From dc552c6739a5774ee60073cfdf6fe41a6505ffa1 Mon Sep 17 00:00:00 2001
From: Stefan Stojanovic <StefanStojanovic@users.noreply.github.com>
Date: Fri, 11 Oct 2024 12:28:55 +0200
Subject: [PATCH 001/216] build,win: enable pch for clang-cl

Fixes: https://github.com/nodejs/node/issues/55208
PR-URL: https://github.com/nodejs/node/pull/55249
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 tools/gyp/pylib/gyp/generator/msvs.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/gyp/pylib/gyp/generator/msvs.py b/tools/gyp/pylib/gyp/generator/msvs.py
index 13b0794b4dccc3..3f2497c8127a3d 100644
--- a/tools/gyp/pylib/gyp/generator/msvs.py
+++ b/tools/gyp/pylib/gyp/generator/msvs.py
@@ -3416,7 +3416,11 @@ def _FinalizeMSBuildSettings(spec, configuration):
     )
     # Turn on precompiled headers if appropriate.
     if precompiled_header:
-        precompiled_header = os.path.split(precompiled_header)[1]
+        # While MSVC works with just file name eg. "v8_pch.h", ClangCL requires
+        # the full path eg. "tools/msvs/pch/v8_pch.h" to find the file.
+        # P.S. Only ClangCL defines msbuild_toolset, for MSVC it is None.
+        if configuration.get("msbuild_toolset") != 'ClangCL':
+            precompiled_header = os.path.split(precompiled_header)[1]
         _ToolAppend(msbuild_settings, "ClCompile", "PrecompiledHeader", "Use")
         _ToolAppend(
             msbuild_settings, "ClCompile", "PrecompiledHeaderFile", precompiled_header

From e70abce96a0b1f9db596e79acf1b83814f52cab3 Mon Sep 17 00:00:00 2001
From: Jimmy Leung <43258070+hkleungai@users.noreply.github.com>
Date: Fri, 11 Oct 2024 21:03:35 +0800
Subject: [PATCH 002/216] doc: fix the return type of
 outgoingMessage.setHeaders()

The actual implementation returns `outgoingMessage` itself, but not
exactly `http.ServerResponse`.

Refs: https://github.com/nodejs/node/blob/20d8b85d3493bec944de541a896e0165dd356345/lib/_http_outgoing.js#L712-L751
PR-URL: https://github.com/nodejs/node/pull/55290
Reviewed-By: Paolo Insogna <paolo@cowtech.it>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Qingyu Deng <i@ayase-lab.com>
---
 doc/api/http.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/doc/api/http.md b/doc/api/http.md
index fb035b6fcc996f..0f9baef123edf8 100644
--- a/doc/api/http.md
+++ b/doc/api/http.md
@@ -3289,9 +3289,7 @@ added:
 -->
 
 * `headers` {Headers|Map}
-* Returns: {http.ServerResponse}
-
-Returns the response object.
+* Returns: {this}
 
 Sets multiple header values for implicit headers.
 `headers` must be an instance of [`Headers`][] or `Map`,
@@ -3300,14 +3298,14 @@ its value will be replaced.
 
 ```js
 const headers = new Headers({ foo: 'bar' });
-response.setHeaders(headers);
+outgoingMessage.setHeaders(headers);
 ```
 
 or
 
 ```js
 const headers = new Map([['foo', 'bar']]);
-res.setHeaders(headers);
+outgoingMessage.setHeaders(headers);
 ```
 
 When headers have been set with [`outgoingMessage.setHeaders()`][],

From fbfcb0cc084e42a72d7fd3e1a3a804c8d487fc6a Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 11 Oct 2024 15:30:32 +0200
Subject: [PATCH 003/216] doc: edit onboarding guide to clarify when mailmap
 addition is needed

PR-URL: https://github.com/nodejs/node/pull/55334
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 onboarding.md | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/onboarding.md b/onboarding.md
index 0731161a4c4e25..bdd266eb52799b 100644
--- a/onboarding.md
+++ b/onboarding.md
@@ -216,16 +216,14 @@ needs to be pointed out separately during the onboarding.
     `git show --format=%B 6669b3857f0f43ee0296eb7ac45086cd907b9e94`
 * Collaborators are in alphabetical order by GitHub username.
 * Optionally, include your personal pronouns.
-* The PR should include an addition to the
-  [mailmap](.mailmap) file if the email
-  being added to the collaborator list does not match the email used for
-  commits. Otherwise tooling will not see the collaborator as being active and
-  may suggest removing them. See
-  [gitmailmap](https://git-scm.com/docs/gitmailmap) for information on the
-  format of the mailmap file.
-* Add the `Fixes: <collaborator-nomination-issue-url>` to the commit message
+* Commit, including a `Fixes: <collaborator-nomination-issue-url>` trailer
   so that when the commit lands, the nomination issue url will be
   automatically closed.
+* Run `tools/find-inactive-collaborators.mjs`. If that command outputs your name,
+  amend the commit to include an addition to the [mailmap](.mailmap) file. See
+  [gitmailmap](https://git-scm.com/docs/gitmailmap) for information on the
+  format of the mailmap file.
+* Push the commit to your own fork.
 * Label your pull request with the `doc`, `notable-change`, and `fast-track`
   labels. The `fast-track` label should cause the Node.js GitHub bot to post a
   comment in the pull request asking collaborators to approve the pull request

From 8d3d4a9fab5bcebb4044166cafc6a205108d69e9 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 11 Oct 2024 19:18:24 +0000
Subject: [PATCH 004/216] meta: bump step-security/harden-runner from 2.9.1 to
 2.10.1

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.9.1 to 2.10.1.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde...91182cccc01eb5e619899d80e4e971d6181294a7)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55220
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index 5bda200cc79aa7..5c6a6edd7d91e1 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -33,7 +33,7 @@ jobs:
 
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@5c7944e73c4c2a096b17a9cb74d65b6c2bbafbde  # v2.9.1
+        uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7  # v2.10.1
         with:
           egress-policy: audit  # TODO: change to 'egress-policy: block' after couple of runs
 

From 158c8ad77c565225de082e941b9f6990c9c178e8 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 11 Oct 2024 19:18:39 +0000
Subject: [PATCH 005/216] meta: bump github/codeql-action from 3.26.6 to
 3.26.10

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.6 to 3.26.10.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/4dd16135b69a43b6c8efb853346f8437d92d3c93...e2b3eafc8d227b0241d48be5f425d47c2d750a13)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55221
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index 5c6a6edd7d91e1..c23dae03fc036c 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -73,6 +73,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: Upload to code-scanning
-        uses: github/codeql-action/upload-sarif@4dd16135b69a43b6c8efb853346f8437d92d3c93  # v3.26.6
+        uses: github/codeql-action/upload-sarif@e2b3eafc8d227b0241d48be5f425d47c2d750a13  # v3.26.10
         with:
           sarif_file: results.sarif

From 7441e289db3ec5d56bf02c867e948db2111f5108 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 11 Oct 2024 19:19:28 +0000
Subject: [PATCH 006/216] meta: bump codecov/codecov-action from 4.5.0 to 4.6.0

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4.5.0 to 4.6.0.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/e28ff129e5465c2c0dcc6f003fc735cb6ae0c673...b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55222
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/coverage-linux-without-intl.yml | 2 +-
 .github/workflows/coverage-linux.yml              | 2 +-
 .github/workflows/coverage-windows.yml            | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/coverage-linux-without-intl.yml b/.github/workflows/coverage-linux-without-intl.yml
index b2354a0b5d07a2..744e9b30b89ab7 100644
--- a/.github/workflows/coverage-linux-without-intl.yml
+++ b/.github/workflows/coverage-linux-without-intl.yml
@@ -79,7 +79,7 @@ jobs:
       - name: Clean tmp
         run: rm -rf coverage/tmp && rm -rf out
       - name: Upload
-        uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673  # v4.5.0
+        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
         with:
           directory: ./coverage
           token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.github/workflows/coverage-linux.yml b/.github/workflows/coverage-linux.yml
index f9afc41e478fca..68f98937e73ba5 100644
--- a/.github/workflows/coverage-linux.yml
+++ b/.github/workflows/coverage-linux.yml
@@ -79,7 +79,7 @@ jobs:
       - name: Clean tmp
         run: rm -rf coverage/tmp && rm -rf out
       - name: Upload
-        uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673  # v4.5.0
+        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
         with:
           directory: ./coverage
           token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.github/workflows/coverage-windows.yml b/.github/workflows/coverage-windows.yml
index 642cd18e10e8e6..ced6ff661c297a 100644
--- a/.github/workflows/coverage-windows.yml
+++ b/.github/workflows/coverage-windows.yml
@@ -71,7 +71,7 @@ jobs:
       - name: Clean tmp
         run: npx rimraf ./coverage/tmp
       - name: Upload
-        uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673  # v4.5.0
+        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
         with:
           directory: ./coverage
           token: ${{ secrets.CODECOV_TOKEN }}

From a5df7087fdcbe78d157a749abe259f3b29cd2a41 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Sun, 13 Oct 2024 16:14:48 -0300
Subject: [PATCH 007/216] doc: fix ambasador markdown list

PR-URL: https://github.com/nodejs/node/pull/55361
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
---
 doc/contributing/advocacy-ambasador-program.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/contributing/advocacy-ambasador-program.md b/doc/contributing/advocacy-ambasador-program.md
index f034af248e697c..3b518d092e1feb 100644
--- a/doc/contributing/advocacy-ambasador-program.md
+++ b/doc/contributing/advocacy-ambasador-program.md
@@ -45,9 +45,9 @@ If there is no objection within 14 days, the nomination is approved.
 
 To onboard an ambassador, a member of the TSC will:
 
-* \[] Add the ambassador to the nodejs/ambassadors team.
-* \[] Add the ambassador to the nodejs/ambassadors `README.md`.
-* \[] Add the ambassador to the OpenJS Slack channel.
+* \[ ] Add the ambassador to the nodejs/ambassadors team.
+* \[ ] Add the ambassador to the nodejs/ambassadors `README.md`.
+* \[ ] Add the ambassador to the OpenJS Slack channel.
 
 ## Reviewing content
 

From de8de542b52ba5be0bdf2e1d6009965c82753faf Mon Sep 17 00:00:00 2001
From: Karl Horky <karl.horky@gmail.com>
Date: Sun, 13 Oct 2024 21:42:03 +0200
Subject: [PATCH 008/216] doc: add missing return values in buffer docs

PR-URL: https://github.com/nodejs/node/pull/55273
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
---
 doc/api/buffer.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/api/buffer.md b/doc/api/buffer.md
index d6ec5372821ee0..ca820ba909dfdb 100644
--- a/doc/api/buffer.md
+++ b/doc/api/buffer.md
@@ -687,6 +687,7 @@ changes:
   with. **Default:** `0`.
 * `encoding` {string} If `fill` is a string, this is its encoding.
   **Default:** `'utf8'`.
+* Returns: {Buffer}
 
 Allocates a new `Buffer` of `size` bytes. If `fill` is `undefined`, the
 `Buffer` will be zero-filled.
@@ -781,6 +782,7 @@ changes:
 -->
 
 * `size` {integer} The desired length of the new `Buffer`.
+* Returns: {Buffer}
 
 Allocates a new `Buffer` of `size` bytes. If `size` is larger than
 [`buffer.constants.MAX_LENGTH`][] or smaller than 0, [`ERR_OUT_OF_RANGE`][]
@@ -851,6 +853,7 @@ changes:
 -->
 
 * `size` {integer} The desired length of the new `Buffer`.
+* Returns: {Buffer}
 
 Allocates a new `Buffer` of `size` bytes. If `size` is larger than
 [`buffer.constants.MAX_LENGTH`][] or smaller than 0, [`ERR_OUT_OF_RANGE`][]
@@ -1095,6 +1098,7 @@ added: v19.8.0
 * `offset` {integer} The starting offset within `view`. **Default:**: `0`.
 * `length` {integer} The number of elements from `view` to copy.
   **Default:** `view.length - offset`.
+* Returns: {Buffer}
 
 Copies the underlying memory of `view` into a new `Buffer`.
 
@@ -1114,6 +1118,7 @@ added: v5.10.0
 -->
 
 * `array` {integer\[]}
+* Returns: {Buffer}
 
 Allocates a new `Buffer` using an `array` of bytes in the range `0` – `255`.
 Array entries outside that range will be truncated to fit into it.
@@ -1156,6 +1161,7 @@ added: v5.10.0
 * `byteOffset` {integer} Index of first byte to expose. **Default:** `0`.
 * `length` {integer} Number of bytes to expose.
   **Default:** `arrayBuffer.byteLength - byteOffset`.
+* Returns: {Buffer}
 
 This creates a view of the [`ArrayBuffer`][] without copying the underlying
 memory. For example, when passed a reference to the `.buffer` property of a
@@ -1268,6 +1274,7 @@ added: v5.10.0
 
 * `buffer` {Buffer|Uint8Array} An existing `Buffer` or [`Uint8Array`][] from
   which to copy data.
+* Returns: {Buffer}
 
 Copies the passed `buffer` data onto a new `Buffer` instance.
 
@@ -1311,6 +1318,7 @@ added: v8.2.0
 * `object` {Object} An object supporting `Symbol.toPrimitive` or `valueOf()`.
 * `offsetOrEncoding` {integer|string} A byte-offset or encoding.
 * `length` {integer} A length.
+* Returns: {Buffer}
 
 For objects whose `valueOf()` function returns a value not strictly equal to
 `object`, returns `Buffer.from(object.valueOf(), offsetOrEncoding, length)`.
@@ -1369,6 +1377,7 @@ added: v5.10.0
 
 * `string` {string} A string to encode.
 * `encoding` {string} The encoding of `string`. **Default:** `'utf8'`.
+* Returns: {Buffer}
 
 Creates a new `Buffer` containing `string`. The `encoding` parameter identifies
 the character encoding to be used when converting `string` into bytes.

From 720d23f3ac432c696aa0d3117eb0468db68e601e Mon Sep 17 00:00:00 2001
From: minkyu_kim <minq2dev@gmail.com>
Date: Mon, 14 Oct 2024 15:41:11 +0900
Subject: [PATCH 009/216] build: fix make errors that occur in Makefile

fix make errors that occur in
 coverage-clean case and coverage-test in Makefile

PR-URL: https://github.com/nodejs/node/pull/55287
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 8e098207f1bf33..22c58a2676a564 100644
--- a/Makefile
+++ b/Makefile
@@ -240,7 +240,7 @@ coverage-clean:
 	$(RM) -r coverage/tmp
 	@if [ -d "out/Release/obj.target" ]; then \
 		$(FIND) out/$(BUILDTYPE)/obj.target \( -name "*.gcda" -o -name "*.gcno" \) \
-			-type f -exec $(RM) {};\
+			-type f | xargs $(RM); \
 	fi
 
 .PHONY: coverage
@@ -266,7 +266,7 @@ coverage-build-js:
 .PHONY: coverage-test
 coverage-test: coverage-build
 	@if [ -d "out/Release/obj.target" ]; then \
-		$(FIND) out/$(BUILDTYPE)/obj.target -name "*.gcda" -type f -exec $(RM) {}; \
+		$(FIND) out/$(BUILDTYPE)/obj.target -name "*.gcda" -type f | xargs $(RM); \
 	fi
 	-NODE_V8_COVERAGE=coverage/tmp \
 		TEST_CI_ARGS="$(TEST_CI_ARGS) --type=coverage" $(MAKE) $(COVTESTS)

From 5489656b351ee51455ec11311707d60bce50b488 Mon Sep 17 00:00:00 2001
From: minkyu_kim <minq2dev@gmail.com>
Date: Mon, 14 Oct 2024 15:41:27 +0900
Subject: [PATCH 010/216] test: update test_util.cc for coverage

update test_util.cc for code coverage src/util-inl.h:PopFront()

PR-URL: https://github.com/nodejs/node/pull/55291
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 test/cctest/test_util.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/cctest/test_util.cc b/test/cctest/test_util.cc
index 79113965f50e36..c3beea15f2dfc4 100644
--- a/test/cctest/test_util.cc
+++ b/test/cctest/test_util.cc
@@ -27,6 +27,7 @@ TEST_F(UtilTest, ListHead) {
 
   List list;
   EXPECT_TRUE(list.IsEmpty());
+  EXPECT_TRUE(list.PopFront() == nullptr);
 
   Item one;
   EXPECT_TRUE(one.node_.IsEmpty());

From a41b0e12474e49a5a90da3fc0c1e1f06dbb757ba Mon Sep 17 00:00:00 2001
From: Robert Nagy <ronagy@icloud.com>
Date: Mon, 14 Oct 2024 12:24:32 +0200
Subject: [PATCH 011/216] events: optimize EventTarget.addEventListener
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55312
Fixes: https://github.com/nodejs/node/issues/55311
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
---
 .../events/eventtarget-add-remove-abort.js    | 25 ++++++++
 benchmark/events/eventtarget-add-remove.js    |  4 +-
 lib/internal/event_target.js                  | 59 +++++++++++--------
 3 files changed, 62 insertions(+), 26 deletions(-)
 create mode 100644 benchmark/events/eventtarget-add-remove-abort.js

diff --git a/benchmark/events/eventtarget-add-remove-abort.js b/benchmark/events/eventtarget-add-remove-abort.js
new file mode 100644
index 00000000000000..b1565c1f432a3e
--- /dev/null
+++ b/benchmark/events/eventtarget-add-remove-abort.js
@@ -0,0 +1,25 @@
+'use strict';
+const common = require('../common.js');
+
+const bench = common.createBenchmark(main, {
+  n: [1e5],
+  nListener: [1, 5, 10],
+});
+
+function main({ n, nListener }) {
+  const target = new EventTarget();
+  const listeners = [];
+  for (let k = 0; k < nListener; k += 1)
+    listeners.push(() => {});
+
+  bench.start();
+  for (let i = 0; i < n; i += 1) {
+    for (let k = listeners.length; --k >= 0;) {
+      target.addEventListener('abort', listeners[k]);
+    }
+    for (let k = listeners.length; --k >= 0;) {
+      target.removeEventListener('abort', listeners[k]);
+    }
+  }
+  bench.end(n);
+}
diff --git a/benchmark/events/eventtarget-add-remove.js b/benchmark/events/eventtarget-add-remove.js
index a3defce03cfb8d..62048086d5d0bf 100644
--- a/benchmark/events/eventtarget-add-remove.js
+++ b/benchmark/events/eventtarget-add-remove.js
@@ -2,8 +2,8 @@
 const common = require('../common.js');
 
 const bench = common.createBenchmark(main, {
-  n: [1e6],
-  nListener: [5, 10],
+  n: [1e5],
+  nListener: [1, 5, 10],
 });
 
 function main({ n, nListener }) {
diff --git a/lib/internal/event_target.js b/lib/internal/event_target.js
index 70875ec3a142f8..183b0fe0a28a60 100644
--- a/lib/internal/event_target.js
+++ b/lib/internal/event_target.js
@@ -601,19 +601,40 @@ class EventTarget {
     if (arguments.length < 2)
       throw new ERR_MISSING_ARGS('type', 'listener');
 
-    // We validateOptions before the validateListener check because the spec
-    // requires us to hit getters.
-    const {
-      once,
-      capture,
-      passive,
-      signal,
-      isNodeStyleListener,
-      weak,
-      resistStopPropagation,
-    } = validateEventListenerOptions(options);
-
-    validateAbortSignal(signal, 'options.signal');
+    let once = false;
+    let capture = false;
+    let passive = false;
+    let isNodeStyleListener = false;
+    let weak = false;
+    let resistStopPropagation = false;
+
+    if (options !== kEmptyObject) {
+      // We validateOptions before the validateListener check because the spec
+      // requires us to hit getters.
+      options = validateEventListenerOptions(options);
+
+      once = options.once;
+      capture = options.capture;
+      passive = options.passive;
+      isNodeStyleListener = options.isNodeStyleListener;
+      weak = options.weak;
+      resistStopPropagation = options.resistStopPropagation;
+
+      const signal = options.signal;
+
+      validateAbortSignal(signal, 'options.signal');
+
+      if (signal) {
+        if (signal.aborted) {
+          return;
+        }
+        // TODO(benjamingr) make this weak somehow? ideally the signal would
+        // not prevent the event target from GC.
+        signal.addEventListener('abort', () => {
+          this.removeEventListener(type, listener, options);
+        }, { __proto__: null, once: true, [kWeakHandler]: this, [kResistStopPropagation]: true });
+      }
+    }
 
     if (!validateEventListener(listener)) {
       // The DOM silently allows passing undefined as a second argument
@@ -627,18 +648,8 @@ class EventTarget {
       process.emitWarning(w);
       return;
     }
-    type = webidl.converters.DOMString(type);
 
-    if (signal) {
-      if (signal.aborted) {
-        return;
-      }
-      // TODO(benjamingr) make this weak somehow? ideally the signal would
-      // not prevent the event target from GC.
-      signal.addEventListener('abort', () => {
-        this.removeEventListener(type, listener, options);
-      }, { __proto__: null, once: true, [kWeakHandler]: this, [kResistStopPropagation]: true });
-    }
+    type = webidl.converters.DOMString(type);
 
     let root = this[kEvents].get(type);
 

From de265b9558ea03c1f616a2a4f91601d0f0d46248 Mon Sep 17 00:00:00 2001
From: simon-id <simon.id@protonmail.com>
Date: Mon, 14 Oct 2024 12:55:39 +0200
Subject: [PATCH 012/216] diagnostics_channel: fix unsubscribe during publish

PR-URL: https://github.com/nodejs/node/pull/55116
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Claudio Wunder <cwunder@gnome.org>
---
 lib/diagnostics_channel.js                          | 13 ++++++++++---
 .../test-diagnostics-channel-sync-unsubscribe.js    |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/diagnostics_channel.js b/lib/diagnostics_channel.js
index 68f604e719d267..f1467675038a9c 100644
--- a/lib/diagnostics_channel.js
+++ b/lib/diagnostics_channel.js
@@ -4,6 +4,8 @@ const {
   ArrayPrototypeAt,
   ArrayPrototypeIndexOf,
   ArrayPrototypePush,
+  ArrayPrototypePushApply,
+  ArrayPrototypeSlice,
   ArrayPrototypeSplice,
   SafeFinalizationRegistry,
   ObjectDefineProperty,
@@ -97,6 +99,7 @@ function wrapStoreRun(store, data, next, transform = defaultTransform) {
 class ActiveChannel {
   subscribe(subscription) {
     validateFunction(subscription, 'subscription');
+    this._subscribers = ArrayPrototypeSlice(this._subscribers);
     ArrayPrototypePush(this._subscribers, subscription);
     channels.incRef(this.name);
   }
@@ -105,7 +108,10 @@ class ActiveChannel {
     const index = ArrayPrototypeIndexOf(this._subscribers, subscription);
     if (index === -1) return false;
 
-    ArrayPrototypeSplice(this._subscribers, index, 1);
+    const before = ArrayPrototypeSlice(this._subscribers, 0, index);
+    const after = ArrayPrototypeSlice(this._subscribers, index + 1);
+    this._subscribers = before;
+    ArrayPrototypePushApply(this._subscribers, after);
 
     channels.decRef(this.name);
     maybeMarkInactive(this);
@@ -137,9 +143,10 @@ class ActiveChannel {
   }
 
   publish(data) {
-    for (let i = 0; i < (this._subscribers?.length || 0); i++) {
+    const subscribers = this._subscribers;
+    for (let i = 0; i < (subscribers?.length || 0); i++) {
       try {
-        const onMessage = this._subscribers[i];
+        const onMessage = subscribers[i];
         onMessage(data, this.name);
       } catch (err) {
         process.nextTick(() => {
diff --git a/test/parallel/test-diagnostics-channel-sync-unsubscribe.js b/test/parallel/test-diagnostics-channel-sync-unsubscribe.js
index 87bf44249f5fc4..51db6a56c2389d 100644
--- a/test/parallel/test-diagnostics-channel-sync-unsubscribe.js
+++ b/test/parallel/test-diagnostics-channel-sync-unsubscribe.js
@@ -9,6 +9,7 @@ const published_data = 'some message';
 const onMessageHandler = common.mustCall(() => dc.unsubscribe(channel_name, onMessageHandler));
 
 dc.subscribe(channel_name, onMessageHandler);
+dc.subscribe(channel_name, common.mustCall());
 
 // This must not throw.
 dc.channel(channel_name).publish(published_data);

From 470789a9817236b86c1aecf5216a53f0ec8a8b57 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Mon, 14 Oct 2024 09:14:01 -0300
Subject: [PATCH 013/216] benchmark: adjust byte size for buffer-copy

PR-URL: https://github.com/nodejs/node/pull/55295
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 benchmark/buffers/buffer-copy.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/benchmark/buffers/buffer-copy.js b/benchmark/buffers/buffer-copy.js
index a498c08e65276f..d6e510c1ac8296 100644
--- a/benchmark/buffers/buffer-copy.js
+++ b/benchmark/buffers/buffer-copy.js
@@ -2,7 +2,7 @@
 const common = require('../common.js');
 
 const bench = common.createBenchmark(main, {
-  bytes: [0, 8, 128, 32 * 1024],
+  bytes: [8, 128, 1024],
   partial: ['true', 'false'],
   n: [6e6],
 }, {

From f2a55d9d2d6f86932a23eb0350fa12e524eb3b7f Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 15 Oct 2024 02:45:37 +0200
Subject: [PATCH 014/216] deps: update c-ares to v1.34.1

PR-URL: https://github.com/nodejs/node/pull/55369
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 deps/cares/CMakeLists.txt                     |    17 +-
 deps/cares/Makefile.Watcom                    |     5 +-
 deps/cares/Makefile.dj                        |     2 +-
 deps/cares/Makefile.m32                       |     2 +-
 deps/cares/Makefile.msvc                      |    11 +-
 deps/cares/README.md                          |     9 +-
 deps/cares/RELEASE-NOTES.md                   |    89 +-
 deps/cares/aminclude_static.am                |     2 +-
 deps/cares/cares.gyp                          |    81 +-
 deps/cares/configure                          |   741 +-
 deps/cares/configure.ac                       |     6 +-
 deps/cares/docs/Makefile.in                   |     4 +
 deps/cares/docs/Makefile.inc                  |     4 +
 deps/cares/docs/adig.1                        |   196 +-
 deps/cares/docs/ares_inet_pton.3              |     7 +-
 deps/cares/docs/ares_init_options.3           |     7 +-
 deps/cares/docs/ares_process.3                |   135 +-
 deps/cares/docs/ares_process_fd.3             |     3 +
 deps/cares/docs/ares_process_fds.3            |     3 +
 deps/cares/docs/ares_process_pending_write.3  |     3 +
 deps/cares/docs/ares_set_local_dev.3          |    10 +-
 deps/cares/docs/ares_set_pending_write_cb.3   |    62 +
 deps/cares/docs/ares_set_servers_csv.3        |    88 +-
 deps/cares/docs/ares_set_socket_functions.3   |   347 +-
 deps/cares/include/ares.h                     |   308 +-
 deps/cares/include/ares_build.h               |     1 -
 deps/cares/include/ares_version.h             |    15 +-
 deps/cares/m4/libtool.m4                      |   307 +-
 deps/cares/m4/ltoptions.m4                    |   106 +-
 deps/cares/m4/ltsugar.m4                      |     2 +-
 deps/cares/m4/ltversion.m4                    |    12 +-
 deps/cares/m4/lt~obsolete.m4                  |     2 +-
 deps/cares/src/lib/CMakeLists.txt             |     2 +
 deps/cares/src/lib/Makefile.am                |     3 +-
 deps/cares/src/lib/Makefile.in                |   604 +-
 deps/cares/src/lib/Makefile.inc               |    83 +-
 deps/cares/src/lib/ares__socket.c             |   764 --
 ...info2hostent.c => ares_addrinfo2hostent.c} |    14 +-
 ..._localhost.c => ares_addrinfo_localhost.c} |    26 +-
 deps/cares/src/lib/ares_android.c             |     6 +-
 deps/cares/src/lib/ares_cancel.c              |    26 +-
 ...__close_sockets.c => ares_close_sockets.c} |    69 +-
 deps/cares/src/lib/ares_config.h.cmake        |     6 +
 deps/cares/src/lib/ares_config.h.in           |     6 +
 deps/cares/src/lib/ares_conn.c                |   511 +
 deps/cares/src/lib/ares_conn.h                |   196 +
 deps/cares/src/lib/ares_cookie.c              |     9 +-
 deps/cares/src/lib/ares_destroy.c             |    68 +-
 deps/cares/src/lib/ares_freeaddrinfo.c        |     8 +-
 deps/cares/src/lib/ares_getaddrinfo.c         |    64 +-
 deps/cares/src/lib/ares_gethostbyaddr.c       |     9 +-
 deps/cares/src/lib/ares_gethostbyname.c       |    29 +-
 deps/cares/src/lib/ares_getnameinfo.c         |    10 +-
 .../{ares__hosts_file.c => ares_hosts_file.c} |   339 +-
 deps/cares/src/lib/ares_inet_net_pton.h       |     4 -
 deps/cares/src/lib/ares_init.c                |   141 +-
 deps/cares/src/lib/ares_ipv6.h                |     2 +-
 deps/cares/src/lib/ares_library_init.c        |    40 +-
 deps/cares/src/lib/ares_metrics.c             |     4 +-
 deps/cares/src/lib/ares_options.c             |    40 +-
 ..._addrinfo.c => ares_parse_into_addrinfo.c} |    22 +-
 deps/cares/src/lib/ares_platform.c            | 11047 ----------------
 deps/cares/src/lib/ares_private.h             |   605 +-
 deps/cares/src/lib/ares_process.c             |  1056 +-
 deps/cares/src/lib/ares_qcache.c              |   139 +-
 deps/cares/src/lib/ares_query.c               |    11 +-
 deps/cares/src/lib/ares_search.c              |   129 +-
 deps/cares/src/lib/ares_send.c                |    63 +-
 .../cares/src/lib/ares_set_socket_functions.c |   586 +
 deps/cares/src/lib/ares_setup.h               |   149 +-
 deps/cares/src/lib/ares_socket.c              |   424 +
 deps/cares/src/lib/ares_socket.h              |   163 +
 ...es__sortaddrinfo.c => ares_sortaddrinfo.c} |    35 +-
 deps/cares/src/lib/ares_sysconfig.c           |   106 +-
 deps/cares/src/lib/ares_sysconfig_files.c     |   382 +-
 deps/cares/src/lib/ares_sysconfig_mac.c       |    32 +-
 deps/cares/src/lib/ares_sysconfig_win.c       |    15 +-
 deps/cares/src/lib/ares_timeout.c             |    24 +-
 deps/cares/src/lib/ares_update_servers.c      |   593 +-
 deps/cares/src/lib/config-dos.h               |     1 +
 deps/cares/src/lib/config-win32.h             |     3 +
 .../lib/dsa/{ares__array.c => ares_array.c}   |   156 +-
 .../lib/dsa/{ares__htable.c => ares_htable.c} |   179 +-
 .../lib/dsa/{ares__htable.h => ares_htable.h} |    46 +-
 ...ares__htable_asvp.c => ares_htable_asvp.c} |    80 +-
 deps/cares/src/lib/dsa/ares_htable_dict.c     |   228 +
 ...es__htable_strvp.c => ares_htable_strvp.c} |   104 +-
 ...ares__htable_szvp.c => ares_htable_szvp.c} |    69 +-
 deps/cares/src/lib/dsa/ares_htable_vpstr.c    |   186 +
 ...ares__htable_vpvp.c => ares_htable_vpvp.c} |    75 +-
 .../lib/dsa/{ares__llist.c => ares_llist.c}   |   150 +-
 .../lib/dsa/{ares__slist.c => ares_slist.c}   |   146 +-
 .../lib/dsa/{ares__slist.h => ares_slist.h}   |    55 +-
 deps/cares/src/lib/event/ares_event.h         |    12 +-
 .../src/lib/event/ares_event_configchg.c      |    54 +-
 deps/cares/src/lib/event/ares_event_epoll.c   |     4 +-
 deps/cares/src/lib/event/ares_event_kqueue.c  |     4 +-
 deps/cares/src/lib/event/ares_event_poll.c    |     6 +-
 deps/cares/src/lib/event/ares_event_select.c  |     6 +-
 deps/cares/src/lib/event/ares_event_thread.c  |   121 +-
 deps/cares/src/lib/event/ares_event_win32.c   |    74 +-
 .../ares__array.h => include/ares_array.h}    |   109 +-
 .../{str/ares__buf.h => include/ares_buf.h}   |   327 +-
 .../ares_htable_asvp.h}                       |    40 +-
 deps/cares/src/lib/include/ares_htable_dict.h |   123 +
 .../ares_htable_strvp.h}                      |    48 +-
 .../ares_htable_szvp.h}                       |    35 +-
 .../cares/src/lib/include/ares_htable_vpstr.h |   111 +
 .../ares_htable_vpvp.h}                       |    40 +-
 .../ares__llist.h => include/ares_llist.h}    |    72 +-
 .../ares_strcasecmp.h => include/ares_mem.h}  |    20 +-
 deps/cares/src/lib/include/ares_str.h         |   230 +
 deps/cares/src/lib/inet_net_pton.c            |    58 +-
 deps/cares/src/lib/inet_ntop.c                |    29 +-
 deps/cares/src/lib/legacy/ares_expand_name.c  |    26 +-
 .../cares/src/lib/legacy/ares_expand_string.c |    14 +-
 deps/cares/src/lib/legacy/ares_fds.c          |    30 +-
 deps/cares/src/lib/legacy/ares_getsock.c      |    32 +-
 .../cares/src/lib/legacy/ares_parse_a_reply.c |    12 +-
 .../src/lib/legacy/ares_parse_aaaa_reply.c    |    12 +-
 .../src/lib/legacy/ares_parse_ptr_reply.c     |     2 +-
 .../src/lib/legacy/ares_parse_txt_reply.c     |    11 +-
 deps/cares/src/lib/record/ares_dns_mapping.c  |     6 +-
 .../src/lib/record/ares_dns_multistring.c     |   145 +-
 .../src/lib/record/ares_dns_multistring.h     |    58 +-
 deps/cares/src/lib/record/ares_dns_name.c     |   197 +-
 deps/cares/src/lib/record/ares_dns_parse.c    |   189 +-
 deps/cares/src/lib/record/ares_dns_private.h  |   132 +-
 deps/cares/src/lib/record/ares_dns_record.c   |   208 +-
 deps/cares/src/lib/record/ares_dns_write.c    |   264 +-
 .../src/lib/str/{ares__buf.c => ares_buf.c}   |   687 +-
 deps/cares/src/lib/str/ares_str.c             |   337 +-
 deps/cares/src/lib/str/ares_str.h             |    89 -
 deps/cares/src/lib/str/ares_strsplit.c        |    71 +-
 deps/cares/src/lib/str/ares_strsplit.h        |     8 +-
 deps/cares/src/lib/util/ares__threads.h       |    60 -
 .../{ares__iface_ips.c => ares_iface_ips.c}   |   188 +-
 .../{ares__iface_ips.h => ares_iface_ips.h}   |    40 +-
 deps/cares/src/lib/util/ares_math.c           |    26 +-
 .../ares_strcasecmp.c => util/ares_math.h}    |    66 +-
 deps/cares/src/lib/util/ares_rand.c           |    50 +-
 .../lib/{ares_platform.h => util/ares_rand.h} |    30 +-
 .../util/{ares__threads.c => ares_threads.c}  |   174 +-
 deps/cares/src/lib/util/ares_threads.h        |    60 +
 deps/cares/src/lib/util/ares_time.h           |    48 +
 .../util/{ares__timeval.c => ares_timeval.c}  |     6 +-
 deps/cares/src/lib/util/ares_uri.c            |  1626 +++
 deps/cares/src/lib/util/ares_uri.h            |   252 +
 deps/cares/src/tools/CMakeLists.txt           |    14 +-
 deps/cares/src/tools/Makefile.am              |     3 +-
 deps/cares/src/tools/Makefile.in              |    95 +-
 deps/cares/src/tools/Makefile.inc             |     6 +-
 deps/cares/src/tools/adig.c                   |  1141 +-
 deps/cares/src/tools/ahost.c                  |    21 +-
 154 files changed, 12736 insertions(+), 17952 deletions(-)
 create mode 100644 deps/cares/docs/ares_process_fd.3
 create mode 100644 deps/cares/docs/ares_process_fds.3
 create mode 100644 deps/cares/docs/ares_process_pending_write.3
 create mode 100644 deps/cares/docs/ares_set_pending_write_cb.3
 delete mode 100644 deps/cares/src/lib/ares__socket.c
 rename deps/cares/src/lib/{ares__addrinfo2hostent.c => ares_addrinfo2hostent.c} (93%)
 rename deps/cares/src/lib/{ares__addrinfo_localhost.c => ares_addrinfo_localhost.c} (89%)
 rename deps/cares/src/lib/{ares__close_sockets.c => ares_close_sockets.c} (59%)
 create mode 100644 deps/cares/src/lib/ares_conn.c
 create mode 100644 deps/cares/src/lib/ares_conn.h
 rename deps/cares/src/lib/{ares__hosts_file.c => ares_hosts_file.c} (64%)
 rename deps/cares/src/lib/{ares__parse_into_addrinfo.c => ares_parse_into_addrinfo.c} (89%)
 delete mode 100644 deps/cares/src/lib/ares_platform.c
 create mode 100644 deps/cares/src/lib/ares_set_socket_functions.c
 create mode 100644 deps/cares/src/lib/ares_socket.c
 create mode 100644 deps/cares/src/lib/ares_socket.h
 rename deps/cares/src/lib/{ares__sortaddrinfo.c => ares_sortaddrinfo.c} (94%)
 rename deps/cares/src/lib/dsa/{ares__array.c => ares_array.c} (60%)
 rename deps/cares/src/lib/dsa/{ares__htable.c => ares_htable.c} (64%)
 rename deps/cares/src/lib/dsa/{ares__htable.h => ares_htable.h} (76%)
 rename deps/cares/src/lib/dsa/{ares__htable_asvp.c => ares_htable_asvp.c} (62%)
 create mode 100644 deps/cares/src/lib/dsa/ares_htable_dict.c
 rename deps/cares/src/lib/dsa/{ares__htable_strvp.c => ares_htable_strvp.c} (54%)
 rename deps/cares/src/lib/dsa/{ares__htable_szvp.c => ares_htable_szvp.c} (61%)
 create mode 100644 deps/cares/src/lib/dsa/ares_htable_vpstr.c
 rename deps/cares/src/lib/dsa/{ares__htable_vpvp.c => ares_htable_vpvp.c} (60%)
 rename deps/cares/src/lib/dsa/{ares__llist.c => ares_llist.c} (52%)
 rename deps/cares/src/lib/dsa/{ares__slist.c => ares_slist.c} (70%)
 rename deps/cares/src/lib/dsa/{ares__slist.h => ares_slist.h} (75%)
 rename deps/cares/src/lib/{dsa/ares__array.h => include/ares_array.h} (61%)
 rename deps/cares/src/lib/{str/ares__buf.h => include/ares_buf.h} (60%)
 rename deps/cares/src/lib/{dsa/ares__htable_asvp.h => include/ares_htable_asvp.h} (72%)
 create mode 100644 deps/cares/src/lib/include/ares_htable_dict.h
 rename deps/cares/src/lib/{dsa/ares__htable_strvp.h => include/ares_htable_strvp.h} (66%)
 rename deps/cares/src/lib/{dsa/ares__htable_szvp.h => include/ares_htable_szvp.h} (72%)
 create mode 100644 deps/cares/src/lib/include/ares_htable_vpstr.h
 rename deps/cares/src/lib/{dsa/ares__htable_vpvp.h => include/ares_htable_vpvp.h} (72%)
 rename deps/cares/src/lib/{dsa/ares__llist.h => include/ares_llist.h} (68%)
 rename deps/cares/src/lib/{str/ares_strcasecmp.h => include/ares_mem.h} (75%)
 create mode 100644 deps/cares/src/lib/include/ares_str.h
 rename deps/cares/src/lib/str/{ares__buf.c => ares_buf.c} (55%)
 delete mode 100644 deps/cares/src/lib/str/ares_str.h
 delete mode 100644 deps/cares/src/lib/util/ares__threads.h
 rename deps/cares/src/lib/util/{ares__iface_ips.c => ares_iface_ips.c} (69%)
 rename deps/cares/src/lib/util/{ares__iface_ips.h => ares_iface_ips.h} (77%)
 rename deps/cares/src/lib/{str/ares_strcasecmp.c => util/ares_math.h} (53%)
 rename deps/cares/src/lib/{ares_platform.h => util/ares_rand.h} (72%)
 rename deps/cares/src/lib/util/{ares__threads.c => ares_threads.c} (64%)
 create mode 100644 deps/cares/src/lib/util/ares_threads.h
 create mode 100644 deps/cares/src/lib/util/ares_time.h
 rename deps/cares/src/lib/util/{ares__timeval.c => ares_timeval.c} (96%)
 create mode 100644 deps/cares/src/lib/util/ares_uri.c
 create mode 100644 deps/cares/src/lib/util/ares_uri.h

diff --git a/deps/cares/CMakeLists.txt b/deps/cares/CMakeLists.txt
index 9862406495f4fa..39963f1e8c3cb4 100644
--- a/deps/cares/CMakeLists.txt
+++ b/deps/cares/CMakeLists.txt
@@ -12,7 +12,7 @@ INCLUDE (CheckCSourceCompiles)
 INCLUDE (CheckStructHasMember)
 INCLUDE (CheckLibraryExists)
 
-PROJECT (c-ares LANGUAGES C VERSION "1.33.1" )
+PROJECT (c-ares LANGUAGES C VERSION "1.34.1" )
 
 # Set this version before release
 SET (CARES_VERSION "${PROJECT_VERSION}")
@@ -30,7 +30,7 @@ INCLUDE (GNUInstallDirs) # include this *AFTER* PROJECT(), otherwise paths are w
 # For example, a version of 4:0:2 would generate output such as:
 #    libname.so   -> libname.so.2
 #    libname.so.2 -> libname.so.2.2.0
-SET (CARES_LIB_VERSIONINFO "20:1:18")
+SET (CARES_LIB_VERSIONINFO "21:1:19")
 
 
 OPTION (CARES_STATIC        "Build as a static library"                                             OFF)
@@ -406,7 +406,7 @@ ENDIF ()
 
 CHECK_STRUCT_HAS_MEMBER("struct sockaddr_in6" sin6_scope_id "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID LANGUAGE C)
 
-
+CHECK_SYMBOL_EXISTS (memmem          "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_MEMMEM)
 CHECK_SYMBOL_EXISTS (closesocket     "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_CLOSESOCKET)
 CHECK_SYMBOL_EXISTS (CloseSocket     "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_CLOSESOCKET_CAMEL)
 CHECK_SYMBOL_EXISTS (connect         "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_CONNECT)
@@ -443,6 +443,7 @@ CHECK_SYMBOL_EXISTS (IoctlSocket     "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_IOCTLSO
 CHECK_SYMBOL_EXISTS (recv            "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_RECV)
 CHECK_SYMBOL_EXISTS (recvfrom        "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_RECVFROM)
 CHECK_SYMBOL_EXISTS (send            "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_SEND)
+CHECK_SYMBOL_EXISTS (sendto          "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_SENDTO)
 CHECK_SYMBOL_EXISTS (setsockopt      "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_SETSOCKOPT)
 CHECK_SYMBOL_EXISTS (socket          "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_SOCKET)
 CHECK_SYMBOL_EXISTS (strcasecmp      "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_STRCASECMP)
@@ -500,7 +501,10 @@ IF (CARES_THREADS)
 			CARES_EXTRAINCLUDE_IFSET (HAVE_PTHREAD_NP_H                    pthread_np.h)
 			CHECK_SYMBOL_EXISTS (pthread_init  "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_PTHREAD_INIT)
 			# Make sure libcares.pc.cmake knows about thread libraries on static builds
-			LIST (APPEND CARES_DEPENDENT_LIBS ${CMAKE_THREAD_LIBS_INIT})
+			# The variable set by FIND_PACKAGE(Threads) has a -l prefix on it, we need
+			# to strip that first since CARES_DEPENDENT_LIBS doesn't expect that.
+			STRING (REPLACE "-l" "" CARES_THREAD_LIBRARY "${CMAKE_THREAD_LIBS_INIT}")
+			LIST (APPEND CARES_DEPENDENT_LIBS ${CARES_THREAD_LIBRARY})
 		ELSE ()
 			MESSAGE (WARNING "Threading support not found, disabling...")
 			SET (CARES_THREADS OFF)
@@ -688,7 +692,6 @@ IF (HAVE_ARPA_NAMESER_COMPAT_H)
 	SET (CARES_HAVE_ARPA_NAMESER_COMPAT_H 1)
 ENDIF()
 
-
 # Coverage
 IF (CARES_COVERAGE)
 	# set compiler flags
@@ -755,9 +758,6 @@ IF (CARES_BUILD_TESTS OR CARES_BUILD_CONTAINER_TESTS)
 	ADD_SUBDIRECTORY (test)
 ENDIF ()
 
-
-
-
 # Export targets
 IF (CARES_INSTALL)
 	SET (CMAKECONFIG_INSTALL_DIR "${CMAKE_INSTALL_LIBDIR}/cmake/${PROJECT_NAME}")
@@ -781,7 +781,6 @@ IF (CARES_INSTALL)
 	INSTALL (FILES "${CMAKE_CURRENT_BINARY_DIR}/libcares.pc" COMPONENT Devel DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
 ENDIF ()
 
-
 # Legacy chain-building variables (provided for compatibility with old code).
 # Don't use these, external code should be updated to refer to the aliases directly (e.g., Cares::cares).
 SET (CARES_FOUND 1 CACHE INTERNAL "CARES LIBRARY FOUND")
diff --git a/deps/cares/Makefile.Watcom b/deps/cares/Makefile.Watcom
index 96ffedb1eb3301..1e445f287a54cc 100644
--- a/deps/cares/Makefile.Watcom
+++ b/deps/cares/Makefile.Watcom
@@ -43,7 +43,8 @@ CP = copy
 
 CFLAGS = -3r -mf -hc -zff -zgf -zq -zm -zc -s -fr=con -w2 -fpi -oilrtfm -aa   &
          -wcd=201 -bt=nt -d+ -dCARES_BUILDING_LIBRARY &
-         -dNTDDI_VERSION=0x06020000 -I. -I.\include -I.\src\lib $(SYS_INCL)
+         -dNTDDI_VERSION=0x06020000 -I. -I.\include -I.\src\lib -I.\src\lib\include &
+         $(SYS_INCL)
 
 LFLAGS = option quiet, map, caseexact, eliminate
 
@@ -124,7 +125,7 @@ $(LIBNAME).lib: $(OBJS_STAT) $(LIB_ARG)
 $(OBJ_BASE)\tools\ares_getopt.obj:
 	$(CC) $(CFLAGS) -DCARES_STATICLIB .\src\tools\ares_getopt.c -fo=$^@
 
-adig.exe: $(OBJ_BASE)\tools\ares_getopt.obj $(LIBNAME).lib
+adig.exe: $(LIBNAME).lib
 	$(CC) $(CFLAGS) src\tools\adig.c -fo=$(OBJ_BASE)\tools\adig.obj
 	$(LD) name $^@ system nt $(LFLAGS) file { $(OBJ_BASE)\tools\adig.obj $[@ } library $]@, ws2_32.lib, iphlpapi.lib
 
diff --git a/deps/cares/Makefile.dj b/deps/cares/Makefile.dj
index 69b3ca31851b32..8dca20cb42b4c3 100644
--- a/deps/cares/Makefile.dj
+++ b/deps/cares/Makefile.dj
@@ -23,7 +23,7 @@ VPATH = src/lib src/tools
 WATT32_ROOT = $(realpath $(WATT_ROOT))
 WATT32_LIB  = $(WATT32_ROOT)/lib/libwatt.a
 
-CFLAGS = -g -O2 -I./include -I./src/lib \
+CFLAGS = -g -O2 -I./include -I./src/lib -I./src/lib/include \
          -I$(WATT32_ROOT)/inc \
          -Wall \
          -Wextra \
diff --git a/deps/cares/Makefile.m32 b/deps/cares/Makefile.m32
index 36ae674c635194..7bd85165978208 100644
--- a/deps/cares/Makefile.m32
+++ b/deps/cares/Makefile.m32
@@ -19,7 +19,7 @@ RANLIB	= $(CROSSPREFIX)ranlib
 #RM	= rm -f
 CP	= cp -afv
 
-CFLAGS	= $(CARES_CFLAG_EXTRAS) -O2 -Wall -I./include -I./src/lib -D_WIN32_WINNT=0x0602
+CFLAGS	= $(CARES_CFLAG_EXTRAS) -O2 -Wall -I./include -I./src/lib -I./src/lib/include -D_WIN32_WINNT=0x0602
 CFLAGS	+= -DCARES_STATICLIB
 LDFLAGS	= $(CARES_LDFLAG_EXTRAS) -s
 LIBS	= -lws2_32 -liphlpapi
diff --git a/deps/cares/Makefile.msvc b/deps/cares/Makefile.msvc
index 619c2d39274521..8395d1a7d67728 100644
--- a/deps/cares/Makefile.msvc
+++ b/deps/cares/Makefile.msvc
@@ -192,7 +192,7 @@ EX_LIBS_DBG = ws2_32.lib advapi32.lib kernel32.lib iphlpapi.lib
 
 CC_CMD_REL = cl.exe /nologo $(RTLIB) /DNDEBUG /O2
 CC_CMD_DBG = cl.exe /nologo $(RTLIBD) /D_DEBUG /Od /Zi /RTCsu
-CC_CFLAGS  = $(CFLAGS) /D_REENTRANT /I.\src\lib /I.\include /W3 /EHsc /FD
+CC_CFLAGS  = $(CFLAGS) /D_REENTRANT /I.\src\lib /I.\include /I.\src\lib\include /W3 /EHsc /FD
 
 RC_CMD_REL = rc.exe /l 0x409 /d "NDEBUG"
 RC_CMD_DBG = rc.exe /l 0x409 /d "_DEBUG"
@@ -344,15 +344,6 @@ PROG3_OBJS = $(PROG3_OBJS) $(PROG3_OBJDIR)\ahost.obj
 {$(SRCDIR)\src\tools}.c{$(PROG3_OBJDIR)}.obj:
     $(CC_CMD) $(CC_CFLAGS) $(SPROG_CFLAGS) /Fo$@ /Fd$(PROG3_OBJDIR)\ /c $<
 
-# Hack Alert! we reference ../lib/str files in the Makefile.inc for tools as they
-# share some files with the library itself.  We need to hack around that here.
-
-{$(SRCDIR)\src\lib\str}.c{$(PROG2_OBJDIR)\..\lib\str}.obj:
-    $(CC_CMD) $(CC_CFLAGS) $(SPROG_CFLAGS) /Fo$(PROG2_OBJDIR)\str\$(@F) /Fd$(PROG2_OBJDIR)\str\ /c $<
-
-{$(SRCDIR)\src\lib\str}.c{$(PROG3_OBJDIR)\..\lib\str}.obj:
-    $(CC_CMD) $(CC_CFLAGS) $(SPROG_CFLAGS) /Fo$(PROG3_OBJDIR)\str\$(@F) /Fd$(PROG3_OBJDIR)\str\ /c $<
-
 # ------------------------------------------------------------- #
 # ------------------------------------------------------------- #
 # Default target when no CFG library type has been specified,   #
diff --git a/deps/cares/README.md b/deps/cares/README.md
index c32d0677c83419..6566c9fe6aa18e 100644
--- a/deps/cares/README.md
+++ b/deps/cares/README.md
@@ -82,7 +82,7 @@ to sign releases):
 
 ```bash
 gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys 27EDEAF22F3ABCEB50DB9A125CC908FDB71E12C2 # Daniel Stenberg
-gpg --keyserver hkps://keys.openpgp.org --recv-keys DA7D64E4C82C6294CB73A20E22E3D13B5411B7CA     # Brad House
+gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys DA7D64E4C82C6294CB73A20E22E3D13B5411B7CA # Brad House
 ```
 
 ### Verifying signatures
@@ -109,8 +109,11 @@ gpg: binary signature, digest algorithm SHA512, key algorithm rsa2048
 ```
 
 ## Features
+
+See [Features](FEATURES.md)
+
 ### Supported RFCs and Proposals
-- [RFC1035](https://datatracker.ietf.org/doc/html/rfc7873).
+- [RFC1035](https://datatracker.ietf.org/doc/html/rfc1035).
   Initial/Base DNS RFC
 - [RFC2671](https://datatracker.ietf.org/doc/html/rfc2671),
   [RFC6891](https://datatracker.ietf.org/doc/html/rfc6891).
@@ -151,3 +154,5 @@ gpg: binary signature, digest algorithm SHA512, key algorithm rsa2048
   IPv6 address sorting as used by `ares_getaddrinfo()`.
 - [RFC7413](https://datatracker.ietf.org/doc/html/rfc7413).
   TCP FastOpen (TFO) for 0-RTT TCP Connection Resumption.
+- [RFC3986](https://datatracker.ietf.org/doc/html/rfc3986).
+  Uniform Resource Identifier (URI). Used for server configuration.
diff --git a/deps/cares/RELEASE-NOTES.md b/deps/cares/RELEASE-NOTES.md
index e9c04953dc6022..fa1db666083365 100644
--- a/deps/cares/RELEASE-NOTES.md
+++ b/deps/cares/RELEASE-NOTES.md
@@ -1,65 +1,60 @@
-## c-ares version 1.33.1 - August 23 2024
+## c-ares version 1.34.1 - Octover 9 2024
 
-This is a bugfix release.
+This release fixes a packaging issue.
 
-Bugfixes:
-* Work around systemd-resolved quirk that returns unexpected codes for single
-  label names.  Also adds test cases to validate the work around works and
-  will continue to work in future releases.
-  [PR #863](https://github.com/c-ares/c-ares/pull/863),
-  See Also https://github.com/systemd/systemd/issues/34101
-* Fix sysconfig ndots default value, also adds containerized test case to
-  prevent future regressions.
-  [PR #862](https://github.com/c-ares/c-ares/pull/862)
-* Fix blank DNS name returning error code rather than valid record for
-  commands like: `adig -t SOA .`.  Also adds test case to prevent future
-  regressions.
-  [9e574af](https://github.com/c-ares/c-ares/commit/9e574af)
-* Fix calculation of query times > 1s.
-  [2b2eae7](https://github.com/c-ares/c-ares/commit/2b2eae7)
-* Fix building on old Linux releases that don't have `TCP_FASTOPEN_CONNECT`.
-  [b7a89b9](https://github.com/c-ares/c-ares/commit/b7a89b9)
-* Fix minor Android build warnings.
-  [PR #848](https://github.com/c-ares/c-ares/pull/848)
-
-Thanks go to these friendly people for their efforts and contributions for this
-release:
-* Brad House (@bradh352)
-* Erik Lax (@eriklax)
-* Hans-Christian Egtvedt (@egtvedt)
-* Mikael Lindemann (@mikaellindemann)
-* Nodar Chkuaselidze (@nodech)
 
-## c-ares version 1.33.0 - August 2 2024
+## c-ares version 1.34.0 - October 9 2024
 
 This is a feature and bugfix release.
 
 Features:
-* Add DNS cookie support (RFC7873 + RFC9018) to help prevent off-path cache
-  poisoning attacks. [PR #833](https://github.com/c-ares/c-ares/pull/833)
-* Implement TCP FastOpen (TFO) RFC7413, which will make TCP reconnects 0-RTT
-  on supported systems. [PR #840](https://github.com/c-ares/c-ares/pull/840)
+* adig: read arguments from adigrc.
+  [PR #856](https://github.com/c-ares/c-ares/pull/856)
+* Add new pending write callback optimization via `ares_set_pending_write_cb`.
+  [PR #857](https://github.com/c-ares/c-ares/pull/857)
+* New function `ares_process_fds()`.
+  [PR #875](https://github.com/c-ares/c-ares/pull/875)
+* Failed servers should be probed rather than redirecting queries which could
+  cause unexpected latency.
+  [PR #877](https://github.com/c-ares/c-ares/pull/877)
+* adig: rework command line arguments to mimic dig from bind.
+  [PR #890](https://github.com/c-ares/c-ares/pull/890)
+* Add new method for overriding network functions
+  `ares_set_socket_function_ex()` to properly support all new functionality.
+  [PR #894](https://github.com/c-ares/c-ares/pull/894)
+* Fix regression with custom socket callbacks due to DNS cookie support.
+  [PR #895](https://github.com/c-ares/c-ares/pull/895)
+* ares_socket: set IP_BIND_ADDRESS_NO_PORT on ares_set_local_ip* tcp sockets
+  [PR #887](https://github.com/c-ares/c-ares/pull/887)
+* URI parser/writer for ares_set_servers_csv()/ares_get_servers_csv().
+  [PR #882](https://github.com/c-ares/c-ares/pull/882)
 
 Changes:
-* Reorganize source tree. [PR #822](https://github.com/c-ares/c-ares/pull/822)
-* Refactoring of connection handling to prevent code duplication.
-  [PR #839](https://github.com/c-ares/c-ares/pull/839)
-* New dynamic array data structure to prevent simple logic flaws in array
-  handling in various code paths.
-  [PR #841](https://github.com/c-ares/c-ares/pull/841)
+* Connection handling modularization.
+  [PR #857](https://github.com/c-ares/c-ares/pull/857),
+  [PR #876](https://github.com/c-ares/c-ares/pull/876)
+* Expose library/utility functions to tools.
+  [PR #860](https://github.com/c-ares/c-ares/pull/860)
+* Remove `ares__` prefix, just use `ares_` for internal functions.
+  [PR #872](https://github.com/c-ares/c-ares/pull/872)
+
 
 Bugfixes:
-* `ares_destroy()` race condition during shutdown due to missing lock.
-  [PR #831](https://github.com/c-ares/c-ares/pull/831)
-* Android: Preserve thread name after attaching it to JVM.
-  [PR #838](https://github.com/c-ares/c-ares/pull/838)
-* Windows UWP (Store) support fix.
-  [PR #845](https://github.com/c-ares/c-ares/pull/845)
+* fix: potential WIN32_LEAN_AND_MEAN redefinition.
+  [PR #869](https://github.com/c-ares/c-ares/pull/869)
+* Fix googletest v1.15 compatibility.
+  [PR #874](https://github.com/c-ares/c-ares/pull/874)
+* Fix pkgconfig thread dependencies.
+  [PR #884](https://github.com/c-ares/c-ares/pull/884)
 
 
 Thanks go to these friendly people for their efforts and contributions for this
 release:
 
 * Brad House (@bradh352)
-* Yauheni Khnykin (@Hsilgos)
+* Cristian Rodríguez (@crrodriguez)
+* Georg (@tacerus)
+* @lifenjoiner
+* Shelley Vohr (@codebytere)
+* 前进，前进，进 (@leleliu008)
 
diff --git a/deps/cares/aminclude_static.am b/deps/cares/aminclude_static.am
index 538a810c9eb0ee..7cc94c822cf773 100644
--- a/deps/cares/aminclude_static.am
+++ b/deps/cares/aminclude_static.am
@@ -1,6 +1,6 @@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Fri Aug 23 09:37:25 EDT 2024
+# from AX_AM_MACROS_STATIC on Wed Oct  9 20:58:25 EDT 2024
 
 
 # Code coverage
diff --git a/deps/cares/cares.gyp b/deps/cares/cares.gyp
index 6b6d520f9ee872..87898f5610d56c 100644
--- a/deps/cares/cares.gyp
+++ b/deps/cares/cares.gyp
@@ -6,16 +6,14 @@
       'include/ares_dns_record.h',
       'include/ares_nameser.h',
       'include/ares_version.h',
-      'src/lib/ares__addrinfo2hostent.c',
-      'src/lib/ares__addrinfo_localhost.c',
-      'src/lib/ares__close_sockets.c',
-      'src/lib/ares__hosts_file.c',
-      'src/lib/ares__parse_into_addrinfo.c',
-      'src/lib/ares__socket.c',
-      'src/lib/ares__sortaddrinfo.c',
+      'src/lib/ares_addrinfo2hostent.c',
+      'src/lib/ares_addrinfo_localhost.c',
       'src/lib/ares_android.c',
       'src/lib/ares_android.h',
       'src/lib/ares_cancel.c',
+      'src/lib/ares_close_sockets.c',
+      'src/lib/ares_conn.c',
+      'src/lib/ares_conn.h',
       'src/lib/ares_cookie.c',
       'src/lib/ares_data.c',
       'src/lib/ares_data.h',
@@ -29,43 +27,43 @@
       'src/lib/ares_gethostbyaddr.c',
       'src/lib/ares_gethostbyname.c',
       'src/lib/ares_getnameinfo.c',
+      'src/lib/ares_hosts_file.c',
       'src/lib/ares_inet_net_pton.h',
       'src/lib/ares_init.c',
       'src/lib/ares_ipv6.h',
       'src/lib/ares_library_init.c',
       'src/lib/ares_metrics.c',
       'src/lib/ares_options.c',
-      'src/lib/ares_platform.c',
-      'src/lib/ares_platform.h',
+      'src/lib/ares_parse_into_addrinfo.c',
       'src/lib/ares_private.h',
       'src/lib/ares_process.c',
       'src/lib/ares_qcache.c',
       'src/lib/ares_query.c',
       'src/lib/ares_search.c',
       'src/lib/ares_send.c',
+      'src/lib/ares_set_socket_functions.c',
       'src/lib/ares_setup.h',
+      'src/lib/ares_socket.c',
+      'src/lib/ares_socket.h',
+      'src/lib/ares_sortaddrinfo.c',
       'src/lib/ares_strerror.c',
       'src/lib/ares_sysconfig.c',
       'src/lib/ares_sysconfig_files.c',
       'src/lib/ares_timeout.c',
       'src/lib/ares_update_servers.c',
       'src/lib/ares_version.c',
-      'src/lib/dsa/ares__array.c',
-      'src/lib/dsa/ares__array.h',
-      'src/lib/dsa/ares__htable.c',
-      'src/lib/dsa/ares__htable.h',
-      'src/lib/dsa/ares__htable_asvp.c',
-      'src/lib/dsa/ares__htable_asvp.h',
-      'src/lib/dsa/ares__htable_strvp.c',
-      'src/lib/dsa/ares__htable_strvp.h',
-      'src/lib/dsa/ares__htable_szvp.c',
-      'src/lib/dsa/ares__htable_szvp.h',
-      'src/lib/dsa/ares__htable_vpvp.c',
-      'src/lib/dsa/ares__htable_vpvp.h',
-      'src/lib/dsa/ares__llist.c',
-      'src/lib/dsa/ares__llist.h',
-      'src/lib/dsa/ares__slist.c',
-      'src/lib/dsa/ares__slist.h',
+      'src/lib/dsa/ares_array.c',
+      'src/lib/dsa/ares_htable.c',
+      'src/lib/dsa/ares_htable.h',
+      'src/lib/dsa/ares_htable_asvp.c',
+      'src/lib/dsa/ares_htable_dict.c',
+      'src/lib/dsa/ares_htable_strvp.c',
+      'src/lib/dsa/ares_htable_szvp.c',
+      'src/lib/dsa/ares_htable_vpstr.c',
+      'src/lib/dsa/ares_htable_vpvp.c',
+      'src/lib/dsa/ares_llist.c',
+      'src/lib/dsa/ares_slist.c',
+      'src/lib/dsa/ares_slist.h',
       'src/lib/event/ares_event.h',
       'src/lib/event/ares_event_configchg.c',
       'src/lib/event/ares_event_epoll.c',
@@ -76,6 +74,17 @@
       'src/lib/event/ares_event_wake_pipe.c',
       'src/lib/event/ares_event_win32.c',
       'src/lib/event/ares_event_win32.h',
+      'src/lib/include/ares_array.h',
+      'src/lib/include/ares_buf.h',
+      'src/lib/include/ares_htable_asvp.h',
+      'src/lib/include/ares_htable_dict.h',
+      'src/lib/include/ares_htable_strvp.h',
+      'src/lib/include/ares_htable_szvp.h',
+      'src/lib/include/ares_htable_vpstr.h',
+      'src/lib/include/ares_htable_vpvp.h',
+      'src/lib/include/ares_llist.h',
+      'src/lib/include/ares_mem.h',
+      'src/lib/include/ares_str.h',
       'src/lib/inet_net_pton.c',
       'src/lib/inet_ntop.c',
       'src/lib/legacy/ares_create_query.c',
@@ -102,21 +111,23 @@
       'src/lib/record/ares_dns_private.h',
       'src/lib/record/ares_dns_record.c',
       'src/lib/record/ares_dns_write.c',
-      'src/lib/str/ares__buf.c',
       'src/lib/str/ares__buf.h',
+      'src/lib/str/ares_buf.c',
       'src/lib/str/ares_str.c',
-      'src/lib/str/ares_str.h',
-      'src/lib/str/ares_strcasecmp.c',
-      'src/lib/str/ares_strcasecmp.h',
       'src/lib/str/ares_strsplit.c',
       'src/lib/str/ares_strsplit.h',
-      'src/lib/util/ares__iface_ips.c',
-      'src/lib/util/ares__iface_ips.h',
-      'src/lib/util/ares__threads.c',
-      'src/lib/util/ares__threads.h',
-      'src/lib/util/ares__timeval.c',
+      'src/lib/util/ares_iface_ips.c',
+      'src/lib/util/ares_iface_ips.h',
       'src/lib/util/ares_math.c',
+      'src/lib/util/ares_math.h',
       'src/lib/util/ares_rand.c',
+      'src/lib/util/ares_rand.h',
+      'src/lib/util/ares_threads.c',
+      'src/lib/util/ares_threads.h',
+      'src/lib/util/ares_time.h',
+      'src/lib/util/ares_timeval.c',
+      'src/lib/util/ares_uri.c',
+      'src/lib/util/ares_uri.h',
       'src/tools/ares_getopt.c',
       'src/tools/ares_getopt.h',
     ],
@@ -163,7 +174,7 @@
     {
       'target_name': 'cares',
       'type': '<(library)',
-      'include_dirs': [ 'include', 'src/lib' ],
+      'include_dirs': [ 'include', 'src/lib', 'src/lib/include' ],
       'direct_dependent_settings': {
         'include_dirs': [ 'include' ],
         'cflags': [ '-Wno-error=deprecated-declarations' ],
diff --git a/deps/cares/configure b/deps/cares/configure
index 74e9741fe6ee7a..635872c9f18e1a 100755
--- a/deps/cares/configure
+++ b/deps/cares/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.72 for c-ares 1.33.1.
+# Generated by GNU Autoconf 2.72 for c-ares 1.34.1.
 #
 # Report bugs to <c-ares mailing list: http://lists.haxx.se/listinfo/c-ares>.
 #
@@ -614,8 +614,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='c-ares'
 PACKAGE_TARNAME='c-ares'
-PACKAGE_VERSION='1.33.1'
-PACKAGE_STRING='c-ares 1.33.1'
+PACKAGE_VERSION='1.34.1'
+PACKAGE_STRING='c-ares 1.34.1'
 PACKAGE_BUGREPORT='c-ares mailing list: http://lists.haxx.se/listinfo/c-ares'
 PACKAGE_URL=''
 
@@ -834,8 +834,10 @@ enable_dependency_tracking
 enable_silent_rules
 enable_shared
 enable_static
+enable_pic
 with_pic
 enable_fast_install
+enable_aix_soname
 with_aix_soname
 with_gnu_ld
 with_sysroot
@@ -1421,7 +1423,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-'configure' configures c-ares 1.33.1 to adapt to many kinds of systems.
+'configure' configures c-ares 1.34.1 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1492,7 +1494,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of c-ares 1.33.1:";;
+     short | recursive ) echo "Configuration of c-ares 1.34.1:";;
    esac
   cat <<\_ACEOF
 
@@ -1508,8 +1510,13 @@ Optional Features:
   --disable-silent-rules  verbose build output (undo: "make V=0")
   --enable-shared[=PKGS]  build shared libraries [default=yes]
   --enable-static[=PKGS]  build static libraries [default=yes]
+  --enable-pic[=PKGS]     try to use only PIC/non-PIC objects [default=use
+                          both]
   --enable-fast-install[=PKGS]
                           optimize for fast installation [default=yes]
+  --enable-aix-soname=aix|svr4|both
+                          shared library versioning (aka "SONAME") variant to
+                          provide on AIX, [default=aix].
   --disable-libtool-lock  avoid locking (might break parallel builds)
   --disable-warnings      Disable strict compiler warnings
   --disable-symbol-hiding Disable symbol hiding. Enabled by default if the
@@ -1528,11 +1535,6 @@ Optional Features:
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
   --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
-  --with-pic[=PKGS]       try to use only PIC/non-PIC objects [default=use
-                          both]
-  --with-aix-soname=aix|svr4|both
-                          shared library versioning (aka "SONAME") variant to
-                          provide on AIX, [default=aix].
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
   --with-sysroot[=DIR]    Search for dependent libraries within DIR (or the
                           compiler's sysroot if not specified).
@@ -1633,7 +1635,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-c-ares configure 1.33.1
+c-ares configure 1.34.1
 generated by GNU Autoconf 2.72
 
 Copyright (C) 2023 Free Software Foundation, Inc.
@@ -2277,7 +2279,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by c-ares $as_me 1.33.1, which was
+It was created by c-ares $as_me 1.34.1, which was
 generated by GNU Autoconf 2.72.  Invocation command line was
 
   $ $0$ac_configure_args_raw
@@ -3269,7 +3271,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
-CARES_VERSION_INFO="20:1:18"
+CARES_VERSION_INFO="21:1:19"
 
 
 
@@ -6190,7 +6192,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='c-ares'
- VERSION='1.33.1'
+ VERSION='1.34.1'
 
 
 printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h
@@ -6523,8 +6525,8 @@ esac
 
 
 
-macro_version='2.4.7'
-macro_revision='2.4.7'
+macro_version='2.5.3'
+macro_revision='2.5.3'
 
 
 
@@ -7035,7 +7037,7 @@ if test yes = "$GCC"; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5
 printf %s "checking for ld used by $CC... " >&6; }
   case $host in
-  *-*-mingw*)
+  *-*-mingw* | *-*-windows*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -7164,7 +7166,7 @@ else
 	# Tru64's nm complains that /dev/null is an invalid object file
 	# MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty
 	case $build_os in
-	mingw*) lt_bad_file=conftest.nm/nofile ;;
+	mingw* | windows*) lt_bad_file=conftest.nm/nofile ;;
 	*) lt_bad_file=/dev/null ;;
 	esac
 	case `"$tmp_nm" -B $lt_bad_file 2>&1 | $SED '1q'` in
@@ -7397,7 +7399,7 @@ else case e in #(
     lt_cv_sys_max_cmd_len=-1;
     ;;
 
-  cygwin* | mingw* | cegcc*)
+  cygwin* | mingw* | windows* | cegcc*)
     # On Win9x/ME, this test blows up -- it succeeds, but takes
     # about 5 minutes as the teststring grows exponentially.
     # Worse, since 9x/ME are not pre-emptively multitasking,
@@ -7419,7 +7421,7 @@ else case e in #(
     lt_cv_sys_max_cmd_len=8192;
     ;;
 
-  bitrig* | darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
+  darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
     # This has been around since 386BSD, at least.  Likely further.
     if test -x /sbin/sysctl; then
       lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax`
@@ -7562,7 +7564,7 @@ else case e in #(
   e) case $host in
   *-*-mingw* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32
         ;;
       *-*-cygwin* )
@@ -7575,7 +7577,7 @@ else case e in #(
     ;;
   *-*-cygwin* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin
         ;;
       *-*-cygwin* )
@@ -7611,9 +7613,9 @@ else case e in #(
   e) #assume ordinary cross tools, or native build.
 lt_cv_to_tool_file_cmd=func_convert_file_noop
 case $host in
-  *-*-mingw* )
+  *-*-mingw* | *-*-windows* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32
         ;;
     esac
@@ -7649,7 +7651,7 @@ case $reload_flag in
 esac
 reload_cmds='$LD$reload_flag -o $output$reload_objs'
 case $host_os in
-  cygwin* | mingw* | pw32* | cegcc*)
+  cygwin* | mingw* | windows* | pw32* | cegcc*)
     if test yes != "$GCC"; then
       reload_cmds=false
     fi
@@ -7671,9 +7673,8 @@ esac
 
 
 
-if test -n "$ac_tool_prefix"; then
-  # Extract the first word of "${ac_tool_prefix}file", so it can be a program name with args.
-set dummy ${ac_tool_prefix}file; ac_word=$2
+# Extract the first word of "file", so it can be a program name with args.
+set dummy file; ac_word=$2
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
 printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_FILECMD+y}
@@ -7694,7 +7695,7 @@ do
   esac
     for ac_exec_ext in '' $ac_executable_extensions; do
   if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_FILECMD="${ac_tool_prefix}file"
+    ac_cv_prog_FILECMD="file"
     printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
     break 2
   fi
@@ -7702,6 +7703,7 @@ done
   done
 IFS=$as_save_IFS
 
+  test -z "$ac_cv_prog_FILECMD" && ac_cv_prog_FILECMD=":"
 fi ;;
 esac
 fi
@@ -7715,66 +7717,6 @@ printf "%s\n" "no" >&6; }
 fi
 
 
-fi
-if test -z "$ac_cv_prog_FILECMD"; then
-  ac_ct_FILECMD=$FILECMD
-  # Extract the first word of "file", so it can be a program name with args.
-set dummy file; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_ac_ct_FILECMD+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_FILECMD"; then
-  ac_cv_prog_ac_ct_FILECMD="$ac_ct_FILECMD" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_ac_ct_FILECMD="file"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-fi ;;
-esac
-fi
-ac_ct_FILECMD=$ac_cv_prog_ac_ct_FILECMD
-if test -n "$ac_ct_FILECMD"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_FILECMD" >&5
-printf "%s\n" "$ac_ct_FILECMD" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-  if test "x$ac_ct_FILECMD" = x; then
-    FILECMD=":"
-  else
-    case $cross_compiling:$ac_tool_warned in
-yes:)
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
-printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
-ac_tool_warned=yes ;;
-esac
-    FILECMD=$ac_ct_FILECMD
-  fi
-else
-  FILECMD="$ac_cv_prog_FILECMD"
-fi
-
 
 
 
@@ -7906,7 +7848,6 @@ lt_cv_deplibs_check_method='unknown'
 # 'none' -- dependencies not supported.
 # 'unknown' -- same as none, but documents that we really don't know.
 # 'pass_all' -- all dependencies passed with no checks.
-# 'test_compile' -- check by making test program.
 # 'file_magic [[regex]]' -- check by looking for files in library path
 # that responds to the $file_magic_cmd with a given extended regex.
 # If you have 'file' or equivalent on your system and you're not sure
@@ -7933,7 +7874,7 @@ cygwin*)
   lt_cv_file_magic_cmd='func_win32_libid'
   ;;
 
-mingw* | pw32*)
+mingw* | windows* | pw32*)
   # Base MSYS/MinGW do not provide the 'file' command needed by
   # func_win32_libid shell function, so use a weaker test based on 'objdump',
   # unless we find 'file', for example because we are cross-compiling.
@@ -7942,7 +7883,7 @@ mingw* | pw32*)
     lt_cv_file_magic_cmd='func_win32_libid'
   else
     # Keep this pattern in sync with the one in func_win32_libid.
-    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)'
+    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64|pe-aarch64)'
     lt_cv_file_magic_cmd='$OBJDUMP -f'
   fi
   ;;
@@ -8033,7 +7974,7 @@ newos6*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-openbsd* | bitrig*)
+openbsd*)
   if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then
     lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|\.so|_pic\.a)$'
   else
@@ -8101,7 +8042,7 @@ file_magic_glob=
 want_nocaseglob=no
 if test "$build" = "$host"; then
   case $host_os in
-  mingw* | pw32*)
+  mingw* | windows* | pw32*)
     if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then
       want_nocaseglob=yes
     else
@@ -8257,7 +8198,7 @@ else case e in #(
   e) lt_cv_sharedlib_from_linklib_cmd='unknown'
 
 case $host_os in
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   # two different shell functions defined in ltmain.sh;
   # decide which one to use based on capabilities of $DLLTOOL
   case `$DLLTOOL --help 2>&1` in
@@ -8288,6 +8229,110 @@ test -z "$sharedlib_from_linklib_cmd" && sharedlib_from_linklib_cmd=$ECHO
 
 
 
+if test -n "$ac_tool_prefix"; then
+  # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args.
+set dummy ${ac_tool_prefix}ranlib; ac_word=$2
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+printf %s "checking for $ac_word... " >&6; }
+if test ${ac_cv_prog_RANLIB+y}
+then :
+  printf %s "(cached) " >&6
+else case e in #(
+  e) if test -n "$RANLIB"; then
+  ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test.
+else
+as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  case $as_dir in #(((
+    '') as_dir=./ ;;
+    */) ;;
+    *) as_dir=$as_dir/ ;;
+  esac
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
+    ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib"
+    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+fi ;;
+esac
+fi
+RANLIB=$ac_cv_prog_RANLIB
+if test -n "$RANLIB"; then
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5
+printf "%s\n" "$RANLIB" >&6; }
+else
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+
+
+fi
+if test -z "$ac_cv_prog_RANLIB"; then
+  ac_ct_RANLIB=$RANLIB
+  # Extract the first word of "ranlib", so it can be a program name with args.
+set dummy ranlib; ac_word=$2
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+printf %s "checking for $ac_word... " >&6; }
+if test ${ac_cv_prog_ac_ct_RANLIB+y}
+then :
+  printf %s "(cached) " >&6
+else case e in #(
+  e) if test -n "$ac_ct_RANLIB"; then
+  ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test.
+else
+as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  case $as_dir in #(((
+    '') as_dir=./ ;;
+    */) ;;
+    *) as_dir=$as_dir/ ;;
+  esac
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
+    ac_cv_prog_ac_ct_RANLIB="ranlib"
+    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+fi ;;
+esac
+fi
+ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB
+if test -n "$ac_ct_RANLIB"; then
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5
+printf "%s\n" "$ac_ct_RANLIB" >&6; }
+else
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+
+  if test "x$ac_ct_RANLIB" = x; then
+    RANLIB=":"
+  else
+    case $cross_compiling:$ac_tool_warned in
+yes:)
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
+printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
+ac_tool_warned=yes ;;
+esac
+    RANLIB=$ac_ct_RANLIB
+  fi
+else
+  RANLIB="$ac_cv_prog_RANLIB"
+fi
+
 if test -n "$ac_tool_prefix"; then
   for ac_prog in ar
   do
@@ -8409,7 +8454,7 @@ fi
 
 # Use ARFLAGS variable as AR's operation code to sync the variable naming with
 # Automake.  If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have
-# higher priority because thats what people were doing historically (setting
+# higher priority because that's what people were doing historically (setting
 # ARFLAGS for automake and AR_FLAGS for libtool).  FIXME: Make the AR_FLAGS
 # variable obsoleted/removed.
 
@@ -8601,109 +8646,6 @@ test -z "$STRIP" && STRIP=:
 
 
 
-if test -n "$ac_tool_prefix"; then
-  # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args.
-set dummy ${ac_tool_prefix}ranlib; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_RANLIB+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$RANLIB"; then
-  ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-fi ;;
-esac
-fi
-RANLIB=$ac_cv_prog_RANLIB
-if test -n "$RANLIB"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5
-printf "%s\n" "$RANLIB" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-fi
-if test -z "$ac_cv_prog_RANLIB"; then
-  ac_ct_RANLIB=$RANLIB
-  # Extract the first word of "ranlib", so it can be a program name with args.
-set dummy ranlib; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_ac_ct_RANLIB+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_RANLIB"; then
-  ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_ac_ct_RANLIB="ranlib"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-fi ;;
-esac
-fi
-ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB
-if test -n "$ac_ct_RANLIB"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5
-printf "%s\n" "$ac_ct_RANLIB" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-  if test "x$ac_ct_RANLIB" = x; then
-    RANLIB=":"
-  else
-    case $cross_compiling:$ac_tool_warned in
-yes:)
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
-printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
-ac_tool_warned=yes ;;
-esac
-    RANLIB=$ac_ct_RANLIB
-  fi
-else
-  RANLIB="$ac_cv_prog_RANLIB"
-fi
 
 test -z "$RANLIB" && RANLIB=:
 
@@ -8718,15 +8660,8 @@ old_postinstall_cmds='chmod 644 $oldlib'
 old_postuninstall_cmds=
 
 if test -n "$RANLIB"; then
-  case $host_os in
-  bitrig* | openbsd*)
-    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib"
-    ;;
-  *)
-    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
-    ;;
-  esac
   old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib"
+  old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
 fi
 
 case $host_os in
@@ -8806,7 +8741,7 @@ case $host_os in
 aix*)
   symcode='[BCDT]'
   ;;
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   symcode='[ABCDGISTW]'
   ;;
 hpux*)
@@ -8821,7 +8756,7 @@ osf*)
   symcode='[BCDEGQRST]'
   ;;
 solaris*)
-  symcode='[BDRT]'
+  symcode='[BCDRT]'
   ;;
 sco3.2v5*)
   symcode='[DT]'
@@ -8885,7 +8820,7 @@ $lt_c_name_lib_hook\
 # Handle CRLF in mingw tool chain
 opt_cr=
 case $build_os in
-mingw*)
+mingw* | windows*)
   opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp
   ;;
 esac
@@ -8936,7 +8871,7 @@ void nm_test_func(void){}
 #ifdef __cplusplus
 }
 #endif
-int main(){nm_test_var='a';nm_test_func();return(0);}
+int main(void){nm_test_var='a';nm_test_func();return(0);}
 _LT_EOF
 
   if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
@@ -9121,7 +9056,9 @@ lt_sysroot=
 case $with_sysroot in #(
  yes)
    if test yes = "$GCC"; then
-     lt_sysroot=`$CC --print-sysroot 2>/dev/null`
+     # Trim trailing / since we'll always append absolute paths and we want
+     # to avoid //, if only for less confusing output for the user.
+     lt_sysroot=`$CC --print-sysroot 2>/dev/null | $SED 's:/\+$::'`
    fi
    ;; #(
  /*)
@@ -9338,7 +9275,7 @@ mips64*-*linux*)
   ;;
 
 x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \
-s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
+s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
   # Find out what ABI is being produced by ac_compile, and set linker
   # options accordingly.  Note that the listed cases only cover the
   # situations where additional linker options are needed (such as when
@@ -9357,7 +9294,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_i386_fbsd"
 	    ;;
-	  x86_64-*linux*)
+	  x86_64-*linux*|x86_64-gnu*)
 	    case `$FILECMD conftest.o` in
 	      *x86-64*)
 		LD="${LD-ld} -m elf32_x86_64"
@@ -9386,7 +9323,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_x86_64_fbsd"
 	    ;;
-	  x86_64-*linux*)
+	  x86_64-*linux*|x86_64-gnu*)
 	    LD="${LD-ld} -m elf_x86_64"
 	    ;;
 	  powerpcle-*linux*)
@@ -9607,23 +9544,23 @@ fi
 test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $MANIFEST_TOOL is a manifest tool" >&5
 printf %s "checking if $MANIFEST_TOOL is a manifest tool... " >&6; }
-if test ${lt_cv_path_mainfest_tool+y}
+if test ${lt_cv_path_manifest_tool+y}
 then :
   printf %s "(cached) " >&6
 else case e in #(
-  e) lt_cv_path_mainfest_tool=no
+  e) lt_cv_path_manifest_tool=no
   echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&5
   $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out
   cat conftest.err >&5
   if $GREP 'Manifest Tool' conftest.out > /dev/null; then
-    lt_cv_path_mainfest_tool=yes
+    lt_cv_path_manifest_tool=yes
   fi
   rm -f conftest* ;;
 esac
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_mainfest_tool" >&5
-printf "%s\n" "$lt_cv_path_mainfest_tool" >&6; }
-if test yes != "$lt_cv_path_mainfest_tool"; then
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_manifest_tool" >&5
+printf "%s\n" "$lt_cv_path_manifest_tool" >&6; }
+if test yes != "$lt_cv_path_manifest_tool"; then
   MANIFEST_TOOL=:
 fi
 
@@ -10218,6 +10155,45 @@ fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_apple_cc_single_mod" >&5
 printf "%s\n" "$lt_cv_apple_cc_single_mod" >&6; }
 
+    # Feature test to disable chained fixups since it is not
+    # compatible with '-undefined dynamic_lookup'
+    { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -no_fixup_chains linker flag" >&5
+printf %s "checking for -no_fixup_chains linker flag... " >&6; }
+if test ${lt_cv_support_no_fixup_chains+y}
+then :
+  printf %s "(cached) " >&6
+else case e in #(
+  e)  save_LDFLAGS=$LDFLAGS
+        LDFLAGS="$LDFLAGS -Wl,-no_fixup_chains"
+        cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"
+then :
+  lt_cv_support_no_fixup_chains=yes
+else case e in #(
+  e) lt_cv_support_no_fixup_chains=no
+         ;;
+esac
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam \
+    conftest$ac_exeext conftest.$ac_ext
+        LDFLAGS=$save_LDFLAGS
+
+     ;;
+esac
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_support_no_fixup_chains" >&5
+printf "%s\n" "$lt_cv_support_no_fixup_chains" >&6; }
+
     { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -exported_symbols_list linker flag" >&5
 printf %s "checking for -exported_symbols_list linker flag... " >&6; }
 if test ${lt_cv_ld_exported_symbols_list+y}
@@ -10272,7 +10248,7 @@ _LT_EOF
       echo "$RANLIB libconftest.a" >&5
       $RANLIB libconftest.a 2>&5
       cat > conftest.c << _LT_EOF
-int main() { return 0;}
+int main(void) { return 0;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&5
       $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err
@@ -10301,7 +10277,11 @@ printf "%s\n" "$lt_cv_ld_force_load" >&6; }
         10.[012],*|,*powerpc*-darwin[5-8]*)
           _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
         *)
-          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
+          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup'
+          if test yes = "$lt_cv_support_no_fixup_chains"; then
+            as_fn_append _lt_dar_allow_undefined ' $wl-no_fixup_chains'
+          fi
+        ;;
       esac
     ;;
   esac
@@ -10382,7 +10362,7 @@ func_stripname_cnf ()
 enable_win32_dll=yes
 
 case $host in
-*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*)
+*-*-cygwin* | *-*-mingw* | *-*-windows* | *-*-pw32* | *-*-cegcc*)
   if test -n "$ac_tool_prefix"; then
   # Extract the first word of "${ac_tool_prefix}as", so it can be a program name with args.
 set dummy ${ac_tool_prefix}as; ac_word=$2
@@ -10788,31 +10768,54 @@ fi
 
 
 
-
-# Check whether --with-pic was given.
+  # Check whether --enable-pic was given.
+if test ${enable_pic+y}
+then :
+  enableval=$enable_pic; lt_p=${PACKAGE-default}
+     case $enableval in
+     yes|no) pic_mode=$enableval ;;
+     *)
+       pic_mode=default
+       # Look at the argument we got.  We use all the common list separators.
+       lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+       for lt_pkg in $enableval; do
+	 IFS=$lt_save_ifs
+	 if test "X$lt_pkg" = "X$lt_p"; then
+	   pic_mode=yes
+	 fi
+       done
+       IFS=$lt_save_ifs
+       ;;
+     esac
+else case e in #(
+  e)           # Check whether --with-pic was given.
 if test ${with_pic+y}
 then :
   withval=$with_pic; lt_p=${PACKAGE-default}
-    case $withval in
-    yes|no) pic_mode=$withval ;;
-    *)
-      pic_mode=default
-      # Look at the argument we got.  We use all the common list separators.
-      lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-      for lt_pkg in $withval; do
-	IFS=$lt_save_ifs
-	if test "X$lt_pkg" = "X$lt_p"; then
-	  pic_mode=yes
-	fi
-      done
-      IFS=$lt_save_ifs
-      ;;
-    esac
+	 case $withval in
+	 yes|no) pic_mode=$withval ;;
+	 *)
+	   pic_mode=default
+	   # Look at the argument we got.  We use all the common list separators.
+	   lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+	   for lt_pkg in $withval; do
+	     IFS=$lt_save_ifs
+	     if test "X$lt_pkg" = "X$lt_p"; then
+	       pic_mode=yes
+	     fi
+	   done
+	   IFS=$lt_save_ifs
+	   ;;
+	 esac
 else case e in #(
   e) pic_mode=default ;;
 esac
 fi
 
+     ;;
+esac
+fi
+
 
 
 
@@ -10857,18 +10860,29 @@ case $host,$enable_shared in
 power*-*-aix[5-9]*,yes)
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking which variant of shared library versioning to provide" >&5
 printf %s "checking which variant of shared library versioning to provide... " >&6; }
-
-# Check whether --with-aix-soname was given.
+  # Check whether --enable-aix-soname was given.
+if test ${enable_aix_soname+y}
+then :
+  enableval=$enable_aix_soname; case $enableval in
+     aix|svr4|both)
+       ;;
+     *)
+       as_fn_error $? "Unknown argument to --enable-aix-soname" "$LINENO" 5
+       ;;
+     esac
+     lt_cv_with_aix_soname=$enable_aix_soname
+else case e in #(
+  e) # Check whether --with-aix-soname was given.
 if test ${with_aix_soname+y}
 then :
   withval=$with_aix_soname; case $withval in
-    aix|svr4|both)
-      ;;
-    *)
-      as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5
-      ;;
-    esac
-    lt_cv_with_aix_soname=$with_aix_soname
+         aix|svr4|both)
+           ;;
+         *)
+           as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5
+           ;;
+         esac
+         lt_cv_with_aix_soname=$with_aix_soname
 else case e in #(
   e) if test ${lt_cv_with_aix_soname+y}
 then :
@@ -10876,12 +10890,16 @@ then :
 else case e in #(
   e) lt_cv_with_aix_soname=aix ;;
 esac
+fi
+ ;;
+esac
 fi
 
-    with_aix_soname=$lt_cv_with_aix_soname ;;
+     enable_aix_soname=$lt_cv_with_aix_soname ;;
 esac
 fi
 
+  with_aix_soname=$enable_aix_soname
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $with_aix_soname" >&5
 printf "%s\n" "$with_aix_soname" >&6; }
   if test aix != "$with_aix_soname"; then
@@ -11197,7 +11215,7 @@ objext=$objext
 lt_simple_compile_test_code="int some_variable = 0;"
 
 # Code to be used in simple link tests
-lt_simple_link_test_code='int main(){return(0);}'
+lt_simple_link_test_code='int main(void){return(0);}'
 
 
 
@@ -11339,7 +11357,7 @@ lt_prog_compiler_static=
       # PIC is the default for these OSes.
       ;;
 
-    mingw* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -11442,7 +11460,7 @@ lt_prog_compiler_static=
       esac
       ;;
 
-    mingw* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       lt_prog_compiler_pic='-DDLL_EXPORT'
@@ -11483,6 +11501,12 @@ lt_prog_compiler_static=
 	lt_prog_compiler_pic='-KPIC'
 	lt_prog_compiler_static='-static'
         ;;
+      *flang* | ftn)
+        # Flang compiler.
+	lt_prog_compiler_wl='-Wl,'
+	lt_prog_compiler_pic='-fPIC'
+	lt_prog_compiler_static='-static'
+        ;;
       # icc used to be incompatible with GCC.
       # ICC 10 doesn't accept -KPIC any more.
       icc* | ifort*)
@@ -11954,7 +11978,7 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
   extract_expsyms_cmds=
 
   case $host_os in
-  cygwin* | mingw* | pw32* | cegcc*)
+  cygwin* | mingw* | windows* | pw32* | cegcc*)
     # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time
     # When not using gcc, we currently assume that we are using
     # Microsoft Visual C++ or Intel C++ Compiler.
@@ -11966,7 +11990,7 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
     # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC)
     with_gnu_ld=yes
     ;;
-  openbsd* | bitrig*)
+  openbsd*)
     with_gnu_ld=no
     ;;
   esac
@@ -12069,7 +12093,7 @@ _LT_EOF
       fi
       ;;
 
-    cygwin* | mingw* | pw32* | cegcc*)
+    cygwin* | mingw* | windows* | pw32* | cegcc*)
       # _LT_TAGVAR(hardcode_libdir_flag_spec, ) is actually meaningless,
       # as there is no search path for DLLs.
       hardcode_libdir_flag_spec='-L$libdir'
@@ -12125,7 +12149,7 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      old_archive_from_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       enable_shared_with_static_runtimes=yes
       file_list_spec='@'
       ;;
@@ -12616,7 +12640,7 @@ fi
       export_dynamic_flag_spec=-rdynamic
       ;;
 
-    cygwin* | mingw* | pw32* | cegcc*)
+    cygwin* | mingw* | windows* | pw32* | cegcc*)
       # When not using gcc, we currently assume that we are using
       # Microsoft Visual C++ or Intel C++ Compiler.
       # hardcode_libdir_flag_spec is actually meaningless, as there is
@@ -12633,14 +12657,14 @@ fi
 	# Tell ltmain to make .dll files, not .so files.
 	shrext_cmds=.dll
 	# FIXME: Setting linknames here is a bad hack.
-	archive_cmds='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
+	archive_cmds='$CC -Fe $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
 	archive_expsym_cmds='if   test DEF = "`$SED -n     -e '\''s/^[	 ]*//'\''     -e '\''/^\(;.*\)*$/d'\''     -e '\''s/^\(EXPORTS\|LIBRARY\)\([	 ].*\)*$/DEF/p'\''     -e q     $export_symbols`" ; then
             cp "$export_symbols" "$output_objdir/$soname.def";
             echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp";
           else
             $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp;
           fi~
-          $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
+          $CC -Fe $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
           linknames='
 	# The linker will not automatically build a static lib if we build a DLL.
 	# _LT_TAGVAR(old_archive_from_new_cmds, )='true'
@@ -12949,7 +12973,7 @@ printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
     *nto* | *qnx*)
       ;;
 
-    openbsd* | bitrig*)
+    openbsd*)
       if test -f /usr/libexec/ld.so; then
 	hardcode_direct=yes
 	hardcode_shlibpath_var=no
@@ -12992,7 +13016,7 @@ printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      old_archive_from_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       enable_shared_with_static_runtimes=yes
       file_list_spec='@'
       ;;
@@ -13434,7 +13458,7 @@ if test yes = "$GCC"; then
     *) lt_awk_arg='/^libraries:/' ;;
   esac
   case $host_os in
-    mingw* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;;
+    mingw* | windows* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;;
     *) lt_sed_strip_eq='s|=/|/|g' ;;
   esac
   lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq`
@@ -13492,7 +13516,7 @@ BEGIN {RS = " "; FS = "/|\n";} {
   # AWK program above erroneously prepends '/' to C:/dos/paths
   # for these hosts.
   case $host_os in
-    mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
+    mingw* | windows* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
       $SED 's|/\([A-Za-z]:\)|\1|g'` ;;
   esac
   sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP`
@@ -13566,7 +13590,7 @@ aix[4-9]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -13660,7 +13684,7 @@ bsdi[45]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -13671,6 +13695,19 @@ cygwin* | mingw* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
+    # If user builds GCC with mulitlibs enabled,
+    # it should just install on $(libdir)
+    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+    if test yes = $multilib; then
+    postinstall_cmds='base_file=`basename \$file`~
+      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
+      dldir=$destdir/`dirname \$dlpath`~
+      $install_prog $dir/$dlname $destdir/$dlname~
+      chmod a+x $destdir/$dlname~
+      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+      fi'
+    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -13680,6 +13717,7 @@ cygwin* | mingw* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
+    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -13692,7 +13730,7 @@ cygwin* | mingw* | pw32* | cegcc*)
 
       sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api"
       ;;
-    mingw* | cegcc*)
+    mingw* | windows* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
@@ -13711,7 +13749,7 @@ cygwin* | mingw* | pw32* | cegcc*)
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw*)
+    mingw* | windows*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -13818,7 +13856,28 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
+  case $host_cpu in
+    powerpc64)
+      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
+      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int test_pointer_size[sizeof (void *) - 5];
+
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
   shlibpath_var=LD_LIBRARY_PATH
+else case e in #(
+  e) shlibpath_var=LD_32_LIBRARY_PATH ;;
+esac
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+      ;;
+    *)
+      shlibpath_var=LD_LIBRARY_PATH
+      ;;
+  esac
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -13959,7 +14018,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext'
+  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -13971,8 +14030,9 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # Don't embed -rpath directories since the linker doesn't support them.
-  hardcode_libdir_flag_spec='-L$libdir'
+  # -rpath works at least for libraries that are not overridden by
+  # libraries installed in system locations.
+  hardcode_libdir_flag_spec='$wl-rpath $wl$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -14029,7 +14089,7 @@ fi
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directores which are
+  # Ideally, we could use ldconfig to report *all* directories which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -14086,7 +14146,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd* | bitrig*)
+openbsd*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -14427,7 +14487,7 @@ else
     lt_cv_dlopen_self=yes
     ;;
 
-  mingw* | pw32* | cegcc*)
+  mingw* | windows* | pw32* | cegcc*)
     lt_cv_dlopen=LoadLibrary
     lt_cv_dlopen_libs=
     ;;
@@ -14800,11 +14860,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord () __attribute__((visibility("default")));
+int fnord (void) __attribute__((visibility("default")));
 #endif
 
-int fnord () { return 42; }
-int main ()
+int fnord (void) { return 42; }
+int main (void)
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -14908,11 +14968,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord () __attribute__((visibility("default")));
+int fnord (void) __attribute__((visibility("default")));
 #endif
 
-int fnord () { return 42; }
-int main ()
+int fnord (void) { return 42; }
+int main (void)
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -15373,7 +15433,7 @@ if test yes = "$GCC"; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5
 printf %s "checking for ld used by $CC... " >&6; }
   case $host in
-  *-*-mingw*)
+  *-*-mingw* | *-*-windows*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -15488,8 +15548,7 @@ with_gnu_ld=$lt_cv_prog_gnu_ld
         wlarc='$wl'
 
         # ancient GNU ld didn't support --whole-archive et. al.
-        if eval "`$CC -print-prog-name=ld` --help 2>&1" |
-	  $GREP 'no-whole-archive' > /dev/null; then
+        if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then
           whole_archive_flag_spec_CXX=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive'
         else
           whole_archive_flag_spec_CXX=
@@ -15509,7 +15568,7 @@ with_gnu_ld=$lt_cv_prog_gnu_ld
       # Commands to make compiler produce verbose output that lists
       # what "hidden" libraries, object files and flags are used when
       # linking a shared library.
-      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
 
     else
       GXX=no
@@ -15809,7 +15868,7 @@ fi
         esac
         ;;
 
-      cygwin* | mingw* | pw32* | cegcc*)
+      cygwin* | mingw* | windows* | pw32* | cegcc*)
 	case $GXX,$cc_basename in
 	,cl* | no,cl* | ,icl* | no,icl*)
 	  # Native MSVC or ICC
@@ -15940,7 +15999,7 @@ fi
 	  cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	  $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	  emximp -o $lib $output_objdir/$libname.def'
-	old_archive_From_new_cmds_CXX='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+	old_archive_from_new_cmds_CXX='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
 	enable_shared_with_static_runtimes_CXX=yes
 	file_list_spec_CXX='@'
 	;;
@@ -16008,7 +16067,7 @@ fi
             # explicitly linking system object files so we need to strip them
             # from the output so that they don't get included in the library
             # dependencies.
-            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "[-]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
             ;;
           *)
             if test yes = "$GXX"; then
@@ -16073,7 +16132,7 @@ fi
 	    # explicitly linking system object files so we need to strip them
 	    # from the output so that they don't get included in the library
 	    # dependencies.
-	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "[-]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
 	    ;;
           *)
 	    if test yes = "$GXX"; then
@@ -16321,7 +16380,7 @@ fi
         ld_shlibs_CXX=yes
 	;;
 
-      openbsd* | bitrig*)
+      openbsd*)
 	if test -f /usr/libexec/ld.so; then
 	  hardcode_direct_CXX=yes
 	  hardcode_shlibpath_var_CXX=no
@@ -16412,7 +16471,7 @@ fi
 	      # Commands to make compiler produce verbose output that lists
 	      # what "hidden" libraries, object files and flags are used when
 	      # linking a shared library.
-	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
 
 	    else
 	      # FIXME: insert proper C++ library support
@@ -16496,7 +16555,7 @@ fi
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
 	      else
 	        # g++ 2.7 appears to require '-G' NOT '-shared' on this
 	        # platform.
@@ -16507,7 +16566,7 @@ fi
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
 	      fi
 
 	      hardcode_libdir_flag_spec_CXX='$wl-R $wl$libdir'
@@ -16650,10 +16709,11 @@ if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
     case $prev$p in
 
     -L* | -R* | -l*)
-       # Some compilers place space between "-{L,R}" and the path.
+       # Some compilers place space between "-{L,R,l}" and the path.
        # Remove the space.
-       if test x-L = "$p" ||
-          test x-R = "$p"; then
+       if test x-L = x"$p" ||
+          test x-R = x"$p" ||
+          test x-l = x"$p"; then
 	 prev=$p
 	 continue
        fi
@@ -16820,7 +16880,7 @@ lt_prog_compiler_static_CXX=
     beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
       # PIC is the default for these OSes.
       ;;
-    mingw* | cygwin* | os2* | pw32* | cegcc*)
+    mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -16895,7 +16955,7 @@ lt_prog_compiler_static_CXX=
 	  ;;
 	esac
 	;;
-      mingw* | cygwin* | os2* | pw32* | cegcc*)
+      mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
 	# This hack is so that the source file can tell whether it is being
 	# built for inclusion in a dll (and should export symbols for example).
 	lt_prog_compiler_pic_CXX='-DDLL_EXPORT'
@@ -17394,7 +17454,7 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
   pw32*)
     export_symbols_cmds_CXX=$ltdll_cmds
     ;;
-  cygwin* | mingw* | cegcc*)
+  cygwin* | mingw* | windows* | cegcc*)
     case $cc_basename in
     cl* | icl*)
       exclude_expsyms_CXX='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*'
@@ -17623,7 +17683,7 @@ aix[4-9]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -17717,7 +17777,7 @@ bsdi[45]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -17728,6 +17788,19 @@ cygwin* | mingw* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
+    # If user builds GCC with mulitlibs enabled,
+    # it should just install on $(libdir)
+    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+    if test yes = $multilib; then
+    postinstall_cmds='base_file=`basename \$file`~
+      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
+      dldir=$destdir/`dirname \$dlpath`~
+      $install_prog $dir/$dlname $destdir/$dlname~
+      chmod a+x $destdir/$dlname~
+      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+      fi'
+    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -17737,6 +17810,7 @@ cygwin* | mingw* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
+    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -17748,7 +17822,7 @@ cygwin* | mingw* | pw32* | cegcc*)
       soname_spec='`echo $libname | $SED -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
 
       ;;
-    mingw* | cegcc*)
+    mingw* | windows* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
@@ -17767,7 +17841,7 @@ cygwin* | mingw* | pw32* | cegcc*)
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw*)
+    mingw* | windows*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -17873,7 +17947,28 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
+  case $host_cpu in
+    powerpc64)
+      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
+      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int test_pointer_size[sizeof (void *) - 5];
+
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"
+then :
   shlibpath_var=LD_LIBRARY_PATH
+else case e in #(
+  e) shlibpath_var=LD_32_LIBRARY_PATH ;;
+esac
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+      ;;
+    *)
+      shlibpath_var=LD_LIBRARY_PATH
+      ;;
+  esac
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -18014,7 +18109,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext'
+  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -18026,8 +18121,9 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # Don't embed -rpath directories since the linker doesn't support them.
-  hardcode_libdir_flag_spec_CXX='-L$libdir'
+  # -rpath works at least for libraries that are not overridden by
+  # libraries installed in system locations.
+  hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -18084,7 +18180,7 @@ fi
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directores which are
+  # Ideally, we could use ldconfig to report *all* directories which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -18141,7 +18237,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd* | bitrig*)
+openbsd*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -22653,6 +22749,14 @@ fi
 
 
 
+ac_fn_check_decl "$LINENO" "memmem" "ac_cv_have_decl_memmem" "$cares_all_includes
+" "$ac_c_undeclared_builtin_options" "CFLAGS"
+if test "x$ac_cv_have_decl_memmem" = xyes
+then :
+
+printf "%s\n" "#define HAVE_MEMMEM 1" >>confdefs.h
+
+fi
 ac_fn_check_decl "$LINENO" "recv" "ac_cv_have_decl_recv" "$cares_all_includes
 " "$ac_c_undeclared_builtin_options" "CFLAGS"
 if test "x$ac_cv_have_decl_recv" = xyes
@@ -22676,6 +22780,14 @@ then :
 
 printf "%s\n" "#define HAVE_SEND 1" >>confdefs.h
 
+fi
+ac_fn_check_decl "$LINENO" "sendto" "ac_cv_have_decl_sendto" "$cares_all_includes
+" "$ac_c_undeclared_builtin_options" "CFLAGS"
+if test "x$ac_cv_have_decl_sendto" = xyes
+then :
+
+printf "%s\n" "#define HAVE_SENDTO 1" >>confdefs.h
+
 fi
 ac_fn_check_decl "$LINENO" "getnameinfo" "ac_cv_have_decl_getnameinfo" "$cares_all_includes
 " "$ac_c_undeclared_builtin_options" "CFLAGS"
@@ -26711,7 +26823,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by c-ares $as_me 1.33.1, which was
+This file was extended by c-ares $as_me 1.34.1, which was
 generated by GNU Autoconf 2.72.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -26779,7 +26891,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config='$ac_cs_config_escaped'
 ac_cs_version="\\
-c-ares config.status 1.33.1
+c-ares config.status 1.34.1
 configured by $0, generated by GNU Autoconf 2.72,
   with options \\"\$ac_cs_config\\"
 
@@ -28011,19 +28123,18 @@ See 'config.log' for more details" "$LINENO" 5; }
     cat <<_LT_EOF >> "$cfgfile"
 #! $SHELL
 # Generated automatically by $as_me ($PACKAGE) $VERSION
-# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`:
 # NOTE: Changes made to this file will be lost: look at ltmain.sh.
 
 # Provide generalized library-building support services.
 # Written by Gordon Matzigkeit, 1996
 
-# Copyright (C) 2014 Free Software Foundation, Inc.
+# Copyright (C) 2024 Free Software Foundation, Inc.
 # This is free software; see the source for copying conditions.  There is NO
 # warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 # GNU Libtool is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of of the License, or
+# the Free Software Foundation; either version 2 of the License, or
 # (at your option) any later version.
 #
 # As a special exception to the GNU General Public License, if you
@@ -28407,7 +28518,7 @@ hardcode_direct=$hardcode_direct
 
 # Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes
 # DIR into the resulting binary and the resulting library dependency is
-# "absolute",i.e impossible to change by setting \$shlibpath_var if the
+# "absolute",i.e. impossible to change by setting \$shlibpath_var if the
 # library is relocated.
 hardcode_direct_absolute=$hardcode_direct_absolute
 
@@ -28650,7 +28761,7 @@ hardcode_direct=$hardcode_direct_CXX
 
 # Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes
 # DIR into the resulting binary and the resulting library dependency is
-# "absolute",i.e impossible to change by setting \$shlibpath_var if the
+# "absolute",i.e. impossible to change by setting \$shlibpath_var if the
 # library is relocated.
 hardcode_direct_absolute=$hardcode_direct_absolute_CXX
 
diff --git a/deps/cares/configure.ac b/deps/cares/configure.ac
index 59fd975b64f873..cc52c2c6c5de0a 100644
--- a/deps/cares/configure.ac
+++ b/deps/cares/configure.ac
@@ -2,10 +2,10 @@ dnl Copyright (C) The c-ares project and its contributors
 dnl SPDX-License-Identifier: MIT
 AC_PREREQ([2.69])
 
-AC_INIT([c-ares], [1.33.1],
+AC_INIT([c-ares], [1.34.1],
   [c-ares mailing list: http://lists.haxx.se/listinfo/c-ares])
 
-CARES_VERSION_INFO="20:1:18"
+CARES_VERSION_INFO="21:1:19"
 dnl This flag accepts an argument of the form current[:revision[:age]]. So,
 dnl passing -version-info 3:12:1 sets current to 3, revision to 12, and age to
 dnl 1.
@@ -540,9 +540,11 @@ dnl https://mailman.videolan.org/pipermail/vlc-devel/2015-March/101802.html
 dnl which would require we check each individually and provide function arguments
 dnl for the test.
 
+AC_CHECK_DECL(memmem,          [AC_DEFINE([HAVE_MEMMEM],            1, [Define to 1 if you have `memmem`]         )], [], $cares_all_includes)
 AC_CHECK_DECL(recv,            [AC_DEFINE([HAVE_RECV],              1, [Define to 1 if you have `recv`]           )], [], $cares_all_includes)
 AC_CHECK_DECL(recvfrom,        [AC_DEFINE([HAVE_RECVFROM],          1, [Define to 1 if you have `recvfrom`]       )], [], $cares_all_includes)
 AC_CHECK_DECL(send,            [AC_DEFINE([HAVE_SEND],              1, [Define to 1 if you have `send`]           )], [], $cares_all_includes)
+AC_CHECK_DECL(sendto,          [AC_DEFINE([HAVE_SENDTO],            1, [Define to 1 if you have `sendto`]         )], [], $cares_all_includes)
 AC_CHECK_DECL(getnameinfo,     [AC_DEFINE([HAVE_GETNAMEINFO],       1, [Define to 1 if you have `getnameinfo`]    )], [], $cares_all_includes)
 AC_CHECK_DECL(gethostname,     [AC_DEFINE([HAVE_GETHOSTNAME],       1, [Define to 1 if you have `gethostname`]    )], [], $cares_all_includes)
 AC_CHECK_DECL(connect,         [AC_DEFINE([HAVE_CONNECT],           1, [Define to 1 if you have `connect`]        )], [], $cares_all_includes)
diff --git a/deps/cares/docs/Makefile.in b/deps/cares/docs/Makefile.in
index a57cd0abc18846..75b3f3d942bbd6 100644
--- a/deps/cares/docs/Makefile.in
+++ b/deps/cares/docs/Makefile.in
@@ -457,6 +457,9 @@ MANPAGES = ares_cancel.3		\
   ares_parse_txt_reply.3		\
   ares_parse_uri_reply.3		\
   ares_process.3			\
+  ares_process_fd.3			\
+  ares_process_fds.3			\
+  ares_process_pending_write.3		\
   ares_query.3				\
   ares_query_dnsrec.3			\
   ares_queue.3				\
@@ -471,6 +474,7 @@ MANPAGES = ares_cancel.3		\
   ares_set_local_dev.3			\
   ares_set_local_ip4.3			\
   ares_set_local_ip6.3			\
+  ares_set_pending_write_cb.3	\
   ares_set_server_state_callback.3	\
   ares_set_servers.3			\
   ares_set_servers_csv.3		\
diff --git a/deps/cares/docs/Makefile.inc b/deps/cares/docs/Makefile.inc
index 46e30346cfb86a..d6ad73246b3f99 100644
--- a/deps/cares/docs/Makefile.inc
+++ b/deps/cares/docs/Makefile.inc
@@ -114,6 +114,9 @@ MANPAGES = ares_cancel.3		\
   ares_parse_txt_reply.3		\
   ares_parse_uri_reply.3		\
   ares_process.3			\
+  ares_process_fd.3			\
+  ares_process_fds.3			\
+  ares_process_pending_write.3		\
   ares_query.3				\
   ares_query_dnsrec.3			\
   ares_queue.3				\
@@ -128,6 +131,7 @@ MANPAGES = ares_cancel.3		\
   ares_set_local_dev.3			\
   ares_set_local_ip4.3			\
   ares_set_local_ip6.3			\
+  ares_set_pending_write_cb.3	\
   ares_set_server_state_callback.3	\
   ares_set_servers.3			\
   ares_set_servers_csv.3		\
diff --git a/deps/cares/docs/adig.1 b/deps/cares/docs/adig.1
index 59923790587ddd..e0b81c91e3f9fa 100644
--- a/deps/cares/docs/adig.1
+++ b/deps/cares/docs/adig.1
@@ -3,65 +3,177 @@
 .\" Copyright (C) Daniel Stenberg
 .\" SPDX-License-Identifier: MIT
 .\"
-.TH ADIG "1" "April 2011" "c-ares utilities"
+.TH ADIG "1" "Sept 2024" "c-ares utilities"
 .SH NAME
 adig \- print information collected from Domain Name System (DNS) servers
 .SH SYNOPSIS
-.B adig
-[\fIOPTION\fR]... \fINAME\fR...
+\fBadig\fP [\fI@server\fR] [\fI-c class\fR] [\fI-p port#\fR] [\fI-q name\fR]
+[\fI-t type\fR] [\fI-x addr\fR] [\fIname\fR] [\fItype\fR] [\fIclass\fR]
+[\fIqueryopt\fR...]
+
 .SH DESCRIPTION
 .PP
-.\" Add any additional description here
-.PP
-Send queries to DNS servers about \fINAME\fR and print received
-information, where \fINAME\fR is a valid DNS name (e.g. www.example.com,
+Send queries to DNS servers about \fUname\fR and print received
+information, where \fIname\fR is a valid DNS name (e.g. www.example.com,
 1.2.3.10.in-addr.arpa).
 .PP
 This utility comes with the \fBc\-ares\fR asynchronous resolver library.
-.SH OPTIONS
+.PP
+It is possible to specify default arguments for \fBadig\fR via \fB${XDG_CONFIG_HOME}/adigrc\fR.
+.SH ARGS
+.TP
+\fB@server\fR
+Server ip address.  May specify multiple in comma delimited format. May be
+specified in URI format.
+.TP
+\fBname\fR
+Name of the resource record that is to be looked up
+.TP
+\fBtype\fR
+What type of query is required.  e.g. - A, AAAA, MX, TXT, etc.  If not
+specified, A will be used.
+.TP
+\fBclass\fR
+Sets the query class, defaults to IN.  May also be HS or CH.
+
+.SH FLAGS
 .TP
 \fB\-c\fR class
-Set the query class.
-Possible values for class are
-ANY, CHAOS, HS and IN (default).
-.TP
-\fB\-d\fR
-Print some extra debugging output.
-.TP
-\fB\-f\fR flag
-Add a behavior control flag.
-Possible values for flag are
- igntc     - ignore query truncation, return answer as-is instead of retrying
-             via tcp.
- noaliases - don't honor the HOSTALIASES environment variable,
- norecurse - don't query upstream servers recursively,
- primary   - use the first server,
- stayopen  - don't close the communication sockets, and
- usevc     - always use TCP.
-.TP
-\fB\-h\fR, \fB\-?\fR
-Display this help and exit.
-.TP
-\fB\-s\fR server
-Connect to specified DNS server, instead of the system's default one(s).
-Servers are tried in round-robin, if the previous one failed.
+Sets the query class, defaults to IN.  May also be HS or CH.
+.TP
+\fB\-h\fR
+Prints the help.
+.TP
+\fB\-p\fR port
+Sends query to a port other than 53.  Often recommended to set the port using
+\fI@server\fR instead.
+.TP
+\fB\-q\fR name
+Specifies the domain name to query. Useful to distinguish name from other
+arguments
+.TP
+\fB\-r\fR
+Skip adigrc processing
+.TP
+\fB\-s\fR
+Server (alias for @server syntax), compatibility with old cmdline
 .TP
 \fB\-t\fR type
-Query records of specified type.
-Possible values for type are
-A (default), AAAA, ANY, AXFR, CNAME, HINFO, MX, NAPTR, NS, PTR, SOA, SRV, TXT,
-URI, CAA, SVCB, and HTTPS.
+Indicates resource record type to query. Useful to distinguish type from other
+arguments
+.TP
+\fB\-x\fR addr
+Simplified reverse lookups.  Sets the type to PTR and forms a valid in-arpa
+query string
+
+.SH QUERY OPTIONS
+.TP
+\fB+[no]aaonly\fR
+Sets the aa flag in the query. Default is off.
+.TP
+\fB+[no]aaflag\fR
+Alias for +[no]aaonly
+.TP
+\fB+[no]additional\fR
+Toggles printing the additional section. On by default.
+.TP
+\fB+[no]adflag\fR
+Sets the ad (authentic data) bit in the query. Default is off.
+.TP
+\fB+[no]aliases\fR
+Whether or not to honor the HOSTALIASES file. Default is on.
+.TP
+\fB+[no]all\fR
+Toggles all of +[no]cmd, +[no]stats, +[no]question, +[no]answer,
++[no]authority, +[no]additional, +[no]comments
+.TP
+\fB+[no]answer\fR
+Toggles printing the answer. On by default.
+.TP
+\fB+[no]authority\fR
+Toggles printing the authority. On by default.
 .TP
-\fB\-T\fR port
-Connect to the specified TCP port of DNS server.
+\fB+bufsize=\fR#
+UDP EDNS 0 packet size allowed. Defaults to 1232.
 .TP
-\fB\-U\fR port
-Connect to the specified UDP port of DNS server.
+\fB+[no]cdflag\fR
+Sets the CD (checking disabled) bit in the query. Default is off.
+.TP
+\fB+[no]class\fR
+Display the class when printing the record. On by default.
+.TP
+\fB+[no]cmd\fR
+Toggles printing the command requested. On by default.
+.TP
+\fB+[no]comments\fR
+Toggles printing the comments. On by default
+.TP
+\fB+[no]defname\fR
+Alias for +[no]search
+.TP
+\fB+domain=somename\fR
+Sets the search list to a single domain.
+.TP
+\fB+[no]dns0x20\fR
+Whether or not to use DNS 0x20 case randomization when sending queries.
+Default is off.
+.TP
+\fB+[no]edns\fR[=#]
+Enable or disable EDNS.  Only allows a value of 0 if specified. Default is to
+enable EDNS.
+.TP
+\fB+[no]ignore\fR
+Ignore truncation on UDP, by default retried on TCP.
+.TP
+\fB+[no]keepopen\fR
+Whether or not the server connection should be persistent. Default is off.
+.TP
+\fB+ndots\fR=#
+Sets the number of dots that must appear before being considered absolute.
+Defaults to 1.
+.TP
+\fB+[no]primary\fR
+Whether or not to only use a single server if more than one server is available.
+Defaults to using all servers.
+.TP
+\fB+[no]qr\fR
+Toggles printing the request query. Off by default.
+.TP
+\fB+[no]question\fR
+Toggles printing the question. On by default.
+.TP
+\fB+[no]recurse\fR
+Toggles the RD (Recursion Desired) bit. On by default.
+.TP
+\fB+retry\fR=#
+Same as +tries but does not include the initial attempt.
+.TP
+\fB+[no]search\fR
+To use or not use the search list. Search list is not used by default.
+.TP
+\fB+[no]stats\fR
+Toggles printing the statistics. On by default.
+.TP
+\fB+[no]tcp\fR
+Whether to use TCP when querying name servers. Default is UDP.
+.TP
+\fB+tries\fR=#
+Number of query tries. Defaults to 3.
+.TP
+\fB+[no]ttlid\fR
+Display the TTL when printing the record. On by default.
+.TP
+\fB+[no]vc\fR
+Alias for +[no]tcp
+
+.SH FILES
+
+${XDG_CONFIG_HOME}/adigrc
 
 .SH "REPORTING BUGS"
-Report bugs to the c-ares mailing list:
+Report bugs to the c-ares github issues tracker
 .br
-\fBhttps://lists.haxx.se/listinfo/c-ares\fR
+\fBhttps://github.com/c-ares/c-ares/issues\fR
 .SH "SEE ALSO"
 .PP
 ahost(1).
diff --git a/deps/cares/docs/ares_inet_pton.3 b/deps/cares/docs/ares_inet_pton.3
index 5b7b8010d22a3a..34b2df063c6820 100644
--- a/deps/cares/docs/ares_inet_pton.3
+++ b/deps/cares/docs/ares_inet_pton.3
@@ -9,7 +9,7 @@ ares_inet_pton \- convert an IPv4 or IPv6 address from text to binary form
 .nf
 #include <ares.h>
 
-const char *ares_inet_pton(int \fIaf\fP, const char *\fIsrc\fP, void *\fIdst\fP);
+int ares_inet_pton(int \fIaf\fP, const char *\fIsrc\fP, void *\fIdst\fP);
 .fi
 .SH DESCRIPTION
 This is a portable version with the identical functionality of the commonly
@@ -22,6 +22,11 @@ shall be supported. The \fBsrc\fP argument points to the string being passed
 in. The \fBdst\fP argument points to a buffer into which the function stores
 the numeric address; this shall be large enough to hold the numeric address
 (32 bits for AF_INET, 128 bits for AF_INET6).
+
+It returns 1 if the address was valid for the specified address family, or 0
+if the address was not parseable in the specified address family, or -1 if
+some system error occurred (in which case errno will have been set).
+
 .SH SEE ALSO
 .BR ares_init (3),
 .BR ares_inet_ntop (3)
diff --git a/deps/cares/docs/ares_init_options.3 b/deps/cares/docs/ares_init_options.3
index 694beb5ed28d13..9b3b4815355659 100644
--- a/deps/cares/docs/ares_init_options.3
+++ b/deps/cares/docs/ares_init_options.3
@@ -345,7 +345,8 @@ Configure server failover retry behavior.  When a DNS server fails to
 respond to a query, c-ares will deprioritize the server.  On subsequent
 queries, servers with fewer consecutive failures will be selected in
 preference.  However, in order to detect when such a server has recovered,
-c-ares will occasionally retry failed servers.  The
+c-ares will occasionally retry failed servers by probing with a copy of
+the query, without affecting the latency of the actual requested query.  The
 \fIares_server_failover_options\fP structure contains options to control this
 behavior.
 The \fIretry_chance\fP field gives the probability (1/N) of retrying a
@@ -367,7 +368,9 @@ for each resolution.
 .TP 23
 .B ARES_OPT_NOROTATE
 Do not perform round-robin nameserver selection; always use the list of
-nameservers in the same order.
+nameservers in the same order.  The default is not to rotate servers, however
+the system configuration can specify the desire to rotate and this
+configuration value can negate such a system configuration.
 .PP
 
 .SH RETURN VALUES
diff --git a/deps/cares/docs/ares_process.3 b/deps/cares/docs/ares_process.3
index d45d92a6259682..ce45a60d6c07d4 100644
--- a/deps/cares/docs/ares_process.3
+++ b/deps/cares/docs/ares_process.3
@@ -4,61 +4,106 @@
 .\"
 .TH ARES_PROCESS 3 "25 July 1998"
 .SH NAME
-ares_process \- Process events for name resolution
+ares_process_fds, ares_process_fd, ares_process \- Process events for name resolution
 .SH SYNOPSIS
 .nf
 #include <ares.h>
 
-void ares_process(ares_channel_t *\fIchannel\fP,
-                  fd_set *\fIread_fds\fP,
-                  fd_set *\fIwrite_fds\fP)
+/*! Events used by ares_fd_events_t */
+typedef enum {
+  ARES_FD_EVENT_NONE  = 0,      /*!< No events */
+  ARES_FD_EVENT_READ  = 1 << 0, /*!< Read event (including disconnect/error) */
+  ARES_FD_EVENT_WRITE = 1 << 1  /*!< Write event */
+} ares_fd_eventflag_t;
+
+/*! Type holding a file descriptor and mask of events, used by
+ *  ares_process_fds() */
+typedef struct {
+  ares_socket_t fd;     /*!< File descriptor */
+  unsigned int  events; /*!< Mask of ares_fd_event_t */
+} ares_fd_events_t;
+
+typedef enum {
+  ARES_PROCESS_FLAG_NONE        = 0,
+  ARES_PROCESS_FLAG_SKIP_NON_FD = 1 << 0
+} ares_process_flag_t;
+
+
+ares_status_t ares_process_fds(ares_channel_t         *\fIchannel\fP,
+                               const ares_fd_events_t *\fIevents\fP,
+                               size_t                  \fInevents\fP,
+                               unsigned int            \fIflags\fP)
 
 void ares_process_fd(ares_channel_t *\fIchannel\fP,
                      ares_socket_t \fIread_fd\fP,
                      ares_socket_t \fIwrite_fd\fP)
+
+void ares_process(ares_channel_t *\fIchannel\fP,
+                  fd_set *\fIread_fds\fP,
+                  fd_set *\fIwrite_fds\fP)
+
 .fi
 .SH DESCRIPTION
-The \fBares_process(3)\fP function handles input/output events and timeouts
-associated with queries pending on the name service channel identified by
-.IR channel .
-The file descriptor sets pointed to by \fIread_fds\fP and \fIwrite_fds\fP
-should have file descriptors set in them according to whether the file
-descriptors specified by \fIares_fds(3)\fP are ready for reading and writing.
-(The easiest way to determine this information is to invoke \fBselect(3)\fP
-with a timeout no greater than the timeout given by \fIares_timeout(3)\fP).
-
-The \fBares_process(3)\fP function will invoke callbacks for pending queries
-if they complete successfully or fail.
-
-\fBares_process_fd(3)\fP works the same way but acts and operates only on the
-specific file descriptors (sockets) you pass in to the function. Use
-ARES_SOCKET_BAD for "no action". This function is provided to allow users of
-c-ares to avoid \fIselect(3)\fP in their applications and within c-ares.
-
-To only process possible timeout conditions without a socket event occurring,
-one may pass NULL as the values for both \fIread_fds\fP and \fIwrite_fds\fP for
-\fBares_process(3)\fP, or ARES_SOCKET_BAD for both \fIread_fd\fP and
-\fIwrite_fd\fP for \fBares_process_fd(3)\fP.
-.SH EXAMPLE
-The following code fragment waits for all pending queries on a channel
-to complete:
+These functions must be used by integrators choosing not to use the
+EventThread enabled via \fBARES_OPT_EVENT_THREAD\fP passed to
+\fBares_init_options\fP.  This assumes integrators already have their own
+event loop handling event notifications for various file descriptors and
+wish to do the same with their integration with c-ares.
+
+The \fBares_process_fds(3)\fP function handles input/output events on file
+descriptors and timeouts associated with queries pending on the channel
+identified by \fIchannel\fP.  The file descriptors to be processed are passed
+in an array of \fIares_fd_events_t\fP data structures in the \fIfd\fP member,
+and events are a bitwise mask of \fIares_fd_eventflag_t\fP in the \fIevent\fP
+member.  This function can also be used to process timeouts by passing NULL
+to the \fIevents\fP member with \fInevents\fP value of 0.  Flags may also be
+specified in the \fIflags\fP field and are defined in \fBares_process_flag_t\fP.
+
+\fBARES_PROCESS_FLAG_SKIP_NON_FD\fP can be specified to specifically skip any
+processing unrelated to the file descriptor events passed in, examples include
+timeout processing and cleanup handling.  This is useful if an integrator
+knows they will be sending multiple \fIares_process_fds(3)\fP requests and
+wants to skip that extra processing.  However, the integrator must send the
+final request with the flag so that timeout and other processing gets performed
+before their event loop waits on additional events.
+
+It is allowable to use an \fIares_fd_events_t\fP with \fIevents\fP member of
+value \fIARES_FD_EVENT_NONE\fP (0) if there are no events for a given file
+descriptor if an integrator wishes to simply maintain an array with all
+possible file descriptors and update readiness via the \fIevent\fP member.
+
+This function will return \fIARES_ENOMEM\fP in out of memory conditions,
+otherwise will return \fIARES_SUCCESS\fP.
+
+This function is recommended over \fBares_process_fd(3)\fP since it can
+handle processing of multiple file descriptors at once, thus skipping repeating
+additional logic such as timeout processing which would be required if calling
+\fBares_process_fd(3)\fP for multiple file descriptors notified at the same
+time.
+
+This function is typically used with the \fIARES_OPT_SOCK_STATE_CB\fP option.
+
+\fBares_timeout(3)\fP should be used to retrieve the desired timeout, and when
+the timeout expires, the integrator must call \fBares_process_fds(3)\fP with
+a NULL \fIevents\fP array. (or \fBares_process_fd(3)\fP with both sockets set
+to \fIARES_SOCKET_BAD\fP). There is no need to do this if events are also
+delivered for any file descriptors as timeout processing will automatically be
+handled by any call to \fBares_process_fds(3)\fP or \fBares_process_fd(3)\fP.
+
+The \fBares_process_fd(3)\fP function is the same as \fBares_process_fds(3)\fP
+except can only process a single read and write file descriptor at a time.
+New integrators should use \fBares_process_fds(3)\fP if possible.
+
+The \fBares_process(3)\fP function works in the same manner, except it works
+on \fIfd_sets\fP as is used by \fBselect(3)\fP and retrieved by
+\fBares_fds(3)\fP.  This method is deprecated and should not be used in modern
+applications due to known limitations to the \fBselect(3)\fP implementation.
+
+.SH AVAILABILITY
+\fBares_process_fds(3)\fP was introduced in c-ares 1.34.0.
 
-.nf
-int nfds, count;
-fd_set readers, writers;
-struct timeval tv, *tvp;
-
-while (1) {
-  FD_ZERO(&readers);
-  FD_ZERO(&writers);
-  nfds = ares_fds(channel, &readers, &writers);
-  if (nfds == 0)
-    break;
-  tvp = ares_timeout(channel, NULL, &tv);
-  count = select(nfds, &readers, &writers, NULL, tvp);
-  ares_process(channel, &readers, &writers);
-}
-.fi
 .SH SEE ALSO
 .BR ares_fds (3),
-.BR ares_timeout (3)
+.BR ares_timeout (3),
+.BR ares_init_options (3)
+with \fIARES_OPT_EVENT_THREAD\fP or \fIARES_OPT_SOCK_STATE_CB\fP
diff --git a/deps/cares/docs/ares_process_fd.3 b/deps/cares/docs/ares_process_fd.3
new file mode 100644
index 00000000000000..94e50f41a91717
--- /dev/null
+++ b/deps/cares/docs/ares_process_fd.3
@@ -0,0 +1,3 @@
+.\" Copyright (C) 2023 The c-ares project and its contributors.
+.\" SPDX-License-Identifier: MIT
+.so man3/ares_process.3
diff --git a/deps/cares/docs/ares_process_fds.3 b/deps/cares/docs/ares_process_fds.3
new file mode 100644
index 00000000000000..94e50f41a91717
--- /dev/null
+++ b/deps/cares/docs/ares_process_fds.3
@@ -0,0 +1,3 @@
+.\" Copyright (C) 2023 The c-ares project and its contributors.
+.\" SPDX-License-Identifier: MIT
+.so man3/ares_process.3
diff --git a/deps/cares/docs/ares_process_pending_write.3 b/deps/cares/docs/ares_process_pending_write.3
new file mode 100644
index 00000000000000..90843341950bbb
--- /dev/null
+++ b/deps/cares/docs/ares_process_pending_write.3
@@ -0,0 +1,3 @@
+.\" Copyright (C) 2023 The c-ares project and its contributors.
+.\" SPDX-License-Identifier: MIT
+.so man3/ares_set_pending_write_cb.3
diff --git a/deps/cares/docs/ares_set_local_dev.3 b/deps/cares/docs/ares_set_local_dev.3
index 2e2028f616ae3e..621cf35184fe10 100644
--- a/deps/cares/docs/ares_set_local_dev.3
+++ b/deps/cares/docs/ares_set_local_dev.3
@@ -2,7 +2,7 @@
 .\" Copyright 2010 by Ben Greear <greearb@candelatech.com>
 .\" SPDX-License-Identifier: MIT
 .\"
-.TH ARES_SET_LOCAL_DEV 3 "30 June 2010"
+.TH ARES_SET_LOCAL_DEV 3 "23 September 2024"
 .SH NAME
 ares_set_local_dev \- Bind to a specific network device when creating sockets.
 .SH SYNOPSIS
@@ -15,12 +15,14 @@ void ares_set_local_dev(ares_channel_t *\fIchannel\fP, const char* \fIlocal_dev_
 The \fBares_set_local_dev\fP function causes all future sockets
 to be bound to this device with SO_BINDTODEVICE.  This forces communications
 to go over a certain interface, which can be useful on multi-homed machines.
-This option is only supported on Linux, and root privileges are required
-for the option to work.  If SO_BINDTODEVICE is not supported or the
-setsocktop call fails (probably because of permissions), the error is
+This option is only supported on Linux, the interface must not be already bound to the 
+socket or the current effective user must have CAP_NET_RAW capability in the current
+network namespace for the option to work.  If SO_BINDTODEVICE is not supported or the
+setsockopt call fails (probably because of permissions), the error is
 silently ignored.
 .SH SEE ALSO
 .BR ares_set_local_ip4 (3)
 .BR ares_set_local_ip6 (3)
+.BR network_namespaces (7)
 .SH NOTES
 This function was added in c-ares 1.7.4
diff --git a/deps/cares/docs/ares_set_pending_write_cb.3 b/deps/cares/docs/ares_set_pending_write_cb.3
new file mode 100644
index 00000000000000..3c7712a69d96e4
--- /dev/null
+++ b/deps/cares/docs/ares_set_pending_write_cb.3
@@ -0,0 +1,62 @@
+.\"
+.\" Copyright 2024 by the c-ares project and its contributors
+.\" SPDX-License-Identifier: MIT
+.\"
+.TH ARES_SET_NOTIFY_PENDING_WRITE_CALLBACK 3 "13 Aug 2024"
+.SH NAME
+ares_set_pending_write_cb, ares_process_pending_write \- Function
+for setting a callback which is triggered when there is potential pending data
+which needs to be written.
+.SH SYNOPSIS
+.nf
+#include <ares.h>
+
+typedef void (*ares_pending_write_cb)(void *\fIdata\fP);
+
+void ares_set_pending_write_cb(
+  ares_channel_t        *\fIchannel\fP,
+  ares_pending_write_cb  \fIcallback\fP,
+  void                  *\fIuser_data\fP);
+
+void ares_process_pending_write(ares_channel_t *\fIchannel\fP);
+
+.fi
+
+.SH DESCRIPTION
+The \fBares_set_pending_write_cb(3)\fP function sets a callback
+function \fIcallback\fP in the given ares channel handle \fIchannel\fP that
+is invoked whenever there is new pending TCP data to be written.  Since TCP
+is stream based, if there are multiple queries being enqueued back to back they
+can be sent as one large buffer. Normally a \fBsend(2)\fP syscall operation
+would be triggered for each query.
+
+When setting this callback, an event will be triggered when data is buffered,
+but not written.  This event is used to wake the caller's event loop which
+should call \fBares_process_pending_write(3)\fP using the channel associated
+with the callback.  Each time the callback is triggered must result in a call
+to \fBares_process_pending_write(3)\fP from the caller's event loop otherwise
+stalls and timeouts may occur.  The callback \fBmust not\fP call
+\fBares_process_pending_write(3)\fP directly as otherwise it would invalidate
+any advantage of this use-case.
+
+This is considered an optimization, especially when using TLS-based connections
+which add additional overhead to the data stream.  Due to the asyncronous nature
+of c-ares, there is no way to identify when a caller may be finished enqueuing
+queries via any of the possible public API calls such as
+\fBares_getaddrinfo(3)\fP or \fBares_search_dnsrec(3)\fP, so this is an
+enhancement to try to group query send operations together and will rely on the
+singaling latency involved in waking the user's event loop.
+
+If no callback is set, data will be written immediately to the socket, thus
+bypassing this optimization.
+
+This option cannot be used with \fIARES_OPT_EVENT_THREAD\fP passed to
+\fBares_init_options(3)\fP since the user has no event loop.  This optimization
+is automatically enabled when using the Event Thread as it sets the callback
+for its own internal signaling.
+
+.SH AVAILABILITY
+This function was first introduced in c-ares version 1.34.0.
+
+.SH SEE ALSO
+.BR ares_init_options (3)
diff --git a/deps/cares/docs/ares_set_servers_csv.3 b/deps/cares/docs/ares_set_servers_csv.3
index 875a156bfb1014..f1435143f567a4 100644
--- a/deps/cares/docs/ares_set_servers_csv.3
+++ b/deps/cares/docs/ares_set_servers_csv.3
@@ -29,11 +29,18 @@ simulation but unlikely to be useful in production.
 The \fBares_get_servers_csv\fP retrieves the list of servers in comma delimited
 format.
 
-The input and output format is a comma separated list of servers.  Each server
-entry may contain these forms:
+The input and output format is a comma separated list of servers.  Two formats
+are available, the typical \fBresolv.conf(5)\fP \fInameserver\fP format, as
+well as a \fIURI\fP format.  Both formats can be used at the same time in the
+provided CSV string.
+
+The \fInameserver\fP format is:
+.nf
 
 ip[:port][%iface]
 
+.fi
+.RS 4
 The \fBip\fP may be encapsulated in square brackets ([ ]), and must be if
 using ipv6 and also specifying a port.
 
@@ -42,16 +49,88 @@ The \fBport\fP is optional, and will default to 53 or the value specified in
 
 The \fBiface\fP is specific to IPv6 link-local servers (fe80::/10) and should
 not otherwise be used.
+.RE
+
+\fInameserver\fP format examples:
+.nf
+
+192.168.1.100
+192.168.1.101:53
+[1:2:3::4]:53
+[fe80::1]:53%eth0
+
+.fi
+.PP
+
+The \fIURI\fP format is is made up of these defined schemes:
+.RS 4
+\fIdns://\fP - Normal DNS server (UDP + TCP). We need to be careful not to
+conflict with query params defined in RFC4501 since we'd technically be
+extending this URI scheme. Port defaults to 53.
+
+\fIdns+tls://\fP - DNS over TLS. Port defaults to 853.
+
+\fIdns+https://\fP - DNS over HTTPS. Port defaults to 443.
+.RE
+
+.PP
+Query parameters are defined as below.  Additional parameters may be defined
+in the future.
+
+.RS 4
+\fItcpport\fP - TCP port to use, only for \fIdns://\fP scheme. The port
+specified as part of the authority component of the URI will be used for both
+UDP and TCP by default, this option will override the TCP port.
+
+\fIipaddr\fP - Only for \fIdns+tls://\fP and \fIdns+https://\fP. If the
+authority component of the URI contains a hostname, this is used to specify the
+ip address of the hostname. If not specified, will need to use a non-secure
+server to perform a DNS lookup to retrieve this information. It is always
+recommended to have both the ip address and fully qualified domain name
+specified.
+
+\fIhostname\fP - Only for \fIdns+tls://\fP and \fIdns+https://\fP. If the
+authority component of the URI contains an ip address, this is used to specify
+the fully qualified domain name of the server. If not specified, will need to
+use a non-secure server to perform a DNS reverse lookup to retrieve this
+information. It is always recommended to have both the ip address and fully
+qualified domain name specified.
+
+\fIdomain\fP - If specified, this server is a domain-specific server. Any
+queries for this domain will be routed to this server. Multiple servers may be
+tagged with the same domain.
+.RE
+
+\fIURI\fP format Examples:
+.nf
 
-For example:
+dns://8.8.8.8
+dns://[2001:4860:4860::8888]
+dns://[fe80::b542:84df:1719:65e3%en0]
+dns://192.168.1.1:55
+dns://192.168.1.1?tcpport=1153
+dns://10.0.1.1?domain=myvpn.com
+dns+tls://8.8.8.8?hostname=dns.google
+dns+tls://one.one.one.one?ipaddr=1.1.1.1
+
+.fi
+
+\fBNOTE\fP: While we are defining the scheme for things like domain-specific
+servers, DNS over TLS and DNS over HTTPS, the underlying implementations for
+those features do not yet exist and therefore will result in errors if they are
+attempted to be used.
 
-192.168.1.100,192.168.1.101:53,[1:2:3::4]:53,[fe80::1]:53%eth0
 .PP
 As of c-ares 1.24.0, \fBares_set_servers_csv\fP and \fBares_set_servers_ports_csv\fP
 are identical.  Prior versions would simply omit ports in \fBares_set_servers_csv\fP
 but due to the addition of link local interface support, this difference was
 removed.
 
+.SH EXAMPLE
+.nf
+192.168.1.100,[fe80::1]:53%eth0,dns://192.168.1.1?tcpport=1153
+.fi
+
 .SH RETURN VALUES
 .B ares_set_servers_csv(3)
 and
@@ -81,3 +160,4 @@ returns a string representing the servers configured which must be freed with
 \fBares_set_servers_csv\fP was added in c-ares 1.7.2
 \fBares_set_servers_ports_csv\fP was added in c-ares 1.11.0.
 \fBares_get_servers_csv\fP was added in c-ares 1.24.0.
+\fIURI\fP support was added in c-ares 1.34.0.
diff --git a/deps/cares/docs/ares_set_socket_functions.3 b/deps/cares/docs/ares_set_socket_functions.3
index ab945ed18de86b..8a903dc6521607 100644
--- a/deps/cares/docs/ares_set_socket_functions.3
+++ b/deps/cares/docs/ares_set_socket_functions.3
@@ -1,12 +1,63 @@
 .\" Copyright (C) Daniel Stenberg
 .\" SPDX-License-Identifier: MIT
-.TH ARES_SET_SOCKET_FUNCTIONS 3 "13 Dec 2016"
+.TH ARES_SET_SOCKET_FUNCTIONS 3 "8 Oct 2024"
 .SH NAME
-ares_set_socket_functions \- Set socket io callbacks
+ares_set_socket_functions, ares_set_socket_functions_ex \- Set socket io callbacks
 .SH SYNOPSIS
 .nf
 #include <ares.h>
 
+typedef enum {
+  ARES_SOCKFUNC_FLAG_NONBLOCKING = 1 << 0
+} ares_sockfunc_flags_t;
+
+typedef enum {
+  ARES_SOCKET_OPT_SENDBUF_SIZE,
+  ARES_SOCKET_OPT_RECVBUF_SIZE,
+  ARES_SOCKET_OPT_BIND_DEVICE,
+  ARES_SOCKET_OPT_TCP_FASTOPEN
+} ares_socket_opt_t;
+
+typedef enum {
+  ARES_SOCKET_CONN_TCP_FASTOPEN = 1 << 0
+} ares_socket_connect_flags_t;
+
+typedef enum {
+  ARES_SOCKET_BIND_TCP = 1 << 0,
+  ARES_SOCKET_BIND_CLIENT = 1 << 1
+} ares_socket_bind_flags_t;
+
+struct ares_socket_functions_ex {
+  unsigned int version; /* ABI Version: must be "1" */
+  unsigned int flags;
+
+  ares_socket_t (*asocket)(int domain, int type, int protocol, void *user_data);
+  int (*aclose)(ares_socket_t sock, void *user_data);
+  int (*asetsockopt)(ares_socket_t sock, ares_socket_opt_t opt, const void *val,
+                     ares_socklen_t val_size, void *user_data);
+  int (*aconnect)(ares_socket_t sock, const struct sockaddr *address,
+                  ares_socklen_t address_len, unsigned int flags,
+                  void *user_data);
+  ares_ssize_t (*arecvfrom)(ares_socket_t sock, void *buffer, size_t length,
+                            int flags, struct sockaddr *address,
+                            ares_socklen_t *address_len, void *user_data);
+  ares_ssize_t (*asendto)(ares_socket_t sock, const void *buffer, size_t length,
+                          int flags, const struct sockaddr *address,
+                          ares_socklen_t address_len, void *user_data);
+  int (*agetsockname)(ares_socket_t sock, struct sockaddr *address,
+                      ares_socklen_t *address_len, void *user_data);
+  int (*abind)(ares_socket_t sock, unsigned int flags,
+               const struct sockaddr *address, socklen_t address_len,
+               void *user_data);
+  unsigned int (*aif_nametoindex)(const char *ifname, void *user_data);
+  const char *(*aif_indextoname)(unsigned int ifindex, char *ifname_buf,
+                                 size_t ifname_buf_len, void *user_data);
+};
+
+ares_status_t ares_set_socket_functions_ex(ares_channel_t *channel,
+  const struct ares_socket_functions_ex *funcs, void *user_data);
+
+
 struct ares_socket_functions {
     ares_socket_t (*\fIasocket\fP)(int, int, int, void *);
     int (*\fIaclose\fP)(ares_socket_t, void *);
@@ -22,80 +73,262 @@ void ares_set_socket_functions(ares_channel_t *\fIchannel\fP,
 .fi
 .SH DESCRIPTION
 .PP
-This function sets a set of callback \fIfunctions\fP in the given ares channel handle.
-Cannot be used when \fBARES_OPT_EVENT_THREAD\fP is passed to \fIares_init_options(3)\fP.
-
-These callback functions will be invoked to create/destroy socket objects and perform
-io, instead of the normal system calls. A client application can override normal network
-operation fully through this functionality, and provide its own transport layer. You
-can choose to only implement some of the socket functions, and provide NULL to any
-others and c-ares will use its built-in system functions in that case.
+
+\fBares_set_socket_functions_ex(3)\fP sets a set of callback \fIfunctions\fP in
+the given ares channel handle.  Cannot be used when \fBARES_OPT_EVENT_THREAD\fP
+is passed to \fIares_init_options(3)\fP.  This function replaces the now
+deprecated \fBares_set_socket_functions(3)\fP call.
+
+These callback functions will be invoked to create/destroy socket objects and
+perform io, instead of the normal system calls. A client application can
+override normal network operation fully through this functionality, and provide
+its own transport layer.
+
+Some callbacks may be optional and are documented as such below, but failing
+to implement such callbacks will disable certain features within c-ares.  It
+is strongly recommended to implement all callbacks.
+
+All callback functions are expected to operate like their system equivalents,
+and to set \fBerrno(2)\fP or \fBWSASetLastError(2)\fP to an appropriate error
+code on failure. It is strongly recommended that io callbacks are implemented
+to be asynchronous and indicated as such in the \fIflags\fP member.  The io
+callbacks can return error codes of \fBEAGAIN\fP, \fBEWOULDBLOCK\fP, or
+\fBWSAEWOULDBLOCK\fP when they would otherwise block.
+
+The \fIuser_data\fP value is provided to each callback function invocation to
+serve as context.
+
+The \fBares_set_socket_functions_ex(3)\fP must provide the following structure
+members and callbacks (which are different from the
+\fBares_set_socket_functions(3)\fP members and callbacks):
+
+.RS 4
+.TP 8
+.B unsigned int \fIversion\fP
+.br
+ABI Version of structure.  Must be set to a value of "1".
+
+.TP 8
+.B unsigned int \fIflags\fP
+.br
+Flags available are specified in \fIares_sockfunc_flags_t\fP.
+
+.TP 8
+.B ares_socket_t (*\fIasocket\fP)(int \fIdomain\fP, int \fItype\fP, int \fIprotocol\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Creates an endpoint for communication and returns a descriptor. \fIdomain\fP,
+\fItype\fP, and \fIprotocol\fP each correspond to the parameters of
+\fBsocket(2)\fP. Returns a handle to the newly created socket, or
+\fBARES_SOCKET_BAD\fP on error.
+
+.TP 8
+.B int (*\fIaclose\fP)(ares_socket_t \fIfd\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Closes the socket endpoint indicated by \fIfd\fP. See \fBclose(2)\fP.
+
+.TP 8
+.B int (*\fIasetsockopt\fP)(ares_socket_t \fIfd\fP, ares_socket_opt_t \fIopt\fP, const void * \fIval\fP, ares_socklen_t \fIval_size\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Set socket option.  This shares a similar syntax to the BSD \fIsetsockopt(2)\fP
+call, however c-ares uses different options for portability. The value is
+a pointer to the desired value, and each option has its own data type listed
+in the options below defined in \fIares_socket_opt_t\fP.
+
+.TP 8
+.B int (*\fIaconnect\fP)(ares_socket_t \fIfd\fP, const struct sockaddr * \fIaddr\fP, ares_socklen_t \fIaddr_len\fP, unsigned int \fIflags\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Initiate a connection to the address indicated by \fIaddr\fP on
+a socket. Additional flags controlling behavior are in
+\fIares_socket_connect_flags_t\fP. See \fBconnect(2)\fP.
+
+.TP 8
+.B ares_ssize_t (*\fIarecvfrom\fP)(ares_socket_t \fIfd\fP, void * \fIbuffer\fP, size_t \fIbuf_size\fP, int \fIflags\fP, struct sockaddr * \fIaddr\fP, ares_socklen_t * \fIaddr_len\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Receives data from remote socket endpoint, if available. If the
+\fIaddr\fP parameter is not NULL and the connection protocol provides the source
+address, the callback should fill this in. The \fIflags\fP parameter is
+currently unused. See \fBrecvfrom(2)\fP.
+
+.TP 8
+.B ares_ssize_t (*\fIasendto\fP)(ares_socket_t \fIfd\fP, const void * \fIbuffer\fP, size_t \fIlength\fP, int \fIflags\fP, const struct sockaddr * \fIaddress\fP, ares_socklen_t \fIaddress_len\fP, void * \fIuser_data\fP)
+.br
+\fIREQUIRED\fP. Send data, as provided by the \fIbuffer\fP, to the socket
+endpoint. The \fIflags\fP member may be used on systems that have
+\fBMSG_NOSIGNAL\fP defined but is otherwise unused.  An \fIaddress\fP is
+provided primarily to support TCP FastOpen scenarios, which will be NULL in
+other circumstances. See \fBsendto(2)\fP.
+
+.TP 8
+.B int (*\fIagetsockname\fP)(ares_socket_t \fIfd\fP, struct sockaddr * \fIaddress\fP, ares_socklen_t * \fIaddress_len\fP, void * \fIuser_data\fP)
+.br
+\fIOptional\fP. Retrieve the local address of a socket and store it into the provided
+\fIaddress\fP buffer. May impact DNS Cookies if not provided. See
+\fBgetsockname(2)\fP.
+
+.TP 8
+.B int (*\fIabind\fP)(ares_socket_t \fIfd\fP, unsigned int \fIflags\fP, const struct sockaddr * \fIaddress\fP, ares_socklen_t \fIaddress_len\fP, void * \fIuser_data\fP)
+.br
+\fIOptional\fP. Bind the socket to an address.  This can be used for client
+connections to bind the source address for packets before connect, or
+for server connections to bind to an address and port before listening.
+Currently c-ares only supports client connections.  \fIflags\fP from
+\fIares_socket_bind_flags_t\fP can be specified.  See \fBbind(2)\fP.
+
+.TP 8
+.B unsigned int (*\fIaif_nametoindex\fP)(const char * \fIifname\fP, void * \fIuser_data\fP)
+.br
+\fIOptional\fP. Convert an interface name into the interface index.  If this
+callback is not specified, then IPv6 Link-Local DNS servers cannot be used.
+See \fBif_nametoindex(2)\fP.
+
+.TP 8
+.B const char * (*\fIaif_indextoname\fP)(unsigned int \fIifindex\fP, char * \fIifname_buf\fP, size_t \fIifname_buf_len\fP, void * \fIuser_data\fP)
+.br
+\fIOptional\fP. Convert an interface index into the interface name.  If this
+callback is not specified, then IPv6 Link-Local DNS servers cannot be used.
+\fIifname_buf\fP must be at least \fBIF_NAMESIZE\fP or \fBIFNAMSIZ\fP in size.
+See \fBif_indextoname(2)\fP.
+.RE
+
+.PP
+\fBares_sockfunc_flags_t\fP values:
+
+.RS 4
+.TP 8
+.B \fIARES_SOCKFUNC_FLAG_NONBLOCKING\fP
+.br
+Used to indicate the implementation of the io functions are asynchronous.
+.RE
+
+.PP
+\fBares_socket_opt_t\fP values:
+
+.RS 4
+.TP 8
+.B \fIARES_SOCKET_OPT_SENDBUF_SIZE\fP
+.br
+Set the Send Buffer size.  Value is a pointer to an int. (SO_SNDBUF).
+
+.TP 8
+.B \fIARES_SOCKET_OPT_RECVBUF_SIZE\fP
+.br
+Set the Receive Buffer size.  Value is a pointer to an int. (SO_RCVBUF).
+
+.TP 8
+.B \fIARES_SOCKET_OPT_BIND_DEVICE\fP
+.br
+Set the network interface to use as the source for communication. Value is a C
+string. (SO_BINDTODEVICE)
+
+.TP 8
+.B \fIARES_SOCKET_OPT_TCP_FASTOPEN\fP
+.br
+Enable TCP Fast Open.  Value is a pointer to an \fIares_bool_t\fP.  On some
+systems this could be a no-op if it is known it is on by default and
+return success.  Other systems may be a no-op if known the system does
+not support the feature and returns failure with errno set to \fBENOSYS\fP or
+\fBWSASetLastError(WSAEOPNOTSUPP);\fP.
+.RE
+
+.PP
+\fBares_socket_connect_flags_t\fP values:
+.RS 4
+.TP 8
+.B \fIARES_SOCKET_CONN_TCP_FASTOPEN\fP
+.br
+Connect using TCP Fast Open.
+.RE
+
+.PP
+\fBares_socket_bind_flags_t\fP values:
+
+.RS 4
+.TP 8
+.B \fIARES_SOCKET_BIND_TCP\fP
+.br
+Bind is for a TCP connection.
+
+.TP 19
+.B \fIARES_SOCKET_BIND_CLIENT\fP
+.br
+Bind is for a client connection, not server.
+.RE
+
+.PP
+
+\fBares_set_socket_functions(3)\fP sets a set of callback \fIfunctions\fP in the
+given ares channel handle.  Cannot be used when \fBARES_OPT_EVENT_THREAD\fP is
+passed to \fIares_init_options(3)\fP.  This function is deprecated as of
+c-ares 1.34.0 in favor of \fIares_set_socket_functions_ex(3)\fP.
+
+\fBares_set_socket_functions(3)\fP allows you to choose to only implement
+some of the socket functions, and provide NULL to any others and c-ares will use
+its built-in system functions in that case.
+
 .PP
-All callback functions are expected to operate like their system equivalents, and to
-set
-.BR errno(3)
-to an appropriate error code on failure. C-ares also expects all io functions to behave
-asynchronously, i.e. as if the socket object has been set to non-blocking mode. Thus
-read/write calls (for TCP connections) are expected to often generate
-.BR EAGAIN
-or
-.BR EWOULDBLOCK.
+All callback functions are expected to operate like their system equivalents,
+and to set \fBerrno(2)\fP or \fBWSASetLastError(2)\fP to an appropriate error
+code on failure. It is strongly recommended all io functions behave
+asynchronously and return error codes of \fBEAGAIN\fP, \fBEWOULDBLOCK\fP, or
+\fBWSAEWOULDBLOCK\fP when they would otherwise block.
 
 .PP
-The \fIuser_data\fP value is provided to each callback function invocation to serve as
-context.
+The \fIuser_data\fP value is provided to each callback function invocation to
+serve as context.
 .PP
-The
-.B ares_socket_functions
-must provide the following callbacks:
-.TP 18
-.B \fIasocket\fP
-.B ares_socket_t(*)(int \fIdomain\fP, int \fItype\fP, int \fIprotocol\fP, void * \fIuser_data\fP)
+The \fBares_set_socket_functions(3)\fP must provide the following callbacks (which
+are different from the \fBares_set_socket_functions_ex(3)\fP callbacks):
+
+.RS 4
+.TP 8
+.B ares_socket_t (*\fIasocket\fP)(int \fIdomain\fP, int \fItype\fP, int \fIprotocol\fP, void * \fIuser_data\fP)
 .br
 Creates an endpoint for communication and returns a descriptor. \fIdomain\fP, \fItype\fP, and \fIprotocol\fP
-each correspond to the parameters of
-.BR socket(2).
-Returns ahandle to the newly created socket, or -1 on error.
-.TP 18
-.B \fIaclose\fP
-.B int(*)(ares_socket_t \fIfd\fP, void * \fIuser_data\fP)
-.br
-Closes the socket endpoint indicated by \fIfd\fP. See
-.BR close(2)
-.TP 18
-.B \fIaconnect\fP
-.B int(*)(ares_socket_t \fIfd\fP, const struct sockaddr * \fIaddr\fP, ares_socklen_t \fIaddr_len\fP, void * \fIuser_data\fP)
+each correspond to the parameters of \fBsocket(2)\fP. Returns ahandle to the
+newly created socket, or ARES_SOCKET_BAD on error.
+
+.TP 8
+.B int (*\fIaclose\fP)(ares_socket_t \fIfd\fP, void * \fIuser_data\fP)
+.br
+Closes the socket endpoint indicated by \fIfd\fP. See \fBclose(2)\fP.
+
+.TP 8
+.B int (*\fIaconnect\fP)(ares_socket_t \fIfd\fP, const struct sockaddr * \fIaddr\fP, ares_socklen_t \fIaddr_len\fP, void * \fIuser_data\fP)
 .br
 Initiate a connection to the address indicated by \fIaddr\fP on a socket. See
-.BR connect(2)
+\fBconnect(2)\fP
 
-.TP 18
-.B \fIarecvfrom\fP
-.B ares_ssize_t(*)(ares_socket_t \fIfd\fP, void * \fIbuffer\fP, size_t \fIbuf_size\fP, int \fIflags\fP, struct sockaddr * \fIaddr\fP, ares_socklen_t * \fIaddr_len\fP, void * \fIuser_data\fP)
+.TP 8
+.B ares_ssize_t (*\fIarecvfrom\fP)(ares_socket_t \fIfd\fP, void * \fIbuffer\fP, size_t \fIbuf_size\fP, int \fIflags\fP, struct sockaddr * \fIaddr\fP, ares_socklen_t * \fIaddr_len\fP, void * \fIuser_data\fP)
 .br
-Receives data from remote socket endpoint, if available. If the \fIaddr\fP parameter is not NULL and the connection protocol provides the source address, the callback should fill this in. See
-.BR recvfrom(2)
+Receives data from remote socket endpoint, if available. If the \fIaddr\fP
+parameter is not NULL and the connection protocol provides the source address,
+the callback should fill this in. See \fBrecvfrom(2)\fP
 
-.TP 18
-.B \fIasendv\fP
-.B ares_ssize_t(*)(ares_socket_t \fIfd\fP, const struct iovec * \fIdata\fP, int \fIlen\fP, void * \fIuser_data\fP)
+.TP 8
+.B ares_ssize_t (*\fIasendv\fP)(ares_socket_t \fIfd\fP, const struct iovec * \fIdata\fP, int \fIlen\fP, void * \fIuser_data\fP)
 .br
-Send data, as provided by the iovec array \fIdata\fP, to the socket endpoint. See
-.BR writev(2),
+Send data, as provided by the iovec array \fIdata\fP, to the socket endpoint.
+See \fBwritev(2)\fP
+.RE
 
 .PP
-The
-.B ares_socket_functions
-struct provided is not copied but directly referenced,
-and must thus remain valid through out the channels and any created socket's lifetime.
+The \fBares_set_socket_functions(3)\fP struct provided is not copied but directly
+referenced, and must thus remain valid through out the channels and any created
+socket's lifetime.  However, the \fBares_set_socket_functions_ex(3)\fP struct is
+duplicated and does not need to survive past the call to the function.
+
 .SH AVAILABILITY
-Added in c-ares 1.13.0
+ares_socket_functions added in c-ares 1.13.0, ares_socket_functions_ex added in
+c-ares 1.34.0
 .SH SEE ALSO
 .BR ares_init_options (3),
 .BR socket (2),
 .BR close (2),
 .BR connect (2),
-.BR recv (2),
 .BR recvfrom (2),
-.BR send (2),
+.BR sendto (2),
+.BR bind (2),
+.BR getsockname (2),
+.BR setsockopt (2),
 .BR writev (2)
diff --git a/deps/cares/include/ares.h b/deps/cares/include/ares.h
index 95fc2440a1f662..139c6d66ee90df 100644
--- a/deps/cares/include/ares.h
+++ b/deps/cares/include/ares.h
@@ -460,6 +460,8 @@ typedef void (*ares_server_state_callback)(const char *server_string,
                                            ares_bool_t success, int flags,
                                            void *data);
 
+typedef void (*ares_pending_write_cb)(void *data);
+
 CARES_EXTERN int ares_library_init(int flags);
 
 CARES_EXTERN int ares_library_init_mem(int flags, void *(*amalloc)(size_t size),
@@ -473,6 +475,9 @@ CARES_EXTERN int  ares_library_init_android(jobject connectivity_manager);
 CARES_EXTERN int  ares_library_android_initialized(void);
 #endif
 
+#define CARES_HAVE_ARES_LIBRARY_INIT    1
+#define CARES_HAVE_ARES_LIBRARY_CLEANUP 1
+
 CARES_EXTERN int         ares_library_initialized(void);
 
 CARES_EXTERN void        ares_library_cleanup(void);
@@ -526,6 +531,12 @@ CARES_EXTERN void
                                                  ares_server_state_callback callback,
                                                  void                      *user_data);
 
+CARES_EXTERN void ares_set_pending_write_cb(ares_channel_t       *channel,
+                                            ares_pending_write_cb callback,
+                                            void                 *user_data);
+
+CARES_EXTERN void ares_process_pending_write(ares_channel_t *channel);
+
 CARES_EXTERN int  ares_set_sortlist(ares_channel_t *channel,
                                     const char     *sortstr);
 
@@ -556,10 +567,251 @@ struct ares_socket_functions {
   ares_ssize_t (*asendv)(ares_socket_t, const struct iovec *, int, void *);
 };
 
-CARES_EXTERN void
-             ares_set_socket_functions(ares_channel_t                     *channel,
-                                       const struct ares_socket_functions *funcs,
-                                       void                               *user_data);
+CARES_EXTERN CARES_DEPRECATED_FOR(
+  ares_set_socket_functions_ex) void ares_set_socket_functions(ares_channel_t
+                                                                 *channel,
+                                                               const struct
+                                                               ares_socket_functions
+                                                                    *funcs,
+                                                               void *user_data);
+
+/*! Flags defining behavior of socket functions */
+typedef enum {
+  /*! Strongly recommended to create sockets as non-blocking and set this
+   *  flag */
+  ARES_SOCKFUNC_FLAG_NONBLOCKING = 1 << 0
+} ares_sockfunc_flags_t;
+
+/*! Socket options in request to asetsockopt() in struct
+ *  ares_socket_functions_ex */
+typedef enum {
+  /*! Set the send buffer size. Value is a pointer to an int. (SO_SNDBUF) */
+  ARES_SOCKET_OPT_SENDBUF_SIZE,
+  /*! Set the recv buffer size. Value is a pointer to an int. (SO_RCVBUF) */
+  ARES_SOCKET_OPT_RECVBUF_SIZE,
+  /*! Set the network interface to use as the source for communication.
+   *  Value is a C string. (SO_BINDTODEVICE) */
+  ARES_SOCKET_OPT_BIND_DEVICE,
+  /*! Enable TCP Fast Open.  Value is a pointer to an ares_bool_t.  On some
+   *  systems this could be a no-op if it is known it is on by default and
+   *  return success.  Other systems may be a no-op if known the system does
+   *  not support the feature and returns failure with errno set to ENOSYS or
+   *  WSASetLastError(WSAEOPNOTSUPP).
+   */
+  ARES_SOCKET_OPT_TCP_FASTOPEN
+} ares_socket_opt_t;
+
+/*! Flags for behavior during connect */
+typedef enum {
+  /*! Connect using TCP Fast Open */
+  ARES_SOCKET_CONN_TCP_FASTOPEN = 1 << 0
+} ares_socket_connect_flags_t;
+
+/*! Flags for behavior during bind */
+typedef enum {
+  /*! Bind is for a TCP connection */
+  ARES_SOCKET_BIND_TCP = 1 << 0,
+  /*! Bind is for a client connection, not server */
+  ARES_SOCKET_BIND_CLIENT = 1 << 1
+} ares_socket_bind_flags_t;
+
+/*! Socket functions to call rather than using OS-native functions */
+struct ares_socket_functions_ex {
+  /*! ABI Version: must be "1" */
+  unsigned int version;
+
+  /*! Flags indicating behavior of the subsystem. One or more
+   * ares_sockfunc_flags_t  */
+  unsigned int flags;
+
+  /*! REQUIRED. Create a new socket file descriptor.  The file descriptor must
+   * be opened in non-blocking mode (so that reads and writes never block).
+   * Recommended other options would be to disable signals on write errors
+   * (SO_NOSIGPIPE), Disable the Nagle algorithm on SOCK_STREAM (TCP_NODELAY),
+   * and to automatically close file descriptors on exec (FD_CLOEXEC).
+   *
+   *  \param[in] domain      Socket domain. Valid values are AF_INET, AF_INET6.
+   *  \param[in] type       Socket type. Valid values are SOCK_STREAM (tcp) and
+   *                        SOCK_DGRAM (udp).
+   *  \param[in] protocol   In general this should be ignored, may be passed as
+   *                        0 (use as default for type), or may be IPPROTO_UDP
+   *                        or IPPROTO_TCP.
+   *  \param[in] user_data  Pointer provided to ares_set_socket_functions_ex().
+   *  \return ARES_SOCKET_BAD on error, or socket file descriptor on success.
+   *          On error, it is expected to set errno (or WSASetLastError()) to an
+   *          appropriate reason code such as EAFNOSUPPORT / WSAAFNOSUPPORT. */
+  ares_socket_t (*asocket)(int domain, int type, int protocol, void *user_data);
+
+  /*! REQUIRED. Close a socket file descriptor.
+   *  \param[in] sock      Socket file descriptor returned from asocket.
+   *  \param[in] user_data Pointer provided to ares_set_socket_functions_ex().
+   *  \return 0 on success.  On failure, should set errno (or WSASetLastError)
+   *          to an appropriate code such as EBADF / WSAEBADF */
+  int (*aclose)(ares_socket_t sock, void *user_data);
+
+
+  /*! REQUIRED. Set socket option.  This shares a similar syntax to the BSD
+   *  setsockopt() call, however we use our own options.  The value is typically
+   *  a pointer to the desired value and each option has its own data type it
+   *  will express in the documentation.
+   *
+   * \param[in] sock         Socket file descriptor returned from asocket.
+   * \param[in] opt          Option to set.
+   * \param[in] val          Pointer to value for option.
+   * \param[in] val_size     Size of value.
+   * \param[in] user_data    Pointer provided to
+   * ares_set_socket_functions_ex().
+   * \return Return 0 on success, otherwise -1 should be returned with an
+   *         appropriate errno (or WSASetLastError()) set.  If error is ENOSYS /
+   *         WSAEOPNOTSUPP an error will not be propagated as it will take it
+   *         to mean it is an intentional decision to not support the feature.
+   */
+  int (*asetsockopt)(ares_socket_t sock, ares_socket_opt_t opt, const void *val,
+                     ares_socklen_t val_size, void *user_data);
+
+  /*! REQUIRED. Connect to the remote using the supplied address.  For UDP
+   * sockets this will bind the file descriptor to only send and receive packets
+   * from the remote address provided.
+   *
+   *  \param[in] sock         Socket file descriptor returned from asocket.
+   *  \param[in] address      Address to connect to
+   *  \param[in] address_len  Size of address structure passed
+   *  \param[in] flags        One or more ares_socket_connect_flags_t
+   *  \param[in] user_data    Pointer provided to
+   * ares_set_socket_functions_ex().
+   *  \return Return 0 upon successful establishement, otherwise -1 should be
+   *          returned with an appropriate errno (or WSASetLastError()) set.  It
+   * is generally expected that most TCP connections (not using TCP Fast Open)
+   * will return -1 with an error of EINPROGRESS / WSAEINPROGRESS due to the
+   * non-blocking nature of the connection.  It is then the responsibility of
+   * the implementation to notify of writability on the socket to indicate the
+   * connection has succeeded (or readability on failure to retrieve the
+   * appropriate error).
+   */
+  int (*aconnect)(ares_socket_t sock, const struct sockaddr *address,
+                  ares_socklen_t address_len, unsigned int flags,
+                  void *user_data);
+
+  /*! REQUIRED. Attempt to read data from the remote.
+   *
+   *  \param[in]     sock        Socket file descriptor returned from asocket.
+   *  \param[in,out] buffer      Allocated buffer to place data read from
+   * socket.
+   *  \param[in]     length      Size of buffer
+   *  \param[in]     flags       Unused, always 0.
+   *  \param[in,out] address     Buffer to hold address data was received from.
+   *                             May be NULL if address not desired.
+   *  \param[in,out] address_len Input size of address buffer, output actual
+   *                             written size. Must be NULL if address is NULL.
+   *  \param[in]     user_data   Pointer provided to
+   * ares_set_socket_functions_ex().
+   *  \return -1 on error with appropriate errno (or WSASetLastError()) set,
+   * such as EWOULDBLOCK / EAGAIN / WSAEWOULDBLOCK, or ECONNRESET /
+   * WSAECONNRESET.
+   */
+  ares_ssize_t (*arecvfrom)(ares_socket_t sock, void *buffer, size_t length,
+                            int flags, struct sockaddr *address,
+                            ares_socklen_t *address_len, void *user_data);
+
+  /*! REQUIRED. Attempt to send data to the remote.  Optional address may be
+   * specified which may be useful on unbound UDP sockets (though currently not
+   * used), and TCP FastOpen where the connection is delayed until first write.
+   *
+   *  \param[in]     sock        Socket file descriptor returned from asocket.
+   *  \param[in]     buffer      Containing data to place onto wire.
+   *  \param[in]     length      Size of buffer
+   *  \param[in]     flags       Flags for writing.  Currently only used flag is
+   *                             MSG_NOSIGNAL if the host OS has such a flag. In
+   *                             general flags can be ignored.
+   *  \param[in]     address     Buffer containing address to send data to.  May
+   *                             be NULL.
+   *  \param[in,out] address_len Size of address buffer.  Must be 0 if address
+   *                             is NULL.
+   *  \param[in]     user_data   Pointer provided to
+   * ares_set_socket_functions_ex().
+   *  \return Number of bytes written. -1 on error with appropriate errno (or
+   * WSASetLastError()) set.
+   */
+  ares_ssize_t (*asendto)(ares_socket_t sock, const void *buffer, size_t length,
+                          int flags, const struct sockaddr *address,
+                          ares_socklen_t address_len, void *user_data);
+
+  /*! Optional. Retrieve the local address of the socket.
+   *
+   *  \param[in]     sock        Socket file descriptor returned from asocket
+   *  \param[in,out] address     Buffer to hold address
+   *  \param[in,out] address_len Size of address buffer on input, written size
+   * on output.
+   *  \param[in]     user_data   Pointer provided to
+   * ares_set_socket_functions_ex().
+   *  \return 0 on success. -1 on error with an appropriate errno (or
+   * WSASetLastError()) set.
+   */
+  int (*agetsockname)(ares_socket_t sock, struct sockaddr *address,
+                      ares_socklen_t *address_len, void *user_data);
+
+  /*! Optional. Bind the socket to an address.  This can be used for client
+   *  connections to bind the source address for packets before connect, or
+   *  for server connections to bind to an address and port before listening.
+   *  Currently c-ares only supports client connections.
+   *
+   *  \param[in] sock        Socket file descriptor returned from asocket
+   *  \param[in] flags       ares_socket_bind_flags_t flags.
+   *  \param[in] address     Buffer containing address.
+   *  \param[in] address_len Size of address buffer.
+   *  \param[in] user_data   Pointer provided to
+   * ares_set_socket_functions_ex().
+   *  \return 0 on success. -1 on error with an appropriate errno (or
+   * WSASetLastError()) set.
+   */
+  int (*abind)(ares_socket_t sock, unsigned int flags,
+               const struct sockaddr *address, socklen_t address_len,
+               void *user_data);
+
+  /* Optional. Convert an interface name into the interface index.  If this
+   * callback is not specified, then IPv6 Link-Local DNS servers cannot be used.
+   *
+   * \param[in] ifname  Interface Name as NULL-terminated string.
+   * \param[in] user_data Pointer provided to
+   * ares_set_socket_functions_ex().
+   * \return 0 on failure, otherwise interface index.
+   */
+  unsigned int (*aif_nametoindex)(const char *ifname, void *user_data);
+
+  /* Optional. Convert an interface index into the interface name.  If this
+   * callback is not specified, then IPv6 Link-Local DNS servers cannot be used.
+   *
+   * \param[in] ifindex        Interface index, must be > 0
+   * \param[in] ifname_buf     Buffer to hold interface name. Must be at least
+   *                           IFNAMSIZ in length or 32 bytes if IFNAMSIZ isn't
+   *                           defined.
+   * \param[in] ifname_buf_len Size of ifname_buf for verification.
+   * \param[in] user_data      Pointer provided to
+   * ares_set_socket_functions_ex().
+   * \return NULL on failure, otherwise pointer to provided ifname_buf
+   */
+  const char *(*aif_indextoname)(unsigned int ifindex, char *ifname_buf,
+                                 size_t ifname_buf_len, void *user_data);
+};
+
+/*! Override the native socket functions for the OS with the provided set.
+ *  An optional user data thunk may be specified which will be passed to
+ *  each registered callback.  Replaces ares_set_socket_functions().
+ *
+ *  \param[in] channel   An initialized c-ares channel.
+ *  \param[in] funcs     Structure registering the implementations for the
+ *                       various functions.  See the structure definition.
+ *                       This will be duplicated and does not need to exist
+ *                       past the life of this call.
+ *  \param[in] user_data User data thunk which will be passed to each call of
+ *                       the registered callbacks.
+ *  \return ARES_SUCCESS on success, or another error code such as ARES_EFORMERR
+ *          on misuse.
+ */
+CARES_EXTERN ares_status_t ares_set_socket_functions_ex(
+  ares_channel_t *channel, const struct ares_socket_functions_ex *funcs,
+  void *user_data);
+
 
 CARES_EXTERN CARES_DEPRECATED_FOR(ares_send_dnsrec) void ares_send(
   ares_channel_t *channel, const unsigned char *qbuf, int qlen,
@@ -655,12 +907,52 @@ CARES_EXTERN struct timeval *ares_timeout(const ares_channel_t *channel,
                                           struct timeval       *maxtv,
                                           struct timeval       *tv);
 
-CARES_EXTERN CARES_DEPRECATED_FOR(ares_process_fd) void ares_process(
+CARES_EXTERN CARES_DEPRECATED_FOR(ares_process_fds) void ares_process(
   ares_channel_t *channel, fd_set *read_fds, fd_set *write_fds);
 
-CARES_EXTERN void ares_process_fd(ares_channel_t *channel,
-                                  ares_socket_t   read_fd,
-                                  ares_socket_t   write_fd);
+/*! Events used by ares_fd_events_t */
+typedef enum {
+  ARES_FD_EVENT_NONE  = 0,      /*!< No events */
+  ARES_FD_EVENT_READ  = 1 << 0, /*!< Read event (including disconnect/error) */
+  ARES_FD_EVENT_WRITE = 1 << 1  /*!< Write event */
+} ares_fd_eventflag_t;
+
+/*! Type holding a file descriptor and mask of events, used by
+ *  ares_process_fds() */
+typedef struct {
+  ares_socket_t fd;     /*!< File descriptor */
+  unsigned int  events; /*!< Mask of ares_fd_eventflag_t */
+} ares_fd_events_t;
+
+/*! Flags used by ares_process_fds() */
+typedef enum {
+  ARES_PROCESS_FLAG_NONE        = 0,     /*!< No flag value */
+  ARES_PROCESS_FLAG_SKIP_NON_FD = 1 << 0 /*!< skip any processing unrelated to
+                                          *   the file descriptor events passed
+                                          *    in */
+} ares_process_flag_t;
+
+/*! Process events on multiple file descriptors based on the event mask
+ *  associated with each file descriptor.  Recommended over calling
+ *  ares_process_fd() multiple times since it would trigger additional logic
+ *  such as timeout processing on each call.
+ *
+ *  \param[in] channel  Initialized ares channel
+ *  \param[in] events   Array of file descriptors with events.  May be NULL if
+ *                      no events, but may have timeouts to process.
+ *  \param[in] nevents  Number of elements in the events array.  May be 0 if
+ *                      no events, but may have timeouts to process.
+ *  \param[in] flags    Flags to alter behavior of the process command.
+ *  \return ARES_ENOMEM on out of memory, ARES_EFORMERR on misuse,
+ *          otherwise ARES_SUCCESS
+ */
+CARES_EXTERN ares_status_t ares_process_fds(ares_channel_t         *channel,
+                                            const ares_fd_events_t *events,
+                                            size_t nevents, unsigned int flags);
+
+CARES_EXTERN void          ares_process_fd(ares_channel_t *channel,
+                                           ares_socket_t   read_fd,
+                                           ares_socket_t   write_fd);
 
 CARES_EXTERN CARES_DEPRECATED_FOR(ares_dns_record_create) int ares_create_query(
   const char *name, int dnsclass, int type, unsigned short id, int rd,
diff --git a/deps/cares/include/ares_build.h b/deps/cares/include/ares_build.h
index 18a92606a81714..aca66bebcd9b5c 100644
--- a/deps/cares/include/ares_build.h
+++ b/deps/cares/include/ares_build.h
@@ -165,7 +165,6 @@
 #  define CARES_TYPEOF_ARES_SOCKLEN_T int
 
 #elif defined(_WIN32)
-#  define WIN32_LEAN_AND_MEAN
 #  define CARES_TYPEOF_ARES_SOCKLEN_T int
 #  define CARES_HAVE_WINDOWS_H          1
 #  define CARES_HAVE_SYS_TYPES_H        1
diff --git a/deps/cares/include/ares_version.h b/deps/cares/include/ares_version.h
index c910d79209a3fb..ba98e6949d53c8 100644
--- a/deps/cares/include/ares_version.h
+++ b/deps/cares/include/ares_version.h
@@ -31,14 +31,21 @@
 #define ARES_COPYRIGHT "2004 - 2024 Daniel Stenberg, <daniel@haxx.se>."
 
 #define ARES_VERSION_MAJOR 1
-#define ARES_VERSION_MINOR 33
+#define ARES_VERSION_MINOR 34
 #define ARES_VERSION_PATCH 1
+
 #define ARES_VERSION                                        \
   ((ARES_VERSION_MAJOR << 16) | (ARES_VERSION_MINOR << 8) | \
    (ARES_VERSION_PATCH))
-#define ARES_VERSION_STR "1.33.1"
 
-#define CARES_HAVE_ARES_LIBRARY_INIT    1
-#define CARES_HAVE_ARES_LIBRARY_CLEANUP 1
+
+/* Need a level of indirection due to argument prescan to stringify a macro
+ * value. */
+#define ARES_STRINGIFY_PRE(s) #s
+#define ARES_STRINGIFY(s)     ARES_STRINGIFY_PRE(s)
+
+#define ARES_VERSION_STR             \
+  ARES_STRINGIFY(ARES_VERSION_MAJOR) \
+  "." ARES_STRINGIFY(ARES_VERSION_MINOR) "." ARES_STRINGIFY(ARES_VERSION_PATCH)
 
 #endif
diff --git a/deps/cares/m4/libtool.m4 b/deps/cares/m4/libtool.m4
index 79a2451ef520f2..e5ddacee99c5cd 100644
--- a/deps/cares/m4/libtool.m4
+++ b/deps/cares/m4/libtool.m4
@@ -1,6 +1,6 @@
 # libtool.m4 - Configure libtool for the host system. -*-Autoconf-*-
 #
-#   Copyright (C) 1996-2001, 2003-2019, 2021-2022 Free Software
+#   Copyright (C) 1996-2001, 2003-2019, 2021-2024 Free Software
 #   Foundation, Inc.
 #   Written by Gordon Matzigkeit, 1996
 #
@@ -9,13 +9,13 @@
 # modifications, as long as this notice is preserved.
 
 m4_define([_LT_COPYING], [dnl
-# Copyright (C) 2014 Free Software Foundation, Inc.
+# Copyright (C) 2024 Free Software Foundation, Inc.
 # This is free software; see the source for copying conditions.  There is NO
 # warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 # GNU Libtool is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of of the License, or
+# the Free Software Foundation; either version 2 of the License, or
 # (at your option) any later version.
 #
 # As a special exception to the GNU General Public License, if you
@@ -32,7 +32,7 @@ m4_define([_LT_COPYING], [dnl
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 ])
 
-# serial 59 LT_INIT
+# serial 62 LT_INIT
 
 
 # LT_PREREQ(VERSION)
@@ -60,7 +60,7 @@ esac
 # LT_INIT([OPTIONS])
 # ------------------
 AC_DEFUN([LT_INIT],
-[AC_PREREQ([2.62])dnl We use AC_PATH_PROGS_FEATURE_CHECK
+[AC_PREREQ([2.64])dnl We use AC_PATH_PROGS_FEATURE_CHECK
 AC_REQUIRE([AC_CONFIG_AUX_DIR_DEFAULT])dnl
 AC_BEFORE([$0], [LT_LANG])dnl
 AC_BEFORE([$0], [LT_OUTPUT])dnl
@@ -616,7 +616,7 @@ m4_popdef([AS_MESSAGE_LOG_FD])])])# _LT_GENERATED_FILE_INIT
 # LT_OUTPUT
 # ---------
 # This macro allows early generation of the libtool script (before
-# AC_OUTPUT is called), incase it is used in configure for compilation
+# AC_OUTPUT is called), in case it is used in configure for compilation
 # tests.
 AC_DEFUN([LT_OUTPUT],
 [: ${CONFIG_LT=./config.lt}
@@ -651,9 +651,9 @@ m4_ifset([AC_PACKAGE_NAME], [AC_PACKAGE_NAME ])config.lt[]dnl
 m4_ifset([AC_PACKAGE_VERSION], [ AC_PACKAGE_VERSION])
 configured by $[0], generated by m4_PACKAGE_STRING.
 
-Copyright (C) 2011 Free Software Foundation, Inc.
+Copyright (C) 2024 Free Software Foundation, Inc.
 This config.lt script is free software; the Free Software Foundation
-gives unlimited permision to copy, distribute and modify it."
+gives unlimited permission to copy, distribute and modify it."
 
 while test 0 != $[#]
 do
@@ -730,7 +730,6 @@ _LT_CONFIG_SAVE_COMMANDS([
     cat <<_LT_EOF >> "$cfgfile"
 #! $SHELL
 # Generated automatically by $as_me ($PACKAGE) $VERSION
-# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`:
 # NOTE: Changes made to this file will be lost: look at ltmain.sh.
 
 # Provide generalized library-building support services.
@@ -975,6 +974,7 @@ _lt_linker_boilerplate=`cat conftest.err`
 $RM -r conftest*
 ])# _LT_LINKER_BOILERPLATE
 
+
 # _LT_REQUIRED_DARWIN_CHECKS
 # -------------------------
 m4_defun_once([_LT_REQUIRED_DARWIN_CHECKS],[
@@ -1025,6 +1025,21 @@ m4_defun_once([_LT_REQUIRED_DARWIN_CHECKS],[
 	rm -f conftest.*
       fi])
 
+    # Feature test to disable chained fixups since it is not
+    # compatible with '-undefined dynamic_lookup'
+    AC_CACHE_CHECK([for -no_fixup_chains linker flag],
+      [lt_cv_support_no_fixup_chains],
+      [ save_LDFLAGS=$LDFLAGS
+        LDFLAGS="$LDFLAGS -Wl,-no_fixup_chains"
+        AC_LINK_IFELSE(
+          [AC_LANG_PROGRAM([],[])],
+          lt_cv_support_no_fixup_chains=yes,
+          lt_cv_support_no_fixup_chains=no
+        )
+        LDFLAGS=$save_LDFLAGS
+      ]
+    )
+
     AC_CACHE_CHECK([for -exported_symbols_list linker flag],
       [lt_cv_ld_exported_symbols_list],
       [lt_cv_ld_exported_symbols_list=no
@@ -1049,7 +1064,7 @@ _LT_EOF
       echo "$RANLIB libconftest.a" >&AS_MESSAGE_LOG_FD
       $RANLIB libconftest.a 2>&AS_MESSAGE_LOG_FD
       cat > conftest.c << _LT_EOF
-int main() { return 0;}
+int main(void) { return 0;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&AS_MESSAGE_LOG_FD
       $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err
@@ -1074,7 +1089,11 @@ _LT_EOF
         10.[[012]],*|,*powerpc*-darwin[[5-8]]*)
           _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
         *)
-          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
+          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup'
+          if test yes = "$lt_cv_support_no_fixup_chains"; then
+            AS_VAR_APPEND([_lt_dar_allow_undefined], [' $wl-no_fixup_chains'])
+          fi
+        ;;
       esac
     ;;
   esac
@@ -1256,7 +1275,9 @@ lt_sysroot=
 case $with_sysroot in #(
  yes)
    if test yes = "$GCC"; then
-     lt_sysroot=`$CC --print-sysroot 2>/dev/null`
+     # Trim trailing / since we'll always append absolute paths and we want
+     # to avoid //, if only for less confusing output for the user.
+     lt_sysroot=`$CC --print-sysroot 2>/dev/null | $SED 's:/\+$::'`
    fi
    ;; #(
  /*)
@@ -1368,7 +1389,7 @@ mips64*-*linux*)
   ;;
 
 x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \
-s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
+s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
   # Find out what ABI is being produced by ac_compile, and set linker
   # options accordingly.  Note that the listed cases only cover the
   # situations where additional linker options are needed (such as when
@@ -1383,7 +1404,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_i386_fbsd"
 	    ;;
-	  x86_64-*linux*)
+	  x86_64-*linux*|x86_64-gnu*)
 	    case `$FILECMD conftest.o` in
 	      *x86-64*)
 		LD="${LD-ld} -m elf32_x86_64"
@@ -1412,7 +1433,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_x86_64_fbsd"
 	    ;;
-	  x86_64-*linux*)
+	  x86_64-*linux*|x86_64-gnu*)
 	    LD="${LD-ld} -m elf_x86_64"
 	    ;;
 	  powerpcle-*linux*)
@@ -1495,7 +1516,7 @@ _LT_DECL([], [AR], [1], [The archiver])
 
 # Use ARFLAGS variable as AR's operation code to sync the variable naming with
 # Automake.  If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have
-# higher priority because thats what people were doing historically (setting
+# higher priority because that's what people were doing historically (setting
 # ARFLAGS for automake and AR_FLAGS for libtool).  FIXME: Make the AR_FLAGS
 # variable obsoleted/removed.
 
@@ -1545,7 +1566,7 @@ AC_CHECK_TOOL(STRIP, strip, :)
 test -z "$STRIP" && STRIP=:
 _LT_DECL([], [STRIP], [1], [A symbol stripping program])
 
-AC_CHECK_TOOL(RANLIB, ranlib, :)
+AC_REQUIRE([AC_PROG_RANLIB])
 test -z "$RANLIB" && RANLIB=:
 _LT_DECL([], [RANLIB], [1],
     [Commands used to install an old-style archive])
@@ -1556,15 +1577,8 @@ old_postinstall_cmds='chmod 644 $oldlib'
 old_postuninstall_cmds=
 
 if test -n "$RANLIB"; then
-  case $host_os in
-  bitrig* | openbsd*)
-    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib"
-    ;;
-  *)
-    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
-    ;;
-  esac
   old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib"
+  old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
 fi
 
 case $host_os in
@@ -1703,7 +1717,7 @@ AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
     lt_cv_sys_max_cmd_len=-1;
     ;;
 
-  cygwin* | mingw* | cegcc*)
+  cygwin* | mingw* | windows* | cegcc*)
     # On Win9x/ME, this test blows up -- it succeeds, but takes
     # about 5 minutes as the teststring grows exponentially.
     # Worse, since 9x/ME are not pre-emptively multitasking,
@@ -1725,7 +1739,7 @@ AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
     lt_cv_sys_max_cmd_len=8192;
     ;;
 
-  bitrig* | darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
+  darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
     # This has been around since 386BSD, at least.  Likely further.
     if test -x /sbin/sysctl; then
       lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax`
@@ -1885,11 +1899,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord () __attribute__((visibility("default")));
+int fnord (void) __attribute__((visibility("default")));
 #endif
 
-int fnord () { return 42; }
-int main ()
+int fnord (void) { return 42; }
+int main (void)
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -1946,7 +1960,7 @@ else
     lt_cv_dlopen_self=yes
     ;;
 
-  mingw* | pw32* | cegcc*)
+  mingw* | windows* | pw32* | cegcc*)
     lt_cv_dlopen=LoadLibrary
     lt_cv_dlopen_libs=
     ;;
@@ -2314,7 +2328,7 @@ if test yes = "$GCC"; then
     *) lt_awk_arg='/^libraries:/' ;;
   esac
   case $host_os in
-    mingw* | cegcc*) lt_sed_strip_eq='s|=\([[A-Za-z]]:\)|\1|g' ;;
+    mingw* | windows* | cegcc*) lt_sed_strip_eq='s|=\([[A-Za-z]]:\)|\1|g' ;;
     *) lt_sed_strip_eq='s|=/|/|g' ;;
   esac
   lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq`
@@ -2372,7 +2386,7 @@ BEGIN {RS = " "; FS = "/|\n";} {
   # AWK program above erroneously prepends '/' to C:/dos/paths
   # for these hosts.
   case $host_os in
-    mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
+    mingw* | windows* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
       $SED 's|/\([[A-Za-z]]:\)|\1|g'` ;;
   esac
   sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP`
@@ -2447,7 +2461,7 @@ aix[[4-9]]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -2541,7 +2555,7 @@ bsdi[[45]]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -2552,6 +2566,19 @@ cygwin* | mingw* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
+    # If user builds GCC with mulitlibs enabled,
+    # it should just install on $(libdir)
+    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
+    if test yes = $multilib; then
+    postinstall_cmds='base_file=`basename \$file`~
+      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
+      dldir=$destdir/`dirname \$dlpath`~
+      $install_prog $dir/$dlname $destdir/$dlname~
+      chmod a+x $destdir/$dlname~
+      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
+        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
+      fi'
+    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -2561,6 +2588,7 @@ cygwin* | mingw* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
+    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -2573,7 +2601,7 @@ cygwin* | mingw* | pw32* | cegcc*)
 m4_if([$1], [],[
       sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api"])
       ;;
-    mingw* | cegcc*)
+    mingw* | windows* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
       ;;
@@ -2592,7 +2620,7 @@ m4_if([$1], [],[
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw*)
+    mingw* | windows*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -2699,7 +2727,21 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
-  shlibpath_var=LD_LIBRARY_PATH
+  case $host_cpu in
+    powerpc64)
+      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
+      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
+      AC_COMPILE_IFELSE(
+        [AC_LANG_SOURCE(
+           [[int test_pointer_size[sizeof (void *) - 5];
+           ]])],
+        [shlibpath_var=LD_LIBRARY_PATH],
+        [shlibpath_var=LD_32_LIBRARY_PATH])
+      ;;
+    *)
+      shlibpath_var=LD_LIBRARY_PATH
+      ;;
+  esac
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -2840,7 +2882,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext'
+  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -2852,8 +2894,9 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # Don't embed -rpath directories since the linker doesn't support them.
-  _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
+  # -rpath works at least for libraries that are not overridden by
+  # libraries installed in system locations.
+  _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='$wl-rpath $wl$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -2887,7 +2930,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directores which are
+  # Ideally, we could use ldconfig to report *all* directories which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -2944,7 +2987,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd* | bitrig*)
+openbsd*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -3276,7 +3319,7 @@ if test yes = "$GCC"; then
   # Check if gcc -print-prog-name=ld gives a path.
   AC_MSG_CHECKING([for ld used by $CC])
   case $host in
-  *-*-mingw*)
+  *-*-mingw* | *-*-windows*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -3385,7 +3428,7 @@ case $reload_flag in
 esac
 reload_cmds='$LD$reload_flag -o $output$reload_objs'
 case $host_os in
-  cygwin* | mingw* | pw32* | cegcc*)
+  cygwin* | mingw* | windows* | pw32* | cegcc*)
     if test yes != "$GCC"; then
       reload_cmds=false
     fi
@@ -3457,7 +3500,6 @@ lt_cv_deplibs_check_method='unknown'
 # 'none' -- dependencies not supported.
 # 'unknown' -- same as none, but documents that we really don't know.
 # 'pass_all' -- all dependencies passed with no checks.
-# 'test_compile' -- check by making test program.
 # 'file_magic [[regex]]' -- check by looking for files in library path
 # that responds to the $file_magic_cmd with a given extended regex.
 # If you have 'file' or equivalent on your system and you're not sure
@@ -3484,7 +3526,7 @@ cygwin*)
   lt_cv_file_magic_cmd='func_win32_libid'
   ;;
 
-mingw* | pw32*)
+mingw* | windows* | pw32*)
   # Base MSYS/MinGW do not provide the 'file' command needed by
   # func_win32_libid shell function, so use a weaker test based on 'objdump',
   # unless we find 'file', for example because we are cross-compiling.
@@ -3493,7 +3535,7 @@ mingw* | pw32*)
     lt_cv_file_magic_cmd='func_win32_libid'
   else
     # Keep this pattern in sync with the one in func_win32_libid.
-    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)'
+    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64|pe-aarch64)'
     lt_cv_file_magic_cmd='$OBJDUMP -f'
   fi
   ;;
@@ -3584,7 +3626,7 @@ newos6*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-openbsd* | bitrig*)
+openbsd*)
   if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then
     lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|\.so|_pic\.a)$'
   else
@@ -3648,7 +3690,7 @@ file_magic_glob=
 want_nocaseglob=no
 if test "$build" = "$host"; then
   case $host_os in
-  mingw* | pw32*)
+  mingw* | windows* | pw32*)
     if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then
       want_nocaseglob=yes
     else
@@ -3700,7 +3742,7 @@ else
 	# Tru64's nm complains that /dev/null is an invalid object file
 	# MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty
 	case $build_os in
-	mingw*) lt_bad_file=conftest.nm/nofile ;;
+	mingw* | windows*) lt_bad_file=conftest.nm/nofile ;;
 	*) lt_bad_file=/dev/null ;;
 	esac
 	case `"$tmp_nm" -B $lt_bad_file 2>&1 | $SED '1q'` in
@@ -3791,7 +3833,7 @@ lt_cv_sharedlib_from_linklib_cmd,
 [lt_cv_sharedlib_from_linklib_cmd='unknown'
 
 case $host_os in
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   # two different shell functions defined in ltmain.sh;
   # decide which one to use based on capabilities of $DLLTOOL
   case `$DLLTOOL --help 2>&1` in
@@ -3823,16 +3865,16 @@ _LT_DECL([], [sharedlib_from_linklib_cmd], [1],
 m4_defun([_LT_PATH_MANIFEST_TOOL],
 [AC_CHECK_TOOL(MANIFEST_TOOL, mt, :)
 test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt
-AC_CACHE_CHECK([if $MANIFEST_TOOL is a manifest tool], [lt_cv_path_mainfest_tool],
-  [lt_cv_path_mainfest_tool=no
+AC_CACHE_CHECK([if $MANIFEST_TOOL is a manifest tool], [lt_cv_path_manifest_tool],
+  [lt_cv_path_manifest_tool=no
   echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&AS_MESSAGE_LOG_FD
   $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out
   cat conftest.err >&AS_MESSAGE_LOG_FD
   if $GREP 'Manifest Tool' conftest.out > /dev/null; then
-    lt_cv_path_mainfest_tool=yes
+    lt_cv_path_manifest_tool=yes
   fi
   rm -f conftest*])
-if test yes != "$lt_cv_path_mainfest_tool"; then
+if test yes != "$lt_cv_path_manifest_tool"; then
   MANIFEST_TOOL=:
 fi
 _LT_DECL([], [MANIFEST_TOOL], [1], [Manifest tool])dnl
@@ -3861,7 +3903,7 @@ AC_DEFUN([LT_LIB_M],
 [AC_REQUIRE([AC_CANONICAL_HOST])dnl
 LIBM=
 case $host in
-*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-pw32* | *-*-darwin*)
+*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-mingw* | *-*-pw32* | *-*-darwin*)
   # These system don't have libm, or don't need it
   ;;
 *-ncr-sysv4.3*)
@@ -3936,7 +3978,7 @@ case $host_os in
 aix*)
   symcode='[[BCDT]]'
   ;;
-cygwin* | mingw* | pw32* | cegcc*)
+cygwin* | mingw* | windows* | pw32* | cegcc*)
   symcode='[[ABCDGISTW]]'
   ;;
 hpux*)
@@ -3951,7 +3993,7 @@ osf*)
   symcode='[[BCDEGQRST]]'
   ;;
 solaris*)
-  symcode='[[BDRT]]'
+  symcode='[[BCDRT]]'
   ;;
 sco3.2v5*)
   symcode='[[DT]]'
@@ -4015,7 +4057,7 @@ $lt_c_name_lib_hook\
 # Handle CRLF in mingw tool chain
 opt_cr=
 case $build_os in
-mingw*)
+mingw* | windows*)
   opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp
   ;;
 esac
@@ -4066,7 +4108,7 @@ void nm_test_func(void){}
 #ifdef __cplusplus
 }
 #endif
-int main(){nm_test_var='a';nm_test_func();return(0);}
+int main(void){nm_test_var='a';nm_test_func();return(0);}
 _LT_EOF
 
   if AC_TRY_EVAL(ac_compile); then
@@ -4242,7 +4284,7 @@ m4_if([$1], [CXX], [
     beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
       # PIC is the default for these OSes.
       ;;
-    mingw* | cygwin* | os2* | pw32* | cegcc*)
+    mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -4318,7 +4360,7 @@ m4_if([$1], [CXX], [
 	  ;;
 	esac
 	;;
-      mingw* | cygwin* | os2* | pw32* | cegcc*)
+      mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
 	# This hack is so that the source file can tell whether it is being
 	# built for inclusion in a dll (and should export symbols for example).
 	m4_if([$1], [GCJ], [],
@@ -4566,7 +4608,7 @@ m4_if([$1], [CXX], [
       # PIC is the default for these OSes.
       ;;
 
-    mingw* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -4670,7 +4712,7 @@ m4_if([$1], [CXX], [
       esac
       ;;
 
-    mingw* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       m4_if([$1], [GCJ], [],
@@ -4712,6 +4754,12 @@ m4_if([$1], [CXX], [
 	_LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
 	_LT_TAGVAR(lt_prog_compiler_static, $1)='-static'
         ;;
+      *flang* | ftn)
+        # Flang compiler.
+	_LT_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
+	_LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
+	_LT_TAGVAR(lt_prog_compiler_static, $1)='-static'
+        ;;
       # icc used to be incompatible with GCC.
       # ICC 10 doesn't accept -KPIC any more.
       icc* | ifort*)
@@ -4945,7 +4993,7 @@ m4_if([$1], [CXX], [
   pw32*)
     _LT_TAGVAR(export_symbols_cmds, $1)=$ltdll_cmds
     ;;
-  cygwin* | mingw* | cegcc*)
+  cygwin* | mingw* | windows* | cegcc*)
     case $cc_basename in
     cl* | icl*)
       _LT_TAGVAR(exclude_expsyms, $1)='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*'
@@ -5003,7 +5051,7 @@ dnl Note also adjust exclude_expsyms for C++ above.
   extract_expsyms_cmds=
 
   case $host_os in
-  cygwin* | mingw* | pw32* | cegcc*)
+  cygwin* | mingw* | windows* | pw32* | cegcc*)
     # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time
     # When not using gcc, we currently assume that we are using
     # Microsoft Visual C++ or Intel C++ Compiler.
@@ -5015,7 +5063,7 @@ dnl Note also adjust exclude_expsyms for C++ above.
     # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC)
     with_gnu_ld=yes
     ;;
-  openbsd* | bitrig*)
+  openbsd*)
     with_gnu_ld=no
     ;;
   esac
@@ -5118,7 +5166,7 @@ _LT_EOF
       fi
       ;;
 
-    cygwin* | mingw* | pw32* | cegcc*)
+    cygwin* | mingw* | windows* | pw32* | cegcc*)
       # _LT_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless,
       # as there is no search path for DLLs.
       _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
@@ -5174,7 +5222,7 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      _LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      _LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       _LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
       _LT_TAGVAR(file_list_spec, $1)='@'
       ;;
@@ -5575,7 +5623,7 @@ _LT_EOF
       _LT_TAGVAR(export_dynamic_flag_spec, $1)=-rdynamic
       ;;
 
-    cygwin* | mingw* | pw32* | cegcc*)
+    cygwin* | mingw* | windows* | pw32* | cegcc*)
       # When not using gcc, we currently assume that we are using
       # Microsoft Visual C++ or Intel C++ Compiler.
       # hardcode_libdir_flag_spec is actually meaningless, as there is
@@ -5592,14 +5640,14 @@ _LT_EOF
 	# Tell ltmain to make .dll files, not .so files.
 	shrext_cmds=.dll
 	# FIXME: Setting linknames here is a bad hack.
-	_LT_TAGVAR(archive_cmds, $1)='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
+	_LT_TAGVAR(archive_cmds, $1)='$CC -Fe $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
 	_LT_TAGVAR(archive_expsym_cmds, $1)='if _LT_DLL_DEF_P([$export_symbols]); then
             cp "$export_symbols" "$output_objdir/$soname.def";
             echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp";
           else
             $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp;
           fi~
-          $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
+          $CC -Fe $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
           linknames='
 	# The linker will not automatically build a static lib if we build a DLL.
 	# _LT_TAGVAR(old_archive_from_new_cmds, $1)='true'
@@ -5837,7 +5885,7 @@ _LT_EOF
     *nto* | *qnx*)
       ;;
 
-    openbsd* | bitrig*)
+    openbsd*)
       if test -f /usr/libexec/ld.so; then
 	_LT_TAGVAR(hardcode_direct, $1)=yes
 	_LT_TAGVAR(hardcode_shlibpath_var, $1)=no
@@ -5880,7 +5928,7 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      _LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      _LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       _LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
       _LT_TAGVAR(file_list_spec, $1)='@'
       ;;
@@ -6174,7 +6222,7 @@ _LT_TAGDECL([], [hardcode_direct], [0],
 _LT_TAGDECL([], [hardcode_direct_absolute], [0],
     [Set to "yes" if using DIR/libNAME$shared_ext during linking hardcodes
     DIR into the resulting binary and the resulting library dependency is
-    "absolute", i.e impossible to change by setting $shlibpath_var if the
+    "absolute", i.e. impossible to change by setting $shlibpath_var if the
     library is relocated])
 _LT_TAGDECL([], [hardcode_minus_L], [0],
     [Set to "yes" if using the -LDIR flag during linking hardcodes DIR
@@ -6232,7 +6280,7 @@ _LT_TAGVAR(objext, $1)=$objext
 lt_simple_compile_test_code="int some_variable = 0;"
 
 # Code to be used in simple link tests
-lt_simple_link_test_code='int main(){return(0);}'
+lt_simple_link_test_code='int main(void){return(0);}'
 
 _LT_TAG_COMPILER
 # Save the default compiler, since it gets overwritten when the other
@@ -6421,8 +6469,7 @@ if test yes != "$_lt_caught_CXX_error"; then
         wlarc='$wl'
 
         # ancient GNU ld didn't support --whole-archive et. al.
-        if eval "`$CC -print-prog-name=ld` --help 2>&1" |
-	  $GREP 'no-whole-archive' > /dev/null; then
+        if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then
           _LT_TAGVAR(whole_archive_flag_spec, $1)=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive'
         else
           _LT_TAGVAR(whole_archive_flag_spec, $1)=
@@ -6442,7 +6489,7 @@ if test yes != "$_lt_caught_CXX_error"; then
       # Commands to make compiler produce verbose output that lists
       # what "hidden" libraries, object files and flags are used when
       # linking a shared library.
-      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
 
     else
       GXX=no
@@ -6651,7 +6698,7 @@ if test yes != "$_lt_caught_CXX_error"; then
         esac
         ;;
 
-      cygwin* | mingw* | pw32* | cegcc*)
+      cygwin* | mingw* | windows* | pw32* | cegcc*)
 	case $GXX,$cc_basename in
 	,cl* | no,cl* | ,icl* | no,icl*)
 	  # Native MSVC or ICC
@@ -6750,7 +6797,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	  cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	  $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	  emximp -o $lib $output_objdir/$libname.def'
-	_LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+	_LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
 	_LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
 	_LT_TAGVAR(file_list_spec, $1)='@'
 	;;
@@ -6818,7 +6865,7 @@ if test yes != "$_lt_caught_CXX_error"; then
             # explicitly linking system object files so we need to strip them
             # from the output so that they don't get included in the library
             # dependencies.
-            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "[[-]]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
             ;;
           *)
             if test yes = "$GXX"; then
@@ -6883,7 +6930,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	    # explicitly linking system object files so we need to strip them
 	    # from the output so that they don't get included in the library
 	    # dependencies.
-	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "[[-]]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
 	    ;;
           *)
 	    if test yes = "$GXX"; then
@@ -7131,7 +7178,7 @@ if test yes != "$_lt_caught_CXX_error"; then
         _LT_TAGVAR(ld_shlibs, $1)=yes
 	;;
 
-      openbsd* | bitrig*)
+      openbsd*)
 	if test -f /usr/libexec/ld.so; then
 	  _LT_TAGVAR(hardcode_direct, $1)=yes
 	  _LT_TAGVAR(hardcode_shlibpath_var, $1)=no
@@ -7222,7 +7269,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	      # Commands to make compiler produce verbose output that lists
 	      # what "hidden" libraries, object files and flags are used when
 	      # linking a shared library.
-	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
 
 	    else
 	      # FIXME: insert proper C++ library support
@@ -7306,7 +7353,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
 	      else
 	        # g++ 2.7 appears to require '-G' NOT '-shared' on this
 	        # platform.
@@ -7317,7 +7364,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"'
+	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
 	      fi
 
 	      _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='$wl-R $wl$libdir'
@@ -7555,10 +7602,11 @@ if AC_TRY_EVAL(ac_compile); then
     case $prev$p in
 
     -L* | -R* | -l*)
-       # Some compilers place space between "-{L,R}" and the path.
+       # Some compilers place space between "-{L,R,l}" and the path.
        # Remove the space.
-       if test x-L = "$p" ||
-          test x-R = "$p"; then
+       if test x-L = x"$p" ||
+          test x-R = x"$p" ||
+          test x-l = x"$p"; then
 	 prev=$p
 	 continue
        fi
@@ -8216,7 +8264,7 @@ AC_SUBST([DLLTOOL])
 # ----------------
 # Check for a file(cmd) program that can be used to detect file type and magic
 m4_defun([_LT_DECL_FILECMD],
-[AC_CHECK_TOOL([FILECMD], [file], [:])
+[AC_CHECK_PROG([FILECMD], [file], [file], [:])
 _LT_DECL([], [FILECMD], [1], [A file(cmd) program that detects file types])
 ])# _LD_DECL_FILECMD
 
@@ -8232,73 +8280,6 @@ _LT_DECL([], [SED], [1], [A sed program that does not truncate output])
 _LT_DECL([], [Xsed], ["\$SED -e 1s/^X//"],
     [Sed that helps us avoid accidentally triggering echo(1) options like -n])
 ])# _LT_DECL_SED
-
-m4_ifndef([AC_PROG_SED], [
-############################################################
-# NOTE: This macro has been submitted for inclusion into   #
-#  GNU Autoconf as AC_PROG_SED.  When it is available in   #
-#  a released version of Autoconf we should remove this    #
-#  macro and use it instead.                               #
-############################################################
-
-m4_defun([AC_PROG_SED],
-[AC_MSG_CHECKING([for a sed that does not truncate output])
-AC_CACHE_VAL(lt_cv_path_SED,
-[# Loop through the user's path and test for sed and gsed.
-# Then use that list of sed's as ones to test for truncation.
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  test -z "$as_dir" && as_dir=.
-  for lt_ac_prog in sed gsed; do
-    for ac_exec_ext in '' $ac_executable_extensions; do
-      if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then
-        lt_ac_sed_list="$lt_ac_sed_list $as_dir/$lt_ac_prog$ac_exec_ext"
-      fi
-    done
-  done
-done
-IFS=$as_save_IFS
-lt_ac_max=0
-lt_ac_count=0
-# Add /usr/xpg4/bin/sed as it is typically found on Solaris
-# along with /bin/sed that truncates output.
-for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do
-  test ! -f "$lt_ac_sed" && continue
-  cat /dev/null > conftest.in
-  lt_ac_count=0
-  echo $ECHO_N "0123456789$ECHO_C" >conftest.in
-  # Check for GNU sed and select it if it is found.
-  if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; then
-    lt_cv_path_SED=$lt_ac_sed
-    break
-  fi
-  while true; do
-    cat conftest.in conftest.in >conftest.tmp
-    mv conftest.tmp conftest.in
-    cp conftest.in conftest.nl
-    echo >>conftest.nl
-    $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break
-    cmp -s conftest.out conftest.nl || break
-    # 10000 chars as input seems more than enough
-    test 10 -lt "$lt_ac_count" && break
-    lt_ac_count=`expr $lt_ac_count + 1`
-    if test "$lt_ac_count" -gt "$lt_ac_max"; then
-      lt_ac_max=$lt_ac_count
-      lt_cv_path_SED=$lt_ac_sed
-    fi
-  done
-done
-])
-SED=$lt_cv_path_SED
-AC_SUBST([SED])
-AC_MSG_RESULT([$SED])
-])#AC_PROG_SED
-])#m4_ifndef
-
-# Old name:
-AU_ALIAS([LT_AC_PROG_SED], [AC_PROG_SED])
 dnl aclocal-1.4 backwards compatibility:
 dnl AC_DEFUN([LT_AC_PROG_SED], [])
 
@@ -8345,7 +8326,7 @@ AC_CACHE_VAL(lt_cv_to_host_file_cmd,
 [case $host in
   *-*-mingw* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32
         ;;
       *-*-cygwin* )
@@ -8358,7 +8339,7 @@ AC_CACHE_VAL(lt_cv_to_host_file_cmd,
     ;;
   *-*-cygwin* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin
         ;;
       *-*-cygwin* )
@@ -8384,9 +8365,9 @@ AC_CACHE_VAL(lt_cv_to_tool_file_cmd,
 [#assume ordinary cross tools, or native build.
 lt_cv_to_tool_file_cmd=func_convert_file_noop
 case $host in
-  *-*-mingw* )
+  *-*-mingw* | *-*-windows* )
     case $build in
-      *-*-mingw* ) # actually msys
+      *-*-mingw* | *-*-windows* ) # actually msys
         lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32
         ;;
     esac
diff --git a/deps/cares/m4/ltoptions.m4 b/deps/cares/m4/ltoptions.m4
index b0b5e9c2126062..25caa890298a4e 100644
--- a/deps/cares/m4/ltoptions.m4
+++ b/deps/cares/m4/ltoptions.m4
@@ -1,6 +1,6 @@
 # Helper functions for option handling.                    -*- Autoconf -*-
 #
-#   Copyright (C) 2004-2005, 2007-2009, 2011-2019, 2021-2022 Free
+#   Copyright (C) 2004-2005, 2007-2009, 2011-2019, 2021-2024 Free
 #   Software Foundation, Inc.
 #   Written by Gary V. Vaughan, 2004
 #
@@ -8,7 +8,7 @@
 # unlimited permission to copy and/or distribute it, with or without
 # modifications, as long as this notice is preserved.
 
-# serial 8 ltoptions.m4
+# serial 10 ltoptions.m4
 
 # This is to help aclocal find these macros, as it can't see m4_define.
 AC_DEFUN([LTOPTIONS_VERSION], [m4_if([1])])
@@ -128,7 +128,7 @@ LT_OPTION_DEFINE([LT_INIT], [win32-dll],
 [enable_win32_dll=yes
 
 case $host in
-*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*)
+*-*-cygwin* | *-*-mingw* | *-*-windows* | *-*-pw32* | *-*-cegcc*)
   AC_CHECK_TOOL(AS, as, false)
   AC_CHECK_TOOL(DLLTOOL, dlltool, false)
   AC_CHECK_TOOL(OBJDUMP, objdump, false)
@@ -323,29 +323,39 @@ dnl AC_DEFUN([AM_DISABLE_FAST_INSTALL], [])
 
 # _LT_WITH_AIX_SONAME([DEFAULT])
 # ----------------------------------
-# implement the --with-aix-soname flag, and support the `aix-soname=aix'
-# and `aix-soname=both' and `aix-soname=svr4' LT_INIT options. DEFAULT
-# is either `aix', `both' or `svr4'.  If omitted, it defaults to `aix'.
+# implement the --enable-aix-soname configure option, and support the
+# `aix-soname=aix' and `aix-soname=both' and `aix-soname=svr4' LT_INIT options.
+# DEFAULT is either `aix', `both', or `svr4'.  If omitted, it defaults to `aix'.
 m4_define([_LT_WITH_AIX_SONAME],
 [m4_define([_LT_WITH_AIX_SONAME_DEFAULT], [m4_if($1, svr4, svr4, m4_if($1, both, both, aix))])dnl
 shared_archive_member_spec=
 case $host,$enable_shared in
 power*-*-aix[[5-9]]*,yes)
   AC_MSG_CHECKING([which variant of shared library versioning to provide])
-  AC_ARG_WITH([aix-soname],
-    [AS_HELP_STRING([--with-aix-soname=aix|svr4|both],
+  AC_ARG_ENABLE([aix-soname],
+    [AS_HELP_STRING([--enable-aix-soname=aix|svr4|both],
       [shared library versioning (aka "SONAME") variant to provide on AIX, @<:@default=]_LT_WITH_AIX_SONAME_DEFAULT[@:>@.])],
-    [case $withval in
-    aix|svr4|both)
-      ;;
-    *)
-      AC_MSG_ERROR([Unknown argument to --with-aix-soname])
-      ;;
-    esac
-    lt_cv_with_aix_soname=$with_aix_soname],
-    [AC_CACHE_VAL([lt_cv_with_aix_soname],
-      [lt_cv_with_aix_soname=]_LT_WITH_AIX_SONAME_DEFAULT)
-    with_aix_soname=$lt_cv_with_aix_soname])
+    [case $enableval in
+     aix|svr4|both)
+       ;;
+     *)
+       AC_MSG_ERROR([Unknown argument to --enable-aix-soname])
+       ;;
+     esac
+     lt_cv_with_aix_soname=$enable_aix_soname],
+    [_AC_ENABLE_IF([with], [aix-soname],
+        [case $withval in
+         aix|svr4|both)
+           ;;
+         *)
+           AC_MSG_ERROR([Unknown argument to --with-aix-soname])
+           ;;
+         esac
+         lt_cv_with_aix_soname=$with_aix_soname],
+        [AC_CACHE_VAL([lt_cv_with_aix_soname],
+           [lt_cv_with_aix_soname=]_LT_WITH_AIX_SONAME_DEFAULT)])
+     enable_aix_soname=$lt_cv_with_aix_soname])
+  with_aix_soname=$enable_aix_soname
   AC_MSG_RESULT([$with_aix_soname])
   if test aix != "$with_aix_soname"; then
     # For the AIX way of multilib, we name the shared archive member
@@ -376,30 +386,50 @@ LT_OPTION_DEFINE([LT_INIT], [aix-soname=svr4], [_LT_WITH_AIX_SONAME([svr4])])
 
 # _LT_WITH_PIC([MODE])
 # --------------------
-# implement the --with-pic flag, and support the 'pic-only' and 'no-pic'
+# implement the --enable-pic flag, and support the 'pic-only' and 'no-pic'
 # LT_INIT options.
 # MODE is either 'yes' or 'no'.  If omitted, it defaults to 'both'.
 m4_define([_LT_WITH_PIC],
-[AC_ARG_WITH([pic],
-    [AS_HELP_STRING([--with-pic@<:@=PKGS@:>@],
+[AC_ARG_ENABLE([pic],
+    [AS_HELP_STRING([--enable-pic@<:@=PKGS@:>@],
 	[try to use only PIC/non-PIC objects @<:@default=use both@:>@])],
     [lt_p=${PACKAGE-default}
-    case $withval in
-    yes|no) pic_mode=$withval ;;
-    *)
-      pic_mode=default
-      # Look at the argument we got.  We use all the common list separators.
-      lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-      for lt_pkg in $withval; do
-	IFS=$lt_save_ifs
-	if test "X$lt_pkg" = "X$lt_p"; then
-	  pic_mode=yes
-	fi
-      done
-      IFS=$lt_save_ifs
-      ;;
-    esac],
-    [pic_mode=m4_default([$1], [default])])
+     case $enableval in
+     yes|no) pic_mode=$enableval ;;
+     *)
+       pic_mode=default
+       # Look at the argument we got.  We use all the common list separators.
+       lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+       for lt_pkg in $enableval; do
+	 IFS=$lt_save_ifs
+	 if test "X$lt_pkg" = "X$lt_p"; then
+	   pic_mode=yes
+	 fi
+       done
+       IFS=$lt_save_ifs
+       ;;
+     esac],
+    [dnl Continue to support --with-pic and --without-pic, for backward
+     dnl compatibility.
+     _AC_ENABLE_IF([with], [pic],
+	[lt_p=${PACKAGE-default}
+	 case $withval in
+	 yes|no) pic_mode=$withval ;;
+	 *)
+	   pic_mode=default
+	   # Look at the argument we got.  We use all the common list separators.
+	   lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+	   for lt_pkg in $withval; do
+	     IFS=$lt_save_ifs
+	     if test "X$lt_pkg" = "X$lt_p"; then
+	       pic_mode=yes
+	     fi
+	   done
+	   IFS=$lt_save_ifs
+	   ;;
+	 esac],
+	[pic_mode=m4_default([$1], [default])])]
+    )
 
 _LT_DECL([], [pic_mode], [0], [What type of objects to build])dnl
 ])# _LT_WITH_PIC
diff --git a/deps/cares/m4/ltsugar.m4 b/deps/cares/m4/ltsugar.m4
index 902508bd93aec6..5b5c80a3ad78a1 100644
--- a/deps/cares/m4/ltsugar.m4
+++ b/deps/cares/m4/ltsugar.m4
@@ -1,6 +1,6 @@
 # ltsugar.m4 -- libtool m4 base layer.                         -*-Autoconf-*-
 #
-# Copyright (C) 2004-2005, 2007-2008, 2011-2019, 2021-2022 Free Software
+# Copyright (C) 2004-2005, 2007-2008, 2011-2019, 2021-2024 Free Software
 # Foundation, Inc.
 # Written by Gary V. Vaughan, 2004
 #
diff --git a/deps/cares/m4/ltversion.m4 b/deps/cares/m4/ltversion.m4
index b155d0aceca376..149c9719fa5983 100644
--- a/deps/cares/m4/ltversion.m4
+++ b/deps/cares/m4/ltversion.m4
@@ -1,6 +1,6 @@
 # ltversion.m4 -- version numbers			-*- Autoconf -*-
 #
-#   Copyright (C) 2004, 2011-2019, 2021-2022 Free Software Foundation,
+#   Copyright (C) 2004, 2011-2019, 2021-2024 Free Software Foundation,
 #   Inc.
 #   Written by Scott James Remnant, 2004
 #
@@ -10,15 +10,15 @@
 
 # @configure_input@
 
-# serial 4245 ltversion.m4
+# serial 4392 ltversion.m4
 # This file is part of GNU Libtool
 
-m4_define([LT_PACKAGE_VERSION], [2.4.7])
-m4_define([LT_PACKAGE_REVISION], [2.4.7])
+m4_define([LT_PACKAGE_VERSION], [2.5.3])
+m4_define([LT_PACKAGE_REVISION], [2.5.3])
 
 AC_DEFUN([LTVERSION_VERSION],
-[macro_version='2.4.7'
-macro_revision='2.4.7'
+[macro_version='2.5.3'
+macro_revision='2.5.3'
 _LT_DECL(, macro_version, 0, [Which release of libtool.m4 was used?])
 _LT_DECL(, macro_revision, 0)
 ])
diff --git a/deps/cares/m4/lt~obsolete.m4 b/deps/cares/m4/lt~obsolete.m4
index 0f7a8759da8d46..22b5346973571a 100644
--- a/deps/cares/m4/lt~obsolete.m4
+++ b/deps/cares/m4/lt~obsolete.m4
@@ -1,6 +1,6 @@
 # lt~obsolete.m4 -- aclocal satisfying obsolete definitions.    -*-Autoconf-*-
 #
-#   Copyright (C) 2004-2005, 2007, 2009, 2011-2019, 2021-2022 Free
+#   Copyright (C) 2004-2005, 2007, 2009, 2011-2019, 2021-2024 Free
 #   Software Foundation, Inc.
 #   Written by Scott James Remnant, 2004.
 #
diff --git a/deps/cares/src/lib/CMakeLists.txt b/deps/cares/src/lib/CMakeLists.txt
index ef0acf371ff091..9956fd625b2ad6 100644
--- a/deps/cares/src/lib/CMakeLists.txt
+++ b/deps/cares/src/lib/CMakeLists.txt
@@ -53,6 +53,7 @@ IF (CARES_SHARED)
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>"
 		       "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
 		PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}"
+		        "${CMAKE_CURRENT_SOURCE_DIR}/include"
 	)
 
 	TARGET_COMPILE_DEFINITIONS (${PROJECT_NAME} PRIVATE HAVE_CONFIG_H=1 CARES_BUILDING_LIBRARY)
@@ -110,6 +111,7 @@ IF (CARES_STATIC)
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>"
 		       "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
 		PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}"
+		        "${CMAKE_CURRENT_SOURCE_DIR}/include"
 	)
 
 	TARGET_COMPILE_DEFINITIONS (${LIBNAME} PRIVATE HAVE_CONFIG_H=1 CARES_BUILDING_LIBRARY)
diff --git a/deps/cares/src/lib/Makefile.am b/deps/cares/src/lib/Makefile.am
index 44e04bd35ccf7d..db4f2640f2bd40 100644
--- a/deps/cares/src/lib/Makefile.am
+++ b/deps/cares/src/lib/Makefile.am
@@ -11,7 +11,8 @@ ACLOCAL_AMFLAGS = -I m4 --install
 AM_CPPFLAGS += -I$(top_builddir)/include \
                -I$(top_builddir)/src/lib \
                -I$(top_srcdir)/include \
-               -I$(top_srcdir)/src/lib
+               -I$(top_srcdir)/src/lib \
+               -I$(top_srcdir)/src/lib/include
 
 lib_LTLIBRARIES = libcares.la
 
diff --git a/deps/cares/src/lib/Makefile.in b/deps/cares/src/lib/Makefile.in
index 30d33843d5d2d8..6fdb27835828af 100644
--- a/deps/cares/src/lib/Makefile.in
+++ b/deps/cares/src/lib/Makefile.in
@@ -15,7 +15,7 @@
 @SET_MAKE@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Fri Aug 23 09:37:25 EDT 2024
+# from AX_AM_MACROS_STATIC on Wed Oct  9 20:58:25 EDT 2024
 
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
@@ -162,13 +162,10 @@ am__installdirs = "$(DESTDIR)$(libdir)"
 LTLIBRARIES = $(lib_LTLIBRARIES)
 libcares_la_LIBADD =
 am__dirstamp = $(am__leading_dot)dirstamp
-am__objects_1 = libcares_la-ares__addrinfo2hostent.lo \
-	libcares_la-ares__addrinfo_localhost.lo \
-	libcares_la-ares__close_sockets.lo \
-	libcares_la-ares__hosts_file.lo \
-	libcares_la-ares__parse_into_addrinfo.lo \
-	libcares_la-ares__socket.lo libcares_la-ares__sortaddrinfo.lo \
+am__objects_1 = libcares_la-ares_addrinfo2hostent.lo \
+	libcares_la-ares_addrinfo_localhost.lo \
 	libcares_la-ares_android.lo libcares_la-ares_cancel.lo \
+	libcares_la-ares_close_sockets.lo libcares_la-ares_conn.lo \
 	libcares_la-ares_cookie.lo libcares_la-ares_data.lo \
 	libcares_la-ares_destroy.lo libcares_la-ares_free_hostent.lo \
 	libcares_la-ares_free_string.lo \
@@ -176,25 +173,30 @@ am__objects_1 = libcares_la-ares__addrinfo2hostent.lo \
 	libcares_la-ares_getaddrinfo.lo libcares_la-ares_getenv.lo \
 	libcares_la-ares_gethostbyaddr.lo \
 	libcares_la-ares_gethostbyname.lo \
-	libcares_la-ares_getnameinfo.lo libcares_la-ares_init.lo \
-	libcares_la-ares_library_init.lo libcares_la-ares_metrics.lo \
-	libcares_la-ares_options.lo libcares_la-ares_platform.lo \
+	libcares_la-ares_getnameinfo.lo libcares_la-ares_hosts_file.lo \
+	libcares_la-ares_init.lo libcares_la-ares_library_init.lo \
+	libcares_la-ares_metrics.lo libcares_la-ares_options.lo \
+	libcares_la-ares_parse_into_addrinfo.lo \
 	libcares_la-ares_process.lo libcares_la-ares_qcache.lo \
 	libcares_la-ares_query.lo libcares_la-ares_search.lo \
-	libcares_la-ares_send.lo libcares_la-ares_strerror.lo \
-	libcares_la-ares_sysconfig.lo \
+	libcares_la-ares_send.lo \
+	libcares_la-ares_set_socket_functions.lo \
+	libcares_la-ares_socket.lo libcares_la-ares_sortaddrinfo.lo \
+	libcares_la-ares_strerror.lo libcares_la-ares_sysconfig.lo \
 	libcares_la-ares_sysconfig_files.lo \
 	libcares_la-ares_sysconfig_mac.lo \
 	libcares_la-ares_sysconfig_win.lo libcares_la-ares_timeout.lo \
 	libcares_la-ares_update_servers.lo libcares_la-ares_version.lo \
 	libcares_la-inet_net_pton.lo libcares_la-inet_ntop.lo \
-	libcares_la-windows_port.lo dsa/libcares_la-ares__array.lo \
-	dsa/libcares_la-ares__htable.lo \
-	dsa/libcares_la-ares__htable_asvp.lo \
-	dsa/libcares_la-ares__htable_strvp.lo \
-	dsa/libcares_la-ares__htable_szvp.lo \
-	dsa/libcares_la-ares__htable_vpvp.lo \
-	dsa/libcares_la-ares__llist.lo dsa/libcares_la-ares__slist.lo \
+	libcares_la-windows_port.lo dsa/libcares_la-ares_array.lo \
+	dsa/libcares_la-ares_htable.lo \
+	dsa/libcares_la-ares_htable_asvp.lo \
+	dsa/libcares_la-ares_htable_dict.lo \
+	dsa/libcares_la-ares_htable_strvp.lo \
+	dsa/libcares_la-ares_htable_szvp.lo \
+	dsa/libcares_la-ares_htable_vpstr.lo \
+	dsa/libcares_la-ares_htable_vpvp.lo \
+	dsa/libcares_la-ares_llist.lo dsa/libcares_la-ares_slist.lo \
 	event/libcares_la-ares_event_configchg.lo \
 	event/libcares_la-ares_event_epoll.lo \
 	event/libcares_la-ares_event_kqueue.lo \
@@ -225,13 +227,12 @@ am__objects_1 = libcares_la-ares__addrinfo2hostent.lo \
 	record/libcares_la-ares_dns_parse.lo \
 	record/libcares_la-ares_dns_record.lo \
 	record/libcares_la-ares_dns_write.lo \
-	str/libcares_la-ares__buf.lo \
-	str/libcares_la-ares_strcasecmp.lo str/libcares_la-ares_str.lo \
+	str/libcares_la-ares_buf.lo str/libcares_la-ares_str.lo \
 	str/libcares_la-ares_strsplit.lo \
-	util/libcares_la-ares__iface_ips.lo \
-	util/libcares_la-ares__threads.lo \
-	util/libcares_la-ares__timeval.lo \
-	util/libcares_la-ares_math.lo util/libcares_la-ares_rand.lo
+	util/libcares_la-ares_iface_ips.lo \
+	util/libcares_la-ares_threads.lo \
+	util/libcares_la-ares_timeval.lo util/libcares_la-ares_math.lo \
+	util/libcares_la-ares_rand.lo util/libcares_la-ares_uri.lo
 am__objects_2 =
 am_libcares_la_OBJECTS = $(am__objects_1) $(am__objects_2)
 libcares_la_OBJECTS = $(am_libcares_la_OBJECTS)
@@ -258,15 +259,12 @@ DEFAULT_INCLUDES =
 depcomp = $(SHELL) $(top_srcdir)/config/depcomp
 am__maybe_remake_depfiles = depfiles
 am__depfiles_remade =  \
-	./$(DEPDIR)/libcares_la-ares__addrinfo2hostent.Plo \
-	./$(DEPDIR)/libcares_la-ares__addrinfo_localhost.Plo \
-	./$(DEPDIR)/libcares_la-ares__close_sockets.Plo \
-	./$(DEPDIR)/libcares_la-ares__hosts_file.Plo \
-	./$(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Plo \
-	./$(DEPDIR)/libcares_la-ares__socket.Plo \
-	./$(DEPDIR)/libcares_la-ares__sortaddrinfo.Plo \
+	./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo \
+	./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo \
 	./$(DEPDIR)/libcares_la-ares_android.Plo \
 	./$(DEPDIR)/libcares_la-ares_cancel.Plo \
+	./$(DEPDIR)/libcares_la-ares_close_sockets.Plo \
+	./$(DEPDIR)/libcares_la-ares_conn.Plo \
 	./$(DEPDIR)/libcares_la-ares_cookie.Plo \
 	./$(DEPDIR)/libcares_la-ares_data.Plo \
 	./$(DEPDIR)/libcares_la-ares_destroy.Plo \
@@ -278,16 +276,20 @@ am__depfiles_remade =  \
 	./$(DEPDIR)/libcares_la-ares_gethostbyaddr.Plo \
 	./$(DEPDIR)/libcares_la-ares_gethostbyname.Plo \
 	./$(DEPDIR)/libcares_la-ares_getnameinfo.Plo \
+	./$(DEPDIR)/libcares_la-ares_hosts_file.Plo \
 	./$(DEPDIR)/libcares_la-ares_init.Plo \
 	./$(DEPDIR)/libcares_la-ares_library_init.Plo \
 	./$(DEPDIR)/libcares_la-ares_metrics.Plo \
 	./$(DEPDIR)/libcares_la-ares_options.Plo \
-	./$(DEPDIR)/libcares_la-ares_platform.Plo \
+	./$(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Plo \
 	./$(DEPDIR)/libcares_la-ares_process.Plo \
 	./$(DEPDIR)/libcares_la-ares_qcache.Plo \
 	./$(DEPDIR)/libcares_la-ares_query.Plo \
 	./$(DEPDIR)/libcares_la-ares_search.Plo \
 	./$(DEPDIR)/libcares_la-ares_send.Plo \
+	./$(DEPDIR)/libcares_la-ares_set_socket_functions.Plo \
+	./$(DEPDIR)/libcares_la-ares_socket.Plo \
+	./$(DEPDIR)/libcares_la-ares_sortaddrinfo.Plo \
 	./$(DEPDIR)/libcares_la-ares_strerror.Plo \
 	./$(DEPDIR)/libcares_la-ares_sysconfig.Plo \
 	./$(DEPDIR)/libcares_la-ares_sysconfig_files.Plo \
@@ -299,14 +301,16 @@ am__depfiles_remade =  \
 	./$(DEPDIR)/libcares_la-inet_net_pton.Plo \
 	./$(DEPDIR)/libcares_la-inet_ntop.Plo \
 	./$(DEPDIR)/libcares_la-windows_port.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__array.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__htable.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__llist.Plo \
-	dsa/$(DEPDIR)/libcares_la-ares__slist.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_array.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_llist.Plo \
+	dsa/$(DEPDIR)/libcares_la-ares_slist.Plo \
 	event/$(DEPDIR)/libcares_la-ares_event_configchg.Plo \
 	event/$(DEPDIR)/libcares_la-ares_event_epoll.Plo \
 	event/$(DEPDIR)/libcares_la-ares_event_kqueue.Plo \
@@ -337,15 +341,15 @@ am__depfiles_remade =  \
 	record/$(DEPDIR)/libcares_la-ares_dns_parse.Plo \
 	record/$(DEPDIR)/libcares_la-ares_dns_record.Plo \
 	record/$(DEPDIR)/libcares_la-ares_dns_write.Plo \
-	str/$(DEPDIR)/libcares_la-ares__buf.Plo \
+	str/$(DEPDIR)/libcares_la-ares_buf.Plo \
 	str/$(DEPDIR)/libcares_la-ares_str.Plo \
-	str/$(DEPDIR)/libcares_la-ares_strcasecmp.Plo \
 	str/$(DEPDIR)/libcares_la-ares_strsplit.Plo \
-	util/$(DEPDIR)/libcares_la-ares__iface_ips.Plo \
-	util/$(DEPDIR)/libcares_la-ares__threads.Plo \
-	util/$(DEPDIR)/libcares_la-ares__timeval.Plo \
+	util/$(DEPDIR)/libcares_la-ares_iface_ips.Plo \
 	util/$(DEPDIR)/libcares_la-ares_math.Plo \
-	util/$(DEPDIR)/libcares_la-ares_rand.Plo
+	util/$(DEPDIR)/libcares_la-ares_rand.Plo \
+	util/$(DEPDIR)/libcares_la-ares_threads.Plo \
+	util/$(DEPDIR)/libcares_la-ares_timeval.Plo \
+	util/$(DEPDIR)/libcares_la-ares_uri.Plo
 am__mv = mv -f
 COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
 	$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
@@ -445,7 +449,7 @@ AM_CFLAGS = @AM_CFLAGS@
 # might possibly already be installed in the system.
 AM_CPPFLAGS = @AM_CPPFLAGS@ -I$(top_builddir)/include \
 	-I$(top_builddir)/src/lib -I$(top_srcdir)/include \
-	-I$(top_srcdir)/src/lib
+	-I$(top_srcdir)/src/lib -I$(top_srcdir)/src/lib/include
 AM_DEFAULT_VERBOSITY = @AM_DEFAULT_VERBOSITY@
 AR = @AR@
 AS = @AS@
@@ -643,15 +647,12 @@ libcares_la_CPPFLAGS_EXTRA = -DCARES_BUILDING_LIBRARY $(am__append_3) \
 libcares_la_LIBS = $(CODE_COVERAGE_LIBS)
 libcares_la_CFLAGS = $(AM_CFLAGS) $(libcares_la_CFLAGS_EXTRA)
 libcares_la_CPPFLAGS = $(AM_CPPFLAGS) $(libcares_la_CPPFLAGS_EXTRA)
-CSOURCES = ares__addrinfo2hostent.c	\
-  ares__addrinfo_localhost.c		\
-  ares__close_sockets.c			\
-  ares__hosts_file.c			\
-  ares__parse_into_addrinfo.c		\
-  ares__socket.c			\
-  ares__sortaddrinfo.c			\
+CSOURCES = ares_addrinfo2hostent.c	\
+  ares_addrinfo_localhost.c		\
   ares_android.c			\
   ares_cancel.c				\
+  ares_close_sockets.c			\
+  ares_conn.c				\
   ares_cookie.c				\
   ares_data.c				\
   ares_destroy.c			\
@@ -663,16 +664,20 @@ CSOURCES = ares__addrinfo2hostent.c	\
   ares_gethostbyaddr.c			\
   ares_gethostbyname.c			\
   ares_getnameinfo.c			\
+  ares_hosts_file.c			\
   ares_init.c				\
   ares_library_init.c			\
   ares_metrics.c			\
   ares_options.c			\
-  ares_platform.c			\
+  ares_parse_into_addrinfo.c		\
   ares_process.c			\
   ares_qcache.c				\
   ares_query.c				\
   ares_search.c				\
   ares_send.c				\
+  ares_set_socket_functions.c		\
+  ares_socket.c				\
+  ares_sortaddrinfo.c			\
   ares_strerror.c			\
   ares_sysconfig.c			\
   ares_sysconfig_files.c		\
@@ -684,14 +689,16 @@ CSOURCES = ares__addrinfo2hostent.c	\
   inet_net_pton.c			\
   inet_ntop.c				\
   windows_port.c			\
-  dsa/ares__array.c			\
-  dsa/ares__htable.c			\
-  dsa/ares__htable_asvp.c		\
-  dsa/ares__htable_strvp.c		\
-  dsa/ares__htable_szvp.c		\
-  dsa/ares__htable_vpvp.c		\
-  dsa/ares__llist.c			\
-  dsa/ares__slist.c			\
+  dsa/ares_array.c			\
+  dsa/ares_htable.c			\
+  dsa/ares_htable_asvp.c		\
+  dsa/ares_htable_dict.c		\
+  dsa/ares_htable_strvp.c		\
+  dsa/ares_htable_szvp.c		\
+  dsa/ares_htable_vpstr.c		\
+  dsa/ares_htable_vpvp.c		\
+  dsa/ares_llist.c			\
+  dsa/ares_slist.c			\
   event/ares_event_configchg.c		\
   event/ares_event_epoll.c		\
   event/ares_event_kqueue.c		\
@@ -722,42 +729,49 @@ CSOURCES = ares__addrinfo2hostent.c	\
   record/ares_dns_parse.c		\
   record/ares_dns_record.c		\
   record/ares_dns_write.c		\
-  str/ares__buf.c			\
-  str/ares_strcasecmp.c			\
+  str/ares_buf.c			\
   str/ares_str.c			\
   str/ares_strsplit.c			\
-  util/ares__iface_ips.c		\
-  util/ares__threads.c			\
-  util/ares__timeval.c			\
+  util/ares_iface_ips.c			\
+  util/ares_threads.c			\
+  util/ares_timeval.c			\
   util/ares_math.c			\
-  util/ares_rand.c
+  util/ares_rand.c			\
+  util/ares_uri.c
 
 HHEADERS = ares_android.h			\
+  ares_conn.h				\
   ares_data.h				\
   ares_getenv.h				\
   ares_inet_net_pton.h			\
   ares_ipv6.h				\
-  ares_platform.h			\
   ares_private.h			\
   ares_setup.h				\
-  dsa/ares__array.h			\
-  dsa/ares__htable.h			\
-  dsa/ares__htable_asvp.h		\
-  dsa/ares__htable_strvp.h		\
-  dsa/ares__htable_szvp.h		\
-  dsa/ares__htable_vpvp.h		\
-  dsa/ares__llist.h			\
-  dsa/ares__slist.h			\
+  ares_socket.h				\
+  dsa/ares_htable.h			\
+  dsa/ares_slist.h			\
   event/ares_event.h			\
   event/ares_event_win32.h		\
+  include/ares_array.h			\
+  include/ares_buf.h			\
+  include/ares_htable_asvp.h		\
+  include/ares_htable_dict.h		\
+  include/ares_htable_strvp.h		\
+  include/ares_htable_szvp.h		\
+  include/ares_htable_vpstr.h		\
+  include/ares_htable_vpvp.h		\
+  include/ares_llist.h			\
+  include/ares_mem.h			\
+  include/ares_str.h			\
   record/ares_dns_multistring.h		\
   record/ares_dns_private.h		\
-  str/ares__buf.h			\
-  str/ares_strcasecmp.h			\
-  str/ares_str.h			\
   str/ares_strsplit.h			\
-  util/ares__iface_ips.h		\
-  util/ares__threads.h			\
+  util/ares_iface_ips.h			\
+  util/ares_math.h			\
+  util/ares_rand.h			\
+  util/ares_time.h			\
+  util/ares_threads.h			\
+  util/ares_uri.h			\
   thirdparty/apple/dnsinfo.h
 
 
@@ -852,21 +866,25 @@ dsa/$(am__dirstamp):
 dsa/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) dsa/$(DEPDIR)
 	@: >>dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__array.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_array.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__htable.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__htable_asvp.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_asvp.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__htable_strvp.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_dict.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__htable_szvp.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_strvp.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__htable_vpvp.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_szvp.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__llist.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_vpstr.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
-dsa/libcares_la-ares__slist.lo: dsa/$(am__dirstamp) \
+dsa/libcares_la-ares_htable_vpvp.lo: dsa/$(am__dirstamp) \
+	dsa/$(DEPDIR)/$(am__dirstamp)
+dsa/libcares_la-ares_llist.lo: dsa/$(am__dirstamp) \
+	dsa/$(DEPDIR)/$(am__dirstamp)
+dsa/libcares_la-ares_slist.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
 event/$(am__dirstamp):
 	@$(MKDIR_P) event
@@ -952,9 +970,7 @@ str/$(am__dirstamp):
 str/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) str/$(DEPDIR)
 	@: >>str/$(DEPDIR)/$(am__dirstamp)
-str/libcares_la-ares__buf.lo: str/$(am__dirstamp) \
-	str/$(DEPDIR)/$(am__dirstamp)
-str/libcares_la-ares_strcasecmp.lo: str/$(am__dirstamp) \
+str/libcares_la-ares_buf.lo: str/$(am__dirstamp) \
 	str/$(DEPDIR)/$(am__dirstamp)
 str/libcares_la-ares_str.lo: str/$(am__dirstamp) \
 	str/$(DEPDIR)/$(am__dirstamp)
@@ -966,16 +982,18 @@ util/$(am__dirstamp):
 util/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) util/$(DEPDIR)
 	@: >>util/$(DEPDIR)/$(am__dirstamp)
-util/libcares_la-ares__iface_ips.lo: util/$(am__dirstamp) \
+util/libcares_la-ares_iface_ips.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
-util/libcares_la-ares__threads.lo: util/$(am__dirstamp) \
+util/libcares_la-ares_threads.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
-util/libcares_la-ares__timeval.lo: util/$(am__dirstamp) \
+util/libcares_la-ares_timeval.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
 util/libcares_la-ares_math.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
 util/libcares_la-ares_rand.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
+util/libcares_la-ares_uri.lo: util/$(am__dirstamp) \
+	util/$(DEPDIR)/$(am__dirstamp)
 
 libcares.la: $(libcares_la_OBJECTS) $(libcares_la_DEPENDENCIES) $(EXTRA_libcares_la_DEPENDENCIES) 
 	$(AM_V_CCLD)$(libcares_la_LINK) -rpath $(libdir) $(libcares_la_OBJECTS) $(libcares_la_LIBADD) $(LIBS)
@@ -998,15 +1016,12 @@ mostlyclean-compile:
 distclean-compile:
 	-rm -f *.tab.c
 
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__addrinfo2hostent.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__addrinfo_localhost.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__close_sockets.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__hosts_file.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__socket.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares__sortaddrinfo.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_android.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_cancel.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_close_sockets.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_conn.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_cookie.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_data.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_destroy.Plo@am__quote@ # am--include-marker
@@ -1018,16 +1033,20 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_gethostbyaddr.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_gethostbyname.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_getnameinfo.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_hosts_file.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_init.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_library_init.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_metrics.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_options.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_platform.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_process.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_qcache.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_query.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_search.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_send.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_set_socket_functions.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_socket.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_sortaddrinfo.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_strerror.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_sysconfig.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-ares_sysconfig_files.Plo@am__quote@ # am--include-marker
@@ -1039,14 +1058,16 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-inet_net_pton.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-inet_ntop.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libcares_la-windows_port.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__array.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__htable.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__llist.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares__slist.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_array.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_llist.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@dsa/$(DEPDIR)/libcares_la-ares_slist.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@event/$(DEPDIR)/libcares_la-ares_event_configchg.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@event/$(DEPDIR)/libcares_la-ares_event_epoll.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@event/$(DEPDIR)/libcares_la-ares_event_kqueue.Plo@am__quote@ # am--include-marker
@@ -1077,15 +1098,15 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@record/$(DEPDIR)/libcares_la-ares_dns_parse.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@record/$(DEPDIR)/libcares_la-ares_dns_record.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@record/$(DEPDIR)/libcares_la-ares_dns_write.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@str/$(DEPDIR)/libcares_la-ares__buf.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@str/$(DEPDIR)/libcares_la-ares_buf.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@str/$(DEPDIR)/libcares_la-ares_str.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@str/$(DEPDIR)/libcares_la-ares_strcasecmp.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@str/$(DEPDIR)/libcares_la-ares_strsplit.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares__iface_ips.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares__threads.Plo@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares__timeval.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_iface_ips.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_math.Plo@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_rand.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_threads.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_timeval.Plo@am__quote@ # am--include-marker
+@AMDEP_TRUE@@am__include@ @am__quote@util/$(DEPDIR)/libcares_la-ares_uri.Plo@am__quote@ # am--include-marker
 
 $(am__depfiles_remade):
 	@$(MKDIR_P) $(@D)
@@ -1117,54 +1138,19 @@ am--depfiles: $(am__depfiles_remade)
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LTCOMPILE) -c -o $@ $<
 
-libcares_la-ares__addrinfo2hostent.lo: ares__addrinfo2hostent.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__addrinfo2hostent.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__addrinfo2hostent.Tpo -c -o libcares_la-ares__addrinfo2hostent.lo `test -f 'ares__addrinfo2hostent.c' || echo '$(srcdir)/'`ares__addrinfo2hostent.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__addrinfo2hostent.Tpo $(DEPDIR)/libcares_la-ares__addrinfo2hostent.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__addrinfo2hostent.c' object='libcares_la-ares__addrinfo2hostent.lo' libtool=yes @AMDEPBACKSLASH@
+libcares_la-ares_addrinfo2hostent.lo: ares_addrinfo2hostent.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_addrinfo2hostent.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_addrinfo2hostent.Tpo -c -o libcares_la-ares_addrinfo2hostent.lo `test -f 'ares_addrinfo2hostent.c' || echo '$(srcdir)/'`ares_addrinfo2hostent.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_addrinfo2hostent.Tpo $(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_addrinfo2hostent.c' object='libcares_la-ares_addrinfo2hostent.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__addrinfo2hostent.lo `test -f 'ares__addrinfo2hostent.c' || echo '$(srcdir)/'`ares__addrinfo2hostent.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_addrinfo2hostent.lo `test -f 'ares_addrinfo2hostent.c' || echo '$(srcdir)/'`ares_addrinfo2hostent.c
 
-libcares_la-ares__addrinfo_localhost.lo: ares__addrinfo_localhost.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__addrinfo_localhost.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__addrinfo_localhost.Tpo -c -o libcares_la-ares__addrinfo_localhost.lo `test -f 'ares__addrinfo_localhost.c' || echo '$(srcdir)/'`ares__addrinfo_localhost.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__addrinfo_localhost.Tpo $(DEPDIR)/libcares_la-ares__addrinfo_localhost.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__addrinfo_localhost.c' object='libcares_la-ares__addrinfo_localhost.lo' libtool=yes @AMDEPBACKSLASH@
+libcares_la-ares_addrinfo_localhost.lo: ares_addrinfo_localhost.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_addrinfo_localhost.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_addrinfo_localhost.Tpo -c -o libcares_la-ares_addrinfo_localhost.lo `test -f 'ares_addrinfo_localhost.c' || echo '$(srcdir)/'`ares_addrinfo_localhost.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_addrinfo_localhost.Tpo $(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_addrinfo_localhost.c' object='libcares_la-ares_addrinfo_localhost.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__addrinfo_localhost.lo `test -f 'ares__addrinfo_localhost.c' || echo '$(srcdir)/'`ares__addrinfo_localhost.c
-
-libcares_la-ares__close_sockets.lo: ares__close_sockets.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__close_sockets.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__close_sockets.Tpo -c -o libcares_la-ares__close_sockets.lo `test -f 'ares__close_sockets.c' || echo '$(srcdir)/'`ares__close_sockets.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__close_sockets.Tpo $(DEPDIR)/libcares_la-ares__close_sockets.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__close_sockets.c' object='libcares_la-ares__close_sockets.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__close_sockets.lo `test -f 'ares__close_sockets.c' || echo '$(srcdir)/'`ares__close_sockets.c
-
-libcares_la-ares__hosts_file.lo: ares__hosts_file.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__hosts_file.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__hosts_file.Tpo -c -o libcares_la-ares__hosts_file.lo `test -f 'ares__hosts_file.c' || echo '$(srcdir)/'`ares__hosts_file.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__hosts_file.Tpo $(DEPDIR)/libcares_la-ares__hosts_file.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__hosts_file.c' object='libcares_la-ares__hosts_file.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__hosts_file.lo `test -f 'ares__hosts_file.c' || echo '$(srcdir)/'`ares__hosts_file.c
-
-libcares_la-ares__parse_into_addrinfo.lo: ares__parse_into_addrinfo.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__parse_into_addrinfo.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Tpo -c -o libcares_la-ares__parse_into_addrinfo.lo `test -f 'ares__parse_into_addrinfo.c' || echo '$(srcdir)/'`ares__parse_into_addrinfo.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Tpo $(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__parse_into_addrinfo.c' object='libcares_la-ares__parse_into_addrinfo.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__parse_into_addrinfo.lo `test -f 'ares__parse_into_addrinfo.c' || echo '$(srcdir)/'`ares__parse_into_addrinfo.c
-
-libcares_la-ares__socket.lo: ares__socket.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__socket.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__socket.Tpo -c -o libcares_la-ares__socket.lo `test -f 'ares__socket.c' || echo '$(srcdir)/'`ares__socket.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__socket.Tpo $(DEPDIR)/libcares_la-ares__socket.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__socket.c' object='libcares_la-ares__socket.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__socket.lo `test -f 'ares__socket.c' || echo '$(srcdir)/'`ares__socket.c
-
-libcares_la-ares__sortaddrinfo.lo: ares__sortaddrinfo.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares__sortaddrinfo.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares__sortaddrinfo.Tpo -c -o libcares_la-ares__sortaddrinfo.lo `test -f 'ares__sortaddrinfo.c' || echo '$(srcdir)/'`ares__sortaddrinfo.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares__sortaddrinfo.Tpo $(DEPDIR)/libcares_la-ares__sortaddrinfo.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares__sortaddrinfo.c' object='libcares_la-ares__sortaddrinfo.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares__sortaddrinfo.lo `test -f 'ares__sortaddrinfo.c' || echo '$(srcdir)/'`ares__sortaddrinfo.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_addrinfo_localhost.lo `test -f 'ares_addrinfo_localhost.c' || echo '$(srcdir)/'`ares_addrinfo_localhost.c
 
 libcares_la-ares_android.lo: ares_android.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_android.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_android.Tpo -c -o libcares_la-ares_android.lo `test -f 'ares_android.c' || echo '$(srcdir)/'`ares_android.c
@@ -1180,6 +1166,20 @@ libcares_la-ares_cancel.lo: ares_cancel.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_cancel.lo `test -f 'ares_cancel.c' || echo '$(srcdir)/'`ares_cancel.c
 
+libcares_la-ares_close_sockets.lo: ares_close_sockets.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_close_sockets.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_close_sockets.Tpo -c -o libcares_la-ares_close_sockets.lo `test -f 'ares_close_sockets.c' || echo '$(srcdir)/'`ares_close_sockets.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_close_sockets.Tpo $(DEPDIR)/libcares_la-ares_close_sockets.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_close_sockets.c' object='libcares_la-ares_close_sockets.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_close_sockets.lo `test -f 'ares_close_sockets.c' || echo '$(srcdir)/'`ares_close_sockets.c
+
+libcares_la-ares_conn.lo: ares_conn.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_conn.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_conn.Tpo -c -o libcares_la-ares_conn.lo `test -f 'ares_conn.c' || echo '$(srcdir)/'`ares_conn.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_conn.Tpo $(DEPDIR)/libcares_la-ares_conn.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_conn.c' object='libcares_la-ares_conn.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_conn.lo `test -f 'ares_conn.c' || echo '$(srcdir)/'`ares_conn.c
+
 libcares_la-ares_cookie.lo: ares_cookie.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_cookie.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_cookie.Tpo -c -o libcares_la-ares_cookie.lo `test -f 'ares_cookie.c' || echo '$(srcdir)/'`ares_cookie.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_cookie.Tpo $(DEPDIR)/libcares_la-ares_cookie.Plo
@@ -1257,6 +1257,13 @@ libcares_la-ares_getnameinfo.lo: ares_getnameinfo.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_getnameinfo.lo `test -f 'ares_getnameinfo.c' || echo '$(srcdir)/'`ares_getnameinfo.c
 
+libcares_la-ares_hosts_file.lo: ares_hosts_file.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_hosts_file.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_hosts_file.Tpo -c -o libcares_la-ares_hosts_file.lo `test -f 'ares_hosts_file.c' || echo '$(srcdir)/'`ares_hosts_file.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_hosts_file.Tpo $(DEPDIR)/libcares_la-ares_hosts_file.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_hosts_file.c' object='libcares_la-ares_hosts_file.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_hosts_file.lo `test -f 'ares_hosts_file.c' || echo '$(srcdir)/'`ares_hosts_file.c
+
 libcares_la-ares_init.lo: ares_init.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_init.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_init.Tpo -c -o libcares_la-ares_init.lo `test -f 'ares_init.c' || echo '$(srcdir)/'`ares_init.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_init.Tpo $(DEPDIR)/libcares_la-ares_init.Plo
@@ -1285,12 +1292,12 @@ libcares_la-ares_options.lo: ares_options.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_options.lo `test -f 'ares_options.c' || echo '$(srcdir)/'`ares_options.c
 
-libcares_la-ares_platform.lo: ares_platform.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_platform.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_platform.Tpo -c -o libcares_la-ares_platform.lo `test -f 'ares_platform.c' || echo '$(srcdir)/'`ares_platform.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_platform.Tpo $(DEPDIR)/libcares_la-ares_platform.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_platform.c' object='libcares_la-ares_platform.lo' libtool=yes @AMDEPBACKSLASH@
+libcares_la-ares_parse_into_addrinfo.lo: ares_parse_into_addrinfo.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_parse_into_addrinfo.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Tpo -c -o libcares_la-ares_parse_into_addrinfo.lo `test -f 'ares_parse_into_addrinfo.c' || echo '$(srcdir)/'`ares_parse_into_addrinfo.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Tpo $(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_parse_into_addrinfo.c' object='libcares_la-ares_parse_into_addrinfo.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_platform.lo `test -f 'ares_platform.c' || echo '$(srcdir)/'`ares_platform.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_parse_into_addrinfo.lo `test -f 'ares_parse_into_addrinfo.c' || echo '$(srcdir)/'`ares_parse_into_addrinfo.c
 
 libcares_la-ares_process.lo: ares_process.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_process.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_process.Tpo -c -o libcares_la-ares_process.lo `test -f 'ares_process.c' || echo '$(srcdir)/'`ares_process.c
@@ -1327,6 +1334,27 @@ libcares_la-ares_send.lo: ares_send.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_send.lo `test -f 'ares_send.c' || echo '$(srcdir)/'`ares_send.c
 
+libcares_la-ares_set_socket_functions.lo: ares_set_socket_functions.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_set_socket_functions.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_set_socket_functions.Tpo -c -o libcares_la-ares_set_socket_functions.lo `test -f 'ares_set_socket_functions.c' || echo '$(srcdir)/'`ares_set_socket_functions.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_set_socket_functions.Tpo $(DEPDIR)/libcares_la-ares_set_socket_functions.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_set_socket_functions.c' object='libcares_la-ares_set_socket_functions.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_set_socket_functions.lo `test -f 'ares_set_socket_functions.c' || echo '$(srcdir)/'`ares_set_socket_functions.c
+
+libcares_la-ares_socket.lo: ares_socket.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_socket.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_socket.Tpo -c -o libcares_la-ares_socket.lo `test -f 'ares_socket.c' || echo '$(srcdir)/'`ares_socket.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_socket.Tpo $(DEPDIR)/libcares_la-ares_socket.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_socket.c' object='libcares_la-ares_socket.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_socket.lo `test -f 'ares_socket.c' || echo '$(srcdir)/'`ares_socket.c
+
+libcares_la-ares_sortaddrinfo.lo: ares_sortaddrinfo.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_sortaddrinfo.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_sortaddrinfo.Tpo -c -o libcares_la-ares_sortaddrinfo.lo `test -f 'ares_sortaddrinfo.c' || echo '$(srcdir)/'`ares_sortaddrinfo.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_sortaddrinfo.Tpo $(DEPDIR)/libcares_la-ares_sortaddrinfo.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_sortaddrinfo.c' object='libcares_la-ares_sortaddrinfo.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-ares_sortaddrinfo.lo `test -f 'ares_sortaddrinfo.c' || echo '$(srcdir)/'`ares_sortaddrinfo.c
+
 libcares_la-ares_strerror.lo: ares_strerror.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT libcares_la-ares_strerror.lo -MD -MP -MF $(DEPDIR)/libcares_la-ares_strerror.Tpo -c -o libcares_la-ares_strerror.lo `test -f 'ares_strerror.c' || echo '$(srcdir)/'`ares_strerror.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libcares_la-ares_strerror.Tpo $(DEPDIR)/libcares_la-ares_strerror.Plo
@@ -1404,61 +1432,75 @@ libcares_la-windows_port.lo: windows_port.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o libcares_la-windows_port.lo `test -f 'windows_port.c' || echo '$(srcdir)/'`windows_port.c
 
-dsa/libcares_la-ares__array.lo: dsa/ares__array.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__array.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__array.Tpo -c -o dsa/libcares_la-ares__array.lo `test -f 'dsa/ares__array.c' || echo '$(srcdir)/'`dsa/ares__array.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__array.Tpo dsa/$(DEPDIR)/libcares_la-ares__array.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__array.c' object='dsa/libcares_la-ares__array.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_array.lo: dsa/ares_array.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_array.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_array.Tpo -c -o dsa/libcares_la-ares_array.lo `test -f 'dsa/ares_array.c' || echo '$(srcdir)/'`dsa/ares_array.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_array.Tpo dsa/$(DEPDIR)/libcares_la-ares_array.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_array.c' object='dsa/libcares_la-ares_array.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__array.lo `test -f 'dsa/ares__array.c' || echo '$(srcdir)/'`dsa/ares__array.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_array.lo `test -f 'dsa/ares_array.c' || echo '$(srcdir)/'`dsa/ares_array.c
 
-dsa/libcares_la-ares__htable.lo: dsa/ares__htable.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__htable.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__htable.Tpo -c -o dsa/libcares_la-ares__htable.lo `test -f 'dsa/ares__htable.c' || echo '$(srcdir)/'`dsa/ares__htable.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__htable.Tpo dsa/$(DEPDIR)/libcares_la-ares__htable.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__htable.c' object='dsa/libcares_la-ares__htable.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable.lo: dsa/ares_htable.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable.Tpo -c -o dsa/libcares_la-ares_htable.lo `test -f 'dsa/ares_htable.c' || echo '$(srcdir)/'`dsa/ares_htable.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable.c' object='dsa/libcares_la-ares_htable.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__htable.lo `test -f 'dsa/ares__htable.c' || echo '$(srcdir)/'`dsa/ares__htable.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable.lo `test -f 'dsa/ares_htable.c' || echo '$(srcdir)/'`dsa/ares_htable.c
 
-dsa/libcares_la-ares__htable_asvp.lo: dsa/ares__htable_asvp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__htable_asvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Tpo -c -o dsa/libcares_la-ares__htable_asvp.lo `test -f 'dsa/ares__htable_asvp.c' || echo '$(srcdir)/'`dsa/ares__htable_asvp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Tpo dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__htable_asvp.c' object='dsa/libcares_la-ares__htable_asvp.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_asvp.lo: dsa/ares_htable_asvp.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_asvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Tpo -c -o dsa/libcares_la-ares_htable_asvp.lo `test -f 'dsa/ares_htable_asvp.c' || echo '$(srcdir)/'`dsa/ares_htable_asvp.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_asvp.c' object='dsa/libcares_la-ares_htable_asvp.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__htable_asvp.lo `test -f 'dsa/ares__htable_asvp.c' || echo '$(srcdir)/'`dsa/ares__htable_asvp.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_asvp.lo `test -f 'dsa/ares_htable_asvp.c' || echo '$(srcdir)/'`dsa/ares_htable_asvp.c
 
-dsa/libcares_la-ares__htable_strvp.lo: dsa/ares__htable_strvp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__htable_strvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Tpo -c -o dsa/libcares_la-ares__htable_strvp.lo `test -f 'dsa/ares__htable_strvp.c' || echo '$(srcdir)/'`dsa/ares__htable_strvp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Tpo dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__htable_strvp.c' object='dsa/libcares_la-ares__htable_strvp.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_dict.lo: dsa/ares_htable_dict.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_dict.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Tpo -c -o dsa/libcares_la-ares_htable_dict.lo `test -f 'dsa/ares_htable_dict.c' || echo '$(srcdir)/'`dsa/ares_htable_dict.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_dict.c' object='dsa/libcares_la-ares_htable_dict.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__htable_strvp.lo `test -f 'dsa/ares__htable_strvp.c' || echo '$(srcdir)/'`dsa/ares__htable_strvp.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_dict.lo `test -f 'dsa/ares_htable_dict.c' || echo '$(srcdir)/'`dsa/ares_htable_dict.c
 
-dsa/libcares_la-ares__htable_szvp.lo: dsa/ares__htable_szvp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__htable_szvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Tpo -c -o dsa/libcares_la-ares__htable_szvp.lo `test -f 'dsa/ares__htable_szvp.c' || echo '$(srcdir)/'`dsa/ares__htable_szvp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Tpo dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__htable_szvp.c' object='dsa/libcares_la-ares__htable_szvp.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_strvp.lo: dsa/ares_htable_strvp.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_strvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Tpo -c -o dsa/libcares_la-ares_htable_strvp.lo `test -f 'dsa/ares_htable_strvp.c' || echo '$(srcdir)/'`dsa/ares_htable_strvp.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_strvp.c' object='dsa/libcares_la-ares_htable_strvp.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__htable_szvp.lo `test -f 'dsa/ares__htable_szvp.c' || echo '$(srcdir)/'`dsa/ares__htable_szvp.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_strvp.lo `test -f 'dsa/ares_htable_strvp.c' || echo '$(srcdir)/'`dsa/ares_htable_strvp.c
 
-dsa/libcares_la-ares__htable_vpvp.lo: dsa/ares__htable_vpvp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__htable_vpvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Tpo -c -o dsa/libcares_la-ares__htable_vpvp.lo `test -f 'dsa/ares__htable_vpvp.c' || echo '$(srcdir)/'`dsa/ares__htable_vpvp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Tpo dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__htable_vpvp.c' object='dsa/libcares_la-ares__htable_vpvp.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_szvp.lo: dsa/ares_htable_szvp.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_szvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Tpo -c -o dsa/libcares_la-ares_htable_szvp.lo `test -f 'dsa/ares_htable_szvp.c' || echo '$(srcdir)/'`dsa/ares_htable_szvp.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_szvp.c' object='dsa/libcares_la-ares_htable_szvp.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__htable_vpvp.lo `test -f 'dsa/ares__htable_vpvp.c' || echo '$(srcdir)/'`dsa/ares__htable_vpvp.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_szvp.lo `test -f 'dsa/ares_htable_szvp.c' || echo '$(srcdir)/'`dsa/ares_htable_szvp.c
 
-dsa/libcares_la-ares__llist.lo: dsa/ares__llist.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__llist.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__llist.Tpo -c -o dsa/libcares_la-ares__llist.lo `test -f 'dsa/ares__llist.c' || echo '$(srcdir)/'`dsa/ares__llist.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__llist.Tpo dsa/$(DEPDIR)/libcares_la-ares__llist.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__llist.c' object='dsa/libcares_la-ares__llist.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_vpstr.lo: dsa/ares_htable_vpstr.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_vpstr.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Tpo -c -o dsa/libcares_la-ares_htable_vpstr.lo `test -f 'dsa/ares_htable_vpstr.c' || echo '$(srcdir)/'`dsa/ares_htable_vpstr.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_vpstr.c' object='dsa/libcares_la-ares_htable_vpstr.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__llist.lo `test -f 'dsa/ares__llist.c' || echo '$(srcdir)/'`dsa/ares__llist.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_vpstr.lo `test -f 'dsa/ares_htable_vpstr.c' || echo '$(srcdir)/'`dsa/ares_htable_vpstr.c
 
-dsa/libcares_la-ares__slist.lo: dsa/ares__slist.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares__slist.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares__slist.Tpo -c -o dsa/libcares_la-ares__slist.lo `test -f 'dsa/ares__slist.c' || echo '$(srcdir)/'`dsa/ares__slist.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares__slist.Tpo dsa/$(DEPDIR)/libcares_la-ares__slist.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares__slist.c' object='dsa/libcares_la-ares__slist.lo' libtool=yes @AMDEPBACKSLASH@
+dsa/libcares_la-ares_htable_vpvp.lo: dsa/ares_htable_vpvp.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_htable_vpvp.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Tpo -c -o dsa/libcares_la-ares_htable_vpvp.lo `test -f 'dsa/ares_htable_vpvp.c' || echo '$(srcdir)/'`dsa/ares_htable_vpvp.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Tpo dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_htable_vpvp.c' object='dsa/libcares_la-ares_htable_vpvp.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares__slist.lo `test -f 'dsa/ares__slist.c' || echo '$(srcdir)/'`dsa/ares__slist.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_htable_vpvp.lo `test -f 'dsa/ares_htable_vpvp.c' || echo '$(srcdir)/'`dsa/ares_htable_vpvp.c
+
+dsa/libcares_la-ares_llist.lo: dsa/ares_llist.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_llist.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_llist.Tpo -c -o dsa/libcares_la-ares_llist.lo `test -f 'dsa/ares_llist.c' || echo '$(srcdir)/'`dsa/ares_llist.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_llist.Tpo dsa/$(DEPDIR)/libcares_la-ares_llist.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_llist.c' object='dsa/libcares_la-ares_llist.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_llist.lo `test -f 'dsa/ares_llist.c' || echo '$(srcdir)/'`dsa/ares_llist.c
+
+dsa/libcares_la-ares_slist.lo: dsa/ares_slist.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT dsa/libcares_la-ares_slist.lo -MD -MP -MF dsa/$(DEPDIR)/libcares_la-ares_slist.Tpo -c -o dsa/libcares_la-ares_slist.lo `test -f 'dsa/ares_slist.c' || echo '$(srcdir)/'`dsa/ares_slist.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) dsa/$(DEPDIR)/libcares_la-ares_slist.Tpo dsa/$(DEPDIR)/libcares_la-ares_slist.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='dsa/ares_slist.c' object='dsa/libcares_la-ares_slist.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o dsa/libcares_la-ares_slist.lo `test -f 'dsa/ares_slist.c' || echo '$(srcdir)/'`dsa/ares_slist.c
 
 event/libcares_la-ares_event_configchg.lo: event/ares_event_configchg.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT event/libcares_la-ares_event_configchg.lo -MD -MP -MF event/$(DEPDIR)/libcares_la-ares_event_configchg.Tpo -c -o event/libcares_la-ares_event_configchg.lo `test -f 'event/ares_event_configchg.c' || echo '$(srcdir)/'`event/ares_event_configchg.c
@@ -1670,19 +1712,12 @@ record/libcares_la-ares_dns_write.lo: record/ares_dns_write.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o record/libcares_la-ares_dns_write.lo `test -f 'record/ares_dns_write.c' || echo '$(srcdir)/'`record/ares_dns_write.c
 
-str/libcares_la-ares__buf.lo: str/ares__buf.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT str/libcares_la-ares__buf.lo -MD -MP -MF str/$(DEPDIR)/libcares_la-ares__buf.Tpo -c -o str/libcares_la-ares__buf.lo `test -f 'str/ares__buf.c' || echo '$(srcdir)/'`str/ares__buf.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) str/$(DEPDIR)/libcares_la-ares__buf.Tpo str/$(DEPDIR)/libcares_la-ares__buf.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='str/ares__buf.c' object='str/libcares_la-ares__buf.lo' libtool=yes @AMDEPBACKSLASH@
+str/libcares_la-ares_buf.lo: str/ares_buf.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT str/libcares_la-ares_buf.lo -MD -MP -MF str/$(DEPDIR)/libcares_la-ares_buf.Tpo -c -o str/libcares_la-ares_buf.lo `test -f 'str/ares_buf.c' || echo '$(srcdir)/'`str/ares_buf.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) str/$(DEPDIR)/libcares_la-ares_buf.Tpo str/$(DEPDIR)/libcares_la-ares_buf.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='str/ares_buf.c' object='str/libcares_la-ares_buf.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o str/libcares_la-ares__buf.lo `test -f 'str/ares__buf.c' || echo '$(srcdir)/'`str/ares__buf.c
-
-str/libcares_la-ares_strcasecmp.lo: str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT str/libcares_la-ares_strcasecmp.lo -MD -MP -MF str/$(DEPDIR)/libcares_la-ares_strcasecmp.Tpo -c -o str/libcares_la-ares_strcasecmp.lo `test -f 'str/ares_strcasecmp.c' || echo '$(srcdir)/'`str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) str/$(DEPDIR)/libcares_la-ares_strcasecmp.Tpo str/$(DEPDIR)/libcares_la-ares_strcasecmp.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='str/ares_strcasecmp.c' object='str/libcares_la-ares_strcasecmp.lo' libtool=yes @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o str/libcares_la-ares_strcasecmp.lo `test -f 'str/ares_strcasecmp.c' || echo '$(srcdir)/'`str/ares_strcasecmp.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o str/libcares_la-ares_buf.lo `test -f 'str/ares_buf.c' || echo '$(srcdir)/'`str/ares_buf.c
 
 str/libcares_la-ares_str.lo: str/ares_str.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT str/libcares_la-ares_str.lo -MD -MP -MF str/$(DEPDIR)/libcares_la-ares_str.Tpo -c -o str/libcares_la-ares_str.lo `test -f 'str/ares_str.c' || echo '$(srcdir)/'`str/ares_str.c
@@ -1698,26 +1733,26 @@ str/libcares_la-ares_strsplit.lo: str/ares_strsplit.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o str/libcares_la-ares_strsplit.lo `test -f 'str/ares_strsplit.c' || echo '$(srcdir)/'`str/ares_strsplit.c
 
-util/libcares_la-ares__iface_ips.lo: util/ares__iface_ips.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares__iface_ips.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares__iface_ips.Tpo -c -o util/libcares_la-ares__iface_ips.lo `test -f 'util/ares__iface_ips.c' || echo '$(srcdir)/'`util/ares__iface_ips.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares__iface_ips.Tpo util/$(DEPDIR)/libcares_la-ares__iface_ips.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares__iface_ips.c' object='util/libcares_la-ares__iface_ips.lo' libtool=yes @AMDEPBACKSLASH@
+util/libcares_la-ares_iface_ips.lo: util/ares_iface_ips.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares_iface_ips.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares_iface_ips.Tpo -c -o util/libcares_la-ares_iface_ips.lo `test -f 'util/ares_iface_ips.c' || echo '$(srcdir)/'`util/ares_iface_ips.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares_iface_ips.Tpo util/$(DEPDIR)/libcares_la-ares_iface_ips.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares_iface_ips.c' object='util/libcares_la-ares_iface_ips.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares__iface_ips.lo `test -f 'util/ares__iface_ips.c' || echo '$(srcdir)/'`util/ares__iface_ips.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares_iface_ips.lo `test -f 'util/ares_iface_ips.c' || echo '$(srcdir)/'`util/ares_iface_ips.c
 
-util/libcares_la-ares__threads.lo: util/ares__threads.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares__threads.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares__threads.Tpo -c -o util/libcares_la-ares__threads.lo `test -f 'util/ares__threads.c' || echo '$(srcdir)/'`util/ares__threads.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares__threads.Tpo util/$(DEPDIR)/libcares_la-ares__threads.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares__threads.c' object='util/libcares_la-ares__threads.lo' libtool=yes @AMDEPBACKSLASH@
+util/libcares_la-ares_threads.lo: util/ares_threads.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares_threads.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares_threads.Tpo -c -o util/libcares_la-ares_threads.lo `test -f 'util/ares_threads.c' || echo '$(srcdir)/'`util/ares_threads.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares_threads.Tpo util/$(DEPDIR)/libcares_la-ares_threads.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares_threads.c' object='util/libcares_la-ares_threads.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares__threads.lo `test -f 'util/ares__threads.c' || echo '$(srcdir)/'`util/ares__threads.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares_threads.lo `test -f 'util/ares_threads.c' || echo '$(srcdir)/'`util/ares_threads.c
 
-util/libcares_la-ares__timeval.lo: util/ares__timeval.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares__timeval.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares__timeval.Tpo -c -o util/libcares_la-ares__timeval.lo `test -f 'util/ares__timeval.c' || echo '$(srcdir)/'`util/ares__timeval.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares__timeval.Tpo util/$(DEPDIR)/libcares_la-ares__timeval.Plo
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares__timeval.c' object='util/libcares_la-ares__timeval.lo' libtool=yes @AMDEPBACKSLASH@
+util/libcares_la-ares_timeval.lo: util/ares_timeval.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares_timeval.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares_timeval.Tpo -c -o util/libcares_la-ares_timeval.lo `test -f 'util/ares_timeval.c' || echo '$(srcdir)/'`util/ares_timeval.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares_timeval.Tpo util/$(DEPDIR)/libcares_la-ares_timeval.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares_timeval.c' object='util/libcares_la-ares_timeval.lo' libtool=yes @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares__timeval.lo `test -f 'util/ares__timeval.c' || echo '$(srcdir)/'`util/ares__timeval.c
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares_timeval.lo `test -f 'util/ares_timeval.c' || echo '$(srcdir)/'`util/ares_timeval.c
 
 util/libcares_la-ares_math.lo: util/ares_math.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares_math.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares_math.Tpo -c -o util/libcares_la-ares_math.lo `test -f 'util/ares_math.c' || echo '$(srcdir)/'`util/ares_math.c
@@ -1733,6 +1768,13 @@ util/libcares_la-ares_rand.lo: util/ares_rand.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares_rand.lo `test -f 'util/ares_rand.c' || echo '$(srcdir)/'`util/ares_rand.c
 
+util/libcares_la-ares_uri.lo: util/ares_uri.c
+@am__fastdepCC_TRUE@	$(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -MT util/libcares_la-ares_uri.lo -MD -MP -MF util/$(DEPDIR)/libcares_la-ares_uri.Tpo -c -o util/libcares_la-ares_uri.lo `test -f 'util/ares_uri.c' || echo '$(srcdir)/'`util/ares_uri.c
+@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) util/$(DEPDIR)/libcares_la-ares_uri.Tpo util/$(DEPDIR)/libcares_la-ares_uri.Plo
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='util/ares_uri.c' object='util/libcares_la-ares_uri.lo' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libcares_la_CPPFLAGS) $(CPPFLAGS) $(libcares_la_CFLAGS) $(CFLAGS) -c -o util/libcares_la-ares_uri.lo `test -f 'util/ares_uri.c' || echo '$(srcdir)/'`util/ares_uri.c
+
 mostlyclean-libtool:
 	-rm -f *.lo
 
@@ -1958,15 +2000,12 @@ clean-am: clean-generic clean-libLTLIBRARIES clean-libtool \
 	mostlyclean-am
 
 distclean: distclean-recursive
-	-rm -f ./$(DEPDIR)/libcares_la-ares__addrinfo2hostent.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__addrinfo_localhost.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__close_sockets.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__hosts_file.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__socket.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__sortaddrinfo.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_android.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cancel.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_close_sockets.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_conn.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cookie.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_data.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_destroy.Plo
@@ -1978,16 +2017,20 @@ distclean: distclean-recursive
 	-rm -f ./$(DEPDIR)/libcares_la-ares_gethostbyaddr.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_gethostbyname.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_getnameinfo.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_hosts_file.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_init.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_library_init.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_metrics.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_options.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares_platform.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_process.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_qcache.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_query.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_search.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_send.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_set_socket_functions.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_socket.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_sortaddrinfo.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_strerror.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_sysconfig.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_sysconfig_files.Plo
@@ -1999,14 +2042,16 @@ distclean: distclean-recursive
 	-rm -f ./$(DEPDIR)/libcares_la-inet_net_pton.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-inet_ntop.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-windows_port.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__array.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__llist.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__slist.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_array.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_llist.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_slist.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_configchg.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_epoll.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_kqueue.Plo
@@ -2037,15 +2082,15 @@ distclean: distclean-recursive
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_parse.Plo
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_record.Plo
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_write.Plo
-	-rm -f str/$(DEPDIR)/libcares_la-ares__buf.Plo
+	-rm -f str/$(DEPDIR)/libcares_la-ares_buf.Plo
 	-rm -f str/$(DEPDIR)/libcares_la-ares_str.Plo
-	-rm -f str/$(DEPDIR)/libcares_la-ares_strcasecmp.Plo
 	-rm -f str/$(DEPDIR)/libcares_la-ares_strsplit.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__iface_ips.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__threads.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__timeval.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_iface_ips.Plo
 	-rm -f util/$(DEPDIR)/libcares_la-ares_math.Plo
 	-rm -f util/$(DEPDIR)/libcares_la-ares_rand.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_threads.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_timeval.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_uri.Plo
 	-rm -f Makefile
 distclean-am: clean-am distclean-compile distclean-generic \
 	distclean-hdr distclean-tags
@@ -2091,15 +2136,12 @@ install-ps-am:
 installcheck-am:
 
 maintainer-clean: maintainer-clean-recursive
-	-rm -f ./$(DEPDIR)/libcares_la-ares__addrinfo2hostent.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__addrinfo_localhost.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__close_sockets.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__hosts_file.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__parse_into_addrinfo.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__socket.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares__sortaddrinfo.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_android.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cancel.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_close_sockets.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_conn.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cookie.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_data.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_destroy.Plo
@@ -2111,16 +2153,20 @@ maintainer-clean: maintainer-clean-recursive
 	-rm -f ./$(DEPDIR)/libcares_la-ares_gethostbyaddr.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_gethostbyname.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_getnameinfo.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_hosts_file.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_init.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_library_init.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_metrics.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_options.Plo
-	-rm -f ./$(DEPDIR)/libcares_la-ares_platform.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_parse_into_addrinfo.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_process.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_qcache.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_query.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_search.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_send.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_set_socket_functions.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_socket.Plo
+	-rm -f ./$(DEPDIR)/libcares_la-ares_sortaddrinfo.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_strerror.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_sysconfig.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_sysconfig_files.Plo
@@ -2132,14 +2178,16 @@ maintainer-clean: maintainer-clean-recursive
 	-rm -f ./$(DEPDIR)/libcares_la-inet_net_pton.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-inet_ntop.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-windows_port.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__array.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_asvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_strvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_szvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__htable_vpvp.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__llist.Plo
-	-rm -f dsa/$(DEPDIR)/libcares_la-ares__slist.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_array.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_asvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_dict.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_strvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_szvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_vpstr.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_htable_vpvp.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_llist.Plo
+	-rm -f dsa/$(DEPDIR)/libcares_la-ares_slist.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_configchg.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_epoll.Plo
 	-rm -f event/$(DEPDIR)/libcares_la-ares_event_kqueue.Plo
@@ -2170,15 +2218,15 @@ maintainer-clean: maintainer-clean-recursive
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_parse.Plo
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_record.Plo
 	-rm -f record/$(DEPDIR)/libcares_la-ares_dns_write.Plo
-	-rm -f str/$(DEPDIR)/libcares_la-ares__buf.Plo
+	-rm -f str/$(DEPDIR)/libcares_la-ares_buf.Plo
 	-rm -f str/$(DEPDIR)/libcares_la-ares_str.Plo
-	-rm -f str/$(DEPDIR)/libcares_la-ares_strcasecmp.Plo
 	-rm -f str/$(DEPDIR)/libcares_la-ares_strsplit.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__iface_ips.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__threads.Plo
-	-rm -f util/$(DEPDIR)/libcares_la-ares__timeval.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_iface_ips.Plo
 	-rm -f util/$(DEPDIR)/libcares_la-ares_math.Plo
 	-rm -f util/$(DEPDIR)/libcares_la-ares_rand.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_threads.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_timeval.Plo
+	-rm -f util/$(DEPDIR)/libcares_la-ares_uri.Plo
 	-rm -f Makefile
 maintainer-clean-am: distclean-am maintainer-clean-generic
 
diff --git a/deps/cares/src/lib/Makefile.inc b/deps/cares/src/lib/Makefile.inc
index 8fa434c3e2c860..10e4edf5c761a0 100644
--- a/deps/cares/src/lib/Makefile.inc
+++ b/deps/cares/src/lib/Makefile.inc
@@ -1,15 +1,12 @@
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
 
-CSOURCES = ares__addrinfo2hostent.c	\
-  ares__addrinfo_localhost.c		\
-  ares__close_sockets.c			\
-  ares__hosts_file.c			\
-  ares__parse_into_addrinfo.c		\
-  ares__socket.c			\
-  ares__sortaddrinfo.c			\
+CSOURCES = ares_addrinfo2hostent.c	\
+  ares_addrinfo_localhost.c		\
   ares_android.c			\
   ares_cancel.c				\
+  ares_close_sockets.c			\
+  ares_conn.c				\
   ares_cookie.c				\
   ares_data.c				\
   ares_destroy.c			\
@@ -21,16 +18,20 @@ CSOURCES = ares__addrinfo2hostent.c	\
   ares_gethostbyaddr.c			\
   ares_gethostbyname.c			\
   ares_getnameinfo.c			\
+  ares_hosts_file.c			\
   ares_init.c				\
   ares_library_init.c			\
   ares_metrics.c			\
   ares_options.c			\
-  ares_platform.c			\
+  ares_parse_into_addrinfo.c		\
   ares_process.c			\
   ares_qcache.c				\
   ares_query.c				\
   ares_search.c				\
   ares_send.c				\
+  ares_set_socket_functions.c		\
+  ares_socket.c				\
+  ares_sortaddrinfo.c			\
   ares_strerror.c			\
   ares_sysconfig.c			\
   ares_sysconfig_files.c		\
@@ -42,14 +43,16 @@ CSOURCES = ares__addrinfo2hostent.c	\
   inet_net_pton.c			\
   inet_ntop.c				\
   windows_port.c			\
-  dsa/ares__array.c			\
-  dsa/ares__htable.c			\
-  dsa/ares__htable_asvp.c		\
-  dsa/ares__htable_strvp.c		\
-  dsa/ares__htable_szvp.c		\
-  dsa/ares__htable_vpvp.c		\
-  dsa/ares__llist.c			\
-  dsa/ares__slist.c			\
+  dsa/ares_array.c			\
+  dsa/ares_htable.c			\
+  dsa/ares_htable_asvp.c		\
+  dsa/ares_htable_dict.c		\
+  dsa/ares_htable_strvp.c		\
+  dsa/ares_htable_szvp.c		\
+  dsa/ares_htable_vpstr.c		\
+  dsa/ares_htable_vpvp.c		\
+  dsa/ares_llist.c			\
+  dsa/ares_slist.c			\
   event/ares_event_configchg.c		\
   event/ares_event_epoll.c		\
   event/ares_event_kqueue.c		\
@@ -80,41 +83,47 @@ CSOURCES = ares__addrinfo2hostent.c	\
   record/ares_dns_parse.c		\
   record/ares_dns_record.c		\
   record/ares_dns_write.c		\
-  str/ares__buf.c			\
-  str/ares_strcasecmp.c			\
+  str/ares_buf.c			\
   str/ares_str.c			\
   str/ares_strsplit.c			\
-  util/ares__iface_ips.c		\
-  util/ares__threads.c			\
-  util/ares__timeval.c			\
+  util/ares_iface_ips.c			\
+  util/ares_threads.c			\
+  util/ares_timeval.c			\
   util/ares_math.c			\
-  util/ares_rand.c
+  util/ares_rand.c			\
+  util/ares_uri.c
 
 HHEADERS = ares_android.h			\
+  ares_conn.h				\
   ares_data.h				\
   ares_getenv.h				\
   ares_inet_net_pton.h			\
   ares_ipv6.h				\
-  ares_platform.h			\
   ares_private.h			\
   ares_setup.h				\
-  dsa/ares__array.h			\
-  dsa/ares__htable.h			\
-  dsa/ares__htable_asvp.h		\
-  dsa/ares__htable_strvp.h		\
-  dsa/ares__htable_szvp.h		\
-  dsa/ares__htable_vpvp.h		\
-  dsa/ares__llist.h			\
-  dsa/ares__slist.h			\
+  ares_socket.h				\
+  dsa/ares_htable.h			\
+  dsa/ares_slist.h			\
   event/ares_event.h			\
   event/ares_event_win32.h		\
+  include/ares_array.h			\
+  include/ares_buf.h			\
+  include/ares_htable_asvp.h		\
+  include/ares_htable_dict.h		\
+  include/ares_htable_strvp.h		\
+  include/ares_htable_szvp.h		\
+  include/ares_htable_vpstr.h		\
+  include/ares_htable_vpvp.h		\
+  include/ares_llist.h			\
+  include/ares_mem.h			\
+  include/ares_str.h			\
   record/ares_dns_multistring.h		\
   record/ares_dns_private.h		\
-  str/ares__buf.h			\
-  str/ares_strcasecmp.h			\
-  str/ares_str.h			\
   str/ares_strsplit.h			\
-  util/ares__iface_ips.h		\
-  util/ares__threads.h			\
+  util/ares_iface_ips.h			\
+  util/ares_math.h			\
+  util/ares_rand.h			\
+  util/ares_time.h			\
+  util/ares_threads.h			\
+  util/ares_uri.h			\
   thirdparty/apple/dnsinfo.h
-
diff --git a/deps/cares/src/lib/ares__socket.c b/deps/cares/src/lib/ares__socket.c
deleted file mode 100644
index 86e281fcddadd4..00000000000000
--- a/deps/cares/src/lib/ares__socket.c
+++ /dev/null
@@ -1,764 +0,0 @@
-/* MIT License
- *
- * Copyright (c) Massachusetts Institute of Technology
- * Copyright (c) The c-ares project and its contributors
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- * SPDX-License-Identifier: MIT
- */
-#include "ares_private.h"
-
-#ifdef HAVE_SYS_UIO_H
-#  include <sys/uio.h>
-#endif
-#ifdef HAVE_NETINET_IN_H
-#  include <netinet/in.h>
-#endif
-#ifdef HAVE_NETINET_TCP_H
-#  include <netinet/tcp.h>
-#endif
-#ifdef HAVE_NETDB_H
-#  include <netdb.h>
-#endif
-#ifdef HAVE_ARPA_INET_H
-#  include <arpa/inet.h>
-#endif
-
-#ifdef HAVE_STRINGS_H
-#  include <strings.h>
-#endif
-#ifdef HAVE_SYS_IOCTL_H
-#  include <sys/ioctl.h>
-#endif
-#ifdef NETWARE
-#  include <sys/filio.h>
-#endif
-
-#include <assert.h>
-#include <fcntl.h>
-#include <limits.h>
-
-#if defined(__linux__) && defined(TCP_FASTOPEN_CONNECT)
-#  define TFO_SUPPORTED      1
-#  define TFO_SKIP_CONNECT   0
-#  define TFO_USE_SENDTO     0
-#  define TFO_USE_CONNECTX   0
-#  define TFO_CLIENT_SOCKOPT TCP_FASTOPEN_CONNECT
-#elif defined(__FreeBSD__) && defined(TCP_FASTOPEN)
-#  define TFO_SUPPORTED      1
-#  define TFO_SKIP_CONNECT   1
-#  define TFO_USE_SENDTO     1
-#  define TFO_USE_CONNECTX   0
-#  define TFO_CLIENT_SOCKOPT TCP_FASTOPEN
-#elif defined(__APPLE__) && defined(HAVE_CONNECTX)
-#  define TFO_SUPPORTED    1
-#  define TFO_SKIP_CONNECT 0
-#  define TFO_USE_SENDTO   0
-#  define TFO_USE_CONNECTX 1
-#  undef TFO_CLIENT_SOCKOPT
-#else
-#  define TFO_SUPPORTED 0
-#endif
-
-
-#ifndef HAVE_WRITEV
-/* Structure for scatter/gather I/O. */
-struct iovec {
-  void  *iov_base; /* Pointer to data. */
-  size_t iov_len;  /* Length of data.  */
-};
-#endif
-
-
-/* Return 1 if the specified error number describes a readiness error, or 0
- * otherwise. This is mostly for HP-UX, which could return EAGAIN or
- * EWOULDBLOCK. See this man page
- *
- * http://devrsrc1.external.hp.com/STKS/cgi-bin/man2html?
- *     manpage=/usr/share/man/man2.Z/send.2
- */
-ares_bool_t ares__socket_try_again(int errnum)
-{
-#if !defined EWOULDBLOCK && !defined EAGAIN
-#  error "Neither EWOULDBLOCK nor EAGAIN defined"
-#endif
-
-#ifdef EWOULDBLOCK
-  if (errnum == EWOULDBLOCK) {
-    return ARES_TRUE;
-  }
-#endif
-
-#if defined EAGAIN && EAGAIN != EWOULDBLOCK
-  if (errnum == EAGAIN) {
-    return ARES_TRUE;
-  }
-#endif
-
-  return ARES_FALSE;
-}
-
-ares_ssize_t ares__socket_recv(ares_channel_t *channel, ares_socket_t s,
-                               void *data, size_t data_len)
-{
-  if (channel->sock_funcs && channel->sock_funcs->arecvfrom) {
-    return channel->sock_funcs->arecvfrom(s, data, data_len, 0, 0, 0,
-                                          channel->sock_func_cb_data);
-  }
-
-  return (ares_ssize_t)recv((RECV_TYPE_ARG1)s, (RECV_TYPE_ARG2)data,
-                            (RECV_TYPE_ARG3)data_len, (RECV_TYPE_ARG4)(0));
-}
-
-ares_ssize_t ares__socket_recvfrom(ares_channel_t *channel, ares_socket_t s,
-                                   void *data, size_t data_len, int flags,
-                                   struct sockaddr *from,
-                                   ares_socklen_t  *from_len)
-{
-  if (channel->sock_funcs && channel->sock_funcs->arecvfrom) {
-    return channel->sock_funcs->arecvfrom(s, data, data_len, flags, from,
-                                          from_len, channel->sock_func_cb_data);
-  }
-
-#ifdef HAVE_RECVFROM
-  return (ares_ssize_t)recvfrom(s, data, (RECVFROM_TYPE_ARG3)data_len, flags,
-                                from, from_len);
-#else
-  return ares__socket_recv(channel, s, data, data_len);
-#endif
-}
-
-/* Use like:
- *   struct sockaddr_storage sa_storage;
- *   ares_socklen_t          salen     = sizeof(sa_storage);
- *   struct sockaddr        *sa        = (struct sockaddr *)&sa_storage;
- *   ares__conn_set_sockaddr(conn, sa, &salen);
- */
-static ares_status_t ares__conn_set_sockaddr(const ares_conn_t *conn,
-                                             struct sockaddr   *sa,
-                                             ares_socklen_t    *salen)
-{
-  const ares_server_t *server = conn->server;
-  unsigned short       port =
-    conn->flags & ARES_CONN_FLAG_TCP ? server->tcp_port : server->udp_port;
-  struct sockaddr_in  *sin;
-  struct sockaddr_in6 *sin6;
-
-  switch (server->addr.family) {
-    case AF_INET:
-      sin = (struct sockaddr_in *)(void *)sa;
-      if (*salen < (ares_socklen_t)sizeof(*sin)) {
-        return ARES_EFORMERR;
-      }
-      *salen = sizeof(*sin);
-      memset(sin, 0, sizeof(*sin));
-      sin->sin_family = AF_INET;
-      sin->sin_port   = htons(port);
-      memcpy(&sin->sin_addr, &server->addr.addr.addr4, sizeof(sin->sin_addr));
-      return ARES_SUCCESS;
-    case AF_INET6:
-      sin6 = (struct sockaddr_in6 *)(void *)sa;
-      if (*salen < (ares_socklen_t)sizeof(*sin6)) {
-        return ARES_EFORMERR;
-      }
-      *salen = sizeof(*sin6);
-      memset(sin6, 0, sizeof(*sin6));
-      sin6->sin6_family = AF_INET6;
-      sin6->sin6_port   = htons(port);
-      memcpy(&sin6->sin6_addr, &server->addr.addr.addr6,
-             sizeof(sin6->sin6_addr));
-#ifdef HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID
-      sin6->sin6_scope_id = server->ll_scope;
-#endif
-      return ARES_SUCCESS;
-    default:
-      break;
-  }
-
-  return ARES_EBADFAMILY;
-}
-
-static ares_status_t ares_conn_set_self_ip(ares_conn_t *conn, ares_bool_t early)
-{
-  struct sockaddr_storage sa_storage;
-  int                     rv;
-  ares_socklen_t          len = sizeof(sa_storage);
-
-  /* We call this twice on TFO, if we already have the IP we can go ahead and
-   * skip processing */
-  if (!early && conn->self_ip.family != AF_UNSPEC) {
-    return ARES_SUCCESS;
-  }
-
-  memset(&sa_storage, 0, sizeof(sa_storage));
-
-  rv = getsockname(conn->fd, (struct sockaddr *)(void *)&sa_storage, &len);
-  if (rv != 0) {
-    /* During TCP FastOpen, we can't get the IP this early since connect()
-     * may not be called.  That's ok, we'll try again later */
-    if (early && conn->flags & ARES_CONN_FLAG_TCP &&
-        conn->flags & ARES_CONN_FLAG_TFO) {
-      memset(&conn->self_ip, 0, sizeof(conn->self_ip));
-      return ARES_SUCCESS;
-    }
-    return ARES_ECONNREFUSED;
-  }
-
-  if (!ares_sockaddr_to_ares_addr(&conn->self_ip, NULL,
-                                  (struct sockaddr *)(void *)&sa_storage)) {
-    return ARES_ECONNREFUSED;
-  }
-
-  return ARES_SUCCESS;
-}
-
-ares_ssize_t ares__conn_write(ares_conn_t *conn, const void *data, size_t len)
-{
-  ares_channel_t *channel = conn->server->channel;
-  int             flags   = 0;
-
-#ifdef HAVE_MSG_NOSIGNAL
-  flags |= MSG_NOSIGNAL;
-#endif
-
-  if (channel->sock_funcs && channel->sock_funcs->asendv) {
-    struct iovec vec;
-    vec.iov_base = (void *)((size_t)data); /* Cast off const */
-    vec.iov_len  = len;
-    return channel->sock_funcs->asendv(conn->fd, &vec, 1,
-                                       channel->sock_func_cb_data);
-  }
-
-  if (conn->flags & ARES_CONN_FLAG_TFO_INITIAL) {
-    conn->flags &= ~((unsigned int)ARES_CONN_FLAG_TFO_INITIAL);
-
-#if defined(TFO_USE_SENDTO) && TFO_USE_SENDTO
-    {
-      struct sockaddr_storage sa_storage;
-      ares_socklen_t          salen = sizeof(sa_storage);
-      struct sockaddr        *sa    = (struct sockaddr *)&sa_storage;
-      ares_status_t           status;
-      ares_ssize_t            rv;
-
-      status = ares__conn_set_sockaddr(conn, sa, &salen);
-      if (status != ARES_SUCCESS) {
-        return status;
-      }
-
-      rv = (ares_ssize_t)sendto((SEND_TYPE_ARG1)conn->fd, (SEND_TYPE_ARG2)data,
-                                (SEND_TYPE_ARG3)len, (SEND_TYPE_ARG4)flags, sa,
-                                salen);
-
-      /* If using TFO, we might not have been able to get an IP earlier, since
-       * we hadn't informed the OS of the destination.  When using sendto()
-       * now we have so we should be able to fetch it */
-      ares_conn_set_self_ip(conn, ARES_TRUE);
-      return rv;
-    }
-#endif
-  }
-
-  return (ares_ssize_t)send((SEND_TYPE_ARG1)conn->fd, (SEND_TYPE_ARG2)data,
-                            (SEND_TYPE_ARG3)len, (SEND_TYPE_ARG4)flags);
-}
-
-/*
- * setsocknonblock sets the given socket to either blocking or non-blocking
- * mode based on the 'nonblock' boolean argument. This function is highly
- * portable.
- */
-static int setsocknonblock(ares_socket_t sockfd, /* operate on this */
-                           int           nonblock /* TRUE or FALSE */)
-{
-#if defined(USE_BLOCKING_SOCKETS)
-
-  return 0; /* returns success */
-
-#elif defined(HAVE_FCNTL_O_NONBLOCK)
-
-  /* most recent unix versions */
-  int flags;
-  flags = fcntl(sockfd, F_GETFL, 0);
-  if (nonblock) {
-    return fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
-  } else {
-    return fcntl(sockfd, F_SETFL, flags & (~O_NONBLOCK)); /* LCOV_EXCL_LINE */
-  }
-
-#elif defined(HAVE_IOCTL_FIONBIO)
-
-  /* older unix versions */
-  int flags = nonblock ? 1 : 0;
-  return ioctl(sockfd, FIONBIO, &flags);
-
-#elif defined(HAVE_IOCTLSOCKET_FIONBIO)
-
-#  ifdef WATT32
-  char flags = nonblock ? 1 : 0;
-#  else
-  /* Windows */
-  unsigned long flags = nonblock ? 1UL : 0UL;
-#  endif
-  return ioctlsocket(sockfd, (long)FIONBIO, &flags);
-
-#elif defined(HAVE_IOCTLSOCKET_CAMEL_FIONBIO)
-
-  /* Amiga */
-  long flags = nonblock ? 1L : 0L;
-  return IoctlSocket(sockfd, FIONBIO, flags);
-
-#elif defined(HAVE_SETSOCKOPT_SO_NONBLOCK)
-
-  /* BeOS */
-  long b = nonblock ? 1L : 0L;
-  return setsockopt(sockfd, SOL_SOCKET, SO_NONBLOCK, &b, sizeof(b));
-
-#else
-#  error "no non-blocking method was found/used/set"
-#endif
-}
-
-#if defined(IPV6_V6ONLY) && defined(USE_WINSOCK)
-/* It makes support for IPv4-mapped IPv6 addresses.
- * Linux kernel, NetBSD, FreeBSD and Darwin: default is off;
- * Windows Vista and later: default is on;
- * DragonFly BSD: acts like off, and dummy setting;
- * OpenBSD and earlier Windows: unsupported.
- * Linux: controlled by /proc/sys/net/ipv6/bindv6only.
- */
-static void set_ipv6_v6only(ares_socket_t sockfd, int on)
-{
-  (void)setsockopt(sockfd, IPPROTO_IPV6, IPV6_V6ONLY, (void *)&on, sizeof(on));
-}
-#else
-#  define set_ipv6_v6only(s, v)
-#endif
-
-static ares_status_t configure_socket(ares_conn_t *conn)
-{
-  union {
-    struct sockaddr     sa;
-    struct sockaddr_in  sa4;
-    struct sockaddr_in6 sa6;
-  } local;
-
-  ares_socklen_t  bindlen = 0;
-  ares_server_t  *server  = conn->server;
-  ares_channel_t *channel = server->channel;
-
-  /* do not set options for user-managed sockets */
-  if (channel->sock_funcs && channel->sock_funcs->asocket) {
-    return ARES_SUCCESS;
-  }
-
-  (void)setsocknonblock(conn->fd, 1);
-
-#if defined(FD_CLOEXEC) && !defined(MSDOS)
-  /* Configure the socket fd as close-on-exec. */
-  if (fcntl(conn->fd, F_SETFD, FD_CLOEXEC) != 0) {
-    return ARES_ECONNREFUSED; /* LCOV_EXCL_LINE */
-  }
-#endif
-
-  /* No need to emit SIGPIPE on socket errors */
-#if defined(SO_NOSIGPIPE)
-  {
-    int opt = 1;
-    (void)setsockopt(conn->fd, SOL_SOCKET, SO_NOSIGPIPE, (void *)&opt,
-                     sizeof(opt));
-  }
-#endif
-
-  /* Set the socket's send and receive buffer sizes. */
-  if (channel->socket_send_buffer_size > 0 &&
-      setsockopt(conn->fd, SOL_SOCKET, SO_SNDBUF,
-                 (void *)&channel->socket_send_buffer_size,
-                 sizeof(channel->socket_send_buffer_size)) != 0) {
-    return ARES_ECONNREFUSED; /* LCOV_EXCL_LINE: UntestablePath */
-  }
-
-  if (channel->socket_receive_buffer_size > 0 &&
-      setsockopt(conn->fd, SOL_SOCKET, SO_RCVBUF,
-                 (void *)&channel->socket_receive_buffer_size,
-                 sizeof(channel->socket_receive_buffer_size)) != 0) {
-    return ARES_ECONNREFUSED; /* LCOV_EXCL_LINE: UntestablePath */
-  }
-
-#ifdef SO_BINDTODEVICE
-  if (ares_strlen(channel->local_dev_name)) {
-    /* Only root can do this, and usually not fatal if it doesn't work, so
-     * just continue on. */
-    (void)setsockopt(conn->fd, SOL_SOCKET, SO_BINDTODEVICE,
-                     channel->local_dev_name, sizeof(channel->local_dev_name));
-  }
-#endif
-
-  if (server->addr.family == AF_INET && channel->local_ip4) {
-    memset(&local.sa4, 0, sizeof(local.sa4));
-    local.sa4.sin_family      = AF_INET;
-    local.sa4.sin_addr.s_addr = htonl(channel->local_ip4);
-    bindlen                   = sizeof(local.sa4);
-  } else if (server->addr.family == AF_INET6 && server->ll_scope == 0 &&
-             memcmp(channel->local_ip6, ares_in6addr_any._S6_un._S6_u8,
-                    sizeof(channel->local_ip6)) != 0) {
-    /* Only if not link-local and an ip other than "::" is specified */
-    memset(&local.sa6, 0, sizeof(local.sa6));
-    local.sa6.sin6_family = AF_INET6;
-    memcpy(&local.sa6.sin6_addr, channel->local_ip6,
-           sizeof(channel->local_ip6));
-    bindlen = sizeof(local.sa6);
-  }
-
-  if (bindlen && bind(conn->fd, &local.sa, bindlen) < 0) {
-    return ARES_ECONNREFUSED;
-  }
-
-  if (server->addr.family == AF_INET6) {
-    set_ipv6_v6only(conn->fd, 0);
-  }
-
-  if (conn->flags & ARES_CONN_FLAG_TCP) {
-    int opt = 1;
-
-#ifdef TCP_NODELAY
-    /*
-     * Disable the Nagle algorithm (only relevant for TCP sockets, and thus not
-     * in configure_socket). In general, in DNS lookups we're pretty much
-     * interested in firing off a single request and then waiting for a reply,
-     * so batching isn't very interesting.
-     */
-    if (setsockopt(conn->fd, IPPROTO_TCP, TCP_NODELAY, (void *)&opt,
-                   sizeof(opt)) != 0) {
-      return ARES_ECONNREFUSED;
-    }
-#endif
-
-#if defined(TFO_CLIENT_SOCKOPT)
-    if (conn->flags & ARES_CONN_FLAG_TFO &&
-        setsockopt(conn->fd, IPPROTO_TCP, TFO_CLIENT_SOCKOPT, (void *)&opt,
-                   sizeof(opt)) != 0) {
-      /* Disable TFO if flag can't be set. */
-      conn->flags &= ~((unsigned int)ARES_CONN_FLAG_TFO);
-    }
-#endif
-  }
-
-  return ARES_SUCCESS;
-}
-
-ares_bool_t ares_sockaddr_to_ares_addr(struct ares_addr      *ares_addr,
-                                       unsigned short        *port,
-                                       const struct sockaddr *sockaddr)
-{
-  if (sockaddr->sa_family == AF_INET) {
-    /* NOTE: memcpy sockaddr_in due to alignment issues found by UBSAN due to
-     *       dnsinfo packing on MacOS */
-    struct sockaddr_in sockaddr_in;
-    memcpy(&sockaddr_in, sockaddr, sizeof(sockaddr_in));
-
-    ares_addr->family = AF_INET;
-    memcpy(&ares_addr->addr.addr4, &(sockaddr_in.sin_addr),
-           sizeof(ares_addr->addr.addr4));
-
-    if (port) {
-      *port = ntohs(sockaddr_in.sin_port);
-    }
-    return ARES_TRUE;
-  }
-
-  if (sockaddr->sa_family == AF_INET6) {
-    /* NOTE: memcpy sockaddr_in6 due to alignment issues found by UBSAN due to
-     *       dnsinfo packing on MacOS */
-    struct sockaddr_in6 sockaddr_in6;
-    memcpy(&sockaddr_in6, sockaddr, sizeof(sockaddr_in6));
-
-    ares_addr->family = AF_INET6;
-    memcpy(&ares_addr->addr.addr6, &(sockaddr_in6.sin6_addr),
-           sizeof(ares_addr->addr.addr6));
-    if (port) {
-      *port = ntohs(sockaddr_in6.sin6_port);
-    }
-    return ARES_TRUE;
-  }
-
-  return ARES_FALSE;
-}
-
-static ares_status_t ares__conn_connect(ares_conn_t *conn, struct sockaddr *sa,
-                                        ares_socklen_t salen)
-{
-  /* Normal non TCPFastOpen style connect */
-  if (!(conn->flags & ARES_CONN_FLAG_TFO)) {
-    return ares__connect_socket(conn->server->channel, conn->fd, sa, salen);
-  }
-
-  /* FreeBSD don't want any sort of connect() so skip */
-#if defined(TFO_SKIP_CONNECT) && TFO_SKIP_CONNECT
-  return ARES_SUCCESS;
-#elif defined(TFO_USE_CONNECTX) && TFO_USE_CONNECTX
-  {
-    int rv;
-    int err;
-
-    do {
-      sa_endpoints_t endpoints;
-      memset(&endpoints, 0, sizeof(endpoints));
-      endpoints.sae_dstaddr    = sa;
-      endpoints.sae_dstaddrlen = salen;
-
-      rv = connectx(conn->fd, &endpoints, SAE_ASSOCID_ANY,
-                    CONNECT_DATA_IDEMPOTENT | CONNECT_RESUME_ON_READ_WRITE,
-                    NULL, 0, NULL, NULL);
-
-      err = SOCKERRNO;
-      if (rv == -1 && err != EINPROGRESS && err != EWOULDBLOCK) {
-        return ARES_ECONNREFUSED;
-      }
-
-    } while (rv == -1 && err == EINTR);
-  }
-  return ARES_SUCCESS;
-#elif defined(TFO_SUPPORTED) && TFO_SUPPORTED
-  return ares__connect_socket(conn->server->channel, conn->fd, sa, salen);
-#else
-  /* Shouldn't be possible */
-  return ARES_ECONNREFUSED;
-#endif
-}
-
-ares_status_t ares__open_connection(ares_conn_t   **conn_out,
-                                    ares_channel_t *channel,
-                                    ares_server_t *server, ares_bool_t is_tcp)
-{
-  ares_status_t           status;
-  struct sockaddr_storage sa_storage;
-  ares_socklen_t          salen = sizeof(sa_storage);
-  struct sockaddr        *sa    = (struct sockaddr *)&sa_storage;
-  ares_conn_t            *conn;
-  ares__llist_node_t     *node  = NULL;
-  int                     stype = is_tcp ? SOCK_STREAM : SOCK_DGRAM;
-
-  *conn_out = NULL;
-
-  conn = ares_malloc(sizeof(*conn));
-  if (conn == NULL) {
-    return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
-  }
-
-  memset(conn, 0, sizeof(*conn));
-  conn->fd              = ARES_SOCKET_BAD;
-  conn->server          = server;
-  conn->queries_to_conn = ares__llist_create(NULL);
-  conn->flags           = is_tcp ? ARES_CONN_FLAG_TCP : ARES_CONN_FLAG_NONE;
-
-  /* Enable TFO if the OS supports it and we were passed in data to send during
-   * the connect. It might be disabled later if an error is encountered. Make
-   * sure a user isn't overriding anything. */
-  if (conn->flags & ARES_CONN_FLAG_TCP && channel->sock_funcs == NULL &&
-      TFO_SUPPORTED) {
-    conn->flags |= ARES_CONN_FLAG_TFO;
-  }
-
-  if (conn->queries_to_conn == NULL) {
-    /* LCOV_EXCL_START: OutOfMemory */
-    status = ARES_ENOMEM;
-    goto done;
-    /* LCOV_EXCL_STOP */
-  }
-
-  /* Convert into the struct sockaddr structure needed by the OS */
-  status = ares__conn_set_sockaddr(conn, sa, &salen);
-  if (status != ARES_SUCCESS) {
-    goto done;
-  }
-
-  /* Acquire a socket. */
-  conn->fd = ares__open_socket(channel, server->addr.family, stype, 0);
-  if (conn->fd == ARES_SOCKET_BAD) {
-    status = ARES_ECONNREFUSED;
-    goto done;
-  }
-
-  /* Configure it. */
-  status = configure_socket(conn);
-  if (status != ARES_SUCCESS) {
-    goto done;
-  }
-
-  if (channel->sock_config_cb) {
-    int err =
-      channel->sock_config_cb(conn->fd, stype, channel->sock_config_cb_data);
-    if (err < 0) {
-      status = ARES_ECONNREFUSED;
-      goto done;
-    }
-  }
-
-  /* Connect */
-  status = ares__conn_connect(conn, sa, salen);
-  if (status != ARES_SUCCESS) {
-    goto done;
-  }
-
-  if (channel->sock_create_cb) {
-    int err =
-      channel->sock_create_cb(conn->fd, stype, channel->sock_create_cb_data);
-    if (err < 0) {
-      status = ARES_ECONNREFUSED;
-      goto done;
-    }
-  }
-
-  /* Let the connection know we haven't written our first packet yet for TFO */
-  if (conn->flags & ARES_CONN_FLAG_TFO) {
-    conn->flags |= ARES_CONN_FLAG_TFO_INITIAL;
-  }
-
-  /* Need to store our own ip for DNS cookie support */
-  status = ares_conn_set_self_ip(conn, ARES_FALSE);
-  if (status != ARES_SUCCESS) {
-    goto done; /* LCOV_EXCL_LINE: UntestablePath */
-  }
-
-  /* TCP connections are thrown to the end as we don't spawn multiple TCP
-   * connections. UDP connections are put on front where the newest connection
-   * can be quickly pulled */
-  if (is_tcp) {
-    node = ares__llist_insert_last(server->connections, conn);
-  } else {
-    node = ares__llist_insert_first(server->connections, conn);
-  }
-  if (node == NULL) {
-    /* LCOV_EXCL_START: OutOfMemory */
-    status = ARES_ENOMEM;
-    goto done;
-    /* LCOV_EXCL_STOP */
-  }
-
-  /* Register globally to quickly map event on file descriptor to connection
-   * node object */
-  if (!ares__htable_asvp_insert(channel->connnode_by_socket, conn->fd, node)) {
-    /* LCOV_EXCL_START: OutOfMemory */
-    status = ARES_ENOMEM;
-    goto done;
-    /* LCOV_EXCL_STOP */
-  }
-
-  SOCK_STATE_CALLBACK(channel, conn->fd, 1, is_tcp ? 1 : 0);
-
-  if (is_tcp) {
-    server->tcp_conn = conn;
-  }
-
-done:
-  if (status != ARES_SUCCESS) {
-    ares__llist_node_claim(node);
-    ares__llist_destroy(conn->queries_to_conn);
-    ares__close_socket(channel, conn->fd);
-    ares_free(conn);
-  } else {
-    *conn_out = conn;
-  }
-  return status;
-}
-
-ares_socket_t ares__open_socket(ares_channel_t *channel, int af, int type,
-                                int protocol)
-{
-  if (channel->sock_funcs && channel->sock_funcs->asocket) {
-    return channel->sock_funcs->asocket(af, type, protocol,
-                                        channel->sock_func_cb_data);
-  }
-
-  return socket(af, type, protocol);
-}
-
-ares_status_t ares__connect_socket(ares_channel_t        *channel,
-                                   ares_socket_t          sockfd,
-                                   const struct sockaddr *addr,
-                                   ares_socklen_t         addrlen)
-{
-  int rv;
-  int err;
-
-  do {
-    if (channel->sock_funcs && channel->sock_funcs->aconnect) {
-      rv = channel->sock_funcs->aconnect(sockfd, addr, addrlen,
-                                         channel->sock_func_cb_data);
-    } else {
-      rv = connect(sockfd, addr, addrlen);
-    }
-
-    err = SOCKERRNO;
-
-    if (rv == -1 && err != EINPROGRESS && err != EWOULDBLOCK) {
-      return ARES_ECONNREFUSED;
-    }
-
-  } while (rv == -1 && err == EINTR);
-
-  return ARES_SUCCESS;
-}
-
-void ares__close_socket(ares_channel_t *channel, ares_socket_t s)
-{
-  if (s == ARES_SOCKET_BAD) {
-    return;
-  }
-
-  if (channel->sock_funcs && channel->sock_funcs->aclose) {
-    channel->sock_funcs->aclose(s, channel->sock_func_cb_data);
-  } else {
-    sclose(s);
-  }
-}
-
-void ares_set_socket_callback(ares_channel_t           *channel,
-                              ares_sock_create_callback cb, void *data)
-{
-  if (channel == NULL) {
-    return;
-  }
-  channel->sock_create_cb      = cb;
-  channel->sock_create_cb_data = data;
-}
-
-void ares_set_socket_configure_callback(ares_channel_t           *channel,
-                                        ares_sock_config_callback cb,
-                                        void                     *data)
-{
-  if (channel == NULL || channel->optmask & ARES_OPT_EVENT_THREAD) {
-    return;
-  }
-  channel->sock_config_cb      = cb;
-  channel->sock_config_cb_data = data;
-}
-
-void ares_set_socket_functions(ares_channel_t                     *channel,
-                               const struct ares_socket_functions *funcs,
-                               void                               *data)
-{
-  if (channel == NULL || channel->optmask & ARES_OPT_EVENT_THREAD) {
-    return;
-  }
-  channel->sock_funcs        = funcs;
-  channel->sock_func_cb_data = data;
-}
diff --git a/deps/cares/src/lib/ares__addrinfo2hostent.c b/deps/cares/src/lib/ares_addrinfo2hostent.c
similarity index 93%
rename from deps/cares/src/lib/ares__addrinfo2hostent.c
rename to deps/cares/src/lib/ares_addrinfo2hostent.c
index f7b6d1edd251f2..2bbc791157b01e 100644
--- a/deps/cares/src/lib/ares__addrinfo2hostent.c
+++ b/deps/cares/src/lib/ares_addrinfo2hostent.c
@@ -48,8 +48,8 @@
 #endif
 
 
-ares_status_t ares__addrinfo2hostent(const struct ares_addrinfo *ai, int family,
-                                     struct hostent **host)
+ares_status_t ares_addrinfo2hostent(const struct ares_addrinfo *ai, int family,
+                                    struct hostent **host)
 {
   struct ares_addrinfo_node  *next;
   struct ares_addrinfo_cname *next_cname;
@@ -196,11 +196,11 @@ ares_status_t ares__addrinfo2hostent(const struct ares_addrinfo *ai, int family,
   /* LCOV_EXCL_STOP */
 }
 
-ares_status_t ares__addrinfo2addrttl(const struct ares_addrinfo *ai, int family,
-                                     size_t                req_naddrttls,
-                                     struct ares_addrttl  *addrttls,
-                                     struct ares_addr6ttl *addr6ttls,
-                                     size_t               *naddrttls)
+ares_status_t ares_addrinfo2addrttl(const struct ares_addrinfo *ai, int family,
+                                    size_t                req_naddrttls,
+                                    struct ares_addrttl  *addrttls,
+                                    struct ares_addr6ttl *addr6ttls,
+                                    size_t               *naddrttls)
 {
   struct ares_addrinfo_node  *next;
   struct ares_addrinfo_cname *next_cname;
diff --git a/deps/cares/src/lib/ares__addrinfo_localhost.c b/deps/cares/src/lib/ares_addrinfo_localhost.c
similarity index 89%
rename from deps/cares/src/lib/ares__addrinfo_localhost.c
rename to deps/cares/src/lib/ares_addrinfo_localhost.c
index e98dd4e277b056..6f4f2a373b3feb 100644
--- a/deps/cares/src/lib/ares__addrinfo_localhost.c
+++ b/deps/cares/src/lib/ares_addrinfo_localhost.c
@@ -55,7 +55,7 @@ ares_status_t ares_append_ai_node(int aftype, unsigned short port,
 {
   struct ares_addrinfo_node *node;
 
-  node = ares__append_addrinfo_node(nodes);
+  node = ares_append_addrinfo_node(nodes);
   if (!node) {
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -102,8 +102,8 @@ ares_status_t ares_append_ai_node(int aftype, unsigned short port,
 }
 
 static ares_status_t
-  ares__default_loopback_addrs(int aftype, unsigned short port,
-                               struct ares_addrinfo_node **nodes)
+  ares_default_loopback_addrs(int aftype, unsigned short port,
+                              struct ares_addrinfo_node **nodes)
 {
   ares_status_t status = ARES_SUCCESS;
 
@@ -129,8 +129,8 @@ static ares_status_t
 }
 
 static ares_status_t
-  ares__system_loopback_addrs(int aftype, unsigned short port,
-                              struct ares_addrinfo_node **nodes)
+  ares_system_loopback_addrs(int aftype, unsigned short port,
+                             struct ares_addrinfo_node **nodes)
 {
 #if defined(USE_WINSOCK) && defined(_WIN32_WINNT) && _WIN32_WINNT >= 0x0600 && \
   !defined(__WATCOMC__)
@@ -176,7 +176,7 @@ static ares_status_t
   FreeMibTable(table);
 
   if (status != ARES_SUCCESS) {
-    ares__freeaddrinfo_nodes(*nodes);
+    ares_freeaddrinfo_nodes(*nodes);
     *nodes = NULL;
   }
 
@@ -191,9 +191,9 @@ static ares_status_t
 #endif
 }
 
-ares_status_t ares__addrinfo_localhost(const char *name, unsigned short port,
-                                       const struct ares_addrinfo_hints *hints,
-                                       struct ares_addrinfo             *ai)
+ares_status_t ares_addrinfo_localhost(const char *name, unsigned short port,
+                                      const struct ares_addrinfo_hints *hints,
+                                      struct ares_addrinfo             *ai)
 {
   struct ares_addrinfo_node *nodes = NULL;
   ares_status_t              status;
@@ -213,19 +213,19 @@ ares_status_t ares__addrinfo_localhost(const char *name, unsigned short port,
     goto enomem; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__system_loopback_addrs(hints->ai_family, port, &nodes);
+  status = ares_system_loopback_addrs(hints->ai_family, port, &nodes);
 
   if (status == ARES_ENOTFOUND) {
-    status = ares__default_loopback_addrs(hints->ai_family, port, &nodes);
+    status = ares_default_loopback_addrs(hints->ai_family, port, &nodes);
   }
 
-  ares__addrinfo_cat_nodes(&ai->nodes, nodes);
+  ares_addrinfo_cat_nodes(&ai->nodes, nodes);
 
   return status;
 
 /* LCOV_EXCL_START: OutOfMemory */
 enomem:
-  ares__freeaddrinfo_nodes(nodes);
+  ares_freeaddrinfo_nodes(nodes);
   ares_free(ai->name);
   ai->name = NULL;
   return ARES_ENOMEM;
diff --git a/deps/cares/src/lib/ares_android.c b/deps/cares/src/lib/ares_android.c
index 06ab8940ad736d..a8284c1e50235a 100644
--- a/deps/cares/src/lib/ares_android.c
+++ b/deps/cares/src/lib/ares_android.c
@@ -84,9 +84,9 @@ static jmethodID jni_get_method_id(JNIEnv *env, jclass cls,
 
 static int jvm_attach(JNIEnv **env)
 {
-  char              name[17] = {0};
+  char             name[17] = { 0 };
 
-  JavaVMAttachArgs  args;
+  JavaVMAttachArgs args;
 
   args.version = JNI_VERSION_1_6;
   if (prctl(PR_GET_NAME, name) == 0) {
@@ -94,7 +94,7 @@ static int jvm_attach(JNIEnv **env)
   } else {
     args.name = NULL;
   }
-  args.group   = NULL;
+  args.group = NULL;
 
   return (*android_jvm)->AttachCurrentThread(android_jvm, env, &args);
 }
diff --git a/deps/cares/src/lib/ares_cancel.c b/deps/cares/src/lib/ares_cancel.c
index c29d8ef82f4d1a..75600dea6bcdaf 100644
--- a/deps/cares/src/lib/ares_cancel.c
+++ b/deps/cares/src/lib/ares_cancel.c
@@ -37,18 +37,18 @@ void ares_cancel(ares_channel_t *channel)
     return;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  if (ares__llist_len(channel->all_queries) > 0) {
-    ares__llist_node_t *node = NULL;
-    ares__llist_node_t *next = NULL;
+  if (ares_llist_len(channel->all_queries) > 0) {
+    ares_llist_node_t *node = NULL;
+    ares_llist_node_t *next = NULL;
 
     /* Swap list heads, so that only those queries which were present on entry
      * into this function are cancelled. New queries added by callbacks of
      * queries being cancelled will not be cancelled themselves.
      */
-    ares__llist_t      *list_copy = channel->all_queries;
-    channel->all_queries          = ares__llist_create(NULL);
+    ares_llist_t      *list_copy = channel->all_queries;
+    channel->all_queries         = ares_llist_create(NULL);
 
     /* Out of memory, this function doesn't return a result code though so we
      * can't report to caller */
@@ -57,31 +57,31 @@ void ares_cancel(ares_channel_t *channel)
       goto done;                        /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    node = ares__llist_node_first(list_copy);
+    node = ares_llist_node_first(list_copy);
     while (node != NULL) {
       ares_query_t *query;
 
       /* Cache next since this node is being deleted */
-      next = ares__llist_node_next(node);
+      next = ares_llist_node_next(node);
 
-      query                   = ares__llist_node_claim(node);
+      query                   = ares_llist_node_claim(node);
       query->node_all_queries = NULL;
 
       /* NOTE: its possible this may enqueue new queries */
       query->callback(query->arg, ARES_ECANCELLED, 0, NULL);
-      ares__free_query(query);
+      ares_free_query(query);
 
       node = next;
     }
 
-    ares__llist_destroy(list_copy);
+    ares_llist_destroy(list_copy);
   }
 
   /* See if the connections should be cleaned up */
-  ares__check_cleanup_conns(channel);
+  ares_check_cleanup_conns(channel);
 
   ares_queue_notify_empty(channel);
 
 done:
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
diff --git a/deps/cares/src/lib/ares__close_sockets.c b/deps/cares/src/lib/ares_close_sockets.c
similarity index 59%
rename from deps/cares/src/lib/ares__close_sockets.c
rename to deps/cares/src/lib/ares_close_sockets.c
index 71c7e64f08ad38..fd3bf3c4b1e09e 100644
--- a/deps/cares/src/lib/ares__close_sockets.c
+++ b/deps/cares/src/lib/ares_close_sockets.c
@@ -28,81 +28,82 @@
 #include "ares_private.h"
 #include <assert.h>
 
-static void ares__requeue_queries(ares_conn_t  *conn,
-                                  ares_status_t requeue_status)
+static void ares_requeue_queries(ares_conn_t  *conn,
+                                 ares_status_t requeue_status)
 {
   ares_query_t  *query;
   ares_timeval_t now;
 
-  ares__tvnow(&now);
+  ares_tvnow(&now);
 
-  while ((query = ares__llist_first_val(conn->queries_to_conn)) != NULL) {
-    ares__requeue_query(query, &now, requeue_status, ARES_TRUE, NULL);
+  while ((query = ares_llist_first_val(conn->queries_to_conn)) != NULL) {
+    ares_requeue_query(query, &now, requeue_status, ARES_TRUE, NULL);
   }
 }
 
-void ares__close_connection(ares_conn_t *conn, ares_status_t requeue_status)
+void ares_close_connection(ares_conn_t *conn, ares_status_t requeue_status)
 {
   ares_server_t  *server  = conn->server;
   ares_channel_t *channel = server->channel;
 
   /* Unlink */
-  ares__llist_node_claim(
-    ares__htable_asvp_get_direct(channel->connnode_by_socket, conn->fd));
-  ares__htable_asvp_remove(channel->connnode_by_socket, conn->fd);
+  ares_llist_node_claim(
+    ares_htable_asvp_get_direct(channel->connnode_by_socket, conn->fd));
+  ares_htable_asvp_remove(channel->connnode_by_socket, conn->fd);
 
   if (conn->flags & ARES_CONN_FLAG_TCP) {
-    /* Reset any existing input and output buffer. */
-    ares__buf_consume(server->tcp_parser, ares__buf_len(server->tcp_parser));
-    ares__buf_consume(server->tcp_send, ares__buf_len(server->tcp_send));
     server->tcp_conn = NULL;
   }
 
+  ares_buf_destroy(conn->in_buf);
+  ares_buf_destroy(conn->out_buf);
+
   /* Requeue queries to other connections */
-  ares__requeue_queries(conn, requeue_status);
+  ares_requeue_queries(conn, requeue_status);
+
+  ares_llist_destroy(conn->queries_to_conn);
 
-  ares__llist_destroy(conn->queries_to_conn);
+  ares_conn_sock_state_cb_update(conn, ARES_CONN_STATE_NONE);
 
-  SOCK_STATE_CALLBACK(channel, conn->fd, 0, 0);
-  ares__close_socket(channel, conn->fd);
+  ares_socket_close(channel, conn->fd);
 
   ares_free(conn);
 }
 
-void ares__close_sockets(ares_server_t *server)
+void ares_close_sockets(ares_server_t *server)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
-  while ((node = ares__llist_node_first(server->connections)) != NULL) {
-    ares_conn_t *conn = ares__llist_node_val(node);
-    ares__close_connection(conn, ARES_SUCCESS);
+  while ((node = ares_llist_node_first(server->connections)) != NULL) {
+    ares_conn_t *conn = ares_llist_node_val(node);
+    ares_close_connection(conn, ARES_SUCCESS);
   }
 }
 
-void ares__check_cleanup_conns(const ares_channel_t *channel)
+void ares_check_cleanup_conns(const ares_channel_t *channel)
 {
-  ares__slist_node_t *snode;
+  ares_slist_node_t *snode;
 
   if (channel == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
   /* Iterate across each server */
-  for (snode = ares__slist_node_first(channel->servers); snode != NULL;
-       snode = ares__slist_node_next(snode)) {
-    ares_server_t      *server = ares__slist_node_val(snode);
-    ares__llist_node_t *cnode;
+  for (snode = ares_slist_node_first(channel->servers); snode != NULL;
+       snode = ares_slist_node_next(snode)) {
+    ares_server_t     *server = ares_slist_node_val(snode);
+    ares_llist_node_t *cnode;
 
     /* Iterate across each connection */
-    cnode = ares__llist_node_first(server->connections);
+    cnode = ares_llist_node_first(server->connections);
     while (cnode != NULL) {
-      ares__llist_node_t *next       = ares__llist_node_next(cnode);
-      ares_conn_t        *conn       = ares__llist_node_val(cnode);
-      ares_bool_t         do_cleanup = ARES_FALSE;
-      cnode                          = next;
+      ares_llist_node_t *next       = ares_llist_node_next(cnode);
+      ares_conn_t       *conn       = ares_llist_node_val(cnode);
+      ares_bool_t        do_cleanup = ARES_FALSE;
+      cnode                         = next;
 
       /* Has connections, not eligible */
-      if (ares__llist_len(conn->queries_to_conn)) {
+      if (ares_llist_len(conn->queries_to_conn)) {
         continue;
       }
 
@@ -130,7 +131,7 @@ void ares__check_cleanup_conns(const ares_channel_t *channel)
       }
 
       /* Clean it up */
-      ares__close_connection(conn, ARES_SUCCESS);
+      ares_close_connection(conn, ARES_SUCCESS);
     }
   }
 }
diff --git a/deps/cares/src/lib/ares_config.h.cmake b/deps/cares/src/lib/ares_config.h.cmake
index da73867197145f..051b97f494fd32 100644
--- a/deps/cares/src/lib/ares_config.h.cmake
+++ b/deps/cares/src/lib/ares_config.h.cmake
@@ -82,6 +82,9 @@
 /* Define to 1 if you have the <poll.h> header file. */
 #cmakedefine HAVE_POLL_H 1
 
+/* Define to 1 if you have the memmem function. */
+#cmakedefine HAVE_MEMMEM 1
+
 /* Define to 1 if you have the poll function. */
 #cmakedefine HAVE_POLL 1
 
@@ -242,6 +245,9 @@
 /* Define to 1 if you have the send function. */
 #cmakedefine HAVE_SEND 1
 
+/* Define to 1 if you have the sendto function. */
+#cmakedefine HAVE_SENDTO 1
+
 /* Define to 1 if you have the setsockopt function. */
 #cmakedefine HAVE_SETSOCKOPT 1
 
diff --git a/deps/cares/src/lib/ares_config.h.in b/deps/cares/src/lib/ares_config.h.in
index 3e75b4c2cd088e..d22fa863477fbf 100644
--- a/deps/cares/src/lib/ares_config.h.in
+++ b/deps/cares/src/lib/ares_config.h.in
@@ -177,6 +177,9 @@
 /* Define to 1 if you have the <malloc.h> header file. */
 #undef HAVE_MALLOC_H
 
+/* Define to 1 if you have `memmem` */
+#undef HAVE_MEMMEM
+
 /* Define to 1 if you have the <memory.h> header file. */
 #undef HAVE_MEMORY_H
 
@@ -249,6 +252,9 @@
 /* Define to 1 if you have `send` */
 #undef HAVE_SEND
 
+/* Define to 1 if you have `sendto` */
+#undef HAVE_SENDTO
+
 /* Define to 1 if you have `setsockopt` */
 #undef HAVE_SETSOCKOPT
 
diff --git a/deps/cares/src/lib/ares_conn.c b/deps/cares/src/lib/ares_conn.c
new file mode 100644
index 00000000000000..6b315b05486d69
--- /dev/null
+++ b/deps/cares/src/lib/ares_conn.c
@@ -0,0 +1,511 @@
+/* MIT License
+ *
+ * Copyright (c) Massachusetts Institute of Technology
+ * Copyright (c) The c-ares project and its contributors
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#include "ares_private.h"
+
+void ares_conn_sock_state_cb_update(ares_conn_t            *conn,
+                                    ares_conn_state_flags_t flags)
+{
+  ares_channel_t *channel = conn->server->channel;
+
+  if ((conn->state_flags & ARES_CONN_STATE_CBFLAGS) != flags &&
+      channel->sock_state_cb) {
+    channel->sock_state_cb(channel->sock_state_cb_data, conn->fd,
+                           flags & ARES_CONN_STATE_READ ? 1 : 0,
+                           flags & ARES_CONN_STATE_WRITE ? 1 : 0);
+  }
+
+  conn->state_flags &= ~((unsigned int)ARES_CONN_STATE_CBFLAGS);
+  conn->state_flags |= flags;
+}
+
+ares_conn_err_t ares_conn_read(ares_conn_t *conn, void *data, size_t len,
+                               size_t *read_bytes)
+{
+  ares_channel_t *channel = conn->server->channel;
+  ares_conn_err_t err;
+
+  if (!(conn->flags & ARES_CONN_FLAG_TCP)) {
+    struct sockaddr_storage sa_storage;
+    ares_socklen_t          salen = sizeof(sa_storage);
+
+    memset(&sa_storage, 0, sizeof(sa_storage));
+
+    err =
+      ares_socket_recvfrom(channel, conn->fd, ARES_FALSE, data, len, 0,
+                           (struct sockaddr *)&sa_storage, &salen, read_bytes);
+
+#ifdef HAVE_RECVFROM
+    if (err == ARES_CONN_ERR_SUCCESS &&
+        !ares_sockaddr_addr_eq((struct sockaddr *)&sa_storage,
+                               &conn->server->addr)) {
+      err = ARES_CONN_ERR_WOULDBLOCK;
+    }
+#endif
+  } else {
+    err = ares_socket_recv(channel, conn->fd, ARES_TRUE, data, len, read_bytes);
+  }
+
+  /* Toggle connected state if needed */
+  if (err == ARES_CONN_ERR_SUCCESS) {
+    conn->state_flags |= ARES_CONN_STATE_CONNECTED;
+  }
+
+  return err;
+}
+
+/* Use like:
+ *   struct sockaddr_storage sa_storage;
+ *   ares_socklen_t          salen     = sizeof(sa_storage);
+ *   struct sockaddr        *sa        = (struct sockaddr *)&sa_storage;
+ *   ares_conn_set_sockaddr(conn, sa, &salen);
+ */
+static ares_status_t ares_conn_set_sockaddr(const ares_conn_t *conn,
+                                            struct sockaddr   *sa,
+                                            ares_socklen_t    *salen)
+{
+  const ares_server_t *server = conn->server;
+  unsigned short       port =
+    conn->flags & ARES_CONN_FLAG_TCP ? server->tcp_port : server->udp_port;
+  struct sockaddr_in  *sin;
+  struct sockaddr_in6 *sin6;
+
+  switch (server->addr.family) {
+    case AF_INET:
+      sin = (struct sockaddr_in *)(void *)sa;
+      if (*salen < (ares_socklen_t)sizeof(*sin)) {
+        return ARES_EFORMERR;
+      }
+      *salen = sizeof(*sin);
+      memset(sin, 0, sizeof(*sin));
+      sin->sin_family = AF_INET;
+      sin->sin_port   = htons(port);
+      memcpy(&sin->sin_addr, &server->addr.addr.addr4, sizeof(sin->sin_addr));
+      return ARES_SUCCESS;
+    case AF_INET6:
+      sin6 = (struct sockaddr_in6 *)(void *)sa;
+      if (*salen < (ares_socklen_t)sizeof(*sin6)) {
+        return ARES_EFORMERR;
+      }
+      *salen = sizeof(*sin6);
+      memset(sin6, 0, sizeof(*sin6));
+      sin6->sin6_family = AF_INET6;
+      sin6->sin6_port   = htons(port);
+      memcpy(&sin6->sin6_addr, &server->addr.addr.addr6,
+             sizeof(sin6->sin6_addr));
+#ifdef HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID
+      sin6->sin6_scope_id = server->ll_scope;
+#endif
+      return ARES_SUCCESS;
+    default:
+      break;
+  }
+
+  return ARES_EBADFAMILY;
+}
+
+static ares_status_t ares_conn_set_self_ip(ares_conn_t *conn, ares_bool_t early)
+{
+  ares_channel_t         *channel = conn->server->channel;
+  struct sockaddr_storage sa_storage;
+  int                     rv;
+  ares_socklen_t          len = sizeof(sa_storage);
+
+  /* We call this twice on TFO, if we already have the IP we can go ahead and
+   * skip processing */
+  if (!early && conn->self_ip.family != AF_UNSPEC) {
+    return ARES_SUCCESS;
+  }
+
+  memset(&sa_storage, 0, sizeof(sa_storage));
+
+  if (channel->sock_funcs.agetsockname == NULL) {
+    /* Not specified, we can still use cookies cooked with an empty self_ip */
+    memset(&conn->self_ip, 0, sizeof(conn->self_ip));
+    return ARES_SUCCESS;
+  }
+  rv = channel->sock_funcs.agetsockname(conn->fd,
+                                        (struct sockaddr *)(void *)&sa_storage,
+                                        &len, channel->sock_func_cb_data);
+  if (rv != 0) {
+    /* During TCP FastOpen, we can't get the IP this early since connect()
+     * may not be called.  That's ok, we'll try again later */
+    if (early && conn->flags & ARES_CONN_FLAG_TCP &&
+        conn->flags & ARES_CONN_FLAG_TFO) {
+      memset(&conn->self_ip, 0, sizeof(conn->self_ip));
+      return ARES_SUCCESS;
+    }
+    return ARES_ECONNREFUSED;
+  }
+
+  if (!ares_sockaddr_to_ares_addr(&conn->self_ip, NULL,
+                                  (struct sockaddr *)(void *)&sa_storage)) {
+    return ARES_ECONNREFUSED;
+  }
+
+  return ARES_SUCCESS;
+}
+
+ares_conn_err_t ares_conn_write(ares_conn_t *conn, const void *data, size_t len,
+                                size_t *written)
+{
+  ares_channel_t         *channel = conn->server->channel;
+  ares_bool_t             is_tfo  = ARES_FALSE;
+  ares_conn_err_t         err     = ARES_CONN_ERR_SUCCESS;
+  struct sockaddr_storage sa_storage;
+  ares_socklen_t          salen = 0;
+  struct sockaddr        *sa    = NULL;
+
+  *written = 0;
+
+  /* Don't try to write if not doing initial TFO and not connected */
+  if (conn->flags & ARES_CONN_FLAG_TCP &&
+      !(conn->state_flags & ARES_CONN_STATE_CONNECTED) &&
+      !(conn->flags & ARES_CONN_FLAG_TFO_INITIAL)) {
+    return ARES_CONN_ERR_WOULDBLOCK;
+  }
+
+  /* On initial write during TFO we need to send an address */
+  if (conn->flags & ARES_CONN_FLAG_TFO_INITIAL) {
+    salen = sizeof(sa_storage);
+    sa    = (struct sockaddr *)&sa_storage;
+
+    conn->flags &= ~((unsigned int)ARES_CONN_FLAG_TFO_INITIAL);
+    is_tfo       = ARES_TRUE;
+
+    if (ares_conn_set_sockaddr(conn, sa, &salen) != ARES_SUCCESS) {
+      return ARES_CONN_ERR_FAILURE;
+    }
+  }
+
+  err = ares_socket_write(channel, conn->fd, data, len, written, sa, salen);
+  if (err != ARES_CONN_ERR_SUCCESS) {
+    goto done;
+  }
+
+  if (is_tfo) {
+    /* If using TFO, we might not have been able to get an IP earlier, since
+     * we hadn't informed the OS of the destination.  When using sendto()
+     * now we have so we should be able to fetch it */
+    ares_conn_set_self_ip(conn, ARES_FALSE);
+    goto done;
+  }
+
+done:
+  if (err == ARES_CONN_ERR_SUCCESS && len == *written) {
+    /* Wrote all data, make sure we're not listening for write events unless
+     * using TFO, in which case we'll need a write event to know when
+     * we're connected. */
+    ares_conn_sock_state_cb_update(
+      conn, ARES_CONN_STATE_READ |
+              (is_tfo ? ARES_CONN_STATE_WRITE : ARES_CONN_STATE_NONE));
+  } else if (err == ARES_CONN_ERR_WOULDBLOCK) {
+    /* Need to wait on more buffer space to write */
+    ares_conn_sock_state_cb_update(conn, ARES_CONN_STATE_READ |
+                                           ARES_CONN_STATE_WRITE);
+  }
+
+  return err;
+}
+
+ares_status_t ares_conn_flush(ares_conn_t *conn)
+{
+  const unsigned char *data;
+  size_t               data_len;
+  size_t               count;
+  ares_conn_err_t      err;
+  ares_status_t        status;
+  ares_bool_t          tfo = ARES_FALSE;
+
+  if (conn == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (conn->flags & ARES_CONN_FLAG_TFO_INITIAL) {
+    tfo = ARES_TRUE;
+  }
+
+  do {
+    if (ares_buf_len(conn->out_buf) == 0) {
+      status = ARES_SUCCESS;
+      goto done;
+    }
+
+    if (conn->flags & ARES_CONN_FLAG_TCP) {
+      data = ares_buf_peek(conn->out_buf, &data_len);
+    } else {
+      unsigned short msg_len;
+
+      /* Read length, then provide buffer without length */
+      ares_buf_tag(conn->out_buf);
+      status = ares_buf_fetch_be16(conn->out_buf, &msg_len);
+      if (status != ARES_SUCCESS) {
+        return status;
+      }
+      ares_buf_tag_rollback(conn->out_buf);
+
+      data = ares_buf_peek(conn->out_buf, &data_len);
+      if (data_len < (size_t)(msg_len + 2)) {
+        status = ARES_EFORMERR;
+        goto done;
+      }
+      data     += 2;
+      data_len  = msg_len;
+    }
+
+    err = ares_conn_write(conn, data, data_len, &count);
+    if (err != ARES_CONN_ERR_SUCCESS) {
+      if (err != ARES_CONN_ERR_WOULDBLOCK) {
+        status = ARES_ECONNREFUSED;
+        goto done;
+      }
+      status = ARES_SUCCESS;
+      goto done;
+    }
+
+    /* UDP didn't send the length prefix so augment that here */
+    if (!(conn->flags & ARES_CONN_FLAG_TCP)) {
+      count += 2;
+    }
+
+    /* Strip data written from the buffer */
+    ares_buf_consume(conn->out_buf, count);
+    status = ARES_SUCCESS;
+
+    /* Loop only for UDP since we have to send per-packet.  We already
+     * sent everything we could if using tcp */
+  } while (!(conn->flags & ARES_CONN_FLAG_TCP));
+
+done:
+  if (status == ARES_SUCCESS) {
+    ares_conn_state_flags_t flags = ARES_CONN_STATE_READ;
+
+    /* When using TFO, the we need to enabling waiting on a write event to
+     * be notified of when a connection is actually established */
+    if (tfo) {
+      flags |= ARES_CONN_STATE_WRITE;
+    }
+
+    /* If using TCP and not all data was written (partial write), that means
+     * we need to also wait on a write event */
+    if (conn->flags & ARES_CONN_FLAG_TCP && ares_buf_len(conn->out_buf)) {
+      flags |= ARES_CONN_STATE_WRITE;
+    }
+
+    ares_conn_sock_state_cb_update(conn, flags);
+  }
+
+  return status;
+}
+
+static ares_status_t ares_conn_connect(ares_conn_t           *conn,
+                                       const struct sockaddr *sa,
+                                       ares_socklen_t         salen)
+{
+  ares_conn_err_t err;
+
+  err = ares_socket_connect(
+    conn->server->channel, conn->fd,
+    (conn->flags & ARES_CONN_FLAG_TFO) ? ARES_TRUE : ARES_FALSE, sa, salen);
+
+  if (err != ARES_CONN_ERR_WOULDBLOCK && err != ARES_CONN_ERR_SUCCESS) {
+    return ARES_ECONNREFUSED;
+  }
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_open_connection(ares_conn_t   **conn_out,
+                                   ares_channel_t *channel,
+                                   ares_server_t *server, ares_bool_t is_tcp)
+{
+  ares_status_t           status;
+  struct sockaddr_storage sa_storage;
+  ares_socklen_t          salen = sizeof(sa_storage);
+  struct sockaddr        *sa    = (struct sockaddr *)&sa_storage;
+  ares_conn_t            *conn;
+  ares_llist_node_t      *node  = NULL;
+  int                     stype = is_tcp ? SOCK_STREAM : SOCK_DGRAM;
+  ares_conn_state_flags_t state_flags;
+
+  *conn_out = NULL;
+
+  conn = ares_malloc(sizeof(*conn));
+  if (conn == NULL) {
+    return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  memset(conn, 0, sizeof(*conn));
+  conn->fd              = ARES_SOCKET_BAD;
+  conn->server          = server;
+  conn->queries_to_conn = ares_llist_create(NULL);
+  conn->flags           = is_tcp ? ARES_CONN_FLAG_TCP : ARES_CONN_FLAG_NONE;
+  conn->out_buf         = ares_buf_create();
+  conn->in_buf          = ares_buf_create();
+
+  if (conn->queries_to_conn == NULL || conn->out_buf == NULL ||
+      conn->in_buf == NULL) {
+    /* LCOV_EXCL_START: OutOfMemory */
+    status = ARES_ENOMEM;
+    goto done;
+    /* LCOV_EXCL_STOP */
+  }
+
+  /* Try to enable TFO always if using TCP. it will fail later on if its
+   * really not supported when we try to enable it on the socket. */
+  if (conn->flags & ARES_CONN_FLAG_TCP) {
+    conn->flags |= ARES_CONN_FLAG_TFO;
+  }
+
+  /* Convert into the struct sockaddr structure needed by the OS */
+  status = ares_conn_set_sockaddr(conn, sa, &salen);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  /* Acquire a socket. */
+  if (ares_socket_open(&conn->fd, channel, server->addr.family, stype, 0) !=
+      ARES_CONN_ERR_SUCCESS) {
+    status = ARES_ECONNREFUSED;
+    goto done;
+  }
+
+  /* Configure channel configured options */
+  status = ares_socket_configure(
+    channel, server->addr.family,
+    (conn->flags & ARES_CONN_FLAG_TCP) ? ARES_TRUE : ARES_FALSE, conn->fd);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  /* Enable TFO if possible */
+  if (conn->flags & ARES_CONN_FLAG_TFO &&
+      ares_socket_enable_tfo(channel, conn->fd) != ARES_CONN_ERR_SUCCESS) {
+    conn->flags &= ~((unsigned int)ARES_CONN_FLAG_TFO);
+  }
+
+  if (channel->sock_config_cb) {
+    int err =
+      channel->sock_config_cb(conn->fd, stype, channel->sock_config_cb_data);
+    if (err < 0) {
+      status = ARES_ECONNREFUSED;
+      goto done;
+    }
+  }
+
+  /* Connect */
+  status = ares_conn_connect(conn, sa, salen);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  if (channel->sock_create_cb) {
+    int err =
+      channel->sock_create_cb(conn->fd, stype, channel->sock_create_cb_data);
+    if (err < 0) {
+      status = ARES_ECONNREFUSED;
+      goto done;
+    }
+  }
+
+  /* Let the connection know we haven't written our first packet yet for TFO */
+  if (conn->flags & ARES_CONN_FLAG_TFO) {
+    conn->flags |= ARES_CONN_FLAG_TFO_INITIAL;
+  }
+
+  /* Need to store our own ip for DNS cookie support */
+  status = ares_conn_set_self_ip(conn, ARES_TRUE);
+  if (status != ARES_SUCCESS) {
+    goto done; /* LCOV_EXCL_LINE: UntestablePath */
+  }
+
+  /* TCP connections are thrown to the end as we don't spawn multiple TCP
+   * connections. UDP connections are put on front where the newest connection
+   * can be quickly pulled */
+  if (is_tcp) {
+    node = ares_llist_insert_last(server->connections, conn);
+  } else {
+    node = ares_llist_insert_first(server->connections, conn);
+  }
+  if (node == NULL) {
+    /* LCOV_EXCL_START: OutOfMemory */
+    status = ARES_ENOMEM;
+    goto done;
+    /* LCOV_EXCL_STOP */
+  }
+
+  /* Register globally to quickly map event on file descriptor to connection
+   * node object */
+  if (!ares_htable_asvp_insert(channel->connnode_by_socket, conn->fd, node)) {
+    /* LCOV_EXCL_START: OutOfMemory */
+    status = ARES_ENOMEM;
+    goto done;
+    /* LCOV_EXCL_STOP */
+  }
+
+  state_flags = ARES_CONN_STATE_READ;
+
+  /* Get notified on connect if using TCP */
+  if (conn->flags & ARES_CONN_FLAG_TCP) {
+    state_flags |= ARES_CONN_STATE_WRITE;
+  }
+
+  /* Dot no attempt to update sock state callbacks on TFO until *after* the
+   * initial write is performed.  Due to the notification event, its possible
+   * an erroneous read can come in before the attempt to write the data which
+   * might be used to set the ip address */
+  if (!(conn->flags & ARES_CONN_FLAG_TFO_INITIAL)) {
+    ares_conn_sock_state_cb_update(conn, state_flags);
+  }
+
+  if (is_tcp) {
+    server->tcp_conn = conn;
+  }
+
+done:
+  if (status != ARES_SUCCESS) {
+    ares_llist_node_claim(node);
+    ares_llist_destroy(conn->queries_to_conn);
+    ares_socket_close(channel, conn->fd);
+    ares_buf_destroy(conn->out_buf);
+    ares_buf_destroy(conn->in_buf);
+    ares_free(conn);
+  } else {
+    *conn_out = conn;
+  }
+  return status;
+}
+
+ares_conn_t *ares_conn_from_fd(const ares_channel_t *channel, ares_socket_t fd)
+{
+  ares_llist_node_t *node;
+
+  node = ares_htable_asvp_get_direct(channel->connnode_by_socket, fd);
+  if (node == NULL) {
+    return NULL;
+  }
+
+  return ares_llist_node_val(node);
+}
diff --git a/deps/cares/src/lib/ares_conn.h b/deps/cares/src/lib/ares_conn.h
new file mode 100644
index 00000000000000..16ecefdd19fa67
--- /dev/null
+++ b/deps/cares/src/lib/ares_conn.h
@@ -0,0 +1,196 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES_CONN_H
+#define __ARES_CONN_H
+
+#include "ares_socket.h"
+
+struct ares_conn;
+typedef struct ares_conn ares_conn_t;
+
+struct ares_server;
+typedef struct ares_server ares_server_t;
+
+typedef enum {
+  /*! No flags */
+  ARES_CONN_FLAG_NONE = 0,
+  /*! TCP connection, not UDP */
+  ARES_CONN_FLAG_TCP = 1 << 0,
+  /*! TCP Fast Open is enabled and being used if supported by the OS */
+  ARES_CONN_FLAG_TFO = 1 << 1,
+  /*! TCP Fast Open has not yet sent its first packet. Gets unset on first
+   *  write to a connection */
+  ARES_CONN_FLAG_TFO_INITIAL = 1 << 2
+} ares_conn_flags_t;
+
+typedef enum {
+  ARES_CONN_STATE_NONE      = 0,
+  ARES_CONN_STATE_READ      = 1 << 0,
+  ARES_CONN_STATE_WRITE     = 1 << 1,
+  ARES_CONN_STATE_CONNECTED = 1 << 2, /* This doesn't get a callback */
+  ARES_CONN_STATE_CBFLAGS   = ARES_CONN_STATE_READ | ARES_CONN_STATE_WRITE
+} ares_conn_state_flags_t;
+
+struct ares_conn {
+  ares_server_t          *server;
+  ares_socket_t           fd;
+  struct ares_addr        self_ip;
+  ares_conn_flags_t       flags;
+  ares_conn_state_flags_t state_flags;
+
+  /*! Outbound buffered data that is not yet sent.  Exists as one contiguous
+   *  stream in TCP format (big endian 16bit length prefix followed by DNS
+   *  wire-format message).  For TCP this can be sent as-is, UDP this must
+   *  be sent per-packet (stripping the length prefix) */
+  ares_buf_t             *out_buf;
+
+  /*! Inbound buffered data that is not yet parsed.  Exists as one contiguous
+   *  stream in TCP format (big endian 16bit length prefix followed by DNS
+   *  wire-format message).  TCP may have partial data and this needs to be
+   *  handled gracefully, but UDP will always have a full message */
+  ares_buf_t             *in_buf;
+
+  /* total number of queries run on this connection since it was established */
+  size_t                  total_queries;
+
+  /* list of outstanding queries to this connection */
+  ares_llist_t           *queries_to_conn;
+};
+
+/*! Various buckets for grouping history */
+typedef enum {
+  ARES_METRIC_1MINUTE = 0, /*!< Bucket for tracking over the last minute */
+  ARES_METRIC_15MINUTES,   /*!< Bucket for tracking over the last 15 minutes */
+  ARES_METRIC_1HOUR,       /*!< Bucket for tracking over the last hour */
+  ARES_METRIC_1DAY,        /*!< Bucket for tracking over the last day */
+  ARES_METRIC_INCEPTION,   /*!< Bucket for tracking since inception */
+  ARES_METRIC_COUNT        /*!< Count of buckets, not a real bucket */
+} ares_server_bucket_t;
+
+/*! Data metrics collected for each bucket */
+typedef struct {
+  time_t        ts;             /*!< Timestamp divided by bucket divisor */
+  unsigned int  latency_min_ms; /*!< Minimum latency for queries */
+  unsigned int  latency_max_ms; /*!< Maximum latency for queries */
+  ares_uint64_t total_ms;       /*!< Cumulative query time for bucket */
+  ares_uint64_t total_count;    /*!< Number of queries for bucket */
+
+  time_t        prev_ts;        /*!< Previous period bucket timestamp */
+  ares_uint64_t
+    prev_total_ms; /*!< Previous period bucket cumulative query time */
+  ares_uint64_t prev_total_count; /*!< Previous period bucket query count */
+} ares_server_metrics_t;
+
+typedef enum {
+  ARES_COOKIE_INITIAL     = 0,
+  ARES_COOKIE_GENERATED   = 1,
+  ARES_COOKIE_SUPPORTED   = 2,
+  ARES_COOKIE_UNSUPPORTED = 3
+} ares_cookie_state_t;
+
+/*! Structure holding tracking data for RFC 7873/9018 DNS cookies.
+ *  Implementation plan for this feature is here:
+ *  https://github.com/c-ares/c-ares/issues/620
+ */
+typedef struct {
+  /*! starts at INITIAL, transitions as needed. */
+  ares_cookie_state_t state;
+  /*! randomly-generate client cookie */
+  unsigned char       client[8];
+  /*! timestamp client cookie was generated, used for rotation purposes */
+  ares_timeval_t      client_ts;
+  /*! IP address last used for client to connect to server.  If this changes
+   *  The client cookie gets invalidated */
+  struct ares_addr    client_ip;
+  /*! Server Cookie last received, 8-32 bytes in length */
+  unsigned char       server[32];
+  /*! Length of server cookie on file. */
+  size_t              server_len;
+  /*! Timestamp of last attempt to use cookies, but it was determined that the
+   *  server didn't support them */
+  ares_timeval_t      unsupported_ts;
+} ares_cookie_t;
+
+struct ares_server {
+  /* Configuration */
+  size_t                idx;      /* index for server in system configuration */
+  struct ares_addr      addr;
+  unsigned short        udp_port; /* host byte order */
+  unsigned short        tcp_port; /* host byte order */
+  char                  ll_iface[64];    /* IPv6 Link Local Interface */
+  unsigned int          ll_scope;        /* IPv6 Link Local Scope */
+
+  size_t                consec_failures; /* Consecutive query failure count
+                                          * can be hard errors or timeouts
+                                          */
+  ares_bool_t           probe_pending;   /* Whether a probe is pending for this
+                                          * server due to prior failures */
+  ares_llist_t         *connections;
+  ares_conn_t          *tcp_conn;
+
+  /* The next time when we will retry this server if it has hit failures */
+  ares_timeval_t        next_retry_time;
+
+  /*! Buckets for collecting metrics about the server */
+  ares_server_metrics_t metrics[ARES_METRIC_COUNT];
+
+  /*! RFC 7873/9018 DNS Cookies */
+  ares_cookie_t         cookie;
+
+  /* Link back to owning channel */
+  ares_channel_t       *channel;
+};
+
+void ares_close_connection(ares_conn_t *conn, ares_status_t requeue_status);
+void ares_close_sockets(ares_server_t *server);
+void ares_check_cleanup_conns(const ares_channel_t *channel);
+
+void ares_destroy_servers_state(ares_channel_t *channel);
+ares_status_t   ares_open_connection(ares_conn_t   **conn_out,
+                                     ares_channel_t *channel,
+                                     ares_server_t *server, ares_bool_t is_tcp);
+
+ares_conn_err_t ares_conn_write(ares_conn_t *conn, const void *data, size_t len,
+                                size_t *written);
+ares_status_t   ares_conn_flush(ares_conn_t *conn);
+ares_conn_err_t ares_conn_read(ares_conn_t *conn, void *data, size_t len,
+                               size_t *read_bytes);
+ares_conn_t *ares_conn_from_fd(const ares_channel_t *channel, ares_socket_t fd);
+void         ares_conn_sock_state_cb_update(ares_conn_t            *conn,
+                                            ares_conn_state_flags_t flags);
+ares_conn_err_t ares_socket_recv(ares_channel_t *channel, ares_socket_t s,
+                                 ares_bool_t is_tcp, void *data,
+                                 size_t data_len, size_t *read_bytes);
+ares_conn_err_t ares_socket_recvfrom(ares_channel_t *channel, ares_socket_t s,
+                                     ares_bool_t is_tcp, void *data,
+                                     size_t data_len, int flags,
+                                     struct sockaddr *from,
+                                     ares_socklen_t  *from_len,
+                                     size_t          *read_bytes);
+
+void            ares_destroy_server(ares_server_t *server);
+
+#endif
diff --git a/deps/cares/src/lib/ares_cookie.c b/deps/cares/src/lib/ares_cookie.c
index bf9d1ba25da41d..f31c74e748d974 100644
--- a/deps/cares/src/lib/ares_cookie.c
+++ b/deps/cares/src/lib/ares_cookie.c
@@ -229,7 +229,7 @@ static ares_bool_t timeval_expired(const ares_timeval_t *tv,
 {
   ares_int64_t   tvdiff_ms;
   ares_timeval_t tvdiff;
-  ares__timeval_diff(&tvdiff, tv, now);
+  ares_timeval_diff(&tvdiff, tv, now);
 
   tvdiff_ms = tvdiff.sec * 1000 + tvdiff.usec / 1000;
   if (tvdiff_ms >= (ares_int64_t)millsecs) {
@@ -249,7 +249,7 @@ static void ares_cookie_generate(ares_cookie_t *cookie, ares_conn_t *conn,
 {
   ares_channel_t *channel = conn->server->channel;
 
-  ares__rand_bytes(channel->rand_state, cookie->client, sizeof(cookie->client));
+  ares_rand_bytes(channel->rand_state, cookie->client, sizeof(cookie->client));
   memcpy(&cookie->client_ts, now, sizeof(cookie->client_ts));
   memcpy(&cookie->client_ip, &conn->self_ip, sizeof(cookie->client_ip));
 }
@@ -426,9 +426,8 @@ ares_status_t ares_cookie_validate(ares_query_t            *query,
 
     /* Resend the request, hopefully it will work the next time as we should
      * have recorded a server cookie */
-    ares__requeue_query(query, now, ARES_SUCCESS,
-                        ARES_FALSE /* Don't increment try count */,
-                        NULL);
+    ares_requeue_query(query, now, ARES_SUCCESS,
+                       ARES_FALSE /* Don't increment try count */, NULL);
 
     /* Parent needs to drop this response */
     return ARES_EBADRESP;
diff --git a/deps/cares/src/lib/ares_destroy.c b/deps/cares/src/lib/ares_destroy.c
index d75b5e227cc28d..1e5706e0e06b80 100644
--- a/deps/cares/src/lib/ares_destroy.c
+++ b/deps/cares/src/lib/ares_destroy.c
@@ -31,17 +31,17 @@
 
 void ares_destroy(ares_channel_t *channel)
 {
-  size_t              i;
-  ares__llist_node_t *node = NULL;
+  size_t             i;
+  ares_llist_node_t *node = NULL;
 
   if (channel == NULL) {
     return;
   }
 
   /* Mark as being shutdown */
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   channel->sys_up = ARES_FALSE;
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   /* Disable configuration change monitoring.  We can't hold a lock because
    * some cleanup routines, such as on Windows, are synchronous operations.
@@ -61,23 +61,23 @@ void ares_destroy(ares_channel_t *channel)
    * holding a lock as the thread may take locks. */
   if (channel->reinit_thread != NULL) {
     void *rv;
-    ares__thread_join(channel->reinit_thread, &rv);
+    ares_thread_join(channel->reinit_thread, &rv);
     channel->reinit_thread = NULL;
   }
 
   /* Lock because callbacks will be triggered, and any system-generated
    * callbacks need to hold a channel lock. */
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
   /* Destroy all queries */
-  node = ares__llist_node_first(channel->all_queries);
+  node = ares_llist_node_first(channel->all_queries);
   while (node != NULL) {
-    ares__llist_node_t *next  = ares__llist_node_next(node);
-    ares_query_t       *query = ares__llist_node_claim(node);
+    ares_llist_node_t *next  = ares_llist_node_next(node);
+    ares_query_t      *query = ares_llist_node_claim(node);
 
     query->node_all_queries = NULL;
     query->callback(query->arg, ARES_EDESTRUCTION, 0, NULL);
-    ares__free_query(query);
+    ares_free_query(query);
 
     node = next;
   }
@@ -88,19 +88,19 @@ void ares_destroy(ares_channel_t *channel)
   /* Freeing the query should remove it from all the lists in which it sits,
    * so all query lists should be empty now.
    */
-  assert(ares__llist_len(channel->all_queries) == 0);
-  assert(ares__htable_szvp_num_keys(channel->queries_by_qid) == 0);
-  assert(ares__slist_len(channel->queries_by_timeout) == 0);
+  assert(ares_llist_len(channel->all_queries) == 0);
+  assert(ares_htable_szvp_num_keys(channel->queries_by_qid) == 0);
+  assert(ares_slist_len(channel->queries_by_timeout) == 0);
 #endif
 
-  ares__destroy_servers_state(channel);
+  ares_destroy_servers_state(channel);
 
 #ifndef NDEBUG
-  assert(ares__htable_asvp_num_keys(channel->connnode_by_socket) == 0);
+  assert(ares_htable_asvp_num_keys(channel->connnode_by_socket) == 0);
 #endif
 
   /* No more callbacks will be triggered after this point, unlock */
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   /* Shut down the event thread */
   if (channel->optmask & ARES_OPT_EVENT_THREAD) {
@@ -114,48 +114,46 @@ void ares_destroy(ares_channel_t *channel)
     ares_free(channel->domains);
   }
 
-  ares__llist_destroy(channel->all_queries);
-  ares__slist_destroy(channel->queries_by_timeout);
-  ares__htable_szvp_destroy(channel->queries_by_qid);
-  ares__htable_asvp_destroy(channel->connnode_by_socket);
+  ares_llist_destroy(channel->all_queries);
+  ares_slist_destroy(channel->queries_by_timeout);
+  ares_htable_szvp_destroy(channel->queries_by_qid);
+  ares_htable_asvp_destroy(channel->connnode_by_socket);
 
   ares_free(channel->sortlist);
   ares_free(channel->lookups);
   ares_free(channel->resolvconf_path);
   ares_free(channel->hosts_path);
-  ares__destroy_rand_state(channel->rand_state);
+  ares_destroy_rand_state(channel->rand_state);
 
-  ares__hosts_file_destroy(channel->hf);
+  ares_hosts_file_destroy(channel->hf);
 
-  ares__qcache_destroy(channel->qcache);
+  ares_qcache_destroy(channel->qcache);
 
-  ares__channel_threading_destroy(channel);
+  ares_channel_threading_destroy(channel);
 
   ares_free(channel);
 }
 
-void ares__destroy_server(ares_server_t *server)
+void ares_destroy_server(ares_server_t *server)
 {
   if (server == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__close_sockets(server);
-  ares__llist_destroy(server->connections);
-  ares__buf_destroy(server->tcp_parser);
-  ares__buf_destroy(server->tcp_send);
+  ares_close_sockets(server);
+  ares_llist_destroy(server->connections);
   ares_free(server);
 }
 
-void ares__destroy_servers_state(ares_channel_t *channel)
+void ares_destroy_servers_state(ares_channel_t *channel)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *node;
 
-  while ((node = ares__slist_node_first(channel->servers)) != NULL) {
-    ares_server_t *server = ares__slist_node_claim(node);
-    ares__destroy_server(server);
+  while ((node = ares_slist_node_first(channel->servers)) != NULL) {
+    ares_server_t *server = ares_slist_node_claim(node);
+    ares_destroy_server(server);
   }
 
-  ares__slist_destroy(channel->servers);
+  ares_slist_destroy(channel->servers);
   channel->servers = NULL;
 }
diff --git a/deps/cares/src/lib/ares_freeaddrinfo.c b/deps/cares/src/lib/ares_freeaddrinfo.c
index 2a49f57531e9bb..c996df9104a18a 100644
--- a/deps/cares/src/lib/ares_freeaddrinfo.c
+++ b/deps/cares/src/lib/ares_freeaddrinfo.c
@@ -31,7 +31,7 @@
 #  include <netdb.h>
 #endif
 
-void ares__freeaddrinfo_cnames(struct ares_addrinfo_cname *head)
+void ares_freeaddrinfo_cnames(struct ares_addrinfo_cname *head)
 {
   struct ares_addrinfo_cname *current;
   while (head) {
@@ -43,7 +43,7 @@ void ares__freeaddrinfo_cnames(struct ares_addrinfo_cname *head)
   }
 }
 
-void ares__freeaddrinfo_nodes(struct ares_addrinfo_node *head)
+void ares_freeaddrinfo_nodes(struct ares_addrinfo_node *head)
 {
   struct ares_addrinfo_node *current;
   while (head) {
@@ -59,8 +59,8 @@ void ares_freeaddrinfo(struct ares_addrinfo *ai)
   if (ai == NULL) {
     return;
   }
-  ares__freeaddrinfo_cnames(ai->cnames);
-  ares__freeaddrinfo_nodes(ai->nodes);
+  ares_freeaddrinfo_cnames(ai->cnames);
+  ares_freeaddrinfo_nodes(ai->nodes);
 
   ares_free(ai->name);
   ares_free(ai);
diff --git a/deps/cares/src/lib/ares_getaddrinfo.c b/deps/cares/src/lib/ares_getaddrinfo.c
index 713acf744a0dca..09d34d337834af 100644
--- a/deps/cares/src/lib/ares_getaddrinfo.c
+++ b/deps/cares/src/lib/ares_getaddrinfo.c
@@ -58,10 +58,6 @@
 
 #include "ares_dns.h"
 
-#ifdef _WIN32
-#  include "ares_platform.h"
-#endif
-
 struct host_query {
   ares_channel_t            *channel;
   char                      *name;
@@ -101,7 +97,7 @@ static const struct ares_addrinfo_hints default_hints = {
 static ares_bool_t next_dns_lookup(struct host_query *hquery);
 
 struct ares_addrinfo_cname *
-  ares__append_addrinfo_cname(struct ares_addrinfo_cname **head)
+  ares_append_addrinfo_cname(struct ares_addrinfo_cname **head)
 {
   struct ares_addrinfo_cname *tail = ares_malloc_zero(sizeof(*tail));
   struct ares_addrinfo_cname *last = *head;
@@ -123,8 +119,8 @@ struct ares_addrinfo_cname *
   return tail;
 }
 
-void ares__addrinfo_cat_cnames(struct ares_addrinfo_cname **head,
-                               struct ares_addrinfo_cname  *tail)
+void ares_addrinfo_cat_cnames(struct ares_addrinfo_cname **head,
+                              struct ares_addrinfo_cname  *tail)
 {
   struct ares_addrinfo_cname *last = *head;
   if (!last) {
@@ -141,7 +137,7 @@ void ares__addrinfo_cat_cnames(struct ares_addrinfo_cname **head,
 
 /* Allocate new addrinfo and append to the tail. */
 struct ares_addrinfo_node *
-  ares__append_addrinfo_node(struct ares_addrinfo_node **head)
+  ares_append_addrinfo_node(struct ares_addrinfo_node **head)
 {
   struct ares_addrinfo_node *tail = ares_malloc_zero(sizeof(*tail));
   struct ares_addrinfo_node *last = *head;
@@ -163,8 +159,8 @@ struct ares_addrinfo_node *
   return tail;
 }
 
-void ares__addrinfo_cat_nodes(struct ares_addrinfo_node **head,
-                              struct ares_addrinfo_node  *tail)
+void ares_addrinfo_cat_nodes(struct ares_addrinfo_node **head,
+                             struct ares_addrinfo_node  *tail)
 {
   struct ares_addrinfo_node *last = *head;
   if (!last) {
@@ -252,7 +248,7 @@ static ares_bool_t fake_addrinfo(const char *name, unsigned short port,
     ares_bool_t valid   = ARES_TRUE;
     const char *p;
     for (p = name; *p; p++) {
-      if (!ares__isdigit(*p) && *p != '.') {
+      if (!ares_isdigit(*p) && *p != '.') {
         valid = ARES_FALSE;
         break;
       } else if (*p == '.') {
@@ -297,7 +293,7 @@ static ares_bool_t fake_addrinfo(const char *name, unsigned short port,
   }
 
   if (hints->ai_flags & ARES_AI_CANONNAME) {
-    cname = ares__append_addrinfo_cname(&ai->cnames);
+    cname = ares_append_addrinfo_cname(&ai->cnames);
     if (!cname) {
       /* LCOV_EXCL_START: OutOfMemory */
       ares_freeaddrinfo(ai);
@@ -327,7 +323,7 @@ static void hquery_free(struct host_query *hquery, ares_bool_t cleanup_ai)
   if (cleanup_ai) {
     ares_freeaddrinfo(hquery->ai);
   }
-  ares__strsplit_free(hquery->names, hquery->names_cnt);
+  ares_strsplit_free(hquery->names, hquery->names_cnt);
   ares_free(hquery->name);
   ares_free(hquery->lookups);
   ares_free(hquery);
@@ -341,7 +337,7 @@ static void end_hquery(struct host_query *hquery, ares_status_t status)
   if (status == ARES_SUCCESS) {
     if (!(hquery->hints.ai_flags & ARES_AI_NOSORT) && hquery->ai->nodes) {
       sentinel.ai_next = hquery->ai->nodes;
-      ares__sortaddrinfo(hquery->channel, &sentinel);
+      ares_sortaddrinfo(hquery->channel, &sentinel);
       hquery->ai->nodes = sentinel.ai_next;
     }
     next = hquery->ai->nodes;
@@ -361,7 +357,7 @@ static void end_hquery(struct host_query *hquery, ares_status_t status)
   hquery_free(hquery, ARES_FALSE);
 }
 
-ares_bool_t ares__is_localhost(const char *name)
+ares_bool_t ares_is_localhost(const char *name)
 {
   /* RFC6761 6.3 says : The domain "localhost." and any names falling within
    * ".localhost." */
@@ -371,7 +367,7 @@ ares_bool_t ares__is_localhost(const char *name)
     return ARES_FALSE; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  if (strcmp(name, "localhost") == 0) {
+  if (ares_strcaseeq(name, "localhost")) {
     return ARES_TRUE;
   }
 
@@ -380,7 +376,8 @@ ares_bool_t ares__is_localhost(const char *name)
     return ARES_FALSE;
   }
 
-  if (strcmp(name + (len - 10 /* strlen(".localhost") */), ".localhost") == 0) {
+  if (ares_strcaseeq(name + (len - 10 /* strlen(".localhost") */),
+                     ".localhost")) {
     return ARES_TRUE;
   }
 
@@ -393,11 +390,11 @@ static ares_status_t file_lookup(struct host_query *hquery)
   ares_status_t             status;
 
   /* Per RFC 7686, reject queries for ".onion" domain names with NXDOMAIN. */
-  if (ares__is_onion_domain(hquery->name)) {
+  if (ares_is_onion_domain(hquery->name)) {
     return ARES_ENOTFOUND;
   }
 
-  status = ares__hosts_search_host(
+  status = ares_hosts_search_host(
     hquery->channel,
     (hquery->hints.ai_flags & ARES_AI_ENVHOSTS) ? ARES_TRUE : ARES_FALSE,
     hquery->name, &entry);
@@ -406,7 +403,7 @@ static ares_status_t file_lookup(struct host_query *hquery)
     goto done;
   }
 
-  status = ares__hosts_entry_to_addrinfo(
+  status = ares_hosts_entry_to_addrinfo(
     entry, hquery->name, hquery->hints.ai_family, hquery->port,
     (hquery->hints.ai_flags & ARES_AI_CANONNAME) ? ARES_TRUE : ARES_FALSE,
     hquery->ai);
@@ -423,9 +420,9 @@ static ares_status_t file_lookup(struct host_query *hquery)
    * We will also ignore ALL errors when trying to resolve localhost, such
    * as permissions errors reading /etc/hosts or a malformed /etc/hosts */
   if (status != ARES_SUCCESS && status != ARES_ENOMEM &&
-      ares__is_localhost(hquery->name)) {
-    return ares__addrinfo_localhost(hquery->name, hquery->port, &hquery->hints,
-                                    hquery->ai);
+      ares_is_localhost(hquery->name)) {
+    return ares_addrinfo_localhost(hquery->name, hquery->port, &hquery->hints,
+                                   hquery->ai);
   }
 
   return status;
@@ -439,7 +436,7 @@ static void next_lookup(struct host_query *hquery, ares_status_t status)
        * queries for localhost names to their configured caching DNS
        * server(s)."
        * Otherwise, DNS lookup. */
-      if (!ares__is_localhost(hquery->name) && next_dns_lookup(hquery)) {
+      if (!ares_is_localhost(hquery->name) && next_dns_lookup(hquery)) {
         break;
       }
 
@@ -476,7 +473,7 @@ static void terminate_retries(const struct host_query *hquery,
     return;
   }
 
-  query = ares__htable_szvp_get_direct(channel->queries_by_qid, term_qid);
+  query = ares_htable_szvp_get_direct(channel->queries_by_qid, term_qid);
   if (query == NULL) {
     return;
   }
@@ -497,7 +494,7 @@ static void host_callback(void *arg, ares_status_t status, size_t timeouts,
       addinfostatus = ARES_EBADRESP; /* LCOV_EXCL_LINE: DefensiveCoding */
     } else {
       addinfostatus =
-        ares__parse_into_addrinfo(dnsrec, ARES_TRUE, hquery->port, hquery->ai);
+        ares_parse_into_addrinfo(dnsrec, ARES_TRUE, hquery->port, hquery->ai);
     }
     if (addinfostatus == ARES_SUCCESS) {
       terminate_retries(hquery, ares_dns_record_get_id(dnsrec));
@@ -528,10 +525,9 @@ static void host_callback(void *arg, ares_status_t status, size_t timeouts,
         hquery->nodata_cnt++;
       }
       next_lookup(hquery, hquery->nodata_cnt ? ARES_ENODATA : status);
-    } else if (
-        (status == ARES_ESERVFAIL || status == ARES_EREFUSED) &&
-        ares__name_label_cnt(hquery->names[hquery->next_name_idx-1]) == 1
-      ) {
+    } else if ((status == ARES_ESERVFAIL || status == ARES_EREFUSED) &&
+               ares_name_label_cnt(hquery->names[hquery->next_name_idx - 1]) ==
+                 1) {
       /* Issue #852, systemd-resolved may return SERVFAIL or REFUSED on a
        * single label domain name. */
       next_lookup(hquery, hquery->nodata_cnt ? ARES_ENODATA : status);
@@ -567,7 +563,7 @@ static void ares_getaddrinfo_int(ares_channel_t *channel, const char *name,
     return;
   }
 
-  if (ares__is_onion_domain(name)) {
+  if (ares_is_onion_domain(name)) {
     callback(arg, ARES_ENOTFOUND, 0, NULL);
     return;
   }
@@ -630,7 +626,7 @@ static void ares_getaddrinfo_int(ares_channel_t *channel, const char *name,
   }
 
   status =
-    ares__search_name_list(channel, name, &hquery->names, &hquery->names_cnt);
+    ares_search_name_list(channel, name, &hquery->names, &hquery->names_cnt);
   if (status != ARES_SUCCESS) {
     hquery_free(hquery, ARES_TRUE);
     callback(arg, (int)status, 0, NULL);
@@ -659,9 +655,9 @@ void ares_getaddrinfo(ares_channel_t *channel, const char *name,
   if (channel == NULL) {
     return;
   }
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   ares_getaddrinfo_int(channel, name, service, hints, callback, arg);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 static ares_bool_t next_dns_lookup(struct host_query *hquery)
diff --git a/deps/cares/src/lib/ares_gethostbyaddr.c b/deps/cares/src/lib/ares_gethostbyaddr.c
index 1db81ec2b48b9d..a7acf3c45c9e6d 100644
--- a/deps/cares/src/lib/ares_gethostbyaddr.c
+++ b/deps/cares/src/lib/ares_gethostbyaddr.c
@@ -39,7 +39,6 @@
 
 #include "ares_nameser.h"
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 
 struct addr_query {
   /* Arguments passed to ares_gethostbyaddr() */
@@ -112,9 +111,9 @@ void ares_gethostbyaddr(ares_channel_t *channel, const void *addr, int addrlen,
   if (channel == NULL) {
     return;
   }
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   ares_gethostbyaddr_nolock(channel, addr, addrlen, family, callback, arg);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 static void next_lookup(struct addr_query *aquery)
@@ -216,12 +215,12 @@ static ares_status_t file_lookup(ares_channel_t         *channel,
     return ARES_ENOTFOUND;
   }
 
-  status = ares__hosts_search_ipaddr(channel, ARES_FALSE, ipaddr, &entry);
+  status = ares_hosts_search_ipaddr(channel, ARES_FALSE, ipaddr, &entry);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
-  status = ares__hosts_entry_to_hostent(entry, addr->family, host);
+  status = ares_hosts_entry_to_hostent(entry, addr->family, host);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
diff --git a/deps/cares/src/lib/ares_gethostbyname.c b/deps/cares/src/lib/ares_gethostbyname.c
index 6db86e0eed1c65..56de729526af34 100644
--- a/deps/cares/src/lib/ares_gethostbyname.c
+++ b/deps/cares/src/lib/ares_gethostbyname.c
@@ -44,7 +44,6 @@
 #endif
 
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 
 static void   sort_addresses(const struct hostent  *host,
                              const struct apattern *sortlist, size_t nsort);
@@ -68,7 +67,7 @@ static void ares_gethostbyname_callback(void *arg, int status, int timeouts,
   struct host_query *ghbn_arg = arg;
 
   if (status == ARES_SUCCESS) {
-    status = (int)ares__addrinfo2hostent(result, AF_UNSPEC, &hostent);
+    status = (int)ares_addrinfo2hostent(result, AF_UNSPEC, &hostent);
   }
 
   /* addrinfo2hostent will only return ENODATA if there are no addresses _and_
@@ -175,7 +174,7 @@ static size_t get_address_index(const struct in_addr  *addr,
       continue;
     }
 
-    if (ares__subnet_match(&aaddr, &sortlist[i].addr, sortlist[i].mask)) {
+    if (ares_subnet_match(&aaddr, &sortlist[i].addr, sortlist[i].mask)) {
       break;
     }
   }
@@ -231,15 +230,15 @@ static size_t get6_address_index(const struct ares_in6_addr *addr,
       continue;
     }
 
-    if (ares__subnet_match(&aaddr, &sortlist[i].addr, sortlist[i].mask)) {
+    if (ares_subnet_match(&aaddr, &sortlist[i].addr, sortlist[i].mask)) {
       break;
     }
   }
   return i;
 }
 
-static ares_status_t ares__hostent_localhost(const char *name, int family,
-                                             struct hostent **host_out)
+static ares_status_t ares_hostent_localhost(const char *name, int family,
+                                            struct hostent **host_out)
 {
   ares_status_t              status;
   struct ares_addrinfo      *ai = NULL;
@@ -254,12 +253,12 @@ static ares_status_t ares__hostent_localhost(const char *name, int family,
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__addrinfo_localhost(name, 0, &hints, ai);
+  status = ares_addrinfo_localhost(name, 0, &hints, ai);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__addrinfo2hostent(ai, family, host_out);
+  status = ares_addrinfo2hostent(ai, family, host_out);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -289,16 +288,16 @@ static ares_status_t ares_gethostbyname_file_int(ares_channel_t *channel,
   }
 
   /* Per RFC 7686, reject queries for ".onion" domain names with NXDOMAIN. */
-  if (ares__is_onion_domain(name)) {
+  if (ares_is_onion_domain(name)) {
     return ARES_ENOTFOUND;
   }
 
-  status = ares__hosts_search_host(channel, ARES_FALSE, name, &entry);
+  status = ares_hosts_search_host(channel, ARES_FALSE, name, &entry);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  status = ares__hosts_entry_to_hostent(entry, family, host);
+  status = ares_hosts_entry_to_hostent(entry, family, host);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -310,8 +309,8 @@ static ares_status_t ares_gethostbyname_file_int(ares_channel_t *channel,
    * We will also ignore ALL errors when trying to resolve localhost, such
    * as permissions errors reading /etc/hosts or a malformed /etc/hosts */
   if (status != ARES_SUCCESS && status != ARES_ENOMEM &&
-      ares__is_localhost(name)) {
-    return ares__hostent_localhost(name, family, host);
+      ares_is_localhost(name)) {
+    return ares_hostent_localhost(name, family, host);
   }
 
   return status;
@@ -325,8 +324,8 @@ int ares_gethostbyname_file(ares_channel_t *channel, const char *name,
     return ARES_ENOTFOUND;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   status = ares_gethostbyname_file_int(channel, name, family, host);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return (int)status;
 }
diff --git a/deps/cares/src/lib/ares_getnameinfo.c b/deps/cares/src/lib/ares_getnameinfo.c
index 622c1adb1c7342..01593545308cc4 100644
--- a/deps/cares/src/lib/ares_getnameinfo.c
+++ b/deps/cares/src/lib/ares_getnameinfo.c
@@ -193,9 +193,9 @@ void ares_getnameinfo(ares_channel_t *channel, const struct sockaddr *sa,
     return;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   ares_getnameinfo_int(channel, sa, salen, flags_int, callback, arg);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 static void nameinfo_callback(void *arg, int status, int timeouts,
@@ -410,8 +410,8 @@ static char *ares_striendstr(const char *s1, const char *s2)
   c1       = c1_begin;
   c2       = s2;
   while (c2 < s2 + s2_len) {
-    lo1 = ares__tolower((unsigned char)*c1);
-    lo2 = ares__tolower((unsigned char)*c2);
+    lo1 = ares_tolower((unsigned char)*c1);
+    lo2 = ares_tolower((unsigned char)*c2);
     if (lo1 != lo2) {
       return NULL;
     } else {
@@ -423,7 +423,7 @@ static char *ares_striendstr(const char *s1, const char *s2)
   return (char *)((size_t)c1_begin);
 }
 
-ares_bool_t ares__is_onion_domain(const char *name)
+ares_bool_t ares_is_onion_domain(const char *name)
 {
   if (ares_striendstr(name, ".onion")) {
     return ARES_TRUE;
diff --git a/deps/cares/src/lib/ares__hosts_file.c b/deps/cares/src/lib/ares_hosts_file.c
similarity index 64%
rename from deps/cares/src/lib/ares__hosts_file.c
rename to deps/cares/src/lib/ares_hosts_file.c
index ae9c071d7aa90b..0439b8e1d476b9 100644
--- a/deps/cares/src/lib/ares__hosts_file.c
+++ b/deps/cares/src/lib/ares_hosts_file.c
@@ -40,7 +40,6 @@
 #  include <arpa/inet.h>
 #endif
 #include <time.h>
-#include "ares_platform.h"
 
 /* HOSTS FILE PROCESSING OVERVIEW
  * ==============================
@@ -78,22 +77,22 @@
  */
 
 struct ares_hosts_file {
-  time_t                ts;
+  time_t               ts;
   /*! cache the filename so we know if the filename changes it automatically
    *  invalidates the cache */
-  char                 *filename;
+  char                *filename;
   /*! iphash is the owner of the 'entry' object as there is only ever a single
    *  match to the object. */
-  ares__htable_strvp_t *iphash;
+  ares_htable_strvp_t *iphash;
   /*! hosthash does not own the entry so won't free on destruction */
-  ares__htable_strvp_t *hosthash;
+  ares_htable_strvp_t *hosthash;
 };
 
 struct ares_hosts_entry {
-  size_t         refcnt; /*! If the entry is stored multiple times in the
-                          *  ip address hash, we have to reference count it */
-  ares__llist_t *ips;
-  ares__llist_t *hosts;
+  size_t        refcnt; /*! If the entry is stored multiple times in the
+                         *  ip address hash, we have to reference count it */
+  ares_llist_t *ips;
+  ares_llist_t *hosts;
 };
 
 const void *ares_dns_pton(const char *ipaddr, struct ares_addr *addr,
@@ -132,8 +131,8 @@ const void *ares_dns_pton(const char *ipaddr, struct ares_addr *addr,
   return ptr;
 }
 
-static ares_bool_t ares__normalize_ipaddr(const char *ipaddr, char *out,
-                                          size_t out_len)
+static ares_bool_t ares_normalize_ipaddr(const char *ipaddr, char *out,
+                                         size_t out_len)
 {
   struct ares_addr data;
   const void      *addr;
@@ -154,7 +153,7 @@ static ares_bool_t ares__normalize_ipaddr(const char *ipaddr, char *out,
   return ARES_TRUE;
 }
 
-static void ares__hosts_entry_destroy(ares_hosts_entry_t *entry)
+static void ares_hosts_entry_destroy(ares_hosts_entry_t *entry)
 {
   if (entry == NULL) {
     return;
@@ -169,29 +168,29 @@ static void ares__hosts_entry_destroy(ares_hosts_entry_t *entry)
     return;
   }
 
-  ares__llist_destroy(entry->hosts);
-  ares__llist_destroy(entry->ips);
+  ares_llist_destroy(entry->hosts);
+  ares_llist_destroy(entry->ips);
   ares_free(entry);
 }
 
-static void ares__hosts_entry_destroy_cb(void *entry)
+static void ares_hosts_entry_destroy_cb(void *entry)
 {
-  ares__hosts_entry_destroy(entry);
+  ares_hosts_entry_destroy(entry);
 }
 
-void ares__hosts_file_destroy(ares_hosts_file_t *hf)
+void ares_hosts_file_destroy(ares_hosts_file_t *hf)
 {
   if (hf == NULL) {
     return;
   }
 
   ares_free(hf->filename);
-  ares__htable_strvp_destroy(hf->hosthash);
-  ares__htable_strvp_destroy(hf->iphash);
+  ares_htable_strvp_destroy(hf->hosthash);
+  ares_htable_strvp_destroy(hf->iphash);
   ares_free(hf);
 }
 
-static ares_hosts_file_t *ares__hosts_file_create(const char *filename)
+static ares_hosts_file_t *ares_hosts_file_create(const char *filename)
 {
   ares_hosts_file_t *hf = ares_malloc_zero(sizeof(*hf));
   if (hf == NULL) {
@@ -205,12 +204,12 @@ static ares_hosts_file_t *ares__hosts_file_create(const char *filename)
     goto fail;
   }
 
-  hf->iphash = ares__htable_strvp_create(ares__hosts_entry_destroy_cb);
+  hf->iphash = ares_htable_strvp_create(ares_hosts_entry_destroy_cb);
   if (hf->iphash == NULL) {
     goto fail;
   }
 
-  hf->hosthash = ares__htable_strvp_create(NULL);
+  hf->hosthash = ares_htable_strvp_create(NULL);
   if (hf->hosthash == NULL) {
     goto fail;
   }
@@ -218,7 +217,7 @@ static ares_hosts_file_t *ares__hosts_file_create(const char *filename)
   return hf;
 
 fail:
-  ares__hosts_file_destroy(hf);
+  ares_hosts_file_destroy(hf);
   return NULL;
 }
 
@@ -228,63 +227,63 @@ typedef enum {
   ARES_MATCH_HOST   = 2
 } ares_hosts_file_match_t;
 
-static ares_status_t ares__hosts_file_merge_entry(
+static ares_status_t ares_hosts_file_merge_entry(
   const ares_hosts_file_t *hf, ares_hosts_entry_t *existing,
   ares_hosts_entry_t *entry, ares_hosts_file_match_t matchtype)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
   /* If we matched on IP address, we know there can only be 1, so there's no
    * reason to do anything */
   if (matchtype != ARES_MATCH_IPADDR) {
-    while ((node = ares__llist_node_first(entry->ips)) != NULL) {
-      const char *ipaddr = ares__llist_node_val(node);
+    while ((node = ares_llist_node_first(entry->ips)) != NULL) {
+      const char *ipaddr = ares_llist_node_val(node);
 
-      if (ares__htable_strvp_get_direct(hf->iphash, ipaddr) != NULL) {
-        ares__llist_node_destroy(node);
+      if (ares_htable_strvp_get_direct(hf->iphash, ipaddr) != NULL) {
+        ares_llist_node_destroy(node);
         continue;
       }
 
-      ares__llist_node_move_parent_last(node, existing->ips);
+      ares_llist_node_mvparent_last(node, existing->ips);
     }
   }
 
 
-  while ((node = ares__llist_node_first(entry->hosts)) != NULL) {
-    const char *hostname = ares__llist_node_val(node);
+  while ((node = ares_llist_node_first(entry->hosts)) != NULL) {
+    const char *hostname = ares_llist_node_val(node);
 
-    if (ares__htable_strvp_get_direct(hf->hosthash, hostname) != NULL) {
-      ares__llist_node_destroy(node);
+    if (ares_htable_strvp_get_direct(hf->hosthash, hostname) != NULL) {
+      ares_llist_node_destroy(node);
       continue;
     }
 
-    ares__llist_node_move_parent_last(node, existing->hosts);
+    ares_llist_node_mvparent_last(node, existing->hosts);
   }
 
-  ares__hosts_entry_destroy(entry);
+  ares_hosts_entry_destroy(entry);
   return ARES_SUCCESS;
 }
 
 static ares_hosts_file_match_t
-  ares__hosts_file_match(const ares_hosts_file_t *hf, ares_hosts_entry_t *entry,
-                         ares_hosts_entry_t **match)
+  ares_hosts_file_match(const ares_hosts_file_t *hf, ares_hosts_entry_t *entry,
+                        ares_hosts_entry_t **match)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
   *match = NULL;
 
-  for (node = ares__llist_node_first(entry->ips); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const char *ipaddr = ares__llist_node_val(node);
-    *match             = ares__htable_strvp_get_direct(hf->iphash, ipaddr);
+  for (node = ares_llist_node_first(entry->ips); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const char *ipaddr = ares_llist_node_val(node);
+    *match             = ares_htable_strvp_get_direct(hf->iphash, ipaddr);
     if (*match != NULL) {
       return ARES_MATCH_IPADDR;
     }
   }
 
-  for (node = ares__llist_node_first(entry->hosts); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const char *host = ares__llist_node_val(node);
-    *match           = ares__htable_strvp_get_direct(hf->hosthash, host);
+  for (node = ares_llist_node_first(entry->hosts); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const char *host = ares_llist_node_val(node);
+    *match           = ares_htable_strvp_get_direct(hf->hosthash, host);
     if (*match != NULL) {
       return ARES_MATCH_HOST;
     }
@@ -294,38 +293,38 @@ static ares_hosts_file_match_t
 }
 
 /*! entry is invalidated upon calling this function, always, even on error */
-static ares_status_t ares__hosts_file_add(ares_hosts_file_t  *hosts,
-                                          ares_hosts_entry_t *entry)
+static ares_status_t ares_hosts_file_add(ares_hosts_file_t  *hosts,
+                                         ares_hosts_entry_t *entry)
 {
   ares_hosts_entry_t     *match  = NULL;
   ares_status_t           status = ARES_SUCCESS;
-  ares__llist_node_t     *node;
+  ares_llist_node_t      *node;
   ares_hosts_file_match_t matchtype;
   size_t                  num_hostnames;
 
   /* Record the number of hostnames in this entry file.  If we merge into an
    * existing record, these will be *appended* to the entry, so we'll count
    * backwards when adding to the hosts hashtable */
-  num_hostnames = ares__llist_len(entry->hosts);
+  num_hostnames = ares_llist_len(entry->hosts);
 
-  matchtype = ares__hosts_file_match(hosts, entry, &match);
+  matchtype = ares_hosts_file_match(hosts, entry, &match);
 
   if (matchtype != ARES_MATCH_NONE) {
-    status = ares__hosts_file_merge_entry(hosts, match, entry, matchtype);
+    status = ares_hosts_file_merge_entry(hosts, match, entry, matchtype);
     if (status != ARES_SUCCESS) {
-      ares__hosts_entry_destroy(entry); /* LCOV_EXCL_LINE: DefensiveCoding */
-      return status;                    /* LCOV_EXCL_LINE: DefensiveCoding */
+      ares_hosts_entry_destroy(entry); /* LCOV_EXCL_LINE: DefensiveCoding */
+      return status;                   /* LCOV_EXCL_LINE: DefensiveCoding */
     }
     /* entry was invalidated above by merging */
     entry = match;
   }
 
   if (matchtype != ARES_MATCH_IPADDR) {
-    const char *ipaddr = ares__llist_last_val(entry->ips);
+    const char *ipaddr = ares_llist_last_val(entry->ips);
 
-    if (!ares__htable_strvp_get(hosts->iphash, ipaddr, NULL)) {
-      if (!ares__htable_strvp_insert(hosts->iphash, ipaddr, entry)) {
-        ares__hosts_entry_destroy(entry);
+    if (!ares_htable_strvp_get(hosts->iphash, ipaddr, NULL)) {
+      if (!ares_htable_strvp_insert(hosts->iphash, ipaddr, entry)) {
+        ares_hosts_entry_destroy(entry);
         return ARES_ENOMEM;
       }
       entry->refcnt++;
@@ -334,9 +333,9 @@ static ares_status_t ares__hosts_file_add(ares_hosts_file_t  *hosts,
 
   /* Go backwards, on a merge, hostnames are appended.  Breakout once we've
    * consumed all the hosts that we appended */
-  for (node = ares__llist_node_last(entry->hosts); node != NULL;
-       node = ares__llist_node_prev(node)) {
-    const char *val = ares__llist_node_val(node);
+  for (node = ares_llist_node_last(entry->hosts); node != NULL;
+       node = ares_llist_node_prev(node)) {
+    const char *val = ares_llist_node_val(node);
 
     if (num_hostnames == 0) {
       break;
@@ -346,11 +345,11 @@ static ares_status_t ares__hosts_file_add(ares_hosts_file_t  *hosts,
 
     /* first hostname match wins.  If we detect a duplicate hostname for another
      * ip it will automatically be added to the same entry */
-    if (ares__htable_strvp_get(hosts->hosthash, val, NULL)) {
+    if (ares_htable_strvp_get(hosts->hosthash, val, NULL)) {
       continue;
     }
 
-    if (!ares__htable_strvp_insert(hosts->hosthash, val, entry)) {
+    if (!ares_htable_strvp_insert(hosts->hosthash, val, entry)) {
       return ARES_ENOMEM;
     }
   }
@@ -358,15 +357,15 @@ static ares_status_t ares__hosts_file_add(ares_hosts_file_t  *hosts,
   return ARES_SUCCESS;
 }
 
-static ares_bool_t ares__hosts_entry_isdup(ares_hosts_entry_t *entry,
-                                           const char         *host)
+static ares_bool_t ares_hosts_entry_isdup(ares_hosts_entry_t *entry,
+                                          const char         *host)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
-  for (node = ares__llist_node_first(entry->ips); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const char *myhost = ares__llist_node_val(node);
-    if (strcasecmp(myhost, host) == 0) {
+  for (node = ares_llist_node_first(entry->ips); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const char *myhost = ares_llist_node_val(node);
+    if (ares_strcaseeq(myhost, host)) {
       return ARES_TRUE;
     }
   }
@@ -374,44 +373,44 @@ static ares_bool_t ares__hosts_entry_isdup(ares_hosts_entry_t *entry,
   return ARES_FALSE;
 }
 
-static ares_status_t ares__parse_hosts_hostnames(ares__buf_t        *buf,
-                                                 ares_hosts_entry_t *entry)
+static ares_status_t ares_parse_hosts_hostnames(ares_buf_t         *buf,
+                                                ares_hosts_entry_t *entry)
 {
-  entry->hosts = ares__llist_create(ares_free);
+  entry->hosts = ares_llist_create(ares_free);
   if (entry->hosts == NULL) {
     return ARES_ENOMEM;
   }
 
   /* Parse hostnames and aliases */
-  while (ares__buf_len(buf)) {
+  while (ares_buf_len(buf)) {
     char          hostname[256];
     char         *temp;
     ares_status_t status;
     unsigned char comment = '#';
 
-    ares__buf_consume_whitespace(buf, ARES_FALSE);
+    ares_buf_consume_whitespace(buf, ARES_FALSE);
 
-    if (ares__buf_len(buf) == 0) {
+    if (ares_buf_len(buf) == 0) {
       break;
     }
 
     /* See if it is a comment, if so stop processing */
-    if (ares__buf_begins_with(buf, &comment, 1)) {
+    if (ares_buf_begins_with(buf, &comment, 1)) {
       break;
     }
 
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
     /* Must be at end of line */
-    if (ares__buf_consume_nonwhitespace(buf) == 0) {
+    if (ares_buf_consume_nonwhitespace(buf) == 0) {
       break;
     }
 
-    status = ares__buf_tag_fetch_string(buf, hostname, sizeof(hostname));
+    status = ares_buf_tag_fetch_string(buf, hostname, sizeof(hostname));
     if (status != ARES_SUCCESS) {
       /* Bad entry, just ignore as long as its not the first.  If its the first,
        * it must be valid */
-      if (ares__llist_len(entry->hosts) == 0) {
+      if (ares_llist_len(entry->hosts) == 0) {
         return ARES_EBADSTR;
       }
 
@@ -419,12 +418,12 @@ static ares_status_t ares__parse_hosts_hostnames(ares__buf_t        *buf,
     }
 
     /* Validate it is a valid hostname characterset */
-    if (!ares__is_hostname(hostname)) {
+    if (!ares_is_hostname(hostname)) {
       continue;
     }
 
     /* Don't add a duplicate to the same entry */
-    if (ares__hosts_entry_isdup(entry, hostname)) {
+    if (ares_hosts_entry_isdup(entry, hostname)) {
       continue;
     }
 
@@ -434,22 +433,22 @@ static ares_status_t ares__parse_hosts_hostnames(ares__buf_t        *buf,
       return ARES_ENOMEM;
     }
 
-    if (ares__llist_insert_last(entry->hosts, temp) == NULL) {
+    if (ares_llist_insert_last(entry->hosts, temp) == NULL) {
       ares_free(temp);
       return ARES_ENOMEM;
     }
   }
 
   /* Must have at least 1 entry */
-  if (ares__llist_len(entry->hosts) == 0) {
+  if (ares_llist_len(entry->hosts) == 0) {
     return ARES_EBADSTR;
   }
 
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares__parse_hosts_ipaddr(ares__buf_t         *buf,
-                                              ares_hosts_entry_t **entry_out)
+static ares_status_t ares_parse_hosts_ipaddr(ares_buf_t          *buf,
+                                             ares_hosts_entry_t **entry_out)
 {
   char                addr[INET6_ADDRSTRLEN];
   char               *temp;
@@ -458,15 +457,15 @@ static ares_status_t ares__parse_hosts_ipaddr(ares__buf_t         *buf,
 
   *entry_out = NULL;
 
-  ares__buf_tag(buf);
-  ares__buf_consume_nonwhitespace(buf);
-  status = ares__buf_tag_fetch_string(buf, addr, sizeof(addr));
+  ares_buf_tag(buf);
+  ares_buf_consume_nonwhitespace(buf);
+  status = ares_buf_tag_fetch_string(buf, addr, sizeof(addr));
   if (status != ARES_SUCCESS) {
     return status;
   }
 
   /* Validate and normalize the ip address format */
-  if (!ares__normalize_ipaddr(addr, addr, sizeof(addr))) {
+  if (!ares_normalize_ipaddr(addr, addr, sizeof(addr))) {
     return ARES_EBADSTR;
   }
 
@@ -475,21 +474,21 @@ static ares_status_t ares__parse_hosts_ipaddr(ares__buf_t         *buf,
     return ARES_ENOMEM;
   }
 
-  entry->ips = ares__llist_create(ares_free);
+  entry->ips = ares_llist_create(ares_free);
   if (entry->ips == NULL) {
-    ares__hosts_entry_destroy(entry);
+    ares_hosts_entry_destroy(entry);
     return ARES_ENOMEM;
   }
 
   temp = ares_strdup(addr);
   if (temp == NULL) {
-    ares__hosts_entry_destroy(entry);
+    ares_hosts_entry_destroy(entry);
     return ARES_ENOMEM;
   }
 
-  if (ares__llist_insert_first(entry->ips, temp) == NULL) {
+  if (ares_llist_insert_first(entry->ips, temp) == NULL) {
     ares_free(temp);
-    ares__hosts_entry_destroy(entry);
+    ares_hosts_entry_destroy(entry);
     return ARES_ENOMEM;
   }
 
@@ -498,100 +497,100 @@ static ares_status_t ares__parse_hosts_ipaddr(ares__buf_t         *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares__parse_hosts(const char         *filename,
-                                       ares_hosts_file_t **out)
+static ares_status_t ares_parse_hosts(const char         *filename,
+                                      ares_hosts_file_t **out)
 {
-  ares__buf_t        *buf    = NULL;
+  ares_buf_t         *buf    = NULL;
   ares_status_t       status = ARES_EBADRESP;
   ares_hosts_file_t  *hf     = NULL;
   ares_hosts_entry_t *entry  = NULL;
 
   *out = NULL;
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  status = ares__buf_load_file(filename, buf);
+  status = ares_buf_load_file(filename, buf);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  hf = ares__hosts_file_create(filename);
+  hf = ares_hosts_file_create(filename);
   if (hf == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  while (ares__buf_len(buf)) {
+  while (ares_buf_len(buf)) {
     unsigned char comment = '#';
 
     /* -- Start of new line here -- */
 
     /* Consume any leading whitespace */
-    ares__buf_consume_whitespace(buf, ARES_FALSE);
+    ares_buf_consume_whitespace(buf, ARES_FALSE);
 
-    if (ares__buf_len(buf) == 0) {
+    if (ares_buf_len(buf) == 0) {
       break;
     }
 
     /* See if it is a comment, if so, consume remaining line */
-    if (ares__buf_begins_with(buf, &comment, 1)) {
-      ares__buf_consume_line(buf, ARES_TRUE);
+    if (ares_buf_begins_with(buf, &comment, 1)) {
+      ares_buf_consume_line(buf, ARES_TRUE);
       continue;
     }
 
     /* Pull off ip address */
-    status = ares__parse_hosts_ipaddr(buf, &entry);
+    status = ares_parse_hosts_ipaddr(buf, &entry);
     if (status == ARES_ENOMEM) {
       goto done;
     }
     if (status != ARES_SUCCESS) {
       /* Bad line, consume and go onto next */
-      ares__buf_consume_line(buf, ARES_TRUE);
+      ares_buf_consume_line(buf, ARES_TRUE);
       continue;
     }
 
     /* Parse of the hostnames */
-    status = ares__parse_hosts_hostnames(buf, entry);
+    status = ares_parse_hosts_hostnames(buf, entry);
     if (status == ARES_ENOMEM) {
       goto done;
     } else if (status != ARES_SUCCESS) {
       /* Bad line, consume and go onto next */
-      ares__hosts_entry_destroy(entry);
+      ares_hosts_entry_destroy(entry);
       entry = NULL;
-      ares__buf_consume_line(buf, ARES_TRUE);
+      ares_buf_consume_line(buf, ARES_TRUE);
       continue;
     }
 
     /* Append the successful entry to the hosts file */
-    status = ares__hosts_file_add(hf, entry);
+    status = ares_hosts_file_add(hf, entry);
     entry  = NULL; /* is always invalidated by this function, even on error */
     if (status != ARES_SUCCESS) {
       goto done;
     }
 
     /* Go to next line */
-    ares__buf_consume_line(buf, ARES_TRUE);
+    ares_buf_consume_line(buf, ARES_TRUE);
   }
 
   status = ARES_SUCCESS;
 
 done:
-  ares__hosts_entry_destroy(entry);
-  ares__buf_destroy(buf);
+  ares_hosts_entry_destroy(entry);
+  ares_buf_destroy(buf);
   if (status != ARES_SUCCESS) {
-    ares__hosts_file_destroy(hf);
+    ares_hosts_file_destroy(hf);
   } else {
     *out = hf;
   }
   return status;
 }
 
-static ares_bool_t ares__hosts_expired(const char              *filename,
-                                       const ares_hosts_file_t *hf)
+static ares_bool_t ares_hosts_expired(const char              *filename,
+                                      const ares_hosts_file_t *hf)
 {
   time_t mod_ts = 0;
 
@@ -620,7 +619,7 @@ static ares_bool_t ares__hosts_expired(const char              *filename,
   }
 
   /* If filenames are different, its expired */
-  if (strcasecmp(hf->filename, filename) != 0) {
+  if (!ares_strcaseeq(hf->filename, filename)) {
     return ARES_TRUE;
   }
 
@@ -631,8 +630,8 @@ static ares_bool_t ares__hosts_expired(const char              *filename,
   return ARES_FALSE;
 }
 
-static ares_status_t ares__hosts_path(const ares_channel_t *channel,
-                                      ares_bool_t use_env, char **path)
+static ares_status_t ares_hosts_path(const ares_channel_t *channel,
+                                     ares_bool_t use_env, char **path)
 {
   char *path_hosts = NULL;
 
@@ -688,40 +687,40 @@ static ares_status_t ares__hosts_path(const ares_channel_t *channel,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares__hosts_update(ares_channel_t *channel,
-                                        ares_bool_t     use_env)
+static ares_status_t ares_hosts_update(ares_channel_t *channel,
+                                       ares_bool_t     use_env)
 {
   ares_status_t status;
   char         *filename = NULL;
 
-  status = ares__hosts_path(channel, use_env, &filename);
+  status = ares_hosts_path(channel, use_env, &filename);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
-  if (!ares__hosts_expired(filename, channel->hf)) {
+  if (!ares_hosts_expired(filename, channel->hf)) {
     ares_free(filename);
     return ARES_SUCCESS;
   }
 
-  ares__hosts_file_destroy(channel->hf);
+  ares_hosts_file_destroy(channel->hf);
   channel->hf = NULL;
 
-  status = ares__parse_hosts(filename, &channel->hf);
+  status = ares_parse_hosts(filename, &channel->hf);
   ares_free(filename);
   return status;
 }
 
-ares_status_t ares__hosts_search_ipaddr(ares_channel_t *channel,
-                                        ares_bool_t use_env, const char *ipaddr,
-                                        const ares_hosts_entry_t **entry)
+ares_status_t ares_hosts_search_ipaddr(ares_channel_t *channel,
+                                       ares_bool_t use_env, const char *ipaddr,
+                                       const ares_hosts_entry_t **entry)
 {
   ares_status_t status;
   char          addr[INET6_ADDRSTRLEN];
 
   *entry = NULL;
 
-  status = ares__hosts_update(channel, use_env);
+  status = ares_hosts_update(channel, use_env);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -730,11 +729,11 @@ ares_status_t ares__hosts_search_ipaddr(ares_channel_t *channel,
     return ARES_ENOTFOUND; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  if (!ares__normalize_ipaddr(ipaddr, addr, sizeof(addr))) {
+  if (!ares_normalize_ipaddr(ipaddr, addr, sizeof(addr))) {
     return ARES_EBADNAME;
   }
 
-  *entry = ares__htable_strvp_get_direct(channel->hf->iphash, addr);
+  *entry = ares_htable_strvp_get_direct(channel->hf->iphash, addr);
   if (*entry == NULL) {
     return ARES_ENOTFOUND;
   }
@@ -742,15 +741,15 @@ ares_status_t ares__hosts_search_ipaddr(ares_channel_t *channel,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__hosts_search_host(ares_channel_t *channel,
-                                      ares_bool_t use_env, const char *host,
-                                      const ares_hosts_entry_t **entry)
+ares_status_t ares_hosts_search_host(ares_channel_t *channel,
+                                     ares_bool_t use_env, const char *host,
+                                     const ares_hosts_entry_t **entry)
 {
   ares_status_t status;
 
   *entry = NULL;
 
-  status = ares__hosts_update(channel, use_env);
+  status = ares_hosts_update(channel, use_env);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -759,7 +758,7 @@ ares_status_t ares__hosts_search_host(ares_channel_t *channel,
     return ARES_ENOTFOUND; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  *entry = ares__htable_strvp_get_direct(channel->hf->hosthash, host);
+  *entry = ares_htable_strvp_get_direct(channel->hf->hosthash, host);
   if (*entry == NULL) {
     return ARES_ENOTFOUND;
   }
@@ -768,23 +767,23 @@ ares_status_t ares__hosts_search_host(ares_channel_t *channel,
 }
 
 static ares_status_t
-  ares__hosts_ai_append_cnames(const ares_hosts_entry_t    *entry,
-                               struct ares_addrinfo_cname **cnames_out)
+  ares_hosts_ai_append_cnames(const ares_hosts_entry_t    *entry,
+                              struct ares_addrinfo_cname **cnames_out)
 {
   struct ares_addrinfo_cname *cname  = NULL;
   struct ares_addrinfo_cname *cnames = NULL;
   const char                 *primaryhost;
-  ares__llist_node_t         *node;
+  ares_llist_node_t          *node;
   ares_status_t               status;
   size_t                      cnt = 0;
 
-  node        = ares__llist_node_first(entry->hosts);
-  primaryhost = ares__llist_node_val(node);
+  node        = ares_llist_node_first(entry->hosts);
+  primaryhost = ares_llist_node_val(node);
   /* Skip to next node to start with aliases */
-  node = ares__llist_node_next(node);
+  node = ares_llist_node_next(node);
 
   while (node != NULL) {
-    const char *host = ares__llist_node_val(node);
+    const char *host = ares_llist_node_val(node);
 
     /* Cap at 100 entries. , some people use
      * https://github.com/StevenBlack/hosts and we don't need 200k+ aliases */
@@ -793,7 +792,7 @@ static ares_status_t
       break; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
 
-    cname = ares__append_addrinfo_cname(&cnames);
+    cname = ares_append_addrinfo_cname(&cnames);
     if (cname == NULL) {
       status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
       goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -811,12 +810,12 @@ static ares_status_t
       goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    node = ares__llist_node_next(node);
+    node = ares_llist_node_next(node);
   }
 
   /* No entries, add only primary */
   if (cnames == NULL) {
-    cname = ares__append_addrinfo_cname(&cnames);
+    cname = ares_append_addrinfo_cname(&cnames);
     if (cname == NULL) {
       status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
       goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -832,24 +831,24 @@ static ares_status_t
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__freeaddrinfo_cnames(cnames); /* LCOV_EXCL_LINE: DefensiveCoding */
-    return status;                     /* LCOV_EXCL_LINE: DefensiveCoding */
+    ares_freeaddrinfo_cnames(cnames); /* LCOV_EXCL_LINE: DefensiveCoding */
+    return status;                    /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
   *cnames_out = cnames;
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
-                                            const char *name, int family,
-                                            unsigned short        port,
-                                            ares_bool_t           want_cnames,
-                                            struct ares_addrinfo *ai)
+ares_status_t ares_hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
+                                           const char *name, int family,
+                                           unsigned short        port,
+                                           ares_bool_t           want_cnames,
+                                           struct ares_addrinfo *ai)
 {
   ares_status_t               status;
   struct ares_addrinfo_cname *cnames  = NULL;
   struct ares_addrinfo_node  *ainodes = NULL;
-  ares__llist_node_t         *node;
+  ares_llist_node_t          *node;
 
   switch (family) {
     case AF_INET:
@@ -868,12 +867,12 @@ ares_status_t ares__hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
     }
   }
 
-  for (node = ares__llist_node_first(entry->ips); node != NULL;
-       node = ares__llist_node_next(node)) {
+  for (node = ares_llist_node_first(entry->ips); node != NULL;
+       node = ares_llist_node_next(node)) {
     struct ares_addr addr;
     const void      *ptr     = NULL;
     size_t           ptr_len = 0;
-    const char      *ipaddr  = ares__llist_node_val(node);
+    const char      *ipaddr  = ares_llist_node_val(node);
 
     memset(&addr, 0, sizeof(addr));
     addr.family = family;
@@ -890,7 +889,7 @@ ares_status_t ares__hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
   }
 
   if (want_cnames) {
-    status = ares__hosts_ai_append_cnames(entry, &cnames);
+    status = ares_hosts_ai_append_cnames(entry, &cnames);
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
@@ -901,21 +900,21 @@ ares_status_t ares__hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
 done:
   if (status != ARES_SUCCESS) {
     /* LCOV_EXCL_START: defensive coding */
-    ares__freeaddrinfo_cnames(cnames);
-    ares__freeaddrinfo_nodes(ainodes);
+    ares_freeaddrinfo_cnames(cnames);
+    ares_freeaddrinfo_nodes(ainodes);
     ares_free(ai->name);
     ai->name = NULL;
     return status;
     /* LCOV_EXCL_STOP */
   }
-  ares__addrinfo_cat_cnames(&ai->cnames, cnames);
-  ares__addrinfo_cat_nodes(&ai->nodes, ainodes);
+  ares_addrinfo_cat_cnames(&ai->cnames, cnames);
+  ares_addrinfo_cat_nodes(&ai->nodes, ainodes);
 
   return status;
 }
 
-ares_status_t ares__hosts_entry_to_hostent(const ares_hosts_entry_t *entry,
-                                           int family, struct hostent **hostent)
+ares_status_t ares_hosts_entry_to_hostent(const ares_hosts_entry_t *entry,
+                                          int family, struct hostent **hostent)
 {
   ares_status_t         status;
   struct ares_addrinfo *ai = ares_malloc_zero(sizeof(*ai));
@@ -926,12 +925,12 @@ ares_status_t ares__hosts_entry_to_hostent(const ares_hosts_entry_t *entry,
     return ARES_ENOMEM;
   }
 
-  status = ares__hosts_entry_to_addrinfo(entry, NULL, family, 0, ARES_TRUE, ai);
+  status = ares_hosts_entry_to_addrinfo(entry, NULL, family, 0, ARES_TRUE, ai);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  status = ares__addrinfo2hostent(ai, family, hostent);
+  status = ares_addrinfo2hostent(ai, family, hostent);
   if (status != ARES_SUCCESS) {
     goto done;
   }
diff --git a/deps/cares/src/lib/ares_inet_net_pton.h b/deps/cares/src/lib/ares_inet_net_pton.h
index 0a52855bd8c966..e3ed83a3cb1166 100644
--- a/deps/cares/src/lib/ares_inet_net_pton.h
+++ b/deps/cares/src/lib/ares_inet_net_pton.h
@@ -26,10 +26,6 @@
 #ifndef HEADER_CARES_INET_NET_PTON_H
 #define HEADER_CARES_INET_NET_PTON_H
 
-#ifdef HAVE_INET_NET_PTON
-#  define ares_inet_net_pton(w, x, y, z) inet_net_pton(w, x, y, z)
-#else
 int ares_inet_net_pton(int af, const char *src, void *dst, size_t size);
-#endif
 
 #endif /* HEADER_CARES_INET_NET_PTON_H */
diff --git a/deps/cares/src/lib/ares_init.c b/deps/cares/src/lib/ares_init.c
index 6dc5f4f9353ce2..ae78262a112fc0 100644
--- a/deps/cares/src/lib/ares_init.c
+++ b/deps/cares/src/lib/ares_init.c
@@ -62,7 +62,6 @@
 #endif
 
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 #include "event/ares_event.h"
 
 int ares_init(ares_channel_t **channelptr)
@@ -117,7 +116,7 @@ static void server_destroy_cb(void *data)
   if (data == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
-  ares__destroy_server(data);
+  ares_destroy_server(data);
 }
 
 static ares_status_t init_by_defaults(ares_channel_t *channel)
@@ -128,7 +127,7 @@ static ares_status_t init_by_defaults(ares_channel_t *channel)
   const char *dot;
 #endif
   struct ares_addr addr;
-  ares__llist_t   *sconfig = NULL;
+  ares_llist_t    *sconfig = NULL;
 
   /* Enable EDNS by default */
   if (!(channel->optmask & ARES_OPT_FLAGS)) {
@@ -146,7 +145,7 @@ static ares_status_t init_by_defaults(ares_channel_t *channel)
     channel->tries = DEFAULT_TRIES;
   }
 
-  if (ares__slist_len(channel->servers) == 0) {
+  if (ares_slist_len(channel->servers) == 0) {
     /* Add a default local named server to the channel unless configured not
      * to (in which case return an error).
      */
@@ -158,28 +157,19 @@ static ares_status_t init_by_defaults(ares_channel_t *channel)
     addr.family            = AF_INET;
     addr.addr.addr4.s_addr = htonl(INADDR_LOOPBACK);
 
-    rc = ares__sconfig_append(&sconfig, &addr, 0, 0, NULL);
+    rc = ares_sconfig_append(channel, &sconfig, &addr, 0, 0, NULL);
     if (rc != ARES_SUCCESS) {
       goto error; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    rc = ares__servers_update(channel, sconfig, ARES_FALSE);
-    ares__llist_destroy(sconfig);
+    rc = ares_servers_update(channel, sconfig, ARES_FALSE);
+    ares_llist_destroy(sconfig);
 
     if (rc != ARES_SUCCESS) {
       goto error;
     }
   }
 
-#if defined(USE_WINSOCK)
-#  define toolong(x) (x == -1) && (SOCKERRNO == WSAEFAULT)
-#elif defined(ENAMETOOLONG)
-#  define toolong(x) \
-    (x == -1) && ((SOCKERRNO == ENAMETOOLONG) || (SOCKERRNO == EINVAL))
-#else
-#  define toolong(x) (x == -1) && (SOCKERRNO == EINVAL)
-#endif
-
   if (channel->ndomains == 0) {
     /* Derive a default domain search list from the kernel hostname,
      * or set it to empty if the hostname isn't helpful.
@@ -187,9 +177,7 @@ static ares_status_t init_by_defaults(ares_channel_t *channel)
 #ifndef HAVE_GETHOSTNAME
     channel->ndomains = 0; /* default to none */
 #else
-    GETHOSTNAME_TYPE_ARG2 lenv = 64;
-    size_t                len  = 64;
-    int                   res;
+    size_t len        = 256;
     channel->ndomains = 0; /* default to none */
 
     hostname = ares_malloc(len);
@@ -198,28 +186,11 @@ static ares_status_t init_by_defaults(ares_channel_t *channel)
       goto error;       /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    do {
-      res = gethostname(hostname, lenv);
-
-      if (toolong(res)) {
-        char *p;
-        len  *= 2;
-        lenv *= 2;
-        p     = ares_realloc(hostname, len);
-        if (!p) {
-          rc = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
-          goto error;       /* LCOV_EXCL_LINE: OutOfMemory */
-        }
-        hostname = p;
-        continue;
-      } else if (res) {
-        /* Lets not treat a gethostname failure as critical, since we
-         * are ok if gethostname doesn't even exist */
-        *hostname = '\0';
-        break;
-      }
-
-    } while (res != 0);
+    if (gethostname(hostname, (GETHOSTNAME_TYPE_ARG2)len) != 0) {
+      /* Lets not treat a gethostname failure as critical, since we
+       * are ok if gethostname doesn't even exist */
+      *hostname = '\0';
+    }
 
     dot = strchr(hostname, '.');
     if (dot) {
@@ -286,13 +257,13 @@ int ares_init_options(ares_channel_t           **channelptr,
   /* One option where zero is valid, so set default value here */
   channel->ndots = 1;
 
-  status = ares__channel_threading_init(channel);
+  status = ares_channel_threading_init(channel);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Generate random key */
-  channel->rand_state = ares__init_rand_state();
+  channel->rand_state = ares_init_rand_state();
   if (channel->rand_state == NULL) {
     status = ARES_ENOMEM;
     DEBUGF(fprintf(stderr, "Error: init_id_key failed: %s\n",
@@ -302,33 +273,33 @@ int ares_init_options(ares_channel_t           **channelptr,
 
   /* Initialize Server List */
   channel->servers =
-    ares__slist_create(channel->rand_state, server_sort_cb, server_destroy_cb);
+    ares_slist_create(channel->rand_state, server_sort_cb, server_destroy_cb);
   if (channel->servers == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
   /* Initialize our lists of queries */
-  channel->all_queries = ares__llist_create(NULL);
+  channel->all_queries = ares_llist_create(NULL);
   if (channel->all_queries == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  channel->queries_by_qid = ares__htable_szvp_create(NULL);
+  channel->queries_by_qid = ares_htable_szvp_create(NULL);
   if (channel->queries_by_qid == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
   channel->queries_by_timeout =
-    ares__slist_create(channel->rand_state, ares_query_timeout_cmp_cb, NULL);
+    ares_slist_create(channel->rand_state, ares_query_timeout_cmp_cb, NULL);
   if (channel->queries_by_timeout == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  channel->connnode_by_socket = ares__htable_asvp_create(NULL);
+  channel->connnode_by_socket = ares_htable_asvp_create(NULL);
   if (channel->connnode_by_socket == NULL) {
     status = ARES_ENOMEM;
     goto done;
@@ -338,7 +309,7 @@ int ares_init_options(ares_channel_t           **channelptr,
    * precedence to lowest.
    */
 
-  status = ares__init_by_options(channel, options, optmask);
+  status = ares_init_by_options(channel, options, optmask);
   if (status != ARES_SUCCESS) {
     DEBUGF(fprintf(stderr, "Error: init_by_options failed: %s\n",
                    ares_strerror(status)));
@@ -350,14 +321,14 @@ int ares_init_options(ares_channel_t           **channelptr,
   /* Go ahead and let it initialize the query cache even if the ttl is 0 and
    * completely unused.  This reduces the number of different code paths that
    * might be followed even if there is a minor performance hit. */
-  status = ares__qcache_create(channel->rand_state, channel->qcache_max_ttl,
-                               &channel->qcache);
+  status = ares_qcache_create(channel->rand_state, channel->qcache_max_ttl,
+                              &channel->qcache);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   if (status == ARES_SUCCESS) {
-    status = ares__init_by_sysconfig(channel);
+    status = ares_init_by_sysconfig(channel);
     if (status != ARES_SUCCESS) {
       DEBUGF(fprintf(stderr, "Error: init_by_sysconfig failed: %s\n",
                      ares_strerror(status)));
@@ -375,6 +346,8 @@ int ares_init_options(ares_channel_t           **channelptr,
     goto done;
   }
 
+  ares_set_socket_functions_def(channel);
+
   /* Initialize the event thread */
   if (channel->optmask & ARES_OPT_EVENT_THREAD) {
     ares_event_thread_t *e = NULL;
@@ -409,23 +382,23 @@ static void *ares_reinit_thread(void *arg)
   ares_channel_t *channel = arg;
   ares_status_t   status;
 
-  /* ares__init_by_sysconfig() will lock when applying the config, but not
+  /* ares_init_by_sysconfig() will lock when applying the config, but not
    * when retrieving. */
-  status = ares__init_by_sysconfig(channel);
+  status = ares_init_by_sysconfig(channel);
   if (status != ARES_SUCCESS) {
     DEBUGF(fprintf(stderr, "Error: init_by_sysconfig failed: %s\n",
                    ares_strerror(status)));
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
   /* Flush cached queries on reinit */
   if (status == ARES_SUCCESS && channel->qcache) {
-    ares__qcache_flush(channel->qcache);
+    ares_qcache_flush(channel->qcache);
   }
 
   channel->reinit_pending = ARES_FALSE;
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return NULL;
 }
@@ -438,34 +411,34 @@ ares_status_t ares_reinit(ares_channel_t *channel)
     return ARES_EFORMERR;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
   /* If a reinit is already in process, lets not do it again. Or if we are
    * shutting down, skip. */
   if (!channel->sys_up || channel->reinit_pending) {
-    ares__channel_unlock(channel);
+    ares_channel_unlock(channel);
     return ARES_SUCCESS;
   }
   channel->reinit_pending = ARES_TRUE;
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   if (ares_threadsafety()) {
     /* clean up the prior reinit process's thread.  We know the thread isn't
      * running since reinit_pending was false */
     if (channel->reinit_thread != NULL) {
       void *rv;
-      ares__thread_join(channel->reinit_thread, &rv);
+      ares_thread_join(channel->reinit_thread, &rv);
       channel->reinit_thread = NULL;
     }
 
     /* Spawn a new thread */
     status =
-      ares__thread_create(&channel->reinit_thread, ares_reinit_thread, channel);
+      ares_thread_create(&channel->reinit_thread, ares_reinit_thread, channel);
     if (status != ARES_SUCCESS) {
       /* LCOV_EXCL_START: UntestablePath */
-      ares__channel_lock(channel);
+      ares_channel_lock(channel);
       channel->reinit_pending = ARES_FALSE;
-      ares__channel_unlock(channel);
+      ares_channel_unlock(channel);
       /* LCOV_EXCL_STOP */
     }
   } else {
@@ -508,23 +481,25 @@ int ares_dup(ares_channel_t **dest, const ares_channel_t *src)
     goto done;
   }
 
-  ares__channel_lock(src);
+  ares_channel_lock(src);
   /* Now clone the options that ares_save_options() doesn't support, but are
    * user-provided */
-  (*dest)->sock_create_cb       = src->sock_create_cb;
-  (*dest)->sock_create_cb_data  = src->sock_create_cb_data;
-  (*dest)->sock_config_cb       = src->sock_config_cb;
-  (*dest)->sock_config_cb_data  = src->sock_config_cb_data;
-  (*dest)->sock_funcs           = src->sock_funcs;
-  (*dest)->sock_func_cb_data    = src->sock_func_cb_data;
-  (*dest)->server_state_cb      = src->server_state_cb;
-  (*dest)->server_state_cb_data = src->server_state_cb_data;
+  (*dest)->sock_create_cb            = src->sock_create_cb;
+  (*dest)->sock_create_cb_data       = src->sock_create_cb_data;
+  (*dest)->sock_config_cb            = src->sock_config_cb;
+  (*dest)->sock_config_cb_data       = src->sock_config_cb_data;
+  memcpy(&(*dest)->sock_funcs, &(src->sock_funcs), sizeof((*dest)->sock_funcs));
+  (*dest)->sock_func_cb_data         = src->sock_func_cb_data;
+  (*dest)->legacy_sock_funcs         = src->legacy_sock_funcs;
+  (*dest)->legacy_sock_funcs_cb_data = src->legacy_sock_funcs_cb_data;
+  (*dest)->server_state_cb           = src->server_state_cb;
+  (*dest)->server_state_cb_data      = src->server_state_cb_data;
 
   ares_strcpy((*dest)->local_dev_name, src->local_dev_name,
               sizeof((*dest)->local_dev_name));
   (*dest)->local_ip4 = src->local_ip4;
   memcpy((*dest)->local_ip6, src->local_ip6, sizeof(src->local_ip6));
-  ares__channel_unlock(src);
+  ares_channel_unlock(src);
 
   /* Servers are a bit unique as ares_init_options() only allows ipv4 servers
    * and not a port per server, but there are other user specified ways, that
@@ -568,9 +543,9 @@ void ares_set_local_ip4(ares_channel_t *channel, unsigned int local_ip)
   if (channel == NULL) {
     return;
   }
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   channel->local_ip4 = local_ip;
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 /* local_ip6 should be 16 bytes in length */
@@ -579,9 +554,9 @@ void ares_set_local_ip6(ares_channel_t *channel, const unsigned char *local_ip6)
   if (channel == NULL) {
     return;
   }
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   memcpy(&channel->local_ip6, local_ip6, sizeof(channel->local_ip6));
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 /* local_dev_name should be null terminated. */
@@ -591,11 +566,11 @@ void ares_set_local_dev(ares_channel_t *channel, const char *local_dev_name)
     return;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   ares_strcpy(channel->local_dev_name, local_dev_name,
               sizeof(channel->local_dev_name));
   channel->local_dev_name[sizeof(channel->local_dev_name) - 1] = 0;
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 }
 
 int ares_set_sortlist(ares_channel_t *channel, const char *sortstr)
@@ -607,9 +582,9 @@ int ares_set_sortlist(ares_channel_t *channel, const char *sortstr)
   if (!channel) {
     return ARES_ENODATA;
   }
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  status = ares__parse_sortlist(&sortlist, &nsort, sortstr);
+  status = ares_parse_sortlist(&sortlist, &nsort, sortstr);
   if (status == ARES_SUCCESS && sortlist) {
     if (channel->sortlist) {
       ares_free(channel->sortlist);
@@ -620,6 +595,6 @@ int ares_set_sortlist(ares_channel_t *channel, const char *sortstr)
     /* Save sortlist as if it was passed in as an option */
     channel->optmask |= ARES_OPT_SORTLIST;
   }
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return (int)status;
 }
diff --git a/deps/cares/src/lib/ares_ipv6.h b/deps/cares/src/lib/ares_ipv6.h
index e7e0b6d3a5e60c..5da341b010603f 100644
--- a/deps/cares/src/lib/ares_ipv6.h
+++ b/deps/cares/src/lib/ares_ipv6.h
@@ -94,7 +94,7 @@ struct addrinfo {
 #  ifdef IFNAMSIZ
 #    define IF_NAMESIZE IFNAMSIZ
 #  else
-#    define IF_NAMESIZE 256
+#    define IF_NAMESIZE 32
 #  endif
 #endif
 
diff --git a/deps/cares/src/lib/ares_library_init.c b/deps/cares/src/lib/ares_library_init.c
index 6c2f7c0f8ce9ee..2b91692baec6d5 100644
--- a/deps/cares/src/lib/ares_library_init.c
+++ b/deps/cares/src/lib/ares_library_init.c
@@ -52,8 +52,6 @@ static void        *default_malloc(size_t size)
   return malloc(size);
 }
 
-#if defined(_WIN32)
-/* We need indirections to handle Windows DLL rules. */
 static void *default_realloc(void *p, size_t size)
 {
   return realloc(p, size);
@@ -63,13 +61,25 @@ static void default_free(void *p)
 {
   free(p);
 }
-#else
-#  define default_realloc realloc
-#  define default_free    free
-#endif
-void *(*ares_malloc)(size_t size)             = default_malloc;
-void *(*ares_realloc)(void *ptr, size_t size) = default_realloc;
-void (*ares_free)(void *ptr)                  = default_free;
+
+static void *(*__ares_malloc)(size_t size)             = default_malloc;
+static void *(*__ares_realloc)(void *ptr, size_t size) = default_realloc;
+static void (*__ares_free)(void *ptr)                  = default_free;
+
+void *ares_malloc(size_t size)
+{
+  return __ares_malloc(size);
+}
+
+void *ares_realloc(void *ptr, size_t size)
+{
+  return __ares_realloc(ptr, size);
+}
+
+void ares_free(void *ptr)
+{
+  __ares_free(ptr);
+}
 
 void *ares_malloc_zero(size_t size)
 {
@@ -115,13 +125,13 @@ int ares_library_init_mem(int flags, void *(*amalloc)(size_t size),
                           void *(*arealloc)(void *ptr, size_t size))
 {
   if (amalloc) {
-    ares_malloc = amalloc;
+    __ares_malloc = amalloc;
   }
   if (arealloc) {
-    ares_realloc = arealloc;
+    __ares_realloc = arealloc;
   }
   if (afree) {
-    ares_free = afree;
+    __ares_free = afree;
   }
   return ares_library_init(flags);
 }
@@ -143,9 +153,9 @@ void ares_library_cleanup(void)
 #endif
 
   ares_init_flags = ARES_LIB_INIT_NONE;
-  ares_malloc     = malloc;
-  ares_realloc    = realloc;
-  ares_free       = free;
+  __ares_malloc   = default_malloc;
+  __ares_realloc  = default_realloc;
+  __ares_free     = default_free;
 }
 
 int ares_library_initialized(void)
diff --git a/deps/cares/src/lib/ares_metrics.c b/deps/cares/src/lib/ares_metrics.c
index 0e22fc37e7cb46..13e34decc06ae6 100644
--- a/deps/cares/src/lib/ares_metrics.c
+++ b/deps/cares/src/lib/ares_metrics.c
@@ -162,14 +162,14 @@ void ares_metrics_record(const ares_query_t *query, ares_server_t *server,
     return;
   }
 
-  ares__tvnow(&now);
+  ares_tvnow(&now);
 
   rcode = ares_dns_record_get_rcode(dnsrec);
   if (rcode != ARES_RCODE_NOERROR && rcode != ARES_RCODE_NXDOMAIN) {
     return;
   }
 
-  ares__timeval_diff(&tvdiff, &query->ts, &now);
+  ares_timeval_diff(&tvdiff, &query->ts, &now);
   query_ms = (unsigned int)((tvdiff.sec * 1000) + (tvdiff.usec / 1000));
   if (query_ms == 0) {
     query_ms = 1;
diff --git a/deps/cares/src/lib/ares_options.c b/deps/cares/src/lib/ares_options.c
index 9aeb4bad3d743b..3082f332457c22 100644
--- a/deps/cares/src/lib/ares_options.c
+++ b/deps/cares/src/lib/ares_options.c
@@ -54,9 +54,9 @@ void ares_destroy_options(struct ares_options *options)
 static struct in_addr *ares_save_opt_servers(const ares_channel_t *channel,
                                              int                  *nservers)
 {
-  ares__slist_node_t *snode;
-  struct in_addr     *out =
-    ares_malloc_zero(ares__slist_len(channel->servers) * sizeof(*out));
+  ares_slist_node_t *snode;
+  struct in_addr    *out =
+    ares_malloc_zero(ares_slist_len(channel->servers) * sizeof(*out));
 
   *nservers = 0;
 
@@ -64,9 +64,9 @@ static struct in_addr *ares_save_opt_servers(const ares_channel_t *channel,
     return NULL;
   }
 
-  for (snode = ares__slist_node_first(channel->servers); snode != NULL;
-       snode = ares__slist_node_next(snode)) {
-    const ares_server_t *server = ares__slist_node_val(snode);
+  for (snode = ares_slist_node_first(channel->servers); snode != NULL;
+       snode = ares_slist_node_next(snode)) {
+    const ares_server_t *server = ares_slist_node_val(snode);
 
     if (server->addr.family != AF_INET) {
       continue;
@@ -111,7 +111,7 @@ int ares_save_options(const ares_channel_t *channel,
   }
 
   /* We convert ARES_OPT_TIMEOUT to ARES_OPT_TIMEOUTMS in
-   * ares__init_by_options() */
+   * ares_init_by_options() */
   if (channel->optmask & ARES_OPT_TIMEOUTMS) {
     options->timeout = (int)channel->timeout;
   }
@@ -238,28 +238,28 @@ int ares_save_options(const ares_channel_t *channel,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares__init_options_servers(ares_channel_t       *channel,
-                                                const struct in_addr *servers,
-                                                size_t                nservers)
+static ares_status_t ares_init_options_servers(ares_channel_t       *channel,
+                                               const struct in_addr *servers,
+                                               size_t                nservers)
 {
-  ares__llist_t *slist = NULL;
-  ares_status_t  status;
+  ares_llist_t *slist = NULL;
+  ares_status_t status;
 
-  status = ares_in_addr_to_server_config_llist(servers, nservers, &slist);
+  status = ares_in_addr_to_sconfig_llist(servers, nservers, &slist);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__servers_update(channel, slist, ARES_TRUE);
+  status = ares_servers_update(channel, slist, ARES_TRUE);
 
-  ares__llist_destroy(slist);
+  ares_llist_destroy(slist);
 
   return status;
 }
 
-ares_status_t ares__init_by_options(ares_channel_t            *channel,
-                                    const struct ares_options *options,
-                                    int                        optmask)
+ares_status_t ares_init_by_options(ares_channel_t            *channel,
+                                   const struct ares_options *options,
+                                   int                        optmask)
 {
   size_t i;
 
@@ -472,8 +472,8 @@ ares_status_t ares__init_by_options(ares_channel_t            *channel,
       optmask &= ~(ARES_OPT_SERVERS);
     } else {
       ares_status_t status;
-      status = ares__init_options_servers(channel, options->servers,
-                                          (size_t)options->nservers);
+      status = ares_init_options_servers(channel, options->servers,
+                                         (size_t)options->nservers);
       if (status != ARES_SUCCESS) {
         return status; /* LCOV_EXCL_LINE: OutOfMemory */
       }
diff --git a/deps/cares/src/lib/ares__parse_into_addrinfo.c b/deps/cares/src/lib/ares_parse_into_addrinfo.c
similarity index 89%
rename from deps/cares/src/lib/ares__parse_into_addrinfo.c
rename to deps/cares/src/lib/ares_parse_into_addrinfo.c
index 65c94c0401441b..2108f9b8615816 100644
--- a/deps/cares/src/lib/ares__parse_into_addrinfo.c
+++ b/deps/cares/src/lib/ares_parse_into_addrinfo.c
@@ -45,10 +45,10 @@
 #endif
 
 
-ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
-                                        ares_bool_t    cname_only_is_enodata,
-                                        unsigned short port,
-                                        struct ares_addrinfo *ai)
+ares_status_t ares_parse_into_addrinfo(const ares_dns_record_t *dnsrec,
+                                       ares_bool_t    cname_only_is_enodata,
+                                       unsigned short port,
+                                       struct ares_addrinfo *ai)
 {
   ares_status_t               status;
   size_t                      i;
@@ -90,7 +90,7 @@ ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
      *
      * rname = ares_dns_rr_get_name(rr);
      * if ((rtype == ARES_REC_TYPE_A || rtype == ARES_REC_TYPE_AAAA) &&
-     *     strcasecmp(rname, hostname) != 0) {
+     *     !ares_strcaseeq(rname, hostname)) {
      *   continue;
      * }
      */
@@ -103,7 +103,7 @@ ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
        * SA: Seems wrong as it introduces order dependency. */
       hostname = ares_dns_rr_get_str(rr, ARES_RR_CNAME_CNAME);
 
-      cname = ares__append_addrinfo_cname(&cnames);
+      cname = ares_append_addrinfo_cname(&cnames);
       if (cname == NULL) {
         status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
         goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -147,7 +147,7 @@ ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
   }
 
   /* save the hostname as ai->name */
-  if (ai->name == NULL || strcasecmp(ai->name, hostname) != 0) {
+  if (ai->name == NULL || !ares_strcaseeq(ai->name, hostname)) {
     ares_free(ai->name);
     ai->name = ares_strdup(hostname);
     if (ai->name == NULL) {
@@ -157,18 +157,18 @@ ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
   }
 
   if (got_a || got_aaaa) {
-    ares__addrinfo_cat_nodes(&ai->nodes, nodes);
+    ares_addrinfo_cat_nodes(&ai->nodes, nodes);
     nodes = NULL;
   }
 
   if (got_cname) {
-    ares__addrinfo_cat_cnames(&ai->cnames, cnames);
+    ares_addrinfo_cat_cnames(&ai->cnames, cnames);
     cnames = NULL;
   }
 
 done:
-  ares__freeaddrinfo_cnames(cnames);
-  ares__freeaddrinfo_nodes(nodes);
+  ares_freeaddrinfo_cnames(cnames);
+  ares_freeaddrinfo_nodes(nodes);
 
   /* compatibility */
   if (status == ARES_EBADNAME) {
diff --git a/deps/cares/src/lib/ares_platform.c b/deps/cares/src/lib/ares_platform.c
deleted file mode 100644
index 8f0a1dbffb8173..00000000000000
--- a/deps/cares/src/lib/ares_platform.c
+++ /dev/null
@@ -1,11047 +0,0 @@
-/* MIT License
- *
- * Copyright (c) 1998 Massachusetts Institute of Technology
- * Copyright (c) 2004 Daniel Stenberg
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- * SPDX-License-Identifier: MIT
- */
-
-#include "ares_private.h"
-#include "ares_platform.h"
-
-#if defined(_WIN32) && !defined(MSDOS)
-
-#  define V_PLATFORM_WIN32s        0
-#  define V_PLATFORM_WIN32_WINDOWS 1
-#  define V_PLATFORM_WIN32_NT      2
-#  define V_PLATFORM_WIN32_CE      3
-
-win_platform ares__getplatform(void)
-{
-  OSVERSIONINFOEX OsvEx;
-
-  memset(&OsvEx, 0, sizeof(OsvEx));
-  OsvEx.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX);
-#  ifdef _MSC_VER
-#    pragma warning(push)
-#    pragma warning(disable : 4996) /* warning C4996: 'GetVersionExW': was \
-                                       declared deprecated */
-#  endif
-  if (!GetVersionEx((void *)&OsvEx)) {
-    memset(&OsvEx, 0, sizeof(OsvEx));
-    OsvEx.dwOSVersionInfoSize = sizeof(OSVERSIONINFO);
-    if (!GetVersionEx((void *)&OsvEx)) {
-      return WIN_UNKNOWN;
-    }
-  }
-#  ifdef _MSC_VER
-#    pragma warning(pop)
-#  endif
-
-  switch (OsvEx.dwPlatformId) {
-    case V_PLATFORM_WIN32s:
-      return WIN_3X;
-
-    case V_PLATFORM_WIN32_WINDOWS:
-      return WIN_9X;
-
-    case V_PLATFORM_WIN32_NT:
-      return WIN_NT;
-
-    case V_PLATFORM_WIN32_CE:
-      return WIN_CE;
-
-    default:
-      return WIN_UNKNOWN;
-  }
-}
-
-#endif /* _WIN32 && ! MSDOS */
-
-#if defined(_WIN32_WCE)
-
-/* IANA Well Known Ports are in range 0-1023 */
-#  define USE_IANA_WELL_KNOWN_PORTS 1
-
-/* IANA Registered Ports are in range 1024-49151 */
-#  define USE_IANA_REGISTERED_PORTS 1
-
-struct pvt_servent {
-  char          *s_name;
-  char         **s_aliases;
-  unsigned short s_port;
-  char          *s_proto;
-};
-
-/*
- * Ref: http://www.iana.org/assignments/port-numbers
- */
-
-static struct pvt_servent IANAports[] = {
-#  ifdef USE_IANA_WELL_KNOWN_PORTS
-  { "tcpmux",          { NULL }, 1,     "tcp"  },
-  { "tcpmux",          { NULL }, 1,     "udp"  },
-  { "compressnet",     { NULL }, 2,     "tcp"  },
-  { "compressnet",     { NULL }, 2,     "udp"  },
-  { "compressnet",     { NULL }, 3,     "tcp"  },
-  { "compressnet",     { NULL }, 3,     "udp"  },
-  { "rje",             { NULL }, 5,     "tcp"  },
-  { "rje",             { NULL }, 5,     "udp"  },
-  { "echo",            { NULL }, 7,     "tcp"  },
-  { "echo",            { NULL }, 7,     "udp"  },
-  { "discard",         { NULL }, 9,     "tcp"  },
-  { "discard",         { NULL }, 9,     "udp"  },
-  { "discard",         { NULL }, 9,     "sctp" },
-  { "discard",         { NULL }, 9,     "dccp" },
-  { "systat",          { NULL }, 11,    "tcp"  },
-  { "systat",          { NULL }, 11,    "udp"  },
-  { "daytime",         { NULL }, 13,    "tcp"  },
-  { "daytime",         { NULL }, 13,    "udp"  },
-  { "qotd",            { NULL }, 17,    "tcp"  },
-  { "qotd",            { NULL }, 17,    "udp"  },
-  { "msp",             { NULL }, 18,    "tcp"  },
-  { "msp",             { NULL }, 18,    "udp"  },
-  { "chargen",         { NULL }, 19,    "tcp"  },
-  { "chargen",         { NULL }, 19,    "udp"  },
-  { "ftp-data",        { NULL }, 20,    "tcp"  },
-  { "ftp-data",        { NULL }, 20,    "udp"  },
-  { "ftp-data",        { NULL }, 20,    "sctp" },
-  { "ftp",             { NULL }, 21,    "tcp"  },
-  { "ftp",             { NULL }, 21,    "udp"  },
-  { "ftp",             { NULL }, 21,    "sctp" },
-  { "ssh",             { NULL }, 22,    "tcp"  },
-  { "ssh",             { NULL }, 22,    "udp"  },
-  { "ssh",             { NULL }, 22,    "sctp" },
-  { "telnet",          { NULL }, 23,    "tcp"  },
-  { "telnet",          { NULL }, 23,    "udp"  },
-  { "smtp",            { NULL }, 25,    "tcp"  },
-  { "smtp",            { NULL }, 25,    "udp"  },
-  { "nsw-fe",          { NULL }, 27,    "tcp"  },
-  { "nsw-fe",          { NULL }, 27,    "udp"  },
-  { "msg-icp",         { NULL }, 29,    "tcp"  },
-  { "msg-icp",         { NULL }, 29,    "udp"  },
-  { "msg-auth",        { NULL }, 31,    "tcp"  },
-  { "msg-auth",        { NULL }, 31,    "udp"  },
-  { "dsp",             { NULL }, 33,    "tcp"  },
-  { "dsp",             { NULL }, 33,    "udp"  },
-  { "time",            { NULL }, 37,    "tcp"  },
-  { "time",            { NULL }, 37,    "udp"  },
-  { "rap",             { NULL }, 38,    "tcp"  },
-  { "rap",             { NULL }, 38,    "udp"  },
-  { "rlp",             { NULL }, 39,    "tcp"  },
-  { "rlp",             { NULL }, 39,    "udp"  },
-  { "graphics",        { NULL }, 41,    "tcp"  },
-  { "graphics",        { NULL }, 41,    "udp"  },
-  { "name",            { NULL }, 42,    "tcp"  },
-  { "name",            { NULL }, 42,    "udp"  },
-  { "nameserver",      { NULL }, 42,    "tcp"  },
-  { "nameserver",      { NULL }, 42,    "udp"  },
-  { "nicname",         { NULL }, 43,    "tcp"  },
-  { "nicname",         { NULL }, 43,    "udp"  },
-  { "mpm-flags",       { NULL }, 44,    "tcp"  },
-  { "mpm-flags",       { NULL }, 44,    "udp"  },
-  { "mpm",             { NULL }, 45,    "tcp"  },
-  { "mpm",             { NULL }, 45,    "udp"  },
-  { "mpm-snd",         { NULL }, 46,    "tcp"  },
-  { "mpm-snd",         { NULL }, 46,    "udp"  },
-  { "ni-ftp",          { NULL }, 47,    "tcp"  },
-  { "ni-ftp",          { NULL }, 47,    "udp"  },
-  { "auditd",          { NULL }, 48,    "tcp"  },
-  { "auditd",          { NULL }, 48,    "udp"  },
-  { "tacacs",          { NULL }, 49,    "tcp"  },
-  { "tacacs",          { NULL }, 49,    "udp"  },
-  { "re-mail-ck",      { NULL }, 50,    "tcp"  },
-  { "re-mail-ck",      { NULL }, 50,    "udp"  },
-  { "la-maint",        { NULL }, 51,    "tcp"  },
-  { "la-maint",        { NULL }, 51,    "udp"  },
-  { "xns-time",        { NULL }, 52,    "tcp"  },
-  { "xns-time",        { NULL }, 52,    "udp"  },
-  { "domain",          { NULL }, 53,    "tcp"  },
-  { "domain",          { NULL }, 53,    "udp"  },
-  { "xns-ch",          { NULL }, 54,    "tcp"  },
-  { "xns-ch",          { NULL }, 54,    "udp"  },
-  { "isi-gl",          { NULL }, 55,    "tcp"  },
-  { "isi-gl",          { NULL }, 55,    "udp"  },
-  { "xns-auth",        { NULL }, 56,    "tcp"  },
-  { "xns-auth",        { NULL }, 56,    "udp"  },
-  { "xns-mail",        { NULL }, 58,    "tcp"  },
-  { "xns-mail",        { NULL }, 58,    "udp"  },
-  { "ni-mail",         { NULL }, 61,    "tcp"  },
-  { "ni-mail",         { NULL }, 61,    "udp"  },
-  { "acas",            { NULL }, 62,    "tcp"  },
-  { "acas",            { NULL }, 62,    "udp"  },
-  { "whois++",         { NULL }, 63,    "tcp"  },
-  { "whois++",         { NULL }, 63,    "udp"  },
-  { "covia",           { NULL }, 64,    "tcp"  },
-  { "covia",           { NULL }, 64,    "udp"  },
-  { "tacacs-ds",       { NULL }, 65,    "tcp"  },
-  { "tacacs-ds",       { NULL }, 65,    "udp"  },
-  { "sql*net",         { NULL }, 66,    "tcp"  },
-  { "sql*net",         { NULL }, 66,    "udp"  },
-  { "bootps",          { NULL }, 67,    "tcp"  },
-  { "bootps",          { NULL }, 67,    "udp"  },
-  { "bootpc",          { NULL }, 68,    "tcp"  },
-  { "bootpc",          { NULL }, 68,    "udp"  },
-  { "tftp",            { NULL }, 69,    "tcp"  },
-  { "tftp",            { NULL }, 69,    "udp"  },
-  { "gopher",          { NULL }, 70,    "tcp"  },
-  { "gopher",          { NULL }, 70,    "udp"  },
-  { "netrjs-1",        { NULL }, 71,    "tcp"  },
-  { "netrjs-1",        { NULL }, 71,    "udp"  },
-  { "netrjs-2",        { NULL }, 72,    "tcp"  },
-  { "netrjs-2",        { NULL }, 72,    "udp"  },
-  { "netrjs-3",        { NULL }, 73,    "tcp"  },
-  { "netrjs-3",        { NULL }, 73,    "udp"  },
-  { "netrjs-4",        { NULL }, 74,    "tcp"  },
-  { "netrjs-4",        { NULL }, 74,    "udp"  },
-  { "deos",            { NULL }, 76,    "tcp"  },
-  { "deos",            { NULL }, 76,    "udp"  },
-  { "vettcp",          { NULL }, 78,    "tcp"  },
-  { "vettcp",          { NULL }, 78,    "udp"  },
-  { "finger",          { NULL }, 79,    "tcp"  },
-  { "finger",          { NULL }, 79,    "udp"  },
-  { "http",            { NULL }, 80,    "tcp"  },
-  { "http",            { NULL }, 80,    "udp"  },
-  { "www",             { NULL }, 80,    "tcp"  },
-  { "www",             { NULL }, 80,    "udp"  },
-  { "www-http",        { NULL }, 80,    "tcp"  },
-  { "www-http",        { NULL }, 80,    "udp"  },
-  { "http",            { NULL }, 80,    "sctp" },
-  { "xfer",            { NULL }, 82,    "tcp"  },
-  { "xfer",            { NULL }, 82,    "udp"  },
-  { "mit-ml-dev",      { NULL }, 83,    "tcp"  },
-  { "mit-ml-dev",      { NULL }, 83,    "udp"  },
-  { "ctf",             { NULL }, 84,    "tcp"  },
-  { "ctf",             { NULL }, 84,    "udp"  },
-  { "mit-ml-dev",      { NULL }, 85,    "tcp"  },
-  { "mit-ml-dev",      { NULL }, 85,    "udp"  },
-  { "mfcobol",         { NULL }, 86,    "tcp"  },
-  { "mfcobol",         { NULL }, 86,    "udp"  },
-  { "kerberos",        { NULL }, 88,    "tcp"  },
-  { "kerberos",        { NULL }, 88,    "udp"  },
-  { "su-mit-tg",       { NULL }, 89,    "tcp"  },
-  { "su-mit-tg",       { NULL }, 89,    "udp"  },
-  { "dnsix",           { NULL }, 90,    "tcp"  },
-  { "dnsix",           { NULL }, 90,    "udp"  },
-  { "mit-dov",         { NULL }, 91,    "tcp"  },
-  { "mit-dov",         { NULL }, 91,    "udp"  },
-  { "npp",             { NULL }, 92,    "tcp"  },
-  { "npp",             { NULL }, 92,    "udp"  },
-  { "dcp",             { NULL }, 93,    "tcp"  },
-  { "dcp",             { NULL }, 93,    "udp"  },
-  { "objcall",         { NULL }, 94,    "tcp"  },
-  { "objcall",         { NULL }, 94,    "udp"  },
-  { "supdup",          { NULL }, 95,    "tcp"  },
-  { "supdup",          { NULL }, 95,    "udp"  },
-  { "dixie",           { NULL }, 96,    "tcp"  },
-  { "dixie",           { NULL }, 96,    "udp"  },
-  { "swift-rvf",       { NULL }, 97,    "tcp"  },
-  { "swift-rvf",       { NULL }, 97,    "udp"  },
-  { "tacnews",         { NULL }, 98,    "tcp"  },
-  { "tacnews",         { NULL }, 98,    "udp"  },
-  { "metagram",        { NULL }, 99,    "tcp"  },
-  { "metagram",        { NULL }, 99,    "udp"  },
-  { "newacct",         { NULL }, 100,   "tcp"  },
-  { "hostname",        { NULL }, 101,   "tcp"  },
-  { "hostname",        { NULL }, 101,   "udp"  },
-  { "iso-tsap",        { NULL }, 102,   "tcp"  },
-  { "iso-tsap",        { NULL }, 102,   "udp"  },
-  { "gppitnp",         { NULL }, 103,   "tcp"  },
-  { "gppitnp",         { NULL }, 103,   "udp"  },
-  { "acr-nema",        { NULL }, 104,   "tcp"  },
-  { "acr-nema",        { NULL }, 104,   "udp"  },
-  { "cso",             { NULL }, 105,   "tcp"  },
-  { "cso",             { NULL }, 105,   "udp"  },
-  { "csnet-ns",        { NULL }, 105,   "tcp"  },
-  { "csnet-ns",        { NULL }, 105,   "udp"  },
-  { "3com-tsmux",      { NULL }, 106,   "tcp"  },
-  { "3com-tsmux",      { NULL }, 106,   "udp"  },
-  { "rtelnet",         { NULL }, 107,   "tcp"  },
-  { "rtelnet",         { NULL }, 107,   "udp"  },
-  { "snagas",          { NULL }, 108,   "tcp"  },
-  { "snagas",          { NULL }, 108,   "udp"  },
-  { "pop2",            { NULL }, 109,   "tcp"  },
-  { "pop2",            { NULL }, 109,   "udp"  },
-  { "pop3",            { NULL }, 110,   "tcp"  },
-  { "pop3",            { NULL }, 110,   "udp"  },
-  { "sunrpc",          { NULL }, 111,   "tcp"  },
-  { "sunrpc",          { NULL }, 111,   "udp"  },
-  { "mcidas",          { NULL }, 112,   "tcp"  },
-  { "mcidas",          { NULL }, 112,   "udp"  },
-  { "ident",           { NULL }, 113,   "tcp"  },
-  { "auth",            { NULL }, 113,   "tcp"  },
-  { "auth",            { NULL }, 113,   "udp"  },
-  { "sftp",            { NULL }, 115,   "tcp"  },
-  { "sftp",            { NULL }, 115,   "udp"  },
-  { "ansanotify",      { NULL }, 116,   "tcp"  },
-  { "ansanotify",      { NULL }, 116,   "udp"  },
-  { "uucp-path",       { NULL }, 117,   "tcp"  },
-  { "uucp-path",       { NULL }, 117,   "udp"  },
-  { "sqlserv",         { NULL }, 118,   "tcp"  },
-  { "sqlserv",         { NULL }, 118,   "udp"  },
-  { "nntp",            { NULL }, 119,   "tcp"  },
-  { "nntp",            { NULL }, 119,   "udp"  },
-  { "cfdptkt",         { NULL }, 120,   "tcp"  },
-  { "cfdptkt",         { NULL }, 120,   "udp"  },
-  { "erpc",            { NULL }, 121,   "tcp"  },
-  { "erpc",            { NULL }, 121,   "udp"  },
-  { "smakynet",        { NULL }, 122,   "tcp"  },
-  { "smakynet",        { NULL }, 122,   "udp"  },
-  { "ntp",             { NULL }, 123,   "tcp"  },
-  { "ntp",             { NULL }, 123,   "udp"  },
-  { "ansatrader",      { NULL }, 124,   "tcp"  },
-  { "ansatrader",      { NULL }, 124,   "udp"  },
-  { "locus-map",       { NULL }, 125,   "tcp"  },
-  { "locus-map",       { NULL }, 125,   "udp"  },
-  { "nxedit",          { NULL }, 126,   "tcp"  },
-  { "nxedit",          { NULL }, 126,   "udp"  },
-  { "locus-con",       { NULL }, 127,   "tcp"  },
-  { "locus-con",       { NULL }, 127,   "udp"  },
-  { "gss-xlicen",      { NULL }, 128,   "tcp"  },
-  { "gss-xlicen",      { NULL }, 128,   "udp"  },
-  { "pwdgen",          { NULL }, 129,   "tcp"  },
-  { "pwdgen",          { NULL }, 129,   "udp"  },
-  { "cisco-fna",       { NULL }, 130,   "tcp"  },
-  { "cisco-fna",       { NULL }, 130,   "udp"  },
-  { "cisco-tna",       { NULL }, 131,   "tcp"  },
-  { "cisco-tna",       { NULL }, 131,   "udp"  },
-  { "cisco-sys",       { NULL }, 132,   "tcp"  },
-  { "cisco-sys",       { NULL }, 132,   "udp"  },
-  { "statsrv",         { NULL }, 133,   "tcp"  },
-  { "statsrv",         { NULL }, 133,   "udp"  },
-  { "ingres-net",      { NULL }, 134,   "tcp"  },
-  { "ingres-net",      { NULL }, 134,   "udp"  },
-  { "epmap",           { NULL }, 135,   "tcp"  },
-  { "epmap",           { NULL }, 135,   "udp"  },
-  { "profile",         { NULL }, 136,   "tcp"  },
-  { "profile",         { NULL }, 136,   "udp"  },
-  { "netbios-ns",      { NULL }, 137,   "tcp"  },
-  { "netbios-ns",      { NULL }, 137,   "udp"  },
-  { "netbios-dgm",     { NULL }, 138,   "tcp"  },
-  { "netbios-dgm",     { NULL }, 138,   "udp"  },
-  { "netbios-ssn",     { NULL }, 139,   "tcp"  },
-  { "netbios-ssn",     { NULL }, 139,   "udp"  },
-  { "emfis-data",      { NULL }, 140,   "tcp"  },
-  { "emfis-data",      { NULL }, 140,   "udp"  },
-  { "emfis-cntl",      { NULL }, 141,   "tcp"  },
-  { "emfis-cntl",      { NULL }, 141,   "udp"  },
-  { "bl-idm",          { NULL }, 142,   "tcp"  },
-  { "bl-idm",          { NULL }, 142,   "udp"  },
-  { "imap",            { NULL }, 143,   "tcp"  },
-  { "imap",            { NULL }, 143,   "udp"  },
-  { "uma",             { NULL }, 144,   "tcp"  },
-  { "uma",             { NULL }, 144,   "udp"  },
-  { "uaac",            { NULL }, 145,   "tcp"  },
-  { "uaac",            { NULL }, 145,   "udp"  },
-  { "iso-tp0",         { NULL }, 146,   "tcp"  },
-  { "iso-tp0",         { NULL }, 146,   "udp"  },
-  { "iso-ip",          { NULL }, 147,   "tcp"  },
-  { "iso-ip",          { NULL }, 147,   "udp"  },
-  { "jargon",          { NULL }, 148,   "tcp"  },
-  { "jargon",          { NULL }, 148,   "udp"  },
-  { "aed-512",         { NULL }, 149,   "tcp"  },
-  { "aed-512",         { NULL }, 149,   "udp"  },
-  { "sql-net",         { NULL }, 150,   "tcp"  },
-  { "sql-net",         { NULL }, 150,   "udp"  },
-  { "hems",            { NULL }, 151,   "tcp"  },
-  { "hems",            { NULL }, 151,   "udp"  },
-  { "bftp",            { NULL }, 152,   "tcp"  },
-  { "bftp",            { NULL }, 152,   "udp"  },
-  { "sgmp",            { NULL }, 153,   "tcp"  },
-  { "sgmp",            { NULL }, 153,   "udp"  },
-  { "netsc-prod",      { NULL }, 154,   "tcp"  },
-  { "netsc-prod",      { NULL }, 154,   "udp"  },
-  { "netsc-dev",       { NULL }, 155,   "tcp"  },
-  { "netsc-dev",       { NULL }, 155,   "udp"  },
-  { "sqlsrv",          { NULL }, 156,   "tcp"  },
-  { "sqlsrv",          { NULL }, 156,   "udp"  },
-  { "knet-cmp",        { NULL }, 157,   "tcp"  },
-  { "knet-cmp",        { NULL }, 157,   "udp"  },
-  { "pcmail-srv",      { NULL }, 158,   "tcp"  },
-  { "pcmail-srv",      { NULL }, 158,   "udp"  },
-  { "nss-routing",     { NULL }, 159,   "tcp"  },
-  { "nss-routing",     { NULL }, 159,   "udp"  },
-  { "sgmp-traps",      { NULL }, 160,   "tcp"  },
-  { "sgmp-traps",      { NULL }, 160,   "udp"  },
-  { "snmp",            { NULL }, 161,   "tcp"  },
-  { "snmp",            { NULL }, 161,   "udp"  },
-  { "snmptrap",        { NULL }, 162,   "tcp"  },
-  { "snmptrap",        { NULL }, 162,   "udp"  },
-  { "cmip-man",        { NULL }, 163,   "tcp"  },
-  { "cmip-man",        { NULL }, 163,   "udp"  },
-  { "cmip-agent",      { NULL }, 164,   "tcp"  },
-  { "cmip-agent",      { NULL }, 164,   "udp"  },
-  { "xns-courier",     { NULL }, 165,   "tcp"  },
-  { "xns-courier",     { NULL }, 165,   "udp"  },
-  { "s-net",           { NULL }, 166,   "tcp"  },
-  { "s-net",           { NULL }, 166,   "udp"  },
-  { "namp",            { NULL }, 167,   "tcp"  },
-  { "namp",            { NULL }, 167,   "udp"  },
-  { "rsvd",            { NULL }, 168,   "tcp"  },
-  { "rsvd",            { NULL }, 168,   "udp"  },
-  { "send",            { NULL }, 169,   "tcp"  },
-  { "send",            { NULL }, 169,   "udp"  },
-  { "print-srv",       { NULL }, 170,   "tcp"  },
-  { "print-srv",       { NULL }, 170,   "udp"  },
-  { "multiplex",       { NULL }, 171,   "tcp"  },
-  { "multiplex",       { NULL }, 171,   "udp"  },
-  { "cl/1",            { NULL }, 172,   "tcp"  },
-  { "cl/1",            { NULL }, 172,   "udp"  },
-  { "xyplex-mux",      { NULL }, 173,   "tcp"  },
-  { "xyplex-mux",      { NULL }, 173,   "udp"  },
-  { "mailq",           { NULL }, 174,   "tcp"  },
-  { "mailq",           { NULL }, 174,   "udp"  },
-  { "vmnet",           { NULL }, 175,   "tcp"  },
-  { "vmnet",           { NULL }, 175,   "udp"  },
-  { "genrad-mux",      { NULL }, 176,   "tcp"  },
-  { "genrad-mux",      { NULL }, 176,   "udp"  },
-  { "xdmcp",           { NULL }, 177,   "tcp"  },
-  { "xdmcp",           { NULL }, 177,   "udp"  },
-  { "nextstep",        { NULL }, 178,   "tcp"  },
-  { "nextstep",        { NULL }, 178,   "udp"  },
-  { "bgp",             { NULL }, 179,   "tcp"  },
-  { "bgp",             { NULL }, 179,   "udp"  },
-  { "bgp",             { NULL }, 179,   "sctp" },
-  { "ris",             { NULL }, 180,   "tcp"  },
-  { "ris",             { NULL }, 180,   "udp"  },
-  { "unify",           { NULL }, 181,   "tcp"  },
-  { "unify",           { NULL }, 181,   "udp"  },
-  { "audit",           { NULL }, 182,   "tcp"  },
-  { "audit",           { NULL }, 182,   "udp"  },
-  { "ocbinder",        { NULL }, 183,   "tcp"  },
-  { "ocbinder",        { NULL }, 183,   "udp"  },
-  { "ocserver",        { NULL }, 184,   "tcp"  },
-  { "ocserver",        { NULL }, 184,   "udp"  },
-  { "remote-kis",      { NULL }, 185,   "tcp"  },
-  { "remote-kis",      { NULL }, 185,   "udp"  },
-  { "kis",             { NULL }, 186,   "tcp"  },
-  { "kis",             { NULL }, 186,   "udp"  },
-  { "aci",             { NULL }, 187,   "tcp"  },
-  { "aci",             { NULL }, 187,   "udp"  },
-  { "mumps",           { NULL }, 188,   "tcp"  },
-  { "mumps",           { NULL }, 188,   "udp"  },
-  { "qft",             { NULL }, 189,   "tcp"  },
-  { "qft",             { NULL }, 189,   "udp"  },
-  { "gacp",            { NULL }, 190,   "tcp"  },
-  { "gacp",            { NULL }, 190,   "udp"  },
-  { "prospero",        { NULL }, 191,   "tcp"  },
-  { "prospero",        { NULL }, 191,   "udp"  },
-  { "osu-nms",         { NULL }, 192,   "tcp"  },
-  { "osu-nms",         { NULL }, 192,   "udp"  },
-  { "srmp",            { NULL }, 193,   "tcp"  },
-  { "srmp",            { NULL }, 193,   "udp"  },
-  { "irc",             { NULL }, 194,   "tcp"  },
-  { "irc",             { NULL }, 194,   "udp"  },
-  { "dn6-nlm-aud",     { NULL }, 195,   "tcp"  },
-  { "dn6-nlm-aud",     { NULL }, 195,   "udp"  },
-  { "dn6-smm-red",     { NULL }, 196,   "tcp"  },
-  { "dn6-smm-red",     { NULL }, 196,   "udp"  },
-  { "dls",             { NULL }, 197,   "tcp"  },
-  { "dls",             { NULL }, 197,   "udp"  },
-  { "dls-mon",         { NULL }, 198,   "tcp"  },
-  { "dls-mon",         { NULL }, 198,   "udp"  },
-  { "smux",            { NULL }, 199,   "tcp"  },
-  { "smux",            { NULL }, 199,   "udp"  },
-  { "src",             { NULL }, 200,   "tcp"  },
-  { "src",             { NULL }, 200,   "udp"  },
-  { "at-rtmp",         { NULL }, 201,   "tcp"  },
-  { "at-rtmp",         { NULL }, 201,   "udp"  },
-  { "at-nbp",          { NULL }, 202,   "tcp"  },
-  { "at-nbp",          { NULL }, 202,   "udp"  },
-  { "at-3",            { NULL }, 203,   "tcp"  },
-  { "at-3",            { NULL }, 203,   "udp"  },
-  { "at-echo",         { NULL }, 204,   "tcp"  },
-  { "at-echo",         { NULL }, 204,   "udp"  },
-  { "at-5",            { NULL }, 205,   "tcp"  },
-  { "at-5",            { NULL }, 205,   "udp"  },
-  { "at-zis",          { NULL }, 206,   "tcp"  },
-  { "at-zis",          { NULL }, 206,   "udp"  },
-  { "at-7",            { NULL }, 207,   "tcp"  },
-  { "at-7",            { NULL }, 207,   "udp"  },
-  { "at-8",            { NULL }, 208,   "tcp"  },
-  { "at-8",            { NULL }, 208,   "udp"  },
-  { "qmtp",            { NULL }, 209,   "tcp"  },
-  { "qmtp",            { NULL }, 209,   "udp"  },
-  { "z39.50",          { NULL }, 210,   "tcp"  },
-  { "z39.50",          { NULL }, 210,   "udp"  },
-  { "914c/g",          { NULL }, 211,   "tcp"  },
-  { "914c/g",          { NULL }, 211,   "udp"  },
-  { "anet",            { NULL }, 212,   "tcp"  },
-  { "anet",            { NULL }, 212,   "udp"  },
-  { "ipx",             { NULL }, 213,   "tcp"  },
-  { "ipx",             { NULL }, 213,   "udp"  },
-  { "vmpwscs",         { NULL }, 214,   "tcp"  },
-  { "vmpwscs",         { NULL }, 214,   "udp"  },
-  { "softpc",          { NULL }, 215,   "tcp"  },
-  { "softpc",          { NULL }, 215,   "udp"  },
-  { "CAIlic",          { NULL }, 216,   "tcp"  },
-  { "CAIlic",          { NULL }, 216,   "udp"  },
-  { "dbase",           { NULL }, 217,   "tcp"  },
-  { "dbase",           { NULL }, 217,   "udp"  },
-  { "mpp",             { NULL }, 218,   "tcp"  },
-  { "mpp",             { NULL }, 218,   "udp"  },
-  { "uarps",           { NULL }, 219,   "tcp"  },
-  { "uarps",           { NULL }, 219,   "udp"  },
-  { "imap3",           { NULL }, 220,   "tcp"  },
-  { "imap3",           { NULL }, 220,   "udp"  },
-  { "fln-spx",         { NULL }, 221,   "tcp"  },
-  { "fln-spx",         { NULL }, 221,   "udp"  },
-  { "rsh-spx",         { NULL }, 222,   "tcp"  },
-  { "rsh-spx",         { NULL }, 222,   "udp"  },
-  { "cdc",             { NULL }, 223,   "tcp"  },
-  { "cdc",             { NULL }, 223,   "udp"  },
-  { "masqdialer",      { NULL }, 224,   "tcp"  },
-  { "masqdialer",      { NULL }, 224,   "udp"  },
-  { "direct",          { NULL }, 242,   "tcp"  },
-  { "direct",          { NULL }, 242,   "udp"  },
-  { "sur-meas",        { NULL }, 243,   "tcp"  },
-  { "sur-meas",        { NULL }, 243,   "udp"  },
-  { "inbusiness",      { NULL }, 244,   "tcp"  },
-  { "inbusiness",      { NULL }, 244,   "udp"  },
-  { "link",            { NULL }, 245,   "tcp"  },
-  { "link",            { NULL }, 245,   "udp"  },
-  { "dsp3270",         { NULL }, 246,   "tcp"  },
-  { "dsp3270",         { NULL }, 246,   "udp"  },
-  { "subntbcst_tftp",  { NULL }, 247,   "tcp"  },
-  { "subntbcst_tftp",  { NULL }, 247,   "udp"  },
-  { "bhfhs",           { NULL }, 248,   "tcp"  },
-  { "bhfhs",           { NULL }, 248,   "udp"  },
-  { "rap",             { NULL }, 256,   "tcp"  },
-  { "rap",             { NULL }, 256,   "udp"  },
-  { "set",             { NULL }, 257,   "tcp"  },
-  { "set",             { NULL }, 257,   "udp"  },
-  { "esro-gen",        { NULL }, 259,   "tcp"  },
-  { "esro-gen",        { NULL }, 259,   "udp"  },
-  { "openport",        { NULL }, 260,   "tcp"  },
-  { "openport",        { NULL }, 260,   "udp"  },
-  { "nsiiops",         { NULL }, 261,   "tcp"  },
-  { "nsiiops",         { NULL }, 261,   "udp"  },
-  { "arcisdms",        { NULL }, 262,   "tcp"  },
-  { "arcisdms",        { NULL }, 262,   "udp"  },
-  { "hdap",            { NULL }, 263,   "tcp"  },
-  { "hdap",            { NULL }, 263,   "udp"  },
-  { "bgmp",            { NULL }, 264,   "tcp"  },
-  { "bgmp",            { NULL }, 264,   "udp"  },
-  { "x-bone-ctl",      { NULL }, 265,   "tcp"  },
-  { "x-bone-ctl",      { NULL }, 265,   "udp"  },
-  { "sst",             { NULL }, 266,   "tcp"  },
-  { "sst",             { NULL }, 266,   "udp"  },
-  { "td-service",      { NULL }, 267,   "tcp"  },
-  { "td-service",      { NULL }, 267,   "udp"  },
-  { "td-replica",      { NULL }, 268,   "tcp"  },
-  { "td-replica",      { NULL }, 268,   "udp"  },
-  { "manet",           { NULL }, 269,   "tcp"  },
-  { "manet",           { NULL }, 269,   "udp"  },
-  { "gist",            { NULL }, 270,   "udp"  },
-  { "http-mgmt",       { NULL }, 280,   "tcp"  },
-  { "http-mgmt",       { NULL }, 280,   "udp"  },
-  { "personal-link",   { NULL }, 281,   "tcp"  },
-  { "personal-link",   { NULL }, 281,   "udp"  },
-  { "cableport-ax",    { NULL }, 282,   "tcp"  },
-  { "cableport-ax",    { NULL }, 282,   "udp"  },
-  { "rescap",          { NULL }, 283,   "tcp"  },
-  { "rescap",          { NULL }, 283,   "udp"  },
-  { "corerjd",         { NULL }, 284,   "tcp"  },
-  { "corerjd",         { NULL }, 284,   "udp"  },
-  { "fxp",             { NULL }, 286,   "tcp"  },
-  { "fxp",             { NULL }, 286,   "udp"  },
-  { "k-block",         { NULL }, 287,   "tcp"  },
-  { "k-block",         { NULL }, 287,   "udp"  },
-  { "novastorbakcup",  { NULL }, 308,   "tcp"  },
-  { "novastorbakcup",  { NULL }, 308,   "udp"  },
-  { "entrusttime",     { NULL }, 309,   "tcp"  },
-  { "entrusttime",     { NULL }, 309,   "udp"  },
-  { "bhmds",           { NULL }, 310,   "tcp"  },
-  { "bhmds",           { NULL }, 310,   "udp"  },
-  { "asip-webadmin",   { NULL }, 311,   "tcp"  },
-  { "asip-webadmin",   { NULL }, 311,   "udp"  },
-  { "vslmp",           { NULL }, 312,   "tcp"  },
-  { "vslmp",           { NULL }, 312,   "udp"  },
-  { "magenta-logic",   { NULL }, 313,   "tcp"  },
-  { "magenta-logic",   { NULL }, 313,   "udp"  },
-  { "opalis-robot",    { NULL }, 314,   "tcp"  },
-  { "opalis-robot",    { NULL }, 314,   "udp"  },
-  { "dpsi",            { NULL }, 315,   "tcp"  },
-  { "dpsi",            { NULL }, 315,   "udp"  },
-  { "decauth",         { NULL }, 316,   "tcp"  },
-  { "decauth",         { NULL }, 316,   "udp"  },
-  { "zannet",          { NULL }, 317,   "tcp"  },
-  { "zannet",          { NULL }, 317,   "udp"  },
-  { "pkix-timestamp",  { NULL }, 318,   "tcp"  },
-  { "pkix-timestamp",  { NULL }, 318,   "udp"  },
-  { "ptp-event",       { NULL }, 319,   "tcp"  },
-  { "ptp-event",       { NULL }, 319,   "udp"  },
-  { "ptp-general",     { NULL }, 320,   "tcp"  },
-  { "ptp-general",     { NULL }, 320,   "udp"  },
-  { "pip",             { NULL }, 321,   "tcp"  },
-  { "pip",             { NULL }, 321,   "udp"  },
-  { "rtsps",           { NULL }, 322,   "tcp"  },
-  { "rtsps",           { NULL }, 322,   "udp"  },
-  { "texar",           { NULL }, 333,   "tcp"  },
-  { "texar",           { NULL }, 333,   "udp"  },
-  { "pdap",            { NULL }, 344,   "tcp"  },
-  { "pdap",            { NULL }, 344,   "udp"  },
-  { "pawserv",         { NULL }, 345,   "tcp"  },
-  { "pawserv",         { NULL }, 345,   "udp"  },
-  { "zserv",           { NULL }, 346,   "tcp"  },
-  { "zserv",           { NULL }, 346,   "udp"  },
-  { "fatserv",         { NULL }, 347,   "tcp"  },
-  { "fatserv",         { NULL }, 347,   "udp"  },
-  { "csi-sgwp",        { NULL }, 348,   "tcp"  },
-  { "csi-sgwp",        { NULL }, 348,   "udp"  },
-  { "mftp",            { NULL }, 349,   "tcp"  },
-  { "mftp",            { NULL }, 349,   "udp"  },
-  { "matip-type-a",    { NULL }, 350,   "tcp"  },
-  { "matip-type-a",    { NULL }, 350,   "udp"  },
-  { "matip-type-b",    { NULL }, 351,   "tcp"  },
-  { "matip-type-b",    { NULL }, 351,   "udp"  },
-  { "bhoetty",         { NULL }, 351,   "tcp"  },
-  { "bhoetty",         { NULL }, 351,   "udp"  },
-  { "dtag-ste-sb",     { NULL }, 352,   "tcp"  },
-  { "dtag-ste-sb",     { NULL }, 352,   "udp"  },
-  { "bhoedap4",        { NULL }, 352,   "tcp"  },
-  { "bhoedap4",        { NULL }, 352,   "udp"  },
-  { "ndsauth",         { NULL }, 353,   "tcp"  },
-  { "ndsauth",         { NULL }, 353,   "udp"  },
-  { "bh611",           { NULL }, 354,   "tcp"  },
-  { "bh611",           { NULL }, 354,   "udp"  },
-  { "datex-asn",       { NULL }, 355,   "tcp"  },
-  { "datex-asn",       { NULL }, 355,   "udp"  },
-  { "cloanto-net-1",   { NULL }, 356,   "tcp"  },
-  { "cloanto-net-1",   { NULL }, 356,   "udp"  },
-  { "bhevent",         { NULL }, 357,   "tcp"  },
-  { "bhevent",         { NULL }, 357,   "udp"  },
-  { "shrinkwrap",      { NULL }, 358,   "tcp"  },
-  { "shrinkwrap",      { NULL }, 358,   "udp"  },
-  { "nsrmp",           { NULL }, 359,   "tcp"  },
-  { "nsrmp",           { NULL }, 359,   "udp"  },
-  { "scoi2odialog",    { NULL }, 360,   "tcp"  },
-  { "scoi2odialog",    { NULL }, 360,   "udp"  },
-  { "semantix",        { NULL }, 361,   "tcp"  },
-  { "semantix",        { NULL }, 361,   "udp"  },
-  { "srssend",         { NULL }, 362,   "tcp"  },
-  { "srssend",         { NULL }, 362,   "udp"  },
-  { "rsvp_tunnel",     { NULL }, 363,   "tcp"  },
-  { "rsvp_tunnel",     { NULL }, 363,   "udp"  },
-  { "aurora-cmgr",     { NULL }, 364,   "tcp"  },
-  { "aurora-cmgr",     { NULL }, 364,   "udp"  },
-  { "dtk",             { NULL }, 365,   "tcp"  },
-  { "dtk",             { NULL }, 365,   "udp"  },
-  { "odmr",            { NULL }, 366,   "tcp"  },
-  { "odmr",            { NULL }, 366,   "udp"  },
-  { "mortgageware",    { NULL }, 367,   "tcp"  },
-  { "mortgageware",    { NULL }, 367,   "udp"  },
-  { "qbikgdp",         { NULL }, 368,   "tcp"  },
-  { "qbikgdp",         { NULL }, 368,   "udp"  },
-  { "rpc2portmap",     { NULL }, 369,   "tcp"  },
-  { "rpc2portmap",     { NULL }, 369,   "udp"  },
-  { "codaauth2",       { NULL }, 370,   "tcp"  },
-  { "codaauth2",       { NULL }, 370,   "udp"  },
-  { "clearcase",       { NULL }, 371,   "tcp"  },
-  { "clearcase",       { NULL }, 371,   "udp"  },
-  { "ulistproc",       { NULL }, 372,   "tcp"  },
-  { "ulistproc",       { NULL }, 372,   "udp"  },
-  { "legent-1",        { NULL }, 373,   "tcp"  },
-  { "legent-1",        { NULL }, 373,   "udp"  },
-  { "legent-2",        { NULL }, 374,   "tcp"  },
-  { "legent-2",        { NULL }, 374,   "udp"  },
-  { "hassle",          { NULL }, 375,   "tcp"  },
-  { "hassle",          { NULL }, 375,   "udp"  },
-  { "nip",             { NULL }, 376,   "tcp"  },
-  { "nip",             { NULL }, 376,   "udp"  },
-  { "tnETOS",          { NULL }, 377,   "tcp"  },
-  { "tnETOS",          { NULL }, 377,   "udp"  },
-  { "dsETOS",          { NULL }, 378,   "tcp"  },
-  { "dsETOS",          { NULL }, 378,   "udp"  },
-  { "is99c",           { NULL }, 379,   "tcp"  },
-  { "is99c",           { NULL }, 379,   "udp"  },
-  { "is99s",           { NULL }, 380,   "tcp"  },
-  { "is99s",           { NULL }, 380,   "udp"  },
-  { "hp-collector",    { NULL }, 381,   "tcp"  },
-  { "hp-collector",    { NULL }, 381,   "udp"  },
-  { "hp-managed-node", { NULL }, 382,   "tcp"  },
-  { "hp-managed-node", { NULL }, 382,   "udp"  },
-  { "hp-alarm-mgr",    { NULL }, 383,   "tcp"  },
-  { "hp-alarm-mgr",    { NULL }, 383,   "udp"  },
-  { "arns",            { NULL }, 384,   "tcp"  },
-  { "arns",            { NULL }, 384,   "udp"  },
-  { "ibm-app",         { NULL }, 385,   "tcp"  },
-  { "ibm-app",         { NULL }, 385,   "udp"  },
-  { "asa",             { NULL }, 386,   "tcp"  },
-  { "asa",             { NULL }, 386,   "udp"  },
-  { "aurp",            { NULL }, 387,   "tcp"  },
-  { "aurp",            { NULL }, 387,   "udp"  },
-  { "unidata-ldm",     { NULL }, 388,   "tcp"  },
-  { "unidata-ldm",     { NULL }, 388,   "udp"  },
-  { "ldap",            { NULL }, 389,   "tcp"  },
-  { "ldap",            { NULL }, 389,   "udp"  },
-  { "uis",             { NULL }, 390,   "tcp"  },
-  { "uis",             { NULL }, 390,   "udp"  },
-  { "synotics-relay",  { NULL }, 391,   "tcp"  },
-  { "synotics-relay",  { NULL }, 391,   "udp"  },
-  { "synotics-broker", { NULL }, 392,   "tcp"  },
-  { "synotics-broker", { NULL }, 392,   "udp"  },
-  { "meta5",           { NULL }, 393,   "tcp"  },
-  { "meta5",           { NULL }, 393,   "udp"  },
-  { "embl-ndt",        { NULL }, 394,   "tcp"  },
-  { "embl-ndt",        { NULL }, 394,   "udp"  },
-  { "netcp",           { NULL }, 395,   "tcp"  },
-  { "netcp",           { NULL }, 395,   "udp"  },
-  { "netware-ip",      { NULL }, 396,   "tcp"  },
-  { "netware-ip",      { NULL }, 396,   "udp"  },
-  { "mptn",            { NULL }, 397,   "tcp"  },
-  { "mptn",            { NULL }, 397,   "udp"  },
-  { "kryptolan",       { NULL }, 398,   "tcp"  },
-  { "kryptolan",       { NULL }, 398,   "udp"  },
-  { "iso-tsap-c2",     { NULL }, 399,   "tcp"  },
-  { "iso-tsap-c2",     { NULL }, 399,   "udp"  },
-  { "osb-sd",          { NULL }, 400,   "tcp"  },
-  { "osb-sd",          { NULL }, 400,   "udp"  },
-  { "ups",             { NULL }, 401,   "tcp"  },
-  { "ups",             { NULL }, 401,   "udp"  },
-  { "genie",           { NULL }, 402,   "tcp"  },
-  { "genie",           { NULL }, 402,   "udp"  },
-  { "decap",           { NULL }, 403,   "tcp"  },
-  { "decap",           { NULL }, 403,   "udp"  },
-  { "nced",            { NULL }, 404,   "tcp"  },
-  { "nced",            { NULL }, 404,   "udp"  },
-  { "ncld",            { NULL }, 405,   "tcp"  },
-  { "ncld",            { NULL }, 405,   "udp"  },
-  { "imsp",            { NULL }, 406,   "tcp"  },
-  { "imsp",            { NULL }, 406,   "udp"  },
-  { "timbuktu",        { NULL }, 407,   "tcp"  },
-  { "timbuktu",        { NULL }, 407,   "udp"  },
-  { "prm-sm",          { NULL }, 408,   "tcp"  },
-  { "prm-sm",          { NULL }, 408,   "udp"  },
-  { "prm-nm",          { NULL }, 409,   "tcp"  },
-  { "prm-nm",          { NULL }, 409,   "udp"  },
-  { "decladebug",      { NULL }, 410,   "tcp"  },
-  { "decladebug",      { NULL }, 410,   "udp"  },
-  { "rmt",             { NULL }, 411,   "tcp"  },
-  { "rmt",             { NULL }, 411,   "udp"  },
-  { "synoptics-trap",  { NULL }, 412,   "tcp"  },
-  { "synoptics-trap",  { NULL }, 412,   "udp"  },
-  { "smsp",            { NULL }, 413,   "tcp"  },
-  { "smsp",            { NULL }, 413,   "udp"  },
-  { "infoseek",        { NULL }, 414,   "tcp"  },
-  { "infoseek",        { NULL }, 414,   "udp"  },
-  { "bnet",            { NULL }, 415,   "tcp"  },
-  { "bnet",            { NULL }, 415,   "udp"  },
-  { "silverplatter",   { NULL }, 416,   "tcp"  },
-  { "silverplatter",   { NULL }, 416,   "udp"  },
-  { "onmux",           { NULL }, 417,   "tcp"  },
-  { "onmux",           { NULL }, 417,   "udp"  },
-  { "hyper-g",         { NULL }, 418,   "tcp"  },
-  { "hyper-g",         { NULL }, 418,   "udp"  },
-  { "ariel1",          { NULL }, 419,   "tcp"  },
-  { "ariel1",          { NULL }, 419,   "udp"  },
-  { "smpte",           { NULL }, 420,   "tcp"  },
-  { "smpte",           { NULL }, 420,   "udp"  },
-  { "ariel2",          { NULL }, 421,   "tcp"  },
-  { "ariel2",          { NULL }, 421,   "udp"  },
-  { "ariel3",          { NULL }, 422,   "tcp"  },
-  { "ariel3",          { NULL }, 422,   "udp"  },
-  { "opc-job-start",   { NULL }, 423,   "tcp"  },
-  { "opc-job-start",   { NULL }, 423,   "udp"  },
-  { "opc-job-track",   { NULL }, 424,   "tcp"  },
-  { "opc-job-track",   { NULL }, 424,   "udp"  },
-  { "icad-el",         { NULL }, 425,   "tcp"  },
-  { "icad-el",         { NULL }, 425,   "udp"  },
-  { "smartsdp",        { NULL }, 426,   "tcp"  },
-  { "smartsdp",        { NULL }, 426,   "udp"  },
-  { "svrloc",          { NULL }, 427,   "tcp"  },
-  { "svrloc",          { NULL }, 427,   "udp"  },
-  { "ocs_cmu",         { NULL }, 428,   "tcp"  },
-  { "ocs_cmu",         { NULL }, 428,   "udp"  },
-  { "ocs_amu",         { NULL }, 429,   "tcp"  },
-  { "ocs_amu",         { NULL }, 429,   "udp"  },
-  { "utmpsd",          { NULL }, 430,   "tcp"  },
-  { "utmpsd",          { NULL }, 430,   "udp"  },
-  { "utmpcd",          { NULL }, 431,   "tcp"  },
-  { "utmpcd",          { NULL }, 431,   "udp"  },
-  { "iasd",            { NULL }, 432,   "tcp"  },
-  { "iasd",            { NULL }, 432,   "udp"  },
-  { "nnsp",            { NULL }, 433,   "tcp"  },
-  { "nnsp",            { NULL }, 433,   "udp"  },
-  { "mobileip-agent",  { NULL }, 434,   "tcp"  },
-  { "mobileip-agent",  { NULL }, 434,   "udp"  },
-  { "mobilip-mn",      { NULL }, 435,   "tcp"  },
-  { "mobilip-mn",      { NULL }, 435,   "udp"  },
-  { "dna-cml",         { NULL }, 436,   "tcp"  },
-  { "dna-cml",         { NULL }, 436,   "udp"  },
-  { "comscm",          { NULL }, 437,   "tcp"  },
-  { "comscm",          { NULL }, 437,   "udp"  },
-  { "dsfgw",           { NULL }, 438,   "tcp"  },
-  { "dsfgw",           { NULL }, 438,   "udp"  },
-  { "dasp",            { NULL }, 439,   "tcp"  },
-  { "dasp",            { NULL }, 439,   "udp"  },
-  { "sgcp",            { NULL }, 440,   "tcp"  },
-  { "sgcp",            { NULL }, 440,   "udp"  },
-  { "decvms-sysmgt",   { NULL }, 441,   "tcp"  },
-  { "decvms-sysmgt",   { NULL }, 441,   "udp"  },
-  { "cvc_hostd",       { NULL }, 442,   "tcp"  },
-  { "cvc_hostd",       { NULL }, 442,   "udp"  },
-  { "https",           { NULL }, 443,   "tcp"  },
-  { "https",           { NULL }, 443,   "udp"  },
-  { "https",           { NULL }, 443,   "sctp" },
-  { "snpp",            { NULL }, 444,   "tcp"  },
-  { "snpp",            { NULL }, 444,   "udp"  },
-  { "microsoft-ds",    { NULL }, 445,   "tcp"  },
-  { "microsoft-ds",    { NULL }, 445,   "udp"  },
-  { "ddm-rdb",         { NULL }, 446,   "tcp"  },
-  { "ddm-rdb",         { NULL }, 446,   "udp"  },
-  { "ddm-dfm",         { NULL }, 447,   "tcp"  },
-  { "ddm-dfm",         { NULL }, 447,   "udp"  },
-  { "ddm-ssl",         { NULL }, 448,   "tcp"  },
-  { "ddm-ssl",         { NULL }, 448,   "udp"  },
-  { "as-servermap",    { NULL }, 449,   "tcp"  },
-  { "as-servermap",    { NULL }, 449,   "udp"  },
-  { "tserver",         { NULL }, 450,   "tcp"  },
-  { "tserver",         { NULL }, 450,   "udp"  },
-  { "sfs-smp-net",     { NULL }, 451,   "tcp"  },
-  { "sfs-smp-net",     { NULL }, 451,   "udp"  },
-  { "sfs-config",      { NULL }, 452,   "tcp"  },
-  { "sfs-config",      { NULL }, 452,   "udp"  },
-  { "creativeserver",  { NULL }, 453,   "tcp"  },
-  { "creativeserver",  { NULL }, 453,   "udp"  },
-  { "contentserver",   { NULL }, 454,   "tcp"  },
-  { "contentserver",   { NULL }, 454,   "udp"  },
-  { "creativepartnr",  { NULL }, 455,   "tcp"  },
-  { "creativepartnr",  { NULL }, 455,   "udp"  },
-  { "macon-tcp",       { NULL }, 456,   "tcp"  },
-  { "macon-udp",       { NULL }, 456,   "udp"  },
-  { "scohelp",         { NULL }, 457,   "tcp"  },
-  { "scohelp",         { NULL }, 457,   "udp"  },
-  { "appleqtc",        { NULL }, 458,   "tcp"  },
-  { "appleqtc",        { NULL }, 458,   "udp"  },
-  { "ampr-rcmd",       { NULL }, 459,   "tcp"  },
-  { "ampr-rcmd",       { NULL }, 459,   "udp"  },
-  { "skronk",          { NULL }, 460,   "tcp"  },
-  { "skronk",          { NULL }, 460,   "udp"  },
-  { "datasurfsrv",     { NULL }, 461,   "tcp"  },
-  { "datasurfsrv",     { NULL }, 461,   "udp"  },
-  { "datasurfsrvsec",  { NULL }, 462,   "tcp"  },
-  { "datasurfsrvsec",  { NULL }, 462,   "udp"  },
-  { "alpes",           { NULL }, 463,   "tcp"  },
-  { "alpes",           { NULL }, 463,   "udp"  },
-  { "kpasswd",         { NULL }, 464,   "tcp"  },
-  { "kpasswd",         { NULL }, 464,   "udp"  },
-  { "urd",             { NULL }, 465,   "tcp"  },
-  { "igmpv3lite",      { NULL }, 465,   "udp"  },
-  { "digital-vrc",     { NULL }, 466,   "tcp"  },
-  { "digital-vrc",     { NULL }, 466,   "udp"  },
-  { "mylex-mapd",      { NULL }, 467,   "tcp"  },
-  { "mylex-mapd",      { NULL }, 467,   "udp"  },
-  { "photuris",        { NULL }, 468,   "tcp"  },
-  { "photuris",        { NULL }, 468,   "udp"  },
-  { "rcp",             { NULL }, 469,   "tcp"  },
-  { "rcp",             { NULL }, 469,   "udp"  },
-  { "scx-proxy",       { NULL }, 470,   "tcp"  },
-  { "scx-proxy",       { NULL }, 470,   "udp"  },
-  { "mondex",          { NULL }, 471,   "tcp"  },
-  { "mondex",          { NULL }, 471,   "udp"  },
-  { "ljk-login",       { NULL }, 472,   "tcp"  },
-  { "ljk-login",       { NULL }, 472,   "udp"  },
-  { "hybrid-pop",      { NULL }, 473,   "tcp"  },
-  { "hybrid-pop",      { NULL }, 473,   "udp"  },
-  { "tn-tl-w1",        { NULL }, 474,   "tcp"  },
-  { "tn-tl-w2",        { NULL }, 474,   "udp"  },
-  { "tcpnethaspsrv",   { NULL }, 475,   "tcp"  },
-  { "tcpnethaspsrv",   { NULL }, 475,   "udp"  },
-  { "tn-tl-fd1",       { NULL }, 476,   "tcp"  },
-  { "tn-tl-fd1",       { NULL }, 476,   "udp"  },
-  { "ss7ns",           { NULL }, 477,   "tcp"  },
-  { "ss7ns",           { NULL }, 477,   "udp"  },
-  { "spsc",            { NULL }, 478,   "tcp"  },
-  { "spsc",            { NULL }, 478,   "udp"  },
-  { "iafserver",       { NULL }, 479,   "tcp"  },
-  { "iafserver",       { NULL }, 479,   "udp"  },
-  { "iafdbase",        { NULL }, 480,   "tcp"  },
-  { "iafdbase",        { NULL }, 480,   "udp"  },
-  { "ph",              { NULL }, 481,   "tcp"  },
-  { "ph",              { NULL }, 481,   "udp"  },
-  { "bgs-nsi",         { NULL }, 482,   "tcp"  },
-  { "bgs-nsi",         { NULL }, 482,   "udp"  },
-  { "ulpnet",          { NULL }, 483,   "tcp"  },
-  { "ulpnet",          { NULL }, 483,   "udp"  },
-  { "integra-sme",     { NULL }, 484,   "tcp"  },
-  { "integra-sme",     { NULL }, 484,   "udp"  },
-  { "powerburst",      { NULL }, 485,   "tcp"  },
-  { "powerburst",      { NULL }, 485,   "udp"  },
-  { "avian",           { NULL }, 486,   "tcp"  },
-  { "avian",           { NULL }, 486,   "udp"  },
-  { "saft",            { NULL }, 487,   "tcp"  },
-  { "saft",            { NULL }, 487,   "udp"  },
-  { "gss-http",        { NULL }, 488,   "tcp"  },
-  { "gss-http",        { NULL }, 488,   "udp"  },
-  { "nest-protocol",   { NULL }, 489,   "tcp"  },
-  { "nest-protocol",   { NULL }, 489,   "udp"  },
-  { "micom-pfs",       { NULL }, 490,   "tcp"  },
-  { "micom-pfs",       { NULL }, 490,   "udp"  },
-  { "go-login",        { NULL }, 491,   "tcp"  },
-  { "go-login",        { NULL }, 491,   "udp"  },
-  { "ticf-1",          { NULL }, 492,   "tcp"  },
-  { "ticf-1",          { NULL }, 492,   "udp"  },
-  { "ticf-2",          { NULL }, 493,   "tcp"  },
-  { "ticf-2",          { NULL }, 493,   "udp"  },
-  { "pov-ray",         { NULL }, 494,   "tcp"  },
-  { "pov-ray",         { NULL }, 494,   "udp"  },
-  { "intecourier",     { NULL }, 495,   "tcp"  },
-  { "intecourier",     { NULL }, 495,   "udp"  },
-  { "pim-rp-disc",     { NULL }, 496,   "tcp"  },
-  { "pim-rp-disc",     { NULL }, 496,   "udp"  },
-  { "dantz",           { NULL }, 497,   "tcp"  },
-  { "dantz",           { NULL }, 497,   "udp"  },
-  { "siam",            { NULL }, 498,   "tcp"  },
-  { "siam",            { NULL }, 498,   "udp"  },
-  { "iso-ill",         { NULL }, 499,   "tcp"  },
-  { "iso-ill",         { NULL }, 499,   "udp"  },
-  { "isakmp",          { NULL }, 500,   "tcp"  },
-  { "isakmp",          { NULL }, 500,   "udp"  },
-  { "stmf",            { NULL }, 501,   "tcp"  },
-  { "stmf",            { NULL }, 501,   "udp"  },
-  { "asa-appl-proto",  { NULL }, 502,   "tcp"  },
-  { "asa-appl-proto",  { NULL }, 502,   "udp"  },
-  { "intrinsa",        { NULL }, 503,   "tcp"  },
-  { "intrinsa",        { NULL }, 503,   "udp"  },
-  { "citadel",         { NULL }, 504,   "tcp"  },
-  { "citadel",         { NULL }, 504,   "udp"  },
-  { "mailbox-lm",      { NULL }, 505,   "tcp"  },
-  { "mailbox-lm",      { NULL }, 505,   "udp"  },
-  { "ohimsrv",         { NULL }, 506,   "tcp"  },
-  { "ohimsrv",         { NULL }, 506,   "udp"  },
-  { "crs",             { NULL }, 507,   "tcp"  },
-  { "crs",             { NULL }, 507,   "udp"  },
-  { "xvttp",           { NULL }, 508,   "tcp"  },
-  { "xvttp",           { NULL }, 508,   "udp"  },
-  { "snare",           { NULL }, 509,   "tcp"  },
-  { "snare",           { NULL }, 509,   "udp"  },
-  { "fcp",             { NULL }, 510,   "tcp"  },
-  { "fcp",             { NULL }, 510,   "udp"  },
-  { "passgo",          { NULL }, 511,   "tcp"  },
-  { "passgo",          { NULL }, 511,   "udp"  },
-  { "exec",            { NULL }, 512,   "tcp"  },
-  { "comsat",          { NULL }, 512,   "udp"  },
-  { "biff",            { NULL }, 512,   "udp"  },
-  { "login",           { NULL }, 513,   "tcp"  },
-  { "who",             { NULL }, 513,   "udp"  },
-  { "shell",           { NULL }, 514,   "tcp"  },
-  { "syslog",          { NULL }, 514,   "udp"  },
-  { "printer",         { NULL }, 515,   "tcp"  },
-  { "printer",         { NULL }, 515,   "udp"  },
-  { "videotex",        { NULL }, 516,   "tcp"  },
-  { "videotex",        { NULL }, 516,   "udp"  },
-  { "talk",            { NULL }, 517,   "tcp"  },
-  { "talk",            { NULL }, 517,   "udp"  },
-  { "ntalk",           { NULL }, 518,   "tcp"  },
-  { "ntalk",           { NULL }, 518,   "udp"  },
-  { "utime",           { NULL }, 519,   "tcp"  },
-  { "utime",           { NULL }, 519,   "udp"  },
-  { "efs",             { NULL }, 520,   "tcp"  },
-  { "router",          { NULL }, 520,   "udp"  },
-  { "ripng",           { NULL }, 521,   "tcp"  },
-  { "ripng",           { NULL }, 521,   "udp"  },
-  { "ulp",             { NULL }, 522,   "tcp"  },
-  { "ulp",             { NULL }, 522,   "udp"  },
-  { "ibm-db2",         { NULL }, 523,   "tcp"  },
-  { "ibm-db2",         { NULL }, 523,   "udp"  },
-  { "ncp",             { NULL }, 524,   "tcp"  },
-  { "ncp",             { NULL }, 524,   "udp"  },
-  { "timed",           { NULL }, 525,   "tcp"  },
-  { "timed",           { NULL }, 525,   "udp"  },
-  { "tempo",           { NULL }, 526,   "tcp"  },
-  { "tempo",           { NULL }, 526,   "udp"  },
-  { "stx",             { NULL }, 527,   "tcp"  },
-  { "stx",             { NULL }, 527,   "udp"  },
-  { "custix",          { NULL }, 528,   "tcp"  },
-  { "custix",          { NULL }, 528,   "udp"  },
-  { "irc-serv",        { NULL }, 529,   "tcp"  },
-  { "irc-serv",        { NULL }, 529,   "udp"  },
-  { "courier",         { NULL }, 530,   "tcp"  },
-  { "courier",         { NULL }, 530,   "udp"  },
-  { "conference",      { NULL }, 531,   "tcp"  },
-  { "conference",      { NULL }, 531,   "udp"  },
-  { "netnews",         { NULL }, 532,   "tcp"  },
-  { "netnews",         { NULL }, 532,   "udp"  },
-  { "netwall",         { NULL }, 533,   "tcp"  },
-  { "netwall",         { NULL }, 533,   "udp"  },
-  { "windream",        { NULL }, 534,   "tcp"  },
-  { "windream",        { NULL }, 534,   "udp"  },
-  { "iiop",            { NULL }, 535,   "tcp"  },
-  { "iiop",            { NULL }, 535,   "udp"  },
-  { "opalis-rdv",      { NULL }, 536,   "tcp"  },
-  { "opalis-rdv",      { NULL }, 536,   "udp"  },
-  { "nmsp",            { NULL }, 537,   "tcp"  },
-  { "nmsp",            { NULL }, 537,   "udp"  },
-  { "gdomap",          { NULL }, 538,   "tcp"  },
-  { "gdomap",          { NULL }, 538,   "udp"  },
-  { "apertus-ldp",     { NULL }, 539,   "tcp"  },
-  { "apertus-ldp",     { NULL }, 539,   "udp"  },
-  { "uucp",            { NULL }, 540,   "tcp"  },
-  { "uucp",            { NULL }, 540,   "udp"  },
-  { "uucp-rlogin",     { NULL }, 541,   "tcp"  },
-  { "uucp-rlogin",     { NULL }, 541,   "udp"  },
-  { "commerce",        { NULL }, 542,   "tcp"  },
-  { "commerce",        { NULL }, 542,   "udp"  },
-  { "klogin",          { NULL }, 543,   "tcp"  },
-  { "klogin",          { NULL }, 543,   "udp"  },
-  { "kshell",          { NULL }, 544,   "tcp"  },
-  { "kshell",          { NULL }, 544,   "udp"  },
-  { "appleqtcsrvr",    { NULL }, 545,   "tcp"  },
-  { "appleqtcsrvr",    { NULL }, 545,   "udp"  },
-  { "dhcpv6-client",   { NULL }, 546,   "tcp"  },
-  { "dhcpv6-client",   { NULL }, 546,   "udp"  },
-  { "dhcpv6-server",   { NULL }, 547,   "tcp"  },
-  { "dhcpv6-server",   { NULL }, 547,   "udp"  },
-  { "afpovertcp",      { NULL }, 548,   "tcp"  },
-  { "afpovertcp",      { NULL }, 548,   "udp"  },
-  { "idfp",            { NULL }, 549,   "tcp"  },
-  { "idfp",            { NULL }, 549,   "udp"  },
-  { "new-rwho",        { NULL }, 550,   "tcp"  },
-  { "new-rwho",        { NULL }, 550,   "udp"  },
-  { "cybercash",       { NULL }, 551,   "tcp"  },
-  { "cybercash",       { NULL }, 551,   "udp"  },
-  { "devshr-nts",      { NULL }, 552,   "tcp"  },
-  { "devshr-nts",      { NULL }, 552,   "udp"  },
-  { "pirp",            { NULL }, 553,   "tcp"  },
-  { "pirp",            { NULL }, 553,   "udp"  },
-  { "rtsp",            { NULL }, 554,   "tcp"  },
-  { "rtsp",            { NULL }, 554,   "udp"  },
-  { "dsf",             { NULL }, 555,   "tcp"  },
-  { "dsf",             { NULL }, 555,   "udp"  },
-  { "remotefs",        { NULL }, 556,   "tcp"  },
-  { "remotefs",        { NULL }, 556,   "udp"  },
-  { "openvms-sysipc",  { NULL }, 557,   "tcp"  },
-  { "openvms-sysipc",  { NULL }, 557,   "udp"  },
-  { "sdnskmp",         { NULL }, 558,   "tcp"  },
-  { "sdnskmp",         { NULL }, 558,   "udp"  },
-  { "teedtap",         { NULL }, 559,   "tcp"  },
-  { "teedtap",         { NULL }, 559,   "udp"  },
-  { "rmonitor",        { NULL }, 560,   "tcp"  },
-  { "rmonitor",        { NULL }, 560,   "udp"  },
-  { "monitor",         { NULL }, 561,   "tcp"  },
-  { "monitor",         { NULL }, 561,   "udp"  },
-  { "chshell",         { NULL }, 562,   "tcp"  },
-  { "chshell",         { NULL }, 562,   "udp"  },
-  { "nntps",           { NULL }, 563,   "tcp"  },
-  { "nntps",           { NULL }, 563,   "udp"  },
-  { "9pfs",            { NULL }, 564,   "tcp"  },
-  { "9pfs",            { NULL }, 564,   "udp"  },
-  { "whoami",          { NULL }, 565,   "tcp"  },
-  { "whoami",          { NULL }, 565,   "udp"  },
-  { "streettalk",      { NULL }, 566,   "tcp"  },
-  { "streettalk",      { NULL }, 566,   "udp"  },
-  { "banyan-rpc",      { NULL }, 567,   "tcp"  },
-  { "banyan-rpc",      { NULL }, 567,   "udp"  },
-  { "ms-shuttle",      { NULL }, 568,   "tcp"  },
-  { "ms-shuttle",      { NULL }, 568,   "udp"  },
-  { "ms-rome",         { NULL }, 569,   "tcp"  },
-  { "ms-rome",         { NULL }, 569,   "udp"  },
-  { "meter",           { NULL }, 570,   "tcp"  },
-  { "meter",           { NULL }, 570,   "udp"  },
-  { "meter",           { NULL }, 571,   "tcp"  },
-  { "meter",           { NULL }, 571,   "udp"  },
-  { "sonar",           { NULL }, 572,   "tcp"  },
-  { "sonar",           { NULL }, 572,   "udp"  },
-  { "banyan-vip",      { NULL }, 573,   "tcp"  },
-  { "banyan-vip",      { NULL }, 573,   "udp"  },
-  { "ftp-agent",       { NULL }, 574,   "tcp"  },
-  { "ftp-agent",       { NULL }, 574,   "udp"  },
-  { "vemmi",           { NULL }, 575,   "tcp"  },
-  { "vemmi",           { NULL }, 575,   "udp"  },
-  { "ipcd",            { NULL }, 576,   "tcp"  },
-  { "ipcd",            { NULL }, 576,   "udp"  },
-  { "vnas",            { NULL }, 577,   "tcp"  },
-  { "vnas",            { NULL }, 577,   "udp"  },
-  { "ipdd",            { NULL }, 578,   "tcp"  },
-  { "ipdd",            { NULL }, 578,   "udp"  },
-  { "decbsrv",         { NULL }, 579,   "tcp"  },
-  { "decbsrv",         { NULL }, 579,   "udp"  },
-  { "sntp-heartbeat",  { NULL }, 580,   "tcp"  },
-  { "sntp-heartbeat",  { NULL }, 580,   "udp"  },
-  { "bdp",             { NULL }, 581,   "tcp"  },
-  { "bdp",             { NULL }, 581,   "udp"  },
-  { "scc-security",    { NULL }, 582,   "tcp"  },
-  { "scc-security",    { NULL }, 582,   "udp"  },
-  { "philips-vc",      { NULL }, 583,   "tcp"  },
-  { "philips-vc",      { NULL }, 583,   "udp"  },
-  { "keyserver",       { NULL }, 584,   "tcp"  },
-  { "keyserver",       { NULL }, 584,   "udp"  },
-  { "password-chg",    { NULL }, 586,   "tcp"  },
-  { "password-chg",    { NULL }, 586,   "udp"  },
-  { "submission",      { NULL }, 587,   "tcp"  },
-  { "submission",      { NULL }, 587,   "udp"  },
-  { "cal",             { NULL }, 588,   "tcp"  },
-  { "cal",             { NULL }, 588,   "udp"  },
-  { "eyelink",         { NULL }, 589,   "tcp"  },
-  { "eyelink",         { NULL }, 589,   "udp"  },
-  { "tns-cml",         { NULL }, 590,   "tcp"  },
-  { "tns-cml",         { NULL }, 590,   "udp"  },
-  { "http-alt",        { NULL }, 591,   "tcp"  },
-  { "http-alt",        { NULL }, 591,   "udp"  },
-  { "eudora-set",      { NULL }, 592,   "tcp"  },
-  { "eudora-set",      { NULL }, 592,   "udp"  },
-  { "http-rpc-epmap",  { NULL }, 593,   "tcp"  },
-  { "http-rpc-epmap",  { NULL }, 593,   "udp"  },
-  { "tpip",            { NULL }, 594,   "tcp"  },
-  { "tpip",            { NULL }, 594,   "udp"  },
-  { "cab-protocol",    { NULL }, 595,   "tcp"  },
-  { "cab-protocol",    { NULL }, 595,   "udp"  },
-  { "smsd",            { NULL }, 596,   "tcp"  },
-  { "smsd",            { NULL }, 596,   "udp"  },
-  { "ptcnameservice",  { NULL }, 597,   "tcp"  },
-  { "ptcnameservice",  { NULL }, 597,   "udp"  },
-  { "sco-websrvrmg3",  { NULL }, 598,   "tcp"  },
-  { "sco-websrvrmg3",  { NULL }, 598,   "udp"  },
-  { "acp",             { NULL }, 599,   "tcp"  },
-  { "acp",             { NULL }, 599,   "udp"  },
-  { "ipcserver",       { NULL }, 600,   "tcp"  },
-  { "ipcserver",       { NULL }, 600,   "udp"  },
-  { "syslog-conn",     { NULL }, 601,   "tcp"  },
-  { "syslog-conn",     { NULL }, 601,   "udp"  },
-  { "xmlrpc-beep",     { NULL }, 602,   "tcp"  },
-  { "xmlrpc-beep",     { NULL }, 602,   "udp"  },
-  { "idxp",            { NULL }, 603,   "tcp"  },
-  { "idxp",            { NULL }, 603,   "udp"  },
-  { "tunnel",          { NULL }, 604,   "tcp"  },
-  { "tunnel",          { NULL }, 604,   "udp"  },
-  { "soap-beep",       { NULL }, 605,   "tcp"  },
-  { "soap-beep",       { NULL }, 605,   "udp"  },
-  { "urm",             { NULL }, 606,   "tcp"  },
-  { "urm",             { NULL }, 606,   "udp"  },
-  { "nqs",             { NULL }, 607,   "tcp"  },
-  { "nqs",             { NULL }, 607,   "udp"  },
-  { "sift-uft",        { NULL }, 608,   "tcp"  },
-  { "sift-uft",        { NULL }, 608,   "udp"  },
-  { "npmp-trap",       { NULL }, 609,   "tcp"  },
-  { "npmp-trap",       { NULL }, 609,   "udp"  },
-  { "npmp-local",      { NULL }, 610,   "tcp"  },
-  { "npmp-local",      { NULL }, 610,   "udp"  },
-  { "npmp-gui",        { NULL }, 611,   "tcp"  },
-  { "npmp-gui",        { NULL }, 611,   "udp"  },
-  { "hmmp-ind",        { NULL }, 612,   "tcp"  },
-  { "hmmp-ind",        { NULL }, 612,   "udp"  },
-  { "hmmp-op",         { NULL }, 613,   "tcp"  },
-  { "hmmp-op",         { NULL }, 613,   "udp"  },
-  { "sshell",          { NULL }, 614,   "tcp"  },
-  { "sshell",          { NULL }, 614,   "udp"  },
-  { "sco-inetmgr",     { NULL }, 615,   "tcp"  },
-  { "sco-inetmgr",     { NULL }, 615,   "udp"  },
-  { "sco-sysmgr",      { NULL }, 616,   "tcp"  },
-  { "sco-sysmgr",      { NULL }, 616,   "udp"  },
-  { "sco-dtmgr",       { NULL }, 617,   "tcp"  },
-  { "sco-dtmgr",       { NULL }, 617,   "udp"  },
-  { "dei-icda",        { NULL }, 618,   "tcp"  },
-  { "dei-icda",        { NULL }, 618,   "udp"  },
-  { "compaq-evm",      { NULL }, 619,   "tcp"  },
-  { "compaq-evm",      { NULL }, 619,   "udp"  },
-  { "sco-websrvrmgr",  { NULL }, 620,   "tcp"  },
-  { "sco-websrvrmgr",  { NULL }, 620,   "udp"  },
-  { "escp-ip",         { NULL }, 621,   "tcp"  },
-  { "escp-ip",         { NULL }, 621,   "udp"  },
-  { "collaborator",    { NULL }, 622,   "tcp"  },
-  { "collaborator",    { NULL }, 622,   "udp"  },
-  { "oob-ws-http",     { NULL }, 623,   "tcp"  },
-  { "asf-rmcp",        { NULL }, 623,   "udp"  },
-  { "cryptoadmin",     { NULL }, 624,   "tcp"  },
-  { "cryptoadmin",     { NULL }, 624,   "udp"  },
-  { "dec_dlm",         { NULL }, 625,   "tcp"  },
-  { "dec_dlm",         { NULL }, 625,   "udp"  },
-  { "asia",            { NULL }, 626,   "tcp"  },
-  { "asia",            { NULL }, 626,   "udp"  },
-  { "passgo-tivoli",   { NULL }, 627,   "tcp"  },
-  { "passgo-tivoli",   { NULL }, 627,   "udp"  },
-  { "qmqp",            { NULL }, 628,   "tcp"  },
-  { "qmqp",            { NULL }, 628,   "udp"  },
-  { "3com-amp3",       { NULL }, 629,   "tcp"  },
-  { "3com-amp3",       { NULL }, 629,   "udp"  },
-  { "rda",             { NULL }, 630,   "tcp"  },
-  { "rda",             { NULL }, 630,   "udp"  },
-  { "ipp",             { NULL }, 631,   "tcp"  },
-  { "ipp",             { NULL }, 631,   "udp"  },
-  { "bmpp",            { NULL }, 632,   "tcp"  },
-  { "bmpp",            { NULL }, 632,   "udp"  },
-  { "servstat",        { NULL }, 633,   "tcp"  },
-  { "servstat",        { NULL }, 633,   "udp"  },
-  { "ginad",           { NULL }, 634,   "tcp"  },
-  { "ginad",           { NULL }, 634,   "udp"  },
-  { "rlzdbase",        { NULL }, 635,   "tcp"  },
-  { "rlzdbase",        { NULL }, 635,   "udp"  },
-  { "ldaps",           { NULL }, 636,   "tcp"  },
-  { "ldaps",           { NULL }, 636,   "udp"  },
-  { "lanserver",       { NULL }, 637,   "tcp"  },
-  { "lanserver",       { NULL }, 637,   "udp"  },
-  { "mcns-sec",        { NULL }, 638,   "tcp"  },
-  { "mcns-sec",        { NULL }, 638,   "udp"  },
-  { "msdp",            { NULL }, 639,   "tcp"  },
-  { "msdp",            { NULL }, 639,   "udp"  },
-  { "entrust-sps",     { NULL }, 640,   "tcp"  },
-  { "entrust-sps",     { NULL }, 640,   "udp"  },
-  { "repcmd",          { NULL }, 641,   "tcp"  },
-  { "repcmd",          { NULL }, 641,   "udp"  },
-  { "esro-emsdp",      { NULL }, 642,   "tcp"  },
-  { "esro-emsdp",      { NULL }, 642,   "udp"  },
-  { "sanity",          { NULL }, 643,   "tcp"  },
-  { "sanity",          { NULL }, 643,   "udp"  },
-  { "dwr",             { NULL }, 644,   "tcp"  },
-  { "dwr",             { NULL }, 644,   "udp"  },
-  { "pssc",            { NULL }, 645,   "tcp"  },
-  { "pssc",            { NULL }, 645,   "udp"  },
-  { "ldp",             { NULL }, 646,   "tcp"  },
-  { "ldp",             { NULL }, 646,   "udp"  },
-  { "dhcp-failover",   { NULL }, 647,   "tcp"  },
-  { "dhcp-failover",   { NULL }, 647,   "udp"  },
-  { "rrp",             { NULL }, 648,   "tcp"  },
-  { "rrp",             { NULL }, 648,   "udp"  },
-  { "cadview-3d",      { NULL }, 649,   "tcp"  },
-  { "cadview-3d",      { NULL }, 649,   "udp"  },
-  { "obex",            { NULL }, 650,   "tcp"  },
-  { "obex",            { NULL }, 650,   "udp"  },
-  { "ieee-mms",        { NULL }, 651,   "tcp"  },
-  { "ieee-mms",        { NULL }, 651,   "udp"  },
-  { "hello-port",      { NULL }, 652,   "tcp"  },
-  { "hello-port",      { NULL }, 652,   "udp"  },
-  { "repscmd",         { NULL }, 653,   "tcp"  },
-  { "repscmd",         { NULL }, 653,   "udp"  },
-  { "aodv",            { NULL }, 654,   "tcp"  },
-  { "aodv",            { NULL }, 654,   "udp"  },
-  { "tinc",            { NULL }, 655,   "tcp"  },
-  { "tinc",            { NULL }, 655,   "udp"  },
-  { "spmp",            { NULL }, 656,   "tcp"  },
-  { "spmp",            { NULL }, 656,   "udp"  },
-  { "rmc",             { NULL }, 657,   "tcp"  },
-  { "rmc",             { NULL }, 657,   "udp"  },
-  { "tenfold",         { NULL }, 658,   "tcp"  },
-  { "tenfold",         { NULL }, 658,   "udp"  },
-  { "mac-srvr-admin",  { NULL }, 660,   "tcp"  },
-  { "mac-srvr-admin",  { NULL }, 660,   "udp"  },
-  { "hap",             { NULL }, 661,   "tcp"  },
-  { "hap",             { NULL }, 661,   "udp"  },
-  { "pftp",            { NULL }, 662,   "tcp"  },
-  { "pftp",            { NULL }, 662,   "udp"  },
-  { "purenoise",       { NULL }, 663,   "tcp"  },
-  { "purenoise",       { NULL }, 663,   "udp"  },
-  { "oob-ws-https",    { NULL }, 664,   "tcp"  },
-  { "asf-secure-rmcp", { NULL }, 664,   "udp"  },
-  { "sun-dr",          { NULL }, 665,   "tcp"  },
-  { "sun-dr",          { NULL }, 665,   "udp"  },
-  { "mdqs",            { NULL }, 666,   "tcp"  },
-  { "mdqs",            { NULL }, 666,   "udp"  },
-  { "doom",            { NULL }, 666,   "tcp"  },
-  { "doom",            { NULL }, 666,   "udp"  },
-  { "disclose",        { NULL }, 667,   "tcp"  },
-  { "disclose",        { NULL }, 667,   "udp"  },
-  { "mecomm",          { NULL }, 668,   "tcp"  },
-  { "mecomm",          { NULL }, 668,   "udp"  },
-  { "meregister",      { NULL }, 669,   "tcp"  },
-  { "meregister",      { NULL }, 669,   "udp"  },
-  { "vacdsm-sws",      { NULL }, 670,   "tcp"  },
-  { "vacdsm-sws",      { NULL }, 670,   "udp"  },
-  { "vacdsm-app",      { NULL }, 671,   "tcp"  },
-  { "vacdsm-app",      { NULL }, 671,   "udp"  },
-  { "vpps-qua",        { NULL }, 672,   "tcp"  },
-  { "vpps-qua",        { NULL }, 672,   "udp"  },
-  { "cimplex",         { NULL }, 673,   "tcp"  },
-  { "cimplex",         { NULL }, 673,   "udp"  },
-  { "acap",            { NULL }, 674,   "tcp"  },
-  { "acap",            { NULL }, 674,   "udp"  },
-  { "dctp",            { NULL }, 675,   "tcp"  },
-  { "dctp",            { NULL }, 675,   "udp"  },
-  { "vpps-via",        { NULL }, 676,   "tcp"  },
-  { "vpps-via",        { NULL }, 676,   "udp"  },
-  { "vpp",             { NULL }, 677,   "tcp"  },
-  { "vpp",             { NULL }, 677,   "udp"  },
-  { "ggf-ncp",         { NULL }, 678,   "tcp"  },
-  { "ggf-ncp",         { NULL }, 678,   "udp"  },
-  { "mrm",             { NULL }, 679,   "tcp"  },
-  { "mrm",             { NULL }, 679,   "udp"  },
-  { "entrust-aaas",    { NULL }, 680,   "tcp"  },
-  { "entrust-aaas",    { NULL }, 680,   "udp"  },
-  { "entrust-aams",    { NULL }, 681,   "tcp"  },
-  { "entrust-aams",    { NULL }, 681,   "udp"  },
-  { "xfr",             { NULL }, 682,   "tcp"  },
-  { "xfr",             { NULL }, 682,   "udp"  },
-  { "corba-iiop",      { NULL }, 683,   "tcp"  },
-  { "corba-iiop",      { NULL }, 683,   "udp"  },
-  { "corba-iiop-ssl",  { NULL }, 684,   "tcp"  },
-  { "corba-iiop-ssl",  { NULL }, 684,   "udp"  },
-  { "mdc-portmapper",  { NULL }, 685,   "tcp"  },
-  { "mdc-portmapper",  { NULL }, 685,   "udp"  },
-  { "hcp-wismar",      { NULL }, 686,   "tcp"  },
-  { "hcp-wismar",      { NULL }, 686,   "udp"  },
-  { "asipregistry",    { NULL }, 687,   "tcp"  },
-  { "asipregistry",    { NULL }, 687,   "udp"  },
-  { "realm-rusd",      { NULL }, 688,   "tcp"  },
-  { "realm-rusd",      { NULL }, 688,   "udp"  },
-  { "nmap",            { NULL }, 689,   "tcp"  },
-  { "nmap",            { NULL }, 689,   "udp"  },
-  { "vatp",            { NULL }, 690,   "tcp"  },
-  { "vatp",            { NULL }, 690,   "udp"  },
-  { "msexch-routing",  { NULL }, 691,   "tcp"  },
-  { "msexch-routing",  { NULL }, 691,   "udp"  },
-  { "hyperwave-isp",   { NULL }, 692,   "tcp"  },
-  { "hyperwave-isp",   { NULL }, 692,   "udp"  },
-  { "connendp",        { NULL }, 693,   "tcp"  },
-  { "connendp",        { NULL }, 693,   "udp"  },
-  { "ha-cluster",      { NULL }, 694,   "tcp"  },
-  { "ha-cluster",      { NULL }, 694,   "udp"  },
-  { "ieee-mms-ssl",    { NULL }, 695,   "tcp"  },
-  { "ieee-mms-ssl",    { NULL }, 695,   "udp"  },
-  { "rushd",           { NULL }, 696,   "tcp"  },
-  { "rushd",           { NULL }, 696,   "udp"  },
-  { "uuidgen",         { NULL }, 697,   "tcp"  },
-  { "uuidgen",         { NULL }, 697,   "udp"  },
-  { "olsr",            { NULL }, 698,   "tcp"  },
-  { "olsr",            { NULL }, 698,   "udp"  },
-  { "accessnetwork",   { NULL }, 699,   "tcp"  },
-  { "accessnetwork",   { NULL }, 699,   "udp"  },
-  { "epp",             { NULL }, 700,   "tcp"  },
-  { "epp",             { NULL }, 700,   "udp"  },
-  { "lmp",             { NULL }, 701,   "tcp"  },
-  { "lmp",             { NULL }, 701,   "udp"  },
-  { "iris-beep",       { NULL }, 702,   "tcp"  },
-  { "iris-beep",       { NULL }, 702,   "udp"  },
-  { "elcsd",           { NULL }, 704,   "tcp"  },
-  { "elcsd",           { NULL }, 704,   "udp"  },
-  { "agentx",          { NULL }, 705,   "tcp"  },
-  { "agentx",          { NULL }, 705,   "udp"  },
-  { "silc",            { NULL }, 706,   "tcp"  },
-  { "silc",            { NULL }, 706,   "udp"  },
-  { "borland-dsj",     { NULL }, 707,   "tcp"  },
-  { "borland-dsj",     { NULL }, 707,   "udp"  },
-  { "entrust-kmsh",    { NULL }, 709,   "tcp"  },
-  { "entrust-kmsh",    { NULL }, 709,   "udp"  },
-  { "entrust-ash",     { NULL }, 710,   "tcp"  },
-  { "entrust-ash",     { NULL }, 710,   "udp"  },
-  { "cisco-tdp",       { NULL }, 711,   "tcp"  },
-  { "cisco-tdp",       { NULL }, 711,   "udp"  },
-  { "tbrpf",           { NULL }, 712,   "tcp"  },
-  { "tbrpf",           { NULL }, 712,   "udp"  },
-  { "iris-xpc",        { NULL }, 713,   "tcp"  },
-  { "iris-xpc",        { NULL }, 713,   "udp"  },
-  { "iris-xpcs",       { NULL }, 714,   "tcp"  },
-  { "iris-xpcs",       { NULL }, 714,   "udp"  },
-  { "iris-lwz",        { NULL }, 715,   "tcp"  },
-  { "iris-lwz",        { NULL }, 715,   "udp"  },
-  { "pana",            { NULL }, 716,   "udp"  },
-  { "netviewdm1",      { NULL }, 729,   "tcp"  },
-  { "netviewdm1",      { NULL }, 729,   "udp"  },
-  { "netviewdm2",      { NULL }, 730,   "tcp"  },
-  { "netviewdm2",      { NULL }, 730,   "udp"  },
-  { "netviewdm3",      { NULL }, 731,   "tcp"  },
-  { "netviewdm3",      { NULL }, 731,   "udp"  },
-  { "netgw",           { NULL }, 741,   "tcp"  },
-  { "netgw",           { NULL }, 741,   "udp"  },
-  { "netrcs",          { NULL }, 742,   "tcp"  },
-  { "netrcs",          { NULL }, 742,   "udp"  },
-  { "flexlm",          { NULL }, 744,   "tcp"  },
-  { "flexlm",          { NULL }, 744,   "udp"  },
-  { "fujitsu-dev",     { NULL }, 747,   "tcp"  },
-  { "fujitsu-dev",     { NULL }, 747,   "udp"  },
-  { "ris-cm",          { NULL }, 748,   "tcp"  },
-  { "ris-cm",          { NULL }, 748,   "udp"  },
-  { "kerberos-adm",    { NULL }, 749,   "tcp"  },
-  { "kerberos-adm",    { NULL }, 749,   "udp"  },
-  { "rfile",           { NULL }, 750,   "tcp"  },
-  { "loadav",          { NULL }, 750,   "udp"  },
-  { "kerberos-iv",     { NULL }, 750,   "udp"  },
-  { "pump",            { NULL }, 751,   "tcp"  },
-  { "pump",            { NULL }, 751,   "udp"  },
-  { "qrh",             { NULL }, 752,   "tcp"  },
-  { "qrh",             { NULL }, 752,   "udp"  },
-  { "rrh",             { NULL }, 753,   "tcp"  },
-  { "rrh",             { NULL }, 753,   "udp"  },
-  { "tell",            { NULL }, 754,   "tcp"  },
-  { "tell",            { NULL }, 754,   "udp"  },
-  { "nlogin",          { NULL }, 758,   "tcp"  },
-  { "nlogin",          { NULL }, 758,   "udp"  },
-  { "con",             { NULL }, 759,   "tcp"  },
-  { "con",             { NULL }, 759,   "udp"  },
-  { "ns",              { NULL }, 760,   "tcp"  },
-  { "ns",              { NULL }, 760,   "udp"  },
-  { "rxe",             { NULL }, 761,   "tcp"  },
-  { "rxe",             { NULL }, 761,   "udp"  },
-  { "quotad",          { NULL }, 762,   "tcp"  },
-  { "quotad",          { NULL }, 762,   "udp"  },
-  { "cycleserv",       { NULL }, 763,   "tcp"  },
-  { "cycleserv",       { NULL }, 763,   "udp"  },
-  { "omserv",          { NULL }, 764,   "tcp"  },
-  { "omserv",          { NULL }, 764,   "udp"  },
-  { "webster",         { NULL }, 765,   "tcp"  },
-  { "webster",         { NULL }, 765,   "udp"  },
-  { "phonebook",       { NULL }, 767,   "tcp"  },
-  { "phonebook",       { NULL }, 767,   "udp"  },
-  { "vid",             { NULL }, 769,   "tcp"  },
-  { "vid",             { NULL }, 769,   "udp"  },
-  { "cadlock",         { NULL }, 770,   "tcp"  },
-  { "cadlock",         { NULL }, 770,   "udp"  },
-  { "rtip",            { NULL }, 771,   "tcp"  },
-  { "rtip",            { NULL }, 771,   "udp"  },
-  { "cycleserv2",      { NULL }, 772,   "tcp"  },
-  { "cycleserv2",      { NULL }, 772,   "udp"  },
-  { "submit",          { NULL }, 773,   "tcp"  },
-  { "notify",          { NULL }, 773,   "udp"  },
-  { "rpasswd",         { NULL }, 774,   "tcp"  },
-  { "acmaint_dbd",     { NULL }, 774,   "udp"  },
-  { "entomb",          { NULL }, 775,   "tcp"  },
-  { "acmaint_transd",  { NULL }, 775,   "udp"  },
-  { "wpages",          { NULL }, 776,   "tcp"  },
-  { "wpages",          { NULL }, 776,   "udp"  },
-  { "multiling-http",  { NULL }, 777,   "tcp"  },
-  { "multiling-http",  { NULL }, 777,   "udp"  },
-  { "wpgs",            { NULL }, 780,   "tcp"  },
-  { "wpgs",            { NULL }, 780,   "udp"  },
-  { "mdbs_daemon",     { NULL }, 800,   "tcp"  },
-  { "mdbs_daemon",     { NULL }, 800,   "udp"  },
-  { "device",          { NULL }, 801,   "tcp"  },
-  { "device",          { NULL }, 801,   "udp"  },
-  { "fcp-udp",         { NULL }, 810,   "tcp"  },
-  { "fcp-udp",         { NULL }, 810,   "udp"  },
-  { "itm-mcell-s",     { NULL }, 828,   "tcp"  },
-  { "itm-mcell-s",     { NULL }, 828,   "udp"  },
-  { "pkix-3-ca-ra",    { NULL }, 829,   "tcp"  },
-  { "pkix-3-ca-ra",    { NULL }, 829,   "udp"  },
-  { "netconf-ssh",     { NULL }, 830,   "tcp"  },
-  { "netconf-ssh",     { NULL }, 830,   "udp"  },
-  { "netconf-beep",    { NULL }, 831,   "tcp"  },
-  { "netconf-beep",    { NULL }, 831,   "udp"  },
-  { "netconfsoaphttp", { NULL }, 832,   "tcp"  },
-  { "netconfsoaphttp", { NULL }, 832,   "udp"  },
-  { "netconfsoapbeep", { NULL }, 833,   "tcp"  },
-  { "netconfsoapbeep", { NULL }, 833,   "udp"  },
-  { "dhcp-failover2",  { NULL }, 847,   "tcp"  },
-  { "dhcp-failover2",  { NULL }, 847,   "udp"  },
-  { "gdoi",            { NULL }, 848,   "tcp"  },
-  { "gdoi",            { NULL }, 848,   "udp"  },
-  { "iscsi",           { NULL }, 860,   "tcp"  },
-  { "iscsi",           { NULL }, 860,   "udp"  },
-  { "owamp-control",   { NULL }, 861,   "tcp"  },
-  { "owamp-control",   { NULL }, 861,   "udp"  },
-  { "twamp-control",   { NULL }, 862,   "tcp"  },
-  { "twamp-control",   { NULL }, 862,   "udp"  },
-  { "rsync",           { NULL }, 873,   "tcp"  },
-  { "rsync",           { NULL }, 873,   "udp"  },
-  { "iclcnet-locate",  { NULL }, 886,   "tcp"  },
-  { "iclcnet-locate",  { NULL }, 886,   "udp"  },
-  { "iclcnet_svinfo",  { NULL }, 887,   "tcp"  },
-  { "iclcnet_svinfo",  { NULL }, 887,   "udp"  },
-  { "accessbuilder",   { NULL }, 888,   "tcp"  },
-  { "accessbuilder",   { NULL }, 888,   "udp"  },
-  { "cddbp",           { NULL }, 888,   "tcp"  },
-  { "omginitialrefs",  { NULL }, 900,   "tcp"  },
-  { "omginitialrefs",  { NULL }, 900,   "udp"  },
-  { "smpnameres",      { NULL }, 901,   "tcp"  },
-  { "smpnameres",      { NULL }, 901,   "udp"  },
-  { "ideafarm-door",   { NULL }, 902,   "tcp"  },
-  { "ideafarm-door",   { NULL }, 902,   "udp"  },
-  { "ideafarm-panic",  { NULL }, 903,   "tcp"  },
-  { "ideafarm-panic",  { NULL }, 903,   "udp"  },
-  { "kink",            { NULL }, 910,   "tcp"  },
-  { "kink",            { NULL }, 910,   "udp"  },
-  { "xact-backup",     { NULL }, 911,   "tcp"  },
-  { "xact-backup",     { NULL }, 911,   "udp"  },
-  { "apex-mesh",       { NULL }, 912,   "tcp"  },
-  { "apex-mesh",       { NULL }, 912,   "udp"  },
-  { "apex-edge",       { NULL }, 913,   "tcp"  },
-  { "apex-edge",       { NULL }, 913,   "udp"  },
-  { "ftps-data",       { NULL }, 989,   "tcp"  },
-  { "ftps-data",       { NULL }, 989,   "udp"  },
-  { "ftps",            { NULL }, 990,   "tcp"  },
-  { "ftps",            { NULL }, 990,   "udp"  },
-  { "nas",             { NULL }, 991,   "tcp"  },
-  { "nas",             { NULL }, 991,   "udp"  },
-  { "telnets",         { NULL }, 992,   "tcp"  },
-  { "telnets",         { NULL }, 992,   "udp"  },
-  { "imaps",           { NULL }, 993,   "tcp"  },
-  { "imaps",           { NULL }, 993,   "udp"  },
-  { "ircs",            { NULL }, 994,   "tcp"  },
-  { "ircs",            { NULL }, 994,   "udp"  },
-  { "pop3s",           { NULL }, 995,   "tcp"  },
-  { "pop3s",           { NULL }, 995,   "udp"  },
-  { "vsinet",          { NULL }, 996,   "tcp"  },
-  { "vsinet",          { NULL }, 996,   "udp"  },
-  { "maitrd",          { NULL }, 997,   "tcp"  },
-  { "maitrd",          { NULL }, 997,   "udp"  },
-  { "busboy",          { NULL }, 998,   "tcp"  },
-  { "puparp",          { NULL }, 998,   "udp"  },
-  { "garcon",          { NULL }, 999,   "tcp"  },
-  { "applix",          { NULL }, 999,   "udp"  },
-  { "puprouter",       { NULL }, 999,   "tcp"  },
-  { "puprouter",       { NULL }, 999,   "udp"  },
-  { "cadlock2",        { NULL }, 1000,  "tcp"  },
-  { "cadlock2",        { NULL }, 1000,  "udp"  },
-  { "surf",            { NULL }, 1010,  "tcp"  },
-  { "surf",            { NULL }, 1010,  "udp"  },
-  { "exp1",            { NULL }, 1021,  "tcp"  },
-  { "exp1",            { NULL }, 1021,  "udp"  },
-  { "exp2",            { NULL }, 1022,  "tcp"  },
-  { "exp2",            { NULL }, 1022,  "udp"  },
-#  endif  /* USE_IANA_WELL_KNOWN_PORTS */
-#  ifdef USE_IANA_REGISTERED_PORTS
-  { "blackjack",       { NULL }, 1025,  "tcp"  },
-  { "blackjack",       { NULL }, 1025,  "udp"  },
-  { "cap",             { NULL }, 1026,  "tcp"  },
-  { "cap",             { NULL }, 1026,  "udp"  },
-  { "solid-mux",       { NULL }, 1029,  "tcp"  },
-  { "solid-mux",       { NULL }, 1029,  "udp"  },
-  { "iad1",            { NULL }, 1030,  "tcp"  },
-  { "iad1",            { NULL }, 1030,  "udp"  },
-  { "iad2",            { NULL }, 1031,  "tcp"  },
-  { "iad2",            { NULL }, 1031,  "udp"  },
-  { "iad3",            { NULL }, 1032,  "tcp"  },
-  { "iad3",            { NULL }, 1032,  "udp"  },
-  { "netinfo-local",   { NULL }, 1033,  "tcp"  },
-  { "netinfo-local",   { NULL }, 1033,  "udp"  },
-  { "activesync",      { NULL }, 1034,  "tcp"  },
-  { "activesync",      { NULL }, 1034,  "udp"  },
-  { "mxxrlogin",       { NULL }, 1035,  "tcp"  },
-  { "mxxrlogin",       { NULL }, 1035,  "udp"  },
-  { "nsstp",           { NULL }, 1036,  "tcp"  },
-  { "nsstp",           { NULL }, 1036,  "udp"  },
-  { "ams",             { NULL }, 1037,  "tcp"  },
-  { "ams",             { NULL }, 1037,  "udp"  },
-  { "mtqp",            { NULL }, 1038,  "tcp"  },
-  { "mtqp",            { NULL }, 1038,  "udp"  },
-  { "sbl",             { NULL }, 1039,  "tcp"  },
-  { "sbl",             { NULL }, 1039,  "udp"  },
-  { "netarx",          { NULL }, 1040,  "tcp"  },
-  { "netarx",          { NULL }, 1040,  "udp"  },
-  { "danf-ak2",        { NULL }, 1041,  "tcp"  },
-  { "danf-ak2",        { NULL }, 1041,  "udp"  },
-  { "afrog",           { NULL }, 1042,  "tcp"  },
-  { "afrog",           { NULL }, 1042,  "udp"  },
-  { "boinc-client",    { NULL }, 1043,  "tcp"  },
-  { "boinc-client",    { NULL }, 1043,  "udp"  },
-  { "dcutility",       { NULL }, 1044,  "tcp"  },
-  { "dcutility",       { NULL }, 1044,  "udp"  },
-  { "fpitp",           { NULL }, 1045,  "tcp"  },
-  { "fpitp",           { NULL }, 1045,  "udp"  },
-  { "wfremotertm",     { NULL }, 1046,  "tcp"  },
-  { "wfremotertm",     { NULL }, 1046,  "udp"  },
-  { "neod1",           { NULL }, 1047,  "tcp"  },
-  { "neod1",           { NULL }, 1047,  "udp"  },
-  { "neod2",           { NULL }, 1048,  "tcp"  },
-  { "neod2",           { NULL }, 1048,  "udp"  },
-  { "td-postman",      { NULL }, 1049,  "tcp"  },
-  { "td-postman",      { NULL }, 1049,  "udp"  },
-  { "cma",             { NULL }, 1050,  "tcp"  },
-  { "cma",             { NULL }, 1050,  "udp"  },
-  { "optima-vnet",     { NULL }, 1051,  "tcp"  },
-  { "optima-vnet",     { NULL }, 1051,  "udp"  },
-  { "ddt",             { NULL }, 1052,  "tcp"  },
-  { "ddt",             { NULL }, 1052,  "udp"  },
-  { "remote-as",       { NULL }, 1053,  "tcp"  },
-  { "remote-as",       { NULL }, 1053,  "udp"  },
-  { "brvread",         { NULL }, 1054,  "tcp"  },
-  { "brvread",         { NULL }, 1054,  "udp"  },
-  { "ansyslmd",        { NULL }, 1055,  "tcp"  },
-  { "ansyslmd",        { NULL }, 1055,  "udp"  },
-  { "vfo",             { NULL }, 1056,  "tcp"  },
-  { "vfo",             { NULL }, 1056,  "udp"  },
-  { "startron",        { NULL }, 1057,  "tcp"  },
-  { "startron",        { NULL }, 1057,  "udp"  },
-  { "nim",             { NULL }, 1058,  "tcp"  },
-  { "nim",             { NULL }, 1058,  "udp"  },
-  { "nimreg",          { NULL }, 1059,  "tcp"  },
-  { "nimreg",          { NULL }, 1059,  "udp"  },
-  { "polestar",        { NULL }, 1060,  "tcp"  },
-  { "polestar",        { NULL }, 1060,  "udp"  },
-  { "kiosk",           { NULL }, 1061,  "tcp"  },
-  { "kiosk",           { NULL }, 1061,  "udp"  },
-  { "veracity",        { NULL }, 1062,  "tcp"  },
-  { "veracity",        { NULL }, 1062,  "udp"  },
-  { "kyoceranetdev",   { NULL }, 1063,  "tcp"  },
-  { "kyoceranetdev",   { NULL }, 1063,  "udp"  },
-  { "jstel",           { NULL }, 1064,  "tcp"  },
-  { "jstel",           { NULL }, 1064,  "udp"  },
-  { "syscomlan",       { NULL }, 1065,  "tcp"  },
-  { "syscomlan",       { NULL }, 1065,  "udp"  },
-  { "fpo-fns",         { NULL }, 1066,  "tcp"  },
-  { "fpo-fns",         { NULL }, 1066,  "udp"  },
-  { "instl_boots",     { NULL }, 1067,  "tcp"  },
-  { "instl_boots",     { NULL }, 1067,  "udp"  },
-  { "instl_bootc",     { NULL }, 1068,  "tcp"  },
-  { "instl_bootc",     { NULL }, 1068,  "udp"  },
-  { "cognex-insight",  { NULL }, 1069,  "tcp"  },
-  { "cognex-insight",  { NULL }, 1069,  "udp"  },
-  { "gmrupdateserv",   { NULL }, 1070,  "tcp"  },
-  { "gmrupdateserv",   { NULL }, 1070,  "udp"  },
-  { "bsquare-voip",    { NULL }, 1071,  "tcp"  },
-  { "bsquare-voip",    { NULL }, 1071,  "udp"  },
-  { "cardax",          { NULL }, 1072,  "tcp"  },
-  { "cardax",          { NULL }, 1072,  "udp"  },
-  { "bridgecontrol",   { NULL }, 1073,  "tcp"  },
-  { "bridgecontrol",   { NULL }, 1073,  "udp"  },
-  { "warmspotMgmt",    { NULL }, 1074,  "tcp"  },
-  { "warmspotMgmt",    { NULL }, 1074,  "udp"  },
-  { "rdrmshc",         { NULL }, 1075,  "tcp"  },
-  { "rdrmshc",         { NULL }, 1075,  "udp"  },
-  { "dab-sti-c",       { NULL }, 1076,  "tcp"  },
-  { "dab-sti-c",       { NULL }, 1076,  "udp"  },
-  { "imgames",         { NULL }, 1077,  "tcp"  },
-  { "imgames",         { NULL }, 1077,  "udp"  },
-  { "avocent-proxy",   { NULL }, 1078,  "tcp"  },
-  { "avocent-proxy",   { NULL }, 1078,  "udp"  },
-  { "asprovatalk",     { NULL }, 1079,  "tcp"  },
-  { "asprovatalk",     { NULL }, 1079,  "udp"  },
-  { "socks",           { NULL }, 1080,  "tcp"  },
-  { "socks",           { NULL }, 1080,  "udp"  },
-  { "pvuniwien",       { NULL }, 1081,  "tcp"  },
-  { "pvuniwien",       { NULL }, 1081,  "udp"  },
-  { "amt-esd-prot",    { NULL }, 1082,  "tcp"  },
-  { "amt-esd-prot",    { NULL }, 1082,  "udp"  },
-  { "ansoft-lm-1",     { NULL }, 1083,  "tcp"  },
-  { "ansoft-lm-1",     { NULL }, 1083,  "udp"  },
-  { "ansoft-lm-2",     { NULL }, 1084,  "tcp"  },
-  { "ansoft-lm-2",     { NULL }, 1084,  "udp"  },
-  { "webobjects",      { NULL }, 1085,  "tcp"  },
-  { "webobjects",      { NULL }, 1085,  "udp"  },
-  { "cplscrambler-lg", { NULL }, 1086,  "tcp"  },
-  { "cplscrambler-lg", { NULL }, 1086,  "udp"  },
-  { "cplscrambler-in", { NULL }, 1087,  "tcp"  },
-  { "cplscrambler-in", { NULL }, 1087,  "udp"  },
-  { "cplscrambler-al", { NULL }, 1088,  "tcp"  },
-  { "cplscrambler-al", { NULL }, 1088,  "udp"  },
-  { "ff-annunc",       { NULL }, 1089,  "tcp"  },
-  { "ff-annunc",       { NULL }, 1089,  "udp"  },
-  { "ff-fms",          { NULL }, 1090,  "tcp"  },
-  { "ff-fms",          { NULL }, 1090,  "udp"  },
-  { "ff-sm",           { NULL }, 1091,  "tcp"  },
-  { "ff-sm",           { NULL }, 1091,  "udp"  },
-  { "obrpd",           { NULL }, 1092,  "tcp"  },
-  { "obrpd",           { NULL }, 1092,  "udp"  },
-  { "proofd",          { NULL }, 1093,  "tcp"  },
-  { "proofd",          { NULL }, 1093,  "udp"  },
-  { "rootd",           { NULL }, 1094,  "tcp"  },
-  { "rootd",           { NULL }, 1094,  "udp"  },
-  { "nicelink",        { NULL }, 1095,  "tcp"  },
-  { "nicelink",        { NULL }, 1095,  "udp"  },
-  { "cnrprotocol",     { NULL }, 1096,  "tcp"  },
-  { "cnrprotocol",     { NULL }, 1096,  "udp"  },
-  { "sunclustermgr",   { NULL }, 1097,  "tcp"  },
-  { "sunclustermgr",   { NULL }, 1097,  "udp"  },
-  { "rmiactivation",   { NULL }, 1098,  "tcp"  },
-  { "rmiactivation",   { NULL }, 1098,  "udp"  },
-  { "rmiregistry",     { NULL }, 1099,  "tcp"  },
-  { "rmiregistry",     { NULL }, 1099,  "udp"  },
-  { "mctp",            { NULL }, 1100,  "tcp"  },
-  { "mctp",            { NULL }, 1100,  "udp"  },
-  { "pt2-discover",    { NULL }, 1101,  "tcp"  },
-  { "pt2-discover",    { NULL }, 1101,  "udp"  },
-  { "adobeserver-1",   { NULL }, 1102,  "tcp"  },
-  { "adobeserver-1",   { NULL }, 1102,  "udp"  },
-  { "adobeserver-2",   { NULL }, 1103,  "tcp"  },
-  { "adobeserver-2",   { NULL }, 1103,  "udp"  },
-  { "xrl",             { NULL }, 1104,  "tcp"  },
-  { "xrl",             { NULL }, 1104,  "udp"  },
-  { "ftranhc",         { NULL }, 1105,  "tcp"  },
-  { "ftranhc",         { NULL }, 1105,  "udp"  },
-  { "isoipsigport-1",  { NULL }, 1106,  "tcp"  },
-  { "isoipsigport-1",  { NULL }, 1106,  "udp"  },
-  { "isoipsigport-2",  { NULL }, 1107,  "tcp"  },
-  { "isoipsigport-2",  { NULL }, 1107,  "udp"  },
-  { "ratio-adp",       { NULL }, 1108,  "tcp"  },
-  { "ratio-adp",       { NULL }, 1108,  "udp"  },
-  { "webadmstart",     { NULL }, 1110,  "tcp"  },
-  { "nfsd-keepalive",  { NULL }, 1110,  "udp"  },
-  { "lmsocialserver",  { NULL }, 1111,  "tcp"  },
-  { "lmsocialserver",  { NULL }, 1111,  "udp"  },
-  { "icp",             { NULL }, 1112,  "tcp"  },
-  { "icp",             { NULL }, 1112,  "udp"  },
-  { "ltp-deepspace",   { NULL }, 1113,  "tcp"  },
-  { "ltp-deepspace",   { NULL }, 1113,  "udp"  },
-  { "mini-sql",        { NULL }, 1114,  "tcp"  },
-  { "mini-sql",        { NULL }, 1114,  "udp"  },
-  { "ardus-trns",      { NULL }, 1115,  "tcp"  },
-  { "ardus-trns",      { NULL }, 1115,  "udp"  },
-  { "ardus-cntl",      { NULL }, 1116,  "tcp"  },
-  { "ardus-cntl",      { NULL }, 1116,  "udp"  },
-  { "ardus-mtrns",     { NULL }, 1117,  "tcp"  },
-  { "ardus-mtrns",     { NULL }, 1117,  "udp"  },
-  { "sacred",          { NULL }, 1118,  "tcp"  },
-  { "sacred",          { NULL }, 1118,  "udp"  },
-  { "bnetgame",        { NULL }, 1119,  "tcp"  },
-  { "bnetgame",        { NULL }, 1119,  "udp"  },
-  { "bnetfile",        { NULL }, 1120,  "tcp"  },
-  { "bnetfile",        { NULL }, 1120,  "udp"  },
-  { "rmpp",            { NULL }, 1121,  "tcp"  },
-  { "rmpp",            { NULL }, 1121,  "udp"  },
-  { "availant-mgr",    { NULL }, 1122,  "tcp"  },
-  { "availant-mgr",    { NULL }, 1122,  "udp"  },
-  { "murray",          { NULL }, 1123,  "tcp"  },
-  { "murray",          { NULL }, 1123,  "udp"  },
-  { "hpvmmcontrol",    { NULL }, 1124,  "tcp"  },
-  { "hpvmmcontrol",    { NULL }, 1124,  "udp"  },
-  { "hpvmmagent",      { NULL }, 1125,  "tcp"  },
-  { "hpvmmagent",      { NULL }, 1125,  "udp"  },
-  { "hpvmmdata",       { NULL }, 1126,  "tcp"  },
-  { "hpvmmdata",       { NULL }, 1126,  "udp"  },
-  { "kwdb-commn",      { NULL }, 1127,  "tcp"  },
-  { "kwdb-commn",      { NULL }, 1127,  "udp"  },
-  { "saphostctrl",     { NULL }, 1128,  "tcp"  },
-  { "saphostctrl",     { NULL }, 1128,  "udp"  },
-  { "saphostctrls",    { NULL }, 1129,  "tcp"  },
-  { "saphostctrls",    { NULL }, 1129,  "udp"  },
-  { "casp",            { NULL }, 1130,  "tcp"  },
-  { "casp",            { NULL }, 1130,  "udp"  },
-  { "caspssl",         { NULL }, 1131,  "tcp"  },
-  { "caspssl",         { NULL }, 1131,  "udp"  },
-  { "kvm-via-ip",      { NULL }, 1132,  "tcp"  },
-  { "kvm-via-ip",      { NULL }, 1132,  "udp"  },
-  { "dfn",             { NULL }, 1133,  "tcp"  },
-  { "dfn",             { NULL }, 1133,  "udp"  },
-  { "aplx",            { NULL }, 1134,  "tcp"  },
-  { "aplx",            { NULL }, 1134,  "udp"  },
-  { "omnivision",      { NULL }, 1135,  "tcp"  },
-  { "omnivision",      { NULL }, 1135,  "udp"  },
-  { "hhb-gateway",     { NULL }, 1136,  "tcp"  },
-  { "hhb-gateway",     { NULL }, 1136,  "udp"  },
-  { "trim",            { NULL }, 1137,  "tcp"  },
-  { "trim",            { NULL }, 1137,  "udp"  },
-  { "encrypted_admin", { NULL }, 1138,  "tcp"  },
-  { "encrypted_admin", { NULL }, 1138,  "udp"  },
-  { "evm",             { NULL }, 1139,  "tcp"  },
-  { "evm",             { NULL }, 1139,  "udp"  },
-  { "autonoc",         { NULL }, 1140,  "tcp"  },
-  { "autonoc",         { NULL }, 1140,  "udp"  },
-  { "mxomss",          { NULL }, 1141,  "tcp"  },
-  { "mxomss",          { NULL }, 1141,  "udp"  },
-  { "edtools",         { NULL }, 1142,  "tcp"  },
-  { "edtools",         { NULL }, 1142,  "udp"  },
-  { "imyx",            { NULL }, 1143,  "tcp"  },
-  { "imyx",            { NULL }, 1143,  "udp"  },
-  { "fuscript",        { NULL }, 1144,  "tcp"  },
-  { "fuscript",        { NULL }, 1144,  "udp"  },
-  { "x9-icue",         { NULL }, 1145,  "tcp"  },
-  { "x9-icue",         { NULL }, 1145,  "udp"  },
-  { "audit-transfer",  { NULL }, 1146,  "tcp"  },
-  { "audit-transfer",  { NULL }, 1146,  "udp"  },
-  { "capioverlan",     { NULL }, 1147,  "tcp"  },
-  { "capioverlan",     { NULL }, 1147,  "udp"  },
-  { "elfiq-repl",      { NULL }, 1148,  "tcp"  },
-  { "elfiq-repl",      { NULL }, 1148,  "udp"  },
-  { "bvtsonar",        { NULL }, 1149,  "tcp"  },
-  { "bvtsonar",        { NULL }, 1149,  "udp"  },
-  { "blaze",           { NULL }, 1150,  "tcp"  },
-  { "blaze",           { NULL }, 1150,  "udp"  },
-  { "unizensus",       { NULL }, 1151,  "tcp"  },
-  { "unizensus",       { NULL }, 1151,  "udp"  },
-  { "winpoplanmess",   { NULL }, 1152,  "tcp"  },
-  { "winpoplanmess",   { NULL }, 1152,  "udp"  },
-  { "c1222-acse",      { NULL }, 1153,  "tcp"  },
-  { "c1222-acse",      { NULL }, 1153,  "udp"  },
-  { "resacommunity",   { NULL }, 1154,  "tcp"  },
-  { "resacommunity",   { NULL }, 1154,  "udp"  },
-  { "nfa",             { NULL }, 1155,  "tcp"  },
-  { "nfa",             { NULL }, 1155,  "udp"  },
-  { "iascontrol-oms",  { NULL }, 1156,  "tcp"  },
-  { "iascontrol-oms",  { NULL }, 1156,  "udp"  },
-  { "iascontrol",      { NULL }, 1157,  "tcp"  },
-  { "iascontrol",      { NULL }, 1157,  "udp"  },
-  { "dbcontrol-oms",   { NULL }, 1158,  "tcp"  },
-  { "dbcontrol-oms",   { NULL }, 1158,  "udp"  },
-  { "oracle-oms",      { NULL }, 1159,  "tcp"  },
-  { "oracle-oms",      { NULL }, 1159,  "udp"  },
-  { "olsv",            { NULL }, 1160,  "tcp"  },
-  { "olsv",            { NULL }, 1160,  "udp"  },
-  { "health-polling",  { NULL }, 1161,  "tcp"  },
-  { "health-polling",  { NULL }, 1161,  "udp"  },
-  { "health-trap",     { NULL }, 1162,  "tcp"  },
-  { "health-trap",     { NULL }, 1162,  "udp"  },
-  { "sddp",            { NULL }, 1163,  "tcp"  },
-  { "sddp",            { NULL }, 1163,  "udp"  },
-  { "qsm-proxy",       { NULL }, 1164,  "tcp"  },
-  { "qsm-proxy",       { NULL }, 1164,  "udp"  },
-  { "qsm-gui",         { NULL }, 1165,  "tcp"  },
-  { "qsm-gui",         { NULL }, 1165,  "udp"  },
-  { "qsm-remote",      { NULL }, 1166,  "tcp"  },
-  { "qsm-remote",      { NULL }, 1166,  "udp"  },
-  { "cisco-ipsla",     { NULL }, 1167,  "tcp"  },
-  { "cisco-ipsla",     { NULL }, 1167,  "udp"  },
-  { "cisco-ipsla",     { NULL }, 1167,  "sctp" },
-  { "vchat",           { NULL }, 1168,  "tcp"  },
-  { "vchat",           { NULL }, 1168,  "udp"  },
-  { "tripwire",        { NULL }, 1169,  "tcp"  },
-  { "tripwire",        { NULL }, 1169,  "udp"  },
-  { "atc-lm",          { NULL }, 1170,  "tcp"  },
-  { "atc-lm",          { NULL }, 1170,  "udp"  },
-  { "atc-appserver",   { NULL }, 1171,  "tcp"  },
-  { "atc-appserver",   { NULL }, 1171,  "udp"  },
-  { "dnap",            { NULL }, 1172,  "tcp"  },
-  { "dnap",            { NULL }, 1172,  "udp"  },
-  { "d-cinema-rrp",    { NULL }, 1173,  "tcp"  },
-  { "d-cinema-rrp",    { NULL }, 1173,  "udp"  },
-  { "fnet-remote-ui",  { NULL }, 1174,  "tcp"  },
-  { "fnet-remote-ui",  { NULL }, 1174,  "udp"  },
-  { "dossier",         { NULL }, 1175,  "tcp"  },
-  { "dossier",         { NULL }, 1175,  "udp"  },
-  { "indigo-server",   { NULL }, 1176,  "tcp"  },
-  { "indigo-server",   { NULL }, 1176,  "udp"  },
-  { "dkmessenger",     { NULL }, 1177,  "tcp"  },
-  { "dkmessenger",     { NULL }, 1177,  "udp"  },
-  { "sgi-storman",     { NULL }, 1178,  "tcp"  },
-  { "sgi-storman",     { NULL }, 1178,  "udp"  },
-  { "b2n",             { NULL }, 1179,  "tcp"  },
-  { "b2n",             { NULL }, 1179,  "udp"  },
-  { "mc-client",       { NULL }, 1180,  "tcp"  },
-  { "mc-client",       { NULL }, 1180,  "udp"  },
-  { "3comnetman",      { NULL }, 1181,  "tcp"  },
-  { "3comnetman",      { NULL }, 1181,  "udp"  },
-  { "accelenet",       { NULL }, 1182,  "tcp"  },
-  { "accelenet-data",  { NULL }, 1182,  "udp"  },
-  { "llsurfup-http",   { NULL }, 1183,  "tcp"  },
-  { "llsurfup-http",   { NULL }, 1183,  "udp"  },
-  { "llsurfup-https",  { NULL }, 1184,  "tcp"  },
-  { "llsurfup-https",  { NULL }, 1184,  "udp"  },
-  { "catchpole",       { NULL }, 1185,  "tcp"  },
-  { "catchpole",       { NULL }, 1185,  "udp"  },
-  { "mysql-cluster",   { NULL }, 1186,  "tcp"  },
-  { "mysql-cluster",   { NULL }, 1186,  "udp"  },
-  { "alias",           { NULL }, 1187,  "tcp"  },
-  { "alias",           { NULL }, 1187,  "udp"  },
-  { "hp-webadmin",     { NULL }, 1188,  "tcp"  },
-  { "hp-webadmin",     { NULL }, 1188,  "udp"  },
-  { "unet",            { NULL }, 1189,  "tcp"  },
-  { "unet",            { NULL }, 1189,  "udp"  },
-  { "commlinx-avl",    { NULL }, 1190,  "tcp"  },
-  { "commlinx-avl",    { NULL }, 1190,  "udp"  },
-  { "gpfs",            { NULL }, 1191,  "tcp"  },
-  { "gpfs",            { NULL }, 1191,  "udp"  },
-  { "caids-sensor",    { NULL }, 1192,  "tcp"  },
-  { "caids-sensor",    { NULL }, 1192,  "udp"  },
-  { "fiveacross",      { NULL }, 1193,  "tcp"  },
-  { "fiveacross",      { NULL }, 1193,  "udp"  },
-  { "openvpn",         { NULL }, 1194,  "tcp"  },
-  { "openvpn",         { NULL }, 1194,  "udp"  },
-  { "rsf-1",           { NULL }, 1195,  "tcp"  },
-  { "rsf-1",           { NULL }, 1195,  "udp"  },
-  { "netmagic",        { NULL }, 1196,  "tcp"  },
-  { "netmagic",        { NULL }, 1196,  "udp"  },
-  { "carrius-rshell",  { NULL }, 1197,  "tcp"  },
-  { "carrius-rshell",  { NULL }, 1197,  "udp"  },
-  { "cajo-discovery",  { NULL }, 1198,  "tcp"  },
-  { "cajo-discovery",  { NULL }, 1198,  "udp"  },
-  { "dmidi",           { NULL }, 1199,  "tcp"  },
-  { "dmidi",           { NULL }, 1199,  "udp"  },
-  { "scol",            { NULL }, 1200,  "tcp"  },
-  { "scol",            { NULL }, 1200,  "udp"  },
-  { "nucleus-sand",    { NULL }, 1201,  "tcp"  },
-  { "nucleus-sand",    { NULL }, 1201,  "udp"  },
-  { "caiccipc",        { NULL }, 1202,  "tcp"  },
-  { "caiccipc",        { NULL }, 1202,  "udp"  },
-  { "ssslic-mgr",      { NULL }, 1203,  "tcp"  },
-  { "ssslic-mgr",      { NULL }, 1203,  "udp"  },
-  { "ssslog-mgr",      { NULL }, 1204,  "tcp"  },
-  { "ssslog-mgr",      { NULL }, 1204,  "udp"  },
-  { "accord-mgc",      { NULL }, 1205,  "tcp"  },
-  { "accord-mgc",      { NULL }, 1205,  "udp"  },
-  { "anthony-data",    { NULL }, 1206,  "tcp"  },
-  { "anthony-data",    { NULL }, 1206,  "udp"  },
-  { "metasage",        { NULL }, 1207,  "tcp"  },
-  { "metasage",        { NULL }, 1207,  "udp"  },
-  { "seagull-ais",     { NULL }, 1208,  "tcp"  },
-  { "seagull-ais",     { NULL }, 1208,  "udp"  },
-  { "ipcd3",           { NULL }, 1209,  "tcp"  },
-  { "ipcd3",           { NULL }, 1209,  "udp"  },
-  { "eoss",            { NULL }, 1210,  "tcp"  },
-  { "eoss",            { NULL }, 1210,  "udp"  },
-  { "groove-dpp",      { NULL }, 1211,  "tcp"  },
-  { "groove-dpp",      { NULL }, 1211,  "udp"  },
-  { "lupa",            { NULL }, 1212,  "tcp"  },
-  { "lupa",            { NULL }, 1212,  "udp"  },
-  { "mpc-lifenet",     { NULL }, 1213,  "tcp"  },
-  { "mpc-lifenet",     { NULL }, 1213,  "udp"  },
-  { "kazaa",           { NULL }, 1214,  "tcp"  },
-  { "kazaa",           { NULL }, 1214,  "udp"  },
-  { "scanstat-1",      { NULL }, 1215,  "tcp"  },
-  { "scanstat-1",      { NULL }, 1215,  "udp"  },
-  { "etebac5",         { NULL }, 1216,  "tcp"  },
-  { "etebac5",         { NULL }, 1216,  "udp"  },
-  { "hpss-ndapi",      { NULL }, 1217,  "tcp"  },
-  { "hpss-ndapi",      { NULL }, 1217,  "udp"  },
-  { "aeroflight-ads",  { NULL }, 1218,  "tcp"  },
-  { "aeroflight-ads",  { NULL }, 1218,  "udp"  },
-  { "aeroflight-ret",  { NULL }, 1219,  "tcp"  },
-  { "aeroflight-ret",  { NULL }, 1219,  "udp"  },
-  { "qt-serveradmin",  { NULL }, 1220,  "tcp"  },
-  { "qt-serveradmin",  { NULL }, 1220,  "udp"  },
-  { "sweetware-apps",  { NULL }, 1221,  "tcp"  },
-  { "sweetware-apps",  { NULL }, 1221,  "udp"  },
-  { "nerv",            { NULL }, 1222,  "tcp"  },
-  { "nerv",            { NULL }, 1222,  "udp"  },
-  { "tgp",             { NULL }, 1223,  "tcp"  },
-  { "tgp",             { NULL }, 1223,  "udp"  },
-  { "vpnz",            { NULL }, 1224,  "tcp"  },
-  { "vpnz",            { NULL }, 1224,  "udp"  },
-  { "slinkysearch",    { NULL }, 1225,  "tcp"  },
-  { "slinkysearch",    { NULL }, 1225,  "udp"  },
-  { "stgxfws",         { NULL }, 1226,  "tcp"  },
-  { "stgxfws",         { NULL }, 1226,  "udp"  },
-  { "dns2go",          { NULL }, 1227,  "tcp"  },
-  { "dns2go",          { NULL }, 1227,  "udp"  },
-  { "florence",        { NULL }, 1228,  "tcp"  },
-  { "florence",        { NULL }, 1228,  "udp"  },
-  { "zented",          { NULL }, 1229,  "tcp"  },
-  { "zented",          { NULL }, 1229,  "udp"  },
-  { "periscope",       { NULL }, 1230,  "tcp"  },
-  { "periscope",       { NULL }, 1230,  "udp"  },
-  { "menandmice-lpm",  { NULL }, 1231,  "tcp"  },
-  { "menandmice-lpm",  { NULL }, 1231,  "udp"  },
-  { "univ-appserver",  { NULL }, 1233,  "tcp"  },
-  { "univ-appserver",  { NULL }, 1233,  "udp"  },
-  { "search-agent",    { NULL }, 1234,  "tcp"  },
-  { "search-agent",    { NULL }, 1234,  "udp"  },
-  { "mosaicsyssvc1",   { NULL }, 1235,  "tcp"  },
-  { "mosaicsyssvc1",   { NULL }, 1235,  "udp"  },
-  { "bvcontrol",       { NULL }, 1236,  "tcp"  },
-  { "bvcontrol",       { NULL }, 1236,  "udp"  },
-  { "tsdos390",        { NULL }, 1237,  "tcp"  },
-  { "tsdos390",        { NULL }, 1237,  "udp"  },
-  { "hacl-qs",         { NULL }, 1238,  "tcp"  },
-  { "hacl-qs",         { NULL }, 1238,  "udp"  },
-  { "nmsd",            { NULL }, 1239,  "tcp"  },
-  { "nmsd",            { NULL }, 1239,  "udp"  },
-  { "instantia",       { NULL }, 1240,  "tcp"  },
-  { "instantia",       { NULL }, 1240,  "udp"  },
-  { "nessus",          { NULL }, 1241,  "tcp"  },
-  { "nessus",          { NULL }, 1241,  "udp"  },
-  { "nmasoverip",      { NULL }, 1242,  "tcp"  },
-  { "nmasoverip",      { NULL }, 1242,  "udp"  },
-  { "serialgateway",   { NULL }, 1243,  "tcp"  },
-  { "serialgateway",   { NULL }, 1243,  "udp"  },
-  { "isbconference1",  { NULL }, 1244,  "tcp"  },
-  { "isbconference1",  { NULL }, 1244,  "udp"  },
-  { "isbconference2",  { NULL }, 1245,  "tcp"  },
-  { "isbconference2",  { NULL }, 1245,  "udp"  },
-  { "payrouter",       { NULL }, 1246,  "tcp"  },
-  { "payrouter",       { NULL }, 1246,  "udp"  },
-  { "visionpyramid",   { NULL }, 1247,  "tcp"  },
-  { "visionpyramid",   { NULL }, 1247,  "udp"  },
-  { "hermes",          { NULL }, 1248,  "tcp"  },
-  { "hermes",          { NULL }, 1248,  "udp"  },
-  { "mesavistaco",     { NULL }, 1249,  "tcp"  },
-  { "mesavistaco",     { NULL }, 1249,  "udp"  },
-  { "swldy-sias",      { NULL }, 1250,  "tcp"  },
-  { "swldy-sias",      { NULL }, 1250,  "udp"  },
-  { "servergraph",     { NULL }, 1251,  "tcp"  },
-  { "servergraph",     { NULL }, 1251,  "udp"  },
-  { "bspne-pcc",       { NULL }, 1252,  "tcp"  },
-  { "bspne-pcc",       { NULL }, 1252,  "udp"  },
-  { "q55-pcc",         { NULL }, 1253,  "tcp"  },
-  { "q55-pcc",         { NULL }, 1253,  "udp"  },
-  { "de-noc",          { NULL }, 1254,  "tcp"  },
-  { "de-noc",          { NULL }, 1254,  "udp"  },
-  { "de-cache-query",  { NULL }, 1255,  "tcp"  },
-  { "de-cache-query",  { NULL }, 1255,  "udp"  },
-  { "de-server",       { NULL }, 1256,  "tcp"  },
-  { "de-server",       { NULL }, 1256,  "udp"  },
-  { "shockwave2",      { NULL }, 1257,  "tcp"  },
-  { "shockwave2",      { NULL }, 1257,  "udp"  },
-  { "opennl",          { NULL }, 1258,  "tcp"  },
-  { "opennl",          { NULL }, 1258,  "udp"  },
-  { "opennl-voice",    { NULL }, 1259,  "tcp"  },
-  { "opennl-voice",    { NULL }, 1259,  "udp"  },
-  { "ibm-ssd",         { NULL }, 1260,  "tcp"  },
-  { "ibm-ssd",         { NULL }, 1260,  "udp"  },
-  { "mpshrsv",         { NULL }, 1261,  "tcp"  },
-  { "mpshrsv",         { NULL }, 1261,  "udp"  },
-  { "qnts-orb",        { NULL }, 1262,  "tcp"  },
-  { "qnts-orb",        { NULL }, 1262,  "udp"  },
-  { "dka",             { NULL }, 1263,  "tcp"  },
-  { "dka",             { NULL }, 1263,  "udp"  },
-  { "prat",            { NULL }, 1264,  "tcp"  },
-  { "prat",            { NULL }, 1264,  "udp"  },
-  { "dssiapi",         { NULL }, 1265,  "tcp"  },
-  { "dssiapi",         { NULL }, 1265,  "udp"  },
-  { "dellpwrappks",    { NULL }, 1266,  "tcp"  },
-  { "dellpwrappks",    { NULL }, 1266,  "udp"  },
-  { "epc",             { NULL }, 1267,  "tcp"  },
-  { "epc",             { NULL }, 1267,  "udp"  },
-  { "propel-msgsys",   { NULL }, 1268,  "tcp"  },
-  { "propel-msgsys",   { NULL }, 1268,  "udp"  },
-  { "watilapp",        { NULL }, 1269,  "tcp"  },
-  { "watilapp",        { NULL }, 1269,  "udp"  },
-  { "opsmgr",          { NULL }, 1270,  "tcp"  },
-  { "opsmgr",          { NULL }, 1270,  "udp"  },
-  { "excw",            { NULL }, 1271,  "tcp"  },
-  { "excw",            { NULL }, 1271,  "udp"  },
-  { "cspmlockmgr",     { NULL }, 1272,  "tcp"  },
-  { "cspmlockmgr",     { NULL }, 1272,  "udp"  },
-  { "emc-gateway",     { NULL }, 1273,  "tcp"  },
-  { "emc-gateway",     { NULL }, 1273,  "udp"  },
-  { "t1distproc",      { NULL }, 1274,  "tcp"  },
-  { "t1distproc",      { NULL }, 1274,  "udp"  },
-  { "ivcollector",     { NULL }, 1275,  "tcp"  },
-  { "ivcollector",     { NULL }, 1275,  "udp"  },
-  { "ivmanager",       { NULL }, 1276,  "tcp"  },
-  { "ivmanager",       { NULL }, 1276,  "udp"  },
-  { "miva-mqs",        { NULL }, 1277,  "tcp"  },
-  { "miva-mqs",        { NULL }, 1277,  "udp"  },
-  { "dellwebadmin-1",  { NULL }, 1278,  "tcp"  },
-  { "dellwebadmin-1",  { NULL }, 1278,  "udp"  },
-  { "dellwebadmin-2",  { NULL }, 1279,  "tcp"  },
-  { "dellwebadmin-2",  { NULL }, 1279,  "udp"  },
-  { "pictrography",    { NULL }, 1280,  "tcp"  },
-  { "pictrography",    { NULL }, 1280,  "udp"  },
-  { "healthd",         { NULL }, 1281,  "tcp"  },
-  { "healthd",         { NULL }, 1281,  "udp"  },
-  { "emperion",        { NULL }, 1282,  "tcp"  },
-  { "emperion",        { NULL }, 1282,  "udp"  },
-  { "productinfo",     { NULL }, 1283,  "tcp"  },
-  { "productinfo",     { NULL }, 1283,  "udp"  },
-  { "iee-qfx",         { NULL }, 1284,  "tcp"  },
-  { "iee-qfx",         { NULL }, 1284,  "udp"  },
-  { "neoiface",        { NULL }, 1285,  "tcp"  },
-  { "neoiface",        { NULL }, 1285,  "udp"  },
-  { "netuitive",       { NULL }, 1286,  "tcp"  },
-  { "netuitive",       { NULL }, 1286,  "udp"  },
-  { "routematch",      { NULL }, 1287,  "tcp"  },
-  { "routematch",      { NULL }, 1287,  "udp"  },
-  { "navbuddy",        { NULL }, 1288,  "tcp"  },
-  { "navbuddy",        { NULL }, 1288,  "udp"  },
-  { "jwalkserver",     { NULL }, 1289,  "tcp"  },
-  { "jwalkserver",     { NULL }, 1289,  "udp"  },
-  { "winjaserver",     { NULL }, 1290,  "tcp"  },
-  { "winjaserver",     { NULL }, 1290,  "udp"  },
-  { "seagulllms",      { NULL }, 1291,  "tcp"  },
-  { "seagulllms",      { NULL }, 1291,  "udp"  },
-  { "dsdn",            { NULL }, 1292,  "tcp"  },
-  { "dsdn",            { NULL }, 1292,  "udp"  },
-  { "pkt-krb-ipsec",   { NULL }, 1293,  "tcp"  },
-  { "pkt-krb-ipsec",   { NULL }, 1293,  "udp"  },
-  { "cmmdriver",       { NULL }, 1294,  "tcp"  },
-  { "cmmdriver",       { NULL }, 1294,  "udp"  },
-  { "ehtp",            { NULL }, 1295,  "tcp"  },
-  { "ehtp",            { NULL }, 1295,  "udp"  },
-  { "dproxy",          { NULL }, 1296,  "tcp"  },
-  { "dproxy",          { NULL }, 1296,  "udp"  },
-  { "sdproxy",         { NULL }, 1297,  "tcp"  },
-  { "sdproxy",         { NULL }, 1297,  "udp"  },
-  { "lpcp",            { NULL }, 1298,  "tcp"  },
-  { "lpcp",            { NULL }, 1298,  "udp"  },
-  { "hp-sci",          { NULL }, 1299,  "tcp"  },
-  { "hp-sci",          { NULL }, 1299,  "udp"  },
-  { "h323hostcallsc",  { NULL }, 1300,  "tcp"  },
-  { "h323hostcallsc",  { NULL }, 1300,  "udp"  },
-  { "ci3-software-1",  { NULL }, 1301,  "tcp"  },
-  { "ci3-software-1",  { NULL }, 1301,  "udp"  },
-  { "ci3-software-2",  { NULL }, 1302,  "tcp"  },
-  { "ci3-software-2",  { NULL }, 1302,  "udp"  },
-  { "sftsrv",          { NULL }, 1303,  "tcp"  },
-  { "sftsrv",          { NULL }, 1303,  "udp"  },
-  { "boomerang",       { NULL }, 1304,  "tcp"  },
-  { "boomerang",       { NULL }, 1304,  "udp"  },
-  { "pe-mike",         { NULL }, 1305,  "tcp"  },
-  { "pe-mike",         { NULL }, 1305,  "udp"  },
-  { "re-conn-proto",   { NULL }, 1306,  "tcp"  },
-  { "re-conn-proto",   { NULL }, 1306,  "udp"  },
-  { "pacmand",         { NULL }, 1307,  "tcp"  },
-  { "pacmand",         { NULL }, 1307,  "udp"  },
-  { "odsi",            { NULL }, 1308,  "tcp"  },
-  { "odsi",            { NULL }, 1308,  "udp"  },
-  { "jtag-server",     { NULL }, 1309,  "tcp"  },
-  { "jtag-server",     { NULL }, 1309,  "udp"  },
-  { "husky",           { NULL }, 1310,  "tcp"  },
-  { "husky",           { NULL }, 1310,  "udp"  },
-  { "rxmon",           { NULL }, 1311,  "tcp"  },
-  { "rxmon",           { NULL }, 1311,  "udp"  },
-  { "sti-envision",    { NULL }, 1312,  "tcp"  },
-  { "sti-envision",    { NULL }, 1312,  "udp"  },
-  { "bmc_patroldb",    { NULL }, 1313,  "tcp"  },
-  { "bmc_patroldb",    { NULL }, 1313,  "udp"  },
-  { "pdps",            { NULL }, 1314,  "tcp"  },
-  { "pdps",            { NULL }, 1314,  "udp"  },
-  { "els",             { NULL }, 1315,  "tcp"  },
-  { "els",             { NULL }, 1315,  "udp"  },
-  { "exbit-escp",      { NULL }, 1316,  "tcp"  },
-  { "exbit-escp",      { NULL }, 1316,  "udp"  },
-  { "vrts-ipcserver",  { NULL }, 1317,  "tcp"  },
-  { "vrts-ipcserver",  { NULL }, 1317,  "udp"  },
-  { "krb5gatekeeper",  { NULL }, 1318,  "tcp"  },
-  { "krb5gatekeeper",  { NULL }, 1318,  "udp"  },
-  { "amx-icsp",        { NULL }, 1319,  "tcp"  },
-  { "amx-icsp",        { NULL }, 1319,  "udp"  },
-  { "amx-axbnet",      { NULL }, 1320,  "tcp"  },
-  { "amx-axbnet",      { NULL }, 1320,  "udp"  },
-  { "pip",             { NULL }, 1321,  "tcp"  },
-  { "pip",             { NULL }, 1321,  "udp"  },
-  { "novation",        { NULL }, 1322,  "tcp"  },
-  { "novation",        { NULL }, 1322,  "udp"  },
-  { "brcd",            { NULL }, 1323,  "tcp"  },
-  { "brcd",            { NULL }, 1323,  "udp"  },
-  { "delta-mcp",       { NULL }, 1324,  "tcp"  },
-  { "delta-mcp",       { NULL }, 1324,  "udp"  },
-  { "dx-instrument",   { NULL }, 1325,  "tcp"  },
-  { "dx-instrument",   { NULL }, 1325,  "udp"  },
-  { "wimsic",          { NULL }, 1326,  "tcp"  },
-  { "wimsic",          { NULL }, 1326,  "udp"  },
-  { "ultrex",          { NULL }, 1327,  "tcp"  },
-  { "ultrex",          { NULL }, 1327,  "udp"  },
-  { "ewall",           { NULL }, 1328,  "tcp"  },
-  { "ewall",           { NULL }, 1328,  "udp"  },
-  { "netdb-export",    { NULL }, 1329,  "tcp"  },
-  { "netdb-export",    { NULL }, 1329,  "udp"  },
-  { "streetperfect",   { NULL }, 1330,  "tcp"  },
-  { "streetperfect",   { NULL }, 1330,  "udp"  },
-  { "intersan",        { NULL }, 1331,  "tcp"  },
-  { "intersan",        { NULL }, 1331,  "udp"  },
-  { "pcia-rxp-b",      { NULL }, 1332,  "tcp"  },
-  { "pcia-rxp-b",      { NULL }, 1332,  "udp"  },
-  { "passwrd-policy",  { NULL }, 1333,  "tcp"  },
-  { "passwrd-policy",  { NULL }, 1333,  "udp"  },
-  { "writesrv",        { NULL }, 1334,  "tcp"  },
-  { "writesrv",        { NULL }, 1334,  "udp"  },
-  { "digital-notary",  { NULL }, 1335,  "tcp"  },
-  { "digital-notary",  { NULL }, 1335,  "udp"  },
-  { "ischat",          { NULL }, 1336,  "tcp"  },
-  { "ischat",          { NULL }, 1336,  "udp"  },
-  { "menandmice-dns",  { NULL }, 1337,  "tcp"  },
-  { "menandmice-dns",  { NULL }, 1337,  "udp"  },
-  { "wmc-log-svc",     { NULL }, 1338,  "tcp"  },
-  { "wmc-log-svc",     { NULL }, 1338,  "udp"  },
-  { "kjtsiteserver",   { NULL }, 1339,  "tcp"  },
-  { "kjtsiteserver",   { NULL }, 1339,  "udp"  },
-  { "naap",            { NULL }, 1340,  "tcp"  },
-  { "naap",            { NULL }, 1340,  "udp"  },
-  { "qubes",           { NULL }, 1341,  "tcp"  },
-  { "qubes",           { NULL }, 1341,  "udp"  },
-  { "esbroker",        { NULL }, 1342,  "tcp"  },
-  { "esbroker",        { NULL }, 1342,  "udp"  },
-  { "re101",           { NULL }, 1343,  "tcp"  },
-  { "re101",           { NULL }, 1343,  "udp"  },
-  { "icap",            { NULL }, 1344,  "tcp"  },
-  { "icap",            { NULL }, 1344,  "udp"  },
-  { "vpjp",            { NULL }, 1345,  "tcp"  },
-  { "vpjp",            { NULL }, 1345,  "udp"  },
-  { "alta-ana-lm",     { NULL }, 1346,  "tcp"  },
-  { "alta-ana-lm",     { NULL }, 1346,  "udp"  },
-  { "bbn-mmc",         { NULL }, 1347,  "tcp"  },
-  { "bbn-mmc",         { NULL }, 1347,  "udp"  },
-  { "bbn-mmx",         { NULL }, 1348,  "tcp"  },
-  { "bbn-mmx",         { NULL }, 1348,  "udp"  },
-  { "sbook",           { NULL }, 1349,  "tcp"  },
-  { "sbook",           { NULL }, 1349,  "udp"  },
-  { "editbench",       { NULL }, 1350,  "tcp"  },
-  { "editbench",       { NULL }, 1350,  "udp"  },
-  { "equationbuilder", { NULL }, 1351,  "tcp"  },
-  { "equationbuilder", { NULL }, 1351,  "udp"  },
-  { "lotusnote",       { NULL }, 1352,  "tcp"  },
-  { "lotusnote",       { NULL }, 1352,  "udp"  },
-  { "relief",          { NULL }, 1353,  "tcp"  },
-  { "relief",          { NULL }, 1353,  "udp"  },
-  { "XSIP-network",    { NULL }, 1354,  "tcp"  },
-  { "XSIP-network",    { NULL }, 1354,  "udp"  },
-  { "intuitive-edge",  { NULL }, 1355,  "tcp"  },
-  { "intuitive-edge",  { NULL }, 1355,  "udp"  },
-  { "cuillamartin",    { NULL }, 1356,  "tcp"  },
-  { "cuillamartin",    { NULL }, 1356,  "udp"  },
-  { "pegboard",        { NULL }, 1357,  "tcp"  },
-  { "pegboard",        { NULL }, 1357,  "udp"  },
-  { "connlcli",        { NULL }, 1358,  "tcp"  },
-  { "connlcli",        { NULL }, 1358,  "udp"  },
-  { "ftsrv",           { NULL }, 1359,  "tcp"  },
-  { "ftsrv",           { NULL }, 1359,  "udp"  },
-  { "mimer",           { NULL }, 1360,  "tcp"  },
-  { "mimer",           { NULL }, 1360,  "udp"  },
-  { "linx",            { NULL }, 1361,  "tcp"  },
-  { "linx",            { NULL }, 1361,  "udp"  },
-  { "timeflies",       { NULL }, 1362,  "tcp"  },
-  { "timeflies",       { NULL }, 1362,  "udp"  },
-  { "ndm-requester",   { NULL }, 1363,  "tcp"  },
-  { "ndm-requester",   { NULL }, 1363,  "udp"  },
-  { "ndm-server",      { NULL }, 1364,  "tcp"  },
-  { "ndm-server",      { NULL }, 1364,  "udp"  },
-  { "adapt-sna",       { NULL }, 1365,  "tcp"  },
-  { "adapt-sna",       { NULL }, 1365,  "udp"  },
-  { "netware-csp",     { NULL }, 1366,  "tcp"  },
-  { "netware-csp",     { NULL }, 1366,  "udp"  },
-  { "dcs",             { NULL }, 1367,  "tcp"  },
-  { "dcs",             { NULL }, 1367,  "udp"  },
-  { "screencast",      { NULL }, 1368,  "tcp"  },
-  { "screencast",      { NULL }, 1368,  "udp"  },
-  { "gv-us",           { NULL }, 1369,  "tcp"  },
-  { "gv-us",           { NULL }, 1369,  "udp"  },
-  { "us-gv",           { NULL }, 1370,  "tcp"  },
-  { "us-gv",           { NULL }, 1370,  "udp"  },
-  { "fc-cli",          { NULL }, 1371,  "tcp"  },
-  { "fc-cli",          { NULL }, 1371,  "udp"  },
-  { "fc-ser",          { NULL }, 1372,  "tcp"  },
-  { "fc-ser",          { NULL }, 1372,  "udp"  },
-  { "chromagrafx",     { NULL }, 1373,  "tcp"  },
-  { "chromagrafx",     { NULL }, 1373,  "udp"  },
-  { "molly",           { NULL }, 1374,  "tcp"  },
-  { "molly",           { NULL }, 1374,  "udp"  },
-  { "bytex",           { NULL }, 1375,  "tcp"  },
-  { "bytex",           { NULL }, 1375,  "udp"  },
-  { "ibm-pps",         { NULL }, 1376,  "tcp"  },
-  { "ibm-pps",         { NULL }, 1376,  "udp"  },
-  { "cichlid",         { NULL }, 1377,  "tcp"  },
-  { "cichlid",         { NULL }, 1377,  "udp"  },
-  { "elan",            { NULL }, 1378,  "tcp"  },
-  { "elan",            { NULL }, 1378,  "udp"  },
-  { "dbreporter",      { NULL }, 1379,  "tcp"  },
-  { "dbreporter",      { NULL }, 1379,  "udp"  },
-  { "telesis-licman",  { NULL }, 1380,  "tcp"  },
-  { "telesis-licman",  { NULL }, 1380,  "udp"  },
-  { "apple-licman",    { NULL }, 1381,  "tcp"  },
-  { "apple-licman",    { NULL }, 1381,  "udp"  },
-  { "udt_os",          { NULL }, 1382,  "tcp"  },
-  { "udt_os",          { NULL }, 1382,  "udp"  },
-  { "gwha",            { NULL }, 1383,  "tcp"  },
-  { "gwha",            { NULL }, 1383,  "udp"  },
-  { "os-licman",       { NULL }, 1384,  "tcp"  },
-  { "os-licman",       { NULL }, 1384,  "udp"  },
-  { "atex_elmd",       { NULL }, 1385,  "tcp"  },
-  { "atex_elmd",       { NULL }, 1385,  "udp"  },
-  { "checksum",        { NULL }, 1386,  "tcp"  },
-  { "checksum",        { NULL }, 1386,  "udp"  },
-  { "cadsi-lm",        { NULL }, 1387,  "tcp"  },
-  { "cadsi-lm",        { NULL }, 1387,  "udp"  },
-  { "objective-dbc",   { NULL }, 1388,  "tcp"  },
-  { "objective-dbc",   { NULL }, 1388,  "udp"  },
-  { "iclpv-dm",        { NULL }, 1389,  "tcp"  },
-  { "iclpv-dm",        { NULL }, 1389,  "udp"  },
-  { "iclpv-sc",        { NULL }, 1390,  "tcp"  },
-  { "iclpv-sc",        { NULL }, 1390,  "udp"  },
-  { "iclpv-sas",       { NULL }, 1391,  "tcp"  },
-  { "iclpv-sas",       { NULL }, 1391,  "udp"  },
-  { "iclpv-pm",        { NULL }, 1392,  "tcp"  },
-  { "iclpv-pm",        { NULL }, 1392,  "udp"  },
-  { "iclpv-nls",       { NULL }, 1393,  "tcp"  },
-  { "iclpv-nls",       { NULL }, 1393,  "udp"  },
-  { "iclpv-nlc",       { NULL }, 1394,  "tcp"  },
-  { "iclpv-nlc",       { NULL }, 1394,  "udp"  },
-  { "iclpv-wsm",       { NULL }, 1395,  "tcp"  },
-  { "iclpv-wsm",       { NULL }, 1395,  "udp"  },
-  { "dvl-activemail",  { NULL }, 1396,  "tcp"  },
-  { "dvl-activemail",  { NULL }, 1396,  "udp"  },
-  { "audio-activmail", { NULL }, 1397,  "tcp"  },
-  { "audio-activmail", { NULL }, 1397,  "udp"  },
-  { "video-activmail", { NULL }, 1398,  "tcp"  },
-  { "video-activmail", { NULL }, 1398,  "udp"  },
-  { "cadkey-licman",   { NULL }, 1399,  "tcp"  },
-  { "cadkey-licman",   { NULL }, 1399,  "udp"  },
-  { "cadkey-tablet",   { NULL }, 1400,  "tcp"  },
-  { "cadkey-tablet",   { NULL }, 1400,  "udp"  },
-  { "goldleaf-licman", { NULL }, 1401,  "tcp"  },
-  { "goldleaf-licman", { NULL }, 1401,  "udp"  },
-  { "prm-sm-np",       { NULL }, 1402,  "tcp"  },
-  { "prm-sm-np",       { NULL }, 1402,  "udp"  },
-  { "prm-nm-np",       { NULL }, 1403,  "tcp"  },
-  { "prm-nm-np",       { NULL }, 1403,  "udp"  },
-  { "igi-lm",          { NULL }, 1404,  "tcp"  },
-  { "igi-lm",          { NULL }, 1404,  "udp"  },
-  { "ibm-res",         { NULL }, 1405,  "tcp"  },
-  { "ibm-res",         { NULL }, 1405,  "udp"  },
-  { "netlabs-lm",      { NULL }, 1406,  "tcp"  },
-  { "netlabs-lm",      { NULL }, 1406,  "udp"  },
-  { "dbsa-lm",         { NULL }, 1407,  "tcp"  },
-  { "dbsa-lm",         { NULL }, 1407,  "udp"  },
-  { "sophia-lm",       { NULL }, 1408,  "tcp"  },
-  { "sophia-lm",       { NULL }, 1408,  "udp"  },
-  { "here-lm",         { NULL }, 1409,  "tcp"  },
-  { "here-lm",         { NULL }, 1409,  "udp"  },
-  { "hiq",             { NULL }, 1410,  "tcp"  },
-  { "hiq",             { NULL }, 1410,  "udp"  },
-  { "af",              { NULL }, 1411,  "tcp"  },
-  { "af",              { NULL }, 1411,  "udp"  },
-  { "innosys",         { NULL }, 1412,  "tcp"  },
-  { "innosys",         { NULL }, 1412,  "udp"  },
-  { "innosys-acl",     { NULL }, 1413,  "tcp"  },
-  { "innosys-acl",     { NULL }, 1413,  "udp"  },
-  { "ibm-mqseries",    { NULL }, 1414,  "tcp"  },
-  { "ibm-mqseries",    { NULL }, 1414,  "udp"  },
-  { "dbstar",          { NULL }, 1415,  "tcp"  },
-  { "dbstar",          { NULL }, 1415,  "udp"  },
-  { "novell-lu6.2",    { NULL }, 1416,  "tcp"  },
-  { "novell-lu6.2",    { NULL }, 1416,  "udp"  },
-  { "timbuktu-srv1",   { NULL }, 1417,  "tcp"  },
-  { "timbuktu-srv1",   { NULL }, 1417,  "udp"  },
-  { "timbuktu-srv2",   { NULL }, 1418,  "tcp"  },
-  { "timbuktu-srv2",   { NULL }, 1418,  "udp"  },
-  { "timbuktu-srv3",   { NULL }, 1419,  "tcp"  },
-  { "timbuktu-srv3",   { NULL }, 1419,  "udp"  },
-  { "timbuktu-srv4",   { NULL }, 1420,  "tcp"  },
-  { "timbuktu-srv4",   { NULL }, 1420,  "udp"  },
-  { "gandalf-lm",      { NULL }, 1421,  "tcp"  },
-  { "gandalf-lm",      { NULL }, 1421,  "udp"  },
-  { "autodesk-lm",     { NULL }, 1422,  "tcp"  },
-  { "autodesk-lm",     { NULL }, 1422,  "udp"  },
-  { "essbase",         { NULL }, 1423,  "tcp"  },
-  { "essbase",         { NULL }, 1423,  "udp"  },
-  { "hybrid",          { NULL }, 1424,  "tcp"  },
-  { "hybrid",          { NULL }, 1424,  "udp"  },
-  { "zion-lm",         { NULL }, 1425,  "tcp"  },
-  { "zion-lm",         { NULL }, 1425,  "udp"  },
-  { "sais",            { NULL }, 1426,  "tcp"  },
-  { "sais",            { NULL }, 1426,  "udp"  },
-  { "mloadd",          { NULL }, 1427,  "tcp"  },
-  { "mloadd",          { NULL }, 1427,  "udp"  },
-  { "informatik-lm",   { NULL }, 1428,  "tcp"  },
-  { "informatik-lm",   { NULL }, 1428,  "udp"  },
-  { "nms",             { NULL }, 1429,  "tcp"  },
-  { "nms",             { NULL }, 1429,  "udp"  },
-  { "tpdu",            { NULL }, 1430,  "tcp"  },
-  { "tpdu",            { NULL }, 1430,  "udp"  },
-  { "rgtp",            { NULL }, 1431,  "tcp"  },
-  { "rgtp",            { NULL }, 1431,  "udp"  },
-  { "blueberry-lm",    { NULL }, 1432,  "tcp"  },
-  { "blueberry-lm",    { NULL }, 1432,  "udp"  },
-  { "ms-sql-s",        { NULL }, 1433,  "tcp"  },
-  { "ms-sql-s",        { NULL }, 1433,  "udp"  },
-  { "ms-sql-m",        { NULL }, 1434,  "tcp"  },
-  { "ms-sql-m",        { NULL }, 1434,  "udp"  },
-  { "ibm-cics",        { NULL }, 1435,  "tcp"  },
-  { "ibm-cics",        { NULL }, 1435,  "udp"  },
-  { "saism",           { NULL }, 1436,  "tcp"  },
-  { "saism",           { NULL }, 1436,  "udp"  },
-  { "tabula",          { NULL }, 1437,  "tcp"  },
-  { "tabula",          { NULL }, 1437,  "udp"  },
-  { "eicon-server",    { NULL }, 1438,  "tcp"  },
-  { "eicon-server",    { NULL }, 1438,  "udp"  },
-  { "eicon-x25",       { NULL }, 1439,  "tcp"  },
-  { "eicon-x25",       { NULL }, 1439,  "udp"  },
-  { "eicon-slp",       { NULL }, 1440,  "tcp"  },
-  { "eicon-slp",       { NULL }, 1440,  "udp"  },
-  { "cadis-1",         { NULL }, 1441,  "tcp"  },
-  { "cadis-1",         { NULL }, 1441,  "udp"  },
-  { "cadis-2",         { NULL }, 1442,  "tcp"  },
-  { "cadis-2",         { NULL }, 1442,  "udp"  },
-  { "ies-lm",          { NULL }, 1443,  "tcp"  },
-  { "ies-lm",          { NULL }, 1443,  "udp"  },
-  { "marcam-lm",       { NULL }, 1444,  "tcp"  },
-  { "marcam-lm",       { NULL }, 1444,  "udp"  },
-  { "proxima-lm",      { NULL }, 1445,  "tcp"  },
-  { "proxima-lm",      { NULL }, 1445,  "udp"  },
-  { "ora-lm",          { NULL }, 1446,  "tcp"  },
-  { "ora-lm",          { NULL }, 1446,  "udp"  },
-  { "apri-lm",         { NULL }, 1447,  "tcp"  },
-  { "apri-lm",         { NULL }, 1447,  "udp"  },
-  { "oc-lm",           { NULL }, 1448,  "tcp"  },
-  { "oc-lm",           { NULL }, 1448,  "udp"  },
-  { "peport",          { NULL }, 1449,  "tcp"  },
-  { "peport",          { NULL }, 1449,  "udp"  },
-  { "dwf",             { NULL }, 1450,  "tcp"  },
-  { "dwf",             { NULL }, 1450,  "udp"  },
-  { "infoman",         { NULL }, 1451,  "tcp"  },
-  { "infoman",         { NULL }, 1451,  "udp"  },
-  { "gtegsc-lm",       { NULL }, 1452,  "tcp"  },
-  { "gtegsc-lm",       { NULL }, 1452,  "udp"  },
-  { "genie-lm",        { NULL }, 1453,  "tcp"  },
-  { "genie-lm",        { NULL }, 1453,  "udp"  },
-  { "interhdl_elmd",   { NULL }, 1454,  "tcp"  },
-  { "interhdl_elmd",   { NULL }, 1454,  "udp"  },
-  { "esl-lm",          { NULL }, 1455,  "tcp"  },
-  { "esl-lm",          { NULL }, 1455,  "udp"  },
-  { "dca",             { NULL }, 1456,  "tcp"  },
-  { "dca",             { NULL }, 1456,  "udp"  },
-  { "valisys-lm",      { NULL }, 1457,  "tcp"  },
-  { "valisys-lm",      { NULL }, 1457,  "udp"  },
-  { "nrcabq-lm",       { NULL }, 1458,  "tcp"  },
-  { "nrcabq-lm",       { NULL }, 1458,  "udp"  },
-  { "proshare1",       { NULL }, 1459,  "tcp"  },
-  { "proshare1",       { NULL }, 1459,  "udp"  },
-  { "proshare2",       { NULL }, 1460,  "tcp"  },
-  { "proshare2",       { NULL }, 1460,  "udp"  },
-  { "ibm_wrless_lan",  { NULL }, 1461,  "tcp"  },
-  { "ibm_wrless_lan",  { NULL }, 1461,  "udp"  },
-  { "world-lm",        { NULL }, 1462,  "tcp"  },
-  { "world-lm",        { NULL }, 1462,  "udp"  },
-  { "nucleus",         { NULL }, 1463,  "tcp"  },
-  { "nucleus",         { NULL }, 1463,  "udp"  },
-  { "msl_lmd",         { NULL }, 1464,  "tcp"  },
-  { "msl_lmd",         { NULL }, 1464,  "udp"  },
-  { "pipes",           { NULL }, 1465,  "tcp"  },
-  { "pipes",           { NULL }, 1465,  "udp"  },
-  { "oceansoft-lm",    { NULL }, 1466,  "tcp"  },
-  { "oceansoft-lm",    { NULL }, 1466,  "udp"  },
-  { "csdmbase",        { NULL }, 1467,  "tcp"  },
-  { "csdmbase",        { NULL }, 1467,  "udp"  },
-  { "csdm",            { NULL }, 1468,  "tcp"  },
-  { "csdm",            { NULL }, 1468,  "udp"  },
-  { "aal-lm",          { NULL }, 1469,  "tcp"  },
-  { "aal-lm",          { NULL }, 1469,  "udp"  },
-  { "uaiact",          { NULL }, 1470,  "tcp"  },
-  { "uaiact",          { NULL }, 1470,  "udp"  },
-  { "csdmbase",        { NULL }, 1471,  "tcp"  },
-  { "csdmbase",        { NULL }, 1471,  "udp"  },
-  { "csdm",            { NULL }, 1472,  "tcp"  },
-  { "csdm",            { NULL }, 1472,  "udp"  },
-  { "openmath",        { NULL }, 1473,  "tcp"  },
-  { "openmath",        { NULL }, 1473,  "udp"  },
-  { "telefinder",      { NULL }, 1474,  "tcp"  },
-  { "telefinder",      { NULL }, 1474,  "udp"  },
-  { "taligent-lm",     { NULL }, 1475,  "tcp"  },
-  { "taligent-lm",     { NULL }, 1475,  "udp"  },
-  { "clvm-cfg",        { NULL }, 1476,  "tcp"  },
-  { "clvm-cfg",        { NULL }, 1476,  "udp"  },
-  { "ms-sna-server",   { NULL }, 1477,  "tcp"  },
-  { "ms-sna-server",   { NULL }, 1477,  "udp"  },
-  { "ms-sna-base",     { NULL }, 1478,  "tcp"  },
-  { "ms-sna-base",     { NULL }, 1478,  "udp"  },
-  { "dberegister",     { NULL }, 1479,  "tcp"  },
-  { "dberegister",     { NULL }, 1479,  "udp"  },
-  { "pacerforum",      { NULL }, 1480,  "tcp"  },
-  { "pacerforum",      { NULL }, 1480,  "udp"  },
-  { "airs",            { NULL }, 1481,  "tcp"  },
-  { "airs",            { NULL }, 1481,  "udp"  },
-  { "miteksys-lm",     { NULL }, 1482,  "tcp"  },
-  { "miteksys-lm",     { NULL }, 1482,  "udp"  },
-  { "afs",             { NULL }, 1483,  "tcp"  },
-  { "afs",             { NULL }, 1483,  "udp"  },
-  { "confluent",       { NULL }, 1484,  "tcp"  },
-  { "confluent",       { NULL }, 1484,  "udp"  },
-  { "lansource",       { NULL }, 1485,  "tcp"  },
-  { "lansource",       { NULL }, 1485,  "udp"  },
-  { "nms_topo_serv",   { NULL }, 1486,  "tcp"  },
-  { "nms_topo_serv",   { NULL }, 1486,  "udp"  },
-  { "localinfosrvr",   { NULL }, 1487,  "tcp"  },
-  { "localinfosrvr",   { NULL }, 1487,  "udp"  },
-  { "docstor",         { NULL }, 1488,  "tcp"  },
-  { "docstor",         { NULL }, 1488,  "udp"  },
-  { "dmdocbroker",     { NULL }, 1489,  "tcp"  },
-  { "dmdocbroker",     { NULL }, 1489,  "udp"  },
-  { "insitu-conf",     { NULL }, 1490,  "tcp"  },
-  { "insitu-conf",     { NULL }, 1490,  "udp"  },
-  { "stone-design-1",  { NULL }, 1492,  "tcp"  },
-  { "stone-design-1",  { NULL }, 1492,  "udp"  },
-  { "netmap_lm",       { NULL }, 1493,  "tcp"  },
-  { "netmap_lm",       { NULL }, 1493,  "udp"  },
-  { "ica",             { NULL }, 1494,  "tcp"  },
-  { "ica",             { NULL }, 1494,  "udp"  },
-  { "cvc",             { NULL }, 1495,  "tcp"  },
-  { "cvc",             { NULL }, 1495,  "udp"  },
-  { "liberty-lm",      { NULL }, 1496,  "tcp"  },
-  { "liberty-lm",      { NULL }, 1496,  "udp"  },
-  { "rfx-lm",          { NULL }, 1497,  "tcp"  },
-  { "rfx-lm",          { NULL }, 1497,  "udp"  },
-  { "sybase-sqlany",   { NULL }, 1498,  "tcp"  },
-  { "sybase-sqlany",   { NULL }, 1498,  "udp"  },
-  { "fhc",             { NULL }, 1499,  "tcp"  },
-  { "fhc",             { NULL }, 1499,  "udp"  },
-  { "vlsi-lm",         { NULL }, 1500,  "tcp"  },
-  { "vlsi-lm",         { NULL }, 1500,  "udp"  },
-  { "saiscm",          { NULL }, 1501,  "tcp"  },
-  { "saiscm",          { NULL }, 1501,  "udp"  },
-  { "shivadiscovery",  { NULL }, 1502,  "tcp"  },
-  { "shivadiscovery",  { NULL }, 1502,  "udp"  },
-  { "imtc-mcs",        { NULL }, 1503,  "tcp"  },
-  { "imtc-mcs",        { NULL }, 1503,  "udp"  },
-  { "evb-elm",         { NULL }, 1504,  "tcp"  },
-  { "evb-elm",         { NULL }, 1504,  "udp"  },
-  { "funkproxy",       { NULL }, 1505,  "tcp"  },
-  { "funkproxy",       { NULL }, 1505,  "udp"  },
-  { "utcd",            { NULL }, 1506,  "tcp"  },
-  { "utcd",            { NULL }, 1506,  "udp"  },
-  { "symplex",         { NULL }, 1507,  "tcp"  },
-  { "symplex",         { NULL }, 1507,  "udp"  },
-  { "diagmond",        { NULL }, 1508,  "tcp"  },
-  { "diagmond",        { NULL }, 1508,  "udp"  },
-  { "robcad-lm",       { NULL }, 1509,  "tcp"  },
-  { "robcad-lm",       { NULL }, 1509,  "udp"  },
-  { "mvx-lm",          { NULL }, 1510,  "tcp"  },
-  { "mvx-lm",          { NULL }, 1510,  "udp"  },
-  { "3l-l1",           { NULL }, 1511,  "tcp"  },
-  { "3l-l1",           { NULL }, 1511,  "udp"  },
-  { "wins",            { NULL }, 1512,  "tcp"  },
-  { "wins",            { NULL }, 1512,  "udp"  },
-  { "fujitsu-dtc",     { NULL }, 1513,  "tcp"  },
-  { "fujitsu-dtc",     { NULL }, 1513,  "udp"  },
-  { "fujitsu-dtcns",   { NULL }, 1514,  "tcp"  },
-  { "fujitsu-dtcns",   { NULL }, 1514,  "udp"  },
-  { "ifor-protocol",   { NULL }, 1515,  "tcp"  },
-  { "ifor-protocol",   { NULL }, 1515,  "udp"  },
-  { "vpad",            { NULL }, 1516,  "tcp"  },
-  { "vpad",            { NULL }, 1516,  "udp"  },
-  { "vpac",            { NULL }, 1517,  "tcp"  },
-  { "vpac",            { NULL }, 1517,  "udp"  },
-  { "vpvd",            { NULL }, 1518,  "tcp"  },
-  { "vpvd",            { NULL }, 1518,  "udp"  },
-  { "vpvc",            { NULL }, 1519,  "tcp"  },
-  { "vpvc",            { NULL }, 1519,  "udp"  },
-  { "atm-zip-office",  { NULL }, 1520,  "tcp"  },
-  { "atm-zip-office",  { NULL }, 1520,  "udp"  },
-  { "ncube-lm",        { NULL }, 1521,  "tcp"  },
-  { "ncube-lm",        { NULL }, 1521,  "udp"  },
-  { "ricardo-lm",      { NULL }, 1522,  "tcp"  },
-  { "ricardo-lm",      { NULL }, 1522,  "udp"  },
-  { "cichild-lm",      { NULL }, 1523,  "tcp"  },
-  { "cichild-lm",      { NULL }, 1523,  "udp"  },
-  { "ingreslock",      { NULL }, 1524,  "tcp"  },
-  { "ingreslock",      { NULL }, 1524,  "udp"  },
-  { "orasrv",          { NULL }, 1525,  "tcp"  },
-  { "orasrv",          { NULL }, 1525,  "udp"  },
-  { "prospero-np",     { NULL }, 1525,  "tcp"  },
-  { "prospero-np",     { NULL }, 1525,  "udp"  },
-  { "pdap-np",         { NULL }, 1526,  "tcp"  },
-  { "pdap-np",         { NULL }, 1526,  "udp"  },
-  { "tlisrv",          { NULL }, 1527,  "tcp"  },
-  { "tlisrv",          { NULL }, 1527,  "udp"  },
-  { "coauthor",        { NULL }, 1529,  "tcp"  },
-  { "coauthor",        { NULL }, 1529,  "udp"  },
-  { "rap-service",     { NULL }, 1530,  "tcp"  },
-  { "rap-service",     { NULL }, 1530,  "udp"  },
-  { "rap-listen",      { NULL }, 1531,  "tcp"  },
-  { "rap-listen",      { NULL }, 1531,  "udp"  },
-  { "miroconnect",     { NULL }, 1532,  "tcp"  },
-  { "miroconnect",     { NULL }, 1532,  "udp"  },
-  { "virtual-places",  { NULL }, 1533,  "tcp"  },
-  { "virtual-places",  { NULL }, 1533,  "udp"  },
-  { "micromuse-lm",    { NULL }, 1534,  "tcp"  },
-  { "micromuse-lm",    { NULL }, 1534,  "udp"  },
-  { "ampr-info",       { NULL }, 1535,  "tcp"  },
-  { "ampr-info",       { NULL }, 1535,  "udp"  },
-  { "ampr-inter",      { NULL }, 1536,  "tcp"  },
-  { "ampr-inter",      { NULL }, 1536,  "udp"  },
-  { "sdsc-lm",         { NULL }, 1537,  "tcp"  },
-  { "sdsc-lm",         { NULL }, 1537,  "udp"  },
-  { "3ds-lm",          { NULL }, 1538,  "tcp"  },
-  { "3ds-lm",          { NULL }, 1538,  "udp"  },
-  { "intellistor-lm",  { NULL }, 1539,  "tcp"  },
-  { "intellistor-lm",  { NULL }, 1539,  "udp"  },
-  { "rds",             { NULL }, 1540,  "tcp"  },
-  { "rds",             { NULL }, 1540,  "udp"  },
-  { "rds2",            { NULL }, 1541,  "tcp"  },
-  { "rds2",            { NULL }, 1541,  "udp"  },
-  { "gridgen-elmd",    { NULL }, 1542,  "tcp"  },
-  { "gridgen-elmd",    { NULL }, 1542,  "udp"  },
-  { "simba-cs",        { NULL }, 1543,  "tcp"  },
-  { "simba-cs",        { NULL }, 1543,  "udp"  },
-  { "aspeclmd",        { NULL }, 1544,  "tcp"  },
-  { "aspeclmd",        { NULL }, 1544,  "udp"  },
-  { "vistium-share",   { NULL }, 1545,  "tcp"  },
-  { "vistium-share",   { NULL }, 1545,  "udp"  },
-  { "abbaccuray",      { NULL }, 1546,  "tcp"  },
-  { "abbaccuray",      { NULL }, 1546,  "udp"  },
-  { "laplink",         { NULL }, 1547,  "tcp"  },
-  { "laplink",         { NULL }, 1547,  "udp"  },
-  { "axon-lm",         { NULL }, 1548,  "tcp"  },
-  { "axon-lm",         { NULL }, 1548,  "udp"  },
-  { "shivahose",       { NULL }, 1549,  "tcp"  },
-  { "shivasound",      { NULL }, 1549,  "udp"  },
-  { "3m-image-lm",     { NULL }, 1550,  "tcp"  },
-  { "3m-image-lm",     { NULL }, 1550,  "udp"  },
-  { "hecmtl-db",       { NULL }, 1551,  "tcp"  },
-  { "hecmtl-db",       { NULL }, 1551,  "udp"  },
-  { "pciarray",        { NULL }, 1552,  "tcp"  },
-  { "pciarray",        { NULL }, 1552,  "udp"  },
-  { "sna-cs",          { NULL }, 1553,  "tcp"  },
-  { "sna-cs",          { NULL }, 1553,  "udp"  },
-  { "caci-lm",         { NULL }, 1554,  "tcp"  },
-  { "caci-lm",         { NULL }, 1554,  "udp"  },
-  { "livelan",         { NULL }, 1555,  "tcp"  },
-  { "livelan",         { NULL }, 1555,  "udp"  },
-  { "veritas_pbx",     { NULL }, 1556,  "tcp"  },
-  { "veritas_pbx",     { NULL }, 1556,  "udp"  },
-  { "arbortext-lm",    { NULL }, 1557,  "tcp"  },
-  { "arbortext-lm",    { NULL }, 1557,  "udp"  },
-  { "xingmpeg",        { NULL }, 1558,  "tcp"  },
-  { "xingmpeg",        { NULL }, 1558,  "udp"  },
-  { "web2host",        { NULL }, 1559,  "tcp"  },
-  { "web2host",        { NULL }, 1559,  "udp"  },
-  { "asci-val",        { NULL }, 1560,  "tcp"  },
-  { "asci-val",        { NULL }, 1560,  "udp"  },
-  { "facilityview",    { NULL }, 1561,  "tcp"  },
-  { "facilityview",    { NULL }, 1561,  "udp"  },
-  { "pconnectmgr",     { NULL }, 1562,  "tcp"  },
-  { "pconnectmgr",     { NULL }, 1562,  "udp"  },
-  { "cadabra-lm",      { NULL }, 1563,  "tcp"  },
-  { "cadabra-lm",      { NULL }, 1563,  "udp"  },
-  { "pay-per-view",    { NULL }, 1564,  "tcp"  },
-  { "pay-per-view",    { NULL }, 1564,  "udp"  },
-  { "winddlb",         { NULL }, 1565,  "tcp"  },
-  { "winddlb",         { NULL }, 1565,  "udp"  },
-  { "corelvideo",      { NULL }, 1566,  "tcp"  },
-  { "corelvideo",      { NULL }, 1566,  "udp"  },
-  { "jlicelmd",        { NULL }, 1567,  "tcp"  },
-  { "jlicelmd",        { NULL }, 1567,  "udp"  },
-  { "tsspmap",         { NULL }, 1568,  "tcp"  },
-  { "tsspmap",         { NULL }, 1568,  "udp"  },
-  { "ets",             { NULL }, 1569,  "tcp"  },
-  { "ets",             { NULL }, 1569,  "udp"  },
-  { "orbixd",          { NULL }, 1570,  "tcp"  },
-  { "orbixd",          { NULL }, 1570,  "udp"  },
-  { "rdb-dbs-disp",    { NULL }, 1571,  "tcp"  },
-  { "rdb-dbs-disp",    { NULL }, 1571,  "udp"  },
-  { "chip-lm",         { NULL }, 1572,  "tcp"  },
-  { "chip-lm",         { NULL }, 1572,  "udp"  },
-  { "itscomm-ns",      { NULL }, 1573,  "tcp"  },
-  { "itscomm-ns",      { NULL }, 1573,  "udp"  },
-  { "mvel-lm",         { NULL }, 1574,  "tcp"  },
-  { "mvel-lm",         { NULL }, 1574,  "udp"  },
-  { "oraclenames",     { NULL }, 1575,  "tcp"  },
-  { "oraclenames",     { NULL }, 1575,  "udp"  },
-  { "moldflow-lm",     { NULL }, 1576,  "tcp"  },
-  { "moldflow-lm",     { NULL }, 1576,  "udp"  },
-  { "hypercube-lm",    { NULL }, 1577,  "tcp"  },
-  { "hypercube-lm",    { NULL }, 1577,  "udp"  },
-  { "jacobus-lm",      { NULL }, 1578,  "tcp"  },
-  { "jacobus-lm",      { NULL }, 1578,  "udp"  },
-  { "ioc-sea-lm",      { NULL }, 1579,  "tcp"  },
-  { "ioc-sea-lm",      { NULL }, 1579,  "udp"  },
-  { "tn-tl-r1",        { NULL }, 1580,  "tcp"  },
-  { "tn-tl-r2",        { NULL }, 1580,  "udp"  },
-  { "mil-2045-47001",  { NULL }, 1581,  "tcp"  },
-  { "mil-2045-47001",  { NULL }, 1581,  "udp"  },
-  { "msims",           { NULL }, 1582,  "tcp"  },
-  { "msims",           { NULL }, 1582,  "udp"  },
-  { "simbaexpress",    { NULL }, 1583,  "tcp"  },
-  { "simbaexpress",    { NULL }, 1583,  "udp"  },
-  { "tn-tl-fd2",       { NULL }, 1584,  "tcp"  },
-  { "tn-tl-fd2",       { NULL }, 1584,  "udp"  },
-  { "intv",            { NULL }, 1585,  "tcp"  },
-  { "intv",            { NULL }, 1585,  "udp"  },
-  { "ibm-abtact",      { NULL }, 1586,  "tcp"  },
-  { "ibm-abtact",      { NULL }, 1586,  "udp"  },
-  { "pra_elmd",        { NULL }, 1587,  "tcp"  },
-  { "pra_elmd",        { NULL }, 1587,  "udp"  },
-  { "triquest-lm",     { NULL }, 1588,  "tcp"  },
-  { "triquest-lm",     { NULL }, 1588,  "udp"  },
-  { "vqp",             { NULL }, 1589,  "tcp"  },
-  { "vqp",             { NULL }, 1589,  "udp"  },
-  { "gemini-lm",       { NULL }, 1590,  "tcp"  },
-  { "gemini-lm",       { NULL }, 1590,  "udp"  },
-  { "ncpm-pm",         { NULL }, 1591,  "tcp"  },
-  { "ncpm-pm",         { NULL }, 1591,  "udp"  },
-  { "commonspace",     { NULL }, 1592,  "tcp"  },
-  { "commonspace",     { NULL }, 1592,  "udp"  },
-  { "mainsoft-lm",     { NULL }, 1593,  "tcp"  },
-  { "mainsoft-lm",     { NULL }, 1593,  "udp"  },
-  { "sixtrak",         { NULL }, 1594,  "tcp"  },
-  { "sixtrak",         { NULL }, 1594,  "udp"  },
-  { "radio",           { NULL }, 1595,  "tcp"  },
-  { "radio",           { NULL }, 1595,  "udp"  },
-  { "radio-sm",        { NULL }, 1596,  "tcp"  },
-  { "radio-bc",        { NULL }, 1596,  "udp"  },
-  { "orbplus-iiop",    { NULL }, 1597,  "tcp"  },
-  { "orbplus-iiop",    { NULL }, 1597,  "udp"  },
-  { "picknfs",         { NULL }, 1598,  "tcp"  },
-  { "picknfs",         { NULL }, 1598,  "udp"  },
-  { "simbaservices",   { NULL }, 1599,  "tcp"  },
-  { "simbaservices",   { NULL }, 1599,  "udp"  },
-  { "issd",            { NULL }, 1600,  "tcp"  },
-  { "issd",            { NULL }, 1600,  "udp"  },
-  { "aas",             { NULL }, 1601,  "tcp"  },
-  { "aas",             { NULL }, 1601,  "udp"  },
-  { "inspect",         { NULL }, 1602,  "tcp"  },
-  { "inspect",         { NULL }, 1602,  "udp"  },
-  { "picodbc",         { NULL }, 1603,  "tcp"  },
-  { "picodbc",         { NULL }, 1603,  "udp"  },
-  { "icabrowser",      { NULL }, 1604,  "tcp"  },
-  { "icabrowser",      { NULL }, 1604,  "udp"  },
-  { "slp",             { NULL }, 1605,  "tcp"  },
-  { "slp",             { NULL }, 1605,  "udp"  },
-  { "slm-api",         { NULL }, 1606,  "tcp"  },
-  { "slm-api",         { NULL }, 1606,  "udp"  },
-  { "stt",             { NULL }, 1607,  "tcp"  },
-  { "stt",             { NULL }, 1607,  "udp"  },
-  { "smart-lm",        { NULL }, 1608,  "tcp"  },
-  { "smart-lm",        { NULL }, 1608,  "udp"  },
-  { "isysg-lm",        { NULL }, 1609,  "tcp"  },
-  { "isysg-lm",        { NULL }, 1609,  "udp"  },
-  { "taurus-wh",       { NULL }, 1610,  "tcp"  },
-  { "taurus-wh",       { NULL }, 1610,  "udp"  },
-  { "ill",             { NULL }, 1611,  "tcp"  },
-  { "ill",             { NULL }, 1611,  "udp"  },
-  { "netbill-trans",   { NULL }, 1612,  "tcp"  },
-  { "netbill-trans",   { NULL }, 1612,  "udp"  },
-  { "netbill-keyrep",  { NULL }, 1613,  "tcp"  },
-  { "netbill-keyrep",  { NULL }, 1613,  "udp"  },
-  { "netbill-cred",    { NULL }, 1614,  "tcp"  },
-  { "netbill-cred",    { NULL }, 1614,  "udp"  },
-  { "netbill-auth",    { NULL }, 1615,  "tcp"  },
-  { "netbill-auth",    { NULL }, 1615,  "udp"  },
-  { "netbill-prod",    { NULL }, 1616,  "tcp"  },
-  { "netbill-prod",    { NULL }, 1616,  "udp"  },
-  { "nimrod-agent",    { NULL }, 1617,  "tcp"  },
-  { "nimrod-agent",    { NULL }, 1617,  "udp"  },
-  { "skytelnet",       { NULL }, 1618,  "tcp"  },
-  { "skytelnet",       { NULL }, 1618,  "udp"  },
-  { "xs-openstorage",  { NULL }, 1619,  "tcp"  },
-  { "xs-openstorage",  { NULL }, 1619,  "udp"  },
-  { "faxportwinport",  { NULL }, 1620,  "tcp"  },
-  { "faxportwinport",  { NULL }, 1620,  "udp"  },
-  { "softdataphone",   { NULL }, 1621,  "tcp"  },
-  { "softdataphone",   { NULL }, 1621,  "udp"  },
-  { "ontime",          { NULL }, 1622,  "tcp"  },
-  { "ontime",          { NULL }, 1622,  "udp"  },
-  { "jaleosnd",        { NULL }, 1623,  "tcp"  },
-  { "jaleosnd",        { NULL }, 1623,  "udp"  },
-  { "udp-sr-port",     { NULL }, 1624,  "tcp"  },
-  { "udp-sr-port",     { NULL }, 1624,  "udp"  },
-  { "svs-omagent",     { NULL }, 1625,  "tcp"  },
-  { "svs-omagent",     { NULL }, 1625,  "udp"  },
-  { "shockwave",       { NULL }, 1626,  "tcp"  },
-  { "shockwave",       { NULL }, 1626,  "udp"  },
-  { "t128-gateway",    { NULL }, 1627,  "tcp"  },
-  { "t128-gateway",    { NULL }, 1627,  "udp"  },
-  { "lontalk-norm",    { NULL }, 1628,  "tcp"  },
-  { "lontalk-norm",    { NULL }, 1628,  "udp"  },
-  { "lontalk-urgnt",   { NULL }, 1629,  "tcp"  },
-  { "lontalk-urgnt",   { NULL }, 1629,  "udp"  },
-  { "oraclenet8cman",  { NULL }, 1630,  "tcp"  },
-  { "oraclenet8cman",  { NULL }, 1630,  "udp"  },
-  { "visitview",       { NULL }, 1631,  "tcp"  },
-  { "visitview",       { NULL }, 1631,  "udp"  },
-  { "pammratc",        { NULL }, 1632,  "tcp"  },
-  { "pammratc",        { NULL }, 1632,  "udp"  },
-  { "pammrpc",         { NULL }, 1633,  "tcp"  },
-  { "pammrpc",         { NULL }, 1633,  "udp"  },
-  { "loaprobe",        { NULL }, 1634,  "tcp"  },
-  { "loaprobe",        { NULL }, 1634,  "udp"  },
-  { "edb-server1",     { NULL }, 1635,  "tcp"  },
-  { "edb-server1",     { NULL }, 1635,  "udp"  },
-  { "isdc",            { NULL }, 1636,  "tcp"  },
-  { "isdc",            { NULL }, 1636,  "udp"  },
-  { "islc",            { NULL }, 1637,  "tcp"  },
-  { "islc",            { NULL }, 1637,  "udp"  },
-  { "ismc",            { NULL }, 1638,  "tcp"  },
-  { "ismc",            { NULL }, 1638,  "udp"  },
-  { "cert-initiator",  { NULL }, 1639,  "tcp"  },
-  { "cert-initiator",  { NULL }, 1639,  "udp"  },
-  { "cert-responder",  { NULL }, 1640,  "tcp"  },
-  { "cert-responder",  { NULL }, 1640,  "udp"  },
-  { "invision",        { NULL }, 1641,  "tcp"  },
-  { "invision",        { NULL }, 1641,  "udp"  },
-  { "isis-am",         { NULL }, 1642,  "tcp"  },
-  { "isis-am",         { NULL }, 1642,  "udp"  },
-  { "isis-ambc",       { NULL }, 1643,  "tcp"  },
-  { "isis-ambc",       { NULL }, 1643,  "udp"  },
-  { "saiseh",          { NULL }, 1644,  "tcp"  },
-  { "sightline",       { NULL }, 1645,  "tcp"  },
-  { "sightline",       { NULL }, 1645,  "udp"  },
-  { "sa-msg-port",     { NULL }, 1646,  "tcp"  },
-  { "sa-msg-port",     { NULL }, 1646,  "udp"  },
-  { "rsap",            { NULL }, 1647,  "tcp"  },
-  { "rsap",            { NULL }, 1647,  "udp"  },
-  { "concurrent-lm",   { NULL }, 1648,  "tcp"  },
-  { "concurrent-lm",   { NULL }, 1648,  "udp"  },
-  { "kermit",          { NULL }, 1649,  "tcp"  },
-  { "kermit",          { NULL }, 1649,  "udp"  },
-  { "nkd",             { NULL }, 1650,  "tcp"  },
-  { "nkd",             { NULL }, 1650,  "udp"  },
-  { "shiva_confsrvr",  { NULL }, 1651,  "tcp"  },
-  { "shiva_confsrvr",  { NULL }, 1651,  "udp"  },
-  { "xnmp",            { NULL }, 1652,  "tcp"  },
-  { "xnmp",            { NULL }, 1652,  "udp"  },
-  { "alphatech-lm",    { NULL }, 1653,  "tcp"  },
-  { "alphatech-lm",    { NULL }, 1653,  "udp"  },
-  { "stargatealerts",  { NULL }, 1654,  "tcp"  },
-  { "stargatealerts",  { NULL }, 1654,  "udp"  },
-  { "dec-mbadmin",     { NULL }, 1655,  "tcp"  },
-  { "dec-mbadmin",     { NULL }, 1655,  "udp"  },
-  { "dec-mbadmin-h",   { NULL }, 1656,  "tcp"  },
-  { "dec-mbadmin-h",   { NULL }, 1656,  "udp"  },
-  { "fujitsu-mmpdc",   { NULL }, 1657,  "tcp"  },
-  { "fujitsu-mmpdc",   { NULL }, 1657,  "udp"  },
-  { "sixnetudr",       { NULL }, 1658,  "tcp"  },
-  { "sixnetudr",       { NULL }, 1658,  "udp"  },
-  { "sg-lm",           { NULL }, 1659,  "tcp"  },
-  { "sg-lm",           { NULL }, 1659,  "udp"  },
-  { "skip-mc-gikreq",  { NULL }, 1660,  "tcp"  },
-  { "skip-mc-gikreq",  { NULL }, 1660,  "udp"  },
-  { "netview-aix-1",   { NULL }, 1661,  "tcp"  },
-  { "netview-aix-1",   { NULL }, 1661,  "udp"  },
-  { "netview-aix-2",   { NULL }, 1662,  "tcp"  },
-  { "netview-aix-2",   { NULL }, 1662,  "udp"  },
-  { "netview-aix-3",   { NULL }, 1663,  "tcp"  },
-  { "netview-aix-3",   { NULL }, 1663,  "udp"  },
-  { "netview-aix-4",   { NULL }, 1664,  "tcp"  },
-  { "netview-aix-4",   { NULL }, 1664,  "udp"  },
-  { "netview-aix-5",   { NULL }, 1665,  "tcp"  },
-  { "netview-aix-5",   { NULL }, 1665,  "udp"  },
-  { "netview-aix-6",   { NULL }, 1666,  "tcp"  },
-  { "netview-aix-6",   { NULL }, 1666,  "udp"  },
-  { "netview-aix-7",   { NULL }, 1667,  "tcp"  },
-  { "netview-aix-7",   { NULL }, 1667,  "udp"  },
-  { "netview-aix-8",   { NULL }, 1668,  "tcp"  },
-  { "netview-aix-8",   { NULL }, 1668,  "udp"  },
-  { "netview-aix-9",   { NULL }, 1669,  "tcp"  },
-  { "netview-aix-9",   { NULL }, 1669,  "udp"  },
-  { "netview-aix-10",  { NULL }, 1670,  "tcp"  },
-  { "netview-aix-10",  { NULL }, 1670,  "udp"  },
-  { "netview-aix-11",  { NULL }, 1671,  "tcp"  },
-  { "netview-aix-11",  { NULL }, 1671,  "udp"  },
-  { "netview-aix-12",  { NULL }, 1672,  "tcp"  },
-  { "netview-aix-12",  { NULL }, 1672,  "udp"  },
-  { "proshare-mc-1",   { NULL }, 1673,  "tcp"  },
-  { "proshare-mc-1",   { NULL }, 1673,  "udp"  },
-  { "proshare-mc-2",   { NULL }, 1674,  "tcp"  },
-  { "proshare-mc-2",   { NULL }, 1674,  "udp"  },
-  { "pdp",             { NULL }, 1675,  "tcp"  },
-  { "pdp",             { NULL }, 1675,  "udp"  },
-  { "netcomm1",        { NULL }, 1676,  "tcp"  },
-  { "netcomm2",        { NULL }, 1676,  "udp"  },
-  { "groupwise",       { NULL }, 1677,  "tcp"  },
-  { "groupwise",       { NULL }, 1677,  "udp"  },
-  { "prolink",         { NULL }, 1678,  "tcp"  },
-  { "prolink",         { NULL }, 1678,  "udp"  },
-  { "darcorp-lm",      { NULL }, 1679,  "tcp"  },
-  { "darcorp-lm",      { NULL }, 1679,  "udp"  },
-  { "microcom-sbp",    { NULL }, 1680,  "tcp"  },
-  { "microcom-sbp",    { NULL }, 1680,  "udp"  },
-  { "sd-elmd",         { NULL }, 1681,  "tcp"  },
-  { "sd-elmd",         { NULL }, 1681,  "udp"  },
-  { "lanyon-lantern",  { NULL }, 1682,  "tcp"  },
-  { "lanyon-lantern",  { NULL }, 1682,  "udp"  },
-  { "ncpm-hip",        { NULL }, 1683,  "tcp"  },
-  { "ncpm-hip",        { NULL }, 1683,  "udp"  },
-  { "snaresecure",     { NULL }, 1684,  "tcp"  },
-  { "snaresecure",     { NULL }, 1684,  "udp"  },
-  { "n2nremote",       { NULL }, 1685,  "tcp"  },
-  { "n2nremote",       { NULL }, 1685,  "udp"  },
-  { "cvmon",           { NULL }, 1686,  "tcp"  },
-  { "cvmon",           { NULL }, 1686,  "udp"  },
-  { "nsjtp-ctrl",      { NULL }, 1687,  "tcp"  },
-  { "nsjtp-ctrl",      { NULL }, 1687,  "udp"  },
-  { "nsjtp-data",      { NULL }, 1688,  "tcp"  },
-  { "nsjtp-data",      { NULL }, 1688,  "udp"  },
-  { "firefox",         { NULL }, 1689,  "tcp"  },
-  { "firefox",         { NULL }, 1689,  "udp"  },
-  { "ng-umds",         { NULL }, 1690,  "tcp"  },
-  { "ng-umds",         { NULL }, 1690,  "udp"  },
-  { "empire-empuma",   { NULL }, 1691,  "tcp"  },
-  { "empire-empuma",   { NULL }, 1691,  "udp"  },
-  { "sstsys-lm",       { NULL }, 1692,  "tcp"  },
-  { "sstsys-lm",       { NULL }, 1692,  "udp"  },
-  { "rrirtr",          { NULL }, 1693,  "tcp"  },
-  { "rrirtr",          { NULL }, 1693,  "udp"  },
-  { "rrimwm",          { NULL }, 1694,  "tcp"  },
-  { "rrimwm",          { NULL }, 1694,  "udp"  },
-  { "rrilwm",          { NULL }, 1695,  "tcp"  },
-  { "rrilwm",          { NULL }, 1695,  "udp"  },
-  { "rrifmm",          { NULL }, 1696,  "tcp"  },
-  { "rrifmm",          { NULL }, 1696,  "udp"  },
-  { "rrisat",          { NULL }, 1697,  "tcp"  },
-  { "rrisat",          { NULL }, 1697,  "udp"  },
-  { "rsvp-encap-1",    { NULL }, 1698,  "tcp"  },
-  { "rsvp-encap-1",    { NULL }, 1698,  "udp"  },
-  { "rsvp-encap-2",    { NULL }, 1699,  "tcp"  },
-  { "rsvp-encap-2",    { NULL }, 1699,  "udp"  },
-  { "mps-raft",        { NULL }, 1700,  "tcp"  },
-  { "mps-raft",        { NULL }, 1700,  "udp"  },
-  { "l2f",             { NULL }, 1701,  "tcp"  },
-  { "l2f",             { NULL }, 1701,  "udp"  },
-  { "l2tp",            { NULL }, 1701,  "tcp"  },
-  { "l2tp",            { NULL }, 1701,  "udp"  },
-  { "deskshare",       { NULL }, 1702,  "tcp"  },
-  { "deskshare",       { NULL }, 1702,  "udp"  },
-  { "hb-engine",       { NULL }, 1703,  "tcp"  },
-  { "hb-engine",       { NULL }, 1703,  "udp"  },
-  { "bcs-broker",      { NULL }, 1704,  "tcp"  },
-  { "bcs-broker",      { NULL }, 1704,  "udp"  },
-  { "slingshot",       { NULL }, 1705,  "tcp"  },
-  { "slingshot",       { NULL }, 1705,  "udp"  },
-  { "jetform",         { NULL }, 1706,  "tcp"  },
-  { "jetform",         { NULL }, 1706,  "udp"  },
-  { "vdmplay",         { NULL }, 1707,  "tcp"  },
-  { "vdmplay",         { NULL }, 1707,  "udp"  },
-  { "gat-lmd",         { NULL }, 1708,  "tcp"  },
-  { "gat-lmd",         { NULL }, 1708,  "udp"  },
-  { "centra",          { NULL }, 1709,  "tcp"  },
-  { "centra",          { NULL }, 1709,  "udp"  },
-  { "impera",          { NULL }, 1710,  "tcp"  },
-  { "impera",          { NULL }, 1710,  "udp"  },
-  { "pptconference",   { NULL }, 1711,  "tcp"  },
-  { "pptconference",   { NULL }, 1711,  "udp"  },
-  { "registrar",       { NULL }, 1712,  "tcp"  },
-  { "registrar",       { NULL }, 1712,  "udp"  },
-  { "conferencetalk",  { NULL }, 1713,  "tcp"  },
-  { "conferencetalk",  { NULL }, 1713,  "udp"  },
-  { "sesi-lm",         { NULL }, 1714,  "tcp"  },
-  { "sesi-lm",         { NULL }, 1714,  "udp"  },
-  { "houdini-lm",      { NULL }, 1715,  "tcp"  },
-  { "houdini-lm",      { NULL }, 1715,  "udp"  },
-  { "xmsg",            { NULL }, 1716,  "tcp"  },
-  { "xmsg",            { NULL }, 1716,  "udp"  },
-  { "fj-hdnet",        { NULL }, 1717,  "tcp"  },
-  { "fj-hdnet",        { NULL }, 1717,  "udp"  },
-  { "h323gatedisc",    { NULL }, 1718,  "tcp"  },
-  { "h323gatedisc",    { NULL }, 1718,  "udp"  },
-  { "h323gatestat",    { NULL }, 1719,  "tcp"  },
-  { "h323gatestat",    { NULL }, 1719,  "udp"  },
-  { "h323hostcall",    { NULL }, 1720,  "tcp"  },
-  { "h323hostcall",    { NULL }, 1720,  "udp"  },
-  { "caicci",          { NULL }, 1721,  "tcp"  },
-  { "caicci",          { NULL }, 1721,  "udp"  },
-  { "hks-lm",          { NULL }, 1722,  "tcp"  },
-  { "hks-lm",          { NULL }, 1722,  "udp"  },
-  { "pptp",            { NULL }, 1723,  "tcp"  },
-  { "pptp",            { NULL }, 1723,  "udp"  },
-  { "csbphonemaster",  { NULL }, 1724,  "tcp"  },
-  { "csbphonemaster",  { NULL }, 1724,  "udp"  },
-  { "iden-ralp",       { NULL }, 1725,  "tcp"  },
-  { "iden-ralp",       { NULL }, 1725,  "udp"  },
-  { "iberiagames",     { NULL }, 1726,  "tcp"  },
-  { "iberiagames",     { NULL }, 1726,  "udp"  },
-  { "winddx",          { NULL }, 1727,  "tcp"  },
-  { "winddx",          { NULL }, 1727,  "udp"  },
-  { "telindus",        { NULL }, 1728,  "tcp"  },
-  { "telindus",        { NULL }, 1728,  "udp"  },
-  { "citynl",          { NULL }, 1729,  "tcp"  },
-  { "citynl",          { NULL }, 1729,  "udp"  },
-  { "roketz",          { NULL }, 1730,  "tcp"  },
-  { "roketz",          { NULL }, 1730,  "udp"  },
-  { "msiccp",          { NULL }, 1731,  "tcp"  },
-  { "msiccp",          { NULL }, 1731,  "udp"  },
-  { "proxim",          { NULL }, 1732,  "tcp"  },
-  { "proxim",          { NULL }, 1732,  "udp"  },
-  { "siipat",          { NULL }, 1733,  "tcp"  },
-  { "siipat",          { NULL }, 1733,  "udp"  },
-  { "cambertx-lm",     { NULL }, 1734,  "tcp"  },
-  { "cambertx-lm",     { NULL }, 1734,  "udp"  },
-  { "privatechat",     { NULL }, 1735,  "tcp"  },
-  { "privatechat",     { NULL }, 1735,  "udp"  },
-  { "street-stream",   { NULL }, 1736,  "tcp"  },
-  { "street-stream",   { NULL }, 1736,  "udp"  },
-  { "ultimad",         { NULL }, 1737,  "tcp"  },
-  { "ultimad",         { NULL }, 1737,  "udp"  },
-  { "gamegen1",        { NULL }, 1738,  "tcp"  },
-  { "gamegen1",        { NULL }, 1738,  "udp"  },
-  { "webaccess",       { NULL }, 1739,  "tcp"  },
-  { "webaccess",       { NULL }, 1739,  "udp"  },
-  { "encore",          { NULL }, 1740,  "tcp"  },
-  { "encore",          { NULL }, 1740,  "udp"  },
-  { "cisco-net-mgmt",  { NULL }, 1741,  "tcp"  },
-  { "cisco-net-mgmt",  { NULL }, 1741,  "udp"  },
-  { "3Com-nsd",        { NULL }, 1742,  "tcp"  },
-  { "3Com-nsd",        { NULL }, 1742,  "udp"  },
-  { "cinegrfx-lm",     { NULL }, 1743,  "tcp"  },
-  { "cinegrfx-lm",     { NULL }, 1743,  "udp"  },
-  { "ncpm-ft",         { NULL }, 1744,  "tcp"  },
-  { "ncpm-ft",         { NULL }, 1744,  "udp"  },
-  { "remote-winsock",  { NULL }, 1745,  "tcp"  },
-  { "remote-winsock",  { NULL }, 1745,  "udp"  },
-  { "ftrapid-1",       { NULL }, 1746,  "tcp"  },
-  { "ftrapid-1",       { NULL }, 1746,  "udp"  },
-  { "ftrapid-2",       { NULL }, 1747,  "tcp"  },
-  { "ftrapid-2",       { NULL }, 1747,  "udp"  },
-  { "oracle-em1",      { NULL }, 1748,  "tcp"  },
-  { "oracle-em1",      { NULL }, 1748,  "udp"  },
-  { "aspen-services",  { NULL }, 1749,  "tcp"  },
-  { "aspen-services",  { NULL }, 1749,  "udp"  },
-  { "sslp",            { NULL }, 1750,  "tcp"  },
-  { "sslp",            { NULL }, 1750,  "udp"  },
-  { "swiftnet",        { NULL }, 1751,  "tcp"  },
-  { "swiftnet",        { NULL }, 1751,  "udp"  },
-  { "lofr-lm",         { NULL }, 1752,  "tcp"  },
-  { "lofr-lm",         { NULL }, 1752,  "udp"  },
-  { "oracle-em2",      { NULL }, 1754,  "tcp"  },
-  { "oracle-em2",      { NULL }, 1754,  "udp"  },
-  { "ms-streaming",    { NULL }, 1755,  "tcp"  },
-  { "ms-streaming",    { NULL }, 1755,  "udp"  },
-  { "capfast-lmd",     { NULL }, 1756,  "tcp"  },
-  { "capfast-lmd",     { NULL }, 1756,  "udp"  },
-  { "cnhrp",           { NULL }, 1757,  "tcp"  },
-  { "cnhrp",           { NULL }, 1757,  "udp"  },
-  { "tftp-mcast",      { NULL }, 1758,  "tcp"  },
-  { "tftp-mcast",      { NULL }, 1758,  "udp"  },
-  { "spss-lm",         { NULL }, 1759,  "tcp"  },
-  { "spss-lm",         { NULL }, 1759,  "udp"  },
-  { "www-ldap-gw",     { NULL }, 1760,  "tcp"  },
-  { "www-ldap-gw",     { NULL }, 1760,  "udp"  },
-  { "cft-0",           { NULL }, 1761,  "tcp"  },
-  { "cft-0",           { NULL }, 1761,  "udp"  },
-  { "cft-1",           { NULL }, 1762,  "tcp"  },
-  { "cft-1",           { NULL }, 1762,  "udp"  },
-  { "cft-2",           { NULL }, 1763,  "tcp"  },
-  { "cft-2",           { NULL }, 1763,  "udp"  },
-  { "cft-3",           { NULL }, 1764,  "tcp"  },
-  { "cft-3",           { NULL }, 1764,  "udp"  },
-  { "cft-4",           { NULL }, 1765,  "tcp"  },
-  { "cft-4",           { NULL }, 1765,  "udp"  },
-  { "cft-5",           { NULL }, 1766,  "tcp"  },
-  { "cft-5",           { NULL }, 1766,  "udp"  },
-  { "cft-6",           { NULL }, 1767,  "tcp"  },
-  { "cft-6",           { NULL }, 1767,  "udp"  },
-  { "cft-7",           { NULL }, 1768,  "tcp"  },
-  { "cft-7",           { NULL }, 1768,  "udp"  },
-  { "bmc-net-adm",     { NULL }, 1769,  "tcp"  },
-  { "bmc-net-adm",     { NULL }, 1769,  "udp"  },
-  { "bmc-net-svc",     { NULL }, 1770,  "tcp"  },
-  { "bmc-net-svc",     { NULL }, 1770,  "udp"  },
-  { "vaultbase",       { NULL }, 1771,  "tcp"  },
-  { "vaultbase",       { NULL }, 1771,  "udp"  },
-  { "essweb-gw",       { NULL }, 1772,  "tcp"  },
-  { "essweb-gw",       { NULL }, 1772,  "udp"  },
-  { "kmscontrol",      { NULL }, 1773,  "tcp"  },
-  { "kmscontrol",      { NULL }, 1773,  "udp"  },
-  { "global-dtserv",   { NULL }, 1774,  "tcp"  },
-  { "global-dtserv",   { NULL }, 1774,  "udp"  },
-  { "femis",           { NULL }, 1776,  "tcp"  },
-  { "femis",           { NULL }, 1776,  "udp"  },
-  { "powerguardian",   { NULL }, 1777,  "tcp"  },
-  { "powerguardian",   { NULL }, 1777,  "udp"  },
-  { "prodigy-intrnet", { NULL }, 1778,  "tcp"  },
-  { "prodigy-intrnet", { NULL }, 1778,  "udp"  },
-  { "pharmasoft",      { NULL }, 1779,  "tcp"  },
-  { "pharmasoft",      { NULL }, 1779,  "udp"  },
-  { "dpkeyserv",       { NULL }, 1780,  "tcp"  },
-  { "dpkeyserv",       { NULL }, 1780,  "udp"  },
-  { "answersoft-lm",   { NULL }, 1781,  "tcp"  },
-  { "answersoft-lm",   { NULL }, 1781,  "udp"  },
-  { "hp-hcip",         { NULL }, 1782,  "tcp"  },
-  { "hp-hcip",         { NULL }, 1782,  "udp"  },
-  { "finle-lm",        { NULL }, 1784,  "tcp"  },
-  { "finle-lm",        { NULL }, 1784,  "udp"  },
-  { "windlm",          { NULL }, 1785,  "tcp"  },
-  { "windlm",          { NULL }, 1785,  "udp"  },
-  { "funk-logger",     { NULL }, 1786,  "tcp"  },
-  { "funk-logger",     { NULL }, 1786,  "udp"  },
-  { "funk-license",    { NULL }, 1787,  "tcp"  },
-  { "funk-license",    { NULL }, 1787,  "udp"  },
-  { "psmond",          { NULL }, 1788,  "tcp"  },
-  { "psmond",          { NULL }, 1788,  "udp"  },
-  { "hello",           { NULL }, 1789,  "tcp"  },
-  { "hello",           { NULL }, 1789,  "udp"  },
-  { "nmsp",            { NULL }, 1790,  "tcp"  },
-  { "nmsp",            { NULL }, 1790,  "udp"  },
-  { "ea1",             { NULL }, 1791,  "tcp"  },
-  { "ea1",             { NULL }, 1791,  "udp"  },
-  { "ibm-dt-2",        { NULL }, 1792,  "tcp"  },
-  { "ibm-dt-2",        { NULL }, 1792,  "udp"  },
-  { "rsc-robot",       { NULL }, 1793,  "tcp"  },
-  { "rsc-robot",       { NULL }, 1793,  "udp"  },
-  { "cera-bcm",        { NULL }, 1794,  "tcp"  },
-  { "cera-bcm",        { NULL }, 1794,  "udp"  },
-  { "dpi-proxy",       { NULL }, 1795,  "tcp"  },
-  { "dpi-proxy",       { NULL }, 1795,  "udp"  },
-  { "vocaltec-admin",  { NULL }, 1796,  "tcp"  },
-  { "vocaltec-admin",  { NULL }, 1796,  "udp"  },
-  { "uma",             { NULL }, 1797,  "tcp"  },
-  { "uma",             { NULL }, 1797,  "udp"  },
-  { "etp",             { NULL }, 1798,  "tcp"  },
-  { "etp",             { NULL }, 1798,  "udp"  },
-  { "netrisk",         { NULL }, 1799,  "tcp"  },
-  { "netrisk",         { NULL }, 1799,  "udp"  },
-  { "ansys-lm",        { NULL }, 1800,  "tcp"  },
-  { "ansys-lm",        { NULL }, 1800,  "udp"  },
-  { "msmq",            { NULL }, 1801,  "tcp"  },
-  { "msmq",            { NULL }, 1801,  "udp"  },
-  { "concomp1",        { NULL }, 1802,  "tcp"  },
-  { "concomp1",        { NULL }, 1802,  "udp"  },
-  { "hp-hcip-gwy",     { NULL }, 1803,  "tcp"  },
-  { "hp-hcip-gwy",     { NULL }, 1803,  "udp"  },
-  { "enl",             { NULL }, 1804,  "tcp"  },
-  { "enl",             { NULL }, 1804,  "udp"  },
-  { "enl-name",        { NULL }, 1805,  "tcp"  },
-  { "enl-name",        { NULL }, 1805,  "udp"  },
-  { "musiconline",     { NULL }, 1806,  "tcp"  },
-  { "musiconline",     { NULL }, 1806,  "udp"  },
-  { "fhsp",            { NULL }, 1807,  "tcp"  },
-  { "fhsp",            { NULL }, 1807,  "udp"  },
-  { "oracle-vp2",      { NULL }, 1808,  "tcp"  },
-  { "oracle-vp2",      { NULL }, 1808,  "udp"  },
-  { "oracle-vp1",      { NULL }, 1809,  "tcp"  },
-  { "oracle-vp1",      { NULL }, 1809,  "udp"  },
-  { "jerand-lm",       { NULL }, 1810,  "tcp"  },
-  { "jerand-lm",       { NULL }, 1810,  "udp"  },
-  { "scientia-sdb",    { NULL }, 1811,  "tcp"  },
-  { "scientia-sdb",    { NULL }, 1811,  "udp"  },
-  { "radius",          { NULL }, 1812,  "tcp"  },
-  { "radius",          { NULL }, 1812,  "udp"  },
-  { "radius-acct",     { NULL }, 1813,  "tcp"  },
-  { "radius-acct",     { NULL }, 1813,  "udp"  },
-  { "tdp-suite",       { NULL }, 1814,  "tcp"  },
-  { "tdp-suite",       { NULL }, 1814,  "udp"  },
-  { "mmpft",           { NULL }, 1815,  "tcp"  },
-  { "mmpft",           { NULL }, 1815,  "udp"  },
-  { "harp",            { NULL }, 1816,  "tcp"  },
-  { "harp",            { NULL }, 1816,  "udp"  },
-  { "rkb-oscs",        { NULL }, 1817,  "tcp"  },
-  { "rkb-oscs",        { NULL }, 1817,  "udp"  },
-  { "etftp",           { NULL }, 1818,  "tcp"  },
-  { "etftp",           { NULL }, 1818,  "udp"  },
-  { "plato-lm",        { NULL }, 1819,  "tcp"  },
-  { "plato-lm",        { NULL }, 1819,  "udp"  },
-  { "mcagent",         { NULL }, 1820,  "tcp"  },
-  { "mcagent",         { NULL }, 1820,  "udp"  },
-  { "donnyworld",      { NULL }, 1821,  "tcp"  },
-  { "donnyworld",      { NULL }, 1821,  "udp"  },
-  { "es-elmd",         { NULL }, 1822,  "tcp"  },
-  { "es-elmd",         { NULL }, 1822,  "udp"  },
-  { "unisys-lm",       { NULL }, 1823,  "tcp"  },
-  { "unisys-lm",       { NULL }, 1823,  "udp"  },
-  { "metrics-pas",     { NULL }, 1824,  "tcp"  },
-  { "metrics-pas",     { NULL }, 1824,  "udp"  },
-  { "direcpc-video",   { NULL }, 1825,  "tcp"  },
-  { "direcpc-video",   { NULL }, 1825,  "udp"  },
-  { "ardt",            { NULL }, 1826,  "tcp"  },
-  { "ardt",            { NULL }, 1826,  "udp"  },
-  { "asi",             { NULL }, 1827,  "tcp"  },
-  { "asi",             { NULL }, 1827,  "udp"  },
-  { "itm-mcell-u",     { NULL }, 1828,  "tcp"  },
-  { "itm-mcell-u",     { NULL }, 1828,  "udp"  },
-  { "optika-emedia",   { NULL }, 1829,  "tcp"  },
-  { "optika-emedia",   { NULL }, 1829,  "udp"  },
-  { "net8-cman",       { NULL }, 1830,  "tcp"  },
-  { "net8-cman",       { NULL }, 1830,  "udp"  },
-  { "myrtle",          { NULL }, 1831,  "tcp"  },
-  { "myrtle",          { NULL }, 1831,  "udp"  },
-  { "tht-treasure",    { NULL }, 1832,  "tcp"  },
-  { "tht-treasure",    { NULL }, 1832,  "udp"  },
-  { "udpradio",        { NULL }, 1833,  "tcp"  },
-  { "udpradio",        { NULL }, 1833,  "udp"  },
-  { "ardusuni",        { NULL }, 1834,  "tcp"  },
-  { "ardusuni",        { NULL }, 1834,  "udp"  },
-  { "ardusmul",        { NULL }, 1835,  "tcp"  },
-  { "ardusmul",        { NULL }, 1835,  "udp"  },
-  { "ste-smsc",        { NULL }, 1836,  "tcp"  },
-  { "ste-smsc",        { NULL }, 1836,  "udp"  },
-  { "csoft1",          { NULL }, 1837,  "tcp"  },
-  { "csoft1",          { NULL }, 1837,  "udp"  },
-  { "talnet",          { NULL }, 1838,  "tcp"  },
-  { "talnet",          { NULL }, 1838,  "udp"  },
-  { "netopia-vo1",     { NULL }, 1839,  "tcp"  },
-  { "netopia-vo1",     { NULL }, 1839,  "udp"  },
-  { "netopia-vo2",     { NULL }, 1840,  "tcp"  },
-  { "netopia-vo2",     { NULL }, 1840,  "udp"  },
-  { "netopia-vo3",     { NULL }, 1841,  "tcp"  },
-  { "netopia-vo3",     { NULL }, 1841,  "udp"  },
-  { "netopia-vo4",     { NULL }, 1842,  "tcp"  },
-  { "netopia-vo4",     { NULL }, 1842,  "udp"  },
-  { "netopia-vo5",     { NULL }, 1843,  "tcp"  },
-  { "netopia-vo5",     { NULL }, 1843,  "udp"  },
-  { "direcpc-dll",     { NULL }, 1844,  "tcp"  },
-  { "direcpc-dll",     { NULL }, 1844,  "udp"  },
-  { "altalink",        { NULL }, 1845,  "tcp"  },
-  { "altalink",        { NULL }, 1845,  "udp"  },
-  { "tunstall-pnc",    { NULL }, 1846,  "tcp"  },
-  { "tunstall-pnc",    { NULL }, 1846,  "udp"  },
-  { "slp-notify",      { NULL }, 1847,  "tcp"  },
-  { "slp-notify",      { NULL }, 1847,  "udp"  },
-  { "fjdocdist",       { NULL }, 1848,  "tcp"  },
-  { "fjdocdist",       { NULL }, 1848,  "udp"  },
-  { "alpha-sms",       { NULL }, 1849,  "tcp"  },
-  { "alpha-sms",       { NULL }, 1849,  "udp"  },
-  { "gsi",             { NULL }, 1850,  "tcp"  },
-  { "gsi",             { NULL }, 1850,  "udp"  },
-  { "ctcd",            { NULL }, 1851,  "tcp"  },
-  { "ctcd",            { NULL }, 1851,  "udp"  },
-  { "virtual-time",    { NULL }, 1852,  "tcp"  },
-  { "virtual-time",    { NULL }, 1852,  "udp"  },
-  { "vids-avtp",       { NULL }, 1853,  "tcp"  },
-  { "vids-avtp",       { NULL }, 1853,  "udp"  },
-  { "buddy-draw",      { NULL }, 1854,  "tcp"  },
-  { "buddy-draw",      { NULL }, 1854,  "udp"  },
-  { "fiorano-rtrsvc",  { NULL }, 1855,  "tcp"  },
-  { "fiorano-rtrsvc",  { NULL }, 1855,  "udp"  },
-  { "fiorano-msgsvc",  { NULL }, 1856,  "tcp"  },
-  { "fiorano-msgsvc",  { NULL }, 1856,  "udp"  },
-  { "datacaptor",      { NULL }, 1857,  "tcp"  },
-  { "datacaptor",      { NULL }, 1857,  "udp"  },
-  { "privateark",      { NULL }, 1858,  "tcp"  },
-  { "privateark",      { NULL }, 1858,  "udp"  },
-  { "gammafetchsvr",   { NULL }, 1859,  "tcp"  },
-  { "gammafetchsvr",   { NULL }, 1859,  "udp"  },
-  { "sunscalar-svc",   { NULL }, 1860,  "tcp"  },
-  { "sunscalar-svc",   { NULL }, 1860,  "udp"  },
-  { "lecroy-vicp",     { NULL }, 1861,  "tcp"  },
-  { "lecroy-vicp",     { NULL }, 1861,  "udp"  },
-  { "mysql-cm-agent",  { NULL }, 1862,  "tcp"  },
-  { "mysql-cm-agent",  { NULL }, 1862,  "udp"  },
-  { "msnp",            { NULL }, 1863,  "tcp"  },
-  { "msnp",            { NULL }, 1863,  "udp"  },
-  { "paradym-31port",  { NULL }, 1864,  "tcp"  },
-  { "paradym-31port",  { NULL }, 1864,  "udp"  },
-  { "entp",            { NULL }, 1865,  "tcp"  },
-  { "entp",            { NULL }, 1865,  "udp"  },
-  { "swrmi",           { NULL }, 1866,  "tcp"  },
-  { "swrmi",           { NULL }, 1866,  "udp"  },
-  { "udrive",          { NULL }, 1867,  "tcp"  },
-  { "udrive",          { NULL }, 1867,  "udp"  },
-  { "viziblebrowser",  { NULL }, 1868,  "tcp"  },
-  { "viziblebrowser",  { NULL }, 1868,  "udp"  },
-  { "transact",        { NULL }, 1869,  "tcp"  },
-  { "transact",        { NULL }, 1869,  "udp"  },
-  { "sunscalar-dns",   { NULL }, 1870,  "tcp"  },
-  { "sunscalar-dns",   { NULL }, 1870,  "udp"  },
-  { "canocentral0",    { NULL }, 1871,  "tcp"  },
-  { "canocentral0",    { NULL }, 1871,  "udp"  },
-  { "canocentral1",    { NULL }, 1872,  "tcp"  },
-  { "canocentral1",    { NULL }, 1872,  "udp"  },
-  { "fjmpjps",         { NULL }, 1873,  "tcp"  },
-  { "fjmpjps",         { NULL }, 1873,  "udp"  },
-  { "fjswapsnp",       { NULL }, 1874,  "tcp"  },
-  { "fjswapsnp",       { NULL }, 1874,  "udp"  },
-  { "westell-stats",   { NULL }, 1875,  "tcp"  },
-  { "westell-stats",   { NULL }, 1875,  "udp"  },
-  { "ewcappsrv",       { NULL }, 1876,  "tcp"  },
-  { "ewcappsrv",       { NULL }, 1876,  "udp"  },
-  { "hp-webqosdb",     { NULL }, 1877,  "tcp"  },
-  { "hp-webqosdb",     { NULL }, 1877,  "udp"  },
-  { "drmsmc",          { NULL }, 1878,  "tcp"  },
-  { "drmsmc",          { NULL }, 1878,  "udp"  },
-  { "nettgain-nms",    { NULL }, 1879,  "tcp"  },
-  { "nettgain-nms",    { NULL }, 1879,  "udp"  },
-  { "vsat-control",    { NULL }, 1880,  "tcp"  },
-  { "vsat-control",    { NULL }, 1880,  "udp"  },
-  { "ibm-mqseries2",   { NULL }, 1881,  "tcp"  },
-  { "ibm-mqseries2",   { NULL }, 1881,  "udp"  },
-  { "ecsqdmn",         { NULL }, 1882,  "tcp"  },
-  { "ecsqdmn",         { NULL }, 1882,  "udp"  },
-  { "ibm-mqisdp",      { NULL }, 1883,  "tcp"  },
-  { "ibm-mqisdp",      { NULL }, 1883,  "udp"  },
-  { "idmaps",          { NULL }, 1884,  "tcp"  },
-  { "idmaps",          { NULL }, 1884,  "udp"  },
-  { "vrtstrapserver",  { NULL }, 1885,  "tcp"  },
-  { "vrtstrapserver",  { NULL }, 1885,  "udp"  },
-  { "leoip",           { NULL }, 1886,  "tcp"  },
-  { "leoip",           { NULL }, 1886,  "udp"  },
-  { "filex-lport",     { NULL }, 1887,  "tcp"  },
-  { "filex-lport",     { NULL }, 1887,  "udp"  },
-  { "ncconfig",        { NULL }, 1888,  "tcp"  },
-  { "ncconfig",        { NULL }, 1888,  "udp"  },
-  { "unify-adapter",   { NULL }, 1889,  "tcp"  },
-  { "unify-adapter",   { NULL }, 1889,  "udp"  },
-  { "wilkenlistener",  { NULL }, 1890,  "tcp"  },
-  { "wilkenlistener",  { NULL }, 1890,  "udp"  },
-  { "childkey-notif",  { NULL }, 1891,  "tcp"  },
-  { "childkey-notif",  { NULL }, 1891,  "udp"  },
-  { "childkey-ctrl",   { NULL }, 1892,  "tcp"  },
-  { "childkey-ctrl",   { NULL }, 1892,  "udp"  },
-  { "elad",            { NULL }, 1893,  "tcp"  },
-  { "elad",            { NULL }, 1893,  "udp"  },
-  { "o2server-port",   { NULL }, 1894,  "tcp"  },
-  { "o2server-port",   { NULL }, 1894,  "udp"  },
-  { "b-novative-ls",   { NULL }, 1896,  "tcp"  },
-  { "b-novative-ls",   { NULL }, 1896,  "udp"  },
-  { "metaagent",       { NULL }, 1897,  "tcp"  },
-  { "metaagent",       { NULL }, 1897,  "udp"  },
-  { "cymtec-port",     { NULL }, 1898,  "tcp"  },
-  { "cymtec-port",     { NULL }, 1898,  "udp"  },
-  { "mc2studios",      { NULL }, 1899,  "tcp"  },
-  { "mc2studios",      { NULL }, 1899,  "udp"  },
-  { "ssdp",            { NULL }, 1900,  "tcp"  },
-  { "ssdp",            { NULL }, 1900,  "udp"  },
-  { "fjicl-tep-a",     { NULL }, 1901,  "tcp"  },
-  { "fjicl-tep-a",     { NULL }, 1901,  "udp"  },
-  { "fjicl-tep-b",     { NULL }, 1902,  "tcp"  },
-  { "fjicl-tep-b",     { NULL }, 1902,  "udp"  },
-  { "linkname",        { NULL }, 1903,  "tcp"  },
-  { "linkname",        { NULL }, 1903,  "udp"  },
-  { "fjicl-tep-c",     { NULL }, 1904,  "tcp"  },
-  { "fjicl-tep-c",     { NULL }, 1904,  "udp"  },
-  { "sugp",            { NULL }, 1905,  "tcp"  },
-  { "sugp",            { NULL }, 1905,  "udp"  },
-  { "tpmd",            { NULL }, 1906,  "tcp"  },
-  { "tpmd",            { NULL }, 1906,  "udp"  },
-  { "intrastar",       { NULL }, 1907,  "tcp"  },
-  { "intrastar",       { NULL }, 1907,  "udp"  },
-  { "dawn",            { NULL }, 1908,  "tcp"  },
-  { "dawn",            { NULL }, 1908,  "udp"  },
-  { "global-wlink",    { NULL }, 1909,  "tcp"  },
-  { "global-wlink",    { NULL }, 1909,  "udp"  },
-  { "ultrabac",        { NULL }, 1910,  "tcp"  },
-  { "ultrabac",        { NULL }, 1910,  "udp"  },
-  { "mtp",             { NULL }, 1911,  "tcp"  },
-  { "mtp",             { NULL }, 1911,  "udp"  },
-  { "rhp-iibp",        { NULL }, 1912,  "tcp"  },
-  { "rhp-iibp",        { NULL }, 1912,  "udp"  },
-  { "armadp",          { NULL }, 1913,  "tcp"  },
-  { "armadp",          { NULL }, 1913,  "udp"  },
-  { "elm-momentum",    { NULL }, 1914,  "tcp"  },
-  { "elm-momentum",    { NULL }, 1914,  "udp"  },
-  { "facelink",        { NULL }, 1915,  "tcp"  },
-  { "facelink",        { NULL }, 1915,  "udp"  },
-  { "persona",         { NULL }, 1916,  "tcp"  },
-  { "persona",         { NULL }, 1916,  "udp"  },
-  { "noagent",         { NULL }, 1917,  "tcp"  },
-  { "noagent",         { NULL }, 1917,  "udp"  },
-  { "can-nds",         { NULL }, 1918,  "tcp"  },
-  { "can-nds",         { NULL }, 1918,  "udp"  },
-  { "can-dch",         { NULL }, 1919,  "tcp"  },
-  { "can-dch",         { NULL }, 1919,  "udp"  },
-  { "can-ferret",      { NULL }, 1920,  "tcp"  },
-  { "can-ferret",      { NULL }, 1920,  "udp"  },
-  { "noadmin",         { NULL }, 1921,  "tcp"  },
-  { "noadmin",         { NULL }, 1921,  "udp"  },
-  { "tapestry",        { NULL }, 1922,  "tcp"  },
-  { "tapestry",        { NULL }, 1922,  "udp"  },
-  { "spice",           { NULL }, 1923,  "tcp"  },
-  { "spice",           { NULL }, 1923,  "udp"  },
-  { "xiip",            { NULL }, 1924,  "tcp"  },
-  { "xiip",            { NULL }, 1924,  "udp"  },
-  { "discovery-port",  { NULL }, 1925,  "tcp"  },
-  { "discovery-port",  { NULL }, 1925,  "udp"  },
-  { "egs",             { NULL }, 1926,  "tcp"  },
-  { "egs",             { NULL }, 1926,  "udp"  },
-  { "videte-cipc",     { NULL }, 1927,  "tcp"  },
-  { "videte-cipc",     { NULL }, 1927,  "udp"  },
-  { "emsd-port",       { NULL }, 1928,  "tcp"  },
-  { "emsd-port",       { NULL }, 1928,  "udp"  },
-  { "bandwiz-system",  { NULL }, 1929,  "tcp"  },
-  { "bandwiz-system",  { NULL }, 1929,  "udp"  },
-  { "driveappserver",  { NULL }, 1930,  "tcp"  },
-  { "driveappserver",  { NULL }, 1930,  "udp"  },
-  { "amdsched",        { NULL }, 1931,  "tcp"  },
-  { "amdsched",        { NULL }, 1931,  "udp"  },
-  { "ctt-broker",      { NULL }, 1932,  "tcp"  },
-  { "ctt-broker",      { NULL }, 1932,  "udp"  },
-  { "xmapi",           { NULL }, 1933,  "tcp"  },
-  { "xmapi",           { NULL }, 1933,  "udp"  },
-  { "xaapi",           { NULL }, 1934,  "tcp"  },
-  { "xaapi",           { NULL }, 1934,  "udp"  },
-  { "macromedia-fcs",  { NULL }, 1935,  "tcp"  },
-  { "macromedia-fcs",  { NULL }, 1935,  "udp"  },
-  { "jetcmeserver",    { NULL }, 1936,  "tcp"  },
-  { "jetcmeserver",    { NULL }, 1936,  "udp"  },
-  { "jwserver",        { NULL }, 1937,  "tcp"  },
-  { "jwserver",        { NULL }, 1937,  "udp"  },
-  { "jwclient",        { NULL }, 1938,  "tcp"  },
-  { "jwclient",        { NULL }, 1938,  "udp"  },
-  { "jvserver",        { NULL }, 1939,  "tcp"  },
-  { "jvserver",        { NULL }, 1939,  "udp"  },
-  { "jvclient",        { NULL }, 1940,  "tcp"  },
-  { "jvclient",        { NULL }, 1940,  "udp"  },
-  { "dic-aida",        { NULL }, 1941,  "tcp"  },
-  { "dic-aida",        { NULL }, 1941,  "udp"  },
-  { "res",             { NULL }, 1942,  "tcp"  },
-  { "res",             { NULL }, 1942,  "udp"  },
-  { "beeyond-media",   { NULL }, 1943,  "tcp"  },
-  { "beeyond-media",   { NULL }, 1943,  "udp"  },
-  { "close-combat",    { NULL }, 1944,  "tcp"  },
-  { "close-combat",    { NULL }, 1944,  "udp"  },
-  { "dialogic-elmd",   { NULL }, 1945,  "tcp"  },
-  { "dialogic-elmd",   { NULL }, 1945,  "udp"  },
-  { "tekpls",          { NULL }, 1946,  "tcp"  },
-  { "tekpls",          { NULL }, 1946,  "udp"  },
-  { "sentinelsrm",     { NULL }, 1947,  "tcp"  },
-  { "sentinelsrm",     { NULL }, 1947,  "udp"  },
-  { "eye2eye",         { NULL }, 1948,  "tcp"  },
-  { "eye2eye",         { NULL }, 1948,  "udp"  },
-  { "ismaeasdaqlive",  { NULL }, 1949,  "tcp"  },
-  { "ismaeasdaqlive",  { NULL }, 1949,  "udp"  },
-  { "ismaeasdaqtest",  { NULL }, 1950,  "tcp"  },
-  { "ismaeasdaqtest",  { NULL }, 1950,  "udp"  },
-  { "bcs-lmserver",    { NULL }, 1951,  "tcp"  },
-  { "bcs-lmserver",    { NULL }, 1951,  "udp"  },
-  { "mpnjsc",          { NULL }, 1952,  "tcp"  },
-  { "mpnjsc",          { NULL }, 1952,  "udp"  },
-  { "rapidbase",       { NULL }, 1953,  "tcp"  },
-  { "rapidbase",       { NULL }, 1953,  "udp"  },
-  { "abr-api",         { NULL }, 1954,  "tcp"  },
-  { "abr-api",         { NULL }, 1954,  "udp"  },
-  { "abr-secure",      { NULL }, 1955,  "tcp"  },
-  { "abr-secure",      { NULL }, 1955,  "udp"  },
-  { "vrtl-vmf-ds",     { NULL }, 1956,  "tcp"  },
-  { "vrtl-vmf-ds",     { NULL }, 1956,  "udp"  },
-  { "unix-status",     { NULL }, 1957,  "tcp"  },
-  { "unix-status",     { NULL }, 1957,  "udp"  },
-  { "dxadmind",        { NULL }, 1958,  "tcp"  },
-  { "dxadmind",        { NULL }, 1958,  "udp"  },
-  { "simp-all",        { NULL }, 1959,  "tcp"  },
-  { "simp-all",        { NULL }, 1959,  "udp"  },
-  { "nasmanager",      { NULL }, 1960,  "tcp"  },
-  { "nasmanager",      { NULL }, 1960,  "udp"  },
-  { "bts-appserver",   { NULL }, 1961,  "tcp"  },
-  { "bts-appserver",   { NULL }, 1961,  "udp"  },
-  { "biap-mp",         { NULL }, 1962,  "tcp"  },
-  { "biap-mp",         { NULL }, 1962,  "udp"  },
-  { "webmachine",      { NULL }, 1963,  "tcp"  },
-  { "webmachine",      { NULL }, 1963,  "udp"  },
-  { "solid-e-engine",  { NULL }, 1964,  "tcp"  },
-  { "solid-e-engine",  { NULL }, 1964,  "udp"  },
-  { "tivoli-npm",      { NULL }, 1965,  "tcp"  },
-  { "tivoli-npm",      { NULL }, 1965,  "udp"  },
-  { "slush",           { NULL }, 1966,  "tcp"  },
-  { "slush",           { NULL }, 1966,  "udp"  },
-  { "sns-quote",       { NULL }, 1967,  "tcp"  },
-  { "sns-quote",       { NULL }, 1967,  "udp"  },
-  { "lipsinc",         { NULL }, 1968,  "tcp"  },
-  { "lipsinc",         { NULL }, 1968,  "udp"  },
-  { "lipsinc1",        { NULL }, 1969,  "tcp"  },
-  { "lipsinc1",        { NULL }, 1969,  "udp"  },
-  { "netop-rc",        { NULL }, 1970,  "tcp"  },
-  { "netop-rc",        { NULL }, 1970,  "udp"  },
-  { "netop-school",    { NULL }, 1971,  "tcp"  },
-  { "netop-school",    { NULL }, 1971,  "udp"  },
-  { "intersys-cache",  { NULL }, 1972,  "tcp"  },
-  { "intersys-cache",  { NULL }, 1972,  "udp"  },
-  { "dlsrap",          { NULL }, 1973,  "tcp"  },
-  { "dlsrap",          { NULL }, 1973,  "udp"  },
-  { "drp",             { NULL }, 1974,  "tcp"  },
-  { "drp",             { NULL }, 1974,  "udp"  },
-  { "tcoflashagent",   { NULL }, 1975,  "tcp"  },
-  { "tcoflashagent",   { NULL }, 1975,  "udp"  },
-  { "tcoregagent",     { NULL }, 1976,  "tcp"  },
-  { "tcoregagent",     { NULL }, 1976,  "udp"  },
-  { "tcoaddressbook",  { NULL }, 1977,  "tcp"  },
-  { "tcoaddressbook",  { NULL }, 1977,  "udp"  },
-  { "unisql",          { NULL }, 1978,  "tcp"  },
-  { "unisql",          { NULL }, 1978,  "udp"  },
-  { "unisql-java",     { NULL }, 1979,  "tcp"  },
-  { "unisql-java",     { NULL }, 1979,  "udp"  },
-  { "pearldoc-xact",   { NULL }, 1980,  "tcp"  },
-  { "pearldoc-xact",   { NULL }, 1980,  "udp"  },
-  { "p2pq",            { NULL }, 1981,  "tcp"  },
-  { "p2pq",            { NULL }, 1981,  "udp"  },
-  { "estamp",          { NULL }, 1982,  "tcp"  },
-  { "estamp",          { NULL }, 1982,  "udp"  },
-  { "lhtp",            { NULL }, 1983,  "tcp"  },
-  { "lhtp",            { NULL }, 1983,  "udp"  },
-  { "bb",              { NULL }, 1984,  "tcp"  },
-  { "bb",              { NULL }, 1984,  "udp"  },
-  { "hsrp",            { NULL }, 1985,  "tcp"  },
-  { "hsrp",            { NULL }, 1985,  "udp"  },
-  { "licensedaemon",   { NULL }, 1986,  "tcp"  },
-  { "licensedaemon",   { NULL }, 1986,  "udp"  },
-  { "tr-rsrb-p1",      { NULL }, 1987,  "tcp"  },
-  { "tr-rsrb-p1",      { NULL }, 1987,  "udp"  },
-  { "tr-rsrb-p2",      { NULL }, 1988,  "tcp"  },
-  { "tr-rsrb-p2",      { NULL }, 1988,  "udp"  },
-  { "tr-rsrb-p3",      { NULL }, 1989,  "tcp"  },
-  { "tr-rsrb-p3",      { NULL }, 1989,  "udp"  },
-  { "mshnet",          { NULL }, 1989,  "tcp"  },
-  { "mshnet",          { NULL }, 1989,  "udp"  },
-  { "stun-p1",         { NULL }, 1990,  "tcp"  },
-  { "stun-p1",         { NULL }, 1990,  "udp"  },
-  { "stun-p2",         { NULL }, 1991,  "tcp"  },
-  { "stun-p2",         { NULL }, 1991,  "udp"  },
-  { "stun-p3",         { NULL }, 1992,  "tcp"  },
-  { "stun-p3",         { NULL }, 1992,  "udp"  },
-  { "ipsendmsg",       { NULL }, 1992,  "tcp"  },
-  { "ipsendmsg",       { NULL }, 1992,  "udp"  },
-  { "snmp-tcp-port",   { NULL }, 1993,  "tcp"  },
-  { "snmp-tcp-port",   { NULL }, 1993,  "udp"  },
-  { "stun-port",       { NULL }, 1994,  "tcp"  },
-  { "stun-port",       { NULL }, 1994,  "udp"  },
-  { "perf-port",       { NULL }, 1995,  "tcp"  },
-  { "perf-port",       { NULL }, 1995,  "udp"  },
-  { "tr-rsrb-port",    { NULL }, 1996,  "tcp"  },
-  { "tr-rsrb-port",    { NULL }, 1996,  "udp"  },
-  { "gdp-port",        { NULL }, 1997,  "tcp"  },
-  { "gdp-port",        { NULL }, 1997,  "udp"  },
-  { "x25-svc-port",    { NULL }, 1998,  "tcp"  },
-  { "x25-svc-port",    { NULL }, 1998,  "udp"  },
-  { "tcp-id-port",     { NULL }, 1999,  "tcp"  },
-  { "tcp-id-port",     { NULL }, 1999,  "udp"  },
-  { "cisco-sccp",      { NULL }, 2000,  "tcp"  },
-  { "cisco-sccp",      { NULL }, 2000,  "udp"  },
-  { "dc",              { NULL }, 2001,  "tcp"  },
-  { "wizard",          { NULL }, 2001,  "udp"  },
-  { "globe",           { NULL }, 2002,  "tcp"  },
-  { "globe",           { NULL }, 2002,  "udp"  },
-  { "brutus",          { NULL }, 2003,  "tcp"  },
-  { "brutus",          { NULL }, 2003,  "udp"  },
-  { "mailbox",         { NULL }, 2004,  "tcp"  },
-  { "emce",            { NULL }, 2004,  "udp"  },
-  { "berknet",         { NULL }, 2005,  "tcp"  },
-  { "oracle",          { NULL }, 2005,  "udp"  },
-  { "invokator",       { NULL }, 2006,  "tcp"  },
-  { "raid-cd",         { NULL }, 2006,  "udp"  },
-  { "dectalk",         { NULL }, 2007,  "tcp"  },
-  { "raid-am",         { NULL }, 2007,  "udp"  },
-  { "conf",            { NULL }, 2008,  "tcp"  },
-  { "terminaldb",      { NULL }, 2008,  "udp"  },
-  { "news",            { NULL }, 2009,  "tcp"  },
-  { "whosockami",      { NULL }, 2009,  "udp"  },
-  { "search",          { NULL }, 2010,  "tcp"  },
-  { "pipe_server",     { NULL }, 2010,  "udp"  },
-  { "raid-cc",         { NULL }, 2011,  "tcp"  },
-  { "servserv",        { NULL }, 2011,  "udp"  },
-  { "ttyinfo",         { NULL }, 2012,  "tcp"  },
-  { "raid-ac",         { NULL }, 2012,  "udp"  },
-  { "raid-am",         { NULL }, 2013,  "tcp"  },
-  { "raid-cd",         { NULL }, 2013,  "udp"  },
-  { "troff",           { NULL }, 2014,  "tcp"  },
-  { "raid-sf",         { NULL }, 2014,  "udp"  },
-  { "cypress",         { NULL }, 2015,  "tcp"  },
-  { "raid-cs",         { NULL }, 2015,  "udp"  },
-  { "bootserver",      { NULL }, 2016,  "tcp"  },
-  { "bootserver",      { NULL }, 2016,  "udp"  },
-  { "cypress-stat",    { NULL }, 2017,  "tcp"  },
-  { "bootclient",      { NULL }, 2017,  "udp"  },
-  { "terminaldb",      { NULL }, 2018,  "tcp"  },
-  { "rellpack",        { NULL }, 2018,  "udp"  },
-  { "whosockami",      { NULL }, 2019,  "tcp"  },
-  { "about",           { NULL }, 2019,  "udp"  },
-  { "xinupageserver",  { NULL }, 2020,  "tcp"  },
-  { "xinupageserver",  { NULL }, 2020,  "udp"  },
-  { "servexec",        { NULL }, 2021,  "tcp"  },
-  { "xinuexpansion1",  { NULL }, 2021,  "udp"  },
-  { "down",            { NULL }, 2022,  "tcp"  },
-  { "xinuexpansion2",  { NULL }, 2022,  "udp"  },
-  { "xinuexpansion3",  { NULL }, 2023,  "tcp"  },
-  { "xinuexpansion3",  { NULL }, 2023,  "udp"  },
-  { "xinuexpansion4",  { NULL }, 2024,  "tcp"  },
-  { "xinuexpansion4",  { NULL }, 2024,  "udp"  },
-  { "ellpack",         { NULL }, 2025,  "tcp"  },
-  { "xribs",           { NULL }, 2025,  "udp"  },
-  { "scrabble",        { NULL }, 2026,  "tcp"  },
-  { "scrabble",        { NULL }, 2026,  "udp"  },
-  { "shadowserver",    { NULL }, 2027,  "tcp"  },
-  { "shadowserver",    { NULL }, 2027,  "udp"  },
-  { "submitserver",    { NULL }, 2028,  "tcp"  },
-  { "submitserver",    { NULL }, 2028,  "udp"  },
-  { "hsrpv6",          { NULL }, 2029,  "tcp"  },
-  { "hsrpv6",          { NULL }, 2029,  "udp"  },
-  { "device2",         { NULL }, 2030,  "tcp"  },
-  { "device2",         { NULL }, 2030,  "udp"  },
-  { "mobrien-chat",    { NULL }, 2031,  "tcp"  },
-  { "mobrien-chat",    { NULL }, 2031,  "udp"  },
-  { "blackboard",      { NULL }, 2032,  "tcp"  },
-  { "blackboard",      { NULL }, 2032,  "udp"  },
-  { "glogger",         { NULL }, 2033,  "tcp"  },
-  { "glogger",         { NULL }, 2033,  "udp"  },
-  { "scoremgr",        { NULL }, 2034,  "tcp"  },
-  { "scoremgr",        { NULL }, 2034,  "udp"  },
-  { "imsldoc",         { NULL }, 2035,  "tcp"  },
-  { "imsldoc",         { NULL }, 2035,  "udp"  },
-  { "e-dpnet",         { NULL }, 2036,  "tcp"  },
-  { "e-dpnet",         { NULL }, 2036,  "udp"  },
-  { "applus",          { NULL }, 2037,  "tcp"  },
-  { "applus",          { NULL }, 2037,  "udp"  },
-  { "objectmanager",   { NULL }, 2038,  "tcp"  },
-  { "objectmanager",   { NULL }, 2038,  "udp"  },
-  { "prizma",          { NULL }, 2039,  "tcp"  },
-  { "prizma",          { NULL }, 2039,  "udp"  },
-  { "lam",             { NULL }, 2040,  "tcp"  },
-  { "lam",             { NULL }, 2040,  "udp"  },
-  { "interbase",       { NULL }, 2041,  "tcp"  },
-  { "interbase",       { NULL }, 2041,  "udp"  },
-  { "isis",            { NULL }, 2042,  "tcp"  },
-  { "isis",            { NULL }, 2042,  "udp"  },
-  { "isis-bcast",      { NULL }, 2043,  "tcp"  },
-  { "isis-bcast",      { NULL }, 2043,  "udp"  },
-  { "rimsl",           { NULL }, 2044,  "tcp"  },
-  { "rimsl",           { NULL }, 2044,  "udp"  },
-  { "cdfunc",          { NULL }, 2045,  "tcp"  },
-  { "cdfunc",          { NULL }, 2045,  "udp"  },
-  { "sdfunc",          { NULL }, 2046,  "tcp"  },
-  { "sdfunc",          { NULL }, 2046,  "udp"  },
-  { "dls",             { NULL }, 2047,  "tcp"  },
-  { "dls",             { NULL }, 2047,  "udp"  },
-  { "dls-monitor",     { NULL }, 2048,  "tcp"  },
-  { "dls-monitor",     { NULL }, 2048,  "udp"  },
-  { "shilp",           { NULL }, 2049,  "tcp"  },
-  { "shilp",           { NULL }, 2049,  "udp"  },
-  { "nfs",             { NULL }, 2049,  "tcp"  },
-  { "nfs",             { NULL }, 2049,  "udp"  },
-  { "nfs",             { NULL }, 2049,  "sctp" },
-  { "av-emb-config",   { NULL }, 2050,  "tcp"  },
-  { "av-emb-config",   { NULL }, 2050,  "udp"  },
-  { "epnsdp",          { NULL }, 2051,  "tcp"  },
-  { "epnsdp",          { NULL }, 2051,  "udp"  },
-  { "clearvisn",       { NULL }, 2052,  "tcp"  },
-  { "clearvisn",       { NULL }, 2052,  "udp"  },
-  { "lot105-ds-upd",   { NULL }, 2053,  "tcp"  },
-  { "lot105-ds-upd",   { NULL }, 2053,  "udp"  },
-  { "weblogin",        { NULL }, 2054,  "tcp"  },
-  { "weblogin",        { NULL }, 2054,  "udp"  },
-  { "iop",             { NULL }, 2055,  "tcp"  },
-  { "iop",             { NULL }, 2055,  "udp"  },
-  { "omnisky",         { NULL }, 2056,  "tcp"  },
-  { "omnisky",         { NULL }, 2056,  "udp"  },
-  { "rich-cp",         { NULL }, 2057,  "tcp"  },
-  { "rich-cp",         { NULL }, 2057,  "udp"  },
-  { "newwavesearch",   { NULL }, 2058,  "tcp"  },
-  { "newwavesearch",   { NULL }, 2058,  "udp"  },
-  { "bmc-messaging",   { NULL }, 2059,  "tcp"  },
-  { "bmc-messaging",   { NULL }, 2059,  "udp"  },
-  { "teleniumdaemon",  { NULL }, 2060,  "tcp"  },
-  { "teleniumdaemon",  { NULL }, 2060,  "udp"  },
-  { "netmount",        { NULL }, 2061,  "tcp"  },
-  { "netmount",        { NULL }, 2061,  "udp"  },
-  { "icg-swp",         { NULL }, 2062,  "tcp"  },
-  { "icg-swp",         { NULL }, 2062,  "udp"  },
-  { "icg-bridge",      { NULL }, 2063,  "tcp"  },
-  { "icg-bridge",      { NULL }, 2063,  "udp"  },
-  { "icg-iprelay",     { NULL }, 2064,  "tcp"  },
-  { "icg-iprelay",     { NULL }, 2064,  "udp"  },
-  { "dlsrpn",          { NULL }, 2065,  "tcp"  },
-  { "dlsrpn",          { NULL }, 2065,  "udp"  },
-  { "aura",            { NULL }, 2066,  "tcp"  },
-  { "aura",            { NULL }, 2066,  "udp"  },
-  { "dlswpn",          { NULL }, 2067,  "tcp"  },
-  { "dlswpn",          { NULL }, 2067,  "udp"  },
-  { "avauthsrvprtcl",  { NULL }, 2068,  "tcp"  },
-  { "avauthsrvprtcl",  { NULL }, 2068,  "udp"  },
-  { "event-port",      { NULL }, 2069,  "tcp"  },
-  { "event-port",      { NULL }, 2069,  "udp"  },
-  { "ah-esp-encap",    { NULL }, 2070,  "tcp"  },
-  { "ah-esp-encap",    { NULL }, 2070,  "udp"  },
-  { "acp-port",        { NULL }, 2071,  "tcp"  },
-  { "acp-port",        { NULL }, 2071,  "udp"  },
-  { "msync",           { NULL }, 2072,  "tcp"  },
-  { "msync",           { NULL }, 2072,  "udp"  },
-  { "gxs-data-port",   { NULL }, 2073,  "tcp"  },
-  { "gxs-data-port",   { NULL }, 2073,  "udp"  },
-  { "vrtl-vmf-sa",     { NULL }, 2074,  "tcp"  },
-  { "vrtl-vmf-sa",     { NULL }, 2074,  "udp"  },
-  { "newlixengine",    { NULL }, 2075,  "tcp"  },
-  { "newlixengine",    { NULL }, 2075,  "udp"  },
-  { "newlixconfig",    { NULL }, 2076,  "tcp"  },
-  { "newlixconfig",    { NULL }, 2076,  "udp"  },
-  { "tsrmagt",         { NULL }, 2077,  "tcp"  },
-  { "tsrmagt",         { NULL }, 2077,  "udp"  },
-  { "tpcsrvr",         { NULL }, 2078,  "tcp"  },
-  { "tpcsrvr",         { NULL }, 2078,  "udp"  },
-  { "idware-router",   { NULL }, 2079,  "tcp"  },
-  { "idware-router",   { NULL }, 2079,  "udp"  },
-  { "autodesk-nlm",    { NULL }, 2080,  "tcp"  },
-  { "autodesk-nlm",    { NULL }, 2080,  "udp"  },
-  { "kme-trap-port",   { NULL }, 2081,  "tcp"  },
-  { "kme-trap-port",   { NULL }, 2081,  "udp"  },
-  { "infowave",        { NULL }, 2082,  "tcp"  },
-  { "infowave",        { NULL }, 2082,  "udp"  },
-  { "radsec",          { NULL }, 2083,  "tcp"  },
-  { "radsec",          { NULL }, 2083,  "udp"  },
-  { "sunclustergeo",   { NULL }, 2084,  "tcp"  },
-  { "sunclustergeo",   { NULL }, 2084,  "udp"  },
-  { "ada-cip",         { NULL }, 2085,  "tcp"  },
-  { "ada-cip",         { NULL }, 2085,  "udp"  },
-  { "gnunet",          { NULL }, 2086,  "tcp"  },
-  { "gnunet",          { NULL }, 2086,  "udp"  },
-  { "eli",             { NULL }, 2087,  "tcp"  },
-  { "eli",             { NULL }, 2087,  "udp"  },
-  { "ip-blf",          { NULL }, 2088,  "tcp"  },
-  { "ip-blf",          { NULL }, 2088,  "udp"  },
-  { "sep",             { NULL }, 2089,  "tcp"  },
-  { "sep",             { NULL }, 2089,  "udp"  },
-  { "lrp",             { NULL }, 2090,  "tcp"  },
-  { "lrp",             { NULL }, 2090,  "udp"  },
-  { "prp",             { NULL }, 2091,  "tcp"  },
-  { "prp",             { NULL }, 2091,  "udp"  },
-  { "descent3",        { NULL }, 2092,  "tcp"  },
-  { "descent3",        { NULL }, 2092,  "udp"  },
-  { "nbx-cc",          { NULL }, 2093,  "tcp"  },
-  { "nbx-cc",          { NULL }, 2093,  "udp"  },
-  { "nbx-au",          { NULL }, 2094,  "tcp"  },
-  { "nbx-au",          { NULL }, 2094,  "udp"  },
-  { "nbx-ser",         { NULL }, 2095,  "tcp"  },
-  { "nbx-ser",         { NULL }, 2095,  "udp"  },
-  { "nbx-dir",         { NULL }, 2096,  "tcp"  },
-  { "nbx-dir",         { NULL }, 2096,  "udp"  },
-  { "jetformpreview",  { NULL }, 2097,  "tcp"  },
-  { "jetformpreview",  { NULL }, 2097,  "udp"  },
-  { "dialog-port",     { NULL }, 2098,  "tcp"  },
-  { "dialog-port",     { NULL }, 2098,  "udp"  },
-  { "h2250-annex-g",   { NULL }, 2099,  "tcp"  },
-  { "h2250-annex-g",   { NULL }, 2099,  "udp"  },
-  { "amiganetfs",      { NULL }, 2100,  "tcp"  },
-  { "amiganetfs",      { NULL }, 2100,  "udp"  },
-  { "rtcm-sc104",      { NULL }, 2101,  "tcp"  },
-  { "rtcm-sc104",      { NULL }, 2101,  "udp"  },
-  { "zephyr-srv",      { NULL }, 2102,  "tcp"  },
-  { "zephyr-srv",      { NULL }, 2102,  "udp"  },
-  { "zephyr-clt",      { NULL }, 2103,  "tcp"  },
-  { "zephyr-clt",      { NULL }, 2103,  "udp"  },
-  { "zephyr-hm",       { NULL }, 2104,  "tcp"  },
-  { "zephyr-hm",       { NULL }, 2104,  "udp"  },
-  { "minipay",         { NULL }, 2105,  "tcp"  },
-  { "minipay",         { NULL }, 2105,  "udp"  },
-  { "mzap",            { NULL }, 2106,  "tcp"  },
-  { "mzap",            { NULL }, 2106,  "udp"  },
-  { "bintec-admin",    { NULL }, 2107,  "tcp"  },
-  { "bintec-admin",    { NULL }, 2107,  "udp"  },
-  { "comcam",          { NULL }, 2108,  "tcp"  },
-  { "comcam",          { NULL }, 2108,  "udp"  },
-  { "ergolight",       { NULL }, 2109,  "tcp"  },
-  { "ergolight",       { NULL }, 2109,  "udp"  },
-  { "umsp",            { NULL }, 2110,  "tcp"  },
-  { "umsp",            { NULL }, 2110,  "udp"  },
-  { "dsatp",           { NULL }, 2111,  "tcp"  },
-  { "dsatp",           { NULL }, 2111,  "udp"  },
-  { "idonix-metanet",  { NULL }, 2112,  "tcp"  },
-  { "idonix-metanet",  { NULL }, 2112,  "udp"  },
-  { "hsl-storm",       { NULL }, 2113,  "tcp"  },
-  { "hsl-storm",       { NULL }, 2113,  "udp"  },
-  { "newheights",      { NULL }, 2114,  "tcp"  },
-  { "newheights",      { NULL }, 2114,  "udp"  },
-  { "kdm",             { NULL }, 2115,  "tcp"  },
-  { "kdm",             { NULL }, 2115,  "udp"  },
-  { "ccowcmr",         { NULL }, 2116,  "tcp"  },
-  { "ccowcmr",         { NULL }, 2116,  "udp"  },
-  { "mentaclient",     { NULL }, 2117,  "tcp"  },
-  { "mentaclient",     { NULL }, 2117,  "udp"  },
-  { "mentaserver",     { NULL }, 2118,  "tcp"  },
-  { "mentaserver",     { NULL }, 2118,  "udp"  },
-  { "gsigatekeeper",   { NULL }, 2119,  "tcp"  },
-  { "gsigatekeeper",   { NULL }, 2119,  "udp"  },
-  { "qencp",           { NULL }, 2120,  "tcp"  },
-  { "qencp",           { NULL }, 2120,  "udp"  },
-  { "scientia-ssdb",   { NULL }, 2121,  "tcp"  },
-  { "scientia-ssdb",   { NULL }, 2121,  "udp"  },
-  { "caupc-remote",    { NULL }, 2122,  "tcp"  },
-  { "caupc-remote",    { NULL }, 2122,  "udp"  },
-  { "gtp-control",     { NULL }, 2123,  "tcp"  },
-  { "gtp-control",     { NULL }, 2123,  "udp"  },
-  { "elatelink",       { NULL }, 2124,  "tcp"  },
-  { "elatelink",       { NULL }, 2124,  "udp"  },
-  { "lockstep",        { NULL }, 2125,  "tcp"  },
-  { "lockstep",        { NULL }, 2125,  "udp"  },
-  { "pktcable-cops",   { NULL }, 2126,  "tcp"  },
-  { "pktcable-cops",   { NULL }, 2126,  "udp"  },
-  { "index-pc-wb",     { NULL }, 2127,  "tcp"  },
-  { "index-pc-wb",     { NULL }, 2127,  "udp"  },
-  { "net-steward",     { NULL }, 2128,  "tcp"  },
-  { "net-steward",     { NULL }, 2128,  "udp"  },
-  { "cs-live",         { NULL }, 2129,  "tcp"  },
-  { "cs-live",         { NULL }, 2129,  "udp"  },
-  { "xds",             { NULL }, 2130,  "tcp"  },
-  { "xds",             { NULL }, 2130,  "udp"  },
-  { "avantageb2b",     { NULL }, 2131,  "tcp"  },
-  { "avantageb2b",     { NULL }, 2131,  "udp"  },
-  { "solera-epmap",    { NULL }, 2132,  "tcp"  },
-  { "solera-epmap",    { NULL }, 2132,  "udp"  },
-  { "zymed-zpp",       { NULL }, 2133,  "tcp"  },
-  { "zymed-zpp",       { NULL }, 2133,  "udp"  },
-  { "avenue",          { NULL }, 2134,  "tcp"  },
-  { "avenue",          { NULL }, 2134,  "udp"  },
-  { "gris",            { NULL }, 2135,  "tcp"  },
-  { "gris",            { NULL }, 2135,  "udp"  },
-  { "appworxsrv",      { NULL }, 2136,  "tcp"  },
-  { "appworxsrv",      { NULL }, 2136,  "udp"  },
-  { "connect",         { NULL }, 2137,  "tcp"  },
-  { "connect",         { NULL }, 2137,  "udp"  },
-  { "unbind-cluster",  { NULL }, 2138,  "tcp"  },
-  { "unbind-cluster",  { NULL }, 2138,  "udp"  },
-  { "ias-auth",        { NULL }, 2139,  "tcp"  },
-  { "ias-auth",        { NULL }, 2139,  "udp"  },
-  { "ias-reg",         { NULL }, 2140,  "tcp"  },
-  { "ias-reg",         { NULL }, 2140,  "udp"  },
-  { "ias-admind",      { NULL }, 2141,  "tcp"  },
-  { "ias-admind",      { NULL }, 2141,  "udp"  },
-  { "tdmoip",          { NULL }, 2142,  "tcp"  },
-  { "tdmoip",          { NULL }, 2142,  "udp"  },
-  { "lv-jc",           { NULL }, 2143,  "tcp"  },
-  { "lv-jc",           { NULL }, 2143,  "udp"  },
-  { "lv-ffx",          { NULL }, 2144,  "tcp"  },
-  { "lv-ffx",          { NULL }, 2144,  "udp"  },
-  { "lv-pici",         { NULL }, 2145,  "tcp"  },
-  { "lv-pici",         { NULL }, 2145,  "udp"  },
-  { "lv-not",          { NULL }, 2146,  "tcp"  },
-  { "lv-not",          { NULL }, 2146,  "udp"  },
-  { "lv-auth",         { NULL }, 2147,  "tcp"  },
-  { "lv-auth",         { NULL }, 2147,  "udp"  },
-  { "veritas-ucl",     { NULL }, 2148,  "tcp"  },
-  { "veritas-ucl",     { NULL }, 2148,  "udp"  },
-  { "acptsys",         { NULL }, 2149,  "tcp"  },
-  { "acptsys",         { NULL }, 2149,  "udp"  },
-  { "dynamic3d",       { NULL }, 2150,  "tcp"  },
-  { "dynamic3d",       { NULL }, 2150,  "udp"  },
-  { "docent",          { NULL }, 2151,  "tcp"  },
-  { "docent",          { NULL }, 2151,  "udp"  },
-  { "gtp-user",        { NULL }, 2152,  "tcp"  },
-  { "gtp-user",        { NULL }, 2152,  "udp"  },
-  { "ctlptc",          { NULL }, 2153,  "tcp"  },
-  { "ctlptc",          { NULL }, 2153,  "udp"  },
-  { "stdptc",          { NULL }, 2154,  "tcp"  },
-  { "stdptc",          { NULL }, 2154,  "udp"  },
-  { "brdptc",          { NULL }, 2155,  "tcp"  },
-  { "brdptc",          { NULL }, 2155,  "udp"  },
-  { "trp",             { NULL }, 2156,  "tcp"  },
-  { "trp",             { NULL }, 2156,  "udp"  },
-  { "xnds",            { NULL }, 2157,  "tcp"  },
-  { "xnds",            { NULL }, 2157,  "udp"  },
-  { "touchnetplus",    { NULL }, 2158,  "tcp"  },
-  { "touchnetplus",    { NULL }, 2158,  "udp"  },
-  { "gdbremote",       { NULL }, 2159,  "tcp"  },
-  { "gdbremote",       { NULL }, 2159,  "udp"  },
-  { "apc-2160",        { NULL }, 2160,  "tcp"  },
-  { "apc-2160",        { NULL }, 2160,  "udp"  },
-  { "apc-2161",        { NULL }, 2161,  "tcp"  },
-  { "apc-2161",        { NULL }, 2161,  "udp"  },
-  { "navisphere",      { NULL }, 2162,  "tcp"  },
-  { "navisphere",      { NULL }, 2162,  "udp"  },
-  { "navisphere-sec",  { NULL }, 2163,  "tcp"  },
-  { "navisphere-sec",  { NULL }, 2163,  "udp"  },
-  { "ddns-v3",         { NULL }, 2164,  "tcp"  },
-  { "ddns-v3",         { NULL }, 2164,  "udp"  },
-  { "x-bone-api",      { NULL }, 2165,  "tcp"  },
-  { "x-bone-api",      { NULL }, 2165,  "udp"  },
-  { "iwserver",        { NULL }, 2166,  "tcp"  },
-  { "iwserver",        { NULL }, 2166,  "udp"  },
-  { "raw-serial",      { NULL }, 2167,  "tcp"  },
-  { "raw-serial",      { NULL }, 2167,  "udp"  },
-  { "easy-soft-mux",   { NULL }, 2168,  "tcp"  },
-  { "easy-soft-mux",   { NULL }, 2168,  "udp"  },
-  { "brain",           { NULL }, 2169,  "tcp"  },
-  { "brain",           { NULL }, 2169,  "udp"  },
-  { "eyetv",           { NULL }, 2170,  "tcp"  },
-  { "eyetv",           { NULL }, 2170,  "udp"  },
-  { "msfw-storage",    { NULL }, 2171,  "tcp"  },
-  { "msfw-storage",    { NULL }, 2171,  "udp"  },
-  { "msfw-s-storage",  { NULL }, 2172,  "tcp"  },
-  { "msfw-s-storage",  { NULL }, 2172,  "udp"  },
-  { "msfw-replica",    { NULL }, 2173,  "tcp"  },
-  { "msfw-replica",    { NULL }, 2173,  "udp"  },
-  { "msfw-array",      { NULL }, 2174,  "tcp"  },
-  { "msfw-array",      { NULL }, 2174,  "udp"  },
-  { "airsync",         { NULL }, 2175,  "tcp"  },
-  { "airsync",         { NULL }, 2175,  "udp"  },
-  { "rapi",            { NULL }, 2176,  "tcp"  },
-  { "rapi",            { NULL }, 2176,  "udp"  },
-  { "qwave",           { NULL }, 2177,  "tcp"  },
-  { "qwave",           { NULL }, 2177,  "udp"  },
-  { "bitspeer",        { NULL }, 2178,  "tcp"  },
-  { "bitspeer",        { NULL }, 2178,  "udp"  },
-  { "vmrdp",           { NULL }, 2179,  "tcp"  },
-  { "vmrdp",           { NULL }, 2179,  "udp"  },
-  { "mc-gt-srv",       { NULL }, 2180,  "tcp"  },
-  { "mc-gt-srv",       { NULL }, 2180,  "udp"  },
-  { "eforward",        { NULL }, 2181,  "tcp"  },
-  { "eforward",        { NULL }, 2181,  "udp"  },
-  { "cgn-stat",        { NULL }, 2182,  "tcp"  },
-  { "cgn-stat",        { NULL }, 2182,  "udp"  },
-  { "cgn-config",      { NULL }, 2183,  "tcp"  },
-  { "cgn-config",      { NULL }, 2183,  "udp"  },
-  { "nvd",             { NULL }, 2184,  "tcp"  },
-  { "nvd",             { NULL }, 2184,  "udp"  },
-  { "onbase-dds",      { NULL }, 2185,  "tcp"  },
-  { "onbase-dds",      { NULL }, 2185,  "udp"  },
-  { "gtaua",           { NULL }, 2186,  "tcp"  },
-  { "gtaua",           { NULL }, 2186,  "udp"  },
-  { "ssmc",            { NULL }, 2187,  "tcp"  },
-  { "ssmd",            { NULL }, 2187,  "udp"  },
-  { "tivoconnect",     { NULL }, 2190,  "tcp"  },
-  { "tivoconnect",     { NULL }, 2190,  "udp"  },
-  { "tvbus",           { NULL }, 2191,  "tcp"  },
-  { "tvbus",           { NULL }, 2191,  "udp"  },
-  { "asdis",           { NULL }, 2192,  "tcp"  },
-  { "asdis",           { NULL }, 2192,  "udp"  },
-  { "drwcs",           { NULL }, 2193,  "tcp"  },
-  { "drwcs",           { NULL }, 2193,  "udp"  },
-  { "mnp-exchange",    { NULL }, 2197,  "tcp"  },
-  { "mnp-exchange",    { NULL }, 2197,  "udp"  },
-  { "onehome-remote",  { NULL }, 2198,  "tcp"  },
-  { "onehome-remote",  { NULL }, 2198,  "udp"  },
-  { "onehome-help",    { NULL }, 2199,  "tcp"  },
-  { "onehome-help",    { NULL }, 2199,  "udp"  },
-  { "ici",             { NULL }, 2200,  "tcp"  },
-  { "ici",             { NULL }, 2200,  "udp"  },
-  { "ats",             { NULL }, 2201,  "tcp"  },
-  { "ats",             { NULL }, 2201,  "udp"  },
-  { "imtc-map",        { NULL }, 2202,  "tcp"  },
-  { "imtc-map",        { NULL }, 2202,  "udp"  },
-  { "b2-runtime",      { NULL }, 2203,  "tcp"  },
-  { "b2-runtime",      { NULL }, 2203,  "udp"  },
-  { "b2-license",      { NULL }, 2204,  "tcp"  },
-  { "b2-license",      { NULL }, 2204,  "udp"  },
-  { "jps",             { NULL }, 2205,  "tcp"  },
-  { "jps",             { NULL }, 2205,  "udp"  },
-  { "hpocbus",         { NULL }, 2206,  "tcp"  },
-  { "hpocbus",         { NULL }, 2206,  "udp"  },
-  { "hpssd",           { NULL }, 2207,  "tcp"  },
-  { "hpssd",           { NULL }, 2207,  "udp"  },
-  { "hpiod",           { NULL }, 2208,  "tcp"  },
-  { "hpiod",           { NULL }, 2208,  "udp"  },
-  { "rimf-ps",         { NULL }, 2209,  "tcp"  },
-  { "rimf-ps",         { NULL }, 2209,  "udp"  },
-  { "noaaport",        { NULL }, 2210,  "tcp"  },
-  { "noaaport",        { NULL }, 2210,  "udp"  },
-  { "emwin",           { NULL }, 2211,  "tcp"  },
-  { "emwin",           { NULL }, 2211,  "udp"  },
-  { "leecoposserver",  { NULL }, 2212,  "tcp"  },
-  { "leecoposserver",  { NULL }, 2212,  "udp"  },
-  { "kali",            { NULL }, 2213,  "tcp"  },
-  { "kali",            { NULL }, 2213,  "udp"  },
-  { "rpi",             { NULL }, 2214,  "tcp"  },
-  { "rpi",             { NULL }, 2214,  "udp"  },
-  { "ipcore",          { NULL }, 2215,  "tcp"  },
-  { "ipcore",          { NULL }, 2215,  "udp"  },
-  { "vtu-comms",       { NULL }, 2216,  "tcp"  },
-  { "vtu-comms",       { NULL }, 2216,  "udp"  },
-  { "gotodevice",      { NULL }, 2217,  "tcp"  },
-  { "gotodevice",      { NULL }, 2217,  "udp"  },
-  { "bounzza",         { NULL }, 2218,  "tcp"  },
-  { "bounzza",         { NULL }, 2218,  "udp"  },
-  { "netiq-ncap",      { NULL }, 2219,  "tcp"  },
-  { "netiq-ncap",      { NULL }, 2219,  "udp"  },
-  { "netiq",           { NULL }, 2220,  "tcp"  },
-  { "netiq",           { NULL }, 2220,  "udp"  },
-  { "rockwell-csp1",   { NULL }, 2221,  "tcp"  },
-  { "rockwell-csp1",   { NULL }, 2221,  "udp"  },
-  { "EtherNet/IP-1",   { NULL }, 2222,  "tcp"  },
-  { "EtherNet/IP-1",   { NULL }, 2222,  "udp"  },
-  { "rockwell-csp2",   { NULL }, 2223,  "tcp"  },
-  { "rockwell-csp2",   { NULL }, 2223,  "udp"  },
-  { "efi-mg",          { NULL }, 2224,  "tcp"  },
-  { "efi-mg",          { NULL }, 2224,  "udp"  },
-  { "rcip-itu",        { NULL }, 2225,  "tcp"  },
-  { "rcip-itu",        { NULL }, 2225,  "sctp" },
-  { "di-drm",          { NULL }, 2226,  "tcp"  },
-  { "di-drm",          { NULL }, 2226,  "udp"  },
-  { "di-msg",          { NULL }, 2227,  "tcp"  },
-  { "di-msg",          { NULL }, 2227,  "udp"  },
-  { "ehome-ms",        { NULL }, 2228,  "tcp"  },
-  { "ehome-ms",        { NULL }, 2228,  "udp"  },
-  { "datalens",        { NULL }, 2229,  "tcp"  },
-  { "datalens",        { NULL }, 2229,  "udp"  },
-  { "queueadm",        { NULL }, 2230,  "tcp"  },
-  { "queueadm",        { NULL }, 2230,  "udp"  },
-  { "wimaxasncp",      { NULL }, 2231,  "tcp"  },
-  { "wimaxasncp",      { NULL }, 2231,  "udp"  },
-  { "ivs-video",       { NULL }, 2232,  "tcp"  },
-  { "ivs-video",       { NULL }, 2232,  "udp"  },
-  { "infocrypt",       { NULL }, 2233,  "tcp"  },
-  { "infocrypt",       { NULL }, 2233,  "udp"  },
-  { "directplay",      { NULL }, 2234,  "tcp"  },
-  { "directplay",      { NULL }, 2234,  "udp"  },
-  { "sercomm-wlink",   { NULL }, 2235,  "tcp"  },
-  { "sercomm-wlink",   { NULL }, 2235,  "udp"  },
-  { "nani",            { NULL }, 2236,  "tcp"  },
-  { "nani",            { NULL }, 2236,  "udp"  },
-  { "optech-port1-lm", { NULL }, 2237,  "tcp"  },
-  { "optech-port1-lm", { NULL }, 2237,  "udp"  },
-  { "aviva-sna",       { NULL }, 2238,  "tcp"  },
-  { "aviva-sna",       { NULL }, 2238,  "udp"  },
-  { "imagequery",      { NULL }, 2239,  "tcp"  },
-  { "imagequery",      { NULL }, 2239,  "udp"  },
-  { "recipe",          { NULL }, 2240,  "tcp"  },
-  { "recipe",          { NULL }, 2240,  "udp"  },
-  { "ivsd",            { NULL }, 2241,  "tcp"  },
-  { "ivsd",            { NULL }, 2241,  "udp"  },
-  { "foliocorp",       { NULL }, 2242,  "tcp"  },
-  { "foliocorp",       { NULL }, 2242,  "udp"  },
-  { "magicom",         { NULL }, 2243,  "tcp"  },
-  { "magicom",         { NULL }, 2243,  "udp"  },
-  { "nmsserver",       { NULL }, 2244,  "tcp"  },
-  { "nmsserver",       { NULL }, 2244,  "udp"  },
-  { "hao",             { NULL }, 2245,  "tcp"  },
-  { "hao",             { NULL }, 2245,  "udp"  },
-  { "pc-mta-addrmap",  { NULL }, 2246,  "tcp"  },
-  { "pc-mta-addrmap",  { NULL }, 2246,  "udp"  },
-  { "antidotemgrsvr",  { NULL }, 2247,  "tcp"  },
-  { "antidotemgrsvr",  { NULL }, 2247,  "udp"  },
-  { "ums",             { NULL }, 2248,  "tcp"  },
-  { "ums",             { NULL }, 2248,  "udp"  },
-  { "rfmp",            { NULL }, 2249,  "tcp"  },
-  { "rfmp",            { NULL }, 2249,  "udp"  },
-  { "remote-collab",   { NULL }, 2250,  "tcp"  },
-  { "remote-collab",   { NULL }, 2250,  "udp"  },
-  { "dif-port",        { NULL }, 2251,  "tcp"  },
-  { "dif-port",        { NULL }, 2251,  "udp"  },
-  { "njenet-ssl",      { NULL }, 2252,  "tcp"  },
-  { "njenet-ssl",      { NULL }, 2252,  "udp"  },
-  { "dtv-chan-req",    { NULL }, 2253,  "tcp"  },
-  { "dtv-chan-req",    { NULL }, 2253,  "udp"  },
-  { "seispoc",         { NULL }, 2254,  "tcp"  },
-  { "seispoc",         { NULL }, 2254,  "udp"  },
-  { "vrtp",            { NULL }, 2255,  "tcp"  },
-  { "vrtp",            { NULL }, 2255,  "udp"  },
-  { "pcc-mfp",         { NULL }, 2256,  "tcp"  },
-  { "pcc-mfp",         { NULL }, 2256,  "udp"  },
-  { "simple-tx-rx",    { NULL }, 2257,  "tcp"  },
-  { "simple-tx-rx",    { NULL }, 2257,  "udp"  },
-  { "rcts",            { NULL }, 2258,  "tcp"  },
-  { "rcts",            { NULL }, 2258,  "udp"  },
-  { "acd-pm",          { NULL }, 2259,  "tcp"  },
-  { "acd-pm",          { NULL }, 2259,  "udp"  },
-  { "apc-2260",        { NULL }, 2260,  "tcp"  },
-  { "apc-2260",        { NULL }, 2260,  "udp"  },
-  { "comotionmaster",  { NULL }, 2261,  "tcp"  },
-  { "comotionmaster",  { NULL }, 2261,  "udp"  },
-  { "comotionback",    { NULL }, 2262,  "tcp"  },
-  { "comotionback",    { NULL }, 2262,  "udp"  },
-  { "ecwcfg",          { NULL }, 2263,  "tcp"  },
-  { "ecwcfg",          { NULL }, 2263,  "udp"  },
-  { "apx500api-1",     { NULL }, 2264,  "tcp"  },
-  { "apx500api-1",     { NULL }, 2264,  "udp"  },
-  { "apx500api-2",     { NULL }, 2265,  "tcp"  },
-  { "apx500api-2",     { NULL }, 2265,  "udp"  },
-  { "mfserver",        { NULL }, 2266,  "tcp"  },
-  { "mfserver",        { NULL }, 2266,  "udp"  },
-  { "ontobroker",      { NULL }, 2267,  "tcp"  },
-  { "ontobroker",      { NULL }, 2267,  "udp"  },
-  { "amt",             { NULL }, 2268,  "tcp"  },
-  { "amt",             { NULL }, 2268,  "udp"  },
-  { "mikey",           { NULL }, 2269,  "tcp"  },
-  { "mikey",           { NULL }, 2269,  "udp"  },
-  { "starschool",      { NULL }, 2270,  "tcp"  },
-  { "starschool",      { NULL }, 2270,  "udp"  },
-  { "mmcals",          { NULL }, 2271,  "tcp"  },
-  { "mmcals",          { NULL }, 2271,  "udp"  },
-  { "mmcal",           { NULL }, 2272,  "tcp"  },
-  { "mmcal",           { NULL }, 2272,  "udp"  },
-  { "mysql-im",        { NULL }, 2273,  "tcp"  },
-  { "mysql-im",        { NULL }, 2273,  "udp"  },
-  { "pcttunnell",      { NULL }, 2274,  "tcp"  },
-  { "pcttunnell",      { NULL }, 2274,  "udp"  },
-  { "ibridge-data",    { NULL }, 2275,  "tcp"  },
-  { "ibridge-data",    { NULL }, 2275,  "udp"  },
-  { "ibridge-mgmt",    { NULL }, 2276,  "tcp"  },
-  { "ibridge-mgmt",    { NULL }, 2276,  "udp"  },
-  { "bluectrlproxy",   { NULL }, 2277,  "tcp"  },
-  { "bluectrlproxy",   { NULL }, 2277,  "udp"  },
-  { "s3db",            { NULL }, 2278,  "tcp"  },
-  { "s3db",            { NULL }, 2278,  "udp"  },
-  { "xmquery",         { NULL }, 2279,  "tcp"  },
-  { "xmquery",         { NULL }, 2279,  "udp"  },
-  { "lnvpoller",       { NULL }, 2280,  "tcp"  },
-  { "lnvpoller",       { NULL }, 2280,  "udp"  },
-  { "lnvconsole",      { NULL }, 2281,  "tcp"  },
-  { "lnvconsole",      { NULL }, 2281,  "udp"  },
-  { "lnvalarm",        { NULL }, 2282,  "tcp"  },
-  { "lnvalarm",        { NULL }, 2282,  "udp"  },
-  { "lnvstatus",       { NULL }, 2283,  "tcp"  },
-  { "lnvstatus",       { NULL }, 2283,  "udp"  },
-  { "lnvmaps",         { NULL }, 2284,  "tcp"  },
-  { "lnvmaps",         { NULL }, 2284,  "udp"  },
-  { "lnvmailmon",      { NULL }, 2285,  "tcp"  },
-  { "lnvmailmon",      { NULL }, 2285,  "udp"  },
-  { "nas-metering",    { NULL }, 2286,  "tcp"  },
-  { "nas-metering",    { NULL }, 2286,  "udp"  },
-  { "dna",             { NULL }, 2287,  "tcp"  },
-  { "dna",             { NULL }, 2287,  "udp"  },
-  { "netml",           { NULL }, 2288,  "tcp"  },
-  { "netml",           { NULL }, 2288,  "udp"  },
-  { "dict-lookup",     { NULL }, 2289,  "tcp"  },
-  { "dict-lookup",     { NULL }, 2289,  "udp"  },
-  { "sonus-logging",   { NULL }, 2290,  "tcp"  },
-  { "sonus-logging",   { NULL }, 2290,  "udp"  },
-  { "eapsp",           { NULL }, 2291,  "tcp"  },
-  { "eapsp",           { NULL }, 2291,  "udp"  },
-  { "mib-streaming",   { NULL }, 2292,  "tcp"  },
-  { "mib-streaming",   { NULL }, 2292,  "udp"  },
-  { "npdbgmngr",       { NULL }, 2293,  "tcp"  },
-  { "npdbgmngr",       { NULL }, 2293,  "udp"  },
-  { "konshus-lm",      { NULL }, 2294,  "tcp"  },
-  { "konshus-lm",      { NULL }, 2294,  "udp"  },
-  { "advant-lm",       { NULL }, 2295,  "tcp"  },
-  { "advant-lm",       { NULL }, 2295,  "udp"  },
-  { "theta-lm",        { NULL }, 2296,  "tcp"  },
-  { "theta-lm",        { NULL }, 2296,  "udp"  },
-  { "d2k-datamover1",  { NULL }, 2297,  "tcp"  },
-  { "d2k-datamover1",  { NULL }, 2297,  "udp"  },
-  { "d2k-datamover2",  { NULL }, 2298,  "tcp"  },
-  { "d2k-datamover2",  { NULL }, 2298,  "udp"  },
-  { "pc-telecommute",  { NULL }, 2299,  "tcp"  },
-  { "pc-telecommute",  { NULL }, 2299,  "udp"  },
-  { "cvmmon",          { NULL }, 2300,  "tcp"  },
-  { "cvmmon",          { NULL }, 2300,  "udp"  },
-  { "cpq-wbem",        { NULL }, 2301,  "tcp"  },
-  { "cpq-wbem",        { NULL }, 2301,  "udp"  },
-  { "binderysupport",  { NULL }, 2302,  "tcp"  },
-  { "binderysupport",  { NULL }, 2302,  "udp"  },
-  { "proxy-gateway",   { NULL }, 2303,  "tcp"  },
-  { "proxy-gateway",   { NULL }, 2303,  "udp"  },
-  { "attachmate-uts",  { NULL }, 2304,  "tcp"  },
-  { "attachmate-uts",  { NULL }, 2304,  "udp"  },
-  { "mt-scaleserver",  { NULL }, 2305,  "tcp"  },
-  { "mt-scaleserver",  { NULL }, 2305,  "udp"  },
-  { "tappi-boxnet",    { NULL }, 2306,  "tcp"  },
-  { "tappi-boxnet",    { NULL }, 2306,  "udp"  },
-  { "pehelp",          { NULL }, 2307,  "tcp"  },
-  { "pehelp",          { NULL }, 2307,  "udp"  },
-  { "sdhelp",          { NULL }, 2308,  "tcp"  },
-  { "sdhelp",          { NULL }, 2308,  "udp"  },
-  { "sdserver",        { NULL }, 2309,  "tcp"  },
-  { "sdserver",        { NULL }, 2309,  "udp"  },
-  { "sdclient",        { NULL }, 2310,  "tcp"  },
-  { "sdclient",        { NULL }, 2310,  "udp"  },
-  { "messageservice",  { NULL }, 2311,  "tcp"  },
-  { "messageservice",  { NULL }, 2311,  "udp"  },
-  { "wanscaler",       { NULL }, 2312,  "tcp"  },
-  { "wanscaler",       { NULL }, 2312,  "udp"  },
-  { "iapp",            { NULL }, 2313,  "tcp"  },
-  { "iapp",            { NULL }, 2313,  "udp"  },
-  { "cr-websystems",   { NULL }, 2314,  "tcp"  },
-  { "cr-websystems",   { NULL }, 2314,  "udp"  },
-  { "precise-sft",     { NULL }, 2315,  "tcp"  },
-  { "precise-sft",     { NULL }, 2315,  "udp"  },
-  { "sent-lm",         { NULL }, 2316,  "tcp"  },
-  { "sent-lm",         { NULL }, 2316,  "udp"  },
-  { "attachmate-g32",  { NULL }, 2317,  "tcp"  },
-  { "attachmate-g32",  { NULL }, 2317,  "udp"  },
-  { "cadencecontrol",  { NULL }, 2318,  "tcp"  },
-  { "cadencecontrol",  { NULL }, 2318,  "udp"  },
-  { "infolibria",      { NULL }, 2319,  "tcp"  },
-  { "infolibria",      { NULL }, 2319,  "udp"  },
-  { "siebel-ns",       { NULL }, 2320,  "tcp"  },
-  { "siebel-ns",       { NULL }, 2320,  "udp"  },
-  { "rdlap",           { NULL }, 2321,  "tcp"  },
-  { "rdlap",           { NULL }, 2321,  "udp"  },
-  { "ofsd",            { NULL }, 2322,  "tcp"  },
-  { "ofsd",            { NULL }, 2322,  "udp"  },
-  { "3d-nfsd",         { NULL }, 2323,  "tcp"  },
-  { "3d-nfsd",         { NULL }, 2323,  "udp"  },
-  { "cosmocall",       { NULL }, 2324,  "tcp"  },
-  { "cosmocall",       { NULL }, 2324,  "udp"  },
-  { "ansysli",         { NULL }, 2325,  "tcp"  },
-  { "ansysli",         { NULL }, 2325,  "udp"  },
-  { "idcp",            { NULL }, 2326,  "tcp"  },
-  { "idcp",            { NULL }, 2326,  "udp"  },
-  { "xingcsm",         { NULL }, 2327,  "tcp"  },
-  { "xingcsm",         { NULL }, 2327,  "udp"  },
-  { "netrix-sftm",     { NULL }, 2328,  "tcp"  },
-  { "netrix-sftm",     { NULL }, 2328,  "udp"  },
-  { "nvd",             { NULL }, 2329,  "tcp"  },
-  { "nvd",             { NULL }, 2329,  "udp"  },
-  { "tscchat",         { NULL }, 2330,  "tcp"  },
-  { "tscchat",         { NULL }, 2330,  "udp"  },
-  { "agentview",       { NULL }, 2331,  "tcp"  },
-  { "agentview",       { NULL }, 2331,  "udp"  },
-  { "rcc-host",        { NULL }, 2332,  "tcp"  },
-  { "rcc-host",        { NULL }, 2332,  "udp"  },
-  { "snapp",           { NULL }, 2333,  "tcp"  },
-  { "snapp",           { NULL }, 2333,  "udp"  },
-  { "ace-client",      { NULL }, 2334,  "tcp"  },
-  { "ace-client",      { NULL }, 2334,  "udp"  },
-  { "ace-proxy",       { NULL }, 2335,  "tcp"  },
-  { "ace-proxy",       { NULL }, 2335,  "udp"  },
-  { "appleugcontrol",  { NULL }, 2336,  "tcp"  },
-  { "appleugcontrol",  { NULL }, 2336,  "udp"  },
-  { "ideesrv",         { NULL }, 2337,  "tcp"  },
-  { "ideesrv",         { NULL }, 2337,  "udp"  },
-  { "norton-lambert",  { NULL }, 2338,  "tcp"  },
-  { "norton-lambert",  { NULL }, 2338,  "udp"  },
-  { "3com-webview",    { NULL }, 2339,  "tcp"  },
-  { "3com-webview",    { NULL }, 2339,  "udp"  },
-  { "wrs_registry",    { NULL }, 2340,  "tcp"  },
-  { "wrs_registry",    { NULL }, 2340,  "udp"  },
-  { "xiostatus",       { NULL }, 2341,  "tcp"  },
-  { "xiostatus",       { NULL }, 2341,  "udp"  },
-  { "manage-exec",     { NULL }, 2342,  "tcp"  },
-  { "manage-exec",     { NULL }, 2342,  "udp"  },
-  { "nati-logos",      { NULL }, 2343,  "tcp"  },
-  { "nati-logos",      { NULL }, 2343,  "udp"  },
-  { "fcmsys",          { NULL }, 2344,  "tcp"  },
-  { "fcmsys",          { NULL }, 2344,  "udp"  },
-  { "dbm",             { NULL }, 2345,  "tcp"  },
-  { "dbm",             { NULL }, 2345,  "udp"  },
-  { "redstorm_join",   { NULL }, 2346,  "tcp"  },
-  { "redstorm_join",   { NULL }, 2346,  "udp"  },
-  { "redstorm_find",   { NULL }, 2347,  "tcp"  },
-  { "redstorm_find",   { NULL }, 2347,  "udp"  },
-  { "redstorm_info",   { NULL }, 2348,  "tcp"  },
-  { "redstorm_info",   { NULL }, 2348,  "udp"  },
-  { "redstorm_diag",   { NULL }, 2349,  "tcp"  },
-  { "redstorm_diag",   { NULL }, 2349,  "udp"  },
-  { "psbserver",       { NULL }, 2350,  "tcp"  },
-  { "psbserver",       { NULL }, 2350,  "udp"  },
-  { "psrserver",       { NULL }, 2351,  "tcp"  },
-  { "psrserver",       { NULL }, 2351,  "udp"  },
-  { "pslserver",       { NULL }, 2352,  "tcp"  },
-  { "pslserver",       { NULL }, 2352,  "udp"  },
-  { "pspserver",       { NULL }, 2353,  "tcp"  },
-  { "pspserver",       { NULL }, 2353,  "udp"  },
-  { "psprserver",      { NULL }, 2354,  "tcp"  },
-  { "psprserver",      { NULL }, 2354,  "udp"  },
-  { "psdbserver",      { NULL }, 2355,  "tcp"  },
-  { "psdbserver",      { NULL }, 2355,  "udp"  },
-  { "gxtelmd",         { NULL }, 2356,  "tcp"  },
-  { "gxtelmd",         { NULL }, 2356,  "udp"  },
-  { "unihub-server",   { NULL }, 2357,  "tcp"  },
-  { "unihub-server",   { NULL }, 2357,  "udp"  },
-  { "futrix",          { NULL }, 2358,  "tcp"  },
-  { "futrix",          { NULL }, 2358,  "udp"  },
-  { "flukeserver",     { NULL }, 2359,  "tcp"  },
-  { "flukeserver",     { NULL }, 2359,  "udp"  },
-  { "nexstorindltd",   { NULL }, 2360,  "tcp"  },
-  { "nexstorindltd",   { NULL }, 2360,  "udp"  },
-  { "tl1",             { NULL }, 2361,  "tcp"  },
-  { "tl1",             { NULL }, 2361,  "udp"  },
-  { "digiman",         { NULL }, 2362,  "tcp"  },
-  { "digiman",         { NULL }, 2362,  "udp"  },
-  { "mediacntrlnfsd",  { NULL }, 2363,  "tcp"  },
-  { "mediacntrlnfsd",  { NULL }, 2363,  "udp"  },
-  { "oi-2000",         { NULL }, 2364,  "tcp"  },
-  { "oi-2000",         { NULL }, 2364,  "udp"  },
-  { "dbref",           { NULL }, 2365,  "tcp"  },
-  { "dbref",           { NULL }, 2365,  "udp"  },
-  { "qip-login",       { NULL }, 2366,  "tcp"  },
-  { "qip-login",       { NULL }, 2366,  "udp"  },
-  { "service-ctrl",    { NULL }, 2367,  "tcp"  },
-  { "service-ctrl",    { NULL }, 2367,  "udp"  },
-  { "opentable",       { NULL }, 2368,  "tcp"  },
-  { "opentable",       { NULL }, 2368,  "udp"  },
-  { "l3-hbmon",        { NULL }, 2370,  "tcp"  },
-  { "l3-hbmon",        { NULL }, 2370,  "udp"  },
-  { "worldwire",       { NULL }, 2371,  "tcp"  },
-  { "worldwire",       { NULL }, 2371,  "udp"  },
-  { "lanmessenger",    { NULL }, 2372,  "tcp"  },
-  { "lanmessenger",    { NULL }, 2372,  "udp"  },
-  { "remographlm",     { NULL }, 2373,  "tcp"  },
-  { "hydra",           { NULL }, 2374,  "tcp"  },
-  { "compaq-https",    { NULL }, 2381,  "tcp"  },
-  { "compaq-https",    { NULL }, 2381,  "udp"  },
-  { "ms-olap3",        { NULL }, 2382,  "tcp"  },
-  { "ms-olap3",        { NULL }, 2382,  "udp"  },
-  { "ms-olap4",        { NULL }, 2383,  "tcp"  },
-  { "ms-olap4",        { NULL }, 2383,  "udp"  },
-  { "sd-request",      { NULL }, 2384,  "tcp"  },
-  { "sd-capacity",     { NULL }, 2384,  "udp"  },
-  { "sd-data",         { NULL }, 2385,  "tcp"  },
-  { "sd-data",         { NULL }, 2385,  "udp"  },
-  { "virtualtape",     { NULL }, 2386,  "tcp"  },
-  { "virtualtape",     { NULL }, 2386,  "udp"  },
-  { "vsamredirector",  { NULL }, 2387,  "tcp"  },
-  { "vsamredirector",  { NULL }, 2387,  "udp"  },
-  { "mynahautostart",  { NULL }, 2388,  "tcp"  },
-  { "mynahautostart",  { NULL }, 2388,  "udp"  },
-  { "ovsessionmgr",    { NULL }, 2389,  "tcp"  },
-  { "ovsessionmgr",    { NULL }, 2389,  "udp"  },
-  { "rsmtp",           { NULL }, 2390,  "tcp"  },
-  { "rsmtp",           { NULL }, 2390,  "udp"  },
-  { "3com-net-mgmt",   { NULL }, 2391,  "tcp"  },
-  { "3com-net-mgmt",   { NULL }, 2391,  "udp"  },
-  { "tacticalauth",    { NULL }, 2392,  "tcp"  },
-  { "tacticalauth",    { NULL }, 2392,  "udp"  },
-  { "ms-olap1",        { NULL }, 2393,  "tcp"  },
-  { "ms-olap1",        { NULL }, 2393,  "udp"  },
-  { "ms-olap2",        { NULL }, 2394,  "tcp"  },
-  { "ms-olap2",        { NULL }, 2394,  "udp"  },
-  { "lan900_remote",   { NULL }, 2395,  "tcp"  },
-  { "lan900_remote",   { NULL }, 2395,  "udp"  },
-  { "wusage",          { NULL }, 2396,  "tcp"  },
-  { "wusage",          { NULL }, 2396,  "udp"  },
-  { "ncl",             { NULL }, 2397,  "tcp"  },
-  { "ncl",             { NULL }, 2397,  "udp"  },
-  { "orbiter",         { NULL }, 2398,  "tcp"  },
-  { "orbiter",         { NULL }, 2398,  "udp"  },
-  { "fmpro-fdal",      { NULL }, 2399,  "tcp"  },
-  { "fmpro-fdal",      { NULL }, 2399,  "udp"  },
-  { "opequus-server",  { NULL }, 2400,  "tcp"  },
-  { "opequus-server",  { NULL }, 2400,  "udp"  },
-  { "cvspserver",      { NULL }, 2401,  "tcp"  },
-  { "cvspserver",      { NULL }, 2401,  "udp"  },
-  { "taskmaster2000",  { NULL }, 2402,  "tcp"  },
-  { "taskmaster2000",  { NULL }, 2402,  "udp"  },
-  { "taskmaster2000",  { NULL }, 2403,  "tcp"  },
-  { "taskmaster2000",  { NULL }, 2403,  "udp"  },
-  { "iec-104",         { NULL }, 2404,  "tcp"  },
-  { "iec-104",         { NULL }, 2404,  "udp"  },
-  { "trc-netpoll",     { NULL }, 2405,  "tcp"  },
-  { "trc-netpoll",     { NULL }, 2405,  "udp"  },
-  { "jediserver",      { NULL }, 2406,  "tcp"  },
-  { "jediserver",      { NULL }, 2406,  "udp"  },
-  { "orion",           { NULL }, 2407,  "tcp"  },
-  { "orion",           { NULL }, 2407,  "udp"  },
-  { "optimanet",       { NULL }, 2408,  "tcp"  },
-  { "optimanet",       { NULL }, 2408,  "udp"  },
-  { "sns-protocol",    { NULL }, 2409,  "tcp"  },
-  { "sns-protocol",    { NULL }, 2409,  "udp"  },
-  { "vrts-registry",   { NULL }, 2410,  "tcp"  },
-  { "vrts-registry",   { NULL }, 2410,  "udp"  },
-  { "netwave-ap-mgmt", { NULL }, 2411,  "tcp"  },
-  { "netwave-ap-mgmt", { NULL }, 2411,  "udp"  },
-  { "cdn",             { NULL }, 2412,  "tcp"  },
-  { "cdn",             { NULL }, 2412,  "udp"  },
-  { "orion-rmi-reg",   { NULL }, 2413,  "tcp"  },
-  { "orion-rmi-reg",   { NULL }, 2413,  "udp"  },
-  { "beeyond",         { NULL }, 2414,  "tcp"  },
-  { "beeyond",         { NULL }, 2414,  "udp"  },
-  { "codima-rtp",      { NULL }, 2415,  "tcp"  },
-  { "codima-rtp",      { NULL }, 2415,  "udp"  },
-  { "rmtserver",       { NULL }, 2416,  "tcp"  },
-  { "rmtserver",       { NULL }, 2416,  "udp"  },
-  { "composit-server", { NULL }, 2417,  "tcp"  },
-  { "composit-server", { NULL }, 2417,  "udp"  },
-  { "cas",             { NULL }, 2418,  "tcp"  },
-  { "cas",             { NULL }, 2418,  "udp"  },
-  { "attachmate-s2s",  { NULL }, 2419,  "tcp"  },
-  { "attachmate-s2s",  { NULL }, 2419,  "udp"  },
-  { "dslremote-mgmt",  { NULL }, 2420,  "tcp"  },
-  { "dslremote-mgmt",  { NULL }, 2420,  "udp"  },
-  { "g-talk",          { NULL }, 2421,  "tcp"  },
-  { "g-talk",          { NULL }, 2421,  "udp"  },
-  { "crmsbits",        { NULL }, 2422,  "tcp"  },
-  { "crmsbits",        { NULL }, 2422,  "udp"  },
-  { "rnrp",            { NULL }, 2423,  "tcp"  },
-  { "rnrp",            { NULL }, 2423,  "udp"  },
-  { "kofax-svr",       { NULL }, 2424,  "tcp"  },
-  { "kofax-svr",       { NULL }, 2424,  "udp"  },
-  { "fjitsuappmgr",    { NULL }, 2425,  "tcp"  },
-  { "fjitsuappmgr",    { NULL }, 2425,  "udp"  },
-  { "mgcp-gateway",    { NULL }, 2427,  "tcp"  },
-  { "mgcp-gateway",    { NULL }, 2427,  "udp"  },
-  { "ott",             { NULL }, 2428,  "tcp"  },
-  { "ott",             { NULL }, 2428,  "udp"  },
-  { "ft-role",         { NULL }, 2429,  "tcp"  },
-  { "ft-role",         { NULL }, 2429,  "udp"  },
-  { "venus",           { NULL }, 2430,  "tcp"  },
-  { "venus",           { NULL }, 2430,  "udp"  },
-  { "venus-se",        { NULL }, 2431,  "tcp"  },
-  { "venus-se",        { NULL }, 2431,  "udp"  },
-  { "codasrv",         { NULL }, 2432,  "tcp"  },
-  { "codasrv",         { NULL }, 2432,  "udp"  },
-  { "codasrv-se",      { NULL }, 2433,  "tcp"  },
-  { "codasrv-se",      { NULL }, 2433,  "udp"  },
-  { "pxc-epmap",       { NULL }, 2434,  "tcp"  },
-  { "pxc-epmap",       { NULL }, 2434,  "udp"  },
-  { "optilogic",       { NULL }, 2435,  "tcp"  },
-  { "optilogic",       { NULL }, 2435,  "udp"  },
-  { "topx",            { NULL }, 2436,  "tcp"  },
-  { "topx",            { NULL }, 2436,  "udp"  },
-  { "unicontrol",      { NULL }, 2437,  "tcp"  },
-  { "unicontrol",      { NULL }, 2437,  "udp"  },
-  { "msp",             { NULL }, 2438,  "tcp"  },
-  { "msp",             { NULL }, 2438,  "udp"  },
-  { "sybasedbsynch",   { NULL }, 2439,  "tcp"  },
-  { "sybasedbsynch",   { NULL }, 2439,  "udp"  },
-  { "spearway",        { NULL }, 2440,  "tcp"  },
-  { "spearway",        { NULL }, 2440,  "udp"  },
-  { "pvsw-inet",       { NULL }, 2441,  "tcp"  },
-  { "pvsw-inet",       { NULL }, 2441,  "udp"  },
-  { "netangel",        { NULL }, 2442,  "tcp"  },
-  { "netangel",        { NULL }, 2442,  "udp"  },
-  { "powerclientcsf",  { NULL }, 2443,  "tcp"  },
-  { "powerclientcsf",  { NULL }, 2443,  "udp"  },
-  { "btpp2sectrans",   { NULL }, 2444,  "tcp"  },
-  { "btpp2sectrans",   { NULL }, 2444,  "udp"  },
-  { "dtn1",            { NULL }, 2445,  "tcp"  },
-  { "dtn1",            { NULL }, 2445,  "udp"  },
-  { "bues_service",    { NULL }, 2446,  "tcp"  },
-  { "bues_service",    { NULL }, 2446,  "udp"  },
-  { "ovwdb",           { NULL }, 2447,  "tcp"  },
-  { "ovwdb",           { NULL }, 2447,  "udp"  },
-  { "hpppssvr",        { NULL }, 2448,  "tcp"  },
-  { "hpppssvr",        { NULL }, 2448,  "udp"  },
-  { "ratl",            { NULL }, 2449,  "tcp"  },
-  { "ratl",            { NULL }, 2449,  "udp"  },
-  { "netadmin",        { NULL }, 2450,  "tcp"  },
-  { "netadmin",        { NULL }, 2450,  "udp"  },
-  { "netchat",         { NULL }, 2451,  "tcp"  },
-  { "netchat",         { NULL }, 2451,  "udp"  },
-  { "snifferclient",   { NULL }, 2452,  "tcp"  },
-  { "snifferclient",   { NULL }, 2452,  "udp"  },
-  { "madge-ltd",       { NULL }, 2453,  "tcp"  },
-  { "madge-ltd",       { NULL }, 2453,  "udp"  },
-  { "indx-dds",        { NULL }, 2454,  "tcp"  },
-  { "indx-dds",        { NULL }, 2454,  "udp"  },
-  { "wago-io-system",  { NULL }, 2455,  "tcp"  },
-  { "wago-io-system",  { NULL }, 2455,  "udp"  },
-  { "altav-remmgt",    { NULL }, 2456,  "tcp"  },
-  { "altav-remmgt",    { NULL }, 2456,  "udp"  },
-  { "rapido-ip",       { NULL }, 2457,  "tcp"  },
-  { "rapido-ip",       { NULL }, 2457,  "udp"  },
-  { "griffin",         { NULL }, 2458,  "tcp"  },
-  { "griffin",         { NULL }, 2458,  "udp"  },
-  { "community",       { NULL }, 2459,  "tcp"  },
-  { "community",       { NULL }, 2459,  "udp"  },
-  { "ms-theater",      { NULL }, 2460,  "tcp"  },
-  { "ms-theater",      { NULL }, 2460,  "udp"  },
-  { "qadmifoper",      { NULL }, 2461,  "tcp"  },
-  { "qadmifoper",      { NULL }, 2461,  "udp"  },
-  { "qadmifevent",     { NULL }, 2462,  "tcp"  },
-  { "qadmifevent",     { NULL }, 2462,  "udp"  },
-  { "lsi-raid-mgmt",   { NULL }, 2463,  "tcp"  },
-  { "lsi-raid-mgmt",   { NULL }, 2463,  "udp"  },
-  { "direcpc-si",      { NULL }, 2464,  "tcp"  },
-  { "direcpc-si",      { NULL }, 2464,  "udp"  },
-  { "lbm",             { NULL }, 2465,  "tcp"  },
-  { "lbm",             { NULL }, 2465,  "udp"  },
-  { "lbf",             { NULL }, 2466,  "tcp"  },
-  { "lbf",             { NULL }, 2466,  "udp"  },
-  { "high-criteria",   { NULL }, 2467,  "tcp"  },
-  { "high-criteria",   { NULL }, 2467,  "udp"  },
-  { "qip-msgd",        { NULL }, 2468,  "tcp"  },
-  { "qip-msgd",        { NULL }, 2468,  "udp"  },
-  { "mti-tcs-comm",    { NULL }, 2469,  "tcp"  },
-  { "mti-tcs-comm",    { NULL }, 2469,  "udp"  },
-  { "taskman-port",    { NULL }, 2470,  "tcp"  },
-  { "taskman-port",    { NULL }, 2470,  "udp"  },
-  { "seaodbc",         { NULL }, 2471,  "tcp"  },
-  { "seaodbc",         { NULL }, 2471,  "udp"  },
-  { "c3",              { NULL }, 2472,  "tcp"  },
-  { "c3",              { NULL }, 2472,  "udp"  },
-  { "aker-cdp",        { NULL }, 2473,  "tcp"  },
-  { "aker-cdp",        { NULL }, 2473,  "udp"  },
-  { "vitalanalysis",   { NULL }, 2474,  "tcp"  },
-  { "vitalanalysis",   { NULL }, 2474,  "udp"  },
-  { "ace-server",      { NULL }, 2475,  "tcp"  },
-  { "ace-server",      { NULL }, 2475,  "udp"  },
-  { "ace-svr-prop",    { NULL }, 2476,  "tcp"  },
-  { "ace-svr-prop",    { NULL }, 2476,  "udp"  },
-  { "ssm-cvs",         { NULL }, 2477,  "tcp"  },
-  { "ssm-cvs",         { NULL }, 2477,  "udp"  },
-  { "ssm-cssps",       { NULL }, 2478,  "tcp"  },
-  { "ssm-cssps",       { NULL }, 2478,  "udp"  },
-  { "ssm-els",         { NULL }, 2479,  "tcp"  },
-  { "ssm-els",         { NULL }, 2479,  "udp"  },
-  { "powerexchange",   { NULL }, 2480,  "tcp"  },
-  { "powerexchange",   { NULL }, 2480,  "udp"  },
-  { "giop",            { NULL }, 2481,  "tcp"  },
-  { "giop",            { NULL }, 2481,  "udp"  },
-  { "giop-ssl",        { NULL }, 2482,  "tcp"  },
-  { "giop-ssl",        { NULL }, 2482,  "udp"  },
-  { "ttc",             { NULL }, 2483,  "tcp"  },
-  { "ttc",             { NULL }, 2483,  "udp"  },
-  { "ttc-ssl",         { NULL }, 2484,  "tcp"  },
-  { "ttc-ssl",         { NULL }, 2484,  "udp"  },
-  { "netobjects1",     { NULL }, 2485,  "tcp"  },
-  { "netobjects1",     { NULL }, 2485,  "udp"  },
-  { "netobjects2",     { NULL }, 2486,  "tcp"  },
-  { "netobjects2",     { NULL }, 2486,  "udp"  },
-  { "pns",             { NULL }, 2487,  "tcp"  },
-  { "pns",             { NULL }, 2487,  "udp"  },
-  { "moy-corp",        { NULL }, 2488,  "tcp"  },
-  { "moy-corp",        { NULL }, 2488,  "udp"  },
-  { "tsilb",           { NULL }, 2489,  "tcp"  },
-  { "tsilb",           { NULL }, 2489,  "udp"  },
-  { "qip-qdhcp",       { NULL }, 2490,  "tcp"  },
-  { "qip-qdhcp",       { NULL }, 2490,  "udp"  },
-  { "conclave-cpp",    { NULL }, 2491,  "tcp"  },
-  { "conclave-cpp",    { NULL }, 2491,  "udp"  },
-  { "groove",          { NULL }, 2492,  "tcp"  },
-  { "groove",          { NULL }, 2492,  "udp"  },
-  { "talarian-mqs",    { NULL }, 2493,  "tcp"  },
-  { "talarian-mqs",    { NULL }, 2493,  "udp"  },
-  { "bmc-ar",          { NULL }, 2494,  "tcp"  },
-  { "bmc-ar",          { NULL }, 2494,  "udp"  },
-  { "fast-rem-serv",   { NULL }, 2495,  "tcp"  },
-  { "fast-rem-serv",   { NULL }, 2495,  "udp"  },
-  { "dirgis",          { NULL }, 2496,  "tcp"  },
-  { "dirgis",          { NULL }, 2496,  "udp"  },
-  { "quaddb",          { NULL }, 2497,  "tcp"  },
-  { "quaddb",          { NULL }, 2497,  "udp"  },
-  { "odn-castraq",     { NULL }, 2498,  "tcp"  },
-  { "odn-castraq",     { NULL }, 2498,  "udp"  },
-  { "unicontrol",      { NULL }, 2499,  "tcp"  },
-  { "unicontrol",      { NULL }, 2499,  "udp"  },
-  { "rtsserv",         { NULL }, 2500,  "tcp"  },
-  { "rtsserv",         { NULL }, 2500,  "udp"  },
-  { "rtsclient",       { NULL }, 2501,  "tcp"  },
-  { "rtsclient",       { NULL }, 2501,  "udp"  },
-  { "kentrox-prot",    { NULL }, 2502,  "tcp"  },
-  { "kentrox-prot",    { NULL }, 2502,  "udp"  },
-  { "nms-dpnss",       { NULL }, 2503,  "tcp"  },
-  { "nms-dpnss",       { NULL }, 2503,  "udp"  },
-  { "wlbs",            { NULL }, 2504,  "tcp"  },
-  { "wlbs",            { NULL }, 2504,  "udp"  },
-  { "ppcontrol",       { NULL }, 2505,  "tcp"  },
-  { "ppcontrol",       { NULL }, 2505,  "udp"  },
-  { "jbroker",         { NULL }, 2506,  "tcp"  },
-  { "jbroker",         { NULL }, 2506,  "udp"  },
-  { "spock",           { NULL }, 2507,  "tcp"  },
-  { "spock",           { NULL }, 2507,  "udp"  },
-  { "jdatastore",      { NULL }, 2508,  "tcp"  },
-  { "jdatastore",      { NULL }, 2508,  "udp"  },
-  { "fjmpss",          { NULL }, 2509,  "tcp"  },
-  { "fjmpss",          { NULL }, 2509,  "udp"  },
-  { "fjappmgrbulk",    { NULL }, 2510,  "tcp"  },
-  { "fjappmgrbulk",    { NULL }, 2510,  "udp"  },
-  { "metastorm",       { NULL }, 2511,  "tcp"  },
-  { "metastorm",       { NULL }, 2511,  "udp"  },
-  { "citrixima",       { NULL }, 2512,  "tcp"  },
-  { "citrixima",       { NULL }, 2512,  "udp"  },
-  { "citrixadmin",     { NULL }, 2513,  "tcp"  },
-  { "citrixadmin",     { NULL }, 2513,  "udp"  },
-  { "facsys-ntp",      { NULL }, 2514,  "tcp"  },
-  { "facsys-ntp",      { NULL }, 2514,  "udp"  },
-  { "facsys-router",   { NULL }, 2515,  "tcp"  },
-  { "facsys-router",   { NULL }, 2515,  "udp"  },
-  { "maincontrol",     { NULL }, 2516,  "tcp"  },
-  { "maincontrol",     { NULL }, 2516,  "udp"  },
-  { "call-sig-trans",  { NULL }, 2517,  "tcp"  },
-  { "call-sig-trans",  { NULL }, 2517,  "udp"  },
-  { "willy",           { NULL }, 2518,  "tcp"  },
-  { "willy",           { NULL }, 2518,  "udp"  },
-  { "globmsgsvc",      { NULL }, 2519,  "tcp"  },
-  { "globmsgsvc",      { NULL }, 2519,  "udp"  },
-  { "pvsw",            { NULL }, 2520,  "tcp"  },
-  { "pvsw",            { NULL }, 2520,  "udp"  },
-  { "adaptecmgr",      { NULL }, 2521,  "tcp"  },
-  { "adaptecmgr",      { NULL }, 2521,  "udp"  },
-  { "windb",           { NULL }, 2522,  "tcp"  },
-  { "windb",           { NULL }, 2522,  "udp"  },
-  { "qke-llc-v3",      { NULL }, 2523,  "tcp"  },
-  { "qke-llc-v3",      { NULL }, 2523,  "udp"  },
-  { "optiwave-lm",     { NULL }, 2524,  "tcp"  },
-  { "optiwave-lm",     { NULL }, 2524,  "udp"  },
-  { "ms-v-worlds",     { NULL }, 2525,  "tcp"  },
-  { "ms-v-worlds",     { NULL }, 2525,  "udp"  },
-  { "ema-sent-lm",     { NULL }, 2526,  "tcp"  },
-  { "ema-sent-lm",     { NULL }, 2526,  "udp"  },
-  { "iqserver",        { NULL }, 2527,  "tcp"  },
-  { "iqserver",        { NULL }, 2527,  "udp"  },
-  { "ncr_ccl",         { NULL }, 2528,  "tcp"  },
-  { "ncr_ccl",         { NULL }, 2528,  "udp"  },
-  { "utsftp",          { NULL }, 2529,  "tcp"  },
-  { "utsftp",          { NULL }, 2529,  "udp"  },
-  { "vrcommerce",      { NULL }, 2530,  "tcp"  },
-  { "vrcommerce",      { NULL }, 2530,  "udp"  },
-  { "ito-e-gui",       { NULL }, 2531,  "tcp"  },
-  { "ito-e-gui",       { NULL }, 2531,  "udp"  },
-  { "ovtopmd",         { NULL }, 2532,  "tcp"  },
-  { "ovtopmd",         { NULL }, 2532,  "udp"  },
-  { "snifferserver",   { NULL }, 2533,  "tcp"  },
-  { "snifferserver",   { NULL }, 2533,  "udp"  },
-  { "combox-web-acc",  { NULL }, 2534,  "tcp"  },
-  { "combox-web-acc",  { NULL }, 2534,  "udp"  },
-  { "madcap",          { NULL }, 2535,  "tcp"  },
-  { "madcap",          { NULL }, 2535,  "udp"  },
-  { "btpp2audctr1",    { NULL }, 2536,  "tcp"  },
-  { "btpp2audctr1",    { NULL }, 2536,  "udp"  },
-  { "upgrade",         { NULL }, 2537,  "tcp"  },
-  { "upgrade",         { NULL }, 2537,  "udp"  },
-  { "vnwk-prapi",      { NULL }, 2538,  "tcp"  },
-  { "vnwk-prapi",      { NULL }, 2538,  "udp"  },
-  { "vsiadmin",        { NULL }, 2539,  "tcp"  },
-  { "vsiadmin",        { NULL }, 2539,  "udp"  },
-  { "lonworks",        { NULL }, 2540,  "tcp"  },
-  { "lonworks",        { NULL }, 2540,  "udp"  },
-  { "lonworks2",       { NULL }, 2541,  "tcp"  },
-  { "lonworks2",       { NULL }, 2541,  "udp"  },
-  { "udrawgraph",      { NULL }, 2542,  "tcp"  },
-  { "udrawgraph",      { NULL }, 2542,  "udp"  },
-  { "reftek",          { NULL }, 2543,  "tcp"  },
-  { "reftek",          { NULL }, 2543,  "udp"  },
-  { "novell-zen",      { NULL }, 2544,  "tcp"  },
-  { "novell-zen",      { NULL }, 2544,  "udp"  },
-  { "sis-emt",         { NULL }, 2545,  "tcp"  },
-  { "sis-emt",         { NULL }, 2545,  "udp"  },
-  { "vytalvaultbrtp",  { NULL }, 2546,  "tcp"  },
-  { "vytalvaultbrtp",  { NULL }, 2546,  "udp"  },
-  { "vytalvaultvsmp",  { NULL }, 2547,  "tcp"  },
-  { "vytalvaultvsmp",  { NULL }, 2547,  "udp"  },
-  { "vytalvaultpipe",  { NULL }, 2548,  "tcp"  },
-  { "vytalvaultpipe",  { NULL }, 2548,  "udp"  },
-  { "ipass",           { NULL }, 2549,  "tcp"  },
-  { "ipass",           { NULL }, 2549,  "udp"  },
-  { "ads",             { NULL }, 2550,  "tcp"  },
-  { "ads",             { NULL }, 2550,  "udp"  },
-  { "isg-uda-server",  { NULL }, 2551,  "tcp"  },
-  { "isg-uda-server",  { NULL }, 2551,  "udp"  },
-  { "call-logging",    { NULL }, 2552,  "tcp"  },
-  { "call-logging",    { NULL }, 2552,  "udp"  },
-  { "efidiningport",   { NULL }, 2553,  "tcp"  },
-  { "efidiningport",   { NULL }, 2553,  "udp"  },
-  { "vcnet-link-v10",  { NULL }, 2554,  "tcp"  },
-  { "vcnet-link-v10",  { NULL }, 2554,  "udp"  },
-  { "compaq-wcp",      { NULL }, 2555,  "tcp"  },
-  { "compaq-wcp",      { NULL }, 2555,  "udp"  },
-  { "nicetec-nmsvc",   { NULL }, 2556,  "tcp"  },
-  { "nicetec-nmsvc",   { NULL }, 2556,  "udp"  },
-  { "nicetec-mgmt",    { NULL }, 2557,  "tcp"  },
-  { "nicetec-mgmt",    { NULL }, 2557,  "udp"  },
-  { "pclemultimedia",  { NULL }, 2558,  "tcp"  },
-  { "pclemultimedia",  { NULL }, 2558,  "udp"  },
-  { "lstp",            { NULL }, 2559,  "tcp"  },
-  { "lstp",            { NULL }, 2559,  "udp"  },
-  { "labrat",          { NULL }, 2560,  "tcp"  },
-  { "labrat",          { NULL }, 2560,  "udp"  },
-  { "mosaixcc",        { NULL }, 2561,  "tcp"  },
-  { "mosaixcc",        { NULL }, 2561,  "udp"  },
-  { "delibo",          { NULL }, 2562,  "tcp"  },
-  { "delibo",          { NULL }, 2562,  "udp"  },
-  { "cti-redwood",     { NULL }, 2563,  "tcp"  },
-  { "cti-redwood",     { NULL }, 2563,  "udp"  },
-  { "hp-3000-telnet",  { NULL }, 2564,  "tcp"  },
-  { "coord-svr",       { NULL }, 2565,  "tcp"  },
-  { "coord-svr",       { NULL }, 2565,  "udp"  },
-  { "pcs-pcw",         { NULL }, 2566,  "tcp"  },
-  { "pcs-pcw",         { NULL }, 2566,  "udp"  },
-  { "clp",             { NULL }, 2567,  "tcp"  },
-  { "clp",             { NULL }, 2567,  "udp"  },
-  { "spamtrap",        { NULL }, 2568,  "tcp"  },
-  { "spamtrap",        { NULL }, 2568,  "udp"  },
-  { "sonuscallsig",    { NULL }, 2569,  "tcp"  },
-  { "sonuscallsig",    { NULL }, 2569,  "udp"  },
-  { "hs-port",         { NULL }, 2570,  "tcp"  },
-  { "hs-port",         { NULL }, 2570,  "udp"  },
-  { "cecsvc",          { NULL }, 2571,  "tcp"  },
-  { "cecsvc",          { NULL }, 2571,  "udp"  },
-  { "ibp",             { NULL }, 2572,  "tcp"  },
-  { "ibp",             { NULL }, 2572,  "udp"  },
-  { "trustestablish",  { NULL }, 2573,  "tcp"  },
-  { "trustestablish",  { NULL }, 2573,  "udp"  },
-  { "blockade-bpsp",   { NULL }, 2574,  "tcp"  },
-  { "blockade-bpsp",   { NULL }, 2574,  "udp"  },
-  { "hl7",             { NULL }, 2575,  "tcp"  },
-  { "hl7",             { NULL }, 2575,  "udp"  },
-  { "tclprodebugger",  { NULL }, 2576,  "tcp"  },
-  { "tclprodebugger",  { NULL }, 2576,  "udp"  },
-  { "scipticslsrvr",   { NULL }, 2577,  "tcp"  },
-  { "scipticslsrvr",   { NULL }, 2577,  "udp"  },
-  { "rvs-isdn-dcp",    { NULL }, 2578,  "tcp"  },
-  { "rvs-isdn-dcp",    { NULL }, 2578,  "udp"  },
-  { "mpfoncl",         { NULL }, 2579,  "tcp"  },
-  { "mpfoncl",         { NULL }, 2579,  "udp"  },
-  { "tributary",       { NULL }, 2580,  "tcp"  },
-  { "tributary",       { NULL }, 2580,  "udp"  },
-  { "argis-te",        { NULL }, 2581,  "tcp"  },
-  { "argis-te",        { NULL }, 2581,  "udp"  },
-  { "argis-ds",        { NULL }, 2582,  "tcp"  },
-  { "argis-ds",        { NULL }, 2582,  "udp"  },
-  { "mon",             { NULL }, 2583,  "tcp"  },
-  { "mon",             { NULL }, 2583,  "udp"  },
-  { "cyaserv",         { NULL }, 2584,  "tcp"  },
-  { "cyaserv",         { NULL }, 2584,  "udp"  },
-  { "netx-server",     { NULL }, 2585,  "tcp"  },
-  { "netx-server",     { NULL }, 2585,  "udp"  },
-  { "netx-agent",      { NULL }, 2586,  "tcp"  },
-  { "netx-agent",      { NULL }, 2586,  "udp"  },
-  { "masc",            { NULL }, 2587,  "tcp"  },
-  { "masc",            { NULL }, 2587,  "udp"  },
-  { "privilege",       { NULL }, 2588,  "tcp"  },
-  { "privilege",       { NULL }, 2588,  "udp"  },
-  { "quartus-tcl",     { NULL }, 2589,  "tcp"  },
-  { "quartus-tcl",     { NULL }, 2589,  "udp"  },
-  { "idotdist",        { NULL }, 2590,  "tcp"  },
-  { "idotdist",        { NULL }, 2590,  "udp"  },
-  { "maytagshuffle",   { NULL }, 2591,  "tcp"  },
-  { "maytagshuffle",   { NULL }, 2591,  "udp"  },
-  { "netrek",          { NULL }, 2592,  "tcp"  },
-  { "netrek",          { NULL }, 2592,  "udp"  },
-  { "mns-mail",        { NULL }, 2593,  "tcp"  },
-  { "mns-mail",        { NULL }, 2593,  "udp"  },
-  { "dts",             { NULL }, 2594,  "tcp"  },
-  { "dts",             { NULL }, 2594,  "udp"  },
-  { "worldfusion1",    { NULL }, 2595,  "tcp"  },
-  { "worldfusion1",    { NULL }, 2595,  "udp"  },
-  { "worldfusion2",    { NULL }, 2596,  "tcp"  },
-  { "worldfusion2",    { NULL }, 2596,  "udp"  },
-  { "homesteadglory",  { NULL }, 2597,  "tcp"  },
-  { "homesteadglory",  { NULL }, 2597,  "udp"  },
-  { "citriximaclient", { NULL }, 2598,  "tcp"  },
-  { "citriximaclient", { NULL }, 2598,  "udp"  },
-  { "snapd",           { NULL }, 2599,  "tcp"  },
-  { "snapd",           { NULL }, 2599,  "udp"  },
-  { "hpstgmgr",        { NULL }, 2600,  "tcp"  },
-  { "hpstgmgr",        { NULL }, 2600,  "udp"  },
-  { "discp-client",    { NULL }, 2601,  "tcp"  },
-  { "discp-client",    { NULL }, 2601,  "udp"  },
-  { "discp-server",    { NULL }, 2602,  "tcp"  },
-  { "discp-server",    { NULL }, 2602,  "udp"  },
-  { "servicemeter",    { NULL }, 2603,  "tcp"  },
-  { "servicemeter",    { NULL }, 2603,  "udp"  },
-  { "nsc-ccs",         { NULL }, 2604,  "tcp"  },
-  { "nsc-ccs",         { NULL }, 2604,  "udp"  },
-  { "nsc-posa",        { NULL }, 2605,  "tcp"  },
-  { "nsc-posa",        { NULL }, 2605,  "udp"  },
-  { "netmon",          { NULL }, 2606,  "tcp"  },
-  { "netmon",          { NULL }, 2606,  "udp"  },
-  { "connection",      { NULL }, 2607,  "tcp"  },
-  { "connection",      { NULL }, 2607,  "udp"  },
-  { "wag-service",     { NULL }, 2608,  "tcp"  },
-  { "wag-service",     { NULL }, 2608,  "udp"  },
-  { "system-monitor",  { NULL }, 2609,  "tcp"  },
-  { "system-monitor",  { NULL }, 2609,  "udp"  },
-  { "versa-tek",       { NULL }, 2610,  "tcp"  },
-  { "versa-tek",       { NULL }, 2610,  "udp"  },
-  { "lionhead",        { NULL }, 2611,  "tcp"  },
-  { "lionhead",        { NULL }, 2611,  "udp"  },
-  { "qpasa-agent",     { NULL }, 2612,  "tcp"  },
-  { "qpasa-agent",     { NULL }, 2612,  "udp"  },
-  { "smntubootstrap",  { NULL }, 2613,  "tcp"  },
-  { "smntubootstrap",  { NULL }, 2613,  "udp"  },
-  { "neveroffline",    { NULL }, 2614,  "tcp"  },
-  { "neveroffline",    { NULL }, 2614,  "udp"  },
-  { "firepower",       { NULL }, 2615,  "tcp"  },
-  { "firepower",       { NULL }, 2615,  "udp"  },
-  { "appswitch-emp",   { NULL }, 2616,  "tcp"  },
-  { "appswitch-emp",   { NULL }, 2616,  "udp"  },
-  { "cmadmin",         { NULL }, 2617,  "tcp"  },
-  { "cmadmin",         { NULL }, 2617,  "udp"  },
-  { "priority-e-com",  { NULL }, 2618,  "tcp"  },
-  { "priority-e-com",  { NULL }, 2618,  "udp"  },
-  { "bruce",           { NULL }, 2619,  "tcp"  },
-  { "bruce",           { NULL }, 2619,  "udp"  },
-  { "lpsrecommender",  { NULL }, 2620,  "tcp"  },
-  { "lpsrecommender",  { NULL }, 2620,  "udp"  },
-  { "miles-apart",     { NULL }, 2621,  "tcp"  },
-  { "miles-apart",     { NULL }, 2621,  "udp"  },
-  { "metricadbc",      { NULL }, 2622,  "tcp"  },
-  { "metricadbc",      { NULL }, 2622,  "udp"  },
-  { "lmdp",            { NULL }, 2623,  "tcp"  },
-  { "lmdp",            { NULL }, 2623,  "udp"  },
-  { "aria",            { NULL }, 2624,  "tcp"  },
-  { "aria",            { NULL }, 2624,  "udp"  },
-  { "blwnkl-port",     { NULL }, 2625,  "tcp"  },
-  { "blwnkl-port",     { NULL }, 2625,  "udp"  },
-  { "gbjd816",         { NULL }, 2626,  "tcp"  },
-  { "gbjd816",         { NULL }, 2626,  "udp"  },
-  { "moshebeeri",      { NULL }, 2627,  "tcp"  },
-  { "moshebeeri",      { NULL }, 2627,  "udp"  },
-  { "dict",            { NULL }, 2628,  "tcp"  },
-  { "dict",            { NULL }, 2628,  "udp"  },
-  { "sitaraserver",    { NULL }, 2629,  "tcp"  },
-  { "sitaraserver",    { NULL }, 2629,  "udp"  },
-  { "sitaramgmt",      { NULL }, 2630,  "tcp"  },
-  { "sitaramgmt",      { NULL }, 2630,  "udp"  },
-  { "sitaradir",       { NULL }, 2631,  "tcp"  },
-  { "sitaradir",       { NULL }, 2631,  "udp"  },
-  { "irdg-post",       { NULL }, 2632,  "tcp"  },
-  { "irdg-post",       { NULL }, 2632,  "udp"  },
-  { "interintelli",    { NULL }, 2633,  "tcp"  },
-  { "interintelli",    { NULL }, 2633,  "udp"  },
-  { "pk-electronics",  { NULL }, 2634,  "tcp"  },
-  { "pk-electronics",  { NULL }, 2634,  "udp"  },
-  { "backburner",      { NULL }, 2635,  "tcp"  },
-  { "backburner",      { NULL }, 2635,  "udp"  },
-  { "solve",           { NULL }, 2636,  "tcp"  },
-  { "solve",           { NULL }, 2636,  "udp"  },
-  { "imdocsvc",        { NULL }, 2637,  "tcp"  },
-  { "imdocsvc",        { NULL }, 2637,  "udp"  },
-  { "sybaseanywhere",  { NULL }, 2638,  "tcp"  },
-  { "sybaseanywhere",  { NULL }, 2638,  "udp"  },
-  { "aminet",          { NULL }, 2639,  "tcp"  },
-  { "aminet",          { NULL }, 2639,  "udp"  },
-  { "sai_sentlm",      { NULL }, 2640,  "tcp"  },
-  { "sai_sentlm",      { NULL }, 2640,  "udp"  },
-  { "hdl-srv",         { NULL }, 2641,  "tcp"  },
-  { "hdl-srv",         { NULL }, 2641,  "udp"  },
-  { "tragic",          { NULL }, 2642,  "tcp"  },
-  { "tragic",          { NULL }, 2642,  "udp"  },
-  { "gte-samp",        { NULL }, 2643,  "tcp"  },
-  { "gte-samp",        { NULL }, 2643,  "udp"  },
-  { "travsoft-ipx-t",  { NULL }, 2644,  "tcp"  },
-  { "travsoft-ipx-t",  { NULL }, 2644,  "udp"  },
-  { "novell-ipx-cmd",  { NULL }, 2645,  "tcp"  },
-  { "novell-ipx-cmd",  { NULL }, 2645,  "udp"  },
-  { "and-lm",          { NULL }, 2646,  "tcp"  },
-  { "and-lm",          { NULL }, 2646,  "udp"  },
-  { "syncserver",      { NULL }, 2647,  "tcp"  },
-  { "syncserver",      { NULL }, 2647,  "udp"  },
-  { "upsnotifyprot",   { NULL }, 2648,  "tcp"  },
-  { "upsnotifyprot",   { NULL }, 2648,  "udp"  },
-  { "vpsipport",       { NULL }, 2649,  "tcp"  },
-  { "vpsipport",       { NULL }, 2649,  "udp"  },
-  { "eristwoguns",     { NULL }, 2650,  "tcp"  },
-  { "eristwoguns",     { NULL }, 2650,  "udp"  },
-  { "ebinsite",        { NULL }, 2651,  "tcp"  },
-  { "ebinsite",        { NULL }, 2651,  "udp"  },
-  { "interpathpanel",  { NULL }, 2652,  "tcp"  },
-  { "interpathpanel",  { NULL }, 2652,  "udp"  },
-  { "sonus",           { NULL }, 2653,  "tcp"  },
-  { "sonus",           { NULL }, 2653,  "udp"  },
-  { "corel_vncadmin",  { NULL }, 2654,  "tcp"  },
-  { "corel_vncadmin",  { NULL }, 2654,  "udp"  },
-  { "unglue",          { NULL }, 2655,  "tcp"  },
-  { "unglue",          { NULL }, 2655,  "udp"  },
-  { "kana",            { NULL }, 2656,  "tcp"  },
-  { "kana",            { NULL }, 2656,  "udp"  },
-  { "sns-dispatcher",  { NULL }, 2657,  "tcp"  },
-  { "sns-dispatcher",  { NULL }, 2657,  "udp"  },
-  { "sns-admin",       { NULL }, 2658,  "tcp"  },
-  { "sns-admin",       { NULL }, 2658,  "udp"  },
-  { "sns-query",       { NULL }, 2659,  "tcp"  },
-  { "sns-query",       { NULL }, 2659,  "udp"  },
-  { "gcmonitor",       { NULL }, 2660,  "tcp"  },
-  { "gcmonitor",       { NULL }, 2660,  "udp"  },
-  { "olhost",          { NULL }, 2661,  "tcp"  },
-  { "olhost",          { NULL }, 2661,  "udp"  },
-  { "bintec-capi",     { NULL }, 2662,  "tcp"  },
-  { "bintec-capi",     { NULL }, 2662,  "udp"  },
-  { "bintec-tapi",     { NULL }, 2663,  "tcp"  },
-  { "bintec-tapi",     { NULL }, 2663,  "udp"  },
-  { "patrol-mq-gm",    { NULL }, 2664,  "tcp"  },
-  { "patrol-mq-gm",    { NULL }, 2664,  "udp"  },
-  { "patrol-mq-nm",    { NULL }, 2665,  "tcp"  },
-  { "patrol-mq-nm",    { NULL }, 2665,  "udp"  },
-  { "extensis",        { NULL }, 2666,  "tcp"  },
-  { "extensis",        { NULL }, 2666,  "udp"  },
-  { "alarm-clock-s",   { NULL }, 2667,  "tcp"  },
-  { "alarm-clock-s",   { NULL }, 2667,  "udp"  },
-  { "alarm-clock-c",   { NULL }, 2668,  "tcp"  },
-  { "alarm-clock-c",   { NULL }, 2668,  "udp"  },
-  { "toad",            { NULL }, 2669,  "tcp"  },
-  { "toad",            { NULL }, 2669,  "udp"  },
-  { "tve-announce",    { NULL }, 2670,  "tcp"  },
-  { "tve-announce",    { NULL }, 2670,  "udp"  },
-  { "newlixreg",       { NULL }, 2671,  "tcp"  },
-  { "newlixreg",       { NULL }, 2671,  "udp"  },
-  { "nhserver",        { NULL }, 2672,  "tcp"  },
-  { "nhserver",        { NULL }, 2672,  "udp"  },
-  { "firstcall42",     { NULL }, 2673,  "tcp"  },
-  { "firstcall42",     { NULL }, 2673,  "udp"  },
-  { "ewnn",            { NULL }, 2674,  "tcp"  },
-  { "ewnn",            { NULL }, 2674,  "udp"  },
-  { "ttc-etap",        { NULL }, 2675,  "tcp"  },
-  { "ttc-etap",        { NULL }, 2675,  "udp"  },
-  { "simslink",        { NULL }, 2676,  "tcp"  },
-  { "simslink",        { NULL }, 2676,  "udp"  },
-  { "gadgetgate1way",  { NULL }, 2677,  "tcp"  },
-  { "gadgetgate1way",  { NULL }, 2677,  "udp"  },
-  { "gadgetgate2way",  { NULL }, 2678,  "tcp"  },
-  { "gadgetgate2way",  { NULL }, 2678,  "udp"  },
-  { "syncserverssl",   { NULL }, 2679,  "tcp"  },
-  { "syncserverssl",   { NULL }, 2679,  "udp"  },
-  { "pxc-sapxom",      { NULL }, 2680,  "tcp"  },
-  { "pxc-sapxom",      { NULL }, 2680,  "udp"  },
-  { "mpnjsomb",        { NULL }, 2681,  "tcp"  },
-  { "mpnjsomb",        { NULL }, 2681,  "udp"  },
-  { "ncdloadbalance",  { NULL }, 2683,  "tcp"  },
-  { "ncdloadbalance",  { NULL }, 2683,  "udp"  },
-  { "mpnjsosv",        { NULL }, 2684,  "tcp"  },
-  { "mpnjsosv",        { NULL }, 2684,  "udp"  },
-  { "mpnjsocl",        { NULL }, 2685,  "tcp"  },
-  { "mpnjsocl",        { NULL }, 2685,  "udp"  },
-  { "mpnjsomg",        { NULL }, 2686,  "tcp"  },
-  { "mpnjsomg",        { NULL }, 2686,  "udp"  },
-  { "pq-lic-mgmt",     { NULL }, 2687,  "tcp"  },
-  { "pq-lic-mgmt",     { NULL }, 2687,  "udp"  },
-  { "md-cg-http",      { NULL }, 2688,  "tcp"  },
-  { "md-cg-http",      { NULL }, 2688,  "udp"  },
-  { "fastlynx",        { NULL }, 2689,  "tcp"  },
-  { "fastlynx",        { NULL }, 2689,  "udp"  },
-  { "hp-nnm-data",     { NULL }, 2690,  "tcp"  },
-  { "hp-nnm-data",     { NULL }, 2690,  "udp"  },
-  { "itinternet",      { NULL }, 2691,  "tcp"  },
-  { "itinternet",      { NULL }, 2691,  "udp"  },
-  { "admins-lms",      { NULL }, 2692,  "tcp"  },
-  { "admins-lms",      { NULL }, 2692,  "udp"  },
-  { "pwrsevent",       { NULL }, 2694,  "tcp"  },
-  { "pwrsevent",       { NULL }, 2694,  "udp"  },
-  { "vspread",         { NULL }, 2695,  "tcp"  },
-  { "vspread",         { NULL }, 2695,  "udp"  },
-  { "unifyadmin",      { NULL }, 2696,  "tcp"  },
-  { "unifyadmin",      { NULL }, 2696,  "udp"  },
-  { "oce-snmp-trap",   { NULL }, 2697,  "tcp"  },
-  { "oce-snmp-trap",   { NULL }, 2697,  "udp"  },
-  { "mck-ivpip",       { NULL }, 2698,  "tcp"  },
-  { "mck-ivpip",       { NULL }, 2698,  "udp"  },
-  { "csoft-plusclnt",  { NULL }, 2699,  "tcp"  },
-  { "csoft-plusclnt",  { NULL }, 2699,  "udp"  },
-  { "tqdata",          { NULL }, 2700,  "tcp"  },
-  { "tqdata",          { NULL }, 2700,  "udp"  },
-  { "sms-rcinfo",      { NULL }, 2701,  "tcp"  },
-  { "sms-rcinfo",      { NULL }, 2701,  "udp"  },
-  { "sms-xfer",        { NULL }, 2702,  "tcp"  },
-  { "sms-xfer",        { NULL }, 2702,  "udp"  },
-  { "sms-chat",        { NULL }, 2703,  "tcp"  },
-  { "sms-chat",        { NULL }, 2703,  "udp"  },
-  { "sms-remctrl",     { NULL }, 2704,  "tcp"  },
-  { "sms-remctrl",     { NULL }, 2704,  "udp"  },
-  { "sds-admin",       { NULL }, 2705,  "tcp"  },
-  { "sds-admin",       { NULL }, 2705,  "udp"  },
-  { "ncdmirroring",    { NULL }, 2706,  "tcp"  },
-  { "ncdmirroring",    { NULL }, 2706,  "udp"  },
-  { "emcsymapiport",   { NULL }, 2707,  "tcp"  },
-  { "emcsymapiport",   { NULL }, 2707,  "udp"  },
-  { "banyan-net",      { NULL }, 2708,  "tcp"  },
-  { "banyan-net",      { NULL }, 2708,  "udp"  },
-  { "supermon",        { NULL }, 2709,  "tcp"  },
-  { "supermon",        { NULL }, 2709,  "udp"  },
-  { "sso-service",     { NULL }, 2710,  "tcp"  },
-  { "sso-service",     { NULL }, 2710,  "udp"  },
-  { "sso-control",     { NULL }, 2711,  "tcp"  },
-  { "sso-control",     { NULL }, 2711,  "udp"  },
-  { "aocp",            { NULL }, 2712,  "tcp"  },
-  { "aocp",            { NULL }, 2712,  "udp"  },
-  { "raventbs",        { NULL }, 2713,  "tcp"  },
-  { "raventbs",        { NULL }, 2713,  "udp"  },
-  { "raventdm",        { NULL }, 2714,  "tcp"  },
-  { "raventdm",        { NULL }, 2714,  "udp"  },
-  { "hpstgmgr2",       { NULL }, 2715,  "tcp"  },
-  { "hpstgmgr2",       { NULL }, 2715,  "udp"  },
-  { "inova-ip-disco",  { NULL }, 2716,  "tcp"  },
-  { "inova-ip-disco",  { NULL }, 2716,  "udp"  },
-  { "pn-requester",    { NULL }, 2717,  "tcp"  },
-  { "pn-requester",    { NULL }, 2717,  "udp"  },
-  { "pn-requester2",   { NULL }, 2718,  "tcp"  },
-  { "pn-requester2",   { NULL }, 2718,  "udp"  },
-  { "scan-change",     { NULL }, 2719,  "tcp"  },
-  { "scan-change",     { NULL }, 2719,  "udp"  },
-  { "wkars",           { NULL }, 2720,  "tcp"  },
-  { "wkars",           { NULL }, 2720,  "udp"  },
-  { "smart-diagnose",  { NULL }, 2721,  "tcp"  },
-  { "smart-diagnose",  { NULL }, 2721,  "udp"  },
-  { "proactivesrvr",   { NULL }, 2722,  "tcp"  },
-  { "proactivesrvr",   { NULL }, 2722,  "udp"  },
-  { "watchdog-nt",     { NULL }, 2723,  "tcp"  },
-  { "watchdog-nt",     { NULL }, 2723,  "udp"  },
-  { "qotps",           { NULL }, 2724,  "tcp"  },
-  { "qotps",           { NULL }, 2724,  "udp"  },
-  { "msolap-ptp2",     { NULL }, 2725,  "tcp"  },
-  { "msolap-ptp2",     { NULL }, 2725,  "udp"  },
-  { "tams",            { NULL }, 2726,  "tcp"  },
-  { "tams",            { NULL }, 2726,  "udp"  },
-  { "mgcp-callagent",  { NULL }, 2727,  "tcp"  },
-  { "mgcp-callagent",  { NULL }, 2727,  "udp"  },
-  { "sqdr",            { NULL }, 2728,  "tcp"  },
-  { "sqdr",            { NULL }, 2728,  "udp"  },
-  { "tcim-control",    { NULL }, 2729,  "tcp"  },
-  { "tcim-control",    { NULL }, 2729,  "udp"  },
-  { "nec-raidplus",    { NULL }, 2730,  "tcp"  },
-  { "nec-raidplus",    { NULL }, 2730,  "udp"  },
-  { "fyre-messanger",  { NULL }, 2731,  "tcp"  },
-  { "fyre-messanger",  { NULL }, 2731,  "udp"  },
-  { "g5m",             { NULL }, 2732,  "tcp"  },
-  { "g5m",             { NULL }, 2732,  "udp"  },
-  { "signet-ctf",      { NULL }, 2733,  "tcp"  },
-  { "signet-ctf",      { NULL }, 2733,  "udp"  },
-  { "ccs-software",    { NULL }, 2734,  "tcp"  },
-  { "ccs-software",    { NULL }, 2734,  "udp"  },
-  { "netiq-mc",        { NULL }, 2735,  "tcp"  },
-  { "netiq-mc",        { NULL }, 2735,  "udp"  },
-  { "radwiz-nms-srv",  { NULL }, 2736,  "tcp"  },
-  { "radwiz-nms-srv",  { NULL }, 2736,  "udp"  },
-  { "srp-feedback",    { NULL }, 2737,  "tcp"  },
-  { "srp-feedback",    { NULL }, 2737,  "udp"  },
-  { "ndl-tcp-ois-gw",  { NULL }, 2738,  "tcp"  },
-  { "ndl-tcp-ois-gw",  { NULL }, 2738,  "udp"  },
-  { "tn-timing",       { NULL }, 2739,  "tcp"  },
-  { "tn-timing",       { NULL }, 2739,  "udp"  },
-  { "alarm",           { NULL }, 2740,  "tcp"  },
-  { "alarm",           { NULL }, 2740,  "udp"  },
-  { "tsb",             { NULL }, 2741,  "tcp"  },
-  { "tsb",             { NULL }, 2741,  "udp"  },
-  { "tsb2",            { NULL }, 2742,  "tcp"  },
-  { "tsb2",            { NULL }, 2742,  "udp"  },
-  { "murx",            { NULL }, 2743,  "tcp"  },
-  { "murx",            { NULL }, 2743,  "udp"  },
-  { "honyaku",         { NULL }, 2744,  "tcp"  },
-  { "honyaku",         { NULL }, 2744,  "udp"  },
-  { "urbisnet",        { NULL }, 2745,  "tcp"  },
-  { "urbisnet",        { NULL }, 2745,  "udp"  },
-  { "cpudpencap",      { NULL }, 2746,  "tcp"  },
-  { "cpudpencap",      { NULL }, 2746,  "udp"  },
-  { "fjippol-swrly",   { NULL }, 2747,  "tcp"  },
-  { "fjippol-swrly",   { NULL }, 2747,  "udp"  },
-  { "fjippol-polsvr",  { NULL }, 2748,  "tcp"  },
-  { "fjippol-polsvr",  { NULL }, 2748,  "udp"  },
-  { "fjippol-cnsl",    { NULL }, 2749,  "tcp"  },
-  { "fjippol-cnsl",    { NULL }, 2749,  "udp"  },
-  { "fjippol-port1",   { NULL }, 2750,  "tcp"  },
-  { "fjippol-port1",   { NULL }, 2750,  "udp"  },
-  { "fjippol-port2",   { NULL }, 2751,  "tcp"  },
-  { "fjippol-port2",   { NULL }, 2751,  "udp"  },
-  { "rsisysaccess",    { NULL }, 2752,  "tcp"  },
-  { "rsisysaccess",    { NULL }, 2752,  "udp"  },
-  { "de-spot",         { NULL }, 2753,  "tcp"  },
-  { "de-spot",         { NULL }, 2753,  "udp"  },
-  { "apollo-cc",       { NULL }, 2754,  "tcp"  },
-  { "apollo-cc",       { NULL }, 2754,  "udp"  },
-  { "expresspay",      { NULL }, 2755,  "tcp"  },
-  { "expresspay",      { NULL }, 2755,  "udp"  },
-  { "simplement-tie",  { NULL }, 2756,  "tcp"  },
-  { "simplement-tie",  { NULL }, 2756,  "udp"  },
-  { "cnrp",            { NULL }, 2757,  "tcp"  },
-  { "cnrp",            { NULL }, 2757,  "udp"  },
-  { "apollo-status",   { NULL }, 2758,  "tcp"  },
-  { "apollo-status",   { NULL }, 2758,  "udp"  },
-  { "apollo-gms",      { NULL }, 2759,  "tcp"  },
-  { "apollo-gms",      { NULL }, 2759,  "udp"  },
-  { "sabams",          { NULL }, 2760,  "tcp"  },
-  { "sabams",          { NULL }, 2760,  "udp"  },
-  { "dicom-iscl",      { NULL }, 2761,  "tcp"  },
-  { "dicom-iscl",      { NULL }, 2761,  "udp"  },
-  { "dicom-tls",       { NULL }, 2762,  "tcp"  },
-  { "dicom-tls",       { NULL }, 2762,  "udp"  },
-  { "desktop-dna",     { NULL }, 2763,  "tcp"  },
-  { "desktop-dna",     { NULL }, 2763,  "udp"  },
-  { "data-insurance",  { NULL }, 2764,  "tcp"  },
-  { "data-insurance",  { NULL }, 2764,  "udp"  },
-  { "qip-audup",       { NULL }, 2765,  "tcp"  },
-  { "qip-audup",       { NULL }, 2765,  "udp"  },
-  { "compaq-scp",      { NULL }, 2766,  "tcp"  },
-  { "compaq-scp",      { NULL }, 2766,  "udp"  },
-  { "uadtc",           { NULL }, 2767,  "tcp"  },
-  { "uadtc",           { NULL }, 2767,  "udp"  },
-  { "uacs",            { NULL }, 2768,  "tcp"  },
-  { "uacs",            { NULL }, 2768,  "udp"  },
-  { "exce",            { NULL }, 2769,  "tcp"  },
-  { "exce",            { NULL }, 2769,  "udp"  },
-  { "veronica",        { NULL }, 2770,  "tcp"  },
-  { "veronica",        { NULL }, 2770,  "udp"  },
-  { "vergencecm",      { NULL }, 2771,  "tcp"  },
-  { "vergencecm",      { NULL }, 2771,  "udp"  },
-  { "auris",           { NULL }, 2772,  "tcp"  },
-  { "auris",           { NULL }, 2772,  "udp"  },
-  { "rbakcup1",        { NULL }, 2773,  "tcp"  },
-  { "rbakcup1",        { NULL }, 2773,  "udp"  },
-  { "rbakcup2",        { NULL }, 2774,  "tcp"  },
-  { "rbakcup2",        { NULL }, 2774,  "udp"  },
-  { "smpp",            { NULL }, 2775,  "tcp"  },
-  { "smpp",            { NULL }, 2775,  "udp"  },
-  { "ridgeway1",       { NULL }, 2776,  "tcp"  },
-  { "ridgeway1",       { NULL }, 2776,  "udp"  },
-  { "ridgeway2",       { NULL }, 2777,  "tcp"  },
-  { "ridgeway2",       { NULL }, 2777,  "udp"  },
-  { "gwen-sonya",      { NULL }, 2778,  "tcp"  },
-  { "gwen-sonya",      { NULL }, 2778,  "udp"  },
-  { "lbc-sync",        { NULL }, 2779,  "tcp"  },
-  { "lbc-sync",        { NULL }, 2779,  "udp"  },
-  { "lbc-control",     { NULL }, 2780,  "tcp"  },
-  { "lbc-control",     { NULL }, 2780,  "udp"  },
-  { "whosells",        { NULL }, 2781,  "tcp"  },
-  { "whosells",        { NULL }, 2781,  "udp"  },
-  { "everydayrc",      { NULL }, 2782,  "tcp"  },
-  { "everydayrc",      { NULL }, 2782,  "udp"  },
-  { "aises",           { NULL }, 2783,  "tcp"  },
-  { "aises",           { NULL }, 2783,  "udp"  },
-  { "www-dev",         { NULL }, 2784,  "tcp"  },
-  { "www-dev",         { NULL }, 2784,  "udp"  },
-  { "aic-np",          { NULL }, 2785,  "tcp"  },
-  { "aic-np",          { NULL }, 2785,  "udp"  },
-  { "aic-oncrpc",      { NULL }, 2786,  "tcp"  },
-  { "aic-oncrpc",      { NULL }, 2786,  "udp"  },
-  { "piccolo",         { NULL }, 2787,  "tcp"  },
-  { "piccolo",         { NULL }, 2787,  "udp"  },
-  { "fryeserv",        { NULL }, 2788,  "tcp"  },
-  { "fryeserv",        { NULL }, 2788,  "udp"  },
-  { "media-agent",     { NULL }, 2789,  "tcp"  },
-  { "media-agent",     { NULL }, 2789,  "udp"  },
-  { "plgproxy",        { NULL }, 2790,  "tcp"  },
-  { "plgproxy",        { NULL }, 2790,  "udp"  },
-  { "mtport-regist",   { NULL }, 2791,  "tcp"  },
-  { "mtport-regist",   { NULL }, 2791,  "udp"  },
-  { "f5-globalsite",   { NULL }, 2792,  "tcp"  },
-  { "f5-globalsite",   { NULL }, 2792,  "udp"  },
-  { "initlsmsad",      { NULL }, 2793,  "tcp"  },
-  { "initlsmsad",      { NULL }, 2793,  "udp"  },
-  { "livestats",       { NULL }, 2795,  "tcp"  },
-  { "livestats",       { NULL }, 2795,  "udp"  },
-  { "ac-tech",         { NULL }, 2796,  "tcp"  },
-  { "ac-tech",         { NULL }, 2796,  "udp"  },
-  { "esp-encap",       { NULL }, 2797,  "tcp"  },
-  { "esp-encap",       { NULL }, 2797,  "udp"  },
-  { "tmesis-upshot",   { NULL }, 2798,  "tcp"  },
-  { "tmesis-upshot",   { NULL }, 2798,  "udp"  },
-  { "icon-discover",   { NULL }, 2799,  "tcp"  },
-  { "icon-discover",   { NULL }, 2799,  "udp"  },
-  { "acc-raid",        { NULL }, 2800,  "tcp"  },
-  { "acc-raid",        { NULL }, 2800,  "udp"  },
-  { "igcp",            { NULL }, 2801,  "tcp"  },
-  { "igcp",            { NULL }, 2801,  "udp"  },
-  { "veritas-tcp1",    { NULL }, 2802,  "tcp"  },
-  { "veritas-udp1",    { NULL }, 2802,  "udp"  },
-  { "btprjctrl",       { NULL }, 2803,  "tcp"  },
-  { "btprjctrl",       { NULL }, 2803,  "udp"  },
-  { "dvr-esm",         { NULL }, 2804,  "tcp"  },
-  { "dvr-esm",         { NULL }, 2804,  "udp"  },
-  { "wta-wsp-s",       { NULL }, 2805,  "tcp"  },
-  { "wta-wsp-s",       { NULL }, 2805,  "udp"  },
-  { "cspuni",          { NULL }, 2806,  "tcp"  },
-  { "cspuni",          { NULL }, 2806,  "udp"  },
-  { "cspmulti",        { NULL }, 2807,  "tcp"  },
-  { "cspmulti",        { NULL }, 2807,  "udp"  },
-  { "j-lan-p",         { NULL }, 2808,  "tcp"  },
-  { "j-lan-p",         { NULL }, 2808,  "udp"  },
-  { "corbaloc",        { NULL }, 2809,  "tcp"  },
-  { "corbaloc",        { NULL }, 2809,  "udp"  },
-  { "netsteward",      { NULL }, 2810,  "tcp"  },
-  { "netsteward",      { NULL }, 2810,  "udp"  },
-  { "gsiftp",          { NULL }, 2811,  "tcp"  },
-  { "gsiftp",          { NULL }, 2811,  "udp"  },
-  { "atmtcp",          { NULL }, 2812,  "tcp"  },
-  { "atmtcp",          { NULL }, 2812,  "udp"  },
-  { "llm-pass",        { NULL }, 2813,  "tcp"  },
-  { "llm-pass",        { NULL }, 2813,  "udp"  },
-  { "llm-csv",         { NULL }, 2814,  "tcp"  },
-  { "llm-csv",         { NULL }, 2814,  "udp"  },
-  { "lbc-measure",     { NULL }, 2815,  "tcp"  },
-  { "lbc-measure",     { NULL }, 2815,  "udp"  },
-  { "lbc-watchdog",    { NULL }, 2816,  "tcp"  },
-  { "lbc-watchdog",    { NULL }, 2816,  "udp"  },
-  { "nmsigport",       { NULL }, 2817,  "tcp"  },
-  { "nmsigport",       { NULL }, 2817,  "udp"  },
-  { "rmlnk",           { NULL }, 2818,  "tcp"  },
-  { "rmlnk",           { NULL }, 2818,  "udp"  },
-  { "fc-faultnotify",  { NULL }, 2819,  "tcp"  },
-  { "fc-faultnotify",  { NULL }, 2819,  "udp"  },
-  { "univision",       { NULL }, 2820,  "tcp"  },
-  { "univision",       { NULL }, 2820,  "udp"  },
-  { "vrts-at-port",    { NULL }, 2821,  "tcp"  },
-  { "vrts-at-port",    { NULL }, 2821,  "udp"  },
-  { "ka0wuc",          { NULL }, 2822,  "tcp"  },
-  { "ka0wuc",          { NULL }, 2822,  "udp"  },
-  { "cqg-netlan",      { NULL }, 2823,  "tcp"  },
-  { "cqg-netlan",      { NULL }, 2823,  "udp"  },
-  { "cqg-netlan-1",    { NULL }, 2824,  "tcp"  },
-  { "cqg-netlan-1",    { NULL }, 2824,  "udp"  },
-  { "slc-systemlog",   { NULL }, 2826,  "tcp"  },
-  { "slc-systemlog",   { NULL }, 2826,  "udp"  },
-  { "slc-ctrlrloops",  { NULL }, 2827,  "tcp"  },
-  { "slc-ctrlrloops",  { NULL }, 2827,  "udp"  },
-  { "itm-lm",          { NULL }, 2828,  "tcp"  },
-  { "itm-lm",          { NULL }, 2828,  "udp"  },
-  { "silkp1",          { NULL }, 2829,  "tcp"  },
-  { "silkp1",          { NULL }, 2829,  "udp"  },
-  { "silkp2",          { NULL }, 2830,  "tcp"  },
-  { "silkp2",          { NULL }, 2830,  "udp"  },
-  { "silkp3",          { NULL }, 2831,  "tcp"  },
-  { "silkp3",          { NULL }, 2831,  "udp"  },
-  { "silkp4",          { NULL }, 2832,  "tcp"  },
-  { "silkp4",          { NULL }, 2832,  "udp"  },
-  { "glishd",          { NULL }, 2833,  "tcp"  },
-  { "glishd",          { NULL }, 2833,  "udp"  },
-  { "evtp",            { NULL }, 2834,  "tcp"  },
-  { "evtp",            { NULL }, 2834,  "udp"  },
-  { "evtp-data",       { NULL }, 2835,  "tcp"  },
-  { "evtp-data",       { NULL }, 2835,  "udp"  },
-  { "catalyst",        { NULL }, 2836,  "tcp"  },
-  { "catalyst",        { NULL }, 2836,  "udp"  },
-  { "repliweb",        { NULL }, 2837,  "tcp"  },
-  { "repliweb",        { NULL }, 2837,  "udp"  },
-  { "starbot",         { NULL }, 2838,  "tcp"  },
-  { "starbot",         { NULL }, 2838,  "udp"  },
-  { "nmsigport",       { NULL }, 2839,  "tcp"  },
-  { "nmsigport",       { NULL }, 2839,  "udp"  },
-  { "l3-exprt",        { NULL }, 2840,  "tcp"  },
-  { "l3-exprt",        { NULL }, 2840,  "udp"  },
-  { "l3-ranger",       { NULL }, 2841,  "tcp"  },
-  { "l3-ranger",       { NULL }, 2841,  "udp"  },
-  { "l3-hawk",         { NULL }, 2842,  "tcp"  },
-  { "l3-hawk",         { NULL }, 2842,  "udp"  },
-  { "pdnet",           { NULL }, 2843,  "tcp"  },
-  { "pdnet",           { NULL }, 2843,  "udp"  },
-  { "bpcp-poll",       { NULL }, 2844,  "tcp"  },
-  { "bpcp-poll",       { NULL }, 2844,  "udp"  },
-  { "bpcp-trap",       { NULL }, 2845,  "tcp"  },
-  { "bpcp-trap",       { NULL }, 2845,  "udp"  },
-  { "aimpp-hello",     { NULL }, 2846,  "tcp"  },
-  { "aimpp-hello",     { NULL }, 2846,  "udp"  },
-  { "aimpp-port-req",  { NULL }, 2847,  "tcp"  },
-  { "aimpp-port-req",  { NULL }, 2847,  "udp"  },
-  { "amt-blc-port",    { NULL }, 2848,  "tcp"  },
-  { "amt-blc-port",    { NULL }, 2848,  "udp"  },
-  { "fxp",             { NULL }, 2849,  "tcp"  },
-  { "fxp",             { NULL }, 2849,  "udp"  },
-  { "metaconsole",     { NULL }, 2850,  "tcp"  },
-  { "metaconsole",     { NULL }, 2850,  "udp"  },
-  { "webemshttp",      { NULL }, 2851,  "tcp"  },
-  { "webemshttp",      { NULL }, 2851,  "udp"  },
-  { "bears-01",        { NULL }, 2852,  "tcp"  },
-  { "bears-01",        { NULL }, 2852,  "udp"  },
-  { "ispipes",         { NULL }, 2853,  "tcp"  },
-  { "ispipes",         { NULL }, 2853,  "udp"  },
-  { "infomover",       { NULL }, 2854,  "tcp"  },
-  { "infomover",       { NULL }, 2854,  "udp"  },
-  { "msrp",            { NULL }, 2855,  "tcp"  },
-  { "msrp",            { NULL }, 2855,  "udp"  },
-  { "cesdinv",         { NULL }, 2856,  "tcp"  },
-  { "cesdinv",         { NULL }, 2856,  "udp"  },
-  { "simctlp",         { NULL }, 2857,  "tcp"  },
-  { "simctlp",         { NULL }, 2857,  "udp"  },
-  { "ecnp",            { NULL }, 2858,  "tcp"  },
-  { "ecnp",            { NULL }, 2858,  "udp"  },
-  { "activememory",    { NULL }, 2859,  "tcp"  },
-  { "activememory",    { NULL }, 2859,  "udp"  },
-  { "dialpad-voice1",  { NULL }, 2860,  "tcp"  },
-  { "dialpad-voice1",  { NULL }, 2860,  "udp"  },
-  { "dialpad-voice2",  { NULL }, 2861,  "tcp"  },
-  { "dialpad-voice2",  { NULL }, 2861,  "udp"  },
-  { "ttg-protocol",    { NULL }, 2862,  "tcp"  },
-  { "ttg-protocol",    { NULL }, 2862,  "udp"  },
-  { "sonardata",       { NULL }, 2863,  "tcp"  },
-  { "sonardata",       { NULL }, 2863,  "udp"  },
-  { "astromed-main",   { NULL }, 2864,  "tcp"  },
-  { "astromed-main",   { NULL }, 2864,  "udp"  },
-  { "pit-vpn",         { NULL }, 2865,  "tcp"  },
-  { "pit-vpn",         { NULL }, 2865,  "udp"  },
-  { "iwlistener",      { NULL }, 2866,  "tcp"  },
-  { "iwlistener",      { NULL }, 2866,  "udp"  },
-  { "esps-portal",     { NULL }, 2867,  "tcp"  },
-  { "esps-portal",     { NULL }, 2867,  "udp"  },
-  { "npep-messaging",  { NULL }, 2868,  "tcp"  },
-  { "npep-messaging",  { NULL }, 2868,  "udp"  },
-  { "icslap",          { NULL }, 2869,  "tcp"  },
-  { "icslap",          { NULL }, 2869,  "udp"  },
-  { "daishi",          { NULL }, 2870,  "tcp"  },
-  { "daishi",          { NULL }, 2870,  "udp"  },
-  { "msi-selectplay",  { NULL }, 2871,  "tcp"  },
-  { "msi-selectplay",  { NULL }, 2871,  "udp"  },
-  { "radix",           { NULL }, 2872,  "tcp"  },
-  { "radix",           { NULL }, 2872,  "udp"  },
-  { "dxmessagebase1",  { NULL }, 2874,  "tcp"  },
-  { "dxmessagebase1",  { NULL }, 2874,  "udp"  },
-  { "dxmessagebase2",  { NULL }, 2875,  "tcp"  },
-  { "dxmessagebase2",  { NULL }, 2875,  "udp"  },
-  { "sps-tunnel",      { NULL }, 2876,  "tcp"  },
-  { "sps-tunnel",      { NULL }, 2876,  "udp"  },
-  { "bluelance",       { NULL }, 2877,  "tcp"  },
-  { "bluelance",       { NULL }, 2877,  "udp"  },
-  { "aap",             { NULL }, 2878,  "tcp"  },
-  { "aap",             { NULL }, 2878,  "udp"  },
-  { "ucentric-ds",     { NULL }, 2879,  "tcp"  },
-  { "ucentric-ds",     { NULL }, 2879,  "udp"  },
-  { "synapse",         { NULL }, 2880,  "tcp"  },
-  { "synapse",         { NULL }, 2880,  "udp"  },
-  { "ndsp",            { NULL }, 2881,  "tcp"  },
-  { "ndsp",            { NULL }, 2881,  "udp"  },
-  { "ndtp",            { NULL }, 2882,  "tcp"  },
-  { "ndtp",            { NULL }, 2882,  "udp"  },
-  { "ndnp",            { NULL }, 2883,  "tcp"  },
-  { "ndnp",            { NULL }, 2883,  "udp"  },
-  { "flashmsg",        { NULL }, 2884,  "tcp"  },
-  { "flashmsg",        { NULL }, 2884,  "udp"  },
-  { "topflow",         { NULL }, 2885,  "tcp"  },
-  { "topflow",         { NULL }, 2885,  "udp"  },
-  { "responselogic",   { NULL }, 2886,  "tcp"  },
-  { "responselogic",   { NULL }, 2886,  "udp"  },
-  { "aironetddp",      { NULL }, 2887,  "tcp"  },
-  { "aironetddp",      { NULL }, 2887,  "udp"  },
-  { "spcsdlobby",      { NULL }, 2888,  "tcp"  },
-  { "spcsdlobby",      { NULL }, 2888,  "udp"  },
-  { "rsom",            { NULL }, 2889,  "tcp"  },
-  { "rsom",            { NULL }, 2889,  "udp"  },
-  { "cspclmulti",      { NULL }, 2890,  "tcp"  },
-  { "cspclmulti",      { NULL }, 2890,  "udp"  },
-  { "cinegrfx-elmd",   { NULL }, 2891,  "tcp"  },
-  { "cinegrfx-elmd",   { NULL }, 2891,  "udp"  },
-  { "snifferdata",     { NULL }, 2892,  "tcp"  },
-  { "snifferdata",     { NULL }, 2892,  "udp"  },
-  { "vseconnector",    { NULL }, 2893,  "tcp"  },
-  { "vseconnector",    { NULL }, 2893,  "udp"  },
-  { "abacus-remote",   { NULL }, 2894,  "tcp"  },
-  { "abacus-remote",   { NULL }, 2894,  "udp"  },
-  { "natuslink",       { NULL }, 2895,  "tcp"  },
-  { "natuslink",       { NULL }, 2895,  "udp"  },
-  { "ecovisiong6-1",   { NULL }, 2896,  "tcp"  },
-  { "ecovisiong6-1",   { NULL }, 2896,  "udp"  },
-  { "citrix-rtmp",     { NULL }, 2897,  "tcp"  },
-  { "citrix-rtmp",     { NULL }, 2897,  "udp"  },
-  { "appliance-cfg",   { NULL }, 2898,  "tcp"  },
-  { "appliance-cfg",   { NULL }, 2898,  "udp"  },
-  { "powergemplus",    { NULL }, 2899,  "tcp"  },
-  { "powergemplus",    { NULL }, 2899,  "udp"  },
-  { "quicksuite",      { NULL }, 2900,  "tcp"  },
-  { "quicksuite",      { NULL }, 2900,  "udp"  },
-  { "allstorcns",      { NULL }, 2901,  "tcp"  },
-  { "allstorcns",      { NULL }, 2901,  "udp"  },
-  { "netaspi",         { NULL }, 2902,  "tcp"  },
-  { "netaspi",         { NULL }, 2902,  "udp"  },
-  { "suitcase",        { NULL }, 2903,  "tcp"  },
-  { "suitcase",        { NULL }, 2903,  "udp"  },
-  { "m2ua",            { NULL }, 2904,  "tcp"  },
-  { "m2ua",            { NULL }, 2904,  "udp"  },
-  { "m2ua",            { NULL }, 2904,  "sctp" },
-  { "m3ua",            { NULL }, 2905,  "tcp"  },
-  { "m3ua",            { NULL }, 2905,  "sctp" },
-  { "caller9",         { NULL }, 2906,  "tcp"  },
-  { "caller9",         { NULL }, 2906,  "udp"  },
-  { "webmethods-b2b",  { NULL }, 2907,  "tcp"  },
-  { "webmethods-b2b",  { NULL }, 2907,  "udp"  },
-  { "mao",             { NULL }, 2908,  "tcp"  },
-  { "mao",             { NULL }, 2908,  "udp"  },
-  { "funk-dialout",    { NULL }, 2909,  "tcp"  },
-  { "funk-dialout",    { NULL }, 2909,  "udp"  },
-  { "tdaccess",        { NULL }, 2910,  "tcp"  },
-  { "tdaccess",        { NULL }, 2910,  "udp"  },
-  { "blockade",        { NULL }, 2911,  "tcp"  },
-  { "blockade",        { NULL }, 2911,  "udp"  },
-  { "epicon",          { NULL }, 2912,  "tcp"  },
-  { "epicon",          { NULL }, 2912,  "udp"  },
-  { "boosterware",     { NULL }, 2913,  "tcp"  },
-  { "boosterware",     { NULL }, 2913,  "udp"  },
-  { "gamelobby",       { NULL }, 2914,  "tcp"  },
-  { "gamelobby",       { NULL }, 2914,  "udp"  },
-  { "tksocket",        { NULL }, 2915,  "tcp"  },
-  { "tksocket",        { NULL }, 2915,  "udp"  },
-  { "elvin_server",    { NULL }, 2916,  "tcp"  },
-  { "elvin_server",    { NULL }, 2916,  "udp"  },
-  { "elvin_client",    { NULL }, 2917,  "tcp"  },
-  { "elvin_client",    { NULL }, 2917,  "udp"  },
-  { "kastenchasepad",  { NULL }, 2918,  "tcp"  },
-  { "kastenchasepad",  { NULL }, 2918,  "udp"  },
-  { "roboer",          { NULL }, 2919,  "tcp"  },
-  { "roboer",          { NULL }, 2919,  "udp"  },
-  { "roboeda",         { NULL }, 2920,  "tcp"  },
-  { "roboeda",         { NULL }, 2920,  "udp"  },
-  { "cesdcdman",       { NULL }, 2921,  "tcp"  },
-  { "cesdcdman",       { NULL }, 2921,  "udp"  },
-  { "cesdcdtrn",       { NULL }, 2922,  "tcp"  },
-  { "cesdcdtrn",       { NULL }, 2922,  "udp"  },
-  { "wta-wsp-wtp-s",   { NULL }, 2923,  "tcp"  },
-  { "wta-wsp-wtp-s",   { NULL }, 2923,  "udp"  },
-  { "precise-vip",     { NULL }, 2924,  "tcp"  },
-  { "precise-vip",     { NULL }, 2924,  "udp"  },
-  { "mobile-file-dl",  { NULL }, 2926,  "tcp"  },
-  { "mobile-file-dl",  { NULL }, 2926,  "udp"  },
-  { "unimobilectrl",   { NULL }, 2927,  "tcp"  },
-  { "unimobilectrl",   { NULL }, 2927,  "udp"  },
-  { "redstone-cpss",   { NULL }, 2928,  "tcp"  },
-  { "redstone-cpss",   { NULL }, 2928,  "udp"  },
-  { "amx-webadmin",    { NULL }, 2929,  "tcp"  },
-  { "amx-webadmin",    { NULL }, 2929,  "udp"  },
-  { "amx-weblinx",     { NULL }, 2930,  "tcp"  },
-  { "amx-weblinx",     { NULL }, 2930,  "udp"  },
-  { "circle-x",        { NULL }, 2931,  "tcp"  },
-  { "circle-x",        { NULL }, 2931,  "udp"  },
-  { "incp",            { NULL }, 2932,  "tcp"  },
-  { "incp",            { NULL }, 2932,  "udp"  },
-  { "4-tieropmgw",     { NULL }, 2933,  "tcp"  },
-  { "4-tieropmgw",     { NULL }, 2933,  "udp"  },
-  { "4-tieropmcli",    { NULL }, 2934,  "tcp"  },
-  { "4-tieropmcli",    { NULL }, 2934,  "udp"  },
-  { "qtp",             { NULL }, 2935,  "tcp"  },
-  { "qtp",             { NULL }, 2935,  "udp"  },
-  { "otpatch",         { NULL }, 2936,  "tcp"  },
-  { "otpatch",         { NULL }, 2936,  "udp"  },
-  { "pnaconsult-lm",   { NULL }, 2937,  "tcp"  },
-  { "pnaconsult-lm",   { NULL }, 2937,  "udp"  },
-  { "sm-pas-1",        { NULL }, 2938,  "tcp"  },
-  { "sm-pas-1",        { NULL }, 2938,  "udp"  },
-  { "sm-pas-2",        { NULL }, 2939,  "tcp"  },
-  { "sm-pas-2",        { NULL }, 2939,  "udp"  },
-  { "sm-pas-3",        { NULL }, 2940,  "tcp"  },
-  { "sm-pas-3",        { NULL }, 2940,  "udp"  },
-  { "sm-pas-4",        { NULL }, 2941,  "tcp"  },
-  { "sm-pas-4",        { NULL }, 2941,  "udp"  },
-  { "sm-pas-5",        { NULL }, 2942,  "tcp"  },
-  { "sm-pas-5",        { NULL }, 2942,  "udp"  },
-  { "ttnrepository",   { NULL }, 2943,  "tcp"  },
-  { "ttnrepository",   { NULL }, 2943,  "udp"  },
-  { "megaco-h248",     { NULL }, 2944,  "tcp"  },
-  { "megaco-h248",     { NULL }, 2944,  "udp"  },
-  { "megaco-h248",     { NULL }, 2944,  "sctp" },
-  { "h248-binary",     { NULL }, 2945,  "tcp"  },
-  { "h248-binary",     { NULL }, 2945,  "udp"  },
-  { "h248-binary",     { NULL }, 2945,  "sctp" },
-  { "fjsvmpor",        { NULL }, 2946,  "tcp"  },
-  { "fjsvmpor",        { NULL }, 2946,  "udp"  },
-  { "gpsd",            { NULL }, 2947,  "tcp"  },
-  { "gpsd",            { NULL }, 2947,  "udp"  },
-  { "wap-push",        { NULL }, 2948,  "tcp"  },
-  { "wap-push",        { NULL }, 2948,  "udp"  },
-  { "wap-pushsecure",  { NULL }, 2949,  "tcp"  },
-  { "wap-pushsecure",  { NULL }, 2949,  "udp"  },
-  { "esip",            { NULL }, 2950,  "tcp"  },
-  { "esip",            { NULL }, 2950,  "udp"  },
-  { "ottp",            { NULL }, 2951,  "tcp"  },
-  { "ottp",            { NULL }, 2951,  "udp"  },
-  { "mpfwsas",         { NULL }, 2952,  "tcp"  },
-  { "mpfwsas",         { NULL }, 2952,  "udp"  },
-  { "ovalarmsrv",      { NULL }, 2953,  "tcp"  },
-  { "ovalarmsrv",      { NULL }, 2953,  "udp"  },
-  { "ovalarmsrv-cmd",  { NULL }, 2954,  "tcp"  },
-  { "ovalarmsrv-cmd",  { NULL }, 2954,  "udp"  },
-  { "csnotify",        { NULL }, 2955,  "tcp"  },
-  { "csnotify",        { NULL }, 2955,  "udp"  },
-  { "ovrimosdbman",    { NULL }, 2956,  "tcp"  },
-  { "ovrimosdbman",    { NULL }, 2956,  "udp"  },
-  { "jmact5",          { NULL }, 2957,  "tcp"  },
-  { "jmact5",          { NULL }, 2957,  "udp"  },
-  { "jmact6",          { NULL }, 2958,  "tcp"  },
-  { "jmact6",          { NULL }, 2958,  "udp"  },
-  { "rmopagt",         { NULL }, 2959,  "tcp"  },
-  { "rmopagt",         { NULL }, 2959,  "udp"  },
-  { "dfoxserver",      { NULL }, 2960,  "tcp"  },
-  { "dfoxserver",      { NULL }, 2960,  "udp"  },
-  { "boldsoft-lm",     { NULL }, 2961,  "tcp"  },
-  { "boldsoft-lm",     { NULL }, 2961,  "udp"  },
-  { "iph-policy-cli",  { NULL }, 2962,  "tcp"  },
-  { "iph-policy-cli",  { NULL }, 2962,  "udp"  },
-  { "iph-policy-adm",  { NULL }, 2963,  "tcp"  },
-  { "iph-policy-adm",  { NULL }, 2963,  "udp"  },
-  { "bullant-srap",    { NULL }, 2964,  "tcp"  },
-  { "bullant-srap",    { NULL }, 2964,  "udp"  },
-  { "bullant-rap",     { NULL }, 2965,  "tcp"  },
-  { "bullant-rap",     { NULL }, 2965,  "udp"  },
-  { "idp-infotrieve",  { NULL }, 2966,  "tcp"  },
-  { "idp-infotrieve",  { NULL }, 2966,  "udp"  },
-  { "ssc-agent",       { NULL }, 2967,  "tcp"  },
-  { "ssc-agent",       { NULL }, 2967,  "udp"  },
-  { "enpp",            { NULL }, 2968,  "tcp"  },
-  { "enpp",            { NULL }, 2968,  "udp"  },
-  { "essp",            { NULL }, 2969,  "tcp"  },
-  { "essp",            { NULL }, 2969,  "udp"  },
-  { "index-net",       { NULL }, 2970,  "tcp"  },
-  { "index-net",       { NULL }, 2970,  "udp"  },
-  { "netclip",         { NULL }, 2971,  "tcp"  },
-  { "netclip",         { NULL }, 2971,  "udp"  },
-  { "pmsm-webrctl",    { NULL }, 2972,  "tcp"  },
-  { "pmsm-webrctl",    { NULL }, 2972,  "udp"  },
-  { "svnetworks",      { NULL }, 2973,  "tcp"  },
-  { "svnetworks",      { NULL }, 2973,  "udp"  },
-  { "signal",          { NULL }, 2974,  "tcp"  },
-  { "signal",          { NULL }, 2974,  "udp"  },
-  { "fjmpcm",          { NULL }, 2975,  "tcp"  },
-  { "fjmpcm",          { NULL }, 2975,  "udp"  },
-  { "cns-srv-port",    { NULL }, 2976,  "tcp"  },
-  { "cns-srv-port",    { NULL }, 2976,  "udp"  },
-  { "ttc-etap-ns",     { NULL }, 2977,  "tcp"  },
-  { "ttc-etap-ns",     { NULL }, 2977,  "udp"  },
-  { "ttc-etap-ds",     { NULL }, 2978,  "tcp"  },
-  { "ttc-etap-ds",     { NULL }, 2978,  "udp"  },
-  { "h263-video",      { NULL }, 2979,  "tcp"  },
-  { "h263-video",      { NULL }, 2979,  "udp"  },
-  { "wimd",            { NULL }, 2980,  "tcp"  },
-  { "wimd",            { NULL }, 2980,  "udp"  },
-  { "mylxamport",      { NULL }, 2981,  "tcp"  },
-  { "mylxamport",      { NULL }, 2981,  "udp"  },
-  { "iwb-whiteboard",  { NULL }, 2982,  "tcp"  },
-  { "iwb-whiteboard",  { NULL }, 2982,  "udp"  },
-  { "netplan",         { NULL }, 2983,  "tcp"  },
-  { "netplan",         { NULL }, 2983,  "udp"  },
-  { "hpidsadmin",      { NULL }, 2984,  "tcp"  },
-  { "hpidsadmin",      { NULL }, 2984,  "udp"  },
-  { "hpidsagent",      { NULL }, 2985,  "tcp"  },
-  { "hpidsagent",      { NULL }, 2985,  "udp"  },
-  { "stonefalls",      { NULL }, 2986,  "tcp"  },
-  { "stonefalls",      { NULL }, 2986,  "udp"  },
-  { "identify",        { NULL }, 2987,  "tcp"  },
-  { "identify",        { NULL }, 2987,  "udp"  },
-  { "hippad",          { NULL }, 2988,  "tcp"  },
-  { "hippad",          { NULL }, 2988,  "udp"  },
-  { "zarkov",          { NULL }, 2989,  "tcp"  },
-  { "zarkov",          { NULL }, 2989,  "udp"  },
-  { "boscap",          { NULL }, 2990,  "tcp"  },
-  { "boscap",          { NULL }, 2990,  "udp"  },
-  { "wkstn-mon",       { NULL }, 2991,  "tcp"  },
-  { "wkstn-mon",       { NULL }, 2991,  "udp"  },
-  { "avenyo",          { NULL }, 2992,  "tcp"  },
-  { "avenyo",          { NULL }, 2992,  "udp"  },
-  { "veritas-vis1",    { NULL }, 2993,  "tcp"  },
-  { "veritas-vis1",    { NULL }, 2993,  "udp"  },
-  { "veritas-vis2",    { NULL }, 2994,  "tcp"  },
-  { "veritas-vis2",    { NULL }, 2994,  "udp"  },
-  { "idrs",            { NULL }, 2995,  "tcp"  },
-  { "idrs",            { NULL }, 2995,  "udp"  },
-  { "vsixml",          { NULL }, 2996,  "tcp"  },
-  { "vsixml",          { NULL }, 2996,  "udp"  },
-  { "rebol",           { NULL }, 2997,  "tcp"  },
-  { "rebol",           { NULL }, 2997,  "udp"  },
-  { "realsecure",      { NULL }, 2998,  "tcp"  },
-  { "realsecure",      { NULL }, 2998,  "udp"  },
-  { "remoteware-un",   { NULL }, 2999,  "tcp"  },
-  { "remoteware-un",   { NULL }, 2999,  "udp"  },
-  { "hbci",            { NULL }, 3000,  "tcp"  },
-  { "hbci",            { NULL }, 3000,  "udp"  },
-  { "remoteware-cl",   { NULL }, 3000,  "tcp"  },
-  { "remoteware-cl",   { NULL }, 3000,  "udp"  },
-  { "exlm-agent",      { NULL }, 3002,  "tcp"  },
-  { "exlm-agent",      { NULL }, 3002,  "udp"  },
-  { "remoteware-srv",  { NULL }, 3002,  "tcp"  },
-  { "remoteware-srv",  { NULL }, 3002,  "udp"  },
-  { "cgms",            { NULL }, 3003,  "tcp"  },
-  { "cgms",            { NULL }, 3003,  "udp"  },
-  { "csoftragent",     { NULL }, 3004,  "tcp"  },
-  { "csoftragent",     { NULL }, 3004,  "udp"  },
-  { "geniuslm",        { NULL }, 3005,  "tcp"  },
-  { "geniuslm",        { NULL }, 3005,  "udp"  },
-  { "ii-admin",        { NULL }, 3006,  "tcp"  },
-  { "ii-admin",        { NULL }, 3006,  "udp"  },
-  { "lotusmtap",       { NULL }, 3007,  "tcp"  },
-  { "lotusmtap",       { NULL }, 3007,  "udp"  },
-  { "midnight-tech",   { NULL }, 3008,  "tcp"  },
-  { "midnight-tech",   { NULL }, 3008,  "udp"  },
-  { "pxc-ntfy",        { NULL }, 3009,  "tcp"  },
-  { "pxc-ntfy",        { NULL }, 3009,  "udp"  },
-  { "gw",              { NULL }, 3010,  "tcp"  },
-  { "ping-pong",       { NULL }, 3010,  "udp"  },
-  { "trusted-web",     { NULL }, 3011,  "tcp"  },
-  { "trusted-web",     { NULL }, 3011,  "udp"  },
-  { "twsdss",          { NULL }, 3012,  "tcp"  },
-  { "twsdss",          { NULL }, 3012,  "udp"  },
-  { "gilatskysurfer",  { NULL }, 3013,  "tcp"  },
-  { "gilatskysurfer",  { NULL }, 3013,  "udp"  },
-  { "broker_service",  { NULL }, 3014,  "tcp"  },
-  { "broker_service",  { NULL }, 3014,  "udp"  },
-  { "nati-dstp",       { NULL }, 3015,  "tcp"  },
-  { "nati-dstp",       { NULL }, 3015,  "udp"  },
-  { "notify_srvr",     { NULL }, 3016,  "tcp"  },
-  { "notify_srvr",     { NULL }, 3016,  "udp"  },
-  { "event_listener",  { NULL }, 3017,  "tcp"  },
-  { "event_listener",  { NULL }, 3017,  "udp"  },
-  { "srvc_registry",   { NULL }, 3018,  "tcp"  },
-  { "srvc_registry",   { NULL }, 3018,  "udp"  },
-  { "resource_mgr",    { NULL }, 3019,  "tcp"  },
-  { "resource_mgr",    { NULL }, 3019,  "udp"  },
-  { "cifs",            { NULL }, 3020,  "tcp"  },
-  { "cifs",            { NULL }, 3020,  "udp"  },
-  { "agriserver",      { NULL }, 3021,  "tcp"  },
-  { "agriserver",      { NULL }, 3021,  "udp"  },
-  { "csregagent",      { NULL }, 3022,  "tcp"  },
-  { "csregagent",      { NULL }, 3022,  "udp"  },
-  { "magicnotes",      { NULL }, 3023,  "tcp"  },
-  { "magicnotes",      { NULL }, 3023,  "udp"  },
-  { "nds_sso",         { NULL }, 3024,  "tcp"  },
-  { "nds_sso",         { NULL }, 3024,  "udp"  },
-  { "arepa-raft",      { NULL }, 3025,  "tcp"  },
-  { "arepa-raft",      { NULL }, 3025,  "udp"  },
-  { "agri-gateway",    { NULL }, 3026,  "tcp"  },
-  { "agri-gateway",    { NULL }, 3026,  "udp"  },
-  { "LiebDevMgmt_C",   { NULL }, 3027,  "tcp"  },
-  { "LiebDevMgmt_C",   { NULL }, 3027,  "udp"  },
-  { "LiebDevMgmt_DM",  { NULL }, 3028,  "tcp"  },
-  { "LiebDevMgmt_DM",  { NULL }, 3028,  "udp"  },
-  { "LiebDevMgmt_A",   { NULL }, 3029,  "tcp"  },
-  { "LiebDevMgmt_A",   { NULL }, 3029,  "udp"  },
-  { "arepa-cas",       { NULL }, 3030,  "tcp"  },
-  { "arepa-cas",       { NULL }, 3030,  "udp"  },
-  { "eppc",            { NULL }, 3031,  "tcp"  },
-  { "eppc",            { NULL }, 3031,  "udp"  },
-  { "redwood-chat",    { NULL }, 3032,  "tcp"  },
-  { "redwood-chat",    { NULL }, 3032,  "udp"  },
-  { "pdb",             { NULL }, 3033,  "tcp"  },
-  { "pdb",             { NULL }, 3033,  "udp"  },
-  { "osmosis-aeea",    { NULL }, 3034,  "tcp"  },
-  { "osmosis-aeea",    { NULL }, 3034,  "udp"  },
-  { "fjsv-gssagt",     { NULL }, 3035,  "tcp"  },
-  { "fjsv-gssagt",     { NULL }, 3035,  "udp"  },
-  { "hagel-dump",      { NULL }, 3036,  "tcp"  },
-  { "hagel-dump",      { NULL }, 3036,  "udp"  },
-  { "hp-san-mgmt",     { NULL }, 3037,  "tcp"  },
-  { "hp-san-mgmt",     { NULL }, 3037,  "udp"  },
-  { "santak-ups",      { NULL }, 3038,  "tcp"  },
-  { "santak-ups",      { NULL }, 3038,  "udp"  },
-  { "cogitate",        { NULL }, 3039,  "tcp"  },
-  { "cogitate",        { NULL }, 3039,  "udp"  },
-  { "tomato-springs",  { NULL }, 3040,  "tcp"  },
-  { "tomato-springs",  { NULL }, 3040,  "udp"  },
-  { "di-traceware",    { NULL }, 3041,  "tcp"  },
-  { "di-traceware",    { NULL }, 3041,  "udp"  },
-  { "journee",         { NULL }, 3042,  "tcp"  },
-  { "journee",         { NULL }, 3042,  "udp"  },
-  { "brp",             { NULL }, 3043,  "tcp"  },
-  { "brp",             { NULL }, 3043,  "udp"  },
-  { "epp",             { NULL }, 3044,  "tcp"  },
-  { "epp",             { NULL }, 3044,  "udp"  },
-  { "responsenet",     { NULL }, 3045,  "tcp"  },
-  { "responsenet",     { NULL }, 3045,  "udp"  },
-  { "di-ase",          { NULL }, 3046,  "tcp"  },
-  { "di-ase",          { NULL }, 3046,  "udp"  },
-  { "hlserver",        { NULL }, 3047,  "tcp"  },
-  { "hlserver",        { NULL }, 3047,  "udp"  },
-  { "pctrader",        { NULL }, 3048,  "tcp"  },
-  { "pctrader",        { NULL }, 3048,  "udp"  },
-  { "nsws",            { NULL }, 3049,  "tcp"  },
-  { "nsws",            { NULL }, 3049,  "udp"  },
-  { "gds_db",          { NULL }, 3050,  "tcp"  },
-  { "gds_db",          { NULL }, 3050,  "udp"  },
-  { "galaxy-server",   { NULL }, 3051,  "tcp"  },
-  { "galaxy-server",   { NULL }, 3051,  "udp"  },
-  { "apc-3052",        { NULL }, 3052,  "tcp"  },
-  { "apc-3052",        { NULL }, 3052,  "udp"  },
-  { "dsom-server",     { NULL }, 3053,  "tcp"  },
-  { "dsom-server",     { NULL }, 3053,  "udp"  },
-  { "amt-cnf-prot",    { NULL }, 3054,  "tcp"  },
-  { "amt-cnf-prot",    { NULL }, 3054,  "udp"  },
-  { "policyserver",    { NULL }, 3055,  "tcp"  },
-  { "policyserver",    { NULL }, 3055,  "udp"  },
-  { "cdl-server",      { NULL }, 3056,  "tcp"  },
-  { "cdl-server",      { NULL }, 3056,  "udp"  },
-  { "goahead-fldup",   { NULL }, 3057,  "tcp"  },
-  { "goahead-fldup",   { NULL }, 3057,  "udp"  },
-  { "videobeans",      { NULL }, 3058,  "tcp"  },
-  { "videobeans",      { NULL }, 3058,  "udp"  },
-  { "qsoft",           { NULL }, 3059,  "tcp"  },
-  { "qsoft",           { NULL }, 3059,  "udp"  },
-  { "interserver",     { NULL }, 3060,  "tcp"  },
-  { "interserver",     { NULL }, 3060,  "udp"  },
-  { "cautcpd",         { NULL }, 3061,  "tcp"  },
-  { "cautcpd",         { NULL }, 3061,  "udp"  },
-  { "ncacn-ip-tcp",    { NULL }, 3062,  "tcp"  },
-  { "ncacn-ip-tcp",    { NULL }, 3062,  "udp"  },
-  { "ncadg-ip-udp",    { NULL }, 3063,  "tcp"  },
-  { "ncadg-ip-udp",    { NULL }, 3063,  "udp"  },
-  { "rprt",            { NULL }, 3064,  "tcp"  },
-  { "rprt",            { NULL }, 3064,  "udp"  },
-  { "slinterbase",     { NULL }, 3065,  "tcp"  },
-  { "slinterbase",     { NULL }, 3065,  "udp"  },
-  { "netattachsdmp",   { NULL }, 3066,  "tcp"  },
-  { "netattachsdmp",   { NULL }, 3066,  "udp"  },
-  { "fjhpjp",          { NULL }, 3067,  "tcp"  },
-  { "fjhpjp",          { NULL }, 3067,  "udp"  },
-  { "ls3bcast",        { NULL }, 3068,  "tcp"  },
-  { "ls3bcast",        { NULL }, 3068,  "udp"  },
-  { "ls3",             { NULL }, 3069,  "tcp"  },
-  { "ls3",             { NULL }, 3069,  "udp"  },
-  { "mgxswitch",       { NULL }, 3070,  "tcp"  },
-  { "mgxswitch",       { NULL }, 3070,  "udp"  },
-  { "csd-mgmt-port",   { NULL }, 3071,  "tcp"  },
-  { "csd-mgmt-port",   { NULL }, 3071,  "udp"  },
-  { "csd-monitor",     { NULL }, 3072,  "tcp"  },
-  { "csd-monitor",     { NULL }, 3072,  "udp"  },
-  { "vcrp",            { NULL }, 3073,  "tcp"  },
-  { "vcrp",            { NULL }, 3073,  "udp"  },
-  { "xbox",            { NULL }, 3074,  "tcp"  },
-  { "xbox",            { NULL }, 3074,  "udp"  },
-  { "orbix-locator",   { NULL }, 3075,  "tcp"  },
-  { "orbix-locator",   { NULL }, 3075,  "udp"  },
-  { "orbix-config",    { NULL }, 3076,  "tcp"  },
-  { "orbix-config",    { NULL }, 3076,  "udp"  },
-  { "orbix-loc-ssl",   { NULL }, 3077,  "tcp"  },
-  { "orbix-loc-ssl",   { NULL }, 3077,  "udp"  },
-  { "orbix-cfg-ssl",   { NULL }, 3078,  "tcp"  },
-  { "orbix-cfg-ssl",   { NULL }, 3078,  "udp"  },
-  { "lv-frontpanel",   { NULL }, 3079,  "tcp"  },
-  { "lv-frontpanel",   { NULL }, 3079,  "udp"  },
-  { "stm_pproc",       { NULL }, 3080,  "tcp"  },
-  { "stm_pproc",       { NULL }, 3080,  "udp"  },
-  { "tl1-lv",          { NULL }, 3081,  "tcp"  },
-  { "tl1-lv",          { NULL }, 3081,  "udp"  },
-  { "tl1-raw",         { NULL }, 3082,  "tcp"  },
-  { "tl1-raw",         { NULL }, 3082,  "udp"  },
-  { "tl1-telnet",      { NULL }, 3083,  "tcp"  },
-  { "tl1-telnet",      { NULL }, 3083,  "udp"  },
-  { "itm-mccs",        { NULL }, 3084,  "tcp"  },
-  { "itm-mccs",        { NULL }, 3084,  "udp"  },
-  { "pcihreq",         { NULL }, 3085,  "tcp"  },
-  { "pcihreq",         { NULL }, 3085,  "udp"  },
-  { "jdl-dbkitchen",   { NULL }, 3086,  "tcp"  },
-  { "jdl-dbkitchen",   { NULL }, 3086,  "udp"  },
-  { "asoki-sma",       { NULL }, 3087,  "tcp"  },
-  { "asoki-sma",       { NULL }, 3087,  "udp"  },
-  { "xdtp",            { NULL }, 3088,  "tcp"  },
-  { "xdtp",            { NULL }, 3088,  "udp"  },
-  { "ptk-alink",       { NULL }, 3089,  "tcp"  },
-  { "ptk-alink",       { NULL }, 3089,  "udp"  },
-  { "stss",            { NULL }, 3090,  "tcp"  },
-  { "stss",            { NULL }, 3090,  "udp"  },
-  { "1ci-smcs",        { NULL }, 3091,  "tcp"  },
-  { "1ci-smcs",        { NULL }, 3091,  "udp"  },
-  { "rapidmq-center",  { NULL }, 3093,  "tcp"  },
-  { "rapidmq-center",  { NULL }, 3093,  "udp"  },
-  { "rapidmq-reg",     { NULL }, 3094,  "tcp"  },
-  { "rapidmq-reg",     { NULL }, 3094,  "udp"  },
-  { "panasas",         { NULL }, 3095,  "tcp"  },
-  { "panasas",         { NULL }, 3095,  "udp"  },
-  { "ndl-aps",         { NULL }, 3096,  "tcp"  },
-  { "ndl-aps",         { NULL }, 3096,  "udp"  },
-  { "itu-bicc-stc",    { NULL }, 3097,  "sctp" },
-  { "umm-port",        { NULL }, 3098,  "tcp"  },
-  { "umm-port",        { NULL }, 3098,  "udp"  },
-  { "chmd",            { NULL }, 3099,  "tcp"  },
-  { "chmd",            { NULL }, 3099,  "udp"  },
-  { "opcon-xps",       { NULL }, 3100,  "tcp"  },
-  { "opcon-xps",       { NULL }, 3100,  "udp"  },
-  { "hp-pxpib",        { NULL }, 3101,  "tcp"  },
-  { "hp-pxpib",        { NULL }, 3101,  "udp"  },
-  { "slslavemon",      { NULL }, 3102,  "tcp"  },
-  { "slslavemon",      { NULL }, 3102,  "udp"  },
-  { "autocuesmi",      { NULL }, 3103,  "tcp"  },
-  { "autocuesmi",      { NULL }, 3103,  "udp"  },
-  { "autocuelog",      { NULL }, 3104,  "tcp"  },
-  { "autocuetime",     { NULL }, 3104,  "udp"  },
-  { "cardbox",         { NULL }, 3105,  "tcp"  },
-  { "cardbox",         { NULL }, 3105,  "udp"  },
-  { "cardbox-http",    { NULL }, 3106,  "tcp"  },
-  { "cardbox-http",    { NULL }, 3106,  "udp"  },
-  { "business",        { NULL }, 3107,  "tcp"  },
-  { "business",        { NULL }, 3107,  "udp"  },
-  { "geolocate",       { NULL }, 3108,  "tcp"  },
-  { "geolocate",       { NULL }, 3108,  "udp"  },
-  { "personnel",       { NULL }, 3109,  "tcp"  },
-  { "personnel",       { NULL }, 3109,  "udp"  },
-  { "sim-control",     { NULL }, 3110,  "tcp"  },
-  { "sim-control",     { NULL }, 3110,  "udp"  },
-  { "wsynch",          { NULL }, 3111,  "tcp"  },
-  { "wsynch",          { NULL }, 3111,  "udp"  },
-  { "ksysguard",       { NULL }, 3112,  "tcp"  },
-  { "ksysguard",       { NULL }, 3112,  "udp"  },
-  { "cs-auth-svr",     { NULL }, 3113,  "tcp"  },
-  { "cs-auth-svr",     { NULL }, 3113,  "udp"  },
-  { "ccmad",           { NULL }, 3114,  "tcp"  },
-  { "ccmad",           { NULL }, 3114,  "udp"  },
-  { "mctet-master",    { NULL }, 3115,  "tcp"  },
-  { "mctet-master",    { NULL }, 3115,  "udp"  },
-  { "mctet-gateway",   { NULL }, 3116,  "tcp"  },
-  { "mctet-gateway",   { NULL }, 3116,  "udp"  },
-  { "mctet-jserv",     { NULL }, 3117,  "tcp"  },
-  { "mctet-jserv",     { NULL }, 3117,  "udp"  },
-  { "pkagent",         { NULL }, 3118,  "tcp"  },
-  { "pkagent",         { NULL }, 3118,  "udp"  },
-  { "d2000kernel",     { NULL }, 3119,  "tcp"  },
-  { "d2000kernel",     { NULL }, 3119,  "udp"  },
-  { "d2000webserver",  { NULL }, 3120,  "tcp"  },
-  { "d2000webserver",  { NULL }, 3120,  "udp"  },
-  { "vtr-emulator",    { NULL }, 3122,  "tcp"  },
-  { "vtr-emulator",    { NULL }, 3122,  "udp"  },
-  { "edix",            { NULL }, 3123,  "tcp"  },
-  { "edix",            { NULL }, 3123,  "udp"  },
-  { "beacon-port",     { NULL }, 3124,  "tcp"  },
-  { "beacon-port",     { NULL }, 3124,  "udp"  },
-  { "a13-an",          { NULL }, 3125,  "tcp"  },
-  { "a13-an",          { NULL }, 3125,  "udp"  },
-  { "ctx-bridge",      { NULL }, 3127,  "tcp"  },
-  { "ctx-bridge",      { NULL }, 3127,  "udp"  },
-  { "ndl-aas",         { NULL }, 3128,  "tcp"  },
-  { "ndl-aas",         { NULL }, 3128,  "udp"  },
-  { "netport-id",      { NULL }, 3129,  "tcp"  },
-  { "netport-id",      { NULL }, 3129,  "udp"  },
-  { "icpv2",           { NULL }, 3130,  "tcp"  },
-  { "icpv2",           { NULL }, 3130,  "udp"  },
-  { "netbookmark",     { NULL }, 3131,  "tcp"  },
-  { "netbookmark",     { NULL }, 3131,  "udp"  },
-  { "ms-rule-engine",  { NULL }, 3132,  "tcp"  },
-  { "ms-rule-engine",  { NULL }, 3132,  "udp"  },
-  { "prism-deploy",    { NULL }, 3133,  "tcp"  },
-  { "prism-deploy",    { NULL }, 3133,  "udp"  },
-  { "ecp",             { NULL }, 3134,  "tcp"  },
-  { "ecp",             { NULL }, 3134,  "udp"  },
-  { "peerbook-port",   { NULL }, 3135,  "tcp"  },
-  { "peerbook-port",   { NULL }, 3135,  "udp"  },
-  { "grubd",           { NULL }, 3136,  "tcp"  },
-  { "grubd",           { NULL }, 3136,  "udp"  },
-  { "rtnt-1",          { NULL }, 3137,  "tcp"  },
-  { "rtnt-1",          { NULL }, 3137,  "udp"  },
-  { "rtnt-2",          { NULL }, 3138,  "tcp"  },
-  { "rtnt-2",          { NULL }, 3138,  "udp"  },
-  { "incognitorv",     { NULL }, 3139,  "tcp"  },
-  { "incognitorv",     { NULL }, 3139,  "udp"  },
-  { "ariliamulti",     { NULL }, 3140,  "tcp"  },
-  { "ariliamulti",     { NULL }, 3140,  "udp"  },
-  { "vmodem",          { NULL }, 3141,  "tcp"  },
-  { "vmodem",          { NULL }, 3141,  "udp"  },
-  { "rdc-wh-eos",      { NULL }, 3142,  "tcp"  },
-  { "rdc-wh-eos",      { NULL }, 3142,  "udp"  },
-  { "seaview",         { NULL }, 3143,  "tcp"  },
-  { "seaview",         { NULL }, 3143,  "udp"  },
-  { "tarantella",      { NULL }, 3144,  "tcp"  },
-  { "tarantella",      { NULL }, 3144,  "udp"  },
-  { "csi-lfap",        { NULL }, 3145,  "tcp"  },
-  { "csi-lfap",        { NULL }, 3145,  "udp"  },
-  { "bears-02",        { NULL }, 3146,  "tcp"  },
-  { "bears-02",        { NULL }, 3146,  "udp"  },
-  { "rfio",            { NULL }, 3147,  "tcp"  },
-  { "rfio",            { NULL }, 3147,  "udp"  },
-  { "nm-game-admin",   { NULL }, 3148,  "tcp"  },
-  { "nm-game-admin",   { NULL }, 3148,  "udp"  },
-  { "nm-game-server",  { NULL }, 3149,  "tcp"  },
-  { "nm-game-server",  { NULL }, 3149,  "udp"  },
-  { "nm-asses-admin",  { NULL }, 3150,  "tcp"  },
-  { "nm-asses-admin",  { NULL }, 3150,  "udp"  },
-  { "nm-assessor",     { NULL }, 3151,  "tcp"  },
-  { "nm-assessor",     { NULL }, 3151,  "udp"  },
-  { "feitianrockey",   { NULL }, 3152,  "tcp"  },
-  { "feitianrockey",   { NULL }, 3152,  "udp"  },
-  { "s8-client-port",  { NULL }, 3153,  "tcp"  },
-  { "s8-client-port",  { NULL }, 3153,  "udp"  },
-  { "ccmrmi",          { NULL }, 3154,  "tcp"  },
-  { "ccmrmi",          { NULL }, 3154,  "udp"  },
-  { "jpegmpeg",        { NULL }, 3155,  "tcp"  },
-  { "jpegmpeg",        { NULL }, 3155,  "udp"  },
-  { "indura",          { NULL }, 3156,  "tcp"  },
-  { "indura",          { NULL }, 3156,  "udp"  },
-  { "e3consultants",   { NULL }, 3157,  "tcp"  },
-  { "e3consultants",   { NULL }, 3157,  "udp"  },
-  { "stvp",            { NULL }, 3158,  "tcp"  },
-  { "stvp",            { NULL }, 3158,  "udp"  },
-  { "navegaweb-port",  { NULL }, 3159,  "tcp"  },
-  { "navegaweb-port",  { NULL }, 3159,  "udp"  },
-  { "tip-app-server",  { NULL }, 3160,  "tcp"  },
-  { "tip-app-server",  { NULL }, 3160,  "udp"  },
-  { "doc1lm",          { NULL }, 3161,  "tcp"  },
-  { "doc1lm",          { NULL }, 3161,  "udp"  },
-  { "sflm",            { NULL }, 3162,  "tcp"  },
-  { "sflm",            { NULL }, 3162,  "udp"  },
-  { "res-sap",         { NULL }, 3163,  "tcp"  },
-  { "res-sap",         { NULL }, 3163,  "udp"  },
-  { "imprs",           { NULL }, 3164,  "tcp"  },
-  { "imprs",           { NULL }, 3164,  "udp"  },
-  { "newgenpay",       { NULL }, 3165,  "tcp"  },
-  { "newgenpay",       { NULL }, 3165,  "udp"  },
-  { "sossecollector",  { NULL }, 3166,  "tcp"  },
-  { "sossecollector",  { NULL }, 3166,  "udp"  },
-  { "nowcontact",      { NULL }, 3167,  "tcp"  },
-  { "nowcontact",      { NULL }, 3167,  "udp"  },
-  { "poweronnud",      { NULL }, 3168,  "tcp"  },
-  { "poweronnud",      { NULL }, 3168,  "udp"  },
-  { "serverview-as",   { NULL }, 3169,  "tcp"  },
-  { "serverview-as",   { NULL }, 3169,  "udp"  },
-  { "serverview-asn",  { NULL }, 3170,  "tcp"  },
-  { "serverview-asn",  { NULL }, 3170,  "udp"  },
-  { "serverview-gf",   { NULL }, 3171,  "tcp"  },
-  { "serverview-gf",   { NULL }, 3171,  "udp"  },
-  { "serverview-rm",   { NULL }, 3172,  "tcp"  },
-  { "serverview-rm",   { NULL }, 3172,  "udp"  },
-  { "serverview-icc",  { NULL }, 3173,  "tcp"  },
-  { "serverview-icc",  { NULL }, 3173,  "udp"  },
-  { "armi-server",     { NULL }, 3174,  "tcp"  },
-  { "armi-server",     { NULL }, 3174,  "udp"  },
-  { "t1-e1-over-ip",   { NULL }, 3175,  "tcp"  },
-  { "t1-e1-over-ip",   { NULL }, 3175,  "udp"  },
-  { "ars-master",      { NULL }, 3176,  "tcp"  },
-  { "ars-master",      { NULL }, 3176,  "udp"  },
-  { "phonex-port",     { NULL }, 3177,  "tcp"  },
-  { "phonex-port",     { NULL }, 3177,  "udp"  },
-  { "radclientport",   { NULL }, 3178,  "tcp"  },
-  { "radclientport",   { NULL }, 3178,  "udp"  },
-  { "h2gf-w-2m",       { NULL }, 3179,  "tcp"  },
-  { "h2gf-w-2m",       { NULL }, 3179,  "udp"  },
-  { "mc-brk-srv",      { NULL }, 3180,  "tcp"  },
-  { "mc-brk-srv",      { NULL }, 3180,  "udp"  },
-  { "bmcpatrolagent",  { NULL }, 3181,  "tcp"  },
-  { "bmcpatrolagent",  { NULL }, 3181,  "udp"  },
-  { "bmcpatrolrnvu",   { NULL }, 3182,  "tcp"  },
-  { "bmcpatrolrnvu",   { NULL }, 3182,  "udp"  },
-  { "cops-tls",        { NULL }, 3183,  "tcp"  },
-  { "cops-tls",        { NULL }, 3183,  "udp"  },
-  { "apogeex-port",    { NULL }, 3184,  "tcp"  },
-  { "apogeex-port",    { NULL }, 3184,  "udp"  },
-  { "smpppd",          { NULL }, 3185,  "tcp"  },
-  { "smpppd",          { NULL }, 3185,  "udp"  },
-  { "iiw-port",        { NULL }, 3186,  "tcp"  },
-  { "iiw-port",        { NULL }, 3186,  "udp"  },
-  { "odi-port",        { NULL }, 3187,  "tcp"  },
-  { "odi-port",        { NULL }, 3187,  "udp"  },
-  { "brcm-comm-port",  { NULL }, 3188,  "tcp"  },
-  { "brcm-comm-port",  { NULL }, 3188,  "udp"  },
-  { "pcle-infex",      { NULL }, 3189,  "tcp"  },
-  { "pcle-infex",      { NULL }, 3189,  "udp"  },
-  { "csvr-proxy",      { NULL }, 3190,  "tcp"  },
-  { "csvr-proxy",      { NULL }, 3190,  "udp"  },
-  { "csvr-sslproxy",   { NULL }, 3191,  "tcp"  },
-  { "csvr-sslproxy",   { NULL }, 3191,  "udp"  },
-  { "firemonrcc",      { NULL }, 3192,  "tcp"  },
-  { "firemonrcc",      { NULL }, 3192,  "udp"  },
-  { "spandataport",    { NULL }, 3193,  "tcp"  },
-  { "spandataport",    { NULL }, 3193,  "udp"  },
-  { "magbind",         { NULL }, 3194,  "tcp"  },
-  { "magbind",         { NULL }, 3194,  "udp"  },
-  { "ncu-1",           { NULL }, 3195,  "tcp"  },
-  { "ncu-1",           { NULL }, 3195,  "udp"  },
-  { "ncu-2",           { NULL }, 3196,  "tcp"  },
-  { "ncu-2",           { NULL }, 3196,  "udp"  },
-  { "embrace-dp-s",    { NULL }, 3197,  "tcp"  },
-  { "embrace-dp-s",    { NULL }, 3197,  "udp"  },
-  { "embrace-dp-c",    { NULL }, 3198,  "tcp"  },
-  { "embrace-dp-c",    { NULL }, 3198,  "udp"  },
-  { "dmod-workspace",  { NULL }, 3199,  "tcp"  },
-  { "dmod-workspace",  { NULL }, 3199,  "udp"  },
-  { "tick-port",       { NULL }, 3200,  "tcp"  },
-  { "tick-port",       { NULL }, 3200,  "udp"  },
-  { "cpq-tasksmart",   { NULL }, 3201,  "tcp"  },
-  { "cpq-tasksmart",   { NULL }, 3201,  "udp"  },
-  { "intraintra",      { NULL }, 3202,  "tcp"  },
-  { "intraintra",      { NULL }, 3202,  "udp"  },
-  { "netwatcher-mon",  { NULL }, 3203,  "tcp"  },
-  { "netwatcher-mon",  { NULL }, 3203,  "udp"  },
-  { "netwatcher-db",   { NULL }, 3204,  "tcp"  },
-  { "netwatcher-db",   { NULL }, 3204,  "udp"  },
-  { "isns",            { NULL }, 3205,  "tcp"  },
-  { "isns",            { NULL }, 3205,  "udp"  },
-  { "ironmail",        { NULL }, 3206,  "tcp"  },
-  { "ironmail",        { NULL }, 3206,  "udp"  },
-  { "vx-auth-port",    { NULL }, 3207,  "tcp"  },
-  { "vx-auth-port",    { NULL }, 3207,  "udp"  },
-  { "pfu-prcallback",  { NULL }, 3208,  "tcp"  },
-  { "pfu-prcallback",  { NULL }, 3208,  "udp"  },
-  { "netwkpathengine", { NULL }, 3209,  "tcp"  },
-  { "netwkpathengine", { NULL }, 3209,  "udp"  },
-  { "flamenco-proxy",  { NULL }, 3210,  "tcp"  },
-  { "flamenco-proxy",  { NULL }, 3210,  "udp"  },
-  { "avsecuremgmt",    { NULL }, 3211,  "tcp"  },
-  { "avsecuremgmt",    { NULL }, 3211,  "udp"  },
-  { "surveyinst",      { NULL }, 3212,  "tcp"  },
-  { "surveyinst",      { NULL }, 3212,  "udp"  },
-  { "neon24x7",        { NULL }, 3213,  "tcp"  },
-  { "neon24x7",        { NULL }, 3213,  "udp"  },
-  { "jmq-daemon-1",    { NULL }, 3214,  "tcp"  },
-  { "jmq-daemon-1",    { NULL }, 3214,  "udp"  },
-  { "jmq-daemon-2",    { NULL }, 3215,  "tcp"  },
-  { "jmq-daemon-2",    { NULL }, 3215,  "udp"  },
-  { "ferrari-foam",    { NULL }, 3216,  "tcp"  },
-  { "ferrari-foam",    { NULL }, 3216,  "udp"  },
-  { "unite",           { NULL }, 3217,  "tcp"  },
-  { "unite",           { NULL }, 3217,  "udp"  },
-  { "smartpackets",    { NULL }, 3218,  "tcp"  },
-  { "smartpackets",    { NULL }, 3218,  "udp"  },
-  { "wms-messenger",   { NULL }, 3219,  "tcp"  },
-  { "wms-messenger",   { NULL }, 3219,  "udp"  },
-  { "xnm-ssl",         { NULL }, 3220,  "tcp"  },
-  { "xnm-ssl",         { NULL }, 3220,  "udp"  },
-  { "xnm-clear-text",  { NULL }, 3221,  "tcp"  },
-  { "xnm-clear-text",  { NULL }, 3221,  "udp"  },
-  { "glbp",            { NULL }, 3222,  "tcp"  },
-  { "glbp",            { NULL }, 3222,  "udp"  },
-  { "digivote",        { NULL }, 3223,  "tcp"  },
-  { "digivote",        { NULL }, 3223,  "udp"  },
-  { "aes-discovery",   { NULL }, 3224,  "tcp"  },
-  { "aes-discovery",   { NULL }, 3224,  "udp"  },
-  { "fcip-port",       { NULL }, 3225,  "tcp"  },
-  { "fcip-port",       { NULL }, 3225,  "udp"  },
-  { "isi-irp",         { NULL }, 3226,  "tcp"  },
-  { "isi-irp",         { NULL }, 3226,  "udp"  },
-  { "dwnmshttp",       { NULL }, 3227,  "tcp"  },
-  { "dwnmshttp",       { NULL }, 3227,  "udp"  },
-  { "dwmsgserver",     { NULL }, 3228,  "tcp"  },
-  { "dwmsgserver",     { NULL }, 3228,  "udp"  },
-  { "global-cd-port",  { NULL }, 3229,  "tcp"  },
-  { "global-cd-port",  { NULL }, 3229,  "udp"  },
-  { "sftdst-port",     { NULL }, 3230,  "tcp"  },
-  { "sftdst-port",     { NULL }, 3230,  "udp"  },
-  { "vidigo",          { NULL }, 3231,  "tcp"  },
-  { "vidigo",          { NULL }, 3231,  "udp"  },
-  { "mdtp",            { NULL }, 3232,  "tcp"  },
-  { "mdtp",            { NULL }, 3232,  "udp"  },
-  { "whisker",         { NULL }, 3233,  "tcp"  },
-  { "whisker",         { NULL }, 3233,  "udp"  },
-  { "alchemy",         { NULL }, 3234,  "tcp"  },
-  { "alchemy",         { NULL }, 3234,  "udp"  },
-  { "mdap-port",       { NULL }, 3235,  "tcp"  },
-  { "mdap-port",       { NULL }, 3235,  "udp"  },
-  { "apparenet-ts",    { NULL }, 3236,  "tcp"  },
-  { "apparenet-ts",    { NULL }, 3236,  "udp"  },
-  { "apparenet-tps",   { NULL }, 3237,  "tcp"  },
-  { "apparenet-tps",   { NULL }, 3237,  "udp"  },
-  { "apparenet-as",    { NULL }, 3238,  "tcp"  },
-  { "apparenet-as",    { NULL }, 3238,  "udp"  },
-  { "apparenet-ui",    { NULL }, 3239,  "tcp"  },
-  { "apparenet-ui",    { NULL }, 3239,  "udp"  },
-  { "triomotion",      { NULL }, 3240,  "tcp"  },
-  { "triomotion",      { NULL }, 3240,  "udp"  },
-  { "sysorb",          { NULL }, 3241,  "tcp"  },
-  { "sysorb",          { NULL }, 3241,  "udp"  },
-  { "sdp-id-port",     { NULL }, 3242,  "tcp"  },
-  { "sdp-id-port",     { NULL }, 3242,  "udp"  },
-  { "timelot",         { NULL }, 3243,  "tcp"  },
-  { "timelot",         { NULL }, 3243,  "udp"  },
-  { "onesaf",          { NULL }, 3244,  "tcp"  },
-  { "onesaf",          { NULL }, 3244,  "udp"  },
-  { "vieo-fe",         { NULL }, 3245,  "tcp"  },
-  { "vieo-fe",         { NULL }, 3245,  "udp"  },
-  { "dvt-system",      { NULL }, 3246,  "tcp"  },
-  { "dvt-system",      { NULL }, 3246,  "udp"  },
-  { "dvt-data",        { NULL }, 3247,  "tcp"  },
-  { "dvt-data",        { NULL }, 3247,  "udp"  },
-  { "procos-lm",       { NULL }, 3248,  "tcp"  },
-  { "procos-lm",       { NULL }, 3248,  "udp"  },
-  { "ssp",             { NULL }, 3249,  "tcp"  },
-  { "ssp",             { NULL }, 3249,  "udp"  },
-  { "hicp",            { NULL }, 3250,  "tcp"  },
-  { "hicp",            { NULL }, 3250,  "udp"  },
-  { "sysscanner",      { NULL }, 3251,  "tcp"  },
-  { "sysscanner",      { NULL }, 3251,  "udp"  },
-  { "dhe",             { NULL }, 3252,  "tcp"  },
-  { "dhe",             { NULL }, 3252,  "udp"  },
-  { "pda-data",        { NULL }, 3253,  "tcp"  },
-  { "pda-data",        { NULL }, 3253,  "udp"  },
-  { "pda-sys",         { NULL }, 3254,  "tcp"  },
-  { "pda-sys",         { NULL }, 3254,  "udp"  },
-  { "semaphore",       { NULL }, 3255,  "tcp"  },
-  { "semaphore",       { NULL }, 3255,  "udp"  },
-  { "cpqrpm-agent",    { NULL }, 3256,  "tcp"  },
-  { "cpqrpm-agent",    { NULL }, 3256,  "udp"  },
-  { "cpqrpm-server",   { NULL }, 3257,  "tcp"  },
-  { "cpqrpm-server",   { NULL }, 3257,  "udp"  },
-  { "ivecon-port",     { NULL }, 3258,  "tcp"  },
-  { "ivecon-port",     { NULL }, 3258,  "udp"  },
-  { "epncdp2",         { NULL }, 3259,  "tcp"  },
-  { "epncdp2",         { NULL }, 3259,  "udp"  },
-  { "iscsi-target",    { NULL }, 3260,  "tcp"  },
-  { "iscsi-target",    { NULL }, 3260,  "udp"  },
-  { "winshadow",       { NULL }, 3261,  "tcp"  },
-  { "winshadow",       { NULL }, 3261,  "udp"  },
-  { "necp",            { NULL }, 3262,  "tcp"  },
-  { "necp",            { NULL }, 3262,  "udp"  },
-  { "ecolor-imager",   { NULL }, 3263,  "tcp"  },
-  { "ecolor-imager",   { NULL }, 3263,  "udp"  },
-  { "ccmail",          { NULL }, 3264,  "tcp"  },
-  { "ccmail",          { NULL }, 3264,  "udp"  },
-  { "altav-tunnel",    { NULL }, 3265,  "tcp"  },
-  { "altav-tunnel",    { NULL }, 3265,  "udp"  },
-  { "ns-cfg-server",   { NULL }, 3266,  "tcp"  },
-  { "ns-cfg-server",   { NULL }, 3266,  "udp"  },
-  { "ibm-dial-out",    { NULL }, 3267,  "tcp"  },
-  { "ibm-dial-out",    { NULL }, 3267,  "udp"  },
-  { "msft-gc",         { NULL }, 3268,  "tcp"  },
-  { "msft-gc",         { NULL }, 3268,  "udp"  },
-  { "msft-gc-ssl",     { NULL }, 3269,  "tcp"  },
-  { "msft-gc-ssl",     { NULL }, 3269,  "udp"  },
-  { "verismart",       { NULL }, 3270,  "tcp"  },
-  { "verismart",       { NULL }, 3270,  "udp"  },
-  { "csoft-prev",      { NULL }, 3271,  "tcp"  },
-  { "csoft-prev",      { NULL }, 3271,  "udp"  },
-  { "user-manager",    { NULL }, 3272,  "tcp"  },
-  { "user-manager",    { NULL }, 3272,  "udp"  },
-  { "sxmp",            { NULL }, 3273,  "tcp"  },
-  { "sxmp",            { NULL }, 3273,  "udp"  },
-  { "ordinox-server",  { NULL }, 3274,  "tcp"  },
-  { "ordinox-server",  { NULL }, 3274,  "udp"  },
-  { "samd",            { NULL }, 3275,  "tcp"  },
-  { "samd",            { NULL }, 3275,  "udp"  },
-  { "maxim-asics",     { NULL }, 3276,  "tcp"  },
-  { "maxim-asics",     { NULL }, 3276,  "udp"  },
-  { "awg-proxy",       { NULL }, 3277,  "tcp"  },
-  { "awg-proxy",       { NULL }, 3277,  "udp"  },
-  { "lkcmserver",      { NULL }, 3278,  "tcp"  },
-  { "lkcmserver",      { NULL }, 3278,  "udp"  },
-  { "admind",          { NULL }, 3279,  "tcp"  },
-  { "admind",          { NULL }, 3279,  "udp"  },
-  { "vs-server",       { NULL }, 3280,  "tcp"  },
-  { "vs-server",       { NULL }, 3280,  "udp"  },
-  { "sysopt",          { NULL }, 3281,  "tcp"  },
-  { "sysopt",          { NULL }, 3281,  "udp"  },
-  { "datusorb",        { NULL }, 3282,  "tcp"  },
-  { "datusorb",        { NULL }, 3282,  "udp"  },
-  { "net-assistant",   { NULL }, 3283,  "tcp"  },
-  { "net-assistant",   { NULL }, 3283,  "udp"  },
-  { "4talk",           { NULL }, 3284,  "tcp"  },
-  { "4talk",           { NULL }, 3284,  "udp"  },
-  { "plato",           { NULL }, 3285,  "tcp"  },
-  { "plato",           { NULL }, 3285,  "udp"  },
-  { "e-net",           { NULL }, 3286,  "tcp"  },
-  { "e-net",           { NULL }, 3286,  "udp"  },
-  { "directvdata",     { NULL }, 3287,  "tcp"  },
-  { "directvdata",     { NULL }, 3287,  "udp"  },
-  { "cops",            { NULL }, 3288,  "tcp"  },
-  { "cops",            { NULL }, 3288,  "udp"  },
-  { "enpc",            { NULL }, 3289,  "tcp"  },
-  { "enpc",            { NULL }, 3289,  "udp"  },
-  { "caps-lm",         { NULL }, 3290,  "tcp"  },
-  { "caps-lm",         { NULL }, 3290,  "udp"  },
-  { "sah-lm",          { NULL }, 3291,  "tcp"  },
-  { "sah-lm",          { NULL }, 3291,  "udp"  },
-  { "cart-o-rama",     { NULL }, 3292,  "tcp"  },
-  { "cart-o-rama",     { NULL }, 3292,  "udp"  },
-  { "fg-fps",          { NULL }, 3293,  "tcp"  },
-  { "fg-fps",          { NULL }, 3293,  "udp"  },
-  { "fg-gip",          { NULL }, 3294,  "tcp"  },
-  { "fg-gip",          { NULL }, 3294,  "udp"  },
-  { "dyniplookup",     { NULL }, 3295,  "tcp"  },
-  { "dyniplookup",     { NULL }, 3295,  "udp"  },
-  { "rib-slm",         { NULL }, 3296,  "tcp"  },
-  { "rib-slm",         { NULL }, 3296,  "udp"  },
-  { "cytel-lm",        { NULL }, 3297,  "tcp"  },
-  { "cytel-lm",        { NULL }, 3297,  "udp"  },
-  { "deskview",        { NULL }, 3298,  "tcp"  },
-  { "deskview",        { NULL }, 3298,  "udp"  },
-  { "pdrncs",          { NULL }, 3299,  "tcp"  },
-  { "pdrncs",          { NULL }, 3299,  "udp"  },
-  { "mcs-fastmail",    { NULL }, 3302,  "tcp"  },
-  { "mcs-fastmail",    { NULL }, 3302,  "udp"  },
-  { "opsession-clnt",  { NULL }, 3303,  "tcp"  },
-  { "opsession-clnt",  { NULL }, 3303,  "udp"  },
-  { "opsession-srvr",  { NULL }, 3304,  "tcp"  },
-  { "opsession-srvr",  { NULL }, 3304,  "udp"  },
-  { "odette-ftp",      { NULL }, 3305,  "tcp"  },
-  { "odette-ftp",      { NULL }, 3305,  "udp"  },
-  { "mysql",           { NULL }, 3306,  "tcp"  },
-  { "mysql",           { NULL }, 3306,  "udp"  },
-  { "opsession-prxy",  { NULL }, 3307,  "tcp"  },
-  { "opsession-prxy",  { NULL }, 3307,  "udp"  },
-  { "tns-server",      { NULL }, 3308,  "tcp"  },
-  { "tns-server",      { NULL }, 3308,  "udp"  },
-  { "tns-adv",         { NULL }, 3309,  "tcp"  },
-  { "tns-adv",         { NULL }, 3309,  "udp"  },
-  { "dyna-access",     { NULL }, 3310,  "tcp"  },
-  { "dyna-access",     { NULL }, 3310,  "udp"  },
-  { "mcns-tel-ret",    { NULL }, 3311,  "tcp"  },
-  { "mcns-tel-ret",    { NULL }, 3311,  "udp"  },
-  { "appman-server",   { NULL }, 3312,  "tcp"  },
-  { "appman-server",   { NULL }, 3312,  "udp"  },
-  { "uorb",            { NULL }, 3313,  "tcp"  },
-  { "uorb",            { NULL }, 3313,  "udp"  },
-  { "uohost",          { NULL }, 3314,  "tcp"  },
-  { "uohost",          { NULL }, 3314,  "udp"  },
-  { "cdid",            { NULL }, 3315,  "tcp"  },
-  { "cdid",            { NULL }, 3315,  "udp"  },
-  { "aicc-cmi",        { NULL }, 3316,  "tcp"  },
-  { "aicc-cmi",        { NULL }, 3316,  "udp"  },
-  { "vsaiport",        { NULL }, 3317,  "tcp"  },
-  { "vsaiport",        { NULL }, 3317,  "udp"  },
-  { "ssrip",           { NULL }, 3318,  "tcp"  },
-  { "ssrip",           { NULL }, 3318,  "udp"  },
-  { "sdt-lmd",         { NULL }, 3319,  "tcp"  },
-  { "sdt-lmd",         { NULL }, 3319,  "udp"  },
-  { "officelink2000",  { NULL }, 3320,  "tcp"  },
-  { "officelink2000",  { NULL }, 3320,  "udp"  },
-  { "vnsstr",          { NULL }, 3321,  "tcp"  },
-  { "vnsstr",          { NULL }, 3321,  "udp"  },
-  { "sftu",            { NULL }, 3326,  "tcp"  },
-  { "sftu",            { NULL }, 3326,  "udp"  },
-  { "bbars",           { NULL }, 3327,  "tcp"  },
-  { "bbars",           { NULL }, 3327,  "udp"  },
-  { "egptlm",          { NULL }, 3328,  "tcp"  },
-  { "egptlm",          { NULL }, 3328,  "udp"  },
-  { "hp-device-disc",  { NULL }, 3329,  "tcp"  },
-  { "hp-device-disc",  { NULL }, 3329,  "udp"  },
-  { "mcs-calypsoicf",  { NULL }, 3330,  "tcp"  },
-  { "mcs-calypsoicf",  { NULL }, 3330,  "udp"  },
-  { "mcs-messaging",   { NULL }, 3331,  "tcp"  },
-  { "mcs-messaging",   { NULL }, 3331,  "udp"  },
-  { "mcs-mailsvr",     { NULL }, 3332,  "tcp"  },
-  { "mcs-mailsvr",     { NULL }, 3332,  "udp"  },
-  { "dec-notes",       { NULL }, 3333,  "tcp"  },
-  { "dec-notes",       { NULL }, 3333,  "udp"  },
-  { "directv-web",     { NULL }, 3334,  "tcp"  },
-  { "directv-web",     { NULL }, 3334,  "udp"  },
-  { "directv-soft",    { NULL }, 3335,  "tcp"  },
-  { "directv-soft",    { NULL }, 3335,  "udp"  },
-  { "directv-tick",    { NULL }, 3336,  "tcp"  },
-  { "directv-tick",    { NULL }, 3336,  "udp"  },
-  { "directv-catlg",   { NULL }, 3337,  "tcp"  },
-  { "directv-catlg",   { NULL }, 3337,  "udp"  },
-  { "anet-b",          { NULL }, 3338,  "tcp"  },
-  { "anet-b",          { NULL }, 3338,  "udp"  },
-  { "anet-l",          { NULL }, 3339,  "tcp"  },
-  { "anet-l",          { NULL }, 3339,  "udp"  },
-  { "anet-m",          { NULL }, 3340,  "tcp"  },
-  { "anet-m",          { NULL }, 3340,  "udp"  },
-  { "anet-h",          { NULL }, 3341,  "tcp"  },
-  { "anet-h",          { NULL }, 3341,  "udp"  },
-  { "webtie",          { NULL }, 3342,  "tcp"  },
-  { "webtie",          { NULL }, 3342,  "udp"  },
-  { "ms-cluster-net",  { NULL }, 3343,  "tcp"  },
-  { "ms-cluster-net",  { NULL }, 3343,  "udp"  },
-  { "bnt-manager",     { NULL }, 3344,  "tcp"  },
-  { "bnt-manager",     { NULL }, 3344,  "udp"  },
-  { "influence",       { NULL }, 3345,  "tcp"  },
-  { "influence",       { NULL }, 3345,  "udp"  },
-  { "trnsprntproxy",   { NULL }, 3346,  "tcp"  },
-  { "trnsprntproxy",   { NULL }, 3346,  "udp"  },
-  { "phoenix-rpc",     { NULL }, 3347,  "tcp"  },
-  { "phoenix-rpc",     { NULL }, 3347,  "udp"  },
-  { "pangolin-laser",  { NULL }, 3348,  "tcp"  },
-  { "pangolin-laser",  { NULL }, 3348,  "udp"  },
-  { "chevinservices",  { NULL }, 3349,  "tcp"  },
-  { "chevinservices",  { NULL }, 3349,  "udp"  },
-  { "findviatv",       { NULL }, 3350,  "tcp"  },
-  { "findviatv",       { NULL }, 3350,  "udp"  },
-  { "btrieve",         { NULL }, 3351,  "tcp"  },
-  { "btrieve",         { NULL }, 3351,  "udp"  },
-  { "ssql",            { NULL }, 3352,  "tcp"  },
-  { "ssql",            { NULL }, 3352,  "udp"  },
-  { "fatpipe",         { NULL }, 3353,  "tcp"  },
-  { "fatpipe",         { NULL }, 3353,  "udp"  },
-  { "suitjd",          { NULL }, 3354,  "tcp"  },
-  { "suitjd",          { NULL }, 3354,  "udp"  },
-  { "ordinox-dbase",   { NULL }, 3355,  "tcp"  },
-  { "ordinox-dbase",   { NULL }, 3355,  "udp"  },
-  { "upnotifyps",      { NULL }, 3356,  "tcp"  },
-  { "upnotifyps",      { NULL }, 3356,  "udp"  },
-  { "adtech-test",     { NULL }, 3357,  "tcp"  },
-  { "adtech-test",     { NULL }, 3357,  "udp"  },
-  { "mpsysrmsvr",      { NULL }, 3358,  "tcp"  },
-  { "mpsysrmsvr",      { NULL }, 3358,  "udp"  },
-  { "wg-netforce",     { NULL }, 3359,  "tcp"  },
-  { "wg-netforce",     { NULL }, 3359,  "udp"  },
-  { "kv-server",       { NULL }, 3360,  "tcp"  },
-  { "kv-server",       { NULL }, 3360,  "udp"  },
-  { "kv-agent",        { NULL }, 3361,  "tcp"  },
-  { "kv-agent",        { NULL }, 3361,  "udp"  },
-  { "dj-ilm",          { NULL }, 3362,  "tcp"  },
-  { "dj-ilm",          { NULL }, 3362,  "udp"  },
-  { "nati-vi-server",  { NULL }, 3363,  "tcp"  },
-  { "nati-vi-server",  { NULL }, 3363,  "udp"  },
-  { "creativeserver",  { NULL }, 3364,  "tcp"  },
-  { "creativeserver",  { NULL }, 3364,  "udp"  },
-  { "contentserver",   { NULL }, 3365,  "tcp"  },
-  { "contentserver",   { NULL }, 3365,  "udp"  },
-  { "creativepartnr",  { NULL }, 3366,  "tcp"  },
-  { "creativepartnr",  { NULL }, 3366,  "udp"  },
-  { "tip2",            { NULL }, 3372,  "tcp"  },
-  { "tip2",            { NULL }, 3372,  "udp"  },
-  { "lavenir-lm",      { NULL }, 3373,  "tcp"  },
-  { "lavenir-lm",      { NULL }, 3373,  "udp"  },
-  { "cluster-disc",    { NULL }, 3374,  "tcp"  },
-  { "cluster-disc",    { NULL }, 3374,  "udp"  },
-  { "vsnm-agent",      { NULL }, 3375,  "tcp"  },
-  { "vsnm-agent",      { NULL }, 3375,  "udp"  },
-  { "cdbroker",        { NULL }, 3376,  "tcp"  },
-  { "cdbroker",        { NULL }, 3376,  "udp"  },
-  { "cogsys-lm",       { NULL }, 3377,  "tcp"  },
-  { "cogsys-lm",       { NULL }, 3377,  "udp"  },
-  { "wsicopy",         { NULL }, 3378,  "tcp"  },
-  { "wsicopy",         { NULL }, 3378,  "udp"  },
-  { "socorfs",         { NULL }, 3379,  "tcp"  },
-  { "socorfs",         { NULL }, 3379,  "udp"  },
-  { "sns-channels",    { NULL }, 3380,  "tcp"  },
-  { "sns-channels",    { NULL }, 3380,  "udp"  },
-  { "geneous",         { NULL }, 3381,  "tcp"  },
-  { "geneous",         { NULL }, 3381,  "udp"  },
-  { "fujitsu-neat",    { NULL }, 3382,  "tcp"  },
-  { "fujitsu-neat",    { NULL }, 3382,  "udp"  },
-  { "esp-lm",          { NULL }, 3383,  "tcp"  },
-  { "esp-lm",          { NULL }, 3383,  "udp"  },
-  { "hp-clic",         { NULL }, 3384,  "tcp"  },
-  { "hp-clic",         { NULL }, 3384,  "udp"  },
-  { "qnxnetman",       { NULL }, 3385,  "tcp"  },
-  { "qnxnetman",       { NULL }, 3385,  "udp"  },
-  { "gprs-data",       { NULL }, 3386,  "tcp"  },
-  { "gprs-sig",        { NULL }, 3386,  "udp"  },
-  { "backroomnet",     { NULL }, 3387,  "tcp"  },
-  { "backroomnet",     { NULL }, 3387,  "udp"  },
-  { "cbserver",        { NULL }, 3388,  "tcp"  },
-  { "cbserver",        { NULL }, 3388,  "udp"  },
-  { "ms-wbt-server",   { NULL }, 3389,  "tcp"  },
-  { "ms-wbt-server",   { NULL }, 3389,  "udp"  },
-  { "dsc",             { NULL }, 3390,  "tcp"  },
-  { "dsc",             { NULL }, 3390,  "udp"  },
-  { "savant",          { NULL }, 3391,  "tcp"  },
-  { "savant",          { NULL }, 3391,  "udp"  },
-  { "efi-lm",          { NULL }, 3392,  "tcp"  },
-  { "efi-lm",          { NULL }, 3392,  "udp"  },
-  { "d2k-tapestry1",   { NULL }, 3393,  "tcp"  },
-  { "d2k-tapestry1",   { NULL }, 3393,  "udp"  },
-  { "d2k-tapestry2",   { NULL }, 3394,  "tcp"  },
-  { "d2k-tapestry2",   { NULL }, 3394,  "udp"  },
-  { "dyna-lm",         { NULL }, 3395,  "tcp"  },
-  { "dyna-lm",         { NULL }, 3395,  "udp"  },
-  { "printer_agent",   { NULL }, 3396,  "tcp"  },
-  { "printer_agent",   { NULL }, 3396,  "udp"  },
-  { "cloanto-lm",      { NULL }, 3397,  "tcp"  },
-  { "cloanto-lm",      { NULL }, 3397,  "udp"  },
-  { "mercantile",      { NULL }, 3398,  "tcp"  },
-  { "mercantile",      { NULL }, 3398,  "udp"  },
-  { "csms",            { NULL }, 3399,  "tcp"  },
-  { "csms",            { NULL }, 3399,  "udp"  },
-  { "csms2",           { NULL }, 3400,  "tcp"  },
-  { "csms2",           { NULL }, 3400,  "udp"  },
-  { "filecast",        { NULL }, 3401,  "tcp"  },
-  { "filecast",        { NULL }, 3401,  "udp"  },
-  { "fxaengine-net",   { NULL }, 3402,  "tcp"  },
-  { "fxaengine-net",   { NULL }, 3402,  "udp"  },
-  { "nokia-ann-ch1",   { NULL }, 3405,  "tcp"  },
-  { "nokia-ann-ch1",   { NULL }, 3405,  "udp"  },
-  { "nokia-ann-ch2",   { NULL }, 3406,  "tcp"  },
-  { "nokia-ann-ch2",   { NULL }, 3406,  "udp"  },
-  { "ldap-admin",      { NULL }, 3407,  "tcp"  },
-  { "ldap-admin",      { NULL }, 3407,  "udp"  },
-  { "BESApi",          { NULL }, 3408,  "tcp"  },
-  { "BESApi",          { NULL }, 3408,  "udp"  },
-  { "networklens",     { NULL }, 3409,  "tcp"  },
-  { "networklens",     { NULL }, 3409,  "udp"  },
-  { "networklenss",    { NULL }, 3410,  "tcp"  },
-  { "networklenss",    { NULL }, 3410,  "udp"  },
-  { "biolink-auth",    { NULL }, 3411,  "tcp"  },
-  { "biolink-auth",    { NULL }, 3411,  "udp"  },
-  { "xmlblaster",      { NULL }, 3412,  "tcp"  },
-  { "xmlblaster",      { NULL }, 3412,  "udp"  },
-  { "svnet",           { NULL }, 3413,  "tcp"  },
-  { "svnet",           { NULL }, 3413,  "udp"  },
-  { "wip-port",        { NULL }, 3414,  "tcp"  },
-  { "wip-port",        { NULL }, 3414,  "udp"  },
-  { "bcinameservice",  { NULL }, 3415,  "tcp"  },
-  { "bcinameservice",  { NULL }, 3415,  "udp"  },
-  { "commandport",     { NULL }, 3416,  "tcp"  },
-  { "commandport",     { NULL }, 3416,  "udp"  },
-  { "csvr",            { NULL }, 3417,  "tcp"  },
-  { "csvr",            { NULL }, 3417,  "udp"  },
-  { "rnmap",           { NULL }, 3418,  "tcp"  },
-  { "rnmap",           { NULL }, 3418,  "udp"  },
-  { "softaudit",       { NULL }, 3419,  "tcp"  },
-  { "softaudit",       { NULL }, 3419,  "udp"  },
-  { "ifcp-port",       { NULL }, 3420,  "tcp"  },
-  { "ifcp-port",       { NULL }, 3420,  "udp"  },
-  { "bmap",            { NULL }, 3421,  "tcp"  },
-  { "bmap",            { NULL }, 3421,  "udp"  },
-  { "rusb-sys-port",   { NULL }, 3422,  "tcp"  },
-  { "rusb-sys-port",   { NULL }, 3422,  "udp"  },
-  { "xtrm",            { NULL }, 3423,  "tcp"  },
-  { "xtrm",            { NULL }, 3423,  "udp"  },
-  { "xtrms",           { NULL }, 3424,  "tcp"  },
-  { "xtrms",           { NULL }, 3424,  "udp"  },
-  { "agps-port",       { NULL }, 3425,  "tcp"  },
-  { "agps-port",       { NULL }, 3425,  "udp"  },
-  { "arkivio",         { NULL }, 3426,  "tcp"  },
-  { "arkivio",         { NULL }, 3426,  "udp"  },
-  { "websphere-snmp",  { NULL }, 3427,  "tcp"  },
-  { "websphere-snmp",  { NULL }, 3427,  "udp"  },
-  { "twcss",           { NULL }, 3428,  "tcp"  },
-  { "twcss",           { NULL }, 3428,  "udp"  },
-  { "gcsp",            { NULL }, 3429,  "tcp"  },
-  { "gcsp",            { NULL }, 3429,  "udp"  },
-  { "ssdispatch",      { NULL }, 3430,  "tcp"  },
-  { "ssdispatch",      { NULL }, 3430,  "udp"  },
-  { "ndl-als",         { NULL }, 3431,  "tcp"  },
-  { "ndl-als",         { NULL }, 3431,  "udp"  },
-  { "osdcp",           { NULL }, 3432,  "tcp"  },
-  { "osdcp",           { NULL }, 3432,  "udp"  },
-  { "alta-smp",        { NULL }, 3433,  "tcp"  },
-  { "alta-smp",        { NULL }, 3433,  "udp"  },
-  { "opencm",          { NULL }, 3434,  "tcp"  },
-  { "opencm",          { NULL }, 3434,  "udp"  },
-  { "pacom",           { NULL }, 3435,  "tcp"  },
-  { "pacom",           { NULL }, 3435,  "udp"  },
-  { "gc-config",       { NULL }, 3436,  "tcp"  },
-  { "gc-config",       { NULL }, 3436,  "udp"  },
-  { "autocueds",       { NULL }, 3437,  "tcp"  },
-  { "autocueds",       { NULL }, 3437,  "udp"  },
-  { "spiral-admin",    { NULL }, 3438,  "tcp"  },
-  { "spiral-admin",    { NULL }, 3438,  "udp"  },
-  { "hri-port",        { NULL }, 3439,  "tcp"  },
-  { "hri-port",        { NULL }, 3439,  "udp"  },
-  { "ans-console",     { NULL }, 3440,  "tcp"  },
-  { "ans-console",     { NULL }, 3440,  "udp"  },
-  { "connect-client",  { NULL }, 3441,  "tcp"  },
-  { "connect-client",  { NULL }, 3441,  "udp"  },
-  { "connect-server",  { NULL }, 3442,  "tcp"  },
-  { "connect-server",  { NULL }, 3442,  "udp"  },
-  { "ov-nnm-websrv",   { NULL }, 3443,  "tcp"  },
-  { "ov-nnm-websrv",   { NULL }, 3443,  "udp"  },
-  { "denali-server",   { NULL }, 3444,  "tcp"  },
-  { "denali-server",   { NULL }, 3444,  "udp"  },
-  { "monp",            { NULL }, 3445,  "tcp"  },
-  { "monp",            { NULL }, 3445,  "udp"  },
-  { "3comfaxrpc",      { NULL }, 3446,  "tcp"  },
-  { "3comfaxrpc",      { NULL }, 3446,  "udp"  },
-  { "directnet",       { NULL }, 3447,  "tcp"  },
-  { "directnet",       { NULL }, 3447,  "udp"  },
-  { "dnc-port",        { NULL }, 3448,  "tcp"  },
-  { "dnc-port",        { NULL }, 3448,  "udp"  },
-  { "hotu-chat",       { NULL }, 3449,  "tcp"  },
-  { "hotu-chat",       { NULL }, 3449,  "udp"  },
-  { "castorproxy",     { NULL }, 3450,  "tcp"  },
-  { "castorproxy",     { NULL }, 3450,  "udp"  },
-  { "asam",            { NULL }, 3451,  "tcp"  },
-  { "asam",            { NULL }, 3451,  "udp"  },
-  { "sabp-signal",     { NULL }, 3452,  "tcp"  },
-  { "sabp-signal",     { NULL }, 3452,  "udp"  },
-  { "pscupd",          { NULL }, 3453,  "tcp"  },
-  { "pscupd",          { NULL }, 3453,  "udp"  },
-  { "mira",            { NULL }, 3454,  "tcp"  },
-  { "prsvp",           { NULL }, 3455,  "tcp"  },
-  { "prsvp",           { NULL }, 3455,  "udp"  },
-  { "vat",             { NULL }, 3456,  "tcp"  },
-  { "vat",             { NULL }, 3456,  "udp"  },
-  { "vat-control",     { NULL }, 3457,  "tcp"  },
-  { "vat-control",     { NULL }, 3457,  "udp"  },
-  { "d3winosfi",       { NULL }, 3458,  "tcp"  },
-  { "d3winosfi",       { NULL }, 3458,  "udp"  },
-  { "integral",        { NULL }, 3459,  "tcp"  },
-  { "integral",        { NULL }, 3459,  "udp"  },
-  { "edm-manager",     { NULL }, 3460,  "tcp"  },
-  { "edm-manager",     { NULL }, 3460,  "udp"  },
-  { "edm-stager",      { NULL }, 3461,  "tcp"  },
-  { "edm-stager",      { NULL }, 3461,  "udp"  },
-  { "edm-std-notify",  { NULL }, 3462,  "tcp"  },
-  { "edm-std-notify",  { NULL }, 3462,  "udp"  },
-  { "edm-adm-notify",  { NULL }, 3463,  "tcp"  },
-  { "edm-adm-notify",  { NULL }, 3463,  "udp"  },
-  { "edm-mgr-sync",    { NULL }, 3464,  "tcp"  },
-  { "edm-mgr-sync",    { NULL }, 3464,  "udp"  },
-  { "edm-mgr-cntrl",   { NULL }, 3465,  "tcp"  },
-  { "edm-mgr-cntrl",   { NULL }, 3465,  "udp"  },
-  { "workflow",        { NULL }, 3466,  "tcp"  },
-  { "workflow",        { NULL }, 3466,  "udp"  },
-  { "rcst",            { NULL }, 3467,  "tcp"  },
-  { "rcst",            { NULL }, 3467,  "udp"  },
-  { "ttcmremotectrl",  { NULL }, 3468,  "tcp"  },
-  { "ttcmremotectrl",  { NULL }, 3468,  "udp"  },
-  { "pluribus",        { NULL }, 3469,  "tcp"  },
-  { "pluribus",        { NULL }, 3469,  "udp"  },
-  { "jt400",           { NULL }, 3470,  "tcp"  },
-  { "jt400",           { NULL }, 3470,  "udp"  },
-  { "jt400-ssl",       { NULL }, 3471,  "tcp"  },
-  { "jt400-ssl",       { NULL }, 3471,  "udp"  },
-  { "jaugsremotec-1",  { NULL }, 3472,  "tcp"  },
-  { "jaugsremotec-1",  { NULL }, 3472,  "udp"  },
-  { "jaugsremotec-2",  { NULL }, 3473,  "tcp"  },
-  { "jaugsremotec-2",  { NULL }, 3473,  "udp"  },
-  { "ttntspauto",      { NULL }, 3474,  "tcp"  },
-  { "ttntspauto",      { NULL }, 3474,  "udp"  },
-  { "genisar-port",    { NULL }, 3475,  "tcp"  },
-  { "genisar-port",    { NULL }, 3475,  "udp"  },
-  { "nppmp",           { NULL }, 3476,  "tcp"  },
-  { "nppmp",           { NULL }, 3476,  "udp"  },
-  { "ecomm",           { NULL }, 3477,  "tcp"  },
-  { "ecomm",           { NULL }, 3477,  "udp"  },
-  { "stun",            { NULL }, 3478,  "tcp"  },
-  { "stun",            { NULL }, 3478,  "udp"  },
-  { "turn",            { NULL }, 3478,  "tcp"  },
-  { "turn",            { NULL }, 3478,  "udp"  },
-  { "stun-behavior",   { NULL }, 3478,  "tcp"  },
-  { "stun-behavior",   { NULL }, 3478,  "udp"  },
-  { "twrpc",           { NULL }, 3479,  "tcp"  },
-  { "twrpc",           { NULL }, 3479,  "udp"  },
-  { "plethora",        { NULL }, 3480,  "tcp"  },
-  { "plethora",        { NULL }, 3480,  "udp"  },
-  { "cleanerliverc",   { NULL }, 3481,  "tcp"  },
-  { "cleanerliverc",   { NULL }, 3481,  "udp"  },
-  { "vulture",         { NULL }, 3482,  "tcp"  },
-  { "vulture",         { NULL }, 3482,  "udp"  },
-  { "slim-devices",    { NULL }, 3483,  "tcp"  },
-  { "slim-devices",    { NULL }, 3483,  "udp"  },
-  { "gbs-stp",         { NULL }, 3484,  "tcp"  },
-  { "gbs-stp",         { NULL }, 3484,  "udp"  },
-  { "celatalk",        { NULL }, 3485,  "tcp"  },
-  { "celatalk",        { NULL }, 3485,  "udp"  },
-  { "ifsf-hb-port",    { NULL }, 3486,  "tcp"  },
-  { "ifsf-hb-port",    { NULL }, 3486,  "udp"  },
-  { "ltctcp",          { NULL }, 3487,  "tcp"  },
-  { "ltcudp",          { NULL }, 3487,  "udp"  },
-  { "fs-rh-srv",       { NULL }, 3488,  "tcp"  },
-  { "fs-rh-srv",       { NULL }, 3488,  "udp"  },
-  { "dtp-dia",         { NULL }, 3489,  "tcp"  },
-  { "dtp-dia",         { NULL }, 3489,  "udp"  },
-  { "colubris",        { NULL }, 3490,  "tcp"  },
-  { "colubris",        { NULL }, 3490,  "udp"  },
-  { "swr-port",        { NULL }, 3491,  "tcp"  },
-  { "swr-port",        { NULL }, 3491,  "udp"  },
-  { "tvdumtray-port",  { NULL }, 3492,  "tcp"  },
-  { "tvdumtray-port",  { NULL }, 3492,  "udp"  },
-  { "nut",             { NULL }, 3493,  "tcp"  },
-  { "nut",             { NULL }, 3493,  "udp"  },
-  { "ibm3494",         { NULL }, 3494,  "tcp"  },
-  { "ibm3494",         { NULL }, 3494,  "udp"  },
-  { "seclayer-tcp",    { NULL }, 3495,  "tcp"  },
-  { "seclayer-tcp",    { NULL }, 3495,  "udp"  },
-  { "seclayer-tls",    { NULL }, 3496,  "tcp"  },
-  { "seclayer-tls",    { NULL }, 3496,  "udp"  },
-  { "ipether232port",  { NULL }, 3497,  "tcp"  },
-  { "ipether232port",  { NULL }, 3497,  "udp"  },
-  { "dashpas-port",    { NULL }, 3498,  "tcp"  },
-  { "dashpas-port",    { NULL }, 3498,  "udp"  },
-  { "sccip-media",     { NULL }, 3499,  "tcp"  },
-  { "sccip-media",     { NULL }, 3499,  "udp"  },
-  { "rtmp-port",       { NULL }, 3500,  "tcp"  },
-  { "rtmp-port",       { NULL }, 3500,  "udp"  },
-  { "isoft-p2p",       { NULL }, 3501,  "tcp"  },
-  { "isoft-p2p",       { NULL }, 3501,  "udp"  },
-  { "avinstalldisc",   { NULL }, 3502,  "tcp"  },
-  { "avinstalldisc",   { NULL }, 3502,  "udp"  },
-  { "lsp-ping",        { NULL }, 3503,  "tcp"  },
-  { "lsp-ping",        { NULL }, 3503,  "udp"  },
-  { "ironstorm",       { NULL }, 3504,  "tcp"  },
-  { "ironstorm",       { NULL }, 3504,  "udp"  },
-  { "ccmcomm",         { NULL }, 3505,  "tcp"  },
-  { "ccmcomm",         { NULL }, 3505,  "udp"  },
-  { "apc-3506",        { NULL }, 3506,  "tcp"  },
-  { "apc-3506",        { NULL }, 3506,  "udp"  },
-  { "nesh-broker",     { NULL }, 3507,  "tcp"  },
-  { "nesh-broker",     { NULL }, 3507,  "udp"  },
-  { "interactionweb",  { NULL }, 3508,  "tcp"  },
-  { "interactionweb",  { NULL }, 3508,  "udp"  },
-  { "vt-ssl",          { NULL }, 3509,  "tcp"  },
-  { "vt-ssl",          { NULL }, 3509,  "udp"  },
-  { "xss-port",        { NULL }, 3510,  "tcp"  },
-  { "xss-port",        { NULL }, 3510,  "udp"  },
-  { "webmail-2",       { NULL }, 3511,  "tcp"  },
-  { "webmail-2",       { NULL }, 3511,  "udp"  },
-  { "aztec",           { NULL }, 3512,  "tcp"  },
-  { "aztec",           { NULL }, 3512,  "udp"  },
-  { "arcpd",           { NULL }, 3513,  "tcp"  },
-  { "arcpd",           { NULL }, 3513,  "udp"  },
-  { "must-p2p",        { NULL }, 3514,  "tcp"  },
-  { "must-p2p",        { NULL }, 3514,  "udp"  },
-  { "must-backplane",  { NULL }, 3515,  "tcp"  },
-  { "must-backplane",  { NULL }, 3515,  "udp"  },
-  { "smartcard-port",  { NULL }, 3516,  "tcp"  },
-  { "smartcard-port",  { NULL }, 3516,  "udp"  },
-  { "802-11-iapp",     { NULL }, 3517,  "tcp"  },
-  { "802-11-iapp",     { NULL }, 3517,  "udp"  },
-  { "artifact-msg",    { NULL }, 3518,  "tcp"  },
-  { "artifact-msg",    { NULL }, 3518,  "udp"  },
-  { "nvmsgd",          { NULL }, 3519,  "tcp"  },
-  { "galileo",         { NULL }, 3519,  "udp"  },
-  { "galileolog",      { NULL }, 3520,  "tcp"  },
-  { "galileolog",      { NULL }, 3520,  "udp"  },
-  { "mc3ss",           { NULL }, 3521,  "tcp"  },
-  { "mc3ss",           { NULL }, 3521,  "udp"  },
-  { "nssocketport",    { NULL }, 3522,  "tcp"  },
-  { "nssocketport",    { NULL }, 3522,  "udp"  },
-  { "odeumservlink",   { NULL }, 3523,  "tcp"  },
-  { "odeumservlink",   { NULL }, 3523,  "udp"  },
-  { "ecmport",         { NULL }, 3524,  "tcp"  },
-  { "ecmport",         { NULL }, 3524,  "udp"  },
-  { "eisport",         { NULL }, 3525,  "tcp"  },
-  { "eisport",         { NULL }, 3525,  "udp"  },
-  { "starquiz-port",   { NULL }, 3526,  "tcp"  },
-  { "starquiz-port",   { NULL }, 3526,  "udp"  },
-  { "beserver-msg-q",  { NULL }, 3527,  "tcp"  },
-  { "beserver-msg-q",  { NULL }, 3527,  "udp"  },
-  { "jboss-iiop",      { NULL }, 3528,  "tcp"  },
-  { "jboss-iiop",      { NULL }, 3528,  "udp"  },
-  { "jboss-iiop-ssl",  { NULL }, 3529,  "tcp"  },
-  { "jboss-iiop-ssl",  { NULL }, 3529,  "udp"  },
-  { "gf",              { NULL }, 3530,  "tcp"  },
-  { "gf",              { NULL }, 3530,  "udp"  },
-  { "joltid",          { NULL }, 3531,  "tcp"  },
-  { "joltid",          { NULL }, 3531,  "udp"  },
-  { "raven-rmp",       { NULL }, 3532,  "tcp"  },
-  { "raven-rmp",       { NULL }, 3532,  "udp"  },
-  { "raven-rdp",       { NULL }, 3533,  "tcp"  },
-  { "raven-rdp",       { NULL }, 3533,  "udp"  },
-  { "urld-port",       { NULL }, 3534,  "tcp"  },
-  { "urld-port",       { NULL }, 3534,  "udp"  },
-  { "ms-la",           { NULL }, 3535,  "tcp"  },
-  { "ms-la",           { NULL }, 3535,  "udp"  },
-  { "snac",            { NULL }, 3536,  "tcp"  },
-  { "snac",            { NULL }, 3536,  "udp"  },
-  { "ni-visa-remote",  { NULL }, 3537,  "tcp"  },
-  { "ni-visa-remote",  { NULL }, 3537,  "udp"  },
-  { "ibm-diradm",      { NULL }, 3538,  "tcp"  },
-  { "ibm-diradm",      { NULL }, 3538,  "udp"  },
-  { "ibm-diradm-ssl",  { NULL }, 3539,  "tcp"  },
-  { "ibm-diradm-ssl",  { NULL }, 3539,  "udp"  },
-  { "pnrp-port",       { NULL }, 3540,  "tcp"  },
-  { "pnrp-port",       { NULL }, 3540,  "udp"  },
-  { "voispeed-port",   { NULL }, 3541,  "tcp"  },
-  { "voispeed-port",   { NULL }, 3541,  "udp"  },
-  { "hacl-monitor",    { NULL }, 3542,  "tcp"  },
-  { "hacl-monitor",    { NULL }, 3542,  "udp"  },
-  { "qftest-lookup",   { NULL }, 3543,  "tcp"  },
-  { "qftest-lookup",   { NULL }, 3543,  "udp"  },
-  { "teredo",          { NULL }, 3544,  "tcp"  },
-  { "teredo",          { NULL }, 3544,  "udp"  },
-  { "camac",           { NULL }, 3545,  "tcp"  },
-  { "camac",           { NULL }, 3545,  "udp"  },
-  { "symantec-sim",    { NULL }, 3547,  "tcp"  },
-  { "symantec-sim",    { NULL }, 3547,  "udp"  },
-  { "interworld",      { NULL }, 3548,  "tcp"  },
-  { "interworld",      { NULL }, 3548,  "udp"  },
-  { "tellumat-nms",    { NULL }, 3549,  "tcp"  },
-  { "tellumat-nms",    { NULL }, 3549,  "udp"  },
-  { "ssmpp",           { NULL }, 3550,  "tcp"  },
-  { "ssmpp",           { NULL }, 3550,  "udp"  },
-  { "apcupsd",         { NULL }, 3551,  "tcp"  },
-  { "apcupsd",         { NULL }, 3551,  "udp"  },
-  { "taserver",        { NULL }, 3552,  "tcp"  },
-  { "taserver",        { NULL }, 3552,  "udp"  },
-  { "rbr-discovery",   { NULL }, 3553,  "tcp"  },
-  { "rbr-discovery",   { NULL }, 3553,  "udp"  },
-  { "questnotify",     { NULL }, 3554,  "tcp"  },
-  { "questnotify",     { NULL }, 3554,  "udp"  },
-  { "razor",           { NULL }, 3555,  "tcp"  },
-  { "razor",           { NULL }, 3555,  "udp"  },
-  { "sky-transport",   { NULL }, 3556,  "tcp"  },
-  { "sky-transport",   { NULL }, 3556,  "udp"  },
-  { "personalos-001",  { NULL }, 3557,  "tcp"  },
-  { "personalos-001",  { NULL }, 3557,  "udp"  },
-  { "mcp-port",        { NULL }, 3558,  "tcp"  },
-  { "mcp-port",        { NULL }, 3558,  "udp"  },
-  { "cctv-port",       { NULL }, 3559,  "tcp"  },
-  { "cctv-port",       { NULL }, 3559,  "udp"  },
-  { "iniserve-port",   { NULL }, 3560,  "tcp"  },
-  { "iniserve-port",   { NULL }, 3560,  "udp"  },
-  { "bmc-onekey",      { NULL }, 3561,  "tcp"  },
-  { "bmc-onekey",      { NULL }, 3561,  "udp"  },
-  { "sdbproxy",        { NULL }, 3562,  "tcp"  },
-  { "sdbproxy",        { NULL }, 3562,  "udp"  },
-  { "watcomdebug",     { NULL }, 3563,  "tcp"  },
-  { "watcomdebug",     { NULL }, 3563,  "udp"  },
-  { "esimport",        { NULL }, 3564,  "tcp"  },
-  { "esimport",        { NULL }, 3564,  "udp"  },
-  { "m2pa",            { NULL }, 3565,  "tcp"  },
-  { "m2pa",            { NULL }, 3565,  "sctp" },
-  { "quest-data-hub",  { NULL }, 3566,  "tcp"  },
-  { "oap",             { NULL }, 3567,  "tcp"  },
-  { "oap",             { NULL }, 3567,  "udp"  },
-  { "oap-s",           { NULL }, 3568,  "tcp"  },
-  { "oap-s",           { NULL }, 3568,  "udp"  },
-  { "mbg-ctrl",        { NULL }, 3569,  "tcp"  },
-  { "mbg-ctrl",        { NULL }, 3569,  "udp"  },
-  { "mccwebsvr-port",  { NULL }, 3570,  "tcp"  },
-  { "mccwebsvr-port",  { NULL }, 3570,  "udp"  },
-  { "megardsvr-port",  { NULL }, 3571,  "tcp"  },
-  { "megardsvr-port",  { NULL }, 3571,  "udp"  },
-  { "megaregsvrport",  { NULL }, 3572,  "tcp"  },
-  { "megaregsvrport",  { NULL }, 3572,  "udp"  },
-  { "tag-ups-1",       { NULL }, 3573,  "tcp"  },
-  { "tag-ups-1",       { NULL }, 3573,  "udp"  },
-  { "dmaf-server",     { NULL }, 3574,  "tcp"  },
-  { "dmaf-caster",     { NULL }, 3574,  "udp"  },
-  { "ccm-port",        { NULL }, 3575,  "tcp"  },
-  { "ccm-port",        { NULL }, 3575,  "udp"  },
-  { "cmc-port",        { NULL }, 3576,  "tcp"  },
-  { "cmc-port",        { NULL }, 3576,  "udp"  },
-  { "config-port",     { NULL }, 3577,  "tcp"  },
-  { "config-port",     { NULL }, 3577,  "udp"  },
-  { "data-port",       { NULL }, 3578,  "tcp"  },
-  { "data-port",       { NULL }, 3578,  "udp"  },
-  { "ttat3lb",         { NULL }, 3579,  "tcp"  },
-  { "ttat3lb",         { NULL }, 3579,  "udp"  },
-  { "nati-svrloc",     { NULL }, 3580,  "tcp"  },
-  { "nati-svrloc",     { NULL }, 3580,  "udp"  },
-  { "kfxaclicensing",  { NULL }, 3581,  "tcp"  },
-  { "kfxaclicensing",  { NULL }, 3581,  "udp"  },
-  { "press",           { NULL }, 3582,  "tcp"  },
-  { "press",           { NULL }, 3582,  "udp"  },
-  { "canex-watch",     { NULL }, 3583,  "tcp"  },
-  { "canex-watch",     { NULL }, 3583,  "udp"  },
-  { "u-dbap",          { NULL }, 3584,  "tcp"  },
-  { "u-dbap",          { NULL }, 3584,  "udp"  },
-  { "emprise-lls",     { NULL }, 3585,  "tcp"  },
-  { "emprise-lls",     { NULL }, 3585,  "udp"  },
-  { "emprise-lsc",     { NULL }, 3586,  "tcp"  },
-  { "emprise-lsc",     { NULL }, 3586,  "udp"  },
-  { "p2pgroup",        { NULL }, 3587,  "tcp"  },
-  { "p2pgroup",        { NULL }, 3587,  "udp"  },
-  { "sentinel",        { NULL }, 3588,  "tcp"  },
-  { "sentinel",        { NULL }, 3588,  "udp"  },
-  { "isomair",         { NULL }, 3589,  "tcp"  },
-  { "isomair",         { NULL }, 3589,  "udp"  },
-  { "wv-csp-sms",      { NULL }, 3590,  "tcp"  },
-  { "wv-csp-sms",      { NULL }, 3590,  "udp"  },
-  { "gtrack-server",   { NULL }, 3591,  "tcp"  },
-  { "gtrack-server",   { NULL }, 3591,  "udp"  },
-  { "gtrack-ne",       { NULL }, 3592,  "tcp"  },
-  { "gtrack-ne",       { NULL }, 3592,  "udp"  },
-  { "bpmd",            { NULL }, 3593,  "tcp"  },
-  { "bpmd",            { NULL }, 3593,  "udp"  },
-  { "mediaspace",      { NULL }, 3594,  "tcp"  },
-  { "mediaspace",      { NULL }, 3594,  "udp"  },
-  { "shareapp",        { NULL }, 3595,  "tcp"  },
-  { "shareapp",        { NULL }, 3595,  "udp"  },
-  { "iw-mmogame",      { NULL }, 3596,  "tcp"  },
-  { "iw-mmogame",      { NULL }, 3596,  "udp"  },
-  { "a14",             { NULL }, 3597,  "tcp"  },
-  { "a14",             { NULL }, 3597,  "udp"  },
-  { "a15",             { NULL }, 3598,  "tcp"  },
-  { "a15",             { NULL }, 3598,  "udp"  },
-  { "quasar-server",   { NULL }, 3599,  "tcp"  },
-  { "quasar-server",   { NULL }, 3599,  "udp"  },
-  { "trap-daemon",     { NULL }, 3600,  "tcp"  },
-  { "trap-daemon",     { NULL }, 3600,  "udp"  },
-  { "visinet-gui",     { NULL }, 3601,  "tcp"  },
-  { "visinet-gui",     { NULL }, 3601,  "udp"  },
-  { "infiniswitchcl",  { NULL }, 3602,  "tcp"  },
-  { "infiniswitchcl",  { NULL }, 3602,  "udp"  },
-  { "int-rcv-cntrl",   { NULL }, 3603,  "tcp"  },
-  { "int-rcv-cntrl",   { NULL }, 3603,  "udp"  },
-  { "bmc-jmx-port",    { NULL }, 3604,  "tcp"  },
-  { "bmc-jmx-port",    { NULL }, 3604,  "udp"  },
-  { "comcam-io",       { NULL }, 3605,  "tcp"  },
-  { "comcam-io",       { NULL }, 3605,  "udp"  },
-  { "splitlock",       { NULL }, 3606,  "tcp"  },
-  { "splitlock",       { NULL }, 3606,  "udp"  },
-  { "precise-i3",      { NULL }, 3607,  "tcp"  },
-  { "precise-i3",      { NULL }, 3607,  "udp"  },
-  { "trendchip-dcp",   { NULL }, 3608,  "tcp"  },
-  { "trendchip-dcp",   { NULL }, 3608,  "udp"  },
-  { "cpdi-pidas-cm",   { NULL }, 3609,  "tcp"  },
-  { "cpdi-pidas-cm",   { NULL }, 3609,  "udp"  },
-  { "echonet",         { NULL }, 3610,  "tcp"  },
-  { "echonet",         { NULL }, 3610,  "udp"  },
-  { "six-degrees",     { NULL }, 3611,  "tcp"  },
-  { "six-degrees",     { NULL }, 3611,  "udp"  },
-  { "hp-dataprotect",  { NULL }, 3612,  "tcp"  },
-  { "hp-dataprotect",  { NULL }, 3612,  "udp"  },
-  { "alaris-disc",     { NULL }, 3613,  "tcp"  },
-  { "alaris-disc",     { NULL }, 3613,  "udp"  },
-  { "sigma-port",      { NULL }, 3614,  "tcp"  },
-  { "sigma-port",      { NULL }, 3614,  "udp"  },
-  { "start-network",   { NULL }, 3615,  "tcp"  },
-  { "start-network",   { NULL }, 3615,  "udp"  },
-  { "cd3o-protocol",   { NULL }, 3616,  "tcp"  },
-  { "cd3o-protocol",   { NULL }, 3616,  "udp"  },
-  { "sharp-server",    { NULL }, 3617,  "tcp"  },
-  { "sharp-server",    { NULL }, 3617,  "udp"  },
-  { "aairnet-1",       { NULL }, 3618,  "tcp"  },
-  { "aairnet-1",       { NULL }, 3618,  "udp"  },
-  { "aairnet-2",       { NULL }, 3619,  "tcp"  },
-  { "aairnet-2",       { NULL }, 3619,  "udp"  },
-  { "ep-pcp",          { NULL }, 3620,  "tcp"  },
-  { "ep-pcp",          { NULL }, 3620,  "udp"  },
-  { "ep-nsp",          { NULL }, 3621,  "tcp"  },
-  { "ep-nsp",          { NULL }, 3621,  "udp"  },
-  { "ff-lr-port",      { NULL }, 3622,  "tcp"  },
-  { "ff-lr-port",      { NULL }, 3622,  "udp"  },
-  { "haipe-discover",  { NULL }, 3623,  "tcp"  },
-  { "haipe-discover",  { NULL }, 3623,  "udp"  },
-  { "dist-upgrade",    { NULL }, 3624,  "tcp"  },
-  { "dist-upgrade",    { NULL }, 3624,  "udp"  },
-  { "volley",          { NULL }, 3625,  "tcp"  },
-  { "volley",          { NULL }, 3625,  "udp"  },
-  { "bvcdaemon-port",  { NULL }, 3626,  "tcp"  },
-  { "bvcdaemon-port",  { NULL }, 3626,  "udp"  },
-  { "jamserverport",   { NULL }, 3627,  "tcp"  },
-  { "jamserverport",   { NULL }, 3627,  "udp"  },
-  { "ept-machine",     { NULL }, 3628,  "tcp"  },
-  { "ept-machine",     { NULL }, 3628,  "udp"  },
-  { "escvpnet",        { NULL }, 3629,  "tcp"  },
-  { "escvpnet",        { NULL }, 3629,  "udp"  },
-  { "cs-remote-db",    { NULL }, 3630,  "tcp"  },
-  { "cs-remote-db",    { NULL }, 3630,  "udp"  },
-  { "cs-services",     { NULL }, 3631,  "tcp"  },
-  { "cs-services",     { NULL }, 3631,  "udp"  },
-  { "distcc",          { NULL }, 3632,  "tcp"  },
-  { "distcc",          { NULL }, 3632,  "udp"  },
-  { "wacp",            { NULL }, 3633,  "tcp"  },
-  { "wacp",            { NULL }, 3633,  "udp"  },
-  { "hlibmgr",         { NULL }, 3634,  "tcp"  },
-  { "hlibmgr",         { NULL }, 3634,  "udp"  },
-  { "sdo",             { NULL }, 3635,  "tcp"  },
-  { "sdo",             { NULL }, 3635,  "udp"  },
-  { "servistaitsm",    { NULL }, 3636,  "tcp"  },
-  { "servistaitsm",    { NULL }, 3636,  "udp"  },
-  { "scservp",         { NULL }, 3637,  "tcp"  },
-  { "scservp",         { NULL }, 3637,  "udp"  },
-  { "ehp-backup",      { NULL }, 3638,  "tcp"  },
-  { "ehp-backup",      { NULL }, 3638,  "udp"  },
-  { "xap-ha",          { NULL }, 3639,  "tcp"  },
-  { "xap-ha",          { NULL }, 3639,  "udp"  },
-  { "netplay-port1",   { NULL }, 3640,  "tcp"  },
-  { "netplay-port1",   { NULL }, 3640,  "udp"  },
-  { "netplay-port2",   { NULL }, 3641,  "tcp"  },
-  { "netplay-port2",   { NULL }, 3641,  "udp"  },
-  { "juxml-port",      { NULL }, 3642,  "tcp"  },
-  { "juxml-port",      { NULL }, 3642,  "udp"  },
-  { "audiojuggler",    { NULL }, 3643,  "tcp"  },
-  { "audiojuggler",    { NULL }, 3643,  "udp"  },
-  { "ssowatch",        { NULL }, 3644,  "tcp"  },
-  { "ssowatch",        { NULL }, 3644,  "udp"  },
-  { "cyc",             { NULL }, 3645,  "tcp"  },
-  { "cyc",             { NULL }, 3645,  "udp"  },
-  { "xss-srv-port",    { NULL }, 3646,  "tcp"  },
-  { "xss-srv-port",    { NULL }, 3646,  "udp"  },
-  { "splitlock-gw",    { NULL }, 3647,  "tcp"  },
-  { "splitlock-gw",    { NULL }, 3647,  "udp"  },
-  { "fjcp",            { NULL }, 3648,  "tcp"  },
-  { "fjcp",            { NULL }, 3648,  "udp"  },
-  { "nmmp",            { NULL }, 3649,  "tcp"  },
-  { "nmmp",            { NULL }, 3649,  "udp"  },
-  { "prismiq-plugin",  { NULL }, 3650,  "tcp"  },
-  { "prismiq-plugin",  { NULL }, 3650,  "udp"  },
-  { "xrpc-registry",   { NULL }, 3651,  "tcp"  },
-  { "xrpc-registry",   { NULL }, 3651,  "udp"  },
-  { "vxcrnbuport",     { NULL }, 3652,  "tcp"  },
-  { "vxcrnbuport",     { NULL }, 3652,  "udp"  },
-  { "tsp",             { NULL }, 3653,  "tcp"  },
-  { "tsp",             { NULL }, 3653,  "udp"  },
-  { "vaprtm",          { NULL }, 3654,  "tcp"  },
-  { "vaprtm",          { NULL }, 3654,  "udp"  },
-  { "abatemgr",        { NULL }, 3655,  "tcp"  },
-  { "abatemgr",        { NULL }, 3655,  "udp"  },
-  { "abatjss",         { NULL }, 3656,  "tcp"  },
-  { "abatjss",         { NULL }, 3656,  "udp"  },
-  { "immedianet-bcn",  { NULL }, 3657,  "tcp"  },
-  { "immedianet-bcn",  { NULL }, 3657,  "udp"  },
-  { "ps-ams",          { NULL }, 3658,  "tcp"  },
-  { "ps-ams",          { NULL }, 3658,  "udp"  },
-  { "apple-sasl",      { NULL }, 3659,  "tcp"  },
-  { "apple-sasl",      { NULL }, 3659,  "udp"  },
-  { "can-nds-ssl",     { NULL }, 3660,  "tcp"  },
-  { "can-nds-ssl",     { NULL }, 3660,  "udp"  },
-  { "can-ferret-ssl",  { NULL }, 3661,  "tcp"  },
-  { "can-ferret-ssl",  { NULL }, 3661,  "udp"  },
-  { "pserver",         { NULL }, 3662,  "tcp"  },
-  { "pserver",         { NULL }, 3662,  "udp"  },
-  { "dtp",             { NULL }, 3663,  "tcp"  },
-  { "dtp",             { NULL }, 3663,  "udp"  },
-  { "ups-engine",      { NULL }, 3664,  "tcp"  },
-  { "ups-engine",      { NULL }, 3664,  "udp"  },
-  { "ent-engine",      { NULL }, 3665,  "tcp"  },
-  { "ent-engine",      { NULL }, 3665,  "udp"  },
-  { "eserver-pap",     { NULL }, 3666,  "tcp"  },
-  { "eserver-pap",     { NULL }, 3666,  "udp"  },
-  { "infoexch",        { NULL }, 3667,  "tcp"  },
-  { "infoexch",        { NULL }, 3667,  "udp"  },
-  { "dell-rm-port",    { NULL }, 3668,  "tcp"  },
-  { "dell-rm-port",    { NULL }, 3668,  "udp"  },
-  { "casanswmgmt",     { NULL }, 3669,  "tcp"  },
-  { "casanswmgmt",     { NULL }, 3669,  "udp"  },
-  { "smile",           { NULL }, 3670,  "tcp"  },
-  { "smile",           { NULL }, 3670,  "udp"  },
-  { "efcp",            { NULL }, 3671,  "tcp"  },
-  { "efcp",            { NULL }, 3671,  "udp"  },
-  { "lispworks-orb",   { NULL }, 3672,  "tcp"  },
-  { "lispworks-orb",   { NULL }, 3672,  "udp"  },
-  { "mediavault-gui",  { NULL }, 3673,  "tcp"  },
-  { "mediavault-gui",  { NULL }, 3673,  "udp"  },
-  { "wininstall-ipc",  { NULL }, 3674,  "tcp"  },
-  { "wininstall-ipc",  { NULL }, 3674,  "udp"  },
-  { "calltrax",        { NULL }, 3675,  "tcp"  },
-  { "calltrax",        { NULL }, 3675,  "udp"  },
-  { "va-pacbase",      { NULL }, 3676,  "tcp"  },
-  { "va-pacbase",      { NULL }, 3676,  "udp"  },
-  { "roverlog",        { NULL }, 3677,  "tcp"  },
-  { "roverlog",        { NULL }, 3677,  "udp"  },
-  { "ipr-dglt",        { NULL }, 3678,  "tcp"  },
-  { "ipr-dglt",        { NULL }, 3678,  "udp"  },
-  { "newton-dock",     { NULL }, 3679,  "tcp"  },
-  { "newton-dock",     { NULL }, 3679,  "udp"  },
-  { "npds-tracker",    { NULL }, 3680,  "tcp"  },
-  { "npds-tracker",    { NULL }, 3680,  "udp"  },
-  { "bts-x73",         { NULL }, 3681,  "tcp"  },
-  { "bts-x73",         { NULL }, 3681,  "udp"  },
-  { "cas-mapi",        { NULL }, 3682,  "tcp"  },
-  { "cas-mapi",        { NULL }, 3682,  "udp"  },
-  { "bmc-ea",          { NULL }, 3683,  "tcp"  },
-  { "bmc-ea",          { NULL }, 3683,  "udp"  },
-  { "faxstfx-port",    { NULL }, 3684,  "tcp"  },
-  { "faxstfx-port",    { NULL }, 3684,  "udp"  },
-  { "dsx-agent",       { NULL }, 3685,  "tcp"  },
-  { "dsx-agent",       { NULL }, 3685,  "udp"  },
-  { "tnmpv2",          { NULL }, 3686,  "tcp"  },
-  { "tnmpv2",          { NULL }, 3686,  "udp"  },
-  { "simple-push",     { NULL }, 3687,  "tcp"  },
-  { "simple-push",     { NULL }, 3687,  "udp"  },
-  { "simple-push-s",   { NULL }, 3688,  "tcp"  },
-  { "simple-push-s",   { NULL }, 3688,  "udp"  },
-  { "daap",            { NULL }, 3689,  "tcp"  },
-  { "daap",            { NULL }, 3689,  "udp"  },
-  { "svn",             { NULL }, 3690,  "tcp"  },
-  { "svn",             { NULL }, 3690,  "udp"  },
-  { "magaya-network",  { NULL }, 3691,  "tcp"  },
-  { "magaya-network",  { NULL }, 3691,  "udp"  },
-  { "intelsync",       { NULL }, 3692,  "tcp"  },
-  { "intelsync",       { NULL }, 3692,  "udp"  },
-  { "bmc-data-coll",   { NULL }, 3695,  "tcp"  },
-  { "bmc-data-coll",   { NULL }, 3695,  "udp"  },
-  { "telnetcpcd",      { NULL }, 3696,  "tcp"  },
-  { "telnetcpcd",      { NULL }, 3696,  "udp"  },
-  { "nw-license",      { NULL }, 3697,  "tcp"  },
-  { "nw-license",      { NULL }, 3697,  "udp"  },
-  { "sagectlpanel",    { NULL }, 3698,  "tcp"  },
-  { "sagectlpanel",    { NULL }, 3698,  "udp"  },
-  { "kpn-icw",         { NULL }, 3699,  "tcp"  },
-  { "kpn-icw",         { NULL }, 3699,  "udp"  },
-  { "lrs-paging",      { NULL }, 3700,  "tcp"  },
-  { "lrs-paging",      { NULL }, 3700,  "udp"  },
-  { "netcelera",       { NULL }, 3701,  "tcp"  },
-  { "netcelera",       { NULL }, 3701,  "udp"  },
-  { "ws-discovery",    { NULL }, 3702,  "tcp"  },
-  { "ws-discovery",    { NULL }, 3702,  "udp"  },
-  { "adobeserver-3",   { NULL }, 3703,  "tcp"  },
-  { "adobeserver-3",   { NULL }, 3703,  "udp"  },
-  { "adobeserver-4",   { NULL }, 3704,  "tcp"  },
-  { "adobeserver-4",   { NULL }, 3704,  "udp"  },
-  { "adobeserver-5",   { NULL }, 3705,  "tcp"  },
-  { "adobeserver-5",   { NULL }, 3705,  "udp"  },
-  { "rt-event",        { NULL }, 3706,  "tcp"  },
-  { "rt-event",        { NULL }, 3706,  "udp"  },
-  { "rt-event-s",      { NULL }, 3707,  "tcp"  },
-  { "rt-event-s",      { NULL }, 3707,  "udp"  },
-  { "sun-as-iiops",    { NULL }, 3708,  "tcp"  },
-  { "sun-as-iiops",    { NULL }, 3708,  "udp"  },
-  { "ca-idms",         { NULL }, 3709,  "tcp"  },
-  { "ca-idms",         { NULL }, 3709,  "udp"  },
-  { "portgate-auth",   { NULL }, 3710,  "tcp"  },
-  { "portgate-auth",   { NULL }, 3710,  "udp"  },
-  { "edb-server2",     { NULL }, 3711,  "tcp"  },
-  { "edb-server2",     { NULL }, 3711,  "udp"  },
-  { "sentinel-ent",    { NULL }, 3712,  "tcp"  },
-  { "sentinel-ent",    { NULL }, 3712,  "udp"  },
-  { "tftps",           { NULL }, 3713,  "tcp"  },
-  { "tftps",           { NULL }, 3713,  "udp"  },
-  { "delos-dms",       { NULL }, 3714,  "tcp"  },
-  { "delos-dms",       { NULL }, 3714,  "udp"  },
-  { "anoto-rendezv",   { NULL }, 3715,  "tcp"  },
-  { "anoto-rendezv",   { NULL }, 3715,  "udp"  },
-  { "wv-csp-sms-cir",  { NULL }, 3716,  "tcp"  },
-  { "wv-csp-sms-cir",  { NULL }, 3716,  "udp"  },
-  { "wv-csp-udp-cir",  { NULL }, 3717,  "tcp"  },
-  { "wv-csp-udp-cir",  { NULL }, 3717,  "udp"  },
-  { "opus-services",   { NULL }, 3718,  "tcp"  },
-  { "opus-services",   { NULL }, 3718,  "udp"  },
-  { "itelserverport",  { NULL }, 3719,  "tcp"  },
-  { "itelserverport",  { NULL }, 3719,  "udp"  },
-  { "ufastro-instr",   { NULL }, 3720,  "tcp"  },
-  { "ufastro-instr",   { NULL }, 3720,  "udp"  },
-  { "xsync",           { NULL }, 3721,  "tcp"  },
-  { "xsync",           { NULL }, 3721,  "udp"  },
-  { "xserveraid",      { NULL }, 3722,  "tcp"  },
-  { "xserveraid",      { NULL }, 3722,  "udp"  },
-  { "sychrond",        { NULL }, 3723,  "tcp"  },
-  { "sychrond",        { NULL }, 3723,  "udp"  },
-  { "blizwow",         { NULL }, 3724,  "tcp"  },
-  { "blizwow",         { NULL }, 3724,  "udp"  },
-  { "na-er-tip",       { NULL }, 3725,  "tcp"  },
-  { "na-er-tip",       { NULL }, 3725,  "udp"  },
-  { "array-manager",   { NULL }, 3726,  "tcp"  },
-  { "array-manager",   { NULL }, 3726,  "udp"  },
-  { "e-mdu",           { NULL }, 3727,  "tcp"  },
-  { "e-mdu",           { NULL }, 3727,  "udp"  },
-  { "e-woa",           { NULL }, 3728,  "tcp"  },
-  { "e-woa",           { NULL }, 3728,  "udp"  },
-  { "fksp-audit",      { NULL }, 3729,  "tcp"  },
-  { "fksp-audit",      { NULL }, 3729,  "udp"  },
-  { "client-ctrl",     { NULL }, 3730,  "tcp"  },
-  { "client-ctrl",     { NULL }, 3730,  "udp"  },
-  { "smap",            { NULL }, 3731,  "tcp"  },
-  { "smap",            { NULL }, 3731,  "udp"  },
-  { "m-wnn",           { NULL }, 3732,  "tcp"  },
-  { "m-wnn",           { NULL }, 3732,  "udp"  },
-  { "multip-msg",      { NULL }, 3733,  "tcp"  },
-  { "multip-msg",      { NULL }, 3733,  "udp"  },
-  { "synel-data",      { NULL }, 3734,  "tcp"  },
-  { "synel-data",      { NULL }, 3734,  "udp"  },
-  { "pwdis",           { NULL }, 3735,  "tcp"  },
-  { "pwdis",           { NULL }, 3735,  "udp"  },
-  { "rs-rmi",          { NULL }, 3736,  "tcp"  },
-  { "rs-rmi",          { NULL }, 3736,  "udp"  },
-  { "xpanel",          { NULL }, 3737,  "tcp"  },
-  { "versatalk",       { NULL }, 3738,  "tcp"  },
-  { "versatalk",       { NULL }, 3738,  "udp"  },
-  { "launchbird-lm",   { NULL }, 3739,  "tcp"  },
-  { "launchbird-lm",   { NULL }, 3739,  "udp"  },
-  { "heartbeat",       { NULL }, 3740,  "tcp"  },
-  { "heartbeat",       { NULL }, 3740,  "udp"  },
-  { "wysdma",          { NULL }, 3741,  "tcp"  },
-  { "wysdma",          { NULL }, 3741,  "udp"  },
-  { "cst-port",        { NULL }, 3742,  "tcp"  },
-  { "cst-port",        { NULL }, 3742,  "udp"  },
-  { "ipcs-command",    { NULL }, 3743,  "tcp"  },
-  { "ipcs-command",    { NULL }, 3743,  "udp"  },
-  { "sasg",            { NULL }, 3744,  "tcp"  },
-  { "sasg",            { NULL }, 3744,  "udp"  },
-  { "gw-call-port",    { NULL }, 3745,  "tcp"  },
-  { "gw-call-port",    { NULL }, 3745,  "udp"  },
-  { "linktest",        { NULL }, 3746,  "tcp"  },
-  { "linktest",        { NULL }, 3746,  "udp"  },
-  { "linktest-s",      { NULL }, 3747,  "tcp"  },
-  { "linktest-s",      { NULL }, 3747,  "udp"  },
-  { "webdata",         { NULL }, 3748,  "tcp"  },
-  { "webdata",         { NULL }, 3748,  "udp"  },
-  { "cimtrak",         { NULL }, 3749,  "tcp"  },
-  { "cimtrak",         { NULL }, 3749,  "udp"  },
-  { "cbos-ip-port",    { NULL }, 3750,  "tcp"  },
-  { "cbos-ip-port",    { NULL }, 3750,  "udp"  },
-  { "gprs-cube",       { NULL }, 3751,  "tcp"  },
-  { "gprs-cube",       { NULL }, 3751,  "udp"  },
-  { "vipremoteagent",  { NULL }, 3752,  "tcp"  },
-  { "vipremoteagent",  { NULL }, 3752,  "udp"  },
-  { "nattyserver",     { NULL }, 3753,  "tcp"  },
-  { "nattyserver",     { NULL }, 3753,  "udp"  },
-  { "timestenbroker",  { NULL }, 3754,  "tcp"  },
-  { "timestenbroker",  { NULL }, 3754,  "udp"  },
-  { "sas-remote-hlp",  { NULL }, 3755,  "tcp"  },
-  { "sas-remote-hlp",  { NULL }, 3755,  "udp"  },
-  { "canon-capt",      { NULL }, 3756,  "tcp"  },
-  { "canon-capt",      { NULL }, 3756,  "udp"  },
-  { "grf-port",        { NULL }, 3757,  "tcp"  },
-  { "grf-port",        { NULL }, 3757,  "udp"  },
-  { "apw-registry",    { NULL }, 3758,  "tcp"  },
-  { "apw-registry",    { NULL }, 3758,  "udp"  },
-  { "exapt-lmgr",      { NULL }, 3759,  "tcp"  },
-  { "exapt-lmgr",      { NULL }, 3759,  "udp"  },
-  { "adtempusclient",  { NULL }, 3760,  "tcp"  },
-  { "adtempusclient",  { NULL }, 3760,  "udp"  },
-  { "gsakmp",          { NULL }, 3761,  "tcp"  },
-  { "gsakmp",          { NULL }, 3761,  "udp"  },
-  { "gbs-smp",         { NULL }, 3762,  "tcp"  },
-  { "gbs-smp",         { NULL }, 3762,  "udp"  },
-  { "xo-wave",         { NULL }, 3763,  "tcp"  },
-  { "xo-wave",         { NULL }, 3763,  "udp"  },
-  { "mni-prot-rout",   { NULL }, 3764,  "tcp"  },
-  { "mni-prot-rout",   { NULL }, 3764,  "udp"  },
-  { "rtraceroute",     { NULL }, 3765,  "tcp"  },
-  { "rtraceroute",     { NULL }, 3765,  "udp"  },
-  { "listmgr-port",    { NULL }, 3767,  "tcp"  },
-  { "listmgr-port",    { NULL }, 3767,  "udp"  },
-  { "rblcheckd",       { NULL }, 3768,  "tcp"  },
-  { "rblcheckd",       { NULL }, 3768,  "udp"  },
-  { "haipe-otnk",      { NULL }, 3769,  "tcp"  },
-  { "haipe-otnk",      { NULL }, 3769,  "udp"  },
-  { "cindycollab",     { NULL }, 3770,  "tcp"  },
-  { "cindycollab",     { NULL }, 3770,  "udp"  },
-  { "paging-port",     { NULL }, 3771,  "tcp"  },
-  { "paging-port",     { NULL }, 3771,  "udp"  },
-  { "ctp",             { NULL }, 3772,  "tcp"  },
-  { "ctp",             { NULL }, 3772,  "udp"  },
-  { "ctdhercules",     { NULL }, 3773,  "tcp"  },
-  { "ctdhercules",     { NULL }, 3773,  "udp"  },
-  { "zicom",           { NULL }, 3774,  "tcp"  },
-  { "zicom",           { NULL }, 3774,  "udp"  },
-  { "ispmmgr",         { NULL }, 3775,  "tcp"  },
-  { "ispmmgr",         { NULL }, 3775,  "udp"  },
-  { "dvcprov-port",    { NULL }, 3776,  "tcp"  },
-  { "dvcprov-port",    { NULL }, 3776,  "udp"  },
-  { "jibe-eb",         { NULL }, 3777,  "tcp"  },
-  { "jibe-eb",         { NULL }, 3777,  "udp"  },
-  { "c-h-it-port",     { NULL }, 3778,  "tcp"  },
-  { "c-h-it-port",     { NULL }, 3778,  "udp"  },
-  { "cognima",         { NULL }, 3779,  "tcp"  },
-  { "cognima",         { NULL }, 3779,  "udp"  },
-  { "nnp",             { NULL }, 3780,  "tcp"  },
-  { "nnp",             { NULL }, 3780,  "udp"  },
-  { "abcvoice-port",   { NULL }, 3781,  "tcp"  },
-  { "abcvoice-port",   { NULL }, 3781,  "udp"  },
-  { "iso-tp0s",        { NULL }, 3782,  "tcp"  },
-  { "iso-tp0s",        { NULL }, 3782,  "udp"  },
-  { "bim-pem",         { NULL }, 3783,  "tcp"  },
-  { "bim-pem",         { NULL }, 3783,  "udp"  },
-  { "bfd-control",     { NULL }, 3784,  "tcp"  },
-  { "bfd-control",     { NULL }, 3784,  "udp"  },
-  { "bfd-echo",        { NULL }, 3785,  "tcp"  },
-  { "bfd-echo",        { NULL }, 3785,  "udp"  },
-  { "upstriggervsw",   { NULL }, 3786,  "tcp"  },
-  { "upstriggervsw",   { NULL }, 3786,  "udp"  },
-  { "fintrx",          { NULL }, 3787,  "tcp"  },
-  { "fintrx",          { NULL }, 3787,  "udp"  },
-  { "isrp-port",       { NULL }, 3788,  "tcp"  },
-  { "isrp-port",       { NULL }, 3788,  "udp"  },
-  { "remotedeploy",    { NULL }, 3789,  "tcp"  },
-  { "remotedeploy",    { NULL }, 3789,  "udp"  },
-  { "quickbooksrds",   { NULL }, 3790,  "tcp"  },
-  { "quickbooksrds",   { NULL }, 3790,  "udp"  },
-  { "tvnetworkvideo",  { NULL }, 3791,  "tcp"  },
-  { "tvnetworkvideo",  { NULL }, 3791,  "udp"  },
-  { "sitewatch",       { NULL }, 3792,  "tcp"  },
-  { "sitewatch",       { NULL }, 3792,  "udp"  },
-  { "dcsoftware",      { NULL }, 3793,  "tcp"  },
-  { "dcsoftware",      { NULL }, 3793,  "udp"  },
-  { "jaus",            { NULL }, 3794,  "tcp"  },
-  { "jaus",            { NULL }, 3794,  "udp"  },
-  { "myblast",         { NULL }, 3795,  "tcp"  },
-  { "myblast",         { NULL }, 3795,  "udp"  },
-  { "spw-dialer",      { NULL }, 3796,  "tcp"  },
-  { "spw-dialer",      { NULL }, 3796,  "udp"  },
-  { "idps",            { NULL }, 3797,  "tcp"  },
-  { "idps",            { NULL }, 3797,  "udp"  },
-  { "minilock",        { NULL }, 3798,  "tcp"  },
-  { "minilock",        { NULL }, 3798,  "udp"  },
-  { "radius-dynauth",  { NULL }, 3799,  "tcp"  },
-  { "radius-dynauth",  { NULL }, 3799,  "udp"  },
-  { "pwgpsi",          { NULL }, 3800,  "tcp"  },
-  { "pwgpsi",          { NULL }, 3800,  "udp"  },
-  { "ibm-mgr",         { NULL }, 3801,  "tcp"  },
-  { "ibm-mgr",         { NULL }, 3801,  "udp"  },
-  { "vhd",             { NULL }, 3802,  "tcp"  },
-  { "vhd",             { NULL }, 3802,  "udp"  },
-  { "soniqsync",       { NULL }, 3803,  "tcp"  },
-  { "soniqsync",       { NULL }, 3803,  "udp"  },
-  { "iqnet-port",      { NULL }, 3804,  "tcp"  },
-  { "iqnet-port",      { NULL }, 3804,  "udp"  },
-  { "tcpdataserver",   { NULL }, 3805,  "tcp"  },
-  { "tcpdataserver",   { NULL }, 3805,  "udp"  },
-  { "wsmlb",           { NULL }, 3806,  "tcp"  },
-  { "wsmlb",           { NULL }, 3806,  "udp"  },
-  { "spugna",          { NULL }, 3807,  "tcp"  },
-  { "spugna",          { NULL }, 3807,  "udp"  },
-  { "sun-as-iiops-ca", { NULL }, 3808,  "tcp"  },
-  { "sun-as-iiops-ca", { NULL }, 3808,  "udp"  },
-  { "apocd",           { NULL }, 3809,  "tcp"  },
-  { "apocd",           { NULL }, 3809,  "udp"  },
-  { "wlanauth",        { NULL }, 3810,  "tcp"  },
-  { "wlanauth",        { NULL }, 3810,  "udp"  },
-  { "amp",             { NULL }, 3811,  "tcp"  },
-  { "amp",             { NULL }, 3811,  "udp"  },
-  { "neto-wol-server", { NULL }, 3812,  "tcp"  },
-  { "neto-wol-server", { NULL }, 3812,  "udp"  },
-  { "rap-ip",          { NULL }, 3813,  "tcp"  },
-  { "rap-ip",          { NULL }, 3813,  "udp"  },
-  { "neto-dcs",        { NULL }, 3814,  "tcp"  },
-  { "neto-dcs",        { NULL }, 3814,  "udp"  },
-  { "lansurveyorxml",  { NULL }, 3815,  "tcp"  },
-  { "lansurveyorxml",  { NULL }, 3815,  "udp"  },
-  { "sunlps-http",     { NULL }, 3816,  "tcp"  },
-  { "sunlps-http",     { NULL }, 3816,  "udp"  },
-  { "tapeware",        { NULL }, 3817,  "tcp"  },
-  { "tapeware",        { NULL }, 3817,  "udp"  },
-  { "crinis-hb",       { NULL }, 3818,  "tcp"  },
-  { "crinis-hb",       { NULL }, 3818,  "udp"  },
-  { "epl-slp",         { NULL }, 3819,  "tcp"  },
-  { "epl-slp",         { NULL }, 3819,  "udp"  },
-  { "scp",             { NULL }, 3820,  "tcp"  },
-  { "scp",             { NULL }, 3820,  "udp"  },
-  { "pmcp",            { NULL }, 3821,  "tcp"  },
-  { "pmcp",            { NULL }, 3821,  "udp"  },
-  { "acp-discovery",   { NULL }, 3822,  "tcp"  },
-  { "acp-discovery",   { NULL }, 3822,  "udp"  },
-  { "acp-conduit",     { NULL }, 3823,  "tcp"  },
-  { "acp-conduit",     { NULL }, 3823,  "udp"  },
-  { "acp-policy",      { NULL }, 3824,  "tcp"  },
-  { "acp-policy",      { NULL }, 3824,  "udp"  },
-  { "ffserver",        { NULL }, 3825,  "tcp"  },
-  { "ffserver",        { NULL }, 3825,  "udp"  },
-  { "wormux",          { NULL }, 3826,  "tcp"  },
-  { "wormux",          { NULL }, 3826,  "udp"  },
-  { "netmpi",          { NULL }, 3827,  "tcp"  },
-  { "netmpi",          { NULL }, 3827,  "udp"  },
-  { "neteh",           { NULL }, 3828,  "tcp"  },
-  { "neteh",           { NULL }, 3828,  "udp"  },
-  { "neteh-ext",       { NULL }, 3829,  "tcp"  },
-  { "neteh-ext",       { NULL }, 3829,  "udp"  },
-  { "cernsysmgmtagt",  { NULL }, 3830,  "tcp"  },
-  { "cernsysmgmtagt",  { NULL }, 3830,  "udp"  },
-  { "dvapps",          { NULL }, 3831,  "tcp"  },
-  { "dvapps",          { NULL }, 3831,  "udp"  },
-  { "xxnetserver",     { NULL }, 3832,  "tcp"  },
-  { "xxnetserver",     { NULL }, 3832,  "udp"  },
-  { "aipn-auth",       { NULL }, 3833,  "tcp"  },
-  { "aipn-auth",       { NULL }, 3833,  "udp"  },
-  { "spectardata",     { NULL }, 3834,  "tcp"  },
-  { "spectardata",     { NULL }, 3834,  "udp"  },
-  { "spectardb",       { NULL }, 3835,  "tcp"  },
-  { "spectardb",       { NULL }, 3835,  "udp"  },
-  { "markem-dcp",      { NULL }, 3836,  "tcp"  },
-  { "markem-dcp",      { NULL }, 3836,  "udp"  },
-  { "mkm-discovery",   { NULL }, 3837,  "tcp"  },
-  { "mkm-discovery",   { NULL }, 3837,  "udp"  },
-  { "sos",             { NULL }, 3838,  "tcp"  },
-  { "sos",             { NULL }, 3838,  "udp"  },
-  { "amx-rms",         { NULL }, 3839,  "tcp"  },
-  { "amx-rms",         { NULL }, 3839,  "udp"  },
-  { "flirtmitmir",     { NULL }, 3840,  "tcp"  },
-  { "flirtmitmir",     { NULL }, 3840,  "udp"  },
-  { "zfirm-shiprush3", { NULL }, 3841,  "tcp"  },
-  { "zfirm-shiprush3", { NULL }, 3841,  "udp"  },
-  { "nhci",            { NULL }, 3842,  "tcp"  },
-  { "nhci",            { NULL }, 3842,  "udp"  },
-  { "quest-agent",     { NULL }, 3843,  "tcp"  },
-  { "quest-agent",     { NULL }, 3843,  "udp"  },
-  { "rnm",             { NULL }, 3844,  "tcp"  },
-  { "rnm",             { NULL }, 3844,  "udp"  },
-  { "v-one-spp",       { NULL }, 3845,  "tcp"  },
-  { "v-one-spp",       { NULL }, 3845,  "udp"  },
-  { "an-pcp",          { NULL }, 3846,  "tcp"  },
-  { "an-pcp",          { NULL }, 3846,  "udp"  },
-  { "msfw-control",    { NULL }, 3847,  "tcp"  },
-  { "msfw-control",    { NULL }, 3847,  "udp"  },
-  { "item",            { NULL }, 3848,  "tcp"  },
-  { "item",            { NULL }, 3848,  "udp"  },
-  { "spw-dnspreload",  { NULL }, 3849,  "tcp"  },
-  { "spw-dnspreload",  { NULL }, 3849,  "udp"  },
-  { "qtms-bootstrap",  { NULL }, 3850,  "tcp"  },
-  { "qtms-bootstrap",  { NULL }, 3850,  "udp"  },
-  { "spectraport",     { NULL }, 3851,  "tcp"  },
-  { "spectraport",     { NULL }, 3851,  "udp"  },
-  { "sse-app-config",  { NULL }, 3852,  "tcp"  },
-  { "sse-app-config",  { NULL }, 3852,  "udp"  },
-  { "sscan",           { NULL }, 3853,  "tcp"  },
-  { "sscan",           { NULL }, 3853,  "udp"  },
-  { "stryker-com",     { NULL }, 3854,  "tcp"  },
-  { "stryker-com",     { NULL }, 3854,  "udp"  },
-  { "opentrac",        { NULL }, 3855,  "tcp"  },
-  { "opentrac",        { NULL }, 3855,  "udp"  },
-  { "informer",        { NULL }, 3856,  "tcp"  },
-  { "informer",        { NULL }, 3856,  "udp"  },
-  { "trap-port",       { NULL }, 3857,  "tcp"  },
-  { "trap-port",       { NULL }, 3857,  "udp"  },
-  { "trap-port-mom",   { NULL }, 3858,  "tcp"  },
-  { "trap-port-mom",   { NULL }, 3858,  "udp"  },
-  { "nav-port",        { NULL }, 3859,  "tcp"  },
-  { "nav-port",        { NULL }, 3859,  "udp"  },
-  { "sasp",            { NULL }, 3860,  "tcp"  },
-  { "sasp",            { NULL }, 3860,  "udp"  },
-  { "winshadow-hd",    { NULL }, 3861,  "tcp"  },
-  { "winshadow-hd",    { NULL }, 3861,  "udp"  },
-  { "giga-pocket",     { NULL }, 3862,  "tcp"  },
-  { "giga-pocket",     { NULL }, 3862,  "udp"  },
-  { "asap-tcp",        { NULL }, 3863,  "tcp"  },
-  { "asap-udp",        { NULL }, 3863,  "udp"  },
-  { "asap-sctp",       { NULL }, 3863,  "sctp" },
-  { "asap-tcp-tls",    { NULL }, 3864,  "tcp"  },
-  { "asap-sctp-tls",   { NULL }, 3864,  "sctp" },
-  { "xpl",             { NULL }, 3865,  "tcp"  },
-  { "xpl",             { NULL }, 3865,  "udp"  },
-  { "dzdaemon",        { NULL }, 3866,  "tcp"  },
-  { "dzdaemon",        { NULL }, 3866,  "udp"  },
-  { "dzoglserver",     { NULL }, 3867,  "tcp"  },
-  { "dzoglserver",     { NULL }, 3867,  "udp"  },
-  { "diameter",        { NULL }, 3868,  "tcp"  },
-  { "diameter",        { NULL }, 3868,  "sctp" },
-  { "ovsam-mgmt",      { NULL }, 3869,  "tcp"  },
-  { "ovsam-mgmt",      { NULL }, 3869,  "udp"  },
-  { "ovsam-d-agent",   { NULL }, 3870,  "tcp"  },
-  { "ovsam-d-agent",   { NULL }, 3870,  "udp"  },
-  { "avocent-adsap",   { NULL }, 3871,  "tcp"  },
-  { "avocent-adsap",   { NULL }, 3871,  "udp"  },
-  { "oem-agent",       { NULL }, 3872,  "tcp"  },
-  { "oem-agent",       { NULL }, 3872,  "udp"  },
-  { "fagordnc",        { NULL }, 3873,  "tcp"  },
-  { "fagordnc",        { NULL }, 3873,  "udp"  },
-  { "sixxsconfig",     { NULL }, 3874,  "tcp"  },
-  { "sixxsconfig",     { NULL }, 3874,  "udp"  },
-  { "pnbscada",        { NULL }, 3875,  "tcp"  },
-  { "pnbscada",        { NULL }, 3875,  "udp"  },
-  { "dl_agent",        { NULL }, 3876,  "tcp"  },
-  { "dl_agent",        { NULL }, 3876,  "udp"  },
-  { "xmpcr-interface", { NULL }, 3877,  "tcp"  },
-  { "xmpcr-interface", { NULL }, 3877,  "udp"  },
-  { "fotogcad",        { NULL }, 3878,  "tcp"  },
-  { "fotogcad",        { NULL }, 3878,  "udp"  },
-  { "appss-lm",        { NULL }, 3879,  "tcp"  },
-  { "appss-lm",        { NULL }, 3879,  "udp"  },
-  { "igrs",            { NULL }, 3880,  "tcp"  },
-  { "igrs",            { NULL }, 3880,  "udp"  },
-  { "idac",            { NULL }, 3881,  "tcp"  },
-  { "idac",            { NULL }, 3881,  "udp"  },
-  { "msdts1",          { NULL }, 3882,  "tcp"  },
-  { "msdts1",          { NULL }, 3882,  "udp"  },
-  { "vrpn",            { NULL }, 3883,  "tcp"  },
-  { "vrpn",            { NULL }, 3883,  "udp"  },
-  { "softrack-meter",  { NULL }, 3884,  "tcp"  },
-  { "softrack-meter",  { NULL }, 3884,  "udp"  },
-  { "topflow-ssl",     { NULL }, 3885,  "tcp"  },
-  { "topflow-ssl",     { NULL }, 3885,  "udp"  },
-  { "nei-management",  { NULL }, 3886,  "tcp"  },
-  { "nei-management",  { NULL }, 3886,  "udp"  },
-  { "ciphire-data",    { NULL }, 3887,  "tcp"  },
-  { "ciphire-data",    { NULL }, 3887,  "udp"  },
-  { "ciphire-serv",    { NULL }, 3888,  "tcp"  },
-  { "ciphire-serv",    { NULL }, 3888,  "udp"  },
-  { "dandv-tester",    { NULL }, 3889,  "tcp"  },
-  { "dandv-tester",    { NULL }, 3889,  "udp"  },
-  { "ndsconnect",      { NULL }, 3890,  "tcp"  },
-  { "ndsconnect",      { NULL }, 3890,  "udp"  },
-  { "rtc-pm-port",     { NULL }, 3891,  "tcp"  },
-  { "rtc-pm-port",     { NULL }, 3891,  "udp"  },
-  { "pcc-image-port",  { NULL }, 3892,  "tcp"  },
-  { "pcc-image-port",  { NULL }, 3892,  "udp"  },
-  { "cgi-starapi",     { NULL }, 3893,  "tcp"  },
-  { "cgi-starapi",     { NULL }, 3893,  "udp"  },
-  { "syam-agent",      { NULL }, 3894,  "tcp"  },
-  { "syam-agent",      { NULL }, 3894,  "udp"  },
-  { "syam-smc",        { NULL }, 3895,  "tcp"  },
-  { "syam-smc",        { NULL }, 3895,  "udp"  },
-  { "sdo-tls",         { NULL }, 3896,  "tcp"  },
-  { "sdo-tls",         { NULL }, 3896,  "udp"  },
-  { "sdo-ssh",         { NULL }, 3897,  "tcp"  },
-  { "sdo-ssh",         { NULL }, 3897,  "udp"  },
-  { "senip",           { NULL }, 3898,  "tcp"  },
-  { "senip",           { NULL }, 3898,  "udp"  },
-  { "itv-control",     { NULL }, 3899,  "tcp"  },
-  { "itv-control",     { NULL }, 3899,  "udp"  },
-  { "udt_os",          { NULL }, 3900,  "tcp"  },
-  { "udt_os",          { NULL }, 3900,  "udp"  },
-  { "nimsh",           { NULL }, 3901,  "tcp"  },
-  { "nimsh",           { NULL }, 3901,  "udp"  },
-  { "nimaux",          { NULL }, 3902,  "tcp"  },
-  { "nimaux",          { NULL }, 3902,  "udp"  },
-  { "charsetmgr",      { NULL }, 3903,  "tcp"  },
-  { "charsetmgr",      { NULL }, 3903,  "udp"  },
-  { "omnilink-port",   { NULL }, 3904,  "tcp"  },
-  { "omnilink-port",   { NULL }, 3904,  "udp"  },
-  { "mupdate",         { NULL }, 3905,  "tcp"  },
-  { "mupdate",         { NULL }, 3905,  "udp"  },
-  { "topovista-data",  { NULL }, 3906,  "tcp"  },
-  { "topovista-data",  { NULL }, 3906,  "udp"  },
-  { "imoguia-port",    { NULL }, 3907,  "tcp"  },
-  { "imoguia-port",    { NULL }, 3907,  "udp"  },
-  { "hppronetman",     { NULL }, 3908,  "tcp"  },
-  { "hppronetman",     { NULL }, 3908,  "udp"  },
-  { "surfcontrolcpa",  { NULL }, 3909,  "tcp"  },
-  { "surfcontrolcpa",  { NULL }, 3909,  "udp"  },
-  { "prnrequest",      { NULL }, 3910,  "tcp"  },
-  { "prnrequest",      { NULL }, 3910,  "udp"  },
-  { "prnstatus",       { NULL }, 3911,  "tcp"  },
-  { "prnstatus",       { NULL }, 3911,  "udp"  },
-  { "gbmt-stars",      { NULL }, 3912,  "tcp"  },
-  { "gbmt-stars",      { NULL }, 3912,  "udp"  },
-  { "listcrt-port",    { NULL }, 3913,  "tcp"  },
-  { "listcrt-port",    { NULL }, 3913,  "udp"  },
-  { "listcrt-port-2",  { NULL }, 3914,  "tcp"  },
-  { "listcrt-port-2",  { NULL }, 3914,  "udp"  },
-  { "agcat",           { NULL }, 3915,  "tcp"  },
-  { "agcat",           { NULL }, 3915,  "udp"  },
-  { "wysdmc",          { NULL }, 3916,  "tcp"  },
-  { "wysdmc",          { NULL }, 3916,  "udp"  },
-  { "aftmux",          { NULL }, 3917,  "tcp"  },
-  { "aftmux",          { NULL }, 3917,  "udp"  },
-  { "pktcablemmcops",  { NULL }, 3918,  "tcp"  },
-  { "pktcablemmcops",  { NULL }, 3918,  "udp"  },
-  { "hyperip",         { NULL }, 3919,  "tcp"  },
-  { "hyperip",         { NULL }, 3919,  "udp"  },
-  { "exasoftport1",    { NULL }, 3920,  "tcp"  },
-  { "exasoftport1",    { NULL }, 3920,  "udp"  },
-  { "herodotus-net",   { NULL }, 3921,  "tcp"  },
-  { "herodotus-net",   { NULL }, 3921,  "udp"  },
-  { "sor-update",      { NULL }, 3922,  "tcp"  },
-  { "sor-update",      { NULL }, 3922,  "udp"  },
-  { "symb-sb-port",    { NULL }, 3923,  "tcp"  },
-  { "symb-sb-port",    { NULL }, 3923,  "udp"  },
-  { "mpl-gprs-port",   { NULL }, 3924,  "tcp"  },
-  { "mpl-gprs-port",   { NULL }, 3924,  "udp"  },
-  { "zmp",             { NULL }, 3925,  "tcp"  },
-  { "zmp",             { NULL }, 3925,  "udp"  },
-  { "winport",         { NULL }, 3926,  "tcp"  },
-  { "winport",         { NULL }, 3926,  "udp"  },
-  { "natdataservice",  { NULL }, 3927,  "tcp"  },
-  { "natdataservice",  { NULL }, 3927,  "udp"  },
-  { "netboot-pxe",     { NULL }, 3928,  "tcp"  },
-  { "netboot-pxe",     { NULL }, 3928,  "udp"  },
-  { "smauth-port",     { NULL }, 3929,  "tcp"  },
-  { "smauth-port",     { NULL }, 3929,  "udp"  },
-  { "syam-webserver",  { NULL }, 3930,  "tcp"  },
-  { "syam-webserver",  { NULL }, 3930,  "udp"  },
-  { "msr-plugin-port", { NULL }, 3931,  "tcp"  },
-  { "msr-plugin-port", { NULL }, 3931,  "udp"  },
-  { "dyn-site",        { NULL }, 3932,  "tcp"  },
-  { "dyn-site",        { NULL }, 3932,  "udp"  },
-  { "plbserve-port",   { NULL }, 3933,  "tcp"  },
-  { "plbserve-port",   { NULL }, 3933,  "udp"  },
-  { "sunfm-port",      { NULL }, 3934,  "tcp"  },
-  { "sunfm-port",      { NULL }, 3934,  "udp"  },
-  { "sdp-portmapper",  { NULL }, 3935,  "tcp"  },
-  { "sdp-portmapper",  { NULL }, 3935,  "udp"  },
-  { "mailprox",        { NULL }, 3936,  "tcp"  },
-  { "mailprox",        { NULL }, 3936,  "udp"  },
-  { "dvbservdsc",      { NULL }, 3937,  "tcp"  },
-  { "dvbservdsc",      { NULL }, 3937,  "udp"  },
-  { "dbcontrol_agent", { NULL }, 3938,  "tcp"  },
-  { "dbcontrol_agent", { NULL }, 3938,  "udp"  },
-  { "aamp",            { NULL }, 3939,  "tcp"  },
-  { "aamp",            { NULL }, 3939,  "udp"  },
-  { "xecp-node",       { NULL }, 3940,  "tcp"  },
-  { "xecp-node",       { NULL }, 3940,  "udp"  },
-  { "homeportal-web",  { NULL }, 3941,  "tcp"  },
-  { "homeportal-web",  { NULL }, 3941,  "udp"  },
-  { "srdp",            { NULL }, 3942,  "tcp"  },
-  { "srdp",            { NULL }, 3942,  "udp"  },
-  { "tig",             { NULL }, 3943,  "tcp"  },
-  { "tig",             { NULL }, 3943,  "udp"  },
-  { "sops",            { NULL }, 3944,  "tcp"  },
-  { "sops",            { NULL }, 3944,  "udp"  },
-  { "emcads",          { NULL }, 3945,  "tcp"  },
-  { "emcads",          { NULL }, 3945,  "udp"  },
-  { "backupedge",      { NULL }, 3946,  "tcp"  },
-  { "backupedge",      { NULL }, 3946,  "udp"  },
-  { "ccp",             { NULL }, 3947,  "tcp"  },
-  { "ccp",             { NULL }, 3947,  "udp"  },
-  { "apdap",           { NULL }, 3948,  "tcp"  },
-  { "apdap",           { NULL }, 3948,  "udp"  },
-  { "drip",            { NULL }, 3949,  "tcp"  },
-  { "drip",            { NULL }, 3949,  "udp"  },
-  { "namemunge",       { NULL }, 3950,  "tcp"  },
-  { "namemunge",       { NULL }, 3950,  "udp"  },
-  { "pwgippfax",       { NULL }, 3951,  "tcp"  },
-  { "pwgippfax",       { NULL }, 3951,  "udp"  },
-  { "i3-sessionmgr",   { NULL }, 3952,  "tcp"  },
-  { "i3-sessionmgr",   { NULL }, 3952,  "udp"  },
-  { "xmlink-connect",  { NULL }, 3953,  "tcp"  },
-  { "xmlink-connect",  { NULL }, 3953,  "udp"  },
-  { "adrep",           { NULL }, 3954,  "tcp"  },
-  { "adrep",           { NULL }, 3954,  "udp"  },
-  { "p2pcommunity",    { NULL }, 3955,  "tcp"  },
-  { "p2pcommunity",    { NULL }, 3955,  "udp"  },
-  { "gvcp",            { NULL }, 3956,  "tcp"  },
-  { "gvcp",            { NULL }, 3956,  "udp"  },
-  { "mqe-broker",      { NULL }, 3957,  "tcp"  },
-  { "mqe-broker",      { NULL }, 3957,  "udp"  },
-  { "mqe-agent",       { NULL }, 3958,  "tcp"  },
-  { "mqe-agent",       { NULL }, 3958,  "udp"  },
-  { "treehopper",      { NULL }, 3959,  "tcp"  },
-  { "treehopper",      { NULL }, 3959,  "udp"  },
-  { "bess",            { NULL }, 3960,  "tcp"  },
-  { "bess",            { NULL }, 3960,  "udp"  },
-  { "proaxess",        { NULL }, 3961,  "tcp"  },
-  { "proaxess",        { NULL }, 3961,  "udp"  },
-  { "sbi-agent",       { NULL }, 3962,  "tcp"  },
-  { "sbi-agent",       { NULL }, 3962,  "udp"  },
-  { "thrp",            { NULL }, 3963,  "tcp"  },
-  { "thrp",            { NULL }, 3963,  "udp"  },
-  { "sasggprs",        { NULL }, 3964,  "tcp"  },
-  { "sasggprs",        { NULL }, 3964,  "udp"  },
-  { "ati-ip-to-ncpe",  { NULL }, 3965,  "tcp"  },
-  { "ati-ip-to-ncpe",  { NULL }, 3965,  "udp"  },
-  { "bflckmgr",        { NULL }, 3966,  "tcp"  },
-  { "bflckmgr",        { NULL }, 3966,  "udp"  },
-  { "ppsms",           { NULL }, 3967,  "tcp"  },
-  { "ppsms",           { NULL }, 3967,  "udp"  },
-  { "ianywhere-dbns",  { NULL }, 3968,  "tcp"  },
-  { "ianywhere-dbns",  { NULL }, 3968,  "udp"  },
-  { "landmarks",       { NULL }, 3969,  "tcp"  },
-  { "landmarks",       { NULL }, 3969,  "udp"  },
-  { "lanrevagent",     { NULL }, 3970,  "tcp"  },
-  { "lanrevagent",     { NULL }, 3970,  "udp"  },
-  { "lanrevserver",    { NULL }, 3971,  "tcp"  },
-  { "lanrevserver",    { NULL }, 3971,  "udp"  },
-  { "iconp",           { NULL }, 3972,  "tcp"  },
-  { "iconp",           { NULL }, 3972,  "udp"  },
-  { "progistics",      { NULL }, 3973,  "tcp"  },
-  { "progistics",      { NULL }, 3973,  "udp"  },
-  { "citysearch",      { NULL }, 3974,  "tcp"  },
-  { "citysearch",      { NULL }, 3974,  "udp"  },
-  { "airshot",         { NULL }, 3975,  "tcp"  },
-  { "airshot",         { NULL }, 3975,  "udp"  },
-  { "opswagent",       { NULL }, 3976,  "tcp"  },
-  { "opswagent",       { NULL }, 3976,  "udp"  },
-  { "opswmanager",     { NULL }, 3977,  "tcp"  },
-  { "opswmanager",     { NULL }, 3977,  "udp"  },
-  { "secure-cfg-svr",  { NULL }, 3978,  "tcp"  },
-  { "secure-cfg-svr",  { NULL }, 3978,  "udp"  },
-  { "smwan",           { NULL }, 3979,  "tcp"  },
-  { "smwan",           { NULL }, 3979,  "udp"  },
-  { "acms",            { NULL }, 3980,  "tcp"  },
-  { "acms",            { NULL }, 3980,  "udp"  },
-  { "starfish",        { NULL }, 3981,  "tcp"  },
-  { "starfish",        { NULL }, 3981,  "udp"  },
-  { "eis",             { NULL }, 3982,  "tcp"  },
-  { "eis",             { NULL }, 3982,  "udp"  },
-  { "eisp",            { NULL }, 3983,  "tcp"  },
-  { "eisp",            { NULL }, 3983,  "udp"  },
-  { "mapper-nodemgr",  { NULL }, 3984,  "tcp"  },
-  { "mapper-nodemgr",  { NULL }, 3984,  "udp"  },
-  { "mapper-mapethd",  { NULL }, 3985,  "tcp"  },
-  { "mapper-mapethd",  { NULL }, 3985,  "udp"  },
-  { "mapper-ws_ethd",  { NULL }, 3986,  "tcp"  },
-  { "mapper-ws_ethd",  { NULL }, 3986,  "udp"  },
-  { "centerline",      { NULL }, 3987,  "tcp"  },
-  { "centerline",      { NULL }, 3987,  "udp"  },
-  { "dcs-config",      { NULL }, 3988,  "tcp"  },
-  { "dcs-config",      { NULL }, 3988,  "udp"  },
-  { "bv-queryengine",  { NULL }, 3989,  "tcp"  },
-  { "bv-queryengine",  { NULL }, 3989,  "udp"  },
-  { "bv-is",           { NULL }, 3990,  "tcp"  },
-  { "bv-is",           { NULL }, 3990,  "udp"  },
-  { "bv-smcsrv",       { NULL }, 3991,  "tcp"  },
-  { "bv-smcsrv",       { NULL }, 3991,  "udp"  },
-  { "bv-ds",           { NULL }, 3992,  "tcp"  },
-  { "bv-ds",           { NULL }, 3992,  "udp"  },
-  { "bv-agent",        { NULL }, 3993,  "tcp"  },
-  { "bv-agent",        { NULL }, 3993,  "udp"  },
-  { "iss-mgmt-ssl",    { NULL }, 3995,  "tcp"  },
-  { "iss-mgmt-ssl",    { NULL }, 3995,  "udp"  },
-  { "abcsoftware",     { NULL }, 3996,  "tcp"  },
-  { "abcsoftware",     { NULL }, 3996,  "udp"  },
-  { "agentsease-db",   { NULL }, 3997,  "tcp"  },
-  { "agentsease-db",   { NULL }, 3997,  "udp"  },
-  { "dnx",             { NULL }, 3998,  "tcp"  },
-  { "dnx",             { NULL }, 3998,  "udp"  },
-  { "nvcnet",          { NULL }, 3999,  "tcp"  },
-  { "nvcnet",          { NULL }, 3999,  "udp"  },
-  { "terabase",        { NULL }, 4000,  "tcp"  },
-  { "terabase",        { NULL }, 4000,  "udp"  },
-  { "newoak",          { NULL }, 4001,  "tcp"  },
-  { "newoak",          { NULL }, 4001,  "udp"  },
-  { "pxc-spvr-ft",     { NULL }, 4002,  "tcp"  },
-  { "pxc-spvr-ft",     { NULL }, 4002,  "udp"  },
-  { "pxc-splr-ft",     { NULL }, 4003,  "tcp"  },
-  { "pxc-splr-ft",     { NULL }, 4003,  "udp"  },
-  { "pxc-roid",        { NULL }, 4004,  "tcp"  },
-  { "pxc-roid",        { NULL }, 4004,  "udp"  },
-  { "pxc-pin",         { NULL }, 4005,  "tcp"  },
-  { "pxc-pin",         { NULL }, 4005,  "udp"  },
-  { "pxc-spvr",        { NULL }, 4006,  "tcp"  },
-  { "pxc-spvr",        { NULL }, 4006,  "udp"  },
-  { "pxc-splr",        { NULL }, 4007,  "tcp"  },
-  { "pxc-splr",        { NULL }, 4007,  "udp"  },
-  { "netcheque",       { NULL }, 4008,  "tcp"  },
-  { "netcheque",       { NULL }, 4008,  "udp"  },
-  { "chimera-hwm",     { NULL }, 4009,  "tcp"  },
-  { "chimera-hwm",     { NULL }, 4009,  "udp"  },
-  { "samsung-unidex",  { NULL }, 4010,  "tcp"  },
-  { "samsung-unidex",  { NULL }, 4010,  "udp"  },
-  { "altserviceboot",  { NULL }, 4011,  "tcp"  },
-  { "altserviceboot",  { NULL }, 4011,  "udp"  },
-  { "pda-gate",        { NULL }, 4012,  "tcp"  },
-  { "pda-gate",        { NULL }, 4012,  "udp"  },
-  { "acl-manager",     { NULL }, 4013,  "tcp"  },
-  { "acl-manager",     { NULL }, 4013,  "udp"  },
-  { "taiclock",        { NULL }, 4014,  "tcp"  },
-  { "taiclock",        { NULL }, 4014,  "udp"  },
-  { "talarian-mcast1", { NULL }, 4015,  "tcp"  },
-  { "talarian-mcast1", { NULL }, 4015,  "udp"  },
-  { "talarian-mcast2", { NULL }, 4016,  "tcp"  },
-  { "talarian-mcast2", { NULL }, 4016,  "udp"  },
-  { "talarian-mcast3", { NULL }, 4017,  "tcp"  },
-  { "talarian-mcast3", { NULL }, 4017,  "udp"  },
-  { "talarian-mcast4", { NULL }, 4018,  "tcp"  },
-  { "talarian-mcast4", { NULL }, 4018,  "udp"  },
-  { "talarian-mcast5", { NULL }, 4019,  "tcp"  },
-  { "talarian-mcast5", { NULL }, 4019,  "udp"  },
-  { "trap",            { NULL }, 4020,  "tcp"  },
-  { "trap",            { NULL }, 4020,  "udp"  },
-  { "nexus-portal",    { NULL }, 4021,  "tcp"  },
-  { "nexus-portal",    { NULL }, 4021,  "udp"  },
-  { "dnox",            { NULL }, 4022,  "tcp"  },
-  { "dnox",            { NULL }, 4022,  "udp"  },
-  { "esnm-zoning",     { NULL }, 4023,  "tcp"  },
-  { "esnm-zoning",     { NULL }, 4023,  "udp"  },
-  { "tnp1-port",       { NULL }, 4024,  "tcp"  },
-  { "tnp1-port",       { NULL }, 4024,  "udp"  },
-  { "partimage",       { NULL }, 4025,  "tcp"  },
-  { "partimage",       { NULL }, 4025,  "udp"  },
-  { "as-debug",        { NULL }, 4026,  "tcp"  },
-  { "as-debug",        { NULL }, 4026,  "udp"  },
-  { "bxp",             { NULL }, 4027,  "tcp"  },
-  { "bxp",             { NULL }, 4027,  "udp"  },
-  { "dtserver-port",   { NULL }, 4028,  "tcp"  },
-  { "dtserver-port",   { NULL }, 4028,  "udp"  },
-  { "ip-qsig",         { NULL }, 4029,  "tcp"  },
-  { "ip-qsig",         { NULL }, 4029,  "udp"  },
-  { "jdmn-port",       { NULL }, 4030,  "tcp"  },
-  { "jdmn-port",       { NULL }, 4030,  "udp"  },
-  { "suucp",           { NULL }, 4031,  "tcp"  },
-  { "suucp",           { NULL }, 4031,  "udp"  },
-  { "vrts-auth-port",  { NULL }, 4032,  "tcp"  },
-  { "vrts-auth-port",  { NULL }, 4032,  "udp"  },
-  { "sanavigator",     { NULL }, 4033,  "tcp"  },
-  { "sanavigator",     { NULL }, 4033,  "udp"  },
-  { "ubxd",            { NULL }, 4034,  "tcp"  },
-  { "ubxd",            { NULL }, 4034,  "udp"  },
-  { "wap-push-http",   { NULL }, 4035,  "tcp"  },
-  { "wap-push-http",   { NULL }, 4035,  "udp"  },
-  { "wap-push-https",  { NULL }, 4036,  "tcp"  },
-  { "wap-push-https",  { NULL }, 4036,  "udp"  },
-  { "ravehd",          { NULL }, 4037,  "tcp"  },
-  { "ravehd",          { NULL }, 4037,  "udp"  },
-  { "fazzt-ptp",       { NULL }, 4038,  "tcp"  },
-  { "fazzt-ptp",       { NULL }, 4038,  "udp"  },
-  { "fazzt-admin",     { NULL }, 4039,  "tcp"  },
-  { "fazzt-admin",     { NULL }, 4039,  "udp"  },
-  { "yo-main",         { NULL }, 4040,  "tcp"  },
-  { "yo-main",         { NULL }, 4040,  "udp"  },
-  { "houston",         { NULL }, 4041,  "tcp"  },
-  { "houston",         { NULL }, 4041,  "udp"  },
-  { "ldxp",            { NULL }, 4042,  "tcp"  },
-  { "ldxp",            { NULL }, 4042,  "udp"  },
-  { "nirp",            { NULL }, 4043,  "tcp"  },
-  { "nirp",            { NULL }, 4043,  "udp"  },
-  { "ltp",             { NULL }, 4044,  "tcp"  },
-  { "ltp",             { NULL }, 4044,  "udp"  },
-  { "npp",             { NULL }, 4045,  "tcp"  },
-  { "npp",             { NULL }, 4045,  "udp"  },
-  { "acp-proto",       { NULL }, 4046,  "tcp"  },
-  { "acp-proto",       { NULL }, 4046,  "udp"  },
-  { "ctp-state",       { NULL }, 4047,  "tcp"  },
-  { "ctp-state",       { NULL }, 4047,  "udp"  },
-  { "wafs",            { NULL }, 4049,  "tcp"  },
-  { "wafs",            { NULL }, 4049,  "udp"  },
-  { "cisco-wafs",      { NULL }, 4050,  "tcp"  },
-  { "cisco-wafs",      { NULL }, 4050,  "udp"  },
-  { "cppdp",           { NULL }, 4051,  "tcp"  },
-  { "cppdp",           { NULL }, 4051,  "udp"  },
-  { "interact",        { NULL }, 4052,  "tcp"  },
-  { "interact",        { NULL }, 4052,  "udp"  },
-  { "ccu-comm-1",      { NULL }, 4053,  "tcp"  },
-  { "ccu-comm-1",      { NULL }, 4053,  "udp"  },
-  { "ccu-comm-2",      { NULL }, 4054,  "tcp"  },
-  { "ccu-comm-2",      { NULL }, 4054,  "udp"  },
-  { "ccu-comm-3",      { NULL }, 4055,  "tcp"  },
-  { "ccu-comm-3",      { NULL }, 4055,  "udp"  },
-  { "lms",             { NULL }, 4056,  "tcp"  },
-  { "lms",             { NULL }, 4056,  "udp"  },
-  { "wfm",             { NULL }, 4057,  "tcp"  },
-  { "wfm",             { NULL }, 4057,  "udp"  },
-  { "kingfisher",      { NULL }, 4058,  "tcp"  },
-  { "kingfisher",      { NULL }, 4058,  "udp"  },
-  { "dlms-cosem",      { NULL }, 4059,  "tcp"  },
-  { "dlms-cosem",      { NULL }, 4059,  "udp"  },
-  { "dsmeter_iatc",    { NULL }, 4060,  "tcp"  },
-  { "dsmeter_iatc",    { NULL }, 4060,  "udp"  },
-  { "ice-location",    { NULL }, 4061,  "tcp"  },
-  { "ice-location",    { NULL }, 4061,  "udp"  },
-  { "ice-slocation",   { NULL }, 4062,  "tcp"  },
-  { "ice-slocation",   { NULL }, 4062,  "udp"  },
-  { "ice-router",      { NULL }, 4063,  "tcp"  },
-  { "ice-router",      { NULL }, 4063,  "udp"  },
-  { "ice-srouter",     { NULL }, 4064,  "tcp"  },
-  { "ice-srouter",     { NULL }, 4064,  "udp"  },
-  { "avanti_cdp",      { NULL }, 4065,  "tcp"  },
-  { "avanti_cdp",      { NULL }, 4065,  "udp"  },
-  { "pmas",            { NULL }, 4066,  "tcp"  },
-  { "pmas",            { NULL }, 4066,  "udp"  },
-  { "idp",             { NULL }, 4067,  "tcp"  },
-  { "idp",             { NULL }, 4067,  "udp"  },
-  { "ipfltbcst",       { NULL }, 4068,  "tcp"  },
-  { "ipfltbcst",       { NULL }, 4068,  "udp"  },
-  { "minger",          { NULL }, 4069,  "tcp"  },
-  { "minger",          { NULL }, 4069,  "udp"  },
-  { "tripe",           { NULL }, 4070,  "tcp"  },
-  { "tripe",           { NULL }, 4070,  "udp"  },
-  { "aibkup",          { NULL }, 4071,  "tcp"  },
-  { "aibkup",          { NULL }, 4071,  "udp"  },
-  { "zieto-sock",      { NULL }, 4072,  "tcp"  },
-  { "zieto-sock",      { NULL }, 4072,  "udp"  },
-  { "iRAPP",           { NULL }, 4073,  "tcp"  },
-  { "iRAPP",           { NULL }, 4073,  "udp"  },
-  { "cequint-cityid",  { NULL }, 4074,  "tcp"  },
-  { "cequint-cityid",  { NULL }, 4074,  "udp"  },
-  { "perimlan",        { NULL }, 4075,  "tcp"  },
-  { "perimlan",        { NULL }, 4075,  "udp"  },
-  { "seraph",          { NULL }, 4076,  "tcp"  },
-  { "seraph",          { NULL }, 4076,  "udp"  },
-  { "ascomalarm",      { NULL }, 4077,  "udp"  },
-  { "cssp",            { NULL }, 4078,  "tcp"  },
-  { "santools",        { NULL }, 4079,  "tcp"  },
-  { "santools",        { NULL }, 4079,  "udp"  },
-  { "lorica-in",       { NULL }, 4080,  "tcp"  },
-  { "lorica-in",       { NULL }, 4080,  "udp"  },
-  { "lorica-in-sec",   { NULL }, 4081,  "tcp"  },
-  { "lorica-in-sec",   { NULL }, 4081,  "udp"  },
-  { "lorica-out",      { NULL }, 4082,  "tcp"  },
-  { "lorica-out",      { NULL }, 4082,  "udp"  },
-  { "lorica-out-sec",  { NULL }, 4083,  "tcp"  },
-  { "lorica-out-sec",  { NULL }, 4083,  "udp"  },
-  { "fortisphere-vm",  { NULL }, 4084,  "udp"  },
-  { "ezmessagesrv",    { NULL }, 4085,  "tcp"  },
-  { "ftsync",          { NULL }, 4086,  "udp"  },
-  { "applusservice",   { NULL }, 4087,  "tcp"  },
-  { "npsp",            { NULL }, 4088,  "tcp"  },
-  { "opencore",        { NULL }, 4089,  "tcp"  },
-  { "opencore",        { NULL }, 4089,  "udp"  },
-  { "omasgport",       { NULL }, 4090,  "tcp"  },
-  { "omasgport",       { NULL }, 4090,  "udp"  },
-  { "ewinstaller",     { NULL }, 4091,  "tcp"  },
-  { "ewinstaller",     { NULL }, 4091,  "udp"  },
-  { "ewdgs",           { NULL }, 4092,  "tcp"  },
-  { "ewdgs",           { NULL }, 4092,  "udp"  },
-  { "pvxpluscs",       { NULL }, 4093,  "tcp"  },
-  { "pvxpluscs",       { NULL }, 4093,  "udp"  },
-  { "sysrqd",          { NULL }, 4094,  "tcp"  },
-  { "sysrqd",          { NULL }, 4094,  "udp"  },
-  { "xtgui",           { NULL }, 4095,  "tcp"  },
-  { "xtgui",           { NULL }, 4095,  "udp"  },
-  { "bre",             { NULL }, 4096,  "tcp"  },
-  { "bre",             { NULL }, 4096,  "udp"  },
-  { "patrolview",      { NULL }, 4097,  "tcp"  },
-  { "patrolview",      { NULL }, 4097,  "udp"  },
-  { "drmsfsd",         { NULL }, 4098,  "tcp"  },
-  { "drmsfsd",         { NULL }, 4098,  "udp"  },
-  { "dpcp",            { NULL }, 4099,  "tcp"  },
-  { "dpcp",            { NULL }, 4099,  "udp"  },
-  { "igo-incognito",   { NULL }, 4100,  "tcp"  },
-  { "igo-incognito",   { NULL }, 4100,  "udp"  },
-  { "brlp-0",          { NULL }, 4101,  "tcp"  },
-  { "brlp-0",          { NULL }, 4101,  "udp"  },
-  { "brlp-1",          { NULL }, 4102,  "tcp"  },
-  { "brlp-1",          { NULL }, 4102,  "udp"  },
-  { "brlp-2",          { NULL }, 4103,  "tcp"  },
-  { "brlp-2",          { NULL }, 4103,  "udp"  },
-  { "brlp-3",          { NULL }, 4104,  "tcp"  },
-  { "brlp-3",          { NULL }, 4104,  "udp"  },
-  { "shofarplayer",    { NULL }, 4105,  "tcp"  },
-  { "shofarplayer",    { NULL }, 4105,  "udp"  },
-  { "synchronite",     { NULL }, 4106,  "tcp"  },
-  { "synchronite",     { NULL }, 4106,  "udp"  },
-  { "j-ac",            { NULL }, 4107,  "tcp"  },
-  { "j-ac",            { NULL }, 4107,  "udp"  },
-  { "accel",           { NULL }, 4108,  "tcp"  },
-  { "accel",           { NULL }, 4108,  "udp"  },
-  { "izm",             { NULL }, 4109,  "tcp"  },
-  { "izm",             { NULL }, 4109,  "udp"  },
-  { "g2tag",           { NULL }, 4110,  "tcp"  },
-  { "g2tag",           { NULL }, 4110,  "udp"  },
-  { "xgrid",           { NULL }, 4111,  "tcp"  },
-  { "xgrid",           { NULL }, 4111,  "udp"  },
-  { "apple-vpns-rp",   { NULL }, 4112,  "tcp"  },
-  { "apple-vpns-rp",   { NULL }, 4112,  "udp"  },
-  { "aipn-reg",        { NULL }, 4113,  "tcp"  },
-  { "aipn-reg",        { NULL }, 4113,  "udp"  },
-  { "jomamqmonitor",   { NULL }, 4114,  "tcp"  },
-  { "jomamqmonitor",   { NULL }, 4114,  "udp"  },
-  { "cds",             { NULL }, 4115,  "tcp"  },
-  { "cds",             { NULL }, 4115,  "udp"  },
-  { "smartcard-tls",   { NULL }, 4116,  "tcp"  },
-  { "smartcard-tls",   { NULL }, 4116,  "udp"  },
-  { "hillrserv",       { NULL }, 4117,  "tcp"  },
-  { "hillrserv",       { NULL }, 4117,  "udp"  },
-  { "netscript",       { NULL }, 4118,  "tcp"  },
-  { "netscript",       { NULL }, 4118,  "udp"  },
-  { "assuria-slm",     { NULL }, 4119,  "tcp"  },
-  { "assuria-slm",     { NULL }, 4119,  "udp"  },
-  { "e-builder",       { NULL }, 4121,  "tcp"  },
-  { "e-builder",       { NULL }, 4121,  "udp"  },
-  { "fprams",          { NULL }, 4122,  "tcp"  },
-  { "fprams",          { NULL }, 4122,  "udp"  },
-  { "z-wave",          { NULL }, 4123,  "tcp"  },
-  { "z-wave",          { NULL }, 4123,  "udp"  },
-  { "tigv2",           { NULL }, 4124,  "tcp"  },
-  { "tigv2",           { NULL }, 4124,  "udp"  },
-  { "opsview-envoy",   { NULL }, 4125,  "tcp"  },
-  { "opsview-envoy",   { NULL }, 4125,  "udp"  },
-  { "ddrepl",          { NULL }, 4126,  "tcp"  },
-  { "ddrepl",          { NULL }, 4126,  "udp"  },
-  { "unikeypro",       { NULL }, 4127,  "tcp"  },
-  { "unikeypro",       { NULL }, 4127,  "udp"  },
-  { "nufw",            { NULL }, 4128,  "tcp"  },
-  { "nufw",            { NULL }, 4128,  "udp"  },
-  { "nuauth",          { NULL }, 4129,  "tcp"  },
-  { "nuauth",          { NULL }, 4129,  "udp"  },
-  { "fronet",          { NULL }, 4130,  "tcp"  },
-  { "fronet",          { NULL }, 4130,  "udp"  },
-  { "stars",           { NULL }, 4131,  "tcp"  },
-  { "stars",           { NULL }, 4131,  "udp"  },
-  { "nuts_dem",        { NULL }, 4132,  "tcp"  },
-  { "nuts_dem",        { NULL }, 4132,  "udp"  },
-  { "nuts_bootp",      { NULL }, 4133,  "tcp"  },
-  { "nuts_bootp",      { NULL }, 4133,  "udp"  },
-  { "nifty-hmi",       { NULL }, 4134,  "tcp"  },
-  { "nifty-hmi",       { NULL }, 4134,  "udp"  },
-  { "cl-db-attach",    { NULL }, 4135,  "tcp"  },
-  { "cl-db-attach",    { NULL }, 4135,  "udp"  },
-  { "cl-db-request",   { NULL }, 4136,  "tcp"  },
-  { "cl-db-request",   { NULL }, 4136,  "udp"  },
-  { "cl-db-remote",    { NULL }, 4137,  "tcp"  },
-  { "cl-db-remote",    { NULL }, 4137,  "udp"  },
-  { "nettest",         { NULL }, 4138,  "tcp"  },
-  { "nettest",         { NULL }, 4138,  "udp"  },
-  { "thrtx",           { NULL }, 4139,  "tcp"  },
-  { "thrtx",           { NULL }, 4139,  "udp"  },
-  { "cedros_fds",      { NULL }, 4140,  "tcp"  },
-  { "cedros_fds",      { NULL }, 4140,  "udp"  },
-  { "oirtgsvc",        { NULL }, 4141,  "tcp"  },
-  { "oirtgsvc",        { NULL }, 4141,  "udp"  },
-  { "oidocsvc",        { NULL }, 4142,  "tcp"  },
-  { "oidocsvc",        { NULL }, 4142,  "udp"  },
-  { "oidsr",           { NULL }, 4143,  "tcp"  },
-  { "oidsr",           { NULL }, 4143,  "udp"  },
-  { "vvr-control",     { NULL }, 4145,  "tcp"  },
-  { "vvr-control",     { NULL }, 4145,  "udp"  },
-  { "tgcconnect",      { NULL }, 4146,  "tcp"  },
-  { "tgcconnect",      { NULL }, 4146,  "udp"  },
-  { "vrxpservman",     { NULL }, 4147,  "tcp"  },
-  { "vrxpservman",     { NULL }, 4147,  "udp"  },
-  { "hhb-handheld",    { NULL }, 4148,  "tcp"  },
-  { "hhb-handheld",    { NULL }, 4148,  "udp"  },
-  { "agslb",           { NULL }, 4149,  "tcp"  },
-  { "agslb",           { NULL }, 4149,  "udp"  },
-  { "PowerAlert-nsa",  { NULL }, 4150,  "tcp"  },
-  { "PowerAlert-nsa",  { NULL }, 4150,  "udp"  },
-  { "menandmice_noh",  { NULL }, 4151,  "tcp"  },
-  { "menandmice_noh",  { NULL }, 4151,  "udp"  },
-  { "idig_mux",        { NULL }, 4152,  "tcp"  },
-  { "idig_mux",        { NULL }, 4152,  "udp"  },
-  { "mbl-battd",       { NULL }, 4153,  "tcp"  },
-  { "mbl-battd",       { NULL }, 4153,  "udp"  },
-  { "atlinks",         { NULL }, 4154,  "tcp"  },
-  { "atlinks",         { NULL }, 4154,  "udp"  },
-  { "bzr",             { NULL }, 4155,  "tcp"  },
-  { "bzr",             { NULL }, 4155,  "udp"  },
-  { "stat-results",    { NULL }, 4156,  "tcp"  },
-  { "stat-results",    { NULL }, 4156,  "udp"  },
-  { "stat-scanner",    { NULL }, 4157,  "tcp"  },
-  { "stat-scanner",    { NULL }, 4157,  "udp"  },
-  { "stat-cc",         { NULL }, 4158,  "tcp"  },
-  { "stat-cc",         { NULL }, 4158,  "udp"  },
-  { "nss",             { NULL }, 4159,  "tcp"  },
-  { "nss",             { NULL }, 4159,  "udp"  },
-  { "jini-discovery",  { NULL }, 4160,  "tcp"  },
-  { "jini-discovery",  { NULL }, 4160,  "udp"  },
-  { "omscontact",      { NULL }, 4161,  "tcp"  },
-  { "omscontact",      { NULL }, 4161,  "udp"  },
-  { "omstopology",     { NULL }, 4162,  "tcp"  },
-  { "omstopology",     { NULL }, 4162,  "udp"  },
-  { "silverpeakpeer",  { NULL }, 4163,  "tcp"  },
-  { "silverpeakpeer",  { NULL }, 4163,  "udp"  },
-  { "silverpeakcomm",  { NULL }, 4164,  "tcp"  },
-  { "silverpeakcomm",  { NULL }, 4164,  "udp"  },
-  { "altcp",           { NULL }, 4165,  "tcp"  },
-  { "altcp",           { NULL }, 4165,  "udp"  },
-  { "joost",           { NULL }, 4166,  "tcp"  },
-  { "joost",           { NULL }, 4166,  "udp"  },
-  { "ddgn",            { NULL }, 4167,  "tcp"  },
-  { "ddgn",            { NULL }, 4167,  "udp"  },
-  { "pslicser",        { NULL }, 4168,  "tcp"  },
-  { "pslicser",        { NULL }, 4168,  "udp"  },
-  { "iadt",            { NULL }, 4169,  "tcp"  },
-  { "iadt-disc",       { NULL }, 4169,  "udp"  },
-  { "d-cinema-csp",    { NULL }, 4170,  "tcp"  },
-  { "ml-svnet",        { NULL }, 4171,  "tcp"  },
-  { "pcoip",           { NULL }, 4172,  "tcp"  },
-  { "pcoip",           { NULL }, 4172,  "udp"  },
-  { "smcluster",       { NULL }, 4174,  "tcp"  },
-  { "bccp",            { NULL }, 4175,  "tcp"  },
-  { "tl-ipcproxy",     { NULL }, 4176,  "tcp"  },
-  { "wello",           { NULL }, 4177,  "tcp"  },
-  { "wello",           { NULL }, 4177,  "udp"  },
-  { "storman",         { NULL }, 4178,  "tcp"  },
-  { "storman",         { NULL }, 4178,  "udp"  },
-  { "MaxumSP",         { NULL }, 4179,  "tcp"  },
-  { "MaxumSP",         { NULL }, 4179,  "udp"  },
-  { "httpx",           { NULL }, 4180,  "tcp"  },
-  { "httpx",           { NULL }, 4180,  "udp"  },
-  { "macbak",          { NULL }, 4181,  "tcp"  },
-  { "macbak",          { NULL }, 4181,  "udp"  },
-  { "pcptcpservice",   { NULL }, 4182,  "tcp"  },
-  { "pcptcpservice",   { NULL }, 4182,  "udp"  },
-  { "gmmp",            { NULL }, 4183,  "tcp"  },
-  { "gmmp",            { NULL }, 4183,  "udp"  },
-  { "universe_suite",  { NULL }, 4184,  "tcp"  },
-  { "universe_suite",  { NULL }, 4184,  "udp"  },
-  { "wcpp",            { NULL }, 4185,  "tcp"  },
-  { "wcpp",            { NULL }, 4185,  "udp"  },
-  { "boxbackupstore",  { NULL }, 4186,  "tcp"  },
-  { "csc_proxy",       { NULL }, 4187,  "tcp"  },
-  { "vatata",          { NULL }, 4188,  "tcp"  },
-  { "vatata",          { NULL }, 4188,  "udp"  },
-  { "pcep",            { NULL }, 4189,  "tcp"  },
-  { "sieve",           { NULL }, 4190,  "tcp"  },
-  { "dsmipv6",         { NULL }, 4191,  "udp"  },
-  { "azeti",           { NULL }, 4192,  "tcp"  },
-  { "azeti-bd",        { NULL }, 4192,  "udp"  },
-  { "pvxplusio",       { NULL }, 4193,  "tcp"  },
-  { "eims-admin",      { NULL }, 4199,  "tcp"  },
-  { "eims-admin",      { NULL }, 4199,  "udp"  },
-  { "corelccam",       { NULL }, 4300,  "tcp"  },
-  { "corelccam",       { NULL }, 4300,  "udp"  },
-  { "d-data",          { NULL }, 4301,  "tcp"  },
-  { "d-data",          { NULL }, 4301,  "udp"  },
-  { "d-data-control",  { NULL }, 4302,  "tcp"  },
-  { "d-data-control",  { NULL }, 4302,  "udp"  },
-  { "srcp",            { NULL }, 4303,  "tcp"  },
-  { "srcp",            { NULL }, 4303,  "udp"  },
-  { "owserver",        { NULL }, 4304,  "tcp"  },
-  { "owserver",        { NULL }, 4304,  "udp"  },
-  { "batman",          { NULL }, 4305,  "tcp"  },
-  { "batman",          { NULL }, 4305,  "udp"  },
-  { "pinghgl",         { NULL }, 4306,  "tcp"  },
-  { "pinghgl",         { NULL }, 4306,  "udp"  },
-  { "visicron-vs",     { NULL }, 4307,  "tcp"  },
-  { "visicron-vs",     { NULL }, 4307,  "udp"  },
-  { "compx-lockview",  { NULL }, 4308,  "tcp"  },
-  { "compx-lockview",  { NULL }, 4308,  "udp"  },
-  { "dserver",         { NULL }, 4309,  "tcp"  },
-  { "dserver",         { NULL }, 4309,  "udp"  },
-  { "mirrtex",         { NULL }, 4310,  "tcp"  },
-  { "mirrtex",         { NULL }, 4310,  "udp"  },
-  { "p6ssmc",          { NULL }, 4311,  "tcp"  },
-  { "pscl-mgt",        { NULL }, 4312,  "tcp"  },
-  { "perrla",          { NULL }, 4313,  "tcp"  },
-  { "fdt-rcatp",       { NULL }, 4320,  "tcp"  },
-  { "fdt-rcatp",       { NULL }, 4320,  "udp"  },
-  { "rwhois",          { NULL }, 4321,  "tcp"  },
-  { "rwhois",          { NULL }, 4321,  "udp"  },
-  { "trim-event",      { NULL }, 4322,  "tcp"  },
-  { "trim-event",      { NULL }, 4322,  "udp"  },
-  { "trim-ice",        { NULL }, 4323,  "tcp"  },
-  { "trim-ice",        { NULL }, 4323,  "udp"  },
-  { "balour",          { NULL }, 4324,  "tcp"  },
-  { "balour",          { NULL }, 4324,  "udp"  },
-  { "geognosisman",    { NULL }, 4325,  "tcp"  },
-  { "geognosisman",    { NULL }, 4325,  "udp"  },
-  { "geognosis",       { NULL }, 4326,  "tcp"  },
-  { "geognosis",       { NULL }, 4326,  "udp"  },
-  { "jaxer-web",       { NULL }, 4327,  "tcp"  },
-  { "jaxer-web",       { NULL }, 4327,  "udp"  },
-  { "jaxer-manager",   { NULL }, 4328,  "tcp"  },
-  { "jaxer-manager",   { NULL }, 4328,  "udp"  },
-  { "publiqare-sync",  { NULL }, 4329,  "tcp"  },
-  { "gaia",            { NULL }, 4340,  "tcp"  },
-  { "gaia",            { NULL }, 4340,  "udp"  },
-  { "lisp-data",       { NULL }, 4341,  "tcp"  },
-  { "lisp-data",       { NULL }, 4341,  "udp"  },
-  { "lisp-cons",       { NULL }, 4342,  "tcp"  },
-  { "lisp-control",    { NULL }, 4342,  "udp"  },
-  { "unicall",         { NULL }, 4343,  "tcp"  },
-  { "unicall",         { NULL }, 4343,  "udp"  },
-  { "vinainstall",     { NULL }, 4344,  "tcp"  },
-  { "vinainstall",     { NULL }, 4344,  "udp"  },
-  { "m4-network-as",   { NULL }, 4345,  "tcp"  },
-  { "m4-network-as",   { NULL }, 4345,  "udp"  },
-  { "elanlm",          { NULL }, 4346,  "tcp"  },
-  { "elanlm",          { NULL }, 4346,  "udp"  },
-  { "lansurveyor",     { NULL }, 4347,  "tcp"  },
-  { "lansurveyor",     { NULL }, 4347,  "udp"  },
-  { "itose",           { NULL }, 4348,  "tcp"  },
-  { "itose",           { NULL }, 4348,  "udp"  },
-  { "fsportmap",       { NULL }, 4349,  "tcp"  },
-  { "fsportmap",       { NULL }, 4349,  "udp"  },
-  { "net-device",      { NULL }, 4350,  "tcp"  },
-  { "net-device",      { NULL }, 4350,  "udp"  },
-  { "plcy-net-svcs",   { NULL }, 4351,  "tcp"  },
-  { "plcy-net-svcs",   { NULL }, 4351,  "udp"  },
-  { "pjlink",          { NULL }, 4352,  "tcp"  },
-  { "pjlink",          { NULL }, 4352,  "udp"  },
-  { "f5-iquery",       { NULL }, 4353,  "tcp"  },
-  { "f5-iquery",       { NULL }, 4353,  "udp"  },
-  { "qsnet-trans",     { NULL }, 4354,  "tcp"  },
-  { "qsnet-trans",     { NULL }, 4354,  "udp"  },
-  { "qsnet-workst",    { NULL }, 4355,  "tcp"  },
-  { "qsnet-workst",    { NULL }, 4355,  "udp"  },
-  { "qsnet-assist",    { NULL }, 4356,  "tcp"  },
-  { "qsnet-assist",    { NULL }, 4356,  "udp"  },
-  { "qsnet-cond",      { NULL }, 4357,  "tcp"  },
-  { "qsnet-cond",      { NULL }, 4357,  "udp"  },
-  { "qsnet-nucl",      { NULL }, 4358,  "tcp"  },
-  { "qsnet-nucl",      { NULL }, 4358,  "udp"  },
-  { "omabcastltkm",    { NULL }, 4359,  "tcp"  },
-  { "omabcastltkm",    { NULL }, 4359,  "udp"  },
-  { "matrix_vnet",     { NULL }, 4360,  "tcp"  },
-  { "nacnl",           { NULL }, 4361,  "udp"  },
-  { "afore-vdp-disc",  { NULL }, 4362,  "udp"  },
-  { "wxbrief",         { NULL }, 4368,  "tcp"  },
-  { "wxbrief",         { NULL }, 4368,  "udp"  },
-  { "epmd",            { NULL }, 4369,  "tcp"  },
-  { "epmd",            { NULL }, 4369,  "udp"  },
-  { "elpro_tunnel",    { NULL }, 4370,  "tcp"  },
-  { "elpro_tunnel",    { NULL }, 4370,  "udp"  },
-  { "l2c-control",     { NULL }, 4371,  "tcp"  },
-  { "l2c-disc",        { NULL }, 4371,  "udp"  },
-  { "l2c-data",        { NULL }, 4372,  "tcp"  },
-  { "l2c-data",        { NULL }, 4372,  "udp"  },
-  { "remctl",          { NULL }, 4373,  "tcp"  },
-  { "remctl",          { NULL }, 4373,  "udp"  },
-  { "psi-ptt",         { NULL }, 4374,  "tcp"  },
-  { "tolteces",        { NULL }, 4375,  "tcp"  },
-  { "tolteces",        { NULL }, 4375,  "udp"  },
-  { "bip",             { NULL }, 4376,  "tcp"  },
-  { "bip",             { NULL }, 4376,  "udp"  },
-  { "cp-spxsvr",       { NULL }, 4377,  "tcp"  },
-  { "cp-spxsvr",       { NULL }, 4377,  "udp"  },
-  { "cp-spxdpy",       { NULL }, 4378,  "tcp"  },
-  { "cp-spxdpy",       { NULL }, 4378,  "udp"  },
-  { "ctdb",            { NULL }, 4379,  "tcp"  },
-  { "ctdb",            { NULL }, 4379,  "udp"  },
-  { "xandros-cms",     { NULL }, 4389,  "tcp"  },
-  { "xandros-cms",     { NULL }, 4389,  "udp"  },
-  { "wiegand",         { NULL }, 4390,  "tcp"  },
-  { "wiegand",         { NULL }, 4390,  "udp"  },
-  { "apwi-imserver",   { NULL }, 4391,  "tcp"  },
-  { "apwi-rxserver",   { NULL }, 4392,  "tcp"  },
-  { "apwi-rxspooler",  { NULL }, 4393,  "tcp"  },
-  { "apwi-disc",       { NULL }, 4394,  "udp"  },
-  { "omnivisionesx",   { NULL }, 4395,  "tcp"  },
-  { "omnivisionesx",   { NULL }, 4395,  "udp"  },
-  { "fly",             { NULL }, 4396,  "tcp"  },
-  { "ds-srv",          { NULL }, 4400,  "tcp"  },
-  { "ds-srv",          { NULL }, 4400,  "udp"  },
-  { "ds-srvr",         { NULL }, 4401,  "tcp"  },
-  { "ds-srvr",         { NULL }, 4401,  "udp"  },
-  { "ds-clnt",         { NULL }, 4402,  "tcp"  },
-  { "ds-clnt",         { NULL }, 4402,  "udp"  },
-  { "ds-user",         { NULL }, 4403,  "tcp"  },
-  { "ds-user",         { NULL }, 4403,  "udp"  },
-  { "ds-admin",        { NULL }, 4404,  "tcp"  },
-  { "ds-admin",        { NULL }, 4404,  "udp"  },
-  { "ds-mail",         { NULL }, 4405,  "tcp"  },
-  { "ds-mail",         { NULL }, 4405,  "udp"  },
-  { "ds-slp",          { NULL }, 4406,  "tcp"  },
-  { "ds-slp",          { NULL }, 4406,  "udp"  },
-  { "nacagent",        { NULL }, 4407,  "tcp"  },
-  { "slscc",           { NULL }, 4408,  "tcp"  },
-  { "netcabinet-com",  { NULL }, 4409,  "tcp"  },
-  { "itwo-server",     { NULL }, 4410,  "tcp"  },
-  { "netrockey6",      { NULL }, 4425,  "tcp"  },
-  { "netrockey6",      { NULL }, 4425,  "udp"  },
-  { "beacon-port-2",   { NULL }, 4426,  "tcp"  },
-  { "beacon-port-2",   { NULL }, 4426,  "udp"  },
-  { "drizzle",         { NULL }, 4427,  "tcp"  },
-  { "omviserver",      { NULL }, 4428,  "tcp"  },
-  { "omviagent",       { NULL }, 4429,  "tcp"  },
-  { "rsqlserver",      { NULL }, 4430,  "tcp"  },
-  { "rsqlserver",      { NULL }, 4430,  "udp"  },
-  { "wspipe",          { NULL }, 4431,  "tcp"  },
-  { "netblox",         { NULL }, 4441,  "udp"  },
-  { "saris",           { NULL }, 4442,  "tcp"  },
-  { "saris",           { NULL }, 4442,  "udp"  },
-  { "pharos",          { NULL }, 4443,  "tcp"  },
-  { "pharos",          { NULL }, 4443,  "udp"  },
-  { "krb524",          { NULL }, 4444,  "tcp"  },
-  { "krb524",          { NULL }, 4444,  "udp"  },
-  { "nv-video",        { NULL }, 4444,  "tcp"  },
-  { "nv-video",        { NULL }, 4444,  "udp"  },
-  { "upnotifyp",       { NULL }, 4445,  "tcp"  },
-  { "upnotifyp",       { NULL }, 4445,  "udp"  },
-  { "n1-fwp",          { NULL }, 4446,  "tcp"  },
-  { "n1-fwp",          { NULL }, 4446,  "udp"  },
-  { "n1-rmgmt",        { NULL }, 4447,  "tcp"  },
-  { "n1-rmgmt",        { NULL }, 4447,  "udp"  },
-  { "asc-slmd",        { NULL }, 4448,  "tcp"  },
-  { "asc-slmd",        { NULL }, 4448,  "udp"  },
-  { "privatewire",     { NULL }, 4449,  "tcp"  },
-  { "privatewire",     { NULL }, 4449,  "udp"  },
-  { "camp",            { NULL }, 4450,  "tcp"  },
-  { "camp",            { NULL }, 4450,  "udp"  },
-  { "ctisystemmsg",    { NULL }, 4451,  "tcp"  },
-  { "ctisystemmsg",    { NULL }, 4451,  "udp"  },
-  { "ctiprogramload",  { NULL }, 4452,  "tcp"  },
-  { "ctiprogramload",  { NULL }, 4452,  "udp"  },
-  { "nssalertmgr",     { NULL }, 4453,  "tcp"  },
-  { "nssalertmgr",     { NULL }, 4453,  "udp"  },
-  { "nssagentmgr",     { NULL }, 4454,  "tcp"  },
-  { "nssagentmgr",     { NULL }, 4454,  "udp"  },
-  { "prchat-user",     { NULL }, 4455,  "tcp"  },
-  { "prchat-user",     { NULL }, 4455,  "udp"  },
-  { "prchat-server",   { NULL }, 4456,  "tcp"  },
-  { "prchat-server",   { NULL }, 4456,  "udp"  },
-  { "prRegister",      { NULL }, 4457,  "tcp"  },
-  { "prRegister",      { NULL }, 4457,  "udp"  },
-  { "mcp",             { NULL }, 4458,  "tcp"  },
-  { "mcp",             { NULL }, 4458,  "udp"  },
-  { "hpssmgmt",        { NULL }, 4484,  "tcp"  },
-  { "hpssmgmt",        { NULL }, 4484,  "udp"  },
-  { "assyst-dr",       { NULL }, 4485,  "tcp"  },
-  { "icms",            { NULL }, 4486,  "tcp"  },
-  { "icms",            { NULL }, 4486,  "udp"  },
-  { "prex-tcp",        { NULL }, 4487,  "tcp"  },
-  { "awacs-ice",       { NULL }, 4488,  "tcp"  },
-  { "awacs-ice",       { NULL }, 4488,  "udp"  },
-  { "ipsec-nat-t",     { NULL }, 4500,  "tcp"  },
-  { "ipsec-nat-t",     { NULL }, 4500,  "udp"  },
-  { "ehs",             { NULL }, 4535,  "tcp"  },
-  { "ehs",             { NULL }, 4535,  "udp"  },
-  { "ehs-ssl",         { NULL }, 4536,  "tcp"  },
-  { "ehs-ssl",         { NULL }, 4536,  "udp"  },
-  { "wssauthsvc",      { NULL }, 4537,  "tcp"  },
-  { "wssauthsvc",      { NULL }, 4537,  "udp"  },
-  { "swx-gate",        { NULL }, 4538,  "tcp"  },
-  { "swx-gate",        { NULL }, 4538,  "udp"  },
-  { "worldscores",     { NULL }, 4545,  "tcp"  },
-  { "worldscores",     { NULL }, 4545,  "udp"  },
-  { "sf-lm",           { NULL }, 4546,  "tcp"  },
-  { "sf-lm",           { NULL }, 4546,  "udp"  },
-  { "lanner-lm",       { NULL }, 4547,  "tcp"  },
-  { "lanner-lm",       { NULL }, 4547,  "udp"  },
-  { "synchromesh",     { NULL }, 4548,  "tcp"  },
-  { "synchromesh",     { NULL }, 4548,  "udp"  },
-  { "aegate",          { NULL }, 4549,  "tcp"  },
-  { "aegate",          { NULL }, 4549,  "udp"  },
-  { "gds-adppiw-db",   { NULL }, 4550,  "tcp"  },
-  { "gds-adppiw-db",   { NULL }, 4550,  "udp"  },
-  { "ieee-mih",        { NULL }, 4551,  "tcp"  },
-  { "ieee-mih",        { NULL }, 4551,  "udp"  },
-  { "menandmice-mon",  { NULL }, 4552,  "tcp"  },
-  { "menandmice-mon",  { NULL }, 4552,  "udp"  },
-  { "icshostsvc",      { NULL }, 4553,  "tcp"  },
-  { "msfrs",           { NULL }, 4554,  "tcp"  },
-  { "msfrs",           { NULL }, 4554,  "udp"  },
-  { "rsip",            { NULL }, 4555,  "tcp"  },
-  { "rsip",            { NULL }, 4555,  "udp"  },
-  { "dtn-bundle-tcp",  { NULL }, 4556,  "tcp"  },
-  { "dtn-bundle-udp",  { NULL }, 4556,  "udp"  },
-  { "mtcevrunqss",     { NULL }, 4557,  "udp"  },
-  { "mtcevrunqman",    { NULL }, 4558,  "udp"  },
-  { "hylafax",         { NULL }, 4559,  "tcp"  },
-  { "hylafax",         { NULL }, 4559,  "udp"  },
-  { "kwtc",            { NULL }, 4566,  "tcp"  },
-  { "kwtc",            { NULL }, 4566,  "udp"  },
-  { "tram",            { NULL }, 4567,  "tcp"  },
-  { "tram",            { NULL }, 4567,  "udp"  },
-  { "bmc-reporting",   { NULL }, 4568,  "tcp"  },
-  { "bmc-reporting",   { NULL }, 4568,  "udp"  },
-  { "iax",             { NULL }, 4569,  "tcp"  },
-  { "iax",             { NULL }, 4569,  "udp"  },
-  { "rid",             { NULL }, 4590,  "tcp"  },
-  { "l3t-at-an",       { NULL }, 4591,  "tcp"  },
-  { "l3t-at-an",       { NULL }, 4591,  "udp"  },
-  { "hrpd-ith-at-an",  { NULL }, 4592,  "udp"  },
-  { "ipt-anri-anri",   { NULL }, 4593,  "tcp"  },
-  { "ipt-anri-anri",   { NULL }, 4593,  "udp"  },
-  { "ias-session",     { NULL }, 4594,  "tcp"  },
-  { "ias-session",     { NULL }, 4594,  "udp"  },
-  { "ias-paging",      { NULL }, 4595,  "tcp"  },
-  { "ias-paging",      { NULL }, 4595,  "udp"  },
-  { "ias-neighbor",    { NULL }, 4596,  "tcp"  },
-  { "ias-neighbor",    { NULL }, 4596,  "udp"  },
-  { "a21-an-1xbs",     { NULL }, 4597,  "tcp"  },
-  { "a21-an-1xbs",     { NULL }, 4597,  "udp"  },
-  { "a16-an-an",       { NULL }, 4598,  "tcp"  },
-  { "a16-an-an",       { NULL }, 4598,  "udp"  },
-  { "a17-an-an",       { NULL }, 4599,  "tcp"  },
-  { "a17-an-an",       { NULL }, 4599,  "udp"  },
-  { "piranha1",        { NULL }, 4600,  "tcp"  },
-  { "piranha1",        { NULL }, 4600,  "udp"  },
-  { "piranha2",        { NULL }, 4601,  "tcp"  },
-  { "piranha2",        { NULL }, 4601,  "udp"  },
-  { "mtsserver",       { NULL }, 4602,  "tcp"  },
-  { "menandmice-upg",  { NULL }, 4603,  "tcp"  },
-  { "playsta2-app",    { NULL }, 4658,  "tcp"  },
-  { "playsta2-app",    { NULL }, 4658,  "udp"  },
-  { "playsta2-lob",    { NULL }, 4659,  "tcp"  },
-  { "playsta2-lob",    { NULL }, 4659,  "udp"  },
-  { "smaclmgr",        { NULL }, 4660,  "tcp"  },
-  { "smaclmgr",        { NULL }, 4660,  "udp"  },
-  { "kar2ouche",       { NULL }, 4661,  "tcp"  },
-  { "kar2ouche",       { NULL }, 4661,  "udp"  },
-  { "oms",             { NULL }, 4662,  "tcp"  },
-  { "oms",             { NULL }, 4662,  "udp"  },
-  { "noteit",          { NULL }, 4663,  "tcp"  },
-  { "noteit",          { NULL }, 4663,  "udp"  },
-  { "ems",             { NULL }, 4664,  "tcp"  },
-  { "ems",             { NULL }, 4664,  "udp"  },
-  { "contclientms",    { NULL }, 4665,  "tcp"  },
-  { "contclientms",    { NULL }, 4665,  "udp"  },
-  { "eportcomm",       { NULL }, 4666,  "tcp"  },
-  { "eportcomm",       { NULL }, 4666,  "udp"  },
-  { "mmacomm",         { NULL }, 4667,  "tcp"  },
-  { "mmacomm",         { NULL }, 4667,  "udp"  },
-  { "mmaeds",          { NULL }, 4668,  "tcp"  },
-  { "mmaeds",          { NULL }, 4668,  "udp"  },
-  { "eportcommdata",   { NULL }, 4669,  "tcp"  },
-  { "eportcommdata",   { NULL }, 4669,  "udp"  },
-  { "light",           { NULL }, 4670,  "tcp"  },
-  { "light",           { NULL }, 4670,  "udp"  },
-  { "acter",           { NULL }, 4671,  "tcp"  },
-  { "acter",           { NULL }, 4671,  "udp"  },
-  { "rfa",             { NULL }, 4672,  "tcp"  },
-  { "rfa",             { NULL }, 4672,  "udp"  },
-  { "cxws",            { NULL }, 4673,  "tcp"  },
-  { "cxws",            { NULL }, 4673,  "udp"  },
-  { "appiq-mgmt",      { NULL }, 4674,  "tcp"  },
-  { "appiq-mgmt",      { NULL }, 4674,  "udp"  },
-  { "dhct-status",     { NULL }, 4675,  "tcp"  },
-  { "dhct-status",     { NULL }, 4675,  "udp"  },
-  { "dhct-alerts",     { NULL }, 4676,  "tcp"  },
-  { "dhct-alerts",     { NULL }, 4676,  "udp"  },
-  { "bcs",             { NULL }, 4677,  "tcp"  },
-  { "bcs",             { NULL }, 4677,  "udp"  },
-  { "traversal",       { NULL }, 4678,  "tcp"  },
-  { "traversal",       { NULL }, 4678,  "udp"  },
-  { "mgesupervision",  { NULL }, 4679,  "tcp"  },
-  { "mgesupervision",  { NULL }, 4679,  "udp"  },
-  { "mgemanagement",   { NULL }, 4680,  "tcp"  },
-  { "mgemanagement",   { NULL }, 4680,  "udp"  },
-  { "parliant",        { NULL }, 4681,  "tcp"  },
-  { "parliant",        { NULL }, 4681,  "udp"  },
-  { "finisar",         { NULL }, 4682,  "tcp"  },
-  { "finisar",         { NULL }, 4682,  "udp"  },
-  { "spike",           { NULL }, 4683,  "tcp"  },
-  { "spike",           { NULL }, 4683,  "udp"  },
-  { "rfid-rp1",        { NULL }, 4684,  "tcp"  },
-  { "rfid-rp1",        { NULL }, 4684,  "udp"  },
-  { "autopac",         { NULL }, 4685,  "tcp"  },
-  { "autopac",         { NULL }, 4685,  "udp"  },
-  { "msp-os",          { NULL }, 4686,  "tcp"  },
-  { "msp-os",          { NULL }, 4686,  "udp"  },
-  { "nst",             { NULL }, 4687,  "tcp"  },
-  { "nst",             { NULL }, 4687,  "udp"  },
-  { "mobile-p2p",      { NULL }, 4688,  "tcp"  },
-  { "mobile-p2p",      { NULL }, 4688,  "udp"  },
-  { "altovacentral",   { NULL }, 4689,  "tcp"  },
-  { "altovacentral",   { NULL }, 4689,  "udp"  },
-  { "prelude",         { NULL }, 4690,  "tcp"  },
-  { "prelude",         { NULL }, 4690,  "udp"  },
-  { "mtn",             { NULL }, 4691,  "tcp"  },
-  { "mtn",             { NULL }, 4691,  "udp"  },
-  { "conspiracy",      { NULL }, 4692,  "tcp"  },
-  { "conspiracy",      { NULL }, 4692,  "udp"  },
-  { "netxms-agent",    { NULL }, 4700,  "tcp"  },
-  { "netxms-agent",    { NULL }, 4700,  "udp"  },
-  { "netxms-mgmt",     { NULL }, 4701,  "tcp"  },
-  { "netxms-mgmt",     { NULL }, 4701,  "udp"  },
-  { "netxms-sync",     { NULL }, 4702,  "tcp"  },
-  { "netxms-sync",     { NULL }, 4702,  "udp"  },
-  { "npqes-test",      { NULL }, 4703,  "tcp"  },
-  { "assuria-ins",     { NULL }, 4704,  "tcp"  },
-  { "truckstar",       { NULL }, 4725,  "tcp"  },
-  { "truckstar",       { NULL }, 4725,  "udp"  },
-  { "a26-fap-fgw",     { NULL }, 4726,  "udp"  },
-  { "fcis",            { NULL }, 4727,  "tcp"  },
-  { "fcis-disc",       { NULL }, 4727,  "udp"  },
-  { "capmux",          { NULL }, 4728,  "tcp"  },
-  { "capmux",          { NULL }, 4728,  "udp"  },
-  { "gsmtap",          { NULL }, 4729,  "udp"  },
-  { "gearman",         { NULL }, 4730,  "tcp"  },
-  { "gearman",         { NULL }, 4730,  "udp"  },
-  { "remcap",          { NULL }, 4731,  "tcp"  },
-  { "ohmtrigger",      { NULL }, 4732,  "udp"  },
-  { "resorcs",         { NULL }, 4733,  "tcp"  },
-  { "ipdr-sp",         { NULL }, 4737,  "tcp"  },
-  { "ipdr-sp",         { NULL }, 4737,  "udp"  },
-  { "solera-lpn",      { NULL }, 4738,  "tcp"  },
-  { "solera-lpn",      { NULL }, 4738,  "udp"  },
-  { "ipfix",           { NULL }, 4739,  "tcp"  },
-  { "ipfix",           { NULL }, 4739,  "udp"  },
-  { "ipfix",           { NULL }, 4739,  "sctp" },
-  { "ipfixs",          { NULL }, 4740,  "tcp"  },
-  { "ipfixs",          { NULL }, 4740,  "sctp" },
-  { "ipfixs",          { NULL }, 4740,  "udp"  },
-  { "lumimgrd",        { NULL }, 4741,  "tcp"  },
-  { "lumimgrd",        { NULL }, 4741,  "udp"  },
-  { "sicct",           { NULL }, 4742,  "tcp"  },
-  { "sicct-sdp",       { NULL }, 4742,  "udp"  },
-  { "openhpid",        { NULL }, 4743,  "tcp"  },
-  { "openhpid",        { NULL }, 4743,  "udp"  },
-  { "ifsp",            { NULL }, 4744,  "tcp"  },
-  { "ifsp",            { NULL }, 4744,  "udp"  },
-  { "fmp",             { NULL }, 4745,  "tcp"  },
-  { "fmp",             { NULL }, 4745,  "udp"  },
-  { "profilemac",      { NULL }, 4749,  "tcp"  },
-  { "profilemac",      { NULL }, 4749,  "udp"  },
-  { "ssad",            { NULL }, 4750,  "tcp"  },
-  { "ssad",            { NULL }, 4750,  "udp"  },
-  { "spocp",           { NULL }, 4751,  "tcp"  },
-  { "spocp",           { NULL }, 4751,  "udp"  },
-  { "snap",            { NULL }, 4752,  "tcp"  },
-  { "snap",            { NULL }, 4752,  "udp"  },
-  { "bfd-multi-ctl",   { NULL }, 4784,  "tcp"  },
-  { "bfd-multi-ctl",   { NULL }, 4784,  "udp"  },
-  { "cncp",            { NULL }, 4785,  "udp"  },
-  { "smart-install",   { NULL }, 4786,  "tcp"  },
-  { "sia-ctrl-plane",  { NULL }, 4787,  "tcp"  },
-  { "iims",            { NULL }, 4800,  "tcp"  },
-  { "iims",            { NULL }, 4800,  "udp"  },
-  { "iwec",            { NULL }, 4801,  "tcp"  },
-  { "iwec",            { NULL }, 4801,  "udp"  },
-  { "ilss",            { NULL }, 4802,  "tcp"  },
-  { "ilss",            { NULL }, 4802,  "udp"  },
-  { "notateit",        { NULL }, 4803,  "tcp"  },
-  { "notateit-disc",   { NULL }, 4803,  "udp"  },
-  { "aja-ntv4-disc",   { NULL }, 4804,  "udp"  },
-  { "htcp",            { NULL }, 4827,  "tcp"  },
-  { "htcp",            { NULL }, 4827,  "udp"  },
-  { "varadero-0",      { NULL }, 4837,  "tcp"  },
-  { "varadero-0",      { NULL }, 4837,  "udp"  },
-  { "varadero-1",      { NULL }, 4838,  "tcp"  },
-  { "varadero-1",      { NULL }, 4838,  "udp"  },
-  { "varadero-2",      { NULL }, 4839,  "tcp"  },
-  { "varadero-2",      { NULL }, 4839,  "udp"  },
-  { "opcua-tcp",       { NULL }, 4840,  "tcp"  },
-  { "opcua-udp",       { NULL }, 4840,  "udp"  },
-  { "quosa",           { NULL }, 4841,  "tcp"  },
-  { "quosa",           { NULL }, 4841,  "udp"  },
-  { "gw-asv",          { NULL }, 4842,  "tcp"  },
-  { "gw-asv",          { NULL }, 4842,  "udp"  },
-  { "opcua-tls",       { NULL }, 4843,  "tcp"  },
-  { "opcua-tls",       { NULL }, 4843,  "udp"  },
-  { "gw-log",          { NULL }, 4844,  "tcp"  },
-  { "gw-log",          { NULL }, 4844,  "udp"  },
-  { "wcr-remlib",      { NULL }, 4845,  "tcp"  },
-  { "wcr-remlib",      { NULL }, 4845,  "udp"  },
-  { "contamac_icm",    { NULL }, 4846,  "tcp"  },
-  { "contamac_icm",    { NULL }, 4846,  "udp"  },
-  { "wfc",             { NULL }, 4847,  "tcp"  },
-  { "wfc",             { NULL }, 4847,  "udp"  },
-  { "appserv-http",    { NULL }, 4848,  "tcp"  },
-  { "appserv-http",    { NULL }, 4848,  "udp"  },
-  { "appserv-https",   { NULL }, 4849,  "tcp"  },
-  { "appserv-https",   { NULL }, 4849,  "udp"  },
-  { "sun-as-nodeagt",  { NULL }, 4850,  "tcp"  },
-  { "sun-as-nodeagt",  { NULL }, 4850,  "udp"  },
-  { "derby-repli",     { NULL }, 4851,  "tcp"  },
-  { "derby-repli",     { NULL }, 4851,  "udp"  },
-  { "unify-debug",     { NULL }, 4867,  "tcp"  },
-  { "unify-debug",     { NULL }, 4867,  "udp"  },
-  { "phrelay",         { NULL }, 4868,  "tcp"  },
-  { "phrelay",         { NULL }, 4868,  "udp"  },
-  { "phrelaydbg",      { NULL }, 4869,  "tcp"  },
-  { "phrelaydbg",      { NULL }, 4869,  "udp"  },
-  { "cc-tracking",     { NULL }, 4870,  "tcp"  },
-  { "cc-tracking",     { NULL }, 4870,  "udp"  },
-  { "wired",           { NULL }, 4871,  "tcp"  },
-  { "wired",           { NULL }, 4871,  "udp"  },
-  { "tritium-can",     { NULL }, 4876,  "tcp"  },
-  { "tritium-can",     { NULL }, 4876,  "udp"  },
-  { "lmcs",            { NULL }, 4877,  "tcp"  },
-  { "lmcs",            { NULL }, 4877,  "udp"  },
-  { "inst-discovery",  { NULL }, 4878,  "udp"  },
-  { "wsdl-event",      { NULL }, 4879,  "tcp"  },
-  { "hislip",          { NULL }, 4880,  "tcp"  },
-  { "socp-t",          { NULL }, 4881,  "udp"  },
-  { "socp-c",          { NULL }, 4882,  "udp"  },
-  { "wmlserver",       { NULL }, 4883,  "tcp"  },
-  { "hivestor",        { NULL }, 4884,  "tcp"  },
-  { "hivestor",        { NULL }, 4884,  "udp"  },
-  { "abbs",            { NULL }, 4885,  "tcp"  },
-  { "abbs",            { NULL }, 4885,  "udp"  },
-  { "lyskom",          { NULL }, 4894,  "tcp"  },
-  { "lyskom",          { NULL }, 4894,  "udp"  },
-  { "radmin-port",     { NULL }, 4899,  "tcp"  },
-  { "radmin-port",     { NULL }, 4899,  "udp"  },
-  { "hfcs",            { NULL }, 4900,  "tcp"  },
-  { "hfcs",            { NULL }, 4900,  "udp"  },
-  { "flr_agent",       { NULL }, 4901,  "tcp"  },
-  { "magiccontrol",    { NULL }, 4902,  "tcp"  },
-  { "lutap",           { NULL }, 4912,  "tcp"  },
-  { "lutcp",           { NULL }, 4913,  "tcp"  },
-  { "bones",           { NULL }, 4914,  "tcp"  },
-  { "bones",           { NULL }, 4914,  "udp"  },
-  { "frcs",            { NULL }, 4915,  "tcp"  },
-  { "atsc-mh-ssc",     { NULL }, 4937,  "udp"  },
-  { "eq-office-4940",  { NULL }, 4940,  "tcp"  },
-  { "eq-office-4940",  { NULL }, 4940,  "udp"  },
-  { "eq-office-4941",  { NULL }, 4941,  "tcp"  },
-  { "eq-office-4941",  { NULL }, 4941,  "udp"  },
-  { "eq-office-4942",  { NULL }, 4942,  "tcp"  },
-  { "eq-office-4942",  { NULL }, 4942,  "udp"  },
-  { "munin",           { NULL }, 4949,  "tcp"  },
-  { "munin",           { NULL }, 4949,  "udp"  },
-  { "sybasesrvmon",    { NULL }, 4950,  "tcp"  },
-  { "sybasesrvmon",    { NULL }, 4950,  "udp"  },
-  { "pwgwims",         { NULL }, 4951,  "tcp"  },
-  { "pwgwims",         { NULL }, 4951,  "udp"  },
-  { "sagxtsds",        { NULL }, 4952,  "tcp"  },
-  { "sagxtsds",        { NULL }, 4952,  "udp"  },
-  { "dbsyncarbiter",   { NULL }, 4953,  "tcp"  },
-  { "ccss-qmm",        { NULL }, 4969,  "tcp"  },
-  { "ccss-qmm",        { NULL }, 4969,  "udp"  },
-  { "ccss-qsm",        { NULL }, 4970,  "tcp"  },
-  { "ccss-qsm",        { NULL }, 4970,  "udp"  },
-  { "webyast",         { NULL }, 4984,  "tcp"  },
-  { "gerhcs",          { NULL }, 4985,  "tcp"  },
-  { "mrip",            { NULL }, 4986,  "tcp"  },
-  { "mrip",            { NULL }, 4986,  "udp"  },
-  { "smar-se-port1",   { NULL }, 4987,  "tcp"  },
-  { "smar-se-port1",   { NULL }, 4987,  "udp"  },
-  { "smar-se-port2",   { NULL }, 4988,  "tcp"  },
-  { "smar-se-port2",   { NULL }, 4988,  "udp"  },
-  { "parallel",        { NULL }, 4989,  "tcp"  },
-  { "parallel",        { NULL }, 4989,  "udp"  },
-  { "busycal",         { NULL }, 4990,  "tcp"  },
-  { "busycal",         { NULL }, 4990,  "udp"  },
-  { "vrt",             { NULL }, 4991,  "tcp"  },
-  { "vrt",             { NULL }, 4991,  "udp"  },
-  { "hfcs-manager",    { NULL }, 4999,  "tcp"  },
-  { "hfcs-manager",    { NULL }, 4999,  "udp"  },
-  { "commplex-main",   { NULL }, 5000,  "tcp"  },
-  { "commplex-main",   { NULL }, 5000,  "udp"  },
-  { "commplex-link",   { NULL }, 5001,  "tcp"  },
-  { "commplex-link",   { NULL }, 5001,  "udp"  },
-  { "rfe",             { NULL }, 5002,  "tcp"  },
-  { "rfe",             { NULL }, 5002,  "udp"  },
-  { "fmpro-internal",  { NULL }, 5003,  "tcp"  },
-  { "fmpro-internal",  { NULL }, 5003,  "udp"  },
-  { "avt-profile-1",   { NULL }, 5004,  "tcp"  },
-  { "avt-profile-1",   { NULL }, 5004,  "udp"  },
-  { "avt-profile-1",   { NULL }, 5004,  "dccp" },
-  { "avt-profile-2",   { NULL }, 5005,  "tcp"  },
-  { "avt-profile-2",   { NULL }, 5005,  "udp"  },
-  { "avt-profile-2",   { NULL }, 5005,  "dccp" },
-  { "wsm-server",      { NULL }, 5006,  "tcp"  },
-  { "wsm-server",      { NULL }, 5006,  "udp"  },
-  { "wsm-server-ssl",  { NULL }, 5007,  "tcp"  },
-  { "wsm-server-ssl",  { NULL }, 5007,  "udp"  },
-  { "synapsis-edge",   { NULL }, 5008,  "tcp"  },
-  { "synapsis-edge",   { NULL }, 5008,  "udp"  },
-  { "winfs",           { NULL }, 5009,  "tcp"  },
-  { "winfs",           { NULL }, 5009,  "udp"  },
-  { "telelpathstart",  { NULL }, 5010,  "tcp"  },
-  { "telelpathstart",  { NULL }, 5010,  "udp"  },
-  { "telelpathattack", { NULL }, 5011,  "tcp"  },
-  { "telelpathattack", { NULL }, 5011,  "udp"  },
-  { "nsp",             { NULL }, 5012,  "tcp"  },
-  { "nsp",             { NULL }, 5012,  "udp"  },
-  { "fmpro-v6",        { NULL }, 5013,  "tcp"  },
-  { "fmpro-v6",        { NULL }, 5013,  "udp"  },
-  { "onpsocket",       { NULL }, 5014,  "udp"  },
-  { "fmwp",            { NULL }, 5015,  "tcp"  },
-  { "zenginkyo-1",     { NULL }, 5020,  "tcp"  },
-  { "zenginkyo-1",     { NULL }, 5020,  "udp"  },
-  { "zenginkyo-2",     { NULL }, 5021,  "tcp"  },
-  { "zenginkyo-2",     { NULL }, 5021,  "udp"  },
-  { "mice",            { NULL }, 5022,  "tcp"  },
-  { "mice",            { NULL }, 5022,  "udp"  },
-  { "htuilsrv",        { NULL }, 5023,  "tcp"  },
-  { "htuilsrv",        { NULL }, 5023,  "udp"  },
-  { "scpi-telnet",     { NULL }, 5024,  "tcp"  },
-  { "scpi-telnet",     { NULL }, 5024,  "udp"  },
-  { "scpi-raw",        { NULL }, 5025,  "tcp"  },
-  { "scpi-raw",        { NULL }, 5025,  "udp"  },
-  { "strexec-d",       { NULL }, 5026,  "tcp"  },
-  { "strexec-d",       { NULL }, 5026,  "udp"  },
-  { "strexec-s",       { NULL }, 5027,  "tcp"  },
-  { "strexec-s",       { NULL }, 5027,  "udp"  },
-  { "qvr",             { NULL }, 5028,  "tcp"  },
-  { "infobright",      { NULL }, 5029,  "tcp"  },
-  { "infobright",      { NULL }, 5029,  "udp"  },
-  { "surfpass",        { NULL }, 5030,  "tcp"  },
-  { "surfpass",        { NULL }, 5030,  "udp"  },
-  { "dmp",             { NULL }, 5031,  "udp"  },
-  { "asnaacceler8db",  { NULL }, 5042,  "tcp"  },
-  { "asnaacceler8db",  { NULL }, 5042,  "udp"  },
-  { "swxadmin",        { NULL }, 5043,  "tcp"  },
-  { "swxadmin",        { NULL }, 5043,  "udp"  },
-  { "lxi-evntsvc",     { NULL }, 5044,  "tcp"  },
-  { "lxi-evntsvc",     { NULL }, 5044,  "udp"  },
-  { "osp",             { NULL }, 5045,  "tcp"  },
-  { "vpm-udp",         { NULL }, 5046,  "udp"  },
-  { "iscape",          { NULL }, 5047,  "udp"  },
-  { "texai",           { NULL }, 5048,  "tcp"  },
-  { "ivocalize",       { NULL }, 5049,  "tcp"  },
-  { "ivocalize",       { NULL }, 5049,  "udp"  },
-  { "mmcc",            { NULL }, 5050,  "tcp"  },
-  { "mmcc",            { NULL }, 5050,  "udp"  },
-  { "ita-agent",       { NULL }, 5051,  "tcp"  },
-  { "ita-agent",       { NULL }, 5051,  "udp"  },
-  { "ita-manager",     { NULL }, 5052,  "tcp"  },
-  { "ita-manager",     { NULL }, 5052,  "udp"  },
-  { "rlm",             { NULL }, 5053,  "tcp"  },
-  { "rlm-admin",       { NULL }, 5054,  "tcp"  },
-  { "unot",            { NULL }, 5055,  "tcp"  },
-  { "unot",            { NULL }, 5055,  "udp"  },
-  { "intecom-ps1",     { NULL }, 5056,  "tcp"  },
-  { "intecom-ps1",     { NULL }, 5056,  "udp"  },
-  { "intecom-ps2",     { NULL }, 5057,  "tcp"  },
-  { "intecom-ps2",     { NULL }, 5057,  "udp"  },
-  { "locus-disc",      { NULL }, 5058,  "udp"  },
-  { "sds",             { NULL }, 5059,  "tcp"  },
-  { "sds",             { NULL }, 5059,  "udp"  },
-  { "sip",             { NULL }, 5060,  "tcp"  },
-  { "sip",             { NULL }, 5060,  "udp"  },
-  { "sip-tls",         { NULL }, 5061,  "tcp"  },
-  { "sip-tls",         { NULL }, 5061,  "udp"  },
-  { "na-localise",     { NULL }, 5062,  "tcp"  },
-  { "na-localise",     { NULL }, 5062,  "udp"  },
-  { "csrpc",           { NULL }, 5063,  "tcp"  },
-  { "ca-1",            { NULL }, 5064,  "tcp"  },
-  { "ca-1",            { NULL }, 5064,  "udp"  },
-  { "ca-2",            { NULL }, 5065,  "tcp"  },
-  { "ca-2",            { NULL }, 5065,  "udp"  },
-  { "stanag-5066",     { NULL }, 5066,  "tcp"  },
-  { "stanag-5066",     { NULL }, 5066,  "udp"  },
-  { "authentx",        { NULL }, 5067,  "tcp"  },
-  { "authentx",        { NULL }, 5067,  "udp"  },
-  { "bitforestsrv",    { NULL }, 5068,  "tcp"  },
-  { "i-net-2000-npr",  { NULL }, 5069,  "tcp"  },
-  { "i-net-2000-npr",  { NULL }, 5069,  "udp"  },
-  { "vtsas",           { NULL }, 5070,  "tcp"  },
-  { "vtsas",           { NULL }, 5070,  "udp"  },
-  { "powerschool",     { NULL }, 5071,  "tcp"  },
-  { "powerschool",     { NULL }, 5071,  "udp"  },
-  { "ayiya",           { NULL }, 5072,  "tcp"  },
-  { "ayiya",           { NULL }, 5072,  "udp"  },
-  { "tag-pm",          { NULL }, 5073,  "tcp"  },
-  { "tag-pm",          { NULL }, 5073,  "udp"  },
-  { "alesquery",       { NULL }, 5074,  "tcp"  },
-  { "alesquery",       { NULL }, 5074,  "udp"  },
-  { "cp-spxrpts",      { NULL }, 5079,  "udp"  },
-  { "onscreen",        { NULL }, 5080,  "tcp"  },
-  { "onscreen",        { NULL }, 5080,  "udp"  },
-  { "sdl-ets",         { NULL }, 5081,  "tcp"  },
-  { "sdl-ets",         { NULL }, 5081,  "udp"  },
-  { "qcp",             { NULL }, 5082,  "tcp"  },
-  { "qcp",             { NULL }, 5082,  "udp"  },
-  { "qfp",             { NULL }, 5083,  "tcp"  },
-  { "qfp",             { NULL }, 5083,  "udp"  },
-  { "llrp",            { NULL }, 5084,  "tcp"  },
-  { "llrp",            { NULL }, 5084,  "udp"  },
-  { "encrypted-llrp",  { NULL }, 5085,  "tcp"  },
-  { "encrypted-llrp",  { NULL }, 5085,  "udp"  },
-  { "aprigo-cs",       { NULL }, 5086,  "tcp"  },
-  { "car",             { NULL }, 5090,  "sctp" },
-  { "cxtp",            { NULL }, 5091,  "sctp" },
-  { "magpie",          { NULL }, 5092,  "udp"  },
-  { "sentinel-lm",     { NULL }, 5093,  "tcp"  },
-  { "sentinel-lm",     { NULL }, 5093,  "udp"  },
-  { "hart-ip",         { NULL }, 5094,  "tcp"  },
-  { "hart-ip",         { NULL }, 5094,  "udp"  },
-  { "sentlm-srv2srv",  { NULL }, 5099,  "tcp"  },
-  { "sentlm-srv2srv",  { NULL }, 5099,  "udp"  },
-  { "socalia",         { NULL }, 5100,  "tcp"  },
-  { "socalia",         { NULL }, 5100,  "udp"  },
-  { "talarian-tcp",    { NULL }, 5101,  "tcp"  },
-  { "talarian-udp",    { NULL }, 5101,  "udp"  },
-  { "oms-nonsecure",   { NULL }, 5102,  "tcp"  },
-  { "oms-nonsecure",   { NULL }, 5102,  "udp"  },
-  { "actifio-c2c",     { NULL }, 5103,  "tcp"  },
-  { "tinymessage",     { NULL }, 5104,  "udp"  },
-  { "hughes-ap",       { NULL }, 5105,  "udp"  },
-  { "taep-as-svc",     { NULL }, 5111,  "tcp"  },
-  { "taep-as-svc",     { NULL }, 5111,  "udp"  },
-  { "pm-cmdsvr",       { NULL }, 5112,  "tcp"  },
-  { "pm-cmdsvr",       { NULL }, 5112,  "udp"  },
-  { "ev-services",     { NULL }, 5114,  "tcp"  },
-  { "autobuild",       { NULL }, 5115,  "tcp"  },
-  { "emb-proj-cmd",    { NULL }, 5116,  "udp"  },
-  { "gradecam",        { NULL }, 5117,  "tcp"  },
-  { "nbt-pc",          { NULL }, 5133,  "tcp"  },
-  { "nbt-pc",          { NULL }, 5133,  "udp"  },
-  { "ppactivation",    { NULL }, 5134,  "tcp"  },
-  { "erp-scale",       { NULL }, 5135,  "tcp"  },
-  { "minotaur-sa",     { NULL }, 5136,  "udp"  },
-  { "ctsd",            { NULL }, 5137,  "tcp"  },
-  { "ctsd",            { NULL }, 5137,  "udp"  },
-  { "rmonitor_secure", { NULL }, 5145,  "tcp"  },
-  { "rmonitor_secure", { NULL }, 5145,  "udp"  },
-  { "social-alarm",    { NULL }, 5146,  "tcp"  },
-  { "atmp",            { NULL }, 5150,  "tcp"  },
-  { "atmp",            { NULL }, 5150,  "udp"  },
-  { "esri_sde",        { NULL }, 5151,  "tcp"  },
-  { "esri_sde",        { NULL }, 5151,  "udp"  },
-  { "sde-discovery",   { NULL }, 5152,  "tcp"  },
-  { "sde-discovery",   { NULL }, 5152,  "udp"  },
-  { "toruxserver",     { NULL }, 5153,  "tcp"  },
-  { "bzflag",          { NULL }, 5154,  "tcp"  },
-  { "bzflag",          { NULL }, 5154,  "udp"  },
-  { "asctrl-agent",    { NULL }, 5155,  "tcp"  },
-  { "asctrl-agent",    { NULL }, 5155,  "udp"  },
-  { "rugameonline",    { NULL }, 5156,  "tcp"  },
-  { "mediat",          { NULL }, 5157,  "tcp"  },
-  { "snmpssh",         { NULL }, 5161,  "tcp"  },
-  { "snmpssh-trap",    { NULL }, 5162,  "tcp"  },
-  { "sbackup",         { NULL }, 5163,  "tcp"  },
-  { "vpa",             { NULL }, 5164,  "tcp"  },
-  { "vpa-disc",        { NULL }, 5164,  "udp"  },
-  { "ife_icorp",       { NULL }, 5165,  "tcp"  },
-  { "ife_icorp",       { NULL }, 5165,  "udp"  },
-  { "winpcs",          { NULL }, 5166,  "tcp"  },
-  { "winpcs",          { NULL }, 5166,  "udp"  },
-  { "scte104",         { NULL }, 5167,  "tcp"  },
-  { "scte104",         { NULL }, 5167,  "udp"  },
-  { "scte30",          { NULL }, 5168,  "tcp"  },
-  { "scte30",          { NULL }, 5168,  "udp"  },
-  { "aol",             { NULL }, 5190,  "tcp"  },
-  { "aol",             { NULL }, 5190,  "udp"  },
-  { "aol-1",           { NULL }, 5191,  "tcp"  },
-  { "aol-1",           { NULL }, 5191,  "udp"  },
-  { "aol-2",           { NULL }, 5192,  "tcp"  },
-  { "aol-2",           { NULL }, 5192,  "udp"  },
-  { "aol-3",           { NULL }, 5193,  "tcp"  },
-  { "aol-3",           { NULL }, 5193,  "udp"  },
-  { "cpscomm",         { NULL }, 5194,  "tcp"  },
-  { "targus-getdata",  { NULL }, 5200,  "tcp"  },
-  { "targus-getdata",  { NULL }, 5200,  "udp"  },
-  { "targus-getdata1", { NULL }, 5201,  "tcp"  },
-  { "targus-getdata1", { NULL }, 5201,  "udp"  },
-  { "targus-getdata2", { NULL }, 5202,  "tcp"  },
-  { "targus-getdata2", { NULL }, 5202,  "udp"  },
-  { "targus-getdata3", { NULL }, 5203,  "tcp"  },
-  { "targus-getdata3", { NULL }, 5203,  "udp"  },
-  { "3exmp",           { NULL }, 5221,  "tcp"  },
-  { "xmpp-client",     { NULL }, 5222,  "tcp"  },
-  { "hpvirtgrp",       { NULL }, 5223,  "tcp"  },
-  { "hpvirtgrp",       { NULL }, 5223,  "udp"  },
-  { "hpvirtctrl",      { NULL }, 5224,  "tcp"  },
-  { "hpvirtctrl",      { NULL }, 5224,  "udp"  },
-  { "hp-server",       { NULL }, 5225,  "tcp"  },
-  { "hp-server",       { NULL }, 5225,  "udp"  },
-  { "hp-status",       { NULL }, 5226,  "tcp"  },
-  { "hp-status",       { NULL }, 5226,  "udp"  },
-  { "perfd",           { NULL }, 5227,  "tcp"  },
-  { "perfd",           { NULL }, 5227,  "udp"  },
-  { "hpvroom",         { NULL }, 5228,  "tcp"  },
-  { "csedaemon",       { NULL }, 5232,  "tcp"  },
-  { "enfs",            { NULL }, 5233,  "tcp"  },
-  { "eenet",           { NULL }, 5234,  "tcp"  },
-  { "eenet",           { NULL }, 5234,  "udp"  },
-  { "galaxy-network",  { NULL }, 5235,  "tcp"  },
-  { "galaxy-network",  { NULL }, 5235,  "udp"  },
-  { "padl2sim",        { NULL }, 5236,  "tcp"  },
-  { "padl2sim",        { NULL }, 5236,  "udp"  },
-  { "mnet-discovery",  { NULL }, 5237,  "tcp"  },
-  { "mnet-discovery",  { NULL }, 5237,  "udp"  },
-  { "downtools",       { NULL }, 5245,  "tcp"  },
-  { "downtools-disc",  { NULL }, 5245,  "udp"  },
-  { "capwap-control",  { NULL }, 5246,  "udp"  },
-  { "capwap-data",     { NULL }, 5247,  "udp"  },
-  { "caacws",          { NULL }, 5248,  "tcp"  },
-  { "caacws",          { NULL }, 5248,  "udp"  },
-  { "caaclang2",       { NULL }, 5249,  "tcp"  },
-  { "caaclang2",       { NULL }, 5249,  "udp"  },
-  { "soagateway",      { NULL }, 5250,  "tcp"  },
-  { "soagateway",      { NULL }, 5250,  "udp"  },
-  { "caevms",          { NULL }, 5251,  "tcp"  },
-  { "caevms",          { NULL }, 5251,  "udp"  },
-  { "movaz-ssc",       { NULL }, 5252,  "tcp"  },
-  { "movaz-ssc",       { NULL }, 5252,  "udp"  },
-  { "kpdp",            { NULL }, 5253,  "tcp"  },
-  { "3com-njack-1",    { NULL }, 5264,  "tcp"  },
-  { "3com-njack-1",    { NULL }, 5264,  "udp"  },
-  { "3com-njack-2",    { NULL }, 5265,  "tcp"  },
-  { "3com-njack-2",    { NULL }, 5265,  "udp"  },
-  { "xmpp-server",     { NULL }, 5269,  "tcp"  },
-  { "xmp",             { NULL }, 5270,  "tcp"  },
-  { "xmp",             { NULL }, 5270,  "udp"  },
-  { "cuelink",         { NULL }, 5271,  "tcp"  },
-  { "cuelink-disc",    { NULL }, 5271,  "udp"  },
-  { "pk",              { NULL }, 5272,  "tcp"  },
-  { "pk",              { NULL }, 5272,  "udp"  },
-  { "xmpp-bosh",       { NULL }, 5280,  "tcp"  },
-  { "undo-lm",         { NULL }, 5281,  "tcp"  },
-  { "transmit-port",   { NULL }, 5282,  "tcp"  },
-  { "transmit-port",   { NULL }, 5282,  "udp"  },
-  { "presence",        { NULL }, 5298,  "tcp"  },
-  { "presence",        { NULL }, 5298,  "udp"  },
-  { "nlg-data",        { NULL }, 5299,  "tcp"  },
-  { "nlg-data",        { NULL }, 5299,  "udp"  },
-  { "hacl-hb",         { NULL }, 5300,  "tcp"  },
-  { "hacl-hb",         { NULL }, 5300,  "udp"  },
-  { "hacl-gs",         { NULL }, 5301,  "tcp"  },
-  { "hacl-gs",         { NULL }, 5301,  "udp"  },
-  { "hacl-cfg",        { NULL }, 5302,  "tcp"  },
-  { "hacl-cfg",        { NULL }, 5302,  "udp"  },
-  { "hacl-probe",      { NULL }, 5303,  "tcp"  },
-  { "hacl-probe",      { NULL }, 5303,  "udp"  },
-  { "hacl-local",      { NULL }, 5304,  "tcp"  },
-  { "hacl-local",      { NULL }, 5304,  "udp"  },
-  { "hacl-test",       { NULL }, 5305,  "tcp"  },
-  { "hacl-test",       { NULL }, 5305,  "udp"  },
-  { "sun-mc-grp",      { NULL }, 5306,  "tcp"  },
-  { "sun-mc-grp",      { NULL }, 5306,  "udp"  },
-  { "sco-aip",         { NULL }, 5307,  "tcp"  },
-  { "sco-aip",         { NULL }, 5307,  "udp"  },
-  { "cfengine",        { NULL }, 5308,  "tcp"  },
-  { "cfengine",        { NULL }, 5308,  "udp"  },
-  { "jprinter",        { NULL }, 5309,  "tcp"  },
-  { "jprinter",        { NULL }, 5309,  "udp"  },
-  { "outlaws",         { NULL }, 5310,  "tcp"  },
-  { "outlaws",         { NULL }, 5310,  "udp"  },
-  { "permabit-cs",     { NULL }, 5312,  "tcp"  },
-  { "permabit-cs",     { NULL }, 5312,  "udp"  },
-  { "rrdp",            { NULL }, 5313,  "tcp"  },
-  { "rrdp",            { NULL }, 5313,  "udp"  },
-  { "opalis-rbt-ipc",  { NULL }, 5314,  "tcp"  },
-  { "opalis-rbt-ipc",  { NULL }, 5314,  "udp"  },
-  { "hacl-poll",       { NULL }, 5315,  "tcp"  },
-  { "hacl-poll",       { NULL }, 5315,  "udp"  },
-  { "hpdevms",         { NULL }, 5316,  "tcp"  },
-  { "hpdevms",         { NULL }, 5316,  "udp"  },
-  { "bsfserver-zn",    { NULL }, 5320,  "tcp"  },
-  { "bsfsvr-zn-ssl",   { NULL }, 5321,  "tcp"  },
-  { "kfserver",        { NULL }, 5343,  "tcp"  },
-  { "kfserver",        { NULL }, 5343,  "udp"  },
-  { "xkotodrcp",       { NULL }, 5344,  "tcp"  },
-  { "xkotodrcp",       { NULL }, 5344,  "udp"  },
-  { "stuns",           { NULL }, 5349,  "tcp"  },
-  { "stuns",           { NULL }, 5349,  "udp"  },
-  { "turns",           { NULL }, 5349,  "tcp"  },
-  { "turns",           { NULL }, 5349,  "udp"  },
-  { "stun-behaviors",  { NULL }, 5349,  "tcp"  },
-  { "stun-behaviors",  { NULL }, 5349,  "udp"  },
-  { "nat-pmp-status",  { NULL }, 5350,  "tcp"  },
-  { "nat-pmp-status",  { NULL }, 5350,  "udp"  },
-  { "nat-pmp",         { NULL }, 5351,  "tcp"  },
-  { "nat-pmp",         { NULL }, 5351,  "udp"  },
-  { "dns-llq",         { NULL }, 5352,  "tcp"  },
-  { "dns-llq",         { NULL }, 5352,  "udp"  },
-  { "mdns",            { NULL }, 5353,  "tcp"  },
-  { "mdns",            { NULL }, 5353,  "udp"  },
-  { "mdnsresponder",   { NULL }, 5354,  "tcp"  },
-  { "mdnsresponder",   { NULL }, 5354,  "udp"  },
-  { "llmnr",           { NULL }, 5355,  "tcp"  },
-  { "llmnr",           { NULL }, 5355,  "udp"  },
-  { "ms-smlbiz",       { NULL }, 5356,  "tcp"  },
-  { "ms-smlbiz",       { NULL }, 5356,  "udp"  },
-  { "wsdapi",          { NULL }, 5357,  "tcp"  },
-  { "wsdapi",          { NULL }, 5357,  "udp"  },
-  { "wsdapi-s",        { NULL }, 5358,  "tcp"  },
-  { "wsdapi-s",        { NULL }, 5358,  "udp"  },
-  { "ms-alerter",      { NULL }, 5359,  "tcp"  },
-  { "ms-alerter",      { NULL }, 5359,  "udp"  },
-  { "ms-sideshow",     { NULL }, 5360,  "tcp"  },
-  { "ms-sideshow",     { NULL }, 5360,  "udp"  },
-  { "ms-s-sideshow",   { NULL }, 5361,  "tcp"  },
-  { "ms-s-sideshow",   { NULL }, 5361,  "udp"  },
-  { "serverwsd2",      { NULL }, 5362,  "tcp"  },
-  { "serverwsd2",      { NULL }, 5362,  "udp"  },
-  { "net-projection",  { NULL }, 5363,  "tcp"  },
-  { "net-projection",  { NULL }, 5363,  "udp"  },
-  { "stresstester",    { NULL }, 5397,  "tcp"  },
-  { "stresstester",    { NULL }, 5397,  "udp"  },
-  { "elektron-admin",  { NULL }, 5398,  "tcp"  },
-  { "elektron-admin",  { NULL }, 5398,  "udp"  },
-  { "securitychase",   { NULL }, 5399,  "tcp"  },
-  { "securitychase",   { NULL }, 5399,  "udp"  },
-  { "excerpt",         { NULL }, 5400,  "tcp"  },
-  { "excerpt",         { NULL }, 5400,  "udp"  },
-  { "excerpts",        { NULL }, 5401,  "tcp"  },
-  { "excerpts",        { NULL }, 5401,  "udp"  },
-  { "mftp",            { NULL }, 5402,  "tcp"  },
-  { "mftp",            { NULL }, 5402,  "udp"  },
-  { "hpoms-ci-lstn",   { NULL }, 5403,  "tcp"  },
-  { "hpoms-ci-lstn",   { NULL }, 5403,  "udp"  },
-  { "hpoms-dps-lstn",  { NULL }, 5404,  "tcp"  },
-  { "hpoms-dps-lstn",  { NULL }, 5404,  "udp"  },
-  { "netsupport",      { NULL }, 5405,  "tcp"  },
-  { "netsupport",      { NULL }, 5405,  "udp"  },
-  { "systemics-sox",   { NULL }, 5406,  "tcp"  },
-  { "systemics-sox",   { NULL }, 5406,  "udp"  },
-  { "foresyte-clear",  { NULL }, 5407,  "tcp"  },
-  { "foresyte-clear",  { NULL }, 5407,  "udp"  },
-  { "foresyte-sec",    { NULL }, 5408,  "tcp"  },
-  { "foresyte-sec",    { NULL }, 5408,  "udp"  },
-  { "salient-dtasrv",  { NULL }, 5409,  "tcp"  },
-  { "salient-dtasrv",  { NULL }, 5409,  "udp"  },
-  { "salient-usrmgr",  { NULL }, 5410,  "tcp"  },
-  { "salient-usrmgr",  { NULL }, 5410,  "udp"  },
-  { "actnet",          { NULL }, 5411,  "tcp"  },
-  { "actnet",          { NULL }, 5411,  "udp"  },
-  { "continuus",       { NULL }, 5412,  "tcp"  },
-  { "continuus",       { NULL }, 5412,  "udp"  },
-  { "wwiotalk",        { NULL }, 5413,  "tcp"  },
-  { "wwiotalk",        { NULL }, 5413,  "udp"  },
-  { "statusd",         { NULL }, 5414,  "tcp"  },
-  { "statusd",         { NULL }, 5414,  "udp"  },
-  { "ns-server",       { NULL }, 5415,  "tcp"  },
-  { "ns-server",       { NULL }, 5415,  "udp"  },
-  { "sns-gateway",     { NULL }, 5416,  "tcp"  },
-  { "sns-gateway",     { NULL }, 5416,  "udp"  },
-  { "sns-agent",       { NULL }, 5417,  "tcp"  },
-  { "sns-agent",       { NULL }, 5417,  "udp"  },
-  { "mcntp",           { NULL }, 5418,  "tcp"  },
-  { "mcntp",           { NULL }, 5418,  "udp"  },
-  { "dj-ice",          { NULL }, 5419,  "tcp"  },
-  { "dj-ice",          { NULL }, 5419,  "udp"  },
-  { "cylink-c",        { NULL }, 5420,  "tcp"  },
-  { "cylink-c",        { NULL }, 5420,  "udp"  },
-  { "netsupport2",     { NULL }, 5421,  "tcp"  },
-  { "netsupport2",     { NULL }, 5421,  "udp"  },
-  { "salient-mux",     { NULL }, 5422,  "tcp"  },
-  { "salient-mux",     { NULL }, 5422,  "udp"  },
-  { "virtualuser",     { NULL }, 5423,  "tcp"  },
-  { "virtualuser",     { NULL }, 5423,  "udp"  },
-  { "beyond-remote",   { NULL }, 5424,  "tcp"  },
-  { "beyond-remote",   { NULL }, 5424,  "udp"  },
-  { "br-channel",      { NULL }, 5425,  "tcp"  },
-  { "br-channel",      { NULL }, 5425,  "udp"  },
-  { "devbasic",        { NULL }, 5426,  "tcp"  },
-  { "devbasic",        { NULL }, 5426,  "udp"  },
-  { "sco-peer-tta",    { NULL }, 5427,  "tcp"  },
-  { "sco-peer-tta",    { NULL }, 5427,  "udp"  },
-  { "telaconsole",     { NULL }, 5428,  "tcp"  },
-  { "telaconsole",     { NULL }, 5428,  "udp"  },
-  { "base",            { NULL }, 5429,  "tcp"  },
-  { "base",            { NULL }, 5429,  "udp"  },
-  { "radec-corp",      { NULL }, 5430,  "tcp"  },
-  { "radec-corp",      { NULL }, 5430,  "udp"  },
-  { "park-agent",      { NULL }, 5431,  "tcp"  },
-  { "park-agent",      { NULL }, 5431,  "udp"  },
-  { "postgresql",      { NULL }, 5432,  "tcp"  },
-  { "postgresql",      { NULL }, 5432,  "udp"  },
-  { "pyrrho",          { NULL }, 5433,  "tcp"  },
-  { "pyrrho",          { NULL }, 5433,  "udp"  },
-  { "sgi-arrayd",      { NULL }, 5434,  "tcp"  },
-  { "sgi-arrayd",      { NULL }, 5434,  "udp"  },
-  { "sceanics",        { NULL }, 5435,  "tcp"  },
-  { "sceanics",        { NULL }, 5435,  "udp"  },
-  { "pmip6-cntl",      { NULL }, 5436,  "udp"  },
-  { "pmip6-data",      { NULL }, 5437,  "udp"  },
-  { "spss",            { NULL }, 5443,  "tcp"  },
-  { "spss",            { NULL }, 5443,  "udp"  },
-  { "surebox",         { NULL }, 5453,  "tcp"  },
-  { "surebox",         { NULL }, 5453,  "udp"  },
-  { "apc-5454",        { NULL }, 5454,  "tcp"  },
-  { "apc-5454",        { NULL }, 5454,  "udp"  },
-  { "apc-5455",        { NULL }, 5455,  "tcp"  },
-  { "apc-5455",        { NULL }, 5455,  "udp"  },
-  { "apc-5456",        { NULL }, 5456,  "tcp"  },
-  { "apc-5456",        { NULL }, 5456,  "udp"  },
-  { "silkmeter",       { NULL }, 5461,  "tcp"  },
-  { "silkmeter",       { NULL }, 5461,  "udp"  },
-  { "ttl-publisher",   { NULL }, 5462,  "tcp"  },
-  { "ttl-publisher",   { NULL }, 5462,  "udp"  },
-  { "ttlpriceproxy",   { NULL }, 5463,  "tcp"  },
-  { "ttlpriceproxy",   { NULL }, 5463,  "udp"  },
-  { "quailnet",        { NULL }, 5464,  "tcp"  },
-  { "quailnet",        { NULL }, 5464,  "udp"  },
-  { "netops-broker",   { NULL }, 5465,  "tcp"  },
-  { "netops-broker",   { NULL }, 5465,  "udp"  },
-  { "fcp-addr-srvr1",  { NULL }, 5500,  "tcp"  },
-  { "fcp-addr-srvr1",  { NULL }, 5500,  "udp"  },
-  { "fcp-addr-srvr2",  { NULL }, 5501,  "tcp"  },
-  { "fcp-addr-srvr2",  { NULL }, 5501,  "udp"  },
-  { "fcp-srvr-inst1",  { NULL }, 5502,  "tcp"  },
-  { "fcp-srvr-inst1",  { NULL }, 5502,  "udp"  },
-  { "fcp-srvr-inst2",  { NULL }, 5503,  "tcp"  },
-  { "fcp-srvr-inst2",  { NULL }, 5503,  "udp"  },
-  { "fcp-cics-gw1",    { NULL }, 5504,  "tcp"  },
-  { "fcp-cics-gw1",    { NULL }, 5504,  "udp"  },
-  { "checkoutdb",      { NULL }, 5505,  "tcp"  },
-  { "checkoutdb",      { NULL }, 5505,  "udp"  },
-  { "amc",             { NULL }, 5506,  "tcp"  },
-  { "amc",             { NULL }, 5506,  "udp"  },
-  { "sgi-eventmond",   { NULL }, 5553,  "tcp"  },
-  { "sgi-eventmond",   { NULL }, 5553,  "udp"  },
-  { "sgi-esphttp",     { NULL }, 5554,  "tcp"  },
-  { "sgi-esphttp",     { NULL }, 5554,  "udp"  },
-  { "personal-agent",  { NULL }, 5555,  "tcp"  },
-  { "personal-agent",  { NULL }, 5555,  "udp"  },
-  { "freeciv",         { NULL }, 5556,  "tcp"  },
-  { "freeciv",         { NULL }, 5556,  "udp"  },
-  { "farenet",         { NULL }, 5557,  "tcp"  },
-  { "westec-connect",  { NULL }, 5566,  "tcp"  },
-  { "m-oap",           { NULL }, 5567,  "tcp"  },
-  { "m-oap",           { NULL }, 5567,  "udp"  },
-  { "sdt",             { NULL }, 5568,  "tcp"  },
-  { "sdt",             { NULL }, 5568,  "udp"  },
-  { "sdmmp",           { NULL }, 5573,  "tcp"  },
-  { "sdmmp",           { NULL }, 5573,  "udp"  },
-  { "lsi-bobcat",      { NULL }, 5574,  "tcp"  },
-  { "ora-oap",         { NULL }, 5575,  "tcp"  },
-  { "fdtracks",        { NULL }, 5579,  "tcp"  },
-  { "tmosms0",         { NULL }, 5580,  "tcp"  },
-  { "tmosms0",         { NULL }, 5580,  "udp"  },
-  { "tmosms1",         { NULL }, 5581,  "tcp"  },
-  { "tmosms1",         { NULL }, 5581,  "udp"  },
-  { "fac-restore",     { NULL }, 5582,  "tcp"  },
-  { "fac-restore",     { NULL }, 5582,  "udp"  },
-  { "tmo-icon-sync",   { NULL }, 5583,  "tcp"  },
-  { "tmo-icon-sync",   { NULL }, 5583,  "udp"  },
-  { "bis-web",         { NULL }, 5584,  "tcp"  },
-  { "bis-web",         { NULL }, 5584,  "udp"  },
-  { "bis-sync",        { NULL }, 5585,  "tcp"  },
-  { "bis-sync",        { NULL }, 5585,  "udp"  },
-  { "ininmessaging",   { NULL }, 5597,  "tcp"  },
-  { "ininmessaging",   { NULL }, 5597,  "udp"  },
-  { "mctfeed",         { NULL }, 5598,  "tcp"  },
-  { "mctfeed",         { NULL }, 5598,  "udp"  },
-  { "esinstall",       { NULL }, 5599,  "tcp"  },
-  { "esinstall",       { NULL }, 5599,  "udp"  },
-  { "esmmanager",      { NULL }, 5600,  "tcp"  },
-  { "esmmanager",      { NULL }, 5600,  "udp"  },
-  { "esmagent",        { NULL }, 5601,  "tcp"  },
-  { "esmagent",        { NULL }, 5601,  "udp"  },
-  { "a1-msc",          { NULL }, 5602,  "tcp"  },
-  { "a1-msc",          { NULL }, 5602,  "udp"  },
-  { "a1-bs",           { NULL }, 5603,  "tcp"  },
-  { "a1-bs",           { NULL }, 5603,  "udp"  },
-  { "a3-sdunode",      { NULL }, 5604,  "tcp"  },
-  { "a3-sdunode",      { NULL }, 5604,  "udp"  },
-  { "a4-sdunode",      { NULL }, 5605,  "tcp"  },
-  { "a4-sdunode",      { NULL }, 5605,  "udp"  },
-  { "ninaf",           { NULL }, 5627,  "tcp"  },
-  { "ninaf",           { NULL }, 5627,  "udp"  },
-  { "htrust",          { NULL }, 5628,  "tcp"  },
-  { "htrust",          { NULL }, 5628,  "udp"  },
-  { "symantec-sfdb",   { NULL }, 5629,  "tcp"  },
-  { "symantec-sfdb",   { NULL }, 5629,  "udp"  },
-  { "precise-comm",    { NULL }, 5630,  "tcp"  },
-  { "precise-comm",    { NULL }, 5630,  "udp"  },
-  { "pcanywheredata",  { NULL }, 5631,  "tcp"  },
-  { "pcanywheredata",  { NULL }, 5631,  "udp"  },
-  { "pcanywherestat",  { NULL }, 5632,  "tcp"  },
-  { "pcanywherestat",  { NULL }, 5632,  "udp"  },
-  { "beorl",           { NULL }, 5633,  "tcp"  },
-  { "beorl",           { NULL }, 5633,  "udp"  },
-  { "xprtld",          { NULL }, 5634,  "tcp"  },
-  { "xprtld",          { NULL }, 5634,  "udp"  },
-  { "sfmsso",          { NULL }, 5635,  "tcp"  },
-  { "sfm-db-server",   { NULL }, 5636,  "tcp"  },
-  { "cssc",            { NULL }, 5637,  "tcp"  },
-  { "amqps",           { NULL }, 5671,  "tcp"  },
-  { "amqps",           { NULL }, 5671,  "udp"  },
-  { "amqp",            { NULL }, 5672,  "tcp"  },
-  { "amqp",            { NULL }, 5672,  "udp"  },
-  { "amqp",            { NULL }, 5672,  "sctp" },
-  { "jms",             { NULL }, 5673,  "tcp"  },
-  { "jms",             { NULL }, 5673,  "udp"  },
-  { "hyperscsi-port",  { NULL }, 5674,  "tcp"  },
-  { "hyperscsi-port",  { NULL }, 5674,  "udp"  },
-  { "v5ua",            { NULL }, 5675,  "tcp"  },
-  { "v5ua",            { NULL }, 5675,  "udp"  },
-  { "v5ua",            { NULL }, 5675,  "sctp" },
-  { "raadmin",         { NULL }, 5676,  "tcp"  },
-  { "raadmin",         { NULL }, 5676,  "udp"  },
-  { "questdb2-lnchr",  { NULL }, 5677,  "tcp"  },
-  { "questdb2-lnchr",  { NULL }, 5677,  "udp"  },
-  { "rrac",            { NULL }, 5678,  "tcp"  },
-  { "rrac",            { NULL }, 5678,  "udp"  },
-  { "dccm",            { NULL }, 5679,  "tcp"  },
-  { "dccm",            { NULL }, 5679,  "udp"  },
-  { "auriga-router",   { NULL }, 5680,  "tcp"  },
-  { "auriga-router",   { NULL }, 5680,  "udp"  },
-  { "ncxcp",           { NULL }, 5681,  "tcp"  },
-  { "ncxcp",           { NULL }, 5681,  "udp"  },
-  { "brightcore",      { NULL }, 5682,  "udp"  },
-  { "ggz",             { NULL }, 5688,  "tcp"  },
-  { "ggz",             { NULL }, 5688,  "udp"  },
-  { "qmvideo",         { NULL }, 5689,  "tcp"  },
-  { "qmvideo",         { NULL }, 5689,  "udp"  },
-  { "proshareaudio",   { NULL }, 5713,  "tcp"  },
-  { "proshareaudio",   { NULL }, 5713,  "udp"  },
-  { "prosharevideo",   { NULL }, 5714,  "tcp"  },
-  { "prosharevideo",   { NULL }, 5714,  "udp"  },
-  { "prosharedata",    { NULL }, 5715,  "tcp"  },
-  { "prosharedata",    { NULL }, 5715,  "udp"  },
-  { "prosharerequest", { NULL }, 5716,  "tcp"  },
-  { "prosharerequest", { NULL }, 5716,  "udp"  },
-  { "prosharenotify",  { NULL }, 5717,  "tcp"  },
-  { "prosharenotify",  { NULL }, 5717,  "udp"  },
-  { "dpm",             { NULL }, 5718,  "tcp"  },
-  { "dpm",             { NULL }, 5718,  "udp"  },
-  { "dpm-agent",       { NULL }, 5719,  "tcp"  },
-  { "dpm-agent",       { NULL }, 5719,  "udp"  },
-  { "ms-licensing",    { NULL }, 5720,  "tcp"  },
-  { "ms-licensing",    { NULL }, 5720,  "udp"  },
-  { "dtpt",            { NULL }, 5721,  "tcp"  },
-  { "dtpt",            { NULL }, 5721,  "udp"  },
-  { "msdfsr",          { NULL }, 5722,  "tcp"  },
-  { "msdfsr",          { NULL }, 5722,  "udp"  },
-  { "omhs",            { NULL }, 5723,  "tcp"  },
-  { "omhs",            { NULL }, 5723,  "udp"  },
-  { "omsdk",           { NULL }, 5724,  "tcp"  },
-  { "omsdk",           { NULL }, 5724,  "udp"  },
-  { "ms-ilm",          { NULL }, 5725,  "tcp"  },
-  { "ms-ilm-sts",      { NULL }, 5726,  "tcp"  },
-  { "asgenf",          { NULL }, 5727,  "tcp"  },
-  { "io-dist-data",    { NULL }, 5728,  "tcp"  },
-  { "io-dist-group",   { NULL }, 5728,  "udp"  },
-  { "openmail",        { NULL }, 5729,  "tcp"  },
-  { "openmail",        { NULL }, 5729,  "udp"  },
-  { "unieng",          { NULL }, 5730,  "tcp"  },
-  { "unieng",          { NULL }, 5730,  "udp"  },
-  { "ida-discover1",   { NULL }, 5741,  "tcp"  },
-  { "ida-discover1",   { NULL }, 5741,  "udp"  },
-  { "ida-discover2",   { NULL }, 5742,  "tcp"  },
-  { "ida-discover2",   { NULL }, 5742,  "udp"  },
-  { "watchdoc-pod",    { NULL }, 5743,  "tcp"  },
-  { "watchdoc-pod",    { NULL }, 5743,  "udp"  },
-  { "watchdoc",        { NULL }, 5744,  "tcp"  },
-  { "watchdoc",        { NULL }, 5744,  "udp"  },
-  { "fcopy-server",    { NULL }, 5745,  "tcp"  },
-  { "fcopy-server",    { NULL }, 5745,  "udp"  },
-  { "fcopys-server",   { NULL }, 5746,  "tcp"  },
-  { "fcopys-server",   { NULL }, 5746,  "udp"  },
-  { "tunatic",         { NULL }, 5747,  "tcp"  },
-  { "tunatic",         { NULL }, 5747,  "udp"  },
-  { "tunalyzer",       { NULL }, 5748,  "tcp"  },
-  { "tunalyzer",       { NULL }, 5748,  "udp"  },
-  { "rscd",            { NULL }, 5750,  "tcp"  },
-  { "rscd",            { NULL }, 5750,  "udp"  },
-  { "openmailg",       { NULL }, 5755,  "tcp"  },
-  { "openmailg",       { NULL }, 5755,  "udp"  },
-  { "x500ms",          { NULL }, 5757,  "tcp"  },
-  { "x500ms",          { NULL }, 5757,  "udp"  },
-  { "openmailns",      { NULL }, 5766,  "tcp"  },
-  { "openmailns",      { NULL }, 5766,  "udp"  },
-  { "s-openmail",      { NULL }, 5767,  "tcp"  },
-  { "s-openmail",      { NULL }, 5767,  "udp"  },
-  { "openmailpxy",     { NULL }, 5768,  "tcp"  },
-  { "openmailpxy",     { NULL }, 5768,  "udp"  },
-  { "spramsca",        { NULL }, 5769,  "tcp"  },
-  { "spramsca",        { NULL }, 5769,  "udp"  },
-  { "spramsd",         { NULL }, 5770,  "tcp"  },
-  { "spramsd",         { NULL }, 5770,  "udp"  },
-  { "netagent",        { NULL }, 5771,  "tcp"  },
-  { "netagent",        { NULL }, 5771,  "udp"  },
-  { "dali-port",       { NULL }, 5777,  "tcp"  },
-  { "dali-port",       { NULL }, 5777,  "udp"  },
-  { "vts-rpc",         { NULL }, 5780,  "tcp"  },
-  { "3par-evts",       { NULL }, 5781,  "tcp"  },
-  { "3par-evts",       { NULL }, 5781,  "udp"  },
-  { "3par-mgmt",       { NULL }, 5782,  "tcp"  },
-  { "3par-mgmt",       { NULL }, 5782,  "udp"  },
-  { "3par-mgmt-ssl",   { NULL }, 5783,  "tcp"  },
-  { "3par-mgmt-ssl",   { NULL }, 5783,  "udp"  },
-  { "ibar",            { NULL }, 5784,  "udp"  },
-  { "3par-rcopy",      { NULL }, 5785,  "tcp"  },
-  { "3par-rcopy",      { NULL }, 5785,  "udp"  },
-  { "cisco-redu",      { NULL }, 5786,  "udp"  },
-  { "waascluster",     { NULL }, 5787,  "udp"  },
-  { "xtreamx",         { NULL }, 5793,  "tcp"  },
-  { "xtreamx",         { NULL }, 5793,  "udp"  },
-  { "spdp",            { NULL }, 5794,  "udp"  },
-  { "icmpd",           { NULL }, 5813,  "tcp"  },
-  { "icmpd",           { NULL }, 5813,  "udp"  },
-  { "spt-automation",  { NULL }, 5814,  "tcp"  },
-  { "spt-automation",  { NULL }, 5814,  "udp"  },
-  { "wherehoo",        { NULL }, 5859,  "tcp"  },
-  { "wherehoo",        { NULL }, 5859,  "udp"  },
-  { "ppsuitemsg",      { NULL }, 5863,  "tcp"  },
-  { "ppsuitemsg",      { NULL }, 5863,  "udp"  },
-  { "rfb",             { NULL }, 5900,  "tcp"  },
-  { "rfb",             { NULL }, 5900,  "udp"  },
-  { "cm",              { NULL }, 5910,  "tcp"  },
-  { "cm",              { NULL }, 5910,  "udp"  },
-  { "cpdlc",           { NULL }, 5911,  "tcp"  },
-  { "cpdlc",           { NULL }, 5911,  "udp"  },
-  { "fis",             { NULL }, 5912,  "tcp"  },
-  { "fis",             { NULL }, 5912,  "udp"  },
-  { "ads-c",           { NULL }, 5913,  "tcp"  },
-  { "ads-c",           { NULL }, 5913,  "udp"  },
-  { "indy",            { NULL }, 5963,  "tcp"  },
-  { "indy",            { NULL }, 5963,  "udp"  },
-  { "mppolicy-v5",     { NULL }, 5968,  "tcp"  },
-  { "mppolicy-v5",     { NULL }, 5968,  "udp"  },
-  { "mppolicy-mgr",    { NULL }, 5969,  "tcp"  },
-  { "mppolicy-mgr",    { NULL }, 5969,  "udp"  },
-  { "couchdb",         { NULL }, 5984,  "tcp"  },
-  { "couchdb",         { NULL }, 5984,  "udp"  },
-  { "wsman",           { NULL }, 5985,  "tcp"  },
-  { "wsman",           { NULL }, 5985,  "udp"  },
-  { "wsmans",          { NULL }, 5986,  "tcp"  },
-  { "wsmans",          { NULL }, 5986,  "udp"  },
-  { "wbem-rmi",        { NULL }, 5987,  "tcp"  },
-  { "wbem-rmi",        { NULL }, 5987,  "udp"  },
-  { "wbem-http",       { NULL }, 5988,  "tcp"  },
-  { "wbem-http",       { NULL }, 5988,  "udp"  },
-  { "wbem-https",      { NULL }, 5989,  "tcp"  },
-  { "wbem-https",      { NULL }, 5989,  "udp"  },
-  { "wbem-exp-https",  { NULL }, 5990,  "tcp"  },
-  { "wbem-exp-https",  { NULL }, 5990,  "udp"  },
-  { "nuxsl",           { NULL }, 5991,  "tcp"  },
-  { "nuxsl",           { NULL }, 5991,  "udp"  },
-  { "consul-insight",  { NULL }, 5992,  "tcp"  },
-  { "consul-insight",  { NULL }, 5992,  "udp"  },
-  { "cvsup",           { NULL }, 5999,  "tcp"  },
-  { "cvsup",           { NULL }, 5999,  "udp"  },
-  { "ndl-ahp-svc",     { NULL }, 6064,  "tcp"  },
-  { "ndl-ahp-svc",     { NULL }, 6064,  "udp"  },
-  { "winpharaoh",      { NULL }, 6065,  "tcp"  },
-  { "winpharaoh",      { NULL }, 6065,  "udp"  },
-  { "ewctsp",          { NULL }, 6066,  "tcp"  },
-  { "ewctsp",          { NULL }, 6066,  "udp"  },
-  { "gsmp",            { NULL }, 6068,  "tcp"  },
-  { "gsmp",            { NULL }, 6068,  "udp"  },
-  { "trip",            { NULL }, 6069,  "tcp"  },
-  { "trip",            { NULL }, 6069,  "udp"  },
-  { "messageasap",     { NULL }, 6070,  "tcp"  },
-  { "messageasap",     { NULL }, 6070,  "udp"  },
-  { "ssdtp",           { NULL }, 6071,  "tcp"  },
-  { "ssdtp",           { NULL }, 6071,  "udp"  },
-  { "diagnose-proc",   { NULL }, 6072,  "tcp"  },
-  { "diagnose-proc",   { NULL }, 6072,  "udp"  },
-  { "directplay8",     { NULL }, 6073,  "tcp"  },
-  { "directplay8",     { NULL }, 6073,  "udp"  },
-  { "max",             { NULL }, 6074,  "tcp"  },
-  { "max",             { NULL }, 6074,  "udp"  },
-  { "dpm-acm",         { NULL }, 6075,  "tcp"  },
-  { "miami-bcast",     { NULL }, 6083,  "udp"  },
-  { "p2p-sip",         { NULL }, 6084,  "tcp"  },
-  { "konspire2b",      { NULL }, 6085,  "tcp"  },
-  { "konspire2b",      { NULL }, 6085,  "udp"  },
-  { "pdtp",            { NULL }, 6086,  "tcp"  },
-  { "pdtp",            { NULL }, 6086,  "udp"  },
-  { "ldss",            { NULL }, 6087,  "tcp"  },
-  { "ldss",            { NULL }, 6087,  "udp"  },
-  { "raxa-mgmt",       { NULL }, 6099,  "tcp"  },
-  { "synchronet-db",   { NULL }, 6100,  "tcp"  },
-  { "synchronet-db",   { NULL }, 6100,  "udp"  },
-  { "synchronet-rtc",  { NULL }, 6101,  "tcp"  },
-  { "synchronet-rtc",  { NULL }, 6101,  "udp"  },
-  { "synchronet-upd",  { NULL }, 6102,  "tcp"  },
-  { "synchronet-upd",  { NULL }, 6102,  "udp"  },
-  { "rets",            { NULL }, 6103,  "tcp"  },
-  { "rets",            { NULL }, 6103,  "udp"  },
-  { "dbdb",            { NULL }, 6104,  "tcp"  },
-  { "dbdb",            { NULL }, 6104,  "udp"  },
-  { "primaserver",     { NULL }, 6105,  "tcp"  },
-  { "primaserver",     { NULL }, 6105,  "udp"  },
-  { "mpsserver",       { NULL }, 6106,  "tcp"  },
-  { "mpsserver",       { NULL }, 6106,  "udp"  },
-  { "etc-control",     { NULL }, 6107,  "tcp"  },
-  { "etc-control",     { NULL }, 6107,  "udp"  },
-  { "sercomm-scadmin", { NULL }, 6108,  "tcp"  },
-  { "sercomm-scadmin", { NULL }, 6108,  "udp"  },
-  { "globecast-id",    { NULL }, 6109,  "tcp"  },
-  { "globecast-id",    { NULL }, 6109,  "udp"  },
-  { "softcm",          { NULL }, 6110,  "tcp"  },
-  { "softcm",          { NULL }, 6110,  "udp"  },
-  { "spc",             { NULL }, 6111,  "tcp"  },
-  { "spc",             { NULL }, 6111,  "udp"  },
-  { "dtspcd",          { NULL }, 6112,  "tcp"  },
-  { "dtspcd",          { NULL }, 6112,  "udp"  },
-  { "dayliteserver",   { NULL }, 6113,  "tcp"  },
-  { "wrspice",         { NULL }, 6114,  "tcp"  },
-  { "xic",             { NULL }, 6115,  "tcp"  },
-  { "xtlserv",         { NULL }, 6116,  "tcp"  },
-  { "daylitetouch",    { NULL }, 6117,  "tcp"  },
-  { "spdy",            { NULL }, 6121,  "tcp"  },
-  { "bex-webadmin",    { NULL }, 6122,  "tcp"  },
-  { "bex-webadmin",    { NULL }, 6122,  "udp"  },
-  { "backup-express",  { NULL }, 6123,  "tcp"  },
-  { "backup-express",  { NULL }, 6123,  "udp"  },
-  { "pnbs",            { NULL }, 6124,  "tcp"  },
-  { "pnbs",            { NULL }, 6124,  "udp"  },
-  { "nbt-wol",         { NULL }, 6133,  "tcp"  },
-  { "nbt-wol",         { NULL }, 6133,  "udp"  },
-  { "pulsonixnls",     { NULL }, 6140,  "tcp"  },
-  { "pulsonixnls",     { NULL }, 6140,  "udp"  },
-  { "meta-corp",       { NULL }, 6141,  "tcp"  },
-  { "meta-corp",       { NULL }, 6141,  "udp"  },
-  { "aspentec-lm",     { NULL }, 6142,  "tcp"  },
-  { "aspentec-lm",     { NULL }, 6142,  "udp"  },
-  { "watershed-lm",    { NULL }, 6143,  "tcp"  },
-  { "watershed-lm",    { NULL }, 6143,  "udp"  },
-  { "statsci1-lm",     { NULL }, 6144,  "tcp"  },
-  { "statsci1-lm",     { NULL }, 6144,  "udp"  },
-  { "statsci2-lm",     { NULL }, 6145,  "tcp"  },
-  { "statsci2-lm",     { NULL }, 6145,  "udp"  },
-  { "lonewolf-lm",     { NULL }, 6146,  "tcp"  },
-  { "lonewolf-lm",     { NULL }, 6146,  "udp"  },
-  { "montage-lm",      { NULL }, 6147,  "tcp"  },
-  { "montage-lm",      { NULL }, 6147,  "udp"  },
-  { "ricardo-lm",      { NULL }, 6148,  "tcp"  },
-  { "ricardo-lm",      { NULL }, 6148,  "udp"  },
-  { "tal-pod",         { NULL }, 6149,  "tcp"  },
-  { "tal-pod",         { NULL }, 6149,  "udp"  },
-  { "efb-aci",         { NULL }, 6159,  "tcp"  },
-  { "patrol-ism",      { NULL }, 6161,  "tcp"  },
-  { "patrol-ism",      { NULL }, 6161,  "udp"  },
-  { "patrol-coll",     { NULL }, 6162,  "tcp"  },
-  { "patrol-coll",     { NULL }, 6162,  "udp"  },
-  { "pscribe",         { NULL }, 6163,  "tcp"  },
-  { "pscribe",         { NULL }, 6163,  "udp"  },
-  { "lm-x",            { NULL }, 6200,  "tcp"  },
-  { "lm-x",            { NULL }, 6200,  "udp"  },
-  { "radmind",         { NULL }, 6222,  "tcp"  },
-  { "radmind",         { NULL }, 6222,  "udp"  },
-  { "jeol-nsdtp-1",    { NULL }, 6241,  "tcp"  },
-  { "jeol-nsddp-1",    { NULL }, 6241,  "udp"  },
-  { "jeol-nsdtp-2",    { NULL }, 6242,  "tcp"  },
-  { "jeol-nsddp-2",    { NULL }, 6242,  "udp"  },
-  { "jeol-nsdtp-3",    { NULL }, 6243,  "tcp"  },
-  { "jeol-nsddp-3",    { NULL }, 6243,  "udp"  },
-  { "jeol-nsdtp-4",    { NULL }, 6244,  "tcp"  },
-  { "jeol-nsddp-4",    { NULL }, 6244,  "udp"  },
-  { "tl1-raw-ssl",     { NULL }, 6251,  "tcp"  },
-  { "tl1-raw-ssl",     { NULL }, 6251,  "udp"  },
-  { "tl1-ssh",         { NULL }, 6252,  "tcp"  },
-  { "tl1-ssh",         { NULL }, 6252,  "udp"  },
-  { "crip",            { NULL }, 6253,  "tcp"  },
-  { "crip",            { NULL }, 6253,  "udp"  },
-  { "gld",             { NULL }, 6267,  "tcp"  },
-  { "grid",            { NULL }, 6268,  "tcp"  },
-  { "grid",            { NULL }, 6268,  "udp"  },
-  { "grid-alt",        { NULL }, 6269,  "tcp"  },
-  { "grid-alt",        { NULL }, 6269,  "udp"  },
-  { "bmc-grx",         { NULL }, 6300,  "tcp"  },
-  { "bmc-grx",         { NULL }, 6300,  "udp"  },
-  { "bmc_ctd_ldap",    { NULL }, 6301,  "tcp"  },
-  { "bmc_ctd_ldap",    { NULL }, 6301,  "udp"  },
-  { "ufmp",            { NULL }, 6306,  "tcp"  },
-  { "ufmp",            { NULL }, 6306,  "udp"  },
-  { "scup",            { NULL }, 6315,  "tcp"  },
-  { "scup-disc",       { NULL }, 6315,  "udp"  },
-  { "abb-escp",        { NULL }, 6316,  "tcp"  },
-  { "abb-escp",        { NULL }, 6316,  "udp"  },
-  { "repsvc",          { NULL }, 6320,  "tcp"  },
-  { "repsvc",          { NULL }, 6320,  "udp"  },
-  { "emp-server1",     { NULL }, 6321,  "tcp"  },
-  { "emp-server1",     { NULL }, 6321,  "udp"  },
-  { "emp-server2",     { NULL }, 6322,  "tcp"  },
-  { "emp-server2",     { NULL }, 6322,  "udp"  },
-  { "sflow",           { NULL }, 6343,  "tcp"  },
-  { "sflow",           { NULL }, 6343,  "udp"  },
-  { "gnutella-svc",    { NULL }, 6346,  "tcp"  },
-  { "gnutella-svc",    { NULL }, 6346,  "udp"  },
-  { "gnutella-rtr",    { NULL }, 6347,  "tcp"  },
-  { "gnutella-rtr",    { NULL }, 6347,  "udp"  },
-  { "adap",            { NULL }, 6350,  "tcp"  },
-  { "adap",            { NULL }, 6350,  "udp"  },
-  { "pmcs",            { NULL }, 6355,  "tcp"  },
-  { "pmcs",            { NULL }, 6355,  "udp"  },
-  { "metaedit-mu",     { NULL }, 6360,  "tcp"  },
-  { "metaedit-mu",     { NULL }, 6360,  "udp"  },
-  { "metaedit-se",     { NULL }, 6370,  "tcp"  },
-  { "metaedit-se",     { NULL }, 6370,  "udp"  },
-  { "metatude-mds",    { NULL }, 6382,  "tcp"  },
-  { "metatude-mds",    { NULL }, 6382,  "udp"  },
-  { "clariion-evr01",  { NULL }, 6389,  "tcp"  },
-  { "clariion-evr01",  { NULL }, 6389,  "udp"  },
-  { "metaedit-ws",     { NULL }, 6390,  "tcp"  },
-  { "metaedit-ws",     { NULL }, 6390,  "udp"  },
-  { "faxcomservice",   { NULL }, 6417,  "tcp"  },
-  { "faxcomservice",   { NULL }, 6417,  "udp"  },
-  { "syserverremote",  { NULL }, 6418,  "tcp"  },
-  { "svdrp",           { NULL }, 6419,  "tcp"  },
-  { "nim-vdrshell",    { NULL }, 6420,  "tcp"  },
-  { "nim-vdrshell",    { NULL }, 6420,  "udp"  },
-  { "nim-wan",         { NULL }, 6421,  "tcp"  },
-  { "nim-wan",         { NULL }, 6421,  "udp"  },
-  { "pgbouncer",       { NULL }, 6432,  "tcp"  },
-  { "sun-sr-https",    { NULL }, 6443,  "tcp"  },
-  { "sun-sr-https",    { NULL }, 6443,  "udp"  },
-  { "sge_qmaster",     { NULL }, 6444,  "tcp"  },
-  { "sge_qmaster",     { NULL }, 6444,  "udp"  },
-  { "sge_execd",       { NULL }, 6445,  "tcp"  },
-  { "sge_execd",       { NULL }, 6445,  "udp"  },
-  { "mysql-proxy",     { NULL }, 6446,  "tcp"  },
-  { "mysql-proxy",     { NULL }, 6446,  "udp"  },
-  { "skip-cert-recv",  { NULL }, 6455,  "tcp"  },
-  { "skip-cert-send",  { NULL }, 6456,  "udp"  },
-  { "lvision-lm",      { NULL }, 6471,  "tcp"  },
-  { "lvision-lm",      { NULL }, 6471,  "udp"  },
-  { "sun-sr-http",     { NULL }, 6480,  "tcp"  },
-  { "sun-sr-http",     { NULL }, 6480,  "udp"  },
-  { "servicetags",     { NULL }, 6481,  "tcp"  },
-  { "servicetags",     { NULL }, 6481,  "udp"  },
-  { "ldoms-mgmt",      { NULL }, 6482,  "tcp"  },
-  { "ldoms-mgmt",      { NULL }, 6482,  "udp"  },
-  { "SunVTS-RMI",      { NULL }, 6483,  "tcp"  },
-  { "SunVTS-RMI",      { NULL }, 6483,  "udp"  },
-  { "sun-sr-jms",      { NULL }, 6484,  "tcp"  },
-  { "sun-sr-jms",      { NULL }, 6484,  "udp"  },
-  { "sun-sr-iiop",     { NULL }, 6485,  "tcp"  },
-  { "sun-sr-iiop",     { NULL }, 6485,  "udp"  },
-  { "sun-sr-iiops",    { NULL }, 6486,  "tcp"  },
-  { "sun-sr-iiops",    { NULL }, 6486,  "udp"  },
-  { "sun-sr-iiop-aut", { NULL }, 6487,  "tcp"  },
-  { "sun-sr-iiop-aut", { NULL }, 6487,  "udp"  },
-  { "sun-sr-jmx",      { NULL }, 6488,  "tcp"  },
-  { "sun-sr-jmx",      { NULL }, 6488,  "udp"  },
-  { "sun-sr-admin",    { NULL }, 6489,  "tcp"  },
-  { "sun-sr-admin",    { NULL }, 6489,  "udp"  },
-  { "boks",            { NULL }, 6500,  "tcp"  },
-  { "boks",            { NULL }, 6500,  "udp"  },
-  { "boks_servc",      { NULL }, 6501,  "tcp"  },
-  { "boks_servc",      { NULL }, 6501,  "udp"  },
-  { "boks_servm",      { NULL }, 6502,  "tcp"  },
-  { "boks_servm",      { NULL }, 6502,  "udp"  },
-  { "boks_clntd",      { NULL }, 6503,  "tcp"  },
-  { "boks_clntd",      { NULL }, 6503,  "udp"  },
-  { "badm_priv",       { NULL }, 6505,  "tcp"  },
-  { "badm_priv",       { NULL }, 6505,  "udp"  },
-  { "badm_pub",        { NULL }, 6506,  "tcp"  },
-  { "badm_pub",        { NULL }, 6506,  "udp"  },
-  { "bdir_priv",       { NULL }, 6507,  "tcp"  },
-  { "bdir_priv",       { NULL }, 6507,  "udp"  },
-  { "bdir_pub",        { NULL }, 6508,  "tcp"  },
-  { "bdir_pub",        { NULL }, 6508,  "udp"  },
-  { "mgcs-mfp-port",   { NULL }, 6509,  "tcp"  },
-  { "mgcs-mfp-port",   { NULL }, 6509,  "udp"  },
-  { "mcer-port",       { NULL }, 6510,  "tcp"  },
-  { "mcer-port",       { NULL }, 6510,  "udp"  },
-  { "netconf-tls",     { NULL }, 6513,  "tcp"  },
-  { "syslog-tls",      { NULL }, 6514,  "tcp"  },
-  { "syslog-tls",      { NULL }, 6514,  "udp"  },
-  { "syslog-tls",      { NULL }, 6514,  "dccp" },
-  { "elipse-rec",      { NULL }, 6515,  "tcp"  },
-  { "elipse-rec",      { NULL }, 6515,  "udp"  },
-  { "lds-distrib",     { NULL }, 6543,  "tcp"  },
-  { "lds-distrib",     { NULL }, 6543,  "udp"  },
-  { "lds-dump",        { NULL }, 6544,  "tcp"  },
-  { "lds-dump",        { NULL }, 6544,  "udp"  },
-  { "apc-6547",        { NULL }, 6547,  "tcp"  },
-  { "apc-6547",        { NULL }, 6547,  "udp"  },
-  { "apc-6548",        { NULL }, 6548,  "tcp"  },
-  { "apc-6548",        { NULL }, 6548,  "udp"  },
-  { "apc-6549",        { NULL }, 6549,  "tcp"  },
-  { "apc-6549",        { NULL }, 6549,  "udp"  },
-  { "fg-sysupdate",    { NULL }, 6550,  "tcp"  },
-  { "fg-sysupdate",    { NULL }, 6550,  "udp"  },
-  { "sum",             { NULL }, 6551,  "tcp"  },
-  { "sum",             { NULL }, 6551,  "udp"  },
-  { "xdsxdm",          { NULL }, 6558,  "tcp"  },
-  { "xdsxdm",          { NULL }, 6558,  "udp"  },
-  { "sane-port",       { NULL }, 6566,  "tcp"  },
-  { "sane-port",       { NULL }, 6566,  "udp"  },
-  { "esp",             { NULL }, 6567,  "tcp"  },
-  { "esp",             { NULL }, 6567,  "udp"  },
-  { "canit_store",     { NULL }, 6568,  "tcp"  },
-  { "rp-reputation",   { NULL }, 6568,  "udp"  },
-  { "affiliate",       { NULL }, 6579,  "tcp"  },
-  { "affiliate",       { NULL }, 6579,  "udp"  },
-  { "parsec-master",   { NULL }, 6580,  "tcp"  },
-  { "parsec-master",   { NULL }, 6580,  "udp"  },
-  { "parsec-peer",     { NULL }, 6581,  "tcp"  },
-  { "parsec-peer",     { NULL }, 6581,  "udp"  },
-  { "parsec-game",     { NULL }, 6582,  "tcp"  },
-  { "parsec-game",     { NULL }, 6582,  "udp"  },
-  { "joaJewelSuite",   { NULL }, 6583,  "tcp"  },
-  { "joaJewelSuite",   { NULL }, 6583,  "udp"  },
-  { "mshvlm",          { NULL }, 6600,  "tcp"  },
-  { "mstmg-sstp",      { NULL }, 6601,  "tcp"  },
-  { "wsscomfrmwk",     { NULL }, 6602,  "tcp"  },
-  { "odette-ftps",     { NULL }, 6619,  "tcp"  },
-  { "odette-ftps",     { NULL }, 6619,  "udp"  },
-  { "kftp-data",       { NULL }, 6620,  "tcp"  },
-  { "kftp-data",       { NULL }, 6620,  "udp"  },
-  { "kftp",            { NULL }, 6621,  "tcp"  },
-  { "kftp",            { NULL }, 6621,  "udp"  },
-  { "mcftp",           { NULL }, 6622,  "tcp"  },
-  { "mcftp",           { NULL }, 6622,  "udp"  },
-  { "ktelnet",         { NULL }, 6623,  "tcp"  },
-  { "ktelnet",         { NULL }, 6623,  "udp"  },
-  { "datascaler-db",   { NULL }, 6624,  "tcp"  },
-  { "datascaler-ctl",  { NULL }, 6625,  "tcp"  },
-  { "wago-service",    { NULL }, 6626,  "tcp"  },
-  { "wago-service",    { NULL }, 6626,  "udp"  },
-  { "nexgen",          { NULL }, 6627,  "tcp"  },
-  { "nexgen",          { NULL }, 6627,  "udp"  },
-  { "afesc-mc",        { NULL }, 6628,  "tcp"  },
-  { "afesc-mc",        { NULL }, 6628,  "udp"  },
-  { "mxodbc-connect",  { NULL }, 6632,  "tcp"  },
-  { "pcs-sf-ui-man",   { NULL }, 6655,  "tcp"  },
-  { "emgmsg",          { NULL }, 6656,  "tcp"  },
-  { "palcom-disc",     { NULL }, 6657,  "udp"  },
-  { "vocaltec-gold",   { NULL }, 6670,  "tcp"  },
-  { "vocaltec-gold",   { NULL }, 6670,  "udp"  },
-  { "p4p-portal",      { NULL }, 6671,  "tcp"  },
-  { "p4p-portal",      { NULL }, 6671,  "udp"  },
-  { "vision_server",   { NULL }, 6672,  "tcp"  },
-  { "vision_server",   { NULL }, 6672,  "udp"  },
-  { "vision_elmd",     { NULL }, 6673,  "tcp"  },
-  { "vision_elmd",     { NULL }, 6673,  "udp"  },
-  { "vfbp",            { NULL }, 6678,  "tcp"  },
-  { "vfbp-disc",       { NULL }, 6678,  "udp"  },
-  { "osaut",           { NULL }, 6679,  "tcp"  },
-  { "osaut",           { NULL }, 6679,  "udp"  },
-  { "clever-ctrace",   { NULL }, 6687,  "tcp"  },
-  { "clever-tcpip",    { NULL }, 6688,  "tcp"  },
-  { "tsa",             { NULL }, 6689,  "tcp"  },
-  { "tsa",             { NULL }, 6689,  "udp"  },
-  { "babel",           { NULL }, 6697,  "udp"  },
-  { "kti-icad-srvr",   { NULL }, 6701,  "tcp"  },
-  { "kti-icad-srvr",   { NULL }, 6701,  "udp"  },
-  { "e-design-net",    { NULL }, 6702,  "tcp"  },
-  { "e-design-net",    { NULL }, 6702,  "udp"  },
-  { "e-design-web",    { NULL }, 6703,  "tcp"  },
-  { "e-design-web",    { NULL }, 6703,  "udp"  },
-  { "frc-hp",          { NULL }, 6704,  "sctp" },
-  { "frc-mp",          { NULL }, 6705,  "sctp" },
-  { "frc-lp",          { NULL }, 6706,  "sctp" },
-  { "ibprotocol",      { NULL }, 6714,  "tcp"  },
-  { "ibprotocol",      { NULL }, 6714,  "udp"  },
-  { "fibotrader-com",  { NULL }, 6715,  "tcp"  },
-  { "fibotrader-com",  { NULL }, 6715,  "udp"  },
-  { "bmc-perf-agent",  { NULL }, 6767,  "tcp"  },
-  { "bmc-perf-agent",  { NULL }, 6767,  "udp"  },
-  { "bmc-perf-mgrd",   { NULL }, 6768,  "tcp"  },
-  { "bmc-perf-mgrd",   { NULL }, 6768,  "udp"  },
-  { "adi-gxp-srvprt",  { NULL }, 6769,  "tcp"  },
-  { "adi-gxp-srvprt",  { NULL }, 6769,  "udp"  },
-  { "plysrv-http",     { NULL }, 6770,  "tcp"  },
-  { "plysrv-http",     { NULL }, 6770,  "udp"  },
-  { "plysrv-https",    { NULL }, 6771,  "tcp"  },
-  { "plysrv-https",    { NULL }, 6771,  "udp"  },
-  { "dgpf-exchg",      { NULL }, 6785,  "tcp"  },
-  { "dgpf-exchg",      { NULL }, 6785,  "udp"  },
-  { "smc-jmx",         { NULL }, 6786,  "tcp"  },
-  { "smc-jmx",         { NULL }, 6786,  "udp"  },
-  { "smc-admin",       { NULL }, 6787,  "tcp"  },
-  { "smc-admin",       { NULL }, 6787,  "udp"  },
-  { "smc-http",        { NULL }, 6788,  "tcp"  },
-  { "smc-http",        { NULL }, 6788,  "udp"  },
-  { "smc-https",       { NULL }, 6789,  "tcp"  },
-  { "smc-https",       { NULL }, 6789,  "udp"  },
-  { "hnmp",            { NULL }, 6790,  "tcp"  },
-  { "hnmp",            { NULL }, 6790,  "udp"  },
-  { "hnm",             { NULL }, 6791,  "tcp"  },
-  { "hnm",             { NULL }, 6791,  "udp"  },
-  { "acnet",           { NULL }, 6801,  "tcp"  },
-  { "acnet",           { NULL }, 6801,  "udp"  },
-  { "pentbox-sim",     { NULL }, 6817,  "tcp"  },
-  { "ambit-lm",        { NULL }, 6831,  "tcp"  },
-  { "ambit-lm",        { NULL }, 6831,  "udp"  },
-  { "netmo-default",   { NULL }, 6841,  "tcp"  },
-  { "netmo-default",   { NULL }, 6841,  "udp"  },
-  { "netmo-http",      { NULL }, 6842,  "tcp"  },
-  { "netmo-http",      { NULL }, 6842,  "udp"  },
-  { "iccrushmore",     { NULL }, 6850,  "tcp"  },
-  { "iccrushmore",     { NULL }, 6850,  "udp"  },
-  { "acctopus-cc",     { NULL }, 6868,  "tcp"  },
-  { "acctopus-st",     { NULL }, 6868,  "udp"  },
-  { "muse",            { NULL }, 6888,  "tcp"  },
-  { "muse",            { NULL }, 6888,  "udp"  },
-  { "jetstream",       { NULL }, 6901,  "tcp"  },
-  { "xsmsvc",          { NULL }, 6936,  "tcp"  },
-  { "xsmsvc",          { NULL }, 6936,  "udp"  },
-  { "bioserver",       { NULL }, 6946,  "tcp"  },
-  { "bioserver",       { NULL }, 6946,  "udp"  },
-  { "otlp",            { NULL }, 6951,  "tcp"  },
-  { "otlp",            { NULL }, 6951,  "udp"  },
-  { "jmact3",          { NULL }, 6961,  "tcp"  },
-  { "jmact3",          { NULL }, 6961,  "udp"  },
-  { "jmevt2",          { NULL }, 6962,  "tcp"  },
-  { "jmevt2",          { NULL }, 6962,  "udp"  },
-  { "swismgr1",        { NULL }, 6963,  "tcp"  },
-  { "swismgr1",        { NULL }, 6963,  "udp"  },
-  { "swismgr2",        { NULL }, 6964,  "tcp"  },
-  { "swismgr2",        { NULL }, 6964,  "udp"  },
-  { "swistrap",        { NULL }, 6965,  "tcp"  },
-  { "swistrap",        { NULL }, 6965,  "udp"  },
-  { "swispol",         { NULL }, 6966,  "tcp"  },
-  { "swispol",         { NULL }, 6966,  "udp"  },
-  { "acmsoda",         { NULL }, 6969,  "tcp"  },
-  { "acmsoda",         { NULL }, 6969,  "udp"  },
-  { "MobilitySrv",     { NULL }, 6997,  "tcp"  },
-  { "MobilitySrv",     { NULL }, 6997,  "udp"  },
-  { "iatp-highpri",    { NULL }, 6998,  "tcp"  },
-  { "iatp-highpri",    { NULL }, 6998,  "udp"  },
-  { "iatp-normalpri",  { NULL }, 6999,  "tcp"  },
-  { "iatp-normalpri",  { NULL }, 6999,  "udp"  },
-  { "afs3-fileserver", { NULL }, 7000,  "tcp"  },
-  { "afs3-fileserver", { NULL }, 7000,  "udp"  },
-  { "afs3-callback",   { NULL }, 7001,  "tcp"  },
-  { "afs3-callback",   { NULL }, 7001,  "udp"  },
-  { "afs3-prserver",   { NULL }, 7002,  "tcp"  },
-  { "afs3-prserver",   { NULL }, 7002,  "udp"  },
-  { "afs3-vlserver",   { NULL }, 7003,  "tcp"  },
-  { "afs3-vlserver",   { NULL }, 7003,  "udp"  },
-  { "afs3-kaserver",   { NULL }, 7004,  "tcp"  },
-  { "afs3-kaserver",   { NULL }, 7004,  "udp"  },
-  { "afs3-volser",     { NULL }, 7005,  "tcp"  },
-  { "afs3-volser",     { NULL }, 7005,  "udp"  },
-  { "afs3-errors",     { NULL }, 7006,  "tcp"  },
-  { "afs3-errors",     { NULL }, 7006,  "udp"  },
-  { "afs3-bos",        { NULL }, 7007,  "tcp"  },
-  { "afs3-bos",        { NULL }, 7007,  "udp"  },
-  { "afs3-update",     { NULL }, 7008,  "tcp"  },
-  { "afs3-update",     { NULL }, 7008,  "udp"  },
-  { "afs3-rmtsys",     { NULL }, 7009,  "tcp"  },
-  { "afs3-rmtsys",     { NULL }, 7009,  "udp"  },
-  { "ups-onlinet",     { NULL }, 7010,  "tcp"  },
-  { "ups-onlinet",     { NULL }, 7010,  "udp"  },
-  { "talon-disc",      { NULL }, 7011,  "tcp"  },
-  { "talon-disc",      { NULL }, 7011,  "udp"  },
-  { "talon-engine",    { NULL }, 7012,  "tcp"  },
-  { "talon-engine",    { NULL }, 7012,  "udp"  },
-  { "microtalon-dis",  { NULL }, 7013,  "tcp"  },
-  { "microtalon-dis",  { NULL }, 7013,  "udp"  },
-  { "microtalon-com",  { NULL }, 7014,  "tcp"  },
-  { "microtalon-com",  { NULL }, 7014,  "udp"  },
-  { "talon-webserver", { NULL }, 7015,  "tcp"  },
-  { "talon-webserver", { NULL }, 7015,  "udp"  },
-  { "dpserve",         { NULL }, 7020,  "tcp"  },
-  { "dpserve",         { NULL }, 7020,  "udp"  },
-  { "dpserveadmin",    { NULL }, 7021,  "tcp"  },
-  { "dpserveadmin",    { NULL }, 7021,  "udp"  },
-  { "ctdp",            { NULL }, 7022,  "tcp"  },
-  { "ctdp",            { NULL }, 7022,  "udp"  },
-  { "ct2nmcs",         { NULL }, 7023,  "tcp"  },
-  { "ct2nmcs",         { NULL }, 7023,  "udp"  },
-  { "vmsvc",           { NULL }, 7024,  "tcp"  },
-  { "vmsvc",           { NULL }, 7024,  "udp"  },
-  { "vmsvc-2",         { NULL }, 7025,  "tcp"  },
-  { "vmsvc-2",         { NULL }, 7025,  "udp"  },
-  { "op-probe",        { NULL }, 7030,  "tcp"  },
-  { "op-probe",        { NULL }, 7030,  "udp"  },
-  { "arcp",            { NULL }, 7070,  "tcp"  },
-  { "arcp",            { NULL }, 7070,  "udp"  },
-  { "iwg1",            { NULL }, 7071,  "tcp"  },
-  { "iwg1",            { NULL }, 7071,  "udp"  },
-  { "empowerid",       { NULL }, 7080,  "tcp"  },
-  { "empowerid",       { NULL }, 7080,  "udp"  },
-  { "lazy-ptop",       { NULL }, 7099,  "tcp"  },
-  { "lazy-ptop",       { NULL }, 7099,  "udp"  },
-  { "font-service",    { NULL }, 7100,  "tcp"  },
-  { "font-service",    { NULL }, 7100,  "udp"  },
-  { "elcn",            { NULL }, 7101,  "tcp"  },
-  { "elcn",            { NULL }, 7101,  "udp"  },
-  { "aes-x170",        { NULL }, 7107,  "udp"  },
-  { "virprot-lm",      { NULL }, 7121,  "tcp"  },
-  { "virprot-lm",      { NULL }, 7121,  "udp"  },
-  { "scenidm",         { NULL }, 7128,  "tcp"  },
-  { "scenidm",         { NULL }, 7128,  "udp"  },
-  { "scenccs",         { NULL }, 7129,  "tcp"  },
-  { "scenccs",         { NULL }, 7129,  "udp"  },
-  { "cabsm-comm",      { NULL }, 7161,  "tcp"  },
-  { "cabsm-comm",      { NULL }, 7161,  "udp"  },
-  { "caistoragemgr",   { NULL }, 7162,  "tcp"  },
-  { "caistoragemgr",   { NULL }, 7162,  "udp"  },
-  { "cacsambroker",    { NULL }, 7163,  "tcp"  },
-  { "cacsambroker",    { NULL }, 7163,  "udp"  },
-  { "fsr",             { NULL }, 7164,  "tcp"  },
-  { "fsr",             { NULL }, 7164,  "udp"  },
-  { "doc-server",      { NULL }, 7165,  "tcp"  },
-  { "doc-server",      { NULL }, 7165,  "udp"  },
-  { "aruba-server",    { NULL }, 7166,  "tcp"  },
-  { "aruba-server",    { NULL }, 7166,  "udp"  },
-  { "casrmagent",      { NULL }, 7167,  "tcp"  },
-  { "cnckadserver",    { NULL }, 7168,  "tcp"  },
-  { "ccag-pib",        { NULL }, 7169,  "tcp"  },
-  { "ccag-pib",        { NULL }, 7169,  "udp"  },
-  { "nsrp",            { NULL }, 7170,  "tcp"  },
-  { "nsrp",            { NULL }, 7170,  "udp"  },
-  { "drm-production",  { NULL }, 7171,  "tcp"  },
-  { "drm-production",  { NULL }, 7171,  "udp"  },
-  { "zsecure",         { NULL }, 7173,  "tcp"  },
-  { "clutild",         { NULL }, 7174,  "tcp"  },
-  { "clutild",         { NULL }, 7174,  "udp"  },
-  { "fodms",           { NULL }, 7200,  "tcp"  },
-  { "fodms",           { NULL }, 7200,  "udp"  },
-  { "dlip",            { NULL }, 7201,  "tcp"  },
-  { "dlip",            { NULL }, 7201,  "udp"  },
-  { "ramp",            { NULL }, 7227,  "tcp"  },
-  { "ramp",            { NULL }, 7227,  "udp"  },
-  { "citrixupp",       { NULL }, 7228,  "tcp"  },
-  { "citrixuppg",      { NULL }, 7229,  "tcp"  },
-  { "pads",            { NULL }, 7237,  "tcp"  },
-  { "cnap",            { NULL }, 7262,  "tcp"  },
-  { "cnap",            { NULL }, 7262,  "udp"  },
-  { "watchme-7272",    { NULL }, 7272,  "tcp"  },
-  { "watchme-7272",    { NULL }, 7272,  "udp"  },
-  { "oma-rlp",         { NULL }, 7273,  "tcp"  },
-  { "oma-rlp",         { NULL }, 7273,  "udp"  },
-  { "oma-rlp-s",       { NULL }, 7274,  "tcp"  },
-  { "oma-rlp-s",       { NULL }, 7274,  "udp"  },
-  { "oma-ulp",         { NULL }, 7275,  "tcp"  },
-  { "oma-ulp",         { NULL }, 7275,  "udp"  },
-  { "oma-ilp",         { NULL }, 7276,  "tcp"  },
-  { "oma-ilp",         { NULL }, 7276,  "udp"  },
-  { "oma-ilp-s",       { NULL }, 7277,  "tcp"  },
-  { "oma-ilp-s",       { NULL }, 7277,  "udp"  },
-  { "oma-dcdocbs",     { NULL }, 7278,  "tcp"  },
-  { "oma-dcdocbs",     { NULL }, 7278,  "udp"  },
-  { "ctxlic",          { NULL }, 7279,  "tcp"  },
-  { "ctxlic",          { NULL }, 7279,  "udp"  },
-  { "itactionserver1", { NULL }, 7280,  "tcp"  },
-  { "itactionserver1", { NULL }, 7280,  "udp"  },
-  { "itactionserver2", { NULL }, 7281,  "tcp"  },
-  { "itactionserver2", { NULL }, 7281,  "udp"  },
-  { "mzca-action",     { NULL }, 7282,  "tcp"  },
-  { "mzca-alert",      { NULL }, 7282,  "udp"  },
-  { "lcm-server",      { NULL }, 7365,  "tcp"  },
-  { "lcm-server",      { NULL }, 7365,  "udp"  },
-  { "mindfilesys",     { NULL }, 7391,  "tcp"  },
-  { "mindfilesys",     { NULL }, 7391,  "udp"  },
-  { "mrssrendezvous",  { NULL }, 7392,  "tcp"  },
-  { "mrssrendezvous",  { NULL }, 7392,  "udp"  },
-  { "nfoldman",        { NULL }, 7393,  "tcp"  },
-  { "nfoldman",        { NULL }, 7393,  "udp"  },
-  { "fse",             { NULL }, 7394,  "tcp"  },
-  { "fse",             { NULL }, 7394,  "udp"  },
-  { "winqedit",        { NULL }, 7395,  "tcp"  },
-  { "winqedit",        { NULL }, 7395,  "udp"  },
-  { "hexarc",          { NULL }, 7397,  "tcp"  },
-  { "hexarc",          { NULL }, 7397,  "udp"  },
-  { "rtps-discovery",  { NULL }, 7400,  "tcp"  },
-  { "rtps-discovery",  { NULL }, 7400,  "udp"  },
-  { "rtps-dd-ut",      { NULL }, 7401,  "tcp"  },
-  { "rtps-dd-ut",      { NULL }, 7401,  "udp"  },
-  { "rtps-dd-mt",      { NULL }, 7402,  "tcp"  },
-  { "rtps-dd-mt",      { NULL }, 7402,  "udp"  },
-  { "ionixnetmon",     { NULL }, 7410,  "tcp"  },
-  { "ionixnetmon",     { NULL }, 7410,  "udp"  },
-  { "mtportmon",       { NULL }, 7421,  "tcp"  },
-  { "mtportmon",       { NULL }, 7421,  "udp"  },
-  { "pmdmgr",          { NULL }, 7426,  "tcp"  },
-  { "pmdmgr",          { NULL }, 7426,  "udp"  },
-  { "oveadmgr",        { NULL }, 7427,  "tcp"  },
-  { "oveadmgr",        { NULL }, 7427,  "udp"  },
-  { "ovladmgr",        { NULL }, 7428,  "tcp"  },
-  { "ovladmgr",        { NULL }, 7428,  "udp"  },
-  { "opi-sock",        { NULL }, 7429,  "tcp"  },
-  { "opi-sock",        { NULL }, 7429,  "udp"  },
-  { "xmpv7",           { NULL }, 7430,  "tcp"  },
-  { "xmpv7",           { NULL }, 7430,  "udp"  },
-  { "pmd",             { NULL }, 7431,  "tcp"  },
-  { "pmd",             { NULL }, 7431,  "udp"  },
-  { "faximum",         { NULL }, 7437,  "tcp"  },
-  { "faximum",         { NULL }, 7437,  "udp"  },
-  { "oracleas-https",  { NULL }, 7443,  "tcp"  },
-  { "oracleas-https",  { NULL }, 7443,  "udp"  },
-  { "rise",            { NULL }, 7473,  "tcp"  },
-  { "rise",            { NULL }, 7473,  "udp"  },
-  { "telops-lmd",      { NULL }, 7491,  "tcp"  },
-  { "telops-lmd",      { NULL }, 7491,  "udp"  },
-  { "silhouette",      { NULL }, 7500,  "tcp"  },
-  { "silhouette",      { NULL }, 7500,  "udp"  },
-  { "ovbus",           { NULL }, 7501,  "tcp"  },
-  { "ovbus",           { NULL }, 7501,  "udp"  },
-  { "acplt",           { NULL }, 7509,  "tcp"  },
-  { "ovhpas",          { NULL }, 7510,  "tcp"  },
-  { "ovhpas",          { NULL }, 7510,  "udp"  },
-  { "pafec-lm",        { NULL }, 7511,  "tcp"  },
-  { "pafec-lm",        { NULL }, 7511,  "udp"  },
-  { "saratoga",        { NULL }, 7542,  "tcp"  },
-  { "saratoga",        { NULL }, 7542,  "udp"  },
-  { "atul",            { NULL }, 7543,  "tcp"  },
-  { "atul",            { NULL }, 7543,  "udp"  },
-  { "nta-ds",          { NULL }, 7544,  "tcp"  },
-  { "nta-ds",          { NULL }, 7544,  "udp"  },
-  { "nta-us",          { NULL }, 7545,  "tcp"  },
-  { "nta-us",          { NULL }, 7545,  "udp"  },
-  { "cfs",             { NULL }, 7546,  "tcp"  },
-  { "cfs",             { NULL }, 7546,  "udp"  },
-  { "cwmp",            { NULL }, 7547,  "tcp"  },
-  { "cwmp",            { NULL }, 7547,  "udp"  },
-  { "tidp",            { NULL }, 7548,  "tcp"  },
-  { "tidp",            { NULL }, 7548,  "udp"  },
-  { "nls-tl",          { NULL }, 7549,  "tcp"  },
-  { "nls-tl",          { NULL }, 7549,  "udp"  },
-  { "sncp",            { NULL }, 7560,  "tcp"  },
-  { "sncp",            { NULL }, 7560,  "udp"  },
-  { "cfw",             { NULL }, 7563,  "tcp"  },
-  { "vsi-omega",       { NULL }, 7566,  "tcp"  },
-  { "vsi-omega",       { NULL }, 7566,  "udp"  },
-  { "dell-eql-asm",    { NULL }, 7569,  "tcp"  },
-  { "aries-kfinder",   { NULL }, 7570,  "tcp"  },
-  { "aries-kfinder",   { NULL }, 7570,  "udp"  },
-  { "sun-lm",          { NULL }, 7588,  "tcp"  },
-  { "sun-lm",          { NULL }, 7588,  "udp"  },
-  { "indi",            { NULL }, 7624,  "tcp"  },
-  { "indi",            { NULL }, 7624,  "udp"  },
-  { "simco",           { NULL }, 7626,  "tcp"  },
-  { "simco",           { NULL }, 7626,  "sctp" },
-  { "soap-http",       { NULL }, 7627,  "tcp"  },
-  { "soap-http",       { NULL }, 7627,  "udp"  },
-  { "zen-pawn",        { NULL }, 7628,  "tcp"  },
-  { "zen-pawn",        { NULL }, 7628,  "udp"  },
-  { "xdas",            { NULL }, 7629,  "tcp"  },
-  { "xdas",            { NULL }, 7629,  "udp"  },
-  { "hawk",            { NULL }, 7630,  "tcp"  },
-  { "tesla-sys-msg",   { NULL }, 7631,  "tcp"  },
-  { "pmdfmgt",         { NULL }, 7633,  "tcp"  },
-  { "pmdfmgt",         { NULL }, 7633,  "udp"  },
-  { "cuseeme",         { NULL }, 7648,  "tcp"  },
-  { "cuseeme",         { NULL }, 7648,  "udp"  },
-  { "imqstomp",        { NULL }, 7672,  "tcp"  },
-  { "imqstomps",       { NULL }, 7673,  "tcp"  },
-  { "imqtunnels",      { NULL }, 7674,  "tcp"  },
-  { "imqtunnels",      { NULL }, 7674,  "udp"  },
-  { "imqtunnel",       { NULL }, 7675,  "tcp"  },
-  { "imqtunnel",       { NULL }, 7675,  "udp"  },
-  { "imqbrokerd",      { NULL }, 7676,  "tcp"  },
-  { "imqbrokerd",      { NULL }, 7676,  "udp"  },
-  { "sun-user-https",  { NULL }, 7677,  "tcp"  },
-  { "sun-user-https",  { NULL }, 7677,  "udp"  },
-  { "pando-pub",       { NULL }, 7680,  "tcp"  },
-  { "pando-pub",       { NULL }, 7680,  "udp"  },
-  { "collaber",        { NULL }, 7689,  "tcp"  },
-  { "collaber",        { NULL }, 7689,  "udp"  },
-  { "klio",            { NULL }, 7697,  "tcp"  },
-  { "klio",            { NULL }, 7697,  "udp"  },
-  { "em7-secom",       { NULL }, 7700,  "tcp"  },
-  { "sync-em7",        { NULL }, 7707,  "tcp"  },
-  { "sync-em7",        { NULL }, 7707,  "udp"  },
-  { "scinet",          { NULL }, 7708,  "tcp"  },
-  { "scinet",          { NULL }, 7708,  "udp"  },
-  { "medimageportal",  { NULL }, 7720,  "tcp"  },
-  { "medimageportal",  { NULL }, 7720,  "udp"  },
-  { "nsdeepfreezectl", { NULL }, 7724,  "tcp"  },
-  { "nsdeepfreezectl", { NULL }, 7724,  "udp"  },
-  { "nitrogen",        { NULL }, 7725,  "tcp"  },
-  { "nitrogen",        { NULL }, 7725,  "udp"  },
-  { "freezexservice",  { NULL }, 7726,  "tcp"  },
-  { "freezexservice",  { NULL }, 7726,  "udp"  },
-  { "trident-data",    { NULL }, 7727,  "tcp"  },
-  { "trident-data",    { NULL }, 7727,  "udp"  },
-  { "smip",            { NULL }, 7734,  "tcp"  },
-  { "smip",            { NULL }, 7734,  "udp"  },
-  { "aiagent",         { NULL }, 7738,  "tcp"  },
-  { "aiagent",         { NULL }, 7738,  "udp"  },
-  { "scriptview",      { NULL }, 7741,  "tcp"  },
-  { "scriptview",      { NULL }, 7741,  "udp"  },
-  { "msss",            { NULL }, 7742,  "tcp"  },
-  { "sstp-1",          { NULL }, 7743,  "tcp"  },
-  { "sstp-1",          { NULL }, 7743,  "udp"  },
-  { "raqmon-pdu",      { NULL }, 7744,  "tcp"  },
-  { "raqmon-pdu",      { NULL }, 7744,  "udp"  },
-  { "prgp",            { NULL }, 7747,  "tcp"  },
-  { "prgp",            { NULL }, 7747,  "udp"  },
-  { "cbt",             { NULL }, 7777,  "tcp"  },
-  { "cbt",             { NULL }, 7777,  "udp"  },
-  { "interwise",       { NULL }, 7778,  "tcp"  },
-  { "interwise",       { NULL }, 7778,  "udp"  },
-  { "vstat",           { NULL }, 7779,  "tcp"  },
-  { "vstat",           { NULL }, 7779,  "udp"  },
-  { "accu-lmgr",       { NULL }, 7781,  "tcp"  },
-  { "accu-lmgr",       { NULL }, 7781,  "udp"  },
-  { "minivend",        { NULL }, 7786,  "tcp"  },
-  { "minivend",        { NULL }, 7786,  "udp"  },
-  { "popup-reminders", { NULL }, 7787,  "tcp"  },
-  { "popup-reminders", { NULL }, 7787,  "udp"  },
-  { "office-tools",    { NULL }, 7789,  "tcp"  },
-  { "office-tools",    { NULL }, 7789,  "udp"  },
-  { "q3ade",           { NULL }, 7794,  "tcp"  },
-  { "q3ade",           { NULL }, 7794,  "udp"  },
-  { "pnet-conn",       { NULL }, 7797,  "tcp"  },
-  { "pnet-conn",       { NULL }, 7797,  "udp"  },
-  { "pnet-enc",        { NULL }, 7798,  "tcp"  },
-  { "pnet-enc",        { NULL }, 7798,  "udp"  },
-  { "altbsdp",         { NULL }, 7799,  "tcp"  },
-  { "altbsdp",         { NULL }, 7799,  "udp"  },
-  { "asr",             { NULL }, 7800,  "tcp"  },
-  { "asr",             { NULL }, 7800,  "udp"  },
-  { "ssp-client",      { NULL }, 7801,  "tcp"  },
-  { "ssp-client",      { NULL }, 7801,  "udp"  },
-  { "rbt-wanopt",      { NULL }, 7810,  "tcp"  },
-  { "rbt-wanopt",      { NULL }, 7810,  "udp"  },
-  { "apc-7845",        { NULL }, 7845,  "tcp"  },
-  { "apc-7845",        { NULL }, 7845,  "udp"  },
-  { "apc-7846",        { NULL }, 7846,  "tcp"  },
-  { "apc-7846",        { NULL }, 7846,  "udp"  },
-  { "mobileanalyzer",  { NULL }, 7869,  "tcp"  },
-  { "rbt-smc",         { NULL }, 7870,  "tcp"  },
-  { "pss",             { NULL }, 7880,  "tcp"  },
-  { "pss",             { NULL }, 7880,  "udp"  },
-  { "ubroker",         { NULL }, 7887,  "tcp"  },
-  { "ubroker",         { NULL }, 7887,  "udp"  },
-  { "mevent",          { NULL }, 7900,  "tcp"  },
-  { "mevent",          { NULL }, 7900,  "udp"  },
-  { "tnos-sp",         { NULL }, 7901,  "tcp"  },
-  { "tnos-sp",         { NULL }, 7901,  "udp"  },
-  { "tnos-dp",         { NULL }, 7902,  "tcp"  },
-  { "tnos-dp",         { NULL }, 7902,  "udp"  },
-  { "tnos-dps",        { NULL }, 7903,  "tcp"  },
-  { "tnos-dps",        { NULL }, 7903,  "udp"  },
-  { "qo-secure",       { NULL }, 7913,  "tcp"  },
-  { "qo-secure",       { NULL }, 7913,  "udp"  },
-  { "t2-drm",          { NULL }, 7932,  "tcp"  },
-  { "t2-drm",          { NULL }, 7932,  "udp"  },
-  { "t2-brm",          { NULL }, 7933,  "tcp"  },
-  { "t2-brm",          { NULL }, 7933,  "udp"  },
-  { "supercell",       { NULL }, 7967,  "tcp"  },
-  { "supercell",       { NULL }, 7967,  "udp"  },
-  { "micromuse-ncps",  { NULL }, 7979,  "tcp"  },
-  { "micromuse-ncps",  { NULL }, 7979,  "udp"  },
-  { "quest-vista",     { NULL }, 7980,  "tcp"  },
-  { "quest-vista",     { NULL }, 7980,  "udp"  },
-  { "sossd-collect",   { NULL }, 7981,  "tcp"  },
-  { "sossd-agent",     { NULL }, 7982,  "tcp"  },
-  { "sossd-disc",      { NULL }, 7982,  "udp"  },
-  { "pushns",          { NULL }, 7997,  "tcp"  },
-  { "usicontentpush",  { NULL }, 7998,  "udp"  },
-  { "irdmi2",          { NULL }, 7999,  "tcp"  },
-  { "irdmi2",          { NULL }, 7999,  "udp"  },
-  { "irdmi",           { NULL }, 8000,  "tcp"  },
-  { "irdmi",           { NULL }, 8000,  "udp"  },
-  { "vcom-tunnel",     { NULL }, 8001,  "tcp"  },
-  { "vcom-tunnel",     { NULL }, 8001,  "udp"  },
-  { "teradataordbms",  { NULL }, 8002,  "tcp"  },
-  { "teradataordbms",  { NULL }, 8002,  "udp"  },
-  { "mcreport",        { NULL }, 8003,  "tcp"  },
-  { "mcreport",        { NULL }, 8003,  "udp"  },
-  { "mxi",             { NULL }, 8005,  "tcp"  },
-  { "mxi",             { NULL }, 8005,  "udp"  },
-  { "http-alt",        { NULL }, 8008,  "tcp"  },
-  { "http-alt",        { NULL }, 8008,  "udp"  },
-  { "qbdb",            { NULL }, 8019,  "tcp"  },
-  { "qbdb",            { NULL }, 8019,  "udp"  },
-  { "intu-ec-svcdisc", { NULL }, 8020,  "tcp"  },
-  { "intu-ec-svcdisc", { NULL }, 8020,  "udp"  },
-  { "intu-ec-client",  { NULL }, 8021,  "tcp"  },
-  { "intu-ec-client",  { NULL }, 8021,  "udp"  },
-  { "oa-system",       { NULL }, 8022,  "tcp"  },
-  { "oa-system",       { NULL }, 8022,  "udp"  },
-  { "ca-audit-da",     { NULL }, 8025,  "tcp"  },
-  { "ca-audit-da",     { NULL }, 8025,  "udp"  },
-  { "ca-audit-ds",     { NULL }, 8026,  "tcp"  },
-  { "ca-audit-ds",     { NULL }, 8026,  "udp"  },
-  { "pro-ed",          { NULL }, 8032,  "tcp"  },
-  { "pro-ed",          { NULL }, 8032,  "udp"  },
-  { "mindprint",       { NULL }, 8033,  "tcp"  },
-  { "mindprint",       { NULL }, 8033,  "udp"  },
-  { "vantronix-mgmt",  { NULL }, 8034,  "tcp"  },
-  { "vantronix-mgmt",  { NULL }, 8034,  "udp"  },
-  { "ampify",          { NULL }, 8040,  "tcp"  },
-  { "ampify",          { NULL }, 8040,  "udp"  },
-  { "fs-agent",        { NULL }, 8042,  "tcp"  },
-  { "fs-server",       { NULL }, 8043,  "tcp"  },
-  { "fs-mgmt",         { NULL }, 8044,  "tcp"  },
-  { "senomix01",       { NULL }, 8052,  "tcp"  },
-  { "senomix01",       { NULL }, 8052,  "udp"  },
-  { "senomix02",       { NULL }, 8053,  "tcp"  },
-  { "senomix02",       { NULL }, 8053,  "udp"  },
-  { "senomix03",       { NULL }, 8054,  "tcp"  },
-  { "senomix03",       { NULL }, 8054,  "udp"  },
-  { "senomix04",       { NULL }, 8055,  "tcp"  },
-  { "senomix04",       { NULL }, 8055,  "udp"  },
-  { "senomix05",       { NULL }, 8056,  "tcp"  },
-  { "senomix05",       { NULL }, 8056,  "udp"  },
-  { "senomix06",       { NULL }, 8057,  "tcp"  },
-  { "senomix06",       { NULL }, 8057,  "udp"  },
-  { "senomix07",       { NULL }, 8058,  "tcp"  },
-  { "senomix07",       { NULL }, 8058,  "udp"  },
-  { "senomix08",       { NULL }, 8059,  "tcp"  },
-  { "senomix08",       { NULL }, 8059,  "udp"  },
-  { "gadugadu",        { NULL }, 8074,  "tcp"  },
-  { "gadugadu",        { NULL }, 8074,  "udp"  },
-  { "http-alt",        { NULL }, 8080,  "tcp"  },
-  { "http-alt",        { NULL }, 8080,  "udp"  },
-  { "sunproxyadmin",   { NULL }, 8081,  "tcp"  },
-  { "sunproxyadmin",   { NULL }, 8081,  "udp"  },
-  { "us-cli",          { NULL }, 8082,  "tcp"  },
-  { "us-cli",          { NULL }, 8082,  "udp"  },
-  { "us-srv",          { NULL }, 8083,  "tcp"  },
-  { "us-srv",          { NULL }, 8083,  "udp"  },
-  { "d-s-n",           { NULL }, 8086,  "tcp"  },
-  { "d-s-n",           { NULL }, 8086,  "udp"  },
-  { "simplifymedia",   { NULL }, 8087,  "tcp"  },
-  { "simplifymedia",   { NULL }, 8087,  "udp"  },
-  { "radan-http",      { NULL }, 8088,  "tcp"  },
-  { "radan-http",      { NULL }, 8088,  "udp"  },
-  { "jamlink",         { NULL }, 8091,  "tcp"  },
-  { "sac",             { NULL }, 8097,  "tcp"  },
-  { "sac",             { NULL }, 8097,  "udp"  },
-  { "xprint-server",   { NULL }, 8100,  "tcp"  },
-  { "xprint-server",   { NULL }, 8100,  "udp"  },
-  { "ldoms-migr",      { NULL }, 8101,  "tcp"  },
-  { "mtl8000-matrix",  { NULL }, 8115,  "tcp"  },
-  { "mtl8000-matrix",  { NULL }, 8115,  "udp"  },
-  { "cp-cluster",      { NULL }, 8116,  "tcp"  },
-  { "cp-cluster",      { NULL }, 8116,  "udp"  },
-  { "privoxy",         { NULL }, 8118,  "tcp"  },
-  { "privoxy",         { NULL }, 8118,  "udp"  },
-  { "apollo-data",     { NULL }, 8121,  "tcp"  },
-  { "apollo-data",     { NULL }, 8121,  "udp"  },
-  { "apollo-admin",    { NULL }, 8122,  "tcp"  },
-  { "apollo-admin",    { NULL }, 8122,  "udp"  },
-  { "paycash-online",  { NULL }, 8128,  "tcp"  },
-  { "paycash-online",  { NULL }, 8128,  "udp"  },
-  { "paycash-wbp",     { NULL }, 8129,  "tcp"  },
-  { "paycash-wbp",     { NULL }, 8129,  "udp"  },
-  { "indigo-vrmi",     { NULL }, 8130,  "tcp"  },
-  { "indigo-vrmi",     { NULL }, 8130,  "udp"  },
-  { "indigo-vbcp",     { NULL }, 8131,  "tcp"  },
-  { "indigo-vbcp",     { NULL }, 8131,  "udp"  },
-  { "dbabble",         { NULL }, 8132,  "tcp"  },
-  { "dbabble",         { NULL }, 8132,  "udp"  },
-  { "isdd",            { NULL }, 8148,  "tcp"  },
-  { "isdd",            { NULL }, 8148,  "udp"  },
-  { "patrol",          { NULL }, 8160,  "tcp"  },
-  { "patrol",          { NULL }, 8160,  "udp"  },
-  { "patrol-snmp",     { NULL }, 8161,  "tcp"  },
-  { "patrol-snmp",     { NULL }, 8161,  "udp"  },
-  { "vmware-fdm",      { NULL }, 8182,  "tcp"  },
-  { "vmware-fdm",      { NULL }, 8182,  "udp"  },
-  { "proremote",       { NULL }, 8183,  "tcp"  },
-  { "itach",           { NULL }, 8184,  "tcp"  },
-  { "itach",           { NULL }, 8184,  "udp"  },
-  { "spytechphone",    { NULL }, 8192,  "tcp"  },
-  { "spytechphone",    { NULL }, 8192,  "udp"  },
-  { "blp1",            { NULL }, 8194,  "tcp"  },
-  { "blp1",            { NULL }, 8194,  "udp"  },
-  { "blp2",            { NULL }, 8195,  "tcp"  },
-  { "blp2",            { NULL }, 8195,  "udp"  },
-  { "vvr-data",        { NULL }, 8199,  "tcp"  },
-  { "vvr-data",        { NULL }, 8199,  "udp"  },
-  { "trivnet1",        { NULL }, 8200,  "tcp"  },
-  { "trivnet1",        { NULL }, 8200,  "udp"  },
-  { "trivnet2",        { NULL }, 8201,  "tcp"  },
-  { "trivnet2",        { NULL }, 8201,  "udp"  },
-  { "lm-perfworks",    { NULL }, 8204,  "tcp"  },
-  { "lm-perfworks",    { NULL }, 8204,  "udp"  },
-  { "lm-instmgr",      { NULL }, 8205,  "tcp"  },
-  { "lm-instmgr",      { NULL }, 8205,  "udp"  },
-  { "lm-dta",          { NULL }, 8206,  "tcp"  },
-  { "lm-dta",          { NULL }, 8206,  "udp"  },
-  { "lm-sserver",      { NULL }, 8207,  "tcp"  },
-  { "lm-sserver",      { NULL }, 8207,  "udp"  },
-  { "lm-webwatcher",   { NULL }, 8208,  "tcp"  },
-  { "lm-webwatcher",   { NULL }, 8208,  "udp"  },
-  { "rexecj",          { NULL }, 8230,  "tcp"  },
-  { "rexecj",          { NULL }, 8230,  "udp"  },
-  { "synapse-nhttps",  { NULL }, 8243,  "tcp"  },
-  { "synapse-nhttps",  { NULL }, 8243,  "udp"  },
-  { "pando-sec",       { NULL }, 8276,  "tcp"  },
-  { "pando-sec",       { NULL }, 8276,  "udp"  },
-  { "synapse-nhttp",   { NULL }, 8280,  "tcp"  },
-  { "synapse-nhttp",   { NULL }, 8280,  "udp"  },
-  { "blp3",            { NULL }, 8292,  "tcp"  },
-  { "blp3",            { NULL }, 8292,  "udp"  },
-  { "hiperscan-id",    { NULL }, 8293,  "tcp"  },
-  { "blp4",            { NULL }, 8294,  "tcp"  },
-  { "blp4",            { NULL }, 8294,  "udp"  },
-  { "tmi",             { NULL }, 8300,  "tcp"  },
-  { "tmi",             { NULL }, 8300,  "udp"  },
-  { "amberon",         { NULL }, 8301,  "tcp"  },
-  { "amberon",         { NULL }, 8301,  "udp"  },
-  { "tnp-discover",    { NULL }, 8320,  "tcp"  },
-  { "tnp-discover",    { NULL }, 8320,  "udp"  },
-  { "tnp",             { NULL }, 8321,  "tcp"  },
-  { "tnp",             { NULL }, 8321,  "udp"  },
-  { "server-find",     { NULL }, 8351,  "tcp"  },
-  { "server-find",     { NULL }, 8351,  "udp"  },
-  { "cruise-enum",     { NULL }, 8376,  "tcp"  },
-  { "cruise-enum",     { NULL }, 8376,  "udp"  },
-  { "cruise-swroute",  { NULL }, 8377,  "tcp"  },
-  { "cruise-swroute",  { NULL }, 8377,  "udp"  },
-  { "cruise-config",   { NULL }, 8378,  "tcp"  },
-  { "cruise-config",   { NULL }, 8378,  "udp"  },
-  { "cruise-diags",    { NULL }, 8379,  "tcp"  },
-  { "cruise-diags",    { NULL }, 8379,  "udp"  },
-  { "cruise-update",   { NULL }, 8380,  "tcp"  },
-  { "cruise-update",   { NULL }, 8380,  "udp"  },
-  { "m2mservices",     { NULL }, 8383,  "tcp"  },
-  { "m2mservices",     { NULL }, 8383,  "udp"  },
-  { "cvd",             { NULL }, 8400,  "tcp"  },
-  { "cvd",             { NULL }, 8400,  "udp"  },
-  { "sabarsd",         { NULL }, 8401,  "tcp"  },
-  { "sabarsd",         { NULL }, 8401,  "udp"  },
-  { "abarsd",          { NULL }, 8402,  "tcp"  },
-  { "abarsd",          { NULL }, 8402,  "udp"  },
-  { "admind",          { NULL }, 8403,  "tcp"  },
-  { "admind",          { NULL }, 8403,  "udp"  },
-  { "svcloud",         { NULL }, 8404,  "tcp"  },
-  { "svbackup",        { NULL }, 8405,  "tcp"  },
-  { "espeech",         { NULL }, 8416,  "tcp"  },
-  { "espeech",         { NULL }, 8416,  "udp"  },
-  { "espeech-rtp",     { NULL }, 8417,  "tcp"  },
-  { "espeech-rtp",     { NULL }, 8417,  "udp"  },
-  { "cybro-a-bus",     { NULL }, 8442,  "tcp"  },
-  { "cybro-a-bus",     { NULL }, 8442,  "udp"  },
-  { "pcsync-https",    { NULL }, 8443,  "tcp"  },
-  { "pcsync-https",    { NULL }, 8443,  "udp"  },
-  { "pcsync-http",     { NULL }, 8444,  "tcp"  },
-  { "pcsync-http",     { NULL }, 8444,  "udp"  },
-  { "npmp",            { NULL }, 8450,  "tcp"  },
-  { "npmp",            { NULL }, 8450,  "udp"  },
-  { "cisco-avp",       { NULL }, 8470,  "tcp"  },
-  { "pim-port",        { NULL }, 8471,  "tcp"  },
-  { "pim-port",        { NULL }, 8471,  "sctp" },
-  { "otv",             { NULL }, 8472,  "tcp"  },
-  { "otv",             { NULL }, 8472,  "udp"  },
-  { "vp2p",            { NULL }, 8473,  "tcp"  },
-  { "vp2p",            { NULL }, 8473,  "udp"  },
-  { "noteshare",       { NULL }, 8474,  "tcp"  },
-  { "noteshare",       { NULL }, 8474,  "udp"  },
-  { "fmtp",            { NULL }, 8500,  "tcp"  },
-  { "fmtp",            { NULL }, 8500,  "udp"  },
-  { "rtsp-alt",        { NULL }, 8554,  "tcp"  },
-  { "rtsp-alt",        { NULL }, 8554,  "udp"  },
-  { "d-fence",         { NULL }, 8555,  "tcp"  },
-  { "d-fence",         { NULL }, 8555,  "udp"  },
-  { "oap-admin",       { NULL }, 8567,  "tcp"  },
-  { "oap-admin",       { NULL }, 8567,  "udp"  },
-  { "asterix",         { NULL }, 8600,  "tcp"  },
-  { "asterix",         { NULL }, 8600,  "udp"  },
-  { "canon-mfnp",      { NULL }, 8610,  "tcp"  },
-  { "canon-mfnp",      { NULL }, 8610,  "udp"  },
-  { "canon-bjnp1",     { NULL }, 8611,  "tcp"  },
-  { "canon-bjnp1",     { NULL }, 8611,  "udp"  },
-  { "canon-bjnp2",     { NULL }, 8612,  "tcp"  },
-  { "canon-bjnp2",     { NULL }, 8612,  "udp"  },
-  { "canon-bjnp3",     { NULL }, 8613,  "tcp"  },
-  { "canon-bjnp3",     { NULL }, 8613,  "udp"  },
-  { "canon-bjnp4",     { NULL }, 8614,  "tcp"  },
-  { "canon-bjnp4",     { NULL }, 8614,  "udp"  },
-  { "sun-as-jmxrmi",   { NULL }, 8686,  "tcp"  },
-  { "sun-as-jmxrmi",   { NULL }, 8686,  "udp"  },
-  { "vnyx",            { NULL }, 8699,  "tcp"  },
-  { "vnyx",            { NULL }, 8699,  "udp"  },
-  { "dtp-net",         { NULL }, 8732,  "udp"  },
-  { "ibus",            { NULL }, 8733,  "tcp"  },
-  { "ibus",            { NULL }, 8733,  "udp"  },
-  { "mc-appserver",    { NULL }, 8763,  "tcp"  },
-  { "mc-appserver",    { NULL }, 8763,  "udp"  },
-  { "openqueue",       { NULL }, 8764,  "tcp"  },
-  { "openqueue",       { NULL }, 8764,  "udp"  },
-  { "ultraseek-http",  { NULL }, 8765,  "tcp"  },
-  { "ultraseek-http",  { NULL }, 8765,  "udp"  },
-  { "dpap",            { NULL }, 8770,  "tcp"  },
-  { "dpap",            { NULL }, 8770,  "udp"  },
-  { "msgclnt",         { NULL }, 8786,  "tcp"  },
-  { "msgclnt",         { NULL }, 8786,  "udp"  },
-  { "msgsrvr",         { NULL }, 8787,  "tcp"  },
-  { "msgsrvr",         { NULL }, 8787,  "udp"  },
-  { "sunwebadmin",     { NULL }, 8800,  "tcp"  },
-  { "sunwebadmin",     { NULL }, 8800,  "udp"  },
-  { "truecm",          { NULL }, 8804,  "tcp"  },
-  { "truecm",          { NULL }, 8804,  "udp"  },
-  { "dxspider",        { NULL }, 8873,  "tcp"  },
-  { "dxspider",        { NULL }, 8873,  "udp"  },
-  { "cddbp-alt",       { NULL }, 8880,  "tcp"  },
-  { "cddbp-alt",       { NULL }, 8880,  "udp"  },
-  { "secure-mqtt",     { NULL }, 8883,  "tcp"  },
-  { "secure-mqtt",     { NULL }, 8883,  "udp"  },
-  { "ddi-tcp-1",       { NULL }, 8888,  "tcp"  },
-  { "ddi-udp-1",       { NULL }, 8888,  "udp"  },
-  { "ddi-tcp-2",       { NULL }, 8889,  "tcp"  },
-  { "ddi-udp-2",       { NULL }, 8889,  "udp"  },
-  { "ddi-tcp-3",       { NULL }, 8890,  "tcp"  },
-  { "ddi-udp-3",       { NULL }, 8890,  "udp"  },
-  { "ddi-tcp-4",       { NULL }, 8891,  "tcp"  },
-  { "ddi-udp-4",       { NULL }, 8891,  "udp"  },
-  { "ddi-tcp-5",       { NULL }, 8892,  "tcp"  },
-  { "ddi-udp-5",       { NULL }, 8892,  "udp"  },
-  { "ddi-tcp-6",       { NULL }, 8893,  "tcp"  },
-  { "ddi-udp-6",       { NULL }, 8893,  "udp"  },
-  { "ddi-tcp-7",       { NULL }, 8894,  "tcp"  },
-  { "ddi-udp-7",       { NULL }, 8894,  "udp"  },
-  { "ospf-lite",       { NULL }, 8899,  "tcp"  },
-  { "ospf-lite",       { NULL }, 8899,  "udp"  },
-  { "jmb-cds1",        { NULL }, 8900,  "tcp"  },
-  { "jmb-cds1",        { NULL }, 8900,  "udp"  },
-  { "jmb-cds2",        { NULL }, 8901,  "tcp"  },
-  { "jmb-cds2",        { NULL }, 8901,  "udp"  },
-  { "manyone-http",    { NULL }, 8910,  "tcp"  },
-  { "manyone-http",    { NULL }, 8910,  "udp"  },
-  { "manyone-xml",     { NULL }, 8911,  "tcp"  },
-  { "manyone-xml",     { NULL }, 8911,  "udp"  },
-  { "wcbackup",        { NULL }, 8912,  "tcp"  },
-  { "wcbackup",        { NULL }, 8912,  "udp"  },
-  { "dragonfly",       { NULL }, 8913,  "tcp"  },
-  { "dragonfly",       { NULL }, 8913,  "udp"  },
-  { "twds",            { NULL }, 8937,  "tcp"  },
-  { "cumulus-admin",   { NULL }, 8954,  "tcp"  },
-  { "cumulus-admin",   { NULL }, 8954,  "udp"  },
-  { "sunwebadmins",    { NULL }, 8989,  "tcp"  },
-  { "sunwebadmins",    { NULL }, 8989,  "udp"  },
-  { "http-wmap",       { NULL }, 8990,  "tcp"  },
-  { "http-wmap",       { NULL }, 8990,  "udp"  },
-  { "https-wmap",      { NULL }, 8991,  "tcp"  },
-  { "https-wmap",      { NULL }, 8991,  "udp"  },
-  { "bctp",            { NULL }, 8999,  "tcp"  },
-  { "bctp",            { NULL }, 8999,  "udp"  },
-  { "cslistener",      { NULL }, 9000,  "tcp"  },
-  { "cslistener",      { NULL }, 9000,  "udp"  },
-  { "etlservicemgr",   { NULL }, 9001,  "tcp"  },
-  { "etlservicemgr",   { NULL }, 9001,  "udp"  },
-  { "dynamid",         { NULL }, 9002,  "tcp"  },
-  { "dynamid",         { NULL }, 9002,  "udp"  },
-  { "ogs-client",      { NULL }, 9007,  "udp"  },
-  { "ogs-server",      { NULL }, 9008,  "tcp"  },
-  { "pichat",          { NULL }, 9009,  "tcp"  },
-  { "pichat",          { NULL }, 9009,  "udp"  },
-  { "sdr",             { NULL }, 9010,  "tcp"  },
-  { "tambora",         { NULL }, 9020,  "tcp"  },
-  { "tambora",         { NULL }, 9020,  "udp"  },
-  { "panagolin-ident", { NULL }, 9021,  "tcp"  },
-  { "panagolin-ident", { NULL }, 9021,  "udp"  },
-  { "paragent",        { NULL }, 9022,  "tcp"  },
-  { "paragent",        { NULL }, 9022,  "udp"  },
-  { "swa-1",           { NULL }, 9023,  "tcp"  },
-  { "swa-1",           { NULL }, 9023,  "udp"  },
-  { "swa-2",           { NULL }, 9024,  "tcp"  },
-  { "swa-2",           { NULL }, 9024,  "udp"  },
-  { "swa-3",           { NULL }, 9025,  "tcp"  },
-  { "swa-3",           { NULL }, 9025,  "udp"  },
-  { "swa-4",           { NULL }, 9026,  "tcp"  },
-  { "swa-4",           { NULL }, 9026,  "udp"  },
-  { "versiera",        { NULL }, 9050,  "tcp"  },
-  { "fio-cmgmt",       { NULL }, 9051,  "tcp"  },
-  { "glrpc",           { NULL }, 9080,  "tcp"  },
-  { "glrpc",           { NULL }, 9080,  "udp"  },
-  { "lcs-ap",          { NULL }, 9082,  "sctp" },
-  { "emc-pp-mgmtsvc",  { NULL }, 9083,  "tcp"  },
-  { "aurora",          { NULL }, 9084,  "tcp"  },
-  { "aurora",          { NULL }, 9084,  "udp"  },
-  { "aurora",          { NULL }, 9084,  "sctp" },
-  { "ibm-rsyscon",     { NULL }, 9085,  "tcp"  },
-  { "ibm-rsyscon",     { NULL }, 9085,  "udp"  },
-  { "net2display",     { NULL }, 9086,  "tcp"  },
-  { "net2display",     { NULL }, 9086,  "udp"  },
-  { "classic",         { NULL }, 9087,  "tcp"  },
-  { "classic",         { NULL }, 9087,  "udp"  },
-  { "sqlexec",         { NULL }, 9088,  "tcp"  },
-  { "sqlexec",         { NULL }, 9088,  "udp"  },
-  { "sqlexec-ssl",     { NULL }, 9089,  "tcp"  },
-  { "sqlexec-ssl",     { NULL }, 9089,  "udp"  },
-  { "websm",           { NULL }, 9090,  "tcp"  },
-  { "websm",           { NULL }, 9090,  "udp"  },
-  { "xmltec-xmlmail",  { NULL }, 9091,  "tcp"  },
-  { "xmltec-xmlmail",  { NULL }, 9091,  "udp"  },
-  { "XmlIpcRegSvc",    { NULL }, 9092,  "tcp"  },
-  { "XmlIpcRegSvc",    { NULL }, 9092,  "udp"  },
-  { "hp-pdl-datastr",  { NULL }, 9100,  "tcp"  },
-  { "hp-pdl-datastr",  { NULL }, 9100,  "udp"  },
-  { "pdl-datastream",  { NULL }, 9100,  "tcp"  },
-  { "pdl-datastream",  { NULL }, 9100,  "udp"  },
-  { "bacula-dir",      { NULL }, 9101,  "tcp"  },
-  { "bacula-dir",      { NULL }, 9101,  "udp"  },
-  { "bacula-fd",       { NULL }, 9102,  "tcp"  },
-  { "bacula-fd",       { NULL }, 9102,  "udp"  },
-  { "bacula-sd",       { NULL }, 9103,  "tcp"  },
-  { "bacula-sd",       { NULL }, 9103,  "udp"  },
-  { "peerwire",        { NULL }, 9104,  "tcp"  },
-  { "peerwire",        { NULL }, 9104,  "udp"  },
-  { "xadmin",          { NULL }, 9105,  "tcp"  },
-  { "xadmin",          { NULL }, 9105,  "udp"  },
-  { "astergate",       { NULL }, 9106,  "tcp"  },
-  { "astergate-disc",  { NULL }, 9106,  "udp"  },
-  { "astergatefax",    { NULL }, 9107,  "tcp"  },
-  { "mxit",            { NULL }, 9119,  "tcp"  },
-  { "mxit",            { NULL }, 9119,  "udp"  },
-  { "dddp",            { NULL }, 9131,  "tcp"  },
-  { "dddp",            { NULL }, 9131,  "udp"  },
-  { "apani1",          { NULL }, 9160,  "tcp"  },
-  { "apani1",          { NULL }, 9160,  "udp"  },
-  { "apani2",          { NULL }, 9161,  "tcp"  },
-  { "apani2",          { NULL }, 9161,  "udp"  },
-  { "apani3",          { NULL }, 9162,  "tcp"  },
-  { "apani3",          { NULL }, 9162,  "udp"  },
-  { "apani4",          { NULL }, 9163,  "tcp"  },
-  { "apani4",          { NULL }, 9163,  "udp"  },
-  { "apani5",          { NULL }, 9164,  "tcp"  },
-  { "apani5",          { NULL }, 9164,  "udp"  },
-  { "sun-as-jpda",     { NULL }, 9191,  "tcp"  },
-  { "sun-as-jpda",     { NULL }, 9191,  "udp"  },
-  { "wap-wsp",         { NULL }, 9200,  "tcp"  },
-  { "wap-wsp",         { NULL }, 9200,  "udp"  },
-  { "wap-wsp-wtp",     { NULL }, 9201,  "tcp"  },
-  { "wap-wsp-wtp",     { NULL }, 9201,  "udp"  },
-  { "wap-wsp-s",       { NULL }, 9202,  "tcp"  },
-  { "wap-wsp-s",       { NULL }, 9202,  "udp"  },
-  { "wap-wsp-wtp-s",   { NULL }, 9203,  "tcp"  },
-  { "wap-wsp-wtp-s",   { NULL }, 9203,  "udp"  },
-  { "wap-vcard",       { NULL }, 9204,  "tcp"  },
-  { "wap-vcard",       { NULL }, 9204,  "udp"  },
-  { "wap-vcal",        { NULL }, 9205,  "tcp"  },
-  { "wap-vcal",        { NULL }, 9205,  "udp"  },
-  { "wap-vcard-s",     { NULL }, 9206,  "tcp"  },
-  { "wap-vcard-s",     { NULL }, 9206,  "udp"  },
-  { "wap-vcal-s",      { NULL }, 9207,  "tcp"  },
-  { "wap-vcal-s",      { NULL }, 9207,  "udp"  },
-  { "rjcdb-vcards",    { NULL }, 9208,  "tcp"  },
-  { "rjcdb-vcards",    { NULL }, 9208,  "udp"  },
-  { "almobile-system", { NULL }, 9209,  "tcp"  },
-  { "almobile-system", { NULL }, 9209,  "udp"  },
-  { "oma-mlp",         { NULL }, 9210,  "tcp"  },
-  { "oma-mlp",         { NULL }, 9210,  "udp"  },
-  { "oma-mlp-s",       { NULL }, 9211,  "tcp"  },
-  { "oma-mlp-s",       { NULL }, 9211,  "udp"  },
-  { "serverviewdbms",  { NULL }, 9212,  "tcp"  },
-  { "serverviewdbms",  { NULL }, 9212,  "udp"  },
-  { "serverstart",     { NULL }, 9213,  "tcp"  },
-  { "serverstart",     { NULL }, 9213,  "udp"  },
-  { "ipdcesgbs",       { NULL }, 9214,  "tcp"  },
-  { "ipdcesgbs",       { NULL }, 9214,  "udp"  },
-  { "insis",           { NULL }, 9215,  "tcp"  },
-  { "insis",           { NULL }, 9215,  "udp"  },
-  { "acme",            { NULL }, 9216,  "tcp"  },
-  { "acme",            { NULL }, 9216,  "udp"  },
-  { "fsc-port",        { NULL }, 9217,  "tcp"  },
-  { "fsc-port",        { NULL }, 9217,  "udp"  },
-  { "teamcoherence",   { NULL }, 9222,  "tcp"  },
-  { "teamcoherence",   { NULL }, 9222,  "udp"  },
-  { "mon",             { NULL }, 9255,  "tcp"  },
-  { "mon",             { NULL }, 9255,  "udp"  },
-  { "pegasus",         { NULL }, 9278,  "tcp"  },
-  { "pegasus",         { NULL }, 9278,  "udp"  },
-  { "pegasus-ctl",     { NULL }, 9279,  "tcp"  },
-  { "pegasus-ctl",     { NULL }, 9279,  "udp"  },
-  { "pgps",            { NULL }, 9280,  "tcp"  },
-  { "pgps",            { NULL }, 9280,  "udp"  },
-  { "swtp-port1",      { NULL }, 9281,  "tcp"  },
-  { "swtp-port1",      { NULL }, 9281,  "udp"  },
-  { "swtp-port2",      { NULL }, 9282,  "tcp"  },
-  { "swtp-port2",      { NULL }, 9282,  "udp"  },
-  { "callwaveiam",     { NULL }, 9283,  "tcp"  },
-  { "callwaveiam",     { NULL }, 9283,  "udp"  },
-  { "visd",            { NULL }, 9284,  "tcp"  },
-  { "visd",            { NULL }, 9284,  "udp"  },
-  { "n2h2server",      { NULL }, 9285,  "tcp"  },
-  { "n2h2server",      { NULL }, 9285,  "udp"  },
-  { "n2receive",       { NULL }, 9286,  "udp"  },
-  { "cumulus",         { NULL }, 9287,  "tcp"  },
-  { "cumulus",         { NULL }, 9287,  "udp"  },
-  { "armtechdaemon",   { NULL }, 9292,  "tcp"  },
-  { "armtechdaemon",   { NULL }, 9292,  "udp"  },
-  { "storview",        { NULL }, 9293,  "tcp"  },
-  { "storview",        { NULL }, 9293,  "udp"  },
-  { "armcenterhttp",   { NULL }, 9294,  "tcp"  },
-  { "armcenterhttp",   { NULL }, 9294,  "udp"  },
-  { "armcenterhttps",  { NULL }, 9295,  "tcp"  },
-  { "armcenterhttps",  { NULL }, 9295,  "udp"  },
-  { "vrace",           { NULL }, 9300,  "tcp"  },
-  { "vrace",           { NULL }, 9300,  "udp"  },
-  { "sphinxql",        { NULL }, 9306,  "tcp"  },
-  { "sphinxapi",       { NULL }, 9312,  "tcp"  },
-  { "secure-ts",       { NULL }, 9318,  "tcp"  },
-  { "secure-ts",       { NULL }, 9318,  "udp"  },
-  { "guibase",         { NULL }, 9321,  "tcp"  },
-  { "guibase",         { NULL }, 9321,  "udp"  },
-  { "mpidcmgr",        { NULL }, 9343,  "tcp"  },
-  { "mpidcmgr",        { NULL }, 9343,  "udp"  },
-  { "mphlpdmc",        { NULL }, 9344,  "tcp"  },
-  { "mphlpdmc",        { NULL }, 9344,  "udp"  },
-  { "ctechlicensing",  { NULL }, 9346,  "tcp"  },
-  { "ctechlicensing",  { NULL }, 9346,  "udp"  },
-  { "fjdmimgr",        { NULL }, 9374,  "tcp"  },
-  { "fjdmimgr",        { NULL }, 9374,  "udp"  },
-  { "boxp",            { NULL }, 9380,  "tcp"  },
-  { "boxp",            { NULL }, 9380,  "udp"  },
-  { "d2dconfig",       { NULL }, 9387,  "tcp"  },
-  { "d2ddatatrans",    { NULL }, 9388,  "tcp"  },
-  { "adws",            { NULL }, 9389,  "tcp"  },
-  { "otp",             { NULL }, 9390,  "tcp"  },
-  { "fjinvmgr",        { NULL }, 9396,  "tcp"  },
-  { "fjinvmgr",        { NULL }, 9396,  "udp"  },
-  { "mpidcagt",        { NULL }, 9397,  "tcp"  },
-  { "mpidcagt",        { NULL }, 9397,  "udp"  },
-  { "sec-t4net-srv",   { NULL }, 9400,  "tcp"  },
-  { "sec-t4net-srv",   { NULL }, 9400,  "udp"  },
-  { "sec-t4net-clt",   { NULL }, 9401,  "tcp"  },
-  { "sec-t4net-clt",   { NULL }, 9401,  "udp"  },
-  { "sec-pc2fax-srv",  { NULL }, 9402,  "tcp"  },
-  { "sec-pc2fax-srv",  { NULL }, 9402,  "udp"  },
-  { "git",             { NULL }, 9418,  "tcp"  },
-  { "git",             { NULL }, 9418,  "udp"  },
-  { "tungsten-https",  { NULL }, 9443,  "tcp"  },
-  { "tungsten-https",  { NULL }, 9443,  "udp"  },
-  { "wso2esb-console", { NULL }, 9444,  "tcp"  },
-  { "wso2esb-console", { NULL }, 9444,  "udp"  },
-  { "sntlkeyssrvr",    { NULL }, 9450,  "tcp"  },
-  { "sntlkeyssrvr",    { NULL }, 9450,  "udp"  },
-  { "ismserver",       { NULL }, 9500,  "tcp"  },
-  { "ismserver",       { NULL }, 9500,  "udp"  },
-  { "sma-spw",         { NULL }, 9522,  "udp"  },
-  { "mngsuite",        { NULL }, 9535,  "tcp"  },
-  { "mngsuite",        { NULL }, 9535,  "udp"  },
-  { "laes-bf",         { NULL }, 9536,  "tcp"  },
-  { "laes-bf",         { NULL }, 9536,  "udp"  },
-  { "trispen-sra",     { NULL }, 9555,  "tcp"  },
-  { "trispen-sra",     { NULL }, 9555,  "udp"  },
-  { "ldgateway",       { NULL }, 9592,  "tcp"  },
-  { "ldgateway",       { NULL }, 9592,  "udp"  },
-  { "cba8",            { NULL }, 9593,  "tcp"  },
-  { "cba8",            { NULL }, 9593,  "udp"  },
-  { "msgsys",          { NULL }, 9594,  "tcp"  },
-  { "msgsys",          { NULL }, 9594,  "udp"  },
-  { "pds",             { NULL }, 9595,  "tcp"  },
-  { "pds",             { NULL }, 9595,  "udp"  },
-  { "mercury-disc",    { NULL }, 9596,  "tcp"  },
-  { "mercury-disc",    { NULL }, 9596,  "udp"  },
-  { "pd-admin",        { NULL }, 9597,  "tcp"  },
-  { "pd-admin",        { NULL }, 9597,  "udp"  },
-  { "vscp",            { NULL }, 9598,  "tcp"  },
-  { "vscp",            { NULL }, 9598,  "udp"  },
-  { "robix",           { NULL }, 9599,  "tcp"  },
-  { "robix",           { NULL }, 9599,  "udp"  },
-  { "micromuse-ncpw",  { NULL }, 9600,  "tcp"  },
-  { "micromuse-ncpw",  { NULL }, 9600,  "udp"  },
-  { "streamcomm-ds",   { NULL }, 9612,  "tcp"  },
-  { "streamcomm-ds",   { NULL }, 9612,  "udp"  },
-  { "iadt-tls",        { NULL }, 9614,  "tcp"  },
-  { "erunbook_agent",  { NULL }, 9616,  "tcp"  },
-  { "erunbook_server", { NULL }, 9617,  "tcp"  },
-  { "condor",          { NULL }, 9618,  "tcp"  },
-  { "condor",          { NULL }, 9618,  "udp"  },
-  { "odbcpathway",     { NULL }, 9628,  "tcp"  },
-  { "odbcpathway",     { NULL }, 9628,  "udp"  },
-  { "uniport",         { NULL }, 9629,  "tcp"  },
-  { "uniport",         { NULL }, 9629,  "udp"  },
-  { "peoctlr",         { NULL }, 9630,  "tcp"  },
-  { "peocoll",         { NULL }, 9631,  "tcp"  },
-  { "mc-comm",         { NULL }, 9632,  "udp"  },
-  { "pqsflows",        { NULL }, 9640,  "tcp"  },
-  { "xmms2",           { NULL }, 9667,  "tcp"  },
-  { "xmms2",           { NULL }, 9667,  "udp"  },
-  { "tec5-sdctp",      { NULL }, 9668,  "tcp"  },
-  { "tec5-sdctp",      { NULL }, 9668,  "udp"  },
-  { "client-wakeup",   { NULL }, 9694,  "tcp"  },
-  { "client-wakeup",   { NULL }, 9694,  "udp"  },
-  { "ccnx",            { NULL }, 9695,  "tcp"  },
-  { "ccnx",            { NULL }, 9695,  "udp"  },
-  { "board-roar",      { NULL }, 9700,  "tcp"  },
-  { "board-roar",      { NULL }, 9700,  "udp"  },
-  { "l5nas-parchan",   { NULL }, 9747,  "tcp"  },
-  { "l5nas-parchan",   { NULL }, 9747,  "udp"  },
-  { "board-voip",      { NULL }, 9750,  "tcp"  },
-  { "board-voip",      { NULL }, 9750,  "udp"  },
-  { "rasadv",          { NULL }, 9753,  "tcp"  },
-  { "rasadv",          { NULL }, 9753,  "udp"  },
-  { "tungsten-http",   { NULL }, 9762,  "tcp"  },
-  { "tungsten-http",   { NULL }, 9762,  "udp"  },
-  { "davsrc",          { NULL }, 9800,  "tcp"  },
-  { "davsrc",          { NULL }, 9800,  "udp"  },
-  { "sstp-2",          { NULL }, 9801,  "tcp"  },
-  { "sstp-2",          { NULL }, 9801,  "udp"  },
-  { "davsrcs",         { NULL }, 9802,  "tcp"  },
-  { "davsrcs",         { NULL }, 9802,  "udp"  },
-  { "sapv1",           { NULL }, 9875,  "tcp"  },
-  { "sapv1",           { NULL }, 9875,  "udp"  },
-  { "sd",              { NULL }, 9876,  "tcp"  },
-  { "sd",              { NULL }, 9876,  "udp"  },
-  { "cyborg-systems",  { NULL }, 9888,  "tcp"  },
-  { "cyborg-systems",  { NULL }, 9888,  "udp"  },
-  { "gt-proxy",        { NULL }, 9889,  "tcp"  },
-  { "gt-proxy",        { NULL }, 9889,  "udp"  },
-  { "monkeycom",       { NULL }, 9898,  "tcp"  },
-  { "monkeycom",       { NULL }, 9898,  "udp"  },
-  { "sctp-tunneling",  { NULL }, 9899,  "tcp"  },
-  { "sctp-tunneling",  { NULL }, 9899,  "udp"  },
-  { "iua",             { NULL }, 9900,  "tcp"  },
-  { "iua",             { NULL }, 9900,  "udp"  },
-  { "iua",             { NULL }, 9900,  "sctp" },
-  { "enrp",            { NULL }, 9901,  "udp"  },
-  { "enrp-sctp",       { NULL }, 9901,  "sctp" },
-  { "enrp-sctp-tls",   { NULL }, 9902,  "sctp" },
-  { "domaintime",      { NULL }, 9909,  "tcp"  },
-  { "domaintime",      { NULL }, 9909,  "udp"  },
-  { "sype-transport",  { NULL }, 9911,  "tcp"  },
-  { "sype-transport",  { NULL }, 9911,  "udp"  },
-  { "apc-9950",        { NULL }, 9950,  "tcp"  },
-  { "apc-9950",        { NULL }, 9950,  "udp"  },
-  { "apc-9951",        { NULL }, 9951,  "tcp"  },
-  { "apc-9951",        { NULL }, 9951,  "udp"  },
-  { "apc-9952",        { NULL }, 9952,  "tcp"  },
-  { "apc-9952",        { NULL }, 9952,  "udp"  },
-  { "acis",            { NULL }, 9953,  "tcp"  },
-  { "acis",            { NULL }, 9953,  "udp"  },
-  { "odnsp",           { NULL }, 9966,  "tcp"  },
-  { "odnsp",           { NULL }, 9966,  "udp"  },
-  { "dsm-scm-target",  { NULL }, 9987,  "tcp"  },
-  { "dsm-scm-target",  { NULL }, 9987,  "udp"  },
-  { "nsesrvr",         { NULL }, 9988,  "tcp"  },
-  { "osm-appsrvr",     { NULL }, 9990,  "tcp"  },
-  { "osm-appsrvr",     { NULL }, 9990,  "udp"  },
-  { "osm-oev",         { NULL }, 9991,  "tcp"  },
-  { "osm-oev",         { NULL }, 9991,  "udp"  },
-  { "palace-1",        { NULL }, 9992,  "tcp"  },
-  { "palace-1",        { NULL }, 9992,  "udp"  },
-  { "palace-2",        { NULL }, 9993,  "tcp"  },
-  { "palace-2",        { NULL }, 9993,  "udp"  },
-  { "palace-3",        { NULL }, 9994,  "tcp"  },
-  { "palace-3",        { NULL }, 9994,  "udp"  },
-  { "palace-4",        { NULL }, 9995,  "tcp"  },
-  { "palace-4",        { NULL }, 9995,  "udp"  },
-  { "palace-5",        { NULL }, 9996,  "tcp"  },
-  { "palace-5",        { NULL }, 9996,  "udp"  },
-  { "palace-6",        { NULL }, 9997,  "tcp"  },
-  { "palace-6",        { NULL }, 9997,  "udp"  },
-  { "distinct32",      { NULL }, 9998,  "tcp"  },
-  { "distinct32",      { NULL }, 9998,  "udp"  },
-  { "distinct",        { NULL }, 9999,  "tcp"  },
-  { "distinct",        { NULL }, 9999,  "udp"  },
-  { "ndmp",            { NULL }, 10000, "tcp"  },
-  { "ndmp",            { NULL }, 10000, "udp"  },
-  { "scp-config",      { NULL }, 10001, "tcp"  },
-  { "scp-config",      { NULL }, 10001, "udp"  },
-  { "documentum",      { NULL }, 10002, "tcp"  },
-  { "documentum",      { NULL }, 10002, "udp"  },
-  { "documentum_s",    { NULL }, 10003, "tcp"  },
-  { "documentum_s",    { NULL }, 10003, "udp"  },
-  { "emcrmirccd",      { NULL }, 10004, "tcp"  },
-  { "emcrmird",        { NULL }, 10005, "tcp"  },
-  { "mvs-capacity",    { NULL }, 10007, "tcp"  },
-  { "mvs-capacity",    { NULL }, 10007, "udp"  },
-  { "octopus",         { NULL }, 10008, "tcp"  },
-  { "octopus",         { NULL }, 10008, "udp"  },
-  { "swdtp-sv",        { NULL }, 10009, "tcp"  },
-  { "swdtp-sv",        { NULL }, 10009, "udp"  },
-  { "rxapi",           { NULL }, 10010, "tcp"  },
-  { "zabbix-agent",    { NULL }, 10050, "tcp"  },
-  { "zabbix-agent",    { NULL }, 10050, "udp"  },
-  { "zabbix-trapper",  { NULL }, 10051, "tcp"  },
-  { "zabbix-trapper",  { NULL }, 10051, "udp"  },
-  { "qptlmd",          { NULL }, 10055, "tcp"  },
-  { "amanda",          { NULL }, 10080, "tcp"  },
-  { "amanda",          { NULL }, 10080, "udp"  },
-  { "famdc",           { NULL }, 10081, "tcp"  },
-  { "famdc",           { NULL }, 10081, "udp"  },
-  { "itap-ddtp",       { NULL }, 10100, "tcp"  },
-  { "itap-ddtp",       { NULL }, 10100, "udp"  },
-  { "ezmeeting-2",     { NULL }, 10101, "tcp"  },
-  { "ezmeeting-2",     { NULL }, 10101, "udp"  },
-  { "ezproxy-2",       { NULL }, 10102, "tcp"  },
-  { "ezproxy-2",       { NULL }, 10102, "udp"  },
-  { "ezrelay",         { NULL }, 10103, "tcp"  },
-  { "ezrelay",         { NULL }, 10103, "udp"  },
-  { "swdtp",           { NULL }, 10104, "tcp"  },
-  { "swdtp",           { NULL }, 10104, "udp"  },
-  { "bctp-server",     { NULL }, 10107, "tcp"  },
-  { "bctp-server",     { NULL }, 10107, "udp"  },
-  { "nmea-0183",       { NULL }, 10110, "tcp"  },
-  { "nmea-0183",       { NULL }, 10110, "udp"  },
-  { "netiq-endpoint",  { NULL }, 10113, "tcp"  },
-  { "netiq-endpoint",  { NULL }, 10113, "udp"  },
-  { "netiq-qcheck",    { NULL }, 10114, "tcp"  },
-  { "netiq-qcheck",    { NULL }, 10114, "udp"  },
-  { "netiq-endpt",     { NULL }, 10115, "tcp"  },
-  { "netiq-endpt",     { NULL }, 10115, "udp"  },
-  { "netiq-voipa",     { NULL }, 10116, "tcp"  },
-  { "netiq-voipa",     { NULL }, 10116, "udp"  },
-  { "iqrm",            { NULL }, 10117, "tcp"  },
-  { "iqrm",            { NULL }, 10117, "udp"  },
-  { "bmc-perf-sd",     { NULL }, 10128, "tcp"  },
-  { "bmc-perf-sd",     { NULL }, 10128, "udp"  },
-  { "bmc-gms",         { NULL }, 10129, "tcp"  },
-  { "qb-db-server",    { NULL }, 10160, "tcp"  },
-  { "qb-db-server",    { NULL }, 10160, "udp"  },
-  { "snmptls",         { NULL }, 10161, "tcp"  },
-  { "snmpdtls",        { NULL }, 10161, "udp"  },
-  { "snmptls-trap",    { NULL }, 10162, "tcp"  },
-  { "snmpdtls-trap",   { NULL }, 10162, "udp"  },
-  { "trisoap",         { NULL }, 10200, "tcp"  },
-  { "trisoap",         { NULL }, 10200, "udp"  },
-  { "rsms",            { NULL }, 10201, "tcp"  },
-  { "rscs",            { NULL }, 10201, "udp"  },
-  { "apollo-relay",    { NULL }, 10252, "tcp"  },
-  { "apollo-relay",    { NULL }, 10252, "udp"  },
-  { "axis-wimp-port",  { NULL }, 10260, "tcp"  },
-  { "axis-wimp-port",  { NULL }, 10260, "udp"  },
-  { "blocks",          { NULL }, 10288, "tcp"  },
-  { "blocks",          { NULL }, 10288, "udp"  },
-  { "cosir",           { NULL }, 10321, "tcp"  },
-  { "hip-nat-t",       { NULL }, 10500, "udp"  },
-  { "MOS-lower",       { NULL }, 10540, "tcp"  },
-  { "MOS-lower",       { NULL }, 10540, "udp"  },
-  { "MOS-upper",       { NULL }, 10541, "tcp"  },
-  { "MOS-upper",       { NULL }, 10541, "udp"  },
-  { "MOS-aux",         { NULL }, 10542, "tcp"  },
-  { "MOS-aux",         { NULL }, 10542, "udp"  },
-  { "MOS-soap",        { NULL }, 10543, "tcp"  },
-  { "MOS-soap",        { NULL }, 10543, "udp"  },
-  { "MOS-soap-opt",    { NULL }, 10544, "tcp"  },
-  { "MOS-soap-opt",    { NULL }, 10544, "udp"  },
-  { "gap",             { NULL }, 10800, "tcp"  },
-  { "gap",             { NULL }, 10800, "udp"  },
-  { "lpdg",            { NULL }, 10805, "tcp"  },
-  { "lpdg",            { NULL }, 10805, "udp"  },
-  { "nbd",             { NULL }, 10809, "tcp"  },
-  { "nmc-disc",        { NULL }, 10810, "udp"  },
-  { "helix",           { NULL }, 10860, "tcp"  },
-  { "helix",           { NULL }, 10860, "udp"  },
-  { "rmiaux",          { NULL }, 10990, "tcp"  },
-  { "rmiaux",          { NULL }, 10990, "udp"  },
-  { "irisa",           { NULL }, 11000, "tcp"  },
-  { "irisa",           { NULL }, 11000, "udp"  },
-  { "metasys",         { NULL }, 11001, "tcp"  },
-  { "metasys",         { NULL }, 11001, "udp"  },
-  { "netapp-icmgmt",   { NULL }, 11104, "tcp"  },
-  { "netapp-icdata",   { NULL }, 11105, "tcp"  },
-  { "sgi-lk",          { NULL }, 11106, "tcp"  },
-  { "sgi-lk",          { NULL }, 11106, "udp"  },
-  { "vce",             { NULL }, 11111, "tcp"  },
-  { "vce",             { NULL }, 11111, "udp"  },
-  { "dicom",           { NULL }, 11112, "tcp"  },
-  { "dicom",           { NULL }, 11112, "udp"  },
-  { "suncacao-snmp",   { NULL }, 11161, "tcp"  },
-  { "suncacao-snmp",   { NULL }, 11161, "udp"  },
-  { "suncacao-jmxmp",  { NULL }, 11162, "tcp"  },
-  { "suncacao-jmxmp",  { NULL }, 11162, "udp"  },
-  { "suncacao-rmi",    { NULL }, 11163, "tcp"  },
-  { "suncacao-rmi",    { NULL }, 11163, "udp"  },
-  { "suncacao-csa",    { NULL }, 11164, "tcp"  },
-  { "suncacao-csa",    { NULL }, 11164, "udp"  },
-  { "suncacao-websvc", { NULL }, 11165, "tcp"  },
-  { "suncacao-websvc", { NULL }, 11165, "udp"  },
-  { "snss",            { NULL }, 11171, "udp"  },
-  { "oemcacao-jmxmp",  { NULL }, 11172, "tcp"  },
-  { "oemcacao-rmi",    { NULL }, 11174, "tcp"  },
-  { "oemcacao-websvc", { NULL }, 11175, "tcp"  },
-  { "smsqp",           { NULL }, 11201, "tcp"  },
-  { "smsqp",           { NULL }, 11201, "udp"  },
-  { "wifree",          { NULL }, 11208, "tcp"  },
-  { "wifree",          { NULL }, 11208, "udp"  },
-  { "memcache",        { NULL }, 11211, "tcp"  },
-  { "memcache",        { NULL }, 11211, "udp"  },
-  { "imip",            { NULL }, 11319, "tcp"  },
-  { "imip",            { NULL }, 11319, "udp"  },
-  { "imip-channels",   { NULL }, 11320, "tcp"  },
-  { "imip-channels",   { NULL }, 11320, "udp"  },
-  { "arena-server",    { NULL }, 11321, "tcp"  },
-  { "arena-server",    { NULL }, 11321, "udp"  },
-  { "atm-uhas",        { NULL }, 11367, "tcp"  },
-  { "atm-uhas",        { NULL }, 11367, "udp"  },
-  { "hkp",             { NULL }, 11371, "tcp"  },
-  { "hkp",             { NULL }, 11371, "udp"  },
-  { "asgcypresstcps",  { NULL }, 11489, "tcp"  },
-  { "tempest-port",    { NULL }, 11600, "tcp"  },
-  { "tempest-port",    { NULL }, 11600, "udp"  },
-  { "h323callsigalt",  { NULL }, 11720, "tcp"  },
-  { "h323callsigalt",  { NULL }, 11720, "udp"  },
-  { "intrepid-ssl",    { NULL }, 11751, "tcp"  },
-  { "intrepid-ssl",    { NULL }, 11751, "udp"  },
-  { "xoraya",          { NULL }, 11876, "tcp"  },
-  { "xoraya",          { NULL }, 11876, "udp"  },
-  { "x2e-disc",        { NULL }, 11877, "udp"  },
-  { "sysinfo-sp",      { NULL }, 11967, "tcp"  },
-  { "sysinfo-sp",      { NULL }, 11967, "udp"  },
-  { "wmereceiving",    { NULL }, 11997, "sctp" },
-  { "wmedistribution", { NULL }, 11998, "sctp" },
-  { "wmereporting",    { NULL }, 11999, "sctp" },
-  { "entextxid",       { NULL }, 12000, "tcp"  },
-  { "entextxid",       { NULL }, 12000, "udp"  },
-  { "entextnetwk",     { NULL }, 12001, "tcp"  },
-  { "entextnetwk",     { NULL }, 12001, "udp"  },
-  { "entexthigh",      { NULL }, 12002, "tcp"  },
-  { "entexthigh",      { NULL }, 12002, "udp"  },
-  { "entextmed",       { NULL }, 12003, "tcp"  },
-  { "entextmed",       { NULL }, 12003, "udp"  },
-  { "entextlow",       { NULL }, 12004, "tcp"  },
-  { "entextlow",       { NULL }, 12004, "udp"  },
-  { "dbisamserver1",   { NULL }, 12005, "tcp"  },
-  { "dbisamserver1",   { NULL }, 12005, "udp"  },
-  { "dbisamserver2",   { NULL }, 12006, "tcp"  },
-  { "dbisamserver2",   { NULL }, 12006, "udp"  },
-  { "accuracer",       { NULL }, 12007, "tcp"  },
-  { "accuracer",       { NULL }, 12007, "udp"  },
-  { "accuracer-dbms",  { NULL }, 12008, "tcp"  },
-  { "accuracer-dbms",  { NULL }, 12008, "udp"  },
-  { "edbsrvr",         { NULL }, 12010, "tcp"  },
-  { "vipera",          { NULL }, 12012, "tcp"  },
-  { "vipera",          { NULL }, 12012, "udp"  },
-  { "vipera-ssl",      { NULL }, 12013, "tcp"  },
-  { "vipera-ssl",      { NULL }, 12013, "udp"  },
-  { "rets-ssl",        { NULL }, 12109, "tcp"  },
-  { "rets-ssl",        { NULL }, 12109, "udp"  },
-  { "nupaper-ss",      { NULL }, 12121, "tcp"  },
-  { "nupaper-ss",      { NULL }, 12121, "udp"  },
-  { "cawas",           { NULL }, 12168, "tcp"  },
-  { "cawas",           { NULL }, 12168, "udp"  },
-  { "hivep",           { NULL }, 12172, "tcp"  },
-  { "hivep",           { NULL }, 12172, "udp"  },
-  { "linogridengine",  { NULL }, 12300, "tcp"  },
-  { "linogridengine",  { NULL }, 12300, "udp"  },
-  { "warehouse-sss",   { NULL }, 12321, "tcp"  },
-  { "warehouse-sss",   { NULL }, 12321, "udp"  },
-  { "warehouse",       { NULL }, 12322, "tcp"  },
-  { "warehouse",       { NULL }, 12322, "udp"  },
-  { "italk",           { NULL }, 12345, "tcp"  },
-  { "italk",           { NULL }, 12345, "udp"  },
-  { "tsaf",            { NULL }, 12753, "tcp"  },
-  { "tsaf",            { NULL }, 12753, "udp"  },
-  { "i-zipqd",         { NULL }, 13160, "tcp"  },
-  { "i-zipqd",         { NULL }, 13160, "udp"  },
-  { "bcslogc",         { NULL }, 13216, "tcp"  },
-  { "bcslogc",         { NULL }, 13216, "udp"  },
-  { "rs-pias",         { NULL }, 13217, "tcp"  },
-  { "rs-pias",         { NULL }, 13217, "udp"  },
-  { "emc-vcas-tcp",    { NULL }, 13218, "tcp"  },
-  { "emc-vcas-udp",    { NULL }, 13218, "udp"  },
-  { "powwow-client",   { NULL }, 13223, "tcp"  },
-  { "powwow-client",   { NULL }, 13223, "udp"  },
-  { "powwow-server",   { NULL }, 13224, "tcp"  },
-  { "powwow-server",   { NULL }, 13224, "udp"  },
-  { "doip-data",       { NULL }, 13400, "tcp"  },
-  { "doip-disc",       { NULL }, 13400, "udp"  },
-  { "bprd",            { NULL }, 13720, "tcp"  },
-  { "bprd",            { NULL }, 13720, "udp"  },
-  { "bpdbm",           { NULL }, 13721, "tcp"  },
-  { "bpdbm",           { NULL }, 13721, "udp"  },
-  { "bpjava-msvc",     { NULL }, 13722, "tcp"  },
-  { "bpjava-msvc",     { NULL }, 13722, "udp"  },
-  { "vnetd",           { NULL }, 13724, "tcp"  },
-  { "vnetd",           { NULL }, 13724, "udp"  },
-  { "bpcd",            { NULL }, 13782, "tcp"  },
-  { "bpcd",            { NULL }, 13782, "udp"  },
-  { "vopied",          { NULL }, 13783, "tcp"  },
-  { "vopied",          { NULL }, 13783, "udp"  },
-  { "nbdb",            { NULL }, 13785, "tcp"  },
-  { "nbdb",            { NULL }, 13785, "udp"  },
-  { "nomdb",           { NULL }, 13786, "tcp"  },
-  { "nomdb",           { NULL }, 13786, "udp"  },
-  { "dsmcc-config",    { NULL }, 13818, "tcp"  },
-  { "dsmcc-config",    { NULL }, 13818, "udp"  },
-  { "dsmcc-session",   { NULL }, 13819, "tcp"  },
-  { "dsmcc-session",   { NULL }, 13819, "udp"  },
-  { "dsmcc-passthru",  { NULL }, 13820, "tcp"  },
-  { "dsmcc-passthru",  { NULL }, 13820, "udp"  },
-  { "dsmcc-download",  { NULL }, 13821, "tcp"  },
-  { "dsmcc-download",  { NULL }, 13821, "udp"  },
-  { "dsmcc-ccp",       { NULL }, 13822, "tcp"  },
-  { "dsmcc-ccp",       { NULL }, 13822, "udp"  },
-  { "bmdss",           { NULL }, 13823, "tcp"  },
-  { "dta-systems",     { NULL }, 13929, "tcp"  },
-  { "dta-systems",     { NULL }, 13929, "udp"  },
-  { "medevolve",       { NULL }, 13930, "tcp"  },
-  { "scotty-ft",       { NULL }, 14000, "tcp"  },
-  { "scotty-ft",       { NULL }, 14000, "udp"  },
-  { "sua",             { NULL }, 14001, "tcp"  },
-  { "sua",             { NULL }, 14001, "udp"  },
-  { "sua",             { NULL }, 14001, "sctp" },
-  { "sage-best-com1",  { NULL }, 14033, "tcp"  },
-  { "sage-best-com1",  { NULL }, 14033, "udp"  },
-  { "sage-best-com2",  { NULL }, 14034, "tcp"  },
-  { "sage-best-com2",  { NULL }, 14034, "udp"  },
-  { "vcs-app",         { NULL }, 14141, "tcp"  },
-  { "vcs-app",         { NULL }, 14141, "udp"  },
-  { "icpp",            { NULL }, 14142, "tcp"  },
-  { "icpp",            { NULL }, 14142, "udp"  },
-  { "gcm-app",         { NULL }, 14145, "tcp"  },
-  { "gcm-app",         { NULL }, 14145, "udp"  },
-  { "vrts-tdd",        { NULL }, 14149, "tcp"  },
-  { "vrts-tdd",        { NULL }, 14149, "udp"  },
-  { "vcscmd",          { NULL }, 14150, "tcp"  },
-  { "vad",             { NULL }, 14154, "tcp"  },
-  { "vad",             { NULL }, 14154, "udp"  },
-  { "cps",             { NULL }, 14250, "tcp"  },
-  { "cps",             { NULL }, 14250, "udp"  },
-  { "ca-web-update",   { NULL }, 14414, "tcp"  },
-  { "ca-web-update",   { NULL }, 14414, "udp"  },
-  { "hde-lcesrvr-1",   { NULL }, 14936, "tcp"  },
-  { "hde-lcesrvr-1",   { NULL }, 14936, "udp"  },
-  { "hde-lcesrvr-2",   { NULL }, 14937, "tcp"  },
-  { "hde-lcesrvr-2",   { NULL }, 14937, "udp"  },
-  { "hydap",           { NULL }, 15000, "tcp"  },
-  { "hydap",           { NULL }, 15000, "udp"  },
-  { "xpilot",          { NULL }, 15345, "tcp"  },
-  { "xpilot",          { NULL }, 15345, "udp"  },
-  { "3link",           { NULL }, 15363, "tcp"  },
-  { "3link",           { NULL }, 15363, "udp"  },
-  { "cisco-snat",      { NULL }, 15555, "tcp"  },
-  { "cisco-snat",      { NULL }, 15555, "udp"  },
-  { "bex-xr",          { NULL }, 15660, "tcp"  },
-  { "bex-xr",          { NULL }, 15660, "udp"  },
-  { "ptp",             { NULL }, 15740, "tcp"  },
-  { "ptp",             { NULL }, 15740, "udp"  },
-  { "2ping",           { NULL }, 15998, "udp"  },
-  { "programmar",      { NULL }, 15999, "tcp"  },
-  { "fmsas",           { NULL }, 16000, "tcp"  },
-  { "fmsascon",        { NULL }, 16001, "tcp"  },
-  { "gsms",            { NULL }, 16002, "tcp"  },
-  { "alfin",           { NULL }, 16003, "udp"  },
-  { "jwpc",            { NULL }, 16020, "tcp"  },
-  { "jwpc-bin",        { NULL }, 16021, "tcp"  },
-  { "sun-sea-port",    { NULL }, 16161, "tcp"  },
-  { "sun-sea-port",    { NULL }, 16161, "udp"  },
-  { "solaris-audit",   { NULL }, 16162, "tcp"  },
-  { "etb4j",           { NULL }, 16309, "tcp"  },
-  { "etb4j",           { NULL }, 16309, "udp"  },
-  { "pduncs",          { NULL }, 16310, "tcp"  },
-  { "pduncs",          { NULL }, 16310, "udp"  },
-  { "pdefmns",         { NULL }, 16311, "tcp"  },
-  { "pdefmns",         { NULL }, 16311, "udp"  },
-  { "netserialext1",   { NULL }, 16360, "tcp"  },
-  { "netserialext1",   { NULL }, 16360, "udp"  },
-  { "netserialext2",   { NULL }, 16361, "tcp"  },
-  { "netserialext2",   { NULL }, 16361, "udp"  },
-  { "netserialext3",   { NULL }, 16367, "tcp"  },
-  { "netserialext3",   { NULL }, 16367, "udp"  },
-  { "netserialext4",   { NULL }, 16368, "tcp"  },
-  { "netserialext4",   { NULL }, 16368, "udp"  },
-  { "connected",       { NULL }, 16384, "tcp"  },
-  { "connected",       { NULL }, 16384, "udp"  },
-  { "xoms",            { NULL }, 16619, "tcp"  },
-  { "newbay-snc-mc",   { NULL }, 16900, "tcp"  },
-  { "newbay-snc-mc",   { NULL }, 16900, "udp"  },
-  { "sgcip",           { NULL }, 16950, "tcp"  },
-  { "sgcip",           { NULL }, 16950, "udp"  },
-  { "intel-rci-mp",    { NULL }, 16991, "tcp"  },
-  { "intel-rci-mp",    { NULL }, 16991, "udp"  },
-  { "amt-soap-http",   { NULL }, 16992, "tcp"  },
-  { "amt-soap-http",   { NULL }, 16992, "udp"  },
-  { "amt-soap-https",  { NULL }, 16993, "tcp"  },
-  { "amt-soap-https",  { NULL }, 16993, "udp"  },
-  { "amt-redir-tcp",   { NULL }, 16994, "tcp"  },
-  { "amt-redir-tcp",   { NULL }, 16994, "udp"  },
-  { "amt-redir-tls",   { NULL }, 16995, "tcp"  },
-  { "amt-redir-tls",   { NULL }, 16995, "udp"  },
-  { "isode-dua",       { NULL }, 17007, "tcp"  },
-  { "isode-dua",       { NULL }, 17007, "udp"  },
-  { "soundsvirtual",   { NULL }, 17185, "tcp"  },
-  { "soundsvirtual",   { NULL }, 17185, "udp"  },
-  { "chipper",         { NULL }, 17219, "tcp"  },
-  { "chipper",         { NULL }, 17219, "udp"  },
-  { "integrius-stp",   { NULL }, 17234, "tcp"  },
-  { "integrius-stp",   { NULL }, 17234, "udp"  },
-  { "ssh-mgmt",        { NULL }, 17235, "tcp"  },
-  { "ssh-mgmt",        { NULL }, 17235, "udp"  },
-  { "db-lsp",          { NULL }, 17500, "tcp"  },
-  { "db-lsp-disc",     { NULL }, 17500, "udp"  },
-  { "ea",              { NULL }, 17729, "tcp"  },
-  { "ea",              { NULL }, 17729, "udp"  },
-  { "zep",             { NULL }, 17754, "tcp"  },
-  { "zep",             { NULL }, 17754, "udp"  },
-  { "zigbee-ip",       { NULL }, 17755, "tcp"  },
-  { "zigbee-ip",       { NULL }, 17755, "udp"  },
-  { "zigbee-ips",      { NULL }, 17756, "tcp"  },
-  { "zigbee-ips",      { NULL }, 17756, "udp"  },
-  { "sw-orion",        { NULL }, 17777, "tcp"  },
-  { "biimenu",         { NULL }, 18000, "tcp"  },
-  { "biimenu",         { NULL }, 18000, "udp"  },
-  { "radpdf",          { NULL }, 18104, "tcp"  },
-  { "racf",            { NULL }, 18136, "tcp"  },
-  { "opsec-cvp",       { NULL }, 18181, "tcp"  },
-  { "opsec-cvp",       { NULL }, 18181, "udp"  },
-  { "opsec-ufp",       { NULL }, 18182, "tcp"  },
-  { "opsec-ufp",       { NULL }, 18182, "udp"  },
-  { "opsec-sam",       { NULL }, 18183, "tcp"  },
-  { "opsec-sam",       { NULL }, 18183, "udp"  },
-  { "opsec-lea",       { NULL }, 18184, "tcp"  },
-  { "opsec-lea",       { NULL }, 18184, "udp"  },
-  { "opsec-omi",       { NULL }, 18185, "tcp"  },
-  { "opsec-omi",       { NULL }, 18185, "udp"  },
-  { "ohsc",            { NULL }, 18186, "tcp"  },
-  { "ohsc",            { NULL }, 18186, "udp"  },
-  { "opsec-ela",       { NULL }, 18187, "tcp"  },
-  { "opsec-ela",       { NULL }, 18187, "udp"  },
-  { "checkpoint-rtm",  { NULL }, 18241, "tcp"  },
-  { "checkpoint-rtm",  { NULL }, 18241, "udp"  },
-  { "gv-pf",           { NULL }, 18262, "tcp"  },
-  { "gv-pf",           { NULL }, 18262, "udp"  },
-  { "ac-cluster",      { NULL }, 18463, "tcp"  },
-  { "ac-cluster",      { NULL }, 18463, "udp"  },
-  { "rds-ib",          { NULL }, 18634, "tcp"  },
-  { "rds-ib",          { NULL }, 18634, "udp"  },
-  { "rds-ip",          { NULL }, 18635, "tcp"  },
-  { "rds-ip",          { NULL }, 18635, "udp"  },
-  { "ique",            { NULL }, 18769, "tcp"  },
-  { "ique",            { NULL }, 18769, "udp"  },
-  { "infotos",         { NULL }, 18881, "tcp"  },
-  { "infotos",         { NULL }, 18881, "udp"  },
-  { "apc-necmp",       { NULL }, 18888, "tcp"  },
-  { "apc-necmp",       { NULL }, 18888, "udp"  },
-  { "igrid",           { NULL }, 19000, "tcp"  },
-  { "igrid",           { NULL }, 19000, "udp"  },
-  { "j-link",          { NULL }, 19020, "tcp"  },
-  { "opsec-uaa",       { NULL }, 19191, "tcp"  },
-  { "opsec-uaa",       { NULL }, 19191, "udp"  },
-  { "ua-secureagent",  { NULL }, 19194, "tcp"  },
-  { "ua-secureagent",  { NULL }, 19194, "udp"  },
-  { "keysrvr",         { NULL }, 19283, "tcp"  },
-  { "keysrvr",         { NULL }, 19283, "udp"  },
-  { "keyshadow",       { NULL }, 19315, "tcp"  },
-  { "keyshadow",       { NULL }, 19315, "udp"  },
-  { "mtrgtrans",       { NULL }, 19398, "tcp"  },
-  { "mtrgtrans",       { NULL }, 19398, "udp"  },
-  { "hp-sco",          { NULL }, 19410, "tcp"  },
-  { "hp-sco",          { NULL }, 19410, "udp"  },
-  { "hp-sca",          { NULL }, 19411, "tcp"  },
-  { "hp-sca",          { NULL }, 19411, "udp"  },
-  { "hp-sessmon",      { NULL }, 19412, "tcp"  },
-  { "hp-sessmon",      { NULL }, 19412, "udp"  },
-  { "fxuptp",          { NULL }, 19539, "tcp"  },
-  { "fxuptp",          { NULL }, 19539, "udp"  },
-  { "sxuptp",          { NULL }, 19540, "tcp"  },
-  { "sxuptp",          { NULL }, 19540, "udp"  },
-  { "jcp",             { NULL }, 19541, "tcp"  },
-  { "jcp",             { NULL }, 19541, "udp"  },
-  { "iec-104-sec",     { NULL }, 19998, "tcp"  },
-  { "dnp-sec",         { NULL }, 19999, "tcp"  },
-  { "dnp-sec",         { NULL }, 19999, "udp"  },
-  { "dnp",             { NULL }, 20000, "tcp"  },
-  { "dnp",             { NULL }, 20000, "udp"  },
-  { "microsan",        { NULL }, 20001, "tcp"  },
-  { "microsan",        { NULL }, 20001, "udp"  },
-  { "commtact-http",   { NULL }, 20002, "tcp"  },
-  { "commtact-http",   { NULL }, 20002, "udp"  },
-  { "commtact-https",  { NULL }, 20003, "tcp"  },
-  { "commtact-https",  { NULL }, 20003, "udp"  },
-  { "openwebnet",      { NULL }, 20005, "tcp"  },
-  { "openwebnet",      { NULL }, 20005, "udp"  },
-  { "ss-idi-disc",     { NULL }, 20012, "udp"  },
-  { "ss-idi",          { NULL }, 20013, "tcp"  },
-  { "opendeploy",      { NULL }, 20014, "tcp"  },
-  { "opendeploy",      { NULL }, 20014, "udp"  },
-  { "nburn_id",        { NULL }, 20034, "tcp"  },
-  { "nburn_id",        { NULL }, 20034, "udp"  },
-  { "tmophl7mts",      { NULL }, 20046, "tcp"  },
-  { "tmophl7mts",      { NULL }, 20046, "udp"  },
-  { "mountd",          { NULL }, 20048, "tcp"  },
-  { "mountd",          { NULL }, 20048, "udp"  },
-  { "nfsrdma",         { NULL }, 20049, "tcp"  },
-  { "nfsrdma",         { NULL }, 20049, "udp"  },
-  { "nfsrdma",         { NULL }, 20049, "sctp" },
-  { "tolfab",          { NULL }, 20167, "tcp"  },
-  { "tolfab",          { NULL }, 20167, "udp"  },
-  { "ipdtp-port",      { NULL }, 20202, "tcp"  },
-  { "ipdtp-port",      { NULL }, 20202, "udp"  },
-  { "ipulse-ics",      { NULL }, 20222, "tcp"  },
-  { "ipulse-ics",      { NULL }, 20222, "udp"  },
-  { "emwavemsg",       { NULL }, 20480, "tcp"  },
-  { "emwavemsg",       { NULL }, 20480, "udp"  },
-  { "track",           { NULL }, 20670, "tcp"  },
-  { "track",           { NULL }, 20670, "udp"  },
-  { "athand-mmp",      { NULL }, 20999, "tcp"  },
-  { "athand-mmp",      { NULL }, 20999, "udp"  },
-  { "irtrans",         { NULL }, 21000, "tcp"  },
-  { "irtrans",         { NULL }, 21000, "udp"  },
-  { "dfserver",        { NULL }, 21554, "tcp"  },
-  { "dfserver",        { NULL }, 21554, "udp"  },
-  { "vofr-gateway",    { NULL }, 21590, "tcp"  },
-  { "vofr-gateway",    { NULL }, 21590, "udp"  },
-  { "tvpm",            { NULL }, 21800, "tcp"  },
-  { "tvpm",            { NULL }, 21800, "udp"  },
-  { "webphone",        { NULL }, 21845, "tcp"  },
-  { "webphone",        { NULL }, 21845, "udp"  },
-  { "netspeak-is",     { NULL }, 21846, "tcp"  },
-  { "netspeak-is",     { NULL }, 21846, "udp"  },
-  { "netspeak-cs",     { NULL }, 21847, "tcp"  },
-  { "netspeak-cs",     { NULL }, 21847, "udp"  },
-  { "netspeak-acd",    { NULL }, 21848, "tcp"  },
-  { "netspeak-acd",    { NULL }, 21848, "udp"  },
-  { "netspeak-cps",    { NULL }, 21849, "tcp"  },
-  { "netspeak-cps",    { NULL }, 21849, "udp"  },
-  { "snapenetio",      { NULL }, 22000, "tcp"  },
-  { "snapenetio",      { NULL }, 22000, "udp"  },
-  { "optocontrol",     { NULL }, 22001, "tcp"  },
-  { "optocontrol",     { NULL }, 22001, "udp"  },
-  { "optohost002",     { NULL }, 22002, "tcp"  },
-  { "optohost002",     { NULL }, 22002, "udp"  },
-  { "optohost003",     { NULL }, 22003, "tcp"  },
-  { "optohost003",     { NULL }, 22003, "udp"  },
-  { "optohost004",     { NULL }, 22004, "tcp"  },
-  { "optohost004",     { NULL }, 22004, "udp"  },
-  { "optohost004",     { NULL }, 22005, "tcp"  },
-  { "optohost004",     { NULL }, 22005, "udp"  },
-  { "dcap",            { NULL }, 22125, "tcp"  },
-  { "gsidcap",         { NULL }, 22128, "tcp"  },
-  { "wnn6",            { NULL }, 22273, "tcp"  },
-  { "wnn6",            { NULL }, 22273, "udp"  },
-  { "cis",             { NULL }, 22305, "tcp"  },
-  { "cis",             { NULL }, 22305, "udp"  },
-  { "cis-secure",      { NULL }, 22343, "tcp"  },
-  { "cis-secure",      { NULL }, 22343, "udp"  },
-  { "WibuKey",         { NULL }, 22347, "tcp"  },
-  { "WibuKey",         { NULL }, 22347, "udp"  },
-  { "CodeMeter",       { NULL }, 22350, "tcp"  },
-  { "CodeMeter",       { NULL }, 22350, "udp"  },
-  { "vocaltec-wconf",  { NULL }, 22555, "tcp"  },
-  { "vocaltec-phone",  { NULL }, 22555, "udp"  },
-  { "talikaserver",    { NULL }, 22763, "tcp"  },
-  { "talikaserver",    { NULL }, 22763, "udp"  },
-  { "aws-brf",         { NULL }, 22800, "tcp"  },
-  { "aws-brf",         { NULL }, 22800, "udp"  },
-  { "brf-gw",          { NULL }, 22951, "tcp"  },
-  { "brf-gw",          { NULL }, 22951, "udp"  },
-  { "inovaport1",      { NULL }, 23000, "tcp"  },
-  { "inovaport1",      { NULL }, 23000, "udp"  },
-  { "inovaport2",      { NULL }, 23001, "tcp"  },
-  { "inovaport2",      { NULL }, 23001, "udp"  },
-  { "inovaport3",      { NULL }, 23002, "tcp"  },
-  { "inovaport3",      { NULL }, 23002, "udp"  },
-  { "inovaport4",      { NULL }, 23003, "tcp"  },
-  { "inovaport4",      { NULL }, 23003, "udp"  },
-  { "inovaport5",      { NULL }, 23004, "tcp"  },
-  { "inovaport5",      { NULL }, 23004, "udp"  },
-  { "inovaport6",      { NULL }, 23005, "tcp"  },
-  { "inovaport6",      { NULL }, 23005, "udp"  },
-  { "s102",            { NULL }, 23272, "udp"  },
-  { "elxmgmt",         { NULL }, 23333, "tcp"  },
-  { "elxmgmt",         { NULL }, 23333, "udp"  },
-  { "novar-dbase",     { NULL }, 23400, "tcp"  },
-  { "novar-dbase",     { NULL }, 23400, "udp"  },
-  { "novar-alarm",     { NULL }, 23401, "tcp"  },
-  { "novar-alarm",     { NULL }, 23401, "udp"  },
-  { "novar-global",    { NULL }, 23402, "tcp"  },
-  { "novar-global",    { NULL }, 23402, "udp"  },
-  { "aequus",          { NULL }, 23456, "tcp"  },
-  { "aequus-alt",      { NULL }, 23457, "tcp"  },
-  { "med-ltp",         { NULL }, 24000, "tcp"  },
-  { "med-ltp",         { NULL }, 24000, "udp"  },
-  { "med-fsp-rx",      { NULL }, 24001, "tcp"  },
-  { "med-fsp-rx",      { NULL }, 24001, "udp"  },
-  { "med-fsp-tx",      { NULL }, 24002, "tcp"  },
-  { "med-fsp-tx",      { NULL }, 24002, "udp"  },
-  { "med-supp",        { NULL }, 24003, "tcp"  },
-  { "med-supp",        { NULL }, 24003, "udp"  },
-  { "med-ovw",         { NULL }, 24004, "tcp"  },
-  { "med-ovw",         { NULL }, 24004, "udp"  },
-  { "med-ci",          { NULL }, 24005, "tcp"  },
-  { "med-ci",          { NULL }, 24005, "udp"  },
-  { "med-net-svc",     { NULL }, 24006, "tcp"  },
-  { "med-net-svc",     { NULL }, 24006, "udp"  },
-  { "filesphere",      { NULL }, 24242, "tcp"  },
-  { "filesphere",      { NULL }, 24242, "udp"  },
-  { "vista-4gl",       { NULL }, 24249, "tcp"  },
-  { "vista-4gl",       { NULL }, 24249, "udp"  },
-  { "ild",             { NULL }, 24321, "tcp"  },
-  { "ild",             { NULL }, 24321, "udp"  },
-  { "intel_rci",       { NULL }, 24386, "tcp"  },
-  { "intel_rci",       { NULL }, 24386, "udp"  },
-  { "tonidods",        { NULL }, 24465, "tcp"  },
-  { "tonidods",        { NULL }, 24465, "udp"  },
-  { "binkp",           { NULL }, 24554, "tcp"  },
-  { "binkp",           { NULL }, 24554, "udp"  },
-  { "canditv",         { NULL }, 24676, "tcp"  },
-  { "canditv",         { NULL }, 24676, "udp"  },
-  { "flashfiler",      { NULL }, 24677, "tcp"  },
-  { "flashfiler",      { NULL }, 24677, "udp"  },
-  { "proactivate",     { NULL }, 24678, "tcp"  },
-  { "proactivate",     { NULL }, 24678, "udp"  },
-  { "tcc-http",        { NULL }, 24680, "tcp"  },
-  { "tcc-http",        { NULL }, 24680, "udp"  },
-  { "cslg",            { NULL }, 24754, "tcp"  },
-  { "find",            { NULL }, 24922, "tcp"  },
-  { "find",            { NULL }, 24922, "udp"  },
-  { "icl-twobase1",    { NULL }, 25000, "tcp"  },
-  { "icl-twobase1",    { NULL }, 25000, "udp"  },
-  { "icl-twobase2",    { NULL }, 25001, "tcp"  },
-  { "icl-twobase2",    { NULL }, 25001, "udp"  },
-  { "icl-twobase3",    { NULL }, 25002, "tcp"  },
-  { "icl-twobase3",    { NULL }, 25002, "udp"  },
-  { "icl-twobase4",    { NULL }, 25003, "tcp"  },
-  { "icl-twobase4",    { NULL }, 25003, "udp"  },
-  { "icl-twobase5",    { NULL }, 25004, "tcp"  },
-  { "icl-twobase5",    { NULL }, 25004, "udp"  },
-  { "icl-twobase6",    { NULL }, 25005, "tcp"  },
-  { "icl-twobase6",    { NULL }, 25005, "udp"  },
-  { "icl-twobase7",    { NULL }, 25006, "tcp"  },
-  { "icl-twobase7",    { NULL }, 25006, "udp"  },
-  { "icl-twobase8",    { NULL }, 25007, "tcp"  },
-  { "icl-twobase8",    { NULL }, 25007, "udp"  },
-  { "icl-twobase9",    { NULL }, 25008, "tcp"  },
-  { "icl-twobase9",    { NULL }, 25008, "udp"  },
-  { "icl-twobase10",   { NULL }, 25009, "tcp"  },
-  { "icl-twobase10",   { NULL }, 25009, "udp"  },
-  { "rna",             { NULL }, 25471, "sctp" },
-  { "sauterdongle",    { NULL }, 25576, "tcp"  },
-  { "vocaltec-hos",    { NULL }, 25793, "tcp"  },
-  { "vocaltec-hos",    { NULL }, 25793, "udp"  },
-  { "tasp-net",        { NULL }, 25900, "tcp"  },
-  { "tasp-net",        { NULL }, 25900, "udp"  },
-  { "niobserver",      { NULL }, 25901, "tcp"  },
-  { "niobserver",      { NULL }, 25901, "udp"  },
-  { "nilinkanalyst",   { NULL }, 25902, "tcp"  },
-  { "nilinkanalyst",   { NULL }, 25902, "udp"  },
-  { "niprobe",         { NULL }, 25903, "tcp"  },
-  { "niprobe",         { NULL }, 25903, "udp"  },
-  { "quake",           { NULL }, 26000, "tcp"  },
-  { "quake",           { NULL }, 26000, "udp"  },
-  { "scscp",           { NULL }, 26133, "tcp"  },
-  { "scscp",           { NULL }, 26133, "udp"  },
-  { "wnn6-ds",         { NULL }, 26208, "tcp"  },
-  { "wnn6-ds",         { NULL }, 26208, "udp"  },
-  { "ezproxy",         { NULL }, 26260, "tcp"  },
-  { "ezproxy",         { NULL }, 26260, "udp"  },
-  { "ezmeeting",       { NULL }, 26261, "tcp"  },
-  { "ezmeeting",       { NULL }, 26261, "udp"  },
-  { "k3software-svr",  { NULL }, 26262, "tcp"  },
-  { "k3software-svr",  { NULL }, 26262, "udp"  },
-  { "k3software-cli",  { NULL }, 26263, "tcp"  },
-  { "k3software-cli",  { NULL }, 26263, "udp"  },
-  { "exoline-tcp",     { NULL }, 26486, "tcp"  },
-  { "exoline-udp",     { NULL }, 26486, "udp"  },
-  { "exoconfig",       { NULL }, 26487, "tcp"  },
-  { "exoconfig",       { NULL }, 26487, "udp"  },
-  { "exonet",          { NULL }, 26489, "tcp"  },
-  { "exonet",          { NULL }, 26489, "udp"  },
-  { "imagepump",       { NULL }, 27345, "tcp"  },
-  { "imagepump",       { NULL }, 27345, "udp"  },
-  { "jesmsjc",         { NULL }, 27442, "tcp"  },
-  { "jesmsjc",         { NULL }, 27442, "udp"  },
-  { "kopek-httphead",  { NULL }, 27504, "tcp"  },
-  { "kopek-httphead",  { NULL }, 27504, "udp"  },
-  { "ars-vista",       { NULL }, 27782, "tcp"  },
-  { "ars-vista",       { NULL }, 27782, "udp"  },
-  { "tw-auth-key",     { NULL }, 27999, "tcp"  },
-  { "tw-auth-key",     { NULL }, 27999, "udp"  },
-  { "nxlmd",           { NULL }, 28000, "tcp"  },
-  { "nxlmd",           { NULL }, 28000, "udp"  },
-  { "pqsp",            { NULL }, 28001, "tcp"  },
-  { "siemensgsm",      { NULL }, 28240, "tcp"  },
-  { "siemensgsm",      { NULL }, 28240, "udp"  },
-  { "sgsap",           { NULL }, 29118, "sctp" },
-  { "otmp",            { NULL }, 29167, "tcp"  },
-  { "otmp",            { NULL }, 29167, "udp"  },
-  { "sbcap",           { NULL }, 29168, "sctp" },
-  { "iuhsctpassoc",    { NULL }, 29169, "sctp" },
-  { "pago-services1",  { NULL }, 30001, "tcp"  },
-  { "pago-services1",  { NULL }, 30001, "udp"  },
-  { "pago-services2",  { NULL }, 30002, "tcp"  },
-  { "pago-services2",  { NULL }, 30002, "udp"  },
-  { "kingdomsonline",  { NULL }, 30260, "tcp"  },
-  { "kingdomsonline",  { NULL }, 30260, "udp"  },
-  { "ovobs",           { NULL }, 30999, "tcp"  },
-  { "ovobs",           { NULL }, 30999, "udp"  },
-  { "autotrac-acp",    { NULL }, 31020, "tcp"  },
-  { "yawn",            { NULL }, 31029, "udp"  },
-  { "xqosd",           { NULL }, 31416, "tcp"  },
-  { "xqosd",           { NULL }, 31416, "udp"  },
-  { "tetrinet",        { NULL }, 31457, "tcp"  },
-  { "tetrinet",        { NULL }, 31457, "udp"  },
-  { "lm-mon",          { NULL }, 31620, "tcp"  },
-  { "lm-mon",          { NULL }, 31620, "udp"  },
-  { "dsx_monitor",     { NULL }, 31685, "tcp"  },
-  { "gamesmith-port",  { NULL }, 31765, "tcp"  },
-  { "gamesmith-port",  { NULL }, 31765, "udp"  },
-  { "iceedcp_tx",      { NULL }, 31948, "tcp"  },
-  { "iceedcp_tx",      { NULL }, 31948, "udp"  },
-  { "iceedcp_rx",      { NULL }, 31949, "tcp"  },
-  { "iceedcp_rx",      { NULL }, 31949, "udp"  },
-  { "iracinghelper",   { NULL }, 32034, "tcp"  },
-  { "iracinghelper",   { NULL }, 32034, "udp"  },
-  { "t1distproc60",    { NULL }, 32249, "tcp"  },
-  { "t1distproc60",    { NULL }, 32249, "udp"  },
-  { "apm-link",        { NULL }, 32483, "tcp"  },
-  { "apm-link",        { NULL }, 32483, "udp"  },
-  { "sec-ntb-clnt",    { NULL }, 32635, "tcp"  },
-  { "sec-ntb-clnt",    { NULL }, 32635, "udp"  },
-  { "DMExpress",       { NULL }, 32636, "tcp"  },
-  { "DMExpress",       { NULL }, 32636, "udp"  },
-  { "filenet-powsrm",  { NULL }, 32767, "tcp"  },
-  { "filenet-powsrm",  { NULL }, 32767, "udp"  },
-  { "filenet-tms",     { NULL }, 32768, "tcp"  },
-  { "filenet-tms",     { NULL }, 32768, "udp"  },
-  { "filenet-rpc",     { NULL }, 32769, "tcp"  },
-  { "filenet-rpc",     { NULL }, 32769, "udp"  },
-  { "filenet-nch",     { NULL }, 32770, "tcp"  },
-  { "filenet-nch",     { NULL }, 32770, "udp"  },
-  { "filenet-rmi",     { NULL }, 32771, "tcp"  },
-  { "filenet-rmi",     { NULL }, 32771, "udp"  },
-  { "filenet-pa",      { NULL }, 32772, "tcp"  },
-  { "filenet-pa",      { NULL }, 32772, "udp"  },
-  { "filenet-cm",      { NULL }, 32773, "tcp"  },
-  { "filenet-cm",      { NULL }, 32773, "udp"  },
-  { "filenet-re",      { NULL }, 32774, "tcp"  },
-  { "filenet-re",      { NULL }, 32774, "udp"  },
-  { "filenet-pch",     { NULL }, 32775, "tcp"  },
-  { "filenet-pch",     { NULL }, 32775, "udp"  },
-  { "filenet-peior",   { NULL }, 32776, "tcp"  },
-  { "filenet-peior",   { NULL }, 32776, "udp"  },
-  { "filenet-obrok",   { NULL }, 32777, "tcp"  },
-  { "filenet-obrok",   { NULL }, 32777, "udp"  },
-  { "mlsn",            { NULL }, 32801, "tcp"  },
-  { "mlsn",            { NULL }, 32801, "udp"  },
-  { "retp",            { NULL }, 32811, "tcp"  },
-  { "idmgratm",        { NULL }, 32896, "tcp"  },
-  { "idmgratm",        { NULL }, 32896, "udp"  },
-  { "aurora-balaena",  { NULL }, 33123, "tcp"  },
-  { "aurora-balaena",  { NULL }, 33123, "udp"  },
-  { "diamondport",     { NULL }, 33331, "tcp"  },
-  { "diamondport",     { NULL }, 33331, "udp"  },
-  { "dgi-serv",        { NULL }, 33333, "tcp"  },
-  { "traceroute",      { NULL }, 33434, "tcp"  },
-  { "traceroute",      { NULL }, 33434, "udp"  },
-  { "snip-slave",      { NULL }, 33656, "tcp"  },
-  { "snip-slave",      { NULL }, 33656, "udp"  },
-  { "turbonote-2",     { NULL }, 34249, "tcp"  },
-  { "turbonote-2",     { NULL }, 34249, "udp"  },
-  { "p-net-local",     { NULL }, 34378, "tcp"  },
-  { "p-net-local",     { NULL }, 34378, "udp"  },
-  { "p-net-remote",    { NULL }, 34379, "tcp"  },
-  { "p-net-remote",    { NULL }, 34379, "udp"  },
-  { "dhanalakshmi",    { NULL }, 34567, "tcp"  },
-  { "profinet-rt",     { NULL }, 34962, "tcp"  },
-  { "profinet-rt",     { NULL }, 34962, "udp"  },
-  { "profinet-rtm",    { NULL }, 34963, "tcp"  },
-  { "profinet-rtm",    { NULL }, 34963, "udp"  },
-  { "profinet-cm",     { NULL }, 34964, "tcp"  },
-  { "profinet-cm",     { NULL }, 34964, "udp"  },
-  { "ethercat",        { NULL }, 34980, "tcp"  },
-  { "ethercat",        { NULL }, 34980, "udp"  },
-  { "allpeers",        { NULL }, 36001, "tcp"  },
-  { "allpeers",        { NULL }, 36001, "udp"  },
-  { "s1-control",      { NULL }, 36412, "sctp" },
-  { "x2-control",      { NULL }, 36422, "sctp" },
-  { "m2ap",            { NULL }, 36443, "sctp" },
-  { "m3ap",            { NULL }, 36444, "sctp" },
-  { "kastenxpipe",     { NULL }, 36865, "tcp"  },
-  { "kastenxpipe",     { NULL }, 36865, "udp"  },
-  { "neckar",          { NULL }, 37475, "tcp"  },
-  { "neckar",          { NULL }, 37475, "udp"  },
-  { "unisys-eportal",  { NULL }, 37654, "tcp"  },
-  { "unisys-eportal",  { NULL }, 37654, "udp"  },
-  { "galaxy7-data",    { NULL }, 38201, "tcp"  },
-  { "galaxy7-data",    { NULL }, 38201, "udp"  },
-  { "fairview",        { NULL }, 38202, "tcp"  },
-  { "fairview",        { NULL }, 38202, "udp"  },
-  { "agpolicy",        { NULL }, 38203, "tcp"  },
-  { "agpolicy",        { NULL }, 38203, "udp"  },
-  { "turbonote-1",     { NULL }, 39681, "tcp"  },
-  { "turbonote-1",     { NULL }, 39681, "udp"  },
-  { "safetynetp",      { NULL }, 40000, "tcp"  },
-  { "safetynetp",      { NULL }, 40000, "udp"  },
-  { "cscp",            { NULL }, 40841, "tcp"  },
-  { "cscp",            { NULL }, 40841, "udp"  },
-  { "csccredir",       { NULL }, 40842, "tcp"  },
-  { "csccredir",       { NULL }, 40842, "udp"  },
-  { "csccfirewall",    { NULL }, 40843, "tcp"  },
-  { "csccfirewall",    { NULL }, 40843, "udp"  },
-  { "ortec-disc",      { NULL }, 40853, "udp"  },
-  { "fs-qos",          { NULL }, 41111, "tcp"  },
-  { "fs-qos",          { NULL }, 41111, "udp"  },
-  { "tentacle",        { NULL }, 41121, "tcp"  },
-  { "crestron-cip",    { NULL }, 41794, "tcp"  },
-  { "crestron-cip",    { NULL }, 41794, "udp"  },
-  { "crestron-ctp",    { NULL }, 41795, "tcp"  },
-  { "crestron-ctp",    { NULL }, 41795, "udp"  },
-  { "candp",           { NULL }, 42508, "tcp"  },
-  { "candp",           { NULL }, 42508, "udp"  },
-  { "candrp",          { NULL }, 42509, "tcp"  },
-  { "candrp",          { NULL }, 42509, "udp"  },
-  { "caerpc",          { NULL }, 42510, "tcp"  },
-  { "caerpc",          { NULL }, 42510, "udp"  },
-  { "reachout",        { NULL }, 43188, "tcp"  },
-  { "reachout",        { NULL }, 43188, "udp"  },
-  { "ndm-agent-port",  { NULL }, 43189, "tcp"  },
-  { "ndm-agent-port",  { NULL }, 43189, "udp"  },
-  { "ip-provision",    { NULL }, 43190, "tcp"  },
-  { "ip-provision",    { NULL }, 43190, "udp"  },
-  { "noit-transport",  { NULL }, 43191, "tcp"  },
-  { "ew-mgmt",         { NULL }, 43440, "tcp"  },
-  { "ew-disc-cmd",     { NULL }, 43440, "udp"  },
-  { "ciscocsdb",       { NULL }, 43441, "tcp"  },
-  { "ciscocsdb",       { NULL }, 43441, "udp"  },
-  { "pmcd",            { NULL }, 44321, "tcp"  },
-  { "pmcd",            { NULL }, 44321, "udp"  },
-  { "pmcdproxy",       { NULL }, 44322, "tcp"  },
-  { "pmcdproxy",       { NULL }, 44322, "udp"  },
-  { "pcp",             { NULL }, 44323, "udp"  },
-  { "rbr-debug",       { NULL }, 44553, "tcp"  },
-  { "rbr-debug",       { NULL }, 44553, "udp"  },
-  { "EtherNet/IP-2",   { NULL }, 44818, "tcp"  },
-  { "EtherNet/IP-2",   { NULL }, 44818, "udp"  },
-  { "invision-ag",     { NULL }, 45054, "tcp"  },
-  { "invision-ag",     { NULL }, 45054, "udp"  },
-  { "eba",             { NULL }, 45678, "tcp"  },
-  { "eba",             { NULL }, 45678, "udp"  },
-  { "qdb2service",     { NULL }, 45825, "tcp"  },
-  { "qdb2service",     { NULL }, 45825, "udp"  },
-  { "ssr-servermgr",   { NULL }, 45966, "tcp"  },
-  { "ssr-servermgr",   { NULL }, 45966, "udp"  },
-  { "mediabox",        { NULL }, 46999, "tcp"  },
-  { "mediabox",        { NULL }, 46999, "udp"  },
-  { "mbus",            { NULL }, 47000, "tcp"  },
-  { "mbus",            { NULL }, 47000, "udp"  },
-  { "winrm",           { NULL }, 47001, "tcp"  },
-  { "dbbrowse",        { NULL }, 47557, "tcp"  },
-  { "dbbrowse",        { NULL }, 47557, "udp"  },
-  { "directplaysrvr",  { NULL }, 47624, "tcp"  },
-  { "directplaysrvr",  { NULL }, 47624, "udp"  },
-  { "ap",              { NULL }, 47806, "tcp"  },
-  { "ap",              { NULL }, 47806, "udp"  },
-  { "bacnet",          { NULL }, 47808, "tcp"  },
-  { "bacnet",          { NULL }, 47808, "udp"  },
-  { "nimcontroller",   { NULL }, 48000, "tcp"  },
-  { "nimcontroller",   { NULL }, 48000, "udp"  },
-  { "nimspooler",      { NULL }, 48001, "tcp"  },
-  { "nimspooler",      { NULL }, 48001, "udp"  },
-  { "nimhub",          { NULL }, 48002, "tcp"  },
-  { "nimhub",          { NULL }, 48002, "udp"  },
-  { "nimgtw",          { NULL }, 48003, "tcp"  },
-  { "nimgtw",          { NULL }, 48003, "udp"  },
-  { "nimbusdb",        { NULL }, 48004, "tcp"  },
-  { "nimbusdbctrl",    { NULL }, 48005, "tcp"  },
-  { "3gpp-cbsp",       { NULL }, 48049, "tcp"  },
-  { "isnetserv",       { NULL }, 48128, "tcp"  },
-  { "isnetserv",       { NULL }, 48128, "udp"  },
-  { "blp5",            { NULL }, 48129, "tcp"  },
-  { "blp5",            { NULL }, 48129, "udp"  },
-  { "com-bardac-dw",   { NULL }, 48556, "tcp"  },
-  { "com-bardac-dw",   { NULL }, 48556, "udp"  },
-  { "iqobject",        { NULL }, 48619, "tcp"  },
-  { "iqobject",        { NULL }, 48619, "udp"  },
-#  endif  /* USE_IANA_REGISTERED_PORTS */
-  { NULL,              { NULL }, 0,     NULL   }
-};
-
-struct servent *getservbyport(int port, const char *proto)
-{
-  unsigned short u_port;
-  const char    *protocol = NULL;
-  int            error    = 0;
-  size_t         i;
-
-  u_port = ntohs((unsigned short)port);
-
-  if (proto) {
-    switch (ares_strlen(proto)) {
-      case 3:
-        if (!strncasecmp(proto, "tcp", 3)) {
-          protocol = "tcp";
-        } else if (!strncasecmp(proto, "udp", 3)) {
-          protocol = "udp";
-        } else {
-          error = WSAEFAULT;
-        }
-        break;
-      case 4:
-        if (!strncasecmp(proto, "sctp", 4)) {
-          protocol = "sctp";
-        } else if (!strncasecmp(proto, "dccp", 4)) {
-          protocol = "dccp";
-        } else {
-          error = WSAEFAULT;
-        }
-        break;
-      default:
-        error = WSAEFAULT;
-    }
-  }
-
-  if (!error) {
-    for (i = 0; i < (sizeof(IANAports) / sizeof(IANAports[0])) - 1; i++) {
-      if (u_port == IANAports[i].s_port) {
-        if (!protocol || !strcasecmp(protocol, IANAports[i].s_proto)) {
-          return (struct servent *)&IANAports[i];
-        }
-      }
-    }
-    error = WSANO_DATA;
-  }
-
-  SET_SOCKERRNO(error);
-  return NULL;
-}
-
-#endif /* _WIN32_WCE */
diff --git a/deps/cares/src/lib/ares_private.h b/deps/cares/src/lib/ares_private.h
index 263c2a606d3708..ce8c3f2ddc2f6c 100644
--- a/deps/cares/src/lib/ares_private.h
+++ b/deps/cares/src/lib/ares_private.h
@@ -40,6 +40,36 @@
 #  include <netinet/in.h>
 #endif
 
+#include "ares_mem.h"
+#include "ares_ipv6.h"
+#include "util/ares_math.h"
+#include "util/ares_time.h"
+#include "util/ares_rand.h"
+#include "ares_array.h"
+#include "ares_llist.h"
+#include "dsa/ares_slist.h"
+#include "ares_htable_strvp.h"
+#include "ares_htable_szvp.h"
+#include "ares_htable_asvp.h"
+#include "ares_htable_dict.h"
+#include "ares_htable_vpvp.h"
+#include "ares_htable_vpstr.h"
+#include "record/ares_dns_multistring.h"
+#include "ares_buf.h"
+#include "record/ares_dns_private.h"
+#include "util/ares_iface_ips.h"
+#include "util/ares_threads.h"
+#include "ares_socket.h"
+#include "ares_conn.h"
+#include "ares_str.h"
+#include "str/ares_strsplit.h"
+#include "util/ares_uri.h"
+
+#ifndef HAVE_GETENV
+#  include "ares_getenv.h"
+#  define getenv(ptr) ares_getenv(ptr)
+#endif
+
 #define DEFAULT_TIMEOUT 2000 /* milliseconds */
 #define DEFAULT_TRIES   3
 #ifndef INADDR_NONE
@@ -100,42 +130,6 @@ W32_FUNC const char *_w32_GetHostsFile(void);
 
 #endif
 
-#include "ares_ipv6.h"
-
-struct ares_rand_state;
-typedef struct ares_rand_state ares_rand_state;
-
-#include "dsa/ares__array.h"
-#include "dsa/ares__llist.h"
-#include "dsa/ares__slist.h"
-#include "dsa/ares__htable_strvp.h"
-#include "dsa/ares__htable_szvp.h"
-#include "dsa/ares__htable_asvp.h"
-#include "dsa/ares__htable_vpvp.h"
-#include "record/ares_dns_multistring.h"
-#include "str/ares__buf.h"
-#include "record/ares_dns_private.h"
-#include "util/ares__iface_ips.h"
-#include "util/ares__threads.h"
-
-#ifndef HAVE_GETENV
-#  include "ares_getenv.h"
-#  define getenv(ptr) ares_getenv(ptr)
-#endif
-
-#include "str/ares_str.h"
-#include "str/ares_strsplit.h"
-
-#ifndef HAVE_STRCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strcasecmp(p1, p2) ares_strcasecmp(p1, p2)
-#endif
-
-#ifndef HAVE_STRNCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strncasecmp(p1, p2, n) ares_strncasecmp(p1, p2, n)
-#endif
-
 /********* EDNS defines section ******/
 #define EDNSPACKETSZ                                          \
   1232 /* Reasonable UDP payload size, as agreed by operators \
@@ -154,140 +148,6 @@ typedef struct ares_rand_state ares_rand_state;
 struct ares_query;
 typedef struct ares_query ares_query_t;
 
-struct ares_server;
-typedef struct ares_server ares_server_t;
-
-struct ares_conn;
-typedef struct ares_conn ares_conn_t;
-
-typedef enum {
-  /*! No flags */
-  ARES_CONN_FLAG_NONE = 0,
-  /*! TCP connection, not UDP */
-  ARES_CONN_FLAG_TCP = 1 << 0,
-  /*! TCP Fast Open is enabled and being used if supported by the OS */
-  ARES_CONN_FLAG_TFO = 1 << 1,
-  /*! TCP Fast Open has not yet sent its first packet. Gets unset on first
-   *  write to a connection */
-  ARES_CONN_FLAG_TFO_INITIAL = 1 << 2
-} ares_conn_flags_t;
-
-struct ares_conn {
-  ares_server_t    *server;
-  ares_socket_t     fd;
-  struct ares_addr  self_ip;
-  ares_conn_flags_t flags;
-  /* total number of queries run on this connection since it was established */
-  size_t            total_queries;
-  /* list of outstanding queries to this connection */
-  ares__llist_t    *queries_to_conn;
-};
-
-#ifdef _MSC_VER
-typedef __int64          ares_int64_t;
-typedef unsigned __int64 ares_uint64_t;
-#else
-typedef long long          ares_int64_t;
-typedef unsigned long long ares_uint64_t;
-#endif
-
-/*! struct timeval on some systems like Windows doesn't support 64bit time so
- *  therefore can't be used due to Y2K38 issues.  Make our own that does have
- *  64bit time. */
-typedef struct {
-  ares_int64_t sec;  /*!< Seconds */
-  unsigned int usec; /*!< Microseconds. Can't be negative. */
-} ares_timeval_t;
-
-/*! Various buckets for grouping history */
-typedef enum {
-  ARES_METRIC_1MINUTE = 0, /*!< Bucket for tracking over the last minute */
-  ARES_METRIC_15MINUTES,   /*!< Bucket for tracking over the last 15 minutes */
-  ARES_METRIC_1HOUR,       /*!< Bucket for tracking over the last hour */
-  ARES_METRIC_1DAY,        /*!< Bucket for tracking over the last day */
-  ARES_METRIC_INCEPTION,   /*!< Bucket for tracking since inception */
-  ARES_METRIC_COUNT        /*!< Count of buckets, not a real bucket */
-} ares_server_bucket_t;
-
-/*! Data metrics collected for each bucket */
-typedef struct {
-  time_t        ts;             /*!< Timestamp divided by bucket divisor */
-  unsigned int  latency_min_ms; /*!< Minimum latency for queries */
-  unsigned int  latency_max_ms; /*!< Maximum latency for queries */
-  ares_uint64_t total_ms;       /*!< Cumulative query time for bucket */
-  ares_uint64_t total_count;    /*!< Number of queries for bucket */
-
-  time_t        prev_ts;        /*!< Previous period bucket timestamp */
-  ares_uint64_t
-    prev_total_ms; /*!< Previous period bucket cumulative query time */
-  ares_uint64_t prev_total_count; /*!< Previous period bucket query count */
-} ares_server_metrics_t;
-
-typedef enum {
-  ARES_COOKIE_INITIAL     = 0,
-  ARES_COOKIE_GENERATED   = 1,
-  ARES_COOKIE_SUPPORTED   = 2,
-  ARES_COOKIE_UNSUPPORTED = 3
-} ares_cookie_state_t;
-
-/*! Structure holding tracking data for RFC 7873/9018 DNS cookies.
- *  Implementation plan for this feature is here:
- *  https://github.com/c-ares/c-ares/issues/620
- */
-typedef struct {
-  /*! starts at INITIAL, transitions as needed. */
-  ares_cookie_state_t state;
-  /*! randomly-generate client cookie */
-  unsigned char       client[8];
-  /*! timestamp client cookie was generated, used for rotation purposes */
-  ares_timeval_t      client_ts;
-  /*! IP address last used for client to connect to server.  If this changes
-   *  The client cookie gets invalidated */
-  struct ares_addr    client_ip;
-  /*! Server Cookie last received, 8-32 bytes in length */
-  unsigned char       server[32];
-  /*! Length of server cookie on file. */
-  size_t              server_len;
-  /*! Timestamp of last attempt to use cookies, but it was determined that the
-   *  server didn't support them */
-  ares_timeval_t      unsupported_ts;
-} ares_cookie_t;
-
-struct ares_server {
-  /* Configuration */
-  size_t                idx;      /* index for server in system configuration */
-  struct ares_addr      addr;
-  unsigned short        udp_port; /* host byte order */
-  unsigned short        tcp_port; /* host byte order */
-  char                  ll_iface[64];    /* IPv6 Link Local Interface */
-  unsigned int          ll_scope;        /* IPv6 Link Local Scope */
-
-  size_t                consec_failures; /* Consecutive query failure count
-                                          * can be hard errors or timeouts
-                                          */
-  ares__llist_t        *connections;
-  ares_conn_t          *tcp_conn;
-
-  /* The next time when we will retry this server if it has hit failures */
-  ares_timeval_t        next_retry_time;
-
-  /* TCP buffer since multiple responses can come back in one read, or partial
-   * in a read */
-  ares__buf_t          *tcp_parser;
-
-  /* TCP output queue */
-  ares__buf_t          *tcp_send;
-
-  /*! Buckets for collecting metrics about the server */
-  ares_server_metrics_t metrics[ARES_METRIC_COUNT];
-
-  /*! RFC 7873/9018 DNS Cookies */
-  ares_cookie_t         cookie;
-
-  /* Link back to owning channel */
-  ares_channel_t       *channel;
-};
-
 /* State to represent a DNS query */
 struct ares_query {
   /* Query ID from qbuf, for faster lookup, and current timeout */
@@ -300,9 +160,9 @@ struct ares_query {
    * Node object for each list entry the query belongs to in order to
    * make removal operations O(1).
    */
-  ares__slist_node_t  *node_queries_by_timeout;
-  ares__llist_node_t  *node_queries_to_conn;
-  ares__llist_node_t  *node_all_queries;
+  ares_slist_node_t   *node_queries_by_timeout;
+  ares_llist_node_t   *node_queries_to_conn;
+  ares_llist_node_t   *node_all_queries;
 
   /* connection handle query is associated with */
   ares_conn_t         *conn;
@@ -328,71 +188,71 @@ struct apattern {
   unsigned char    mask;
 };
 
-struct ares__qcache;
-typedef struct ares__qcache ares__qcache_t;
+struct ares_qcache;
+typedef struct ares_qcache ares_qcache_t;
 
 struct ares_hosts_file;
 typedef struct ares_hosts_file ares_hosts_file_t;
 
 struct ares_channeldata {
   /* Configuration data */
-  unsigned int          flags;
-  size_t                timeout; /* in milliseconds */
-  size_t                tries;
-  size_t                ndots;
-  size_t                maxtimeout;              /* in milliseconds */
-  ares_bool_t           rotate;
-  unsigned short        udp_port;                /* stored in network order */
-  unsigned short        tcp_port;                /* stored in network order */
-  int                   socket_send_buffer_size; /* setsockopt takes int */
-  int                   socket_receive_buffer_size; /* setsockopt takes int */
-  char                **domains;
-  size_t                ndomains;
-  struct apattern      *sortlist;
-  size_t                nsort;
-  char                 *lookups;
-  size_t                ednspsz;
-  unsigned int          qcache_max_ttl;
-  ares_evsys_t          evsys;
-  unsigned int          optmask;
+  unsigned int         flags;
+  size_t               timeout; /* in milliseconds */
+  size_t               tries;
+  size_t               ndots;
+  size_t               maxtimeout;                 /* in milliseconds */
+  ares_bool_t          rotate;
+  unsigned short       udp_port;                   /* stored in network order */
+  unsigned short       tcp_port;                   /* stored in network order */
+  int                  socket_send_buffer_size;    /* setsockopt takes int */
+  int                  socket_receive_buffer_size; /* setsockopt takes int */
+  char               **domains;
+  size_t               ndomains;
+  struct apattern     *sortlist;
+  size_t               nsort;
+  char                *lookups;
+  size_t               ednspsz;
+  unsigned int         qcache_max_ttl;
+  ares_evsys_t         evsys;
+  unsigned int         optmask;
 
   /* For binding to local devices and/or IP addresses.  Leave
    * them null/zero for no binding.
    */
-  char                  local_dev_name[32];
-  unsigned int          local_ip4;
-  unsigned char         local_ip6[16];
+  char                 local_dev_name[32];
+  unsigned int         local_ip4;
+  unsigned char        local_ip6[16];
 
   /* Thread safety lock */
-  ares__thread_mutex_t *lock;
+  ares_thread_mutex_t *lock;
 
   /* Conditional to wake waiters when queue is empty */
-  ares__thread_cond_t  *cond_empty;
+  ares_thread_cond_t  *cond_empty;
 
   /* Server addresses and communications state. Sorted by least consecutive
    * failures, followed by the configuration order if failures are equal. */
-  ares__slist_t        *servers;
+  ares_slist_t        *servers;
 
   /* random state to use when generating new ids and generating retry penalties
    */
-  ares_rand_state      *rand_state;
+  ares_rand_state     *rand_state;
 
   /* All active queries in a single list */
-  ares__llist_t        *all_queries;
+  ares_llist_t        *all_queries;
   /* Queries bucketed by qid, for quickly dispatching DNS responses: */
-  ares__htable_szvp_t  *queries_by_qid;
+  ares_htable_szvp_t  *queries_by_qid;
 
   /* Queries bucketed by timeout, for quickly handling timeouts: */
-  ares__slist_t        *queries_by_timeout;
+  ares_slist_t        *queries_by_timeout;
 
   /* Map linked list node member for connection to file descriptor.  We use
    * the node instead of the connection object itself so we can quickly look
    * up a connection and remove it if necessary (as otherwise we'd have to
    * scan all connections) */
-  ares__htable_asvp_t  *connnode_by_socket;
+  ares_htable_asvp_t  *connnode_by_socket;
 
-  ares_sock_state_cb    sock_state_cb;
-  void                 *sock_state_cb_data;
+  ares_sock_state_cb   sock_state_cb;
+  void                *sock_state_cb_data;
 
   ares_sock_create_callback           sock_create_cb;
   void                               *sock_create_cb_data;
@@ -400,8 +260,14 @@ struct ares_channeldata {
   ares_sock_config_callback           sock_config_cb;
   void                               *sock_config_cb_data;
 
-  const struct ares_socket_functions *sock_funcs;
+  struct ares_socket_functions_ex     sock_funcs;
   void                               *sock_func_cb_data;
+  const struct ares_socket_functions *legacy_sock_funcs;
+  void                               *legacy_sock_funcs_cb_data;
+
+  ares_pending_write_cb               notify_pending_write_cb;
+  void                               *notify_pending_write_cb_data;
+  ares_bool_t                         notify_pending_write;
 
   /* Path for resolv.conf file, configurable via ares_options */
   char                               *resolvconf_path;
@@ -416,7 +282,7 @@ struct ares_channeldata {
   ares_hosts_file_t                  *hf;
 
   /* Query Cache */
-  ares__qcache_t                     *qcache;
+  ares_qcache_t                      *qcache;
 
   /* Fields controlling server failover behavior.
    * The retry chance is the probability (1/N) by which we will retry a failed
@@ -437,7 +303,7 @@ struct ares_channeldata {
    * reading may block.  The thread handle is provided for waiting on thread
    * exit. */
   ares_bool_t                         reinit_pending;
-  ares__thread_t                     *reinit_thread;
+  ares_thread_t                      *reinit_thread;
 
   /* Whether the system is up or not.  This is mainly to prevent deadlocks
    * and access violations during the cleanup process.  Some things like
@@ -447,29 +313,18 @@ struct ares_channeldata {
 };
 
 /* Does the domain end in ".onion" or ".onion."? Case-insensitive. */
-ares_bool_t ares__is_onion_domain(const char *name);
-
-/* Memory management functions */
-extern void *(*ares_malloc)(size_t size);
-extern void *(*ares_realloc)(void *ptr, size_t size);
-extern void (*ares_free)(void *ptr);
-void         *ares_malloc_zero(size_t size);
-void         *ares_realloc_zero(void *ptr, size_t orig_size, size_t new_size);
-
-/* return true if now is exactly check time or later */
-ares_bool_t   ares__timedout(const ares_timeval_t *now,
-                             const ares_timeval_t *check);
+ares_bool_t   ares_is_onion_domain(const char *name);
 
 /* Returns one of the normal ares status codes like ARES_SUCCESS */
-ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now);
-ares_status_t ares__requeue_query(ares_query_t            *query,
-                                  const ares_timeval_t    *now,
-                                  ares_status_t            status,
-                                  ares_bool_t              inc_try_count,
-                                  const ares_dns_record_t *dnsrec);
+ares_status_t ares_send_query(ares_server_t *requested_server /* Optional */,
+                              ares_query_t *query, const ares_timeval_t *now);
+ares_status_t ares_requeue_query(ares_query_t *query, const ares_timeval_t *now,
+                                 ares_status_t            status,
+                                 ares_bool_t              inc_try_count,
+                                 const ares_dns_record_t *dnsrec);
 
 /*! Count the number of labels (dots+1) in a domain */
-size_t ares__name_label_cnt(const char *name);
+size_t        ares_name_label_cnt(const char *name);
 
 /*! Retrieve a list of names to use for searching.  The first successful
  *  query in the list wins.  This function also uses the HOSTSALIASES file
@@ -477,57 +332,45 @@ size_t ares__name_label_cnt(const char *name);
  *
  *  \param[in]  channel   initialized ares channel
  *  \param[in]  name      initial name being searched
- *  \param[out] names     array of names to attempt, use ares__strsplit_free()
+ *  \param[out] names     array of names to attempt, use ares_strsplit_free()
  *                        when no longer needed.
  *  \param[out] names_len number of names in array
  *  \return ARES_SUCCESS on success, otherwise one of the other error codes.
  */
-ares_status_t ares__search_name_list(const ares_channel_t *channel,
-                                     const char *name, char ***names,
-                                     size_t *names_len);
+ares_status_t ares_search_name_list(const ares_channel_t *channel,
+                                    const char *name, char ***names,
+                                    size_t *names_len);
 
 /*! Function to create callback arg for converting from ares_callback_dnsrec
  *  to ares_calback */
-void         *ares__dnsrec_convert_arg(ares_callback callback, void *arg);
+void         *ares_dnsrec_convert_arg(ares_callback callback, void *arg);
 
 /*! Callback function used to convert from the ares_callback_dnsrec prototype to
  *  the ares_callback prototype, by writing the result and passing that to
  *  the inner callback.
  */
-void ares__dnsrec_convert_cb(void *arg, ares_status_t status, size_t timeouts,
-                             const ares_dns_record_t *dnsrec);
-
-void ares__close_connection(ares_conn_t *conn, ares_status_t requeue_status);
-void ares__close_sockets(ares_server_t *server);
-void ares__check_cleanup_conns(const ares_channel_t *channel);
-void ares__free_query(ares_query_t *query);
-
-ares_rand_state *ares__init_rand_state(void);
-void             ares__destroy_rand_state(ares_rand_state *state);
-void ares__rand_bytes(ares_rand_state *state, unsigned char *buf, size_t len);
-
-unsigned short ares__generate_new_id(ares_rand_state *state);
-void           ares__tvnow(ares_timeval_t *now);
-void           ares__timeval_remaining(ares_timeval_t       *remaining,
-                                       const ares_timeval_t *now,
-                                       const ares_timeval_t *tout);
-void ares__timeval_diff(ares_timeval_t *tvdiff, const ares_timeval_t *tvstart,
-                        const ares_timeval_t *tvstop);
-ares_status_t ares__expand_name_validated(const unsigned char *encoded,
-                                          const unsigned char *abuf,
-                                          size_t alen, char **s, size_t *enclen,
+void ares_dnsrec_convert_cb(void *arg, ares_status_t status, size_t timeouts,
+                            const ares_dns_record_t *dnsrec);
+
+void ares_free_query(ares_query_t *query);
+
+unsigned short ares_generate_new_id(ares_rand_state *state);
+ares_status_t  ares_expand_name_validated(const unsigned char *encoded,
+                                          const unsigned char *abuf, size_t alen,
+                                          char **s, size_t *enclen,
                                           ares_bool_t is_hostname);
-ares_status_t ares_expand_string_ex(const unsigned char *encoded,
-                                    const unsigned char *abuf, size_t alen,
-                                    unsigned char **s, size_t *enclen);
-ares_status_t ares__init_servers_state(ares_channel_t *channel);
-ares_status_t ares__init_by_options(ares_channel_t            *channel,
+ares_status_t  ares_expand_string_ex(const unsigned char *encoded,
+                                     const unsigned char *abuf, size_t alen,
+                                     unsigned char **s, size_t *enclen);
+ares_status_t  ares_init_servers_state(ares_channel_t *channel);
+ares_status_t  ares_init_by_options(ares_channel_t            *channel,
                                     const struct ares_options *options,
                                     int                        optmask);
-ares_status_t ares__init_by_sysconfig(ares_channel_t *channel);
+ares_status_t  ares_init_by_sysconfig(ares_channel_t *channel);
+void           ares_set_socket_functions_def(ares_channel_t *channel);
 
 typedef struct {
-  ares__llist_t   *sconfig;
+  ares_llist_t    *sconfig;
   struct apattern *sortlist;
   size_t           nsortlist;
   char           **domains;
@@ -540,131 +383,108 @@ typedef struct {
   ares_bool_t      usevc;
 } ares_sysconfig_t;
 
-ares_status_t ares__sysconfig_set_options(ares_sysconfig_t *sysconfig,
-                                          const char       *str);
+ares_status_t ares_sysconfig_set_options(ares_sysconfig_t *sysconfig,
+                                         const char       *str);
 
-ares_status_t ares__init_by_environment(ares_sysconfig_t *sysconfig);
+ares_status_t ares_init_by_environment(ares_sysconfig_t *sysconfig);
 
-ares_status_t ares__init_sysconfig_files(const ares_channel_t *channel,
-                                         ares_sysconfig_t     *sysconfig);
+ares_status_t ares_init_sysconfig_files(const ares_channel_t *channel,
+                                        ares_sysconfig_t     *sysconfig);
 #ifdef __APPLE__
-ares_status_t ares__init_sysconfig_macos(ares_sysconfig_t *sysconfig);
+ares_status_t ares_init_sysconfig_macos(const ares_channel_t *channel,
+                                        ares_sysconfig_t     *sysconfig);
 #endif
 #ifdef USE_WINSOCK
-ares_status_t ares__init_sysconfig_windows(ares_sysconfig_t *sysconfig);
+ares_status_t ares_init_sysconfig_windows(const ares_channel_t *channel,
+                                          ares_sysconfig_t     *sysconfig);
 #endif
 
-ares_status_t ares__parse_sortlist(struct apattern **sortlist, size_t *nsort,
-                                   const char *str);
-
-void          ares__destroy_servers_state(ares_channel_t *channel);
+ares_status_t ares_parse_sortlist(struct apattern **sortlist, size_t *nsort,
+                                  const char *str);
 
 /* Returns ARES_SUCCESS if alias found, alias is set.  Returns ARES_ENOTFOUND
  * if not alias found.  Returns other errors on critical failure like
  * ARES_ENOMEM */
-ares_status_t ares__lookup_hostaliases(const ares_channel_t *channel,
-                                       const char *name, char **alias);
+ares_status_t ares_lookup_hostaliases(const ares_channel_t *channel,
+                                      const char *name, char **alias);
 
-ares_status_t ares__cat_domain(const char *name, const char *domain, char **s);
-ares_status_t ares__sortaddrinfo(ares_channel_t            *channel,
-                                 struct ares_addrinfo_node *ai_node);
+ares_status_t ares_cat_domain(const char *name, const char *domain, char **s);
+ares_status_t ares_sortaddrinfo(ares_channel_t            *channel,
+                                struct ares_addrinfo_node *ai_node);
 
-void          ares__freeaddrinfo_nodes(struct ares_addrinfo_node *ai_node);
-ares_bool_t   ares__is_localhost(const char *name);
+void          ares_freeaddrinfo_nodes(struct ares_addrinfo_node *ai_node);
+ares_bool_t   ares_is_localhost(const char *name);
 
 struct ares_addrinfo_node    *
-  ares__append_addrinfo_node(struct ares_addrinfo_node **ai_node);
-void ares__addrinfo_cat_nodes(struct ares_addrinfo_node **head,
-                              struct ares_addrinfo_node  *tail);
+  ares_append_addrinfo_node(struct ares_addrinfo_node **ai_node);
+void ares_addrinfo_cat_nodes(struct ares_addrinfo_node **head,
+                             struct ares_addrinfo_node  *tail);
 
-void ares__freeaddrinfo_cnames(struct ares_addrinfo_cname *ai_cname);
+void ares_freeaddrinfo_cnames(struct ares_addrinfo_cname *ai_cname);
 
-struct ares_addrinfo_cname *
-  ares__append_addrinfo_cname(struct ares_addrinfo_cname **ai_cname);
+struct ares_addrinfo_cname             *
+  ares_append_addrinfo_cname(struct ares_addrinfo_cname **ai_cname);
 
 ares_status_t ares_append_ai_node(int aftype, unsigned short port,
                                   unsigned int ttl, const void *adata,
                                   struct ares_addrinfo_node **nodes);
 
-void          ares__addrinfo_cat_cnames(struct ares_addrinfo_cname **head,
-                                        struct ares_addrinfo_cname  *tail);
+void          ares_addrinfo_cat_cnames(struct ares_addrinfo_cname **head,
+                                       struct ares_addrinfo_cname  *tail);
 
-ares_status_t ares__parse_into_addrinfo(const ares_dns_record_t *dnsrec,
-                                        ares_bool_t    cname_only_is_enodata,
-                                        unsigned short port,
-                                        struct ares_addrinfo *ai);
+ares_status_t ares_parse_into_addrinfo(const ares_dns_record_t *dnsrec,
+                                       ares_bool_t    cname_only_is_enodata,
+                                       unsigned short port,
+                                       struct ares_addrinfo *ai);
 ares_status_t ares_parse_ptr_reply_dnsrec(const ares_dns_record_t *dnsrec,
                                           const void *addr, int addrlen,
                                           int family, struct hostent **host);
 
-ares_status_t ares__addrinfo2hostent(const struct ares_addrinfo *ai, int family,
-                                     struct hostent **host);
-ares_status_t ares__addrinfo2addrttl(const struct ares_addrinfo *ai, int family,
-                                     size_t                req_naddrttls,
-                                     struct ares_addrttl  *addrttls,
-                                     struct ares_addr6ttl *addr6ttls,
-                                     size_t               *naddrttls);
-ares_status_t ares__addrinfo_localhost(const char *name, unsigned short port,
-                                       const struct ares_addrinfo_hints *hints,
-                                       struct ares_addrinfo             *ai);
-ares_status_t ares__open_connection(ares_conn_t   **conn_out,
-                                    ares_channel_t *channel,
-                                    ares_server_t *server, ares_bool_t is_tcp);
-ares_bool_t   ares_sockaddr_to_ares_addr(struct ares_addr      *ares_addr,
-                                         unsigned short        *port,
-                                         const struct sockaddr *sockaddr);
-ares_socket_t ares__open_socket(ares_channel_t *channel, int af, int type,
-                                int protocol);
-ares_bool_t   ares__socket_try_again(int errnum);
-ares_ssize_t  ares__conn_write(ares_conn_t *conn, const void *data, size_t len);
-ares_ssize_t  ares__socket_recvfrom(ares_channel_t *channel, ares_socket_t s,
-                                    void *data, size_t data_len, int flags,
-                                    struct sockaddr *from,
-                                    ares_socklen_t  *from_len);
-ares_ssize_t  ares__socket_recv(ares_channel_t *channel, ares_socket_t s,
-                                void *data, size_t data_len);
-void          ares__close_socket(ares_channel_t *channel, ares_socket_t s);
-ares_status_t ares__connect_socket(ares_channel_t        *channel,
-                                   ares_socket_t          sockfd,
-                                   const struct sockaddr *addr,
-                                   ares_socklen_t         addrlen);
-void          ares__destroy_server(ares_server_t *server);
-
-ares_status_t ares__servers_update(ares_channel_t *channel,
-                                   ares__llist_t  *server_list,
-                                   ares_bool_t     user_specified);
-ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
-                                   const struct ares_addr *addr,
-                                   unsigned short          udp_port,
-                                   unsigned short          tcp_port,
-                                   const char             *ll_iface);
-ares_status_t ares__sconfig_append_fromstr(ares__llist_t **sconfig,
-                                           const char     *str,
-                                           ares_bool_t     ignore_invalid);
-ares_status_t ares_in_addr_to_server_config_llist(const struct in_addr *servers,
-                                                  size_t          nservers,
-                                                  ares__llist_t **llist);
+ares_status_t ares_addrinfo2hostent(const struct ares_addrinfo *ai, int family,
+                                    struct hostent **host);
+ares_status_t ares_addrinfo2addrttl(const struct ares_addrinfo *ai, int family,
+                                    size_t                req_naddrttls,
+                                    struct ares_addrttl  *addrttls,
+                                    struct ares_addr6ttl *addr6ttls,
+                                    size_t               *naddrttls);
+ares_status_t ares_addrinfo_localhost(const char *name, unsigned short port,
+                                      const struct ares_addrinfo_hints *hints,
+                                      struct ares_addrinfo             *ai);
+
+ares_status_t ares_servers_update(ares_channel_t *channel,
+                                  ares_llist_t   *server_list,
+                                  ares_bool_t     user_specified);
+ares_status_t
+  ares_sconfig_append(const ares_channel_t *channel, ares_llist_t **sconfig,
+                      const struct ares_addr *addr, unsigned short udp_port,
+                      unsigned short tcp_port, const char *ll_iface);
+ares_status_t ares_sconfig_append_fromstr(const ares_channel_t *channel,
+                                          ares_llist_t        **sconfig,
+                                          const char           *str,
+                                          ares_bool_t           ignore_invalid);
+ares_status_t ares_in_addr_to_sconfig_llist(const struct in_addr *servers,
+                                            size_t                nservers,
+                                            ares_llist_t        **llist);
 ares_status_t ares_get_server_addr(const ares_server_t *server,
-                                   ares__buf_t         *buf);
+                                   ares_buf_t          *buf);
 
 struct ares_hosts_entry;
 typedef struct ares_hosts_entry ares_hosts_entry_t;
 
-void                            ares__hosts_file_destroy(ares_hosts_file_t *hf);
-ares_status_t ares__hosts_search_ipaddr(ares_channel_t *channel,
-                                        ares_bool_t use_env, const char *ipaddr,
-                                        const ares_hosts_entry_t **entry);
-ares_status_t ares__hosts_search_host(ares_channel_t *channel,
-                                      ares_bool_t use_env, const char *host,
-                                      const ares_hosts_entry_t **entry);
-ares_status_t ares__hosts_entry_to_hostent(const ares_hosts_entry_t *entry,
-                                           int                       family,
-                                           struct hostent          **hostent);
-ares_status_t ares__hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
-                                            const char *name, int family,
-                                            unsigned short        port,
-                                            ares_bool_t           want_cnames,
-                                            struct ares_addrinfo *ai);
+void                            ares_hosts_file_destroy(ares_hosts_file_t *hf);
+ares_status_t ares_hosts_search_ipaddr(ares_channel_t *channel,
+                                       ares_bool_t use_env, const char *ipaddr,
+                                       const ares_hosts_entry_t **entry);
+ares_status_t ares_hosts_search_host(ares_channel_t *channel,
+                                     ares_bool_t use_env, const char *host,
+                                     const ares_hosts_entry_t **entry);
+ares_status_t ares_hosts_entry_to_hostent(const ares_hosts_entry_t *entry,
+                                          int family, struct hostent **hostent);
+ares_status_t ares_hosts_entry_to_addrinfo(const ares_hosts_entry_t *entry,
+                                           const char *name, int family,
+                                           unsigned short        port,
+                                           ares_bool_t           want_cnames,
+                                           struct ares_addrinfo *ai);
 
 /* Same as ares_query_dnsrec() except does not take a channel lock.  Use this
  * if a channel lock is already held */
@@ -674,9 +494,17 @@ ares_status_t ares_query_nolock(ares_channel_t *channel, const char *name,
                                 ares_callback_dnsrec callback, void *arg,
                                 unsigned short *qid);
 
-/* Same as ares_send_dnsrec() except does not take a channel lock.  Use this
- * if a channel lock is already held */
-ares_status_t ares_send_nolock(ares_channel_t          *channel,
+/*! Flags controlling behavior for ares_send_nolock() */
+typedef enum {
+  ARES_SEND_FLAG_NOCACHE = 1 << 0, /*!< Do not query the cache */
+  ARES_SEND_FLAG_NORETRY = 1 << 1  /*!< Do not retry this query on error */
+} ares_send_flags_t;
+
+/* Similar to ares_send_dnsrec() except does not take a channel lock, allows
+ * specifying a particular server to use, and also flags controlling behavior.
+ */
+ares_status_t ares_send_nolock(ares_channel_t *channel, ares_server_t *server,
+                               ares_send_flags_t        flags,
                                const ares_dns_record_t *dnsrec,
                                ares_callback_dnsrec callback, void *arg,
                                unsigned short *qid);
@@ -691,7 +519,7 @@ void ares_gethostbyaddr_nolock(ares_channel_t *channel, const void *addr,
  *  offset within the buffer.
  *
  *  It is assumed that either a const buffer is being used, or before
- *  the message processing was started that ares__buf_reclaim() was called.
+ *  the message processing was started that ares_buf_reclaim() was called.
  *
  *  \param[in]  buf        Initialized buffer object
  *  \param[out] name       Pointer passed by reference to be filled in with
@@ -701,8 +529,8 @@ void ares_gethostbyaddr_nolock(ares_channel_t *channel, const void *addr,
  *                         a valid hostname or will return error.
  *  \return ARES_SUCCESS on success
  */
-ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
-                                   ares_bool_t is_hostname);
+ares_status_t ares_dns_name_parse(ares_buf_t *buf, char **name,
+                                  ares_bool_t is_hostname);
 
 /*! Write the DNS name to the buffer in the DNS domain-name syntax as a
  *  series of labels.  The maximum domain name length is 255 characters with
@@ -721,9 +549,9 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
  *  \return ARES_SUCCESS on success, most likely ARES_EBADNAME if the name is
  *          bad.
  */
-ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
-                                   ares_bool_t validate_hostname,
-                                   const char *name);
+ares_status_t ares_dns_name_write(ares_buf_t *buf, ares_llist_t **list,
+                                  ares_bool_t validate_hostname,
+                                  const char *name);
 
 /*! Check if the queue is empty, if so, wake any waiters.  This is only
  *  effective if built with threading support.
@@ -734,35 +562,20 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
  */
 void          ares_queue_notify_empty(ares_channel_t *channel);
 
-
-#define SOCK_STATE_CALLBACK(c, s, r, w)                           \
-  do {                                                            \
-    if ((c)->sock_state_cb) {                                     \
-      (c)->sock_state_cb((c)->sock_state_cb_data, (s), (r), (w)); \
-    }                                                             \
-  } while (0)
-
-#define ARES_CONFIG_CHECK(x)                                               \
-  (x && x->lookups && ares__slist_len(x->servers) > 0 && x->timeout > 0 && \
+#define ARES_CONFIG_CHECK(x)                                              \
+  (x && x->lookups && ares_slist_len(x->servers) > 0 && x->timeout > 0 && \
    x->tries > 0)
 
-ares_bool_t   ares__subnet_match(const struct ares_addr *addr,
-                                 const struct ares_addr *subnet,
-                                 unsigned char           netmask);
-ares_bool_t   ares__addr_is_linklocal(const struct ares_addr *addr);
-
-ares_bool_t   ares__is_64bit(void);
-size_t        ares__round_up_pow2(size_t n);
-size_t        ares__log2(size_t n);
-size_t        ares__pow(size_t x, size_t y);
-size_t        ares__count_digits(size_t n);
-size_t        ares__count_hexdigits(size_t n);
-unsigned char ares__count_bits_u8(unsigned char x);
-void          ares__qcache_destroy(ares__qcache_t *cache);
-ares_status_t ares__qcache_create(ares_rand_state *rand_state,
-                                  unsigned int     max_ttl,
-                                  ares__qcache_t **cache_out);
-void          ares__qcache_flush(ares__qcache_t *cache);
+ares_bool_t   ares_subnet_match(const struct ares_addr *addr,
+                                const struct ares_addr *subnet,
+                                unsigned char           netmask);
+ares_bool_t   ares_addr_is_linklocal(const struct ares_addr *addr);
+
+void          ares_qcache_destroy(ares_qcache_t *cache);
+ares_status_t ares_qcache_create(ares_rand_state *rand_state,
+                                 unsigned int     max_ttl,
+                                 ares_qcache_t  **cache_out);
+void          ares_qcache_flush(ares_qcache_t *cache);
 ares_status_t ares_qcache_insert(ares_channel_t       *channel,
                                  const ares_timeval_t *now,
                                  const ares_query_t   *query,
@@ -784,10 +597,10 @@ ares_status_t ares_cookie_validate(ares_query_t            *query,
                                    ares_conn_t             *conn,
                                    const ares_timeval_t    *now);
 
-ares_status_t ares__channel_threading_init(ares_channel_t *channel);
-void          ares__channel_threading_destroy(ares_channel_t *channel);
-void          ares__channel_lock(const ares_channel_t *channel);
-void          ares__channel_unlock(const ares_channel_t *channel);
+ares_status_t ares_channel_threading_init(ares_channel_t *channel);
+void          ares_channel_threading_destroy(ares_channel_t *channel);
+void          ares_channel_lock(const ares_channel_t *channel);
+void          ares_channel_unlock(const ares_channel_t *channel);
 
 struct ares_event_thread;
 typedef struct ares_event_thread ares_event_thread_t;
diff --git a/deps/cares/src/lib/ares_process.c b/deps/cares/src/lib/ares_process.c
index f05f67d8f2b176..62a6ae1ddaa46e 100644
--- a/deps/cares/src/lib/ares_process.c
+++ b/deps/cares/src/lib/ares_process.c
@@ -46,32 +46,30 @@
 
 
 static void          timeadd(ares_timeval_t *now, size_t millisecs);
-static void          write_tcp_data(ares_channel_t *channel, fd_set *write_fds,
-                                    ares_socket_t write_fd);
-static void          read_packets(ares_channel_t *channel, fd_set *read_fds,
-                                  ares_socket_t read_fd, const ares_timeval_t *now);
-static void          process_timeouts(ares_channel_t       *channel,
+static ares_status_t process_write(ares_channel_t *channel,
+                                   ares_socket_t   write_fd);
+static ares_status_t process_read(ares_channel_t       *channel,
+                                  ares_socket_t         read_fd,
+                                  const ares_timeval_t *now);
+static ares_status_t process_timeouts(ares_channel_t       *channel,
                                       const ares_timeval_t *now);
 static ares_status_t process_answer(ares_channel_t      *channel,
                                     const unsigned char *abuf, size_t alen,
-                                    ares_conn_t *conn, ares_bool_t tcp,
+                                    ares_conn_t          *conn,
                                     const ares_timeval_t *now);
 static void handle_conn_error(ares_conn_t *conn, ares_bool_t critical_failure,
                               ares_status_t failure_status);
-
 static ares_bool_t same_questions(const ares_query_t      *query,
                                   const ares_dns_record_t *arec);
-static ares_bool_t same_address(const struct sockaddr  *sa,
-                                const struct ares_addr *aa);
 static void        end_query(ares_channel_t *channel, ares_server_t *server,
                              ares_query_t *query, ares_status_t status,
                              const ares_dns_record_t *dnsrec);
 
-static void        ares__query_disassociate_from_conn(ares_query_t *query)
+static void        ares_query_remove_from_conn(ares_query_t *query)
 {
   /* If its not part of a connection, it can't be tracked for timeouts either */
-  ares__slist_node_destroy(query->node_queries_by_timeout);
-  ares__llist_node_destroy(query->node_queries_to_conn);
+  ares_slist_node_destroy(query->node_queries_by_timeout);
+  ares_llist_node_destroy(query->node_queries_to_conn);
   query->node_queries_by_timeout = NULL;
   query->node_queries_to_conn    = NULL;
   query->conn                    = NULL;
@@ -82,7 +80,7 @@ static void invoke_server_state_cb(const ares_server_t *server,
                                    ares_bool_t success, int flags)
 {
   const ares_channel_t *channel = server->channel;
-  ares__buf_t          *buf;
+  ares_buf_t           *buf;
   ares_status_t         status;
   char                 *server_string;
 
@@ -90,18 +88,18 @@ static void invoke_server_state_cb(const ares_server_t *server,
     return;
   }
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     return; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   status = ares_get_server_addr(server, buf);
   if (status != ARES_SUCCESS) {
-    ares__buf_destroy(buf); /* LCOV_EXCL_LINE: OutOfMemory */
-    return;                 /* LCOV_EXCL_LINE: OutOfMemory */
+    ares_buf_destroy(buf); /* LCOV_EXCL_LINE: OutOfMemory */
+    return;                /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  server_string = ares__buf_finish_str(buf, NULL);
+  server_string = ares_buf_finish_str(buf, NULL);
   buf           = NULL;
   if (server_string == NULL) {
     return; /* LCOV_EXCL_LINE: OutOfMemory */
@@ -115,19 +113,19 @@ static void invoke_server_state_cb(const ares_server_t *server,
 static void server_increment_failures(ares_server_t *server,
                                       ares_bool_t    used_tcp)
 {
-  ares__slist_node_t   *node;
+  ares_slist_node_t    *node;
   const ares_channel_t *channel = server->channel;
   ares_timeval_t        next_retry_time;
 
-  node = ares__slist_node_find(channel->servers, server);
+  node = ares_slist_node_find(channel->servers, server);
   if (node == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
   server->consec_failures++;
-  ares__slist_node_reinsert(node);
+  ares_slist_node_reinsert(node);
 
-  ares__tvnow(&next_retry_time);
+  ares_tvnow(&next_retry_time);
   timeadd(&next_retry_time, channel->server_retry_delay);
   server->next_retry_time = next_retry_time;
 
@@ -138,17 +136,17 @@ static void server_increment_failures(ares_server_t *server,
 
 static void server_set_good(ares_server_t *server, ares_bool_t used_tcp)
 {
-  ares__slist_node_t   *node;
+  ares_slist_node_t    *node;
   const ares_channel_t *channel = server->channel;
 
-  node = ares__slist_node_find(channel->servers, server);
+  node = ares_slist_node_find(channel->servers, server);
   if (node == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
   if (server->consec_failures > 0) {
     server->consec_failures = 0;
-    ares__slist_node_reinsert(node);
+    ares_slist_node_reinsert(node);
   }
 
   server->next_retry_time.sec  = 0;
@@ -160,8 +158,8 @@ static void server_set_good(ares_server_t *server, ares_bool_t used_tcp)
 }
 
 /* return true if now is exactly check time or later */
-ares_bool_t ares__timedout(const ares_timeval_t *now,
-                           const ares_timeval_t *check)
+ares_bool_t ares_timedout(const ares_timeval_t *now,
+                          const ares_timeval_t *check)
 {
   ares_int64_t secs = (now->sec - check->sec);
 
@@ -190,380 +188,422 @@ static void timeadd(ares_timeval_t *now, size_t millisecs)
   }
 }
 
-/*
- * generic process function
- */
-static void processfds(ares_channel_t *channel, fd_set *read_fds,
-                       ares_socket_t read_fd, fd_set *write_fds,
-                       ares_socket_t write_fd)
+static ares_status_t ares_process_fds_nolock(ares_channel_t         *channel,
+                                             const ares_fd_events_t *events,
+                                             size_t nevents, unsigned int flags)
 {
   ares_timeval_t now;
+  size_t         i;
+  ares_status_t  status = ARES_SUCCESS;
 
-  if (channel == NULL) {
-    return; /* LCOV_EXCL_LINE: DefensiveCoding */
+  if (channel == NULL || (events == NULL && nevents != 0)) {
+    return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__channel_lock(channel);
+  ares_tvnow(&now);
+
+  /* Process write events */
+  for (i = 0; i < nevents; i++) {
+    if (events[i].fd == ARES_SOCKET_BAD ||
+        !(events[i].events & ARES_FD_EVENT_WRITE)) {
+      continue;
+    }
+    status = process_write(channel, events[i].fd);
+    /* We only care about ENOMEM, anything else is handled via connection
+     * retries, etc */
+    if (status == ARES_ENOMEM) {
+      goto done;
+    }
+  }
 
-  ares__tvnow(&now);
-  read_packets(channel, read_fds, read_fd, &now);
-  process_timeouts(channel, &now);
-  /* Write last as the other 2 operations might have triggered writes */
-  write_tcp_data(channel, write_fds, write_fd);
+  /* Process read events */
+  for (i = 0; i < nevents; i++) {
+    if (events[i].fd == ARES_SOCKET_BAD ||
+        !(events[i].events & ARES_FD_EVENT_READ)) {
+      continue;
+    }
+    status = process_read(channel, events[i].fd, &now);
+    if (status == ARES_ENOMEM) {
+      goto done;
+    }
+  }
+
+  if (!(flags & ARES_PROCESS_FLAG_SKIP_NON_FD)) {
+    ares_check_cleanup_conns(channel);
+    status = process_timeouts(channel, &now);
+    if (status == ARES_ENOMEM) {
+      goto done;
+    }
+  }
 
-  /* See if any connections should be cleaned up */
-  ares__check_cleanup_conns(channel);
-  ares__channel_unlock(channel);
+done:
+  if (status == ARES_ENOMEM) {
+    return ARES_ENOMEM;
+  }
+  return ARES_SUCCESS;
 }
 
-/* Something interesting happened on the wire, or there was a timeout.
- * See what's up and respond accordingly.
- */
-void ares_process(ares_channel_t *channel, fd_set *read_fds, fd_set *write_fds)
+ares_status_t ares_process_fds(ares_channel_t         *channel,
+                               const ares_fd_events_t *events, size_t nevents,
+                               unsigned int flags)
 {
-  processfds(channel, read_fds, ARES_SOCKET_BAD, write_fds, ARES_SOCKET_BAD);
+  ares_status_t status;
+
+  if (channel == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  ares_channel_lock(channel);
+  status = ares_process_fds_nolock(channel, events, nevents, flags);
+  ares_channel_unlock(channel);
+  return status;
 }
 
-/* Something interesting happened on the wire, or there was a timeout.
- * See what's up and respond accordingly.
- */
-void ares_process_fd(ares_channel_t *channel,
-                     ares_socket_t   read_fd, /* use ARES_SOCKET_BAD or valid
-                                                 file descriptors */
-                     ares_socket_t   write_fd)
+void ares_process_fd(ares_channel_t *channel, ares_socket_t read_fd,
+                     ares_socket_t write_fd)
 {
-  processfds(channel, NULL, read_fd, NULL, write_fd);
+  ares_fd_events_t events[2];
+  size_t           nevents = 0;
+
+  memset(events, 0, sizeof(events));
+
+  if (read_fd != ARES_SOCKET_BAD) {
+    nevents++;
+    events[nevents - 1].fd      = read_fd;
+    events[nevents - 1].events |= ARES_FD_EVENT_READ;
+  }
+
+  if (write_fd != ARES_SOCKET_BAD) {
+    if (write_fd != read_fd) {
+      nevents++;
+    }
+    events[nevents - 1].fd      = write_fd;
+    events[nevents - 1].events |= ARES_FD_EVENT_WRITE;
+  }
+
+  ares_process_fds(channel, events, nevents, ARES_PROCESS_FLAG_NONE);
 }
 
-/* If any TCP sockets select true for writing, write out queued data
- * we have for them.
- */
-static void write_tcp_data(ares_channel_t *channel, fd_set *write_fds,
-                           ares_socket_t write_fd)
+static ares_socket_t *channel_socket_list(const ares_channel_t *channel,
+                                          size_t               *num)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *snode;
+  ares_array_t      *arr = ares_array_create(sizeof(ares_socket_t), NULL);
 
-  if (!write_fds && (write_fd == ARES_SOCKET_BAD)) {
-    /* no possible action */
-    return;
+  *num = 0;
+
+  if (arr == NULL) {
+    return NULL; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
-    ares_server_t       *server = ares__slist_node_val(node);
-    const unsigned char *data;
-    size_t               data_len;
-    ares_ssize_t         count;
+  for (snode = ares_slist_node_first(channel->servers); snode != NULL;
+       snode = ares_slist_node_next(snode)) {
+    ares_server_t     *server = ares_slist_node_val(snode);
+    ares_llist_node_t *node;
 
-    /* Make sure server has data to send and is selected in write_fds or
-       write_fd. */
-    if (ares__buf_len(server->tcp_send) == 0 || server->tcp_conn == NULL) {
-      continue;
-    }
+    for (node = ares_llist_node_first(server->connections); node != NULL;
+         node = ares_llist_node_next(node)) {
+      const ares_conn_t *conn = ares_llist_node_val(node);
+      ares_socket_t     *sptr;
+      ares_status_t      status;
 
-    if (write_fds) {
-      if (!FD_ISSET(server->tcp_conn->fd, write_fds)) {
-        continue;
-      }
-    } else {
-      if (server->tcp_conn->fd != write_fd) {
+      if (conn->fd == ARES_SOCKET_BAD) {
         continue;
       }
-    }
 
-    if (write_fds) {
-      /* If there's an error and we close this socket, then open
-       * another with the same fd to talk to another server, then we
-       * don't want to think that it was the new socket that was
-       * ready. This is not disastrous, but is likely to result in
-       * extra system calls and confusion. */
-      FD_CLR(server->tcp_conn->fd, write_fds);
-    }
-
-    data  = ares__buf_peek(server->tcp_send, &data_len);
-    count = ares__conn_write(server->tcp_conn, data, data_len);
-    if (count <= 0) {
-      if (!ares__socket_try_again(SOCKERRNO)) {
-        handle_conn_error(server->tcp_conn, ARES_TRUE, ARES_ECONNREFUSED);
+      status = ares_array_insert_last((void **)&sptr, arr);
+      if (status != ARES_SUCCESS) {
+        ares_array_destroy(arr); /* LCOV_EXCL_LINE: OutOfMemory */
+        return NULL;             /* LCOV_EXCL_LINE: OutOfMemory */
       }
-      continue;
-    }
-
-    /* Strip data written from the buffer */
-    ares__buf_consume(server->tcp_send, (size_t)count);
-
-    /* Notify state callback all data is written */
-    if (ares__buf_len(server->tcp_send) == 0) {
-      SOCK_STATE_CALLBACK(channel, server->tcp_conn->fd, 1, 0);
+      *sptr = conn->fd;
     }
   }
+
+  return ares_array_finish(arr, num);
 }
 
-/* If any TCP socket selects true for reading, read some data,
- * allocate a buffer if we finish reading the length word, and process
- * a packet if we finish reading one.
+/* Something interesting happened on the wire, or there was a timeout.
+ * See what's up and respond accordingly.
  */
-static void read_tcp_data(ares_channel_t *channel, ares_conn_t *conn,
-                          const ares_timeval_t *now)
+void ares_process(ares_channel_t *channel, fd_set *read_fds, fd_set *write_fds)
 {
-  ares_ssize_t   count;
-  ares_server_t *server = conn->server;
-
-  /* Fetch buffer to store data we are reading */
-  size_t         ptr_len = 65535;
-  unsigned char *ptr;
-
-  ptr = ares__buf_append_start(server->tcp_parser, &ptr_len);
-
-  if (ptr == NULL) {
-    handle_conn_error(conn, ARES_FALSE /* not critical to connection */,
-                      ARES_SUCCESS);
-    return; /* bail out on malloc failure. TODO: make this
-               function return error codes */
-  }
+  size_t            i;
+  size_t            num_sockets;
+  ares_socket_t    *socketlist;
+  ares_fd_events_t *events  = NULL;
+  size_t            nevents = 0;
 
-  /* Read from socket */
-  count = ares__socket_recv(channel, conn->fd, ptr, ptr_len);
-  if (count <= 0) {
-    ares__buf_append_finish(server->tcp_parser, 0);
-    if (!(count == -1 && ares__socket_try_again(SOCKERRNO))) {
-      handle_conn_error(conn, ARES_TRUE, ARES_ECONNREFUSED);
-    }
+  if (channel == NULL) {
     return;
   }
 
-  /* Record amount of data read */
-  ares__buf_append_finish(server->tcp_parser, (size_t)count);
-
-  /* Process all queued answers */
-  while (1) {
-    unsigned short       dns_len  = 0;
-    const unsigned char *data     = NULL;
-    size_t               data_len = 0;
-    ares_status_t        status;
+  ares_channel_lock(channel);
 
-    /* Tag so we can roll back */
-    ares__buf_tag(server->tcp_parser);
+  /* There is no good way to iterate across an fd_set, instead we must pull a
+   * list of all known fds, and iterate across that checking against the fd_set.
+   */
+  socketlist = channel_socket_list(channel, &num_sockets);
 
-    /* Read length indicator */
-    if (ares__buf_fetch_be16(server->tcp_parser, &dns_len) != ARES_SUCCESS) {
-      ares__buf_tag_rollback(server->tcp_parser);
-      break;
+  /* Lets create an events array, maximum number is the number of sockets in
+   * the list, so we'll use that and just track entries with nevents */
+  if (num_sockets) {
+    events = ares_malloc_zero(sizeof(*events) * num_sockets);
+    if (events == NULL) {
+      goto done;
     }
+  }
 
-    /* Not enough data for a full response yet */
-    if (ares__buf_consume(server->tcp_parser, dns_len) != ARES_SUCCESS) {
-      ares__buf_tag_rollback(server->tcp_parser);
-      break;
+  for (i = 0; i < num_sockets; i++) {
+    ares_bool_t had_read = ARES_FALSE;
+    if (read_fds && FD_ISSET(socketlist[i], read_fds)) {
+      nevents++;
+      events[nevents - 1].fd      = socketlist[i];
+      events[nevents - 1].events |= ARES_FD_EVENT_READ;
+      had_read                    = ARES_TRUE;
     }
-
-    /* Can't fail except for misuse */
-    data = ares__buf_tag_fetch(server->tcp_parser, &data_len);
-    if (data == NULL || data_len < 2) {
-      ares__buf_tag_clear(server->tcp_parser);
-      break;
+    if (write_fds && FD_ISSET(socketlist[i], write_fds)) {
+      if (!had_read) {
+        nevents++;
+      }
+      events[nevents - 1].fd      = socketlist[i];
+      events[nevents - 1].events |= ARES_FD_EVENT_WRITE;
     }
+  }
 
-    /* Strip off 2 bytes length */
-    data     += 2;
-    data_len -= 2;
+done:
+  ares_process_fds_nolock(channel, events, nevents, ARES_PROCESS_FLAG_NONE);
+  ares_free(events);
+  ares_free(socketlist);
+  ares_channel_unlock(channel);
+}
 
-    /* We finished reading this answer; process it */
-    status = process_answer(channel, data, data_len, conn, ARES_TRUE, now);
-    if (status != ARES_SUCCESS) {
-      handle_conn_error(conn, ARES_TRUE, status);
-      return;
-    }
+static ares_status_t process_write(ares_channel_t *channel,
+                                   ares_socket_t   write_fd)
+{
+  ares_conn_t  *conn = ares_conn_from_fd(channel, write_fd);
+  ares_status_t status;
 
-    /* Since we processed the answer, clear the tag so space can be reclaimed */
-    ares__buf_tag_clear(server->tcp_parser);
+  if (conn == NULL) {
+    return ARES_SUCCESS;
+  }
+
+  /* Mark as connected if we got here and TFO Initial not set */
+  if (!(conn->flags & ARES_CONN_FLAG_TFO_INITIAL)) {
+    conn->state_flags |= ARES_CONN_STATE_CONNECTED;
+  }
+
+  status = ares_conn_flush(conn);
+  if (status != ARES_SUCCESS) {
+    handle_conn_error(conn, ARES_TRUE, status);
   }
+  return status;
 }
 
-static ares_socket_t *channel_socket_list(const ares_channel_t *channel,
-                                          size_t               *num)
+void ares_process_pending_write(ares_channel_t *channel)
 {
-  ares__slist_node_t *snode;
-  ares__array_t      *arr = ares__array_create(sizeof(ares_socket_t), NULL);
+  ares_slist_node_t *node;
 
-  *num = 0;
+  if (channel == NULL) {
+    return;
+  }
 
-  if (arr == NULL) {
-    return NULL; /* LCOV_EXCL_LINE: OutOfMemory */
+  ares_channel_lock(channel);
+  if (!channel->notify_pending_write) {
+    ares_channel_unlock(channel);
+    return;
   }
 
-  for (snode = ares__slist_node_first(channel->servers); snode != NULL;
-       snode = ares__slist_node_next(snode)) {
-    ares_server_t      *server = ares__slist_node_val(snode);
-    ares__llist_node_t *node;
+  /* Set as untriggerd before calling into ares_conn_flush(), this is
+   * because its possible ares_conn_flush() might cause additional data to
+   * be enqueued if there is some form of exception so it will need to recurse.
+   */
+  channel->notify_pending_write = ARES_FALSE;
 
-    for (node = ares__llist_node_first(server->connections); node != NULL;
-         node = ares__llist_node_next(node)) {
-      const ares_conn_t *conn = ares__llist_node_val(node);
-      ares_socket_t     *sptr;
-      ares_status_t      status;
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    ares_server_t *server = ares_slist_node_val(node);
+    ares_conn_t   *conn   = server->tcp_conn;
+    ares_status_t  status;
 
-      if (conn->fd == ARES_SOCKET_BAD) {
-        continue;
-      }
+    if (conn == NULL) {
+      continue;
+    }
 
-      status = ares__array_insert_last((void **)&sptr, arr);
-      if (status != ARES_SUCCESS) {
-        ares__array_destroy(arr); /* LCOV_EXCL_LINE: OutOfMemory */
-        return NULL;              /* LCOV_EXCL_LINE: OutOfMemory */
-      }
-      *sptr = conn->fd;
+    /* Enqueue any pending data if there is any */
+    status = ares_conn_flush(conn);
+    if (status != ARES_SUCCESS) {
+      handle_conn_error(conn, ARES_TRUE, status);
     }
   }
 
-  return ares__array_finish(arr, num);
+  ares_channel_unlock(channel);
 }
 
-/* If any UDP sockets select true for reading, process them. */
-static void read_udp_packets_fd(ares_channel_t *channel, ares_conn_t *conn,
-                                const ares_timeval_t *now)
+static ares_status_t read_conn_packets(ares_conn_t *conn)
 {
-  ares_ssize_t  read_len;
-  unsigned char buf[MAXENDSSZ + 1];
+  ares_bool_t           read_again;
+  ares_conn_err_t       err;
+  const ares_channel_t *channel = conn->server->channel;
 
-#ifdef HAVE_RECVFROM
-  ares_socklen_t fromlen;
+  do {
+    size_t         count;
+    size_t         len = 65535;
+    unsigned char *ptr;
+    size_t         start_len = ares_buf_len(conn->in_buf);
+
+    /* If UDP, lets write out a placeholder for the length indicator */
+    if (!(conn->flags & ARES_CONN_FLAG_TCP) &&
+        ares_buf_append_be16(conn->in_buf, 0) != ARES_SUCCESS) {
+      handle_conn_error(conn, ARES_FALSE /* not critical to connection */,
+                        ARES_SUCCESS);
+      return ARES_ENOMEM;
+    }
 
-  union {
-    struct sockaddr     sa;
-    struct sockaddr_in  sa4;
-    struct sockaddr_in6 sa6;
-  } from;
+    /* Get a buffer of sufficient size */
+    ptr = ares_buf_append_start(conn->in_buf, &len);
 
-  memset(&from, 0, sizeof(from));
-#endif
-
-  /* To reduce event loop overhead, read and process as many
-   * packets as we can. */
-  do {
-    if (conn->fd == ARES_SOCKET_BAD) {
-      read_len = -1;
-    } else {
-      if (conn->server->addr.family == AF_INET) {
-        fromlen = sizeof(from.sa4);
-      } else {
-        fromlen = sizeof(from.sa6);
-      }
-      read_len = ares__socket_recvfrom(channel, conn->fd, (void *)buf,
-                                       sizeof(buf), 0, &from.sa, &fromlen);
+    if (ptr == NULL) {
+      handle_conn_error(conn, ARES_FALSE /* not critical to connection */,
+                        ARES_SUCCESS);
+      return ARES_ENOMEM;
     }
 
-    if (read_len == 0) {
-      /* UDP is connectionless, so result code of 0 is a 0-length UDP
-       * packet, and not an indication the connection is closed like on
-       * tcp */
-      continue;
-    } else if (read_len < 0) {
-      if (ares__socket_try_again(SOCKERRNO)) {
-        break;
+    /* Read from socket */
+    err = ares_conn_read(conn, ptr, len, &count);
+
+    if (err != ARES_CONN_ERR_SUCCESS) {
+      ares_buf_append_finish(conn->in_buf, 0);
+      if (!(conn->flags & ARES_CONN_FLAG_TCP)) {
+        ares_buf_set_length(conn->in_buf, start_len);
       }
+      break;
+    }
 
-      handle_conn_error(conn, ARES_TRUE, ARES_ECONNREFUSED);
-      return;
-#ifdef HAVE_RECVFROM
-    } else if (!same_address(&from.sa, &conn->server->addr)) {
-      /* The address the response comes from does not match the address we
-       * sent the request to. Someone may be attempting to perform a cache
-       * poisoning attack. */
-      continue;
-#endif
+    /* Record amount of data read */
+    ares_buf_append_finish(conn->in_buf, count);
 
-    } else {
-      process_answer(channel, buf, (size_t)read_len, conn, ARES_FALSE, now);
+    /* Only loop if sockets support non-blocking operation, and are using UDP
+     * or are using TCP and read the maximum buffer size */
+    read_again = ARES_FALSE;
+    if (channel->sock_funcs.flags & ARES_SOCKFUNC_FLAG_NONBLOCKING &&
+        (!(conn->flags & ARES_CONN_FLAG_TCP) || count == len)) {
+      read_again = ARES_TRUE;
     }
 
+    /* If UDP, overwrite length */
+    if (!(conn->flags & ARES_CONN_FLAG_TCP)) {
+      len = ares_buf_len(conn->in_buf);
+      ares_buf_set_length(conn->in_buf, start_len);
+      ares_buf_append_be16(conn->in_buf, (unsigned short)count);
+      ares_buf_set_length(conn->in_buf, len);
+    }
     /* Try to read again only if *we* set up the socket, otherwise it may be
      * a blocking socket and would cause recvfrom to hang. */
-  } while (read_len >= 0 && channel->sock_funcs == NULL);
-}
+  } while (read_again);
 
-static void read_packets(ares_channel_t *channel, fd_set *read_fds,
-                         ares_socket_t read_fd, const ares_timeval_t *now)
-{
-  size_t              i;
-  ares_socket_t      *socketlist  = NULL;
-  size_t              num_sockets = 0;
-  ares_conn_t        *conn        = NULL;
-  ares__llist_node_t *node        = NULL;
-
-  if (!read_fds && (read_fd == ARES_SOCKET_BAD)) {
-    /* no possible action */
-    return;
+  if (err != ARES_CONN_ERR_SUCCESS && err != ARES_CONN_ERR_WOULDBLOCK) {
+    handle_conn_error(conn, ARES_TRUE, ARES_ECONNREFUSED);
+    return ARES_ECONNREFUSED;
   }
 
-  /* Single socket specified */
-  if (!read_fds) {
-    node = ares__htable_asvp_get_direct(channel->connnode_by_socket, read_fd);
-    if (node == NULL) {
-      return;
-    }
+  return ARES_SUCCESS;
+}
 
-    conn = ares__llist_node_val(node);
+static ares_status_t read_answers(ares_conn_t *conn, const ares_timeval_t *now)
+{
+  ares_status_t   status;
+  ares_channel_t *channel = conn->server->channel;
 
-    if (conn->flags & ARES_CONN_FLAG_TCP) {
-      read_tcp_data(channel, conn, now);
-    } else {
-      read_udp_packets_fd(channel, conn, now);
-    }
+  /* Process all queued answers */
+  while (1) {
+    unsigned short       dns_len  = 0;
+    const unsigned char *data     = NULL;
+    size_t               data_len = 0;
 
-    return;
-  }
+    /* Tag so we can roll back */
+    ares_buf_tag(conn->in_buf);
 
-  /* There is no good way to iterate across an fd_set, instead we must pull a
-   * list of all known fds, and iterate across that checking against the fd_set.
-   */
-  socketlist = channel_socket_list(channel, &num_sockets);
+    /* Read length indicator */
+    status = ares_buf_fetch_be16(conn->in_buf, &dns_len);
+    if (status != ARES_SUCCESS) {
+      ares_buf_tag_rollback(conn->in_buf);
+      break;
+    }
 
-  for (i = 0; i < num_sockets; i++) {
-    if (!FD_ISSET(socketlist[i], read_fds)) {
-      continue;
+    /* Not enough data for a full response yet */
+    status = ares_buf_consume(conn->in_buf, dns_len);
+    if (status != ARES_SUCCESS) {
+      ares_buf_tag_rollback(conn->in_buf);
+      break;
     }
 
-    /* If there's an error and we close this socket, then open
-     * another with the same fd to talk to another server, then we
-     * don't want to think that it was the new socket that was
-     * ready. This is not disastrous, but is likely to result in
-     * extra system calls and confusion. */
-    FD_CLR(socketlist[i], read_fds);
-
-    node =
-      ares__htable_asvp_get_direct(channel->connnode_by_socket, socketlist[i]);
-    if (node == NULL) {
-      return;
+    /* Can't fail except for misuse */
+    data = ares_buf_tag_fetch(conn->in_buf, &data_len);
+    if (data == NULL || data_len < 2) {
+      ares_buf_tag_clear(conn->in_buf);
+      break;
     }
 
-    conn = ares__llist_node_val(node);
+    /* Strip off 2 bytes length */
+    data     += 2;
+    data_len -= 2;
 
-    if (conn->flags & ARES_CONN_FLAG_TCP) {
-      read_tcp_data(channel, conn, now);
-    } else {
-      read_udp_packets_fd(channel, conn, now);
+    /* We finished reading this answer; process it */
+    status = process_answer(channel, data, data_len, conn, now);
+    if (status != ARES_SUCCESS) {
+      handle_conn_error(conn, ARES_TRUE, status);
+      return status;
     }
+
+    /* Since we processed the answer, clear the tag so space can be reclaimed */
+    ares_buf_tag_clear(conn->in_buf);
   }
+  return status;
+}
 
-  ares_free(socketlist);
+static ares_status_t process_read(ares_channel_t       *channel,
+                                  ares_socket_t         read_fd,
+                                  const ares_timeval_t *now)
+{
+  ares_conn_t  *conn = ares_conn_from_fd(channel, read_fd);
+  ares_status_t status;
+
+  if (conn == NULL) {
+    return ARES_SUCCESS;
+  }
+
+  /* TODO: There might be a potential issue here where there was a read that
+   *       read some data, then looped and read again and got a disconnect.
+   *       Right now, that would cause a resend instead of processing the data
+   *       we have.  This is fairly unlikely to occur due to only looping if
+   *       a full buffer of 65535 bytes was read. */
+  status = read_conn_packets(conn);
+
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  return read_answers(conn, now);
 }
 
 /* If any queries have timed out, note the timeout and move them on. */
-static void process_timeouts(ares_channel_t *channel, const ares_timeval_t *now)
+static ares_status_t process_timeouts(ares_channel_t       *channel,
+                                      const ares_timeval_t *now)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *node;
+  ares_status_t      status = ARES_SUCCESS;
 
   /* Just keep popping off the first as this list will re-sort as things come
    * and go.  We don't want to try to rely on 'next' as some operation might
    * cause a cleanup of that pointer and would become invalid */
-  while ((node = ares__slist_node_first(channel->queries_by_timeout)) != NULL) {
-    ares_query_t *query = ares__slist_node_val(node);
+  while ((node = ares_slist_node_first(channel->queries_by_timeout)) != NULL) {
+    ares_query_t *query = ares_slist_node_val(node);
     ares_conn_t  *conn;
 
     /* Since this is sorted, as soon as we hit a query that isn't timed out,
      * break */
-    if (!ares__timedout(now, &query->timeout)) {
+    if (!ares_timedout(now, &query->timeout)) {
       break;
     }
 
@@ -571,8 +611,16 @@ static void process_timeouts(ares_channel_t *channel, const ares_timeval_t *now)
 
     conn = query->conn;
     server_increment_failures(conn->server, query->using_tcp);
-    ares__requeue_query(query, now, ARES_ETIMEOUT, ARES_TRUE, NULL);
+    status = ares_requeue_query(query, now, ARES_ETIMEOUT, ARES_TRUE, NULL);
+    if (status == ARES_ENOMEM) {
+      goto done;
+    }
   }
+done:
+  if (status == ARES_ENOMEM) {
+    return ARES_ENOMEM;
+  }
+  return ARES_SUCCESS;
 }
 
 static ares_status_t rewrite_without_edns(ares_query_t *query)
@@ -607,17 +655,22 @@ static ares_status_t rewrite_without_edns(ares_query_t *query)
  * the connection to be terminated after this call. */
 static ares_status_t process_answer(ares_channel_t      *channel,
                                     const unsigned char *abuf, size_t alen,
-                                    ares_conn_t *conn, ares_bool_t tcp,
+                                    ares_conn_t          *conn,
                                     const ares_timeval_t *now)
 {
   ares_query_t      *query;
-  /* Cache these as once ares__send_query() gets called, it may end up
+  /* Cache these as once ares_send_query() gets called, it may end up
    * invalidating the connection all-together */
   ares_server_t     *server  = conn->server;
   ares_dns_record_t *rdnsrec = NULL;
   ares_status_t      status;
   ares_bool_t        is_cached = ARES_FALSE;
 
+  /* UDP can have 0-byte messages, drop them to the ground */
+  if (alen == 0) {
+    return ARES_SUCCESS;
+  }
+
   /* Parse the response */
   status = ares_dns_parse(abuf, alen, 0, &rdnsrec);
   if (status != ARES_SUCCESS) {
@@ -629,8 +682,8 @@ static ares_status_t process_answer(ares_channel_t      *channel,
   /* Find the query corresponding to this packet. The queries are
    * hashed/bucketed by query id, so this lookup should be quick.
    */
-  query = ares__htable_szvp_get_direct(channel->queries_by_qid,
-                                       ares_dns_record_get_id(rdnsrec));
+  query = ares_htable_szvp_get_direct(channel->queries_by_qid,
+                                      ares_dns_record_get_id(rdnsrec));
   if (!query) {
     /* We may have stopped listening for this query, that's ok */
     status = ARES_SUCCESS;
@@ -657,7 +710,7 @@ static ares_status_t process_answer(ares_channel_t      *channel,
    * remove it from the connection's queue so we can possibly invalidate the
    * connection. Delay cleaning up the connection though as we may enqueue
    * something new.  */
-  ares__llist_node_destroy(query->node_queries_to_conn);
+  ares_llist_node_destroy(query->node_queries_to_conn);
   query->node_queries_to_conn = NULL;
 
   /* If we use EDNS and server answers with FORMERR without an OPT RR, the
@@ -672,7 +725,8 @@ static ares_status_t process_answer(ares_channel_t      *channel,
       goto cleanup;
     }
 
-    ares__send_query(query, now);
+    /* Send to same server */
+    ares_send_query(server, query, now);
     status = ARES_SUCCESS;
     goto cleanup;
   }
@@ -681,10 +735,11 @@ static ares_status_t process_answer(ares_channel_t      *channel,
    * don't accept the packet, and switch the query to TCP if we hadn't
    * done so already.
    */
-  if (ares_dns_record_get_flags(rdnsrec) & ARES_FLAG_TC && !tcp &&
+  if (ares_dns_record_get_flags(rdnsrec) & ARES_FLAG_TC &&
+      !(conn->flags & ARES_CONN_FLAG_TCP) &&
       !(channel->flags & ARES_FLAG_IGNTC)) {
     query->using_tcp = ARES_TRUE;
-    ares__send_query(query, now);
+    ares_send_query(NULL, query, now);
     status = ARES_SUCCESS; /* Switched to TCP is ok */
     goto cleanup;
   }
@@ -711,7 +766,7 @@ static ares_status_t process_answer(ares_channel_t      *channel,
       }
 
       server_increment_failures(server, query->using_tcp);
-      ares__requeue_query(query, now, status, ARES_TRUE, rdnsrec);
+      ares_requeue_query(query, now, status, ARES_TRUE, rdnsrec);
 
       /* Should any of these cause a connection termination?
        * Maybe SERVER_FAILURE? */
@@ -753,19 +808,18 @@ static void handle_conn_error(ares_conn_t *conn, ares_bool_t critical_failure,
   }
 
   /* This will requeue any connections automatically */
-  ares__close_connection(conn, failure_status);
+  ares_close_connection(conn, failure_status);
 }
 
-ares_status_t ares__requeue_query(ares_query_t            *query,
-                                  const ares_timeval_t    *now,
-                                  ares_status_t            status,
-                                  ares_bool_t              inc_try_count,
-                                  const ares_dns_record_t *dnsrec)
+ares_status_t ares_requeue_query(ares_query_t *query, const ares_timeval_t *now,
+                                 ares_status_t            status,
+                                 ares_bool_t              inc_try_count,
+                                 const ares_dns_record_t *dnsrec)
 {
-  ares_channel_t *channel = query->channel;
-  size_t max_tries        = ares__slist_len(channel->servers) * channel->tries;
+  ares_channel_t *channel   = query->channel;
+  size_t          max_tries = ares_slist_len(channel->servers) * channel->tries;
 
-  ares__query_disassociate_from_conn(query);
+  ares_query_remove_from_conn(query);
 
   if (status != ARES_SUCCESS) {
     query->error_status = status;
@@ -776,7 +830,7 @@ ares_status_t ares__requeue_query(ares_query_t            *query,
   }
 
   if (query->try_count < max_tries && !query->no_retries) {
-    return ares__send_query(query, now);
+    return ares_send_query(NULL, query, now);
   }
 
   /* If we are here, all attempts to perform query failed. */
@@ -788,32 +842,58 @@ ares_status_t ares__requeue_query(ares_query_t            *query,
   return ARES_ETIMEOUT;
 }
 
-/* Pick a random server from the list, we first get a random number in the
- * range of the number of servers, then scan until we find that server in
- * the list */
-static ares_server_t *ares__random_server(ares_channel_t *channel)
+/*! Count the number of servers that share the same highest priority (lowest
+ *  consecutive failures).  Since they are sorted in priority order, we just
+ *  stop when the consecutive failure count changes. Used for random selection
+ *  of good servers. */
+static size_t count_highest_prio_servers(const ares_channel_t *channel)
 {
-  unsigned char       c;
-  size_t              cnt;
-  size_t              idx;
-  ares__slist_node_t *node;
-  size_t              num_servers = ares__slist_len(channel->servers);
+  ares_slist_node_t *node;
+  size_t             cnt                  = 0;
+  size_t             last_consec_failures = SIZE_MAX;
+
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    const ares_server_t *server = ares_slist_node_val(node);
+
+    if (last_consec_failures != SIZE_MAX &&
+        last_consec_failures < server->consec_failures) {
+      break;
+    }
+
+    last_consec_failures = server->consec_failures;
+    cnt++;
+  }
+
+  return cnt;
+}
+
+/* Pick a random *best* server from the list, we first get a random number in
+ * the range of the number of *best* servers, then scan until we find that
+ * server in the list */
+static ares_server_t *ares_random_server(ares_channel_t *channel)
+{
+  unsigned char      c;
+  size_t             cnt;
+  size_t             idx;
+  ares_slist_node_t *node;
+  size_t             num_servers = count_highest_prio_servers(channel);
 
   /* Silence coverity, not possible */
   if (num_servers == 0) {
     return NULL;
   }
 
-  ares__rand_bytes(channel->rand_state, &c, 1);
+  ares_rand_bytes(channel->rand_state, &c, 1);
 
   cnt = c;
   idx = cnt % num_servers;
 
   cnt = 0;
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
     if (cnt == idx) {
-      return ares__slist_node_val(node);
+      return ares_slist_node_val(node);
     }
 
     cnt++;
@@ -822,40 +902,32 @@ static ares_server_t *ares__random_server(ares_channel_t *channel)
   return NULL;
 }
 
-/* Pick a server from the list with failover behavior.
- *
- * We default to using the first server in the sorted list of servers. That is
- * the server with the lowest number of consecutive failures and then the
- * highest priority server (by idx) if there is a draw.
- *
- * However, if a server temporarily goes down and hits some failures, then that
- * server will never be retried until all other servers hit the same number of
- * failures. This may prevent the server from being retried for a long time.
- *
- * To resolve this, with some probability we select a failed server to retry
- * instead.
- */
-static ares_server_t *ares__failover_server(ares_channel_t *channel)
+static void server_probe_cb(void *arg, ares_status_t status, size_t timeouts,
+                            const ares_dns_record_t *dnsrec)
 {
-  ares_server_t       *first_server = ares__slist_first_val(channel->servers);
-  const ares_server_t *last_server  = ares__slist_last_val(channel->servers);
-  unsigned short       r;
-
-  /* Defensive code against no servers being available on the channel. */
-  if (first_server == NULL) {
-    return NULL; /* LCOV_EXCL_LINE: DefensiveCoding */
-  }
-
-  /* If no servers have failures, then prefer the first server in the list. */
-  if (last_server != NULL && last_server->consec_failures == 0) {
-    return first_server;
-  }
+  (void)arg;
+  (void)status;
+  (void)timeouts;
+  (void)dnsrec;
+  /* Nothing to do, the logic internally will handle success/fail of this */
+}
 
-  /* If we are not configured with a server retry chance then return the first
-   * server.
-   */
-  if (channel->server_retry_chance == 0) {
-    return first_server;
+/* Determine if we should probe a downed server */
+static void ares_probe_failed_server(ares_channel_t      *channel,
+                                     const ares_server_t *server,
+                                     const ares_query_t  *query)
+{
+  const ares_server_t *last_server = ares_slist_last_val(channel->servers);
+  unsigned short       r;
+  ares_timeval_t       now;
+  ares_slist_node_t   *node;
+  ares_server_t       *probe_server = NULL;
+
+  /* If no servers have failures, or we're not configured with a server retry
+   * chance, then nothing to probe */
+  if ((last_server != NULL && last_server->consec_failures == 0) ||
+      channel->server_retry_chance == 0) {
+    return;
   }
 
   /* Generate a random value to decide whether to retry a failed server. The
@@ -863,36 +935,49 @@ static ares_server_t *ares__failover_server(ares_channel_t *channel)
    * precision of 1/2^B where B is the number of bits in the random value.
    * We use an unsigned short for the random value for increased precision.
    */
-  ares__rand_bytes(channel->rand_state, (unsigned char *)&r, sizeof(r));
-  if (r % channel->server_retry_chance == 0) {
-    /* Select a suitable failed server to retry. */
-    ares_timeval_t      now;
-    ares__slist_node_t *node;
-
-    ares__tvnow(&now);
-    for (node = ares__slist_node_first(channel->servers); node != NULL;
-         node = ares__slist_node_next(node)) {
-      ares_server_t *node_val = ares__slist_node_val(node);
-      if (node_val != NULL && node_val->consec_failures > 0 &&
-          ares__timedout(&now, &node_val->next_retry_time)) {
-        return node_val;
-      }
+  ares_rand_bytes(channel->rand_state, (unsigned char *)&r, sizeof(r));
+  if (r % channel->server_retry_chance != 0) {
+    return;
+  }
+
+  /* Select the first server with failures to retry that has passed the retry
+   * timeout and doesn't already have a pending probe */
+  ares_tvnow(&now);
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    ares_server_t *node_val = ares_slist_node_val(node);
+    if (node_val != NULL && node_val->consec_failures > 0 &&
+        !node_val->probe_pending &&
+        ares_timedout(&now, &node_val->next_retry_time)) {
+      probe_server = node_val;
+      break;
     }
   }
 
-  /* If we have not returned yet, then return the first server. */
-  return first_server;
+  /* Either nothing to probe or the query was enqueud to the same server
+   * we were going to probe. Do nothing. */
+  if (probe_server == NULL || server == probe_server) {
+    return;
+  }
+
+  /* Enqueue an identical query onto the specified server without honoring
+   * the cache or allowing retries.  We want to make sure it only attempts to
+   * use the server in question */
+  probe_server->probe_pending = ARES_TRUE;
+  ares_send_nolock(channel, probe_server,
+                   ARES_SEND_FLAG_NOCACHE | ARES_SEND_FLAG_NORETRY,
+                   query->query, server_probe_cb, NULL, NULL);
 }
 
-static size_t ares__calc_query_timeout(const ares_query_t   *query,
-                                       const ares_server_t  *server,
-                                       const ares_timeval_t *now)
+static size_t ares_calc_query_timeout(const ares_query_t   *query,
+                                      const ares_server_t  *server,
+                                      const ares_timeval_t *now)
 {
   const ares_channel_t *channel  = query->channel;
   size_t                timeout  = ares_metrics_server_timeout(server, now);
   size_t                timeplus = timeout;
   size_t                rounds;
-  size_t                num_servers = ares__slist_len(channel->servers);
+  size_t                num_servers = ares_slist_len(channel->servers);
 
   if (num_servers == 0) {
     return 0; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -922,7 +1007,7 @@ static size_t ares__calc_query_timeout(const ares_query_t   *query,
     unsigned short r;
     float          delta_multiplier;
 
-    ares__rand_bytes(channel->rand_state, (unsigned char *)&r, sizeof(r));
+    ares_rand_bytes(channel->rand_state, (unsigned char *)&r, sizeof(r));
     delta_multiplier  = ((float)r / USHRT_MAX) * 0.5f;
     timeplus         -= (size_t)((float)timeplus * delta_multiplier);
   }
@@ -936,24 +1021,24 @@ static size_t ares__calc_query_timeout(const ares_query_t   *query,
   return timeplus;
 }
 
-static ares_conn_t *ares__fetch_connection(const ares_channel_t *channel,
-                                           ares_server_t        *server,
-                                           const ares_query_t   *query)
+static ares_conn_t *ares_fetch_connection(const ares_channel_t *channel,
+                                          ares_server_t        *server,
+                                          const ares_query_t   *query)
 {
-  ares__llist_node_t *node;
-  ares_conn_t        *conn;
+  ares_llist_node_t *node;
+  ares_conn_t       *conn;
 
   if (query->using_tcp) {
     return server->tcp_conn;
   }
 
   /* Fetch existing UDP connection */
-  node = ares__llist_node_first(server->connections);
+  node = ares_llist_node_first(server->connections);
   if (node == NULL) {
     return NULL;
   }
 
-  conn = ares__llist_node_val(node);
+  conn = ares_llist_node_val(node);
   /* Not UDP, skip */
   if (conn->flags & ARES_CONN_FLAG_TCP) {
     return NULL;
@@ -968,13 +1053,10 @@ static ares_conn_t *ares__fetch_connection(const ares_channel_t *channel,
   return conn;
 }
 
-static ares_status_t ares__conn_query_write(ares_conn_t          *conn,
-                                            ares_query_t         *query,
-                                            const ares_timeval_t *now)
+static ares_status_t ares_conn_query_write(ares_conn_t          *conn,
+                                           ares_query_t         *query,
+                                           const ares_timeval_t *now)
 {
-  unsigned char  *qbuf     = NULL;
-  size_t          qbuf_len = 0;
-  ares_ssize_t    len;
   ares_server_t  *server  = conn->server;
   ares_channel_t *channel = server->channel;
   ares_status_t   status;
@@ -984,81 +1066,57 @@ static ares_status_t ares__conn_query_write(ares_conn_t          *conn,
     return status;
   }
 
-  if (conn->flags & ARES_CONN_FLAG_TCP) {
-    size_t prior_len = ares__buf_len(server->tcp_send);
-
-    status = ares_dns_write_buf_tcp(query->query, server->tcp_send);
-    if (status != ARES_SUCCESS) {
-      return status;
-    }
-
-    if (conn->flags & ARES_CONN_FLAG_TFO_INITIAL) {
-      /* When using TFO, we need to put it on the wire immediately. */
-      size_t               data_len;
-      const unsigned char *data = NULL;
-
-      data = ares__buf_peek(server->tcp_send, &data_len);
-      len  = ares__conn_write(conn, data, data_len);
-      if (len <= 0) {
-        if (ares__socket_try_again(SOCKERRNO)) {
-          /* This means we must not have qualified for TFO, keep the data
-           * buffered, wait on write signal. */
-          return ARES_SUCCESS;
-        }
-
-        /* TCP TFO might delay failure.  Reflect that here */
-        return ARES_ECONNREFUSED;
-      }
-
-      /* Consume what was written */
-      ares__buf_consume(server->tcp_send, (size_t)len);
-      return ARES_SUCCESS;
-    }
-
-    if (prior_len == 0) {
-      SOCK_STATE_CALLBACK(channel, conn->fd, 1, 1);
-    }
-
-    return ARES_SUCCESS;
-  }
-
-  /* UDP Here */
-  status = ares_dns_write(query->query, &qbuf, &qbuf_len);
+  /* We write using the TCP format even for UDP, we just strip the length
+   * before putting on the wire */
+  status = ares_dns_write_buf_tcp(query->query, conn->out_buf);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
-  len = ares__conn_write(conn, qbuf, qbuf_len);
-  ares_free(qbuf);
+  /* Not pending a TFO write and not connected, so we can't even try to
+   * write until we get a signal */
+  if (conn->flags & ARES_CONN_FLAG_TCP &&
+      !(conn->state_flags & ARES_CONN_STATE_CONNECTED) &&
+      !(conn->flags & ARES_CONN_FLAG_TFO_INITIAL)) {
+    return ARES_SUCCESS;
+  }
 
-  if (len == -1) {
-    if (ares__socket_try_again(SOCKERRNO)) {
-      return ARES_ESERVFAIL;
-    }
-    /* UDP is connection-less, but we might receive an ICMP unreachable which
-     * means we can't talk to the remote host at all and that will be
-     * reflected here */
-    return ARES_ECONNREFUSED;
+  /* Delay actual write if possible (TCP only, and only if callback
+   * configured) */
+  if (channel->notify_pending_write_cb && !channel->notify_pending_write &&
+      conn->flags & ARES_CONN_FLAG_TCP) {
+    channel->notify_pending_write = ARES_TRUE;
+    channel->notify_pending_write_cb(channel->notify_pending_write_cb_data);
+    return ARES_SUCCESS;
   }
 
-  return ARES_SUCCESS;
+  /* Unfortunately we need to write right away and can't aggregate multiple
+   * queries into a single write. */
+  return ares_conn_flush(conn);
 }
 
-ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
+ares_status_t ares_send_query(ares_server_t *requested_server,
+                              ares_query_t *query, const ares_timeval_t *now)
 {
   ares_channel_t *channel = query->channel;
   ares_server_t  *server;
   ares_conn_t    *conn;
   size_t          timeplus;
   ares_status_t   status;
+  ares_bool_t     probe_downed_server = ARES_TRUE;
+
 
   /* Choose the server to send the query to */
-  if (channel->rotate) {
-    /* Pull random server */
-    server = ares__random_server(channel);
+  if (requested_server != NULL) {
+    server = requested_server;
   } else {
-    /* Pull server with failover behavior */
-    server = ares__failover_server(channel);
+    /* If rotate is turned on, do a random selection */
+    if (channel->rotate) {
+      server = ares_random_server(channel);
+    } else {
+      /* First server in list */
+      server = ares_slist_first_val(channel->servers);
+    }
   }
 
   if (server == NULL) {
@@ -1066,9 +1124,16 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
     return ARES_ENOSERVER;
   }
 
-  conn = ares__fetch_connection(channel, server, query);
+  /* If a query is directed to a specific query, or the server chosen has
+   * failures, or the query is being retried, don't probe for downed servers */
+  if (requested_server != NULL || server->consec_failures > 0 ||
+      query->try_count != 0) {
+    probe_downed_server = ARES_FALSE;
+  }
+
+  conn = ares_fetch_connection(channel, server, query);
   if (conn == NULL) {
-    status = ares__open_connection(&conn, channel, server, query->using_tcp);
+    status = ares_open_connection(&conn, channel, server, query->using_tcp);
     switch (status) {
       /* Good result, continue on */
       case ARES_SUCCESS:
@@ -1079,7 +1144,7 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
       case ARES_ECONNREFUSED:
       case ARES_EBADFAMILY:
         server_increment_failures(server, query->using_tcp);
-        return ares__requeue_query(query, now, status, ARES_TRUE, NULL);
+        return ares_requeue_query(query, now, status, ARES_TRUE, NULL);
 
       /* Anything else is not retryable, likely ENOMEM */
       default:
@@ -1089,7 +1154,7 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
   }
 
   /* Write the query */
-  status = ares__conn_query_write(conn, query, now);
+  status = ares_conn_query_write(conn, query, now);
   switch (status) {
     /* Good result, continue on */
     case ARES_SUCCESS:
@@ -1105,30 +1170,28 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
     case ARES_ECONNREFUSED:
     case ARES_EBADFAMILY:
       handle_conn_error(conn, ARES_TRUE, status);
-      status = ares__requeue_query(query, now, status, ARES_TRUE, NULL);
+      status = ares_requeue_query(query, now, status, ARES_TRUE, NULL);
       if (status == ARES_ETIMEOUT) {
         status = ARES_ECONNREFUSED;
       }
       return status;
 
-    /* FIXME: Handle EAGAIN here since it likely can happen. Right now we
-     * just requeue to a different server/connection. */
     default:
       server_increment_failures(server, query->using_tcp);
-      status = ares__requeue_query(query, now, status, ARES_TRUE, NULL);
+      status = ares_requeue_query(query, now, status, ARES_TRUE, NULL);
       return status;
   }
 
-  timeplus = ares__calc_query_timeout(query, server, now);
+  timeplus = ares_calc_query_timeout(query, server, now);
   /* Keep track of queries bucketed by timeout, so we can process
    * timeout events quickly.
    */
-  ares__slist_node_destroy(query->node_queries_by_timeout);
+  ares_slist_node_destroy(query->node_queries_by_timeout);
   query->ts      = *now;
   query->timeout = *now;
   timeadd(&query->timeout, timeplus);
   query->node_queries_by_timeout =
-    ares__slist_insert(channel->queries_by_timeout, query);
+    ares_slist_insert(channel->queries_by_timeout, query);
   if (!query->node_queries_by_timeout) {
     /* LCOV_EXCL_START: OutOfMemory */
     end_query(channel, server, query, ARES_ENOMEM, NULL);
@@ -1138,9 +1201,9 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
 
   /* Keep track of queries bucketed by connection, so we can process errors
    * quickly. */
-  ares__llist_node_destroy(query->node_queries_to_conn);
+  ares_llist_node_destroy(query->node_queries_to_conn);
   query->node_queries_to_conn =
-    ares__llist_insert_last(conn->queries_to_conn, query);
+    ares_llist_insert_last(conn->queries_to_conn, query);
 
   if (query->node_queries_to_conn == NULL) {
     /* LCOV_EXCL_START: OutOfMemory */
@@ -1151,6 +1214,13 @@ ares_status_t ares__send_query(ares_query_t *query, const ares_timeval_t *now)
 
   query->conn = conn;
   conn->total_queries++;
+
+  /* We just successfully enqueud a query, see if we should probe downed
+   * servers. */
+  if (probe_downed_server) {
+    ares_probe_failed_server(channel, server, query);
+  }
+
   return ARES_SUCCESS;
 }
 
@@ -1197,12 +1267,12 @@ static ares_bool_t same_questions(const ares_query_t      *query,
        *       server to preserve the case of the name in the response packet.
        *       https://datatracker.ietf.org/doc/html/draft-vixie-dnsext-dns0x20-00
        */
-      if (strcmp(qname, aname) != 0) {
+      if (!ares_streq(qname, aname)) {
         goto done;
       }
     } else {
       /* without DNS0x20 use case-insensitive matching */
-      if (strcasecmp(qname, aname) != 0) {
+      if (!ares_strcaseeq(qname, aname)) {
         goto done;
       }
     }
@@ -1214,42 +1284,12 @@ static ares_bool_t same_questions(const ares_query_t      *query,
   return rv;
 }
 
-static ares_bool_t same_address(const struct sockaddr  *sa,
-                                const struct ares_addr *aa)
-{
-  const void *addr1;
-  const void *addr2;
-
-  if (sa->sa_family == aa->family) {
-    switch (aa->family) {
-      case AF_INET:
-        addr1 = &aa->addr.addr4;
-        addr2 = &(CARES_INADDR_CAST(const struct sockaddr_in *, sa))->sin_addr;
-        if (memcmp(addr1, addr2, sizeof(aa->addr.addr4)) == 0) {
-          return ARES_TRUE; /* match */
-        }
-        break;
-      case AF_INET6:
-        addr1 = &aa->addr.addr6;
-        addr2 =
-          &(CARES_INADDR_CAST(const struct sockaddr_in6 *, sa))->sin6_addr;
-        if (memcmp(addr1, addr2, sizeof(aa->addr.addr6)) == 0) {
-          return ARES_TRUE; /* match */
-        }
-        break;
-      default:
-        break; /* LCOV_EXCL_LINE */
-    }
-  }
-  return ARES_FALSE; /* different */
-}
-
 static void ares_detach_query(ares_query_t *query)
 {
   /* Remove the query from all the lists in which it is linked */
-  ares__query_disassociate_from_conn(query);
-  ares__htable_szvp_remove(query->channel->queries_by_qid, query->qid);
-  ares__llist_node_destroy(query->node_all_queries);
+  ares_query_remove_from_conn(query);
+  ares_htable_szvp_remove(query->channel->queries_by_qid, query->qid);
+  ares_llist_node_destroy(query->node_all_queries);
   query->node_all_queries = NULL;
 }
 
@@ -1257,11 +1297,17 @@ static void end_query(ares_channel_t *channel, ares_server_t *server,
                       ares_query_t *query, ares_status_t status,
                       const ares_dns_record_t *dnsrec)
 {
+  /* If we were probing for the server to come back online, lets mark it as
+   * no longer being probed */
+  if (server != NULL) {
+    server->probe_pending = ARES_FALSE;
+  }
+
   ares_metrics_record(query, server, status, dnsrec);
 
   /* Invoke the callback. */
   query->callback(query->arg, status, query->timeouts, dnsrec);
-  ares__free_query(query);
+  ares_free_query(query);
 
   /* Check and notify if no other queries are enqueued on the channel.  This
    * must come after the callback and freeing the query for 2 reasons.
@@ -1271,7 +1317,7 @@ static void end_query(ares_channel_t *channel, ares_server_t *server,
   ares_queue_notify_empty(channel);
 }
 
-void ares__free_query(ares_query_t *query)
+void ares_free_query(ares_query_t *query)
 {
   ares_detach_query(query);
   /* Zero out some important stuff, to help catch bugs */
diff --git a/deps/cares/src/lib/ares_qcache.c b/deps/cares/src/lib/ares_qcache.c
index 9725212fded7d1..97c0a9137da58b 100644
--- a/deps/cares/src/lib/ares_qcache.c
+++ b/deps/cares/src/lib/ares_qcache.c
@@ -25,10 +25,10 @@
  */
 #include "ares_private.h"
 
-struct ares__qcache {
-  ares__htable_strvp_t *cache;
-  ares__slist_t        *expire;
-  unsigned int          max_ttl;
+struct ares_qcache {
+  ares_htable_strvp_t *cache;
+  ares_slist_t        *expire;
+  unsigned int         max_ttl;
 };
 
 typedef struct {
@@ -36,11 +36,11 @@ typedef struct {
   ares_dns_record_t *dnsrec;
   time_t             expire_ts;
   time_t             insert_ts;
-} ares__qcache_entry_t;
+} ares_qcache_entry_t;
 
-static char *ares__qcache_calc_key(const ares_dns_record_t *dnsrec)
+static char *ares_qcache_calc_key(const ares_dns_record_t *dnsrec)
 {
-  ares__buf_t     *buf = ares__buf_create();
+  ares_buf_t      *buf = ares_buf_create();
   size_t           i;
   ares_status_t    status;
   ares_dns_flags_t flags;
@@ -51,13 +51,13 @@ static char *ares__qcache_calc_key(const ares_dns_record_t *dnsrec)
 
   /* Format is OPCODE|FLAGS[|QTYPE1|QCLASS1|QNAME1]... */
 
-  status = ares__buf_append_str(
+  status = ares_buf_append_str(
     buf, ares_dns_opcode_tostr(ares_dns_record_get_opcode(dnsrec)));
   if (status != ARES_SUCCESS) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_byte(buf, '|');
+  status = ares_buf_append_byte(buf, '|');
   if (status != ARES_SUCCESS) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -65,13 +65,13 @@ static char *ares__qcache_calc_key(const ares_dns_record_t *dnsrec)
   flags = ares_dns_record_get_flags(dnsrec);
   /* Only care about RD and CD */
   if (flags & ARES_FLAG_RD) {
-    status = ares__buf_append_str(buf, "rd");
+    status = ares_buf_append_str(buf, "rd");
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
   if (flags & ARES_FLAG_CD) {
-    status = ares__buf_append_str(buf, "cd");
+    status = ares_buf_append_str(buf, "cd");
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -88,27 +88,27 @@ static char *ares__qcache_calc_key(const ares_dns_record_t *dnsrec)
       goto fail; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
 
-    status = ares__buf_append_byte(buf, '|');
+    status = ares_buf_append_byte(buf, '|');
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_str(buf, ares_dns_rec_type_tostr(qtype));
+    status = ares_buf_append_str(buf, ares_dns_rec_type_tostr(qtype));
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_byte(buf, '|');
+    status = ares_buf_append_byte(buf, '|');
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_str(buf, ares_dns_class_tostr(qclass));
+    status = ares_buf_append_str(buf, ares_dns_class_tostr(qclass));
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_byte(buf, '|');
+    status = ares_buf_append_byte(buf, '|');
     if (status != ARES_SUCCESS) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -122,64 +122,63 @@ static char *ares__qcache_calc_key(const ares_dns_record_t *dnsrec)
     }
 
     if (name_len > 0) {
-      status = ares__buf_append(buf, (const unsigned char *)name, name_len);
+      status = ares_buf_append(buf, (const unsigned char *)name, name_len);
       if (status != ARES_SUCCESS) {
         goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
       }
     }
   }
 
-  return ares__buf_finish_str(buf, NULL);
+  return ares_buf_finish_str(buf, NULL);
 
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
-  ares__buf_destroy(buf);
+  ares_buf_destroy(buf);
   return NULL;
   /* LCOV_EXCL_STOP */
 }
 
-static void ares__qcache_expire(ares__qcache_t       *cache,
-                                const ares_timeval_t *now)
+static void ares_qcache_expire(ares_qcache_t *cache, const ares_timeval_t *now)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *node;
 
   if (cache == NULL) {
     return;
   }
 
-  while ((node = ares__slist_node_first(cache->expire)) != NULL) {
-    const ares__qcache_entry_t *entry = ares__slist_node_val(node);
+  while ((node = ares_slist_node_first(cache->expire)) != NULL) {
+    const ares_qcache_entry_t *entry = ares_slist_node_val(node);
 
     /* If now is NULL, we're flushing everything, so don't break */
     if (now != NULL && entry->expire_ts > now->sec) {
       break;
     }
 
-    ares__htable_strvp_remove(cache->cache, entry->key);
-    ares__slist_node_destroy(node);
+    ares_htable_strvp_remove(cache->cache, entry->key);
+    ares_slist_node_destroy(node);
   }
 }
 
-void ares__qcache_flush(ares__qcache_t *cache)
+void ares_qcache_flush(ares_qcache_t *cache)
 {
-  ares__qcache_expire(cache, NULL /* flush all */);
+  ares_qcache_expire(cache, NULL /* flush all */);
 }
 
-void ares__qcache_destroy(ares__qcache_t *cache)
+void ares_qcache_destroy(ares_qcache_t *cache)
 {
   if (cache == NULL) {
     return;
   }
 
-  ares__htable_strvp_destroy(cache->cache);
-  ares__slist_destroy(cache->expire);
+  ares_htable_strvp_destroy(cache->cache);
+  ares_slist_destroy(cache->expire);
   ares_free(cache);
 }
 
-static int ares__qcache_entry_sort_cb(const void *arg1, const void *arg2)
+static int ares_qcache_entry_sort_cb(const void *arg1, const void *arg2)
 {
-  const ares__qcache_entry_t *entry1 = arg1;
-  const ares__qcache_entry_t *entry2 = arg2;
+  const ares_qcache_entry_t *entry1 = arg1;
+  const ares_qcache_entry_t *entry2 = arg2;
 
   if (entry1->expire_ts > entry2->expire_ts) {
     return 1;
@@ -192,9 +191,9 @@ static int ares__qcache_entry_sort_cb(const void *arg1, const void *arg2)
   return 0;
 }
 
-static void ares__qcache_entry_destroy_cb(void *arg)
+static void ares_qcache_entry_destroy_cb(void *arg)
 {
-  ares__qcache_entry_t *entry = arg;
+  ares_qcache_entry_t *entry = arg;
   if (entry == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
@@ -204,12 +203,12 @@ static void ares__qcache_entry_destroy_cb(void *arg)
   ares_free(entry);
 }
 
-ares_status_t ares__qcache_create(ares_rand_state *rand_state,
-                                  unsigned int     max_ttl,
-                                  ares__qcache_t **cache_out)
+ares_status_t ares_qcache_create(ares_rand_state *rand_state,
+                                 unsigned int     max_ttl,
+                                 ares_qcache_t  **cache_out)
 {
-  ares_status_t   status = ARES_SUCCESS;
-  ares__qcache_t *cache;
+  ares_status_t  status = ARES_SUCCESS;
+  ares_qcache_t *cache;
 
   cache = ares_malloc_zero(sizeof(*cache));
   if (cache == NULL) {
@@ -217,14 +216,14 @@ ares_status_t ares__qcache_create(ares_rand_state *rand_state,
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  cache->cache = ares__htable_strvp_create(NULL);
+  cache->cache = ares_htable_strvp_create(NULL);
   if (cache->cache == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  cache->expire = ares__slist_create(rand_state, ares__qcache_entry_sort_cb,
-                                     ares__qcache_entry_destroy_cb);
+  cache->expire = ares_slist_create(rand_state, ares_qcache_entry_sort_cb,
+                                    ares_qcache_entry_destroy_cb);
   if (cache->expire == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -235,7 +234,7 @@ ares_status_t ares__qcache_create(ares_rand_state *rand_state,
 done:
   if (status != ARES_SUCCESS) {
     *cache_out = NULL;
-    ares__qcache_destroy(cache);
+    ares_qcache_destroy(cache);
     return status;
   }
 
@@ -243,7 +242,7 @@ ares_status_t ares__qcache_create(ares_rand_state *rand_state,
   return status;
 }
 
-static unsigned int ares__qcache_calc_minttl(ares_dns_record_t *dnsrec)
+static unsigned int ares_qcache_calc_minttl(ares_dns_record_t *dnsrec)
 {
   unsigned int minttl = 0xFFFFFFFF;
   size_t       sect;
@@ -272,7 +271,7 @@ static unsigned int ares__qcache_calc_minttl(ares_dns_record_t *dnsrec)
   return minttl;
 }
 
-static unsigned int ares__qcache_soa_minimum(ares_dns_record_t *dnsrec)
+static unsigned int ares_qcache_soa_minimum(ares_dns_record_t *dnsrec)
 {
   size_t i;
 
@@ -302,15 +301,15 @@ static unsigned int ares__qcache_soa_minimum(ares_dns_record_t *dnsrec)
 }
 
 /* On success, takes ownership of dnsrec */
-static ares_status_t ares__qcache_insert(ares__qcache_t          *qcache,
-                                         ares_dns_record_t       *qresp,
-                                         const ares_dns_record_t *qreq,
-                                         const ares_timeval_t    *now)
+static ares_status_t ares_qcache_insert_int(ares_qcache_t           *qcache,
+                                            ares_dns_record_t       *qresp,
+                                            const ares_dns_record_t *qreq,
+                                            const ares_timeval_t    *now)
 {
-  ares__qcache_entry_t *entry;
-  unsigned int          ttl;
-  ares_dns_rcode_t      rcode = ares_dns_record_get_rcode(qresp);
-  ares_dns_flags_t      flags = ares_dns_record_get_flags(qresp);
+  ares_qcache_entry_t *entry;
+  unsigned int         ttl;
+  ares_dns_rcode_t     rcode = ares_dns_record_get_rcode(qresp);
+  ares_dns_flags_t     flags = ares_dns_record_get_flags(qresp);
 
   if (qcache == NULL || qresp == NULL) {
     return ARES_EFORMERR;
@@ -328,9 +327,9 @@ static ares_status_t ares__qcache_insert(ares__qcache_t          *qcache,
 
   /* Look at SOA for NXDOMAIN for minimum */
   if (rcode == ARES_RCODE_NXDOMAIN) {
-    ttl = ares__qcache_soa_minimum(qresp);
+    ttl = ares_qcache_soa_minimum(qresp);
   } else {
-    ttl = ares__qcache_calc_minttl(qresp);
+    ttl = ares_qcache_calc_minttl(qresp);
   }
 
   if (ttl > qcache->max_ttl) {
@@ -355,16 +354,16 @@ static ares_status_t ares__qcache_insert(ares__qcache_t          *qcache,
    * request had, so we have to re-parse the request in order to generate the
    * key for caching, but we'll only do this once we know for sure we really
    * want to cache it */
-  entry->key = ares__qcache_calc_key(qreq);
+  entry->key = ares_qcache_calc_key(qreq);
   if (entry->key == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  if (!ares__htable_strvp_insert(qcache->cache, entry->key, entry)) {
+  if (!ares_htable_strvp_insert(qcache->cache, entry->key, entry)) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  if (ares__slist_insert(qcache->expire, entry) == NULL) {
+  if (ares_slist_insert(qcache->expire, entry) == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
@@ -373,7 +372,7 @@ static ares_status_t ares__qcache_insert(ares__qcache_t          *qcache,
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
   if (entry != NULL && entry->key != NULL) {
-    ares__htable_strvp_remove(qcache->cache, entry->key);
+    ares_htable_strvp_remove(qcache->cache, entry->key);
     ares_free(entry->key);
     ares_free(entry);
   }
@@ -386,9 +385,9 @@ ares_status_t ares_qcache_fetch(ares_channel_t           *channel,
                                 const ares_dns_record_t  *dnsrec,
                                 const ares_dns_record_t **dnsrec_resp)
 {
-  char                 *key = NULL;
-  ares__qcache_entry_t *entry;
-  ares_status_t         status = ARES_SUCCESS;
+  char                *key = NULL;
+  ares_qcache_entry_t *entry;
+  ares_status_t        status = ARES_SUCCESS;
 
   if (channel == NULL || dnsrec == NULL || dnsrec_resp == NULL) {
     return ARES_EFORMERR;
@@ -398,22 +397,22 @@ ares_status_t ares_qcache_fetch(ares_channel_t           *channel,
     return ARES_ENOTFOUND;
   }
 
-  ares__qcache_expire(channel->qcache, now);
+  ares_qcache_expire(channel->qcache, now);
 
-  key = ares__qcache_calc_key(dnsrec);
+  key = ares_qcache_calc_key(dnsrec);
   if (key == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  entry = ares__htable_strvp_get_direct(channel->qcache->cache, key);
+  entry = ares_htable_strvp_get_direct(channel->qcache->cache, key);
   if (entry == NULL) {
     status = ARES_ENOTFOUND;
     goto done;
   }
 
-  ares_dns_record_write_ttl_decrement(
-    entry->dnsrec, (unsigned int)(now->sec - entry->insert_ts));
+  ares_dns_record_ttl_decrement(entry->dnsrec,
+                                (unsigned int)(now->sec - entry->insert_ts));
 
   *dnsrec_resp = entry->dnsrec;
 
@@ -427,5 +426,5 @@ ares_status_t ares_qcache_insert(ares_channel_t       *channel,
                                  const ares_query_t   *query,
                                  ares_dns_record_t    *dnsrec)
 {
-  return ares__qcache_insert(channel->qcache, dnsrec, query->query, now);
+  return ares_qcache_insert_int(channel->qcache, dnsrec, query->query, now);
 }
diff --git a/deps/cares/src/lib/ares_query.c b/deps/cares/src/lib/ares_query.c
index 4d0861a5f52d51..ca3b6a9b732add 100644
--- a/deps/cares/src/lib/ares_query.c
+++ b/deps/cares/src/lib/ares_query.c
@@ -105,7 +105,8 @@ ares_status_t ares_query_nolock(ares_channel_t *channel, const char *name,
   qquery->arg      = arg;
 
   /* Send it off.  qcallback will be called when we get an answer. */
-  status = ares_send_nolock(channel, dnsrec, ares_query_dnsrec_cb, qquery, qid);
+  status = ares_send_nolock(channel, NULL, 0, dnsrec, ares_query_dnsrec_cb,
+                            qquery, qid);
 
   ares_dns_record_destroy(dnsrec);
   return status;
@@ -123,9 +124,9 @@ ares_status_t ares_query_dnsrec(ares_channel_t *channel, const char *name,
     return ARES_EFORMERR;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   status = ares_query_nolock(channel, name, dnsclass, type, callback, arg, qid);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return status;
 }
 
@@ -138,13 +139,13 @@ void ares_query(ares_channel_t *channel, const char *name, int dnsclass,
     return;
   }
 
-  carg = ares__dnsrec_convert_arg(callback, arg);
+  carg = ares_dnsrec_convert_arg(callback, arg);
   if (carg == NULL) {
     callback(arg, ARES_ENOMEM, 0, NULL, 0); /* LCOV_EXCL_LINE: OutOfMemory */
     return;                                 /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   ares_query_dnsrec(channel, name, (ares_dns_class_t)dnsclass,
-                    (ares_dns_rec_type_t)type, ares__dnsrec_convert_cb, carg,
+                    (ares_dns_rec_type_t)type, ares_dnsrec_convert_cb, carg,
                     NULL);
 }
diff --git a/deps/cares/src/lib/ares_search.c b/deps/cares/src/lib/ares_search.c
index 2d3c0fc5145684..c605caf42cb7a8 100644
--- a/deps/cares/src/lib/ares_search.c
+++ b/deps/cares/src/lib/ares_search.c
@@ -55,7 +55,7 @@ static void squery_free(struct search_query *squery)
   if (squery == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
-  ares__strsplit_free(squery->names, squery->names_cnt);
+  ares_strsplit_free(squery->names, squery->names_cnt);
   ares_dns_record_destroy(squery->dnsrec);
   ares_free(squery);
 }
@@ -92,8 +92,8 @@ static ares_status_t ares_search_next(ares_channel_t      *channel,
     return status;
   }
 
-  status =
-    ares_send_nolock(channel, squery->dnsrec, search_callback, squery, NULL);
+  status = ares_send_nolock(channel, NULL, 0, squery->dnsrec, search_callback,
+                            squery, NULL);
 
   if (status != ARES_EFORMERR) {
     *skip_cleanup = ARES_TRUE;
@@ -114,10 +114,9 @@ static void search_callback(void *arg, ares_status_t status, size_t timeouts,
   squery->timeouts += timeouts;
 
   if (dnsrec) {
-    ares_dns_rcode_t rcode   = ares_dns_record_get_rcode(dnsrec);
-    size_t           ancount = ares_dns_record_rr_cnt(dnsrec,
-                                                      ARES_SECTION_ANSWER);
-    mystatus = ares_dns_query_reply_tostatus(rcode, ancount);
+    ares_dns_rcode_t rcode = ares_dns_record_get_rcode(dnsrec);
+    size_t ancount = ares_dns_record_rr_cnt(dnsrec, ARES_SECTION_ANSWER);
+    mystatus       = ares_dns_query_reply_tostatus(rcode, ancount);
   } else {
     mystatus = status;
   }
@@ -128,9 +127,9 @@ static void search_callback(void *arg, ares_status_t status, size_t timeouts,
       break;
     case ARES_ESERVFAIL:
     case ARES_EREFUSED:
-       /* Issue #852, systemd-resolved may return SERVFAIL or REFUSED on a
+      /* Issue #852, systemd-resolved may return SERVFAIL or REFUSED on a
        * single label domain name. */
-      if (ares__name_label_cnt(squery->names[squery->next_name_idx-1]) != 1) {
+      if (ares_name_label_cnt(squery->names[squery->next_name_idx - 1]) != 1) {
         end_squery(squery, mystatus, dnsrec);
         return;
       }
@@ -169,8 +168,8 @@ static void search_callback(void *arg, ares_status_t status, size_t timeouts,
 
 /* Determine if the domain should be looked up as-is, or if it is eligible
  * for search by appending domains */
-static ares_bool_t ares__search_eligible(const ares_channel_t *channel,
-                                         const char           *name)
+static ares_bool_t ares_search_eligible(const ares_channel_t *channel,
+                                        const char           *name)
 {
   size_t len = ares_strlen(name);
 
@@ -186,10 +185,10 @@ static ares_bool_t ares__search_eligible(const ares_channel_t *channel,
   return ARES_TRUE;
 }
 
-size_t ares__name_label_cnt(const char *name)
+size_t ares_name_label_cnt(const char *name)
 {
-  const char   *p;
-  size_t        ndots = 0;
+  const char *p;
+  size_t      ndots = 0;
 
   if (name == NULL) {
     return 0;
@@ -202,12 +201,12 @@ size_t ares__name_label_cnt(const char *name)
   }
 
   /* Label count is 1 greater than ndots */
-  return ndots+1;
+  return ndots + 1;
 }
 
-ares_status_t ares__search_name_list(const ares_channel_t *channel,
-                                     const char *name, char ***names,
-                                     size_t *names_len)
+ares_status_t ares_search_name_list(const ares_channel_t *channel,
+                                    const char *name, char ***names,
+                                    size_t *names_len)
 {
   ares_status_t status;
   char        **list     = NULL;
@@ -218,7 +217,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
   size_t        i;
 
   /* Perform HOSTALIASES resolution */
-  status = ares__lookup_hostaliases(channel, name, &alias);
+  status = ares_lookup_hostaliases(channel, name, &alias);
   if (status == ARES_SUCCESS) {
     /* If hostalias succeeds, there is no searching, it is used as-is */
     list_len = 1;
@@ -235,7 +234,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
   }
 
   /* See if searching is eligible at all, if not, look up as-is only */
-  if (!ares__search_eligible(channel, name)) {
+  if (!ares_search_eligible(channel, name)) {
     list_len = 1;
     list     = ares_malloc_zero(sizeof(*list) * list_len);
     if (list == NULL) {
@@ -252,7 +251,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
   }
 
   /* Count the number of dots in name, 1 less than label count */
-  ndots = ares__name_label_cnt(name);
+  ndots = ares_name_label_cnt(name);
   if (ndots > 0) {
     ndots--;
   }
@@ -266,7 +265,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
   }
 
   /* Set status here, its possible there are no search domains at all, so
-   * status may be ARES_ENOTFOUND from ares__lookup_hostaliases(). */
+   * status may be ARES_ENOTFOUND from ares_lookup_hostaliases(). */
   status = ARES_SUCCESS;
 
   /* Try as-is first */
@@ -281,7 +280,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
 
   /* Append each search suffix to the name */
   for (i = 0; i < channel->ndomains; i++) {
-    status = ares__cat_domain(name, channel->domains[i], &list[idx]);
+    status = ares_cat_domain(name, channel->domains[i], &list[idx]);
     if (status != ARES_SUCCESS) {
       goto done;
     }
@@ -304,7 +303,7 @@ ares_status_t ares__search_name_list(const ares_channel_t *channel,
     *names     = list;
     *names_len = list_len;
   } else {
-    ares__strsplit_free(list, list_len);
+    ares_strsplit_free(list, list_len);
   }
 
   ares_free(alias);
@@ -334,7 +333,7 @@ static ares_status_t ares_search_int(ares_channel_t          *channel,
   }
 
   /* Per RFC 7686, reject queries for ".onion" domain names with NXDOMAIN. */
-  if (ares__is_onion_domain(name)) {
+  if (ares_is_onion_domain(name)) {
     status = ARES_ENOTFOUND;
     goto fail;
   }
@@ -363,7 +362,7 @@ static ares_status_t ares_search_int(ares_channel_t          *channel,
   squery->ever_got_nodata = ARES_FALSE;
 
   status =
-    ares__search_name_list(channel, name, &squery->names, &squery->names_cnt);
+    ares_search_name_list(channel, name, &squery->names, &squery->names_cnt);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
@@ -383,7 +382,7 @@ static ares_status_t ares_search_int(ares_channel_t          *channel,
   return status;
 }
 
-/* Callback argument structure passed to ares__dnsrec_convert_cb(). */
+/* Callback argument structure passed to ares_dnsrec_convert_cb(). */
 typedef struct {
   ares_callback callback;
   void         *arg;
@@ -391,7 +390,7 @@ typedef struct {
 
 /*! Function to create callback arg for converting from ares_callback_dnsrec
  *  to ares_calback */
-void *ares__dnsrec_convert_arg(ares_callback callback, void *arg)
+void *ares_dnsrec_convert_arg(ares_callback callback, void *arg)
 {
   dnsrec_convert_arg_t *carg = ares_malloc_zero(sizeof(*carg));
   if (carg == NULL) {
@@ -406,8 +405,8 @@ void *ares__dnsrec_convert_arg(ares_callback callback, void *arg)
  *  the ares_callback prototype, by writing the result and passing that to
  *  the inner callback.
  */
-void ares__dnsrec_convert_cb(void *arg, ares_status_t status, size_t timeouts,
-                             const ares_dns_record_t *dnsrec)
+void ares_dnsrec_convert_cb(void *arg, ares_status_t status, size_t timeouts,
+                            const ares_dns_record_t *dnsrec)
 {
   dnsrec_convert_arg_t *carg = arg;
   unsigned char        *abuf = NULL;
@@ -442,11 +441,11 @@ void ares_search(ares_channel_t *channel, const char *name, int dnsclass,
   }
 
   /* For now, ares_search_int() uses the ares_callback prototype. We need to
-   * wrap the callback passed to this function in ares__dnsrec_convert_cb, to
+   * wrap the callback passed to this function in ares_dnsrec_convert_cb, to
    * convert from ares_callback_dnsrec to ares_callback. Allocate the convert
    * arg structure here.
    */
-  carg = ares__dnsrec_convert_arg(callback, arg);
+  carg = ares_dnsrec_convert_arg(callback, arg);
   if (carg == NULL) {
     callback(arg, ARES_ENOMEM, 0, NULL, 0);
     return;
@@ -463,9 +462,9 @@ void ares_search(ares_channel_t *channel, const char *name, int dnsclass,
     return;
   }
 
-  ares__channel_lock(channel);
-  ares_search_int(channel, dnsrec, ares__dnsrec_convert_cb, carg);
-  ares__channel_unlock(channel);
+  ares_channel_lock(channel);
+  ares_search_int(channel, dnsrec, ares_dnsrec_convert_cb, carg);
+  ares_channel_unlock(channel);
 
   ares_dns_record_destroy(dnsrec);
 }
@@ -481,15 +480,15 @@ ares_status_t ares_search_dnsrec(ares_channel_t          *channel,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
   status = ares_search_int(channel, dnsrec, callback, arg);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return status;
 }
 
 /* Concatenate two domains. */
-ares_status_t ares__cat_domain(const char *name, const char *domain, char **s)
+ares_status_t ares_cat_domain(const char *name, const char *domain, char **s)
 {
   size_t nlen = ares_strlen(name);
   size_t dlen = ares_strlen(domain);
@@ -500,7 +499,7 @@ ares_status_t ares__cat_domain(const char *name, const char *domain, char **s)
   }
   memcpy(*s, name, nlen);
   (*s)[nlen] = '.';
-  if (strcmp(domain, ".") == 0) {
+  if (ares_streq(domain, ".")) {
     /* Avoid appending the root domain to the separator, which would set *s to
        an ill-formed value (ending in two consecutive dots). */
     dlen = 0;
@@ -510,14 +509,15 @@ ares_status_t ares__cat_domain(const char *name, const char *domain, char **s)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__lookup_hostaliases(const ares_channel_t *channel,
-                                       const char *name, char **alias)
+ares_status_t ares_lookup_hostaliases(const ares_channel_t *channel,
+                                      const char *name, char **alias)
 {
-  ares_status_t       status      = ARES_SUCCESS;
-  const char         *hostaliases = NULL;
-  ares__buf_t        *buf         = NULL;
-  ares__llist_t      *lines       = NULL;
-  ares__llist_node_t *node;
+  ares_status_t status      = ARES_SUCCESS;
+  const char   *hostaliases = NULL;
+  ares_buf_t   *buf         = NULL;
+  ares_array_t *lines       = NULL;
+  size_t        num;
+  size_t        i;
 
   if (channel == NULL || name == NULL || alias == NULL) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -541,13 +541,13 @@ ares_status_t ares__lookup_hostaliases(const ares_channel_t *channel,
     goto done;
   }
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_load_file(hostaliases, buf);
+  status = ares_buf_load_file(hostaliases, buf);
   if (status != ARES_SUCCESS) {
     goto done;
   }
@@ -560,44 +560,45 @@ ares_status_t ares__lookup_hostaliases(const ares_channel_t *channel,
    * curl    www.curl.se
    */
 
-  status = ares__buf_split(buf, (const unsigned char *)"\n", 1,
-                           ARES_BUF_SPLIT_TRIM, 0, &lines);
+  status = ares_buf_split(buf, (const unsigned char *)"\n", 1,
+                          ARES_BUF_SPLIT_TRIM, 0, &lines);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (node = ares__llist_node_first(lines); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t *line         = ares__llist_node_val(node);
+  num = ares_array_len(lines);
+  for (i = 0; i < num; i++) {
+    ares_buf_t **bufptr       = ares_array_at(lines, i);
+    ares_buf_t  *line         = *bufptr;
     char         hostname[64] = "";
     char         fqdn[256]    = "";
 
     /* Pull off hostname */
-    ares__buf_tag(line);
-    ares__buf_consume_nonwhitespace(line);
-    if (ares__buf_tag_fetch_string(line, hostname, sizeof(hostname)) !=
+    ares_buf_tag(line);
+    ares_buf_consume_nonwhitespace(line);
+    if (ares_buf_tag_fetch_string(line, hostname, sizeof(hostname)) !=
         ARES_SUCCESS) {
       continue;
     }
 
     /* Match hostname */
-    if (strcasecmp(hostname, name) != 0) {
+    if (!ares_strcaseeq(hostname, name)) {
       continue;
     }
 
     /* consume whitespace */
-    ares__buf_consume_whitespace(line, ARES_TRUE);
+    ares_buf_consume_whitespace(line, ARES_TRUE);
 
     /* pull off fqdn */
-    ares__buf_tag(line);
-    ares__buf_consume_nonwhitespace(line);
-    if (ares__buf_tag_fetch_string(line, fqdn, sizeof(fqdn)) != ARES_SUCCESS ||
+    ares_buf_tag(line);
+    ares_buf_consume_nonwhitespace(line);
+    if (ares_buf_tag_fetch_string(line, fqdn, sizeof(fqdn)) != ARES_SUCCESS ||
         ares_strlen(fqdn) == 0) {
       continue;
     }
 
     /* Validate characterset */
-    if (!ares__is_hostname(fqdn)) {
+    if (!ares_is_hostname(fqdn)) {
       continue;
     }
 
@@ -615,8 +616,8 @@ ares_status_t ares__lookup_hostaliases(const ares_channel_t *channel,
   status = ARES_ENOTFOUND;
 
 done:
-  ares__buf_destroy(buf);
-  ares__llist_destroy(lines);
+  ares_buf_destroy(buf);
+  ares_array_destroy(lines);
 
   return status;
 }
diff --git a/deps/cares/src/lib/ares_send.c b/deps/cares/src/lib/ares_send.c
index 64ff7edd3ac602..ca178a1741ed7d 100644
--- a/deps/cares/src/lib/ares_send.c
+++ b/deps/cares/src/lib/ares_send.c
@@ -37,8 +37,8 @@ static unsigned short generate_unique_qid(ares_channel_t *channel)
   unsigned short id;
 
   do {
-    id = ares__generate_new_id(channel->rand_state);
-  } while (ares__htable_szvp_get(channel->queries_by_qid, id, NULL));
+    id = ares_generate_new_id(channel->rand_state);
+  } while (ares_htable_szvp_get(channel->queries_by_qid, id, NULL));
 
   return id;
 }
@@ -77,14 +77,14 @@ static ares_status_t ares_apply_dns0x20(ares_channel_t    *channel,
    * is 1 bit per byte */
   total_bits     = ((len + 7) / 8) * 8;
   remaining_bits = total_bits;
-  ares__rand_bytes(channel->rand_state, randdata, total_bits / 8);
+  ares_rand_bytes(channel->rand_state, randdata, total_bits / 8);
 
   /* Randomly apply 0x20 to name */
   for (i = 0; i < len; i++) {
     size_t bit;
 
     /* Only apply 0x20 to alpha characters */
-    if (!ares__isalpha(name[i])) {
+    if (!ares_isalpha(name[i])) {
       dns0x20name[i] = name[i];
       continue;
     }
@@ -105,7 +105,8 @@ static ares_status_t ares_apply_dns0x20(ares_channel_t    *channel,
   return status;
 }
 
-ares_status_t ares_send_nolock(ares_channel_t          *channel,
+ares_status_t ares_send_nolock(ares_channel_t *channel, ares_server_t *server,
+                               ares_send_flags_t        flags,
                                const ares_dns_record_t *dnsrec,
                                ares_callback_dnsrec callback, void *arg,
                                unsigned short *qid)
@@ -116,20 +117,22 @@ ares_status_t ares_send_nolock(ares_channel_t          *channel,
   unsigned short           id          = generate_unique_qid(channel);
   const ares_dns_record_t *dnsrec_resp = NULL;
 
-  ares__tvnow(&now);
+  ares_tvnow(&now);
 
-  if (ares__slist_len(channel->servers) == 0) {
+  if (ares_slist_len(channel->servers) == 0) {
     callback(arg, ARES_ENOSERVER, 0, NULL);
     return ARES_ENOSERVER;
   }
 
-  /* Check query cache */
-  status = ares_qcache_fetch(channel, &now, dnsrec, &dnsrec_resp);
-  if (status != ARES_ENOTFOUND) {
-    /* ARES_SUCCESS means we retrieved the cache, anything else is a critical
-     * failure, all result in termination */
-    callback(arg, status, 0, dnsrec_resp);
-    return status;
+  if (!(flags & ARES_SEND_FLAG_NOCACHE)) {
+    /* Check query cache */
+    status = ares_qcache_fetch(channel, &now, dnsrec, &dnsrec_resp);
+    if (status != ARES_ENOTFOUND) {
+      /* ARES_SUCCESS means we retrieved the cache, anything else is a critical
+       * failure, all result in termination */
+      callback(arg, status, 0, dnsrec_resp);
+      return status;
+    }
   }
 
   /* Allocate space for query and allocated fields. */
@@ -162,7 +165,7 @@ ares_status_t ares_send_nolock(ares_channel_t          *channel,
     if (status != ARES_SUCCESS) {
       /* LCOV_EXCL_START: OutOfMemory */
       callback(arg, status, 0, NULL);
-      ares__free_query(query);
+      ares_free_query(query);
       return status;
       /* LCOV_EXCL_STOP */
     }
@@ -175,6 +178,9 @@ ares_status_t ares_send_nolock(ares_channel_t          *channel,
   /* Initialize query status. */
   query->try_count = 0;
 
+  if (flags & ARES_SEND_FLAG_NORETRY) {
+    query->no_retries = ARES_TRUE;
+  }
 
   query->error_status = ARES_SUCCESS;
   query->timeouts     = 0;
@@ -184,12 +190,11 @@ ares_status_t ares_send_nolock(ares_channel_t          *channel,
   query->node_queries_to_conn    = NULL;
 
   /* Chain the query into the list of all queries. */
-  query->node_all_queries =
-    ares__llist_insert_last(channel->all_queries, query);
+  query->node_all_queries = ares_llist_insert_last(channel->all_queries, query);
   if (query->node_all_queries == NULL) {
     /* LCOV_EXCL_START: OutOfMemory */
     callback(arg, ARES_ENOMEM, 0, NULL);
-    ares__free_query(query);
+    ares_free_query(query);
     return ARES_ENOMEM;
     /* LCOV_EXCL_STOP */
   }
@@ -197,17 +202,17 @@ ares_status_t ares_send_nolock(ares_channel_t          *channel,
   /* Keep track of queries bucketed by qid, so we can process DNS
    * responses quickly.
    */
-  if (!ares__htable_szvp_insert(channel->queries_by_qid, query->qid, query)) {
+  if (!ares_htable_szvp_insert(channel->queries_by_qid, query->qid, query)) {
     /* LCOV_EXCL_START: OutOfMemory */
     callback(arg, ARES_ENOMEM, 0, NULL);
-    ares__free_query(query);
+    ares_free_query(query);
     return ARES_ENOMEM;
     /* LCOV_EXCL_STOP */
   }
 
   /* Perform the first query action. */
 
-  status = ares__send_query(query, &now);
+  status = ares_send_query(server, query, &now);
   if (status == ARES_SUCCESS && qid) {
     *qid = id;
   }
@@ -225,11 +230,11 @@ ares_status_t ares_send_dnsrec(ares_channel_t          *channel,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  status = ares_send_nolock(channel, dnsrec, callback, arg, qid);
+  status = ares_send_nolock(channel, NULL, 0, dnsrec, callback, arg, qid);
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return status;
 }
@@ -257,7 +262,7 @@ void ares_send(ares_channel_t *channel, const unsigned char *qbuf, int qlen,
     return;
   }
 
-  carg = ares__dnsrec_convert_arg(callback, arg);
+  carg = ares_dnsrec_convert_arg(callback, arg);
   if (carg == NULL) {
     /* LCOV_EXCL_START: OutOfMemory */
     status = ARES_ENOMEM;
@@ -267,7 +272,7 @@ void ares_send(ares_channel_t *channel, const unsigned char *qbuf, int qlen,
     /* LCOV_EXCL_STOP */
   }
 
-  ares_send_dnsrec(channel, dnsrec, ares__dnsrec_convert_cb, carg, NULL);
+  ares_send_dnsrec(channel, dnsrec, ares_dnsrec_convert_cb, carg, NULL);
 
   ares_dns_record_destroy(dnsrec);
 }
@@ -280,11 +285,11 @@ size_t ares_queue_active_queries(const ares_channel_t *channel)
     return 0;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  len = ares__llist_len(channel->all_queries);
+  len = ares_llist_len(channel->all_queries);
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return len;
 }
diff --git a/deps/cares/src/lib/ares_set_socket_functions.c b/deps/cares/src/lib/ares_set_socket_functions.c
new file mode 100644
index 00000000000000..143c491174fdba
--- /dev/null
+++ b/deps/cares/src/lib/ares_set_socket_functions.c
@@ -0,0 +1,586 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#include "ares_private.h"
+#ifdef HAVE_SYS_UIO_H
+#  include <sys/uio.h>
+#endif
+#ifdef HAVE_NETINET_IN_H
+#  include <netinet/in.h>
+#endif
+#ifdef HAVE_NETINET_TCP_H
+#  include <netinet/tcp.h>
+#endif
+#ifdef HAVE_NETDB_H
+#  include <netdb.h>
+#endif
+#ifdef HAVE_ARPA_INET_H
+#  include <arpa/inet.h>
+#endif
+
+#ifdef HAVE_STRINGS_H
+#  include <strings.h>
+#endif
+#ifdef HAVE_SYS_IOCTL_H
+#  include <sys/ioctl.h>
+#endif
+#ifdef NETWARE
+#  include <sys/filio.h>
+#endif
+
+#include <assert.h>
+#include <fcntl.h>
+#include <limits.h>
+
+
+#if defined(__linux__) && defined(TCP_FASTOPEN_CONNECT)
+#  define TFO_SUPPORTED      1
+#  define TFO_SKIP_CONNECT   0
+#  define TFO_USE_SENDTO     0
+#  define TFO_USE_CONNECTX   0
+#  define TFO_CLIENT_SOCKOPT TCP_FASTOPEN_CONNECT
+#elif defined(__FreeBSD__) && defined(TCP_FASTOPEN)
+#  define TFO_SUPPORTED      1
+#  define TFO_SKIP_CONNECT   1
+#  define TFO_USE_SENDTO     1
+#  define TFO_USE_CONNECTX   0
+#  define TFO_CLIENT_SOCKOPT TCP_FASTOPEN
+#elif defined(__APPLE__) && defined(HAVE_CONNECTX)
+#  define TFO_SUPPORTED    1
+#  define TFO_SKIP_CONNECT 0
+#  define TFO_USE_SENDTO   0
+#  define TFO_USE_CONNECTX 1
+#  undef TFO_CLIENT_SOCKOPT
+#else
+#  define TFO_SUPPORTED 0
+#endif
+
+#ifndef HAVE_WRITEV
+/* Structure for scatter/gather I/O. */
+struct iovec {
+  void  *iov_base; /* Pointer to data. */
+  size_t iov_len;  /* Length of data.  */
+};
+#endif
+
+ares_status_t
+  ares_set_socket_functions_ex(ares_channel_t                        *channel,
+                               const struct ares_socket_functions_ex *funcs,
+                               void                                  *user_data)
+{
+  unsigned int known_versions[] = { 1 };
+  size_t       i;
+
+  if (channel == NULL || funcs == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  /* Check to see if we know the version referenced */
+  for (i = 0; i < sizeof(known_versions) / sizeof(*known_versions); i++) {
+    if (funcs->version == known_versions[i]) {
+      break;
+    }
+  }
+  if (i == sizeof(known_versions) / sizeof(*known_versions)) {
+    return ARES_EFORMERR;
+  }
+
+  memset(&channel->sock_funcs, 0, sizeof(channel->sock_funcs));
+
+  /* Copy individually for ABI compliance.  memcpy() with a sizeof would do
+   * invalid reads */
+  if (funcs->version >= 1) {
+    if (funcs->asocket == NULL || funcs->aclose == NULL ||
+        funcs->asetsockopt == NULL || funcs->aconnect == NULL ||
+        funcs->arecvfrom == NULL || funcs->asendto == NULL) {
+      return ARES_EFORMERR;
+    }
+    channel->sock_funcs.version      = funcs->version;
+    channel->sock_funcs.flags        = funcs->flags;
+    channel->sock_funcs.asocket      = funcs->asocket;
+    channel->sock_funcs.aclose       = funcs->aclose;
+    channel->sock_funcs.asetsockopt  = funcs->asetsockopt;
+    channel->sock_funcs.aconnect     = funcs->aconnect;
+    channel->sock_funcs.arecvfrom    = funcs->arecvfrom;
+    channel->sock_funcs.asendto      = funcs->asendto;
+    channel->sock_funcs.agetsockname = funcs->agetsockname;
+    channel->sock_funcs.abind        = funcs->abind;
+  }
+
+  /* Implement newer versions here ...*/
+
+
+  channel->sock_func_cb_data = user_data;
+
+  return ARES_SUCCESS;
+}
+
+static int setsocknonblock(ares_socket_t sockfd, /* operate on this */
+                           int           nonblock /* TRUE or FALSE */)
+{
+#if defined(HAVE_FCNTL_O_NONBLOCK)
+
+  /* most recent unix versions */
+  int flags;
+  flags = fcntl(sockfd, F_GETFL, 0);
+  if (nonblock) {
+    return fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
+  } else {
+    return fcntl(sockfd, F_SETFL, flags & (~O_NONBLOCK)); /* LCOV_EXCL_LINE */
+  }
+
+#elif defined(HAVE_IOCTL_FIONBIO)
+
+  /* older unix versions */
+  int flags = nonblock ? 1 : 0;
+  return ioctl(sockfd, FIONBIO, &flags);
+
+#elif defined(HAVE_IOCTLSOCKET_FIONBIO)
+
+#  ifdef WATT32
+  char flags = nonblock ? 1 : 0;
+#  else
+  /* Windows */
+  unsigned long flags = nonblock ? 1UL : 0UL;
+#  endif
+  return ioctlsocket(sockfd, (long)FIONBIO, &flags);
+
+#elif defined(HAVE_IOCTLSOCKET_CAMEL_FIONBIO)
+
+  /* Amiga */
+  long flags = nonblock ? 1L : 0L;
+  return IoctlSocket(sockfd, FIONBIO, flags);
+
+#elif defined(HAVE_SETSOCKOPT_SO_NONBLOCK)
+
+  /* BeOS */
+  long b = nonblock ? 1L : 0L;
+  return setsockopt(sockfd, SOL_SOCKET, SO_NONBLOCK, &b, sizeof(b));
+
+#else
+#  error "no non-blocking method was found/used/set"
+#endif
+}
+
+static int default_aclose(ares_socket_t sock, void *user_data)
+{
+  (void)user_data;
+
+#if defined(HAVE_CLOSESOCKET)
+  return closesocket(sock);
+#elif defined(HAVE_CLOSESOCKET_CAMEL)
+  return CloseSocket(sock);
+#elif defined(HAVE_CLOSE_S)
+  return close_s(sock);
+#else
+  return close(sock);
+#endif
+}
+
+static ares_socket_t default_asocket(int domain, int type, int protocol,
+                                     void *user_data)
+{
+  ares_socket_t s;
+  (void)user_data;
+
+  s = socket(domain, type, protocol);
+  if (s == ARES_SOCKET_BAD) {
+    return s;
+  }
+
+  if (setsocknonblock(s, 1) != 0) {
+    goto fail; /* LCOV_EXCL_LINE */
+  }
+
+#if defined(FD_CLOEXEC) && !defined(MSDOS)
+  /* Configure the socket fd as close-on-exec. */
+  if (fcntl(s, F_SETFD, FD_CLOEXEC) != 0) {
+    goto fail; /* LCOV_EXCL_LINE */
+  }
+#endif
+
+  /* No need to emit SIGPIPE on socket errors */
+#if defined(SO_NOSIGPIPE)
+  {
+    int opt = 1;
+    (void)setsockopt(s, SOL_SOCKET, SO_NOSIGPIPE, (void *)&opt, sizeof(opt));
+  }
+#endif
+
+
+  if (type == SOCK_STREAM) {
+    int opt = 1;
+
+#ifdef TCP_NODELAY
+    /*
+     * Disable the Nagle algorithm (only relevant for TCP sockets, and thus not
+     * in configure_socket). In general, in DNS lookups we're pretty much
+     * interested in firing off a single request and then waiting for a reply,
+     * so batching isn't very interesting.
+     */
+    if (setsockopt(s, IPPROTO_TCP, TCP_NODELAY, (void *)&opt, sizeof(opt)) !=
+        0) {
+      goto fail;
+    }
+#endif
+  }
+
+#if defined(IPV6_V6ONLY) && defined(USE_WINSOCK)
+  /* Support for IPv4-mapped IPv6 addresses.
+   * Linux kernel, NetBSD, FreeBSD and Darwin: default is off;
+   * Windows Vista and later: default is on;
+   * DragonFly BSD: acts like off, and dummy setting;
+   * OpenBSD and earlier Windows: unsupported.
+   * Linux: controlled by /proc/sys/net/ipv6/bindv6only.
+   */
+  if (domain == PF_INET6) {
+    int on = 0;
+    (void)setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY, (void *)&on, sizeof(on));
+  }
+#endif
+
+  return s;
+
+fail:
+  default_aclose(s, user_data);
+  return ARES_SOCKET_BAD;
+}
+
+static int default_asetsockopt(ares_socket_t sock, ares_socket_opt_t opt,
+                               const void *val, ares_socklen_t val_size,
+                               void *user_data)
+{
+  switch (opt) {
+    case ARES_SOCKET_OPT_SENDBUF_SIZE:
+      if (val_size != sizeof(int)) {
+        SET_SOCKERRNO(EINVAL);
+        return -1;
+      }
+      return setsockopt(sock, SOL_SOCKET, SO_SNDBUF, val, val_size);
+
+    case ARES_SOCKET_OPT_RECVBUF_SIZE:
+      if (val_size != sizeof(int)) {
+        SET_SOCKERRNO(EINVAL);
+        return -1;
+      }
+      return setsockopt(sock, SOL_SOCKET, SO_RCVBUF, val, val_size);
+
+    case ARES_SOCKET_OPT_BIND_DEVICE:
+      if (!ares_str_isprint(val, (size_t)val_size)) {
+        SET_SOCKERRNO(EINVAL);
+        return -1;
+      }
+#ifdef SO_BINDTODEVICE
+      return setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, val, val_size);
+#else
+      SET_SOCKERRNO(ENOSYS);
+      return -1;
+#endif
+
+    case ARES_SOCKET_OPT_TCP_FASTOPEN:
+      if (val_size != sizeof(ares_bool_t)) {
+        SET_SOCKERRNO(EINVAL);
+        return -1;
+      }
+#if defined(TFO_CLIENT_SOCKOPT)
+      {
+        int                oval;
+        const ares_bool_t *pval = val;
+        oval                    = (int)*pval;
+        return setsockopt(sock, IPPROTO_TCP, TFO_CLIENT_SOCKOPT, (void *)&oval,
+                          sizeof(oval));
+      }
+#elif TFO_SUPPORTED
+      return 0;
+#else
+      SET_SOCKERRNO(ENOSYS);
+      return -1;
+#endif
+  }
+
+  (void)user_data;
+  SET_SOCKERRNO(ENOSYS);
+  return -1;
+}
+
+static int default_aconnect(ares_socket_t sock, const struct sockaddr *address,
+                            ares_socklen_t address_len, unsigned int flags,
+                            void *user_data)
+{
+  (void)user_data;
+
+#if defined(TFO_SKIP_CONNECT) && TFO_SKIP_CONNECT
+  if (flags & ARES_SOCKET_CONN_TCP_FASTOPEN) {
+    return 0;
+  }
+  return connect(sock, address, address_len);
+#elif defined(TFO_USE_CONNECTX) && TFO_USE_CONNECTX
+  if (flags & ARES_SOCKET_CONN_TCP_FASTOPEN) {
+    sa_endpoints_t endpoints;
+
+    memset(&endpoints, 0, sizeof(endpoints));
+    endpoints.sae_dstaddr    = address;
+    endpoints.sae_dstaddrlen = address_len;
+
+    return connectx(sock, &endpoints, SAE_ASSOCID_ANY,
+                    CONNECT_DATA_IDEMPOTENT | CONNECT_RESUME_ON_READ_WRITE,
+                    NULL, 0, NULL, NULL);
+  } else {
+    return connect(sock, address, address_len);
+  }
+#else
+  (void)flags;
+  return connect(sock, address, address_len);
+#endif
+}
+
+static ares_ssize_t default_arecvfrom(ares_socket_t sock, void *buffer,
+                                      size_t length, int flags,
+                                      struct sockaddr *address,
+                                      ares_socklen_t  *address_len,
+                                      void            *user_data)
+{
+  (void)user_data;
+
+#ifdef HAVE_RECVFROM
+  return (ares_ssize_t)recvfrom(sock, buffer, (RECVFROM_TYPE_ARG3)length, flags,
+                                address, address_len);
+#else
+  if (address != NULL && address_len != NULL) {
+    memset(address, 0, (size_t)*address_len);
+    address->sa_family = AF_UNSPEC;
+  }
+  return (ares_ssize_t)recv(sock, buffer, (RECVFROM_TYPE_ARG3)length, flags);
+#endif
+}
+
+static ares_ssize_t default_asendto(ares_socket_t sock, const void *buffer,
+                                    size_t length, int flags,
+                                    const struct sockaddr *address,
+                                    ares_socklen_t address_len, void *user_data)
+{
+  (void)user_data;
+
+  if (address != NULL) {
+#ifdef HAVE_SENDTO
+    return (ares_ssize_t)sendto((SEND_TYPE_ARG1)sock, (SEND_TYPE_ARG2)buffer,
+                                (SEND_TYPE_ARG3)length, (SEND_TYPE_ARG4)flags,
+                                address, address_len);
+#else
+    (void)address_len;
+#endif
+  }
+
+  return (ares_ssize_t)send((SEND_TYPE_ARG1)sock, (SEND_TYPE_ARG2)buffer,
+                            (SEND_TYPE_ARG3)length, (SEND_TYPE_ARG4)flags);
+}
+
+static int default_agetsockname(ares_socket_t sock, struct sockaddr *address,
+                                ares_socklen_t *address_len, void *user_data)
+{
+  (void)user_data;
+  return getsockname(sock, address, address_len);
+}
+
+static int default_abind(ares_socket_t sock, unsigned int flags,
+                         const struct sockaddr *address, socklen_t address_len,
+                         void *user_data)
+{
+  (void)user_data;
+
+#ifdef IP_BIND_ADDRESS_NO_PORT
+  if (flags & ARES_SOCKET_BIND_TCP && flags & ARES_SOCKET_BIND_CLIENT) {
+    int opt = 1;
+    (void)setsockopt(sock, SOL_IP, IP_BIND_ADDRESS_NO_PORT, &opt, sizeof(opt));
+  }
+#else
+  (void)flags;
+#endif
+
+  return bind(sock, address, address_len);
+}
+
+static unsigned int default_aif_nametoindex(const char *ifname, void *user_data)
+{
+  (void)user_data;
+  return ares_os_if_nametoindex(ifname);
+}
+
+static const char *default_aif_indextoname(unsigned int ifindex,
+                                           char        *ifname_buf,
+                                           size_t       ifname_buf_len,
+                                           void        *user_data)
+{
+  (void)user_data;
+  return ares_os_if_indextoname(ifindex, ifname_buf, ifname_buf_len);
+}
+
+static const struct ares_socket_functions_ex default_socket_functions = {
+  1,
+  ARES_SOCKFUNC_FLAG_NONBLOCKING,
+  default_asocket,
+  default_aclose,
+  default_asetsockopt,
+  default_aconnect,
+  default_arecvfrom,
+  default_asendto,
+  default_agetsockname,
+  default_abind,
+  default_aif_nametoindex,
+  default_aif_indextoname
+};
+
+void ares_set_socket_functions_def(ares_channel_t *channel)
+{
+  ares_set_socket_functions_ex(channel, &default_socket_functions, NULL);
+}
+
+static int legacycb_aclose(ares_socket_t sock, void *user_data)
+{
+  ares_channel_t *channel = user_data;
+
+  if (channel->legacy_sock_funcs != NULL &&
+      channel->legacy_sock_funcs->aclose != NULL) {
+    return channel->legacy_sock_funcs->aclose(
+      sock, channel->legacy_sock_funcs_cb_data);
+  }
+
+  return default_aclose(sock, NULL);
+}
+
+static ares_socket_t legacycb_asocket(int domain, int type, int protocol,
+                                      void *user_data)
+{
+  ares_channel_t *channel = user_data;
+
+  if (channel->legacy_sock_funcs != NULL &&
+      channel->legacy_sock_funcs->asocket != NULL) {
+    return channel->legacy_sock_funcs->asocket(
+      domain, type, protocol, channel->legacy_sock_funcs_cb_data);
+  }
+
+  return default_asocket(domain, type, protocol, NULL);
+}
+
+static int legacycb_asetsockopt(ares_socket_t sock, ares_socket_opt_t opt,
+                                const void *val, ares_socklen_t val_size,
+                                void *user_data)
+{
+  (void)sock;
+  (void)opt;
+  (void)val;
+  (void)val_size;
+  (void)user_data;
+  SET_SOCKERRNO(ENOSYS);
+  return -1;
+}
+
+static int legacycb_aconnect(ares_socket_t sock, const struct sockaddr *address,
+                             ares_socklen_t address_len, unsigned int flags,
+                             void *user_data)
+{
+  ares_channel_t *channel = user_data;
+
+  if (channel->legacy_sock_funcs != NULL &&
+      channel->legacy_sock_funcs->aconnect != NULL) {
+    return channel->legacy_sock_funcs->aconnect(
+      sock, address, address_len, channel->legacy_sock_funcs_cb_data);
+  }
+
+  return default_aconnect(sock, address, address_len, flags, NULL);
+}
+
+static ares_ssize_t legacycb_arecvfrom(ares_socket_t sock, void *buffer,
+                                       size_t length, int flags,
+                                       struct sockaddr *address,
+                                       ares_socklen_t  *address_len,
+                                       void            *user_data)
+{
+  ares_channel_t *channel = user_data;
+
+  if (channel->legacy_sock_funcs != NULL &&
+      channel->legacy_sock_funcs->arecvfrom != NULL) {
+    if (address != NULL && address_len != NULL) {
+      memset(address, 0, (size_t)*address_len);
+      address->sa_family = AF_UNSPEC;
+    }
+    return channel->legacy_sock_funcs->arecvfrom(
+      sock, buffer, length, flags, address, address_len,
+      channel->legacy_sock_funcs_cb_data);
+  }
+
+  return default_arecvfrom(sock, buffer, length, flags, address, address_len,
+                           NULL);
+}
+
+static ares_ssize_t legacycb_asendto(ares_socket_t sock, const void *buffer,
+                                     size_t length, int flags,
+                                     const struct sockaddr *address,
+                                     ares_socklen_t         address_len,
+                                     void                  *user_data)
+{
+  ares_channel_t *channel = user_data;
+
+  if (channel->legacy_sock_funcs != NULL &&
+      channel->legacy_sock_funcs->asendv != NULL) {
+    struct iovec vec;
+    vec.iov_base = (void *)((size_t)buffer); /* Cast off const */
+    vec.iov_len  = length;
+    return channel->legacy_sock_funcs->asendv(
+      sock, &vec, 1, channel->legacy_sock_funcs_cb_data);
+  }
+
+  return default_asendto(sock, buffer, length, flags, address, address_len,
+                         NULL);
+}
+
+
+static const struct ares_socket_functions_ex legacy_socket_functions = {
+  1,
+  0,
+  legacycb_asocket,
+  legacycb_aclose,
+  legacycb_asetsockopt,
+  legacycb_aconnect,
+  legacycb_arecvfrom,
+  legacycb_asendto,
+  NULL, /* agetsockname */
+  NULL, /* abind */
+  NULL, /* aif_nametoindex */
+  NULL  /* aif_indextoname */
+};
+
+void ares_set_socket_functions(ares_channel_t                     *channel,
+                               const struct ares_socket_functions *funcs,
+                               void                               *data)
+{
+  if (channel == NULL || channel->optmask & ARES_OPT_EVENT_THREAD) {
+    return;
+  }
+
+  channel->legacy_sock_funcs         = funcs;
+  channel->legacy_sock_funcs_cb_data = data;
+  ares_set_socket_functions_ex(channel, &legacy_socket_functions, channel);
+}
diff --git a/deps/cares/src/lib/ares_setup.h b/deps/cares/src/lib/ares_setup.h
index b6ce077ff64758..8890c3c338bf15 100644
--- a/deps/cares/src/lib/ares_setup.h
+++ b/deps/cares/src/lib/ares_setup.h
@@ -199,51 +199,15 @@
 #endif
 
 
-#ifdef __hpux
-#  if !defined(_XOPEN_SOURCE_EXTENDED) || defined(_KERNEL)
-#    ifdef _APP32_64BIT_OFF_T
-#      define OLD_APP32_64BIT_OFF_T _APP32_64BIT_OFF_T
-#      undef _APP32_64BIT_OFF_T
-#    else
-#      undef OLD_APP32_64BIT_OFF_T
-#    endif
-#  endif
-#endif
-
-#ifdef __hpux
-#  if !defined(_XOPEN_SOURCE_EXTENDED) || defined(_KERNEL)
-#    ifdef OLD_APP32_64BIT_OFF_T
-#      define _APP32_64BIT_OFF_T OLD_APP32_64BIT_OFF_T
-#      undef OLD_APP32_64BIT_OFF_T
-#    endif
-#  endif
-#endif
-
-
-/*
- * Definition of timeval struct for platforms that don't have it.
- */
+/* Definition of timeval struct for platforms that don't have it. */
 
 #ifndef HAVE_STRUCT_TIMEVAL
 struct timeval {
-  long tv_sec;
-  long tv_usec;
+  ares_int64_t tv_sec;
+  long         tv_usec;
 };
 #endif
 
-/*
- * Function-like macro definition used to close a socket.
- */
-
-#if defined(HAVE_CLOSESOCKET)
-#  define sclose(x) closesocket((x))
-#elif defined(HAVE_CLOSESOCKET_CAMEL)
-#  define sclose(x) CloseSocket((x))
-#elif defined(HAVE_CLOSE_S)
-#  define sclose(x) close_s((x))
-#else
-#  define sclose(x) close((x))
-#endif
 
 /*
  * Macro used to include code only in debug builds.
@@ -257,111 +221,4 @@ struct timeval {
     } while (0)
 #endif
 
-/*
- * Macro SOCKERRNO / SET_SOCKERRNO() returns / sets the *socket-related* errno
- * (or equivalent) on this platform to hide platform details to code using it.
- */
-
-#ifdef USE_WINSOCK
-#  define SOCKERRNO        ((int)WSAGetLastError())
-#  define SET_SOCKERRNO(x) (WSASetLastError((int)(x)))
-#else
-#  define SOCKERRNO        (errno)
-#  define SET_SOCKERRNO(x) (errno = (x))
-#endif
-
-
-/*
- * Macro ERRNO / SET_ERRNO() returns / sets the NOT *socket-related* errno
- * (or equivalent) on this platform to hide platform details to code using it.
- */
-
-#if defined(WIN32) && !defined(WATT32)
-#  define ERRNO        ((int)GetLastError())
-#  define SET_ERRNO(x) (SetLastError((DWORD)(x)))
-#else
-#  define ERRNO        (errno)
-#  define SET_ERRNO(x) (errno = (x))
-#endif
-
-
-/*
- * Portable error number symbolic names defined to Winsock error codes.
- */
-
-#ifdef USE_WINSOCK
-#  undef EBADF           /* override definition in errno.h */
-#  define EBADF WSAEBADF
-#  undef EINTR           /* override definition in errno.h */
-#  define EINTR WSAEINTR
-#  undef EINVAL          /* override definition in errno.h */
-#  define EINVAL WSAEINVAL
-#  undef EWOULDBLOCK     /* override definition in errno.h */
-#  define EWOULDBLOCK WSAEWOULDBLOCK
-#  undef EINPROGRESS     /* override definition in errno.h */
-#  define EINPROGRESS WSAEINPROGRESS
-#  undef EALREADY        /* override definition in errno.h */
-#  define EALREADY WSAEALREADY
-#  undef ENOTSOCK        /* override definition in errno.h */
-#  define ENOTSOCK WSAENOTSOCK
-#  undef EDESTADDRREQ    /* override definition in errno.h */
-#  define EDESTADDRREQ WSAEDESTADDRREQ
-#  undef EMSGSIZE        /* override definition in errno.h */
-#  define EMSGSIZE WSAEMSGSIZE
-#  undef EPROTOTYPE      /* override definition in errno.h */
-#  define EPROTOTYPE WSAEPROTOTYPE
-#  undef ENOPROTOOPT     /* override definition in errno.h */
-#  define ENOPROTOOPT WSAENOPROTOOPT
-#  undef EPROTONOSUPPORT /* override definition in errno.h */
-#  define EPROTONOSUPPORT WSAEPROTONOSUPPORT
-#  define ESOCKTNOSUPPORT WSAESOCKTNOSUPPORT
-#  undef EOPNOTSUPP /* override definition in errno.h */
-#  define EOPNOTSUPP   WSAEOPNOTSUPP
-#  define EPFNOSUPPORT WSAEPFNOSUPPORT
-#  undef EAFNOSUPPORT  /* override definition in errno.h */
-#  define EAFNOSUPPORT WSAEAFNOSUPPORT
-#  undef EADDRINUSE    /* override definition in errno.h */
-#  define EADDRINUSE WSAEADDRINUSE
-#  undef EADDRNOTAVAIL /* override definition in errno.h */
-#  define EADDRNOTAVAIL WSAEADDRNOTAVAIL
-#  undef ENETDOWN      /* override definition in errno.h */
-#  define ENETDOWN WSAENETDOWN
-#  undef ENETUNREACH   /* override definition in errno.h */
-#  define ENETUNREACH WSAENETUNREACH
-#  undef ENETRESET     /* override definition in errno.h */
-#  define ENETRESET WSAENETRESET
-#  undef ECONNABORTED  /* override definition in errno.h */
-#  define ECONNABORTED WSAECONNABORTED
-#  undef ECONNRESET    /* override definition in errno.h */
-#  define ECONNRESET WSAECONNRESET
-#  undef ENOBUFS       /* override definition in errno.h */
-#  define ENOBUFS WSAENOBUFS
-#  undef EISCONN       /* override definition in errno.h */
-#  define EISCONN WSAEISCONN
-#  undef ENOTCONN      /* override definition in errno.h */
-#  define ENOTCONN     WSAENOTCONN
-#  define ESHUTDOWN    WSAESHUTDOWN
-#  define ETOOMANYREFS WSAETOOMANYREFS
-#  undef ETIMEDOUT     /* override definition in errno.h */
-#  define ETIMEDOUT WSAETIMEDOUT
-#  undef ECONNREFUSED  /* override definition in errno.h */
-#  define ECONNREFUSED WSAECONNREFUSED
-#  undef ELOOP         /* override definition in errno.h */
-#  define ELOOP WSAELOOP
-#  ifndef ENAMETOOLONG /* possible previous definition in errno.h */
-#    define ENAMETOOLONG WSAENAMETOOLONG
-#  endif
-#  define EHOSTDOWN WSAEHOSTDOWN
-#  undef EHOSTUNREACH /* override definition in errno.h */
-#  define EHOSTUNREACH WSAEHOSTUNREACH
-#  ifndef ENOTEMPTY   /* possible previous definition in errno.h */
-#    define ENOTEMPTY WSAENOTEMPTY
-#  endif
-#  define EPROCLIM WSAEPROCLIM
-#  define EUSERS   WSAEUSERS
-#  define EDQUOT   WSAEDQUOT
-#  define ESTALE   WSAESTALE
-#  define EREMOTE  WSAEREMOTE
-#endif
-
 #endif /* __ARES_SETUP_H */
diff --git a/deps/cares/src/lib/ares_socket.c b/deps/cares/src/lib/ares_socket.c
new file mode 100644
index 00000000000000..df02fd61b60b14
--- /dev/null
+++ b/deps/cares/src/lib/ares_socket.c
@@ -0,0 +1,424 @@
+/* MIT License
+ *
+ * Copyright (c) Massachusetts Institute of Technology
+ * Copyright (c) The c-ares project and its contributors
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#include "ares_private.h"
+#ifdef HAVE_SYS_UIO_H
+#  include <sys/uio.h>
+#endif
+#ifdef HAVE_NETINET_IN_H
+#  include <netinet/in.h>
+#endif
+#ifdef HAVE_NETINET_TCP_H
+#  include <netinet/tcp.h>
+#endif
+#ifdef HAVE_NETDB_H
+#  include <netdb.h>
+#endif
+#ifdef HAVE_ARPA_INET_H
+#  include <arpa/inet.h>
+#endif
+
+#ifdef HAVE_STRINGS_H
+#  include <strings.h>
+#endif
+#ifdef HAVE_SYS_IOCTL_H
+#  include <sys/ioctl.h>
+#endif
+#ifdef NETWARE
+#  include <sys/filio.h>
+#endif
+
+#include <assert.h>
+#include <fcntl.h>
+#include <limits.h>
+
+static ares_conn_err_t ares_socket_deref_error(int err)
+{
+  switch (err) {
+#if defined(EWOULDBLOCK)
+    case EWOULDBLOCK:
+      return ARES_CONN_ERR_WOULDBLOCK;
+#endif
+#if defined(EAGAIN) && (!defined(EWOULDBLOCK) || EAGAIN != EWOULDBLOCK)
+    case EAGAIN:
+      return ARES_CONN_ERR_WOULDBLOCK;
+#endif
+    case EINPROGRESS:
+      return ARES_CONN_ERR_WOULDBLOCK;
+    case ENETDOWN:
+      return ARES_CONN_ERR_NETDOWN;
+    case ENETUNREACH:
+      return ARES_CONN_ERR_NETUNREACH;
+    case ECONNABORTED:
+      return ARES_CONN_ERR_CONNABORTED;
+    case ECONNRESET:
+      return ARES_CONN_ERR_CONNRESET;
+    case ECONNREFUSED:
+      return ARES_CONN_ERR_CONNREFUSED;
+    case ETIMEDOUT:
+      return ARES_CONN_ERR_CONNTIMEDOUT;
+    case EHOSTDOWN:
+      return ARES_CONN_ERR_HOSTDOWN;
+    case EHOSTUNREACH:
+      return ARES_CONN_ERR_HOSTUNREACH;
+    case EINTR:
+      return ARES_CONN_ERR_INTERRUPT;
+    case EAFNOSUPPORT:
+      return ARES_CONN_ERR_AFNOSUPPORT;
+    case EADDRNOTAVAIL:
+      return ARES_CONN_ERR_BADADDR;
+    default:
+      break;
+  }
+
+  return ARES_CONN_ERR_FAILURE;
+}
+
+ares_bool_t ares_sockaddr_addr_eq(const struct sockaddr  *sa,
+                                  const struct ares_addr *aa)
+{
+  const void *addr1;
+  const void *addr2;
+
+  if (sa->sa_family == aa->family) {
+    switch (aa->family) {
+      case AF_INET:
+        addr1 = &aa->addr.addr4;
+        addr2 = &(CARES_INADDR_CAST(const struct sockaddr_in *, sa))->sin_addr;
+        if (memcmp(addr1, addr2, sizeof(aa->addr.addr4)) == 0) {
+          return ARES_TRUE; /* match */
+        }
+        break;
+      case AF_INET6:
+        addr1 = &aa->addr.addr6;
+        addr2 =
+          &(CARES_INADDR_CAST(const struct sockaddr_in6 *, sa))->sin6_addr;
+        if (memcmp(addr1, addr2, sizeof(aa->addr.addr6)) == 0) {
+          return ARES_TRUE; /* match */
+        }
+        break;
+      default:
+        break; /* LCOV_EXCL_LINE */
+    }
+  }
+  return ARES_FALSE; /* different */
+}
+
+ares_conn_err_t ares_socket_write(ares_channel_t *channel, ares_socket_t fd,
+                                  const void *data, size_t len, size_t *written,
+                                  const struct sockaddr *sa,
+                                  ares_socklen_t         salen)
+{
+  int             flags = 0;
+  ares_ssize_t    rv;
+  ares_conn_err_t err = ARES_CONN_ERR_SUCCESS;
+
+#ifdef HAVE_MSG_NOSIGNAL
+  flags |= MSG_NOSIGNAL;
+#endif
+
+  rv = channel->sock_funcs.asendto(fd, data, len, flags, sa, salen,
+                                   channel->sock_func_cb_data);
+  if (rv <= 0) {
+    err = ares_socket_deref_error(SOCKERRNO);
+  } else {
+    *written = (size_t)rv;
+  }
+  return err;
+}
+
+ares_conn_err_t ares_socket_recv(ares_channel_t *channel, ares_socket_t s,
+                                 ares_bool_t is_tcp, void *data,
+                                 size_t data_len, size_t *read_bytes)
+{
+  ares_ssize_t rv;
+
+  *read_bytes = 0;
+
+  rv = channel->sock_funcs.arecvfrom(s, data, data_len, 0, NULL, 0,
+                                     channel->sock_func_cb_data);
+
+  if (rv > 0) {
+    *read_bytes = (size_t)rv;
+    return ARES_CONN_ERR_SUCCESS;
+  }
+
+  if (rv == 0) {
+    /* UDP allows 0-byte packets and is connectionless, so this is success */
+    if (!is_tcp) {
+      return ARES_CONN_ERR_SUCCESS;
+    } else {
+      return ARES_CONN_ERR_CONNCLOSED;
+    }
+  }
+
+  /* If we're here, rv<0 */
+  return ares_socket_deref_error(SOCKERRNO);
+}
+
+ares_conn_err_t ares_socket_recvfrom(ares_channel_t *channel, ares_socket_t s,
+                                     ares_bool_t is_tcp, void *data,
+                                     size_t data_len, int flags,
+                                     struct sockaddr *from,
+                                     ares_socklen_t  *from_len,
+                                     size_t          *read_bytes)
+{
+  ares_ssize_t rv;
+
+  rv = channel->sock_funcs.arecvfrom(s, data, data_len, flags, from, from_len,
+                                     channel->sock_func_cb_data);
+
+  if (rv > 0) {
+    *read_bytes = (size_t)rv;
+    return ARES_CONN_ERR_SUCCESS;
+  }
+
+  if (rv == 0) {
+    /* UDP allows 0-byte packets and is connectionless, so this is success */
+    if (!is_tcp) {
+      return ARES_CONN_ERR_SUCCESS;
+    } else {
+      return ARES_CONN_ERR_CONNCLOSED;
+    }
+  }
+
+  /* If we're here, rv<0 */
+  return ares_socket_deref_error(SOCKERRNO);
+}
+
+ares_conn_err_t ares_socket_enable_tfo(const ares_channel_t *channel,
+                                       ares_socket_t         fd)
+{
+  ares_bool_t opt = ARES_TRUE;
+
+  if (channel->sock_funcs.asetsockopt(fd, ARES_SOCKET_OPT_TCP_FASTOPEN,
+                                      (void *)&opt, sizeof(opt),
+                                      channel->sock_func_cb_data) != 0) {
+    return ARES_CONN_ERR_NOTIMP;
+  }
+
+  return ARES_CONN_ERR_SUCCESS;
+}
+
+ares_status_t ares_socket_configure(ares_channel_t *channel, int family,
+                                    ares_bool_t is_tcp, ares_socket_t fd)
+{
+  union {
+    struct sockaddr     sa;
+    struct sockaddr_in  sa4;
+    struct sockaddr_in6 sa6;
+  } local;
+
+  ares_socklen_t bindlen = 0;
+  int            rv;
+  unsigned int   bind_flags = 0;
+
+  /* Set the socket's send and receive buffer sizes. */
+  if (channel->socket_send_buffer_size > 0) {
+    rv = channel->sock_funcs.asetsockopt(
+      fd, ARES_SOCKET_OPT_SENDBUF_SIZE,
+      (void *)&channel->socket_send_buffer_size,
+      sizeof(channel->socket_send_buffer_size), channel->sock_func_cb_data);
+    if (rv != 0 && SOCKERRNO != ENOSYS) {
+      return ARES_ECONNREFUSED; /* LCOV_EXCL_LINE: UntestablePath */
+    }
+  }
+
+  if (channel->socket_receive_buffer_size > 0) {
+    rv = channel->sock_funcs.asetsockopt(
+      fd, ARES_SOCKET_OPT_RECVBUF_SIZE,
+      (void *)&channel->socket_receive_buffer_size,
+      sizeof(channel->socket_receive_buffer_size), channel->sock_func_cb_data);
+    if (rv != 0 && SOCKERRNO != ENOSYS) {
+      return ARES_ECONNREFUSED; /* LCOV_EXCL_LINE: UntestablePath */
+    }
+  }
+
+  /* Bind to network interface if configured */
+  if (ares_strlen(channel->local_dev_name)) {
+    /* Prior versions silently ignored failure, so we need to maintain that
+     * compatibility */
+    (void)channel->sock_funcs.asetsockopt(
+      fd, ARES_SOCKET_OPT_BIND_DEVICE, channel->local_dev_name,
+      sizeof(channel->local_dev_name), channel->sock_func_cb_data);
+  }
+
+  /* Bind to ip address if configured */
+  if (family == AF_INET && channel->local_ip4) {
+    memset(&local.sa4, 0, sizeof(local.sa4));
+    local.sa4.sin_family      = AF_INET;
+    local.sa4.sin_addr.s_addr = htonl(channel->local_ip4);
+    bindlen                   = sizeof(local.sa4);
+  } else if (family == AF_INET6 &&
+             memcmp(channel->local_ip6, ares_in6addr_any._S6_un._S6_u8,
+                    sizeof(channel->local_ip6)) != 0) {
+    /* Only if not link-local and an ip other than "::" is specified */
+    memset(&local.sa6, 0, sizeof(local.sa6));
+    local.sa6.sin6_family = AF_INET6;
+    memcpy(&local.sa6.sin6_addr, channel->local_ip6,
+           sizeof(channel->local_ip6));
+    bindlen = sizeof(local.sa6);
+  }
+
+
+  if (bindlen && channel->sock_funcs.abind != NULL) {
+    bind_flags |= ARES_SOCKET_BIND_CLIENT;
+    if (is_tcp) {
+      bind_flags |= ARES_SOCKET_BIND_TCP;
+    }
+    if (channel->sock_funcs.abind(fd, bind_flags, &local.sa, bindlen,
+                                  channel->sock_func_cb_data) != 0) {
+      return ARES_ECONNREFUSED;
+    }
+  }
+
+  return ARES_SUCCESS;
+}
+
+ares_bool_t ares_sockaddr_to_ares_addr(struct ares_addr      *ares_addr,
+                                       unsigned short        *port,
+                                       const struct sockaddr *sockaddr)
+{
+  if (sockaddr->sa_family == AF_INET) {
+    /* NOTE: memcpy sockaddr_in due to alignment issues found by UBSAN due to
+     *       dnsinfo packing on MacOS */
+    struct sockaddr_in sockaddr_in;
+    memcpy(&sockaddr_in, sockaddr, sizeof(sockaddr_in));
+
+    ares_addr->family = AF_INET;
+    memcpy(&ares_addr->addr.addr4, &(sockaddr_in.sin_addr),
+           sizeof(ares_addr->addr.addr4));
+
+    if (port) {
+      *port = ntohs(sockaddr_in.sin_port);
+    }
+    return ARES_TRUE;
+  }
+
+  if (sockaddr->sa_family == AF_INET6) {
+    /* NOTE: memcpy sockaddr_in6 due to alignment issues found by UBSAN due to
+     *       dnsinfo packing on MacOS */
+    struct sockaddr_in6 sockaddr_in6;
+    memcpy(&sockaddr_in6, sockaddr, sizeof(sockaddr_in6));
+
+    ares_addr->family = AF_INET6;
+    memcpy(&ares_addr->addr.addr6, &(sockaddr_in6.sin6_addr),
+           sizeof(ares_addr->addr.addr6));
+    if (port) {
+      *port = ntohs(sockaddr_in6.sin6_port);
+    }
+    return ARES_TRUE;
+  }
+
+  return ARES_FALSE;
+}
+
+ares_conn_err_t ares_socket_open(ares_socket_t *sock, ares_channel_t *channel,
+                                 int af, int type, int protocol)
+{
+  ares_socket_t s;
+
+  *sock = ARES_SOCKET_BAD;
+
+  s =
+    channel->sock_funcs.asocket(af, type, protocol, channel->sock_func_cb_data);
+
+  if (s == ARES_SOCKET_BAD) {
+    return ares_socket_deref_error(SOCKERRNO);
+  }
+
+  *sock = s;
+
+  return ARES_CONN_ERR_SUCCESS;
+}
+
+ares_conn_err_t ares_socket_connect(ares_channel_t *channel,
+                                    ares_socket_t sockfd, ares_bool_t is_tfo,
+                                    const struct sockaddr *addr,
+                                    ares_socklen_t         addrlen)
+{
+  ares_conn_err_t err   = ARES_CONN_ERR_SUCCESS;
+  unsigned int    flags = 0;
+
+  if (is_tfo) {
+    flags |= ARES_SOCKET_CONN_TCP_FASTOPEN;
+  }
+
+  do {
+    int rv;
+
+    rv = channel->sock_funcs.aconnect(sockfd, addr, addrlen, flags,
+                                      channel->sock_func_cb_data);
+
+    if (rv < 0) {
+      err = ares_socket_deref_error(SOCKERRNO);
+    } else {
+      err = ARES_CONN_ERR_SUCCESS;
+    }
+  } while (err == ARES_CONN_ERR_INTERRUPT);
+
+  return err;
+}
+
+void ares_socket_close(ares_channel_t *channel, ares_socket_t s)
+{
+  if (channel == NULL || s == ARES_SOCKET_BAD) {
+    return;
+  }
+
+  channel->sock_funcs.aclose(s, channel->sock_func_cb_data);
+}
+
+void ares_set_socket_callback(ares_channel_t           *channel,
+                              ares_sock_create_callback cb, void *data)
+{
+  if (channel == NULL) {
+    return;
+  }
+  channel->sock_create_cb      = cb;
+  channel->sock_create_cb_data = data;
+}
+
+void ares_set_socket_configure_callback(ares_channel_t           *channel,
+                                        ares_sock_config_callback cb,
+                                        void                     *data)
+{
+  if (channel == NULL || channel->optmask & ARES_OPT_EVENT_THREAD) {
+    return;
+  }
+  channel->sock_config_cb      = cb;
+  channel->sock_config_cb_data = data;
+}
+
+void ares_set_pending_write_cb(ares_channel_t       *channel,
+                               ares_pending_write_cb callback, void *user_data)
+{
+  if (channel == NULL || channel->optmask & ARES_OPT_EVENT_THREAD) {
+    return;
+  }
+  channel->notify_pending_write_cb      = callback;
+  channel->notify_pending_write_cb_data = user_data;
+}
diff --git a/deps/cares/src/lib/ares_socket.h b/deps/cares/src/lib/ares_socket.h
new file mode 100644
index 00000000000000..24a99ab3316943
--- /dev/null
+++ b/deps/cares/src/lib/ares_socket.h
@@ -0,0 +1,163 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+
+#ifndef __ARES_SOCKET_H
+#define __ARES_SOCKET_H
+
+/* Macro SOCKERRNO / SET_SOCKERRNO() returns / sets the *socket-related* errno
+ * (or equivalent) on this platform to hide platform details to code using it.
+ */
+#ifdef USE_WINSOCK
+#  define SOCKERRNO        ((int)WSAGetLastError())
+#  define SET_SOCKERRNO(x) (WSASetLastError((int)(x)))
+#else
+#  define SOCKERRNO        (errno)
+#  define SET_SOCKERRNO(x) (errno = (x))
+#endif
+
+/* Portable error number symbolic names defined to Winsock error codes. */
+#ifdef USE_WINSOCK
+#  undef EBADF           /* override definition in errno.h */
+#  define EBADF WSAEBADF
+#  undef EINTR           /* override definition in errno.h */
+#  define EINTR WSAEINTR
+#  undef EINVAL          /* override definition in errno.h */
+#  define EINVAL WSAEINVAL
+#  undef EWOULDBLOCK     /* override definition in errno.h */
+#  define EWOULDBLOCK WSAEWOULDBLOCK
+#  undef EINPROGRESS     /* override definition in errno.h */
+#  define EINPROGRESS WSAEINPROGRESS
+#  undef EALREADY        /* override definition in errno.h */
+#  define EALREADY WSAEALREADY
+#  undef ENOTSOCK        /* override definition in errno.h */
+#  define ENOTSOCK WSAENOTSOCK
+#  undef EDESTADDRREQ    /* override definition in errno.h */
+#  define EDESTADDRREQ WSAEDESTADDRREQ
+#  undef EMSGSIZE        /* override definition in errno.h */
+#  define EMSGSIZE WSAEMSGSIZE
+#  undef EPROTOTYPE      /* override definition in errno.h */
+#  define EPROTOTYPE WSAEPROTOTYPE
+#  undef ENOPROTOOPT     /* override definition in errno.h */
+#  define ENOPROTOOPT WSAENOPROTOOPT
+#  undef EPROTONOSUPPORT /* override definition in errno.h */
+#  define EPROTONOSUPPORT WSAEPROTONOSUPPORT
+#  define ESOCKTNOSUPPORT WSAESOCKTNOSUPPORT
+#  undef EOPNOTSUPP /* override definition in errno.h */
+#  define EOPNOTSUPP WSAEOPNOTSUPP
+#  undef ENOSYS     /* override definition in errno.h */
+#  define ENOSYS       WSAEOPNOTSUPP
+#  define EPFNOSUPPORT WSAEPFNOSUPPORT
+#  undef EAFNOSUPPORT  /* override definition in errno.h */
+#  define EAFNOSUPPORT WSAEAFNOSUPPORT
+#  undef EADDRINUSE    /* override definition in errno.h */
+#  define EADDRINUSE WSAEADDRINUSE
+#  undef EADDRNOTAVAIL /* override definition in errno.h */
+#  define EADDRNOTAVAIL WSAEADDRNOTAVAIL
+#  undef ENETDOWN      /* override definition in errno.h */
+#  define ENETDOWN WSAENETDOWN
+#  undef ENETUNREACH   /* override definition in errno.h */
+#  define ENETUNREACH WSAENETUNREACH
+#  undef ENETRESET     /* override definition in errno.h */
+#  define ENETRESET WSAENETRESET
+#  undef ECONNABORTED  /* override definition in errno.h */
+#  define ECONNABORTED WSAECONNABORTED
+#  undef ECONNRESET    /* override definition in errno.h */
+#  define ECONNRESET WSAECONNRESET
+#  undef ENOBUFS       /* override definition in errno.h */
+#  define ENOBUFS WSAENOBUFS
+#  undef EISCONN       /* override definition in errno.h */
+#  define EISCONN WSAEISCONN
+#  undef ENOTCONN      /* override definition in errno.h */
+#  define ENOTCONN     WSAENOTCONN
+#  define ESHUTDOWN    WSAESHUTDOWN
+#  define ETOOMANYREFS WSAETOOMANYREFS
+#  undef ETIMEDOUT     /* override definition in errno.h */
+#  define ETIMEDOUT WSAETIMEDOUT
+#  undef ECONNREFUSED  /* override definition in errno.h */
+#  define ECONNREFUSED WSAECONNREFUSED
+#  undef ELOOP         /* override definition in errno.h */
+#  define ELOOP WSAELOOP
+#  ifndef ENAMETOOLONG /* possible previous definition in errno.h */
+#    define ENAMETOOLONG WSAENAMETOOLONG
+#  endif
+#  define EHOSTDOWN WSAEHOSTDOWN
+#  undef EHOSTUNREACH /* override definition in errno.h */
+#  define EHOSTUNREACH WSAEHOSTUNREACH
+#  ifndef ENOTEMPTY   /* possible previous definition in errno.h */
+#    define ENOTEMPTY WSAENOTEMPTY
+#  endif
+#  define EPROCLIM WSAEPROCLIM
+#  define EUSERS   WSAEUSERS
+#  define EDQUOT   WSAEDQUOT
+#  define ESTALE   WSAESTALE
+#  define EREMOTE  WSAEREMOTE
+#endif
+
+/*! Socket errors */
+typedef enum {
+  ARES_CONN_ERR_SUCCESS      = 0,  /*!< Success */
+  ARES_CONN_ERR_WOULDBLOCK   = 1,  /*!< Operation would block */
+  ARES_CONN_ERR_CONNCLOSED   = 2,  /*!< Connection closed (gracefully) */
+  ARES_CONN_ERR_CONNABORTED  = 3,  /*!< Connection Aborted */
+  ARES_CONN_ERR_CONNRESET    = 4,  /*!< Connection Reset */
+  ARES_CONN_ERR_CONNREFUSED  = 5,  /*!< Connection Refused */
+  ARES_CONN_ERR_CONNTIMEDOUT = 6,  /*!< Connection Timed Out */
+  ARES_CONN_ERR_HOSTDOWN     = 7,  /*!< Host Down */
+  ARES_CONN_ERR_HOSTUNREACH  = 8,  /*!< Host Unreachable */
+  ARES_CONN_ERR_NETDOWN      = 9,  /*!< Network Down */
+  ARES_CONN_ERR_NETUNREACH   = 10, /*!< Network Unreachable */
+  ARES_CONN_ERR_INTERRUPT    = 11, /*!< Call interrupted by signal, repeat */
+  ARES_CONN_ERR_AFNOSUPPORT  = 12, /*!< Address family not supported */
+  ARES_CONN_ERR_BADADDR      = 13, /*!< Bad Address / Unavailable */
+  ARES_CONN_ERR_NOMEM        = 14, /*!< Out of memory */
+  ARES_CONN_ERR_INVALID      = 15, /*!< Invalid Usage */
+  ARES_CONN_ERR_TOOLARGE     = 16, /*!< Request size too large */
+  ARES_CONN_ERR_NOTIMP       = 17, /*!< Not implemented */
+  ARES_CONN_ERR_FAILURE      = 99  /*!< Generic failure */
+} ares_conn_err_t;
+
+ares_bool_t     ares_sockaddr_addr_eq(const struct sockaddr  *sa,
+                                      const struct ares_addr *aa);
+ares_status_t   ares_socket_configure(ares_channel_t *channel, int family,
+                                      ares_bool_t is_tcp, ares_socket_t fd);
+ares_conn_err_t ares_socket_enable_tfo(const ares_channel_t *channel,
+                                       ares_socket_t         fd);
+ares_conn_err_t ares_socket_open(ares_socket_t *sock, ares_channel_t *channel,
+                                 int af, int type, int protocol);
+ares_bool_t     ares_socket_try_again(int errnum);
+void            ares_socket_close(ares_channel_t *channel, ares_socket_t s);
+ares_conn_err_t ares_socket_connect(ares_channel_t *channel,
+                                    ares_socket_t sockfd, ares_bool_t is_tfo,
+                                    const struct sockaddr *addr,
+                                    ares_socklen_t         addrlen);
+ares_bool_t     ares_sockaddr_to_ares_addr(struct ares_addr      *ares_addr,
+                                           unsigned short        *port,
+                                           const struct sockaddr *sockaddr);
+ares_conn_err_t ares_socket_write(ares_channel_t *channel, ares_socket_t fd,
+                                  const void *data, size_t len, size_t *written,
+                                  const struct sockaddr *sa,
+                                  ares_socklen_t         salen);
+#endif
diff --git a/deps/cares/src/lib/ares__sortaddrinfo.c b/deps/cares/src/lib/ares_sortaddrinfo.c
similarity index 94%
rename from deps/cares/src/lib/ares__sortaddrinfo.c
rename to deps/cares/src/lib/ares_sortaddrinfo.c
index 1aab81ecf84ce7..e6c21ea0ad712a 100644
--- a/deps/cares/src/lib/ares__sortaddrinfo.c
+++ b/deps/cares/src/lib/ares_sortaddrinfo.c
@@ -345,8 +345,9 @@ static int rfc6724_compare(const void *ptr1, const void *ptr2)
 static int find_src_addr(ares_channel_t *channel, const struct sockaddr *addr,
                          struct sockaddr *src_addr)
 {
-  ares_socket_t  sock;
-  ares_socklen_t len;
+  ares_socket_t   sock;
+  ares_socklen_t  len;
+  ares_conn_err_t err;
 
   switch (addr->sa_family) {
     case AF_INET:
@@ -360,25 +361,27 @@ static int find_src_addr(ares_channel_t *channel, const struct sockaddr *addr,
       return 0;
   }
 
-  sock = ares__open_socket(channel, addr->sa_family, SOCK_DGRAM, IPPROTO_UDP);
-  if (sock == ARES_SOCKET_BAD) {
-    if (SOCKERRNO == EAFNOSUPPORT) {
-      return 0;
-    } else {
-      return -1;
-    }
+  err =
+    ares_socket_open(&sock, channel, addr->sa_family, SOCK_DGRAM, IPPROTO_UDP);
+  if (err == ARES_CONN_ERR_AFNOSUPPORT) {
+    return 0;
+  } else if (err != ARES_CONN_ERR_SUCCESS) {
+    return -1;
   }
 
-  if (ares__connect_socket(channel, sock, addr, len) != ARES_SUCCESS) {
-    ares__close_socket(channel, sock);
+  err = ares_socket_connect(channel, sock, ARES_FALSE, addr, len);
+  if (err != ARES_CONN_ERR_SUCCESS && err != ARES_CONN_ERR_WOULDBLOCK) {
+    ares_socket_close(channel, sock);
     return 0;
   }
 
-  if (getsockname(sock, src_addr, &len) != 0) {
-    ares__close_socket(channel, sock);
+  if (channel->sock_funcs.agetsockname == NULL ||
+      channel->sock_funcs.agetsockname(sock, src_addr, &len,
+                                       channel->sock_func_cb_data) != 0) {
+    ares_socket_close(channel, sock);
     return -1;
   }
-  ares__close_socket(channel, sock);
+  ares_socket_close(channel, sock);
   return 1;
 }
 
@@ -386,8 +389,8 @@ static int find_src_addr(ares_channel_t *channel, const struct sockaddr *addr,
  * Sort the linked list starting at sentinel->ai_next in RFC6724 order.
  * Will leave the list unchanged if an error occurs.
  */
-ares_status_t ares__sortaddrinfo(ares_channel_t            *channel,
-                                 struct ares_addrinfo_node *list_sentinel)
+ares_status_t ares_sortaddrinfo(ares_channel_t            *channel,
+                                struct ares_addrinfo_node *list_sentinel)
 {
   struct ares_addrinfo_node *cur;
   size_t                     nelem = 0;
diff --git a/deps/cares/src/lib/ares_sysconfig.c b/deps/cares/src/lib/ares_sysconfig.c
index 61e6a423a7578a..9f0d7e5061ffe0 100644
--- a/deps/cares/src/lib/ares_sysconfig.c
+++ b/deps/cares/src/lib/ares_sysconfig.c
@@ -56,11 +56,11 @@
 #endif
 
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 
 
 #if defined(__MVS__)
-static ares_status_t ares__init_sysconfig_mvs(ares_sysconfig_t *sysconfig)
+static ares_status_t ares_init_sysconfig_mvs(const ares_channel_t *channel,
+                                             ares_sysconfig_t     *sysconfig)
 {
   struct __res_state *res = 0;
   size_t              count4;
@@ -99,9 +99,9 @@ static ares_status_t ares__init_sysconfig_mvs(ares_sysconfig_t *sysconfig)
     addr.addr.addr4.s_addr = addr_in->sin_addr.s_addr;
     addr.family            = AF_INET;
 
-    status =
-      ares__sconfig_append(&sysconfig->sconfig, &addr, htons(addr_in->sin_port),
-                           htons(addr_in->sin_port), NULL);
+    status = ares_sconfig_append(channel, &sysconfig->sconfig, &addr,
+                                 htons(addr_in->sin_port),
+                                 htons(addr_in->sin_port), NULL);
 
     if (status != ARES_SUCCESS) {
       return status;
@@ -116,9 +116,9 @@ static ares_status_t ares__init_sysconfig_mvs(ares_sysconfig_t *sysconfig)
     memcpy(&(addr.addr.addr6), &(addr_in->sin6_addr),
            sizeof(addr_in->sin6_addr));
 
-    status =
-      ares__sconfig_append(&sysconfig->sconfig, &addr, htons(addr_in->sin_port),
-                           htons(addr_in->sin_port), NULL);
+    status = ares_sconfig_append(channel, &sysconfig->sconfig, &addr,
+                                 htons(addr_in->sin_port),
+                                 htons(addr_in->sin_port), NULL);
 
     if (status != ARES_SUCCESS) {
       return status;
@@ -130,7 +130,8 @@ static ares_status_t ares__init_sysconfig_mvs(ares_sysconfig_t *sysconfig)
 #endif
 
 #if defined(__riscos__)
-static ares_status_t ares__init_sysconfig_riscos(ares_sysconfig_t *sysconfig)
+static ares_status_t ares_init_sysconfig_riscos(const ares_channel_t *channel,
+                                                ares_sysconfig_t     *sysconfig)
 {
   char         *line;
   ares_status_t status = ARES_SUCCESS;
@@ -153,8 +154,8 @@ static ares_status_t ares__init_sysconfig_riscos(ares_sysconfig_t *sysconfig)
       if (space) {
         *space = '\0';
       }
-      status =
-        ares__sconfig_append_fromstr(&sysconfig->sconfig, pos, ARES_TRUE);
+      status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig, pos,
+                                           ARES_TRUE);
       if (status != ARES_SUCCESS) {
         break;
       }
@@ -169,7 +170,8 @@ static ares_status_t ares__init_sysconfig_riscos(ares_sysconfig_t *sysconfig)
 #endif
 
 #if defined(WATT32)
-static ares_status_t ares__init_sysconfig_watt32(ares_sysconfig_t *sysconfig)
+static ares_status_t ares_init_sysconfig_watt32(const ares_channel_t *channel,
+                                                ares_sysconfig_t     *sysconfig)
 {
   size_t        i;
   ares_status_t status;
@@ -182,7 +184,8 @@ static ares_status_t ares__init_sysconfig_watt32(ares_sysconfig_t *sysconfig)
     addr.family            = AF_INET;
     addr.addr.addr4.s_addr = htonl(def_nameservers[i]);
 
-    status = ares__sconfig_append(&sysconfig->sconfig, &addr, 0, 0, NULL);
+    status =
+      ares_sconfig_append(channel, &sysconfig->sconfig, &addr, 0, 0, NULL);
 
     if (status != ARES_SUCCESS) {
       return status;
@@ -194,7 +197,8 @@ static ares_status_t ares__init_sysconfig_watt32(ares_sysconfig_t *sysconfig)
 #endif
 
 #if defined(ANDROID) || defined(__ANDROID__)
-static ares_status_t ares__init_sysconfig_android(ares_sysconfig_t *sysconfig)
+static ares_status_t ares_init_sysconfig_android(const ares_channel_t *channel,
+                                                 ares_sysconfig_t *sysconfig)
 {
   size_t        i;
   char        **dns_servers;
@@ -211,8 +215,8 @@ static ares_status_t ares__init_sysconfig_android(ares_sysconfig_t *sysconfig)
   dns_servers = ares_get_android_server_list(MAX_DNS_PROPERTIES, &num_servers);
   if (dns_servers != NULL) {
     for (i = 0; i < num_servers; i++) {
-      status = ares__sconfig_append_fromstr(&sysconfig->sconfig, dns_servers[i],
-                                            ARES_TRUE);
+      status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig,
+                                           dns_servers[i], ARES_TRUE);
       if (status != ARES_SUCCESS) {
         return status;
       }
@@ -224,7 +228,7 @@ static ares_status_t ares__init_sysconfig_android(ares_sysconfig_t *sysconfig)
   }
 
   domains            = ares_get_android_search_domains_list();
-  sysconfig->domains = ares__strsplit(domains, ", ", &sysconfig->ndomains);
+  sysconfig->domains = ares_strsplit(domains, ", ", &sysconfig->ndomains);
   ares_free(domains);
 
 #  ifdef HAVE___SYSTEM_PROPERTY_GET
@@ -243,8 +247,8 @@ static ares_status_t ares__init_sysconfig_android(ares_sysconfig_t *sysconfig)
       if (__system_property_get(propname, propvalue) < 1) {
         break;
       }
-      status =
-        ares__sconfig_append_fromstr(&sysconfig->sconfig, propvalue, ARES_TRUE);
+      status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig,
+                                           propvalue, ARES_TRUE);
       if (status != ARES_SUCCESS) {
         return status;
       }
@@ -257,7 +261,9 @@ static ares_status_t ares__init_sysconfig_android(ares_sysconfig_t *sysconfig)
 #endif
 
 #if defined(CARES_USE_LIBRESOLV)
-static ares_status_t ares__init_sysconfig_libresolv(ares_sysconfig_t *sysconfig)
+static ares_status_t
+  ares_init_sysconfig_libresolv(const ares_channel_t *channel,
+                                ares_sysconfig_t     *sysconfig)
 {
   struct __res_state       res;
   ares_status_t            status = ARES_SUCCESS;
@@ -265,7 +271,7 @@ static ares_status_t ares__init_sysconfig_libresolv(ares_sysconfig_t *sysconfig)
   int                      nscount;
   size_t                   i;
   size_t                   entries = 0;
-  ares__buf_t             *ipbuf   = NULL;
+  ares_buf_t              *ipbuf   = NULL;
 
   memset(&res, 0, sizeof(res));
 
@@ -295,58 +301,58 @@ static ares_status_t ares__init_sysconfig_libresolv(ares_sysconfig_t *sysconfig)
 
 
     /* [ip]:port%iface */
-    ipbuf = ares__buf_create();
+    ipbuf = ares_buf_create();
     if (ipbuf == NULL) {
       status = ARES_ENOMEM;
       goto done;
     }
 
-    status = ares__buf_append_str(ipbuf, "[");
+    status = ares_buf_append_str(ipbuf, "[");
     if (status != ARES_SUCCESS) {
       goto done;
     }
 
-    status = ares__buf_append_str(ipbuf, ipaddr);
+    status = ares_buf_append_str(ipbuf, ipaddr);
     if (status != ARES_SUCCESS) {
       goto done;
     }
 
-    status = ares__buf_append_str(ipbuf, "]");
+    status = ares_buf_append_str(ipbuf, "]");
     if (status != ARES_SUCCESS) {
       goto done;
     }
 
     if (port) {
-      status = ares__buf_append_str(ipbuf, ":");
+      status = ares_buf_append_str(ipbuf, ":");
       if (status != ARES_SUCCESS) {
         goto done;
       }
-      status = ares__buf_append_num_dec(ipbuf, port, 0);
+      status = ares_buf_append_num_dec(ipbuf, port, 0);
       if (status != ARES_SUCCESS) {
         goto done;
       }
     }
 
     if (ll_scope) {
-      status = ares__buf_append_str(ipbuf, "%");
+      status = ares_buf_append_str(ipbuf, "%");
       if (status != ARES_SUCCESS) {
         goto done;
       }
-      status = ares__buf_append_num_dec(ipbuf, ll_scope, 0);
+      status = ares_buf_append_num_dec(ipbuf, ll_scope, 0);
       if (status != ARES_SUCCESS) {
         goto done;
       }
     }
 
-    ipstr = ares__buf_finish_str(ipbuf, NULL);
+    ipstr = ares_buf_finish_str(ipbuf, NULL);
     ipbuf = NULL;
     if (ipstr == NULL) {
       status = ARES_ENOMEM;
       goto done;
     }
 
-    status =
-      ares__sconfig_append_fromstr(&sysconfig->sconfig, ipstr, ARES_TRUE);
+    status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig, ipstr,
+                                         ARES_TRUE);
 
     ares_free(ipstr);
     if (status != ARES_SUCCESS) {
@@ -400,7 +406,7 @@ static ares_status_t ares__init_sysconfig_libresolv(ares_sysconfig_t *sysconfig)
   }
 
 done:
-  ares__buf_destroy(ipbuf);
+  ares_buf_destroy(ipbuf);
   res_ndestroy(&res);
   return status;
 }
@@ -408,8 +414,8 @@ static ares_status_t ares__init_sysconfig_libresolv(ares_sysconfig_t *sysconfig)
 
 static void ares_sysconfig_free(ares_sysconfig_t *sysconfig)
 {
-  ares__llist_destroy(sysconfig->sconfig);
-  ares__strsplit_free(sysconfig->domains, sysconfig->ndomains);
+  ares_llist_destroy(sysconfig->sconfig);
+  ares_strsplit_free(sysconfig->domains, sysconfig->ndomains);
   ares_free(sysconfig->sortlist);
   ares_free(sysconfig->lookups);
   memset(sysconfig, 0, sizeof(*sysconfig));
@@ -421,7 +427,7 @@ static ares_status_t ares_sysconfig_apply(ares_channel_t         *channel,
   ares_status_t status;
 
   if (sysconfig->sconfig && !(channel->optmask & ARES_OPT_SERVERS)) {
-    status = ares__servers_update(channel, sysconfig->sconfig, ARES_FALSE);
+    status = ares_servers_update(channel, sysconfig->sconfig, ARES_FALSE);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -431,12 +437,12 @@ static ares_status_t ares_sysconfig_apply(ares_channel_t         *channel,
     /* Make sure we duplicate first then replace so even if there is
      * ARES_ENOMEM, the channel stays in a good state */
     char **temp =
-      ares__strsplit_duplicate(sysconfig->domains, sysconfig->ndomains);
+      ares_strsplit_duplicate(sysconfig->domains, sysconfig->ndomains);
     if (temp == NULL) {
       return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    ares__strsplit_free(channel->domains, channel->ndomains);
+    ares_strsplit_free(channel->domains, channel->ndomains);
     channel->domains  = temp;
     channel->ndomains = sysconfig->ndomains;
   }
@@ -488,7 +494,7 @@ static ares_status_t ares_sysconfig_apply(ares_channel_t         *channel,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__init_by_sysconfig(ares_channel_t *channel)
+ares_status_t ares_init_by_sysconfig(ares_channel_t *channel)
 {
   ares_status_t    status;
   ares_sysconfig_t sysconfig;
@@ -497,21 +503,21 @@ ares_status_t ares__init_by_sysconfig(ares_channel_t *channel)
   sysconfig.ndots = 1; /* Default value if not otherwise set */
 
 #if defined(USE_WINSOCK)
-  status = ares__init_sysconfig_windows(&sysconfig);
+  status = ares_init_sysconfig_windows(channel, &sysconfig);
 #elif defined(__MVS__)
-  status = ares__init_sysconfig_mvs(&sysconfig);
+  status = ares_init_sysconfig_mvs(channel, &sysconfig);
 #elif defined(__riscos__)
-  status = ares__init_sysconfig_riscos(&sysconfig);
+  status = ares_init_sysconfig_riscos(channel, &sysconfig);
 #elif defined(WATT32)
-  status = ares__init_sysconfig_watt32(&sysconfig);
+  status = ares_init_sysconfig_watt32(channel, &sysconfig);
 #elif defined(ANDROID) || defined(__ANDROID__)
-  status = ares__init_sysconfig_android(&sysconfig);
+  status = ares_init_sysconfig_android(channel, &sysconfig);
 #elif defined(__APPLE__)
-  status = ares__init_sysconfig_macos(&sysconfig);
+  status = ares_init_sysconfig_macos(channel, &sysconfig);
 #elif defined(CARES_USE_LIBRESOLV)
-  status = ares__init_sysconfig_libresolv(&sysconfig);
+  status = ares_init_sysconfig_libresolv(channel, &sysconfig);
 #else
-  status = ares__init_sysconfig_files(channel, &sysconfig);
+  status = ares_init_sysconfig_files(channel, &sysconfig);
 #endif
 
   if (status != ARES_SUCCESS) {
@@ -519,7 +525,7 @@ ares_status_t ares__init_by_sysconfig(ares_channel_t *channel)
   }
 
   /* Environment is supposed to override sysconfig */
-  status = ares__init_by_environment(&sysconfig);
+  status = ares_init_by_environment(&sysconfig);
   if (status != ARES_SUCCESS) {
     goto done;
   }
@@ -527,10 +533,10 @@ ares_status_t ares__init_by_sysconfig(ares_channel_t *channel)
   /* Lock when applying the configuration to the channel.  Don't need to
    * lock prior to this. */
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
   status = ares_sysconfig_apply(channel, &sysconfig);
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   if (status != ARES_SUCCESS) {
     goto done;
diff --git a/deps/cares/src/lib/ares_sysconfig_files.c b/deps/cares/src/lib/ares_sysconfig_files.c
index 7b8bdbe41879d6..49bc330d9d346d 100644
--- a/deps/cares/src/lib/ares_sysconfig_files.c
+++ b/deps/cares/src/lib/ares_sysconfig_files.c
@@ -60,7 +60,6 @@
 #endif
 
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 
 static unsigned char ip_natural_mask(const struct ares_addr *addr)
 {
@@ -110,7 +109,7 @@ static ares_bool_t sortlist_append(struct apattern **sortlist, size_t *nsort,
   return ARES_TRUE;
 }
 
-static ares_status_t parse_sort(ares__buf_t *buf, struct apattern *pat)
+static ares_status_t parse_sort(ares_buf_t *buf, struct apattern *pat)
 {
   ares_status_t       status;
   const unsigned char ip_charset[]             = "ABCDEFabcdef0123456789.:";
@@ -120,22 +119,22 @@ static ares_status_t parse_sort(ares__buf_t *buf, struct apattern *pat)
   memset(pat, 0, sizeof(*pat));
 
   /* Consume any leading whitespace */
-  ares__buf_consume_whitespace(buf, ARES_TRUE);
+  ares_buf_consume_whitespace(buf, ARES_TRUE);
 
   /* If no length, just ignore, return ENOTFOUND as an indicator */
-  if (ares__buf_len(buf) == 0) {
+  if (ares_buf_len(buf) == 0) {
     return ARES_ENOTFOUND;
   }
 
-  ares__buf_tag(buf);
+  ares_buf_tag(buf);
 
   /* Consume ip address */
-  if (ares__buf_consume_charset(buf, ip_charset, sizeof(ip_charset) - 1) == 0) {
+  if (ares_buf_consume_charset(buf, ip_charset, sizeof(ip_charset) - 1) == 0) {
     return ARES_EBADSTR;
   }
 
   /* Fetch ip address */
-  status = ares__buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
+  status = ares_buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -147,24 +146,24 @@ static ares_status_t parse_sort(ares__buf_t *buf, struct apattern *pat)
   }
 
   /* See if there is a subnet mask */
-  if (ares__buf_begins_with(buf, (const unsigned char *)"/", 1)) {
+  if (ares_buf_begins_with(buf, (const unsigned char *)"/", 1)) {
     char                maskstr[16];
     const unsigned char ipv4_charset[] = "0123456789.";
 
 
     /* Consume / */
-    ares__buf_consume(buf, 1);
+    ares_buf_consume(buf, 1);
 
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
     /* Consume mask */
-    if (ares__buf_consume_charset(buf, ipv4_charset,
-                                  sizeof(ipv4_charset) - 1) == 0) {
+    if (ares_buf_consume_charset(buf, ipv4_charset, sizeof(ipv4_charset) - 1) ==
+        0) {
       return ARES_EBADSTR;
     }
 
     /* Fetch mask */
-    status = ares__buf_tag_fetch_string(buf, maskstr, sizeof(maskstr));
+    status = ares_buf_tag_fetch_string(buf, maskstr, sizeof(maskstr));
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -190,33 +189,34 @@ static ares_status_t parse_sort(ares__buf_t *buf, struct apattern *pat)
         return ARES_EBADSTR;
       }
       ptr       = (const unsigned char *)&maskaddr.addr.addr4;
-      pat->mask = (unsigned char)(ares__count_bits_u8(ptr[0]) +
-                                  ares__count_bits_u8(ptr[1]) +
-                                  ares__count_bits_u8(ptr[2]) +
-                                  ares__count_bits_u8(ptr[3]));
+      pat->mask = (unsigned char)(ares_count_bits_u8(ptr[0]) +
+                                  ares_count_bits_u8(ptr[1]) +
+                                  ares_count_bits_u8(ptr[2]) +
+                                  ares_count_bits_u8(ptr[3]));
     }
   } else {
     pat->mask = ip_natural_mask(&pat->addr);
   }
 
   /* Consume any trailing whitespace */
-  ares__buf_consume_whitespace(buf, ARES_TRUE);
+  ares_buf_consume_whitespace(buf, ARES_TRUE);
 
   /* If we have any trailing bytes other than whitespace, its a parse failure */
-  if (ares__buf_len(buf) != 0) {
+  if (ares_buf_len(buf) != 0) {
     return ARES_EBADSTR;
   }
 
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__parse_sortlist(struct apattern **sortlist, size_t *nsort,
-                                   const char *str)
+ares_status_t ares_parse_sortlist(struct apattern **sortlist, size_t *nsort,
+                                  const char *str)
 {
-  ares__buf_t        *buf    = NULL;
-  ares__llist_t      *list   = NULL;
-  ares_status_t       status = ARES_SUCCESS;
-  ares__llist_node_t *node   = NULL;
+  ares_buf_t   *buf    = NULL;
+  ares_status_t status = ARES_SUCCESS;
+  ares_array_t *arr    = NULL;
+  size_t        num    = 0;
+  size_t        i;
 
   if (sortlist == NULL || nsort == NULL || str == NULL) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -229,22 +229,23 @@ ares_status_t ares__parse_sortlist(struct apattern **sortlist, size_t *nsort,
   *sortlist = NULL;
   *nsort    = 0;
 
-  buf = ares__buf_create_const((const unsigned char *)str, ares_strlen(str));
+  buf = ares_buf_create_const((const unsigned char *)str, ares_strlen(str));
   if (buf == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
   /* Split on space or semicolon */
-  status = ares__buf_split(buf, (const unsigned char *)" ;", 2,
-                           ARES_BUF_SPLIT_NONE, 0, &list);
+  status = ares_buf_split(buf, (const unsigned char *)" ;", 2,
+                          ARES_BUF_SPLIT_NONE, 0, &arr);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (node = ares__llist_node_first(list); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t    *entry = ares__llist_node_val(node);
+  num = ares_array_len(arr);
+  for (i = 0; i < num; i++) {
+    ares_buf_t    **bufptr = ares_array_at(arr, i);
+    ares_buf_t     *entry  = *bufptr;
 
     struct apattern pat;
 
@@ -266,8 +267,8 @@ ares_status_t ares__parse_sortlist(struct apattern **sortlist, size_t *nsort,
   status = ARES_SUCCESS;
 
 done:
-  ares__buf_destroy(buf);
-  ares__llist_destroy(list);
+  ares_buf_destroy(buf);
+  ares_array_destroy(arr);
 
   if (status != ARES_SUCCESS) {
     ares_free(*sortlist);
@@ -283,12 +284,12 @@ static ares_status_t config_search(ares_sysconfig_t *sysconfig, const char *str,
 {
   if (sysconfig->domains && sysconfig->ndomains > 0) {
     /* if we already have some domains present, free them first */
-    ares__strsplit_free(sysconfig->domains, sysconfig->ndomains);
+    ares_strsplit_free(sysconfig->domains, sysconfig->ndomains);
     sysconfig->domains  = NULL;
     sysconfig->ndomains = 0;
   }
 
-  sysconfig->domains = ares__strsplit(str, ", ", &sysconfig->ndomains);
+  sysconfig->domains = ares_strsplit(str, ", ", &sysconfig->ndomains);
   if (sysconfig->domains == NULL) {
     return ARES_ENOMEM;
   }
@@ -306,52 +307,45 @@ static ares_status_t config_search(ares_sysconfig_t *sysconfig, const char *str,
   return ARES_SUCCESS;
 }
 
-static ares_status_t buf_fetch_string(ares__buf_t *buf, char *str,
+static ares_status_t buf_fetch_string(ares_buf_t *buf, char *str,
                                       size_t str_len)
 {
   ares_status_t status;
-  ares__buf_tag(buf);
-  ares__buf_consume(buf, ares__buf_len(buf));
+  ares_buf_tag(buf);
+  ares_buf_consume(buf, ares_buf_len(buf));
 
-  status = ares__buf_tag_fetch_string(buf, str, str_len);
+  status = ares_buf_tag_fetch_string(buf, str, str_len);
   return status;
 }
 
-static ares_status_t config_lookup(ares_sysconfig_t *sysconfig,
-                                   ares__buf_t *buf, const char *separators)
+static ares_status_t config_lookup(ares_sysconfig_t *sysconfig, ares_buf_t *buf,
+                                   const char *separators)
 {
-  ares_status_t       status;
-  char                lookupstr[32];
-  size_t              lookupstr_cnt = 0;
-  ares__llist_t      *lookups       = NULL;
-  ares__llist_node_t *node;
-  size_t              separators_len = ares_strlen(separators);
-
-  status = ares__buf_split(buf, (const unsigned char *)separators,
-                           separators_len, ARES_BUF_SPLIT_TRIM, 0, &lookups);
+  ares_status_t status;
+  char          lookupstr[32];
+  size_t        lookupstr_cnt = 0;
+  char        **lookups       = NULL;
+  size_t        num           = 0;
+  size_t        i;
+  size_t        separators_len = ares_strlen(separators);
+
+  status =
+    ares_buf_split_str(buf, (const unsigned char *)separators, separators_len,
+                       ARES_BUF_SPLIT_TRIM, 0, &lookups, &num);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  memset(lookupstr, 0, sizeof(lookupstr));
-
-  for (node = ares__llist_node_first(lookups); node != NULL;
-       node = ares__llist_node_next(node)) {
-    char         value[128];
-    char         ch;
-    ares__buf_t *valbuf = ares__llist_node_val(node);
-
-    status = buf_fetch_string(valbuf, value, sizeof(value));
-    if (status != ARES_SUCCESS) {
-      continue;
-    }
+  for (i = 0; i < num; i++) {
+    const char *value = lookups[i];
+    char        ch;
 
-    if (strcasecmp(value, "dns") == 0 || strcasecmp(value, "bind") == 0 ||
-        strcasecmp(value, "resolv") == 0 || strcasecmp(value, "resolve") == 0) {
+    if (ares_strcaseeq(value, "dns") || ares_strcaseeq(value, "bind") ||
+        ares_strcaseeq(value, "resolv") || ares_strcaseeq(value, "resolve")) {
       ch = 'b';
-    } else if (strcasecmp(value, "files") == 0 ||
-               strcasecmp(value, "file") == 0 ||
-               strcasecmp(value, "local") == 0) {
+    } else if (ares_strcaseeq(value, "files") ||
+               ares_strcaseeq(value, "file") ||
+               ares_strcaseeq(value, "local")) {
       ch = 'f';
     } else {
       continue;
@@ -364,10 +358,12 @@ static ares_status_t config_lookup(ares_sysconfig_t *sysconfig,
   }
 
   if (lookupstr_cnt) {
+    lookupstr[lookupstr_cnt] = 0;
     ares_free(sysconfig->lookups);
     sysconfig->lookups = ares_strdup(lookupstr);
     if (sysconfig->lookups == NULL) {
-      return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
+      status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
+      goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
@@ -377,83 +373,85 @@ static ares_status_t config_lookup(ares_sysconfig_t *sysconfig,
   if (status != ARES_ENOMEM) {
     status = ARES_SUCCESS;
   }
-  ares__llist_destroy(lookups);
+  ares_free_array(lookups, num, ares_free);
   return status;
 }
 
 static ares_status_t process_option(ares_sysconfig_t *sysconfig,
-                                    ares__buf_t      *option)
+                                    ares_buf_t       *option)
 {
-  ares__llist_t *kv      = NULL;
-  char           key[32] = "";
-  char           val[32] = "";
-  unsigned int   valint  = 0;
-  ares_status_t  status;
+  char        **kv  = NULL;
+  size_t        num = 0;
+  const char   *key;
+  const char   *val;
+  unsigned int  valint = 0;
+  ares_status_t status;
 
   /* Split on : */
-  status = ares__buf_split(option, (const unsigned char *)":", 1,
-                           ARES_BUF_SPLIT_TRIM, 2, &kv);
+  status = ares_buf_split_str(option, (const unsigned char *)":", 1,
+                              ARES_BUF_SPLIT_TRIM, 2, &kv, &num);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  status = buf_fetch_string(ares__llist_first_val(kv), key, sizeof(key));
-  if (status != ARES_SUCCESS) {
+  if (num < 1) {
+    status = ARES_EBADSTR;
     goto done;
   }
-  if (ares__llist_len(kv) == 2) {
-    status = buf_fetch_string(ares__llist_last_val(kv), val, sizeof(val));
-    if (status != ARES_SUCCESS) {
-      goto done;
-    }
+
+  key = kv[0];
+  if (num == 2) {
+    val    = kv[1];
     valint = (unsigned int)strtoul(val, NULL, 10);
   }
 
-  if (strcmp(key, "ndots") == 0) {
+  if (ares_streq(key, "ndots")) {
     sysconfig->ndots = valint;
-  } else if (strcmp(key, "retrans") == 0 || strcmp(key, "timeout") == 0) {
+  } else if (ares_streq(key, "retrans") || ares_streq(key, "timeout")) {
     if (valint == 0) {
       return ARES_EFORMERR;
     }
     sysconfig->timeout_ms = valint * 1000;
-  } else if (strcmp(key, "retry") == 0 || strcmp(key, "attempts") == 0) {
+  } else if (ares_streq(key, "retry") || ares_streq(key, "attempts")) {
     if (valint == 0) {
       return ARES_EFORMERR;
     }
     sysconfig->tries = valint;
-  } else if (strcmp(key, "rotate") == 0) {
+  } else if (ares_streq(key, "rotate")) {
     sysconfig->rotate = ARES_TRUE;
-  } else if (strcmp(key, "use-vc") == 0 || strcmp(key, "usevc") == 0) {
+  } else if (ares_streq(key, "use-vc") || ares_streq(key, "usevc")) {
     sysconfig->usevc = ARES_TRUE;
   }
 
 done:
-  ares__llist_destroy(kv);
+  ares_free_array(kv, num, ares_free);
   return status;
 }
 
-ares_status_t ares__sysconfig_set_options(ares_sysconfig_t *sysconfig,
-                                          const char       *str)
+ares_status_t ares_sysconfig_set_options(ares_sysconfig_t *sysconfig,
+                                         const char       *str)
 {
-  ares__buf_t        *buf     = NULL;
-  ares__llist_t      *options = NULL;
-  ares_status_t       status;
-  ares__llist_node_t *node;
+  ares_buf_t   *buf     = NULL;
+  ares_array_t *options = NULL;
+  size_t        num;
+  size_t        i;
+  ares_status_t status;
 
-  buf = ares__buf_create_const((const unsigned char *)str, ares_strlen(str));
+  buf = ares_buf_create_const((const unsigned char *)str, ares_strlen(str));
   if (buf == NULL) {
     return ARES_ENOMEM;
   }
 
-  status = ares__buf_split(buf, (const unsigned char *)" \t", 2,
-                           ARES_BUF_SPLIT_TRIM, 0, &options);
+  status = ares_buf_split(buf, (const unsigned char *)" \t", 2,
+                          ARES_BUF_SPLIT_TRIM, 0, &options);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (node = ares__llist_node_first(options); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t *valbuf = ares__llist_node_val(node);
+  num = ares_array_len(options);
+  for (i = 0; i < num; i++) {
+    ares_buf_t **bufptr = ares_array_at(options, i);
+    ares_buf_t  *valbuf = *bufptr;
 
     status = process_option(sysconfig, valbuf);
     /* Out of memory is the only fatal condition */
@@ -465,12 +463,12 @@ ares_status_t ares__sysconfig_set_options(ares_sysconfig_t *sysconfig,
   status = ARES_SUCCESS;
 
 done:
-  ares__llist_destroy(options);
-  ares__buf_destroy(buf);
+  ares_array_destroy(options);
+  ares_buf_destroy(buf);
   return status;
 }
 
-ares_status_t ares__init_by_environment(ares_sysconfig_t *sysconfig)
+ares_status_t ares_init_by_environment(ares_sysconfig_t *sysconfig)
 {
   const char   *localdomain;
   const char   *res_options;
@@ -491,7 +489,7 @@ ares_status_t ares__init_by_environment(ares_sysconfig_t *sysconfig)
 
   res_options = getenv("RES_OPTIONS");
   if (res_options) {
-    status = ares__sysconfig_set_options(sysconfig, res_options);
+    status = ares_sysconfig_set_options(sysconfig, res_options);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -551,70 +549,71 @@ ares_status_t ares__init_by_environment(ares_sysconfig_t *sysconfig)
 /* This function will only return ARES_SUCCESS or ARES_ENOMEM.  Any other
  * conditions are ignored.  Users may mess up config files, but we want to
  * process anything we can. */
-static ares_status_t parse_resolvconf_line(ares_sysconfig_t *sysconfig,
-                                           ares__buf_t      *line)
+static ares_status_t parse_resolvconf_line(const ares_channel_t *channel,
+                                           ares_sysconfig_t     *sysconfig,
+                                           ares_buf_t           *line)
 {
   char          option[32];
   char          value[512];
   ares_status_t status = ARES_SUCCESS;
 
   /* Ignore lines beginning with a comment */
-  if (ares__buf_begins_with(line, (const unsigned char *)"#", 1) ||
-      ares__buf_begins_with(line, (const unsigned char *)";", 1)) {
+  if (ares_buf_begins_with(line, (const unsigned char *)"#", 1) ||
+      ares_buf_begins_with(line, (const unsigned char *)";", 1)) {
     return ARES_SUCCESS;
   }
 
-  ares__buf_tag(line);
+  ares_buf_tag(line);
 
   /* Shouldn't be possible, but if it happens, ignore the line. */
-  if (ares__buf_consume_nonwhitespace(line) == 0) {
+  if (ares_buf_consume_nonwhitespace(line) == 0) {
     return ARES_SUCCESS;
   }
 
-  status = ares__buf_tag_fetch_string(line, option, sizeof(option));
+  status = ares_buf_tag_fetch_string(line, option, sizeof(option));
   if (status != ARES_SUCCESS) {
     return ARES_SUCCESS;
   }
 
-  ares__buf_consume_whitespace(line, ARES_TRUE);
+  ares_buf_consume_whitespace(line, ARES_TRUE);
 
   status = buf_fetch_string(line, value, sizeof(value));
   if (status != ARES_SUCCESS) {
     return ARES_SUCCESS;
   }
 
-  ares__str_trim(value);
+  ares_str_trim(value);
   if (*value == 0) {
     return ARES_SUCCESS;
   }
 
   /* At this point we have a string option and a string value, both trimmed
    * of leading and trailing whitespace.  Lets try to evaluate them */
-  if (strcmp(option, "domain") == 0) {
+  if (ares_streq(option, "domain")) {
     /* Domain is legacy, don't overwrite an existing config set by search */
     if (sysconfig->domains == NULL) {
       status = config_search(sysconfig, value, 1);
     }
-  } else if (strcmp(option, "lookup") == 0 ||
-             strcmp(option, "hostresorder") == 0) {
-    ares__buf_tag_rollback(line);
+  } else if (ares_streq(option, "lookup") ||
+             ares_streq(option, "hostresorder")) {
+    ares_buf_tag_rollback(line);
     status = config_lookup(sysconfig, line, " \t");
-  } else if (strcmp(option, "search") == 0) {
+  } else if (ares_streq(option, "search")) {
     status = config_search(sysconfig, value, 0);
-  } else if (strcmp(option, "nameserver") == 0) {
-    status =
-      ares__sconfig_append_fromstr(&sysconfig->sconfig, value, ARES_TRUE);
-  } else if (strcmp(option, "sortlist") == 0) {
+  } else if (ares_streq(option, "nameserver")) {
+    status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig, value,
+                                         ARES_TRUE);
+  } else if (ares_streq(option, "sortlist")) {
     /* Ignore all failures except ENOMEM.  If the sysadmin set a bad
      * sortlist, just ignore the sortlist, don't cause an inoperable
      * channel */
     status =
-      ares__parse_sortlist(&sysconfig->sortlist, &sysconfig->nsortlist, value);
+      ares_parse_sortlist(&sysconfig->sortlist, &sysconfig->nsortlist, value);
     if (status != ARES_ENOMEM) {
       status = ARES_SUCCESS;
     }
-  } else if (strcmp(option, "options") == 0) {
-    status = ares__sysconfig_set_options(sysconfig, value);
+  } else if (ares_streq(option, "options")) {
+    status = ares_sysconfig_set_options(sysconfig, value);
   }
 
   return status;
@@ -623,44 +622,51 @@ static ares_status_t parse_resolvconf_line(ares_sysconfig_t *sysconfig,
 /* This function will only return ARES_SUCCESS or ARES_ENOMEM.  Any other
  * conditions are ignored.  Users may mess up config files, but we want to
  * process anything we can. */
-static ares_status_t parse_nsswitch_line(ares_sysconfig_t *sysconfig,
-                                         ares__buf_t      *line)
+static ares_status_t parse_nsswitch_line(const ares_channel_t *channel,
+                                         ares_sysconfig_t     *sysconfig,
+                                         ares_buf_t           *line)
 {
-  char           option[32];
-  ares__buf_t   *buf;
-  ares_status_t  status = ARES_SUCCESS;
-  ares__llist_t *sects  = NULL;
+  char          option[32];
+  ares_status_t status = ARES_SUCCESS;
+  ares_array_t *sects  = NULL;
+  ares_buf_t  **bufptr;
+  ares_buf_t   *buf;
+
+  (void)channel;
 
   /* Ignore lines beginning with a comment */
-  if (ares__buf_begins_with(line, (const unsigned char *)"#", 1)) {
+  if (ares_buf_begins_with(line, (const unsigned char *)"#", 1)) {
     return ARES_SUCCESS;
   }
 
   /* database : values (space delimited) */
-  status = ares__buf_split(line, (const unsigned char *)":", 1,
-                           ARES_BUF_SPLIT_TRIM, 2, &sects);
+  status = ares_buf_split(line, (const unsigned char *)":", 1,
+                          ARES_BUF_SPLIT_TRIM, 2, &sects);
 
-  if (status != ARES_SUCCESS || ares__llist_len(sects) != 2) {
+  if (status != ARES_SUCCESS || ares_array_len(sects) != 2) {
     goto done;
   }
 
-  buf    = ares__llist_first_val(sects);
+  bufptr = ares_array_at(sects, 0);
+  buf    = *bufptr;
+
   status = buf_fetch_string(buf, option, sizeof(option));
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Only support "hosts:" */
-  if (strcmp(option, "hosts") != 0) {
+  if (!ares_streq(option, "hosts")) {
     goto done;
   }
 
   /* Values are space separated */
-  buf    = ares__llist_last_val(sects);
+  bufptr = ares_array_at(sects, 1);
+  buf    = *bufptr;
   status = config_lookup(sysconfig, buf, " \t");
 
 done:
-  ares__llist_destroy(sects);
+  ares_array_destroy(sects);
   if (status != ARES_ENOMEM) {
     status = ARES_SUCCESS;
   }
@@ -670,52 +676,59 @@ static ares_status_t parse_nsswitch_line(ares_sysconfig_t *sysconfig,
 /* This function will only return ARES_SUCCESS or ARES_ENOMEM.  Any other
  * conditions are ignored.  Users may mess up config files, but we want to
  * process anything we can. */
-static ares_status_t parse_svcconf_line(ares_sysconfig_t *sysconfig,
-                                        ares__buf_t      *line)
+static ares_status_t parse_svcconf_line(const ares_channel_t *channel,
+                                        ares_sysconfig_t     *sysconfig,
+                                        ares_buf_t           *line)
 {
-  char           option[32];
-  ares__buf_t   *buf;
-  ares_status_t  status = ARES_SUCCESS;
-  ares__llist_t *sects  = NULL;
+  char          option[32];
+  ares_buf_t  **bufptr;
+  ares_buf_t   *buf;
+  ares_status_t status = ARES_SUCCESS;
+  ares_array_t *sects  = NULL;
+
+  (void)channel;
 
   /* Ignore lines beginning with a comment */
-  if (ares__buf_begins_with(line, (const unsigned char *)"#", 1)) {
+  if (ares_buf_begins_with(line, (const unsigned char *)"#", 1)) {
     return ARES_SUCCESS;
   }
 
   /* database = values (comma delimited)*/
-  status = ares__buf_split(line, (const unsigned char *)"=", 1,
-                           ARES_BUF_SPLIT_TRIM, 2, &sects);
+  status = ares_buf_split(line, (const unsigned char *)"=", 1,
+                          ARES_BUF_SPLIT_TRIM, 2, &sects);
 
-  if (status != ARES_SUCCESS || ares__llist_len(sects) != 2) {
+  if (status != ARES_SUCCESS || ares_array_len(sects) != 2) {
     goto done;
   }
 
-  buf    = ares__llist_first_val(sects);
+  bufptr = ares_array_at(sects, 0);
+  buf    = *bufptr;
   status = buf_fetch_string(buf, option, sizeof(option));
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Only support "hosts=" */
-  if (strcmp(option, "hosts") != 0) {
+  if (!ares_streq(option, "hosts")) {
     goto done;
   }
 
   /* Values are comma separated */
-  buf    = ares__llist_last_val(sects);
+  bufptr = ares_array_at(sects, 1);
+  buf    = *bufptr;
   status = config_lookup(sysconfig, buf, ",");
 
 done:
-  ares__llist_destroy(sects);
+  ares_array_destroy(sects);
   if (status != ARES_ENOMEM) {
     status = ARES_SUCCESS;
   }
   return status;
 }
 
-typedef ares_status_t (*line_callback_t)(ares_sysconfig_t *sysconfig,
-                                         ares__buf_t      *line);
+typedef ares_status_t (*line_callback_t)(const ares_channel_t *channel,
+                                         ares_sysconfig_t     *sysconfig,
+                                         ares_buf_t           *line);
 
 /* Should only return:
  *  ARES_ENOTFOUND - file not found
@@ -724,56 +737,60 @@ typedef ares_status_t (*line_callback_t)(ares_sysconfig_t *sysconfig,
  *  ARES_SUCCESS   - file processed, doesn't necessarily mean it was a good
  *                   file, but we're not erroring out if we can't parse
  *                   something (or anything at all) */
-static ares_status_t process_config_lines(const char       *filename,
-                                          ares_sysconfig_t *sysconfig,
-                                          line_callback_t   cb)
+static ares_status_t process_config_lines(const ares_channel_t *channel,
+                                          const char           *filename,
+                                          ares_sysconfig_t     *sysconfig,
+                                          line_callback_t       cb)
 {
-  ares_status_t       status = ARES_SUCCESS;
-  ares__llist_node_t *node;
-  ares__llist_t      *lines = NULL;
-  ares__buf_t        *buf   = NULL;
+  ares_status_t status = ARES_SUCCESS;
+  ares_array_t *lines  = NULL;
+  ares_buf_t   *buf    = NULL;
+  size_t        num;
+  size_t        i;
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  status = ares__buf_load_file(filename, buf);
+  status = ares_buf_load_file(filename, buf);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  status = ares__buf_split(buf, (const unsigned char *)"\n", 1,
-                           ARES_BUF_SPLIT_TRIM, 0, &lines);
+  status = ares_buf_split(buf, (const unsigned char *)"\n", 1,
+                          ARES_BUF_SPLIT_TRIM, 0, &lines);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (node = ares__llist_node_first(lines); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t *line = ares__llist_node_val(node);
+  num = ares_array_len(lines);
+  for (i = 0; i < num; i++) {
+    ares_buf_t **bufptr = ares_array_at(lines, i);
+    ares_buf_t  *line   = *bufptr;
 
-    status = cb(sysconfig, line);
+    status = cb(channel, sysconfig, line);
     if (status != ARES_SUCCESS) {
       goto done;
     }
   }
 
 done:
-  ares__buf_destroy(buf);
-  ares__llist_destroy(lines);
+  ares_buf_destroy(buf);
+  ares_array_destroy(lines);
 
   return status;
 }
 
-ares_status_t ares__init_sysconfig_files(const ares_channel_t *channel,
-                                         ares_sysconfig_t     *sysconfig)
+ares_status_t ares_init_sysconfig_files(const ares_channel_t *channel,
+                                        ares_sysconfig_t     *sysconfig)
 {
   ares_status_t status = ARES_SUCCESS;
 
   /* Resolv.conf */
-  status = process_config_lines((channel->resolvconf_path != NULL)
+  status = process_config_lines(channel,
+                                (channel->resolvconf_path != NULL)
                                   ? channel->resolvconf_path
                                   : PATH_RESOLV_CONF,
                                 sysconfig, parse_resolvconf_line);
@@ -782,21 +799,22 @@ ares_status_t ares__init_sysconfig_files(const ares_channel_t *channel,
   }
 
   /* Nsswitch.conf */
-  status =
-    process_config_lines("/etc/nsswitch.conf", sysconfig, parse_nsswitch_line);
+  status = process_config_lines(channel, "/etc/nsswitch.conf", sysconfig,
+                                parse_nsswitch_line);
   if (status != ARES_SUCCESS && status != ARES_ENOTFOUND) {
     goto done;
   }
 
   /* netsvc.conf */
-  status =
-    process_config_lines("/etc/netsvc.conf", sysconfig, parse_svcconf_line);
+  status = process_config_lines(channel, "/etc/netsvc.conf", sysconfig,
+                                parse_svcconf_line);
   if (status != ARES_SUCCESS && status != ARES_ENOTFOUND) {
     goto done;
   }
 
   /* svc.conf */
-  status = process_config_lines("/etc/svc.conf", sysconfig, parse_svcconf_line);
+  status = process_config_lines(channel, "/etc/svc.conf", sysconfig,
+                                parse_svcconf_line);
   if (status != ARES_SUCCESS && status != ARES_ENOTFOUND) {
     goto done;
   }
diff --git a/deps/cares/src/lib/ares_sysconfig_mac.c b/deps/cares/src/lib/ares_sysconfig_mac.c
index 38ac451ca5f410..4d46ffd58df53b 100644
--- a/deps/cares/src/lib/ares_sysconfig_mac.c
+++ b/deps/cares/src/lib/ares_sysconfig_mac.c
@@ -154,14 +154,15 @@ static ares_bool_t search_is_duplicate(const ares_sysconfig_t *sysconfig,
 {
   size_t i;
   for (i = 0; i < sysconfig->ndomains; i++) {
-    if (strcasecmp(sysconfig->domains[i], name) == 0) {
+    if (ares_strcaseeq(sysconfig->domains[i], name)) {
       return ARES_TRUE;
     }
   }
   return ARES_FALSE;
 }
 
-static ares_status_t read_resolver(const dns_resolver_t *resolver,
+static ares_status_t read_resolver(const ares_channel_t *channel,
+                                   const dns_resolver_t *resolver,
                                    ares_sysconfig_t     *sysconfig)
 {
   int            i;
@@ -243,7 +244,7 @@ static ares_status_t read_resolver(const dns_resolver_t *resolver,
 #  endif
 
   if (resolver->options != NULL) {
-    status = ares__sysconfig_set_options(sysconfig, resolver->options);
+    status = ares_sysconfig_set_options(sysconfig, resolver->options);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -269,7 +270,7 @@ static ares_status_t read_resolver(const dns_resolver_t *resolver,
     unsigned short         addrport;
     const struct sockaddr *sockaddr;
     char                   if_name_str[256] = "";
-    const char            *if_name;
+    const char            *if_name          = NULL;
 
     /* UBSAN alignment workaround to fetch memory address */
     memcpy(&sockaddr, resolver->nameserver + i, sizeof(sockaddr));
@@ -282,10 +283,14 @@ static ares_status_t read_resolver(const dns_resolver_t *resolver,
       addrport = port;
     }
 
-    if_name = ares__if_indextoname(resolver->if_index, if_name_str,
-                                   sizeof(if_name_str));
-    status  = ares__sconfig_append(&sysconfig->sconfig, &addr, addrport,
-                                   addrport, if_name);
+    if (channel->sock_funcs.aif_indextoname != NULL) {
+      if_name = channel->sock_funcs.aif_indextoname(
+        resolver->if_index, if_name_str, sizeof(if_name_str),
+        channel->sock_func_cb_data);
+    }
+
+    status = ares_sconfig_append(channel, &sysconfig->sconfig, &addr, addrport,
+                                 addrport, if_name);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -294,7 +299,8 @@ static ares_status_t read_resolver(const dns_resolver_t *resolver,
   return status;
 }
 
-static ares_status_t read_resolvers(dns_resolver_t **resolvers, int nresolvers,
+static ares_status_t read_resolvers(const ares_channel_t *channel,
+                                    dns_resolver_t **resolvers, int nresolvers,
                                     ares_sysconfig_t *sysconfig)
 {
   ares_status_t status = ARES_SUCCESS;
@@ -309,13 +315,14 @@ static ares_status_t read_resolvers(dns_resolver_t **resolvers, int nresolvers,
      */
     memcpy(&resolver_ptr, resolvers + i, sizeof(resolver_ptr));
 
-    status = read_resolver(resolver_ptr, sysconfig);
+    status = read_resolver(channel, resolver_ptr, sysconfig);
   }
 
   return status;
 }
 
-ares_status_t ares__init_sysconfig_macos(ares_sysconfig_t *sysconfig)
+ares_status_t ares_init_sysconfig_macos(const ares_channel_t *channel,
+                                        ares_sysconfig_t     *sysconfig)
 {
   dnsinfo_t    *dnsinfo = NULL;
   dns_config_t *sc_dns  = NULL;
@@ -343,7 +350,8 @@ ares_status_t ares__init_sysconfig_macos(ares_sysconfig_t *sysconfig)
    * Likely this wasn't available via `/etc/resolv.conf` nor `libresolv` anyhow
    * so its not worse to prior configuration methods, worst case. */
 
-  status = read_resolvers(sc_dns->resolver, sc_dns->n_resolver, sysconfig);
+  status =
+    read_resolvers(channel, sc_dns->resolver, sc_dns->n_resolver, sysconfig);
 
 done:
   if (dnsinfo) {
diff --git a/deps/cares/src/lib/ares_sysconfig_win.c b/deps/cares/src/lib/ares_sysconfig_win.c
index ce2a261cec82bb..f6e07f92e47380 100644
--- a/deps/cares/src/lib/ares_sysconfig_win.c
+++ b/deps/cares/src/lib/ares_sysconfig_win.c
@@ -53,7 +53,6 @@
 #endif
 
 #include "ares_inet_net_pton.h"
-#include "ares_platform.h"
 
 #if defined(USE_WINSOCK)
 /*
@@ -420,7 +419,7 @@ static ares_bool_t get_DNS_Windows(char **outptr)
         memset(&addr, 0, sizeof(addr));
         addr.family = AF_INET6;
         memcpy(&addr.addr.addr6, &namesrvr.sa6->sin6_addr, 16);
-        if (ares__addr_is_linklocal(&addr)) {
+        if (ares_addr_is_linklocal(&addr)) {
           ll_scope = ipaaEntry->Ipv6IfIndex;
         }
 
@@ -514,10 +513,6 @@ static ares_bool_t get_SuffixList_Windows(char **outptr)
 
   *outptr = NULL;
 
-  if (ares__getplatform() != WIN_NT) {
-    return ARES_FALSE;
-  }
-
   /* 1. Global DNS Suffix Search List */
   if (RegOpenKeyExA(HKEY_LOCAL_MACHINE, WIN_NS_NT_KEY, 0, KEY_READ, &hKey) ==
       ERROR_SUCCESS) {
@@ -589,13 +584,15 @@ static ares_bool_t get_SuffixList_Windows(char **outptr)
   return *outptr != NULL ? ARES_TRUE : ARES_FALSE;
 }
 
-ares_status_t ares__init_sysconfig_windows(ares_sysconfig_t *sysconfig)
+ares_status_t ares_init_sysconfig_windows(const ares_channel_t *channel,
+                                          ares_sysconfig_t     *sysconfig)
 {
   char         *line   = NULL;
   ares_status_t status = ARES_SUCCESS;
 
   if (get_DNS_Windows(&line)) {
-    status = ares__sconfig_append_fromstr(&sysconfig->sconfig, line, ARES_TRUE);
+    status = ares_sconfig_append_fromstr(channel, &sysconfig->sconfig, line,
+                                         ARES_TRUE);
     ares_free(line);
     if (status != ARES_SUCCESS) {
       goto done;
@@ -603,7 +600,7 @@ ares_status_t ares__init_sysconfig_windows(ares_sysconfig_t *sysconfig)
   }
 
   if (get_SuffixList_Windows(&line)) {
-    sysconfig->domains = ares__strsplit(line, ", ", &sysconfig->ndomains);
+    sysconfig->domains = ares_strsplit(line, ", ", &sysconfig->ndomains);
     ares_free(line);
     if (sysconfig->domains == NULL) {
       status = ARES_EFILE;
diff --git a/deps/cares/src/lib/ares_timeout.c b/deps/cares/src/lib/ares_timeout.c
index 5ed8b553a3c5ef..0d2fdcff21f657 100644
--- a/deps/cares/src/lib/ares_timeout.c
+++ b/deps/cares/src/lib/ares_timeout.c
@@ -32,9 +32,9 @@
 #endif
 
 
-void ares__timeval_remaining(ares_timeval_t       *remaining,
-                             const ares_timeval_t *now,
-                             const ares_timeval_t *tout)
+void ares_timeval_remaining(ares_timeval_t       *remaining,
+                            const ares_timeval_t *now,
+                            const ares_timeval_t *tout)
 {
   memset(remaining, 0, sizeof(*remaining));
 
@@ -53,8 +53,8 @@ void ares__timeval_remaining(ares_timeval_t       *remaining,
   }
 }
 
-void ares__timeval_diff(ares_timeval_t *tvdiff, const ares_timeval_t *tvstart,
-                        const ares_timeval_t *tvstop)
+void ares_timeval_diff(ares_timeval_t *tvdiff, const ares_timeval_t *tvstart,
+                       const ares_timeval_t *tvstop)
 {
   tvdiff->sec = tvstop->sec - tvstart->sec;
   if (tvstop->usec > tvstart->usec) {
@@ -89,24 +89,24 @@ static struct timeval *ares_timeout_int(const ares_channel_t *channel,
                                         struct timeval       *tvbuf)
 {
   const ares_query_t *query;
-  ares__slist_node_t *node;
+  ares_slist_node_t  *node;
   ares_timeval_t      now;
   ares_timeval_t      atvbuf;
   ares_timeval_t      amaxtv;
 
   /* The minimum timeout of all queries is always the first entry in
    * channel->queries_by_timeout */
-  node = ares__slist_node_first(channel->queries_by_timeout);
+  node = ares_slist_node_first(channel->queries_by_timeout);
   /* no queries/timeout */
   if (node == NULL) {
     return maxtv;
   }
 
-  query = ares__slist_node_val(node);
+  query = ares_slist_node_val(node);
 
-  ares__tvnow(&now);
+  ares_tvnow(&now);
 
-  ares__timeval_remaining(&atvbuf, &now, &query->timeout);
+  ares_timeval_remaining(&atvbuf, &now, &query->timeout);
 
   ares_timeval_to_struct_timeval(tvbuf, &atvbuf);
 
@@ -141,11 +141,11 @@ struct timeval *ares_timeout(const ares_channel_t *channel,
     return NULL;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
   rv = ares_timeout_int(channel, maxtv, tvbuf);
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return rv;
 }
diff --git a/deps/cares/src/lib/ares_update_servers.c b/deps/cares/src/lib/ares_update_servers.c
index 639f79d815915f..70a9381499f8c2 100644
--- a/deps/cares/src/lib/ares_update_servers.c
+++ b/deps/cares/src/lib/ares_update_servers.c
@@ -39,6 +39,9 @@
 #ifdef HAVE_NET_IF_H
 #  include <net/if.h>
 #endif
+#ifdef HAVE_STDINT_H
+#  include <stdint.h>
+#endif
 
 #if defined(USE_WINSOCK)
 #  if defined(HAVE_IPHLPAPI_H)
@@ -61,8 +64,8 @@ typedef struct {
   unsigned int     ll_scope;
 } ares_sconfig_t;
 
-static ares_bool_t ares__addr_match(const struct ares_addr *addr1,
-                                    const struct ares_addr *addr2)
+static ares_bool_t ares_addr_match(const struct ares_addr *addr1,
+                                   const struct ares_addr *addr2)
 {
   if (addr1 == NULL && addr2 == NULL) {
     return ARES_TRUE; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -90,9 +93,9 @@ static ares_bool_t ares__addr_match(const struct ares_addr *addr1,
   return ARES_FALSE;
 }
 
-ares_bool_t ares__subnet_match(const struct ares_addr *addr,
-                               const struct ares_addr *subnet,
-                               unsigned char           netmask)
+ares_bool_t ares_subnet_match(const struct ares_addr *addr,
+                              const struct ares_addr *subnet,
+                              unsigned char           netmask)
 {
   const unsigned char *addr_ptr;
   const unsigned char *subnet_ptr;
@@ -144,7 +147,7 @@ ares_bool_t ares__subnet_match(const struct ares_addr *addr,
   return ARES_TRUE;
 }
 
-ares_bool_t ares__addr_is_linklocal(const struct ares_addr *addr)
+ares_bool_t ares_addr_is_linklocal(const struct ares_addr *addr)
 {
   struct ares_addr    subnet;
   const unsigned char subnetaddr[16] = { 0xfe, 0x80, 0x00, 0x00, 0x00, 0x00,
@@ -155,7 +158,7 @@ ares_bool_t ares__addr_is_linklocal(const struct ares_addr *addr)
   subnet.family = AF_INET6;
   memcpy(&subnet.addr.addr6, subnetaddr, 16);
 
-  return ares__subnet_match(addr, &subnet, 10);
+  return ares_subnet_match(addr, &subnet, 10);
 }
 
 static ares_bool_t ares_server_blacklisted(const struct ares_addr *addr)
@@ -185,13 +188,60 @@ static ares_bool_t ares_server_blacklisted(const struct ares_addr *addr)
     struct ares_addr subnet;
     subnet.family = AF_INET6;
     memcpy(&subnet.addr.addr6, blacklist_v6[i].netbase, 16);
-    if (ares__subnet_match(addr, &subnet, blacklist_v6[i].netmask)) {
+    if (ares_subnet_match(addr, &subnet, blacklist_v6[i].netmask)) {
       return ARES_TRUE;
     }
   }
   return ARES_FALSE;
 }
 
+static ares_status_t parse_nameserver_uri(ares_buf_t     *buf,
+                                          ares_sconfig_t *sconfig)
+{
+  ares_uri_t   *uri    = NULL;
+  ares_status_t status = ARES_SUCCESS;
+  const char   *port;
+  char         *ll_scope;
+  char          hoststr[256];
+  size_t        addrlen;
+
+  status = ares_uri_parse_buf(&uri, buf);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  if (!ares_streq("dns", ares_uri_get_scheme(uri))) {
+    status = ARES_EBADSTR;
+    goto done;
+  }
+
+  ares_strcpy(hoststr, ares_uri_get_host(uri), sizeof(hoststr));
+  ll_scope = strchr(hoststr, '%');
+  if (ll_scope != NULL) {
+    *ll_scope = 0;
+    ll_scope++;
+    ares_strcpy(sconfig->ll_iface, ll_scope, sizeof(sconfig->ll_iface));
+  }
+
+  /* Convert ip address from string to network byte order */
+  sconfig->addr.family = AF_UNSPEC;
+  if (ares_dns_pton(hoststr, &sconfig->addr, &addrlen) == NULL) {
+    status = ARES_EBADSTR;
+    goto done;
+  }
+
+  sconfig->udp_port = ares_uri_get_port(uri);
+  sconfig->tcp_port = sconfig->udp_port;
+  port              = ares_uri_get_query_key(uri, "tcpport");
+  if (port != NULL) {
+    sconfig->tcp_port = (unsigned short)atoi(port);
+  }
+
+done:
+  ares_uri_destroy(uri);
+  return status;
+}
+
 /* Parse address and port in these formats, either ipv4 or ipv6 addresses
  * are allowed:
  *   ipaddr
@@ -211,7 +261,7 @@ static ares_bool_t ares_server_blacklisted(const struct ares_addr *addr)
  * Returns an error code on failure, else ARES_SUCCESS
  */
 
-static ares_status_t parse_nameserver(ares__buf_t *buf, ares_sconfig_t *sconfig)
+static ares_status_t parse_nameserver(ares_buf_t *buf, ares_sconfig_t *sconfig)
 {
   ares_status_t status;
   char          ipaddr[INET6_ADDRSTRLEN] = "";
@@ -220,57 +270,57 @@ static ares_status_t parse_nameserver(ares__buf_t *buf, ares_sconfig_t *sconfig)
   memset(sconfig, 0, sizeof(*sconfig));
 
   /* Consume any leading whitespace */
-  ares__buf_consume_whitespace(buf, ARES_TRUE);
+  ares_buf_consume_whitespace(buf, ARES_TRUE);
 
   /* pop off IP address.  If it is in [ ] then it can be ipv4 or ipv6.  If
    * not, ipv4 only */
-  if (ares__buf_begins_with(buf, (const unsigned char *)"[", 1)) {
+  if (ares_buf_begins_with(buf, (const unsigned char *)"[", 1)) {
     /* Consume [ */
-    ares__buf_consume(buf, 1);
+    ares_buf_consume(buf, 1);
 
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
     /* Consume until ] */
-    if (ares__buf_consume_until_charset(buf, (const unsigned char *)"]", 1,
-                                        ARES_TRUE) == 0) {
+    if (ares_buf_consume_until_charset(buf, (const unsigned char *)"]", 1,
+                                       ARES_TRUE) == SIZE_MAX) {
       return ARES_EBADSTR;
     }
 
-    status = ares__buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
+    status = ares_buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Skip over ] */
-    ares__buf_consume(buf, 1);
+    ares_buf_consume(buf, 1);
   } else {
     size_t offset;
 
     /* Not in [ ], see if '.' is in first 4 characters, if it is, then its ipv4,
      * otherwise treat as ipv6 */
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
-    offset = ares__buf_consume_until_charset(buf, (const unsigned char *)".", 1,
-                                             ARES_TRUE);
-    ares__buf_tag_rollback(buf);
-    ares__buf_tag(buf);
+    offset = ares_buf_consume_until_charset(buf, (const unsigned char *)".", 1,
+                                            ARES_TRUE);
+    ares_buf_tag_rollback(buf);
+    ares_buf_tag(buf);
 
     if (offset > 0 && offset < 4) {
       /* IPv4 */
-      if (ares__buf_consume_charset(buf, (const unsigned char *)"0123456789.",
-                                    11) == 0) {
+      if (ares_buf_consume_charset(buf, (const unsigned char *)"0123456789.",
+                                   11) == 0) {
         return ARES_EBADSTR;
       }
     } else {
       /* IPv6 */
       const unsigned char ipv6_charset[] = "ABCDEFabcdef0123456789.:";
-      if (ares__buf_consume_charset(buf, ipv6_charset,
-                                    sizeof(ipv6_charset) - 1) == 0) {
+      if (ares_buf_consume_charset(buf, ipv6_charset,
+                                   sizeof(ipv6_charset) - 1) == 0) {
         return ARES_EBADSTR;
       }
     }
 
-    status = ares__buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
+    status = ares_buf_tag_fetch_string(buf, ipaddr, sizeof(ipaddr));
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -283,21 +333,21 @@ static ares_status_t parse_nameserver(ares__buf_t *buf, ares_sconfig_t *sconfig)
   }
 
   /* Pull off port */
-  if (ares__buf_begins_with(buf, (const unsigned char *)":", 1)) {
+  if (ares_buf_begins_with(buf, (const unsigned char *)":", 1)) {
     char portstr[6];
 
     /* Consume : */
-    ares__buf_consume(buf, 1);
+    ares_buf_consume(buf, 1);
 
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
     /* Read numbers */
-    if (ares__buf_consume_charset(buf, (const unsigned char *)"0123456789",
-                                  10) == 0) {
+    if (ares_buf_consume_charset(buf, (const unsigned char *)"0123456789",
+                                 10) == 0) {
       return ARES_EBADSTR;
     }
 
-    status = ares__buf_tag_fetch_string(buf, portstr, sizeof(portstr));
+    status = ares_buf_tag_fetch_string(buf, portstr, sizeof(portstr));
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -307,22 +357,22 @@ static ares_status_t parse_nameserver(ares__buf_t *buf, ares_sconfig_t *sconfig)
   }
 
   /* Pull off interface modifier */
-  if (ares__buf_begins_with(buf, (const unsigned char *)"%", 1)) {
+  if (ares_buf_begins_with(buf, (const unsigned char *)"%", 1)) {
     const unsigned char iface_charset[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                                           "abcdefghijklmnopqrstuvwxyz"
                                           "0123456789.-_\\:{}";
     /* Consume % */
-    ares__buf_consume(buf, 1);
+    ares_buf_consume(buf, 1);
 
-    ares__buf_tag(buf);
+    ares_buf_tag(buf);
 
-    if (ares__buf_consume_charset(buf, iface_charset,
-                                  sizeof(iface_charset) - 1) == 0) {
+    if (ares_buf_consume_charset(buf, iface_charset,
+                                 sizeof(iface_charset) - 1) == 0) {
       return ARES_EBADSTR;
     }
 
-    status = ares__buf_tag_fetch_string(buf, sconfig->ll_iface,
-                                        sizeof(sconfig->ll_iface));
+    status = ares_buf_tag_fetch_string(buf, sconfig->ll_iface,
+                                       sizeof(sconfig->ll_iface));
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -330,24 +380,29 @@ static ares_status_t parse_nameserver(ares__buf_t *buf, ares_sconfig_t *sconfig)
 
   /* Consume any trailing whitespace so we can bail out if there is something
    * after we didn't read */
-  ares__buf_consume_whitespace(buf, ARES_TRUE);
+  ares_buf_consume_whitespace(buf, ARES_TRUE);
 
-  if (ares__buf_len(buf) != 0) {
+  if (ares_buf_len(buf) != 0) {
     return ARES_EBADSTR;
   }
 
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares__sconfig_linklocal(ares_sconfig_t *s,
-                                             const char     *ll_iface)
+static ares_status_t ares_sconfig_linklocal(const ares_channel_t *channel,
+                                            ares_sconfig_t       *s,
+                                            const char           *ll_iface)
 {
   unsigned int ll_scope = 0;
 
+
   if (ares_str_isnum(ll_iface)) {
     char ifname[IF_NAMESIZE] = "";
     ll_scope                 = (unsigned int)atoi(ll_iface);
-    if (ares__if_indextoname(ll_scope, ifname, sizeof(ifname)) == NULL) {
+    if (channel->sock_funcs.aif_indextoname == NULL ||
+        channel->sock_funcs.aif_indextoname(ll_scope, ifname, sizeof(ifname),
+                                            channel->sock_func_cb_data) ==
+          NULL) {
       DEBUGF(fprintf(stderr, "Interface %s for ipv6 Link Local not found\n",
                      ll_iface));
       return ARES_ENOTFOUND;
@@ -357,7 +412,10 @@ static ares_status_t ares__sconfig_linklocal(ares_sconfig_t *s,
     return ARES_SUCCESS;
   }
 
-  ll_scope = ares__if_nametoindex(ll_iface);
+  if (channel->sock_funcs.aif_nametoindex != NULL) {
+    ll_scope =
+      channel->sock_funcs.aif_nametoindex(ll_iface, channel->sock_func_cb_data);
+  }
   if (ll_scope == 0) {
     DEBUGF(fprintf(stderr, "Interface %s for ipv6 Link Local not found\n",
                    ll_iface));
@@ -368,11 +426,11 @@ static ares_status_t ares__sconfig_linklocal(ares_sconfig_t *s,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
-                                   const struct ares_addr *addr,
-                                   unsigned short          udp_port,
-                                   unsigned short          tcp_port,
-                                   const char             *ll_iface)
+ares_status_t ares_sconfig_append(const ares_channel_t   *channel,
+                                  ares_llist_t          **sconfig,
+                                  const struct ares_addr *addr,
+                                  unsigned short          udp_port,
+                                  unsigned short tcp_port, const char *ll_iface)
 {
   ares_sconfig_t *s;
   ares_status_t   status;
@@ -392,7 +450,7 @@ ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
   }
 
   if (*sconfig == NULL) {
-    *sconfig = ares__llist_create(ares_free);
+    *sconfig = ares_llist_create(ares_free);
     if (*sconfig == NULL) {
       status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
       goto fail;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -405,13 +463,13 @@ ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
 
   /* Handle link-local enumeration. If an interface is specified on a
    * non-link-local address, we'll simply end up ignoring that */
-  if (ares__addr_is_linklocal(&s->addr)) {
+  if (ares_addr_is_linklocal(&s->addr)) {
     if (ares_strlen(ll_iface) == 0) {
       /* Silently ignore this entry, we require an interface */
       status = ARES_SUCCESS;
       goto fail;
     }
-    status = ares__sconfig_linklocal(s, ll_iface);
+    status = ares_sconfig_linklocal(channel, s, ll_iface);
     /* Silently ignore this entry, we can't validate the interface */
     if (status != ARES_SUCCESS) {
       status = ARES_SUCCESS;
@@ -419,7 +477,7 @@ ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
     }
   }
 
-  if (ares__llist_insert_last(*sconfig, s) == NULL) {
+  if (ares_llist_insert_last(*sconfig, s) == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto fail;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -448,36 +506,43 @@ ares_status_t ares__sconfig_append(ares__llist_t         **sconfig,
  *
  * Returns an error code on failure, else ARES_SUCCESS.
  */
-ares_status_t ares__sconfig_append_fromstr(ares__llist_t **sconfig,
-                                           const char     *str,
-                                           ares_bool_t     ignore_invalid)
+ares_status_t ares_sconfig_append_fromstr(const ares_channel_t *channel,
+                                          ares_llist_t        **sconfig,
+                                          const char           *str,
+                                          ares_bool_t           ignore_invalid)
 {
-  ares_status_t       status = ARES_SUCCESS;
-  ares__buf_t        *buf    = NULL;
-  ares__llist_t      *list   = NULL;
-  ares__llist_node_t *node;
+  ares_status_t status = ARES_SUCCESS;
+  ares_buf_t   *buf    = NULL;
+  ares_array_t *list   = NULL;
+  size_t        num;
+  size_t        i;
 
   /* On Windows, there may be more than one nameserver specified in the same
    * registry key, so we parse input as a space or comma separated list.
    */
-  buf = ares__buf_create_const((const unsigned char *)str, ares_strlen(str));
+  buf = ares_buf_create_const((const unsigned char *)str, ares_strlen(str));
   if (buf == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  status = ares__buf_split(buf, (const unsigned char *)" ,", 2,
-                           ARES_BUF_SPLIT_NONE, 0, &list);
+  status = ares_buf_split(buf, (const unsigned char *)" ,", 2,
+                          ARES_BUF_SPLIT_NONE, 0, &list);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (node = ares__llist_node_first(list); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t   *entry = ares__llist_node_val(node);
+  num = ares_array_len(list);
+  for (i = 0; i < num; i++) {
+    ares_buf_t   **bufptr = ares_array_at(list, i);
+    ares_buf_t    *entry  = *bufptr;
     ares_sconfig_t s;
 
-    status = parse_nameserver(entry, &s);
+    status = parse_nameserver_uri(entry, &s);
+    if (status != ARES_SUCCESS) {
+      status = parse_nameserver(entry, &s);
+    }
+
     if (status != ARES_SUCCESS) {
       if (ignore_invalid) {
         continue;
@@ -486,8 +551,8 @@ ares_status_t ares__sconfig_append_fromstr(ares__llist_t **sconfig,
       }
     }
 
-    status = ares__sconfig_append(sconfig, &s.addr, s.udp_port, s.tcp_port,
-                                  s.ll_iface);
+    status = ares_sconfig_append(channel, sconfig, &s.addr, s.udp_port,
+                                 s.tcp_port, s.ll_iface);
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -496,14 +561,14 @@ ares_status_t ares__sconfig_append_fromstr(ares__llist_t **sconfig,
   status = ARES_SUCCESS;
 
 done:
-  ares__llist_destroy(list);
-  ares__buf_destroy(buf);
+  ares_array_destroy(list);
+  ares_buf_destroy(buf);
   return status;
 }
 
-static unsigned short ares__sconfig_get_port(const ares_channel_t *channel,
-                                             const ares_sconfig_t *s,
-                                             ares_bool_t           is_tcp)
+static unsigned short ares_sconfig_get_port(const ares_channel_t *channel,
+                                            const ares_sconfig_t *s,
+                                            ares_bool_t           is_tcp)
 {
   unsigned short port = is_tcp ? s->tcp_port : s->udp_port;
 
@@ -518,24 +583,24 @@ static unsigned short ares__sconfig_get_port(const ares_channel_t *channel,
   return port;
 }
 
-static ares__slist_node_t *ares__server_find(ares_channel_t       *channel,
-                                             const ares_sconfig_t *s)
+static ares_slist_node_t *ares_server_find(const ares_channel_t *channel,
+                                           const ares_sconfig_t *s)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *node;
 
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
-    const ares_server_t *server = ares__slist_node_val(node);
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    const ares_server_t *server = ares_slist_node_val(node);
 
-    if (!ares__addr_match(&server->addr, &s->addr)) {
+    if (!ares_addr_match(&server->addr, &s->addr)) {
       continue;
     }
 
-    if (server->tcp_port != ares__sconfig_get_port(channel, s, ARES_TRUE)) {
+    if (server->tcp_port != ares_sconfig_get_port(channel, s, ARES_TRUE)) {
       continue;
     }
 
-    if (server->udp_port != ares__sconfig_get_port(channel, s, ARES_FALSE)) {
+    if (server->udp_port != ares_sconfig_get_port(channel, s, ARES_FALSE)) {
       continue;
     }
 
@@ -544,28 +609,28 @@ static ares__slist_node_t *ares__server_find(ares_channel_t       *channel,
   return NULL;
 }
 
-static ares_bool_t ares__server_isdup(const ares_channel_t *channel,
-                                      ares__llist_node_t   *s)
+static ares_bool_t ares_server_isdup(const ares_channel_t *channel,
+                                     ares_llist_node_t    *s)
 {
   /* Scan backwards to see if this is a duplicate */
-  ares__llist_node_t   *prev;
-  const ares_sconfig_t *server = ares__llist_node_val(s);
+  ares_llist_node_t    *prev;
+  const ares_sconfig_t *server = ares_llist_node_val(s);
 
-  for (prev = ares__llist_node_prev(s); prev != NULL;
-       prev = ares__llist_node_prev(prev)) {
-    const ares_sconfig_t *p = ares__llist_node_val(prev);
+  for (prev = ares_llist_node_prev(s); prev != NULL;
+       prev = ares_llist_node_prev(prev)) {
+    const ares_sconfig_t *p = ares_llist_node_val(prev);
 
-    if (!ares__addr_match(&server->addr, &p->addr)) {
+    if (!ares_addr_match(&server->addr, &p->addr)) {
       continue;
     }
 
-    if (ares__sconfig_get_port(channel, server, ARES_TRUE) !=
-        ares__sconfig_get_port(channel, p, ARES_TRUE)) {
+    if (ares_sconfig_get_port(channel, server, ARES_TRUE) !=
+        ares_sconfig_get_port(channel, p, ARES_TRUE)) {
       continue;
     }
 
-    if (ares__sconfig_get_port(channel, server, ARES_FALSE) !=
-        ares__sconfig_get_port(channel, p, ARES_FALSE)) {
+    if (ares_sconfig_get_port(channel, server, ARES_FALSE) !=
+        ares_sconfig_get_port(channel, p, ARES_FALSE)) {
       continue;
     }
 
@@ -575,9 +640,9 @@ static ares_bool_t ares__server_isdup(const ares_channel_t *channel,
   return ARES_FALSE;
 }
 
-static ares_status_t ares__server_create(ares_channel_t       *channel,
-                                         const ares_sconfig_t *sconfig,
-                                         size_t                idx)
+static ares_status_t ares_server_create(ares_channel_t       *channel,
+                                        const ares_sconfig_t *sconfig,
+                                        size_t                idx)
 {
   ares_status_t  status;
   ares_server_t *server = ares_malloc_zero(sizeof(*server));
@@ -588,8 +653,8 @@ static ares_status_t ares__server_create(ares_channel_t       *channel,
 
   server->idx         = idx;
   server->channel     = channel;
-  server->udp_port    = ares__sconfig_get_port(channel, sconfig, ARES_FALSE);
-  server->tcp_port    = ares__sconfig_get_port(channel, sconfig, ARES_TRUE);
+  server->udp_port    = ares_sconfig_get_port(channel, sconfig, ARES_FALSE);
+  server->tcp_port    = ares_sconfig_get_port(channel, sconfig, ARES_TRUE);
   server->addr.family = sconfig->addr.family;
   server->next_retry_time.sec  = 0;
   server->next_retry_time.usec = 0;
@@ -608,25 +673,13 @@ static ares_status_t ares__server_create(ares_channel_t       *channel,
     server->ll_scope = sconfig->ll_scope;
   }
 
-  server->tcp_parser = ares__buf_create();
-  if (server->tcp_parser == NULL) {
-    status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
-    goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
-  }
-
-  server->tcp_send = ares__buf_create();
-  if (server->tcp_send == NULL) {
-    status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
-    goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
-  }
-
-  server->connections = ares__llist_create(NULL);
+  server->connections = ares_llist_create(NULL);
   if (server->connections == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  if (ares__slist_insert(channel->servers, server) == NULL) {
+  if (ares_slist_insert(channel->servers, server) == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -635,31 +688,31 @@ static ares_status_t ares__server_create(ares_channel_t       *channel,
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__destroy_server(server); /* LCOV_EXCL_LINE: OutOfMemory */
+    ares_destroy_server(server); /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   return status;
 }
 
-static ares_bool_t ares__server_in_newconfig(const ares_server_t *server,
-                                             ares__llist_t       *srvlist)
+static ares_bool_t ares_server_in_newconfig(const ares_server_t *server,
+                                            ares_llist_t        *srvlist)
 {
-  ares__llist_node_t   *node;
+  ares_llist_node_t    *node;
   const ares_channel_t *channel = server->channel;
 
-  for (node = ares__llist_node_first(srvlist); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const ares_sconfig_t *s = ares__llist_node_val(node);
+  for (node = ares_llist_node_first(srvlist); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const ares_sconfig_t *s = ares_llist_node_val(node);
 
-    if (!ares__addr_match(&server->addr, &s->addr)) {
+    if (!ares_addr_match(&server->addr, &s->addr)) {
       continue;
     }
 
-    if (server->tcp_port != ares__sconfig_get_port(channel, s, ARES_TRUE)) {
+    if (server->tcp_port != ares_sconfig_get_port(channel, s, ARES_TRUE)) {
       continue;
     }
 
-    if (server->udp_port != ares__sconfig_get_port(channel, s, ARES_FALSE)) {
+    if (server->udp_port != ares_sconfig_get_port(channel, s, ARES_FALSE)) {
       continue;
     }
 
@@ -669,19 +722,19 @@ static ares_bool_t ares__server_in_newconfig(const ares_server_t *server,
   return ARES_FALSE;
 }
 
-static ares_bool_t ares__servers_remove_stale(ares_channel_t *channel,
-                                              ares__llist_t  *srvlist)
+static ares_bool_t ares_servers_remove_stale(ares_channel_t *channel,
+                                             ares_llist_t   *srvlist)
 {
-  ares_bool_t         stale_removed = ARES_FALSE;
-  ares__slist_node_t *snode         = ares__slist_node_first(channel->servers);
+  ares_bool_t        stale_removed = ARES_FALSE;
+  ares_slist_node_t *snode         = ares_slist_node_first(channel->servers);
 
   while (snode != NULL) {
-    ares__slist_node_t  *snext  = ares__slist_node_next(snode);
-    const ares_server_t *server = ares__slist_node_val(snode);
-    if (!ares__server_in_newconfig(server, srvlist)) {
+    ares_slist_node_t   *snext  = ares_slist_node_next(snode);
+    const ares_server_t *server = ares_slist_node_val(snode);
+    if (!ares_server_in_newconfig(server, srvlist)) {
       /* This will clean up all server state via the destruction callback and
        * move any queries to new servers */
-      ares__slist_node_destroy(snode);
+      ares_slist_node_destroy(snode);
       stale_removed = ARES_TRUE;
     }
     snode = snext;
@@ -689,21 +742,21 @@ static ares_bool_t ares__servers_remove_stale(ares_channel_t *channel,
   return stale_removed;
 }
 
-static void ares__servers_trim_single(ares_channel_t *channel)
+static void ares_servers_trim_single(ares_channel_t *channel)
 {
-  while (ares__slist_len(channel->servers) > 1) {
-    ares__slist_node_destroy(ares__slist_node_last(channel->servers));
+  while (ares_slist_len(channel->servers) > 1) {
+    ares_slist_node_destroy(ares_slist_node_last(channel->servers));
   }
 }
 
-ares_status_t ares__servers_update(ares_channel_t *channel,
-                                   ares__llist_t  *server_list,
-                                   ares_bool_t     user_specified)
+ares_status_t ares_servers_update(ares_channel_t *channel,
+                                  ares_llist_t   *server_list,
+                                  ares_bool_t     user_specified)
 {
-  ares__llist_node_t *node;
-  size_t              idx = 0;
-  ares_status_t       status;
-  ares_bool_t         list_changed = ARES_FALSE;
+  ares_llist_node_t *node;
+  size_t             idx = 0;
+  ares_status_t      status;
+  ares_bool_t        list_changed = ARES_FALSE;
 
   if (channel == NULL) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -714,19 +767,19 @@ ares_status_t ares__servers_update(ares_channel_t *channel,
    */
 
   /* Add new entries */
-  for (node = ares__llist_node_first(server_list); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const ares_sconfig_t *sconfig = ares__llist_node_val(node);
-    ares__slist_node_t   *snode;
+  for (node = ares_llist_node_first(server_list); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const ares_sconfig_t *sconfig = ares_llist_node_val(node);
+    ares_slist_node_t    *snode;
 
     /* If a server has already appeared in the list of new servers, skip it. */
-    if (ares__server_isdup(channel, node)) {
+    if (ares_server_isdup(channel, node)) {
       continue;
     }
 
-    snode = ares__server_find(channel, sconfig);
+    snode = ares_server_find(channel, sconfig);
     if (snode != NULL) {
-      ares_server_t *server = ares__slist_node_val(snode);
+      ares_server_t *server = ares_slist_node_val(snode);
 
       /* Copy over link-local settings.  Its possible some of this data has
        * changed, maybe ...  */
@@ -740,10 +793,10 @@ ares_status_t ares__servers_update(ares_channel_t *channel,
         server->idx = idx;
         /* Index changed, reinsert node, doesn't require any memory
          * allocations so can't fail. */
-        ares__slist_node_reinsert(snode);
+        ares_slist_node_reinsert(snode);
       }
     } else {
-      status = ares__server_create(channel, sconfig, idx);
+      status = ares_server_create(channel, sconfig, idx);
       if (status != ARES_SUCCESS) {
         goto done;
       }
@@ -755,13 +808,13 @@ ares_status_t ares__servers_update(ares_channel_t *channel,
   }
 
   /* Remove any servers that don't exist in the current configuration */
-  if (ares__servers_remove_stale(channel, server_list)) {
+  if (ares_servers_remove_stale(channel, server_list)) {
     list_changed = ARES_TRUE;
   }
 
   /* Trim to one server if ARES_FLAG_PRIMARY is set. */
   if (channel->flags & ARES_FLAG_PRIMARY) {
-    ares__servers_trim_single(channel);
+    ares_servers_trim_single(channel);
   }
 
   if (user_specified) {
@@ -771,7 +824,7 @@ ares_status_t ares__servers_update(ares_channel_t *channel,
 
   /* Clear any cached query results only if the server list changed */
   if (list_changed) {
-    ares__qcache_flush(channel->qcache);
+    ares_qcache_flush(channel->qcache);
   }
 
   status = ARES_SUCCESS;
@@ -781,15 +834,15 @@ ares_status_t ares__servers_update(ares_channel_t *channel,
 }
 
 static ares_status_t
-  ares_addr_node_to_server_config_llist(const struct ares_addr_node *servers,
-                                        ares__llist_t              **llist)
+  ares_addr_node_to_sconfig_llist(const struct ares_addr_node *servers,
+                                  ares_llist_t               **llist)
 {
   const struct ares_addr_node *node;
-  ares__llist_t               *s;
+  ares_llist_t                *s;
 
   *llist = NULL;
 
-  s = ares__llist_create(ares_free);
+  s = ares_llist_create(ares_free);
   if (s == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -816,7 +869,7 @@ static ares_status_t
              sizeof(sconfig->addr.addr.addr6));
     }
 
-    if (ares__llist_insert_last(s, sconfig) == NULL) {
+    if (ares_llist_insert_last(s, sconfig) == NULL) {
       ares_free(sconfig); /* LCOV_EXCL_LINE: OutOfMemory */
       goto fail;          /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -827,20 +880,21 @@ static ares_status_t
 
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
-  ares__llist_destroy(s);
+  ares_llist_destroy(s);
   return ARES_ENOMEM;
   /* LCOV_EXCL_STOP */
 }
 
-static ares_status_t ares_addr_port_node_to_server_config_llist(
-  const struct ares_addr_port_node *servers, ares__llist_t **llist)
+static ares_status_t
+  ares_addrpnode_to_sconfig_llist(const struct ares_addr_port_node *servers,
+                                  ares_llist_t                    **llist)
 {
   const struct ares_addr_port_node *node;
-  ares__llist_t                    *s;
+  ares_llist_t                     *s;
 
   *llist = NULL;
 
-  s = ares__llist_create(ares_free);
+  s = ares_llist_create(ares_free);
   if (s == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -870,7 +924,7 @@ static ares_status_t ares_addr_port_node_to_server_config_llist(
     sconfig->tcp_port = (unsigned short)node->tcp_port;
     sconfig->udp_port = (unsigned short)node->udp_port;
 
-    if (ares__llist_insert_last(s, sconfig) == NULL) {
+    if (ares_llist_insert_last(s, sconfig) == NULL) {
       ares_free(sconfig); /* LCOV_EXCL_LINE: OutOfMemory */
       goto fail;          /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -881,21 +935,21 @@ static ares_status_t ares_addr_port_node_to_server_config_llist(
 
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
-  ares__llist_destroy(s);
+  ares_llist_destroy(s);
   return ARES_ENOMEM;
   /* LCOV_EXCL_STOP */
 }
 
-ares_status_t ares_in_addr_to_server_config_llist(const struct in_addr *servers,
-                                                  size_t          nservers,
-                                                  ares__llist_t **llist)
+ares_status_t ares_in_addr_to_sconfig_llist(const struct in_addr *servers,
+                                            size_t                nservers,
+                                            ares_llist_t        **llist)
 {
-  size_t         i;
-  ares__llist_t *s;
+  size_t        i;
+  ares_llist_t *s;
 
   *llist = NULL;
 
-  s = ares__llist_create(ares_free);
+  s = ares_llist_create(ares_free);
   if (s == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -912,7 +966,7 @@ ares_status_t ares_in_addr_to_server_config_llist(const struct in_addr *servers,
     memcpy(&sconfig->addr.addr.addr4, &servers[i],
            sizeof(sconfig->addr.addr.addr4));
 
-    if (ares__llist_insert_last(s, sconfig) == NULL) {
+    if (ares_llist_insert_last(s, sconfig) == NULL) {
       goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
@@ -922,21 +976,90 @@ ares_status_t ares_in_addr_to_server_config_llist(const struct in_addr *servers,
 
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
-  ares__llist_destroy(s);
+  ares_llist_destroy(s);
   return ARES_ENOMEM;
   /* LCOV_EXCL_STOP */
 }
 
+static ares_bool_t ares_server_use_uri(const ares_server_t *server)
+{
+  /* Currently only reason to use new format is if the ports for udp and tcp
+   * are different */
+  if (server->tcp_port != server->udp_port) {
+    return ARES_TRUE;
+  }
+  return ARES_FALSE;
+}
+
+static ares_status_t ares_get_server_addr_uri(const ares_server_t *server,
+                                              ares_buf_t          *buf)
+{
+  ares_uri_t   *uri = NULL;
+  ares_status_t status;
+  char          addr[INET6_ADDRSTRLEN];
+
+  uri = ares_uri_create();
+  if (uri == NULL) {
+    return ARES_ENOMEM;
+  }
+
+  status = ares_uri_set_scheme(uri, "dns");
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  ares_inet_ntop(server->addr.family, &server->addr.addr, addr, sizeof(addr));
+
+  if (ares_strlen(server->ll_iface)) {
+    char addr_iface[256];
+
+    snprintf(addr_iface, sizeof(addr_iface), "%s%%%s", addr, server->ll_iface);
+    status = ares_uri_set_host(uri, addr_iface);
+  } else {
+    status = ares_uri_set_host(uri, addr);
+  }
+
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_set_port(uri, server->udp_port);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  if (server->udp_port != server->tcp_port) {
+    char port[6];
+    snprintf(port, sizeof(port), "%d", server->tcp_port);
+    status = ares_uri_set_query_key(uri, "tcpport", port);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+  }
+
+  status = ares_uri_write_buf(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  ares_uri_destroy(uri);
+  return status;
+}
+
 /* Write out the details of a server to a buffer */
-ares_status_t ares_get_server_addr(const ares_server_t *server,
-                                   ares__buf_t         *buf)
+ares_status_t ares_get_server_addr(const ares_server_t *server, ares_buf_t *buf)
 {
   ares_status_t status;
   char          addr[INET6_ADDRSTRLEN];
 
+  if (ares_server_use_uri(server)) {
+    return ares_get_server_addr_uri(server, buf);
+  }
+
   /* ipv4addr or [ipv6addr] */
   if (server->addr.family == AF_INET6) {
-    status = ares__buf_append_byte(buf, '[');
+    status = ares_buf_append_byte(buf, '[');
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -944,37 +1067,37 @@ ares_status_t ares_get_server_addr(const ares_server_t *server,
 
   ares_inet_ntop(server->addr.family, &server->addr.addr, addr, sizeof(addr));
 
-  status = ares__buf_append_str(buf, addr);
+  status = ares_buf_append_str(buf, addr);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   if (server->addr.family == AF_INET6) {
-    status = ares__buf_append_byte(buf, ']');
+    status = ares_buf_append_byte(buf, ']');
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
   /* :port */
-  status = ares__buf_append_byte(buf, ':');
+  status = ares_buf_append_byte(buf, ':');
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_num_dec(buf, server->udp_port, 0);
+  status = ares_buf_append_num_dec(buf, server->udp_port, 0);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* %iface */
   if (ares_strlen(server->ll_iface)) {
-    status = ares__buf_append_byte(buf, '%');
+    status = ares_buf_append_byte(buf, '%');
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_str(buf, server->ll_iface);
+    status = ares_buf_append_str(buf, server->ll_iface);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -990,17 +1113,17 @@ int ares_get_servers(const ares_channel_t   *channel,
   struct ares_addr_node *srvr_last = NULL;
   struct ares_addr_node *srvr_curr;
   ares_status_t          status = ARES_SUCCESS;
-  ares__slist_node_t    *node;
+  ares_slist_node_t     *node;
 
   if (channel == NULL) {
     return ARES_ENODATA;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
-    const ares_server_t *server = ares__slist_node_val(node);
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    const ares_server_t *server = ares_slist_node_val(node);
 
     /* Allocate storage for this server node appending it to the list */
     srvr_curr = ares_malloc_data(ARES_DATATYPE_ADDR_NODE);
@@ -1033,7 +1156,7 @@ int ares_get_servers(const ares_channel_t   *channel,
 
   *servers = srvr_head;
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
 
   return (int)status;
 }
@@ -1045,17 +1168,17 @@ int ares_get_servers_ports(const ares_channel_t        *channel,
   struct ares_addr_port_node *srvr_last = NULL;
   struct ares_addr_port_node *srvr_curr;
   ares_status_t               status = ARES_SUCCESS;
-  ares__slist_node_t         *node;
+  ares_slist_node_t          *node;
 
   if (channel == NULL) {
     return ARES_ENODATA;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
-    const ares_server_t *server = ares__slist_node_val(node);
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
+    const ares_server_t *server = ares_slist_node_val(node);
 
     /* Allocate storage for this server node appending it to the list */
     srvr_curr = ares_malloc_data(ARES_DATATYPE_ADDR_PORT_NODE);
@@ -1091,30 +1214,30 @@ int ares_get_servers_ports(const ares_channel_t        *channel,
 
   *servers = srvr_head;
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return (int)status;
 }
 
 int ares_set_servers(ares_channel_t              *channel,
                      const struct ares_addr_node *servers)
 {
-  ares__llist_t *slist;
-  ares_status_t  status;
+  ares_llist_t *slist;
+  ares_status_t status;
 
   if (channel == NULL) {
     return ARES_ENODATA;
   }
 
-  status = ares_addr_node_to_server_config_llist(servers, &slist);
+  status = ares_addr_node_to_sconfig_llist(servers, &slist);
   if (status != ARES_SUCCESS) {
     return (int)status;
   }
 
-  ares__channel_lock(channel);
-  status = ares__servers_update(channel, slist, ARES_TRUE);
-  ares__channel_unlock(channel);
+  ares_channel_lock(channel);
+  status = ares_servers_update(channel, slist, ARES_TRUE);
+  ares_channel_unlock(channel);
 
-  ares__llist_destroy(slist);
+  ares_llist_destroy(slist);
 
   return (int)status;
 }
@@ -1122,23 +1245,23 @@ int ares_set_servers(ares_channel_t              *channel,
 int ares_set_servers_ports(ares_channel_t                   *channel,
                            const struct ares_addr_port_node *servers)
 {
-  ares__llist_t *slist;
-  ares_status_t  status;
+  ares_llist_t *slist;
+  ares_status_t status;
 
   if (channel == NULL) {
     return ARES_ENODATA;
   }
 
-  status = ares_addr_port_node_to_server_config_llist(servers, &slist);
+  status = ares_addrpnode_to_sconfig_llist(servers, &slist);
   if (status != ARES_SUCCESS) {
     return (int)status;
   }
 
-  ares__channel_lock(channel);
-  status = ares__servers_update(channel, slist, ARES_TRUE);
-  ares__channel_unlock(channel);
+  ares_channel_lock(channel);
+  status = ares_servers_update(channel, slist, ARES_TRUE);
+  ares_channel_unlock(channel);
 
-  ares__llist_destroy(slist);
+  ares_llist_destroy(slist);
 
   return (int)status;
 }
@@ -1147,8 +1270,8 @@ int ares_set_servers_ports(ares_channel_t                   *channel,
 /* IPv6 addresses with ports require square brackets [fe80::1]:53 */
 static ares_status_t set_servers_csv(ares_channel_t *channel, const char *_csv)
 {
-  ares_status_t  status;
-  ares__llist_t *slist = NULL;
+  ares_status_t status;
+  ares_llist_t *slist = NULL;
 
   if (channel == NULL) {
     return ARES_ENODATA;
@@ -1156,23 +1279,23 @@ static ares_status_t set_servers_csv(ares_channel_t *channel, const char *_csv)
 
   if (ares_strlen(_csv) == 0) {
     /* blank all servers */
-    ares__channel_lock(channel);
-    status = ares__servers_update(channel, NULL, ARES_TRUE);
-    ares__channel_unlock(channel);
+    ares_channel_lock(channel);
+    status = ares_servers_update(channel, NULL, ARES_TRUE);
+    ares_channel_unlock(channel);
     return status;
   }
 
-  status = ares__sconfig_append_fromstr(&slist, _csv, ARES_FALSE);
+  status = ares_sconfig_append_fromstr(channel, &slist, _csv, ARES_FALSE);
   if (status != ARES_SUCCESS) {
-    ares__llist_destroy(slist);
+    ares_llist_destroy(slist);
     return status;
   }
 
-  ares__channel_lock(channel);
-  status = ares__servers_update(channel, slist, ARES_TRUE);
-  ares__channel_unlock(channel);
+  ares_channel_lock(channel);
+  status = ares_servers_update(channel, slist, ARES_TRUE);
+  ares_channel_unlock(channel);
 
-  ares__llist_destroy(slist);
+  ares_llist_destroy(slist);
 
   return status;
 }
@@ -1190,24 +1313,24 @@ int ares_set_servers_ports_csv(ares_channel_t *channel, const char *_csv)
 
 char *ares_get_servers_csv(const ares_channel_t *channel)
 {
-  ares__buf_t        *buf = NULL;
-  char               *out = NULL;
-  ares__slist_node_t *node;
+  ares_buf_t        *buf = NULL;
+  char              *out = NULL;
+  ares_slist_node_t *node;
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  for (node = ares__slist_node_first(channel->servers); node != NULL;
-       node = ares__slist_node_next(node)) {
+  for (node = ares_slist_node_first(channel->servers); node != NULL;
+       node = ares_slist_node_next(node)) {
     ares_status_t        status;
-    const ares_server_t *server = ares__slist_node_val(node);
+    const ares_server_t *server = ares_slist_node_val(node);
 
-    if (ares__buf_len(buf)) {
-      status = ares__buf_append_byte(buf, ',');
+    if (ares_buf_len(buf)) {
+      status = ares_buf_append_byte(buf, ',');
       if (status != ARES_SUCCESS) {
         goto done; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -1219,12 +1342,12 @@ char *ares_get_servers_csv(const ares_channel_t *channel)
     }
   }
 
-  out = ares__buf_finish_str(buf, NULL);
+  out = ares_buf_finish_str(buf, NULL);
   buf = NULL;
 
 done:
-  ares__channel_unlock(channel);
-  ares__buf_destroy(buf);
+  ares_channel_unlock(channel);
+  ares_buf_destroy(buf);
   return out;
 }
 
diff --git a/deps/cares/src/lib/config-dos.h b/deps/cares/src/lib/config-dos.h
index db758fcca6619a..afbcfb2858ea0b 100644
--- a/deps/cares/src/lib/config-dos.h
+++ b/deps/cares/src/lib/config-dos.h
@@ -21,6 +21,7 @@
 #define HAVE_RECV                1
 #define HAVE_RECVFROM            1
 #define HAVE_SEND                1
+#define HAVE_SENDTO              1
 #define HAVE_STRDUP              1
 #define HAVE_STRICMP             1
 #define HAVE_STRUCT_IN6_ADDR     1
diff --git a/deps/cares/src/lib/config-win32.h b/deps/cares/src/lib/config-win32.h
index b9c4ec17336b86..be233a2f8b9c2d 100644
--- a/deps/cares/src/lib/config-win32.h
+++ b/deps/cares/src/lib/config-win32.h
@@ -187,6 +187,9 @@
 /* Define if you have the send function. */
 #define HAVE_SEND 1
 
+/* Define if you have the sendto function. */
+#define HAVE_SENDTO 1
+
 /* Define to the type of arg 1 for send. */
 #define SEND_TYPE_ARG1 SOCKET
 
diff --git a/deps/cares/src/lib/dsa/ares__array.c b/deps/cares/src/lib/dsa/ares_array.c
similarity index 60%
rename from deps/cares/src/lib/dsa/ares__array.c
rename to deps/cares/src/lib/dsa/ares_array.c
index 0c724248bf2e09..c421d5c5f670bd 100644
--- a/deps/cares/src/lib/dsa/ares__array.c
+++ b/deps/cares/src/lib/dsa/ares_array.c
@@ -24,23 +24,23 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__array.h"
+#include "ares_array.h"
 
 #define ARES__ARRAY_MIN 4
 
-struct ares__array {
-  ares__array_destructor_t destruct;
-  void                    *arr;
-  size_t                   member_size;
-  size_t                   cnt;
-  size_t                   offset;
-  size_t                   alloc_cnt;
+struct ares_array {
+  ares_array_destructor_t destruct;
+  void                   *arr;
+  size_t                  member_size;
+  size_t                  cnt;
+  size_t                  offset;
+  size_t                  alloc_cnt;
 };
 
-ares__array_t *ares__array_create(size_t                   member_size,
-                                  ares__array_destructor_t destruct)
+ares_array_t *ares_array_create(size_t                  member_size,
+                                ares_array_destructor_t destruct)
 {
-  ares__array_t *arr;
+  ares_array_t *arr;
 
   if (member_size == 0) {
     return NULL;
@@ -56,7 +56,7 @@ ares__array_t *ares__array_create(size_t                   member_size,
   return arr;
 }
 
-size_t ares__array_len(const ares__array_t *arr)
+size_t ares_array_len(const ares_array_t *arr)
 {
   if (arr == NULL) {
     return 0;
@@ -64,7 +64,7 @@ size_t ares__array_len(const ares__array_t *arr)
   return arr->cnt;
 }
 
-void *ares__array_at(ares__array_t *arr, size_t idx)
+void *ares_array_at(ares_array_t *arr, size_t idx)
 {
   if (arr == NULL || idx >= arr->cnt) {
     return NULL;
@@ -72,7 +72,7 @@ void *ares__array_at(ares__array_t *arr, size_t idx)
   return (unsigned char *)arr->arr + ((idx + arr->offset) * arr->member_size);
 }
 
-const void *ares__array_at_const(const ares__array_t *arr, size_t idx)
+const void *ares_array_at_const(const ares_array_t *arr, size_t idx)
 {
   if (arr == NULL || idx >= arr->cnt) {
     return NULL;
@@ -80,7 +80,7 @@ const void *ares__array_at_const(const ares__array_t *arr, size_t idx)
   return (unsigned char *)arr->arr + ((idx + arr->offset) * arr->member_size);
 }
 
-ares_status_t ares__array_sort(ares__array_t *arr, ares__array_cmp_t cmp)
+ares_status_t ares_array_sort(ares_array_t *arr, ares_array_cmp_t cmp)
 {
   if (arr == NULL || cmp == NULL) {
     return ARES_EFORMERR;
@@ -96,7 +96,7 @@ ares_status_t ares__array_sort(ares__array_t *arr, ares__array_cmp_t cmp)
   return ARES_SUCCESS;
 }
 
-void ares__array_destroy(ares__array_t *arr)
+void ares_array_destroy(ares_array_t *arr)
 {
   size_t i;
 
@@ -106,7 +106,7 @@ void ares__array_destroy(ares__array_t *arr)
 
   if (arr->destruct != NULL) {
     for (i = 0; i < arr->cnt; i++) {
-      arr->destruct(ares__array_at(arr, i));
+      arr->destruct(ares_array_at(arr, i));
     }
   }
 
@@ -116,8 +116,8 @@ void ares__array_destroy(ares__array_t *arr)
 
 /* NOTE: this function operates on actual indexes, NOT indexes using the
  *       arr->offset */
-static ares_status_t ares__array_move(ares__array_t *arr, size_t dest_idx,
-                                      size_t src_idx)
+static ares_status_t ares_array_move(ares_array_t *arr, size_t dest_idx,
+                                     size_t src_idx)
 {
   void       *dest_ptr;
   const void *src_ptr;
@@ -140,18 +140,14 @@ static ares_status_t ares__array_move(ares__array_t *arr, size_t dest_idx,
   if (dest_idx > src_idx && arr->cnt + (dest_idx - src_idx) > arr->alloc_cnt) {
     return ARES_EFORMERR;
   }
-  if (dest_idx < src_idx) {
-    nmembers = arr->cnt - dest_idx;
-  } else {
-    nmembers = arr->cnt - src_idx;
-  }
 
+  nmembers = arr->cnt - (src_idx - arr->offset);
   memmove(dest_ptr, src_ptr, nmembers * arr->member_size);
 
   return ARES_SUCCESS;
 }
 
-void *ares__array_finish(ares__array_t *arr, size_t *num_members)
+void *ares_array_finish(ares_array_t *arr, size_t *num_members)
 {
   void *ptr;
 
@@ -161,7 +157,7 @@ void *ares__array_finish(ares__array_t *arr, size_t *num_members)
 
   /* Make sure we move data to beginning of allocation */
   if (arr->offset != 0) {
-    if (ares__array_move(arr, 0, arr->offset) != ARES_SUCCESS) {
+    if (ares_array_move(arr, 0, arr->offset) != ARES_SUCCESS) {
       return NULL;
     }
     arr->offset = 0;
@@ -173,7 +169,7 @@ void *ares__array_finish(ares__array_t *arr, size_t *num_members)
   return ptr;
 }
 
-ares_status_t ares__array_set_size(ares__array_t *arr, size_t size)
+ares_status_t ares_array_set_size(ares_array_t *arr, size_t size)
 {
   void *temp;
 
@@ -182,7 +178,7 @@ ares_status_t ares__array_set_size(ares__array_t *arr, size_t size)
   }
 
   /* Always operate on powers of 2 */
-  size = ares__round_up_pow2(size);
+  size = ares_round_up_pow2(size);
 
   if (size < ARES__ARRAY_MIN) {
     size = ARES__ARRAY_MIN;
@@ -203,8 +199,8 @@ ares_status_t ares__array_set_size(ares__array_t *arr, size_t size)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
-                                    size_t idx)
+ares_status_t ares_array_insert_at(void **elem_ptr, ares_array_t *arr,
+                                   size_t idx)
 {
   void         *ptr;
   ares_status_t status;
@@ -219,14 +215,14 @@ ares_status_t ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
   }
 
   /* Allocate more if needed */
-  status = ares__array_set_size(arr, arr->cnt + 1);
+  status = ares_array_set_size(arr, arr->cnt + 1);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
   /* Shift if we have memory but not enough room at the end */
   if (arr->cnt + 1 + arr->offset > arr->alloc_cnt) {
-    status = ares__array_move(arr, 0, arr->offset);
+    status = ares_array_move(arr, 0, arr->offset);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -236,7 +232,7 @@ ares_status_t ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
   /* If we're inserting anywhere other than the end, we need to move some
    * elements out of the way */
   if (idx != arr->cnt) {
-    status = ares__array_move(arr, idx + arr->offset + 1, idx + arr->offset);
+    status = ares_array_move(arr, idx + arr->offset + 1, idx + arr->offset);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -255,46 +251,88 @@ ares_status_t ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__array_insert_last(void **elem_ptr, ares__array_t *arr)
+ares_status_t ares_array_insert_last(void **elem_ptr, ares_array_t *arr)
+{
+  return ares_array_insert_at(elem_ptr, arr, ares_array_len(arr));
+}
+
+ares_status_t ares_array_insert_first(void **elem_ptr, ares_array_t *arr)
+{
+  return ares_array_insert_at(elem_ptr, arr, 0);
+}
+
+ares_status_t ares_array_insertdata_at(ares_array_t *arr, size_t idx,
+                                       const void *data_ptr)
+{
+  ares_status_t status;
+  void         *ptr = NULL;
+
+  status = ares_array_insert_at(&ptr, arr, idx);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+  memcpy(ptr, data_ptr, arr->member_size);
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_array_insertdata_last(ares_array_t *arr,
+                                         const void   *data_ptr)
 {
-  return ares__array_insert_at(elem_ptr, arr, ares__array_len(arr));
+  ares_status_t status;
+  void         *ptr = NULL;
+
+  status = ares_array_insert_last(&ptr, arr);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+  memcpy(ptr, data_ptr, arr->member_size);
+  return ARES_SUCCESS;
 }
 
-ares_status_t ares__array_insert_first(void **elem_ptr, ares__array_t *arr)
+ares_status_t ares_array_insertdata_first(ares_array_t *arr,
+                                          const void   *data_ptr)
 {
-  return ares__array_insert_at(elem_ptr, arr, 0);
+  ares_status_t status;
+  void         *ptr = NULL;
+
+  status = ares_array_insert_last(&ptr, arr);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+  memcpy(ptr, data_ptr, arr->member_size);
+  return ARES_SUCCESS;
 }
 
-void *ares__array_first(ares__array_t *arr)
+void *ares_array_first(ares_array_t *arr)
 {
-  return ares__array_at(arr, 0);
+  return ares_array_at(arr, 0);
 }
 
-void *ares__array_last(ares__array_t *arr)
+void *ares_array_last(ares_array_t *arr)
 {
-  size_t cnt = ares__array_len(arr);
+  size_t cnt = ares_array_len(arr);
   if (cnt == 0) {
     return NULL;
   }
-  return ares__array_at(arr, cnt - 1);
+  return ares_array_at(arr, cnt - 1);
 }
 
-const void *ares__array_first_const(const ares__array_t *arr)
+const void *ares_array_first_const(const ares_array_t *arr)
 {
-  return ares__array_at_const(arr, 0);
+  return ares_array_at_const(arr, 0);
 }
 
-const void *ares__array_last_const(const ares__array_t *arr)
+const void *ares_array_last_const(const ares_array_t *arr)
 {
-  size_t cnt = ares__array_len(arr);
+  size_t cnt = ares_array_len(arr);
   if (cnt == 0) {
     return NULL;
   }
-  return ares__array_at_const(arr, cnt - 1);
+  return ares_array_at_const(arr, cnt - 1);
 }
 
-ares_status_t ares__array_claim_at(void *dest, size_t dest_size,
-                                   ares__array_t *arr, size_t idx)
+ares_status_t ares_array_claim_at(void *dest, size_t dest_size,
+                                  ares_array_t *arr, size_t idx)
 {
   ares_status_t status;
 
@@ -307,7 +345,7 @@ ares_status_t ares__array_claim_at(void *dest, size_t dest_size,
   }
 
   if (dest) {
-    memcpy(dest, ares__array_at(arr, idx), arr->member_size);
+    memcpy(dest, ares_array_at(arr, idx), arr->member_size);
   }
 
   if (idx == 0) {
@@ -317,7 +355,7 @@ ares_status_t ares__array_claim_at(void *dest, size_t dest_size,
   } else if (idx != arr->cnt - 1) {
     /* Must shift entire array if removing an element from the middle. Does
      * nothing if removing last element other than decrement count. */
-    status = ares__array_move(arr, idx + arr->offset, idx + arr->offset + 1);
+    status = ares_array_move(arr, idx + arr->offset, idx + arr->offset + 1);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -327,9 +365,9 @@ ares_status_t ares__array_claim_at(void *dest, size_t dest_size,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__array_remove_at(ares__array_t *arr, size_t idx)
+ares_status_t ares_array_remove_at(ares_array_t *arr, size_t idx)
 {
-  void *ptr = ares__array_at(arr, idx);
+  void *ptr = ares_array_at(arr, idx);
   if (arr == NULL || ptr == NULL) {
     return ARES_EFORMERR;
   }
@@ -338,19 +376,19 @@ ares_status_t ares__array_remove_at(ares__array_t *arr, size_t idx)
     arr->destruct(ptr);
   }
 
-  return ares__array_claim_at(NULL, 0, arr, idx);
+  return ares_array_claim_at(NULL, 0, arr, idx);
 }
 
-ares_status_t ares__array_remove_first(ares__array_t *arr)
+ares_status_t ares_array_remove_first(ares_array_t *arr)
 {
-  return ares__array_remove_at(arr, 0);
+  return ares_array_remove_at(arr, 0);
 }
 
-ares_status_t ares__array_remove_last(ares__array_t *arr)
+ares_status_t ares_array_remove_last(ares_array_t *arr)
 {
-  size_t cnt = ares__array_len(arr);
+  size_t cnt = ares_array_len(arr);
   if (cnt == 0) {
     return ARES_EFORMERR;
   }
-  return ares__array_remove_at(arr, cnt - 1);
+  return ares_array_remove_at(arr, cnt - 1);
 }
diff --git a/deps/cares/src/lib/dsa/ares__htable.c b/deps/cares/src/lib/dsa/ares_htable.c
similarity index 64%
rename from deps/cares/src/lib/dsa/ares__htable.c
rename to deps/cares/src/lib/dsa/ares_htable.c
index 9049b3246b36f1..f76b67cae9a668 100644
--- a/deps/cares/src/lib/dsa/ares__htable.c
+++ b/deps/cares/src/lib/dsa/ares_htable.c
@@ -24,33 +24,37 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__llist.h"
-#include "ares__htable.h"
+#include "ares_llist.h"
+#include "ares_htable.h"
 
 #define ARES__HTABLE_MAX_BUCKETS    (1U << 24)
 #define ARES__HTABLE_MIN_BUCKETS    (1U << 4)
 #define ARES__HTABLE_EXPAND_PERCENT 75
 
-struct ares__htable {
-  ares__htable_hashfunc_t    hash;
-  ares__htable_bucket_key_t  bucket_key;
-  ares__htable_bucket_free_t bucket_free;
-  ares__htable_key_eq_t      key_eq;
-  unsigned int               seed;
-  unsigned int               size;
-  size_t                     num_keys;
-  size_t                     num_collisions;
-  /* NOTE: if we converted buckets into ares__slist_t we could guarantee on
+struct ares_htable {
+  ares_htable_hashfunc_t    hash;
+  ares_htable_bucket_key_t  bucket_key;
+  ares_htable_bucket_free_t bucket_free;
+  ares_htable_key_eq_t      key_eq;
+  unsigned int              seed;
+  unsigned int              size;
+  size_t                    num_keys;
+  size_t                    num_collisions;
+  /* NOTE: if we converted buckets into ares_slist_t we could guarantee on
    *       hash collisions we would have O(log n) worst case insert and search
    *       performance.  (We'd also need to make key_eq into a key_cmp to
    *       support sort).  That said, risk with a random hash seed is near zero,
-   *       and ares__slist_t is heavier weight, so I think using ares__llist_t
+   *       and ares_slist_t is heavier weight, so I think using ares_llist_t
    *       is an overall win. */
-  ares__llist_t            **buckets;
+  ares_llist_t            **buckets;
 };
 
-static unsigned int ares__htable_generate_seed(ares__htable_t *htable)
+static unsigned int ares_htable_generate_seed(ares_htable_t *htable)
 {
+#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+  /* Seed needs to be static for fuzzing */
+  return 0;
+#else
   unsigned int seed = 0;
   time_t       t    = time(NULL);
 
@@ -61,11 +65,12 @@ static unsigned int ares__htable_generate_seed(ares__htable_t *htable)
   seed |= (unsigned int)((size_t)&seed & 0xFFFFFFFF);
   seed |= (unsigned int)(((ares_uint64_t)t) & 0xFFFFFFFF);
   return seed;
+#endif
 }
 
-static void ares__htable_buckets_destroy(ares__llist_t **buckets,
-                                         unsigned int    size,
-                                         ares_bool_t     destroy_vals)
+static void ares_htable_buckets_destroy(ares_llist_t **buckets,
+                                        unsigned int   size,
+                                        ares_bool_t    destroy_vals)
 {
   unsigned int i;
 
@@ -79,30 +84,30 @@ static void ares__htable_buckets_destroy(ares__llist_t **buckets,
     }
 
     if (!destroy_vals) {
-      ares__llist_replace_destructor(buckets[i], NULL);
+      ares_llist_replace_destructor(buckets[i], NULL);
     }
 
-    ares__llist_destroy(buckets[i]);
+    ares_llist_destroy(buckets[i]);
   }
 
   ares_free(buckets);
 }
 
-void ares__htable_destroy(ares__htable_t *htable)
+void ares_htable_destroy(ares_htable_t *htable)
 {
   if (htable == NULL) {
     return;
   }
-  ares__htable_buckets_destroy(htable->buckets, htable->size, ARES_TRUE);
+  ares_htable_buckets_destroy(htable->buckets, htable->size, ARES_TRUE);
   ares_free(htable);
 }
 
-ares__htable_t *ares__htable_create(ares__htable_hashfunc_t    hash_func,
-                                    ares__htable_bucket_key_t  bucket_key,
-                                    ares__htable_bucket_free_t bucket_free,
-                                    ares__htable_key_eq_t      key_eq)
+ares_htable_t *ares_htable_create(ares_htable_hashfunc_t    hash_func,
+                                  ares_htable_bucket_key_t  bucket_key,
+                                  ares_htable_bucket_free_t bucket_free,
+                                  ares_htable_key_eq_t      key_eq)
 {
-  ares__htable_t *htable = NULL;
+  ares_htable_t *htable = NULL;
 
   if (hash_func == NULL || bucket_key == NULL || bucket_free == NULL ||
       key_eq == NULL) {
@@ -118,7 +123,7 @@ ares__htable_t *ares__htable_create(ares__htable_hashfunc_t    hash_func,
   htable->bucket_key  = bucket_key;
   htable->bucket_free = bucket_free;
   htable->key_eq      = key_eq;
-  htable->seed        = ares__htable_generate_seed(htable);
+  htable->seed        = ares_htable_generate_seed(htable);
   htable->size        = ARES__HTABLE_MIN_BUCKETS;
   htable->buckets = ares_malloc_zero(sizeof(*htable->buckets) * htable->size);
 
@@ -129,11 +134,11 @@ ares__htable_t *ares__htable_create(ares__htable_hashfunc_t    hash_func,
   return htable;
 
 fail:
-  ares__htable_destroy(htable);
+  ares_htable_destroy(htable);
   return NULL;
 }
 
-const void **ares__htable_all_buckets(const ares__htable_t *htable, size_t *num)
+const void **ares_htable_all_buckets(const ares_htable_t *htable, size_t *num)
 {
   const void **out = NULL;
   size_t       cnt = 0;
@@ -151,10 +156,10 @@ const void **ares__htable_all_buckets(const ares__htable_t *htable, size_t *num)
   }
 
   for (i = 0; i < htable->size; i++) {
-    ares__llist_node_t *node;
-    for (node = ares__llist_node_first(htable->buckets[i]); node != NULL;
-         node = ares__llist_node_next(node)) {
-      out[cnt++] = ares__llist_node_val(node);
+    ares_llist_node_t *node;
+    for (node = ares_llist_node_first(htable->buckets[i]); node != NULL;
+         node = ares_llist_node_next(node)) {
+      out[cnt++] = ares_llist_node_val(node);
     }
   }
 
@@ -169,14 +174,14 @@ const void **ares__htable_all_buckets(const ares__htable_t *htable, size_t *num)
  * efficient */
 #define HASH_IDX(h, key) h->hash(key, h->seed) & (h->size - 1)
 
-static ares__llist_node_t *ares__htable_find(const ares__htable_t *htable,
-                                             unsigned int idx, const void *key)
+static ares_llist_node_t *ares_htable_find(const ares_htable_t *htable,
+                                           unsigned int idx, const void *key)
 {
-  ares__llist_node_t *node = NULL;
+  ares_llist_node_t *node = NULL;
 
-  for (node = ares__llist_node_first(htable->buckets[idx]); node != NULL;
-       node = ares__llist_node_next(node)) {
-    if (htable->key_eq(key, htable->bucket_key(ares__llist_node_val(node)))) {
+  for (node = ares_llist_node_first(htable->buckets[idx]); node != NULL;
+       node = ares_llist_node_next(node)) {
+    if (htable->key_eq(key, htable->bucket_key(ares_llist_node_val(node)))) {
       break;
     }
   }
@@ -184,14 +189,14 @@ static ares__llist_node_t *ares__htable_find(const ares__htable_t *htable,
   return node;
 }
 
-static ares_bool_t ares__htable_expand(ares__htable_t *htable)
+static ares_bool_t ares_htable_expand(ares_htable_t *htable)
 {
-  ares__llist_t **buckets  = NULL;
-  unsigned int    old_size = htable->size;
-  size_t          i;
-  ares__llist_t **prealloc_llist     = NULL;
-  size_t          prealloc_llist_len = 0;
-  ares_bool_t     rv                 = ARES_FALSE;
+  ares_llist_t **buckets  = NULL;
+  unsigned int   old_size = htable->size;
+  size_t         i;
+  ares_llist_t **prealloc_llist     = NULL;
+  size_t         prealloc_llist_len = 0;
+  ares_bool_t    rv                 = ARES_FALSE;
 
   /* Not a failure, just won't expand */
   if (old_size == ARES__HTABLE_MAX_BUCKETS) {
@@ -219,7 +224,7 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
     }
   }
   for (i = 0; i < prealloc_llist_len; i++) {
-    prealloc_llist[i] = ares__llist_create(htable->bucket_free);
+    prealloc_llist[i] = ares_llist_create(htable->bucket_free);
     if (prealloc_llist[i] == NULL) {
       goto done;
     }
@@ -228,7 +233,7 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
   /* Iterate across all buckets and move the entries to the new buckets */
   htable->num_collisions = 0;
   for (i = 0; i < old_size; i++) {
-    ares__llist_node_t *node;
+    ares_llist_node_t *node;
 
     /* Nothing in this bucket */
     if (htable->buckets[i] == NULL) {
@@ -238,8 +243,8 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
     /* Fast path optimization (most likely case), there is likely only a single
      * entry in both the source and destination, check for this to confirm and
      * if so, just move the bucket over */
-    if (ares__llist_len(htable->buckets[i]) == 1) {
-      const void *val = ares__llist_first_val(htable->buckets[i]);
+    if (ares_llist_len(htable->buckets[i]) == 1) {
+      const void *val = ares_llist_first_val(htable->buckets[i]);
       size_t      idx = HASH_IDX(htable, htable->bucket_key(val));
 
       if (buckets[idx] == NULL) {
@@ -251,13 +256,13 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
     }
 
     /* Slow path, collisions */
-    while ((node = ares__llist_node_first(htable->buckets[i])) != NULL) {
-      const void *val = ares__llist_node_val(node);
+    while ((node = ares_llist_node_first(htable->buckets[i])) != NULL) {
+      const void *val = ares_llist_node_val(node);
       size_t      idx = HASH_IDX(htable, htable->bucket_key(val));
 
       /* Try fast path again as maybe we popped one collision off and the
        * next we can reuse the llist parent */
-      if (buckets[idx] == NULL && ares__llist_len(htable->buckets[i]) == 1) {
+      if (buckets[idx] == NULL && ares_llist_len(htable->buckets[i]) == 1) {
         /* Swap! */
         buckets[idx]       = htable->buckets[i];
         htable->buckets[i] = NULL;
@@ -277,12 +282,12 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
         htable->num_collisions++;
       }
 
-      ares__llist_node_move_parent_first(node, buckets[idx]);
+      ares_llist_node_mvparent_first(node, buckets[idx]);
     }
 
     /* Abandoned bucket, destroy */
     if (htable->buckets[i] != NULL) {
-      ares__llist_destroy(htable->buckets[i]);
+      ares_llist_destroy(htable->buckets[i]);
       htable->buckets[i] = NULL;
     }
   }
@@ -297,8 +302,8 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
 done:
   ares_free(buckets);
   /* destroy any unused preallocated buckets */
-  ares__htable_buckets_destroy(prealloc_llist, (unsigned int)prealloc_llist_len,
-                               ARES_FALSE);
+  ares_htable_buckets_destroy(prealloc_llist, (unsigned int)prealloc_llist_len,
+                              ARES_FALSE);
 
   /* On failure, we need to restore the htable size */
   if (rv != ARES_TRUE) {
@@ -308,11 +313,11 @@ static ares_bool_t ares__htable_expand(ares__htable_t *htable)
   return rv;
 }
 
-ares_bool_t ares__htable_insert(ares__htable_t *htable, void *bucket)
+ares_bool_t ares_htable_insert(ares_htable_t *htable, void *bucket)
 {
-  unsigned int        idx  = 0;
-  ares__llist_node_t *node = NULL;
-  const void         *key  = NULL;
+  unsigned int       idx  = 0;
+  ares_llist_node_t *node = NULL;
+  const void        *key  = NULL;
 
   if (htable == NULL || bucket == NULL) {
     return ARES_FALSE;
@@ -323,9 +328,9 @@ ares_bool_t ares__htable_insert(ares__htable_t *htable, void *bucket)
   idx = HASH_IDX(htable, key);
 
   /* See if we have a matching bucket already, if so, replace it */
-  node = ares__htable_find(htable, idx, key);
+  node = ares_htable_find(htable, idx, key);
   if (node != NULL) {
-    ares__llist_node_replace(node, bucket);
+    ares_llist_node_replace(node, bucket);
     return ARES_TRUE;
   }
 
@@ -333,7 +338,7 @@ ares_bool_t ares__htable_insert(ares__htable_t *htable, void *bucket)
    * increased beyond our threshold */
   if (htable->num_keys + 1 >
       (htable->size * ARES__HTABLE_EXPAND_PERCENT) / 100) {
-    if (!ares__htable_expand(htable)) {
+    if (!ares_htable_expand(htable)) {
       return ARES_FALSE; /* LCOV_EXCL_LINE */
     }
     /* If we expanded, need to calculate a new index */
@@ -342,19 +347,19 @@ ares_bool_t ares__htable_insert(ares__htable_t *htable, void *bucket)
 
   /* We lazily allocate the linked list */
   if (htable->buckets[idx] == NULL) {
-    htable->buckets[idx] = ares__llist_create(htable->bucket_free);
+    htable->buckets[idx] = ares_llist_create(htable->bucket_free);
     if (htable->buckets[idx] == NULL) {
       return ARES_FALSE;
     }
   }
 
-  node = ares__llist_insert_first(htable->buckets[idx], bucket);
+  node = ares_llist_insert_first(htable->buckets[idx], bucket);
   if (node == NULL) {
     return ARES_FALSE;
   }
 
   /* Track collisions for rehash stability */
-  if (ares__llist_len(htable->buckets[idx]) > 1) {
+  if (ares_llist_len(htable->buckets[idx]) > 1) {
     htable->num_collisions++;
   }
 
@@ -363,7 +368,7 @@ ares_bool_t ares__htable_insert(ares__htable_t *htable, void *bucket)
   return ARES_TRUE;
 }
 
-void *ares__htable_get(const ares__htable_t *htable, const void *key)
+void *ares_htable_get(const ares_htable_t *htable, const void *key)
 {
   unsigned int idx;
 
@@ -373,20 +378,20 @@ void *ares__htable_get(const ares__htable_t *htable, const void *key)
 
   idx = HASH_IDX(htable, key);
 
-  return ares__llist_node_val(ares__htable_find(htable, idx, key));
+  return ares_llist_node_val(ares_htable_find(htable, idx, key));
 }
 
-ares_bool_t ares__htable_remove(ares__htable_t *htable, const void *key)
+ares_bool_t ares_htable_remove(ares_htable_t *htable, const void *key)
 {
-  ares__llist_node_t *node;
-  unsigned int        idx;
+  ares_llist_node_t *node;
+  unsigned int       idx;
 
   if (htable == NULL || key == NULL) {
     return ARES_FALSE;
   }
 
   idx  = HASH_IDX(htable, key);
-  node = ares__htable_find(htable, idx, key);
+  node = ares_htable_find(htable, idx, key);
   if (node == NULL) {
     return ARES_FALSE;
   }
@@ -394,15 +399,15 @@ ares_bool_t ares__htable_remove(ares__htable_t *htable, const void *key)
   htable->num_keys--;
 
   /* Reduce collisions */
-  if (ares__llist_len(ares__llist_node_parent(node)) > 1) {
+  if (ares_llist_len(ares_llist_node_parent(node)) > 1) {
     htable->num_collisions--;
   }
 
-  ares__llist_node_destroy(node);
+  ares_llist_node_destroy(node);
   return ARES_TRUE;
 }
 
-size_t ares__htable_num_keys(const ares__htable_t *htable)
+size_t ares_htable_num_keys(const ares_htable_t *htable)
 {
   if (htable == NULL) {
     return 0;
@@ -410,16 +415,15 @@ size_t ares__htable_num_keys(const ares__htable_t *htable)
   return htable->num_keys;
 }
 
-unsigned int ares__htable_hash_FNV1a(const unsigned char *key, size_t key_len,
-                                     unsigned int seed)
+unsigned int ares_htable_hash_FNV1a(const unsigned char *key, size_t key_len,
+                                    unsigned int seed)
 {
-  /* recommended seed is 2166136261U, but we don't want collisions */
-  unsigned int hv = seed;
+  unsigned int hv = seed ^ 2166136261U;
   size_t       i;
 
   for (i = 0; i < key_len; i++) {
     hv ^= (unsigned int)key[i];
-    /* hv *= 0x01000193 */
+    /* hv *= 16777619 (0x01000193) */
     hv += (hv << 1) + (hv << 4) + (hv << 7) + (hv << 8) + (hv << 24);
   }
 
@@ -427,16 +431,15 @@ unsigned int ares__htable_hash_FNV1a(const unsigned char *key, size_t key_len,
 }
 
 /* Case insensitive version, meant for ASCII strings */
-unsigned int ares__htable_hash_FNV1a_casecmp(const unsigned char *key,
-                                             size_t key_len, unsigned int seed)
+unsigned int ares_htable_hash_FNV1a_casecmp(const unsigned char *key,
+                                            size_t key_len, unsigned int seed)
 {
-  /* recommended seed is 2166136261U, but we don't want collisions */
-  unsigned int hv = seed;
+  unsigned int hv = seed ^ 2166136261U;
   size_t       i;
 
   for (i = 0; i < key_len; i++) {
-    hv ^= (unsigned int)ares__tolower(key[i]);
-    /* hv *= 0x01000193 */
+    hv ^= (unsigned int)ares_tolower(key[i]);
+    /* hv *= 16777619 (0x01000193) */
     hv += (hv << 1) + (hv << 4) + (hv << 7) + (hv << 8) + (hv << 24);
   }
 
diff --git a/deps/cares/src/lib/dsa/ares__htable.h b/deps/cares/src/lib/dsa/ares_htable.h
similarity index 76%
rename from deps/cares/src/lib/dsa/ares__htable.h
rename to deps/cares/src/lib/dsa/ares_htable.h
index d09c865977cdae..5700286eb0fabf 100644
--- a/deps/cares/src/lib/dsa/ares__htable.h
+++ b/deps/cares/src/lib/dsa/ares_htable.h
@@ -27,7 +27,7 @@
 #define __ARES__HTABLE_H
 
 
-/*! \addtogroup ares__htable Base HashTable Data Structure
+/*! \addtogroup ares_htable Base HashTable Data Structure
  *
  * This is a basic hashtable data structure that is meant to be wrapped
  * by a higher level implementation.  This data structure is designed to
@@ -45,10 +45,10 @@
  * @{
  */
 
-struct ares__htable;
+struct ares_htable;
 
 /*! Opaque data type for generic hash table implementation */
-typedef struct ares__htable ares__htable_t;
+typedef struct ares_htable ares_htable_t;
 
 /*! Callback for generating a hash of the key.
  *
@@ -58,21 +58,21 @@ typedef struct ares__htable ares__htable_t;
  *                   but otherwise will not change between calls.
  *  \return hash
  */
-typedef unsigned int (*ares__htable_hashfunc_t)(const void  *key,
-                                                unsigned int seed);
+typedef unsigned int (*ares_htable_hashfunc_t)(const void  *key,
+                                               unsigned int seed);
 
 /*! Callback to free the bucket
  *
  *  \param[in] bucket  user provided bucket
  */
-typedef void (*ares__htable_bucket_free_t)(void *bucket);
+typedef void (*ares_htable_bucket_free_t)(void *bucket);
 
 /*! Callback to extract the key from the user-provided bucket
  *
  *  \param[in] bucket  user provided bucket
  *  \return pointer to key held in bucket
  */
-typedef const void *(*ares__htable_bucket_key_t)(const void *bucket);
+typedef const void *(*ares_htable_bucket_key_t)(const void *bucket);
 
 /*! Callback to compare two keys for equality
  *
@@ -80,15 +80,14 @@ typedef const void *(*ares__htable_bucket_key_t)(const void *bucket);
  *  \param[in] key2  second key
  *  \return ARES_TRUE if equal, ARES_FALSE if not
  */
-typedef ares_bool_t (*ares__htable_key_eq_t)(const void *key1,
-                                             const void *key2);
+typedef ares_bool_t (*ares_htable_key_eq_t)(const void *key1, const void *key2);
 
 
 /*! Destroy the initialized hashtable
  *
  *  \param[in] htable initialized hashtable
  */
-void            ares__htable_destroy(ares__htable_t *htable);
+void           ares_htable_destroy(ares_htable_t *htable);
 
 /*! Create a new hashtable
  *
@@ -98,17 +97,17 @@ void            ares__htable_destroy(ares__htable_t *htable);
  *  \param[in] key_eq      Required. Callback to check for key equality.
  *  \return initialized hashtable.  NULL if out of memory or misuse.
  */
-ares__htable_t *ares__htable_create(ares__htable_hashfunc_t    hash_func,
-                                    ares__htable_bucket_key_t  bucket_key,
-                                    ares__htable_bucket_free_t bucket_free,
-                                    ares__htable_key_eq_t      key_eq);
+ares_htable_t *ares_htable_create(ares_htable_hashfunc_t    hash_func,
+                                  ares_htable_bucket_key_t  bucket_key,
+                                  ares_htable_bucket_free_t bucket_free,
+                                  ares_htable_key_eq_t      key_eq);
 
 /*! Count of keys from initialized hashtable
  *
  *  \param[in] htable  Initialized hashtable.
  *  \return count of keys
  */
-size_t          ares__htable_num_keys(const ares__htable_t *htable);
+size_t         ares_htable_num_keys(const ares_htable_t *htable);
 
 /*! Retrieve an array of buckets from the hashtable.  This is mainly used as
  *  a helper for retrieving an array of keys.
@@ -120,8 +119,7 @@ size_t          ares__htable_num_keys(const ares__htable_t *htable);
  *          will be a dangling pointer.  It is expected wrappers will make
  *          such values safe by duplicating them.
  */
-const void    **ares__htable_all_buckets(const ares__htable_t *htable,
-                                         size_t               *num);
+const void **ares_htable_all_buckets(const ares_htable_t *htable, size_t *num);
 
 /*! Insert bucket into hashtable
  *
@@ -130,7 +128,7 @@ const void    **ares__htable_all_buckets(const ares__htable_t *htable,
  *                     allowed to be NULL.
  *  \return ARES_TRUE on success, ARES_FALSE if out of memory
  */
-ares_bool_t     ares__htable_insert(ares__htable_t *htable, void *bucket);
+ares_bool_t  ares_htable_insert(ares_htable_t *htable, void *bucket);
 
 /*! Retrieve bucket from hashtable based on key.
  *
@@ -138,7 +136,7 @@ ares_bool_t     ares__htable_insert(ares__htable_t *htable, void *bucket);
  *  \param[in] key     Pointer to key to use for comparison.
  *  \return matching bucket, or NULL if not found.
  */
-void           *ares__htable_get(const ares__htable_t *htable, const void *key);
+void        *ares_htable_get(const ares_htable_t *htable, const void *key);
 
 /*! Remove bucket from hashtable by key
  *
@@ -146,7 +144,7 @@ void           *ares__htable_get(const ares__htable_t *htable, const void *key);
  *  \param[in] key     Pointer to key to use for comparison
  *  \return ARES_TRUE if found, ARES_FALSE if not found
  */
-ares_bool_t     ares__htable_remove(ares__htable_t *htable, const void *key);
+ares_bool_t  ares_htable_remove(ares_htable_t *htable, const void *key);
 
 /*! FNV1a hash algorithm.  Can be used as underlying primitive for building
  *  a wrapper hashtable.
@@ -156,8 +154,8 @@ ares_bool_t     ares__htable_remove(ares__htable_t *htable, const void *key);
  *  \param[in] seed     Seed for generating hash
  *  \return hash value
  */
-unsigned int ares__htable_hash_FNV1a(const unsigned char *key, size_t key_len,
-                                     unsigned int seed);
+unsigned int ares_htable_hash_FNV1a(const unsigned char *key, size_t key_len,
+                                    unsigned int seed);
 
 /*! FNV1a hash algorithm, but converts all characters to lowercase before
  *  hashing to make the hash case-insensitive. Can be used as underlying
@@ -168,8 +166,8 @@ unsigned int ares__htable_hash_FNV1a(const unsigned char *key, size_t key_len,
  *  \param[in] seed     Seed for generating hash
  *  \return hash value
  */
-unsigned int ares__htable_hash_FNV1a_casecmp(const unsigned char *key,
-                                             size_t key_len, unsigned int seed);
+unsigned int ares_htable_hash_FNV1a_casecmp(const unsigned char *key,
+                                            size_t key_len, unsigned int seed);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/dsa/ares__htable_asvp.c b/deps/cares/src/lib/dsa/ares_htable_asvp.c
similarity index 62%
rename from deps/cares/src/lib/dsa/ares__htable_asvp.c
rename to deps/cares/src/lib/dsa/ares_htable_asvp.c
index 4b9267ff6c14ee..32f4d2c9949962 100644
--- a/deps/cares/src/lib/dsa/ares__htable_asvp.c
+++ b/deps/cares/src/lib/dsa/ares_htable_asvp.c
@@ -24,46 +24,45 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__htable.h"
-#include "ares__htable_asvp.h"
+#include "ares_htable.h"
+#include "ares_htable_asvp.h"
 
-struct ares__htable_asvp {
-  ares__htable_asvp_val_free_t free_val;
-  ares__htable_t              *hash;
+struct ares_htable_asvp {
+  ares_htable_asvp_val_free_t free_val;
+  ares_htable_t              *hash;
 };
 
 typedef struct {
-  ares_socket_t        key;
-  void                *val;
-  ares__htable_asvp_t *parent;
-} ares__htable_asvp_bucket_t;
+  ares_socket_t       key;
+  void               *val;
+  ares_htable_asvp_t *parent;
+} ares_htable_asvp_bucket_t;
 
-void ares__htable_asvp_destroy(ares__htable_asvp_t *htable)
+void ares_htable_asvp_destroy(ares_htable_asvp_t *htable)
 {
   if (htable == NULL) {
     return;
   }
 
-  ares__htable_destroy(htable->hash);
+  ares_htable_destroy(htable->hash);
   ares_free(htable);
 }
 
 static unsigned int hash_func(const void *key, unsigned int seed)
 {
   const ares_socket_t *arg = key;
-  return ares__htable_hash_FNV1a((const unsigned char *)arg, sizeof(*arg),
-                                 seed);
+  return ares_htable_hash_FNV1a((const unsigned char *)arg, sizeof(*arg), seed);
 }
 
 static const void *bucket_key(const void *bucket)
 {
-  const ares__htable_asvp_bucket_t *arg = bucket;
+  const ares_htable_asvp_bucket_t *arg = bucket;
   return &arg->key;
 }
 
 static void bucket_free(void *bucket)
 {
-  ares__htable_asvp_bucket_t *arg = bucket;
+  ares_htable_asvp_bucket_t *arg = bucket;
 
   if (arg->parent->free_val) {
     arg->parent->free_val(arg->val);
@@ -84,16 +83,15 @@ static ares_bool_t key_eq(const void *key1, const void *key2)
   return ARES_FALSE;
 }
 
-ares__htable_asvp_t *
-  ares__htable_asvp_create(ares__htable_asvp_val_free_t val_free)
+ares_htable_asvp_t *
+  ares_htable_asvp_create(ares_htable_asvp_val_free_t val_free)
 {
-  ares__htable_asvp_t *htable = ares_malloc(sizeof(*htable));
+  ares_htable_asvp_t *htable = ares_malloc(sizeof(*htable));
   if (htable == NULL) {
     goto fail;
   }
 
-  htable->hash =
-    ares__htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
   if (htable->hash == NULL) {
     goto fail;
   }
@@ -104,14 +102,14 @@ ares__htable_asvp_t *
 
 fail:
   if (htable) {
-    ares__htable_destroy(htable->hash);
+    ares_htable_destroy(htable->hash);
     ares_free(htable);
   }
   return NULL;
 }
 
-ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
-                                      size_t                    *num)
+ares_socket_t *ares_htable_asvp_keys(const ares_htable_asvp_t *htable,
+                                     size_t                   *num)
 {
   const void   **buckets = NULL;
   size_t         cnt     = 0;
@@ -124,7 +122,7 @@ ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
 
   *num = 0;
 
-  buckets = ares__htable_all_buckets(htable->hash, &cnt);
+  buckets = ares_htable_all_buckets(htable->hash, &cnt);
   if (buckets == NULL || cnt == 0) {
     return NULL;
   }
@@ -136,7 +134,7 @@ ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
   }
 
   for (i = 0; i < cnt; i++) {
-    out[i] = ((const ares__htable_asvp_bucket_t *)buckets[i])->key;
+    out[i] = ((const ares_htable_asvp_bucket_t *)buckets[i])->key;
   }
 
   ares_free(buckets);
@@ -144,10 +142,10 @@ ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
   return out;
 }
 
-ares_bool_t ares__htable_asvp_insert(ares__htable_asvp_t *htable,
-                                     ares_socket_t key, void *val)
+ares_bool_t ares_htable_asvp_insert(ares_htable_asvp_t *htable,
+                                    ares_socket_t key, void *val)
 {
-  ares__htable_asvp_bucket_t *bucket = NULL;
+  ares_htable_asvp_bucket_t *bucket = NULL;
 
   if (htable == NULL) {
     goto fail;
@@ -162,7 +160,7 @@ ares_bool_t ares__htable_asvp_insert(ares__htable_asvp_t *htable,
   bucket->key    = key;
   bucket->val    = val;
 
-  if (!ares__htable_insert(htable->hash, bucket)) {
+  if (!ares_htable_insert(htable->hash, bucket)) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
@@ -175,10 +173,10 @@ ares_bool_t ares__htable_asvp_insert(ares__htable_asvp_t *htable,
   return ARES_FALSE;
 }
 
-ares_bool_t ares__htable_asvp_get(const ares__htable_asvp_t *htable,
-                                  ares_socket_t key, void **val)
+ares_bool_t ares_htable_asvp_get(const ares_htable_asvp_t *htable,
+                                 ares_socket_t key, void **val)
 {
-  ares__htable_asvp_bucket_t *bucket = NULL;
+  ares_htable_asvp_bucket_t *bucket = NULL;
 
   if (val) {
     *val = NULL;
@@ -188,7 +186,7 @@ ares_bool_t ares__htable_asvp_get(const ares__htable_asvp_t *htable,
     return ARES_FALSE;
   }
 
-  bucket = ares__htable_get(htable->hash, &key);
+  bucket = ares_htable_get(htable->hash, &key);
   if (bucket == NULL) {
     return ARES_FALSE;
   }
@@ -199,28 +197,28 @@ ares_bool_t ares__htable_asvp_get(const ares__htable_asvp_t *htable,
   return ARES_TRUE;
 }
 
-void *ares__htable_asvp_get_direct(const ares__htable_asvp_t *htable,
-                                   ares_socket_t              key)
+void *ares_htable_asvp_get_direct(const ares_htable_asvp_t *htable,
+                                  ares_socket_t             key)
 {
   void *val = NULL;
-  ares__htable_asvp_get(htable, key, &val);
+  ares_htable_asvp_get(htable, key, &val);
   return val;
 }
 
-ares_bool_t ares__htable_asvp_remove(ares__htable_asvp_t *htable,
-                                     ares_socket_t        key)
+ares_bool_t ares_htable_asvp_remove(ares_htable_asvp_t *htable,
+                                    ares_socket_t       key)
 {
   if (htable == NULL) {
     return ARES_FALSE;
   }
 
-  return ares__htable_remove(htable->hash, &key);
+  return ares_htable_remove(htable->hash, &key);
 }
 
-size_t ares__htable_asvp_num_keys(const ares__htable_asvp_t *htable)
+size_t ares_htable_asvp_num_keys(const ares_htable_asvp_t *htable)
 {
   if (htable == NULL) {
     return 0;
   }
-  return ares__htable_num_keys(htable->hash);
+  return ares_htable_num_keys(htable->hash);
 }
diff --git a/deps/cares/src/lib/dsa/ares_htable_dict.c b/deps/cares/src/lib/dsa/ares_htable_dict.c
new file mode 100644
index 00000000000000..93d7a20137c8db
--- /dev/null
+++ b/deps/cares/src/lib/dsa/ares_htable_dict.c
@@ -0,0 +1,228 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#include "ares_private.h"
+#include "ares_htable.h"
+#include "ares_htable_dict.h"
+
+struct ares_htable_dict {
+  ares_htable_t *hash;
+};
+
+typedef struct {
+  char               *key;
+  char               *val;
+  ares_htable_dict_t *parent;
+} ares_htable_dict_bucket_t;
+
+void ares_htable_dict_destroy(ares_htable_dict_t *htable)
+{
+  if (htable == NULL) {
+    return; /* LCOV_EXCL_LINE: DefensiveCoding */
+  }
+
+  ares_htable_destroy(htable->hash);
+  ares_free(htable);
+}
+
+static unsigned int hash_func(const void *key, unsigned int seed)
+{
+  return ares_htable_hash_FNV1a_casecmp(key, ares_strlen(key), seed);
+}
+
+static const void *bucket_key(const void *bucket)
+{
+  const ares_htable_dict_bucket_t *arg = bucket;
+  return arg->key;
+}
+
+static void bucket_free(void *bucket)
+{
+  ares_htable_dict_bucket_t *arg = bucket;
+
+  ares_free(arg->key);
+  ares_free(arg->val);
+
+  ares_free(arg);
+}
+
+static ares_bool_t key_eq(const void *key1, const void *key2)
+{
+  return ares_strcaseeq(key1, key2);
+}
+
+ares_htable_dict_t *ares_htable_dict_create(void)
+{
+  ares_htable_dict_t *htable = ares_malloc(sizeof(*htable));
+  if (htable == NULL) {
+    goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  if (htable->hash == NULL) {
+    goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  return htable;
+
+/* LCOV_EXCL_START: OutOfMemory */
+fail:
+  if (htable) {
+    ares_htable_destroy(htable->hash);
+    ares_free(htable);
+  }
+  return NULL;
+  /* LCOV_EXCL_STOP */
+}
+
+ares_bool_t ares_htable_dict_insert(ares_htable_dict_t *htable, const char *key,
+                                    const char *val)
+{
+  ares_htable_dict_bucket_t *bucket = NULL;
+
+  if (htable == NULL || ares_strlen(key) == 0) {
+    goto fail;
+  }
+
+  bucket = ares_malloc_zero(sizeof(*bucket));
+  if (bucket == NULL) {
+    goto fail;
+  }
+
+  bucket->parent = htable;
+  bucket->key    = ares_strdup(key);
+  if (bucket->key == NULL) {
+    goto fail;
+  }
+
+  if (val != NULL) {
+    bucket->val = ares_strdup(val);
+    if (bucket->val == NULL) {
+      goto fail;
+    }
+  }
+
+  if (!ares_htable_insert(htable->hash, bucket)) {
+    goto fail;
+  }
+
+  return ARES_TRUE;
+
+fail:
+  if (bucket) {
+    ares_free(bucket->val);
+    ares_free(bucket);
+  }
+  return ARES_FALSE;
+}
+
+ares_bool_t ares_htable_dict_get(const ares_htable_dict_t *htable,
+                                 const char *key, const char **val)
+{
+  const ares_htable_dict_bucket_t *bucket = NULL;
+
+  if (val) {
+    *val = NULL;
+  }
+
+  if (htable == NULL) {
+    return ARES_FALSE;
+  }
+
+  bucket = ares_htable_get(htable->hash, key);
+  if (bucket == NULL) {
+    return ARES_FALSE;
+  }
+
+  if (val) {
+    *val = bucket->val;
+  }
+  return ARES_TRUE;
+}
+
+const char *ares_htable_dict_get_direct(const ares_htable_dict_t *htable,
+                                        const char               *key)
+{
+  const char *val = NULL;
+  ares_htable_dict_get(htable, key, &val);
+  return val;
+}
+
+ares_bool_t ares_htable_dict_remove(ares_htable_dict_t *htable, const char *key)
+{
+  if (htable == NULL) {
+    return ARES_FALSE;
+  }
+
+  return ares_htable_remove(htable->hash, key);
+}
+
+size_t ares_htable_dict_num_keys(const ares_htable_dict_t *htable)
+{
+  if (htable == NULL) {
+    return 0;
+  }
+  return ares_htable_num_keys(htable->hash);
+}
+
+char **ares_htable_dict_keys(const ares_htable_dict_t *htable, size_t *num)
+{
+  const void **buckets = NULL;
+  size_t       cnt     = 0;
+  char       **out     = NULL;
+  size_t       i;
+
+  if (htable == NULL || num == NULL) {
+    return NULL; /* LCOV_EXCL_LINE: DefensiveCoding */
+  }
+
+  *num = 0;
+
+  buckets = ares_htable_all_buckets(htable->hash, &cnt);
+  if (buckets == NULL || cnt == 0) {
+    return NULL;
+  }
+
+  out = ares_malloc_zero(sizeof(*out) * cnt);
+  if (out == NULL) {
+    goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  for (i = 0; i < cnt; i++) {
+    out[i] = ares_strdup(((const ares_htable_dict_bucket_t *)buckets[i])->key);
+    if (out[i] == NULL) {
+      goto fail;
+    }
+  }
+
+  ares_free(buckets);
+  *num = cnt;
+  return out;
+
+fail:
+  *num = 0;
+  ares_free_array(out, cnt, ares_free);
+  return NULL;
+}
diff --git a/deps/cares/src/lib/dsa/ares__htable_strvp.c b/deps/cares/src/lib/dsa/ares_htable_strvp.c
similarity index 54%
rename from deps/cares/src/lib/dsa/ares__htable_strvp.c
rename to deps/cares/src/lib/dsa/ares_htable_strvp.c
index d73a1928a75f95..daca117e80f3bb 100644
--- a/deps/cares/src/lib/dsa/ares__htable_strvp.c
+++ b/deps/cares/src/lib/dsa/ares_htable_strvp.c
@@ -24,46 +24,46 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__htable.h"
-#include "ares__htable_strvp.h"
+#include "ares_htable.h"
+#include "ares_htable_strvp.h"
 
-struct ares__htable_strvp {
-  ares__htable_strvp_val_free_t free_val;
-  ares__htable_t               *hash;
+struct ares_htable_strvp {
+  ares_htable_strvp_val_free_t free_val;
+  ares_htable_t               *hash;
 };
 
 typedef struct {
-  char                 *key;
-  void                 *val;
-  ares__htable_strvp_t *parent;
-} ares__htable_strvp_bucket_t;
+  char                *key;
+  void                *val;
+  ares_htable_strvp_t *parent;
+} ares_htable_strvp_bucket_t;
 
-void ares__htable_strvp_destroy(ares__htable_strvp_t *htable)
+void ares_htable_strvp_destroy(ares_htable_strvp_t *htable)
 {
   if (htable == NULL) {
     return;
   }
 
-  ares__htable_destroy(htable->hash);
+  ares_htable_destroy(htable->hash);
   ares_free(htable);
 }
 
 static unsigned int hash_func(const void *key, unsigned int seed)
 {
   const char *arg = key;
-  return ares__htable_hash_FNV1a_casecmp((const unsigned char *)arg,
-                                         ares_strlen(arg), seed);
+  return ares_htable_hash_FNV1a_casecmp((const unsigned char *)arg,
+                                        ares_strlen(arg), seed);
 }
 
 static const void *bucket_key(const void *bucket)
 {
-  const ares__htable_strvp_bucket_t *arg = bucket;
+  const ares_htable_strvp_bucket_t *arg = bucket;
   return arg->key;
 }
 
 static void bucket_free(void *bucket)
 {
-  ares__htable_strvp_bucket_t *arg = bucket;
+  ares_htable_strvp_bucket_t *arg = bucket;
 
   if (arg->parent->free_val) {
     arg->parent->free_val(arg->val);
@@ -74,26 +74,18 @@ static void bucket_free(void *bucket)
 
 static ares_bool_t key_eq(const void *key1, const void *key2)
 {
-  const char *k1 = key1;
-  const char *k2 = key2;
-
-  if (strcasecmp(k1, k2) == 0) {
-    return ARES_TRUE;
-  }
-
-  return ARES_FALSE;
+  return ares_strcaseeq(key1, key2);
 }
 
-ares__htable_strvp_t *
-  ares__htable_strvp_create(ares__htable_strvp_val_free_t val_free)
+ares_htable_strvp_t *
+  ares_htable_strvp_create(ares_htable_strvp_val_free_t val_free)
 {
-  ares__htable_strvp_t *htable = ares_malloc(sizeof(*htable));
+  ares_htable_strvp_t *htable = ares_malloc(sizeof(*htable));
   if (htable == NULL) {
     goto fail;
   }
 
-  htable->hash =
-    ares__htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
   if (htable->hash == NULL) {
     goto fail;
   }
@@ -104,16 +96,16 @@ ares__htable_strvp_t *
 
 fail:
   if (htable) {
-    ares__htable_destroy(htable->hash);
+    ares_htable_destroy(htable->hash);
     ares_free(htable);
   }
   return NULL;
 }
 
-ares_bool_t ares__htable_strvp_insert(ares__htable_strvp_t *htable,
-                                      const char *key, void *val)
+ares_bool_t ares_htable_strvp_insert(ares_htable_strvp_t *htable,
+                                     const char *key, void *val)
 {
-  ares__htable_strvp_bucket_t *bucket = NULL;
+  ares_htable_strvp_bucket_t *bucket = NULL;
 
   if (htable == NULL || key == NULL) {
     goto fail;
@@ -131,7 +123,7 @@ ares_bool_t ares__htable_strvp_insert(ares__htable_strvp_t *htable,
   }
   bucket->val = val;
 
-  if (!ares__htable_insert(htable->hash, bucket)) {
+  if (!ares_htable_insert(htable->hash, bucket)) {
     goto fail;
   }
 
@@ -145,10 +137,10 @@ ares_bool_t ares__htable_strvp_insert(ares__htable_strvp_t *htable,
   return ARES_FALSE;
 }
 
-ares_bool_t ares__htable_strvp_get(const ares__htable_strvp_t *htable,
-                                   const char *key, void **val)
+ares_bool_t ares_htable_strvp_get(const ares_htable_strvp_t *htable,
+                                  const char *key, void **val)
 {
-  ares__htable_strvp_bucket_t *bucket = NULL;
+  ares_htable_strvp_bucket_t *bucket = NULL;
 
   if (val) {
     *val = NULL;
@@ -158,7 +150,7 @@ ares_bool_t ares__htable_strvp_get(const ares__htable_strvp_t *htable,
     return ARES_FALSE;
   }
 
-  bucket = ares__htable_get(htable->hash, key);
+  bucket = ares_htable_get(htable->hash, key);
   if (bucket == NULL) {
     return ARES_FALSE;
   }
@@ -169,28 +161,50 @@ ares_bool_t ares__htable_strvp_get(const ares__htable_strvp_t *htable,
   return ARES_TRUE;
 }
 
-void *ares__htable_strvp_get_direct(const ares__htable_strvp_t *htable,
-                                    const char                 *key)
+void *ares_htable_strvp_get_direct(const ares_htable_strvp_t *htable,
+                                   const char                *key)
 {
   void *val = NULL;
-  ares__htable_strvp_get(htable, key, &val);
+  ares_htable_strvp_get(htable, key, &val);
   return val;
 }
 
-ares_bool_t ares__htable_strvp_remove(ares__htable_strvp_t *htable,
-                                      const char           *key)
+ares_bool_t ares_htable_strvp_remove(ares_htable_strvp_t *htable,
+                                     const char          *key)
 {
   if (htable == NULL) {
     return ARES_FALSE;
   }
 
-  return ares__htable_remove(htable->hash, key);
+  return ares_htable_remove(htable->hash, key);
+}
+
+void *ares_htable_strvp_claim(ares_htable_strvp_t *htable, const char *key)
+{
+  ares_htable_strvp_bucket_t *bucket = NULL;
+  void                       *val;
+
+  if (htable == NULL || key == NULL) {
+    return NULL;
+  }
+
+  bucket = ares_htable_get(htable->hash, key);
+  if (bucket == NULL) {
+    return NULL;
+  }
+
+  /* Unassociate value from bucket */
+  val         = bucket->val;
+  bucket->val = NULL;
+
+  ares_htable_strvp_remove(htable, key);
+  return val;
 }
 
-size_t ares__htable_strvp_num_keys(const ares__htable_strvp_t *htable)
+size_t ares_htable_strvp_num_keys(const ares_htable_strvp_t *htable)
 {
   if (htable == NULL) {
     return 0;
   }
-  return ares__htable_num_keys(htable->hash);
+  return ares_htable_num_keys(htable->hash);
 }
diff --git a/deps/cares/src/lib/dsa/ares__htable_szvp.c b/deps/cares/src/lib/dsa/ares_htable_szvp.c
similarity index 61%
rename from deps/cares/src/lib/dsa/ares__htable_szvp.c
rename to deps/cares/src/lib/dsa/ares_htable_szvp.c
index b3e88d8b9a434a..fdaae0a571c80c 100644
--- a/deps/cares/src/lib/dsa/ares__htable_szvp.c
+++ b/deps/cares/src/lib/dsa/ares_htable_szvp.c
@@ -24,46 +24,45 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__htable.h"
-#include "ares__htable_szvp.h"
+#include "ares_htable.h"
+#include "ares_htable_szvp.h"
 
-struct ares__htable_szvp {
-  ares__htable_szvp_val_free_t free_val;
-  ares__htable_t              *hash;
+struct ares_htable_szvp {
+  ares_htable_szvp_val_free_t free_val;
+  ares_htable_t              *hash;
 };
 
 typedef struct {
-  size_t               key;
-  void                *val;
-  ares__htable_szvp_t *parent;
-} ares__htable_szvp_bucket_t;
+  size_t              key;
+  void               *val;
+  ares_htable_szvp_t *parent;
+} ares_htable_szvp_bucket_t;
 
-void ares__htable_szvp_destroy(ares__htable_szvp_t *htable)
+void ares_htable_szvp_destroy(ares_htable_szvp_t *htable)
 {
   if (htable == NULL) {
     return;
   }
 
-  ares__htable_destroy(htable->hash);
+  ares_htable_destroy(htable->hash);
   ares_free(htable);
 }
 
 static unsigned int hash_func(const void *key, unsigned int seed)
 {
   const size_t *arg = key;
-  return ares__htable_hash_FNV1a((const unsigned char *)arg, sizeof(*arg),
-                                 seed);
+  return ares_htable_hash_FNV1a((const unsigned char *)arg, sizeof(*arg), seed);
 }
 
 static const void *bucket_key(const void *bucket)
 {
-  const ares__htable_szvp_bucket_t *arg = bucket;
+  const ares_htable_szvp_bucket_t *arg = bucket;
   return &arg->key;
 }
 
 static void bucket_free(void *bucket)
 {
-  ares__htable_szvp_bucket_t *arg = bucket;
+  ares_htable_szvp_bucket_t *arg = bucket;
 
   if (arg->parent->free_val) {
     arg->parent->free_val(arg->val);
@@ -84,16 +83,15 @@ static ares_bool_t key_eq(const void *key1, const void *key2)
   return ARES_FALSE;
 }
 
-ares__htable_szvp_t *
-  ares__htable_szvp_create(ares__htable_szvp_val_free_t val_free)
+ares_htable_szvp_t *
+  ares_htable_szvp_create(ares_htable_szvp_val_free_t val_free)
 {
-  ares__htable_szvp_t *htable = ares_malloc(sizeof(*htable));
+  ares_htable_szvp_t *htable = ares_malloc(sizeof(*htable));
   if (htable == NULL) {
     goto fail;
   }
 
-  htable->hash =
-    ares__htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
   if (htable->hash == NULL) {
     goto fail;
   }
@@ -104,16 +102,16 @@ ares__htable_szvp_t *
 
 fail:
   if (htable) {
-    ares__htable_destroy(htable->hash);
+    ares_htable_destroy(htable->hash);
     ares_free(htable);
   }
   return NULL;
 }
 
-ares_bool_t ares__htable_szvp_insert(ares__htable_szvp_t *htable, size_t key,
-                                     void *val)
+ares_bool_t ares_htable_szvp_insert(ares_htable_szvp_t *htable, size_t key,
+                                    void *val)
 {
-  ares__htable_szvp_bucket_t *bucket = NULL;
+  ares_htable_szvp_bucket_t *bucket = NULL;
 
   if (htable == NULL) {
     goto fail;
@@ -128,7 +126,7 @@ ares_bool_t ares__htable_szvp_insert(ares__htable_szvp_t *htable, size_t key,
   bucket->key    = key;
   bucket->val    = val;
 
-  if (!ares__htable_insert(htable->hash, bucket)) {
+  if (!ares_htable_insert(htable->hash, bucket)) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
@@ -141,10 +139,10 @@ ares_bool_t ares__htable_szvp_insert(ares__htable_szvp_t *htable, size_t key,
   return ARES_FALSE;
 }
 
-ares_bool_t ares__htable_szvp_get(const ares__htable_szvp_t *htable, size_t key,
-                                  void **val)
+ares_bool_t ares_htable_szvp_get(const ares_htable_szvp_t *htable, size_t key,
+                                 void **val)
 {
-  ares__htable_szvp_bucket_t *bucket = NULL;
+  ares_htable_szvp_bucket_t *bucket = NULL;
 
   if (val) {
     *val = NULL;
@@ -154,7 +152,7 @@ ares_bool_t ares__htable_szvp_get(const ares__htable_szvp_t *htable, size_t key,
     return ARES_FALSE;
   }
 
-  bucket = ares__htable_get(htable->hash, &key);
+  bucket = ares_htable_get(htable->hash, &key);
   if (bucket == NULL) {
     return ARES_FALSE;
   }
@@ -165,27 +163,26 @@ ares_bool_t ares__htable_szvp_get(const ares__htable_szvp_t *htable, size_t key,
   return ARES_TRUE;
 }
 
-void *ares__htable_szvp_get_direct(const ares__htable_szvp_t *htable,
-                                   size_t                     key)
+void *ares_htable_szvp_get_direct(const ares_htable_szvp_t *htable, size_t key)
 {
   void *val = NULL;
-  ares__htable_szvp_get(htable, key, &val);
+  ares_htable_szvp_get(htable, key, &val);
   return val;
 }
 
-ares_bool_t ares__htable_szvp_remove(ares__htable_szvp_t *htable, size_t key)
+ares_bool_t ares_htable_szvp_remove(ares_htable_szvp_t *htable, size_t key)
 {
   if (htable == NULL) {
     return ARES_FALSE;
   }
 
-  return ares__htable_remove(htable->hash, &key);
+  return ares_htable_remove(htable->hash, &key);
 }
 
-size_t ares__htable_szvp_num_keys(const ares__htable_szvp_t *htable)
+size_t ares_htable_szvp_num_keys(const ares_htable_szvp_t *htable)
 {
   if (htable == NULL) {
     return 0;
   }
-  return ares__htable_num_keys(htable->hash);
+  return ares_htable_num_keys(htable->hash);
 }
diff --git a/deps/cares/src/lib/dsa/ares_htable_vpstr.c b/deps/cares/src/lib/dsa/ares_htable_vpstr.c
new file mode 100644
index 00000000000000..86c881f768613b
--- /dev/null
+++ b/deps/cares/src/lib/dsa/ares_htable_vpstr.c
@@ -0,0 +1,186 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#include "ares_private.h"
+#include "ares_htable.h"
+#include "ares_htable_vpstr.h"
+
+struct ares_htable_vpstr {
+  ares_htable_t *hash;
+};
+
+typedef struct {
+  void                *key;
+  char                *val;
+  ares_htable_vpstr_t *parent;
+} ares_htable_vpstr_bucket_t;
+
+void ares_htable_vpstr_destroy(ares_htable_vpstr_t *htable)
+{
+  if (htable == NULL) {
+    return; /* LCOV_EXCL_LINE: DefensiveCoding */
+  }
+
+  ares_htable_destroy(htable->hash);
+  ares_free(htable);
+}
+
+static unsigned int hash_func(const void *key, unsigned int seed)
+{
+  return ares_htable_hash_FNV1a((const unsigned char *)&key, sizeof(key), seed);
+}
+
+static const void *bucket_key(const void *bucket)
+{
+  const ares_htable_vpstr_bucket_t *arg = bucket;
+  return arg->key;
+}
+
+static void bucket_free(void *bucket)
+{
+  ares_htable_vpstr_bucket_t *arg = bucket;
+
+  ares_free(arg->val);
+
+  ares_free(arg);
+}
+
+static ares_bool_t key_eq(const void *key1, const void *key2)
+{
+  if (key1 == key2) {
+    return ARES_TRUE;
+  }
+
+  return ARES_FALSE;
+}
+
+ares_htable_vpstr_t *ares_htable_vpstr_create(void)
+{
+  ares_htable_vpstr_t *htable = ares_malloc(sizeof(*htable));
+  if (htable == NULL) {
+    goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  if (htable->hash == NULL) {
+    goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
+  }
+
+  return htable;
+
+/* LCOV_EXCL_START: OutOfMemory */
+fail:
+  if (htable) {
+    ares_htable_destroy(htable->hash);
+    ares_free(htable);
+  }
+  return NULL;
+  /* LCOV_EXCL_STOP */
+}
+
+ares_bool_t ares_htable_vpstr_insert(ares_htable_vpstr_t *htable, void *key,
+                                     const char *val)
+{
+  ares_htable_vpstr_bucket_t *bucket = NULL;
+
+  if (htable == NULL) {
+    goto fail;
+  }
+
+  bucket = ares_malloc(sizeof(*bucket));
+  if (bucket == NULL) {
+    goto fail;
+  }
+
+  bucket->parent = htable;
+  bucket->key    = key;
+  bucket->val    = ares_strdup(val);
+  if (bucket->val == NULL) {
+    goto fail;
+  }
+
+  if (!ares_htable_insert(htable->hash, bucket)) {
+    goto fail;
+  }
+
+  return ARES_TRUE;
+
+fail:
+  if (bucket) {
+    ares_free(bucket->val);
+    ares_free(bucket);
+  }
+  return ARES_FALSE;
+}
+
+ares_bool_t ares_htable_vpstr_get(const ares_htable_vpstr_t *htable,
+                                  const void *key, const char **val)
+{
+  const ares_htable_vpstr_bucket_t *bucket = NULL;
+
+  if (val) {
+    *val = NULL;
+  }
+
+  if (htable == NULL) {
+    return ARES_FALSE;
+  }
+
+  bucket = ares_htable_get(htable->hash, key);
+  if (bucket == NULL) {
+    return ARES_FALSE;
+  }
+
+  if (val) {
+    *val = bucket->val;
+  }
+  return ARES_TRUE;
+}
+
+const char *ares_htable_vpstr_get_direct(const ares_htable_vpstr_t *htable,
+                                         const void                *key)
+{
+  const char *val = NULL;
+  ares_htable_vpstr_get(htable, key, &val);
+  return val;
+}
+
+ares_bool_t ares_htable_vpstr_remove(ares_htable_vpstr_t *htable,
+                                     const void          *key)
+{
+  if (htable == NULL) {
+    return ARES_FALSE;
+  }
+
+  return ares_htable_remove(htable->hash, key);
+}
+
+size_t ares_htable_vpstr_num_keys(const ares_htable_vpstr_t *htable)
+{
+  if (htable == NULL) {
+    return 0;
+  }
+  return ares_htable_num_keys(htable->hash);
+}
diff --git a/deps/cares/src/lib/dsa/ares__htable_vpvp.c b/deps/cares/src/lib/dsa/ares_htable_vpvp.c
similarity index 60%
rename from deps/cares/src/lib/dsa/ares__htable_vpvp.c
rename to deps/cares/src/lib/dsa/ares_htable_vpvp.c
index 9042c48dd7fba2..14fd6e9da097ed 100644
--- a/deps/cares/src/lib/dsa/ares__htable_vpvp.c
+++ b/deps/cares/src/lib/dsa/ares_htable_vpvp.c
@@ -24,46 +24,45 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__htable.h"
-#include "ares__htable_vpvp.h"
+#include "ares_htable.h"
+#include "ares_htable_vpvp.h"
 
-struct ares__htable_vpvp {
-  ares__htable_vpvp_key_free_t free_key;
-  ares__htable_vpvp_val_free_t free_val;
-  ares__htable_t              *hash;
+struct ares_htable_vpvp {
+  ares_htable_vpvp_key_free_t free_key;
+  ares_htable_vpvp_val_free_t free_val;
+  ares_htable_t              *hash;
 };
 
 typedef struct {
-  void                *key;
-  void                *val;
-  ares__htable_vpvp_t *parent;
-} ares__htable_vpvp_bucket_t;
+  void               *key;
+  void               *val;
+  ares_htable_vpvp_t *parent;
+} ares_htable_vpvp_bucket_t;
 
-void ares__htable_vpvp_destroy(ares__htable_vpvp_t *htable)
+void ares_htable_vpvp_destroy(ares_htable_vpvp_t *htable)
 {
   if (htable == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__htable_destroy(htable->hash);
+  ares_htable_destroy(htable->hash);
   ares_free(htable);
 }
 
 static unsigned int hash_func(const void *key, unsigned int seed)
 {
-  return ares__htable_hash_FNV1a((const unsigned char *)&key, sizeof(key),
-                                 seed);
+  return ares_htable_hash_FNV1a((const unsigned char *)&key, sizeof(key), seed);
 }
 
 static const void *bucket_key(const void *bucket)
 {
-  const ares__htable_vpvp_bucket_t *arg = bucket;
+  const ares_htable_vpvp_bucket_t *arg = bucket;
   return arg->key;
 }
 
 static void bucket_free(void *bucket)
 {
-  ares__htable_vpvp_bucket_t *arg = bucket;
+  ares_htable_vpvp_bucket_t *arg = bucket;
 
   if (arg->parent->free_key) {
     arg->parent->free_key(arg->key);
@@ -85,17 +84,16 @@ static ares_bool_t key_eq(const void *key1, const void *key2)
   return ARES_FALSE;
 }
 
-ares__htable_vpvp_t *
-  ares__htable_vpvp_create(ares__htable_vpvp_key_free_t key_free,
-                           ares__htable_vpvp_val_free_t val_free)
+ares_htable_vpvp_t *
+  ares_htable_vpvp_create(ares_htable_vpvp_key_free_t key_free,
+                          ares_htable_vpvp_val_free_t val_free)
 {
-  ares__htable_vpvp_t *htable = ares_malloc(sizeof(*htable));
+  ares_htable_vpvp_t *htable = ares_malloc(sizeof(*htable));
   if (htable == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  htable->hash =
-    ares__htable_create(hash_func, bucket_key, bucket_free, key_eq);
+  htable->hash = ares_htable_create(hash_func, bucket_key, bucket_free, key_eq);
   if (htable->hash == NULL) {
     goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -108,17 +106,17 @@ ares__htable_vpvp_t *
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
   if (htable) {
-    ares__htable_destroy(htable->hash);
+    ares_htable_destroy(htable->hash);
     ares_free(htable);
   }
   return NULL;
   /* LCOV_EXCL_STOP */
 }
 
-ares_bool_t ares__htable_vpvp_insert(ares__htable_vpvp_t *htable, void *key,
-                                     void *val)
+ares_bool_t ares_htable_vpvp_insert(ares_htable_vpvp_t *htable, void *key,
+                                    void *val)
 {
-  ares__htable_vpvp_bucket_t *bucket = NULL;
+  ares_htable_vpvp_bucket_t *bucket = NULL;
 
   if (htable == NULL) {
     goto fail;
@@ -133,7 +131,7 @@ ares_bool_t ares__htable_vpvp_insert(ares__htable_vpvp_t *htable, void *key,
   bucket->key    = key;
   bucket->val    = val;
 
-  if (!ares__htable_insert(htable->hash, bucket)) {
+  if (!ares_htable_insert(htable->hash, bucket)) {
     goto fail;
   }
 
@@ -146,10 +144,10 @@ ares_bool_t ares__htable_vpvp_insert(ares__htable_vpvp_t *htable, void *key,
   return ARES_FALSE;
 }
 
-ares_bool_t ares__htable_vpvp_get(const ares__htable_vpvp_t *htable,
-                                  const void *key, void **val)
+ares_bool_t ares_htable_vpvp_get(const ares_htable_vpvp_t *htable,
+                                 const void *key, void **val)
 {
-  ares__htable_vpvp_bucket_t *bucket = NULL;
+  ares_htable_vpvp_bucket_t *bucket = NULL;
 
   if (val) {
     *val = NULL;
@@ -159,7 +157,7 @@ ares_bool_t ares__htable_vpvp_get(const ares__htable_vpvp_t *htable,
     return ARES_FALSE;
   }
 
-  bucket = ares__htable_get(htable->hash, key);
+  bucket = ares_htable_get(htable->hash, key);
   if (bucket == NULL) {
     return ARES_FALSE;
   }
@@ -170,28 +168,27 @@ ares_bool_t ares__htable_vpvp_get(const ares__htable_vpvp_t *htable,
   return ARES_TRUE;
 }
 
-void *ares__htable_vpvp_get_direct(const ares__htable_vpvp_t *htable,
-                                   const void                *key)
+void *ares_htable_vpvp_get_direct(const ares_htable_vpvp_t *htable,
+                                  const void               *key)
 {
   void *val = NULL;
-  ares__htable_vpvp_get(htable, key, &val);
+  ares_htable_vpvp_get(htable, key, &val);
   return val;
 }
 
-ares_bool_t ares__htable_vpvp_remove(ares__htable_vpvp_t *htable,
-                                     const void          *key)
+ares_bool_t ares_htable_vpvp_remove(ares_htable_vpvp_t *htable, const void *key)
 {
   if (htable == NULL) {
     return ARES_FALSE;
   }
 
-  return ares__htable_remove(htable->hash, key);
+  return ares_htable_remove(htable->hash, key);
 }
 
-size_t ares__htable_vpvp_num_keys(const ares__htable_vpvp_t *htable)
+size_t ares_htable_vpvp_num_keys(const ares_htable_vpvp_t *htable)
 {
   if (htable == NULL) {
     return 0;
   }
-  return ares__htable_num_keys(htable->hash);
+  return ares_htable_num_keys(htable->hash);
 }
diff --git a/deps/cares/src/lib/dsa/ares__llist.c b/deps/cares/src/lib/dsa/ares_llist.c
similarity index 52%
rename from deps/cares/src/lib/dsa/ares__llist.c
rename to deps/cares/src/lib/dsa/ares_llist.c
index 96936c1abe7c07..6bd7de269a43fb 100644
--- a/deps/cares/src/lib/dsa/ares__llist.c
+++ b/deps/cares/src/lib/dsa/ares_llist.c
@@ -24,25 +24,25 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__llist.h"
+#include "ares_llist.h"
 
-struct ares__llist {
-  ares__llist_node_t      *head;
-  ares__llist_node_t      *tail;
-  ares__llist_destructor_t destruct;
-  size_t                   cnt;
+struct ares_llist {
+  ares_llist_node_t      *head;
+  ares_llist_node_t      *tail;
+  ares_llist_destructor_t destruct;
+  size_t                  cnt;
 };
 
-struct ares__llist_node {
-  void               *data;
-  ares__llist_node_t *prev;
-  ares__llist_node_t *next;
-  ares__llist_t      *parent;
+struct ares_llist_node {
+  void              *data;
+  ares_llist_node_t *prev;
+  ares_llist_node_t *next;
+  ares_llist_t      *parent;
 };
 
-ares__llist_t *ares__llist_create(ares__llist_destructor_t destruct)
+ares_llist_t *ares_llist_create(ares_llist_destructor_t destruct)
 {
-  ares__llist_t *list = ares_malloc_zero(sizeof(*list));
+  ares_llist_t *list = ares_malloc_zero(sizeof(*list));
 
   if (list == NULL) {
     return NULL;
@@ -53,8 +53,8 @@ ares__llist_t *ares__llist_create(ares__llist_destructor_t destruct)
   return list;
 }
 
-void ares__llist_replace_destructor(ares__llist_t           *list,
-                                    ares__llist_destructor_t destruct)
+void ares_llist_replace_destructor(ares_llist_t           *list,
+                                   ares_llist_destructor_t destruct)
 {
   if (list == NULL) {
     return;
@@ -67,12 +67,11 @@ typedef enum {
   ARES__LLIST_INSERT_HEAD,
   ARES__LLIST_INSERT_TAIL,
   ARES__LLIST_INSERT_BEFORE
-} ares__llist_insert_type_t;
+} ares_llist_insert_type_t;
 
-static void ares__llist_attach_at(ares__llist_t            *list,
-                                  ares__llist_insert_type_t type,
-                                  ares__llist_node_t       *at,
-                                  ares__llist_node_t       *node)
+static void ares_llist_attach_at(ares_llist_t            *list,
+                                 ares_llist_insert_type_t type,
+                                 ares_llist_node_t *at, ares_llist_node_t *node)
 {
   if (list == NULL || node == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -117,12 +116,11 @@ static void ares__llist_attach_at(ares__llist_t            *list,
   list->cnt++;
 }
 
-static ares__llist_node_t *ares__llist_insert_at(ares__llist_t            *list,
-                                                 ares__llist_insert_type_t type,
-                                                 ares__llist_node_t       *at,
-                                                 void                     *val)
+static ares_llist_node_t *ares_llist_insert_at(ares_llist_t            *list,
+                                               ares_llist_insert_type_t type,
+                                               ares_llist_node_t *at, void *val)
 {
-  ares__llist_node_t *node = NULL;
+  ares_llist_node_t *node = NULL;
 
   if (list == NULL || val == NULL) {
     return NULL; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -135,48 +133,46 @@ static ares__llist_node_t *ares__llist_insert_at(ares__llist_t            *list,
   }
 
   node->data = val;
-  ares__llist_attach_at(list, type, at, node);
+  ares_llist_attach_at(list, type, at, node);
 
   return node;
 }
 
-ares__llist_node_t *ares__llist_insert_first(ares__llist_t *list, void *val)
+ares_llist_node_t *ares_llist_insert_first(ares_llist_t *list, void *val)
 {
-  return ares__llist_insert_at(list, ARES__LLIST_INSERT_HEAD, NULL, val);
+  return ares_llist_insert_at(list, ARES__LLIST_INSERT_HEAD, NULL, val);
 }
 
-ares__llist_node_t *ares__llist_insert_last(ares__llist_t *list, void *val)
+ares_llist_node_t *ares_llist_insert_last(ares_llist_t *list, void *val)
 {
-  return ares__llist_insert_at(list, ARES__LLIST_INSERT_TAIL, NULL, val);
+  return ares_llist_insert_at(list, ARES__LLIST_INSERT_TAIL, NULL, val);
 }
 
-ares__llist_node_t *ares__llist_insert_before(ares__llist_node_t *node,
-                                              void               *val)
+ares_llist_node_t *ares_llist_insert_before(ares_llist_node_t *node, void *val)
 {
   if (node == NULL) {
     return NULL;
   }
 
-  return ares__llist_insert_at(node->parent, ARES__LLIST_INSERT_BEFORE, node,
-                               val);
+  return ares_llist_insert_at(node->parent, ARES__LLIST_INSERT_BEFORE, node,
+                              val);
 }
 
-ares__llist_node_t *ares__llist_insert_after(ares__llist_node_t *node,
-                                             void               *val)
+ares_llist_node_t *ares_llist_insert_after(ares_llist_node_t *node, void *val)
 {
   if (node == NULL) {
     return NULL;
   }
 
   if (node->next == NULL) {
-    return ares__llist_insert_last(node->parent, val);
+    return ares_llist_insert_last(node->parent, val);
   }
 
-  return ares__llist_insert_at(node->parent, ARES__LLIST_INSERT_BEFORE,
-                               node->next, val);
+  return ares_llist_insert_at(node->parent, ARES__LLIST_INSERT_BEFORE,
+                              node->next, val);
 }
 
-ares__llist_node_t *ares__llist_node_first(ares__llist_t *list)
+ares_llist_node_t *ares_llist_node_first(ares_llist_t *list)
 {
   if (list == NULL) {
     return NULL;
@@ -184,10 +180,10 @@ ares__llist_node_t *ares__llist_node_first(ares__llist_t *list)
   return list->head;
 }
 
-ares__llist_node_t *ares__llist_node_idx(ares__llist_t *list, size_t idx)
+ares_llist_node_t *ares_llist_node_idx(ares_llist_t *list, size_t idx)
 {
-  ares__llist_node_t *node;
-  size_t              cnt;
+  ares_llist_node_t *node;
+  size_t             cnt;
 
   if (list == NULL) {
     return NULL;
@@ -204,7 +200,7 @@ ares__llist_node_t *ares__llist_node_idx(ares__llist_t *list, size_t idx)
   return node;
 }
 
-ares__llist_node_t *ares__llist_node_last(ares__llist_t *list)
+ares_llist_node_t *ares_llist_node_last(ares_llist_t *list)
 {
   if (list == NULL) {
     return NULL;
@@ -212,7 +208,7 @@ ares__llist_node_t *ares__llist_node_last(ares__llist_t *list)
   return list->tail;
 }
 
-ares__llist_node_t *ares__llist_node_next(ares__llist_node_t *node)
+ares_llist_node_t *ares_llist_node_next(ares_llist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -220,7 +216,7 @@ ares__llist_node_t *ares__llist_node_next(ares__llist_node_t *node)
   return node->next;
 }
 
-ares__llist_node_t *ares__llist_node_prev(ares__llist_node_t *node)
+ares_llist_node_t *ares_llist_node_prev(ares_llist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -228,7 +224,7 @@ ares__llist_node_t *ares__llist_node_prev(ares__llist_node_t *node)
   return node->prev;
 }
 
-void *ares__llist_node_val(ares__llist_node_t *node)
+void *ares_llist_node_val(ares_llist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -237,7 +233,7 @@ void *ares__llist_node_val(ares__llist_node_t *node)
   return node->data;
 }
 
-size_t ares__llist_len(const ares__llist_t *list)
+size_t ares_llist_len(const ares_llist_t *list)
 {
   if (list == NULL) {
     return 0;
@@ -245,7 +241,7 @@ size_t ares__llist_len(const ares__llist_t *list)
   return list->cnt;
 }
 
-ares__llist_t *ares__llist_node_parent(ares__llist_node_t *node)
+ares_llist_t *ares_llist_node_parent(ares_llist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -253,19 +249,19 @@ ares__llist_t *ares__llist_node_parent(ares__llist_node_t *node)
   return node->parent;
 }
 
-void *ares__llist_first_val(ares__llist_t *list)
+void *ares_llist_first_val(ares_llist_t *list)
 {
-  return ares__llist_node_val(ares__llist_node_first(list));
+  return ares_llist_node_val(ares_llist_node_first(list));
 }
 
-void *ares__llist_last_val(ares__llist_t *list)
+void *ares_llist_last_val(ares_llist_t *list)
 {
-  return ares__llist_node_val(ares__llist_node_last(list));
+  return ares_llist_node_val(ares_llist_node_last(list));
 }
 
-static void ares__llist_node_detach(ares__llist_node_t *node)
+static void ares_llist_node_detach(ares_llist_node_t *node)
 {
-  ares__llist_t *list;
+  ares_llist_t *list;
 
   if (node == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -293,7 +289,7 @@ static void ares__llist_node_detach(ares__llist_node_t *node)
   list->cnt--;
 }
 
-void *ares__llist_node_claim(ares__llist_node_t *node)
+void *ares_llist_node_claim(ares_llist_node_t *node)
 {
   void *val;
 
@@ -302,16 +298,16 @@ void *ares__llist_node_claim(ares__llist_node_t *node)
   }
 
   val = node->data;
-  ares__llist_node_detach(node);
+  ares_llist_node_detach(node);
   ares_free(node);
 
   return val;
 }
 
-void ares__llist_node_destroy(ares__llist_node_t *node)
+void ares_llist_node_destroy(ares_llist_node_t *node)
 {
-  ares__llist_destructor_t destruct;
-  void                    *val;
+  ares_llist_destructor_t destruct;
+  void                   *val;
 
   if (node == NULL) {
     return;
@@ -319,15 +315,15 @@ void ares__llist_node_destroy(ares__llist_node_t *node)
 
   destruct = node->parent->destruct;
 
-  val = ares__llist_node_claim(node);
+  val = ares_llist_node_claim(node);
   if (val != NULL && destruct != NULL) {
     destruct(val);
   }
 }
 
-void ares__llist_node_replace(ares__llist_node_t *node, void *val)
+void ares_llist_node_replace(ares_llist_node_t *node, void *val)
 {
-  ares__llist_destructor_t destruct;
+  ares_llist_destructor_t destruct;
 
   if (node == NULL) {
     return;
@@ -341,46 +337,46 @@ void ares__llist_node_replace(ares__llist_node_t *node, void *val)
   node->data = val;
 }
 
-void ares__llist_clear(ares__llist_t *list)
+void ares_llist_clear(ares_llist_t *list)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
   if (list == NULL) {
     return;
   }
 
-  while ((node = ares__llist_node_first(list)) != NULL) {
-    ares__llist_node_destroy(node);
+  while ((node = ares_llist_node_first(list)) != NULL) {
+    ares_llist_node_destroy(node);
   }
 }
 
-void ares__llist_destroy(ares__llist_t *list)
+void ares_llist_destroy(ares_llist_t *list)
 {
   if (list == NULL) {
     return;
   }
-  ares__llist_clear(list);
+  ares_llist_clear(list);
   ares_free(list);
 }
 
-void ares__llist_node_move_parent_last(ares__llist_node_t *node,
-                                       ares__llist_t      *new_parent)
+void ares_llist_node_mvparent_last(ares_llist_node_t *node,
+                                   ares_llist_t      *new_parent)
 {
   if (node == NULL || new_parent == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__llist_node_detach(node);
-  ares__llist_attach_at(new_parent, ARES__LLIST_INSERT_TAIL, NULL, node);
+  ares_llist_node_detach(node);
+  ares_llist_attach_at(new_parent, ARES__LLIST_INSERT_TAIL, NULL, node);
 }
 
-void ares__llist_node_move_parent_first(ares__llist_node_t *node,
-                                        ares__llist_t      *new_parent)
+void ares_llist_node_mvparent_first(ares_llist_node_t *node,
+                                    ares_llist_t      *new_parent)
 {
   if (node == NULL || new_parent == NULL) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__llist_node_detach(node);
-  ares__llist_attach_at(new_parent, ARES__LLIST_INSERT_HEAD, NULL, node);
+  ares_llist_node_detach(node);
+  ares_llist_attach_at(new_parent, ARES__LLIST_INSERT_HEAD, NULL, node);
 }
diff --git a/deps/cares/src/lib/dsa/ares__slist.c b/deps/cares/src/lib/dsa/ares_slist.c
similarity index 70%
rename from deps/cares/src/lib/dsa/ares__slist.c
rename to deps/cares/src/lib/dsa/ares_slist.c
index f0e3f8b14a9885..7e68347994ce4c 100644
--- a/deps/cares/src/lib/dsa/ares__slist.c
+++ b/deps/cares/src/lib/dsa/ares_slist.c
@@ -24,39 +24,39 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__slist.h"
+#include "ares_slist.h"
 
 /* SkipList implementation */
 
 #define ARES__SLIST_START_LEVELS 4
 
-struct ares__slist {
-  ares_rand_state         *rand_state;
-  unsigned char            rand_data[8];
-  size_t                   rand_bits;
+struct ares_slist {
+  ares_rand_state        *rand_state;
+  unsigned char           rand_data[8];
+  size_t                  rand_bits;
 
-  ares__slist_node_t     **head;
-  size_t                   levels;
-  ares__slist_node_t      *tail;
+  ares_slist_node_t     **head;
+  size_t                  levels;
+  ares_slist_node_t      *tail;
 
-  ares__slist_cmp_t        cmp;
-  ares__slist_destructor_t destruct;
-  size_t                   cnt;
+  ares_slist_cmp_t        cmp;
+  ares_slist_destructor_t destruct;
+  size_t                  cnt;
 };
 
-struct ares__slist_node {
-  void                *data;
-  ares__slist_node_t **prev;
-  ares__slist_node_t **next;
-  size_t               levels;
-  ares__slist_t       *parent;
+struct ares_slist_node {
+  void               *data;
+  ares_slist_node_t **prev;
+  ares_slist_node_t **next;
+  size_t              levels;
+  ares_slist_t       *parent;
 };
 
-ares__slist_t *ares__slist_create(ares_rand_state         *rand_state,
-                                  ares__slist_cmp_t        cmp,
-                                  ares__slist_destructor_t destruct)
+ares_slist_t *ares_slist_create(ares_rand_state        *rand_state,
+                                ares_slist_cmp_t        cmp,
+                                ares_slist_destructor_t destruct)
 {
-  ares__slist_t *list;
+  ares_slist_t *list;
 
   if (rand_state == NULL || cmp == NULL) {
     return NULL;
@@ -82,18 +82,17 @@ ares__slist_t *ares__slist_create(ares_rand_state         *rand_state,
   return list;
 }
 
-static ares_bool_t ares__slist_coin_flip(ares__slist_t *list)
+static ares_bool_t ares_slist_coin_flip(ares_slist_t *list)
 {
   size_t total_bits = sizeof(list->rand_data) * 8;
   size_t bit;
 
   /* Refill random data used for coin flips.  We pull this in 8 byte chunks.
-   * ares__rand_bytes() has some built-in caching of its own so we don't need
+   * ares_rand_bytes() has some built-in caching of its own so we don't need
    * to be excessive in caching ourselves.  Prefer to require less memory per
    * skiplist */
   if (list->rand_bits == 0) {
-    ares__rand_bytes(list->rand_state, list->rand_data,
-                     sizeof(list->rand_data));
+    ares_rand_bytes(list->rand_state, list->rand_data, sizeof(list->rand_data));
     list->rand_bits = total_bits;
   }
 
@@ -103,8 +102,8 @@ static ares_bool_t ares__slist_coin_flip(ares__slist_t *list)
   return (list->rand_data[bit / 8] & (1 << (bit % 8))) ? ARES_TRUE : ARES_FALSE;
 }
 
-void ares__slist_replace_destructor(ares__slist_t           *list,
-                                    ares__slist_destructor_t destruct)
+void ares_slist_replace_destructor(ares_slist_t           *list,
+                                   ares_slist_destructor_t destruct)
 {
   if (list == NULL) {
     return;
@@ -113,14 +112,14 @@ void ares__slist_replace_destructor(ares__slist_t           *list,
   list->destruct = destruct;
 }
 
-static size_t ares__slist_max_level(const ares__slist_t *list)
+static size_t ares_slist_max_level(const ares_slist_t *list)
 {
   size_t max_level = 0;
 
   if (list->cnt + 1 <= (1 << ARES__SLIST_START_LEVELS)) {
     max_level = ARES__SLIST_START_LEVELS;
   } else {
-    max_level = ares__log2(ares__round_up_pow2(list->cnt + 1));
+    max_level = ares_log2(ares_round_up_pow2(list->cnt + 1));
   }
 
   if (list->levels > max_level) {
@@ -130,21 +129,21 @@ static size_t ares__slist_max_level(const ares__slist_t *list)
   return max_level;
 }
 
-static size_t ares__slist_calc_level(ares__slist_t *list)
+static size_t ares_slist_calc_level(ares_slist_t *list)
 {
-  size_t max_level = ares__slist_max_level(list);
+  size_t max_level = ares_slist_max_level(list);
   size_t level;
 
-  for (level = 1; ares__slist_coin_flip(list) && level < max_level; level++)
+  for (level = 1; ares_slist_coin_flip(list) && level < max_level; level++)
     ;
 
   return level;
 }
 
-static void ares__slist_node_push(ares__slist_t *list, ares__slist_node_t *node)
+static void ares_slist_node_push(ares_slist_t *list, ares_slist_node_t *node)
 {
-  size_t              i;
-  ares__slist_node_t *left = NULL;
+  size_t             i;
+  ares_slist_node_t *left = NULL;
 
   /* Scan from highest level in the slist, even if we're not using that number
    * of levels for this entry as this is what makes it O(log n) */
@@ -193,9 +192,9 @@ static void ares__slist_node_push(ares__slist_t *list, ares__slist_node_t *node)
   }
 }
 
-ares__slist_node_t *ares__slist_insert(ares__slist_t *list, void *val)
+ares_slist_node_t *ares_slist_insert(ares_slist_t *list, void *val)
 {
-  ares__slist_node_t *node = NULL;
+  ares_slist_node_t *node = NULL;
 
   if (list == NULL || val == NULL) {
     return NULL;
@@ -211,7 +210,7 @@ ares__slist_node_t *ares__slist_insert(ares__slist_t *list, void *val)
   node->parent = list;
 
   /* Randomly determine the number of levels we want to use */
-  node->levels = ares__slist_calc_level(list);
+  node->levels = ares_slist_calc_level(list);
 
   /* Allocate array of next and prev nodes for linking each level */
   node->next = ares_malloc_zero(sizeof(*node->next) * node->levels);
@@ -238,7 +237,7 @@ ares__slist_node_t *ares__slist_insert(ares__slist_t *list, void *val)
     list->levels = node->levels;
   }
 
-  ares__slist_node_push(list, node);
+  ares_slist_node_push(list, node);
 
   list->cnt++;
 
@@ -255,10 +254,10 @@ ares__slist_node_t *ares__slist_insert(ares__slist_t *list, void *val)
   /* LCOV_EXCL_STOP */
 }
 
-static void ares__slist_node_pop(ares__slist_node_t *node)
+static void ares_slist_node_pop(ares_slist_node_t *node)
 {
-  ares__slist_t *list = node->parent;
-  size_t         i;
+  ares_slist_t *list = node->parent;
+  size_t        i;
 
   /* relink each node at each level */
   for (i = node->levels; i-- > 0;) {
@@ -281,10 +280,10 @@ static void ares__slist_node_pop(ares__slist_node_t *node)
   memset(node->prev, 0, sizeof(*node->prev) * node->levels);
 }
 
-void *ares__slist_node_claim(ares__slist_node_t *node)
+void *ares_slist_node_claim(ares_slist_node_t *node)
 {
-  ares__slist_t *list;
-  void          *val;
+  ares_slist_t *list;
+  void         *val;
 
   if (node == NULL) {
     return NULL;
@@ -293,7 +292,7 @@ void *ares__slist_node_claim(ares__slist_node_t *node)
   list = node->parent;
   val  = node->data;
 
-  ares__slist_node_pop(node);
+  ares_slist_node_pop(node);
 
   ares_free(node->next);
   ares_free(node->prev);
@@ -304,9 +303,9 @@ void *ares__slist_node_claim(ares__slist_node_t *node)
   return val;
 }
 
-void ares__slist_node_reinsert(ares__slist_node_t *node)
+void ares_slist_node_reinsert(ares_slist_node_t *node)
 {
-  ares__slist_t *list;
+  ares_slist_t *list;
 
   if (node == NULL) {
     return;
@@ -314,15 +313,16 @@ void ares__slist_node_reinsert(ares__slist_node_t *node)
 
   list = node->parent;
 
-  ares__slist_node_pop(node);
-  ares__slist_node_push(list, node);
+  ares_slist_node_pop(node);
+  ares_slist_node_push(list, node);
 }
 
-ares__slist_node_t *ares__slist_node_find(ares__slist_t *list, const void *val)
+ares_slist_node_t *ares_slist_node_find(const ares_slist_t *list,
+                                        const void         *val)
 {
-  size_t              i;
-  ares__slist_node_t *node = NULL;
-  int                 rv   = -1;
+  size_t             i;
+  ares_slist_node_t *node = NULL;
+  int                rv   = -1;
 
   if (list == NULL || val == NULL) {
     return NULL;
@@ -377,7 +377,7 @@ ares__slist_node_t *ares__slist_node_find(ares__slist_t *list, const void *val)
   return node;
 }
 
-ares__slist_node_t *ares__slist_node_first(ares__slist_t *list)
+ares_slist_node_t *ares_slist_node_first(const ares_slist_t *list)
 {
   if (list == NULL) {
     return NULL;
@@ -386,7 +386,7 @@ ares__slist_node_t *ares__slist_node_first(ares__slist_t *list)
   return list->head[0];
 }
 
-ares__slist_node_t *ares__slist_node_last(ares__slist_t *list)
+ares_slist_node_t *ares_slist_node_last(const ares_slist_t *list)
 {
   if (list == NULL) {
     return NULL;
@@ -394,7 +394,7 @@ ares__slist_node_t *ares__slist_node_last(ares__slist_t *list)
   return list->tail;
 }
 
-ares__slist_node_t *ares__slist_node_next(ares__slist_node_t *node)
+ares_slist_node_t *ares_slist_node_next(const ares_slist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -402,7 +402,7 @@ ares__slist_node_t *ares__slist_node_next(ares__slist_node_t *node)
   return node->next[0];
 }
 
-ares__slist_node_t *ares__slist_node_prev(ares__slist_node_t *node)
+ares_slist_node_t *ares_slist_node_prev(const ares_slist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -410,7 +410,7 @@ ares__slist_node_t *ares__slist_node_prev(ares__slist_node_t *node)
   return node->prev[0];
 }
 
-void *ares__slist_node_val(ares__slist_node_t *node)
+void *ares_slist_node_val(ares_slist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -419,7 +419,7 @@ void *ares__slist_node_val(ares__slist_node_t *node)
   return node->data;
 }
 
-size_t ares__slist_len(const ares__slist_t *list)
+size_t ares_slist_len(const ares_slist_t *list)
 {
   if (list == NULL) {
     return 0;
@@ -427,7 +427,7 @@ size_t ares__slist_len(const ares__slist_t *list)
   return list->cnt;
 }
 
-ares__slist_t *ares__slist_node_parent(ares__slist_node_t *node)
+ares_slist_t *ares_slist_node_parent(ares_slist_node_t *node)
 {
   if (node == NULL) {
     return NULL;
@@ -435,43 +435,43 @@ ares__slist_t *ares__slist_node_parent(ares__slist_node_t *node)
   return node->parent;
 }
 
-void *ares__slist_first_val(ares__slist_t *list)
+void *ares_slist_first_val(const ares_slist_t *list)
 {
-  return ares__slist_node_val(ares__slist_node_first(list));
+  return ares_slist_node_val(ares_slist_node_first(list));
 }
 
-void *ares__slist_last_val(ares__slist_t *list)
+void *ares_slist_last_val(const ares_slist_t *list)
 {
-  return ares__slist_node_val(ares__slist_node_last(list));
+  return ares_slist_node_val(ares_slist_node_last(list));
 }
 
-void ares__slist_node_destroy(ares__slist_node_t *node)
+void ares_slist_node_destroy(ares_slist_node_t *node)
 {
-  ares__slist_destructor_t destruct;
-  void                    *val;
+  ares_slist_destructor_t destruct;
+  void                   *val;
 
   if (node == NULL) {
     return;
   }
 
   destruct = node->parent->destruct;
-  val      = ares__slist_node_claim(node);
+  val      = ares_slist_node_claim(node);
 
   if (val != NULL && destruct != NULL) {
     destruct(val);
   }
 }
 
-void ares__slist_destroy(ares__slist_t *list)
+void ares_slist_destroy(ares_slist_t *list)
 {
-  ares__slist_node_t *node;
+  ares_slist_node_t *node;
 
   if (list == NULL) {
     return;
   }
 
-  while ((node = ares__slist_node_first(list)) != NULL) {
-    ares__slist_node_destroy(node);
+  while ((node = ares_slist_node_first(list)) != NULL) {
+    ares_slist_node_destroy(node);
   }
 
   ares_free(list->head);
diff --git a/deps/cares/src/lib/dsa/ares__slist.h b/deps/cares/src/lib/dsa/ares_slist.h
similarity index 75%
rename from deps/cares/src/lib/dsa/ares__slist.h
rename to deps/cares/src/lib/dsa/ares_slist.h
index 26af88fa782499..a89c2652f2d4d4 100644
--- a/deps/cares/src/lib/dsa/ares__slist.h
+++ b/deps/cares/src/lib/dsa/ares_slist.h
@@ -27,7 +27,7 @@
 #define __ARES__SLIST_H
 
 
-/*! \addtogroup ares__slist SkipList Data Structure
+/*! \addtogroup ares_slist SkipList Data Structure
  *
  * This data structure is known as a Skip List, which in essence is a sorted
  * linked list with multiple levels of linkage to gain some algorithmic
@@ -49,21 +49,21 @@
  *
  * @{
  */
-struct ares__slist;
+struct ares_slist;
 
 /*! SkipList Object, opaque */
-typedef struct ares__slist ares__slist_t;
+typedef struct ares_slist ares_slist_t;
 
-struct ares__slist_node;
+struct ares_slist_node;
 
 /*! SkipList Node Object, opaque */
-typedef struct ares__slist_node ares__slist_node_t;
+typedef struct ares_slist_node ares_slist_node_t;
 
 /*! SkipList Node Value destructor callback
  *
  *  \param[in] data  User-defined data to destroy
  */
-typedef void (*ares__slist_destructor_t)(void *data);
+typedef void (*ares_slist_destructor_t)(void *data);
 
 /*! SkipList comparison function
  *
@@ -71,7 +71,7 @@ typedef void (*ares__slist_destructor_t)(void *data);
  *  \param[in] data2 Second user-defined data object
  *  \return < 0 if data1 < data1, > 0 if data1 > data2, 0 if data1 == data2
  */
-typedef int (*ares__slist_cmp_t)(const void *data1, const void *data2);
+typedef int (*ares_slist_cmp_t)(const void *data1, const void *data2);
 
 /*! Create SkipList
  *
@@ -80,17 +80,17 @@ typedef int (*ares__slist_cmp_t)(const void *data1, const void *data2);
  *  \param[in] destruct     SkipList Node Value Destructor. Optional, use NULL.
  *  \return Initialized SkipList Object or NULL on misuse or ENOMEM
  */
-ares__slist_t      *ares__slist_create(ares_rand_state         *rand_state,
-                                       ares__slist_cmp_t        cmp,
-                                       ares__slist_destructor_t destruct);
+ares_slist_t      *ares_slist_create(ares_rand_state        *rand_state,
+                                     ares_slist_cmp_t        cmp,
+                                     ares_slist_destructor_t destruct);
 
 /*! Replace SkipList Node Value Destructor
  *
  *  \param[in] list      Initialized SkipList Object
  *  \param[in] destruct  Replacement destructor. May be NULL.
  */
-void                ares__slist_replace_destructor(ares__slist_t           *list,
-                                                   ares__slist_destructor_t destruct);
+void               ares_slist_replace_destructor(ares_slist_t           *list,
+                                                 ares_slist_destructor_t destruct);
 
 /*! Insert Value into SkipList
  *
@@ -99,35 +99,35 @@ void                ares__slist_replace_destructor(ares__slist_t           *list
  *                    and will have destructor called.
  *  \return SkipList Node Object or NULL on misuse or ENOMEM
  */
-ares__slist_node_t *ares__slist_insert(ares__slist_t *list, void *val);
+ares_slist_node_t *ares_slist_insert(ares_slist_t *list, void *val);
 
 /*! Fetch first node in SkipList
  *
  *  \param[in] list  Initialized SkipList Object
  *  \return SkipList Node Object or NULL if none
  */
-ares__slist_node_t *ares__slist_node_first(ares__slist_t *list);
+ares_slist_node_t *ares_slist_node_first(const ares_slist_t *list);
 
 /*! Fetch last node in SkipList
  *
  *  \param[in] list  Initialized SkipList Object
  *  \return SkipList Node Object or NULL if none
  */
-ares__slist_node_t *ares__slist_node_last(ares__slist_t *list);
+ares_slist_node_t *ares_slist_node_last(const ares_slist_t *list);
 
 /*! Fetch next node in SkipList
  *
  *  \param[in] node  SkipList Node Object
  *  \return SkipList Node Object or NULL if none
  */
-ares__slist_node_t *ares__slist_node_next(ares__slist_node_t *node);
+ares_slist_node_t *ares_slist_node_next(const ares_slist_node_t *node);
 
 /*! Fetch previous node in SkipList
  *
  *  \param[in] node  SkipList Node Object
  *  \return SkipList Node Object or NULL if none
  */
-ares__slist_node_t *ares__slist_node_prev(ares__slist_node_t *node);
+ares_slist_node_t *ares_slist_node_prev(const ares_slist_node_t *node);
 
 /*! Fetch SkipList Node Object by Value
  *
@@ -135,7 +135,8 @@ ares__slist_node_t *ares__slist_node_prev(ares__slist_node_t *node);
  *  \param[in] val   Object to use for comparison
  *  \return SkipList Node Object or NULL if not found
  */
-ares__slist_node_t *ares__slist_node_find(ares__slist_t *list, const void *val);
+ares_slist_node_t *ares_slist_node_find(const ares_slist_t *list,
+                                        const void         *val);
 
 
 /*! Fetch Node Value
@@ -143,42 +144,42 @@ ares__slist_node_t *ares__slist_node_find(ares__slist_t *list, const void *val);
  *  \param[in] node  SkipList Node Object
  *  \return user defined node value
  */
-void               *ares__slist_node_val(ares__slist_node_t *node);
+void              *ares_slist_node_val(ares_slist_node_t *node);
 
 /*! Fetch number of entries in SkipList Object
  *
  *  \param[in] list  Initialized SkipList Object
  *  \return number of entries
  */
-size_t              ares__slist_len(const ares__slist_t *list);
+size_t             ares_slist_len(const ares_slist_t *list);
 
 /*! Fetch SkipList Object from SkipList Node
  *
  *  \param[in] node  SkipList Node Object
  *  \return SkipList Object
  */
-ares__slist_t      *ares__slist_node_parent(ares__slist_node_t *node);
+ares_slist_t      *ares_slist_node_parent(ares_slist_node_t *node);
 
 /*! Fetch first Node Value in SkipList
  *
  *  \param[in] list  Initialized SkipList Object
  *  \return user defined node value or NULL if none
  */
-void               *ares__slist_first_val(ares__slist_t *list);
+void              *ares_slist_first_val(const ares_slist_t *list);
 
 /*! Fetch last Node Value in SkipList
  *
  *  \param[in] list  Initialized SkipList Object
  *  \return user defined node value or NULL if none
  */
-void               *ares__slist_last_val(ares__slist_t *list);
+void              *ares_slist_last_val(const ares_slist_t *list);
 
 /*! Take back ownership of Node Value in SkipList, remove from SkipList.
  *
  *  \param[in] node  SkipList Node Object
  *  \return user defined node value
  */
-void               *ares__slist_node_claim(ares__slist_node_t *node);
+void              *ares_slist_node_claim(ares_slist_node_t *node);
 
 /*! The internals of the node have changed, thus its position in the sorted
  *  list is no longer valid.  This function will remove it and re-add it to
@@ -187,19 +188,19 @@ void               *ares__slist_node_claim(ares__slist_node_t *node);
  *
  *  \param[in] node  SkipList Node Object
  */
-void                ares__slist_node_reinsert(ares__slist_node_t *node);
+void               ares_slist_node_reinsert(ares_slist_node_t *node);
 
 /*! Remove Node from SkipList, calling destructor for Node Value.
  *
  *  \param[in] node  SkipList Node Object
  */
-void                ares__slist_node_destroy(ares__slist_node_t *node);
+void               ares_slist_node_destroy(ares_slist_node_t *node);
 
 /*! Destroy SkipList Object.  If there are any nodes, they will be destroyed.
  *
  *  \param[in] list  Initialized SkipList Object
  */
-void                ares__slist_destroy(ares__slist_t *list);
+void               ares_slist_destroy(ares_slist_t *list);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/event/ares_event.h b/deps/cares/src/lib/event/ares_event.h
index 317731fc4289ad..36cd10dcf89152 100644
--- a/deps/cares/src/lib/event/ares_event.h
+++ b/deps/cares/src/lib/event/ares_event.h
@@ -90,21 +90,23 @@ struct ares_event_thread {
    *  event before sleeping. */
   ares_bool_t             isup;
   /*! Handle to the thread for joining during shutdown */
-  ares__thread_t         *thread;
+  ares_thread_t          *thread;
   /*! Lock to protect the data contained within the event thread itself */
-  ares__thread_mutex_t   *mutex;
+  ares_thread_mutex_t    *mutex;
   /*! Reference to the ares channel, for being able to call things like
    *  ares_timeout() and ares_process_fd(). */
   ares_channel_t         *channel;
+  /*! Whether or not on the next loop we should process a pending write */
+  ares_bool_t             process_pending_write;
   /*! Not-yet-processed event handle updates.  These will get enqueued by a
    *  thread other than the event thread itself. The event thread will then
    *  be woken then process these updates itself */
-  ares__llist_t          *ev_updates;
+  ares_llist_t           *ev_updates;
   /*! Registered socket event handles */
-  ares__htable_asvp_t    *ev_sock_handles;
+  ares_htable_asvp_t     *ev_sock_handles;
   /*! Registered custom event handles. Typically used for external triggering.
    */
-  ares__htable_vpvp_t    *ev_cust_handles;
+  ares_htable_vpvp_t     *ev_cust_handles;
   /*! Pointer to the event handle which is used to signal and wake the event
    *  thread itself.  This is needed to be able to do things like update the
    *  file descriptors being waited on and to wake the event subsystem during
diff --git a/deps/cares/src/lib/event/ares_event_configchg.c b/deps/cares/src/lib/event/ares_event_configchg.c
index 10f0e21dde77fa..e3e665bd165523 100644
--- a/deps/cares/src/lib/event/ares_event_configchg.c
+++ b/deps/cares/src/lib/event/ares_event_configchg.c
@@ -116,8 +116,8 @@ static void ares_event_configchg_cb(ares_event_thread_t *e, ares_socket_t fd,
         continue;
       }
 
-      if (strcasecmp(event->name, "resolv.conf") == 0 ||
-          strcasecmp(event->name, "nsswitch.conf") == 0) {
+      if (ares_strcaseeq(event->name, "resolv.conf") ||
+          ares_strcaseeq(event->name, "nsswitch.conf")) {
         triggered = ARES_TRUE;
       }
     }
@@ -545,17 +545,17 @@ typedef struct {
 } fileinfo_t;
 
 struct ares_event_configchg {
-  ares_bool_t           isup;
-  ares__thread_t       *thread;
-  ares__htable_strvp_t *filestat;
-  ares__thread_mutex_t *lock;
-  ares__thread_cond_t  *wake;
-  const char           *resolvconf_path;
-  ares_event_thread_t  *e;
+  ares_bool_t          isup;
+  ares_thread_t       *thread;
+  ares_htable_strvp_t *filestat;
+  ares_thread_mutex_t *lock;
+  ares_thread_cond_t  *wake;
+  const char          *resolvconf_path;
+  ares_event_thread_t *e;
 };
 
-static ares_status_t config_change_check(ares__htable_strvp_t *filestat,
-                                         const char           *resolvconf_path)
+static ares_status_t config_change_check(ares_htable_strvp_t *filestat,
+                                         const char          *resolvconf_path)
 {
   size_t      i;
   const char *configfiles[5];
@@ -568,7 +568,7 @@ static ares_status_t config_change_check(ares__htable_strvp_t *filestat,
   configfiles[4] = NULL;
 
   for (i = 0; configfiles[i] != NULL; i++) {
-    fileinfo_t *fi = ares__htable_strvp_get_direct(filestat, configfiles[i]);
+    fileinfo_t *fi = ares_htable_strvp_get_direct(filestat, configfiles[i]);
     struct stat st;
 
     if (stat(configfiles[i], &st) == 0) {
@@ -577,7 +577,7 @@ static ares_status_t config_change_check(ares__htable_strvp_t *filestat,
         if (fi == NULL) {
           return ARES_ENOMEM;
         }
-        if (!ares__htable_strvp_insert(filestat, configfiles[i], fi)) {
+        if (!ares_htable_strvp_insert(filestat, configfiles[i], fi)) {
           ares_free(fi);
           return ARES_ENOMEM;
         }
@@ -589,7 +589,7 @@ static ares_status_t config_change_check(ares__htable_strvp_t *filestat,
       fi->mtime = (time_t)st.st_mtime;
     } else if (fi != NULL) {
       /* File no longer exists, remove */
-      ares__htable_strvp_remove(filestat, configfiles[i]);
+      ares_htable_strvp_remove(filestat, configfiles[i]);
       changed = ARES_TRUE;
     }
   }
@@ -604,11 +604,11 @@ static void *ares_event_configchg_thread(void *arg)
 {
   ares_event_configchg_t *c = arg;
 
-  ares__thread_mutex_lock(c->lock);
+  ares_thread_mutex_lock(c->lock);
   while (c->isup) {
     ares_status_t status;
 
-    if (ares__thread_cond_timedwait(c->wake, c->lock, 30000) != ARES_ETIMEOUT) {
+    if (ares_thread_cond_timedwait(c->wake, c->lock, 30000) != ARES_ETIMEOUT) {
       continue;
     }
 
@@ -623,7 +623,7 @@ static void *ares_event_configchg_thread(void *arg)
     }
   }
 
-  ares__thread_mutex_unlock(c->lock);
+  ares_thread_mutex_unlock(c->lock);
   return NULL;
 }
 
@@ -643,13 +643,13 @@ ares_status_t ares_event_configchg_init(ares_event_configchg_t **configchg,
 
   c->e = e;
 
-  c->filestat = ares__htable_strvp_create(ares_free);
+  c->filestat = ares_htable_strvp_create(ares_free);
   if (c->filestat == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  c->wake = ares__thread_cond_create();
+  c->wake = ares_thread_cond_create();
   if (c->wake == NULL) {
     status = ARES_ENOMEM;
     goto done;
@@ -666,7 +666,7 @@ ares_status_t ares_event_configchg_init(ares_event_configchg_t **configchg,
   }
 
   c->isup = ARES_TRUE;
-  status  = ares__thread_create(&c->thread, ares_event_configchg_thread, c);
+  status  = ares_thread_create(&c->thread, ares_event_configchg_thread, c);
 
 done:
   if (status != ARES_SUCCESS) {
@@ -684,26 +684,26 @@ void ares_event_configchg_destroy(ares_event_configchg_t *configchg)
   }
 
   if (configchg->lock) {
-    ares__thread_mutex_lock(configchg->lock);
+    ares_thread_mutex_lock(configchg->lock);
   }
 
   configchg->isup = ARES_FALSE;
   if (configchg->wake) {
-    ares__thread_cond_signal(configchg->wake);
+    ares_thread_cond_signal(configchg->wake);
   }
 
   if (configchg->lock) {
-    ares__thread_mutex_unlock(configchg->lock);
+    ares_thread_mutex_unlock(configchg->lock);
   }
 
   if (configchg->thread) {
     void *rv = NULL;
-    ares__thread_join(configchg->thread, &rv);
+    ares_thread_join(configchg->thread, &rv);
   }
 
-  ares__thread_mutex_destroy(configchg->lock);
-  ares__thread_cond_destroy(configchg->wake);
-  ares__htable_strvp_destroy(configchg->filestat);
+  ares_thread_mutex_destroy(configchg->lock);
+  ares_thread_cond_destroy(configchg->wake);
+  ares_htable_strvp_destroy(configchg->filestat);
   ares_free(configchg);
 }
 
diff --git a/deps/cares/src/lib/event/ares_event_epoll.c b/deps/cares/src/lib/event/ares_event_epoll.c
index 5eb25cccc9ed2a..538c38b4f94ab4 100644
--- a/deps/cares/src/lib/event/ares_event_epoll.c
+++ b/deps/cares/src/lib/event/ares_event_epoll.c
@@ -161,8 +161,8 @@ static size_t ares_evsys_epoll_wait(ares_event_thread_t *e,
     ares_event_t      *ev;
     ares_event_flags_t flags = 0;
 
-    ev = ares__htable_asvp_get_direct(e->ev_sock_handles,
-                                      (ares_socket_t)events[i].data.fd);
+    ev = ares_htable_asvp_get_direct(e->ev_sock_handles,
+                                     (ares_socket_t)events[i].data.fd);
     if (ev == NULL || ev->cb == NULL) {
       continue; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
diff --git a/deps/cares/src/lib/event/ares_event_kqueue.c b/deps/cares/src/lib/event/ares_event_kqueue.c
index 1c35c14f165690..dbbd0dbd9f76a6 100644
--- a/deps/cares/src/lib/event/ares_event_kqueue.c
+++ b/deps/cares/src/lib/event/ares_event_kqueue.c
@@ -217,8 +217,8 @@ static size_t ares_evsys_kqueue_wait(ares_event_thread_t *e,
     ares_event_t      *ev;
     ares_event_flags_t flags = 0;
 
-    ev = ares__htable_asvp_get_direct(e->ev_sock_handles,
-                                      (ares_socket_t)events[i].ident);
+    ev = ares_htable_asvp_get_direct(e->ev_sock_handles,
+                                     (ares_socket_t)events[i].ident);
     if (ev == NULL || ev->cb == NULL) {
       continue;
     }
diff --git a/deps/cares/src/lib/event/ares_event_poll.c b/deps/cares/src/lib/event/ares_event_poll.c
index 42ffd912e95c4b..c6ab4b62072b36 100644
--- a/deps/cares/src/lib/event/ares_event_poll.c
+++ b/deps/cares/src/lib/event/ares_event_poll.c
@@ -67,7 +67,7 @@ static size_t ares_evsys_poll_wait(ares_event_thread_t *e,
                                    unsigned long        timeout_ms)
 {
   size_t         num_fds = 0;
-  ares_socket_t *fdlist  = ares__htable_asvp_keys(e->ev_sock_handles, &num_fds);
+  ares_socket_t *fdlist  = ares_htable_asvp_keys(e->ev_sock_handles, &num_fds);
   struct pollfd *pollfd  = NULL;
   int            rv;
   size_t         cnt = 0;
@@ -80,7 +80,7 @@ static size_t ares_evsys_poll_wait(ares_event_thread_t *e,
     }
     for (i = 0; i < num_fds; i++) {
       const ares_event_t *ev =
-        ares__htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
+        ares_htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
       pollfd[i].fd = ev->fd;
       if (ev->flags & ARES_EVENT_FLAG_READ) {
         pollfd[i].events |= POLLIN;
@@ -107,7 +107,7 @@ static size_t ares_evsys_poll_wait(ares_event_thread_t *e,
 
     cnt++;
 
-    ev = ares__htable_asvp_get_direct(e->ev_sock_handles, pollfd[i].fd);
+    ev = ares_htable_asvp_get_direct(e->ev_sock_handles, pollfd[i].fd);
     if (ev == NULL || ev->cb == NULL) {
       continue; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
diff --git a/deps/cares/src/lib/event/ares_event_select.c b/deps/cares/src/lib/event/ares_event_select.c
index e1266ea99056a3..4d7c085d872088 100644
--- a/deps/cares/src/lib/event/ares_event_select.c
+++ b/deps/cares/src/lib/event/ares_event_select.c
@@ -75,7 +75,7 @@ static size_t ares_evsys_select_wait(ares_event_thread_t *e,
                                      unsigned long        timeout_ms)
 {
   size_t          num_fds = 0;
-  ares_socket_t  *fdlist = ares__htable_asvp_keys(e->ev_sock_handles, &num_fds);
+  ares_socket_t  *fdlist  = ares_htable_asvp_keys(e->ev_sock_handles, &num_fds);
   int             rv;
   size_t          cnt = 0;
   size_t          i;
@@ -92,7 +92,7 @@ static size_t ares_evsys_select_wait(ares_event_thread_t *e,
 
   for (i = 0; i < num_fds; i++) {
     const ares_event_t *ev =
-      ares__htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
+      ares_htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
     if (ev->flags & ARES_EVENT_FLAG_READ) {
       FD_SET(ev->fd, &read_fds);
     }
@@ -117,7 +117,7 @@ static size_t ares_evsys_select_wait(ares_event_thread_t *e,
       ares_event_t      *ev;
       ares_event_flags_t flags = 0;
 
-      ev = ares__htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
+      ev = ares_htable_asvp_get_direct(e->ev_sock_handles, fdlist[i]);
       if (ev == NULL || ev->cb == NULL) {
         continue; /* LCOV_EXCL_LINE: DefensiveCoding */
       }
diff --git a/deps/cares/src/lib/event/ares_event_thread.c b/deps/cares/src/lib/event/ares_event_thread.c
index 8b332e9b0193b7..24b55d6945728f 100644
--- a/deps/cares/src/lib/event/ares_event_thread.c
+++ b/deps/cares/src/lib/event/ares_event_thread.c
@@ -77,11 +77,11 @@ static void ares_event_thread_wake(const ares_event_thread_t *e)
 static ares_event_t *ares_event_update_find(ares_event_thread_t *e,
                                             ares_socket_t fd, const void *data)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
-  for (node = ares__llist_node_first(e->ev_updates); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares_event_t *ev = ares__llist_node_val(node);
+  for (node = ares_llist_node_first(e->ev_updates); node != NULL;
+       node = ares_llist_node_next(node)) {
+    ares_event_t *ev = ares_llist_node_val(node);
 
     if (fd != ARES_SOCKET_BAD && fd == ev->fd && ev->flags != 0) {
       return ev;
@@ -134,7 +134,7 @@ ares_status_t ares_event_update(ares_event_t **event, ares_event_thread_t *e,
 
   /* That's all the validation we can really do */
 
-  ares__thread_mutex_lock(e->mutex);
+  ares_thread_mutex_lock(e->mutex);
 
   /* See if we have a queued update already */
   ev = ares_event_update_find(e, fd, data);
@@ -146,7 +146,7 @@ ares_status_t ares_event_update(ares_event_t **event, ares_event_thread_t *e,
       goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    if (ares__llist_insert_last(e->ev_updates, ev) == NULL) {
+    if (ares_llist_insert_last(e->ev_updates, ev) == NULL) {
       ares_free(ev);        /* LCOV_EXCL_LINE: OutOfMemory */
       status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
       goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -180,7 +180,7 @@ ares_status_t ares_event_update(ares_event_t **event, ares_event_thread_t *e,
     ares_event_thread_wake(e);
   }
 
-  ares__thread_mutex_unlock(e->mutex);
+  ares_thread_mutex_unlock(e->mutex);
 
   return status;
 }
@@ -189,11 +189,18 @@ static void ares_event_thread_process_fd(ares_event_thread_t *e,
                                          ares_socket_t fd, void *data,
                                          ares_event_flags_t flags)
 {
+  ares_fd_events_t event;
   (void)data;
 
-  ares_process_fd(e->channel,
-                  (flags & ARES_EVENT_FLAG_READ) ? fd : ARES_SOCKET_BAD,
-                  (flags & ARES_EVENT_FLAG_WRITE) ? fd : ARES_SOCKET_BAD);
+  event.fd     = fd;
+  event.events = 0;
+  if (flags & ARES_EVENT_FLAG_READ) {
+    event.events |= ARES_FD_EVENT_READ;
+  }
+  if (flags & ARES_EVENT_FLAG_WRITE) {
+    event.events |= ARES_FD_EVENT_WRITE;
+  }
+  ares_process_fds(e->channel, &event, 1, ARES_PROCESS_FLAG_SKIP_NON_FD);
 }
 
 static void ares_event_thread_sockstate_cb(void *data, ares_socket_t socket_fd,
@@ -216,20 +223,31 @@ static void ares_event_thread_sockstate_cb(void *data, ares_socket_t socket_fd,
                     NULL, NULL, NULL);
 }
 
+static void notifywrite_cb(void *data)
+{
+  ares_event_thread_t *e = data;
+
+  ares_thread_mutex_lock(e->mutex);
+  e->process_pending_write = ARES_TRUE;
+  ares_thread_mutex_unlock(e->mutex);
+
+  ares_event_thread_wake(e);
+}
+
 static void ares_event_process_updates(ares_event_thread_t *e)
 {
-  ares__llist_node_t *node;
+  ares_llist_node_t *node;
 
   /* Iterate across all updates and apply to internal list, removing from update
    * list */
-  while ((node = ares__llist_node_first(e->ev_updates)) != NULL) {
-    ares_event_t *newev = ares__llist_node_claim(node);
+  while ((node = ares_llist_node_first(e->ev_updates)) != NULL) {
+    ares_event_t *newev = ares_llist_node_claim(node);
     ares_event_t *oldev;
 
     if (newev->fd == ARES_SOCKET_BAD) {
-      oldev = ares__htable_vpvp_get_direct(e->ev_cust_handles, newev->data);
+      oldev = ares_htable_vpvp_get_direct(e->ev_cust_handles, newev->data);
     } else {
-      oldev = ares__htable_asvp_get_direct(e->ev_sock_handles, newev->fd);
+      oldev = ares_htable_asvp_get_direct(e->ev_sock_handles, newev->fd);
     }
 
     /* Adding new */
@@ -244,9 +262,9 @@ static void ares_event_process_updates(ares_event_thread_t *e)
         ares_event_destroy_cb(newev);
       } else {
         if (newev->fd == ARES_SOCKET_BAD) {
-          ares__htable_vpvp_insert(e->ev_cust_handles, newev->data, newev);
+          ares_htable_vpvp_insert(e->ev_cust_handles, newev->data, newev);
         } else {
-          ares__htable_asvp_insert(e->ev_sock_handles, newev->fd, newev);
+          ares_htable_asvp_insert(e->ev_sock_handles, newev->fd, newev);
         }
       }
       continue;
@@ -257,9 +275,9 @@ static void ares_event_process_updates(ares_event_thread_t *e)
       /* the callback for the removal will call e->ev_sys->event_del(e, event)
        */
       if (newev->fd == ARES_SOCKET_BAD) {
-        ares__htable_vpvp_remove(e->ev_cust_handles, newev->data);
+        ares_htable_vpvp_remove(e->ev_cust_handles, newev->data);
       } else {
-        ares__htable_asvp_remove(e->ev_sock_handles, newev->fd);
+        ares_htable_asvp_remove(e->ev_sock_handles, newev->fd);
       }
       ares_free(newev);
       continue;
@@ -276,22 +294,22 @@ static void ares_event_thread_cleanup(ares_event_thread_t *e)
 {
   /* Manually free any updates that weren't processed */
   if (e->ev_updates != NULL) {
-    ares__llist_node_t *node;
+    ares_llist_node_t *node;
 
-    while ((node = ares__llist_node_first(e->ev_updates)) != NULL) {
-      ares_event_destroy_cb(ares__llist_node_claim(node));
+    while ((node = ares_llist_node_first(e->ev_updates)) != NULL) {
+      ares_event_destroy_cb(ares_llist_node_claim(node));
     }
-    ares__llist_destroy(e->ev_updates);
+    ares_llist_destroy(e->ev_updates);
     e->ev_updates = NULL;
   }
 
   if (e->ev_sock_handles != NULL) {
-    ares__htable_asvp_destroy(e->ev_sock_handles);
+    ares_htable_asvp_destroy(e->ev_sock_handles);
     e->ev_sock_handles = NULL;
   }
 
   if (e->ev_cust_handles != NULL) {
-    ares__htable_vpvp_destroy(e->ev_cust_handles);
+    ares_htable_vpvp_destroy(e->ev_cust_handles);
     e->ev_cust_handles = NULL;
   }
 
@@ -304,19 +322,20 @@ static void ares_event_thread_cleanup(ares_event_thread_t *e)
 static void *ares_event_thread(void *arg)
 {
   ares_event_thread_t *e = arg;
-  ares__thread_mutex_lock(e->mutex);
+  ares_thread_mutex_lock(e->mutex);
 
   while (e->isup) {
     struct timeval        tv;
     const struct timeval *tvout;
     unsigned long         timeout_ms = 0; /* 0 = unlimited */
+    ares_bool_t           process_pending_write;
 
     ares_event_process_updates(e);
 
     /* Don't hold a mutex while waiting on events or calling into anything
      * that might require a c-ares channel lock since a callback could be
      * triggered cross-thread */
-    ares__thread_mutex_unlock(e->mutex);
+    ares_thread_mutex_unlock(e->mutex);
 
     tvout = ares_timeout(e->channel, NULL, &tv);
     if (tvout != NULL) {
@@ -326,19 +345,29 @@ static void *ares_event_thread(void *arg)
 
     e->ev_sys->wait(e, timeout_ms);
 
-    /* Each iteration should do timeout processing */
+    /* Process pending write operation */
+    ares_thread_mutex_lock(e->mutex);
+    process_pending_write    = e->process_pending_write;
+    e->process_pending_write = ARES_FALSE;
+    ares_thread_mutex_unlock(e->mutex);
+    if (process_pending_write) {
+      ares_process_pending_write(e->channel);
+    }
+
+    /* Each iteration should do timeout processing and any other cleanup
+     * that may not have been performed */
     if (e->isup) {
-      ares_process_fd(e->channel, ARES_SOCKET_BAD, ARES_SOCKET_BAD);
+      ares_process_fds(e->channel, NULL, 0, ARES_PROCESS_FLAG_NONE);
     }
 
     /* Relock before we loop again */
-    ares__thread_mutex_lock(e->mutex);
+    ares_thread_mutex_lock(e->mutex);
   }
 
   /* Lets cleanup while we're in the thread itself */
   ares_event_thread_cleanup(e);
 
-  ares__thread_mutex_unlock(e->mutex);
+  ares_thread_mutex_unlock(e->mutex);
 
   return NULL;
 }
@@ -346,17 +375,17 @@ static void *ares_event_thread(void *arg)
 static void ares_event_thread_destroy_int(ares_event_thread_t *e)
 {
   /* Wake thread and tell it to shutdown if it exists */
-  ares__thread_mutex_lock(e->mutex);
+  ares_thread_mutex_lock(e->mutex);
   if (e->isup) {
     e->isup = ARES_FALSE;
     ares_event_thread_wake(e);
   }
-  ares__thread_mutex_unlock(e->mutex);
+  ares_thread_mutex_unlock(e->mutex);
 
   /* Wait for thread to shutdown */
   if (e->thread) {
     void *rv = NULL;
-    ares__thread_join(e->thread, &rv);
+    ares_thread_join(e->thread, &rv);
     e->thread = NULL;
   }
 
@@ -364,7 +393,7 @@ static void ares_event_thread_destroy_int(ares_event_thread_t *e)
    * as it runs this same cleanup when it shuts down */
   ares_event_thread_cleanup(e);
 
-  ares__thread_mutex_destroy(e->mutex);
+  ares_thread_mutex_destroy(e->mutex);
   e->mutex = NULL;
 
   ares_free(e);
@@ -379,8 +408,10 @@ void ares_event_thread_destroy(ares_channel_t *channel)
   }
 
   ares_event_thread_destroy_int(e);
-  channel->sock_state_cb_data = NULL;
-  channel->sock_state_cb      = NULL;
+  channel->sock_state_cb_data           = NULL;
+  channel->sock_state_cb                = NULL;
+  channel->notify_pending_write_cb      = NULL;
+  channel->notify_pending_write_cb_data = NULL;
 }
 
 static const ares_event_sys_t *ares_event_fetch_sys(ares_evsys_t evsys)
@@ -451,25 +482,25 @@ ares_status_t ares_event_thread_init(ares_channel_t *channel)
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  e->mutex = ares__thread_mutex_create();
+  e->mutex = ares_thread_mutex_create();
   if (e->mutex == NULL) {
     ares_event_thread_destroy_int(e); /* LCOV_EXCL_LINE: OutOfMemory */
     return ARES_ENOMEM;               /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  e->ev_updates = ares__llist_create(NULL);
+  e->ev_updates = ares_llist_create(NULL);
   if (e->ev_updates == NULL) {
     ares_event_thread_destroy_int(e); /* LCOV_EXCL_LINE: OutOfMemory */
     return ARES_ENOMEM;               /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  e->ev_sock_handles = ares__htable_asvp_create(ares_event_destroy_cb);
+  e->ev_sock_handles = ares_htable_asvp_create(ares_event_destroy_cb);
   if (e->ev_sock_handles == NULL) {
     ares_event_thread_destroy_int(e); /* LCOV_EXCL_LINE: OutOfMemory */
     return ARES_ENOMEM;               /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  e->ev_cust_handles = ares__htable_vpvp_create(NULL, ares_event_destroy_cb);
+  e->ev_cust_handles = ares_htable_vpvp_create(NULL, ares_event_destroy_cb);
   if (e->ev_cust_handles == NULL) {
     ares_event_thread_destroy_int(e); /* LCOV_EXCL_LINE: OutOfMemory */
     return ARES_ENOMEM;               /* LCOV_EXCL_LINE: OutOfMemory */
@@ -483,8 +514,10 @@ ares_status_t ares_event_thread_init(ares_channel_t *channel)
     return ARES_ENOTIMP;              /* LCOV_EXCL_LINE: UntestablePath */
   }
 
-  channel->sock_state_cb      = ares_event_thread_sockstate_cb;
-  channel->sock_state_cb_data = e;
+  channel->sock_state_cb                = ares_event_thread_sockstate_cb;
+  channel->sock_state_cb_data           = e;
+  channel->notify_pending_write_cb      = notifywrite_cb;
+  channel->notify_pending_write_cb_data = e;
 
   if (!e->ev_sys->init(e)) {
     /* LCOV_EXCL_START: UntestablePath */
@@ -503,7 +536,7 @@ ares_status_t ares_event_thread_init(ares_channel_t *channel)
   ares_event_process_updates(e);
 
   /* Start thread */
-  if (ares__thread_create(&e->thread, ares_event_thread, e) != ARES_SUCCESS) {
+  if (ares_thread_create(&e->thread, ares_event_thread, e) != ARES_SUCCESS) {
     /* LCOV_EXCL_START: UntestablePath */
     ares_event_thread_destroy_int(e);
     channel->sock_state_cb      = NULL;
diff --git a/deps/cares/src/lib/event/ares_event_win32.c b/deps/cares/src/lib/event/ares_event_win32.c
index 0b7e535bbbf538..1531b6d81ddca4 100644
--- a/deps/cares/src/lib/event/ares_event_win32.c
+++ b/deps/cares/src/lib/event/ares_event_win32.c
@@ -204,14 +204,14 @@ typedef struct {
   NtCancelIoFileEx_t      NtCancelIoFileEx;
 
   /* Implementation details */
-  ares__slist_t          *afd_handles;
+  ares_slist_t           *afd_handles;
   HANDLE                  iocp_handle;
 
   /* IO_STATUS_BLOCK * -> ares_evsys_win32_eventdata_t * mapping.  There is
    * no completion key passed to IOCP with this method so we have to look
    * up based on the lpOverlapped returned (which is mapped to IO_STATUS_BLOCK)
    */
-  ares__htable_vpvp_t    *sockets;
+  ares_htable_vpvp_t     *sockets;
 
   /* Flag about whether or not we are shutting down */
   ares_bool_t             is_shutdown;
@@ -226,24 +226,24 @@ typedef enum {
 
 typedef struct {
   /*! Pointer to parent event container */
-  ares_event_t         *event;
+  ares_event_t        *event;
   /*! Socket passed in to monitor */
-  SOCKET                socket;
+  SOCKET               socket;
   /*! Base socket derived from provided socket */
-  SOCKET                base_socket;
+  SOCKET               base_socket;
   /*! Structure for submitting AFD POLL requests (Internals!) */
-  AFD_POLL_INFO         afd_poll_info;
+  AFD_POLL_INFO        afd_poll_info;
   /*! Status of current polling operation */
-  poll_status_t         poll_status;
+  poll_status_t        poll_status;
   /*! IO Status Block structure submitted with AFD POLL requests and returned
    *  with IOCP results as lpOverlapped (even though its a different structure)
    */
-  IO_STATUS_BLOCK       iosb;
+  IO_STATUS_BLOCK      iosb;
   /*! AFD handle node an outstanding poll request is associated with */
-  ares__slist_node_t   *afd_handle_node;
+  ares_slist_node_t   *afd_handle_node;
   /* Lock is only for PostQueuedCompletionStatus() to prevent multiple
    * signals. Tracking via POLL_STATUS_PENDING/POLL_STATUS_NONE */
-  ares__thread_mutex_t *lock;
+  ares_thread_mutex_t *lock;
 } ares_evsys_win32_eventdata_t;
 
 static size_t ares_evsys_win32_wait(ares_event_thread_t *e,
@@ -256,12 +256,12 @@ static void   ares_iocpevent_signal(const ares_event_t *event)
   ares_evsys_win32_eventdata_t *ed          = event->data;
   ares_bool_t                   queue_event = ARES_FALSE;
 
-  ares__thread_mutex_lock(ed->lock);
+  ares_thread_mutex_lock(ed->lock);
   if (ed->poll_status != POLL_STATUS_PENDING) {
     ed->poll_status = POLL_STATUS_PENDING;
     queue_event     = ARES_TRUE;
   }
-  ares__thread_mutex_unlock(ed->lock);
+  ares_thread_mutex_unlock(ed->lock);
 
   if (!queue_event) {
     return;
@@ -277,9 +277,9 @@ static void ares_iocpevent_cb(ares_event_thread_t *e, ares_socket_t fd,
   (void)e;
   (void)fd;
   (void)flags;
-  ares__thread_mutex_lock(ed->lock);
+  ares_thread_mutex_lock(ed->lock);
   ed->poll_status = POLL_STATUS_NONE;
-  ares__thread_mutex_unlock(ed->lock);
+  ares_thread_mutex_unlock(ed->lock);
 }
 
 static ares_event_t *ares_iocpevent_create(ares_event_thread_t *e)
@@ -314,8 +314,8 @@ static void ares_evsys_win32_destroy(ares_event_thread_t *e)
 
   ew->is_shutdown = ARES_TRUE;
   CARES_DEBUG_LOG("  ** waiting on %lu remaining sockets to be destroyed\n",
-                  (unsigned long)ares__htable_vpvp_num_keys(ew->sockets));
-  while (ares__htable_vpvp_num_keys(ew->sockets)) {
+                  (unsigned long)ares_htable_vpvp_num_keys(ew->sockets));
+  while (ares_htable_vpvp_num_keys(ew->sockets)) {
     ares_evsys_win32_wait(e, 0);
   }
   CARES_DEBUG_LOG("  ** all sockets cleaned up\n");
@@ -325,9 +325,9 @@ static void ares_evsys_win32_destroy(ares_event_thread_t *e)
     CloseHandle(ew->iocp_handle);
   }
 
-  ares__slist_destroy(ew->afd_handles);
+  ares_slist_destroy(ew->afd_handles);
 
-  ares__htable_vpvp_destroy(ew->sockets);
+  ares_htable_vpvp_destroy(ew->sockets);
 
   ares_free(ew);
   e->ev_sys_data = NULL;
@@ -373,14 +373,14 @@ static void fill_object_attributes(OBJECT_ATTRIBUTES *attr,
 #  define UNICODE_STRING_CONSTANT(s) \
     { (sizeof(s) - 1) * sizeof(wchar_t), sizeof(s) * sizeof(wchar_t), L##s }
 
-static ares__slist_node_t *ares_afd_handle_create(ares_evsys_win32_t *ew)
+static ares_slist_node_t *ares_afd_handle_create(ares_evsys_win32_t *ew)
 {
   UNICODE_STRING     afd_device_name = UNICODE_STRING_CONSTANT("\\Device\\Afd");
   OBJECT_ATTRIBUTES  afd_attributes;
   NTSTATUS           status;
   IO_STATUS_BLOCK    iosb;
-  ares_afd_handle_t *afd   = ares_malloc_zero(sizeof(*afd));
-  ares__slist_node_t *node = NULL;
+  ares_afd_handle_t *afd  = ares_malloc_zero(sizeof(*afd));
+  ares_slist_node_t *node = NULL;
   if (afd == NULL) {
     goto fail;
   }
@@ -407,7 +407,7 @@ static ares__slist_node_t *ares_afd_handle_create(ares_evsys_win32_t *ew)
     goto fail;
   }
 
-  node = ares__slist_insert(ew->afd_handles, afd);
+  node = ares_slist_insert(ew->afd_handles, afd);
   if (node == NULL) {
     goto fail;
   }
@@ -422,10 +422,10 @@ static ares__slist_node_t *ares_afd_handle_create(ares_evsys_win32_t *ew)
 
 /* Fetch the lowest poll count entry, but if it exceeds the limit, create a
  * new one and return that */
-static ares__slist_node_t *ares_afd_handle_fetch(ares_evsys_win32_t *ew)
+static ares_slist_node_t *ares_afd_handle_fetch(ares_evsys_win32_t *ew)
 {
-  ares__slist_node_t *node = ares__slist_node_first(ew->afd_handles);
-  ares_afd_handle_t  *afd  = ares__slist_node_val(node);
+  ares_slist_node_t *node = ares_slist_node_first(ew->afd_handles);
+  ares_afd_handle_t *afd  = ares_slist_node_val(node);
 
   if (afd != NULL && afd->poll_cnt < AFD_POLL_PER_HANDLE) {
     return node;
@@ -488,7 +488,7 @@ static ares_bool_t ares_evsys_win32_init(ares_event_thread_t *e)
     goto fail;
   }
 
-  ew->afd_handles = ares__slist_create(
+  ew->afd_handles = ares_slist_create(
     e->channel->rand_state, ares_afd_handle_cmp, ares_afd_handle_destroy);
   if (ew->afd_handles == NULL) {
     goto fail;
@@ -505,7 +505,7 @@ static ares_bool_t ares_evsys_win32_init(ares_event_thread_t *e)
     goto fail;
   }
 
-  ew->sockets = ares__htable_vpvp_create(NULL, NULL);
+  ew->sockets = ares_htable_vpvp_create(NULL, NULL);
   if (ew->sockets == NULL) {
     goto fail;
   }
@@ -582,7 +582,7 @@ static ares_bool_t ares_evsys_win32_afd_enqueue(ares_event_t      *event,
     return ARES_FALSE;
   }
 
-  afd = ares__slist_node_val(ed->afd_handle_node);
+  afd = ares_slist_node_val(ed->afd_handle_node);
 
   /* Enqueue AFD Poll */
   ed->afd_poll_info.Exclusive         = FALSE;
@@ -621,7 +621,7 @@ static ares_bool_t ares_evsys_win32_afd_enqueue(ares_event_t      *event,
   /* Record that we submitted a poll request to this handle and tell it to
    * re-sort the node since we changed its sort value */
   afd->poll_cnt++;
-  ares__slist_node_reinsert(ed->afd_handle_node);
+  ares_slist_node_reinsert(ed->afd_handle_node);
 
   ed->poll_status = POLL_STATUS_PENDING;
   CARES_DEBUG_LOG("++ afd_enqueue ed=%p flags=%X\n", (void *)ed,
@@ -643,7 +643,7 @@ static ares_bool_t ares_evsys_win32_afd_cancel(ares_evsys_win32_eventdata_t *ed)
     return ARES_FALSE;
   }
 
-  afd = ares__slist_node_val(ed->afd_handle_node);
+  afd = ares_slist_node_val(ed->afd_handle_node);
 
   /* Misuse */
   if (afd == NULL) {
@@ -685,10 +685,10 @@ static void ares_evsys_win32_eventdata_destroy(ares_evsys_win32_t           *ew,
                   (ed->socket == ARES_SOCKET_BAD) ? "data" : "socket");
   /* These type of handles are deferred destroy. Update tracking. */
   if (ed->socket != ARES_SOCKET_BAD) {
-    ares__htable_vpvp_remove(ew->sockets, &ed->iosb);
+    ares_htable_vpvp_remove(ew->sockets, &ed->iosb);
   }
 
-  ares__thread_mutex_destroy(ed->lock);
+  ares_thread_mutex_destroy(ed->lock);
 
   if (ed->event != NULL) {
     ed->event->data = NULL;
@@ -718,7 +718,7 @@ static ares_bool_t ares_evsys_win32_event_add(ares_event_t *event)
    * the ares_evsys_win32_eventdata_t as the placeholder to use as the
    * IOCP Completion Key */
   if (ed->socket == ARES_SOCKET_BAD) {
-    ed->lock = ares__thread_mutex_create();
+    ed->lock = ares_thread_mutex_create();
     if (ed->lock == NULL) {
       goto done;
     }
@@ -731,7 +731,7 @@ static ares_bool_t ares_evsys_win32_event_add(ares_event_t *event)
     goto done;
   }
 
-  if (!ares__htable_vpvp_insert(ew->sockets, &ed->iosb, ed)) {
+  if (!ares_htable_vpvp_insert(ew->sockets, &ed->iosb, ed)) {
     goto done;
   }
 
@@ -859,9 +859,9 @@ static ares_bool_t ares_evsys_win32_process_socket_event(
 
   /* Decrement poll count for AFD handle then resort, also disassociate
    * with socket */
-  afd = ares__slist_node_val(ed->afd_handle_node);
+  afd = ares_slist_node_val(ed->afd_handle_node);
   afd->poll_cnt--;
-  ares__slist_node_reinsert(ed->afd_handle_node);
+  ares_slist_node_reinsert(ed->afd_handle_node);
   ed->afd_handle_node = NULL;
 
   /* Pending destroy, go ahead and kill it */
@@ -946,7 +946,7 @@ static size_t ares_evsys_win32_wait(ares_event_thread_t *e,
         ed = (ares_evsys_win32_eventdata_t *)entries[i].lpCompletionKey;
         rc = ares_evsys_win32_process_other_event(ew, ed, i);
       } else {
-        ed = ares__htable_vpvp_get_direct(ew->sockets, entries[i].lpOverlapped);
+        ed = ares_htable_vpvp_get_direct(ew->sockets, entries[i].lpOverlapped);
         rc = ares_evsys_win32_process_socket_event(ew, ed, i);
       }
 
diff --git a/deps/cares/src/lib/dsa/ares__array.h b/deps/cares/src/lib/include/ares_array.h
similarity index 61%
rename from deps/cares/src/lib/dsa/ares__array.h
rename to deps/cares/src/lib/include/ares_array.h
index 6fa1c0e15e9301..f1a2e155f37f82 100644
--- a/deps/cares/src/lib/dsa/ares__array.h
+++ b/deps/cares/src/lib/include/ares_array.h
@@ -26,7 +26,9 @@
 #ifndef __ARES__ARRAY_H
 #define __ARES__ARRAY_H
 
-/*! \addtogroup ares__array Array Data Structure
+#include "ares.h"
+
+/*! \addtogroup ares_array Array Data Structure
  *
  * This is an array with helpers.  It is meant to have as little overhead
  * as possible over direct array management by applications but to provide
@@ -36,17 +38,17 @@
  * @{
  */
 
-struct ares__array;
+struct ares_array;
 
 /*! Opaque data structure for array */
-typedef struct ares__array ares__array_t;
+typedef struct ares_array ares_array_t;
 
 /*! Callback to free user-defined member data
  *
  *  \param[in] data  pointer to member of array to be destroyed. The pointer
  *                   itself must not be destroyed, just the data it contains.
  */
-typedef void (*ares__array_destructor_t)(void *data);
+typedef void (*ares_array_destructor_t)(void *data);
 
 /*! Callback to compare two array elements used for sorting
  *
@@ -54,7 +56,7 @@ typedef void (*ares__array_destructor_t)(void *data);
  *  \param[in] data2 array member 2
  *  \return < 0 if data1 < data2, > 0 if data1 > data2, 0 if data1 == data2
  */
-typedef int (*ares__array_cmp_t)(const void *data1, const void *data2);
+typedef int (*ares_array_cmp_t)(const void *data1, const void *data2);
 
 /*! Create an array object
  *
@@ -72,8 +74,8 @@ typedef int (*ares__array_cmp_t)(const void *data1, const void *data2);
  *
  *  \return array object or NULL on out of memory
  */
-ares__array_t *ares__array_create(size_t                   member_size,
-                                  ares__array_destructor_t destruct);
+CARES_EXTERN ares_array_t *ares_array_create(size_t member_size,
+                                             ares_array_destructor_t destruct);
 
 
 /*! Request the array be at least the requested size.  Useful if the desired
@@ -83,7 +85,7 @@ ares__array_t *ares__array_create(size_t                   member_size,
  *  \param[in] size Minimum number of members
  *  \return ARES_SUCCESS on success, ARES_EFORMERR on misuse,
  *    ARES_ENOMEM on out of memory */
-ares_status_t  ares__array_set_size(ares__array_t *arr, size_t size);
+CARES_EXTERN ares_status_t ares_array_set_size(ares_array_t *arr, size_t size);
 
 /*! Sort the array using the given comparison function.  This is not
  *  persistent, any future elements inserted will not maintain this sort.
@@ -92,14 +94,15 @@ ares_status_t  ares__array_set_size(ares__array_t *arr, size_t size);
  *  \param[in]  cb       Sort callback
  *  \return ARES_SUCCESS on success
  */
-ares_status_t  ares__array_sort(ares__array_t *arr, ares__array_cmp_t cmp);
+CARES_EXTERN ares_status_t ares_array_sort(ares_array_t    *arr,
+                                           ares_array_cmp_t cmp);
 
 /*! Destroy an array object.  If a destructor is set, will be called on each
  *  member of the array.
  *
  *  \param[in] arr     Initialized array object.
  */
-void           ares__array_destroy(ares__array_t *arr);
+CARES_EXTERN void          ares_array_destroy(ares_array_t *arr);
 
 /*! Retrieve the array in the native format.  This will also destroy the
  *  container.  It is the responsibility of the caller to free the returned
@@ -109,14 +112,14 @@ void           ares__array_destroy(ares__array_t *arr);
  *  \param[out] num_members the number of members in the returned array
  *  \return pointer to native array on success, NULL on failure.
  */
-void          *ares__array_finish(ares__array_t *arr, size_t *num_members);
+CARES_EXTERN void  *ares_array_finish(ares_array_t *arr, size_t *num_members);
 
 /*! Retrieve the number of members in the array
  *
  *  \param[in] arr     Initialized array object.
  *  \return numbrer of members
  */
-size_t         ares__array_len(const ares__array_t *arr);
+CARES_EXTERN size_t ares_array_len(const ares_array_t *arr);
 
 /*! Insert a new array member at the given index
  *
@@ -127,8 +130,8 @@ size_t         ares__array_len(const ares__array_t *arr);
  *  \return ARES_SUCCESS on success, ARES_EFORMERR on bad index,
  *          ARES_ENOMEM on out of memory.
  */
-ares_status_t  ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
-                                     size_t idx);
+CARES_EXTERN ares_status_t ares_array_insert_at(void        **elem_ptr,
+                                                ares_array_t *arr, size_t idx);
 
 /*! Insert a new array member at the end of the array
  *
@@ -136,7 +139,8 @@ ares_status_t  ares__array_insert_at(void **elem_ptr, ares__array_t *arr,
  *  \param[in]  arr      Initialized array object.
  *  \return ARES_SUCCESS on success, ARES_ENOMEM on out of memory.
  */
-ares_status_t  ares__array_insert_last(void **elem_ptr, ares__array_t *arr);
+CARES_EXTERN ares_status_t ares_array_insert_last(void        **elem_ptr,
+                                                  ares_array_t *arr);
 
 /*! Insert a new array member at the beginning of the array
  *
@@ -144,39 +148,87 @@ ares_status_t  ares__array_insert_last(void **elem_ptr, ares__array_t *arr);
  *  \param[in]  arr      Initialized array object.
  *  \return ARES_SUCCESS on success, ARES_ENOMEM on out of memory.
  */
-ares_status_t  ares__array_insert_first(void **elem_ptr, ares__array_t *arr);
+CARES_EXTERN ares_status_t ares_array_insert_first(void        **elem_ptr,
+                                                   ares_array_t *arr);
+
+
+/*! Insert a new array member at the given index and copy the data pointed
+ *  to by the data pointer into the array.  This will copy member_size bytes
+ *  from the provided pointer, this may not be safe for some data types
+ *  that may have a smaller size than the provided member_size which includes
+ *  padding as discussed in ares_array_create().
+ *
+ *  \param[in]  arr      Initialized array object.
+ *  \param[in]  idx      Index in array to place new element, will shift any
+ *                       elements down that exist after this point.
+ *  \param[in]  data_ptr Pointer to data to copy into array.
+ *  \return ARES_SUCCESS on success, ARES_EFORMERR on bad index or null data
+ * ptr, ARES_ENOMEM on out of memory.
+ */
+CARES_EXTERN ares_status_t ares_array_insertdata_at(ares_array_t *arr,
+                                                    size_t        idx,
+                                                    const void   *data_ptr);
+
+/*! Insert a new array member at the end of the array and copy the data pointed
+ *  to by the data pointer into the array.  This will copy member_size bytes
+ *  from the provided pointer, this may not be safe for some data types
+ *  that may have a smaller size than the provided member_size which includes
+ *  padding as discussed in ares_array_create().
+ *
+ *  \param[in]  arr      Initialized array object.
+ *  \param[in]  data_ptr Pointer to data to copy into array.
+ *  \return ARES_SUCCESS on success, ARES_EFORMERR on bad index or null data
+ * ptr, ARES_ENOMEM on out of memory.
+ */
+CARES_EXTERN ares_status_t ares_array_insertdata_last(ares_array_t *arr,
+                                                      const void   *data_ptr);
+
+/*! Insert a new array member at the beginning of the array and copy the data
+ * pointed to by the data pointer into the array.  This will copy member_size
+ * bytes from the provided pointer, this may not be safe for some data types
+ *  that may have a smaller size than the provided member_size which includes
+ *  padding as discussed in ares_array_create().
+ *
+ *  \param[in]  arr      Initialized array object.
+ *  \param[in]  data_ptr Pointer to data to copy into array.
+ *  \return ARES_SUCCESS on success, ARES_EFORMERR on bad index or null data
+ * ptr, ARES_ENOMEM on out of memory.
+ */
+CARES_EXTERN ares_status_t ares_array_insertdata_first(ares_array_t *arr,
+                                                       const void   *data_ptr);
 
 /*! Fetch a pointer to the given element in the array
  *  \param[in]  array  Initialized array object
  *  \param[in]  idx    Index to fetch
  *  \return pointer on success, NULL on failure */
-void          *ares__array_at(ares__array_t *arr, size_t idx);
+CARES_EXTERN void         *ares_array_at(ares_array_t *arr, size_t idx);
 
 /*! Fetch a pointer to the first element in the array
  *  \param[in]  array  Initialized array object
  *  \return pointer on success, NULL on failure */
-void          *ares__array_first(ares__array_t *arr);
+CARES_EXTERN void         *ares_array_first(ares_array_t *arr);
 
 /*! Fetch a pointer to the last element in the array
  *  \param[in]  array  Initialized array object
  *  \return pointer on success, NULL on failure */
-void          *ares__array_last(ares__array_t *arr);
+CARES_EXTERN void         *ares_array_last(ares_array_t *arr);
 
 /*! Fetch a constant pointer to the given element in the array
  *  \param[in]  array  Initialized array object
  *  \param[in]  idx    Index to fetch
  *  \return pointer on success, NULL on failure */
-const void    *ares__array_at_const(const ares__array_t *arr, size_t idx);
+CARES_EXTERN const void   *ares_array_at_const(const ares_array_t *arr,
+                                               size_t              idx);
 
 /*! Fetch a constant pointer to the first element in the array
  *  \param[in]  array  Initialized array object
  *  \return pointer on success, NULL on failure */
-const void    *ares__array_first_const(const ares__array_t *arr);
+CARES_EXTERN const void   *ares_array_first_const(const ares_array_t *arr);
 
 /*! Fetch a constant pointer to the last element in the array
  *  \param[in]  array  Initialized array object
  *  \return pointer on success, NULL on failure */
-const void    *ares__array_last_const(const ares__array_t *arr);
+CARES_EXTERN const void   *ares_array_last_const(const ares_array_t *arr);
 
 /*! Claim the data from the specified array index, copying it to the buffer
  *  provided by the caller.  The index specified in the array will then be
@@ -187,13 +239,13 @@ const void    *ares__array_last_const(const ares__array_t *arr);
  *                           member needs destructor if not provided.
  *  \param[in]     dest_size Size of buffer provided, used as a sanity check.
  *                           Must match member_size provided to
- *                           ares__array_create() if dest_size specified.
+ *                           ares_array_create() if dest_size specified.
  *  \param[in]     arr       Initialized array object
  *  \param[in]     idx       Index to claim
  *  \return ARES_SUCCESS on success, ARES_EFORMERR on usage failure.
  */
-ares_status_t  ares__array_claim_at(void *dest, size_t dest_size,
-                                    ares__array_t *arr, size_t idx);
+CARES_EXTERN ares_status_t ares_array_claim_at(void *dest, size_t dest_size,
+                                               ares_array_t *arr, size_t idx);
 
 /*! Remove the member at the specified array index.  The destructor will be
  *  called.
@@ -202,21 +254,22 @@ ares_status_t  ares__array_claim_at(void *dest, size_t dest_size,
  *  \param[in] idx  Index to remove
  *  \return ARES_SUCCESS if removed, ARES_EFORMERR on invalid use
  */
-ares_status_t  ares__array_remove_at(ares__array_t *arr, size_t idx);
+CARES_EXTERN ares_status_t ares_array_remove_at(ares_array_t *arr, size_t idx);
 
 /*! Remove the first member of the array.
  *
  *  \param[in] arr  Initialized array object
  *  \return ARES_SUCCESS if removed, ARES_EFORMERR on invalid use
  */
-ares_status_t  ares__array_remove_first(ares__array_t *arr);
+CARES_EXTERN ares_status_t ares_array_remove_first(ares_array_t *arr);
 
 /*! Remove the last member of the array.
  *
  *  \param[in] arr  Initialized array object
  *  \return ARES_SUCCESS if removed, ARES_EFORMERR on invalid use
  */
-ares_status_t  ares__array_remove_last(ares__array_t *arr);
+CARES_EXTERN ares_status_t ares_array_remove_last(ares_array_t *arr);
+
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/str/ares__buf.h b/deps/cares/src/lib/include/ares_buf.h
similarity index 60%
rename from deps/cares/src/lib/str/ares__buf.h
rename to deps/cares/src/lib/include/ares_buf.h
index cb887aa27edcdf..7836a313e066d1 100644
--- a/deps/cares/src/lib/str/ares__buf.h
+++ b/deps/cares/src/lib/include/ares_buf.h
@@ -26,7 +26,10 @@
 #ifndef __ARES__BUF_H
 #define __ARES__BUF_H
 
-/*! \addtogroup ares__buf Safe Data Builder and buffer
+#include "ares.h"
+#include "ares_array.h"
+
+/*! \addtogroup ares_buf Safe Data Builder and buffer
  *
  * This is a buffer building and parsing framework with a focus on security over
  * performance. All data to be read from the buffer will perform explicit length
@@ -42,16 +45,16 @@
  *
  * @{
  */
-struct ares__buf;
+struct ares_buf;
 
 /*! Opaque data type for generic hash table implementation */
-typedef struct ares__buf ares__buf_t;
+typedef struct ares_buf     ares_buf_t;
 
 /*! Create a new buffer object that dynamically allocates buffers for data.
  *
  *  \return initialized buffer object or NULL if out of memory.
  */
-ares__buf_t             *ares__buf_create(void);
+CARES_EXTERN ares_buf_t    *ares_buf_create(void);
 
 /*! Create a new buffer object that uses a user-provided data pointer.  The
  *  data provided will not be manipulated, and cannot be appended to.  This
@@ -62,14 +65,15 @@ ares__buf_t             *ares__buf_create(void);
  *
  *  \return initialized buffer object or NULL if out of memory or misuse.
  */
-ares__buf_t *ares__buf_create_const(const unsigned char *data, size_t data_len);
+CARES_EXTERN ares_buf_t    *ares_buf_create_const(const unsigned char *data,
+                                                  size_t               data_len);
 
 
 /*! Destroy an initialized buffer object.
  *
  *  \param[in] buf  Initialized buf object
  */
-void         ares__buf_destroy(ares__buf_t *buf);
+CARES_EXTERN void           ares_buf_destroy(ares_buf_t *buf);
 
 
 /*! Append multiple bytes to a dynamic buffer object
@@ -79,8 +83,9 @@ void         ares__buf_destroy(ares__buf_t *buf);
  *  \param[in] data_len Length of data to copy to buffer object.
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_append(ares__buf_t *buf, const unsigned char *data,
-                                size_t data_len);
+CARES_EXTERN ares_status_t  ares_buf_append(ares_buf_t          *buf,
+                                            const unsigned char *data,
+                                            size_t               data_len);
 
 /*! Append a single byte to the dynamic buffer object
  *
@@ -88,7 +93,8 @@ ares_status_t  ares__buf_append(ares__buf_t *buf, const unsigned char *data,
  *  \param[in] b        Single byte to append to buffer object.
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_append_byte(ares__buf_t *buf, unsigned char b);
+CARES_EXTERN ares_status_t  ares_buf_append_byte(ares_buf_t   *buf,
+                                                 unsigned char b);
 
 /*! Append a null-terminated string to the dynamic buffer object
  *
@@ -96,7 +102,8 @@ ares_status_t  ares__buf_append_byte(ares__buf_t *buf, unsigned char b);
  *  \param[in] str      String to append to buffer object.
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_append_str(ares__buf_t *buf, const char *str);
+CARES_EXTERN ares_status_t  ares_buf_append_str(ares_buf_t *buf,
+                                                const char *str);
 
 /*! Append a 16bit Big Endian number to the buffer.
  *
@@ -104,7 +111,8 @@ ares_status_t  ares__buf_append_str(ares__buf_t *buf, const char *str);
  *  \param[out] u16     16bit integer
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_append_be16(ares__buf_t *buf, unsigned short u16);
+CARES_EXTERN ares_status_t  ares_buf_append_be16(ares_buf_t    *buf,
+                                                 unsigned short u16);
 
 /*! Append a 32bit Big Endian number to the buffer.
  *
@@ -112,7 +120,8 @@ ares_status_t  ares__buf_append_be16(ares__buf_t *buf, unsigned short u16);
  *  \param[out] u32     32bit integer
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_append_be32(ares__buf_t *buf, unsigned int u32);
+CARES_EXTERN ares_status_t  ares_buf_append_be32(ares_buf_t  *buf,
+                                                 unsigned int u32);
 
 /*! Append a number in ASCII decimal form.
  *
@@ -121,8 +130,8 @@ ares_status_t  ares__buf_append_be32(ares__buf_t *buf, unsigned int u32);
  *  \param[in] len  Length to output, use 0 for no padding
  *  \return ARES_SUCCESS on success
  */
-ares_status_t  ares__buf_append_num_dec(ares__buf_t *buf, size_t num,
-                                        size_t len);
+CARES_EXTERN ares_status_t  ares_buf_append_num_dec(ares_buf_t *buf, size_t num,
+                                                    size_t len);
 
 /*! Append a number in ASCII hexadecimal form.
  *
@@ -131,8 +140,8 @@ ares_status_t  ares__buf_append_num_dec(ares__buf_t *buf, size_t num,
  *  \param[in] len  Length to output, use 0 for no padding
  *  \return ARES_SUCCESS on success
  */
-ares_status_t  ares__buf_append_num_hex(ares__buf_t *buf, size_t num,
-                                        size_t len);
+CARES_EXTERN ares_status_t  ares_buf_append_num_hex(ares_buf_t *buf, size_t num,
+                                                    size_t len);
 
 /*! Sets the current buffer length.  This *may* be used if there is a need to
  *  override a prior position in the buffer, such as if there is a length
@@ -147,13 +156,13 @@ ares_status_t  ares__buf_append_num_hex(ares__buf_t *buf, size_t num,
  *  \param[in]  len  Length to set
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_set_length(ares__buf_t *buf, size_t len);
+CARES_EXTERN ares_status_t  ares_buf_set_length(ares_buf_t *buf, size_t len);
 
 
 /*! Start a dynamic append operation that returns a buffer suitable for
  *  writing.  A desired minimum length is passed in, and the actual allocated
  *  buffer size is returned which may be greater than the requested size.
- *  No operation other than ares__buf_append_finish() is allowed on the
+ *  No operation other than ares_buf_append_finish() is allowed on the
  *  buffer after this request.
  *
  *  \param[in]     buf     Initialized buffer object
@@ -161,17 +170,17 @@ ares_status_t  ares__buf_set_length(ares__buf_t *buf, size_t len);
  *                         returned.
  *  \return Pointer to writable buffer or NULL on failure (usage, out of mem)
  */
-unsigned char *ares__buf_append_start(ares__buf_t *buf, size_t *len);
+CARES_EXTERN unsigned char *ares_buf_append_start(ares_buf_t *buf, size_t *len);
 
 /*! Finish a dynamic append operation.  Called after
- *  ares__buf_append_start() once desired data is written.
+ *  ares_buf_append_start() once desired data is written.
  *
  *  \param[in] buf    Initialized buffer object.
  *  \param[in] len    Length of data written.  May be zero to terminate
  *                    operation. Must not be greater than returned from
- *                    ares__buf_append_start().
+ *                    ares_buf_append_start().
  */
-void           ares__buf_append_finish(ares__buf_t *buf, size_t len);
+CARES_EXTERN void           ares_buf_append_finish(ares_buf_t *buf, size_t len);
 
 /*! Write the data provided to the buffer in a hexdump format.
  *
@@ -180,10 +189,11 @@ void           ares__buf_append_finish(ares__buf_t *buf, size_t len);
  *  \param[in] len      Length of data to hexdump
  *  \return ARES_SUCCESS on success.
  */
-ares_status_t  ares__buf_hexdump(ares__buf_t *buf, const unsigned char *data,
-                                 size_t len);
+CARES_EXTERN ares_status_t  ares_buf_hexdump(ares_buf_t          *buf,
+                                             const unsigned char *data,
+                                             size_t               len);
 
-/*! Clean up ares__buf_t and return allocated pointer to unprocessed data.  It
+/*! Clean up ares_buf_t and return allocated pointer to unprocessed data.  It
  *  is the responsibility of the  caller to ares_free() the returned buffer.
  *  The passed in buf parameter is invalidated by this call.
  *
@@ -191,9 +201,9 @@ ares_status_t  ares__buf_hexdump(ares__buf_t *buf, const unsigned char *data,
  * \param[out] len    Length of data returned
  * \return pointer to unprocessed data (may be zero length) or NULL on error.
  */
-unsigned char *ares__buf_finish_bin(ares__buf_t *buf, size_t *len);
+CARES_EXTERN unsigned char *ares_buf_finish_bin(ares_buf_t *buf, size_t *len);
 
-/*! Clean up ares__buf_t and return allocated pointer to unprocessed data and
+/*! Clean up ares_buf_t and return allocated pointer to unprocessed data and
  *  return it as a string (null terminated).  It is the responsibility of the
  *  caller to ares_free() the returned buffer. The passed in buf parameter is
  *  invalidated by this call.
@@ -207,7 +217,7 @@ unsigned char *ares__buf_finish_bin(ares__buf_t *buf, size_t *len);
  * \param[out] len    Optional. Length of data returned, or NULL if not needed.
  * \return pointer to unprocessed data or NULL on error.
  */
-char          *ares__buf_finish_str(ares__buf_t *buf, size_t *len);
+CARES_EXTERN char          *ares_buf_finish_str(ares_buf_t *buf, size_t *len);
 
 /*! Tag a position to save in the buffer in case parsing needs to rollback,
  *  such as if insufficient data is available, but more data may be added in
@@ -216,14 +226,14 @@ char          *ares__buf_finish_str(ares__buf_t *buf, size_t *len);
  *
  *  \param[in] buf Initialized buffer object
  */
-void           ares__buf_tag(ares__buf_t *buf);
+CARES_EXTERN void           ares_buf_tag(ares_buf_t *buf);
 
 /*! Rollback to a tagged position.  Will automatically clear the tag.
  *
  *  \param[in] buf Initialized buffer object
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_tag_rollback(ares__buf_t *buf);
+CARES_EXTERN ares_status_t  ares_buf_tag_rollback(ares_buf_t *buf);
 
 /*! Clear the tagged position without rolling back.  You should do this any
  *  time a tag is no longer needed as future append operations can reclaim
@@ -232,25 +242,26 @@ ares_status_t  ares__buf_tag_rollback(ares__buf_t *buf);
  *  \param[in] buf Initialized buffer object
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t  ares__buf_tag_clear(ares__buf_t *buf);
+CARES_EXTERN ares_status_t  ares_buf_tag_clear(ares_buf_t *buf);
 
 /*! Fetch the buffer and length of data starting from the tagged position up
  *  to the _current_ position.  It will not unset the tagged position.  The
- *  data may be invalidated by any future ares__buf_*() calls.
+ *  data may be invalidated by any future ares_buf_*() calls.
  *
  *  \param[in]  buf    Initialized buffer object
  *  \param[out] len    Length between tag and current offset in buffer
  *  \return NULL on failure (such as no tag), otherwise pointer to start of
  *          buffer
  */
-const unsigned char *ares__buf_tag_fetch(const ares__buf_t *buf, size_t *len);
+CARES_EXTERN const unsigned char *ares_buf_tag_fetch(const ares_buf_t *buf,
+                                                     size_t           *len);
 
 /*! Get the length of the current tag offset to the current position.
  *
  *  \param[in]  buf    Initialized buffer object
  *  \return length
  */
-size_t               ares__buf_tag_length(const ares__buf_t *buf);
+CARES_EXTERN size_t               ares_buf_tag_length(const ares_buf_t *buf);
 
 /*! Fetch the bytes starting from the tagged position up to the _current_
  *  position using the provided buffer.  It will not unset the tagged position.
@@ -261,23 +272,50 @@ size_t               ares__buf_tag_length(const ares__buf_t *buf);
  *                        buffer.
  *  \return ARES_SUCCESS if fetched, ARES_EFORMERR if insufficient buffer size
  */
-ares_status_t        ares__buf_tag_fetch_bytes(const ares__buf_t *buf,
-                                               unsigned char *bytes, size_t *len);
+CARES_EXTERN ares_status_t ares_buf_tag_fetch_bytes(const ares_buf_t *buf,
+                                                    unsigned char    *bytes,
+                                                    size_t           *len);
 
 /*! Fetch the bytes starting from the tagged position up to the _current_
  *  position as a NULL-terminated string using the provided buffer.  The data
  *  is validated to be ASCII-printable data.  It will not unset the tagged
- *  poition.
+ *  position.
  *
  *  \param[in]     buf    Initialized buffer object
  *  \param[in,out] str    Buffer to hold data
- *  \param[in]     len    On input, buffer size, on output, bytes place in
- *                        buffer.
+ *  \param[in]     len    buffer size
+ *  \return ARES_SUCCESS if fetched, ARES_EFORMERR if insufficient buffer size,
+ *          ARES_EBADSTR if not printable ASCII
+ */
+CARES_EXTERN ares_status_t ares_buf_tag_fetch_string(const ares_buf_t *buf,
+                                                     char *str, size_t len);
+
+/*! Fetch the bytes starting from the tagged position up to the _current_
+ *  position as a NULL-terminated string and placed into a newly allocated
+ *  buffer.  The data is validated to be ASCII-printable data.  It will not
+ *  unset the tagged position.
+ *
+ *  \param[in]  buf    Initialized buffer object
+ *  \param[out] str    New buffer to hold output, free with ares_free()
+ *
  *  \return ARES_SUCCESS if fetched, ARES_EFORMERR if insufficient buffer size,
  *          ARES_EBADSTR if not printable ASCII
  */
-ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
-                                         size_t len);
+CARES_EXTERN ares_status_t ares_buf_tag_fetch_strdup(const ares_buf_t *buf,
+                                                     char            **str);
+
+/*! Fetch the bytes starting from the tagged position up to the _current_
+ *  position as const buffer.  Care must be taken to not append or destroy the
+ *  passed in buffer until the newly fetched buffer is no longer needed since
+ *  it points to memory inside the passed in buffer which could be invalidated.
+ *
+ *  \param[in]     buf    Initialized buffer object
+ *  \param[out]    newbuf New const buffer object, must be destroyed when done.
+
+ *  \return ARES_SUCCESS if fetched
+ */
+CARES_EXTERN ares_status_t ares_buf_tag_fetch_constbuf(const ares_buf_t *buf,
+                                                       ares_buf_t **newbuf);
 
 /*! Consume the given number of bytes without reading them.
  *
@@ -285,7 +323,7 @@ ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
  *  \param[in] len    Length to consume
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_consume(ares__buf_t *buf, size_t len);
+CARES_EXTERN ares_status_t ares_buf_consume(ares_buf_t *buf, size_t len);
 
 /*! Fetch a 16bit Big Endian number from the buffer.
  *
@@ -293,7 +331,8 @@ ares_status_t ares__buf_consume(ares__buf_t *buf, size_t len);
  *  \param[out] u16     Buffer to hold 16bit integer
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_be16(ares__buf_t *buf, unsigned short *u16);
+CARES_EXTERN ares_status_t ares_buf_fetch_be16(ares_buf_t     *buf,
+                                               unsigned short *u16);
 
 /*! Fetch a 32bit Big Endian number from the buffer.
  *
@@ -301,7 +340,8 @@ ares_status_t ares__buf_fetch_be16(ares__buf_t *buf, unsigned short *u16);
  *  \param[out] u32     Buffer to hold 32bit integer
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_be32(ares__buf_t *buf, unsigned int *u32);
+CARES_EXTERN ares_status_t ares_buf_fetch_be32(ares_buf_t   *buf,
+                                               unsigned int *u32);
 
 
 /*! Fetch the requested number of bytes into the provided buffer
@@ -311,8 +351,9 @@ ares_status_t ares__buf_fetch_be32(ares__buf_t *buf, unsigned int *u32);
  *  \param[in]  len     Requested number of bytes (must be > 0)
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_bytes(ares__buf_t *buf, unsigned char *bytes,
-                                    size_t len);
+CARES_EXTERN ares_status_t ares_buf_fetch_bytes(ares_buf_t    *buf,
+                                                unsigned char *bytes,
+                                                size_t         len);
 
 
 /*! Fetch the requested number of bytes and return a new buffer that must be
@@ -326,9 +367,9 @@ ares_status_t ares__buf_fetch_bytes(ares__buf_t *buf, unsigned char *bytes,
  *  \param[out] bytes     Pointer passed by reference. Will be allocated.
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_bytes_dup(ares__buf_t *buf, size_t len,
-                                        ares_bool_t     null_term,
-                                        unsigned char **bytes);
+CARES_EXTERN ares_status_t ares_buf_fetch_bytes_dup(ares_buf_t *buf, size_t len,
+                                                    ares_bool_t     null_term,
+                                                    unsigned char **bytes);
 
 /*! Fetch the requested number of bytes and place them into the provided
  *  dest buffer object.
@@ -338,19 +379,21 @@ ares_status_t ares__buf_fetch_bytes_dup(ares__buf_t *buf, size_t len,
  *  \param[in]  len     Requested number of bytes (must be > 0)
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_bytes_into_buf(ares__buf_t *buf,
-                                             ares__buf_t *dest, size_t len);
+CARES_EXTERN ares_status_t ares_buf_fetch_bytes_into_buf(ares_buf_t *buf,
+                                                         ares_buf_t *dest,
+                                                         size_t      len);
 
 /*! Fetch the requested number of bytes and return a new buffer that must be
  *  ares_free()'d by the caller.  The returned buffer is a null terminated
- *  string.
+ *  string.  The data is validated to be ASCII-printable.
  *
  *  \param[in]  buf     Initialized buffer object
  *  \param[in]  len     Requested number of bytes (must be > 0)
  *  \param[out] str     Pointer passed by reference. Will be allocated.
  *  \return ARES_SUCCESS or one of the c-ares error codes
  */
-ares_status_t ares__buf_fetch_str_dup(ares__buf_t *buf, size_t len, char **str);
+CARES_EXTERN ares_status_t ares_buf_fetch_str_dup(ares_buf_t *buf, size_t len,
+                                                  char **str);
 
 /*! Consume whitespace characters (0x09, 0x0B, 0x0C, 0x0D, 0x20, and optionally
  *  0x0A).
@@ -360,8 +403,8 @@ ares_status_t ares__buf_fetch_str_dup(ares__buf_t *buf, size_t len, char **str);
  *                                ARES_FALSE otherwise.
  *  \return number of whitespace characters consumed
  */
-size_t        ares__buf_consume_whitespace(ares__buf_t *buf,
-                                           ares_bool_t  include_linefeed);
+CARES_EXTERN size_t        ares_buf_consume_whitespace(ares_buf_t *buf,
+                                                       ares_bool_t include_linefeed);
 
 
 /*! Consume any non-whitespace character (anything other than 0x09, 0x0B, 0x0C,
@@ -370,7 +413,7 @@ size_t        ares__buf_consume_whitespace(ares__buf_t *buf,
  *  \param[in]  buf               Initialized buffer object
  *  \return number of characters consumed
  */
-size_t        ares__buf_consume_nonwhitespace(ares__buf_t *buf);
+CARES_EXTERN size_t        ares_buf_consume_nonwhitespace(ares_buf_t *buf);
 
 
 /*! Consume until a character in the character set provided is reached.  Does
@@ -382,13 +425,31 @@ size_t        ares__buf_consume_nonwhitespace(ares__buf_t *buf);
  *  \param[in] require_charset    require we find a character from the charset.
  *                                if ARES_FALSE it will simply consume the
  *                                rest of the buffer.  If ARES_TRUE will return
- *                                0 if not found.
+ *                                SIZE_MAX if not found.
  *  \return number of characters consumed
  */
-size_t        ares__buf_consume_until_charset(ares__buf_t         *buf,
-                                              const unsigned char *charset, size_t len,
-                                              ares_bool_t require_charset);
+CARES_EXTERN size_t        ares_buf_consume_until_charset(ares_buf_t          *buf,
+                                                          const unsigned char *charset,
+                                                          size_t               len,
+                                                          ares_bool_t require_charset);
+
 
+/*! Consume until a sequence of bytes is encountered.  Does not include the
+ *  sequence of characters itself.
+ *
+ *  \param[in] buf                Initialized buffer object
+ *  \param[in] seq                sequence of bytes
+ *  \param[in] len                length of sequence
+ *  \param[in] require_charset    require we find the sequence.
+ *                                if ARES_FALSE it will simply consume the
+ *                                rest of the buffer.  If ARES_TRUE will return
+ *                                SIZE_MAX if not found.
+ *  \return number of characters consumed
+ */
+CARES_EXTERN size_t        ares_buf_consume_until_seq(ares_buf_t          *buf,
+                                                      const unsigned char *seq,
+                                                      size_t               len,
+                                                      ares_bool_t require_seq);
 
 /*! Consume while the characters match the characters in the provided set.
  *
@@ -397,8 +458,9 @@ size_t        ares__buf_consume_until_charset(ares__buf_t         *buf,
  *  \param[in] len                length of character set
  *  \return number of characters consumed
  */
-size_t ares__buf_consume_charset(ares__buf_t *buf, const unsigned char *charset,
-                                 size_t len);
+CARES_EXTERN size_t        ares_buf_consume_charset(ares_buf_t          *buf,
+                                                    const unsigned char *charset,
+                                                    size_t               len);
 
 
 /*! Consume from the current position until the end of the line, and optionally
@@ -409,7 +471,8 @@ size_t ares__buf_consume_charset(ares__buf_t *buf, const unsigned char *charset,
  *                                ARES_FALSE otherwise.
  *  \return number of characters consumed
  */
-size_t ares__buf_consume_line(ares__buf_t *buf, ares_bool_t include_linefeed);
+CARES_EXTERN size_t        ares_buf_consume_line(ares_buf_t *buf,
+                                                 ares_bool_t include_linefeed);
 
 typedef enum {
   /*! No flags */
@@ -419,9 +482,9 @@ typedef enum {
    *  incompatible with ARES_BUF_SPLIT_LTRIM since the delimiter is always
    *  the first character.
    */
-  ARES_BUF_SPLIT_DONT_CONSUME_DELIMS = 1 << 0,
+  ARES_BUF_SPLIT_KEEP_DELIMS = 1 << 0,
   /*! Allow blank sections, by default blank sections are not emitted.  If using
-   *  ARES_BUF_SPLIT_DONT_CONSUME_DELIMS, the delimiter is not counted as part
+   *  ARES_BUF_SPLIT_KEEP_DELIMS, the delimiter is not counted as part
    *  of the section */
   ARES_BUF_SPLIT_ALLOW_BLANK = 1 << 1,
   /*! Remove duplicate entries */
@@ -434,7 +497,7 @@ typedef enum {
   ARES_BUF_SPLIT_RTRIM = 1 << 5,
   /*! Trim leading and trailing whitespace from buffer */
   ARES_BUF_SPLIT_TRIM = (ARES_BUF_SPLIT_LTRIM | ARES_BUF_SPLIT_RTRIM)
-} ares__buf_split_t;
+} ares_buf_split_t;
 
 /*! Split the provided buffer into multiple sub-buffers stored in the variable
  *  pointed to by the linked list.  The sub buffers are const buffers pointing
@@ -450,17 +513,60 @@ typedef enum {
  *                                character in the value.  A value of 1 would
  *                                have little usefulness and would effectively
  *                                ignore the delimiter itself.
- *  \param[out] list              Result. Depending on flags, this may be a
- *                                valid list with no elements.  Use
- *                                ares__llist_destroy() to free the memory which
- *                                will also free the contained ares__buf_t
- *                                objects.
+ *  \param[out] arr               Result. Depending on flags, this may be a
+ *                                valid array with no elements.  Use
+ *                                ares_array_destroy() to free the memory which
+ *                                will also free the contained ares_buf_t *
+ *                                objects. Each buf object returned by
+ *                                ares_array_at() or similar is a pointer to
+ *                                an ares_buf_t * object, meaning you need to
+ *                                accept it as "ares_buf_t **" then dereference.
+ *  \return ARES_SUCCESS on success, or error like ARES_ENOMEM.
+ */
+CARES_EXTERN ares_status_t ares_buf_split(
+  ares_buf_t *buf, const unsigned char *delims, size_t delims_len,
+  ares_buf_split_t flags, size_t max_sections, ares_array_t **arr);
+
+/*! Split the provided buffer into an ares_array_t of C strings.
+ *
+ *  \param[in]  buf               Initialized buffer object
+ *  \param[in]  delims            Possible delimiters
+ *  \param[in]  delims_len        Length of possible delimiters
+ *  \param[in]  flags             One more more flags
+ *  \param[in]  max_sections      Maximum number of sections.  Use 0 for
+ *                                unlimited. Useful for splitting key/value
+ *                                pairs where the delimiter may be a valid
+ *                                character in the value.  A value of 1 would
+ *                                have little usefulness and would effectively
+ *                                ignore the delimiter itself.
+ *  \param[out] arr               Array of strings. Free using
+ *                                ares_array_destroy().
  *  \return ARES_SUCCESS on success, or error like ARES_ENOMEM.
  */
-ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
-                              size_t delims_len, ares__buf_split_t flags,
-                              size_t max_sections, ares__llist_t **list);
+CARES_EXTERN ares_status_t ares_buf_split_str_array(
+  ares_buf_t *buf, const unsigned char *delims, size_t delims_len,
+  ares_buf_split_t flags, size_t max_sections, ares_array_t **arr);
 
+/*! Split the provided buffer into a C array of C strings.
+ *
+ *  \param[in]  buf               Initialized buffer object
+ *  \param[in]  delims            Possible delimiters
+ *  \param[in]  delims_len        Length of possible delimiters
+ *  \param[in]  flags             One more more flags
+ *  \param[in]  max_sections      Maximum number of sections.  Use 0 for
+ *                                unlimited. Useful for splitting key/value
+ *                                pairs where the delimiter may be a valid
+ *                                character in the value.  A value of 1 would
+ *                                have little usefulness and would effectively
+ *                                ignore the delimiter itself.
+ *  \param[out] strs              Array of strings. Free using
+ *                                ares_free_array(strs, nstrs, ares_free)
+ *  \param[out] nstrs             Number of elements in the array.
+ *  \return ARES_SUCCESS on success, or error like ARES_ENOMEM.
+ */
+CARES_EXTERN ares_status_t ares_buf_split_str(
+  ares_buf_t *buf, const unsigned char *delims, size_t delims_len,
+  ares_buf_split_t flags, size_t max_sections, char ***strs, size_t *nstrs);
 
 /*! Check the unprocessed buffer to see if it begins with the sequence of
  *  characters provided.
@@ -470,8 +576,9 @@ ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
  *  \param[in] data_len     Length of data to compare.
  *  \return ARES_TRUE on match, ARES_FALSE otherwise.
  */
-ares_bool_t   ares__buf_begins_with(const ares__buf_t   *buf,
-                                    const unsigned char *data, size_t data_len);
+CARES_EXTERN ares_bool_t          ares_buf_begins_with(const ares_buf_t    *buf,
+                                                       const unsigned char *data,
+                                                       size_t               data_len);
 
 
 /*! Size of unprocessed remaining data length
@@ -479,18 +586,27 @@ ares_bool_t   ares__buf_begins_with(const ares__buf_t   *buf,
  *  \param[in] buf Initialized buffer object
  *  \return length remaining
  */
-size_t        ares__buf_len(const ares__buf_t *buf);
+CARES_EXTERN size_t               ares_buf_len(const ares_buf_t *buf);
 
 /*! Retrieve a pointer to the currently unprocessed data.  Generally this isn't
  *  recommended to be used in practice.  The returned pointer may be invalidated
- *  by any future ares__buf_*() calls.
+ *  by any future ares_buf_*() calls.
  *
  *  \param[in]  buf    Initialized buffer object
  *  \param[out] len    Length of available data
  *  \return Pointer to buffer of unprocessed data
  */
-const unsigned char *ares__buf_peek(const ares__buf_t *buf, size_t *len);
+CARES_EXTERN const unsigned char *ares_buf_peek(const ares_buf_t *buf,
+                                                size_t           *len);
 
+/*! Retrieve the next byte in the buffer without moving forward.
+ *
+ *  \param[in]  buf  Initialized buffer object
+ *  \param[out] b    Single byte
+ *  \return \return ARES_SUCCESS on success, or error
+ */
+CARES_EXTERN ares_status_t        ares_buf_peek_byte(const ares_buf_t *buf,
+                                                     unsigned char    *b);
 
 /*! Wipe any processed data from the beginning of the buffer.  This will
  *  move any remaining data to the front of the internally allocated buffer.
@@ -503,46 +619,46 @@ const unsigned char *ares__buf_peek(const ares__buf_t *buf, size_t *len);
  *
  *  It may be useful to call in order to ensure the current message being
  *  processed is in the beginning of the buffer if there is an intent to use
- *  ares__buf_set_position() and ares__buf_get_position() as may be necessary
+ *  ares_buf_set_position() and ares_buf_get_position() as may be necessary
  *  when processing DNS compressed names.
  *
  *  If there is an active tag, it will NOT clear the tag, it will use the tag
  *  as the start of the unprocessed data rather than the current offset.  If
- *  a prior tag is no longer needed, may be wise to call ares__buf_tag_clear().
+ *  a prior tag is no longer needed, may be wise to call ares_buf_tag_clear().
  *
  *  \param[in]  buf    Initialized buffer object
  */
-void                 ares__buf_reclaim(ares__buf_t *buf);
+CARES_EXTERN void                 ares_buf_reclaim(ares_buf_t *buf);
 
 /*! Set the current offset within the internal buffer.
  *
- *  Typically this should not be used, if possible, use the ares__buf_tag*()
+ *  Typically this should not be used, if possible, use the ares_buf_tag*()
  *  operations instead.
  *
  *  One exception is DNS name compression which may backwards reference to
  *  an index in the message.  It may be necessary in such a case to call
- *  ares__buf_reclaim() if using a dynamic (non-const) buffer before processing
+ *  ares_buf_reclaim() if using a dynamic (non-const) buffer before processing
  *  such a message.
  *
  *  \param[in] buf  Initialized buffer object
  *  \param[in] idx  Index to set position
  *  \return ARES_SUCCESS if valid index
  */
-ares_status_t        ares__buf_set_position(ares__buf_t *buf, size_t idx);
+CARES_EXTERN ares_status_t ares_buf_set_position(ares_buf_t *buf, size_t idx);
 
 /*! Get the current offset within the internal buffer.
  *
- *  Typically this should not be used, if possible, use the ares__buf_tag*()
+ *  Typically this should not be used, if possible, use the ares_buf_tag*()
  *  operations instead.
  *
  *  This can be used to get the current position, useful for saving if a
- *  jump via ares__buf_set_position() is performed and need to restore the
+ *  jump via ares_buf_set_position() is performed and need to restore the
  *  current position for future operations.
  *
  *  \param[in] buf Initialized buffer object
  *  \return index of current position
  */
-size_t               ares__buf_get_position(const ares__buf_t *buf);
+CARES_EXTERN size_t        ares_buf_get_position(const ares_buf_t *buf);
 
 /*! Parse a character-string as defined in RFC1035, as a null-terminated
  *  string.
@@ -556,29 +672,9 @@ size_t               ares__buf_get_position(const ares__buf_t *buf);
  *                             ares_free()'d by the caller.
  *  \return ARES_SUCCESS on success
  */
-ares_status_t ares__buf_parse_dns_str(ares__buf_t *buf, size_t remaining_len,
-                                      char **name);
-
-/*! Parse an array of character strings as defined in RFC1035, as binary,
- *  however, for convenience this does guarantee a NULL terminator (that is
- *  not included in the length for each value).
- *
- *  \param[in]  buf                initialized buffer object
- *  \param[in]  remaining_len      maximum length that should be used for
- *                                 parsing the string, this is often less than
- *                                 the remaining buffer and is based on the RR
- *                                 record length.
- *  \param[out] strs               Pointer passed by reference to be filled in
- *                                 with
- *                                 the array of values.
- *  \param[out] validate_printable Validate the strings contain only printable
- *                                 data.
- *  \return ARES_SUCCESS on success
- */
-ares_status_t ares__buf_parse_dns_abinstr(ares__buf_t *buf,
-                                          size_t       remaining_len,
-                                          ares__dns_multistring_t **strs,
-                                          ares_bool_t validate_printable);
+CARES_EXTERN ares_status_t ares_buf_parse_dns_str(ares_buf_t *buf,
+                                                  size_t      remaining_len,
+                                                  char      **name);
 
 /*! Parse a character-string as defined in RFC1035, as binary, however for
  *  convenience this does guarantee a NULL terminator (that is not included
@@ -594,8 +690,10 @@ ares_status_t ares__buf_parse_dns_abinstr(ares__buf_t *buf,
  *  \param[out] bin_len        Length of returned string.
  *  \return ARES_SUCCESS on success
  */
-ares_status_t ares__buf_parse_dns_binstr(ares__buf_t *buf, size_t remaining_len,
-                                         unsigned char **bin, size_t *bin_len);
+CARES_EXTERN ares_status_t ares_buf_parse_dns_binstr(ares_buf_t *buf,
+                                                     size_t      remaining_len,
+                                                     unsigned char **bin,
+                                                     size_t         *bin_len);
 
 /*! Load data from specified file path into provided buffer.  The entire file
  *  is loaded into memory.
@@ -606,7 +704,8 @@ ares_status_t ares__buf_parse_dns_binstr(ares__buf_t *buf, size_t remaining_len,
  *  \return ARES_ENOTFOUND if file not found, ARES_EFILE if issues reading
  *          file, ARES_ENOMEM if out of memory, ARES_SUCCESS on success.
  */
-ares_status_t ares__buf_load_file(const char *filename, ares__buf_t *buf);
+CARES_EXTERN ares_status_t ares_buf_load_file(const char *filename,
+                                              ares_buf_t *buf);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/dsa/ares__htable_asvp.h b/deps/cares/src/lib/include/ares_htable_asvp.h
similarity index 72%
rename from deps/cares/src/lib/dsa/ares__htable_asvp.h
rename to deps/cares/src/lib/include/ares_htable_asvp.h
index 49a766d023091e..89a99fc9eec541 100644
--- a/deps/cares/src/lib/dsa/ares__htable_asvp.h
+++ b/deps/cares/src/lib/include/ares_htable_asvp.h
@@ -26,10 +26,10 @@
 #ifndef __ARES__HTABLE_ASVP_H
 #define __ARES__HTABLE_ASVP_H
 
-/*! \addtogroup ares__htable_asvp HashTable with ares_socket_t Key and
+/*! \addtogroup ares_htable_asvp HashTable with ares_socket_t Key and
  *                                void pointer Value
  *
- * This data structure wraps the base ares__htable data structure in order to
+ * This data structure wraps the base ares_htable data structure in order to
  * split the key and value data types as ares_socket_t and void pointer,
  * respectively.
  *
@@ -41,23 +41,23 @@
  * @{
  */
 
-struct ares__htable_asvp;
+struct ares_htable_asvp;
 
 /*! Opaque data type for ares_socket_t key, void pointer hash table
  *  implementation */
-typedef struct ares__htable_asvp ares__htable_asvp_t;
+typedef struct ares_htable_asvp ares_htable_asvp_t;
 
 /*! Callback to free value stored in hashtable
  *
  *  \param[in] val  user-supplied value
  */
-typedef void (*ares__htable_asvp_val_free_t)(void *val);
+typedef void (*ares_htable_asvp_val_free_t)(void *val);
 
 /*! Destroy hashtable
  *
  *  \param[in] htable  Initialized hashtable
  */
-void ares__htable_asvp_destroy(ares__htable_asvp_t *htable);
+CARES_EXTERN void ares_htable_asvp_destroy(ares_htable_asvp_t *htable);
 
 /*! Create size_t key, void pointer value hash table
  *
@@ -65,8 +65,8 @@ void ares__htable_asvp_destroy(ares__htable_asvp_t *htable);
  *                       NULL it is expected the caller will clean up any user
  *                       supplied values.
  */
-ares__htable_asvp_t              *
-  ares__htable_asvp_create(ares__htable_asvp_val_free_t val_free);
+CARES_EXTERN ares_htable_asvp_t *
+  ares_htable_asvp_create(ares_htable_asvp_val_free_t val_free);
 
 /*! Retrieve an array of keys from the hashtable.
  *
@@ -74,8 +74,8 @@ ares__htable_asvp_t              *
  *  \param[out] num      Count of returned keys
  *  \return Array of keys in the hashtable. Must be free'd with ares_free().
  */
-ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
-                                      size_t                    *num);
+CARES_EXTERN ares_socket_t *
+  ares_htable_asvp_keys(const ares_htable_asvp_t *htable, size_t *num);
 
 
 /*! Insert key/value into hash table
@@ -85,8 +85,8 @@ ares_socket_t *ares__htable_asvp_keys(const ares__htable_asvp_t *htable,
  *  \param[in] val    value to store (takes ownership). May be NULL.
  *  \return ARES_TRUE on success, ARES_FALSE on out of memory or misuse
  */
-ares_bool_t    ares__htable_asvp_insert(ares__htable_asvp_t *htable,
-                                        ares_socket_t key, void *val);
+CARES_EXTERN ares_bool_t ares_htable_asvp_insert(ares_htable_asvp_t *htable,
+                                                 ares_socket_t key, void *val);
 
 /*! Retrieve value from hashtable based on key
  *
@@ -95,19 +95,19 @@ ares_bool_t    ares__htable_asvp_insert(ares__htable_asvp_t *htable,
  *  \param[out] val     Optional.  Pointer to store value.
  *  \return ARES_TRUE on success, ARES_FALSE on failure
  */
-ares_bool_t    ares__htable_asvp_get(const ares__htable_asvp_t *htable,
-                                     ares_socket_t key, void **val);
+CARES_EXTERN ares_bool_t ares_htable_asvp_get(const ares_htable_asvp_t *htable,
+                                              ares_socket_t key, void **val);
 
 /*! Retrieve value from hashtable directly as return value.  Caveat to this
- *  function over ares__htable_asvp_get() is that if a NULL value is stored
+ *  function over ares_htable_asvp_get() is that if a NULL value is stored
  *  you cannot determine if the key is not found or the value is NULL.
  *
  *  \param[in] htable  Initialized hash table
  *  \param[in] key     key to use to search
  *  \return value associated with key in hashtable or NULL
  */
-void          *ares__htable_asvp_get_direct(const ares__htable_asvp_t *htable,
-                                            ares_socket_t              key);
+CARES_EXTERN void *ares_htable_asvp_get_direct(const ares_htable_asvp_t *htable,
+                                               ares_socket_t             key);
 
 /*! Remove a value from the hashtable by key
  *
@@ -115,15 +115,15 @@ void          *ares__htable_asvp_get_direct(const ares__htable_asvp_t *htable,
  *  \param[in] key     key to use to search
  *  \return ARES_TRUE if found, ARES_FALSE if not found
  */
-ares_bool_t    ares__htable_asvp_remove(ares__htable_asvp_t *htable,
-                                        ares_socket_t        key);
+CARES_EXTERN ares_bool_t ares_htable_asvp_remove(ares_htable_asvp_t *htable,
+                                                 ares_socket_t       key);
 
 /*! Retrieve the number of keys stored in the hash table
  *
  *  \param[in] htable  Initialized hash table
  *  \return count
  */
-size_t         ares__htable_asvp_num_keys(const ares__htable_asvp_t *htable);
+CARES_EXTERN size_t ares_htable_asvp_num_keys(const ares_htable_asvp_t *htable);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/include/ares_htable_dict.h b/deps/cares/src/lib/include/ares_htable_dict.h
new file mode 100644
index 00000000000000..cb6f1f048ca023
--- /dev/null
+++ b/deps/cares/src/lib/include/ares_htable_dict.h
@@ -0,0 +1,123 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES__HTABLE_DICT_H
+#define __ARES__HTABLE_DICT_H
+
+/*! \addtogroup ares_htable_dict HashTable with case-insensitive string Key and
+ *  string value
+ *
+ * This data structure wraps the base ares_htable data structure in order to
+ * split the key and value data types as string and string, respectively.
+ *
+ * Average time complexity:
+ *  - Insert: O(1)
+ *  - Search: O(1)
+ *  - Delete: O(1)
+ *
+ * @{
+ */
+
+struct ares_htable_dict;
+
+/*! Opaque data type for string key, string value hash table
+ * implementation */
+typedef struct ares_htable_dict ares_htable_dict_t;
+
+/*! Destroy hashtable
+ *
+ *  \param[in] htable  Initialized hashtable
+ */
+CARES_EXTERN void ares_htable_dict_destroy(ares_htable_dict_t *htable);
+
+/*! Create void pointer key, string value hash table
+ *
+ */
+CARES_EXTERN ares_htable_dict_t *ares_htable_dict_create(void);
+
+/*! Insert key/value into hash table
+ *
+ *  \param[in] htable Initialized hash table
+ *  \param[in] key    key to associate with value
+ *  \param[in] val    value to store (duplicates).
+ *  \return ARES_TRUE on success, ARES_FALSE on failure or out of memory
+ */
+CARES_EXTERN ares_bool_t ares_htable_dict_insert(ares_htable_dict_t *htable,
+                                                 const char         *key,
+                                                 const char         *val);
+
+/*! Retrieve value from hashtable based on key
+ *
+ *  \param[in]  htable  Initialized hash table
+ *  \param[in]  key     key to use to search
+ *  \param[out] val     Optional.  Pointer to store value.
+ *  \return ARES_TRUE on success, ARES_FALSE on failure
+ */
+CARES_EXTERN ares_bool_t ares_htable_dict_get(const ares_htable_dict_t *htable,
+                                              const char               *key,
+                                              const char              **val);
+
+/*! Retrieve value from hashtable directly as return value.  Caveat to this
+ *  function over ares_htable_dict_get() is that if a NULL value is stored
+ *  you cannot determine if the key is not found or the value is NULL.
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \param[in] key     key to use to search
+ *  \return value associated with key in hashtable or NULL
+ */
+CARES_EXTERN const char *
+  ares_htable_dict_get_direct(const ares_htable_dict_t *htable,
+                              const char               *key);
+
+/*! Remove a value from the hashtable by key
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \param[in] key     key to use to search
+ *  \return ARES_TRUE if found, ARES_FALSE if not
+ */
+CARES_EXTERN ares_bool_t ares_htable_dict_remove(ares_htable_dict_t *htable,
+                                                 const char         *key);
+
+/*! Retrieve the number of keys stored in the hash table
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \return count
+ */
+CARES_EXTERN size_t ares_htable_dict_num_keys(const ares_htable_dict_t *htable);
+
+/*! Retrieve an array of keys from the hashtable.
+ *
+ *  \param[in]  htable   Initialized hashtable
+ *  \param[out] num      Count of returned keys
+ *  \return Array of keys in the hashtable. Must be free'd with
+ *          ares_free_array(strs, num, ares_free);
+ */
+CARES_EXTERN char **ares_htable_dict_keys(const ares_htable_dict_t *htable,
+                                          size_t                   *num);
+
+
+/*! @} */
+
+#endif /* __ARES__HTABLE_DICT_H */
diff --git a/deps/cares/src/lib/dsa/ares__htable_strvp.h b/deps/cares/src/lib/include/ares_htable_strvp.h
similarity index 66%
rename from deps/cares/src/lib/dsa/ares__htable_strvp.h
rename to deps/cares/src/lib/include/ares_htable_strvp.h
index 878c71869a92dc..eaaf6d3be0d188 100644
--- a/deps/cares/src/lib/dsa/ares__htable_strvp.h
+++ b/deps/cares/src/lib/include/ares_htable_strvp.h
@@ -26,10 +26,10 @@
 #ifndef __ARES__HTABLE_STRVP_H
 #define __ARES__HTABLE_STRVP_H
 
-/*! \addtogroup ares__htable_strvp HashTable with string Key and void pointer
+/*! \addtogroup ares_htable_strvp HashTable with string Key and void pointer
  * Value
  *
- * This data structure wraps the base ares__htable data structure in order to
+ * This data structure wraps the base ares_htable data structure in order to
  * split the key and value data types as string and void pointer, respectively.
  *
  * Average time complexity:
@@ -40,22 +40,22 @@
  * @{
  */
 
-struct ares__htable_strvp;
+struct ares_htable_strvp;
 
 /*! Opaque data type for size_t key, void pointer hash table implementation */
-typedef struct ares__htable_strvp ares__htable_strvp_t;
+typedef struct ares_htable_strvp ares_htable_strvp_t;
 
 /*! Callback to free value stored in hashtable
  *
  *  \param[in] val  user-supplied value
  */
-typedef void (*ares__htable_strvp_val_free_t)(void *val);
+typedef void (*ares_htable_strvp_val_free_t)(void *val);
 
 /*! Destroy hashtable
  *
  *  \param[in] htable  Initialized hashtable
  */
-void ares__htable_strvp_destroy(ares__htable_strvp_t *htable);
+CARES_EXTERN void ares_htable_strvp_destroy(ares_htable_strvp_t *htable);
 
 /*! Create string, void pointer value hash table
  *
@@ -63,8 +63,8 @@ void ares__htable_strvp_destroy(ares__htable_strvp_t *htable);
  *                       NULL it is expected the caller will clean up any user
  *                       supplied values.
  */
-ares__htable_strvp_t           *
-  ares__htable_strvp_create(ares__htable_strvp_val_free_t val_free);
+CARES_EXTERN ares_htable_strvp_t *
+  ares_htable_strvp_create(ares_htable_strvp_val_free_t val_free);
 
 /*! Insert key/value into hash table
  *
@@ -73,8 +73,8 @@ ares__htable_strvp_t           *
  *  \param[in] val    value to store (takes ownership). May be NULL.
  *  \return ARES_TRUE on success, ARES_FALSE on failure or out of memory
  */
-ares_bool_t ares__htable_strvp_insert(ares__htable_strvp_t *htable,
-                                      const char *key, void *val);
+CARES_EXTERN ares_bool_t ares_htable_strvp_insert(ares_htable_strvp_t *htable,
+                                                  const char *key, void *val);
 
 /*! Retrieve value from hashtable based on key
  *
@@ -83,19 +83,20 @@ ares_bool_t ares__htable_strvp_insert(ares__htable_strvp_t *htable,
  *  \param[out] val     Optional.  Pointer to store value.
  *  \return ARES_TRUE on success, ARES_FALSE on failure
  */
-ares_bool_t ares__htable_strvp_get(const ares__htable_strvp_t *htable,
-                                   const char *key, void **val);
+CARES_EXTERN ares_bool_t ares_htable_strvp_get(
+  const ares_htable_strvp_t *htable, const char *key, void **val);
 
 /*! Retrieve value from hashtable directly as return value.  Caveat to this
- *  function over ares__htable_strvp_get() is that if a NULL value is stored
+ *  function over ares_htable_strvp_get() is that if a NULL value is stored
  *  you cannot determine if the key is not found or the value is NULL.
  *
  *  \param[in] htable  Initialized hash table
  *  \param[in] key     key to use to search
  *  \return value associated with key in hashtable or NULL
  */
-void       *ares__htable_strvp_get_direct(const ares__htable_strvp_t *htable,
-                                          const char                 *key);
+CARES_EXTERN void *
+  ares_htable_strvp_get_direct(const ares_htable_strvp_t *htable,
+                               const char                *key);
 
 /*! Remove a value from the hashtable by key
  *
@@ -103,15 +104,26 @@ void       *ares__htable_strvp_get_direct(const ares__htable_strvp_t *htable,
  *  \param[in] key     key to use to search
  *  \return ARES_TRUE if found, ARES_FALSE if not
  */
-ares_bool_t ares__htable_strvp_remove(ares__htable_strvp_t *htable,
-                                      const char           *key);
+CARES_EXTERN ares_bool_t ares_htable_strvp_remove(ares_htable_strvp_t *htable,
+                                                  const char          *key);
+
+/*! Remove the value from the hashtable, and return the value instead of
+ *  calling the val_free passed to ares_htable_strvp_create().
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \param[in] key     key to use to search
+ *  \return value in hashtable or NULL on error
+ */
+CARES_EXTERN void       *ares_htable_strvp_claim(ares_htable_strvp_t *htable,
+                                                 const char          *key);
 
 /*! Retrieve the number of keys stored in the hash table
  *
  *  \param[in] htable  Initialized hash table
  *  \return count
  */
-size_t      ares__htable_strvp_num_keys(const ares__htable_strvp_t *htable);
+CARES_EXTERN size_t
+  ares_htable_strvp_num_keys(const ares_htable_strvp_t *htable);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/dsa/ares__htable_szvp.h b/deps/cares/src/lib/include/ares_htable_szvp.h
similarity index 72%
rename from deps/cares/src/lib/dsa/ares__htable_szvp.h
rename to deps/cares/src/lib/include/ares_htable_szvp.h
index 62b1776be92b5b..927b9a5ec9f780 100644
--- a/deps/cares/src/lib/dsa/ares__htable_szvp.h
+++ b/deps/cares/src/lib/include/ares_htable_szvp.h
@@ -26,10 +26,10 @@
 #ifndef __ARES__HTABLE_STVP_H
 #define __ARES__HTABLE_STVP_H
 
-/*! \addtogroup ares__htable_szvp HashTable with size_t Key and void pointer
+/*! \addtogroup ares_htable_szvp HashTable with size_t Key and void pointer
  * Value
  *
- * This data structure wraps the base ares__htable data structure in order to
+ * This data structure wraps the base ares_htable data structure in order to
  * split the key and value data types as size_t and void pointer, respectively.
  *
  * Average time complexity:
@@ -40,22 +40,22 @@
  * @{
  */
 
-struct ares__htable_szvp;
+struct ares_htable_szvp;
 
 /*! Opaque data type for size_t key, void pointer hash table implementation */
-typedef struct ares__htable_szvp ares__htable_szvp_t;
+typedef struct ares_htable_szvp ares_htable_szvp_t;
 
 /*! Callback to free value stored in hashtable
  *
  *  \param[in] val  user-supplied value
  */
-typedef void (*ares__htable_szvp_val_free_t)(void *val);
+typedef void (*ares_htable_szvp_val_free_t)(void *val);
 
 /*! Destroy hashtable
  *
  *  \param[in] htable  Initialized hashtable
  */
-void ares__htable_szvp_destroy(ares__htable_szvp_t *htable);
+CARES_EXTERN void ares_htable_szvp_destroy(ares_htable_szvp_t *htable);
 
 /*! Create size_t key, void pointer value hash table
  *
@@ -63,8 +63,8 @@ void ares__htable_szvp_destroy(ares__htable_szvp_t *htable);
  *                       NULL it is expected the caller will clean up any user
  *                       supplied values.
  */
-ares__htable_szvp_t           *
-  ares__htable_szvp_create(ares__htable_szvp_val_free_t val_free);
+CARES_EXTERN ares_htable_szvp_t *
+  ares_htable_szvp_create(ares_htable_szvp_val_free_t val_free);
 
 /*! Insert key/value into hash table
  *
@@ -73,8 +73,8 @@ ares__htable_szvp_t           *
  *  \param[in] val    value to store (takes ownership). May be NULL.
  *  \return ARES_TRUE on success, ARES_FALSE on failure or out of memory
  */
-ares_bool_t ares__htable_szvp_insert(ares__htable_szvp_t *htable, size_t key,
-                                     void *val);
+CARES_EXTERN ares_bool_t ares_htable_szvp_insert(ares_htable_szvp_t *htable,
+                                                 size_t key, void *val);
 
 /*! Retrieve value from hashtable based on key
  *
@@ -83,19 +83,19 @@ ares_bool_t ares__htable_szvp_insert(ares__htable_szvp_t *htable, size_t key,
  *  \param[out] val     Optional.  Pointer to store value.
  *  \return ARES_TRUE on success, ARES_FALSE on failure
  */
-ares_bool_t ares__htable_szvp_get(const ares__htable_szvp_t *htable, size_t key,
-                                  void **val);
+CARES_EXTERN ares_bool_t ares_htable_szvp_get(const ares_htable_szvp_t *htable,
+                                              size_t key, void **val);
 
 /*! Retrieve value from hashtable directly as return value.  Caveat to this
- *  function over ares__htable_szvp_get() is that if a NULL value is stored
+ *  function over ares_htable_szvp_get() is that if a NULL value is stored
  *  you cannot determine if the key is not found or the value is NULL.
  *
  *  \param[in] htable  Initialized hash table
  *  \param[in] key     key to use to search
  *  \return value associated with key in hashtable or NULL
  */
-void       *ares__htable_szvp_get_direct(const ares__htable_szvp_t *htable,
-                                         size_t                     key);
+CARES_EXTERN void *ares_htable_szvp_get_direct(const ares_htable_szvp_t *htable,
+                                               size_t                    key);
 
 /*! Remove a value from the hashtable by key
  *
@@ -103,14 +103,15 @@ void       *ares__htable_szvp_get_direct(const ares__htable_szvp_t *htable,
  *  \param[in] key     key to use to search
  *  \return ARES_TRUE if found, ARES_FALSE if not
  */
-ares_bool_t ares__htable_szvp_remove(ares__htable_szvp_t *htable, size_t key);
+CARES_EXTERN ares_bool_t ares_htable_szvp_remove(ares_htable_szvp_t *htable,
+                                                 size_t              key);
 
 /*! Retrieve the number of keys stored in the hash table
  *
  *  \param[in] htable  Initialized hash table
  *  \return count
  */
-size_t      ares__htable_szvp_num_keys(const ares__htable_szvp_t *htable);
+CARES_EXTERN size_t ares_htable_szvp_num_keys(const ares_htable_szvp_t *htable);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/include/ares_htable_vpstr.h b/deps/cares/src/lib/include/ares_htable_vpstr.h
new file mode 100644
index 00000000000000..9f51b877452b72
--- /dev/null
+++ b/deps/cares/src/lib/include/ares_htable_vpstr.h
@@ -0,0 +1,111 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES__HTABLE_VPSTR_H
+#define __ARES__HTABLE_VPSTR_H
+
+/*! \addtogroup ares_htable_vpstr HashTable with void pointer Key and string
+ *  value
+ *
+ * This data structure wraps the base ares_htable data structure in order to
+ * split the key and value data types as void pointer and string, respectively.
+ *
+ * Average time complexity:
+ *  - Insert: O(1)
+ *  - Search: O(1)
+ *  - Delete: O(1)
+ *
+ * @{
+ */
+
+struct ares_htable_vpstr;
+
+/*! Opaque data type for void pointer key, string value hash table
+ * implementation */
+typedef struct ares_htable_vpstr ares_htable_vpstr_t;
+
+/*! Destroy hashtable
+ *
+ *  \param[in] htable  Initialized hashtable
+ */
+CARES_EXTERN void ares_htable_vpstr_destroy(ares_htable_vpstr_t *htable);
+
+/*! Create void pointer key, string value hash table
+ *
+ */
+CARES_EXTERN ares_htable_vpstr_t *ares_htable_vpstr_create(void);
+
+/*! Insert key/value into hash table
+ *
+ *  \param[in] htable Initialized hash table
+ *  \param[in] key    key to associate with value
+ *  \param[in] val    value to store (duplicates).
+ *  \return ARES_TRUE on success, ARES_FALSE on failure or out of memory
+ */
+CARES_EXTERN ares_bool_t ares_htable_vpstr_insert(ares_htable_vpstr_t *htable,
+                                                  void *key, const char *val);
+
+/*! Retrieve value from hashtable based on key
+ *
+ *  \param[in]  htable  Initialized hash table
+ *  \param[in]  key     key to use to search
+ *  \param[out] val     Optional.  Pointer to store value.
+ *  \return ARES_TRUE on success, ARES_FALSE on failure
+ */
+CARES_EXTERN ares_bool_t ares_htable_vpstr_get(
+  const ares_htable_vpstr_t *htable, const void *key, const char **val);
+
+/*! Retrieve value from hashtable directly as return value.  Caveat to this
+ *  function over ares_htable_vpstr_get() is that if a NULL value is stored
+ *  you cannot determine if the key is not found or the value is NULL.
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \param[in] key     key to use to search
+ *  \return value associated with key in hashtable or NULL
+ */
+CARES_EXTERN const char *
+  ares_htable_vpstr_get_direct(const ares_htable_vpstr_t *htable,
+                               const void                *key);
+
+/*! Remove a value from the hashtable by key
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \param[in] key     key to use to search
+ *  \return ARES_TRUE if found, ARES_FALSE if not
+ */
+CARES_EXTERN ares_bool_t ares_htable_vpstr_remove(ares_htable_vpstr_t *htable,
+                                                  const void          *key);
+
+/*! Retrieve the number of keys stored in the hash table
+ *
+ *  \param[in] htable  Initialized hash table
+ *  \return count
+ */
+CARES_EXTERN size_t
+  ares_htable_vpstr_num_keys(const ares_htable_vpstr_t *htable);
+
+/*! @} */
+
+#endif /* __ARES__HTABLE_VPSTR_H */
diff --git a/deps/cares/src/lib/dsa/ares__htable_vpvp.h b/deps/cares/src/lib/include/ares_htable_vpvp.h
similarity index 72%
rename from deps/cares/src/lib/dsa/ares__htable_vpvp.h
rename to deps/cares/src/lib/include/ares_htable_vpvp.h
index 1e0c750d862c64..1ebe6145765793 100644
--- a/deps/cares/src/lib/dsa/ares__htable_vpvp.h
+++ b/deps/cares/src/lib/include/ares_htable_vpvp.h
@@ -26,10 +26,10 @@
 #ifndef __ARES__HTABLE_VPVP_H
 #define __ARES__HTABLE_VPVP_H
 
-/*! \addtogroup ares__htable_vpvp HashTable with void pointer Key and void
+/*! \addtogroup ares_htable_vpvp HashTable with void pointer Key and void
  * pointer Value
  *
- * This data structure wraps the base ares__htable data structure in order to
+ * This data structure wraps the base ares_htable data structure in order to
  * split the key and value data types as size_t and void pointer, respectively.
  *
  * Average time complexity:
@@ -40,28 +40,28 @@
  * @{
  */
 
-struct ares__htable_vpvp;
+struct ares_htable_vpvp;
 
 /*! Opaque data type for size_t key, void pointer hash table implementation */
-typedef struct ares__htable_vpvp ares__htable_vpvp_t;
+typedef struct ares_htable_vpvp ares_htable_vpvp_t;
 
 /*! Callback to free key stored in hashtable
  *
  *  \param[in] key  user-supplied key
  */
-typedef void (*ares__htable_vpvp_key_free_t)(void *key);
+typedef void (*ares_htable_vpvp_key_free_t)(void *key);
 
 /*! Callback to free value stored in hashtable
  *
  *  \param[in] val  user-supplied value
  */
-typedef void (*ares__htable_vpvp_val_free_t)(void *val);
+typedef void (*ares_htable_vpvp_val_free_t)(void *val);
 
 /*! Destroy hashtable
  *
  *  \param[in] htable  Initialized hashtable
  */
-void ares__htable_vpvp_destroy(ares__htable_vpvp_t *htable);
+CARES_EXTERN void ares_htable_vpvp_destroy(ares_htable_vpvp_t *htable);
 
 /*! Create size_t key, void pointer value hash table
  *
@@ -72,9 +72,9 @@ void ares__htable_vpvp_destroy(ares__htable_vpvp_t *htable);
  *                       NULL it is expected the caller will clean up any user
  *                       supplied values.
  */
-ares__htable_vpvp_t           *
-  ares__htable_vpvp_create(ares__htable_vpvp_key_free_t key_free,
-                                     ares__htable_vpvp_val_free_t val_free);
+CARES_EXTERN ares_htable_vpvp_t *
+  ares_htable_vpvp_create(ares_htable_vpvp_key_free_t key_free,
+                          ares_htable_vpvp_val_free_t val_free);
 
 /*! Insert key/value into hash table
  *
@@ -83,8 +83,8 @@ ares__htable_vpvp_t           *
  *  \param[in] val    value to store (takes ownership). May be NULL.
  *  \return ARES_TRUE on success, ARES_FALSE on failure or out of memory
  */
-ares_bool_t ares__htable_vpvp_insert(ares__htable_vpvp_t *htable, void *key,
-                                     void *val);
+CARES_EXTERN ares_bool_t ares_htable_vpvp_insert(ares_htable_vpvp_t *htable,
+                                                 void *key, void *val);
 
 /*! Retrieve value from hashtable based on key
  *
@@ -93,19 +93,19 @@ ares_bool_t ares__htable_vpvp_insert(ares__htable_vpvp_t *htable, void *key,
  *  \param[out] val     Optional.  Pointer to store value.
  *  \return ARES_TRUE on success, ARES_FALSE on failure
  */
-ares_bool_t ares__htable_vpvp_get(const ares__htable_vpvp_t *htable,
-                                  const void *key, void **val);
+CARES_EXTERN ares_bool_t ares_htable_vpvp_get(const ares_htable_vpvp_t *htable,
+                                              const void *key, void **val);
 
 /*! Retrieve value from hashtable directly as return value.  Caveat to this
- *  function over ares__htable_vpvp_get() is that if a NULL value is stored
+ *  function over ares_htable_vpvp_get() is that if a NULL value is stored
  *  you cannot determine if the key is not found or the value is NULL.
  *
  *  \param[in] htable  Initialized hash table
  *  \param[in] key     key to use to search
  *  \return value associated with key in hashtable or NULL
  */
-void       *ares__htable_vpvp_get_direct(const ares__htable_vpvp_t *htable,
-                                         const void                *key);
+CARES_EXTERN void *ares_htable_vpvp_get_direct(const ares_htable_vpvp_t *htable,
+                                               const void               *key);
 
 /*! Remove a value from the hashtable by key
  *
@@ -113,15 +113,15 @@ void       *ares__htable_vpvp_get_direct(const ares__htable_vpvp_t *htable,
  *  \param[in] key     key to use to search
  *  \return ARES_TRUE if found, ARES_FALSE if not
  */
-ares_bool_t ares__htable_vpvp_remove(ares__htable_vpvp_t *htable,
-                                     const void          *key);
+CARES_EXTERN ares_bool_t ares_htable_vpvp_remove(ares_htable_vpvp_t *htable,
+                                                 const void         *key);
 
 /*! Retrieve the number of keys stored in the hash table
  *
  *  \param[in] htable  Initialized hash table
  *  \return count
  */
-size_t      ares__htable_vpvp_num_keys(const ares__htable_vpvp_t *htable);
+CARES_EXTERN size_t ares_htable_vpvp_num_keys(const ares_htable_vpvp_t *htable);
 
 /*! @} */
 
diff --git a/deps/cares/src/lib/dsa/ares__llist.h b/deps/cares/src/lib/include/ares_llist.h
similarity index 68%
rename from deps/cares/src/lib/dsa/ares__llist.h
rename to deps/cares/src/lib/include/ares_llist.h
index 213f54134bcb55..6aa0c78370401f 100644
--- a/deps/cares/src/lib/dsa/ares__llist.h
+++ b/deps/cares/src/lib/include/ares_llist.h
@@ -26,7 +26,7 @@
 #ifndef __ARES__LLIST_H
 #define __ARES__LLIST_H
 
-/*! \addtogroup ares__llist LinkedList Data Structure
+/*! \addtogroup ares_llist LinkedList Data Structure
  *
  * This is a doubly-linked list data structure.
  *
@@ -38,28 +38,28 @@
  * @{
  */
 
-struct ares__llist;
+struct ares_llist;
 
 /*! Opaque data structure for linked list */
-typedef struct ares__llist ares__llist_t;
+typedef struct ares_llist ares_llist_t;
 
-struct ares__llist_node;
+struct ares_llist_node;
 
 /*! Opaque data structure for a node in a linked list */
-typedef struct ares__llist_node ares__llist_node_t;
+typedef struct ares_llist_node ares_llist_node_t;
 
 /*! Callback to free user-defined node data
  *
  *  \param[in] data  user supplied data
  */
-typedef void (*ares__llist_destructor_t)(void *data);
+typedef void (*ares_llist_destructor_t)(void *data);
 
 /*! Create a linked list object
  *
  *  \param[in] destruct  Optional. Destructor to call on all removed nodes
  *  \return linked list object or NULL on out of memory
  */
-ares__llist_t      *ares__llist_create(ares__llist_destructor_t destruct);
+CARES_EXTERN ares_llist_t *ares_llist_create(ares_llist_destructor_t destruct);
 
 /*! Replace destructor for linked list nodes.  Typically this is used
  *  when wanting to disable the destructor by using NULL.
@@ -67,8 +67,9 @@ ares__llist_t      *ares__llist_create(ares__llist_destructor_t destruct);
  *  \param[in] list      Initialized linked list object
  *  \param[in] destruct  replacement destructor, NULL is allowed
  */
-void                ares__llist_replace_destructor(ares__llist_t           *list,
-                                                   ares__llist_destructor_t destruct);
+CARES_EXTERN void
+  ares_llist_replace_destructor(ares_llist_t           *list,
+                                ares_llist_destructor_t destruct);
 
 /*! Insert value as the first node in the linked list
  *
@@ -77,7 +78,8 @@ void                ares__llist_replace_destructor(ares__llist_t           *list
  *  \return node object referencing place in list, or null if out of memory or
  *   misuse
  */
-ares__llist_node_t *ares__llist_insert_first(ares__llist_t *list, void *val);
+CARES_EXTERN ares_llist_node_t *ares_llist_insert_first(ares_llist_t *list,
+                                                        void         *val);
 
 /*! Insert value as the last node in the linked list
  *
@@ -86,7 +88,8 @@ ares__llist_node_t *ares__llist_insert_first(ares__llist_t *list, void *val);
  *  \return node object referencing place in list, or null if out of memory or
  *   misuse
  */
-ares__llist_node_t *ares__llist_insert_last(ares__llist_t *list, void *val);
+CARES_EXTERN ares_llist_node_t *ares_llist_insert_last(ares_llist_t *list,
+                                                       void         *val);
 
 /*! Insert value before specified node in the linked list
  *
@@ -95,8 +98,8 @@ ares__llist_node_t *ares__llist_insert_last(ares__llist_t *list, void *val);
  *  \return node object referencing place in list, or null if out of memory or
  *   misuse
  */
-ares__llist_node_t *ares__llist_insert_before(ares__llist_node_t *node,
-                                              void               *val);
+CARES_EXTERN ares_llist_node_t *
+  ares_llist_insert_before(ares_llist_node_t *node, void *val);
 
 /*! Insert value after specified node in the linked list
  *
@@ -105,22 +108,22 @@ ares__llist_node_t *ares__llist_insert_before(ares__llist_node_t *node,
  *  \return node object referencing place in list, or null if out of memory or
  *   misuse
  */
-ares__llist_node_t *ares__llist_insert_after(ares__llist_node_t *node,
-                                             void               *val);
+CARES_EXTERN ares_llist_node_t *ares_llist_insert_after(ares_llist_node_t *node,
+                                                        void              *val);
 
 /*! Obtain first node in list
  *
  *  \param[in] list  Initialized list object
  *  \return first node in list or NULL if none
  */
-ares__llist_node_t *ares__llist_node_first(ares__llist_t *list);
+CARES_EXTERN ares_llist_node_t *ares_llist_node_first(ares_llist_t *list);
 
 /*! Obtain last node in list
  *
  *  \param[in] list  Initialized list object
  *  \return last node in list or NULL if none
  */
-ares__llist_node_t *ares__llist_node_last(ares__llist_t *list);
+CARES_EXTERN ares_llist_node_t *ares_llist_node_last(ares_llist_t *list);
 
 /*! Obtain a node based on its index.  This is an O(n) operation.
  *
@@ -128,21 +131,22 @@ ares__llist_node_t *ares__llist_node_last(ares__llist_t *list);
  *  \param[in] idx  Index of node to retrieve
  *  \return node at index or NULL if invalid index
  */
-ares__llist_node_t *ares__llist_node_idx(ares__llist_t *list, size_t idx);
+CARES_EXTERN ares_llist_node_t *ares_llist_node_idx(ares_llist_t *list,
+                                                    size_t        idx);
 
 /*! Obtain next node in respect to specified node
  *
  *  \param[in] node  Node referenced
  *  \return node or NULL if none
  */
-ares__llist_node_t *ares__llist_node_next(ares__llist_node_t *node);
+CARES_EXTERN ares_llist_node_t *ares_llist_node_next(ares_llist_node_t *node);
 
 /*! Obtain previous node in respect to specified node
  *
  *  \param[in] node  Node referenced
  *  \return node or NULL if none
  */
-ares__llist_node_t *ares__llist_node_prev(ares__llist_node_t *node);
+CARES_EXTERN ares_llist_node_t *ares_llist_node_prev(ares_llist_node_t *node);
 
 
 /*! Obtain value from node
@@ -150,41 +154,41 @@ ares__llist_node_t *ares__llist_node_prev(ares__llist_node_t *node);
  *  \param[in] node  Node referenced
  *  \return user provided value from node
  */
-void               *ares__llist_node_val(ares__llist_node_t *node);
+CARES_EXTERN void              *ares_llist_node_val(ares_llist_node_t *node);
 
 /*! Obtain the number of entries in the list
  *
  *  \param[in] list  Initialized list object
  *  \return count
  */
-size_t              ares__llist_len(const ares__llist_t *list);
+CARES_EXTERN size_t             ares_llist_len(const ares_llist_t *list);
 
 /*! Clear all entries in the list, but don't destroy the list object.
  *
  *  \param[in] list  Initialized list object
  */
-void                ares__llist_clear(ares__llist_t *list);
+CARES_EXTERN void               ares_llist_clear(ares_llist_t *list);
 
 /*! Obtain list object from referenced node
  *
  *  \param[in] node  Node referenced
  *  \return list object node belongs to
  */
-ares__llist_t      *ares__llist_node_parent(ares__llist_node_t *node);
+CARES_EXTERN ares_llist_t      *ares_llist_node_parent(ares_llist_node_t *node);
 
 /*! Obtain the first user-supplied value in the list
  *
  *  \param[in] list Initialized list object
  *  \return first user supplied value or NULL if none
  */
-void               *ares__llist_first_val(ares__llist_t *list);
+CARES_EXTERN void              *ares_llist_first_val(ares_llist_t *list);
 
 /*! Obtain the last user-supplied value in the list
  *
  *  \param[in] list Initialized list object
  *  \return last user supplied value or NULL if none
  */
-void               *ares__llist_last_val(ares__llist_t *list);
+CARES_EXTERN void              *ares_llist_last_val(ares_llist_t *list);
 
 /*! Take ownership of user-supplied value in list without calling destructor.
  *  Will unchain entry from list.
@@ -192,26 +196,26 @@ void               *ares__llist_last_val(ares__llist_t *list);
  *  \param[in] node Node referenced
  *  \return user supplied value
  */
-void               *ares__llist_node_claim(ares__llist_node_t *node);
+CARES_EXTERN void              *ares_llist_node_claim(ares_llist_node_t *node);
 
 /*! Replace user-supplied value for node
  *
  *  \param[in] node Node referenced
  *  \param[in] val  new user-supplied value
  */
-void ares__llist_node_replace(ares__llist_node_t *node, void *val);
+CARES_EXTERN void ares_llist_node_replace(ares_llist_node_t *node, void *val);
 
 /*! Destroy the node, removing it from the list and calling destructor.
  *
  *  \param[in] node  Node referenced
  */
-void ares__llist_node_destroy(ares__llist_node_t *node);
+CARES_EXTERN void ares_llist_node_destroy(ares_llist_node_t *node);
 
 /*! Destroy the list object and all nodes in the list.
  *
  *  \param[in] list Initialized list object
  */
-void ares__llist_destroy(ares__llist_t *list);
+CARES_EXTERN void ares_llist_destroy(ares_llist_t *list);
 
 /*! Detach node from the current list and re-attach it to the new list as the
  *  last entry.
@@ -219,8 +223,8 @@ void ares__llist_destroy(ares__llist_t *list);
  * \param[in] node       node to move
  * \param[in] new_parent new list
  */
-void ares__llist_node_move_parent_last(ares__llist_node_t *node,
-                                       ares__llist_t      *new_parent);
+CARES_EXTERN void ares_llist_node_mvparent_last(ares_llist_node_t *node,
+                                                ares_llist_t      *new_parent);
 
 /*! Detach node from the current list and re-attach it to the new list as the
  *  first entry.
@@ -228,8 +232,8 @@ void ares__llist_node_move_parent_last(ares__llist_node_t *node,
  * \param[in] node       node to move
  * \param[in] new_parent new list
  */
-void ares__llist_node_move_parent_first(ares__llist_node_t *node,
-                                        ares__llist_t      *new_parent);
+CARES_EXTERN void ares_llist_node_mvparent_first(ares_llist_node_t *node,
+                                                 ares_llist_t      *new_parent);
 /*! @} */
 
 #endif /* __ARES__LLIST_H */
diff --git a/deps/cares/src/lib/str/ares_strcasecmp.h b/deps/cares/src/lib/include/ares_mem.h
similarity index 75%
rename from deps/cares/src/lib/str/ares_strcasecmp.h
rename to deps/cares/src/lib/include/ares_mem.h
index a8097d2219e309..371cd4266dc720 100644
--- a/deps/cares/src/lib/str/ares_strcasecmp.h
+++ b/deps/cares/src/lib/include/ares_mem.h
@@ -1,6 +1,5 @@
 /* MIT License
  *
- * Copyright (c) 1998 Massachusetts Institute of Technology
  * Copyright (c) The c-ares project and its contributors
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -24,15 +23,16 @@
  *
  * SPDX-License-Identifier: MIT
  */
-#ifndef HEADER_CARES_STRCASECMP_H
-#define HEADER_CARES_STRCASECMP_H
 
-#ifndef HAVE_STRCASECMP
-extern int ares_strcasecmp(const char *a, const char *b);
-#endif
+#ifndef __ARES_MEM_H
+#define __ARES_MEM_H
 
-#ifndef HAVE_STRNCASECMP
-extern int ares_strncasecmp(const char *a, const char *b, size_t n);
-#endif
+/* Memory management functions */
+CARES_EXTERN void *ares_malloc(size_t size);
+CARES_EXTERN void *ares_realloc(void *ptr, size_t size);
+CARES_EXTERN void  ares_free(void *ptr);
+CARES_EXTERN void *ares_malloc_zero(size_t size);
+CARES_EXTERN void *ares_realloc_zero(void *ptr, size_t orig_size,
+                                     size_t new_size);
 
-#endif /* HEADER_CARES_STRCASECMP_H */
+#endif
diff --git a/deps/cares/src/lib/include/ares_str.h b/deps/cares/src/lib/include/ares_str.h
new file mode 100644
index 00000000000000..ea75b3b3e7441d
--- /dev/null
+++ b/deps/cares/src/lib/include/ares_str.h
@@ -0,0 +1,230 @@
+/* MIT License
+ *
+ * Copyright (c) 1998 Massachusetts Institute of Technology
+ * Copyright (c) The c-ares project and its contributors
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES_STR_H
+#define __ARES_STR_H
+
+CARES_EXTERN char  *ares_strdup(const char *s1);
+
+CARES_EXTERN size_t ares_strlen(const char *str);
+
+/*! Copy string from source to destination with destination buffer size
+ *  provided.  The destination is guaranteed to be null terminated, if the
+ *  provided buffer isn't large enough, only those bytes from the source that
+ *  will fit will be copied.
+ *
+ *  \param[out] dest       Destination buffer
+ *  \param[in]  src        Source to copy
+ *  \param[in]  dest_size  Size of destination buffer
+ *  \return String length.  Will be at most dest_size-1
+ */
+CARES_EXTERN size_t ares_strcpy(char *dest, const char *src, size_t dest_size);
+
+CARES_EXTERN ares_bool_t    ares_str_isnum(const char *str);
+CARES_EXTERN ares_bool_t    ares_str_isalnum(const char *str);
+
+CARES_EXTERN void           ares_str_ltrim(char *str);
+CARES_EXTERN void           ares_str_rtrim(char *str);
+CARES_EXTERN void           ares_str_trim(char *str);
+CARES_EXTERN void           ares_str_lower(char *str);
+
+CARES_EXTERN unsigned char  ares_tolower(unsigned char c);
+CARES_EXTERN unsigned char *ares_memmem(const unsigned char *big,
+                                        size_t               big_len,
+                                        const unsigned char *little,
+                                        size_t               little_len);
+CARES_EXTERN ares_bool_t    ares_memeq(const unsigned char *ptr,
+                                       const unsigned char *val, size_t len);
+CARES_EXTERN ares_bool_t    ares_memeq_ci(const unsigned char *ptr,
+                                          const unsigned char *val, size_t len);
+CARES_EXTERN ares_bool_t    ares_is_hostname(const char *str);
+
+/*! Validate the string provided is printable.  The length specified must be
+ *  at least the size of the buffer provided.  If a NULL-terminator is hit
+ *  before the length provided is hit, this will not be considered a valid
+ *  printable string.  This does not validate that the string is actually
+ *  NULL terminated.
+ *
+ *  \param[in] str  Buffer containing string to evaluate.
+ *  \param[in] len  Number of characters to evaluate within provided buffer.
+ *                  If 0, will return TRUE since it did not hit an exception.
+ *  \return ARES_TRUE if the entire string is printable, ARES_FALSE if not.
+ */
+CARES_EXTERN ares_bool_t    ares_str_isprint(const char *str, size_t len);
+
+/* We only care about ASCII rules */
+#define ares_isascii(x) (((unsigned char)x) <= 127)
+
+#define ares_isdigit(x) (((unsigned char)x) >= '0' && ((unsigned char)x) <= '9')
+
+#define ares_isxdigit(x)                                       \
+  (ares_isdigit(x) ||                                          \
+   (((unsigned char)x) >= 'a' && ((unsigned char)x) <= 'f') || \
+   (((unsigned char)x) >= 'A' && ((unsigned char)x) <= 'F'))
+
+#define ares_isupper(x) (((unsigned char)x) >= 'A' && ((unsigned char)x) <= 'Z')
+
+#define ares_islower(x) (((unsigned char)x) >= 'a' && ((unsigned char)x) <= 'z')
+
+#define ares_isalpha(x) (ares_islower(x) || ares_isupper(x))
+
+#define ares_isspace(x)                                            \
+  (((unsigned char)(x)) == '\r' || ((unsigned char)(x)) == '\t' || \
+   ((unsigned char)(x)) == ' ' || ((unsigned char)(x)) == '\v' ||  \
+   ((unsigned char)(x)) == '\f' || ((unsigned char)(x)) == '\n')
+
+#define ares_isprint(x) \
+  (((unsigned char)(x)) >= 0x20 && ((unsigned char)(x)) <= 0x7E)
+
+/* Character set allowed by hostnames.  This is to include the normal
+ * domain name character set plus:
+ *  - underscores which are used in SRV records.
+ *  - Forward slashes such as are used for classless in-addr.arpa
+ *    delegation (CNAMEs)
+ *  - Asterisks may be used for wildcard domains in CNAMEs as seen in the
+ *    real world.
+ * While RFC 2181 section 11 does state not to do validation,
+ * that applies to servers, not clients.  Vulnerabilities have been
+ * reported when this validation is not performed.  Security is more
+ * important than edge-case compatibility (which is probably invalid
+ * anyhow).
+ * [A-Za-z0-9-*._/]
+ */
+#define ares_is_hostnamech(x)                                           \
+  (ares_isalpha(x) || ares_isdigit(x) || ((unsigned char)(x)) == '-' || \
+   ((unsigned char)(x)) == '.' || ((unsigned char)(x)) == '_' ||        \
+   ((unsigned char)(x)) == '/' || ((unsigned char)(x)) == '*')
+
+
+/*! Compare two strings (for sorting)
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \return < 0 if First String less than Second String,
+ *            0 if First String equal to Second String,
+ *          > 0 if First String greater than Second String
+ */
+CARES_EXTERN int ares_strcmp(const char *a, const char *b);
+
+/*! Compare two strings up to specified length (for sorting)
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \param[in] n Length
+ *  \return < 0 if First String less than Second String,
+ *            0 if First String equal to Second String,
+ *          > 0 if First String greater than Second String
+ */
+CARES_EXTERN int ares_strncmp(const char *a, const char *b, size_t n);
+
+
+/*! Compare two strings in a case-insensitive manner (for sorting)
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \return < 0 if First String less than Second String,
+ *            0 if First String equal to Second String,
+ *          > 0 if First String greater than Second String
+ */
+CARES_EXTERN int ares_strcasecmp(const char *a, const char *b);
+
+/*! Compare two strings in a case-insensitive manner up to specified length
+ *  (for sorting)
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \param[in] n Length
+ *  \return < 0 if First String less than Second String,
+ *            0 if First String equal to Second String,
+ *          > 0 if First String greater than Second String
+ */
+CARES_EXTERN int ares_strncasecmp(const char *a, const char *b, size_t n);
+
+/*! Compare two strings for equality
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \return ARES_TRUE on match, or ARES_FALSE if no match
+ */
+CARES_EXTERN ares_bool_t ares_streq(const char *a, const char *b);
+
+/*! Compare two strings for equality up to specified length
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \param[in] n Length
+ *  \return ARES_TRUE on match, or ARES_FALSE if no match
+ */
+CARES_EXTERN ares_bool_t ares_streq_max(const char *a, const char *b, size_t n);
+
+/*! Compare two strings for equality in a case insensitive manner
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \return ARES_TRUE on match, or ARES_FALSE if no match
+ */
+CARES_EXTERN ares_bool_t ares_strcaseeq(const char *a, const char *b);
+
+/*! Compare two strings for equality up to specified length in a case
+ *  insensitive manner
+ *
+ *  Treats NULL and "" strings as equivalent
+ *
+ *  \param[in] a First String
+ *  \param[in] b Second String
+ *  \param[in] n Length
+ *  \return ARES_TRUE on match, or ARES_FALSE if no match
+ */
+CARES_EXTERN ares_bool_t ares_strcaseeq_max(const char *a, const char *b,
+                                            size_t n);
+
+/*! Free a C array, each element in the array will be freed by the provided
+ *  free function.  Both NULL-terminated arrays and known length arrays are
+ *  supported.
+ *
+ *  \param[in] arr      Array to be freed.
+ *  \param[in] nmembers Number of members in the array, or SIZE_MAX for
+ *                      NULL-terminated arrays
+ *  \param[in] freefunc Function to call on each array member (e.g. ares_free)
+ */
+CARES_EXTERN void        ares_free_array(void *arr, size_t nmembers,
+                                         void (*freefunc)(void *));
+
+#endif /* __ARES_STR_H */
diff --git a/deps/cares/src/lib/inet_net_pton.c b/deps/cares/src/lib/inet_net_pton.c
index 5356778c47b86f..e1f76ef834a6f9 100644
--- a/deps/cares/src/lib/inet_net_pton.c
+++ b/deps/cares/src/lib/inet_net_pton.c
@@ -32,6 +32,20 @@
 #include "ares_ipv6.h"
 #include "ares_inet_net_pton.h"
 
+#ifdef USE_WINSOCK
+#  define SOCKERRNO        ((int)WSAGetLastError())
+#  define SET_SOCKERRNO(x) (WSASetLastError((int)(x)))
+#  undef EMSGSIZE
+#  define EMSGSIZE WSAEMSGSIZE
+#  undef ENOENT
+#  define ENOENT WSA_INVALID_PARAMETER
+#  undef EAFNOSUPPORT
+#  define EAFNOSUPPORT WSAEAFNOSUPPORT
+#else
+#  define SOCKERRNO        (errno)
+#  define SET_SOCKERRNO(x) (errno = (x))
+#endif
+
 const struct ares_in6_addr ares_in6addr_any = { { { 0, 0, 0, 0, 0, 0, 0, 0, 0,
                                                     0, 0, 0, 0, 0, 0, 0 } } };
 
@@ -69,17 +83,17 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
   const unsigned char *odst = dst;
 
   ch = *src++;
-  if (ch == '0' && (src[0] == 'x' || src[0] == 'X') && ares__isascii(src[1]) &&
-      ares__isxdigit(src[1])) {
+  if (ch == '0' && (src[0] == 'x' || src[0] == 'X') && ares_isascii(src[1]) &&
+      ares_isxdigit(src[1])) {
     /* Hexadecimal: Eat nybble string. */
     if (!size) {
       goto emsgsize;
     }
     dirty = 0;
     src++; /* skip x or X. */
-    while ((ch = *src++) != '\0' && ares__isascii(ch) && ares__isxdigit(ch)) {
-      if (ares__isupper(ch)) {
-        ch = ares__tolower((unsigned char)ch);
+    while ((ch = *src++) != '\0' && ares_isascii(ch) && ares_isxdigit(ch)) {
+      if (ares_isupper(ch)) {
+        ch = ares_tolower((unsigned char)ch);
       }
       n = (int)(strchr(xdigits, ch) - xdigits);
       if (dirty == 0) {
@@ -101,7 +115,7 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
       }
       *dst++ = (unsigned char)(tmp << 4);
     }
-  } else if (ares__isascii(ch) && ares__isdigit(ch)) {
+  } else if (ares_isascii(ch) && ares_isdigit(ch)) {
     /* Decimal: eat dotted digit string. */
     for (;;) {
       tmp = 0;
@@ -112,7 +126,7 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
         if (tmp > 255) {
           goto enoent;
         }
-      } while ((ch = *src++) != '\0' && ares__isascii(ch) && ares__isdigit(ch));
+      } while ((ch = *src++) != '\0' && ares_isascii(ch) && ares_isdigit(ch));
       if (!size--) {
         goto emsgsize;
       }
@@ -124,7 +138,7 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
         goto enoent;
       }
       ch = *src++;
-      if (!ares__isascii(ch) || !ares__isdigit(ch)) {
+      if (!ares_isascii(ch) || !ares_isdigit(ch)) {
         goto enoent;
       }
     }
@@ -133,8 +147,7 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
   }
 
   bits = -1;
-  if (ch == '/' && ares__isascii(src[0]) && ares__isdigit(src[0]) &&
-      dst > odst) {
+  if (ch == '/' && ares_isascii(src[0]) && ares_isdigit(src[0]) && dst > odst) {
     /* CIDR width specifier.  Nothing can follow it. */
     ch   = *src++; /* Skip over the /. */
     bits = 0;
@@ -145,7 +158,7 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
       if (bits > 32) {
         goto enoent;
       }
-    } while ((ch = *src++) != '\0' && ares__isascii(ch) && ares__isdigit(ch));
+    } while ((ch = *src++) != '\0' && ares_isascii(ch) && ares_isdigit(ch));
     if (ch != '\0') {
       goto enoent;
     }
@@ -195,11 +208,11 @@ static int ares_inet_net_pton_ipv4(const char *src, unsigned char *dst,
   return bits;
 
 enoent:
-  SET_ERRNO(ENOENT);
+  SET_SOCKERRNO(ENOENT);
   return -1;
 
 emsgsize:
-  SET_ERRNO(EMSGSIZE);
+  SET_SOCKERRNO(EMSGSIZE);
   return -1;
 }
 
@@ -343,7 +356,7 @@ static int ares_inet_pton6(const char *src, unsigned char *dst)
   return 1;
 
 enoent:
-  SET_ERRNO(ENOENT);
+  SET_SOCKERRNO(ENOENT);
   return -1;
 }
 
@@ -358,7 +371,7 @@ static int ares_inet_net_pton_ipv6(const char *src, unsigned char *dst,
   char                *sep;
 
   if (ares_strlen(src) >= sizeof buf) {
-    SET_ERRNO(EMSGSIZE);
+    SET_SOCKERRNO(EMSGSIZE);
     return -1;
   }
   ares_strcpy(buf, src, sizeof buf);
@@ -377,14 +390,14 @@ static int ares_inet_net_pton_ipv6(const char *src, unsigned char *dst,
     bits = 128;
   } else {
     if (!getbits(sep, &bits)) {
-      SET_ERRNO(ENOENT);
+      SET_SOCKERRNO(ENOENT);
       return -1;
     }
   }
 
   bytes = (bits + 7) / 8;
   if (bytes > size) {
-    SET_ERRNO(EMSGSIZE);
+    SET_SOCKERRNO(EMSGSIZE);
     return -1;
   }
   memcpy(dst, &in6, bytes);
@@ -401,13 +414,9 @@ static int ares_inet_net_pton_ipv6(const char *src, unsigned char *dst,
  *      number of bits, either imputed classfully or specified with /CIDR,
  *      or -1 if some failure occurred (check errno).  ENOENT means it was
  *      not a valid network specification.
- * note:
- *      On Windows we store the error in the thread errno, not
- *      in the winsock error code. This is to avoid losing the
- *      actual last winsock error. So use macro ERRNO to fetch the
- *      errno this function sets when returning (-1), not SOCKERRNO.
  * author:
  *      Paul Vixie (ISC), June 1996
+ *
  */
 int ares_inet_net_pton(int af, const char *src, void *dst, size_t size)
 {
@@ -417,7 +426,6 @@ int ares_inet_net_pton(int af, const char *src, void *dst, size_t size)
     case AF_INET6:
       return ares_inet_net_pton_ipv6(src, dst, size);
     default:
-      SET_ERRNO(EAFNOSUPPORT);
       return -1;
   }
 }
@@ -432,11 +440,11 @@ int ares_inet_pton(int af, const char *src, void *dst)
   } else if (af == AF_INET6) {
     size = sizeof(struct ares_in6_addr);
   } else {
-    SET_ERRNO(EAFNOSUPPORT);
+    SET_SOCKERRNO(EAFNOSUPPORT);
     return -1;
   }
   result = ares_inet_net_pton(af, src, dst, size);
-  if (result == -1 && ERRNO == ENOENT) {
+  if (result == -1 && SOCKERRNO == ENOENT) {
     return 0;
   }
   return (result > -1) ? 1 : -1;
diff --git a/deps/cares/src/lib/inet_ntop.c b/deps/cares/src/lib/inet_ntop.c
index 6f96b92ccccd13..79b6c0fa9393d7 100644
--- a/deps/cares/src/lib/inet_ntop.c
+++ b/deps/cares/src/lib/inet_ntop.c
@@ -29,6 +29,22 @@
 #include "ares_nameser.h"
 #include "ares_ipv6.h"
 
+#ifdef USE_WINSOCK
+#  define SOCKERRNO        ((int)WSAGetLastError())
+#  define SET_SOCKERRNO(x) (WSASetLastError((int)(x)))
+#  undef EMSGSIZE
+#  define EMSGSIZE WSAEMSGSIZE
+#  undef ENOENT
+#  define ENOENT WSA_INVALID_PARAMETER
+#  undef EAFNOSUPPORT
+#  define EAFNOSUPPORT WSAEAFNOSUPPORT
+#  undef ENOSPC
+#  define ENOSPC WSA_INVALID_PARAMETER
+#else
+#  define SOCKERRNO        (errno)
+#  define SET_SOCKERRNO(x) (errno = (x))
+#endif
+
 /*
  * WARNING: Don't even consider trying to compile this on a system where
  * sizeof(int) < 4.  sizeof(int) > 4 is fine; all the world's not a VAX.
@@ -42,11 +58,6 @@ static const char *inet_ntop6(const unsigned char *src, char *dst, size_t size);
  *     convert a network format address to presentation format.
  * return:
  *     pointer to presentation format address (`dst'), or NULL (see errno).
- * note:
- *     On Windows we store the error in the thread errno, not
- *     in the winsock error code. This is to avoid losing the
- *     actual last winsock error. So use macro ERRNO to fetch the
- *     errno this function sets when returning NULL, not SOCKERRNO.
  * author:
  *     Paul Vixie, 1996.
  */
@@ -61,7 +72,7 @@ const char        *ares_inet_ntop(int af, const void *src, char *dst,
     default:
       break;
   }
-  SET_ERRNO(EAFNOSUPPORT);
+  SET_SOCKERRNO(EAFNOSUPPORT);
   return NULL;
 }
 
@@ -82,13 +93,13 @@ static const char *inet_ntop4(const unsigned char *src, char *dst, size_t size)
   char              tmp[sizeof("255.255.255.255")];
 
   if (size < sizeof(tmp)) {
-    SET_ERRNO(ENOSPC);
+    SET_SOCKERRNO(ENOSPC);
     return NULL;
   }
 
   if ((size_t)snprintf(tmp, sizeof(tmp), fmt, src[0], src[1], src[2], src[3]) >=
       size) {
-    SET_ERRNO(ENOSPC);
+    SET_SOCKERRNO(ENOSPC);
     return NULL;
   }
   ares_strcpy(dst, tmp, size);
@@ -200,7 +211,7 @@ static const char *inet_ntop6(const unsigned char *src, char *dst, size_t size)
    * Check for overflow, copy, and we're done.
    */
   if ((size_t)(tp - tmp) > size) {
-    SET_ERRNO(ENOSPC);
+    SET_SOCKERRNO(ENOSPC);
     return NULL;
   }
   ares_strcpy(dst, tmp, size);
diff --git a/deps/cares/src/lib/legacy/ares_expand_name.c b/deps/cares/src/lib/legacy/ares_expand_name.c
index 63bd64516682e3..72668f4cb60a07 100644
--- a/deps/cares/src/lib/legacy/ares_expand_name.c
+++ b/deps/cares/src/lib/legacy/ares_expand_name.c
@@ -33,13 +33,13 @@
 
 #include "ares_nameser.h"
 
-ares_status_t ares__expand_name_validated(const unsigned char *encoded,
-                                          const unsigned char *abuf,
-                                          size_t alen, char **s, size_t *enclen,
-                                          ares_bool_t is_hostname)
+ares_status_t ares_expand_name_validated(const unsigned char *encoded,
+                                         const unsigned char *abuf, size_t alen,
+                                         char **s, size_t *enclen,
+                                         ares_bool_t is_hostname)
 {
   ares_status_t status;
-  ares__buf_t  *buf = NULL;
+  ares_buf_t   *buf = NULL;
   size_t        start_len;
 
   if (encoded == NULL || abuf == NULL || alen == 0 || enclen == NULL) {
@@ -57,27 +57,27 @@ ares_status_t ares__expand_name_validated(const unsigned char *encoded,
     *s = NULL;
   }
 
-  buf = ares__buf_create_const(abuf, alen);
+  buf = ares_buf_create_const(abuf, alen);
 
   if (buf == NULL) {
     return ARES_ENOMEM;
   }
 
-  status = ares__buf_set_position(buf, (size_t)(encoded - abuf));
+  status = ares_buf_set_position(buf, (size_t)(encoded - abuf));
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  start_len = ares__buf_len(buf);
-  status    = ares__dns_name_parse(buf, s, is_hostname);
+  start_len = ares_buf_len(buf);
+  status    = ares_dns_name_parse(buf, s, is_hostname);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  *enclen = start_len - ares__buf_len(buf);
+  *enclen = start_len - ares_buf_len(buf);
 
 done:
-  ares__buf_destroy(buf);
+  ares_buf_destroy(buf);
   return status;
 }
 
@@ -92,8 +92,8 @@ int ares_expand_name(const unsigned char *encoded, const unsigned char *abuf,
     return ARES_EBADNAME;
   }
 
-  status  = ares__expand_name_validated(encoded, abuf, (size_t)alen, s,
-                                        &enclen_temp, ARES_FALSE);
+  status  = ares_expand_name_validated(encoded, abuf, (size_t)alen, s,
+                                       &enclen_temp, ARES_FALSE);
   *enclen = (long)enclen_temp;
   return (int)status;
 }
diff --git a/deps/cares/src/lib/legacy/ares_expand_string.c b/deps/cares/src/lib/legacy/ares_expand_string.c
index b3e99daa54b325..e1deb1932f43b4 100644
--- a/deps/cares/src/lib/legacy/ares_expand_string.c
+++ b/deps/cares/src/lib/legacy/ares_expand_string.c
@@ -43,7 +43,7 @@ ares_status_t ares_expand_string_ex(const unsigned char *encoded,
                                     unsigned char **s, size_t *enclen)
 {
   ares_status_t status;
-  ares__buf_t  *buf = NULL;
+  ares_buf_t   *buf = NULL;
   size_t        start_len;
   size_t        len = 0;
 
@@ -62,28 +62,28 @@ ares_status_t ares_expand_string_ex(const unsigned char *encoded,
     *s = NULL;
   }
 
-  buf = ares__buf_create_const(abuf, alen);
+  buf = ares_buf_create_const(abuf, alen);
 
   if (buf == NULL) {
     return ARES_ENOMEM;
   }
 
-  status = ares__buf_set_position(buf, (size_t)(encoded - abuf));
+  status = ares_buf_set_position(buf, (size_t)(encoded - abuf));
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  start_len = ares__buf_len(buf);
-  status    = ares__buf_parse_dns_binstr(buf, ares__buf_len(buf), s, &len);
+  start_len = ares_buf_len(buf);
+  status    = ares_buf_parse_dns_binstr(buf, ares_buf_len(buf), s, &len);
   /* hrm, no way to pass back 'len' with the prototype */
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  *enclen = start_len - ares__buf_len(buf);
+  *enclen = start_len - ares_buf_len(buf);
 
 done:
-  ares__buf_destroy(buf);
+  ares_buf_destroy(buf);
   if (status == ARES_EBADNAME || status == ARES_EBADRESP) {
     status = ARES_EBADSTR;
   }
diff --git a/deps/cares/src/lib/legacy/ares_fds.c b/deps/cares/src/lib/legacy/ares_fds.c
index 3aedd2c90e23f3..112ebac60ad0a7 100644
--- a/deps/cares/src/lib/legacy/ares_fds.c
+++ b/deps/cares/src/lib/legacy/ares_fds.c
@@ -29,28 +29,28 @@
 
 int ares_fds(const ares_channel_t *channel, fd_set *read_fds, fd_set *write_fds)
 {
-  ares_socket_t       nfds;
-  ares__slist_node_t *snode;
+  ares_socket_t      nfds;
+  ares_slist_node_t *snode;
   /* Are there any active queries? */
-  size_t              active_queries;
+  size_t             active_queries;
 
   if (channel == NULL || read_fds == NULL || write_fds == NULL) {
     return 0;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  active_queries = ares__llist_len(channel->all_queries);
+  active_queries = ares_llist_len(channel->all_queries);
 
   nfds = 0;
-  for (snode = ares__slist_node_first(channel->servers); snode != NULL;
-       snode = ares__slist_node_next(snode)) {
-    ares_server_t      *server = ares__slist_node_val(snode);
-    ares__llist_node_t *node;
+  for (snode = ares_slist_node_first(channel->servers); snode != NULL;
+       snode = ares_slist_node_next(snode)) {
+    ares_server_t     *server = ares_slist_node_val(snode);
+    ares_llist_node_t *node;
 
-    for (node = ares__llist_node_first(server->connections); node != NULL;
-         node = ares__llist_node_next(node)) {
-      const ares_conn_t *conn = ares__llist_node_val(node);
+    for (node = ares_llist_node_first(server->connections); node != NULL;
+         node = ares_llist_node_next(node)) {
+      const ares_conn_t *conn = ares_llist_node_val(node);
 
       if (!active_queries && !(conn->flags & ARES_CONN_FLAG_TCP)) {
         continue;
@@ -68,13 +68,13 @@ int ares_fds(const ares_channel_t *channel, fd_set *read_fds, fd_set *write_fds)
         nfds = conn->fd + 1;
       }
 
-      /* TCP only wait on write if we have buffered data */
-      if (conn->flags & ARES_CONN_FLAG_TCP && ares__buf_len(server->tcp_send)) {
+      /* TCP only wait on write if we have the flag set */
+      if (conn->state_flags & ARES_CONN_STATE_WRITE) {
         FD_SET(conn->fd, write_fds);
       }
     }
   }
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return (int)nfds;
 }
diff --git a/deps/cares/src/lib/legacy/ares_getsock.c b/deps/cares/src/lib/legacy/ares_getsock.c
index 8c8476fa95109e..cec9258abb095a 100644
--- a/deps/cares/src/lib/legacy/ares_getsock.c
+++ b/deps/cares/src/lib/legacy/ares_getsock.c
@@ -29,30 +29,30 @@
 int ares_getsock(const ares_channel_t *channel, ares_socket_t *socks,
                  int numsocks) /* size of the 'socks' array */
 {
-  ares__slist_node_t *snode;
-  size_t              sockindex = 0;
-  unsigned int        bitmap    = 0;
-  unsigned int        setbits   = 0xffffffff;
+  ares_slist_node_t *snode;
+  size_t             sockindex = 0;
+  unsigned int       bitmap    = 0;
+  unsigned int       setbits   = 0xffffffff;
 
   /* Are there any active queries? */
-  size_t              active_queries;
+  size_t             active_queries;
 
   if (channel == NULL || numsocks <= 0) {
     return 0;
   }
 
-  ares__channel_lock(channel);
+  ares_channel_lock(channel);
 
-  active_queries = ares__llist_len(channel->all_queries);
+  active_queries = ares_llist_len(channel->all_queries);
 
-  for (snode = ares__slist_node_first(channel->servers); snode != NULL;
-       snode = ares__slist_node_next(snode)) {
-    ares_server_t      *server = ares__slist_node_val(snode);
-    ares__llist_node_t *node;
+  for (snode = ares_slist_node_first(channel->servers); snode != NULL;
+       snode = ares_slist_node_next(snode)) {
+    ares_server_t     *server = ares_slist_node_val(snode);
+    ares_llist_node_t *node;
 
-    for (node = ares__llist_node_first(server->connections); node != NULL;
-         node = ares__llist_node_next(node)) {
-      const ares_conn_t *conn = ares__llist_node_val(node);
+    for (node = ares_llist_node_first(server->connections); node != NULL;
+         node = ares_llist_node_next(node)) {
+      const ares_conn_t *conn = ares_llist_node_val(node);
 
       if (sockindex >= (size_t)numsocks || sockindex >= ARES_GETSOCK_MAXNUM) {
         break;
@@ -71,7 +71,7 @@ int ares_getsock(const ares_channel_t *channel, ares_socket_t *socks,
         bitmap |= ARES_GETSOCK_READABLE(setbits, sockindex);
       }
 
-      if (conn->flags & ARES_CONN_FLAG_TCP && ares__buf_len(server->tcp_send)) {
+      if (conn->state_flags & ARES_CONN_STATE_WRITE) {
         /* then the tcp socket is also writable! */
         bitmap |= ARES_GETSOCK_WRITABLE(setbits, sockindex);
       }
@@ -80,6 +80,6 @@ int ares_getsock(const ares_channel_t *channel, ares_socket_t *socks,
     }
   }
 
-  ares__channel_unlock(channel);
+  ares_channel_unlock(channel);
   return (int)bitmap;
 }
diff --git a/deps/cares/src/lib/legacy/ares_parse_a_reply.c b/deps/cares/src/lib/legacy/ares_parse_a_reply.c
index 0981b90eeaf595..870aaccf76c0a8 100644
--- a/deps/cares/src/lib/legacy/ares_parse_a_reply.c
+++ b/deps/cares/src/lib/legacy/ares_parse_a_reply.c
@@ -71,13 +71,13 @@ int ares_parse_a_reply(const unsigned char *abuf, int alen,
     goto fail;
   }
 
-  status = ares__parse_into_addrinfo(dnsrec, 0, 0, &ai);
+  status = ares_parse_into_addrinfo(dnsrec, 0, 0, &ai);
   if (status != ARES_SUCCESS && status != ARES_ENODATA) {
     goto fail;
   }
 
   if (host != NULL) {
-    status = ares__addrinfo2hostent(&ai, AF_INET, host);
+    status = ares_addrinfo2hostent(&ai, AF_INET, host);
     if (status != ARES_SUCCESS && status != ARES_ENODATA) {
       goto fail; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
@@ -85,15 +85,15 @@ int ares_parse_a_reply(const unsigned char *abuf, int alen,
 
   if (addrttls != NULL && req_naddrttls) {
     size_t temp_naddrttls = 0;
-    ares__addrinfo2addrttl(&ai, AF_INET, req_naddrttls, addrttls, NULL,
-                           &temp_naddrttls);
+    ares_addrinfo2addrttl(&ai, AF_INET, req_naddrttls, addrttls, NULL,
+                          &temp_naddrttls);
     *naddrttls = (int)temp_naddrttls;
   }
 
 
 fail:
-  ares__freeaddrinfo_cnames(ai.cnames);
-  ares__freeaddrinfo_nodes(ai.nodes);
+  ares_freeaddrinfo_cnames(ai.cnames);
+  ares_freeaddrinfo_nodes(ai.nodes);
   ares_free(ai.name);
   ares_free(question_hostname);
   ares_dns_record_destroy(dnsrec);
diff --git a/deps/cares/src/lib/legacy/ares_parse_aaaa_reply.c b/deps/cares/src/lib/legacy/ares_parse_aaaa_reply.c
index 3f6932643b2474..278642f0b3e0af 100644
--- a/deps/cares/src/lib/legacy/ares_parse_aaaa_reply.c
+++ b/deps/cares/src/lib/legacy/ares_parse_aaaa_reply.c
@@ -74,13 +74,13 @@ int ares_parse_aaaa_reply(const unsigned char *abuf, int alen,
     goto fail;
   }
 
-  status = ares__parse_into_addrinfo(dnsrec, 0, 0, &ai);
+  status = ares_parse_into_addrinfo(dnsrec, 0, 0, &ai);
   if (status != ARES_SUCCESS && status != ARES_ENODATA) {
     goto fail;
   }
 
   if (host != NULL) {
-    status = ares__addrinfo2hostent(&ai, AF_INET6, host);
+    status = ares_addrinfo2hostent(&ai, AF_INET6, host);
     if (status != ARES_SUCCESS && status != ARES_ENODATA) {
       goto fail; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
@@ -88,14 +88,14 @@ int ares_parse_aaaa_reply(const unsigned char *abuf, int alen,
 
   if (addrttls != NULL && req_naddrttls) {
     size_t temp_naddrttls = 0;
-    ares__addrinfo2addrttl(&ai, AF_INET6, req_naddrttls, NULL, addrttls,
-                           &temp_naddrttls);
+    ares_addrinfo2addrttl(&ai, AF_INET6, req_naddrttls, NULL, addrttls,
+                          &temp_naddrttls);
     *naddrttls = (int)temp_naddrttls;
   }
 
 fail:
-  ares__freeaddrinfo_cnames(ai.cnames);
-  ares__freeaddrinfo_nodes(ai.nodes);
+  ares_freeaddrinfo_cnames(ai.cnames);
+  ares_freeaddrinfo_nodes(ai.nodes);
   ares_free(question_hostname);
   ares_free(ai.name);
   ares_dns_record_destroy(dnsrec);
diff --git a/deps/cares/src/lib/legacy/ares_parse_ptr_reply.c b/deps/cares/src/lib/legacy/ares_parse_ptr_reply.c
index 56a7b5f94eb331..0e52f9db09bbc7 100644
--- a/deps/cares/src/lib/legacy/ares_parse_ptr_reply.c
+++ b/deps/cares/src/lib/legacy/ares_parse_ptr_reply.c
@@ -135,7 +135,7 @@ ares_status_t ares_parse_ptr_reply_dnsrec(const ares_dns_record_t *dnsrec,
      *   status = ARES_EBADRESP;
      *   goto done;
      * }
-     * if (strcasecmp(ptrname, rname) != 0) {
+     * if (!ares_strcaseeq(ptrname, rname)) {
      *   continue;
      * }
      */
diff --git a/deps/cares/src/lib/legacy/ares_parse_txt_reply.c b/deps/cares/src/lib/legacy/ares_parse_txt_reply.c
index 71ee0841119b8b..d276f6ab390753 100644
--- a/deps/cares/src/lib/legacy/ares_parse_txt_reply.c
+++ b/deps/cares/src/lib/legacy/ares_parse_txt_reply.c
@@ -27,8 +27,8 @@
 #include "ares_private.h"
 #include "ares_data.h"
 
-static int ares__parse_txt_reply(const unsigned char *abuf, size_t alen,
-                                 ares_bool_t ex, void **txt_out)
+static int ares_parse_txt_reply_int(const unsigned char *abuf, size_t alen,
+                                    ares_bool_t ex, void **txt_out)
 {
   ares_status_t        status;
   struct ares_txt_ext *txt_head = NULL;
@@ -129,8 +129,8 @@ int ares_parse_txt_reply(const unsigned char *abuf, int alen,
   if (alen < 0) {
     return ARES_EBADRESP;
   }
-  return ares__parse_txt_reply(abuf, (size_t)alen, ARES_FALSE,
-                               (void **)txt_out);
+  return ares_parse_txt_reply_int(abuf, (size_t)alen, ARES_FALSE,
+                                  (void **)txt_out);
 }
 
 int ares_parse_txt_reply_ext(const unsigned char *abuf, int alen,
@@ -139,5 +139,6 @@ int ares_parse_txt_reply_ext(const unsigned char *abuf, int alen,
   if (alen < 0) {
     return ARES_EBADRESP;
   }
-  return ares__parse_txt_reply(abuf, (size_t)alen, ARES_TRUE, (void **)txt_out);
+  return ares_parse_txt_reply_int(abuf, (size_t)alen, ARES_TRUE,
+                                  (void **)txt_out);
 }
diff --git a/deps/cares/src/lib/record/ares_dns_mapping.c b/deps/cares/src/lib/record/ares_dns_mapping.c
index 738d2f3795ab3f..5a3ec28abf130b 100644
--- a/deps/cares/src/lib/record/ares_dns_mapping.c
+++ b/deps/cares/src/lib/record/ares_dns_mapping.c
@@ -111,7 +111,7 @@ ares_bool_t ares_dns_rec_type_isvalid(ares_dns_rec_type_t type,
   return is_query ? ARES_TRUE : ARES_FALSE;
 }
 
-ares_bool_t ares_dns_rec_type_allow_name_compression(ares_dns_rec_type_t type)
+ares_bool_t ares_dns_rec_allow_name_comp(ares_dns_rec_type_t type)
 {
   /* Only record types defined in RFC1035 allow name compression within the
    * RDATA.  Otherwise nameservers that don't understand an RR may not be
@@ -681,7 +681,7 @@ ares_bool_t ares_dns_class_fromstr(ares_dns_class_t *qclass, const char *str)
   }
 
   for (i = 0; list[i].name != NULL; i++) {
-    if (strcasecmp(list[i].name, str) == 0) {
+    if (ares_strcaseeq(list[i].name, str)) {
       *qclass = list[i].qclass;
       return ARES_TRUE;
     }
@@ -726,7 +726,7 @@ ares_bool_t ares_dns_rec_type_fromstr(ares_dns_rec_type_t *qtype,
   }
 
   for (i = 0; list[i].name != NULL; i++) {
-    if (strcasecmp(list[i].name, str) == 0) {
+    if (ares_strcaseeq(list[i].name, str)) {
       *qtype = list[i].type;
       return ARES_TRUE;
     }
diff --git a/deps/cares/src/lib/record/ares_dns_multistring.c b/deps/cares/src/lib/record/ares_dns_multistring.c
index bff5afb9f2a00f..57c0d1c0a803ec 100644
--- a/deps/cares/src/lib/record/ares_dns_multistring.c
+++ b/deps/cares/src/lib/record/ares_dns_multistring.c
@@ -31,7 +31,7 @@ typedef struct {
   size_t         len;
 } multistring_data_t;
 
-struct ares__dns_multistring {
+struct ares_dns_multistring {
   /*! whether or not cached concatenated string is valid */
   ares_bool_t    cache_invalidated;
   /*! combined/concatenated string cache */
@@ -39,10 +39,10 @@ struct ares__dns_multistring {
   /*! length of combined/concatenated string */
   size_t         cache_str_len;
   /*! Data making up strings */
-  ares__array_t *strs; /*!< multistring_data_t type */
+  ares_array_t  *strs; /*!< multistring_data_t type */
 };
 
-static void ares__dns_multistring_free_cb(void *arg)
+static void ares_dns_multistring_free_cb(void *arg)
 {
   multistring_data_t *data = arg;
   if (data == NULL) {
@@ -51,15 +51,15 @@ static void ares__dns_multistring_free_cb(void *arg)
   ares_free(data->data);
 }
 
-ares__dns_multistring_t *ares__dns_multistring_create(void)
+ares_dns_multistring_t *ares_dns_multistring_create(void)
 {
-  ares__dns_multistring_t *strs = ares_malloc_zero(sizeof(*strs));
+  ares_dns_multistring_t *strs = ares_malloc_zero(sizeof(*strs));
   if (strs == NULL) {
     return NULL;
   }
 
-  strs->strs = ares__array_create(sizeof(multistring_data_t),
-                                  ares__dns_multistring_free_cb);
+  strs->strs =
+    ares_array_create(sizeof(multistring_data_t), ares_dns_multistring_free_cb);
   if (strs->strs == NULL) {
     ares_free(strs);
     return NULL;
@@ -68,31 +68,31 @@ ares__dns_multistring_t *ares__dns_multistring_create(void)
   return strs;
 }
 
-void ares__dns_multistring_clear(ares__dns_multistring_t *strs)
+void ares_dns_multistring_clear(ares_dns_multistring_t *strs)
 {
   if (strs == NULL) {
     return;
   }
 
-  while (ares__array_len(strs->strs)) {
-    ares__array_remove_last(strs->strs);
+  while (ares_array_len(strs->strs)) {
+    ares_array_remove_last(strs->strs);
   }
 }
 
-void ares__dns_multistring_destroy(ares__dns_multistring_t *strs)
+void ares_dns_multistring_destroy(ares_dns_multistring_t *strs)
 {
   if (strs == NULL) {
     return;
   }
-  ares__dns_multistring_clear(strs);
-  ares__array_destroy(strs->strs);
+  ares_dns_multistring_clear(strs);
+  ares_array_destroy(strs->strs);
   ares_free(strs->cache_str);
   ares_free(strs);
 }
 
-ares_status_t ares__dns_multistring_replace_own(ares__dns_multistring_t *strs,
-                                                size_t idx, unsigned char *str,
-                                                size_t len)
+ares_status_t ares_dns_multistring_swap_own(ares_dns_multistring_t *strs,
+                                            size_t idx, unsigned char *str,
+                                            size_t len)
 {
   multistring_data_t *data;
 
@@ -102,7 +102,7 @@ ares_status_t ares__dns_multistring_replace_own(ares__dns_multistring_t *strs,
 
   strs->cache_invalidated = ARES_TRUE;
 
-  data = ares__array_at(strs->strs, idx);
+  data = ares_array_at(strs->strs, idx);
   if (data == NULL) {
     return ARES_EFORMERR;
   }
@@ -113,8 +113,7 @@ ares_status_t ares__dns_multistring_replace_own(ares__dns_multistring_t *strs,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__dns_multistring_del(ares__dns_multistring_t *strs,
-                                        size_t                   idx)
+ares_status_t ares_dns_multistring_del(ares_dns_multistring_t *strs, size_t idx)
 {
   if (strs == NULL) {
     return ARES_EFORMERR;
@@ -122,11 +121,11 @@ ares_status_t ares__dns_multistring_del(ares__dns_multistring_t *strs,
 
   strs->cache_invalidated = ARES_TRUE;
 
-  return ares__array_remove_at(strs->strs, idx);
+  return ares_array_remove_at(strs->strs, idx);
 }
 
-ares_status_t ares__dns_multistring_add_own(ares__dns_multistring_t *strs,
-                                            unsigned char *str, size_t len)
+ares_status_t ares_dns_multistring_add_own(ares_dns_multistring_t *strs,
+                                           unsigned char *str, size_t len)
 {
   multistring_data_t *data;
   ares_status_t       status;
@@ -142,7 +141,7 @@ ares_status_t ares__dns_multistring_add_own(ares__dns_multistring_t *strs,
     return ARES_EFORMERR;
   }
 
-  status = ares__array_insert_last((void **)&data, strs->strs);
+  status = ares_array_insert_last((void **)&data, strs->strs);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -153,17 +152,17 @@ ares_status_t ares__dns_multistring_add_own(ares__dns_multistring_t *strs,
   return ARES_SUCCESS;
 }
 
-size_t ares__dns_multistring_cnt(const ares__dns_multistring_t *strs)
+size_t ares_dns_multistring_cnt(const ares_dns_multistring_t *strs)
 {
   if (strs == NULL) {
     return 0;
   }
-  return ares__array_len(strs->strs);
+  return ares_array_len(strs->strs);
 }
 
 const unsigned char *
-  ares__dns_multistring_get(const ares__dns_multistring_t *strs, size_t idx,
-                            size_t *len)
+  ares_dns_multistring_get(const ares_dns_multistring_t *strs, size_t idx,
+                           size_t *len)
 {
   const multistring_data_t *data;
 
@@ -171,7 +170,7 @@ const unsigned char *
     return NULL;
   }
 
-  data = ares__array_at_const(strs->strs, idx);
+  data = ares_array_at_const(strs->strs, idx);
   if (data == NULL) {
     return NULL;
   }
@@ -180,11 +179,11 @@ const unsigned char *
   return data->data;
 }
 
-const unsigned char *
-  ares__dns_multistring_get_combined(ares__dns_multistring_t *strs, size_t *len)
+const unsigned char *ares_dns_multistring_combined(ares_dns_multistring_t *strs,
+                                                   size_t                 *len)
 {
-  ares__buf_t *buf = NULL;
-  size_t       i;
+  ares_buf_t *buf = NULL;
+  size_t      i;
 
   if (strs == NULL || len == NULL) {
     return NULL;
@@ -203,22 +202,92 @@ const unsigned char *
   strs->cache_str     = NULL;
   strs->cache_str_len = 0;
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
 
-  for (i = 0; i < ares__array_len(strs->strs); i++) {
-    const multistring_data_t *data = ares__array_at_const(strs->strs, i);
+  for (i = 0; i < ares_array_len(strs->strs); i++) {
+    const multistring_data_t *data = ares_array_at_const(strs->strs, i);
     if (data == NULL ||
-        ares__buf_append(buf, data->data, data->len) != ARES_SUCCESS) {
-      ares__buf_destroy(buf);
+        ares_buf_append(buf, data->data, data->len) != ARES_SUCCESS) {
+      ares_buf_destroy(buf);
       return NULL;
     }
   }
 
   strs->cache_str =
-    (unsigned char *)ares__buf_finish_str(buf, &strs->cache_str_len);
+    (unsigned char *)ares_buf_finish_str(buf, &strs->cache_str_len);
   if (strs->cache_str != NULL) {
     strs->cache_invalidated = ARES_FALSE;
   }
   *len = strs->cache_str_len;
   return strs->cache_str;
 }
+
+ares_status_t ares_dns_multistring_parse_buf(ares_buf_t *buf,
+                                             size_t      remaining_len,
+                                             ares_dns_multistring_t **strs,
+                                             ares_bool_t validate_printable)
+{
+  unsigned char len;
+  ares_status_t status   = ARES_EBADRESP;
+  size_t        orig_len = ares_buf_len(buf);
+
+  if (buf == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (remaining_len == 0) {
+    return ARES_EBADRESP;
+  }
+
+  if (strs != NULL) {
+    *strs = ares_dns_multistring_create();
+    if (*strs == NULL) {
+      return ARES_ENOMEM;
+    }
+  }
+
+  while (orig_len - ares_buf_len(buf) < remaining_len) {
+    status = ares_buf_fetch_bytes(buf, &len, 1);
+    if (status != ARES_SUCCESS) {
+      break; /* LCOV_EXCL_LINE: DefensiveCoding */
+    }
+
+    if (len) {
+      /* When used by the _str() parser, it really needs to be validated to
+       * be a valid printable ascii string.  Do that here */
+      if (validate_printable && ares_buf_len(buf) >= len) {
+        size_t      mylen;
+        const char *data = (const char *)ares_buf_peek(buf, &mylen);
+        if (!ares_str_isprint(data, len)) {
+          status = ARES_EBADSTR;
+          break;
+        }
+      }
+
+      if (strs != NULL) {
+        unsigned char *data = NULL;
+        status = ares_buf_fetch_bytes_dup(buf, len, ARES_TRUE, &data);
+        if (status != ARES_SUCCESS) {
+          break;
+        }
+        status = ares_dns_multistring_add_own(*strs, data, len);
+        if (status != ARES_SUCCESS) {
+          ares_free(data);
+          break;
+        }
+      } else {
+        status = ares_buf_consume(buf, len);
+        if (status != ARES_SUCCESS) {
+          break;
+        }
+      }
+    }
+  }
+
+  if (status != ARES_SUCCESS && strs != NULL) {
+    ares_dns_multistring_destroy(*strs);
+    *strs = NULL;
+  }
+
+  return status;
+}
diff --git a/deps/cares/src/lib/record/ares_dns_multistring.h b/deps/cares/src/lib/record/ares_dns_multistring.h
index d9aa7ae37847eb..70834491b52116 100644
--- a/deps/cares/src/lib/record/ares_dns_multistring.h
+++ b/deps/cares/src/lib/record/ares_dns_multistring.h
@@ -26,25 +26,47 @@
 #ifndef __ARES_DNS_MULTISTRING_H
 #define __ARES_DNS_MULTISTRING_H
 
-struct ares__dns_multistring;
-typedef struct ares__dns_multistring ares__dns_multistring_t;
+#include "ares_buf.h"
 
-ares__dns_multistring_t             *ares__dns_multistring_create(void);
-void          ares__dns_multistring_clear(ares__dns_multistring_t *strs);
-void          ares__dns_multistring_destroy(ares__dns_multistring_t *strs);
-ares_status_t ares__dns_multistring_replace_own(ares__dns_multistring_t *strs,
-                                                size_t idx, unsigned char *str,
-                                                size_t len);
-ares_status_t ares__dns_multistring_del(ares__dns_multistring_t *strs,
-                                        size_t                   idx);
-ares_status_t ares__dns_multistring_add_own(ares__dns_multistring_t *strs,
-                                            unsigned char *str, size_t len);
-size_t        ares__dns_multistring_cnt(const ares__dns_multistring_t *strs);
-const unsigned char *
-  ares__dns_multistring_get(const ares__dns_multistring_t *strs, size_t idx,
-                            size_t *len);
+struct ares_dns_multistring;
+typedef struct ares_dns_multistring ares_dns_multistring_t;
+
+ares_dns_multistring_t             *ares_dns_multistring_create(void);
+void          ares_dns_multistring_clear(ares_dns_multistring_t *strs);
+void          ares_dns_multistring_destroy(ares_dns_multistring_t *strs);
+ares_status_t ares_dns_multistring_swap_own(ares_dns_multistring_t *strs,
+                                            size_t idx, unsigned char *str,
+                                            size_t len);
+ares_status_t ares_dns_multistring_del(ares_dns_multistring_t *strs,
+                                       size_t                  idx);
+ares_status_t ares_dns_multistring_add_own(ares_dns_multistring_t *strs,
+                                           unsigned char *str, size_t len);
+size_t        ares_dns_multistring_cnt(const ares_dns_multistring_t *strs);
 const unsigned char *
-  ares__dns_multistring_get_combined(ares__dns_multistring_t *strs,
-                                     size_t                  *len);
+  ares_dns_multistring_get(const ares_dns_multistring_t *strs, size_t idx,
+                           size_t *len);
+const unsigned char *ares_dns_multistring_combined(ares_dns_multistring_t *strs,
+                                                   size_t                 *len);
+
+/*! Parse an array of character strings as defined in RFC1035, as binary,
+ *  however, for convenience this does guarantee a NULL terminator (that is
+ *  not included in the length for each value).
+ *
+ *  \param[in]  buf                initialized buffer object
+ *  \param[in]  remaining_len      maximum length that should be used for
+ *                                 parsing the string, this is often less than
+ *                                 the remaining buffer and is based on the RR
+ *                                 record length.
+ *  \param[out] strs               Pointer passed by reference to be filled in
+ *                                 with
+ *                                 the array of values.
+ *  \param[out] validate_printable Validate the strings contain only printable
+ *                                 data.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t        ares_dns_multistring_parse_buf(ares_buf_t *buf,
+                                                    size_t      remaining_len,
+                                                    ares_dns_multistring_t **strs,
+                                                    ares_bool_t validate_printable);
 
 #endif
diff --git a/deps/cares/src/lib/record/ares_dns_name.c b/deps/cares/src/lib/record/ares_dns_name.c
index a437553b0f2917..a9b92e03ca9353 100644
--- a/deps/cares/src/lib/record/ares_dns_name.c
+++ b/deps/cares/src/lib/record/ares_dns_name.c
@@ -31,7 +31,7 @@ typedef struct {
   size_t idx;
 } ares_nameoffset_t;
 
-static void ares__nameoffset_free(void *arg)
+static void ares_nameoffset_free(void *arg)
 {
   ares_nameoffset_t *off = arg;
   if (off == NULL) {
@@ -41,8 +41,8 @@ static void ares__nameoffset_free(void *arg)
   ares_free(off);
 }
 
-static ares_status_t ares__nameoffset_create(ares__llist_t **list,
-                                             const char *name, size_t idx)
+static ares_status_t ares_nameoffset_create(ares_llist_t **list,
+                                            const char *name, size_t idx)
 {
   ares_status_t      status;
   ares_nameoffset_t *off = NULL;
@@ -53,7 +53,7 @@ static ares_status_t ares__nameoffset_create(ares__llist_t **list,
   }
 
   if (*list == NULL) {
-    *list = ares__llist_create(ares__nameoffset_free);
+    *list = ares_llist_create(ares_nameoffset_free);
   }
   if (*list == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
@@ -69,7 +69,7 @@ static ares_status_t ares__nameoffset_create(ares__llist_t **list,
   off->name_len = ares_strlen(off->name);
   off->idx      = idx;
 
-  if (ares__llist_insert_last(*list, off) == NULL) {
+  if (ares_llist_insert_last(*list, off) == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto fail;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -78,25 +78,25 @@ static ares_status_t ares__nameoffset_create(ares__llist_t **list,
 
 /* LCOV_EXCL_START: OutOfMemory */
 fail:
-  ares__nameoffset_free(off);
+  ares_nameoffset_free(off);
   return status;
   /* LCOV_EXCL_STOP */
 }
 
-static const ares_nameoffset_t *ares__nameoffset_find(ares__llist_t *list,
-                                                      const char    *name)
+static const ares_nameoffset_t *ares_nameoffset_find(ares_llist_t *list,
+                                                     const char   *name)
 {
   size_t                   name_len = ares_strlen(name);
-  ares__llist_node_t      *node;
+  ares_llist_node_t       *node;
   const ares_nameoffset_t *longest_match = NULL;
 
   if (list == NULL || name == NULL || name_len == 0) {
     return NULL;
   }
 
-  for (node = ares__llist_node_first(list); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const ares_nameoffset_t *val = ares__llist_node_val(node);
+  for (node = ares_llist_node_first(list); node != NULL;
+       node = ares_llist_node_next(node)) {
+    const ares_nameoffset_t *val = ares_llist_node_val(node);
     size_t                   prefix_len;
 
     /* Can't be a match if the stored name is longer */
@@ -114,7 +114,7 @@ static const ares_nameoffset_t *ares__nameoffset_find(ares__llist_t *list,
     /* Due to DNS 0x20, lets not inadvertently mangle things, use case-sensitive
      * matching instead of case-insensitive.  This may result in slightly
      * larger DNS queries overall. */
-    if (strcmp(val->name, name + prefix_len) != 0) {
+    if (!ares_streq(val->name, name + prefix_len)) {
       continue;
     }
 
@@ -133,38 +133,38 @@ static const ares_nameoffset_t *ares__nameoffset_find(ares__llist_t *list,
 
 static void ares_dns_labels_free_cb(void *arg)
 {
-  ares__buf_t **buf = arg;
+  ares_buf_t **buf = arg;
   if (buf == NULL) {
     return;
   }
 
-  ares__buf_destroy(*buf);
+  ares_buf_destroy(*buf);
 }
 
-static ares__buf_t *ares_dns_labels_add(ares__array_t *labels)
+static ares_buf_t *ares_dns_labels_add(ares_array_t *labels)
 {
-  ares__buf_t **buf;
+  ares_buf_t **buf;
 
   if (labels == NULL) {
     return NULL; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  if (ares__array_insert_last((void **)&buf, labels) != ARES_SUCCESS) {
+  if (ares_array_insert_last((void **)&buf, labels) != ARES_SUCCESS) {
     return NULL;
   }
 
-  *buf = ares__buf_create();
+  *buf = ares_buf_create();
   if (*buf == NULL) {
-    ares__array_remove_last(labels);
+    ares_array_remove_last(labels);
     return NULL;
   }
 
   return *buf;
 }
 
-static ares__buf_t *ares_dns_labels_get_last(ares__array_t *labels)
+static ares_buf_t *ares_dns_labels_get_last(ares_array_t *labels)
 {
-  ares__buf_t **buf = ares__array_last(labels);
+  ares_buf_t **buf = ares_array_last(labels);
 
   if (buf == NULL) {
     return NULL;
@@ -173,9 +173,9 @@ static ares__buf_t *ares_dns_labels_get_last(ares__array_t *labels)
   return *buf;
 }
 
-static ares__buf_t *ares_dns_labels_get_at(ares__array_t *labels, size_t idx)
+static ares_buf_t *ares_dns_labels_get_at(ares_array_t *labels, size_t idx)
 {
-  ares__buf_t **buf = ares__array_at(labels, idx);
+  ares_buf_t **buf = ares_array_at(labels, idx);
 
   if (buf == NULL) {
     return NULL;
@@ -184,37 +184,37 @@ static ares__buf_t *ares_dns_labels_get_at(ares__array_t *labels, size_t idx)
   return *buf;
 }
 
-static void ares_dns_name_labels_del_last(ares__array_t *labels)
+static void ares_dns_name_labels_del_last(ares_array_t *labels)
 {
-  ares__array_remove_last(labels);
+  ares_array_remove_last(labels);
 }
 
-static ares_status_t ares_parse_dns_name_escape(ares__buf_t *namebuf,
-                                                ares__buf_t *label,
-                                                ares_bool_t  validate_hostname)
+static ares_status_t ares_parse_dns_name_escape(ares_buf_t *namebuf,
+                                                ares_buf_t *label,
+                                                ares_bool_t validate_hostname)
 {
   ares_status_t status;
   unsigned char c;
 
-  status = ares__buf_fetch_bytes(namebuf, &c, 1);
+  status = ares_buf_fetch_bytes(namebuf, &c, 1);
   if (status != ARES_SUCCESS) {
     return ARES_EBADNAME;
   }
 
   /* If next character is a digit, read 2 more digits */
-  if (ares__isdigit(c)) {
+  if (ares_isdigit(c)) {
     size_t       i;
     unsigned int val = 0;
 
     val = c - '0';
 
     for (i = 0; i < 2; i++) {
-      status = ares__buf_fetch_bytes(namebuf, &c, 1);
+      status = ares_buf_fetch_bytes(namebuf, &c, 1);
       if (status != ARES_SUCCESS) {
         return ARES_EBADNAME;
       }
 
-      if (!ares__isdigit(c)) {
+      if (!ares_isdigit(c)) {
         return ARES_EBADNAME;
       }
       val *= 10;
@@ -226,28 +226,28 @@ static ares_status_t ares_parse_dns_name_escape(ares__buf_t *namebuf,
       return ARES_EBADNAME;
     }
 
-    if (validate_hostname && !ares__is_hostnamech((unsigned char)val)) {
+    if (validate_hostname && !ares_is_hostnamech((unsigned char)val)) {
       return ARES_EBADNAME;
     }
 
-    return ares__buf_append_byte(label, (unsigned char)val);
+    return ares_buf_append_byte(label, (unsigned char)val);
   }
 
   /* We can just output the character */
-  if (validate_hostname && !ares__is_hostnamech(c)) {
+  if (validate_hostname && !ares_is_hostnamech(c)) {
     return ARES_EBADNAME;
   }
 
-  return ares__buf_append_byte(label, c);
+  return ares_buf_append_byte(label, c);
 }
 
-static ares_status_t ares_split_dns_name(ares__array_t *labels,
-                                         ares_bool_t    validate_hostname,
-                                         const char    *name)
+static ares_status_t ares_split_dns_name(ares_array_t *labels,
+                                         ares_bool_t   validate_hostname,
+                                         const char   *name)
 {
   ares_status_t status;
-  ares__buf_t  *label   = NULL;
-  ares__buf_t  *namebuf = NULL;
+  ares_buf_t   *label   = NULL;
+  ares_buf_t   *namebuf = NULL;
   size_t        i;
   size_t        total_len = 0;
   unsigned char c;
@@ -257,7 +257,7 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
   }
 
   /* Put name into a buffer for parsing */
-  namebuf = ares__buf_create();
+  namebuf = ares_buf_create();
   if (namebuf == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -265,7 +265,7 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
 
   if (*name != '\0') {
     status =
-      ares__buf_append(namebuf, (const unsigned char *)name, ares_strlen(name));
+      ares_buf_append(namebuf, (const unsigned char *)name, ares_strlen(name));
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -278,7 +278,7 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  while (ares__buf_fetch_bytes(namebuf, &c, 1) == ARES_SUCCESS) {
+  while (ares_buf_fetch_bytes(namebuf, &c, 1) == ARES_SUCCESS) {
     /* New label */
     if (c == '.') {
       label = ares_dns_labels_add(labels);
@@ -299,33 +299,33 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
     }
 
     /* Output direct character */
-    if (validate_hostname && !ares__is_hostnamech(c)) {
+    if (validate_hostname && !ares_is_hostnamech(c)) {
       status = ARES_EBADNAME;
       goto done;
     }
 
-    status = ares__buf_append_byte(label, c);
+    status = ares_buf_append_byte(label, c);
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
   /* Remove trailing blank label */
-  if (ares__buf_len(ares_dns_labels_get_last(labels)) == 0) {
+  if (ares_buf_len(ares_dns_labels_get_last(labels)) == 0) {
     ares_dns_name_labels_del_last(labels);
   }
 
   /* If someone passed in "." there could have been 2 blank labels, check for
    * that */
-  if (ares__array_len(labels) == 1 &&
-      ares__buf_len(ares_dns_labels_get_last(labels)) == 0) {
+  if (ares_array_len(labels) == 1 &&
+      ares_buf_len(ares_dns_labels_get_last(labels)) == 0) {
     ares_dns_name_labels_del_last(labels);
   }
 
   /* Scan to make sure label lengths are valid */
-  for (i = 0; i < ares__array_len(labels); i++) {
-    const ares__buf_t *buf = ares_dns_labels_get_at(labels, i);
-    size_t             len = ares__buf_len(buf);
+  for (i = 0; i < ares_array_len(labels); i++) {
+    const ares_buf_t *buf = ares_dns_labels_get_at(labels, i);
+    size_t            len = ares_buf_len(buf);
     /* No 0-length labels, and no labels over 63 bytes */
     if (len == 0 || len > 63) {
       status = ARES_EBADNAME;
@@ -335,8 +335,7 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
   }
 
   /* Can't exceed maximum (unescaped) length */
-  if (ares__array_len(labels) &&
-      total_len + ares__array_len(labels) - 1 > 255) {
+  if (ares_array_len(labels) && total_len + ares_array_len(labels) - 1 > 255) {
     status = ARES_EBADNAME;
     goto done;
   }
@@ -344,19 +343,19 @@ static ares_status_t ares_split_dns_name(ares__array_t *labels,
   status = ARES_SUCCESS;
 
 done:
-  ares__buf_destroy(namebuf);
+  ares_buf_destroy(namebuf);
   return status;
 }
 
-ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
-                                   ares_bool_t validate_hostname,
-                                   const char *name)
+ares_status_t ares_dns_name_write(ares_buf_t *buf, ares_llist_t **list,
+                                  ares_bool_t validate_hostname,
+                                  const char *name)
 {
   const ares_nameoffset_t *off = NULL;
   size_t                   name_len;
   size_t                   orig_name_len;
-  size_t                   pos    = ares__buf_len(buf);
-  ares__array_t           *labels = NULL;
+  size_t                   pos    = ares_buf_len(buf);
+  ares_array_t            *labels = NULL;
   char                     name_copy[512];
   ares_status_t            status;
 
@@ -364,7 +363,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  labels = ares__array_create(sizeof(ares__buf_t *), ares_dns_labels_free_cb);
+  labels = ares_array_create(sizeof(ares_buf_t *), ares_dns_labels_free_cb);
   if (labels == NULL) {
     return ARES_ENOMEM;
   }
@@ -376,7 +375,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
 
   /* Find longest match */
   if (list != NULL) {
-    off = ares__nameoffset_find(*list, name_copy);
+    off = ares_nameoffset_find(*list, name_copy);
     if (off != NULL && off->name_len != name_len) {
       /* truncate */
       name_len            -= (off->name_len + 1);
@@ -393,17 +392,17 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
       goto done;
     }
 
-    for (i = 0; i < ares__array_len(labels); i++) {
+    for (i = 0; i < ares_array_len(labels); i++) {
       size_t               len  = 0;
-      const ares__buf_t   *lbuf = ares_dns_labels_get_at(labels, i);
-      const unsigned char *ptr  = ares__buf_peek(lbuf, &len);
+      const ares_buf_t    *lbuf = ares_dns_labels_get_at(labels, i);
+      const unsigned char *ptr  = ares_buf_peek(lbuf, &len);
 
-      status = ares__buf_append_byte(buf, (unsigned char)(len & 0xFF));
+      status = ares_buf_append_byte(buf, (unsigned char)(len & 0xFF));
       if (status != ARES_SUCCESS) {
         goto done; /* LCOV_EXCL_LINE: OutOfMemory */
       }
 
-      status = ares__buf_append(buf, ptr, len);
+      status = ares_buf_append(buf, ptr, len);
       if (status != ARES_SUCCESS) {
         goto done; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -411,7 +410,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
 
     /* If we are NOT jumping to another label, output terminator */
     if (off == NULL) {
-      status = ares__buf_append_byte(buf, 0);
+      status = ares_buf_append_byte(buf, 0);
       if (status != ARES_SUCCESS) {
         goto done; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -422,7 +421,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
   if (off != NULL) {
     unsigned short u16 =
       (unsigned short)0xC000 | (unsigned short)(off->idx & 0x3FFF);
-    status = ares__buf_append_be16(buf, u16);
+    status = ares_buf_append_be16(buf, u16);
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -432,7 +431,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
    * a prior entry */
   if (list != NULL && (off == NULL || off->name_len != orig_name_len) &&
       name_len > 0) {
-    status = ares__nameoffset_create(list, name /* not truncated copy! */, pos);
+    status = ares_nameoffset_create(list, name /* not truncated copy! */, pos);
     if (status != ARES_SUCCESS) {
       goto done; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -441,7 +440,7 @@ ares_status_t ares__dns_name_write(ares__buf_t *buf, ares__llist_t **list,
   status = ARES_SUCCESS;
 
 done:
-  ares__array_destroy(labels);
+  ares_array_destroy(labels);
   return status;
 }
 
@@ -465,12 +464,12 @@ static ares_bool_t is_reservedch(int ch)
   return ARES_FALSE;
 }
 
-static ares_status_t ares__fetch_dnsname_into_buf(ares__buf_t *buf,
-                                                  ares__buf_t *dest, size_t len,
-                                                  ares_bool_t is_hostname)
+static ares_status_t ares_fetch_dnsname_into_buf(ares_buf_t *buf,
+                                                 ares_buf_t *dest, size_t len,
+                                                 ares_bool_t is_hostname)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_peek(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_peek(buf, &remaining_len);
   ares_status_t        status;
   size_t               i;
 
@@ -483,7 +482,7 @@ static ares_status_t ares__fetch_dnsname_into_buf(ares__buf_t *buf,
 
     /* Hostnames have a very specific allowed character set.  Anything outside
      * of that (non-printable and reserved included) are disallowed */
-    if (is_hostname && !ares__is_hostnamech(c)) {
+    if (is_hostname && !ares_is_hostnamech(c)) {
       status = ARES_EBADRESP;
       goto fail;
     }
@@ -495,7 +494,7 @@ static ares_status_t ares__fetch_dnsname_into_buf(ares__buf_t *buf,
     }
 
     /* Non-printable characters need to be output as \DDD */
-    if (!ares__isprint(c)) {
+    if (!ares_isprint(c)) {
       unsigned char escape[4];
 
       escape[0] = '\\';
@@ -503,7 +502,7 @@ static ares_status_t ares__fetch_dnsname_into_buf(ares__buf_t *buf,
       escape[2] = '0' + ((c % 100) / 10);
       escape[3] = '0' + (c % 10);
 
-      status = ares__buf_append(dest, escape, sizeof(escape));
+      status = ares_buf_append(dest, escape, sizeof(escape));
       if (status != ARES_SUCCESS) {
         goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -513,39 +512,39 @@ static ares_status_t ares__fetch_dnsname_into_buf(ares__buf_t *buf,
 
     /* Reserved characters need to be escaped, otherwise normal */
     if (is_reservedch(c)) {
-      status = ares__buf_append_byte(dest, '\\');
+      status = ares_buf_append_byte(dest, '\\');
       if (status != ARES_SUCCESS) {
         goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
       }
     }
 
-    status = ares__buf_append_byte(dest, c);
+    status = ares_buf_append_byte(dest, c);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
-  return ares__buf_consume(buf, len);
+  return ares_buf_consume(buf, len);
 
 fail:
   return status;
 }
 
-ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
-                                   ares_bool_t is_hostname)
+ares_status_t ares_dns_name_parse(ares_buf_t *buf, char **name,
+                                  ares_bool_t is_hostname)
 {
   size_t        save_offset = 0;
   unsigned char c;
   ares_status_t status;
-  ares__buf_t  *namebuf     = NULL;
-  size_t        label_start = ares__buf_get_position(buf);
+  ares_buf_t   *namebuf     = NULL;
+  size_t        label_start = ares_buf_get_position(buf);
 
   if (buf == NULL) {
     return ARES_EFORMERR;
   }
 
   if (name != NULL) {
-    namebuf = ares__buf_create();
+    namebuf = ares_buf_create();
     if (namebuf == NULL) {
       status = ARES_ENOMEM;
       goto fail;
@@ -562,11 +561,11 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
   while (1) {
     /* Keep track of the minimum label starting position to prevent forward
      * jumping */
-    if (label_start > ares__buf_get_position(buf)) {
-      label_start = ares__buf_get_position(buf);
+    if (label_start > ares_buf_get_position(buf)) {
+      label_start = ares_buf_get_position(buf);
     }
 
-    status = ares__buf_fetch_bytes(buf, &c, 1);
+    status = ares_buf_fetch_bytes(buf, &c, 1);
     if (status != ARES_SUCCESS) {
       goto fail;
     }
@@ -590,7 +589,7 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
       size_t offset = (size_t)((c & 0x3F) << 8);
 
       /* Fetch second byte of the redirect length */
-      status = ares__buf_fetch_bytes(buf, &c, 1);
+      status = ares_buf_fetch_bytes(buf, &c, 1);
       if (status != ARES_SUCCESS) {
         goto fail;
       }
@@ -612,10 +611,10 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
 
       /* First time we make a jump, save the current position */
       if (save_offset == 0) {
-        save_offset = ares__buf_get_position(buf);
+        save_offset = ares_buf_get_position(buf);
       }
 
-      status = ares__buf_set_position(buf, offset);
+      status = ares_buf_set_position(buf, offset);
       if (status != ARES_SUCCESS) {
         status = ARES_EBADNAME;
         goto fail;
@@ -634,14 +633,14 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
     /* New label */
 
     /* Labels are separated by periods */
-    if (ares__buf_len(namebuf) != 0 && name != NULL) {
-      status = ares__buf_append_byte(namebuf, '.');
+    if (ares_buf_len(namebuf) != 0 && name != NULL) {
+      status = ares_buf_append_byte(namebuf, '.');
       if (status != ARES_SUCCESS) {
         goto fail; /* LCOV_EXCL_LINE: OutOfMemory */
       }
     }
 
-    status = ares__fetch_dnsname_into_buf(buf, namebuf, c, is_hostname);
+    status = ares_fetch_dnsname_into_buf(buf, namebuf, c, is_hostname);
     if (status != ARES_SUCCESS) {
       goto fail;
     }
@@ -650,11 +649,11 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
   /* Restore offset read after first redirect/pointer as this is where the DNS
    * message continues */
   if (save_offset) {
-    ares__buf_set_position(buf, save_offset);
+    ares_buf_set_position(buf, save_offset);
   }
 
   if (name != NULL) {
-    *name = ares__buf_finish_str(namebuf, NULL);
+    *name = ares_buf_finish_str(namebuf, NULL);
     if (*name == NULL) {
       status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
       goto fail;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -669,6 +668,6 @@ ares_status_t ares__dns_name_parse(ares__buf_t *buf, char **name,
     status = ARES_EBADNAME;
   }
 
-  ares__buf_destroy(namebuf);
+  ares_buf_destroy(namebuf);
   return status;
 }
diff --git a/deps/cares/src/lib/record/ares_dns_parse.c b/deps/cares/src/lib/record/ares_dns_parse.c
index 57cb0f714e12e1..0c545d7aa18ada 100644
--- a/deps/cares/src/lib/record/ares_dns_parse.c
+++ b/deps/cares/src/lib/record/ares_dns_parse.c
@@ -29,17 +29,17 @@
 #  include <stdint.h>
 #endif
 
-static size_t ares_dns_rr_remaining_len(const ares__buf_t *buf, size_t orig_len,
+static size_t ares_dns_rr_remaining_len(const ares_buf_t *buf, size_t orig_len,
                                         size_t rdlength)
 {
-  size_t used_len = orig_len - ares__buf_len(buf);
+  size_t used_len = orig_len - ares_buf_len(buf);
   if (used_len >= rdlength) {
     return 0;
   }
   return rdlength - used_len;
 }
 
-static ares_status_t ares_dns_parse_and_set_dns_name(ares__buf_t   *buf,
+static ares_status_t ares_dns_parse_and_set_dns_name(ares_buf_t    *buf,
                                                      ares_bool_t    is_hostname,
                                                      ares_dns_rr_t *rr,
                                                      ares_dns_rr_key_t key)
@@ -47,7 +47,7 @@ static ares_status_t ares_dns_parse_and_set_dns_name(ares__buf_t   *buf,
   ares_status_t status;
   char         *name = NULL;
 
-  status = ares__dns_name_parse(buf, &name, is_hostname);
+  status = ares_dns_name_parse(buf, &name, is_hostname);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -60,7 +60,7 @@ static ares_status_t ares_dns_parse_and_set_dns_name(ares__buf_t   *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_and_set_dns_str(ares__buf_t      *buf,
+static ares_status_t ares_dns_parse_and_set_dns_str(ares_buf_t       *buf,
                                                     size_t            max_len,
                                                     ares_dns_rr_t    *rr,
                                                     ares_dns_rr_key_t key,
@@ -69,7 +69,7 @@ static ares_status_t ares_dns_parse_and_set_dns_str(ares__buf_t      *buf,
   ares_status_t status;
   char         *str = NULL;
 
-  status = ares__buf_parse_dns_str(buf, max_len, &str);
+  status = ares_buf_parse_dns_str(buf, max_len, &str);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -88,34 +88,35 @@ static ares_status_t ares_dns_parse_and_set_dns_str(ares__buf_t      *buf,
 }
 
 static ares_status_t
-  ares_dns_parse_and_set_dns_abin(ares__buf_t *buf, size_t max_len,
+  ares_dns_parse_and_set_dns_abin(ares_buf_t *buf, size_t max_len,
                                   ares_dns_rr_t *rr, ares_dns_rr_key_t key,
                                   ares_bool_t validate_printable)
 {
-  ares_status_t            status;
-  ares__dns_multistring_t *strs = NULL;
+  ares_status_t           status;
+  ares_dns_multistring_t *strs = NULL;
 
-  status = ares__buf_parse_dns_abinstr(buf, max_len, &strs, validate_printable);
+  status =
+    ares_dns_multistring_parse_buf(buf, max_len, &strs, validate_printable);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
   status = ares_dns_rr_set_abin_own(rr, key, strs);
   if (status != ARES_SUCCESS) {
-    ares__dns_multistring_destroy(strs);
+    ares_dns_multistring_destroy(strs);
     return status;
   }
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_and_set_be32(ares__buf_t      *buf,
+static ares_status_t ares_dns_parse_and_set_be32(ares_buf_t       *buf,
                                                  ares_dns_rr_t    *rr,
                                                  ares_dns_rr_key_t key)
 {
   ares_status_t status;
   unsigned int  u32;
 
-  status = ares__buf_fetch_be32(buf, &u32);
+  status = ares_buf_fetch_be32(buf, &u32);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -123,14 +124,14 @@ static ares_status_t ares_dns_parse_and_set_be32(ares__buf_t      *buf,
   return ares_dns_rr_set_u32(rr, key, u32);
 }
 
-static ares_status_t ares_dns_parse_and_set_be16(ares__buf_t      *buf,
+static ares_status_t ares_dns_parse_and_set_be16(ares_buf_t       *buf,
                                                  ares_dns_rr_t    *rr,
                                                  ares_dns_rr_key_t key)
 {
   ares_status_t  status;
   unsigned short u16;
 
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -138,14 +139,14 @@ static ares_status_t ares_dns_parse_and_set_be16(ares__buf_t      *buf,
   return ares_dns_rr_set_u16(rr, key, u16);
 }
 
-static ares_status_t ares_dns_parse_and_set_u8(ares__buf_t      *buf,
+static ares_status_t ares_dns_parse_and_set_u8(ares_buf_t       *buf,
                                                ares_dns_rr_t    *rr,
                                                ares_dns_rr_key_t key)
 {
   ares_status_t status;
   unsigned char u8;
 
-  status = ares__buf_fetch_bytes(buf, &u8, 1);
+  status = ares_buf_fetch_bytes(buf, &u8, 1);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -153,7 +154,7 @@ static ares_status_t ares_dns_parse_and_set_u8(ares__buf_t      *buf,
   return ares_dns_rr_set_u8(rr, key, u8);
 }
 
-static ares_status_t ares_dns_parse_rr_a(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_a(ares_buf_t *buf, ares_dns_rr_t *rr,
                                          size_t rdlength)
 {
   struct in_addr addr;
@@ -161,7 +162,7 @@ static ares_status_t ares_dns_parse_rr_a(ares__buf_t *buf, ares_dns_rr_t *rr,
 
   (void)rdlength; /* Not needed */
 
-  status = ares__buf_fetch_bytes(buf, (unsigned char *)&addr, sizeof(addr));
+  status = ares_buf_fetch_bytes(buf, (unsigned char *)&addr, sizeof(addr));
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -169,7 +170,7 @@ static ares_status_t ares_dns_parse_rr_a(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ares_dns_rr_set_addr(rr, ARES_RR_A_ADDR, &addr);
 }
 
-static ares_status_t ares_dns_parse_rr_ns(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_ns(ares_buf_t *buf, ares_dns_rr_t *rr,
                                           size_t rdlength)
 {
   (void)rdlength; /* Not needed */
@@ -178,8 +179,8 @@ static ares_status_t ares_dns_parse_rr_ns(ares__buf_t *buf, ares_dns_rr_t *rr,
                                          ARES_RR_NS_NSDNAME);
 }
 
-static ares_status_t ares_dns_parse_rr_cname(ares__buf_t   *buf,
-                                             ares_dns_rr_t *rr, size_t rdlength)
+static ares_status_t ares_dns_parse_rr_cname(ares_buf_t *buf, ares_dns_rr_t *rr,
+                                             size_t rdlength)
 {
   (void)rdlength; /* Not needed */
 
@@ -187,7 +188,7 @@ static ares_status_t ares_dns_parse_rr_cname(ares__buf_t   *buf,
                                          ARES_RR_CNAME_CNAME);
 }
 
-static ares_status_t ares_dns_parse_rr_soa(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_soa(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   ares_status_t status;
@@ -236,7 +237,7 @@ static ares_status_t ares_dns_parse_rr_soa(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ares_dns_parse_and_set_be32(buf, rr, ARES_RR_SOA_MINIMUM);
 }
 
-static ares_status_t ares_dns_parse_rr_ptr(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_ptr(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   (void)rdlength; /* Not needed */
@@ -245,11 +246,11 @@ static ares_status_t ares_dns_parse_rr_ptr(ares__buf_t *buf, ares_dns_rr_t *rr,
                                          ARES_RR_PTR_DNAME);
 }
 
-static ares_status_t ares_dns_parse_rr_hinfo(ares__buf_t   *buf,
-                                             ares_dns_rr_t *rr, size_t rdlength)
+static ares_status_t ares_dns_parse_rr_hinfo(ares_buf_t *buf, ares_dns_rr_t *rr,
+                                             size_t rdlength)
 {
   ares_status_t status;
-  size_t        orig_len = ares__buf_len(buf);
+  size_t        orig_len = ares_buf_len(buf);
 
   (void)rdlength; /* Not needed */
 
@@ -269,7 +270,7 @@ static ares_status_t ares_dns_parse_rr_hinfo(ares__buf_t   *buf,
   return status;
 }
 
-static ares_status_t ares_dns_parse_rr_mx(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_mx(ares_buf_t *buf, ares_dns_rr_t *rr,
                                           size_t rdlength)
 {
   ares_status_t status;
@@ -287,18 +288,18 @@ static ares_status_t ares_dns_parse_rr_mx(ares__buf_t *buf, ares_dns_rr_t *rr,
                                          ARES_RR_MX_EXCHANGE);
 }
 
-static ares_status_t ares_dns_parse_rr_txt(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_txt(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   return ares_dns_parse_and_set_dns_abin(buf, rdlength, rr, ARES_RR_TXT_DATA,
                                          ARES_FALSE);
 }
 
-static ares_status_t ares_dns_parse_rr_sig(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_sig(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   ares_status_t  status;
-  size_t         orig_len = ares__buf_len(buf);
+  size_t         orig_len = ares_buf_len(buf);
   size_t         len;
   unsigned char *data;
 
@@ -348,7 +349,7 @@ static ares_status_t ares_dns_parse_rr_sig(ares__buf_t *buf, ares_dns_rr_t *rr,
     return ARES_EBADRESP;
   }
 
-  status = ares__buf_fetch_bytes_dup(buf, len, ARES_FALSE, &data);
+  status = ares_buf_fetch_bytes_dup(buf, len, ARES_FALSE, &data);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -362,7 +363,7 @@ static ares_status_t ares_dns_parse_rr_sig(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_aaaa(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_aaaa(ares_buf_t *buf, ares_dns_rr_t *rr,
                                             size_t rdlength)
 {
   struct ares_in6_addr addr;
@@ -370,7 +371,7 @@ static ares_status_t ares_dns_parse_rr_aaaa(ares__buf_t *buf, ares_dns_rr_t *rr,
 
   (void)rdlength; /* Not needed */
 
-  status = ares__buf_fetch_bytes(buf, (unsigned char *)&addr, sizeof(addr));
+  status = ares_buf_fetch_bytes(buf, (unsigned char *)&addr, sizeof(addr));
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -378,7 +379,7 @@ static ares_status_t ares_dns_parse_rr_aaaa(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ares_dns_rr_set_addr6(rr, ARES_RR_AAAA_ADDR, &addr);
 }
 
-static ares_status_t ares_dns_parse_rr_srv(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_srv(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   ares_status_t status;
@@ -408,11 +409,11 @@ static ares_status_t ares_dns_parse_rr_srv(ares__buf_t *buf, ares_dns_rr_t *rr,
                                          ARES_RR_SRV_TARGET);
 }
 
-static ares_status_t ares_dns_parse_rr_naptr(ares__buf_t   *buf,
-                                             ares_dns_rr_t *rr, size_t rdlength)
+static ares_status_t ares_dns_parse_rr_naptr(ares_buf_t *buf, ares_dns_rr_t *rr,
+                                             size_t rdlength)
 {
   ares_status_t status;
-  size_t        orig_len = ares__buf_len(buf);
+  size_t        orig_len = ares_buf_len(buf);
 
   /* ORDER */
   status = ares_dns_parse_and_set_be16(buf, rr, ARES_RR_NAPTR_ORDER);
@@ -455,13 +456,13 @@ static ares_status_t ares_dns_parse_rr_naptr(ares__buf_t   *buf,
                                          ARES_RR_NAPTR_REPLACEMENT);
 }
 
-static ares_status_t ares_dns_parse_rr_opt(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_opt(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t         rdlength,
                                            unsigned short raw_class,
                                            unsigned int   raw_ttl)
 {
   ares_status_t  status;
-  size_t         orig_len = ares__buf_len(buf);
+  size_t         orig_len = ares_buf_len(buf);
   unsigned short rcode_high;
 
   status = ares_dns_rr_set_u16(rr, ARES_RR_OPT_UDP_SIZE, raw_class);
@@ -493,19 +494,19 @@ static ares_status_t ares_dns_parse_rr_opt(ares__buf_t *buf, ares_dns_rr_t *rr,
     unsigned char *val = NULL;
 
     /* Fetch be16 option */
-    status = ares__buf_fetch_be16(buf, &opt);
+    status = ares_buf_fetch_be16(buf, &opt);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Fetch be16 length */
-    status = ares__buf_fetch_be16(buf, &len);
+    status = ares_buf_fetch_be16(buf, &len);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     if (len) {
-      status = ares__buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
+      status = ares_buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
       if (status != ARES_SUCCESS) {
         return status;
       }
@@ -520,11 +521,11 @@ static ares_status_t ares_dns_parse_rr_opt(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_tlsa(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_tlsa(ares_buf_t *buf, ares_dns_rr_t *rr,
                                             size_t rdlength)
 {
   ares_status_t  status;
-  size_t         orig_len = ares__buf_len(buf);
+  size_t         orig_len = ares_buf_len(buf);
   size_t         len;
   unsigned char *data;
 
@@ -548,7 +549,7 @@ static ares_status_t ares_dns_parse_rr_tlsa(ares__buf_t *buf, ares_dns_rr_t *rr,
     return ARES_EBADRESP;
   }
 
-  status = ares__buf_fetch_bytes_dup(buf, len, ARES_FALSE, &data);
+  status = ares_buf_fetch_bytes_dup(buf, len, ARES_FALSE, &data);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -562,11 +563,11 @@ static ares_status_t ares_dns_parse_rr_tlsa(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_svcb(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_svcb(ares_buf_t *buf, ares_dns_rr_t *rr,
                                             size_t rdlength)
 {
   ares_status_t status;
-  size_t        orig_len = ares__buf_len(buf);
+  size_t        orig_len = ares_buf_len(buf);
 
   status = ares_dns_parse_and_set_be16(buf, rr, ARES_RR_SVCB_PRIORITY);
   if (status != ARES_SUCCESS) {
@@ -586,19 +587,19 @@ static ares_status_t ares_dns_parse_rr_svcb(ares__buf_t *buf, ares_dns_rr_t *rr,
     unsigned char *val = NULL;
 
     /* Fetch be16 option */
-    status = ares__buf_fetch_be16(buf, &opt);
+    status = ares_buf_fetch_be16(buf, &opt);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Fetch be16 length */
-    status = ares__buf_fetch_be16(buf, &len);
+    status = ares_buf_fetch_be16(buf, &len);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     if (len) {
-      status = ares__buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
+      status = ares_buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
       if (status != ARES_SUCCESS) {
         return status;
       }
@@ -613,11 +614,11 @@ static ares_status_t ares_dns_parse_rr_svcb(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_https(ares__buf_t   *buf,
-                                             ares_dns_rr_t *rr, size_t rdlength)
+static ares_status_t ares_dns_parse_rr_https(ares_buf_t *buf, ares_dns_rr_t *rr,
+                                             size_t rdlength)
 {
   ares_status_t status;
-  size_t        orig_len = ares__buf_len(buf);
+  size_t        orig_len = ares_buf_len(buf);
 
   status = ares_dns_parse_and_set_be16(buf, rr, ARES_RR_HTTPS_PRIORITY);
   if (status != ARES_SUCCESS) {
@@ -637,19 +638,19 @@ static ares_status_t ares_dns_parse_rr_https(ares__buf_t   *buf,
     unsigned char *val = NULL;
 
     /* Fetch be16 option */
-    status = ares__buf_fetch_be16(buf, &opt);
+    status = ares_buf_fetch_be16(buf, &opt);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Fetch be16 length */
-    status = ares__buf_fetch_be16(buf, &len);
+    status = ares_buf_fetch_be16(buf, &len);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     if (len) {
-      status = ares__buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
+      status = ares_buf_fetch_bytes_dup(buf, len, ARES_TRUE, &val);
       if (status != ARES_SUCCESS) {
         return status;
       }
@@ -664,12 +665,12 @@ static ares_status_t ares_dns_parse_rr_https(ares__buf_t   *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_uri(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_uri(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   char         *name = NULL;
   ares_status_t status;
-  size_t        orig_len = ares__buf_len(buf);
+  size_t        orig_len = ares_buf_len(buf);
   size_t        remaining_len;
 
   /* PRIORITY */
@@ -693,12 +694,12 @@ static ares_status_t ares_dns_parse_rr_uri(ares__buf_t *buf, ares_dns_rr_t *rr,
   }
 
   /* NOTE: Not in DNS string format */
-  status = ares__buf_fetch_str_dup(buf, remaining_len, &name);
+  status = ares_buf_fetch_str_dup(buf, remaining_len, &name);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
-  if (!ares__str_isprint(name, remaining_len)) {
+  if (!ares_str_isprint(name, remaining_len)) {
     ares_free(name);
     return ARES_EBADRESP;
   }
@@ -713,13 +714,13 @@ static ares_status_t ares_dns_parse_rr_uri(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_caa(ares__buf_t *buf, ares_dns_rr_t *rr,
+static ares_status_t ares_dns_parse_rr_caa(ares_buf_t *buf, ares_dns_rr_t *rr,
                                            size_t rdlength)
 {
   unsigned char *data     = NULL;
   size_t         data_len = 0;
   ares_status_t  status;
-  size_t         orig_len = ares__buf_len(buf);
+  size_t         orig_len = ares_buf_len(buf);
 
   /* CRITICAL */
   status = ares_dns_parse_and_set_u8(buf, rr, ARES_RR_CAA_CRITICAL);
@@ -741,7 +742,7 @@ static ares_status_t ares_dns_parse_rr_caa(ares__buf_t *buf, ares_dns_rr_t *rr,
     status = ARES_EBADRESP;
     return status;
   }
-  status = ares__buf_fetch_bytes_dup(buf, data_len, ARES_TRUE, &data);
+  status = ares_buf_fetch_bytes_dup(buf, data_len, ARES_TRUE, &data);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -756,7 +757,7 @@ static ares_status_t ares_dns_parse_rr_caa(ares__buf_t *buf, ares_dns_rr_t *rr,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_rr_raw_rr(ares__buf_t   *buf,
+static ares_status_t ares_dns_parse_rr_raw_rr(ares_buf_t    *buf,
                                               ares_dns_rr_t *rr,
                                               size_t         rdlength,
                                               unsigned short raw_type)
@@ -768,7 +769,7 @@ static ares_status_t ares_dns_parse_rr_raw_rr(ares__buf_t   *buf,
     return ARES_SUCCESS;
   }
 
-  status = ares__buf_fetch_bytes_dup(buf, rdlength, ARES_FALSE, &bytes);
+  status = ares_buf_fetch_bytes_dup(buf, rdlength, ARES_FALSE, &bytes);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -789,7 +790,7 @@ static ares_status_t ares_dns_parse_rr_raw_rr(ares__buf_t   *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_parse_header(ares__buf_t *buf, unsigned int flags,
+static ares_status_t ares_dns_parse_header(ares_buf_t *buf, unsigned int flags,
                                            ares_dns_record_t **dnsrec,
                                            unsigned short     *qdcount,
                                            unsigned short     *ancount,
@@ -833,13 +834,13 @@ static ares_status_t ares_dns_parse_header(ares__buf_t *buf, unsigned int flags,
    */
 
   /* ID */
-  status = ares__buf_fetch_be16(buf, &id);
+  status = ares_buf_fetch_be16(buf, &id);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
 
   /* Flags */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
@@ -888,25 +889,25 @@ static ares_status_t ares_dns_parse_header(ares__buf_t *buf, unsigned int flags,
   rcode = u16 & 0xf;
 
   /* QDCOUNT */
-  status = ares__buf_fetch_be16(buf, qdcount);
+  status = ares_buf_fetch_be16(buf, qdcount);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
 
   /* ANCOUNT */
-  status = ares__buf_fetch_be16(buf, ancount);
+  status = ares_buf_fetch_be16(buf, ancount);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
 
   /* NSCOUNT */
-  status = ares__buf_fetch_be16(buf, nscount);
+  status = ares_buf_fetch_be16(buf, nscount);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
 
   /* ARCOUNT */
-  status = ares__buf_fetch_be16(buf, arcount);
+  status = ares_buf_fetch_be16(buf, arcount);
   if (status != ARES_SUCCESS) {
     goto fail;
   }
@@ -957,7 +958,7 @@ static ares_status_t ares_dns_parse_header(ares__buf_t *buf, unsigned int flags,
 }
 
 static ares_status_t
-  ares_dns_parse_rr_data(ares__buf_t *buf, size_t rdlength, ares_dns_rr_t *rr,
+  ares_dns_parse_rr_data(ares_buf_t *buf, size_t rdlength, ares_dns_rr_t *rr,
                          ares_dns_rec_type_t type, unsigned short raw_type,
                          unsigned short raw_class, unsigned int raw_ttl)
 {
@@ -1006,7 +1007,7 @@ static ares_status_t
   return ARES_EFORMERR;
 }
 
-static ares_status_t ares_dns_parse_qd(ares__buf_t       *buf,
+static ares_status_t ares_dns_parse_qd(ares_buf_t        *buf,
                                        ares_dns_record_t *dnsrec)
 {
   char               *name = NULL;
@@ -1031,20 +1032,20 @@ static ares_status_t ares_dns_parse_qd(ares__buf_t       *buf,
    */
 
   /* Name */
-  status = ares__dns_name_parse(buf, &name, ARES_FALSE);
+  status = ares_dns_name_parse(buf, &name, ARES_FALSE);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Type */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto done;
   }
   type = u16;
 
   /* Class */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto done;
   }
@@ -1061,7 +1062,7 @@ static ares_status_t ares_dns_parse_qd(ares__buf_t       *buf,
   return status;
 }
 
-static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
+static ares_status_t ares_dns_parse_rr(ares_buf_t *buf, unsigned int flags,
                                        ares_dns_section_t sect,
                                        ares_dns_record_t *dnsrec)
 {
@@ -1102,13 +1103,13 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
    */
 
   /* Name */
-  status = ares__dns_name_parse(buf, &name, ARES_FALSE);
+  status = ares_dns_name_parse(buf, &name, ARES_FALSE);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Type */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto done;
   }
@@ -1116,20 +1117,20 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
   raw_type = u16; /* Only used for raw rr data */
 
   /* Class */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto done;
   }
   qclass = u16;
 
   /* TTL */
-  status = ares__buf_fetch_be32(buf, &ttl);
+  status = ares_buf_fetch_be32(buf, &ttl);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
   /* Length */
-  status = ares__buf_fetch_be16(buf, &u16);
+  status = ares_buf_fetch_be16(buf, &u16);
   if (status != ARES_SUCCESS) {
     goto done;
   }
@@ -1139,7 +1140,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
     type = ARES_REC_TYPE_RAW_RR;
   }
 
-  namecomp = ares_dns_rec_type_allow_name_compression(type);
+  namecomp = ares_dns_rec_allow_name_comp(type);
   if (sect == ARES_SECTION_ANSWER &&
       (flags &
        (namecomp ? ARES_DNS_PARSE_AN_BASE_RAW : ARES_DNS_PARSE_AN_EXT_RAW))) {
@@ -1157,7 +1158,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
   }
 
   /* Pull into another buffer for safety */
-  if (rdlength > ares__buf_len(buf)) {
+  if (rdlength > ares_buf_len(buf)) {
     status = ARES_EBADRESP;
     goto done;
   }
@@ -1173,7 +1174,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
 
   /* Record the current remaining length in the buffer so we can tell how
    * much was processed */
-  remaining_len = ares__buf_len(buf);
+  remaining_len = ares_buf_len(buf);
 
   /* Fill in the data for the rr */
   status = ares_dns_parse_rr_data(buf, rdlength, rr, type, raw_type,
@@ -1183,7 +1184,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
   }
 
   /* Determine how many bytes were processed */
-  processed_len = remaining_len - ares__buf_len(buf);
+  processed_len = remaining_len - ares_buf_len(buf);
 
   /* If too many bytes were processed, error! */
   if (processed_len > rdlength) {
@@ -1194,7 +1195,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
   /* If too few bytes were processed, consume the unprocessed data for this
    * record as the parser may not have wanted/needed to use it */
   if (processed_len < rdlength) {
-    ares__buf_consume(buf, rdlength - processed_len);
+    ares_buf_consume(buf, rdlength - processed_len);
   }
 
 
@@ -1203,7 +1204,7 @@ static ares_status_t ares_dns_parse_rr(ares__buf_t *buf, unsigned int flags,
   return status;
 }
 
-static ares_status_t ares_dns_parse_buf(ares__buf_t *buf, unsigned int flags,
+static ares_status_t ares_dns_parse_buf(ares_buf_t *buf, unsigned int flags,
                                         ares_dns_record_t **dnsrec)
 {
   ares_status_t  status;
@@ -1218,7 +1219,7 @@ static ares_status_t ares_dns_parse_buf(ares__buf_t *buf, unsigned int flags,
   }
 
   /* Maximum DNS packet size is 64k, even over TCP */
-  if (ares__buf_len(buf) > 0xFFFF) {
+  if (ares_buf_len(buf) > 0xFFFF) {
     return ARES_EFORMERR;
   }
 
@@ -1309,20 +1310,20 @@ static ares_status_t ares_dns_parse_buf(ares__buf_t *buf, unsigned int flags,
 ares_status_t ares_dns_parse(const unsigned char *buf, size_t buf_len,
                              unsigned int flags, ares_dns_record_t **dnsrec)
 {
-  ares__buf_t  *parser = NULL;
+  ares_buf_t   *parser = NULL;
   ares_status_t status;
 
   if (buf == NULL || buf_len == 0 || dnsrec == NULL) {
     return ARES_EFORMERR;
   }
 
-  parser = ares__buf_create_const(buf, buf_len);
+  parser = ares_buf_create_const(buf, buf_len);
   if (parser == NULL) {
     return ARES_ENOMEM;
   }
 
   status = ares_dns_parse_buf(parser, flags, dnsrec);
-  ares__buf_destroy(parser);
+  ares_buf_destroy(parser);
 
   return status;
 }
diff --git a/deps/cares/src/lib/record/ares_dns_private.h b/deps/cares/src/lib/record/ares_dns_private.h
index 5b86fed51f9851..e8fd600d1d2008 100644
--- a/deps/cares/src/lib/record/ares_dns_private.h
+++ b/deps/cares/src/lib/record/ares_dns_private.h
@@ -26,26 +26,26 @@
 #ifndef __ARES_DNS_PRIVATE_H
 #define __ARES_DNS_PRIVATE_H
 
-ares_status_t ares_dns_record_duplicate_ex(ares_dns_record_t      **dest,
-                                           const ares_dns_record_t *src);
-ares_bool_t ares_dns_rec_type_allow_name_compression(ares_dns_rec_type_t type);
-ares_bool_t ares_dns_opcode_isvalid(ares_dns_opcode_t opcode);
-ares_bool_t ares_dns_rcode_isvalid(ares_dns_rcode_t rcode);
-ares_bool_t ares_dns_flags_arevalid(unsigned short flags);
-ares_bool_t ares_dns_rec_type_isvalid(ares_dns_rec_type_t type,
-                                      ares_bool_t         is_query);
-ares_bool_t ares_dns_class_isvalid(ares_dns_class_t    qclass,
-                                   ares_dns_rec_type_t type,
-                                   ares_bool_t         is_query);
-ares_bool_t ares_dns_section_isvalid(ares_dns_section_t sect);
+ares_status_t        ares_dns_record_duplicate_ex(ares_dns_record_t      **dest,
+                                                  const ares_dns_record_t *src);
+ares_bool_t          ares_dns_rec_allow_name_comp(ares_dns_rec_type_t type);
+ares_bool_t          ares_dns_opcode_isvalid(ares_dns_opcode_t opcode);
+ares_bool_t          ares_dns_rcode_isvalid(ares_dns_rcode_t rcode);
+ares_bool_t          ares_dns_flags_arevalid(unsigned short flags);
+ares_bool_t          ares_dns_rec_type_isvalid(ares_dns_rec_type_t type,
+                                               ares_bool_t         is_query);
+ares_bool_t          ares_dns_class_isvalid(ares_dns_class_t    qclass,
+                                            ares_dns_rec_type_t type,
+                                            ares_bool_t         is_query);
+ares_bool_t          ares_dns_section_isvalid(ares_dns_section_t sect);
 ares_status_t        ares_dns_rr_set_str_own(ares_dns_rr_t    *dns_rr,
                                              ares_dns_rr_key_t key, char *val);
 ares_status_t        ares_dns_rr_set_bin_own(ares_dns_rr_t    *dns_rr,
                                              ares_dns_rr_key_t key, unsigned char *val,
                                              size_t len);
-ares_status_t        ares_dns_rr_set_abin_own(ares_dns_rr_t           *dns_rr,
-                                              ares_dns_rr_key_t        key,
-                                              ares__dns_multistring_t *strs);
+ares_status_t        ares_dns_rr_set_abin_own(ares_dns_rr_t          *dns_rr,
+                                              ares_dns_rr_key_t       key,
+                                              ares_dns_multistring_t *strs);
 ares_status_t        ares_dns_rr_set_opt_own(ares_dns_rr_t    *dns_rr,
                                              ares_dns_rr_key_t key, unsigned short opt,
                                              unsigned char *val, size_t val_len);
@@ -53,16 +53,16 @@ ares_status_t        ares_dns_record_rr_prealloc(ares_dns_record_t *dnsrec,
                                                  ares_dns_section_t sect, size_t cnt);
 ares_dns_rr_t       *ares_dns_get_opt_rr(ares_dns_record_t *rec);
 const ares_dns_rr_t *ares_dns_get_opt_rr_const(const ares_dns_record_t *rec);
-void          ares_dns_record_write_ttl_decrement(ares_dns_record_t *dnsrec,
-                                                  unsigned int       ttl_decrement);
+void                 ares_dns_record_ttl_decrement(ares_dns_record_t *dnsrec,
+                                                   unsigned int       ttl_decrement);
 
 /* Same as ares_dns_write() but appends to an existing buffer object */
-ares_status_t ares_dns_write_buf(const ares_dns_record_t *dnsrec,
-                                 ares__buf_t             *buf);
+ares_status_t        ares_dns_write_buf(const ares_dns_record_t *dnsrec,
+                                        ares_buf_t              *buf);
 
 /* Same as ares_dns_write_buf(), but prepends a 16bit length */
-ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
-                                     ares__buf_t             *buf);
+ares_status_t        ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
+                                            ares_buf_t              *buf);
 
 /*! Create a DNS record object for a query. The arguments are the same as
  *  those for ares_create_query().
@@ -99,15 +99,15 @@ struct ares_dns_qd {
 
 typedef struct {
   struct in_addr addr;
-} ares__dns_a_t;
+} ares_dns_a_t;
 
 typedef struct {
   char *nsdname;
-} ares__dns_ns_t;
+} ares_dns_ns_t;
 
 typedef struct {
   char *cname;
-} ares__dns_cname_t;
+} ares_dns_cname_t;
 
 typedef struct {
   char        *mname;
@@ -117,25 +117,25 @@ typedef struct {
   unsigned int retry;
   unsigned int expire;
   unsigned int minimum;
-} ares__dns_soa_t;
+} ares_dns_soa_t;
 
 typedef struct {
   char *dname;
-} ares__dns_ptr_t;
+} ares_dns_ptr_t;
 
 typedef struct {
   char *cpu;
   char *os;
-} ares__dns_hinfo_t;
+} ares_dns_hinfo_t;
 
 typedef struct {
   unsigned short preference;
   char          *exchange;
-} ares__dns_mx_t;
+} ares_dns_mx_t;
 
 typedef struct {
-  ares__dns_multistring_t *strs;
-} ares__dns_txt_t;
+  ares_dns_multistring_t *strs;
+} ares_dns_txt_t;
 
 typedef struct {
   unsigned short type_covered;
@@ -148,18 +148,18 @@ typedef struct {
   char          *signers_name;
   unsigned char *signature;
   size_t         signature_len;
-} ares__dns_sig_t;
+} ares_dns_sig_t;
 
 typedef struct {
   struct ares_in6_addr addr;
-} ares__dns_aaaa_t;
+} ares_dns_aaaa_t;
 
 typedef struct {
   unsigned short priority;
   unsigned short weight;
   unsigned short port;
   char          *target;
-} ares__dns_srv_t;
+} ares_dns_srv_t;
 
 typedef struct {
   unsigned short order;
@@ -168,21 +168,21 @@ typedef struct {
   char          *services;
   char          *regexp;
   char          *replacement;
-} ares__dns_naptr_t;
+} ares_dns_naptr_t;
 
 typedef struct {
   unsigned short opt;
   unsigned char *val;
   size_t         val_len;
-} ares__dns_optval_t;
+} ares_dns_optval_t;
 
 typedef struct {
   unsigned short udp_size; /*!< taken from class */
   unsigned char  version;  /*!< taken from bits 8-16 of ttl */
   unsigned short flags;    /*!< Flags, remaining 16 bits, though only
                             *   1 currently defined */
-  ares__array_t *options;  /*!< Type is ares__dns_optval_t */
-} ares__dns_opt_t;
+  ares_array_t  *options;  /*!< Type is ares_dns_optval_t */
+} ares_dns_opt_t;
 
 typedef struct {
   unsigned char  cert_usage;
@@ -190,26 +190,26 @@ typedef struct {
   unsigned char  match;
   unsigned char *data;
   size_t         data_len;
-} ares__dns_tlsa_t;
+} ares_dns_tlsa_t;
 
 typedef struct {
   unsigned short priority;
   char          *target;
-  ares__array_t *params; /*!< Type is ares__dns_optval_t */
-} ares__dns_svcb_t;
+  ares_array_t  *params; /*!< Type is ares_dns_optval_t */
+} ares_dns_svcb_t;
 
 typedef struct {
   unsigned short priority;
   unsigned short weight;
   char          *target;
-} ares__dns_uri_t;
+} ares_dns_uri_t;
 
 typedef struct {
   unsigned char  critical;
   char          *tag;
   unsigned char *value;
   size_t         value_len;
-} ares__dns_caa_t;
+} ares_dns_caa_t;
 
 /*! Raw, unparsed RR data */
 typedef struct {
@@ -217,7 +217,7 @@ typedef struct {
                           *   of those values since it wasn't parsed */
   unsigned char *data;   /*!< Raw RR data */
   size_t         length; /*!< Length of raw RR data */
-} ares__dns_raw_rr_t;
+} ares_dns_raw_rr_t;
 
 /*! DNS RR data structure */
 struct ares_dns_rr {
@@ -228,25 +228,25 @@ struct ares_dns_rr {
   unsigned int        ttl;
 
   union {
-    ares__dns_a_t      a;
-    ares__dns_ns_t     ns;
-    ares__dns_cname_t  cname;
-    ares__dns_soa_t    soa;
-    ares__dns_ptr_t    ptr;
-    ares__dns_hinfo_t  hinfo;
-    ares__dns_mx_t     mx;
-    ares__dns_txt_t    txt;
-    ares__dns_sig_t    sig;
-    ares__dns_aaaa_t   aaaa;
-    ares__dns_srv_t    srv;
-    ares__dns_naptr_t  naptr;
-    ares__dns_opt_t    opt;
-    ares__dns_tlsa_t   tlsa;
-    ares__dns_svcb_t   svcb;
-    ares__dns_svcb_t   https; /*!< https is a type of svcb, so this is right */
-    ares__dns_uri_t    uri;
-    ares__dns_caa_t    caa;
-    ares__dns_raw_rr_t raw_rr;
+    ares_dns_a_t      a;
+    ares_dns_ns_t     ns;
+    ares_dns_cname_t  cname;
+    ares_dns_soa_t    soa;
+    ares_dns_ptr_t    ptr;
+    ares_dns_hinfo_t  hinfo;
+    ares_dns_mx_t     mx;
+    ares_dns_txt_t    txt;
+    ares_dns_sig_t    sig;
+    ares_dns_aaaa_t   aaaa;
+    ares_dns_srv_t    srv;
+    ares_dns_naptr_t  naptr;
+    ares_dns_opt_t    opt;
+    ares_dns_tlsa_t   tlsa;
+    ares_dns_svcb_t   svcb;
+    ares_dns_svcb_t   https; /*!< https is a type of svcb, so this is right */
+    ares_dns_uri_t    uri;
+    ares_dns_caa_t    caa;
+    ares_dns_raw_rr_t raw_rr;
   } r;
 };
 
@@ -264,10 +264,10 @@ struct ares_dns_record {
                                     *   the ttl of any resource records by
                                     *   this amount.  Used for cache */
 
-  ares__array_t    *qd;            /*!< Type is ares_dns_qd_t */
-  ares__array_t    *an;            /*!< Type is ares_dns_rr_t */
-  ares__array_t    *ns;            /*!< Type is ares_dns_rr_t */
-  ares__array_t    *ar;            /*!< Type is ares_dns_rr_t */
+  ares_array_t     *qd;            /*!< Type is ares_dns_qd_t */
+  ares_array_t     *an;            /*!< Type is ares_dns_rr_t */
+  ares_array_t     *ns;            /*!< Type is ares_dns_rr_t */
+  ares_array_t     *ar;            /*!< Type is ares_dns_rr_t */
 };
 
 #endif
diff --git a/deps/cares/src/lib/record/ares_dns_record.c b/deps/cares/src/lib/record/ares_dns_record.c
index 147049490943a6..ec0dfbd13c49f3 100644
--- a/deps/cares/src/lib/record/ares_dns_record.c
+++ b/deps/cares/src/lib/record/ares_dns_record.c
@@ -29,7 +29,7 @@
 #  include <stdint.h>
 #endif
 
-static void ares__dns_rr_free(ares_dns_rr_t *rr);
+static void ares_dns_rr_free(ares_dns_rr_t *rr);
 
 static void ares_dns_qd_free_cb(void *arg)
 {
@@ -46,7 +46,7 @@ static void ares_dns_rr_free_cb(void *arg)
   if (rr == NULL) {
     return;
   }
-  ares__dns_rr_free(rr);
+  ares_dns_rr_free(rr);
 }
 
 ares_status_t ares_dns_record_create(ares_dns_record_t **dnsrec,
@@ -74,14 +74,10 @@ ares_status_t ares_dns_record_create(ares_dns_record_t **dnsrec,
   (*dnsrec)->flags  = flags;
   (*dnsrec)->opcode = opcode;
   (*dnsrec)->rcode  = rcode;
-  (*dnsrec)->qd =
-    ares__array_create(sizeof(ares_dns_qd_t), ares_dns_qd_free_cb);
-  (*dnsrec)->an =
-    ares__array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
-  (*dnsrec)->ns =
-    ares__array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
-  (*dnsrec)->ar =
-    ares__array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
+  (*dnsrec)->qd = ares_array_create(sizeof(ares_dns_qd_t), ares_dns_qd_free_cb);
+  (*dnsrec)->an = ares_array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
+  (*dnsrec)->ns = ares_array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
+  (*dnsrec)->ar = ares_array_create(sizeof(ares_dns_rr_t), ares_dns_rr_free_cb);
 
   if ((*dnsrec)->qd == NULL || (*dnsrec)->an == NULL || (*dnsrec)->ns == NULL ||
       (*dnsrec)->ar == NULL) {
@@ -134,7 +130,7 @@ ares_dns_rcode_t ares_dns_record_get_rcode(const ares_dns_record_t *dnsrec)
   return dnsrec->rcode;
 }
 
-static void ares__dns_rr_free(ares_dns_rr_t *rr)
+static void ares_dns_rr_free(ares_dns_rr_t *rr)
 {
   ares_free(rr->name);
 
@@ -172,7 +168,7 @@ static void ares__dns_rr_free(ares_dns_rr_t *rr)
       break;
 
     case ARES_REC_TYPE_TXT:
-      ares__dns_multistring_destroy(rr->r.txt.strs);
+      ares_dns_multistring_destroy(rr->r.txt.strs);
       break;
 
     case ARES_REC_TYPE_SIG:
@@ -192,7 +188,7 @@ static void ares__dns_rr_free(ares_dns_rr_t *rr)
       break;
 
     case ARES_REC_TYPE_OPT:
-      ares__array_destroy(rr->r.opt.options);
+      ares_array_destroy(rr->r.opt.options);
       break;
 
     case ARES_REC_TYPE_TLSA:
@@ -201,12 +197,12 @@ static void ares__dns_rr_free(ares_dns_rr_t *rr)
 
     case ARES_REC_TYPE_SVCB:
       ares_free(rr->r.svcb.target);
-      ares__array_destroy(rr->r.svcb.params);
+      ares_array_destroy(rr->r.svcb.params);
       break;
 
     case ARES_REC_TYPE_HTTPS:
       ares_free(rr->r.https.target);
-      ares__array_destroy(rr->r.https.params);
+      ares_array_destroy(rr->r.https.params);
       break;
 
     case ARES_REC_TYPE_URI:
@@ -231,16 +227,16 @@ void ares_dns_record_destroy(ares_dns_record_t *dnsrec)
   }
 
   /* Free questions */
-  ares__array_destroy(dnsrec->qd);
+  ares_array_destroy(dnsrec->qd);
 
   /* Free answers */
-  ares__array_destroy(dnsrec->an);
+  ares_array_destroy(dnsrec->an);
 
   /* Free authority */
-  ares__array_destroy(dnsrec->ns);
+  ares_array_destroy(dnsrec->ns);
 
   /* Free additional */
-  ares__array_destroy(dnsrec->ar);
+  ares_array_destroy(dnsrec->ar);
 
   ares_free(dnsrec);
 }
@@ -250,7 +246,7 @@ size_t ares_dns_record_query_cnt(const ares_dns_record_t *dnsrec)
   if (dnsrec == NULL) {
     return 0;
   }
-  return ares__array_len(dnsrec->qd);
+  return ares_array_len(dnsrec->qd);
 }
 
 ares_status_t ares_dns_record_query_add(ares_dns_record_t  *dnsrec,
@@ -268,15 +264,15 @@ ares_status_t ares_dns_record_query_add(ares_dns_record_t  *dnsrec,
     return ARES_EFORMERR;
   }
 
-  idx    = ares__array_len(dnsrec->qd);
-  status = ares__array_insert_last((void **)&qd, dnsrec->qd);
+  idx    = ares_array_len(dnsrec->qd);
+  status = ares_array_insert_last((void **)&qd, dnsrec->qd);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
   qd->name = ares_strdup(name);
   if (qd->name == NULL) {
-    ares__array_remove_at(dnsrec->qd, idx);
+    ares_array_remove_at(dnsrec->qd, idx);
     return ARES_ENOMEM;
   }
   qd->qtype  = qtype;
@@ -290,11 +286,11 @@ ares_status_t ares_dns_record_query_set_name(ares_dns_record_t *dnsrec,
   char          *orig_name = NULL;
   ares_dns_qd_t *qd;
 
-  if (dnsrec == NULL || idx >= ares__array_len(dnsrec->qd) || name == NULL) {
+  if (dnsrec == NULL || idx >= ares_array_len(dnsrec->qd) || name == NULL) {
     return ARES_EFORMERR;
   }
 
-  qd = ares__array_at(dnsrec->qd, idx);
+  qd = ares_array_at(dnsrec->qd, idx);
 
   orig_name = qd->name;
   qd->name  = ares_strdup(name);
@@ -313,12 +309,12 @@ ares_status_t ares_dns_record_query_set_type(ares_dns_record_t  *dnsrec,
 {
   ares_dns_qd_t *qd;
 
-  if (dnsrec == NULL || idx >= ares__array_len(dnsrec->qd) ||
+  if (dnsrec == NULL || idx >= ares_array_len(dnsrec->qd) ||
       !ares_dns_rec_type_isvalid(qtype, ARES_TRUE)) {
     return ARES_EFORMERR;
   }
 
-  qd        = ares__array_at(dnsrec->qd, idx);
+  qd        = ares_array_at(dnsrec->qd, idx);
   qd->qtype = qtype;
 
   return ARES_SUCCESS;
@@ -330,11 +326,11 @@ ares_status_t ares_dns_record_query_get(const ares_dns_record_t *dnsrec,
                                         ares_dns_class_t    *qclass)
 {
   const ares_dns_qd_t *qd;
-  if (dnsrec == NULL || idx >= ares__array_len(dnsrec->qd)) {
+  if (dnsrec == NULL || idx >= ares_array_len(dnsrec->qd)) {
     return ARES_EFORMERR;
   }
 
-  qd = ares__array_at(dnsrec->qd, idx);
+  qd = ares_array_at(dnsrec->qd, idx);
   if (name != NULL) {
     *name = qd->name;
   }
@@ -359,11 +355,11 @@ size_t ares_dns_record_rr_cnt(const ares_dns_record_t *dnsrec,
 
   switch (sect) {
     case ARES_SECTION_ANSWER:
-      return ares__array_len(dnsrec->an);
+      return ares_array_len(dnsrec->an);
     case ARES_SECTION_AUTHORITY:
-      return ares__array_len(dnsrec->ns);
+      return ares_array_len(dnsrec->ns);
     case ARES_SECTION_ADDITIONAL:
-      return ares__array_len(dnsrec->ar);
+      return ares_array_len(dnsrec->ar);
   }
 
   return 0; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -372,7 +368,7 @@ size_t ares_dns_record_rr_cnt(const ares_dns_record_t *dnsrec,
 ares_status_t ares_dns_record_rr_prealloc(ares_dns_record_t *dnsrec,
                                           ares_dns_section_t sect, size_t cnt)
 {
-  ares__array_t *arr = NULL;
+  ares_array_t *arr = NULL;
 
   if (dnsrec == NULL || !ares_dns_section_isvalid(sect)) {
     return ARES_EFORMERR;
@@ -390,11 +386,11 @@ ares_status_t ares_dns_record_rr_prealloc(ares_dns_record_t *dnsrec,
       break;
   }
 
-  if (cnt < ares__array_len(arr)) {
+  if (cnt < ares_array_len(arr)) {
     return ARES_EFORMERR;
   }
 
-  return ares__array_set_size(arr, cnt);
+  return ares_array_set_size(arr, cnt);
 }
 
 ares_status_t ares_dns_record_rr_add(ares_dns_rr_t    **rr_out,
@@ -404,7 +400,7 @@ ares_status_t ares_dns_record_rr_add(ares_dns_rr_t    **rr_out,
                                      ares_dns_class_t rclass, unsigned int ttl)
 {
   ares_dns_rr_t *rr  = NULL;
-  ares__array_t *arr = NULL;
+  ares_array_t  *arr = NULL;
   ares_status_t  status;
   size_t         idx;
 
@@ -429,15 +425,15 @@ ares_status_t ares_dns_record_rr_add(ares_dns_rr_t    **rr_out,
       break;
   }
 
-  idx    = ares__array_len(arr);
-  status = ares__array_insert_last((void **)&rr, arr);
+  idx    = ares_array_len(arr);
+  status = ares_array_insert_last((void **)&rr, arr);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   rr->name = ares_strdup(name);
   if (rr->name == NULL) {
-    ares__array_remove_at(arr, idx);
+    ares_array_remove_at(arr, idx);
     return ARES_ENOMEM;
   }
 
@@ -454,7 +450,7 @@ ares_status_t ares_dns_record_rr_add(ares_dns_rr_t    **rr_out,
 ares_status_t ares_dns_record_rr_del(ares_dns_record_t *dnsrec,
                                      ares_dns_section_t sect, size_t idx)
 {
-  ares__array_t *arr = NULL;
+  ares_array_t *arr = NULL;
 
   if (dnsrec == NULL || !ares_dns_section_isvalid(sect)) {
     return ARES_EFORMERR;
@@ -472,13 +468,13 @@ ares_status_t ares_dns_record_rr_del(ares_dns_record_t *dnsrec,
       break;
   }
 
-  return ares__array_remove_at(arr, idx);
+  return ares_array_remove_at(arr, idx);
 }
 
 ares_dns_rr_t *ares_dns_record_rr_get(ares_dns_record_t *dnsrec,
                                       ares_dns_section_t sect, size_t idx)
 {
-  ares__array_t *arr = NULL;
+  ares_array_t *arr = NULL;
 
   if (dnsrec == NULL || !ares_dns_section_isvalid(sect)) {
     return NULL;
@@ -496,7 +492,7 @@ ares_dns_rr_t *ares_dns_record_rr_get(ares_dns_record_t *dnsrec,
       break;
   }
 
-  return ares__array_at(arr, idx);
+  return ares_array_at(arr, idx);
 }
 
 const ares_dns_rr_t *
@@ -849,14 +845,14 @@ const unsigned char *ares_dns_rr_get_bin(const ares_dns_rr_t *dns_rr,
 
   /* Array of strings, return concatenated version */
   if (ares_dns_rr_key_datatype(key) == ARES_DATATYPE_ABINP) {
-    ares__dns_multistring_t * const *strs =
+    ares_dns_multistring_t * const *strs =
       ares_dns_rr_data_ptr_const(dns_rr, key, NULL);
 
     if (strs == NULL) {
       return NULL;
     }
 
-    return ares__dns_multistring_get_combined(*strs, len);
+    return ares_dns_multistring_combined(*strs, len);
   }
 
   /* Not a multi-string, just straight binary data */
@@ -877,7 +873,7 @@ const unsigned char *ares_dns_rr_get_bin(const ares_dns_rr_t *dns_rr,
 size_t ares_dns_rr_get_abin_cnt(const ares_dns_rr_t *dns_rr,
                                 ares_dns_rr_key_t    key)
 {
-  ares__dns_multistring_t * const *strs;
+  ares_dns_multistring_t * const *strs;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_ABINP) {
     return 0;
@@ -888,14 +884,14 @@ size_t ares_dns_rr_get_abin_cnt(const ares_dns_rr_t *dns_rr,
     return 0;
   }
 
-  return ares__dns_multistring_cnt(*strs);
+  return ares_dns_multistring_cnt(*strs);
 }
 
 const unsigned char *ares_dns_rr_get_abin(const ares_dns_rr_t *dns_rr,
                                           ares_dns_rr_key_t key, size_t idx,
                                           size_t *len)
 {
-  ares__dns_multistring_t * const *strs;
+  ares_dns_multistring_t * const *strs;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_ABINP) {
     return NULL;
@@ -906,13 +902,13 @@ const unsigned char *ares_dns_rr_get_abin(const ares_dns_rr_t *dns_rr,
     return NULL;
   }
 
-  return ares__dns_multistring_get(*strs, idx, len);
+  return ares_dns_multistring_get(*strs, idx, len);
 }
 
 ares_status_t ares_dns_rr_del_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
                                    size_t idx)
 {
-  ares__dns_multistring_t **strs;
+  ares_dns_multistring_t **strs;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_ABINP) {
     return ARES_EFORMERR;
@@ -923,7 +919,7 @@ ares_status_t ares_dns_rr_del_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
     return ARES_EFORMERR;
   }
 
-  return ares__dns_multistring_del(*strs, idx);
+  return ares_dns_multistring_del(*strs, idx);
 }
 
 ares_status_t ares_dns_rr_add_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
@@ -933,9 +929,9 @@ ares_status_t ares_dns_rr_add_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
   ares_dns_datatype_t datatype = ares_dns_rr_key_datatype(key);
   ares_bool_t         is_nullterm =
     (datatype == ARES_DATATYPE_ABINP) ? ARES_TRUE : ARES_FALSE;
-  size_t                    alloclen = is_nullterm ? len + 1 : len;
-  unsigned char            *temp;
-  ares__dns_multistring_t **strs;
+  size_t                   alloclen = is_nullterm ? len + 1 : len;
+  unsigned char           *temp;
+  ares_dns_multistring_t **strs;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_ABINP) {
     return ARES_EFORMERR;
@@ -947,7 +943,7 @@ ares_status_t ares_dns_rr_add_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
   }
 
   if (*strs == NULL) {
-    *strs = ares__dns_multistring_create();
+    *strs = ares_dns_multistring_create();
     if (*strs == NULL) {
       return ARES_ENOMEM;
     }
@@ -965,7 +961,7 @@ ares_status_t ares_dns_rr_add_abin(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
     temp[len] = 0;
   }
 
-  status = ares__dns_multistring_add_own(*strs, temp, len);
+  status = ares_dns_multistring_add_own(*strs, temp, len);
   if (status != ARES_SUCCESS) {
     ares_free(temp);
   }
@@ -994,7 +990,7 @@ const char *ares_dns_rr_get_str(const ares_dns_rr_t *dns_rr,
 size_t ares_dns_rr_get_opt_cnt(const ares_dns_rr_t *dns_rr,
                                ares_dns_rr_key_t    key)
 {
-  ares__array_t * const *opts;
+  ares_array_t * const *opts;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_OPT) {
     return 0;
@@ -1005,15 +1001,15 @@ size_t ares_dns_rr_get_opt_cnt(const ares_dns_rr_t *dns_rr,
     return 0;
   }
 
-  return ares__array_len(*opts);
+  return ares_array_len(*opts);
 }
 
 unsigned short ares_dns_rr_get_opt(const ares_dns_rr_t *dns_rr,
                                    ares_dns_rr_key_t key, size_t idx,
                                    const unsigned char **val, size_t *val_len)
 {
-  ares__array_t * const    *opts;
-  const ares__dns_optval_t *opt;
+  ares_array_t * const    *opts;
+  const ares_dns_optval_t *opt;
 
   if (val) {
     *val = NULL;
@@ -1031,7 +1027,7 @@ unsigned short ares_dns_rr_get_opt(const ares_dns_rr_t *dns_rr,
     return 65535;
   }
 
-  opt = ares__array_at(*opts, idx);
+  opt = ares_array_at(*opts, idx);
   if (opt == NULL) {
     return 65535;
   }
@@ -1050,10 +1046,10 @@ ares_bool_t ares_dns_rr_get_opt_byid(const ares_dns_rr_t *dns_rr,
                                      ares_dns_rr_key_t key, unsigned short opt,
                                      const unsigned char **val, size_t *val_len)
 {
-  ares__array_t * const    *opts;
-  size_t                    i;
-  size_t                    cnt;
-  const ares__dns_optval_t *optptr = NULL;
+  ares_array_t * const    *opts;
+  size_t                   i;
+  size_t                   cnt;
+  const ares_dns_optval_t *optptr = NULL;
 
   if (val) {
     *val = NULL;
@@ -1071,9 +1067,9 @@ ares_bool_t ares_dns_rr_get_opt_byid(const ares_dns_rr_t *dns_rr,
     return ARES_FALSE;
   }
 
-  cnt = ares__array_len(*opts);
+  cnt = ares_array_len(*opts);
   for (i = 0; i < cnt; i++) {
-    optptr = ares__array_at(*opts, i);
+    optptr = ares_array_at(*opts, i);
     if (optptr == NULL) {
       return ARES_FALSE;
     }
@@ -1200,22 +1196,22 @@ ares_status_t ares_dns_rr_set_bin_own(ares_dns_rr_t    *dns_rr,
   }
 
   if (ares_dns_rr_key_datatype(key) == ARES_DATATYPE_ABINP) {
-    ares__dns_multistring_t **strs = ares_dns_rr_data_ptr(dns_rr, key, NULL);
+    ares_dns_multistring_t **strs = ares_dns_rr_data_ptr(dns_rr, key, NULL);
     if (strs == NULL) {
       return ARES_EFORMERR;
     }
 
     if (*strs == NULL) {
-      *strs = ares__dns_multistring_create();
+      *strs = ares_dns_multistring_create();
       if (*strs == NULL) {
         return ARES_ENOMEM;
       }
     }
 
     /* Clear all existing entries as this is an override */
-    ares__dns_multistring_clear(*strs);
+    ares_dns_multistring_clear(*strs);
 
-    return ares__dns_multistring_add_own(*strs, val, len);
+    return ares_dns_multistring_add_own(*strs, val, len);
   }
 
   bin = ares_dns_rr_data_ptr(dns_rr, key, &bin_len);
@@ -1307,11 +1303,11 @@ ares_status_t ares_dns_rr_set_str(ares_dns_rr_t *dns_rr, ares_dns_rr_key_t key,
   return status;
 }
 
-ares_status_t ares_dns_rr_set_abin_own(ares_dns_rr_t           *dns_rr,
-                                       ares_dns_rr_key_t        key,
-                                       ares__dns_multistring_t *strs)
+ares_status_t ares_dns_rr_set_abin_own(ares_dns_rr_t          *dns_rr,
+                                       ares_dns_rr_key_t       key,
+                                       ares_dns_multistring_t *strs)
 {
-  ares__dns_multistring_t **strs_ptr;
+  ares_dns_multistring_t **strs_ptr;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_ABINP) {
     return ARES_EFORMERR;
@@ -1323,16 +1319,16 @@ ares_status_t ares_dns_rr_set_abin_own(ares_dns_rr_t           *dns_rr,
   }
 
   if (*strs_ptr != NULL) {
-    ares__dns_multistring_destroy(*strs_ptr);
+    ares_dns_multistring_destroy(*strs_ptr);
   }
   *strs_ptr = strs;
 
   return ARES_SUCCESS;
 }
 
-static void ares__dns_opt_free_cb(void *arg)
+static void ares_dns_opt_free_cb(void *arg)
 {
-  ares__dns_optval_t *opt = arg;
+  ares_dns_optval_t *opt = arg;
   if (opt == NULL) {
     return;
   }
@@ -1343,11 +1339,11 @@ ares_status_t ares_dns_rr_set_opt_own(ares_dns_rr_t    *dns_rr,
                                       ares_dns_rr_key_t key, unsigned short opt,
                                       unsigned char *val, size_t val_len)
 {
-  ares__array_t     **options;
-  ares__dns_optval_t *optptr = NULL;
-  size_t              idx;
-  size_t              cnt;
-  ares_status_t       status;
+  ares_array_t     **options;
+  ares_dns_optval_t *optptr = NULL;
+  size_t             idx;
+  size_t             cnt;
+  ares_status_t      status;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_OPT) {
     return ARES_EFORMERR;
@@ -1360,15 +1356,15 @@ ares_status_t ares_dns_rr_set_opt_own(ares_dns_rr_t    *dns_rr,
 
   if (*options == NULL) {
     *options =
-      ares__array_create(sizeof(ares__dns_optval_t), ares__dns_opt_free_cb);
+      ares_array_create(sizeof(ares_dns_optval_t), ares_dns_opt_free_cb);
   }
   if (*options == NULL) {
     return ARES_ENOMEM;
   }
 
-  cnt = ares__array_len(*options);
+  cnt = ares_array_len(*options);
   for (idx = 0; idx < cnt; idx++) {
-    optptr = ares__array_at(*options, idx);
+    optptr = ares_array_at(*options, idx);
     if (optptr == NULL) {
       return ARES_EFORMERR;
     }
@@ -1382,7 +1378,7 @@ ares_status_t ares_dns_rr_set_opt_own(ares_dns_rr_t    *dns_rr,
     goto done;
   }
 
-  status = ares__array_insert_last((void **)&optptr, *options);
+  status = ares_array_insert_last((void **)&optptr, *options);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -1424,10 +1420,10 @@ ares_status_t ares_dns_rr_del_opt_byid(ares_dns_rr_t    *dns_rr,
                                        ares_dns_rr_key_t key,
                                        unsigned short    opt)
 {
-  ares__array_t           **options;
-  const ares__dns_optval_t *optptr;
-  size_t                    idx;
-  size_t                    cnt;
+  ares_array_t           **options;
+  const ares_dns_optval_t *optptr;
+  size_t                   idx;
+  size_t                   cnt;
 
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_OPT) {
     return ARES_EFORMERR;
@@ -1443,14 +1439,14 @@ ares_status_t ares_dns_rr_del_opt_byid(ares_dns_rr_t    *dns_rr,
     return ARES_SUCCESS;
   }
 
-  cnt = ares__array_len(*options);
+  cnt = ares_array_len(*options);
   for (idx = 0; idx < cnt; idx++) {
-    optptr = ares__array_at_const(*options, idx);
+    optptr = ares_array_at_const(*options, idx);
     if (optptr == NULL) {
       return ARES_ENOTFOUND;
     }
     if (optptr->opt == opt) {
-      return ares__array_remove_at(*options, idx);
+      return ares_array_remove_at(*options, idx);
     }
   }
 
@@ -1459,7 +1455,7 @@ ares_status_t ares_dns_rr_del_opt_byid(ares_dns_rr_t    *dns_rr,
 
 char *ares_dns_addr_to_ptr(const struct ares_addr *addr)
 {
-  ares__buf_t               *buf     = NULL;
+  ares_buf_t                *buf     = NULL;
   const unsigned char       *ptr     = NULL;
   size_t                     ptr_len = 0;
   size_t                     i;
@@ -1470,7 +1466,7 @@ char *ares_dns_addr_to_ptr(const struct ares_addr *addr)
     goto fail;
   }
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     goto fail;
   }
@@ -1485,47 +1481,47 @@ char *ares_dns_addr_to_ptr(const struct ares_addr *addr)
 
   for (i = ptr_len; i > 0; i--) {
     if (addr->family == AF_INET) {
-      status = ares__buf_append_num_dec(buf, (size_t)ptr[i - 1], 0);
+      status = ares_buf_append_num_dec(buf, (size_t)ptr[i - 1], 0);
     } else {
       unsigned char c;
 
       c      = ptr[i - 1] & 0xF;
-      status = ares__buf_append_byte(buf, hexbytes[c]);
+      status = ares_buf_append_byte(buf, hexbytes[c]);
       if (status != ARES_SUCCESS) {
         goto fail;
       }
 
-      status = ares__buf_append_byte(buf, '.');
+      status = ares_buf_append_byte(buf, '.');
       if (status != ARES_SUCCESS) {
         goto fail;
       }
 
       c      = (ptr[i - 1] >> 4) & 0xF;
-      status = ares__buf_append_byte(buf, hexbytes[c]);
+      status = ares_buf_append_byte(buf, hexbytes[c]);
     }
     if (status != ARES_SUCCESS) {
       goto fail;
     }
 
-    status = ares__buf_append_byte(buf, '.');
+    status = ares_buf_append_byte(buf, '.');
     if (status != ARES_SUCCESS) {
       goto fail;
     }
   }
 
   if (addr->family == AF_INET) {
-    status = ares__buf_append(buf, (const unsigned char *)"in-addr.arpa", 12);
+    status = ares_buf_append(buf, (const unsigned char *)"in-addr.arpa", 12);
   } else {
-    status = ares__buf_append(buf, (const unsigned char *)"ip6.arpa", 8);
+    status = ares_buf_append(buf, (const unsigned char *)"ip6.arpa", 8);
   }
   if (status != ARES_SUCCESS) {
     goto fail;
   }
 
-  return ares__buf_finish_str(buf, NULL);
+  return ares_buf_finish_str(buf, NULL);
 
 fail:
-  ares__buf_destroy(buf);
+  ares_buf_destroy(buf);
   return NULL;
 }
 
@@ -1575,7 +1571,7 @@ ares_status_t
   *dnsrec = NULL;
 
   /* Per RFC 7686, reject queries for ".onion" domain names with NXDOMAIN */
-  if (ares__is_onion_domain(name)) {
+  if (ares_is_onion_domain(name)) {
     status = ARES_ENOTFOUND;
     goto done;
   }
diff --git a/deps/cares/src/lib/record/ares_dns_write.c b/deps/cares/src/lib/record/ares_dns_write.c
index 8a3addd9f01b53..549017ffbc1768 100644
--- a/deps/cares/src/lib/record/ares_dns_write.c
+++ b/deps/cares/src/lib/record/ares_dns_write.c
@@ -31,7 +31,7 @@
 
 
 static ares_status_t ares_dns_write_header(const ares_dns_record_t *dnsrec,
-                                           ares__buf_t             *buf)
+                                           ares_buf_t              *buf)
 {
   unsigned short u16;
   unsigned short opcode;
@@ -40,7 +40,7 @@ static ares_status_t ares_dns_write_header(const ares_dns_record_t *dnsrec,
   ares_status_t  status;
 
   /* ID */
-  status = ares__buf_append_be16(buf, dnsrec->id);
+  status = ares_buf_append_be16(buf, dnsrec->id);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -99,35 +99,35 @@ static ares_status_t ares_dns_write_header(const ares_dns_record_t *dnsrec,
   }
   u16 |= rcode;
 
-  status = ares__buf_append_be16(buf, u16);
+  status = ares_buf_append_be16(buf, u16);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* QDCOUNT */
-  status = ares__buf_append_be16(
+  status = ares_buf_append_be16(
     buf, (unsigned short)ares_dns_record_query_cnt(dnsrec));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* ANCOUNT */
-  status = ares__buf_append_be16(
+  status = ares_buf_append_be16(
     buf, (unsigned short)ares_dns_record_rr_cnt(dnsrec, ARES_SECTION_ANSWER));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* NSCOUNT */
-  status = ares__buf_append_be16(buf, (unsigned short)ares_dns_record_rr_cnt(
-                                        dnsrec, ARES_SECTION_AUTHORITY));
+  status = ares_buf_append_be16(buf, (unsigned short)ares_dns_record_rr_cnt(
+                                       dnsrec, ARES_SECTION_AUTHORITY));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* ARCOUNT */
-  status = ares__buf_append_be16(buf, (unsigned short)ares_dns_record_rr_cnt(
-                                        dnsrec, ARES_SECTION_ADDITIONAL));
+  status = ares_buf_append_be16(buf, (unsigned short)ares_dns_record_rr_cnt(
+                                       dnsrec, ARES_SECTION_ADDITIONAL));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -136,8 +136,8 @@ static ares_status_t ares_dns_write_header(const ares_dns_record_t *dnsrec,
 }
 
 static ares_status_t ares_dns_write_questions(const ares_dns_record_t *dnsrec,
-                                              ares__llist_t          **namelist,
-                                              ares__buf_t             *buf)
+                                              ares_llist_t           **namelist,
+                                              ares_buf_t              *buf)
 {
   size_t i;
 
@@ -153,19 +153,19 @@ static ares_status_t ares_dns_write_questions(const ares_dns_record_t *dnsrec,
     }
 
     /* Name */
-    status = ares__dns_name_write(buf, namelist, ARES_TRUE, name);
+    status = ares_dns_name_write(buf, namelist, ARES_TRUE, name);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Type */
-    status = ares__buf_append_be16(buf, (unsigned short)qtype);
+    status = ares_buf_append_be16(buf, (unsigned short)qtype);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Class */
-    status = ares__buf_append_be16(buf, (unsigned short)qclass);
+    status = ares_buf_append_be16(buf, (unsigned short)qclass);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -174,9 +174,9 @@ static ares_status_t ares_dns_write_questions(const ares_dns_record_t *dnsrec,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_write_rr_name(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_name(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
-                                            ares__llist_t      **namelist,
+                                            ares_llist_t       **namelist,
                                             ares_bool_t       validate_hostname,
                                             ares_dns_rr_key_t key)
 {
@@ -187,10 +187,10 @@ static ares_status_t ares_dns_write_rr_name(ares__buf_t         *buf,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  return ares__dns_name_write(buf, namelist, validate_hostname, name);
+  return ares_dns_name_write(buf, namelist, validate_hostname, name);
 }
 
-static ares_status_t ares_dns_write_rr_str(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_str(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
                                            ares_dns_rr_key_t    key)
 {
@@ -209,7 +209,7 @@ static ares_status_t ares_dns_write_rr_str(ares__buf_t         *buf,
   }
 
   /* Write 1 byte length */
-  status = ares__buf_append_byte(buf, (unsigned char)(len & 0xFF));
+  status = ares_buf_append_byte(buf, (unsigned char)(len & 0xFF));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -219,10 +219,10 @@ static ares_status_t ares_dns_write_rr_str(ares__buf_t         *buf,
   }
 
   /* Write string */
-  return ares__buf_append(buf, (const unsigned char *)str, len);
+  return ares_buf_append(buf, (const unsigned char *)str, len);
 }
 
-static ares_status_t ares_dns_write_binstr(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_binstr(ares_buf_t          *buf,
                                            const unsigned char *bin,
                                            size_t               bin_len)
 {
@@ -240,14 +240,14 @@ static ares_status_t ares_dns_write_binstr(ares__buf_t         *buf,
     }
 
     /* Length */
-    status = ares__buf_append_byte(buf, (unsigned char)(len & 0xFF));
+    status = ares_buf_append_byte(buf, (unsigned char)(len & 0xFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* String */
     if (len) {
-      status = ares__buf_append(buf, ptr, len);
+      status = ares_buf_append(buf, ptr, len);
       if (status != ARES_SUCCESS) {
         return status; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -260,7 +260,7 @@ static ares_status_t ares_dns_write_binstr(ares__buf_t         *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_write_rr_abin(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_abin(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
                                             ares_dns_rr_key_t    key)
 {
@@ -287,39 +287,39 @@ static ares_status_t ares_dns_write_rr_abin(ares__buf_t         *buf,
   return status;
 }
 
-static ares_status_t ares_dns_write_rr_be32(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_be32(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
                                             ares_dns_rr_key_t    key)
 {
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_U32) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
-  return ares__buf_append_be32(buf, ares_dns_rr_get_u32(rr, key));
+  return ares_buf_append_be32(buf, ares_dns_rr_get_u32(rr, key));
 }
 
-static ares_status_t ares_dns_write_rr_be16(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_be16(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
                                             ares_dns_rr_key_t    key)
 {
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_U16) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
-  return ares__buf_append_be16(buf, ares_dns_rr_get_u16(rr, key));
+  return ares_buf_append_be16(buf, ares_dns_rr_get_u16(rr, key));
 }
 
-static ares_status_t ares_dns_write_rr_u8(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_u8(ares_buf_t          *buf,
                                           const ares_dns_rr_t *rr,
                                           ares_dns_rr_key_t    key)
 {
   if (ares_dns_rr_key_datatype(key) != ARES_DATATYPE_U8) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
-  return ares__buf_append_byte(buf, ares_dns_rr_get_u8(rr, key));
+  return ares_buf_append_byte(buf, ares_dns_rr_get_u8(rr, key));
 }
 
-static ares_status_t ares_dns_write_rr_a(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_a(ares_buf_t          *buf,
                                          const ares_dns_rr_t *rr,
-                                         ares__llist_t      **namelist)
+                                         ares_llist_t       **namelist)
 {
   const struct in_addr *addr;
   (void)namelist;
@@ -329,28 +329,28 @@ static ares_status_t ares_dns_write_rr_a(ares__buf_t         *buf,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  return ares__buf_append(buf, (const unsigned char *)addr, sizeof(*addr));
+  return ares_buf_append(buf, (const unsigned char *)addr, sizeof(*addr));
 }
 
-static ares_status_t ares_dns_write_rr_ns(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_ns(ares_buf_t          *buf,
                                           const ares_dns_rr_t *rr,
-                                          ares__llist_t      **namelist)
+                                          ares_llist_t       **namelist)
 {
   return ares_dns_write_rr_name(buf, rr, namelist, ARES_FALSE,
                                 ARES_RR_NS_NSDNAME);
 }
 
-static ares_status_t ares_dns_write_rr_cname(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_cname(ares_buf_t          *buf,
                                              const ares_dns_rr_t *rr,
-                                             ares__llist_t      **namelist)
+                                             ares_llist_t       **namelist)
 {
   return ares_dns_write_rr_name(buf, rr, namelist, ARES_FALSE,
                                 ARES_RR_CNAME_CNAME);
 }
 
-static ares_status_t ares_dns_write_rr_soa(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_soa(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   ares_status_t status;
 
@@ -396,17 +396,17 @@ static ares_status_t ares_dns_write_rr_soa(ares__buf_t         *buf,
   return ares_dns_write_rr_be32(buf, rr, ARES_RR_SOA_MINIMUM);
 }
 
-static ares_status_t ares_dns_write_rr_ptr(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_ptr(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   return ares_dns_write_rr_name(buf, rr, namelist, ARES_FALSE,
                                 ARES_RR_PTR_DNAME);
 }
 
-static ares_status_t ares_dns_write_rr_hinfo(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_hinfo(ares_buf_t          *buf,
                                              const ares_dns_rr_t *rr,
-                                             ares__llist_t      **namelist)
+                                             ares_llist_t       **namelist)
 {
   ares_status_t status;
 
@@ -422,9 +422,9 @@ static ares_status_t ares_dns_write_rr_hinfo(ares__buf_t         *buf,
   return ares_dns_write_rr_str(buf, rr, ARES_RR_HINFO_OS);
 }
 
-static ares_status_t ares_dns_write_rr_mx(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_mx(ares_buf_t          *buf,
                                           const ares_dns_rr_t *rr,
-                                          ares__llist_t      **namelist)
+                                          ares_llist_t       **namelist)
 {
   ares_status_t status;
 
@@ -439,17 +439,17 @@ static ares_status_t ares_dns_write_rr_mx(ares__buf_t         *buf,
                                 ARES_RR_MX_EXCHANGE);
 }
 
-static ares_status_t ares_dns_write_rr_txt(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_txt(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   (void)namelist;
   return ares_dns_write_rr_abin(buf, rr, ARES_RR_TXT_DATA);
 }
 
-static ares_status_t ares_dns_write_rr_sig(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_sig(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   ares_status_t        status;
   const unsigned char *data;
@@ -512,12 +512,12 @@ static ares_status_t ares_dns_write_rr_sig(ares__buf_t         *buf,
     return ARES_EFORMERR;
   }
 
-  return ares__buf_append(buf, data, len);
+  return ares_buf_append(buf, data, len);
 }
 
-static ares_status_t ares_dns_write_rr_aaaa(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_aaaa(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
-                                            ares__llist_t      **namelist)
+                                            ares_llist_t       **namelist)
 {
   const struct ares_in6_addr *addr;
   (void)namelist;
@@ -527,12 +527,12 @@ static ares_status_t ares_dns_write_rr_aaaa(ares__buf_t         *buf,
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  return ares__buf_append(buf, (const unsigned char *)addr, sizeof(*addr));
+  return ares_buf_append(buf, (const unsigned char *)addr, sizeof(*addr));
 }
 
-static ares_status_t ares_dns_write_rr_srv(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_srv(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   ares_status_t status;
 
@@ -559,9 +559,9 @@ static ares_status_t ares_dns_write_rr_srv(ares__buf_t         *buf,
                                 ARES_RR_SRV_TARGET);
 }
 
-static ares_status_t ares_dns_write_rr_naptr(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_naptr(ares_buf_t          *buf,
                                              const ares_dns_rr_t *rr,
-                                             ares__llist_t      **namelist)
+                                             ares_llist_t       **namelist)
 {
   ares_status_t status;
 
@@ -600,11 +600,11 @@ static ares_status_t ares_dns_write_rr_naptr(ares__buf_t         *buf,
                                 ARES_RR_NAPTR_REPLACEMENT);
 }
 
-static ares_status_t ares_dns_write_rr_opt(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_opt(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
-  size_t         len = ares__buf_len(buf);
+  size_t         len = ares_buf_len(buf);
   ares_status_t  status;
   unsigned int   ttl = 0;
   size_t         i;
@@ -620,9 +620,9 @@ static ares_status_t ares_dns_write_rr_opt(ares__buf_t         *buf,
 
   /* We need to go back and overwrite the class and ttl that were emitted as
    * the OPT record overloads them for its own use (yes, very strange!) */
-  status = ares__buf_set_length(buf, len - 2 /* RDLENGTH */
-                                       - 4   /* TTL */
-                                       - 2 /* CLASS */);
+  status = ares_buf_set_length(buf, len - 2 /* RDLENGTH */
+                                      - 4   /* TTL */
+                                      - 2 /* CLASS */);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -638,13 +638,13 @@ static ares_status_t ares_dns_write_rr_opt(ares__buf_t         *buf,
   ttl |= (unsigned int)ares_dns_rr_get_u8(rr, ARES_RR_OPT_VERSION) << 16;
   ttl |= (unsigned int)ares_dns_rr_get_u16(rr, ARES_RR_OPT_FLAGS);
 
-  status = ares__buf_append_be32(buf, ttl);
+  status = ares_buf_append_be32(buf, ttl);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* Now go back to real end */
-  status = ares__buf_set_length(buf, len);
+  status = ares_buf_set_length(buf, len);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -658,20 +658,20 @@ static ares_status_t ares_dns_write_rr_opt(ares__buf_t         *buf,
     opt = ares_dns_rr_get_opt(rr, ARES_RR_OPT_OPTIONS, i, &val, &val_len);
 
     /* BE16 option */
-    status = ares__buf_append_be16(buf, opt);
+    status = ares_buf_append_be16(buf, opt);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* BE16 length */
-    status = ares__buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
+    status = ares_buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Value */
     if (val && val_len) {
-      status = ares__buf_append(buf, val, val_len);
+      status = ares_buf_append(buf, val, val_len);
       if (status != ARES_SUCCESS) {
         return status; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -681,9 +681,9 @@ static ares_status_t ares_dns_write_rr_opt(ares__buf_t         *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_write_rr_tlsa(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_tlsa(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
-                                            ares__llist_t      **namelist)
+                                            ares_llist_t       **namelist)
 {
   ares_status_t        status;
   const unsigned char *data;
@@ -715,12 +715,12 @@ static ares_status_t ares_dns_write_rr_tlsa(ares__buf_t         *buf,
     return ARES_EFORMERR;
   }
 
-  return ares__buf_append(buf, data, len);
+  return ares_buf_append(buf, data, len);
 }
 
-static ares_status_t ares_dns_write_rr_svcb(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_svcb(ares_buf_t          *buf,
                                             const ares_dns_rr_t *rr,
-                                            ares__llist_t      **namelist)
+                                            ares_llist_t       **namelist)
 {
   ares_status_t status;
   size_t        i;
@@ -747,20 +747,20 @@ static ares_status_t ares_dns_write_rr_svcb(ares__buf_t         *buf,
     opt = ares_dns_rr_get_opt(rr, ARES_RR_SVCB_PARAMS, i, &val, &val_len);
 
     /* BE16 option */
-    status = ares__buf_append_be16(buf, opt);
+    status = ares_buf_append_be16(buf, opt);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* BE16 length */
-    status = ares__buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
+    status = ares_buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Value */
     if (val && val_len) {
-      status = ares__buf_append(buf, val, val_len);
+      status = ares_buf_append(buf, val, val_len);
       if (status != ARES_SUCCESS) {
         return status; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -769,9 +769,9 @@ static ares_status_t ares_dns_write_rr_svcb(ares__buf_t         *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_write_rr_https(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_https(ares_buf_t          *buf,
                                              const ares_dns_rr_t *rr,
-                                             ares__llist_t      **namelist)
+                                             ares_llist_t       **namelist)
 {
   ares_status_t status;
   size_t        i;
@@ -798,20 +798,20 @@ static ares_status_t ares_dns_write_rr_https(ares__buf_t         *buf,
     opt = ares_dns_rr_get_opt(rr, ARES_RR_HTTPS_PARAMS, i, &val, &val_len);
 
     /* BE16 option */
-    status = ares__buf_append_be16(buf, opt);
+    status = ares_buf_append_be16(buf, opt);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* BE16 length */
-    status = ares__buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
+    status = ares_buf_append_be16(buf, (unsigned short)(val_len & 0xFFFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Value */
     if (val && val_len) {
-      status = ares__buf_append(buf, val, val_len);
+      status = ares_buf_append(buf, val, val_len);
       if (status != ARES_SUCCESS) {
         return status; /* LCOV_EXCL_LINE: OutOfMemory */
       }
@@ -820,9 +820,9 @@ static ares_status_t ares_dns_write_rr_https(ares__buf_t         *buf,
   return ARES_SUCCESS;
 }
 
-static ares_status_t ares_dns_write_rr_uri(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_uri(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   ares_status_t status;
   const char   *target;
@@ -848,13 +848,13 @@ static ares_status_t ares_dns_write_rr_uri(ares__buf_t         *buf,
     return ARES_EFORMERR;
   }
 
-  return ares__buf_append(buf, (const unsigned char *)target,
-                          ares_strlen(target));
+  return ares_buf_append(buf, (const unsigned char *)target,
+                         ares_strlen(target));
 }
 
-static ares_status_t ares_dns_write_rr_caa(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_caa(ares_buf_t          *buf,
                                            const ares_dns_rr_t *rr,
-                                           ares__llist_t      **namelist)
+                                           ares_llist_t       **namelist)
 {
   const unsigned char *data     = NULL;
   size_t               data_len = 0;
@@ -880,14 +880,14 @@ static ares_status_t ares_dns_write_rr_caa(ares__buf_t         *buf,
     return ARES_EFORMERR;
   }
 
-  return ares__buf_append(buf, data, data_len);
+  return ares_buf_append(buf, data, data_len);
 }
 
-static ares_status_t ares_dns_write_rr_raw_rr(ares__buf_t         *buf,
+static ares_status_t ares_dns_write_rr_raw_rr(ares_buf_t          *buf,
                                               const ares_dns_rr_t *rr,
-                                              ares__llist_t      **namelist)
+                                              ares_llist_t       **namelist)
 {
-  size_t               len = ares__buf_len(buf);
+  size_t               len = ares_buf_len(buf);
   ares_status_t        status;
   const unsigned char *data     = NULL;
   size_t               data_len = 0;
@@ -902,10 +902,10 @@ static ares_status_t ares_dns_write_rr_raw_rr(ares__buf_t         *buf,
 
   /* We need to go back and overwrite the type that was emitted by the parent
    * function */
-  status = ares__buf_set_length(buf, len - 2 /* RDLENGTH */
-                                       - 4   /* TTL */
-                                       - 2   /* CLASS */
-                                       - 2 /* TYPE */);
+  status = ares_buf_set_length(buf, len - 2 /* RDLENGTH */
+                                      - 4   /* TTL */
+                                      - 2   /* CLASS */
+                                      - 2 /* TYPE */);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -916,7 +916,7 @@ static ares_status_t ares_dns_write_rr_raw_rr(ares__buf_t         *buf,
   }
 
   /* Now go back to real end */
-  status = ares__buf_set_length(buf, len);
+  status = ares_buf_set_length(buf, len);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -931,13 +931,13 @@ static ares_status_t ares_dns_write_rr_raw_rr(ares__buf_t         *buf,
     return ARES_SUCCESS;
   }
 
-  return ares__buf_append(buf, data, data_len);
+  return ares_buf_append(buf, data, data_len);
 }
 
 static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
-                                       ares__llist_t          **namelist,
+                                       ares_llist_t           **namelist,
                                        ares_dns_section_t       section,
-                                       ares__buf_t             *buf)
+                                       ares_buf_t              *buf)
 {
   size_t i;
 
@@ -945,7 +945,7 @@ static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
     const ares_dns_rr_t *rr;
     ares_dns_rec_type_t  type;
     ares_bool_t          allow_compress;
-    ares__llist_t      **namelistptr = NULL;
+    ares_llist_t       **namelistptr = NULL;
     size_t               pos_len;
     ares_status_t        status;
     size_t               rdlength;
@@ -958,27 +958,27 @@ static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
     }
 
     type           = ares_dns_rr_get_type(rr);
-    allow_compress = ares_dns_rec_type_allow_name_compression(type);
+    allow_compress = ares_dns_rec_allow_name_comp(type);
     if (allow_compress) {
       namelistptr = namelist;
     }
 
     /* Name */
     status =
-      ares__dns_name_write(buf, namelist, ARES_TRUE, ares_dns_rr_get_name(rr));
+      ares_dns_name_write(buf, namelist, ARES_TRUE, ares_dns_rr_get_name(rr));
     if (status != ARES_SUCCESS) {
       return status;
     }
 
     /* Type */
-    status = ares__buf_append_be16(buf, (unsigned short)type);
+    status = ares_buf_append_be16(buf, (unsigned short)type);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Class */
     status =
-      ares__buf_append_be16(buf, (unsigned short)ares_dns_rr_get_class(rr));
+      ares_buf_append_be16(buf, (unsigned short)ares_dns_rr_get_class(rr));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -990,14 +990,14 @@ static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
     } else {
       ttl -= rr->parent->ttl_decrement;
     }
-    status = ares__buf_append_be32(buf, ttl);
+    status = ares_buf_append_be32(buf, ttl);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
     /* Length */
-    pos_len = ares__buf_len(buf); /* Save to write real length later */
-    status  = ares__buf_append_be16(buf, 0);
+    pos_len = ares_buf_len(buf); /* Save to write real length later */
+    status  = ares_buf_append_be16(buf, 0);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -1072,20 +1072,20 @@ static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
 
     /* Back off write pointer, write real length, then go back to proper
      * position */
-    end_length = ares__buf_len(buf);
+    end_length = ares_buf_len(buf);
     rdlength   = end_length - pos_len - 2;
 
-    status = ares__buf_set_length(buf, pos_len);
+    status = ares_buf_set_length(buf, pos_len);
     if (status != ARES_SUCCESS) {
       return status;
     }
 
-    status = ares__buf_append_be16(buf, (unsigned short)(rdlength & 0xFFFF));
+    status = ares_buf_append_be16(buf, (unsigned short)(rdlength & 0xFFFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_set_length(buf, end_length);
+    status = ares_buf_set_length(buf, end_length);
     if (status != ARES_SUCCESS) {
       return status;
     }
@@ -1095,17 +1095,17 @@ static ares_status_t ares_dns_write_rr(const ares_dns_record_t *dnsrec,
 }
 
 ares_status_t ares_dns_write_buf(const ares_dns_record_t *dnsrec,
-                                 ares__buf_t             *buf)
+                                 ares_buf_t              *buf)
 {
-  ares__llist_t *namelist = NULL;
-  size_t         orig_len;
-  ares_status_t  status;
+  ares_llist_t *namelist = NULL;
+  size_t        orig_len;
+  ares_status_t status;
 
   if (dnsrec == NULL || buf == NULL) {
     return ARES_EFORMERR;
   }
 
-  orig_len = ares__buf_len(buf);
+  orig_len = ares_buf_len(buf);
 
   status = ares_dns_write_header(dnsrec, buf);
   if (status != ARES_SUCCESS) {
@@ -1133,16 +1133,16 @@ ares_status_t ares_dns_write_buf(const ares_dns_record_t *dnsrec,
   }
 
 done:
-  ares__llist_destroy(namelist);
+  ares_llist_destroy(namelist);
   if (status != ARES_SUCCESS) {
-    ares__buf_set_length(buf, orig_len);
+    ares_buf_set_length(buf, orig_len);
   }
 
   return status;
 }
 
 ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
-                                     ares__buf_t             *buf)
+                                     ares_buf_t              *buf)
 {
   ares_status_t status;
   size_t        orig_len;
@@ -1153,10 +1153,10 @@ ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
     return ARES_EFORMERR;
   }
 
-  orig_len = ares__buf_len(buf);
+  orig_len = ares_buf_len(buf);
 
   /* Write placeholder for length */
-  status = ares__buf_append_be16(buf, 0);
+  status = ares_buf_append_be16(buf, 0);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -1167,7 +1167,7 @@ ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
     goto done; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  len     = ares__buf_len(buf);
+  len     = ares_buf_len(buf);
   msg_len = len - orig_len - 2;
   if (msg_len > 65535) {
     status = ARES_EBADQUERY;
@@ -1176,16 +1176,16 @@ ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
 
   /* Now we need to overwrite the length, so we jump back to the original
    * message length, overwrite the section and jump back */
-  ares__buf_set_length(buf, orig_len);
-  status = ares__buf_append_be16(buf, (unsigned short)(msg_len & 0xFFFF));
+  ares_buf_set_length(buf, orig_len);
+  status = ares_buf_append_be16(buf, (unsigned short)(msg_len & 0xFFFF));
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: UntestablePath */
   }
-  ares__buf_set_length(buf, len);
+  ares_buf_set_length(buf, len);
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__buf_set_length(buf, orig_len);
+    ares_buf_set_length(buf, orig_len);
   }
   return status;
 }
@@ -1193,7 +1193,7 @@ ares_status_t ares_dns_write_buf_tcp(const ares_dns_record_t *dnsrec,
 ares_status_t ares_dns_write(const ares_dns_record_t *dnsrec,
                              unsigned char **buf, size_t *buf_len)
 {
-  ares__buf_t  *b = NULL;
+  ares_buf_t   *b = NULL;
   ares_status_t status;
 
   if (buf == NULL || buf_len == NULL || dnsrec == NULL) {
@@ -1203,7 +1203,7 @@ ares_status_t ares_dns_write(const ares_dns_record_t *dnsrec,
   *buf     = NULL;
   *buf_len = 0;
 
-  b = ares__buf_create();
+  b = ares_buf_create();
   if (b == NULL) {
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -1211,16 +1211,16 @@ ares_status_t ares_dns_write(const ares_dns_record_t *dnsrec,
   status = ares_dns_write_buf(dnsrec, b);
 
   if (status != ARES_SUCCESS) {
-    ares__buf_destroy(b);
+    ares_buf_destroy(b);
     return status;
   }
 
-  *buf = ares__buf_finish_bin(b, buf_len);
+  *buf = ares_buf_finish_bin(b, buf_len);
   return status;
 }
 
-void ares_dns_record_write_ttl_decrement(ares_dns_record_t *dnsrec,
-                                         unsigned int       ttl_decrement)
+void ares_dns_record_ttl_decrement(ares_dns_record_t *dnsrec,
+                                   unsigned int       ttl_decrement)
 {
   if (dnsrec == NULL) {
     return;
diff --git a/deps/cares/src/lib/str/ares__buf.c b/deps/cares/src/lib/str/ares_buf.c
similarity index 55%
rename from deps/cares/src/lib/str/ares__buf.c
rename to deps/cares/src/lib/str/ares_buf.c
index bf6d4a0e1d3712..69e6b38aac849e 100644
--- a/deps/cares/src/lib/str/ares__buf.c
+++ b/deps/cares/src/lib/str/ares_buf.c
@@ -24,13 +24,13 @@
  * SPDX-License-Identifier: MIT
  */
 #include "ares_private.h"
-#include "ares__buf.h"
+#include "ares_buf.h"
 #include <limits.h>
 #ifdef HAVE_STDINT_H
 #  include <stdint.h>
 #endif
 
-struct ares__buf {
+struct ares_buf {
   const unsigned char *data;          /*!< pointer to start of data buffer */
   size_t               data_len;      /*!< total size of data in buffer */
 
@@ -43,9 +43,9 @@ struct ares__buf {
                                        *   SIZE_MAX if not set. */
 };
 
-ares__buf_t *ares__buf_create(void)
+ares_buf_t *ares_buf_create(void)
 {
-  ares__buf_t *buf = ares_malloc_zero(sizeof(*buf));
+  ares_buf_t *buf = ares_malloc_zero(sizeof(*buf));
   if (buf == NULL) {
     return NULL;
   }
@@ -54,15 +54,15 @@ ares__buf_t *ares__buf_create(void)
   return buf;
 }
 
-ares__buf_t *ares__buf_create_const(const unsigned char *data, size_t data_len)
+ares_buf_t *ares_buf_create_const(const unsigned char *data, size_t data_len)
 {
-  ares__buf_t *buf;
+  ares_buf_t *buf;
 
   if (data == NULL || data_len == 0) {
     return NULL;
   }
 
-  buf = ares__buf_create();
+  buf = ares_buf_create();
   if (buf == NULL) {
     return NULL;
   }
@@ -73,7 +73,7 @@ ares__buf_t *ares__buf_create_const(const unsigned char *data, size_t data_len)
   return buf;
 }
 
-void ares__buf_destroy(ares__buf_t *buf)
+void ares_buf_destroy(ares_buf_t *buf)
 {
   if (buf == NULL) {
     return;
@@ -82,7 +82,7 @@ void ares__buf_destroy(ares__buf_t *buf)
   ares_free(buf);
 }
 
-static ares_bool_t ares__buf_is_const(const ares__buf_t *buf)
+static ares_bool_t ares_buf_is_const(const ares_buf_t *buf)
 {
   if (buf == NULL) {
     return ARES_FALSE; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -95,7 +95,7 @@ static ares_bool_t ares__buf_is_const(const ares__buf_t *buf)
   return ARES_FALSE;
 }
 
-void ares__buf_reclaim(ares__buf_t *buf)
+void ares_buf_reclaim(ares_buf_t *buf)
 {
   size_t prefix_size;
   size_t data_size;
@@ -104,7 +104,7 @@ void ares__buf_reclaim(ares__buf_t *buf)
     return;
   }
 
-  if (ares__buf_is_const(buf)) {
+  if (ares_buf_is_const(buf)) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
@@ -135,8 +135,7 @@ void ares__buf_reclaim(ares__buf_t *buf)
   }
 }
 
-static ares_status_t ares__buf_ensure_space(ares__buf_t *buf,
-                                            size_t       needed_size)
+static ares_status_t ares_buf_ensure_space(ares_buf_t *buf, size_t needed_size)
 {
   size_t         remaining_size;
   size_t         alloc_size;
@@ -146,11 +145,11 @@ static ares_status_t ares__buf_ensure_space(ares__buf_t *buf,
     return ARES_EFORMERR;
   }
 
-  if (ares__buf_is_const(buf)) {
+  if (ares_buf_is_const(buf)) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  /* When calling ares__buf_finish_str() we end up adding a null terminator,
+  /* When calling ares_buf_finish_str() we end up adding a null terminator,
    * so we want to ensure the size is always sufficient for this as we don't
    * want an ARES_ENOMEM at that point */
   needed_size++;
@@ -162,7 +161,7 @@ static ares_status_t ares__buf_ensure_space(ares__buf_t *buf,
   }
 
   /* See if just moving consumed data frees up enough space */
-  ares__buf_reclaim(buf);
+  ares_buf_reclaim(buf);
 
   remaining_size = buf->alloc_buf_len - buf->data_len;
   if (remaining_size >= needed_size) {
@@ -194,9 +193,9 @@ static ares_status_t ares__buf_ensure_space(ares__buf_t *buf,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_set_length(ares__buf_t *buf, size_t len)
+ares_status_t ares_buf_set_length(ares_buf_t *buf, size_t len)
 {
-  if (buf == NULL || ares__buf_is_const(buf)) {
+  if (buf == NULL || ares_buf_is_const(buf)) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
@@ -208,8 +207,8 @@ ares_status_t ares__buf_set_length(ares__buf_t *buf, size_t len)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_append(ares__buf_t *buf, const unsigned char *data,
-                               size_t data_len)
+ares_status_t ares_buf_append(ares_buf_t *buf, const unsigned char *data,
+                              size_t data_len)
 {
   ares_status_t status;
 
@@ -221,7 +220,7 @@ ares_status_t ares__buf_append(ares__buf_t *buf, const unsigned char *data,
     return ARES_SUCCESS;
   }
 
-  status = ares__buf_ensure_space(buf, data_len);
+  status = ares_buf_ensure_space(buf, data_len);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -231,21 +230,21 @@ ares_status_t ares__buf_append(ares__buf_t *buf, const unsigned char *data,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_append_byte(ares__buf_t *buf, unsigned char b)
+ares_status_t ares_buf_append_byte(ares_buf_t *buf, unsigned char b)
 {
-  return ares__buf_append(buf, &b, 1);
+  return ares_buf_append(buf, &b, 1);
 }
 
-ares_status_t ares__buf_append_be16(ares__buf_t *buf, unsigned short u16)
+ares_status_t ares_buf_append_be16(ares_buf_t *buf, unsigned short u16)
 {
   ares_status_t status;
 
-  status = ares__buf_append_byte(buf, (unsigned char)((u16 >> 8) & 0xff));
+  status = ares_buf_append_byte(buf, (unsigned char)((u16 >> 8) & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_byte(buf, (unsigned char)(u16 & 0xff));
+  status = ares_buf_append_byte(buf, (unsigned char)(u16 & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -253,26 +252,26 @@ ares_status_t ares__buf_append_be16(ares__buf_t *buf, unsigned short u16)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_append_be32(ares__buf_t *buf, unsigned int u32)
+ares_status_t ares_buf_append_be32(ares_buf_t *buf, unsigned int u32)
 {
   ares_status_t status;
 
-  status = ares__buf_append_byte(buf, ((unsigned char)(u32 >> 24) & 0xff));
+  status = ares_buf_append_byte(buf, ((unsigned char)(u32 >> 24) & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_byte(buf, ((unsigned char)(u32 >> 16) & 0xff));
+  status = ares_buf_append_byte(buf, ((unsigned char)(u32 >> 16) & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_byte(buf, ((unsigned char)(u32 >> 8) & 0xff));
+  status = ares_buf_append_byte(buf, ((unsigned char)(u32 >> 8) & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__buf_append_byte(buf, ((unsigned char)u32 & 0xff));
+  status = ares_buf_append_byte(buf, ((unsigned char)u32 & 0xff));
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -280,7 +279,7 @@ ares_status_t ares__buf_append_be32(ares__buf_t *buf, unsigned int u32)
   return ARES_SUCCESS;
 }
 
-unsigned char *ares__buf_append_start(ares__buf_t *buf, size_t *len)
+unsigned char *ares_buf_append_start(ares_buf_t *buf, size_t *len)
 {
   ares_status_t status;
 
@@ -288,17 +287,17 @@ unsigned char *ares__buf_append_start(ares__buf_t *buf, size_t *len)
     return NULL;
   }
 
-  status = ares__buf_ensure_space(buf, *len);
+  status = ares_buf_ensure_space(buf, *len);
   if (status != ARES_SUCCESS) {
     return NULL;
   }
 
-  /* -1 for possible null terminator for ares__buf_finish_str() */
+  /* -1 for possible null terminator for ares_buf_finish_str() */
   *len = buf->alloc_buf_len - buf->data_len - 1;
   return buf->alloc_buf + buf->data_len;
 }
 
-void ares__buf_append_finish(ares__buf_t *buf, size_t len)
+void ares_buf_append_finish(ares_buf_t *buf, size_t len)
 {
   if (buf == NULL) {
     return;
@@ -307,18 +306,17 @@ void ares__buf_append_finish(ares__buf_t *buf, size_t len)
   buf->data_len += len;
 }
 
-unsigned char *ares__buf_finish_bin(ares__buf_t *buf, size_t *len)
+unsigned char *ares_buf_finish_bin(ares_buf_t *buf, size_t *len)
 {
   unsigned char *ptr = NULL;
-  if (buf == NULL || len == NULL || ares__buf_is_const(buf)) {
+  if (buf == NULL || len == NULL || ares_buf_is_const(buf)) {
     return NULL;
   }
 
-  ares__buf_reclaim(buf);
+  ares_buf_reclaim(buf);
 
   /* We don't want to return NULL except on failure, may be zero-length */
-  if (buf->alloc_buf == NULL &&
-      ares__buf_ensure_space(buf, 1) != ARES_SUCCESS) {
+  if (buf->alloc_buf == NULL && ares_buf_ensure_space(buf, 1) != ARES_SUCCESS) {
     return NULL; /* LCOV_EXCL_LINE: OutOfMemory */
   }
   ptr  = buf->alloc_buf;
@@ -327,12 +325,12 @@ unsigned char *ares__buf_finish_bin(ares__buf_t *buf, size_t *len)
   return ptr;
 }
 
-char *ares__buf_finish_str(ares__buf_t *buf, size_t *len)
+char *ares_buf_finish_str(ares_buf_t *buf, size_t *len)
 {
   char  *ptr;
   size_t mylen;
 
-  ptr = (char *)ares__buf_finish_bin(buf, &mylen);
+  ptr = (char *)ares_buf_finish_bin(buf, &mylen);
   if (ptr == NULL) {
     return NULL;
   }
@@ -341,14 +339,14 @@ char *ares__buf_finish_str(ares__buf_t *buf, size_t *len)
     *len = mylen;
   }
 
-  /* NOTE: ensured via ares__buf_ensure_space() that there is always at least
+  /* NOTE: ensured via ares_buf_ensure_space() that there is always at least
    *       1 extra byte available for this specific use-case */
   ptr[mylen] = 0;
 
   return ptr;
 }
 
-void ares__buf_tag(ares__buf_t *buf)
+void ares_buf_tag(ares_buf_t *buf)
 {
   if (buf == NULL) {
     return;
@@ -357,7 +355,7 @@ void ares__buf_tag(ares__buf_t *buf)
   buf->tag_offset = buf->offset;
 }
 
-ares_status_t ares__buf_tag_rollback(ares__buf_t *buf)
+ares_status_t ares_buf_tag_rollback(ares_buf_t *buf)
 {
   if (buf == NULL || buf->tag_offset == SIZE_MAX) {
     return ARES_EFORMERR;
@@ -368,7 +366,7 @@ ares_status_t ares__buf_tag_rollback(ares__buf_t *buf)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_tag_clear(ares__buf_t *buf)
+ares_status_t ares_buf_tag_clear(ares_buf_t *buf)
 {
   if (buf == NULL || buf->tag_offset == SIZE_MAX) {
     return ARES_EFORMERR;
@@ -378,7 +376,7 @@ ares_status_t ares__buf_tag_clear(ares__buf_t *buf)
   return ARES_SUCCESS;
 }
 
-const unsigned char *ares__buf_tag_fetch(const ares__buf_t *buf, size_t *len)
+const unsigned char *ares_buf_tag_fetch(const ares_buf_t *buf, size_t *len)
 {
   if (buf == NULL || buf->tag_offset == SIZE_MAX || len == NULL) {
     return NULL;
@@ -388,7 +386,7 @@ const unsigned char *ares__buf_tag_fetch(const ares__buf_t *buf, size_t *len)
   return buf->data + buf->tag_offset;
 }
 
-size_t ares__buf_tag_length(const ares__buf_t *buf)
+size_t ares_buf_tag_length(const ares_buf_t *buf)
 {
   if (buf == NULL || buf->tag_offset == SIZE_MAX) {
     return 0;
@@ -396,11 +394,11 @@ size_t ares__buf_tag_length(const ares__buf_t *buf)
   return buf->offset - buf->tag_offset;
 }
 
-ares_status_t ares__buf_tag_fetch_bytes(const ares__buf_t *buf,
-                                        unsigned char *bytes, size_t *len)
+ares_status_t ares_buf_tag_fetch_bytes(const ares_buf_t *buf,
+                                       unsigned char *bytes, size_t *len)
 {
   size_t               ptr_len = 0;
-  const unsigned char *ptr     = ares__buf_tag_fetch(buf, &ptr_len);
+  const unsigned char *ptr     = ares_buf_tag_fetch(buf, &ptr_len);
 
   if (ptr == NULL || bytes == NULL || len == NULL) {
     return ARES_EFORMERR;
@@ -418,8 +416,25 @@ ares_status_t ares__buf_tag_fetch_bytes(const ares__buf_t *buf,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
-                                         size_t len)
+ares_status_t ares_buf_tag_fetch_constbuf(const ares_buf_t *buf,
+                                          ares_buf_t      **newbuf)
+{
+  size_t               ptr_len = 0;
+  const unsigned char *ptr     = ares_buf_tag_fetch(buf, &ptr_len);
+
+  if (ptr == NULL || newbuf == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *newbuf = ares_buf_create_const(ptr, ptr_len);
+  if (*newbuf == NULL) {
+    return ARES_ENOMEM;
+  }
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_buf_tag_fetch_string(const ares_buf_t *buf, char *str,
+                                        size_t len)
 {
   size_t        out_len;
   ares_status_t status;
@@ -432,7 +447,7 @@ ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
   /* Space for NULL terminator */
   out_len = len - 1;
 
-  status = ares__buf_tag_fetch_bytes(buf, (unsigned char *)str, &out_len);
+  status = ares_buf_tag_fetch_bytes(buf, (unsigned char *)str, &out_len);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -442,7 +457,7 @@ ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
 
   /* Validate string is printable */
   for (i = 0; i < out_len; i++) {
-    if (!ares__isprint(str[i])) {
+    if (!ares_isprint(str[i])) {
       return ARES_EBADSTR;
     }
   }
@@ -450,7 +465,32 @@ ares_status_t ares__buf_tag_fetch_string(const ares__buf_t *buf, char *str,
   return ARES_SUCCESS;
 }
 
-static const unsigned char *ares__buf_fetch(const ares__buf_t *buf, size_t *len)
+ares_status_t ares_buf_tag_fetch_strdup(const ares_buf_t *buf, char **str)
+{
+  size_t               ptr_len = 0;
+  const unsigned char *ptr     = ares_buf_tag_fetch(buf, &ptr_len);
+
+  if (ptr == NULL || str == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (!ares_str_isprint((const char *)ptr, ptr_len)) {
+    return ARES_EBADSTR;
+  }
+
+  *str = ares_malloc(ptr_len + 1);
+  if (*str == NULL) {
+    return ARES_ENOMEM;
+  }
+
+  if (ptr_len > 0) {
+    memcpy(*str, ptr, ptr_len);
+  }
+  (*str)[ptr_len] = 0;
+  return ARES_SUCCESS;
+}
+
+static const unsigned char *ares_buf_fetch(const ares_buf_t *buf, size_t *len)
 {
   if (len != NULL) {
     *len = 0;
@@ -468,9 +508,9 @@ static const unsigned char *ares__buf_fetch(const ares__buf_t *buf, size_t *len)
   return buf->data + buf->offset;
 }
 
-ares_status_t ares__buf_consume(ares__buf_t *buf, size_t len)
+ares_status_t ares_buf_consume(ares_buf_t *buf, size_t len)
 {
-  size_t remaining_len = ares__buf_len(buf);
+  size_t remaining_len = ares_buf_len(buf);
 
   if (remaining_len < len) {
     return ARES_EBADRESP;
@@ -480,10 +520,10 @@ ares_status_t ares__buf_consume(ares__buf_t *buf, size_t len)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_fetch_be16(ares__buf_t *buf, unsigned short *u16)
+ares_status_t ares_buf_fetch_be16(ares_buf_t *buf, unsigned short *u16)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
   unsigned int         u32;
 
   if (buf == NULL || u16 == NULL || remaining_len < sizeof(*u16)) {
@@ -495,13 +535,13 @@ ares_status_t ares__buf_fetch_be16(ares__buf_t *buf, unsigned short *u16)
   u32  = ((unsigned int)(ptr[0]) << 8 | (unsigned int)ptr[1]);
   *u16 = (unsigned short)(u32 & 0xFFFF);
 
-  return ares__buf_consume(buf, sizeof(*u16));
+  return ares_buf_consume(buf, sizeof(*u16));
 }
 
-ares_status_t ares__buf_fetch_be32(ares__buf_t *buf, unsigned int *u32)
+ares_status_t ares_buf_fetch_be32(ares_buf_t *buf, unsigned int *u32)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
 
   if (buf == NULL || u32 == NULL || remaining_len < sizeof(*u32)) {
     return ARES_EBADRESP;
@@ -510,29 +550,29 @@ ares_status_t ares__buf_fetch_be32(ares__buf_t *buf, unsigned int *u32)
   *u32 = ((unsigned int)(ptr[0]) << 24 | (unsigned int)(ptr[1]) << 16 |
           (unsigned int)(ptr[2]) << 8 | (unsigned int)(ptr[3]));
 
-  return ares__buf_consume(buf, sizeof(*u32));
+  return ares_buf_consume(buf, sizeof(*u32));
 }
 
-ares_status_t ares__buf_fetch_bytes(ares__buf_t *buf, unsigned char *bytes,
-                                    size_t len)
+ares_status_t ares_buf_fetch_bytes(ares_buf_t *buf, unsigned char *bytes,
+                                   size_t len)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
 
   if (buf == NULL || bytes == NULL || len == 0 || remaining_len < len) {
     return ARES_EBADRESP;
   }
 
   memcpy(bytes, ptr, len);
-  return ares__buf_consume(buf, len);
+  return ares_buf_consume(buf, len);
 }
 
-ares_status_t ares__buf_fetch_bytes_dup(ares__buf_t *buf, size_t len,
-                                        ares_bool_t     null_term,
-                                        unsigned char **bytes)
+ares_status_t ares_buf_fetch_bytes_dup(ares_buf_t *buf, size_t len,
+                                       ares_bool_t     null_term,
+                                       unsigned char **bytes)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
 
   if (buf == NULL || bytes == NULL || len == 0 || remaining_len < len) {
     return ARES_EBADRESP;
@@ -547,18 +587,26 @@ ares_status_t ares__buf_fetch_bytes_dup(ares__buf_t *buf, size_t len,
   if (null_term) {
     (*bytes)[len] = 0;
   }
-  return ares__buf_consume(buf, len);
+  return ares_buf_consume(buf, len);
 }
 
-ares_status_t ares__buf_fetch_str_dup(ares__buf_t *buf, size_t len, char **str)
+ares_status_t ares_buf_fetch_str_dup(ares_buf_t *buf, size_t len, char **str)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  size_t               i;
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
 
   if (buf == NULL || str == NULL || len == 0 || remaining_len < len) {
     return ARES_EBADRESP;
   }
 
+  /* Validate string is printable */
+  for (i = 0; i < len; i++) {
+    if (!ares_isprint(ptr[i])) {
+      return ARES_EBADSTR;
+    }
+  }
+
   *str = ares_malloc(len + 1);
   if (*str == NULL) {
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
@@ -567,30 +615,30 @@ ares_status_t ares__buf_fetch_str_dup(ares__buf_t *buf, size_t len, char **str)
   memcpy(*str, ptr, len);
   (*str)[len] = 0;
 
-  return ares__buf_consume(buf, len);
+  return ares_buf_consume(buf, len);
 }
 
-ares_status_t ares__buf_fetch_bytes_into_buf(ares__buf_t *buf,
-                                             ares__buf_t *dest, size_t len)
+ares_status_t ares_buf_fetch_bytes_into_buf(ares_buf_t *buf, ares_buf_t *dest,
+                                            size_t len)
 {
   size_t               remaining_len;
-  const unsigned char *ptr = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr = ares_buf_fetch(buf, &remaining_len);
   ares_status_t        status;
 
   if (buf == NULL || dest == NULL || len == 0 || remaining_len < len) {
     return ARES_EBADRESP;
   }
 
-  status = ares__buf_append(dest, ptr, len);
+  status = ares_buf_append(dest, ptr, len);
   if (status != ARES_SUCCESS) {
     return status;
   }
 
-  return ares__buf_consume(buf, len);
+  return ares_buf_consume(buf, len);
 }
 
-static ares_bool_t ares__is_whitespace(unsigned char c,
-                                       ares_bool_t   include_linefeed)
+static ares_bool_t ares_is_whitespace(unsigned char c,
+                                      ares_bool_t   include_linefeed)
 {
   switch (c) {
     case '\r':
@@ -607,11 +655,11 @@ static ares_bool_t ares__is_whitespace(unsigned char c,
   return ARES_FALSE;
 }
 
-size_t ares__buf_consume_whitespace(ares__buf_t *buf,
-                                    ares_bool_t  include_linefeed)
+size_t ares_buf_consume_whitespace(ares_buf_t *buf,
+                                   ares_bool_t include_linefeed)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
   size_t               i;
 
   if (ptr == NULL) {
@@ -619,21 +667,21 @@ size_t ares__buf_consume_whitespace(ares__buf_t *buf,
   }
 
   for (i = 0; i < remaining_len; i++) {
-    if (!ares__is_whitespace(ptr[i], include_linefeed)) {
+    if (!ares_is_whitespace(ptr[i], include_linefeed)) {
       break;
     }
   }
 
   if (i > 0) {
-    ares__buf_consume(buf, i);
+    ares_buf_consume(buf, i);
   }
   return i;
 }
 
-size_t ares__buf_consume_nonwhitespace(ares__buf_t *buf)
+size_t ares_buf_consume_nonwhitespace(ares_buf_t *buf)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
   size_t               i;
 
   if (ptr == NULL) {
@@ -641,21 +689,21 @@ size_t ares__buf_consume_nonwhitespace(ares__buf_t *buf)
   }
 
   for (i = 0; i < remaining_len; i++) {
-    if (ares__is_whitespace(ptr[i], ARES_TRUE)) {
+    if (ares_is_whitespace(ptr[i], ARES_TRUE)) {
       break;
     }
   }
 
   if (i > 0) {
-    ares__buf_consume(buf, i);
+    ares_buf_consume(buf, i);
   }
   return i;
 }
 
-size_t ares__buf_consume_line(ares__buf_t *buf, ares_bool_t include_linefeed)
+size_t ares_buf_consume_line(ares_buf_t *buf, ares_bool_t include_linefeed)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
   size_t               i;
 
   if (ptr == NULL) {
@@ -674,28 +722,40 @@ size_t ares__buf_consume_line(ares__buf_t *buf, ares_bool_t include_linefeed)
   }
 
   if (i > 0) {
-    ares__buf_consume(buf, i);
+    ares_buf_consume(buf, i);
   }
   return i;
 }
 
-size_t ares__buf_consume_until_charset(ares__buf_t         *buf,
-                                       const unsigned char *charset, size_t len,
-                                       ares_bool_t require_charset)
+size_t ares_buf_consume_until_charset(ares_buf_t          *buf,
+                                      const unsigned char *charset, size_t len,
+                                      ares_bool_t require_charset)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
-  size_t               i;
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
+  size_t               pos;
   ares_bool_t          found = ARES_FALSE;
 
   if (ptr == NULL || charset == NULL || len == 0) {
     return 0;
   }
 
-  for (i = 0; i < remaining_len; i++) {
+  /* Optimize for single character searches */
+  if (len == 1) {
+    const unsigned char *p = memchr(ptr, charset[0], remaining_len);
+    if (p != NULL) {
+      found = ARES_TRUE;
+      pos   = (size_t)(p - ptr);
+    } else {
+      pos = remaining_len;
+    }
+    goto done;
+  }
+
+  for (pos = 0; pos < remaining_len; pos++) {
     size_t j;
     for (j = 0; j < len; j++) {
-      if (ptr[i] == charset[j]) {
+      if (ptr[pos] == charset[j]) {
         found = ARES_TRUE;
         goto done;
       }
@@ -704,20 +764,50 @@ size_t ares__buf_consume_until_charset(ares__buf_t         *buf,
 
 done:
   if (require_charset && !found) {
+    return SIZE_MAX;
+  }
+
+  if (pos > 0) {
+    ares_buf_consume(buf, pos);
+  }
+  return pos;
+}
+
+size_t ares_buf_consume_until_seq(ares_buf_t *buf, const unsigned char *seq,
+                                  size_t len, ares_bool_t require_seq)
+{
+  size_t               remaining_len = 0;
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
+  const unsigned char *p;
+  size_t               consume_len = 0;
+
+  if (ptr == NULL || seq == NULL || len == 0) {
     return 0;
   }
 
-  if (i > 0) {
-    ares__buf_consume(buf, i);
+  p = ares_memmem(ptr, remaining_len, seq, len);
+  if (require_seq && p == NULL) {
+    return SIZE_MAX;
   }
-  return i;
+
+  if (p != NULL) {
+    consume_len = (size_t)(p - ptr);
+  } else {
+    consume_len = remaining_len;
+  }
+
+  if (consume_len > 0) {
+    ares_buf_consume(buf, consume_len);
+  }
+
+  return consume_len;
 }
 
-size_t ares__buf_consume_charset(ares__buf_t *buf, const unsigned char *charset,
-                                 size_t len)
+size_t ares_buf_consume_charset(ares_buf_t *buf, const unsigned char *charset,
+                                size_t len)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
   size_t               i;
 
   if (ptr == NULL || charset == NULL || len == 0) {
@@ -738,28 +828,30 @@ size_t ares__buf_consume_charset(ares__buf_t *buf, const unsigned char *charset,
   }
 
   if (i > 0) {
-    ares__buf_consume(buf, i);
+    ares_buf_consume(buf, i);
   }
   return i;
 }
 
-static void ares__buf_destroy_cb(void *arg)
+static void ares_buf_destroy_cb(void *arg)
 {
-  ares__buf_destroy(arg);
+  ares_buf_t **buf = arg;
+  ares_buf_destroy(*buf);
 }
 
-static ares_bool_t ares__buf_split_isduplicate(ares__llist_t       *list,
-                                               const unsigned char *val,
-                                               size_t               len,
-                                               ares__buf_split_t    flags)
+static ares_bool_t ares_buf_split_isduplicate(ares_array_t        *arr,
+                                              const unsigned char *val,
+                                              size_t               len,
+                                              ares_buf_split_t     flags)
 {
-  ares__llist_node_t *node;
+  size_t i;
+  size_t num = ares_array_len(arr);
 
-  for (node = ares__llist_node_first(list); node != NULL;
-       node = ares__llist_node_next(node)) {
-    const ares__buf_t   *buf  = ares__llist_node_val(node);
-    size_t               plen = 0;
-    const unsigned char *ptr  = ares__buf_peek(buf, &plen);
+  for (i = 0; i < num; i++) {
+    ares_buf_t         **bufptr = ares_array_at(arr, i);
+    const ares_buf_t    *buf    = *bufptr;
+    size_t               plen   = 0;
+    const unsigned char *ptr    = ares_buf_peek(buf, &plen);
 
     /* Can't be duplicate if lengths mismatch */
     if (plen != len) {
@@ -767,61 +859,62 @@ static ares_bool_t ares__buf_split_isduplicate(ares__llist_t       *list,
     }
 
     if (flags & ARES_BUF_SPLIT_CASE_INSENSITIVE) {
-      if (ares__memeq_ci(ptr, val, len)) {
+      if (ares_memeq_ci(ptr, val, len)) {
         return ARES_TRUE;
       }
     } else {
-      if (memcmp(ptr, val, len) == 0) {
+      if (ares_memeq(ptr, val, len)) {
         return ARES_TRUE;
       }
     }
   }
+
   return ARES_FALSE;
 }
 
-ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
-                              size_t delims_len, ares__buf_split_t flags,
-                              size_t max_sections, ares__llist_t **list)
+ares_status_t ares_buf_split(ares_buf_t *buf, const unsigned char *delims,
+                             size_t delims_len, ares_buf_split_t flags,
+                             size_t max_sections, ares_array_t **arr)
 {
   ares_status_t status = ARES_SUCCESS;
   ares_bool_t   first  = ARES_TRUE;
 
-  if (buf == NULL || delims == NULL || delims_len == 0 || list == NULL) {
+  if (buf == NULL || delims == NULL || delims_len == 0 || arr == NULL) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  *list = ares__llist_create(ares__buf_destroy_cb);
-  if (*list == NULL) {
+  *arr = ares_array_create(sizeof(ares_buf_t *), ares_buf_destroy_cb);
+  if (*arr == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  while (ares__buf_len(buf)) {
+  while (ares_buf_len(buf)) {
     size_t               len = 0;
     const unsigned char *ptr;
 
     if (first) {
       /* No delimiter yet, just tag the start */
-      ares__buf_tag(buf);
+      ares_buf_tag(buf);
     } else {
-      if (flags & ARES_BUF_SPLIT_DONT_CONSUME_DELIMS) {
+      if (flags & ARES_BUF_SPLIT_KEEP_DELIMS) {
         /* tag then eat delimiter so its first byte in buffer */
-        ares__buf_tag(buf);
-        ares__buf_consume(buf, 1);
+        ares_buf_tag(buf);
+        ares_buf_consume(buf, 1);
       } else {
         /* throw away delimiter */
-        ares__buf_consume(buf, 1);
-        ares__buf_tag(buf);
+        ares_buf_consume(buf, 1);
+        ares_buf_tag(buf);
       }
     }
 
-    if (max_sections && ares__llist_len(*list) >= max_sections - 1) {
-      ares__buf_consume(buf, ares__buf_len(buf));
+    if (max_sections && ares_array_len(*arr) >= max_sections - 1) {
+      ares_buf_consume(buf, ares_buf_len(buf));
     } else {
-      ares__buf_consume_until_charset(buf, delims, delims_len, ARES_FALSE);
+      ares_buf_consume_until_charset(buf, delims, delims_len, ARES_FALSE);
     }
 
-    ptr = ares__buf_tag_fetch(buf, &len);
+    ptr = ares_buf_tag_fetch(buf, &len);
 
     /* Shouldn't be possible */
     if (ptr == NULL) {
@@ -832,7 +925,7 @@ ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
     if (flags & ARES_BUF_SPLIT_LTRIM) {
       size_t i;
       for (i = 0; i < len; i++) {
-        if (!ares__is_whitespace(ptr[i], ARES_TRUE)) {
+        if (!ares_is_whitespace(ptr[i], ARES_TRUE)) {
           break;
         }
       }
@@ -841,22 +934,22 @@ ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
     }
 
     if (flags & ARES_BUF_SPLIT_RTRIM) {
-      while (len > 0 && ares__is_whitespace(ptr[len - 1], ARES_TRUE)) {
+      while (len > 0 && ares_is_whitespace(ptr[len - 1], ARES_TRUE)) {
         len--;
       }
     }
 
     if (len != 0 || flags & ARES_BUF_SPLIT_ALLOW_BLANK) {
-      ares__buf_t *data;
+      ares_buf_t *data;
 
       if (!(flags & ARES_BUF_SPLIT_NO_DUPLICATES) ||
-          !ares__buf_split_isduplicate(*list, ptr, len, flags)) {
+          !ares_buf_split_isduplicate(*arr, ptr, len, flags)) {
         /* Since we don't allow const buffers of 0 length, and user wants
          * 0-length buffers, swap what we do here */
         if (len) {
-          data = ares__buf_create_const(ptr, len);
+          data = ares_buf_create_const(ptr, len);
         } else {
-          data = ares__buf_create();
+          data = ares_buf_create();
         }
 
         if (data == NULL) {
@@ -864,9 +957,9 @@ ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
           goto done;
         }
 
-        if (ares__llist_insert_last(*list, data) == NULL) {
-          ares__buf_destroy(data);
-          status = ARES_ENOMEM;
+        status = ares_array_insertdata_last(*arr, &data);
+        if (status != ARES_SUCCESS) {
+          ares_buf_destroy(data);
           goto done;
         }
       }
@@ -877,18 +970,110 @@ ares_status_t ares__buf_split(ares__buf_t *buf, const unsigned char *delims,
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__llist_destroy(*list);
-    *list = NULL;
+    ares_array_destroy(*arr);
+    *arr = NULL;
   }
 
   return status;
 }
 
-ares_bool_t ares__buf_begins_with(const ares__buf_t   *buf,
-                                  const unsigned char *data, size_t data_len)
+static void ares_free_split_array(void *arg)
+{
+  void **ptr = arg;
+  ares_free(*ptr);
+}
+
+ares_status_t ares_buf_split_str_array(ares_buf_t          *buf,
+                                       const unsigned char *delims,
+                                       size_t               delims_len,
+                                       ares_buf_split_t     flags,
+                                       size_t max_sections, ares_array_t **arr)
+{
+  ares_status_t status;
+  ares_array_t *split = NULL;
+  size_t        i;
+  size_t        len;
+
+  if (arr == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *arr = NULL;
+
+  status = ares_buf_split(buf, delims, delims_len, flags, max_sections, &split);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  *arr = ares_array_create(sizeof(char *), ares_free_split_array);
+  if (*arr == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  len = ares_array_len(split);
+  for (i = 0; i < len; i++) {
+    ares_buf_t **bufptr = ares_array_at(split, i);
+    ares_buf_t  *lbuf   = *bufptr;
+    char        *str    = NULL;
+
+    status = ares_buf_fetch_str_dup(lbuf, ares_buf_len(lbuf), &str);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    status = ares_array_insertdata_last(*arr, &str);
+    if (status != ARES_SUCCESS) {
+      ares_free(str);
+      goto done;
+    }
+  }
+
+done:
+  ares_array_destroy(split);
+  if (status != ARES_SUCCESS) {
+    ares_array_destroy(*arr);
+    *arr = NULL;
+  }
+  return status;
+}
+
+ares_status_t ares_buf_split_str(ares_buf_t *buf, const unsigned char *delims,
+                                 size_t delims_len, ares_buf_split_t flags,
+                                 size_t max_sections, char ***strs,
+                                 size_t *nstrs)
+{
+  ares_status_t status;
+  ares_array_t *arr = NULL;
+
+  if (strs == NULL || nstrs == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *strs  = NULL;
+  *nstrs = 0;
+
+  status = ares_buf_split_str_array(buf, delims, delims_len, flags,
+                                    max_sections, &arr);
+
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  if (status == ARES_SUCCESS) {
+    *strs = ares_array_finish(arr, nstrs);
+  } else {
+    ares_array_destroy(arr);
+  }
+  return status;
+}
+
+ares_bool_t ares_buf_begins_with(const ares_buf_t    *buf,
+                                 const unsigned char *data, size_t data_len)
 {
   size_t               remaining_len = 0;
-  const unsigned char *ptr           = ares__buf_fetch(buf, &remaining_len);
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
 
   if (ptr == NULL || data == NULL || data_len == 0) {
     return ARES_FALSE;
@@ -905,7 +1090,7 @@ ares_bool_t ares__buf_begins_with(const ares__buf_t   *buf,
   return ARES_TRUE;
 }
 
-size_t ares__buf_len(const ares__buf_t *buf)
+size_t ares_buf_len(const ares_buf_t *buf)
 {
   if (buf == NULL) {
     return 0;
@@ -914,12 +1099,28 @@ size_t ares__buf_len(const ares__buf_t *buf)
   return buf->data_len - buf->offset;
 }
 
-const unsigned char *ares__buf_peek(const ares__buf_t *buf, size_t *len)
+const unsigned char *ares_buf_peek(const ares_buf_t *buf, size_t *len)
 {
-  return ares__buf_fetch(buf, len);
+  return ares_buf_fetch(buf, len);
+}
+
+ares_status_t ares_buf_peek_byte(const ares_buf_t *buf, unsigned char *b)
+{
+  size_t               remaining_len = 0;
+  const unsigned char *ptr           = ares_buf_fetch(buf, &remaining_len);
+
+  if (buf == NULL || b == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (remaining_len == 0) {
+    return ARES_EBADRESP;
+  }
+  *b = ptr[0];
+  return ARES_SUCCESS;
 }
 
-size_t ares__buf_get_position(const ares__buf_t *buf)
+size_t ares_buf_get_position(const ares_buf_t *buf)
 {
   if (buf == NULL) {
     return 0;
@@ -927,7 +1128,7 @@ size_t ares__buf_get_position(const ares__buf_t *buf)
   return buf->offset;
 }
 
-ares_status_t ares__buf_set_position(ares__buf_t *buf, size_t idx)
+ares_status_t ares_buf_set_position(ares_buf_t *buf, size_t idx)
 {
   if (buf == NULL) {
     return ARES_EFORMERR;
@@ -941,84 +1142,14 @@ ares_status_t ares__buf_set_position(ares__buf_t *buf, size_t idx)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_parse_dns_abinstr(ares__buf_t *buf,
-                                          size_t       remaining_len,
-                                          ares__dns_multistring_t **strs,
-                                          ares_bool_t validate_printable)
-{
-  unsigned char len;
-  ares_status_t status   = ARES_EBADRESP;
-  size_t        orig_len = ares__buf_len(buf);
-
-  if (buf == NULL) {
-    return ARES_EFORMERR;
-  }
-
-  if (remaining_len == 0) {
-    return ARES_EBADRESP;
-  }
-
-  if (strs != NULL) {
-    *strs = ares__dns_multistring_create();
-    if (*strs == NULL) {
-      return ARES_ENOMEM;
-    }
-  }
-
-  while (orig_len - ares__buf_len(buf) < remaining_len) {
-    status = ares__buf_fetch_bytes(buf, &len, 1);
-    if (status != ARES_SUCCESS) {
-      break; /* LCOV_EXCL_LINE: DefensiveCoding */
-    }
-
-    if (len) {
-      /* When used by the _str() parser, it really needs to be validated to
-       * be a valid printable ascii string.  Do that here */
-      if (validate_printable && ares__buf_len(buf) >= len) {
-        size_t      mylen;
-        const char *data = (const char *)ares__buf_peek(buf, &mylen);
-        if (!ares__str_isprint(data, len)) {
-          status = ARES_EBADSTR;
-          break;
-        }
-      }
-
-      if (strs != NULL) {
-        unsigned char *data = NULL;
-        status = ares__buf_fetch_bytes_dup(buf, len, ARES_TRUE, &data);
-        if (status != ARES_SUCCESS) {
-          break;
-        }
-        status = ares__dns_multistring_add_own(*strs, data, len);
-        if (status != ARES_SUCCESS) {
-          ares_free(data);
-          break;
-        }
-      } else {
-        status = ares__buf_consume(buf, len);
-        if (status != ARES_SUCCESS) {
-          break;
-        }
-      }
-    }
-  }
-
-  if (status != ARES_SUCCESS && strs != NULL) {
-    ares__dns_multistring_destroy(*strs);
-    *strs = NULL;
-  }
-
-  return status;
-}
-
 static ares_status_t
-  ares__buf_parse_dns_binstr_int(ares__buf_t *buf, size_t remaining_len,
-                                 unsigned char **bin, size_t *bin_len,
-                                 ares_bool_t validate_printable)
+  ares_buf_parse_dns_binstr_int(ares_buf_t *buf, size_t remaining_len,
+                                unsigned char **bin, size_t *bin_len,
+                                ares_bool_t validate_printable)
 {
   unsigned char len;
   ares_status_t status = ARES_EBADRESP;
-  ares__buf_t  *binbuf = NULL;
+  ares_buf_t   *binbuf = NULL;
 
   if (buf == NULL) {
     return ARES_EFORMERR;
@@ -1028,12 +1159,12 @@ static ares_status_t
     return ARES_EBADRESP;
   }
 
-  binbuf = ares__buf_create();
+  binbuf = ares_buf_create();
   if (binbuf == NULL) {
     return ARES_ENOMEM;
   }
 
-  status = ares__buf_fetch_bytes(buf, &len, 1);
+  status = ares_buf_fetch_bytes(buf, &len, 1);
   if (status != ARES_SUCCESS) {
     goto done; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
@@ -1048,32 +1179,32 @@ static ares_status_t
   if (len) {
     /* When used by the _str() parser, it really needs to be validated to
      * be a valid printable ascii string.  Do that here */
-    if (validate_printable && ares__buf_len(buf) >= len) {
+    if (validate_printable && ares_buf_len(buf) >= len) {
       size_t      mylen;
-      const char *data = (const char *)ares__buf_peek(buf, &mylen);
-      if (!ares__str_isprint(data, len)) {
+      const char *data = (const char *)ares_buf_peek(buf, &mylen);
+      if (!ares_str_isprint(data, len)) {
         status = ARES_EBADSTR;
         goto done;
       }
     }
 
     if (bin != NULL) {
-      status = ares__buf_fetch_bytes_into_buf(buf, binbuf, len);
+      status = ares_buf_fetch_bytes_into_buf(buf, binbuf, len);
     } else {
-      status = ares__buf_consume(buf, len);
+      status = ares_buf_consume(buf, len);
     }
   }
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__buf_destroy(binbuf);
+    ares_buf_destroy(binbuf);
   } else {
     if (bin != NULL) {
       size_t mylen = 0;
-      /* NOTE: we use ares__buf_finish_str() here as we guarantee NULL
+      /* NOTE: we use ares_buf_finish_str() here as we guarantee NULL
        *       Termination even though we are technically returning binary data.
        */
-      *bin     = (unsigned char *)ares__buf_finish_str(binbuf, &mylen);
+      *bin     = (unsigned char *)ares_buf_finish_str(binbuf, &mylen);
       *bin_len = mylen;
     }
   }
@@ -1081,32 +1212,32 @@ static ares_status_t
   return status;
 }
 
-ares_status_t ares__buf_parse_dns_binstr(ares__buf_t *buf, size_t remaining_len,
-                                         unsigned char **bin, size_t *bin_len)
+ares_status_t ares_buf_parse_dns_binstr(ares_buf_t *buf, size_t remaining_len,
+                                        unsigned char **bin, size_t *bin_len)
 {
-  return ares__buf_parse_dns_binstr_int(buf, remaining_len, bin, bin_len,
-                                        ARES_FALSE);
+  return ares_buf_parse_dns_binstr_int(buf, remaining_len, bin, bin_len,
+                                       ARES_FALSE);
 }
 
-ares_status_t ares__buf_parse_dns_str(ares__buf_t *buf, size_t remaining_len,
-                                      char **str)
+ares_status_t ares_buf_parse_dns_str(ares_buf_t *buf, size_t remaining_len,
+                                     char **str)
 {
   size_t len;
 
-  return ares__buf_parse_dns_binstr_int(buf, remaining_len,
-                                        (unsigned char **)str, &len, ARES_TRUE);
+  return ares_buf_parse_dns_binstr_int(buf, remaining_len,
+                                       (unsigned char **)str, &len, ARES_TRUE);
 }
 
-ares_status_t ares__buf_append_num_dec(ares__buf_t *buf, size_t num, size_t len)
+ares_status_t ares_buf_append_num_dec(ares_buf_t *buf, size_t num, size_t len)
 {
   size_t i;
   size_t mod;
 
   if (len == 0) {
-    len = ares__count_digits(num);
+    len = ares_count_digits(num);
   }
 
-  mod = ares__pow(10, len);
+  mod = ares_pow(10, len);
 
   for (i = len; i > 0; i--) {
     size_t        digit = (num % mod);
@@ -1120,7 +1251,7 @@ ares_status_t ares__buf_append_num_dec(ares__buf_t *buf, size_t num, size_t len)
     }
 
     digit  /= mod;
-    status  = ares__buf_append_byte(buf, '0' + (unsigned char)(digit & 0xFF));
+    status  = ares_buf_append_byte(buf, '0' + (unsigned char)(digit & 0xFF));
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -1128,18 +1259,18 @@ ares_status_t ares__buf_append_num_dec(ares__buf_t *buf, size_t num, size_t len)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_append_num_hex(ares__buf_t *buf, size_t num, size_t len)
+ares_status_t ares_buf_append_num_hex(ares_buf_t *buf, size_t num, size_t len)
 {
   size_t                     i;
   static const unsigned char hexbytes[] = "0123456789ABCDEF";
 
   if (len == 0) {
-    len = ares__count_hexdigits(num);
+    len = ares_count_hexdigits(num);
   }
 
   for (i = len; i > 0; i--) {
     ares_status_t status;
-    status = ares__buf_append_byte(buf, hexbytes[(num >> ((i - 1) * 4)) & 0xF]);
+    status = ares_buf_append_byte(buf, hexbytes[(num >> ((i - 1) * 4)) & 0xF]);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -1147,48 +1278,48 @@ ares_status_t ares__buf_append_num_hex(ares__buf_t *buf, size_t num, size_t len)
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_append_str(ares__buf_t *buf, const char *str)
+ares_status_t ares_buf_append_str(ares_buf_t *buf, const char *str)
 {
-  return ares__buf_append(buf, (const unsigned char *)str, ares_strlen(str));
+  return ares_buf_append(buf, (const unsigned char *)str, ares_strlen(str));
 }
 
-static ares_status_t ares__buf_hexdump_line(ares__buf_t *buf, size_t idx,
-                                            const unsigned char *data,
-                                            size_t               len)
+static ares_status_t ares_buf_hexdump_line(ares_buf_t *buf, size_t idx,
+                                           const unsigned char *data,
+                                           size_t               len)
 {
   size_t        i;
   ares_status_t status;
 
   /* Address */
-  status = ares__buf_append_num_hex(buf, idx, 6);
+  status = ares_buf_append_num_hex(buf, idx, 6);
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   /* | */
-  status = ares__buf_append_str(buf, " | ");
+  status = ares_buf_append_str(buf, " | ");
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   for (i = 0; i < 16; i++) {
     if (i >= len) {
-      status = ares__buf_append_str(buf, "  ");
+      status = ares_buf_append_str(buf, "  ");
     } else {
-      status = ares__buf_append_num_hex(buf, data[i], 2);
+      status = ares_buf_append_num_hex(buf, data[i], 2);
     }
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
 
-    status = ares__buf_append_byte(buf, ' ');
+    status = ares_buf_append_byte(buf, ' ');
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
   /* | */
-  status = ares__buf_append_str(buf, " | ");
+  status = ares_buf_append_str(buf, " | ");
   if (status != ARES_SUCCESS) {
     return status; /* LCOV_EXCL_LINE: OutOfMemory */
   }
@@ -1197,24 +1328,24 @@ static ares_status_t ares__buf_hexdump_line(ares__buf_t *buf, size_t idx,
     if (i >= len) {
       break;
     }
-    status = ares__buf_append_byte(buf, ares__isprint(data[i]) ? data[i] : '.');
+    status = ares_buf_append_byte(buf, ares_isprint(data[i]) ? data[i] : '.');
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
-  return ares__buf_append_byte(buf, '\n');
+  return ares_buf_append_byte(buf, '\n');
 }
 
-ares_status_t ares__buf_hexdump(ares__buf_t *buf, const unsigned char *data,
-                                size_t len)
+ares_status_t ares_buf_hexdump(ares_buf_t *buf, const unsigned char *data,
+                               size_t len)
 {
   size_t i;
 
   /* Each line is 16 bytes */
   for (i = 0; i < len; i += 16) {
     ares_status_t status;
-    status = ares__buf_hexdump_line(buf, i, data + i, len - i);
+    status = ares_buf_hexdump_line(buf, i, data + i, len - i);
     if (status != ARES_SUCCESS) {
       return status; /* LCOV_EXCL_LINE: OutOfMemory */
     }
@@ -1223,7 +1354,7 @@ ares_status_t ares__buf_hexdump(ares__buf_t *buf, const unsigned char *data,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__buf_load_file(const char *filename, ares__buf_t *buf)
+ares_status_t ares_buf_load_file(const char *filename, ares_buf_t *buf)
 {
   FILE          *fp        = NULL;
   unsigned char *ptr       = NULL;
@@ -1238,7 +1369,7 @@ ares_status_t ares__buf_load_file(const char *filename, ares__buf_t *buf)
 
   fp = fopen(filename, "rb");
   if (fp == NULL) {
-    int error = ERRNO;
+    int error = errno;
     switch (error) {
       case ENOENT:
       case ESRCH:
@@ -1278,7 +1409,7 @@ ares_status_t ares__buf_load_file(const char *filename, ares__buf_t *buf)
 
   /* Read entire data into buffer */
   ptr_len = len;
-  ptr     = ares__buf_append_start(buf, &ptr_len);
+  ptr     = ares_buf_append_start(buf, &ptr_len);
   if (ptr == NULL) {
     status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
     goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
@@ -1290,7 +1421,7 @@ ares_status_t ares__buf_load_file(const char *filename, ares__buf_t *buf)
     goto done;           /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  ares__buf_append_finish(buf, len);
+  ares_buf_append_finish(buf, len);
   status = ARES_SUCCESS;
 
 done:
diff --git a/deps/cares/src/lib/str/ares_str.c b/deps/cares/src/lib/str/ares_str.c
index ade61041eb920b..f6bfabf11f4467 100644
--- a/deps/cares/src/lib/str/ares_str.c
+++ b/deps/cares/src/lib/str/ares_str.c
@@ -101,14 +101,30 @@ ares_bool_t ares_str_isnum(const char *str)
   }
 
   for (i = 0; str[i] != 0; i++) {
-    if (str[i] < '0' || str[i] > '9') {
+    if (!ares_isdigit(str[i])) {
       return ARES_FALSE;
     }
   }
   return ARES_TRUE;
 }
 
-void ares__str_rtrim(char *str)
+ares_bool_t ares_str_isalnum(const char *str)
+{
+  size_t i;
+
+  if (str == NULL || *str == 0) {
+    return ARES_FALSE;
+  }
+
+  for (i = 0; str[i] != 0; i++) {
+    if (!ares_isdigit(str[i]) && !ares_isalpha(str[i])) {
+      return ARES_FALSE;
+    }
+  }
+  return ARES_TRUE;
+}
+
+void ares_str_rtrim(char *str)
 {
   size_t len;
   size_t i;
@@ -119,14 +135,14 @@ void ares__str_rtrim(char *str)
 
   len = ares_strlen(str);
   for (i = len; i > 0; i--) {
-    if (!ares__isspace(str[i - 1])) {
+    if (!ares_isspace(str[i - 1])) {
       break;
     }
   }
   str[i] = 0;
 }
 
-void ares__str_ltrim(char *str)
+void ares_str_ltrim(char *str)
 {
   size_t i;
   size_t len;
@@ -135,7 +151,7 @@ void ares__str_ltrim(char *str)
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
   }
 
-  for (i = 0; str[i] != 0 && ares__isspace(str[i]); i++) {
+  for (i = 0; str[i] != 0 && ares_isspace(str[i]); i++) {
     /* Do nothing */
   }
 
@@ -150,15 +166,15 @@ void ares__str_ltrim(char *str)
   str[len - i] = 0;
 }
 
-void ares__str_trim(char *str)
+void ares_str_trim(char *str)
 {
-  ares__str_ltrim(str);
-  ares__str_rtrim(str);
+  ares_str_ltrim(str);
+  ares_str_rtrim(str);
 }
 
 /* tolower() is locale-specific.  Use a lookup table fast conversion that only
  * operates on ASCII */
-static const unsigned char ares__tolower_lookup[] = {
+static const unsigned char ares_tolower_lookup[] = {
   0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C,
   0x0D, 0x0E, 0x0F, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19,
   0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26,
@@ -181,81 +197,80 @@ static const unsigned char ares__tolower_lookup[] = {
   0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
 };
 
-unsigned char ares__tolower(unsigned char c)
+unsigned char ares_tolower(unsigned char c)
 {
-  return ares__tolower_lookup[c];
+  return ares_tolower_lookup[c];
 }
 
-ares_bool_t ares__memeq_ci(const unsigned char *ptr, const unsigned char *val,
-                           size_t len)
+void ares_str_lower(char *str)
 {
   size_t i;
-  for (i = 0; i < len; i++) {
-    if (ares__tolower_lookup[ptr[i]] != ares__tolower_lookup[val[i]]) {
-      return ARES_FALSE;
-    }
+
+  if (str == NULL) {
+    return;
+  }
+
+  for (i = 0; str[i] != 0; i++) {
+    str[i] = (char)ares_tolower((unsigned char)str[i]);
   }
-  return ARES_TRUE;
 }
 
-ares_bool_t ares__isspace(int ch)
+unsigned char *ares_memmem(const unsigned char *big, size_t big_len,
+                           const unsigned char *little, size_t little_len)
 {
-  switch (ch) {
-    case '\r':
-    case '\t':
-    case ' ':
-    case '\v':
-    case '\f':
-    case '\n':
-      return ARES_TRUE;
-    default:
+  unsigned char *ptr;
+
+  if (big == NULL || little == NULL || big_len == 0 || little_len == 0) {
+    return NULL;
+  }
+
+#ifdef HAVE_MEMMEM
+  ptr = memmem(big, big_len, little, little_len);
+  return ptr;
+#else
+  while (1) {
+    ptr = memchr(big, little[0], big_len);
+    if (ptr == NULL) {
+      break;
+    }
+
+    big_len -= (size_t)(ptr - big);
+    big      = ptr;
+    if (big_len < little_len) {
       break;
+    }
+
+    if (memcmp(big, little, little_len) == 0) {
+      return ptr;
+    }
+
+    big++;
+    big_len--;
   }
-  return ARES_FALSE;
+
+  return NULL;
+#endif
 }
 
-ares_bool_t ares__isprint(int ch)
+ares_bool_t ares_memeq(const unsigned char *ptr, const unsigned char *val,
+                       size_t len)
 {
-  if (ch >= 0x20 && ch <= 0x7E) {
-    return ARES_TRUE;
-  }
-  return ARES_FALSE;
+  return memcmp(ptr, val, len) == 0 ? ARES_TRUE : ARES_FALSE;
 }
 
-/* Character set allowed by hostnames.  This is to include the normal
- * domain name character set plus:
- *  - underscores which are used in SRV records.
- *  - Forward slashes such as are used for classless in-addr.arpa
- *    delegation (CNAMEs)
- *  - Asterisks may be used for wildcard domains in CNAMEs as seen in the
- *    real world.
- * While RFC 2181 section 11 does state not to do validation,
- * that applies to servers, not clients.  Vulnerabilities have been
- * reported when this validation is not performed.  Security is more
- * important than edge-case compatibility (which is probably invalid
- * anyhow). */
-ares_bool_t ares__is_hostnamech(int ch)
+ares_bool_t ares_memeq_ci(const unsigned char *ptr, const unsigned char *val,
+                          size_t len)
 {
-  /* [A-Za-z0-9-*._/]
-   * Don't use isalnum() as it is locale-specific
-   */
-  if (ch >= 'A' && ch <= 'Z') {
-    return ARES_TRUE;
-  }
-  if (ch >= 'a' && ch <= 'z') {
-    return ARES_TRUE;
-  }
-  if (ch >= '0' && ch <= '9') {
-    return ARES_TRUE;
-  }
-  if (ch == '-' || ch == '.' || ch == '_' || ch == '/' || ch == '*') {
-    return ARES_TRUE;
+  size_t i;
+  for (i = 0; i < len; i++) {
+    if (ares_tolower_lookup[ptr[i]] != ares_tolower_lookup[val[i]]) {
+      return ARES_FALSE;
+    }
   }
-
-  return ARES_FALSE;
+  return ARES_TRUE;
 }
 
-ares_bool_t ares__is_hostname(const char *str)
+ares_bool_t ares_is_hostname(const char *str)
 {
   size_t i;
 
@@ -264,14 +279,14 @@ ares_bool_t ares__is_hostname(const char *str)
   }
 
   for (i = 0; str[i] != 0; i++) {
-    if (!ares__is_hostnamech(str[i])) {
+    if (!ares_is_hostnamech(str[i])) {
       return ARES_FALSE;
     }
   }
   return ARES_TRUE;
 }
 
-ares_bool_t ares__str_isprint(const char *str, size_t len)
+ares_bool_t ares_str_isprint(const char *str, size_t len)
 {
   size_t i;
 
@@ -280,9 +295,197 @@ ares_bool_t ares__str_isprint(const char *str, size_t len)
   }
 
   for (i = 0; i < len; i++) {
-    if (!ares__isprint(str[i])) {
+    if (!ares_isprint(str[i])) {
       return ARES_FALSE;
     }
   }
   return ARES_TRUE;
 }
+
+int ares_strcmp(const char *a, const char *b)
+{
+  if (a == NULL && b == NULL) {
+    return 0;
+  }
+
+  if (a != NULL && b == NULL) {
+    if (*a == 0) {
+      return 0;
+    }
+    return 1;
+  }
+
+  if (a == NULL && b != NULL) {
+    if (*b == 0) {
+      return 0;
+    }
+    return -1;
+  }
+
+  return strcmp(a, b);
+}
+
+int ares_strncmp(const char *a, const char *b, size_t n)
+{
+  if (n == 0) {
+    return 0;
+  }
+
+  if (a == NULL && b == NULL) {
+    return 0;
+  }
+
+  if (a != NULL && b == NULL) {
+    if (*a == 0) {
+      return 0;
+    }
+    return 1;
+  }
+
+  if (a == NULL && b != NULL) {
+    if (*b == 0) {
+      return 0;
+    }
+    return -1;
+  }
+
+  return strncmp(a, b, n);
+}
+
+int ares_strcasecmp(const char *a, const char *b)
+{
+  if (a == NULL && b == NULL) {
+    return 0;
+  }
+
+  if (a != NULL && b == NULL) {
+    if (*a == 0) {
+      return 0;
+    }
+    return 1;
+  }
+
+  if (a == NULL && b != NULL) {
+    if (*b == 0) {
+      return 0;
+    }
+    return -1;
+  }
+
+#if defined(HAVE_STRCASECMP)
+  return strcasecmp(a, b);
+#elif defined(HAVE_STRCMPI)
+  return strcmpi(a, b);
+#elif defined(HAVE_STRICMP)
+  return stricmp(a, b);
+#else
+  {
+    size_t i;
+
+    for (i = 0; i < (size_t)-1; i++) {
+      int c1 = ares_tolower(a[i]);
+      int c2 = ares_tolower(b[i]);
+      if (c1 != c2) {
+        return c1 - c2;
+      }
+      if (!c1) {
+        break;
+      }
+    }
+  }
+  return 0;
+#endif
+}
+
+int ares_strncasecmp(const char *a, const char *b, size_t n)
+{
+  if (n == 0) {
+    return 0;
+  }
+
+  if (a == NULL && b == NULL) {
+    return 0;
+  }
+
+  if (a != NULL && b == NULL) {
+    if (*a == 0) {
+      return 0;
+    }
+    return 1;
+  }
+
+  if (a == NULL && b != NULL) {
+    if (*b == 0) {
+      return 0;
+    }
+    return -1;
+  }
+
+#if defined(HAVE_STRNCASECMP)
+  return strncasecmp(a, b, n);
+#elif defined(HAVE_STRNCMPI)
+  return strncmpi(a, b, n);
+#elif defined(HAVE_STRNICMP)
+  return strnicmp(a, b, n);
+#else
+  {
+    size_t i;
+
+    for (i = 0; i < n; i++) {
+      int c1 = ares_tolower(a[i]);
+      int c2 = ares_tolower(b[i]);
+      if (c1 != c2) {
+        return c1 - c2;
+      }
+      if (!c1) {
+        break;
+      }
+    }
+  }
+  return 0;
+#endif
+}
+
+ares_bool_t ares_strcaseeq(const char *a, const char *b)
+{
+  return ares_strcasecmp(a, b) == 0 ? ARES_TRUE : ARES_FALSE;
+}
+
+ares_bool_t ares_strcaseeq_max(const char *a, const char *b, size_t n)
+{
+  return ares_strncasecmp(a, b, n) == 0 ? ARES_TRUE : ARES_FALSE;
+}
+
+ares_bool_t ares_streq(const char *a, const char *b)
+{
+  return ares_strcmp(a, b) == 0 ? ARES_TRUE : ARES_FALSE;
+}
+
+ares_bool_t ares_streq_max(const char *a, const char *b, size_t n)
+{
+  return ares_strncmp(a, b, n) == 0 ? ARES_TRUE : ARES_FALSE;
+}
+
+void ares_free_array(void *arrp, size_t nmembers, void (*freefunc)(void *))
+{
+  size_t i;
+  void **arr = arrp;
+
+  if (arr == NULL) {
+    return;
+  }
+
+  if (freefunc != NULL) {
+    if (nmembers == SIZE_MAX) {
+      for (i = 0; arr[i] != NULL; i++) {
+        freefunc(arr[i]);
+      }
+    } else {
+      for (i = 0; i < nmembers; i++) {
+        freefunc(arr[i]);
+      }
+    }
+  }
+
+  ares_free(arr);
+}
diff --git a/deps/cares/src/lib/str/ares_str.h b/deps/cares/src/lib/str/ares_str.h
deleted file mode 100644
index 440758c21be61f..00000000000000
--- a/deps/cares/src/lib/str/ares_str.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/* MIT License
- *
- * Copyright (c) 1998 Massachusetts Institute of Technology
- * Copyright (c) The c-ares project and its contributors
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- * SPDX-License-Identifier: MIT
- */
-#ifndef __ARES_STR_H
-#define __ARES_STR_H
-
-char         *ares_strdup(const char *s1);
-
-size_t        ares_strlen(const char *str);
-
-/*! Copy string from source to destination with destination buffer size
- *  provided.  The destination is guaranteed to be null terminated, if the
- *  provided buffer isn't large enough, only those bytes from the source that
- *  will fit will be copied.
- *
- *  \param[out] dest       Destination buffer
- *  \param[in]  src        Source to copy
- *  \param[in]  dest_size  Size of destination buffer
- *  \return String length.  Will be at most dest_size-1
- */
-size_t        ares_strcpy(char *dest, const char *src, size_t dest_size);
-
-ares_bool_t   ares_str_isnum(const char *str);
-
-void          ares__str_ltrim(char *str);
-void          ares__str_rtrim(char *str);
-void          ares__str_trim(char *str);
-
-unsigned char ares__tolower(unsigned char c);
-ares_bool_t   ares__memeq_ci(const unsigned char *ptr, const unsigned char *val,
-                             size_t len);
-
-ares_bool_t   ares__isspace(int ch);
-ares_bool_t   ares__isprint(int ch);
-ares_bool_t   ares__is_hostnamech(int ch);
-
-ares_bool_t   ares__is_hostname(const char *str);
-
-/*! Validate the string provided is printable.  The length specified must be
- *  at least the size of the buffer provided.  If a NULL-terminator is hit
- *  before the length provided is hit, this will not be considered a valid
- *  printable string.  This does not validate that the string is actually
- *  NULL terminated.
- *
- *  \param[in] str  Buffer containing string to evaluate.
- *  \param[in] len  Number of characters to evaluate within provided buffer.
- *                  If 0, will return TRUE since it did not hit an exception.
- *  \return ARES_TRUE if the entire string is printable, ARES_FALSE if not.
- */
-ares_bool_t   ares__str_isprint(const char *str, size_t len);
-
-/* We only care about ASCII rules */
-#define ares__isascii(x) (((unsigned char)x) <= 127)
-#define ares__isdigit(x) \
-  (((unsigned char)x) >= '0' && ((unsigned char)x) <= '9')
-#define ares__isxdigit(x)                                      \
-  (ares__isdigit(x) ||                                         \
-   (((unsigned char)x) >= 'a' && ((unsigned char)x) <= 'f') || \
-   (((unsigned char)x) >= 'A' && ((unsigned char)x) <= 'F'))
-#define ares__isupper(x) \
-  (((unsigned char)x) >= 'A' && ((unsigned char)x) <= 'Z')
-#define ares__islower(x) \
-  (((unsigned char)x) >= 'a' && ((unsigned char)x) <= 'z')
-#define ares__isalpha(x) (ares__islower(x) || ares__isupper(x))
-
-#endif /* __ARES_STR_H */
diff --git a/deps/cares/src/lib/str/ares_strsplit.c b/deps/cares/src/lib/str/ares_strsplit.c
index dee307d7799f51..4431c5044d5d3c 100644
--- a/deps/cares/src/lib/str/ares_strsplit.c
+++ b/deps/cares/src/lib/str/ares_strsplit.c
@@ -25,21 +25,12 @@
  */
 #include "ares_private.h"
 
-void ares__strsplit_free(char **elms, size_t num_elm)
+void ares_strsplit_free(char **elms, size_t num_elm)
 {
-  size_t i;
-
-  if (elms == NULL) {
-    return;
-  }
-
-  for (i = 0; i < num_elm; i++) {
-    ares_free(elms[i]);
-  }
-  ares_free(elms);
+  ares_free_array(elms, num_elm, ares_free);
 }
 
-char **ares__strsplit_duplicate(char **elms, size_t num_elm)
+char **ares_strsplit_duplicate(char **elms, size_t num_elm)
 {
   size_t i;
   char **out;
@@ -56,23 +47,19 @@ char **ares__strsplit_duplicate(char **elms, size_t num_elm)
   for (i = 0; i < num_elm; i++) {
     out[i] = ares_strdup(elms[i]);
     if (out[i] == NULL) {
-      ares__strsplit_free(out, num_elm); /* LCOV_EXCL_LINE: OutOfMemory */
-      return NULL;                       /* LCOV_EXCL_LINE: OutOfMemory */
+      ares_strsplit_free(out, num_elm); /* LCOV_EXCL_LINE: OutOfMemory */
+      return NULL;                      /* LCOV_EXCL_LINE: OutOfMemory */
     }
   }
 
   return out;
 }
 
-char **ares__strsplit(const char *in, const char *delms, size_t *num_elm)
+char **ares_strsplit(const char *in, const char *delms, size_t *num_elm)
 {
-  ares_status_t       status;
-  ares__buf_t        *buf   = NULL;
-  ares__llist_t      *llist = NULL;
-  ares__llist_node_t *node;
-  char              **out = NULL;
-  size_t              cnt = 0;
-  size_t              idx = 0;
+  ares_status_t status;
+  ares_buf_t   *buf = NULL;
+  char        **out = NULL;
 
   if (in == NULL || delms == NULL || num_elm == NULL) {
     return NULL; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -80,52 +67,22 @@ char **ares__strsplit(const char *in, const char *delms, size_t *num_elm)
 
   *num_elm = 0;
 
-  buf = ares__buf_create_const((const unsigned char *)in, ares_strlen(in));
+  buf = ares_buf_create_const((const unsigned char *)in, ares_strlen(in));
   if (buf == NULL) {
     return NULL;
   }
 
-  status = ares__buf_split(
+  status = ares_buf_split_str(
     buf, (const unsigned char *)delms, ares_strlen(delms),
-    ARES_BUF_SPLIT_NO_DUPLICATES | ARES_BUF_SPLIT_CASE_INSENSITIVE, 0, &llist);
+    ARES_BUF_SPLIT_NO_DUPLICATES | ARES_BUF_SPLIT_CASE_INSENSITIVE, 0, &out,
+    num_elm);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  cnt = ares__llist_len(llist);
-  if (cnt == 0) {
-    status = ARES_EFORMERR;
-    goto done;
-  }
-
-
-  out = ares_malloc_zero(cnt * sizeof(*out));
-  if (out == NULL) {
-    status = ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
-    goto done;            /* LCOV_EXCL_LINE: OutOfMemory */
-  }
-
-  for (node = ares__llist_node_first(llist); node != NULL;
-       node = ares__llist_node_next(node)) {
-    ares__buf_t *val  = ares__llist_node_val(node);
-    char        *temp = NULL;
-
-    status = ares__buf_fetch_str_dup(val, ares__buf_len(val), &temp);
-    if (status != ARES_SUCCESS) {
-      goto done;
-    }
-
-    out[idx++] = temp;
-  }
-
-  *num_elm = cnt;
-  status   = ARES_SUCCESS;
-
 done:
-  ares__llist_destroy(llist);
-  ares__buf_destroy(buf);
+  ares_buf_destroy(buf);
   if (status != ARES_SUCCESS) {
-    ares__strsplit_free(out, cnt);
     out = NULL;
   }
 
diff --git a/deps/cares/src/lib/str/ares_strsplit.h b/deps/cares/src/lib/str/ares_strsplit.h
index ee997804f012f1..0da090263af691 100644
--- a/deps/cares/src/lib/str/ares_strsplit.h
+++ b/deps/cares/src/lib/str/ares_strsplit.h
@@ -40,12 +40,12 @@
  * returns an allocated array of allocated string elements.
  *
  */
-char **ares__strsplit(const char *in, const char *delms, size_t *num_elm);
+char **ares_strsplit(const char *in, const char *delms, size_t *num_elm);
 
-/* Frees the result returned from ares__strsplit(). */
-void   ares__strsplit_free(char **elms, size_t num_elm);
+/* Frees the result returned from ares_strsplit(). */
+void   ares_strsplit_free(char **elms, size_t num_elm);
 
 /* Duplicate the array */
-char **ares__strsplit_duplicate(char **elms, size_t num_elm);
+char **ares_strsplit_duplicate(char **elms, size_t num_elm);
 
 #endif /* HEADER_CARES_STRSPLIT_H */
diff --git a/deps/cares/src/lib/util/ares__threads.h b/deps/cares/src/lib/util/ares__threads.h
deleted file mode 100644
index 108354dfc1e17f..00000000000000
--- a/deps/cares/src/lib/util/ares__threads.h
+++ /dev/null
@@ -1,60 +0,0 @@
-/* MIT License
- *
- * Copyright (c) 2023 Brad House
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- * SPDX-License-Identifier: MIT
- */
-#ifndef __ARES__THREADS_H
-#define __ARES__THREADS_H
-
-struct ares__thread_mutex;
-typedef struct ares__thread_mutex ares__thread_mutex_t;
-
-ares__thread_mutex_t             *ares__thread_mutex_create(void);
-void ares__thread_mutex_destroy(ares__thread_mutex_t *mut);
-void ares__thread_mutex_lock(ares__thread_mutex_t *mut);
-void ares__thread_mutex_unlock(ares__thread_mutex_t *mut);
-
-
-struct ares__thread_cond;
-typedef struct ares__thread_cond ares__thread_cond_t;
-
-ares__thread_cond_t             *ares__thread_cond_create(void);
-void          ares__thread_cond_destroy(ares__thread_cond_t *cond);
-void          ares__thread_cond_signal(ares__thread_cond_t *cond);
-void          ares__thread_cond_broadcast(ares__thread_cond_t *cond);
-ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
-                                     ares__thread_mutex_t *mut);
-ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
-                                          ares__thread_mutex_t *mut,
-                                          unsigned long         timeout_ms);
-
-
-struct ares__thread;
-typedef struct ares__thread ares__thread_t;
-
-typedef void *(*ares__thread_func_t)(void *arg);
-ares_status_t ares__thread_create(ares__thread_t    **thread,
-                                  ares__thread_func_t func, void *arg);
-ares_status_t ares__thread_join(ares__thread_t *thread, void **rv);
-
-#endif
diff --git a/deps/cares/src/lib/util/ares__iface_ips.c b/deps/cares/src/lib/util/ares_iface_ips.c
similarity index 69%
rename from deps/cares/src/lib/util/ares__iface_ips.c
rename to deps/cares/src/lib/util/ares_iface_ips.c
index 56dc25790412e4..46cb291e300ec2 100644
--- a/deps/cares/src/lib/util/ares__iface_ips.c
+++ b/deps/cares/src/lib/util/ares_iface_ips.c
@@ -59,41 +59,40 @@
 #endif
 
 
-static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
-                                               const char        *name);
+static ares_status_t ares_iface_ips_enumerate(ares_iface_ips_t *ips,
+                                              const char       *name);
 
 typedef struct {
-  char                  *name;
-  struct ares_addr       addr;
-  unsigned char          netmask;
-  unsigned int           ll_scope;
-  ares__iface_ip_flags_t flags;
-} ares__iface_ip_t;
-
-struct ares__iface_ips {
-  ares__array_t         *ips; /*!< Type is ares__iface_ip_t */
-  ares__iface_ip_flags_t enum_flags;
+  char                 *name;
+  struct ares_addr      addr;
+  unsigned char         netmask;
+  unsigned int          ll_scope;
+  ares_iface_ip_flags_t flags;
+} ares_iface_ip_t;
+
+struct ares_iface_ips {
+  ares_array_t         *ips; /*!< Type is ares_iface_ip_t */
+  ares_iface_ip_flags_t enum_flags;
 };
 
-static void ares__iface_ip_free_cb(void *arg)
+static void ares_iface_ip_free_cb(void *arg)
 {
-  ares__iface_ip_t *ip = arg;
+  ares_iface_ip_t *ip = arg;
   if (ip == NULL) {
     return;
   }
   ares_free(ip->name);
 }
 
-static ares__iface_ips_t *ares__iface_ips_alloc(ares__iface_ip_flags_t flags)
+static ares_iface_ips_t *ares_iface_ips_alloc(ares_iface_ip_flags_t flags)
 {
-  ares__iface_ips_t *ips = ares_malloc_zero(sizeof(*ips));
+  ares_iface_ips_t *ips = ares_malloc_zero(sizeof(*ips));
   if (ips == NULL) {
     return NULL; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   ips->enum_flags = flags;
-  ips->ips =
-    ares__array_create(sizeof(ares__iface_ip_t), ares__iface_ip_free_cb);
+  ips->ips = ares_array_create(sizeof(ares_iface_ip_t), ares_iface_ip_free_cb);
   if (ips->ips == NULL) {
     ares_free(ips); /* LCOV_EXCL_LINE: OutOfMemory */
     return NULL;    /* LCOV_EXCL_LINE: OutOfMemory */
@@ -101,18 +100,18 @@ static ares__iface_ips_t *ares__iface_ips_alloc(ares__iface_ip_flags_t flags)
   return ips;
 }
 
-void ares__iface_ips_destroy(ares__iface_ips_t *ips)
+void ares_iface_ips_destroy(ares_iface_ips_t *ips)
 {
   if (ips == NULL) {
     return;
   }
 
-  ares__array_destroy(ips->ips);
+  ares_array_destroy(ips->ips);
   ares_free(ips);
 }
 
-ares_status_t ares__iface_ips(ares__iface_ips_t    **ips,
-                              ares__iface_ip_flags_t flags, const char *name)
+ares_status_t ares_iface_ips(ares_iface_ips_t    **ips,
+                             ares_iface_ip_flags_t flags, const char *name)
 {
   ares_status_t status;
 
@@ -120,15 +119,15 @@ ares_status_t ares__iface_ips(ares__iface_ips_t    **ips,
     return ARES_EFORMERR;
   }
 
-  *ips = ares__iface_ips_alloc(flags);
+  *ips = ares_iface_ips_alloc(flags);
   if (*ips == NULL) {
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
-  status = ares__iface_ips_enumerate(*ips, name);
+  status = ares_iface_ips_enumerate(*ips, name);
   if (status != ARES_SUCCESS) {
     /* LCOV_EXCL_START: UntestablePath */
-    ares__iface_ips_destroy(*ips);
+    ares_iface_ips_destroy(*ips);
     *ips = NULL;
     return status;
     /* LCOV_EXCL_STOP */
@@ -138,12 +137,12 @@ ares_status_t ares__iface_ips(ares__iface_ips_t    **ips,
 }
 
 static ares_status_t
-  ares__iface_ips_add(ares__iface_ips_t *ips, ares__iface_ip_flags_t flags,
-                      const char *name, const struct ares_addr *addr,
-                      unsigned char netmask, unsigned int ll_scope)
+  ares_iface_ips_add(ares_iface_ips_t *ips, ares_iface_ip_flags_t flags,
+                     const char *name, const struct ares_addr *addr,
+                     unsigned char netmask, unsigned int ll_scope)
 {
-  ares__iface_ip_t *ip;
-  ares_status_t     status;
+  ares_iface_ip_t *ip;
+  ares_status_t    status;
 
   if (ips == NULL || name == NULL || addr == NULL) {
     return ARES_EFORMERR; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -162,7 +161,7 @@ static ares_status_t
   }
 
   /* Check for link-local */
-  if (ares__addr_is_linklocal(addr)) {
+  if (ares_addr_is_linklocal(addr)) {
     flags |= ARES_IFACE_IP_LINKLOCAL;
   }
   if (flags & ARES_IFACE_IP_LINKLOCAL &&
@@ -190,7 +189,7 @@ static ares_status_t
     }
   }
 
-  status = ares__array_insert_last((void **)&ip, ips->ips);
+  status = ares_array_insert_last((void **)&ip, ips->ips);
   if (status != ARES_SUCCESS) {
     return status;
   }
@@ -203,30 +202,30 @@ static ares_status_t
   memcpy(&ip->addr, addr, sizeof(*addr));
   ip->name = ares_strdup(name);
   if (ip->name == NULL) {
-    ares__array_remove_last(ips->ips);
+    ares_array_remove_last(ips->ips);
     return ARES_ENOMEM; /* LCOV_EXCL_LINE: OutOfMemory */
   }
 
   return ARES_SUCCESS;
 }
 
-size_t ares__iface_ips_cnt(const ares__iface_ips_t *ips)
+size_t ares_iface_ips_cnt(const ares_iface_ips_t *ips)
 {
   if (ips == NULL) {
     return 0;
   }
-  return ares__array_len(ips->ips);
+  return ares_array_len(ips->ips);
 }
 
-const char *ares__iface_ips_get_name(const ares__iface_ips_t *ips, size_t idx)
+const char *ares_iface_ips_get_name(const ares_iface_ips_t *ips, size_t idx)
 {
-  const ares__iface_ip_t *ip;
+  const ares_iface_ip_t *ip;
 
   if (ips == NULL) {
     return NULL;
   }
 
-  ip = ares__array_at_const(ips->ips, idx);
+  ip = ares_array_at_const(ips->ips, idx);
   if (ip == NULL) {
     return NULL;
   }
@@ -234,16 +233,16 @@ const char *ares__iface_ips_get_name(const ares__iface_ips_t *ips, size_t idx)
   return ip->name;
 }
 
-const struct ares_addr *ares__iface_ips_get_addr(const ares__iface_ips_t *ips,
-                                                 size_t                   idx)
+const struct ares_addr *ares_iface_ips_get_addr(const ares_iface_ips_t *ips,
+                                                size_t                  idx)
 {
-  const ares__iface_ip_t *ip;
+  const ares_iface_ip_t *ip;
 
   if (ips == NULL) {
     return NULL;
   }
 
-  ip = ares__array_at_const(ips->ips, idx);
+  ip = ares_array_at_const(ips->ips, idx);
   if (ip == NULL) {
     return NULL;
   }
@@ -251,16 +250,16 @@ const struct ares_addr *ares__iface_ips_get_addr(const ares__iface_ips_t *ips,
   return &ip->addr;
 }
 
-ares__iface_ip_flags_t ares__iface_ips_get_flags(const ares__iface_ips_t *ips,
-                                                 size_t                   idx)
+ares_iface_ip_flags_t ares_iface_ips_get_flags(const ares_iface_ips_t *ips,
+                                               size_t                  idx)
 {
-  const ares__iface_ip_t *ip;
+  const ares_iface_ip_t *ip;
 
   if (ips == NULL) {
     return 0;
   }
 
-  ip = ares__array_at_const(ips->ips, idx);
+  ip = ares_array_at_const(ips->ips, idx);
   if (ip == NULL) {
     return 0;
   }
@@ -268,16 +267,16 @@ ares__iface_ip_flags_t ares__iface_ips_get_flags(const ares__iface_ips_t *ips,
   return ip->flags;
 }
 
-unsigned char ares__iface_ips_get_netmask(const ares__iface_ips_t *ips,
-                                          size_t                   idx)
+unsigned char ares_iface_ips_get_netmask(const ares_iface_ips_t *ips,
+                                         size_t                  idx)
 {
-  const ares__iface_ip_t *ip;
+  const ares_iface_ip_t *ip;
 
   if (ips == NULL) {
     return 0;
   }
 
-  ip = ares__array_at_const(ips->ips, idx);
+  ip = ares_array_at_const(ips->ips, idx);
   if (ip == NULL) {
     return 0;
   }
@@ -285,16 +284,16 @@ unsigned char ares__iface_ips_get_netmask(const ares__iface_ips_t *ips,
   return ip->netmask;
 }
 
-unsigned int ares__iface_ips_get_ll_scope(const ares__iface_ips_t *ips,
-                                          size_t                   idx)
+unsigned int ares_iface_ips_get_ll_scope(const ares_iface_ips_t *ips,
+                                         size_t                  idx)
 {
-  const ares__iface_ip_t *ip;
+  const ares_iface_ip_t *ip;
 
   if (ips == NULL) {
     return 0;
   }
 
-  ip = ares__array_at_const(ips->ips, idx);
+  ip = ares_array_at_const(ips->ips, idx);
   if (ip == NULL) {
     return 0;
   }
@@ -334,7 +333,7 @@ static ares_bool_t name_match(const char *name, const char *adapter_name,
     return ARES_TRUE;
   }
 
-  if (strcasecmp(name, adapter_name) == 0) {
+  if (ares_strcaseeq(name, adapter_name)) {
     return ARES_TRUE;
   }
 
@@ -345,8 +344,8 @@ static ares_bool_t name_match(const char *name, const char *adapter_name,
   return ARES_FALSE;
 }
 
-static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
-                                               const char        *name)
+static ares_status_t ares_iface_ips_enumerate(ares_iface_ips_t *ips,
+                                              const char       *name)
 {
   ULONG myflags = GAA_FLAG_INCLUDE_PREFIX /*|GAA_FLAG_INCLUDE_ALL_INTERFACES */;
   ULONG outBufLen = 0;
@@ -377,7 +376,7 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
 
   for (address = addresses; address != NULL; address = address->Next) {
     IP_ADAPTER_UNICAST_ADDRESS *ipaddr     = NULL;
-    ares__iface_ip_flags_t      addrflag   = 0;
+    ares_iface_ip_flags_t       addrflag   = 0;
     char                        ifname[64] = "";
 
 #  if defined(HAVE_CONVERTINTERFACEINDEXTOLUID) && \
@@ -431,9 +430,9 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
         continue;
       }
 
-      status = ares__iface_ips_add(ips, addrflag, ifname, &addr,
-                                   ipaddr->OnLinkPrefixLength /* netmask */,
-                                   address->Ipv6IfIndex /* ll_scope */);
+      status = ares_iface_ips_add(ips, addrflag, ifname, &addr,
+                                  ipaddr->OnLinkPrefixLength /* netmask */,
+                                  address->Ipv6IfIndex /* ll_scope */);
 
       if (status != ARES_SUCCESS) {
         goto done;
@@ -454,13 +453,13 @@ static unsigned char count_addr_bits(const unsigned char *addr, size_t addr_len)
   unsigned char count = 0;
 
   for (i = 0; i < addr_len; i++) {
-    count += ares__count_bits_u8(addr[i]);
+    count += ares_count_bits_u8(addr[i]);
   }
   return count;
 }
 
-static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
-                                               const char        *name)
+static ares_status_t ares_iface_ips_enumerate(ares_iface_ips_t *ips,
+                                              const char       *name)
 {
   struct ifaddrs *ifap   = NULL;
   struct ifaddrs *ifa    = NULL;
@@ -472,10 +471,10 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
   }
 
   for (ifa = ifap; ifa != NULL; ifa = ifa->ifa_next) {
-    ares__iface_ip_flags_t addrflag = 0;
-    struct ares_addr       addr;
-    unsigned char          netmask  = 0;
-    unsigned int           ll_scope = 0;
+    ares_iface_ip_flags_t addrflag = 0;
+    struct ares_addr      addr;
+    unsigned char         netmask  = 0;
+    unsigned int          ll_scope = 0;
 
     if (ifa->ifa_addr == NULL) {
       continue;
@@ -515,12 +514,12 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
     }
 
     /* Name mismatch */
-    if (name != NULL && strcasecmp(ifa->ifa_name, name) != 0) {
+    if (name != NULL && !ares_strcaseeq(ifa->ifa_name, name)) {
       continue;
     }
 
-    status = ares__iface_ips_add(ips, addrflag, ifa->ifa_name, &addr, netmask,
-                                 ll_scope);
+    status = ares_iface_ips_add(ips, addrflag, ifa->ifa_name, &addr, netmask,
+                                ll_scope);
     if (status != ARES_SUCCESS) {
       goto done;
     }
@@ -533,8 +532,8 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
 
 #else
 
-static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
-                                               const char        *name)
+static ares_status_t ares_iface_ips_enumerate(ares_iface_ips_t *ips,
+                                              const char       *name)
 {
   (void)ips;
   (void)name;
@@ -544,7 +543,7 @@ static ares_status_t ares__iface_ips_enumerate(ares__iface_ips_t *ips,
 #endif
 
 
-unsigned int ares__if_nametoindex(const char *name)
+unsigned int ares_os_if_nametoindex(const char *name)
 {
 #ifdef HAVE_IF_NAMETOINDEX
   if (name == NULL) {
@@ -552,36 +551,35 @@ unsigned int ares__if_nametoindex(const char *name)
   }
   return if_nametoindex(name);
 #else
-  ares_status_t      status;
-  ares__iface_ips_t *ips = NULL;
-  size_t             i;
-  unsigned int       index = 0;
+  ares_status_t     status;
+  ares_iface_ips_t *ips = NULL;
+  size_t            i;
+  unsigned int      index = 0;
 
   if (name == NULL) {
     return 0;
   }
 
   status =
-    ares__iface_ips(&ips, ARES_IFACE_IP_V6 | ARES_IFACE_IP_LINKLOCAL, name);
+    ares_iface_ips(&ips, ARES_IFACE_IP_V6 | ARES_IFACE_IP_LINKLOCAL, name);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (i = 0; i < ares__iface_ips_cnt(ips); i++) {
-    if (ares__iface_ips_get_flags(ips, i) & ARES_IFACE_IP_LINKLOCAL) {
-      index = ares__iface_ips_get_ll_scope(ips, i);
+  for (i = 0; i < ares_iface_ips_cnt(ips); i++) {
+    if (ares_iface_ips_get_flags(ips, i) & ARES_IFACE_IP_LINKLOCAL) {
+      index = ares_iface_ips_get_ll_scope(ips, i);
       goto done;
     }
   }
 
 done:
-  ares__iface_ips_destroy(ips);
+  ares_iface_ips_destroy(ips);
   return index;
 #endif
 }
 
-const char *ares__if_indextoname(unsigned int index, char *name,
-                                 size_t name_len)
+const char *ares_os_if_indextoname(unsigned int index, char *name, size_t name_len)
 {
 #ifdef HAVE_IF_INDEXTONAME
   if (name_len < IF_NAMESIZE) {
@@ -589,10 +587,10 @@ const char *ares__if_indextoname(unsigned int index, char *name,
   }
   return if_indextoname(index, name);
 #else
-  ares_status_t      status;
-  ares__iface_ips_t *ips = NULL;
-  size_t             i;
-  const char        *ptr = NULL;
+  ares_status_t     status;
+  ares_iface_ips_t *ips = NULL;
+  size_t            i;
+  const char       *ptr = NULL;
 
   if (name == NULL || name_len < IF_NAMESIZE) {
     goto done;
@@ -603,22 +601,22 @@ const char *ares__if_indextoname(unsigned int index, char *name,
   }
 
   status =
-    ares__iface_ips(&ips, ARES_IFACE_IP_V6 | ARES_IFACE_IP_LINKLOCAL, NULL);
+    ares_iface_ips(&ips, ARES_IFACE_IP_V6 | ARES_IFACE_IP_LINKLOCAL, NULL);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  for (i = 0; i < ares__iface_ips_cnt(ips); i++) {
-    if (ares__iface_ips_get_flags(ips, i) & ARES_IFACE_IP_LINKLOCAL &&
-        ares__iface_ips_get_ll_scope(ips, i) == index) {
-      ares_strcpy(name, ares__iface_ips_get_name(ips, i), name_len);
+  for (i = 0; i < ares_iface_ips_cnt(ips); i++) {
+    if (ares_iface_ips_get_flags(ips, i) & ARES_IFACE_IP_LINKLOCAL &&
+        ares_iface_ips_get_ll_scope(ips, i) == index) {
+      ares_strcpy(name, ares_iface_ips_get_name(ips, i), name_len);
       ptr = name;
       goto done;
     }
   }
 
 done:
-  ares__iface_ips_destroy(ips);
+  ares_iface_ips_destroy(ips);
   return ptr;
 #endif
 }
diff --git a/deps/cares/src/lib/util/ares__iface_ips.h b/deps/cares/src/lib/util/ares_iface_ips.h
similarity index 77%
rename from deps/cares/src/lib/util/ares__iface_ips.h
rename to deps/cares/src/lib/util/ares_iface_ips.h
index 61ff736a796361..f22e09046a065b 100644
--- a/deps/cares/src/lib/util/ares__iface_ips.h
+++ b/deps/cares/src/lib/util/ares_iface_ips.h
@@ -42,18 +42,18 @@ typedef enum {
   /*! Default, enumerate all ips for online interfaces, including loopback */
   ARES_IFACE_IP_DEFAULT = (ARES_IFACE_IP_V4 | ARES_IFACE_IP_V6 |
                            ARES_IFACE_IP_LOOPBACK | ARES_IFACE_IP_LINKLOCAL)
-} ares__iface_ip_flags_t;
+} ares_iface_ip_flags_t;
 
-struct ares__iface_ips;
+struct ares_iface_ips;
 
 /*! Opaque pointer for holding enumerated interface ip addresses */
-typedef struct ares__iface_ips ares__iface_ips_t;
+typedef struct ares_iface_ips ares_iface_ips_t;
 
-/*! Destroy ip address enumeration created by ares__iface_ips().
+/*! Destroy ip address enumeration created by ares_iface_ips().
  *
  *  \param[in]  ips   Initialized IP address enumeration structure
  */
-void                           ares__iface_ips_destroy(ares__iface_ips_t *ips);
+void                          ares_iface_ips_destroy(ares_iface_ips_t *ips);
 
 /*! Enumerate ip addresses on interfaces
  *
@@ -63,15 +63,15 @@ void                           ares__iface_ips_destroy(ares__iface_ips_t *ips);
  *  \return ARES_ENOMEM on out of memory, ARES_ENOTIMP if not supported on
  *          the system, ARES_SUCCESS on success
  */
-ares_status_t                  ares__iface_ips(ares__iface_ips_t    **ips,
-                                               ares__iface_ip_flags_t flags, const char *name);
+ares_status_t                 ares_iface_ips(ares_iface_ips_t    **ips,
+                                             ares_iface_ip_flags_t flags, const char *name);
 
 /*! Count of ips enumerated
  *
  * \param[in]  ips   Initialized IP address enumeration structure
  * \return count
  */
-size_t      ares__iface_ips_cnt(const ares__iface_ips_t *ips);
+size_t                        ares_iface_ips_cnt(const ares_iface_ips_t *ips);
 
 /*! Retrieve interface name
  *
@@ -79,7 +79,7 @@ size_t      ares__iface_ips_cnt(const ares__iface_ips_t *ips);
  * \param[in]  idx   Index of entry to pull
  * \return interface name
  */
-const char *ares__iface_ips_get_name(const ares__iface_ips_t *ips, size_t idx);
+const char *ares_iface_ips_get_name(const ares_iface_ips_t *ips, size_t idx);
 
 /*! Retrieve interface address
  *
@@ -87,8 +87,8 @@ const char *ares__iface_ips_get_name(const ares__iface_ips_t *ips, size_t idx);
  * \param[in]  idx   Index of entry to pull
  * \return interface address
  */
-const struct ares_addr *ares__iface_ips_get_addr(const ares__iface_ips_t *ips,
-                                                 size_t                   idx);
+const struct ares_addr *ares_iface_ips_get_addr(const ares_iface_ips_t *ips,
+                                                size_t                  idx);
 
 /*! Retrieve interface address flags
  *
@@ -96,8 +96,8 @@ const struct ares_addr *ares__iface_ips_get_addr(const ares__iface_ips_t *ips,
  * \param[in]  idx   Index of entry to pull
  * \return interface address flags
  */
-ares__iface_ip_flags_t  ares__iface_ips_get_flags(const ares__iface_ips_t *ips,
-                                                  size_t                   idx);
+ares_iface_ip_flags_t   ares_iface_ips_get_flags(const ares_iface_ips_t *ips,
+                                                 size_t                  idx);
 
 /*! Retrieve interface address netmask
  *
@@ -105,8 +105,8 @@ ares__iface_ip_flags_t  ares__iface_ips_get_flags(const ares__iface_ips_t *ips,
  * \param[in]  idx   Index of entry to pull
  * \return interface address netmask
  */
-unsigned char ares__iface_ips_get_netmask(const ares__iface_ips_t *ips,
-                                          size_t                   idx);
+unsigned char           ares_iface_ips_get_netmask(const ares_iface_ips_t *ips,
+                                                   size_t                  idx);
 
 /*! Retrieve interface ipv6 link local scope
  *
@@ -114,8 +114,8 @@ unsigned char ares__iface_ips_get_netmask(const ares__iface_ips_t *ips,
  * \param[in]  idx   Index of entry to pull
  * \return interface ipv6 link local scope
  */
-unsigned int  ares__iface_ips_get_ll_scope(const ares__iface_ips_t *ips,
-                                           size_t                   idx);
+unsigned int            ares_iface_ips_get_ll_scope(const ares_iface_ips_t *ips,
+                                                    size_t                  idx);
 
 
 /*! Retrieve the interface index (aka link local scope) from the interface
@@ -124,7 +124,7 @@ unsigned int  ares__iface_ips_get_ll_scope(const ares__iface_ips_t *ips,
  * \param[in] name  Interface name
  * \return 0 on failure, index otherwise
  */
-unsigned int  ares__if_nametoindex(const char *name);
+unsigned int            ares_os_if_nametoindex(const char *name);
 
 /*! Retrieves the interface name from the index (aka link local scope)
  *
@@ -133,7 +133,7 @@ unsigned int  ares__if_nametoindex(const char *name);
  * \param[in] name_len Length of provided buffer, must be at least IF_NAMESIZE
  * \return NULL on failure, or pointer to name on success
  */
-const char   *ares__if_indextoname(unsigned int index, char *name,
-                                   size_t name_len);
+const char             *ares_os_if_indextoname(unsigned int index, char *name,
+                                               size_t name_len);
 
 #endif
diff --git a/deps/cares/src/lib/util/ares_math.c b/deps/cares/src/lib/util/ares_math.c
index 45999bdeababa6..1106bf6bf151f8 100644
--- a/deps/cares/src/lib/util/ares_math.c
+++ b/deps/cares/src/lib/util/ares_math.c
@@ -29,7 +29,7 @@
 /* Uses public domain code snippets from
  * http://graphics.stanford.edu/~seander/bithacks.html */
 
-static unsigned int ares__round_up_pow2_u32(unsigned int n)
+static unsigned int ares_round_up_pow2_u32(unsigned int n)
 {
   /* NOTE: if already a power of 2, will return itself, not the next */
   n--;
@@ -42,7 +42,7 @@ static unsigned int ares__round_up_pow2_u32(unsigned int n)
   return n;
 }
 
-static ares_int64_t ares__round_up_pow2_u64(ares_int64_t n)
+static ares_int64_t ares_round_up_pow2_u64(ares_int64_t n)
 {
   /* NOTE: if already a power of 2, will return itself, not the next */
   n--;
@@ -56,7 +56,7 @@ static ares_int64_t ares__round_up_pow2_u64(ares_int64_t n)
   return n;
 }
 
-ares_bool_t ares__is_64bit(void)
+ares_bool_t ares_is_64bit(void)
 {
 #ifdef _MSC_VER
 #  pragma warning(push)
@@ -70,16 +70,16 @@ ares_bool_t ares__is_64bit(void)
 #endif
 }
 
-size_t ares__round_up_pow2(size_t n)
+size_t ares_round_up_pow2(size_t n)
 {
-  if (ares__is_64bit()) {
-    return (size_t)ares__round_up_pow2_u64((ares_int64_t)n);
+  if (ares_is_64bit()) {
+    return (size_t)ares_round_up_pow2_u64((ares_int64_t)n);
   }
 
-  return (size_t)ares__round_up_pow2_u32((unsigned int)n);
+  return (size_t)ares_round_up_pow2_u32((unsigned int)n);
 }
 
-size_t ares__log2(size_t n)
+size_t ares_log2(size_t n)
 {
   static const unsigned char tab32[32] = { 0,  1,  28, 2,  29, 14, 24, 3,
                                            30, 22, 20, 15, 25, 17, 4,  8,
@@ -92,7 +92,7 @@ size_t ares__log2(size_t n)
     56, 45, 25, 31, 35, 16, 9,  12, 44, 24, 15, 8,  23, 7,  6,  5
   };
 
-  if (!ares__is_64bit()) {
+  if (!ares_is_64bit()) {
     return tab32[(n * 0x077CB531) >> 27];
   }
 
@@ -100,7 +100,7 @@ size_t ares__log2(size_t n)
 }
 
 /* x^y */
-size_t ares__pow(size_t x, size_t y)
+size_t ares_pow(size_t x, size_t y)
 {
   size_t res = 1;
 
@@ -118,7 +118,7 @@ size_t ares__pow(size_t x, size_t y)
   return res;
 }
 
-size_t ares__count_digits(size_t n)
+size_t ares_count_digits(size_t n)
 {
   size_t digits;
 
@@ -132,7 +132,7 @@ size_t ares__count_digits(size_t n)
   return digits;
 }
 
-size_t ares__count_hexdigits(size_t n)
+size_t ares_count_hexdigits(size_t n)
 {
   size_t digits;
 
@@ -146,7 +146,7 @@ size_t ares__count_hexdigits(size_t n)
   return digits;
 }
 
-unsigned char ares__count_bits_u8(unsigned char x)
+unsigned char ares_count_bits_u8(unsigned char x)
 {
   /* Implementation obtained from:
    * http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable */
diff --git a/deps/cares/src/lib/str/ares_strcasecmp.c b/deps/cares/src/lib/util/ares_math.h
similarity index 53%
rename from deps/cares/src/lib/str/ares_strcasecmp.c
rename to deps/cares/src/lib/util/ares_math.h
index 76b835fd8e92f1..52fa1facf01e47 100644
--- a/deps/cares/src/lib/str/ares_strcasecmp.c
+++ b/deps/cares/src/lib/util/ares_math.h
@@ -1,7 +1,6 @@
 /* MIT License
  *
- * Copyright (c) 1998 Massachusetts Institute of Technology
- * Copyright (c) The c-ares project and its contributors
+ * Copyright (c) 2024 Brad House
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -24,56 +23,23 @@
  *
  * SPDX-License-Identifier: MIT
  */
+#ifndef __ARES_MATH_H
+#define __ARES_MATH_H
 
-#include "ares_private.h"
-#include "ares_strcasecmp.h"
-
-#ifndef HAVE_STRCASECMP
-int ares_strcasecmp(const char *a, const char *b)
-{
-#  if defined(HAVE_STRCMPI)
-  return strcmpi(a, b);
-#  elif defined(HAVE_STRICMP)
-  return stricmp(a, b);
-#  else
-  size_t i;
-
-  for (i = 0; i < (size_t)-1; i++) {
-    int c1 = ares__tolower(a[i]);
-    int c2 = ares__tolower(b[i]);
-    if (c1 != c2) {
-      return c1 - c2;
-    }
-    if (!c1) {
-      break;
-    }
-  }
-  return 0;
-#  endif
-}
+#ifdef _MSC_VER
+typedef __int64          ares_int64_t;
+typedef unsigned __int64 ares_uint64_t;
+#else
+typedef long long          ares_int64_t;
+typedef unsigned long long ares_uint64_t;
 #endif
 
-#ifndef HAVE_STRNCASECMP
-int ares_strncasecmp(const char *a, const char *b, size_t n)
-{
-#  if defined(HAVE_STRNCMPI)
-  return strncmpi(a, b, n);
-#  elif defined(HAVE_STRNICMP)
-  return strnicmp(a, b, n);
-#  else
-  size_t i;
+ares_bool_t   ares_is_64bit(void);
+size_t        ares_round_up_pow2(size_t n);
+size_t        ares_log2(size_t n);
+size_t        ares_pow(size_t x, size_t y);
+size_t        ares_count_digits(size_t n);
+size_t        ares_count_hexdigits(size_t n);
+unsigned char ares_count_bits_u8(unsigned char x);
 
-  for (i = 0; i < n; i++) {
-    int c1 = ares__tolower(a[i]);
-    int c2 = ares__tolower(b[i]);
-    if (c1 != c2) {
-      return c1 - c2;
-    }
-    if (!c1) {
-      break;
-    }
-  }
-  return 0;
-#  endif
-}
 #endif
diff --git a/deps/cares/src/lib/util/ares_rand.c b/deps/cares/src/lib/util/ares_rand.c
index c57bb706e68e5b..408999951a78ca 100644
--- a/deps/cares/src/lib/util/ares_rand.c
+++ b/deps/cares/src/lib/util/ares_rand.c
@@ -55,7 +55,7 @@ typedef struct ares_rand_rc4 {
 static unsigned int ares_u32_from_ptr(void *addr)
 {
   /* LCOV_EXCL_START: FallbackCode */
-  if (ares__is_64bit()) {
+  if (ares_is_64bit()) {
     return (unsigned int)((((ares_uint64_t)addr >> 32) & 0xFFFFFFFF) |
                           ((ares_uint64_t)addr & 0xFFFFFFFF));
   }
@@ -77,9 +77,13 @@ static void ares_rc4_generate_key(ares_rand_rc4 *rc4_state, unsigned char *key,
     return;
   }
 
+#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+  /* For fuzzing, random should be deterministic */
+  srand(0);
+#else
   /* Randomness is hard to come by.  Maybe the system randomizes heap and stack
    * addresses. Maybe the current timestamp give us some randomness. Use
-   * rc4_state (heap), &i (stack), and ares__tvnow()
+   * rc4_state (heap), &i (stack), and ares_tvnow()
    */
   data = ares_u32_from_ptr(rc4_state);
   memcpy(key + len, &data, sizeof(data));
@@ -89,13 +93,14 @@ static void ares_rc4_generate_key(ares_rand_rc4 *rc4_state, unsigned char *key,
   memcpy(key + len, &data, sizeof(data));
   len += sizeof(data);
 
-  ares__tvnow(&tv);
+  ares_tvnow(&tv);
   data = (unsigned int)((tv.sec | tv.usec) & 0xFFFFFFFF);
   memcpy(key + len, &data, sizeof(data));
   len += sizeof(data);
 
   srand(ares_u32_from_ptr(rc4_state) | ares_u32_from_ptr(&i) |
         (unsigned int)((tv.sec | tv.usec) & 0xFFFFFFFF));
+#endif
 
   for (i = len; i < key_len; i++) {
     key[i] = (unsigned char)(rand() % 256); /* LCOV_EXCL_LINE */
@@ -188,10 +193,15 @@ BOOLEAN WINAPI SystemFunction036(PVOID RandomBuffer, ULONG RandomBufferLength);
 #endif
 
 
-static ares_bool_t ares__init_rand_engine(ares_rand_state *state)
+static ares_bool_t ares_init_rand_engine(ares_rand_state *state)
 {
   state->cache_remaining = 0;
 
+#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+  /* For fuzzing, random should be deterministic */
+  state->bad_backends |= ARES_RAND_OS | ARES_RAND_FILE;
+#endif
+
 #if defined(HAVE_ARC4RANDOM_BUF) || defined(HAVE_GETRANDOM) || defined(_WIN32)
   if (!(state->bad_backends & ARES_RAND_OS)) {
     state->type = ARES_RAND_OS;
@@ -223,7 +233,7 @@ static ares_bool_t ares__init_rand_engine(ares_rand_state *state)
   return ARES_TRUE; /* LCOV_EXCL_LINE: UntestablePath */
 }
 
-ares_rand_state *ares__init_rand_state(void)
+ares_rand_state *ares_init_rand_state(void)
 {
   ares_rand_state *state = NULL;
 
@@ -232,7 +242,7 @@ ares_rand_state *ares__init_rand_state(void)
     return NULL;
   }
 
-  if (!ares__init_rand_engine(state)) {
+  if (!ares_init_rand_engine(state)) {
     ares_free(state); /* LCOV_EXCL_LINE: UntestablePath */
     return NULL;      /* LCOV_EXCL_LINE: UntestablePath */
   }
@@ -240,7 +250,7 @@ ares_rand_state *ares__init_rand_state(void)
   return state;
 }
 
-static void ares__clear_rand_state(ares_rand_state *state)
+static void ares_clear_rand_state(ares_rand_state *state)
 {
   if (!state) {
     return; /* LCOV_EXCL_LINE: DefensiveCoding */
@@ -259,26 +269,26 @@ static void ares__clear_rand_state(ares_rand_state *state)
   }
 }
 
-static void ares__reinit_rand(ares_rand_state *state)
+static void ares_reinit_rand(ares_rand_state *state)
 {
   /* LCOV_EXCL_START: UntestablePath */
-  ares__clear_rand_state(state);
-  ares__init_rand_engine(state);
+  ares_clear_rand_state(state);
+  ares_init_rand_engine(state);
   /* LCOV_EXCL_STOP */
 }
 
-void ares__destroy_rand_state(ares_rand_state *state)
+void ares_destroy_rand_state(ares_rand_state *state)
 {
   if (!state) {
     return;
   }
 
-  ares__clear_rand_state(state);
+  ares_clear_rand_state(state);
   ares_free(state);
 }
 
-static void ares__rand_bytes_fetch(ares_rand_state *state, unsigned char *buf,
-                                   size_t len)
+static void ares_rand_bytes_fetch(ares_rand_state *state, unsigned char *buf,
+                                  size_t len)
 {
   while (1) {
     size_t bytes_read = 0;
@@ -344,17 +354,17 @@ static void ares__rand_bytes_fetch(ares_rand_state *state, unsigned char *buf,
 
     /* If we didn't return before we got here, that means we had a critical rand
      * failure and need to reinitialized */
-    ares__reinit_rand(state); /* LCOV_EXCL_LINE: UntestablePath */
+    ares_reinit_rand(state); /* LCOV_EXCL_LINE: UntestablePath */
   }
 }
 
-void ares__rand_bytes(ares_rand_state *state, unsigned char *buf, size_t len)
+void ares_rand_bytes(ares_rand_state *state, unsigned char *buf, size_t len)
 {
   /* See if we need to refill the cache to serve the request, but if len is
    * excessive, we're not going to update our cache or serve from cache */
   if (len > state->cache_remaining && len < sizeof(state->cache)) {
     size_t fetch_size = sizeof(state->cache) - state->cache_remaining;
-    ares__rand_bytes_fetch(state, state->cache, fetch_size);
+    ares_rand_bytes_fetch(state, state->cache, fetch_size);
     state->cache_remaining = sizeof(state->cache);
   }
 
@@ -367,13 +377,13 @@ void ares__rand_bytes(ares_rand_state *state, unsigned char *buf, size_t len)
   }
 
   /* Serve direct due to excess size of request */
-  ares__rand_bytes_fetch(state, buf, len);
+  ares_rand_bytes_fetch(state, buf, len);
 }
 
-unsigned short ares__generate_new_id(ares_rand_state *state)
+unsigned short ares_generate_new_id(ares_rand_state *state)
 {
   unsigned short r = 0;
 
-  ares__rand_bytes(state, (unsigned char *)&r, sizeof(r));
+  ares_rand_bytes(state, (unsigned char *)&r, sizeof(r));
   return r;
 }
diff --git a/deps/cares/src/lib/ares_platform.h b/deps/cares/src/lib/util/ares_rand.h
similarity index 72%
rename from deps/cares/src/lib/ares_platform.h
rename to deps/cares/src/lib/util/ares_rand.h
index 768eaddddd9ce3..81c61bf46488ae 100644
--- a/deps/cares/src/lib/ares_platform.h
+++ b/deps/cares/src/lib/util/ares_rand.h
@@ -1,7 +1,6 @@
 /* MIT License
  *
- * Copyright (c) 1998 Massachusetts Institute of Technology
- * Copyright (c) 2004 Daniel Stenberg
+ * Copyright (c) 2024 Brad House
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -24,27 +23,14 @@
  *
  * SPDX-License-Identifier: MIT
  */
-#ifndef HEADER_CARES_PLATFORM_H
-#define HEADER_CARES_PLATFORM_H
+#ifndef __ARES_RAND_H
+#define __ARES_RAND_H
 
-#if defined(_WIN32) && !defined(MSDOS)
+struct ares_rand_state;
+typedef struct ares_rand_state ares_rand_state;
 
-typedef enum {
-  WIN_UNKNOWN,
-  WIN_3X,
-  WIN_9X,
-  WIN_NT,
-  WIN_CE
-} win_platform;
-
-win_platform ares__getplatform(void);
-
-#endif
-
-#if defined(_WIN32_WCE)
-
-struct servent *getservbyport(int port, const char *proto);
+ares_rand_state               *ares_init_rand_state(void);
+void                           ares_destroy_rand_state(ares_rand_state *state);
+void ares_rand_bytes(ares_rand_state *state, unsigned char *buf, size_t len);
 
 #endif
-
-#endif /* HEADER_CARES_PLATFORM_H */
diff --git a/deps/cares/src/lib/util/ares__threads.c b/deps/cares/src/lib/util/ares_threads.c
similarity index 64%
rename from deps/cares/src/lib/util/ares__threads.c
rename to deps/cares/src/lib/util/ares_threads.c
index b47544451d9d4f..ab0b51afb70577 100644
--- a/deps/cares/src/lib/util/ares__threads.c
+++ b/deps/cares/src/lib/util/ares_threads.c
@@ -28,13 +28,13 @@
 #ifdef CARES_THREADS
 #  ifdef _WIN32
 
-struct ares__thread_mutex {
+struct ares_thread_mutex {
   CRITICAL_SECTION mutex;
 };
 
-ares__thread_mutex_t *ares__thread_mutex_create(void)
+ares_thread_mutex_t *ares_thread_mutex_create(void)
 {
-  ares__thread_mutex_t *mut = ares_malloc_zero(sizeof(*mut));
+  ares_thread_mutex_t *mut = ares_malloc_zero(sizeof(*mut));
   if (mut == NULL) {
     return NULL;
   }
@@ -43,7 +43,7 @@ ares__thread_mutex_t *ares__thread_mutex_create(void)
   return mut;
 }
 
-void ares__thread_mutex_destroy(ares__thread_mutex_t *mut)
+void ares_thread_mutex_destroy(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -52,7 +52,7 @@ void ares__thread_mutex_destroy(ares__thread_mutex_t *mut)
   ares_free(mut);
 }
 
-void ares__thread_mutex_lock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_lock(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -60,7 +60,7 @@ void ares__thread_mutex_lock(ares__thread_mutex_t *mut)
   EnterCriticalSection(&mut->mutex);
 }
 
-void ares__thread_mutex_unlock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_unlock(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -68,13 +68,13 @@ void ares__thread_mutex_unlock(ares__thread_mutex_t *mut)
   LeaveCriticalSection(&mut->mutex);
 }
 
-struct ares__thread_cond {
+struct ares_thread_cond {
   CONDITION_VARIABLE cond;
 };
 
-ares__thread_cond_t *ares__thread_cond_create(void)
+ares_thread_cond_t *ares_thread_cond_create(void)
 {
-  ares__thread_cond_t *cond = ares_malloc_zero(sizeof(*cond));
+  ares_thread_cond_t *cond = ares_malloc_zero(sizeof(*cond));
   if (cond == NULL) {
     return NULL;
   }
@@ -82,7 +82,7 @@ ares__thread_cond_t *ares__thread_cond_create(void)
   return cond;
 }
 
-void ares__thread_cond_destroy(ares__thread_cond_t *cond)
+void ares_thread_cond_destroy(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -90,7 +90,7 @@ void ares__thread_cond_destroy(ares__thread_cond_t *cond)
   ares_free(cond);
 }
 
-void ares__thread_cond_signal(ares__thread_cond_t *cond)
+void ares_thread_cond_signal(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -98,7 +98,7 @@ void ares__thread_cond_signal(ares__thread_cond_t *cond)
   WakeConditionVariable(&cond->cond);
 }
 
-void ares__thread_cond_broadcast(ares__thread_cond_t *cond)
+void ares_thread_cond_broadcast(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -106,8 +106,8 @@ void ares__thread_cond_broadcast(ares__thread_cond_t *cond)
   WakeAllConditionVariable(&cond->cond);
 }
 
-ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
-                                     ares__thread_mutex_t *mut)
+ares_status_t ares_thread_cond_wait(ares_thread_cond_t  *cond,
+                                    ares_thread_mutex_t *mut)
 {
   if (cond == NULL || mut == NULL) {
     return ARES_EFORMERR;
@@ -117,9 +117,9 @@ ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
-                                          ares__thread_mutex_t *mut,
-                                          unsigned long         timeout_ms)
+ares_status_t ares_thread_cond_timedwait(ares_thread_cond_t  *cond,
+                                         ares_thread_mutex_t *mut,
+                                         unsigned long        timeout_ms)
 {
   if (cond == NULL || mut == NULL) {
     return ARES_EFORMERR;
@@ -132,7 +132,7 @@ ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
   return ARES_SUCCESS;
 }
 
-struct ares__thread {
+struct ares_thread {
   HANDLE thread;
   DWORD  id;
 
@@ -142,18 +142,18 @@ struct ares__thread {
 };
 
 /* Wrap for pthread compatibility */
-static DWORD WINAPI ares__thread_func(LPVOID lpParameter)
+static DWORD WINAPI ares_thread_func(LPVOID lpParameter)
 {
-  ares__thread_t *thread = lpParameter;
+  ares_thread_t *thread = lpParameter;
 
   thread->rv = thread->func(thread->arg);
   return 0;
 }
 
-ares_status_t ares__thread_create(ares__thread_t    **thread,
-                                  ares__thread_func_t func, void *arg)
+ares_status_t ares_thread_create(ares_thread_t    **thread,
+                                 ares_thread_func_t func, void *arg)
 {
-  ares__thread_t *thr = NULL;
+  ares_thread_t *thr = NULL;
 
   if (func == NULL || thread == NULL) {
     return ARES_EFORMERR;
@@ -166,7 +166,7 @@ ares_status_t ares__thread_create(ares__thread_t    **thread,
 
   thr->func   = func;
   thr->arg    = arg;
-  thr->thread = CreateThread(NULL, 0, ares__thread_func, thr, 0, &thr->id);
+  thr->thread = CreateThread(NULL, 0, ares_thread_func, thr, 0, &thr->id);
   if (thr->thread == NULL) {
     ares_free(thr);
     return ARES_ESERVFAIL;
@@ -176,7 +176,7 @@ ares_status_t ares__thread_create(ares__thread_t    **thread,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__thread_join(ares__thread_t *thread, void **rv)
+ares_status_t ares_thread_join(ares_thread_t *thread, void **rv)
 {
   ares_status_t status = ARES_SUCCESS;
 
@@ -211,14 +211,14 @@ ares_status_t ares__thread_join(ares__thread_t *thread, void **rv)
 #      include <sys/time.h>
 #    endif
 
-struct ares__thread_mutex {
+struct ares_thread_mutex {
   pthread_mutex_t mutex;
 };
 
-ares__thread_mutex_t *ares__thread_mutex_create(void)
+ares_thread_mutex_t *ares_thread_mutex_create(void)
 {
-  pthread_mutexattr_t   attr;
-  ares__thread_mutex_t *mut = ares_malloc_zero(sizeof(*mut));
+  pthread_mutexattr_t  attr;
+  ares_thread_mutex_t *mut = ares_malloc_zero(sizeof(*mut));
   if (mut == NULL) {
     return NULL;
   }
@@ -247,7 +247,7 @@ ares__thread_mutex_t *ares__thread_mutex_create(void)
   /* LCOV_EXCL_STOP */
 }
 
-void ares__thread_mutex_destroy(ares__thread_mutex_t *mut)
+void ares_thread_mutex_destroy(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -256,7 +256,7 @@ void ares__thread_mutex_destroy(ares__thread_mutex_t *mut)
   ares_free(mut);
 }
 
-void ares__thread_mutex_lock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_lock(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -264,7 +264,7 @@ void ares__thread_mutex_lock(ares__thread_mutex_t *mut)
   pthread_mutex_lock(&mut->mutex);
 }
 
-void ares__thread_mutex_unlock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_unlock(ares_thread_mutex_t *mut)
 {
   if (mut == NULL) {
     return;
@@ -272,13 +272,13 @@ void ares__thread_mutex_unlock(ares__thread_mutex_t *mut)
   pthread_mutex_unlock(&mut->mutex);
 }
 
-struct ares__thread_cond {
+struct ares_thread_cond {
   pthread_cond_t cond;
 };
 
-ares__thread_cond_t *ares__thread_cond_create(void)
+ares_thread_cond_t *ares_thread_cond_create(void)
 {
-  ares__thread_cond_t *cond = ares_malloc_zero(sizeof(*cond));
+  ares_thread_cond_t *cond = ares_malloc_zero(sizeof(*cond));
   if (cond == NULL) {
     return NULL;
   }
@@ -286,7 +286,7 @@ ares__thread_cond_t *ares__thread_cond_create(void)
   return cond;
 }
 
-void ares__thread_cond_destroy(ares__thread_cond_t *cond)
+void ares_thread_cond_destroy(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -295,7 +295,7 @@ void ares__thread_cond_destroy(ares__thread_cond_t *cond)
   ares_free(cond);
 }
 
-void ares__thread_cond_signal(ares__thread_cond_t *cond)
+void ares_thread_cond_signal(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -303,7 +303,7 @@ void ares__thread_cond_signal(ares__thread_cond_t *cond)
   pthread_cond_signal(&cond->cond);
 }
 
-void ares__thread_cond_broadcast(ares__thread_cond_t *cond)
+void ares_thread_cond_broadcast(ares_thread_cond_t *cond)
 {
   if (cond == NULL) {
     return;
@@ -311,8 +311,8 @@ void ares__thread_cond_broadcast(ares__thread_cond_t *cond)
   pthread_cond_broadcast(&cond->cond);
 }
 
-ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
-                                     ares__thread_mutex_t *mut)
+ares_status_t ares_thread_cond_wait(ares_thread_cond_t  *cond,
+                                    ares_thread_mutex_t *mut)
 {
   if (cond == NULL || mut == NULL) {
     return ARES_EFORMERR;
@@ -322,7 +322,7 @@ ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
   return ARES_SUCCESS;
 }
 
-static void ares__timespec_timeout(struct timespec *ts, unsigned long add_ms)
+static void ares_timespec_timeout(struct timespec *ts, unsigned long add_ms)
 {
 #    if defined(HAVE_CLOCK_GETTIME) && defined(CLOCK_REALTIME)
   clock_gettime(CLOCK_REALTIME, ts);
@@ -345,9 +345,9 @@ static void ares__timespec_timeout(struct timespec *ts, unsigned long add_ms)
   }
 }
 
-ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
-                                          ares__thread_mutex_t *mut,
-                                          unsigned long         timeout_ms)
+ares_status_t ares_thread_cond_timedwait(ares_thread_cond_t  *cond,
+                                         ares_thread_mutex_t *mut,
+                                         unsigned long        timeout_ms)
 {
   struct timespec ts;
 
@@ -355,7 +355,7 @@ ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
     return ARES_EFORMERR;
   }
 
-  ares__timespec_timeout(&ts, timeout_ms);
+  ares_timespec_timeout(&ts, timeout_ms);
 
   if (pthread_cond_timedwait(&cond->cond, &mut->mutex, &ts) != 0) {
     return ARES_ETIMEOUT;
@@ -364,14 +364,14 @@ ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
   return ARES_SUCCESS;
 }
 
-struct ares__thread {
+struct ares_thread {
   pthread_t thread;
 };
 
-ares_status_t ares__thread_create(ares__thread_t    **thread,
-                                  ares__thread_func_t func, void *arg)
+ares_status_t ares_thread_create(ares_thread_t    **thread,
+                                 ares_thread_func_t func, void *arg)
 {
-  ares__thread_t *thr = NULL;
+  ares_thread_t *thr = NULL;
 
   if (func == NULL || thread == NULL) {
     return ARES_EFORMERR;
@@ -390,7 +390,7 @@ ares_status_t ares__thread_create(ares__thread_t    **thread,
   return ARES_SUCCESS;
 }
 
-ares_status_t ares__thread_join(ares__thread_t *thread, void **rv)
+ares_status_t ares_thread_join(ares_thread_t *thread, void **rv)
 {
   void         *ret    = NULL;
   ares_status_t status = ARES_SUCCESS;
@@ -420,57 +420,57 @@ ares_bool_t ares_threadsafety(void)
 #else /* !CARES_THREADS */
 
 /* NoOp */
-ares__thread_mutex_t *ares__thread_mutex_create(void)
+ares_thread_mutex_t *ares_thread_mutex_create(void)
 {
   return NULL;
 }
 
-void ares__thread_mutex_destroy(ares__thread_mutex_t *mut)
+void ares_thread_mutex_destroy(ares_thread_mutex_t *mut)
 {
   (void)mut;
 }
 
-void ares__thread_mutex_lock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_lock(ares_thread_mutex_t *mut)
 {
   (void)mut;
 }
 
-void ares__thread_mutex_unlock(ares__thread_mutex_t *mut)
+void ares_thread_mutex_unlock(ares_thread_mutex_t *mut)
 {
   (void)mut;
 }
 
-ares__thread_cond_t *ares__thread_cond_create(void)
+ares_thread_cond_t *ares_thread_cond_create(void)
 {
   return NULL;
 }
 
-void ares__thread_cond_destroy(ares__thread_cond_t *cond)
+void ares_thread_cond_destroy(ares_thread_cond_t *cond)
 {
   (void)cond;
 }
 
-void ares__thread_cond_signal(ares__thread_cond_t *cond)
+void ares_thread_cond_signal(ares_thread_cond_t *cond)
 {
   (void)cond;
 }
 
-void ares__thread_cond_broadcast(ares__thread_cond_t *cond)
+void ares_thread_cond_broadcast(ares_thread_cond_t *cond)
 {
   (void)cond;
 }
 
-ares_status_t ares__thread_cond_wait(ares__thread_cond_t  *cond,
-                                     ares__thread_mutex_t *mut)
+ares_status_t ares_thread_cond_wait(ares_thread_cond_t  *cond,
+                                    ares_thread_mutex_t *mut)
 {
   (void)cond;
   (void)mut;
   return ARES_ENOTIMP;
 }
 
-ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
-                                          ares__thread_mutex_t *mut,
-                                          unsigned long         timeout_ms)
+ares_status_t ares_thread_cond_timedwait(ares_thread_cond_t  *cond,
+                                         ares_thread_mutex_t *mut,
+                                         unsigned long        timeout_ms)
 {
   (void)cond;
   (void)mut;
@@ -478,8 +478,8 @@ ares_status_t ares__thread_cond_timedwait(ares__thread_cond_t  *cond,
   return ARES_ENOTIMP;
 }
 
-ares_status_t ares__thread_create(ares__thread_t    **thread,
-                                  ares__thread_func_t func, void *arg)
+ares_status_t ares_thread_create(ares_thread_t    **thread,
+                                 ares_thread_func_t func, void *arg)
 {
   (void)thread;
   (void)func;
@@ -487,7 +487,7 @@ ares_status_t ares__thread_create(ares__thread_t    **thread,
   return ARES_ENOTIMP;
 }
 
-ares_status_t ares__thread_join(ares__thread_t *thread, void **rv)
+ares_status_t ares_thread_join(ares_thread_t *thread, void **rv)
 {
   (void)thread;
   (void)rv;
@@ -501,7 +501,7 @@ ares_bool_t ares_threadsafety(void)
 #endif
 
 
-ares_status_t ares__channel_threading_init(ares_channel_t *channel)
+ares_status_t ares_channel_threading_init(ares_channel_t *channel)
 {
   ares_status_t status = ARES_SUCCESS;
 
@@ -510,13 +510,13 @@ ares_status_t ares__channel_threading_init(ares_channel_t *channel)
     return ARES_SUCCESS;
   }
 
-  channel->lock = ares__thread_mutex_create();
+  channel->lock = ares_thread_mutex_create();
   if (channel->lock == NULL) {
     status = ARES_ENOMEM;
     goto done;
   }
 
-  channel->cond_empty = ares__thread_cond_create();
+  channel->cond_empty = ares_thread_cond_create();
   if (channel->cond_empty == NULL) {
     status = ARES_ENOMEM;
     goto done;
@@ -524,27 +524,27 @@ ares_status_t ares__channel_threading_init(ares_channel_t *channel)
 
 done:
   if (status != ARES_SUCCESS) {
-    ares__channel_threading_destroy(channel);
+    ares_channel_threading_destroy(channel);
   }
   return status;
 }
 
-void ares__channel_threading_destroy(ares_channel_t *channel)
+void ares_channel_threading_destroy(ares_channel_t *channel)
 {
-  ares__thread_mutex_destroy(channel->lock);
+  ares_thread_mutex_destroy(channel->lock);
   channel->lock = NULL;
-  ares__thread_cond_destroy(channel->cond_empty);
+  ares_thread_cond_destroy(channel->cond_empty);
   channel->cond_empty = NULL;
 }
 
-void ares__channel_lock(const ares_channel_t *channel)
+void ares_channel_lock(const ares_channel_t *channel)
 {
-  ares__thread_mutex_lock(channel->lock);
+  ares_thread_mutex_lock(channel->lock);
 }
 
-void ares__channel_unlock(const ares_channel_t *channel)
+void ares_channel_unlock(const ares_channel_t *channel)
 {
-  ares__thread_mutex_unlock(channel->lock);
+  ares_thread_mutex_unlock(channel->lock);
 }
 
 /* Must not be holding a channel lock already, public function only */
@@ -562,29 +562,29 @@ ares_status_t ares_queue_wait_empty(ares_channel_t *channel, int timeout_ms)
   }
 
   if (timeout_ms >= 0) {
-    ares__tvnow(&tout);
+    ares_tvnow(&tout);
     tout.sec  += (ares_int64_t)(timeout_ms / 1000);
     tout.usec += (unsigned int)(timeout_ms % 1000) * 1000;
   }
 
-  ares__thread_mutex_lock(channel->lock);
-  while (ares__llist_len(channel->all_queries)) {
+  ares_thread_mutex_lock(channel->lock);
+  while (ares_llist_len(channel->all_queries)) {
     if (timeout_ms < 0) {
-      ares__thread_cond_wait(channel->cond_empty, channel->lock);
+      ares_thread_cond_wait(channel->cond_empty, channel->lock);
     } else {
       ares_timeval_t tv_remaining;
       ares_timeval_t tv_now;
       unsigned long  tms;
 
-      ares__tvnow(&tv_now);
-      ares__timeval_remaining(&tv_remaining, &tv_now, &tout);
+      ares_tvnow(&tv_now);
+      ares_timeval_remaining(&tv_remaining, &tv_now, &tout);
       tms =
         (unsigned long)((tv_remaining.sec * 1000) + (tv_remaining.usec / 1000));
       if (tms == 0) {
         status = ARES_ETIMEOUT;
       } else {
         status =
-          ares__thread_cond_timedwait(channel->cond_empty, channel->lock, tms);
+          ares_thread_cond_timedwait(channel->cond_empty, channel->lock, tms);
       }
 
       /* If there was a timeout, don't loop.  Otherwise, make sure this wasn't
@@ -594,7 +594,7 @@ ares_status_t ares_queue_wait_empty(ares_channel_t *channel, int timeout_ms)
       }
     }
   }
-  ares__thread_mutex_unlock(channel->lock);
+  ares_thread_mutex_unlock(channel->lock);
   return status;
 }
 
@@ -605,10 +605,10 @@ void ares_queue_notify_empty(ares_channel_t *channel)
   }
 
   /* We are guaranteed to be holding a channel lock already */
-  if (ares__llist_len(channel->all_queries)) {
+  if (ares_llist_len(channel->all_queries)) {
     return;
   }
 
   /* Notify all waiters of the conditional */
-  ares__thread_cond_broadcast(channel->cond_empty);
+  ares_thread_cond_broadcast(channel->cond_empty);
 }
diff --git a/deps/cares/src/lib/util/ares_threads.h b/deps/cares/src/lib/util/ares_threads.h
new file mode 100644
index 00000000000000..95c543e6e994f5
--- /dev/null
+++ b/deps/cares/src/lib/util/ares_threads.h
@@ -0,0 +1,60 @@
+/* MIT License
+ *
+ * Copyright (c) 2023 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES__THREADS_H
+#define __ARES__THREADS_H
+
+struct ares_thread_mutex;
+typedef struct ares_thread_mutex ares_thread_mutex_t;
+
+ares_thread_mutex_t             *ares_thread_mutex_create(void);
+void ares_thread_mutex_destroy(ares_thread_mutex_t *mut);
+void ares_thread_mutex_lock(ares_thread_mutex_t *mut);
+void ares_thread_mutex_unlock(ares_thread_mutex_t *mut);
+
+
+struct ares_thread_cond;
+typedef struct ares_thread_cond ares_thread_cond_t;
+
+ares_thread_cond_t             *ares_thread_cond_create(void);
+void          ares_thread_cond_destroy(ares_thread_cond_t *cond);
+void          ares_thread_cond_signal(ares_thread_cond_t *cond);
+void          ares_thread_cond_broadcast(ares_thread_cond_t *cond);
+ares_status_t ares_thread_cond_wait(ares_thread_cond_t  *cond,
+                                    ares_thread_mutex_t *mut);
+ares_status_t ares_thread_cond_timedwait(ares_thread_cond_t  *cond,
+                                         ares_thread_mutex_t *mut,
+                                         unsigned long        timeout_ms);
+
+
+struct ares_thread;
+typedef struct ares_thread ares_thread_t;
+
+typedef void *(*ares_thread_func_t)(void *arg);
+ares_status_t ares_thread_create(ares_thread_t    **thread,
+                                 ares_thread_func_t func, void *arg);
+ares_status_t ares_thread_join(ares_thread_t *thread, void **rv);
+
+#endif
diff --git a/deps/cares/src/lib/util/ares_time.h b/deps/cares/src/lib/util/ares_time.h
new file mode 100644
index 00000000000000..c6eaf97366379e
--- /dev/null
+++ b/deps/cares/src/lib/util/ares_time.h
@@ -0,0 +1,48 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES_TIME_H
+#define __ARES_TIME_H
+
+/*! struct timeval on some systems like Windows doesn't support 64bit time so
+ *  therefore can't be used due to Y2K38 issues.  Make our own that does have
+ *  64bit time. */
+typedef struct {
+  ares_int64_t sec;  /*!< Seconds */
+  unsigned int usec; /*!< Microseconds. Can't be negative. */
+} ares_timeval_t;
+
+/* return true if now is exactly check time or later */
+ares_bool_t ares_timedout(const ares_timeval_t *now,
+                          const ares_timeval_t *check);
+
+void        ares_tvnow(ares_timeval_t *now);
+void        ares_timeval_remaining(ares_timeval_t       *remaining,
+                                   const ares_timeval_t *now,
+                                   const ares_timeval_t *tout);
+void ares_timeval_diff(ares_timeval_t *tvdiff, const ares_timeval_t *tvstart,
+                       const ares_timeval_t *tvstop);
+
+#endif
diff --git a/deps/cares/src/lib/util/ares__timeval.c b/deps/cares/src/lib/util/ares_timeval.c
similarity index 96%
rename from deps/cares/src/lib/util/ares__timeval.c
rename to deps/cares/src/lib/util/ares_timeval.c
index e3a989dca87a1b..0b0845b6fb7ffe 100644
--- a/deps/cares/src/lib/util/ares__timeval.c
+++ b/deps/cares/src/lib/util/ares_timeval.c
@@ -28,7 +28,7 @@
 
 #if defined(_WIN32) && !defined(MSDOS)
 
-void ares__tvnow(ares_timeval_t *now)
+void ares_tvnow(ares_timeval_t *now)
 {
   /* QueryPerformanceCounters() has been around since Windows 2000, though
    * significant fixes were made in later versions.  Documentation states
@@ -52,7 +52,7 @@ void ares__tvnow(ares_timeval_t *now)
 
 #elif defined(HAVE_CLOCK_GETTIME_MONOTONIC)
 
-void ares__tvnow(ares_timeval_t *now)
+void ares_tvnow(ares_timeval_t *now)
 {
   /* clock_gettime() is guaranteed to be increased monotonically when the
    * monotonic clock is queried. Time starting point is unspecified, it
@@ -76,7 +76,7 @@ void ares__tvnow(ares_timeval_t *now)
 
 #elif defined(HAVE_GETTIMEOFDAY)
 
-void ares__tvnow(ares_timeval_t *now)
+void ares_tvnow(ares_timeval_t *now)
 {
   /* gettimeofday() is not granted to be increased monotonically, due to
    * clock drifting and external source time synchronization it can jump
diff --git a/deps/cares/src/lib/util/ares_uri.c b/deps/cares/src/lib/util/ares_uri.c
new file mode 100644
index 00000000000000..d656f8347f54c9
--- /dev/null
+++ b/deps/cares/src/lib/util/ares_uri.c
@@ -0,0 +1,1626 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad house
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+
+
+#include "ares_private.h"
+#include "ares_uri.h"
+#ifdef HAVE_STDINT_H
+#  include <stdint.h>
+#endif
+
+struct ares_uri {
+  char                scheme[16];
+  char               *username;
+  char               *password;
+  unsigned short      port;
+  char                host[256];
+  char               *path;
+  ares_htable_dict_t *query;
+  char               *fragment;
+};
+
+/* RFC3986 character set notes:
+ *    gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
+ *    sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
+ *                / "*" / "+" / "," / ";" / "="
+ *    reserved    = gen-delims / sub-delims
+ *    unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
+ *    scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+ *    authority   = [ userinfo "@" ] host [ ":" port ]
+ *    userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
+ *    NOTE: Use of the format "user:password" in the userinfo field is
+ *          deprecated.  Applications should not render as clear text any data
+ *          after the first colon (":") character found within a userinfo
+ *          subcomponent unless the data after the colon is the empty string
+ *           (indicating no password).
+ *    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
+ *    query       = *( pchar / "/" / "?" )
+ *    fragment    = *( pchar / "/" / "?" )
+ *
+ *   NOTE: Due to ambiguity, "+" in a query must be percent-encoded, as old
+ *         URLs used that for spaces.
+ */
+
+
+static ares_bool_t ares_uri_chis_subdelim(char x)
+{
+  switch (x) {
+    case '!':
+      return ARES_TRUE;
+    case '$':
+      return ARES_TRUE;
+    case '&':
+      return ARES_TRUE;
+    case '\'':
+      return ARES_TRUE;
+    case '(':
+      return ARES_TRUE;
+    case ')':
+      return ARES_TRUE;
+    case '*':
+      return ARES_TRUE;
+    case '+':
+      return ARES_TRUE;
+    case ',':
+      return ARES_TRUE;
+    case ';':
+      return ARES_TRUE;
+    case '=':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ARES_FALSE;
+}
+
+/* These don't actually appear to be referenced in any logic */
+#if 0
+static ares_bool_t ares_uri_chis_gendelim(char x)
+{
+  switch (x) {
+    case ':':
+      return ARES_TRUE;
+    case '/':
+      return ARES_TRUE;
+    case '?':
+      return ARES_TRUE;
+    case '#':
+      return ARES_TRUE;
+    case '[':
+      return ARES_TRUE;
+    case ']':
+      return ARES_TRUE;
+    case '@':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ARES_FALSE;
+}
+
+
+static ares_bool_t ares_uri_chis_reserved(char x)
+{
+  return ares_uri_chis_gendelim(x) || ares_uri_chis_subdelim(x);
+}
+#endif
+
+static ares_bool_t ares_uri_chis_unreserved(char x)
+{
+  switch (x) {
+    case '-':
+      return ARES_TRUE;
+    case '.':
+      return ARES_TRUE;
+    case '_':
+      return ARES_TRUE;
+    case '~':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ares_isalpha(x) || ares_isdigit(x);
+}
+
+static ares_bool_t ares_uri_chis_scheme(char x)
+{
+  switch (x) {
+    case '+':
+      return ARES_TRUE;
+    case '-':
+      return ARES_TRUE;
+    case '.':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ares_isalpha(x) || ares_isdigit(x);
+}
+
+static ares_bool_t ares_uri_chis_authority(char x)
+{
+  /* This one here isn't well defined.  We are going to include the valid
+   * characters of the subfields plus known delimiters */
+  return ares_uri_chis_unreserved(x) || ares_uri_chis_subdelim(x) || x == '%' ||
+         x == '[' || x == ']' || x == '@' || x == ':';
+}
+
+static ares_bool_t ares_uri_chis_userinfo(char x)
+{
+  /* NOTE: we don't include ':' here since we are using that as our
+   *       username/password delimiter */
+  return ares_uri_chis_unreserved(x) || ares_uri_chis_subdelim(x);
+}
+
+static ares_bool_t ares_uri_chis_path(char x)
+{
+  switch (x) {
+    case ':':
+      return ARES_TRUE;
+    case '@':
+      return ARES_TRUE;
+    /* '/' isn't in the spec as a path character since its technically a
+     * delimiter but we're not splitting on '/' so we accept it as valid */
+    case '/':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ares_uri_chis_unreserved(x) || ares_uri_chis_subdelim(x);
+}
+
+static ares_bool_t ares_uri_chis_path_enc(char x)
+{
+  return ares_uri_chis_path(x) || x == '%';
+}
+
+static ares_bool_t ares_uri_chis_query(char x)
+{
+  switch (x) {
+    case '/':
+      return ARES_TRUE;
+    case '?':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+
+  /* Exclude & and = used as delimiters, they're valid characters in the
+   * set, just not for the individual pieces */
+  return ares_uri_chis_path(x) && x != '&' && x != '=';
+}
+
+static ares_bool_t ares_uri_chis_query_enc(char x)
+{
+  return ares_uri_chis_query(x) || x == '%';
+}
+
+static ares_bool_t ares_uri_chis_fragment(char x)
+{
+  switch (x) {
+    case '/':
+      return ARES_TRUE;
+    case '?':
+      return ARES_TRUE;
+    default:
+      break;
+  }
+  return ares_uri_chis_path(x);
+}
+
+static ares_bool_t ares_uri_chis_fragment_enc(char x)
+{
+  return ares_uri_chis_fragment(x) || x == '%';
+}
+
+ares_uri_t *ares_uri_create(void)
+{
+  ares_uri_t *uri = ares_malloc_zero(sizeof(*uri));
+
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  uri->query = ares_htable_dict_create();
+  if (uri->query == NULL) {
+    ares_free(uri);
+    return NULL;
+  }
+
+  return uri;
+}
+
+void ares_uri_destroy(ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return;
+  }
+
+  ares_free(uri->username);
+  ares_free(uri->password);
+  ares_free(uri->path);
+  ares_free(uri->fragment);
+  ares_htable_dict_destroy(uri->query);
+  ares_free(uri);
+}
+
+static ares_bool_t ares_uri_scheme_is_valid(const char *uri)
+{
+  size_t i;
+
+  if (ares_strlen(uri) == 0) {
+    return ARES_FALSE;
+  }
+
+  if (!ares_isalpha(*uri)) {
+    return ARES_FALSE;
+  }
+
+  for (i = 0; uri[i] != 0; i++) {
+    if (!ares_uri_chis_scheme(uri[i])) {
+      return ARES_FALSE;
+    }
+  }
+  return ARES_TRUE;
+}
+
+static ares_bool_t ares_uri_str_isvalid(const char *str, size_t max_len,
+                                        ares_bool_t (*ischr)(char))
+{
+  size_t i;
+
+  if (str == NULL) {
+    return ARES_FALSE;
+  }
+
+  for (i = 0; i != max_len && str[i] != 0; i++) {
+    if (!ischr(str[i])) {
+      return ARES_FALSE;
+    }
+  }
+  return ARES_TRUE;
+}
+
+ares_status_t ares_uri_set_scheme(ares_uri_t *uri, const char *scheme)
+{
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (!ares_uri_scheme_is_valid(scheme)) {
+    return ARES_EBADSTR;
+  }
+
+  ares_strcpy(uri->scheme, scheme, sizeof(uri->scheme));
+  ares_str_lower(uri->scheme);
+
+  return ARES_SUCCESS;
+}
+
+const char *ares_uri_get_scheme(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  return uri->scheme;
+}
+
+static ares_status_t ares_uri_set_username_own(ares_uri_t *uri, char *username)
+{
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (username != NULL && (!ares_str_isprint(username, ares_strlen(username)) ||
+                           ares_strlen(username) == 0)) {
+    return ARES_EBADSTR;
+  }
+
+
+  ares_free(uri->username);
+  uri->username = username;
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_uri_set_username(ares_uri_t *uri, const char *username)
+{
+  ares_status_t status;
+  char         *temp = NULL;
+
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (username != NULL) {
+    temp = ares_strdup(username);
+    if (temp == NULL) {
+      return ARES_ENOMEM;
+    }
+  }
+
+  status = ares_uri_set_username_own(uri, temp);
+  if (status != ARES_SUCCESS) {
+    ares_free(temp);
+  }
+
+  return status;
+}
+
+const char *ares_uri_get_username(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  return uri->username;
+}
+
+static ares_status_t ares_uri_set_password_own(ares_uri_t *uri, char *password)
+{
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (password != NULL && !ares_str_isprint(password, ares_strlen(password))) {
+    return ARES_EBADSTR;
+  }
+
+  ares_free(uri->password);
+  uri->password = password;
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_uri_set_password(ares_uri_t *uri, const char *password)
+{
+  ares_status_t status;
+  char         *temp = NULL;
+
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (password != NULL) {
+    temp = ares_strdup(password);
+    if (temp == NULL) {
+      return ARES_ENOMEM;
+    }
+  }
+
+  status = ares_uri_set_password_own(uri, temp);
+  if (status != ARES_SUCCESS) {
+    ares_free(temp);
+  }
+
+  return status;
+}
+
+const char *ares_uri_get_password(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  return uri->password;
+}
+
+ares_status_t ares_uri_set_host(ares_uri_t *uri, const char *host)
+{
+  struct ares_addr addr;
+  size_t           addrlen;
+  char             hoststr[256];
+  char            *ll_scope;
+
+  if (uri == NULL || ares_strlen(host) == 0 ||
+      ares_strlen(host) >= sizeof(hoststr)) {
+    return ARES_EFORMERR;
+  }
+
+  ares_strcpy(hoststr, host, sizeof(hoststr));
+
+  /* Look for '%' which could be a link-local scope for ipv6 addresses and
+   * parse it off */
+  ll_scope = strchr(hoststr, '%');
+  if (ll_scope != NULL) {
+    *ll_scope = 0;
+    ll_scope++;
+    if (!ares_str_isalnum(ll_scope)) {
+      return ARES_EBADNAME;
+    }
+  }
+
+  /* If its an IP address, normalize it */
+  memset(&addr, 0, sizeof(addr));
+  addr.family = AF_UNSPEC;
+  if (ares_dns_pton(hoststr, &addr, &addrlen) != NULL) {
+    char ipaddr[INET6_ADDRSTRLEN];
+    ares_inet_ntop(addr.family, &addr.addr, ipaddr, sizeof(ipaddr));
+    /* Only IPv6 is allowed to have a scope */
+    if (ll_scope != NULL && addr.family != AF_INET6) {
+      return ARES_EBADNAME;
+    }
+
+    if (ll_scope != NULL) {
+      snprintf(uri->host, sizeof(uri->host), "%s%%%s", ipaddr, ll_scope);
+    } else {
+      ares_strcpy(uri->host, ipaddr, sizeof(uri->host));
+    }
+    return ARES_SUCCESS;
+  }
+
+  /* If its a hostname, make sure its a valid charset */
+  if (!ares_is_hostname(host)) {
+    return ARES_EBADNAME;
+  }
+
+  ares_strcpy(uri->host, host, sizeof(uri->host));
+  return ARES_SUCCESS;
+}
+
+const char *ares_uri_get_host(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  return uri->host;
+}
+
+ares_status_t ares_uri_set_port(ares_uri_t *uri, unsigned short port)
+{
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+  uri->port = port;
+  return ARES_SUCCESS;
+}
+
+unsigned short ares_uri_get_port(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return 0;
+  }
+  return uri->port;
+}
+
+/* URI spec says path normalization is a requirement */
+static char *ares_uri_path_normalize(const char *path)
+{
+  ares_status_t status;
+  ares_array_t *arr     = NULL;
+  ares_buf_t   *outpath = NULL;
+  ares_buf_t   *inpath  = NULL;
+  ares_ssize_t  i;
+  size_t        j;
+  size_t        len;
+
+  inpath =
+    ares_buf_create_const((const unsigned char *)path, ares_strlen(path));
+  if (inpath == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  outpath = ares_buf_create();
+  if (outpath == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  status = ares_buf_split_str_array(inpath, (const unsigned char *)"/", 1,
+                                    ARES_BUF_SPLIT_TRIM, 0, &arr);
+  if (status != ARES_SUCCESS) {
+    return NULL;
+  }
+
+  for (i = 0; i < (ares_ssize_t)ares_array_len(arr); i++) {
+    const char **strptr = ares_array_at(arr, (size_t)i);
+    const char  *str    = *strptr;
+
+    if (ares_streq(str, ".")) {
+      ares_array_remove_at(arr, (size_t)i);
+      i--;
+    } else if (ares_streq(str, "..")) {
+      if (i != 0) {
+        ares_array_remove_at(arr, (size_t)i - 1);
+        i--;
+      }
+      ares_array_remove_at(arr, (size_t)i);
+      i--;
+    }
+  }
+
+  status = ares_buf_append_byte(outpath, '/');
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  len = ares_array_len(arr);
+  for (j = 0; j < len; j++) {
+    const char **strptr = ares_array_at(arr, j);
+    const char  *str    = *strptr;
+    status              = ares_buf_append_str(outpath, str);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    /* Path separator, but on the last entry, we need to check if it was
+     * originally terminated or not because they have different meanings */
+    if (j != len - 1 || path[ares_strlen(path) - 1] == '/') {
+      status = ares_buf_append_byte(outpath, '/');
+      if (status != ARES_SUCCESS) {
+        goto done;
+      }
+    }
+  }
+
+done:
+  ares_array_destroy(arr);
+  ares_buf_destroy(inpath);
+  if (status != ARES_SUCCESS) {
+    ares_buf_destroy(outpath);
+    return NULL;
+  }
+
+  return ares_buf_finish_str(outpath, NULL);
+}
+
+ares_status_t ares_uri_set_path(ares_uri_t *uri, const char *path)
+{
+  char *temp = NULL;
+
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (path != NULL && !ares_str_isprint(path, ares_strlen(path))) {
+    return ARES_EBADSTR;
+  }
+
+  if (path != NULL) {
+    temp = ares_uri_path_normalize(path);
+    if (temp == NULL) {
+      return ARES_ENOMEM;
+    }
+  }
+
+  ares_free(uri->path);
+  uri->path = temp;
+
+  return ARES_SUCCESS;
+}
+
+const char *ares_uri_get_path(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+
+  return uri->path;
+}
+
+ares_status_t ares_uri_set_query_key(ares_uri_t *uri, const char *key,
+                                     const char *val)
+{
+  if (uri == NULL || key == NULL || *key == 0) {
+    return ARES_EFORMERR;
+  }
+
+  if (!ares_str_isprint(key, ares_strlen(key)) ||
+      (val != NULL && !ares_str_isprint(val, ares_strlen(val)))) {
+    return ARES_EBADSTR;
+  }
+
+  if (!ares_htable_dict_insert(uri->query, key, val)) {
+    return ARES_ENOMEM;
+  }
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_uri_del_query_key(ares_uri_t *uri, const char *key)
+{
+  if (uri == NULL || key == NULL || *key == 0 ||
+      !ares_str_isprint(key, ares_strlen(key))) {
+    return ARES_EFORMERR;
+  }
+
+  if (!ares_htable_dict_remove(uri->query, key)) {
+    return ARES_ENOTFOUND;
+  }
+
+  return ARES_SUCCESS;
+}
+
+const char *ares_uri_get_query_key(const ares_uri_t *uri, const char *key)
+{
+  if (uri == NULL || key == NULL || *key == 0 ||
+      !ares_str_isprint(key, ares_strlen(key))) {
+    return NULL;
+  }
+
+  return ares_htable_dict_get_direct(uri->query, key);
+}
+
+char **ares_uri_get_query_keys(const ares_uri_t *uri, size_t *num)
+{
+  if (uri == NULL || num == NULL) {
+    return NULL;
+  }
+
+  return ares_htable_dict_keys(uri->query, num);
+}
+
+static ares_status_t ares_uri_set_fragment_own(ares_uri_t *uri, char *fragment)
+{
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (fragment != NULL && !ares_str_isprint(fragment, ares_strlen(fragment))) {
+    return ARES_EBADSTR;
+  }
+
+  ares_free(uri->fragment);
+  uri->fragment = fragment;
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_uri_set_fragment(ares_uri_t *uri, const char *fragment)
+{
+  ares_status_t status;
+  char         *temp = NULL;
+
+  if (uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (fragment != NULL) {
+    temp = ares_strdup(fragment);
+    if (temp == NULL) {
+      return ARES_ENOMEM;
+    }
+  }
+
+  status = ares_uri_set_fragment_own(uri, temp);
+  if (status != ARES_SUCCESS) {
+    ares_free(temp);
+  }
+
+  return status;
+}
+
+const char *ares_uri_get_fragment(const ares_uri_t *uri)
+{
+  if (uri == NULL) {
+    return NULL;
+  }
+  return uri->fragment;
+}
+
+static ares_status_t ares_uri_encode_buf(ares_buf_t *buf, const char *str,
+                                         ares_bool_t (*ischr)(char))
+{
+  size_t i;
+
+  if (buf == NULL || str == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  for (i = 0; str[i] != 0; i++) {
+    if (ischr(str[i])) {
+      if (ares_buf_append_byte(buf, (unsigned char)str[i]) != ARES_SUCCESS) {
+        return ARES_ENOMEM;
+      }
+    } else {
+      if (ares_buf_append_byte(buf, '%') != ARES_SUCCESS) {
+        return ARES_ENOMEM;
+      }
+      if (ares_buf_append_num_hex(buf, (size_t)str[i], 2) != ARES_SUCCESS) {
+        return ARES_ENOMEM;
+      }
+    }
+  }
+  return ARES_SUCCESS;
+}
+
+static ares_status_t ares_uri_write_scheme(const ares_uri_t *uri,
+                                           ares_buf_t       *buf)
+{
+  ares_status_t status;
+
+  status = ares_buf_append_str(buf, uri->scheme);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  status = ares_buf_append_str(buf, "://");
+
+  return status;
+}
+
+static ares_status_t ares_uri_write_authority(const ares_uri_t *uri,
+                                              ares_buf_t       *buf)
+{
+  ares_status_t status;
+  ares_bool_t   is_ipv6 = ARES_FALSE;
+
+  if (ares_strlen(uri->username)) {
+    status = ares_uri_encode_buf(buf, uri->username, ares_uri_chis_userinfo);
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  if (ares_strlen(uri->password)) {
+    status = ares_buf_append_byte(buf, ':');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+
+    status = ares_uri_encode_buf(buf, uri->password, ares_uri_chis_userinfo);
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  if (ares_strlen(uri->username) || ares_strlen(uri->password)) {
+    status = ares_buf_append_byte(buf, '@');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  /* We need to write ipv6 addresses with [ ] */
+  if (strchr(uri->host, '%') != NULL) {
+    /* If we have a % in the name, it must be ipv6 link local scope, so we
+     * don't need to check anything else */
+    is_ipv6 = ARES_TRUE;
+  } else {
+    /* Parse the host to see if it is an ipv6 address */
+    struct ares_addr addr;
+    size_t           addrlen;
+    memset(&addr, 0, sizeof(addr));
+    addr.family = AF_INET6;
+    if (ares_dns_pton(uri->host, &addr, &addrlen) != NULL) {
+      is_ipv6 = ARES_TRUE;
+    }
+  }
+
+  if (is_ipv6) {
+    status = ares_buf_append_byte(buf, '[');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  status = ares_buf_append_str(buf, uri->host);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  if (is_ipv6) {
+    status = ares_buf_append_byte(buf, ']');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  if (uri->port > 0) {
+    status = ares_buf_append_byte(buf, ':');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+    status = ares_buf_append_num_dec(buf, uri->port, 0);
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  return status;
+}
+
+static ares_status_t ares_uri_write_path(const ares_uri_t *uri, ares_buf_t *buf)
+{
+  ares_status_t status;
+
+  if (ares_strlen(uri->path) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  if (*uri->path != '/') {
+    status = ares_buf_append_byte(buf, '/');
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  status = ares_uri_encode_buf(buf, uri->path, ares_uri_chis_path);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  return ARES_SUCCESS;
+}
+
+static ares_status_t ares_uri_write_query(const ares_uri_t *uri,
+                                          ares_buf_t       *buf)
+{
+  ares_status_t status;
+  char        **keys;
+  size_t        num_keys = 0;
+  size_t        i;
+
+  if (ares_htable_dict_num_keys(uri->query) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  keys = ares_uri_get_query_keys(uri, &num_keys);
+  if (keys == NULL || num_keys == 0) {
+    return ARES_ENOMEM;
+  }
+
+  status = ares_buf_append_byte(buf, '?');
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  for (i = 0; i < num_keys; i++) {
+    const char *val;
+
+    if (i != 0) {
+      status = ares_buf_append_byte(buf, '&');
+      if (status != ARES_SUCCESS) {
+        goto done;
+      }
+    }
+
+    status = ares_uri_encode_buf(buf, keys[i], ares_uri_chis_query);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    val = ares_uri_get_query_key(uri, keys[i]);
+    if (val != NULL) {
+      status = ares_buf_append_byte(buf, '=');
+      if (status != ARES_SUCCESS) {
+        goto done;
+      }
+
+      status = ares_uri_encode_buf(buf, val, ares_uri_chis_query);
+      if (status != ARES_SUCCESS) {
+        goto done;
+      }
+    }
+  }
+
+done:
+  ares_free_array(keys, num_keys, ares_free);
+  return status;
+}
+
+static ares_status_t ares_uri_write_fragment(const ares_uri_t *uri,
+                                             ares_buf_t       *buf)
+{
+  ares_status_t status;
+
+  if (!ares_strlen(uri->fragment)) {
+    return ARES_SUCCESS;
+  }
+
+  status = ares_buf_append_byte(buf, '#');
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  status = ares_uri_encode_buf(buf, uri->fragment, ares_uri_chis_fragment);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  return ARES_SUCCESS;
+}
+
+ares_status_t ares_uri_write_buf(const ares_uri_t *uri, ares_buf_t *buf)
+{
+  ares_status_t status;
+  size_t        orig_len;
+
+  if (uri == NULL || buf == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  if (ares_strlen(uri->scheme) == 0 || ares_strlen(uri->host) == 0) {
+    return ARES_ENODATA;
+  }
+
+  orig_len = ares_buf_len(buf);
+
+  status = ares_uri_write_scheme(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_write_authority(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_write_path(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_write_query(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_write_fragment(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  if (status != ARES_SUCCESS) {
+    ares_buf_set_length(buf, orig_len);
+  }
+  return status;
+}
+
+ares_status_t ares_uri_write(char **out, const ares_uri_t *uri)
+{
+  ares_buf_t   *buf;
+  ares_status_t status;
+
+  if (out == NULL || uri == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *out = NULL;
+
+  buf = ares_buf_create();
+  if (buf == NULL) {
+    return ARES_ENOMEM;
+  }
+
+  status = ares_uri_write_buf(uri, buf);
+  if (status != ARES_SUCCESS) {
+    ares_buf_destroy(buf);
+    return status;
+  }
+
+  *out = ares_buf_finish_str(buf, NULL);
+  return ARES_SUCCESS;
+}
+
+#define xdigit_val(x)     \
+  ((x >= '0' && x <= '9') \
+     ? (x - '0')          \
+     : ((x >= 'A' && x <= 'F') ? (x - 'A' + 10) : (x - 'a' + 10)))
+
+static ares_status_t ares_uri_decode_inplace(char *str, ares_bool_t is_query,
+                                             ares_bool_t must_be_printable,
+                                             size_t     *out_len)
+{
+  size_t i;
+  size_t len = 0;
+
+  for (i = 0; str[i] != 0; i++) {
+    if (is_query && str[i] == '+') {
+      str[len++] = ' ';
+      continue;
+    }
+
+    if (str[i] != '%') {
+      str[len++] = str[i];
+      continue;
+    }
+
+    if (!ares_isxdigit(str[i + 1]) || !ares_isxdigit(str[i + 2])) {
+      return ARES_EBADSTR;
+    }
+
+    str[len] = (char)(xdigit_val(str[i + 1]) << 4 | xdigit_val(str[i + 2]));
+
+    if (must_be_printable && !ares_isprint(str[len])) {
+      return ARES_EBADSTR;
+    }
+
+    len++;
+
+    i += 2;
+  }
+
+  str[len] = 0;
+
+  *out_len = len;
+  return ARES_SUCCESS;
+}
+
+static ares_status_t ares_uri_parse_scheme(ares_uri_t *uri, ares_buf_t *buf)
+{
+  ares_status_t status;
+  size_t        bytes;
+  char          scheme[sizeof(uri->scheme)];
+
+  ares_buf_tag(buf);
+
+  bytes =
+    ares_buf_consume_until_seq(buf, (const unsigned char *)"://", 3, ARES_TRUE);
+  if (bytes == SIZE_MAX || bytes > sizeof(uri->scheme)) {
+    return ARES_EBADSTR;
+  }
+
+  status = ares_buf_tag_fetch_string(buf, scheme, sizeof(scheme));
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  status = ares_uri_set_scheme(uri, scheme);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Consume :// */
+  ares_buf_consume(buf, 3);
+
+  return ARES_SUCCESS;
+}
+
+static ares_status_t ares_uri_parse_userinfo(ares_uri_t *uri, ares_buf_t *buf)
+{
+  size_t        userinfo_len;
+  size_t        username_len;
+  ares_bool_t   has_password = ARES_FALSE;
+  char         *temp         = NULL;
+  ares_status_t status;
+  size_t        len;
+
+  ares_buf_tag(buf);
+
+  /* Search for @, if its not found, return */
+  userinfo_len = ares_buf_consume_until_charset(buf, (const unsigned char *)"@",
+                                                1, ARES_TRUE);
+
+  if (userinfo_len == SIZE_MAX) {
+    return ARES_SUCCESS;
+  }
+
+  /* Rollback since now we know there really is userinfo */
+  ares_buf_tag_rollback(buf);
+
+  /* Search for ':', if it isn't found or its past the '@' then we only have
+   * a username and no password */
+  ares_buf_tag(buf);
+  username_len = ares_buf_consume_until_charset(buf, (const unsigned char *)":",
+                                                1, ARES_TRUE);
+  if (username_len < userinfo_len) {
+    has_password = ARES_TRUE;
+    status       = ares_buf_tag_fetch_strdup(buf, &temp);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    status = ares_uri_decode_inplace(temp, ARES_FALSE, ARES_TRUE, &len);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    status = ares_uri_set_username_own(uri, temp);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+    temp = NULL;
+
+    /* Consume : */
+    ares_buf_consume(buf, 1);
+  }
+
+  ares_buf_tag(buf);
+  ares_buf_consume_until_charset(buf, (const unsigned char *)"@", 1, ARES_TRUE);
+  status = ares_buf_tag_fetch_strdup(buf, &temp);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_decode_inplace(temp, ARES_FALSE, ARES_TRUE, &len);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  if (has_password) {
+    status = ares_uri_set_password_own(uri, temp);
+  } else {
+    status = ares_uri_set_username_own(uri, temp);
+  }
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+  temp = NULL;
+
+  /* Consume @ */
+  ares_buf_consume(buf, 1);
+
+done:
+  ares_free(temp);
+  return status;
+}
+
+static ares_status_t ares_uri_parse_hostport(ares_uri_t *uri, ares_buf_t *buf)
+{
+  unsigned char b;
+  char          host[256];
+  char          port[6];
+  size_t        len;
+  ares_status_t status;
+
+  status = ares_buf_peek_byte(buf, &b);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Bracketed syntax for ipv6 addresses */
+  if (b == '[') {
+    ares_buf_consume(buf, 1);
+    ares_buf_tag(buf);
+    len = ares_buf_consume_until_charset(buf, (const unsigned char *)"]", 1,
+                                         ARES_TRUE);
+    if (len == SIZE_MAX) {
+      return ARES_EBADSTR;
+    }
+
+    status = ares_buf_tag_fetch_string(buf, host, sizeof(host));
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+    /* Consume ']' */
+    ares_buf_consume(buf, 1);
+  } else {
+    /* Either ipv4 or hostname */
+    ares_buf_tag(buf);
+    ares_buf_consume_until_charset(buf, (const unsigned char *)":", 1,
+                                   ARES_FALSE);
+
+    status = ares_buf_tag_fetch_string(buf, host, sizeof(host));
+    if (status != ARES_SUCCESS) {
+      return status;
+    }
+  }
+
+  status = ares_uri_set_host(uri, host);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* No port if nothing left to consume */
+  if (!ares_buf_len(buf)) {
+    return status;
+  }
+
+  status = ares_buf_peek_byte(buf, &b);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Only valid extra character at this point is ':' */
+  if (b != ':') {
+    return ARES_EBADSTR;
+  }
+  ares_buf_consume(buf, 1);
+
+  len = ares_buf_len(buf);
+  if (len == 0 || len > sizeof(port) - 1) {
+    return ARES_EBADSTR;
+  }
+
+  status = ares_buf_fetch_bytes(buf, (unsigned char *)port, len);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+  port[len] = 0;
+
+  if (!ares_str_isnum(port)) {
+    return ARES_EBADSTR;
+  }
+
+  status = ares_uri_set_port(uri, (unsigned short)atoi(port));
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  return ARES_SUCCESS;
+}
+
+static ares_status_t ares_uri_parse_authority(ares_uri_t *uri, ares_buf_t *buf)
+{
+  ares_status_t        status;
+  size_t               bytes;
+  ares_buf_t          *auth = NULL;
+  const unsigned char *ptr;
+  size_t               ptr_len;
+
+  ares_buf_tag(buf);
+
+  bytes = ares_buf_consume_until_charset(buf, (const unsigned char *)"/?#", 3,
+                                         ARES_FALSE);
+  if (bytes == 0) {
+    return ARES_EBADSTR;
+  }
+
+  status = ares_buf_tag_fetch_constbuf(buf, &auth);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  ptr = ares_buf_peek(auth, &ptr_len);
+  if (!ares_uri_str_isvalid((const char *)ptr, ptr_len,
+                            ares_uri_chis_authority)) {
+    status = ARES_EBADSTR;
+    goto done;
+  }
+
+  status = ares_uri_parse_userinfo(uri, auth);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_hostport(uri, auth);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  /* NOTE: the /, ?, or # is still in the buffer at this point so it can
+   *       be used to determine what parser should be called next */
+
+done:
+  ares_buf_destroy(auth);
+  return status;
+}
+
+static ares_status_t ares_uri_parse_path(ares_uri_t *uri, ares_buf_t *buf)
+{
+  unsigned char b;
+  char         *path = NULL;
+  ares_status_t status;
+  size_t        len;
+
+  if (ares_buf_len(buf) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  status = ares_buf_peek_byte(buf, &b);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Not a path, must be one of the others */
+  if (b != '/') {
+    return ARES_SUCCESS;
+  }
+
+  ares_buf_tag(buf);
+  ares_buf_consume_until_charset(buf, (const unsigned char *)"?#", 2,
+                                 ARES_FALSE);
+  status = ares_buf_tag_fetch_strdup(buf, &path);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  if (!ares_uri_str_isvalid(path, SIZE_MAX, ares_uri_chis_path_enc)) {
+    status = ARES_EBADSTR;
+    goto done;
+  }
+
+  status = ares_uri_decode_inplace(path, ARES_FALSE, ARES_TRUE, &len);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_set_path(uri, path);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  ares_free(path);
+  return status;
+}
+
+static ares_status_t ares_uri_parse_query_buf(ares_uri_t *uri, ares_buf_t *buf)
+{
+  ares_status_t status = ARES_SUCCESS;
+  char         *key    = NULL;
+  char         *val    = NULL;
+
+  while (ares_buf_len(buf) > 0) {
+    unsigned char b = 0;
+    size_t        len;
+
+    ares_buf_tag(buf);
+
+    /* Its valid to have only a key with no value, so we search for both
+     * delims */
+    len = ares_buf_consume_until_charset(buf, (const unsigned char *)"&=", 2,
+                                         ARES_FALSE);
+    if (len == 0) {
+      /* If we're here, we have a zero length key which is invalid */
+      status = ARES_EBADSTR;
+      goto done;
+    }
+
+    if (ares_buf_len(buf) > 0) {
+      /* Determine if we stopped on & or = */
+      status = ares_buf_peek_byte(buf, &b);
+      if (status != ARES_SUCCESS) {
+        goto done;
+      }
+    }
+
+    status = ares_buf_tag_fetch_strdup(buf, &key);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    if (!ares_uri_str_isvalid(key, SIZE_MAX, ares_uri_chis_query_enc)) {
+      status = ARES_EBADSTR;
+      goto done;
+    }
+
+    status = ares_uri_decode_inplace(key, ARES_TRUE, ARES_TRUE, &len);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    /* Fetch Value */
+    if (b == '=') {
+      /* Skip delimiter */
+      ares_buf_consume(buf, 1);
+      ares_buf_tag(buf);
+      len = ares_buf_consume_until_charset(buf, (const unsigned char *)"&", 1,
+                                           ARES_FALSE);
+      if (len > 0) {
+        status = ares_buf_tag_fetch_strdup(buf, &val);
+        if (status != ARES_SUCCESS) {
+          goto done;
+        }
+
+        if (!ares_uri_str_isvalid(val, SIZE_MAX, ares_uri_chis_query_enc)) {
+          status = ARES_EBADSTR;
+          goto done;
+        }
+
+        status = ares_uri_decode_inplace(val, ARES_TRUE, ARES_TRUE, &len);
+        if (status != ARES_SUCCESS) {
+          goto done;
+        }
+      }
+    }
+
+    if (b != 0) {
+      /* Consume '&' */
+      ares_buf_consume(buf, 1);
+    }
+
+    status = ares_uri_set_query_key(uri, key, val);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+
+    ares_free(key);
+    key = NULL;
+    ares_free(val);
+    val = NULL;
+  }
+
+done:
+  ares_free(key);
+  ares_free(val);
+  return status;
+}
+
+static ares_status_t ares_uri_parse_query(ares_uri_t *uri, ares_buf_t *buf)
+{
+  unsigned char b;
+  ares_status_t status;
+  ares_buf_t   *query = NULL;
+  size_t        len;
+
+  if (ares_buf_len(buf) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  status = ares_buf_peek_byte(buf, &b);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Not a query, must be one of the others */
+  if (b != '?') {
+    return ARES_SUCCESS;
+  }
+
+  /* Only possible terminator is fragment indicator of '#' */
+  ares_buf_consume(buf, 1);
+  ares_buf_tag(buf);
+  len = ares_buf_consume_until_charset(buf, (const unsigned char *)"#", 1,
+                                       ARES_FALSE);
+  if (len == 0) {
+    /* No data, return */
+    return ARES_SUCCESS;
+  }
+
+  status = ares_buf_tag_fetch_constbuf(buf, &query);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  status = ares_uri_parse_query_buf(uri, query);
+  ares_buf_destroy(query);
+
+  return status;
+}
+
+static ares_status_t ares_uri_parse_fragment(ares_uri_t *uri, ares_buf_t *buf)
+{
+  unsigned char b;
+  char         *fragment = NULL;
+  ares_status_t status;
+  size_t        len;
+
+  if (ares_buf_len(buf) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  status = ares_buf_peek_byte(buf, &b);
+  if (status != ARES_SUCCESS) {
+    return status;
+  }
+
+  /* Not a fragment, must be one of the others */
+  if (b != '#') {
+    return ARES_SUCCESS;
+  }
+
+  ares_buf_consume(buf, 1);
+
+  if (ares_buf_len(buf) == 0) {
+    return ARES_SUCCESS;
+  }
+
+  /* Rest of the buffer is the fragment */
+  status = ares_buf_fetch_str_dup(buf, ares_buf_len(buf), &fragment);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  if (!ares_uri_str_isvalid(fragment, SIZE_MAX, ares_uri_chis_fragment_enc)) {
+    status = ARES_EBADSTR;
+    goto done;
+  }
+
+  status = ares_uri_decode_inplace(fragment, ARES_FALSE, ARES_TRUE, &len);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_set_fragment_own(uri, fragment);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+  fragment = NULL;
+
+done:
+  ares_free(fragment);
+  return status;
+}
+
+ares_status_t ares_uri_parse_buf(ares_uri_t **out, ares_buf_t *buf)
+{
+  ares_status_t status;
+  ares_uri_t   *uri = NULL;
+  size_t        orig_pos;
+
+  if (out == NULL || buf == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *out = NULL;
+
+  orig_pos = ares_buf_get_position(buf);
+
+  uri = ares_uri_create();
+  if (uri == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  status = ares_uri_parse_scheme(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_authority(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_path(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_query(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_fragment(uri, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  if (status != ARES_SUCCESS) {
+    ares_buf_set_position(buf, orig_pos);
+    ares_uri_destroy(uri);
+  } else {
+    *out = uri;
+  }
+  return status;
+}
+
+ares_status_t ares_uri_parse(ares_uri_t **out, const char *str)
+{
+  ares_status_t status;
+  ares_buf_t   *buf = NULL;
+
+  if (out == NULL || str == NULL) {
+    return ARES_EFORMERR;
+  }
+
+  *out = NULL;
+
+  buf = ares_buf_create();
+  if (buf == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  status = ares_buf_append_str(buf, str);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  status = ares_uri_parse_buf(out, buf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+done:
+  ares_buf_destroy(buf);
+
+  return status;
+}
diff --git a/deps/cares/src/lib/util/ares_uri.h b/deps/cares/src/lib/util/ares_uri.h
new file mode 100644
index 00000000000000..6a703cba5b53c5
--- /dev/null
+++ b/deps/cares/src/lib/util/ares_uri.h
@@ -0,0 +1,252 @@
+/* MIT License
+ *
+ * Copyright (c) 2024 Brad House
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * SPDX-License-Identifier: MIT
+ */
+#ifndef __ARES_URI_H
+#define __ARES_URI_H
+
+/*! \addtogroup ares_uri URI parser and writer implementation
+ *
+ * This is a fairly complete URI parser and writer implementation (RFC 3986) for
+ * schemes which use the :// syntax. Does not currently support URIs without an
+ * authority section, such as "mailto:person@example.com".
+ *
+ * Its implementation is overkill for our current needs to be able to express
+ * DNS server configuration, but there was really no reason not to support
+ * a greater subset of the specification.
+ *
+ * @{
+ */
+
+
+struct ares_uri;
+
+/*! URI object */
+typedef struct ares_uri ares_uri_t;
+
+/*! Create a new URI object
+ *
+ *  \return new ares_uri_t, must be freed with ares_uri_destroy()
+ */
+ares_uri_t             *ares_uri_create(void);
+
+/*! Destroy an initialized URI object
+ *
+ *  \param[in] uri  Initialized URI object
+ */
+void                    ares_uri_destroy(ares_uri_t *uri);
+
+/*! Set the URI scheme.  Automatically lower-cases the scheme provided.
+ *  Only allows Alpha, Digit, +, -, and . characters.  Maximum length is
+ *  15 characters.  This is required to be set to write a URI.
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \param[in] scheme Scheme to set the object to use
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_scheme(ares_uri_t *uri, const char *scheme);
+
+/*! Retrieve the currently configured URI scheme.
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \return string containing URI scheme
+ */
+const char    *ares_uri_get_scheme(const ares_uri_t *uri);
+
+/*! Set the username in the URI object
+ *
+ *  \param[in] uri      Initialized URI object
+ *  \param[in] username Username to set. May be NULL to unset existing username.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_username(ares_uri_t *uri, const char *username);
+
+/*! Retrieve the currently configured username.
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \return string containing username, maybe NULL if not set.
+ */
+const char    *ares_uri_get_username(const ares_uri_t *uri);
+
+/*! Set the password in the URI object
+ *
+ *  \param[in] uri      Initialized URI object
+ *  \param[in] password Password to set. May be NULL to unset existing password.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_password(ares_uri_t *uri, const char *password);
+
+/*! Retrieve the currently configured password.
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \return string containing password, maybe NULL if not set.
+ */
+const char    *ares_uri_get_password(const ares_uri_t *uri);
+
+/*! Set the host or ip address in the URI object.  This is required to be
+ *  set to write a URI.  The character set is strictly validated.
+ *
+ *  \param[in] uri      Initialized URI object
+ *  \param[in] host     IPv4, IPv6, or hostname to set.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_host(ares_uri_t *uri, const char *host);
+
+/*! Retrieve the currently configured host (or ip address).  IPv6 addresses
+ *  May include a link-local scope (e.g. fe80::b542:84df:1719:65e3%en0).
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \return string containing host, maybe NULL if not set.
+ */
+const char    *ares_uri_get_host(const ares_uri_t *uri);
+
+/*! Set the port to use in the URI object.  A port value of 0 will omit
+ *  the port from the URI when written, thus using the scheme's default.
+ *
+ *  \param[in] uri  Initialized URI object
+ *  \param[in] port Port to set. Use 0 to unset.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_port(ares_uri_t *uri, unsigned short port);
+
+/*! Retrieve the currently configured port
+ *
+ *  \param[in] uri    Initialized URI object
+ *  \return port number, or 0 if not set.
+ */
+unsigned short ares_uri_get_port(const ares_uri_t *uri);
+
+/*! Set the path in the URI object.  Unsupported characters will be URI-encoded
+ *  when written.
+ *
+ *  \param[in] uri  Initialized URI object
+ *  \param[in] path Path to set. May be NULL to unset existing path.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_path(ares_uri_t *uri, const char *path);
+
+/*! Retrieves the path in the URI object.  If retrieved after parse, this
+ *  value will be URI-decoded already.
+ *
+ *  \param[in] uri Initialized URI object
+ *  \return path string, or NULL if not set.
+ */
+const char    *ares_uri_get_path(const ares_uri_t *uri);
+
+/*! Set a new query key/value pair.  There is no set order for query keys
+ *  when output in the URI, they will be emitted in a random order.  Keys are
+ *  case-insensitive. Query keys and values will be automatically URI-encoded
+ *  when written.
+ *
+ *  \param[in] uri  Initialized URI object
+ *  \param[in] key  Query key to use, must be non-zero length.
+ *  \param[in] val  Query value to use, may be NULL.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_query_key(ares_uri_t *uri, const char *key,
+                                      const char *val);
+
+/*! Delete a specific query key.
+ *
+ *  \param[in] uri Initialized URI object
+ *  \param[in] key Key to delete.
+ *  \return ARES_SUCCESS if deleted, ARES_ENOTFOUND if not found
+ */
+ares_status_t  ares_uri_del_query_key(ares_uri_t *uri, const char *key);
+
+/*! Retrieve the value associted with a query key. Keys are case-insensitive.
+ *
+ *  \param[in] uri Initialized URI object
+ *  \param[in] key Key to retrieve.
+ *  \return string representing value, may be NULL if either not found or
+ *          NULL value set.  There is currently no way to indicate the
+ *          difference.
+ */
+const char    *ares_uri_get_query_key(const ares_uri_t *uri, const char *key);
+
+/*! Retrieve a complete list of query keys.
+ *
+ *  \param[in]  uri Initialized URI object
+ *  \param[out] num Number of keys.
+ *  \return NULL on failure or no keys. Use
+ *          ares_free_array(keys, num, ares_free) when done with array.
+ */
+char         **ares_uri_get_query_keys(const ares_uri_t *uri, size_t *num);
+
+/*! Set the fragment in the URI object.  Unsupported characters will be
+ *  URI-encoded when written.
+ *
+ *  \param[in] uri      Initialized URI object
+ *  \param[in] fragment Fragment to set. May be NULL to unset existing fragment.
+ *  \return ARES_SUCCESS on success
+ */
+ares_status_t  ares_uri_set_fragment(ares_uri_t *uri, const char *fragment);
+
+/*! Retrieves the fragment in the URI object.  If retrieved after parse, this
+ *  value will be URI-decoded already.
+ *
+ *  \param[in] uri Initialized URI object
+ *  \return fragment string, or NULL if not set.
+ */
+const char    *ares_uri_get_fragment(const ares_uri_t *uri);
+
+/*! Parse the provided URI buffer into a new URI object.
+ *
+ *  \param[out] out  Returned new URI object. free with ares_uri_destroy().
+ *  \param[in]  buf  Buffer object containing the URI
+ *  \return ARES_SUCCESS on successful parse. On failure the 'buf' object will
+ *          be restored to its initial state in case another parser needs to
+ *          be attempted.
+ */
+ares_status_t  ares_uri_parse_buf(ares_uri_t **out, ares_buf_t *buf);
+
+/*! Parse the provided URI string into a new URI object.
+ *
+ *  \param[out] out  Returned new URI object. free with ares_uri_destroy().
+ *  \param[in]  uri  URI string to parse
+ *  \return ARES_SUCCESS on successful parse
+ */
+ares_status_t  ares_uri_parse(ares_uri_t **out, const char *uri);
+
+/*! Write URI object to a new string buffer.  Requires at least the scheme
+ *  and host to be set for this to succeed.
+ *
+ *  \param[out] out  Returned new URI string. Free with ares_free().
+ *  \param[in]  uri  Initialized URI object.
+ *  \return ARES_SUCCESS on successful write.
+ */
+ares_status_t  ares_uri_write(char **out, const ares_uri_t *uri);
+
+/*! Write URI object to an existing ares_buf_t object.  Requires at least the
+ *  scheme and host to be set for this to succeed.
+ *
+ *  \param[in]     uri  Initialized URI object.
+ *  \param[in,out] buf  Destination buf object.
+ *  \return ARES_SUCCESS on successful write.
+ */
+ares_status_t  ares_uri_write_buf(const ares_uri_t *uri, ares_buf_t *buf);
+
+/*! @} */
+
+#endif /* __ARES_URI_H */
diff --git a/deps/cares/src/tools/CMakeLists.txt b/deps/cares/src/tools/CMakeLists.txt
index e23d0f23c781f7..c8c0041e54de81 100644
--- a/deps/cares/src/tools/CMakeLists.txt
+++ b/deps/cares/src/tools/CMakeLists.txt
@@ -11,6 +11,7 @@ IF (CARES_BUILD_TOOLS)
 		PUBLIC "$<BUILD_INTERFACE:${PROJECT_BINARY_DIR}>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/src/lib>"
+		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/src/lib/include>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>"
 		       "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
 		PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}"
@@ -25,13 +26,18 @@ IF (CARES_BUILD_TOOLS)
 
 	TARGET_COMPILE_DEFINITIONS (ahost PRIVATE HAVE_CONFIG_H=1 CARES_NO_DEPRECATED)
 	TARGET_LINK_LIBRARIES (ahost PRIVATE ${PROJECT_NAME})
+
+	# Avoid "fatal error C1041: cannot open program database" due to multiple
+	# targets trying to use the same PDB.  /FS does NOT resolve this issue.
+	SET_TARGET_PROPERTIES(ahost PROPERTIES COMPILE_PDB_NAME ahost.pdb)
+
 	IF (CARES_INSTALL)
 		INSTALL (TARGETS ahost COMPONENT Tools ${TARGETS_INST_DEST})
 	ENDIF ()
 
 
 	# Build adig
-	ADD_EXECUTABLE (adig adig.c ${SAMPLESOURCES})
+	ADD_EXECUTABLE (adig adig.c)
 	# Don't build adig and ahost in parallel.  This is to prevent a Windows MSVC
 	# build error due to them both using the same source files.
 	ADD_DEPENDENCIES(adig ahost)
@@ -39,6 +45,7 @@ IF (CARES_BUILD_TOOLS)
 		PUBLIC "$<BUILD_INTERFACE:${PROJECT_BINARY_DIR}>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/src/lib>"
+		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/src/lib/include>"
 		       "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>"
 		       "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>"
 		PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}"
@@ -53,6 +60,11 @@ IF (CARES_BUILD_TOOLS)
 
 	TARGET_COMPILE_DEFINITIONS (adig PRIVATE HAVE_CONFIG_H=1 CARES_NO_DEPRECATED)
 	TARGET_LINK_LIBRARIES (adig PRIVATE ${PROJECT_NAME})
+
+	# Avoid "fatal error C1041: cannot open program database" due to multiple
+	# targets trying to use the same PDB.  /FS does NOT resolve this issue.
+	SET_TARGET_PROPERTIES(adig PROPERTIES COMPILE_PDB_NAME adig.pdb)
+
 	IF (CARES_INSTALL)
 		INSTALL (TARGETS adig COMPONENT Tools ${TARGETS_INST_DEST})
 	ENDIF ()
diff --git a/deps/cares/src/tools/Makefile.am b/deps/cares/src/tools/Makefile.am
index ba7a672f89faf5..439b40b192a880 100644
--- a/deps/cares/src/tools/Makefile.am
+++ b/deps/cares/src/tools/Makefile.am
@@ -16,6 +16,7 @@ AM_CPPFLAGS += -I$(top_builddir)/include \
                -I$(top_builddir)/src/lib \
                -I$(top_srcdir)/include \
                -I$(top_srcdir)/src/lib \
+               -I$(top_srcdir)/src/lib/include \
                -DCARES_NO_DEPRECATED
 
 include Makefile.inc
@@ -28,6 +29,6 @@ ahost_SOURCES = ahost.c $(SAMPLESOURCES) $(SAMPLEHEADERS)
 ahost_CFLAGS = $(AM_CFLAGS)
 ahost_CPPFLAGS = $(AM_CPPFLAGS)
 
-adig_SOURCES = adig.c $(SAMPLESOURCES) $(SAMPLEHEADERS)
+adig_SOURCES = adig.c
 adig_CFLAGS = $(AM_CFLAGS)
 adig_CPPFLAGS = $(AM_CPPFLAGS)
diff --git a/deps/cares/src/tools/Makefile.in b/deps/cares/src/tools/Makefile.in
index e1b661ec1d7cbf..ace5023f03cfb6 100644
--- a/deps/cares/src/tools/Makefile.in
+++ b/deps/cares/src/tools/Makefile.in
@@ -126,12 +126,7 @@ CONFIG_CLEAN_FILES =
 CONFIG_CLEAN_VPATH_FILES =
 am__EXEEXT_1 = ahost$(EXEEXT) adig$(EXEEXT)
 PROGRAMS = $(noinst_PROGRAMS)
-am__dirstamp = $(am__leading_dot)dirstamp
-am__objects_1 = adig-ares_getopt.$(OBJEXT) \
-	../lib/str/adig-ares_strcasecmp.$(OBJEXT)
-am__objects_2 =
-am_adig_OBJECTS = adig-adig.$(OBJEXT) $(am__objects_1) \
-	$(am__objects_2)
+am_adig_OBJECTS = adig-adig.$(OBJEXT)
 adig_OBJECTS = $(am_adig_OBJECTS)
 adig_LDADD = $(LDADD)
 am__DEPENDENCIES_1 =
@@ -144,9 +139,9 @@ am__v_lt_1 =
 adig_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \
 	$(LIBTOOLFLAGS) --mode=link $(CCLD) $(adig_CFLAGS) $(CFLAGS) \
 	$(AM_LDFLAGS) $(LDFLAGS) -o $@
-am__objects_3 = ahost-ares_getopt.$(OBJEXT) \
-	../lib/str/ahost-ares_strcasecmp.$(OBJEXT)
-am_ahost_OBJECTS = ahost-ahost.$(OBJEXT) $(am__objects_3) \
+am__objects_1 = ahost-ares_getopt.$(OBJEXT)
+am__objects_2 =
+am_ahost_OBJECTS = ahost-ahost.$(OBJEXT) $(am__objects_1) \
 	$(am__objects_2)
 ahost_OBJECTS = $(am_ahost_OBJECTS)
 ahost_LDADD = $(LDADD)
@@ -170,9 +165,7 @@ am__v_at_1 =
 DEFAULT_INCLUDES = 
 depcomp = $(SHELL) $(top_srcdir)/config/depcomp
 am__maybe_remake_depfiles = depfiles
-am__depfiles_remade = ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po \
-	../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po \
-	./$(DEPDIR)/adig-adig.Po ./$(DEPDIR)/adig-ares_getopt.Po \
+am__depfiles_remade = ./$(DEPDIR)/adig-adig.Po \
 	./$(DEPDIR)/ahost-ahost.Po ./$(DEPDIR)/ahost-ares_getopt.Po
 am__mv = mv -f
 COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
@@ -230,7 +223,8 @@ AM_CFLAGS = @AM_CFLAGS@
 # might possibly already be installed in the system.
 AM_CPPFLAGS = @AM_CPPFLAGS@ -I$(top_builddir)/include \
 	-I$(top_builddir)/src/lib -I$(top_srcdir)/include \
-	-I$(top_srcdir)/src/lib -DCARES_NO_DEPRECATED
+	-I$(top_srcdir)/src/lib -I$(top_srcdir)/src/lib/include \
+	-DCARES_NO_DEPRECATED
 AM_DEFAULT_VERBOSITY = @AM_DEFAULT_VERBOSITY@
 AR = @AR@
 AS = @AS@
@@ -396,12 +390,8 @@ EXTRA_DIST = CMakeLists.txt Makefile.inc
 
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
-SAMPLESOURCES = ares_getopt.c		\
-  ../lib/str/ares_strcasecmp.c
-
-SAMPLEHEADERS = ares_getopt.h		\
-  ../lib/str/ares_strcasecmp.h
-
+SAMPLESOURCES = ares_getopt.c
+SAMPLEHEADERS = ares_getopt.h
 
 # We're not interested in code coverage of the test apps themselves, but need
 # to link with gcov if building with code coverage enabled
@@ -409,7 +399,7 @@ LDADD = $(top_builddir)/src/lib/libcares.la $(CODE_COVERAGE_LIBS)
 ahost_SOURCES = ahost.c $(SAMPLESOURCES) $(SAMPLEHEADERS)
 ahost_CFLAGS = $(AM_CFLAGS)
 ahost_CPPFLAGS = $(AM_CPPFLAGS)
-adig_SOURCES = adig.c $(SAMPLESOURCES) $(SAMPLEHEADERS)
+adig_SOURCES = adig.c
 adig_CFLAGS = $(AM_CFLAGS)
 adig_CPPFLAGS = $(AM_CPPFLAGS)
 all: all-am
@@ -450,21 +440,10 @@ $(am__aclocal_m4_deps):
 clean-noinstPROGRAMS:
 	$(am__rm_f) $(noinst_PROGRAMS)
 	test -z "$(EXEEXT)" || $(am__rm_f) $(noinst_PROGRAMS:$(EXEEXT)=)
-../lib/str/$(am__dirstamp):
-	@$(MKDIR_P) ../lib/str
-	@: >>../lib/str/$(am__dirstamp)
-../lib/str/$(DEPDIR)/$(am__dirstamp):
-	@$(MKDIR_P) ../lib/str/$(DEPDIR)
-	@: >>../lib/str/$(DEPDIR)/$(am__dirstamp)
-../lib/str/adig-ares_strcasecmp.$(OBJEXT): ../lib/str/$(am__dirstamp) \
-	../lib/str/$(DEPDIR)/$(am__dirstamp)
 
 adig$(EXEEXT): $(adig_OBJECTS) $(adig_DEPENDENCIES) $(EXTRA_adig_DEPENDENCIES) 
 	@rm -f adig$(EXEEXT)
 	$(AM_V_CCLD)$(adig_LINK) $(adig_OBJECTS) $(adig_LDADD) $(LIBS)
-../lib/str/ahost-ares_strcasecmp.$(OBJEXT):  \
-	../lib/str/$(am__dirstamp) \
-	../lib/str/$(DEPDIR)/$(am__dirstamp)
 
 ahost$(EXEEXT): $(ahost_OBJECTS) $(ahost_DEPENDENCIES) $(EXTRA_ahost_DEPENDENCIES) 
 	@rm -f ahost$(EXEEXT)
@@ -472,15 +451,11 @@ ahost$(EXEEXT): $(ahost_OBJECTS) $(ahost_DEPENDENCIES) $(EXTRA_ahost_DEPENDENCIE
 
 mostlyclean-compile:
 	-rm -f *.$(OBJEXT)
-	-rm -f ../lib/str/*.$(OBJEXT)
 
 distclean-compile:
 	-rm -f *.tab.c
 
-@AMDEP_TRUE@@am__include@ @am__quote@../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/adig-adig.Po@am__quote@ # am--include-marker
-@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/adig-ares_getopt.Po@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ahost-ahost.Po@am__quote@ # am--include-marker
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ahost-ares_getopt.Po@am__quote@ # am--include-marker
 
@@ -528,34 +503,6 @@ adig-adig.obj: adig.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -c -o adig-adig.obj `if test -f 'adig.c'; then $(CYGPATH_W) 'adig.c'; else $(CYGPATH_W) '$(srcdir)/adig.c'; fi`
 
-adig-ares_getopt.o: ares_getopt.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -MT adig-ares_getopt.o -MD -MP -MF $(DEPDIR)/adig-ares_getopt.Tpo -c -o adig-ares_getopt.o `test -f 'ares_getopt.c' || echo '$(srcdir)/'`ares_getopt.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/adig-ares_getopt.Tpo $(DEPDIR)/adig-ares_getopt.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_getopt.c' object='adig-ares_getopt.o' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -c -o adig-ares_getopt.o `test -f 'ares_getopt.c' || echo '$(srcdir)/'`ares_getopt.c
-
-adig-ares_getopt.obj: ares_getopt.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -MT adig-ares_getopt.obj -MD -MP -MF $(DEPDIR)/adig-ares_getopt.Tpo -c -o adig-ares_getopt.obj `if test -f 'ares_getopt.c'; then $(CYGPATH_W) 'ares_getopt.c'; else $(CYGPATH_W) '$(srcdir)/ares_getopt.c'; fi`
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/adig-ares_getopt.Tpo $(DEPDIR)/adig-ares_getopt.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='ares_getopt.c' object='adig-ares_getopt.obj' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -c -o adig-ares_getopt.obj `if test -f 'ares_getopt.c'; then $(CYGPATH_W) 'ares_getopt.c'; else $(CYGPATH_W) '$(srcdir)/ares_getopt.c'; fi`
-
-../lib/str/adig-ares_strcasecmp.o: ../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -MT ../lib/str/adig-ares_strcasecmp.o -MD -MP -MF ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Tpo -c -o ../lib/str/adig-ares_strcasecmp.o `test -f '../lib/str/ares_strcasecmp.c' || echo '$(srcdir)/'`../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Tpo ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='../lib/str/ares_strcasecmp.c' object='../lib/str/adig-ares_strcasecmp.o' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -c -o ../lib/str/adig-ares_strcasecmp.o `test -f '../lib/str/ares_strcasecmp.c' || echo '$(srcdir)/'`../lib/str/ares_strcasecmp.c
-
-../lib/str/adig-ares_strcasecmp.obj: ../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -MT ../lib/str/adig-ares_strcasecmp.obj -MD -MP -MF ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Tpo -c -o ../lib/str/adig-ares_strcasecmp.obj `if test -f '../lib/str/ares_strcasecmp.c'; then $(CYGPATH_W) '../lib/str/ares_strcasecmp.c'; else $(CYGPATH_W) '$(srcdir)/../lib/str/ares_strcasecmp.c'; fi`
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Tpo ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='../lib/str/ares_strcasecmp.c' object='../lib/str/adig-ares_strcasecmp.obj' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(adig_CPPFLAGS) $(CPPFLAGS) $(adig_CFLAGS) $(CFLAGS) -c -o ../lib/str/adig-ares_strcasecmp.obj `if test -f '../lib/str/ares_strcasecmp.c'; then $(CYGPATH_W) '../lib/str/ares_strcasecmp.c'; else $(CYGPATH_W) '$(srcdir)/../lib/str/ares_strcasecmp.c'; fi`
-
 ahost-ahost.o: ahost.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -MT ahost-ahost.o -MD -MP -MF $(DEPDIR)/ahost-ahost.Tpo -c -o ahost-ahost.o `test -f 'ahost.c' || echo '$(srcdir)/'`ahost.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/ahost-ahost.Tpo $(DEPDIR)/ahost-ahost.Po
@@ -584,20 +531,6 @@ ahost-ares_getopt.obj: ares_getopt.c
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -c -o ahost-ares_getopt.obj `if test -f 'ares_getopt.c'; then $(CYGPATH_W) 'ares_getopt.c'; else $(CYGPATH_W) '$(srcdir)/ares_getopt.c'; fi`
 
-../lib/str/ahost-ares_strcasecmp.o: ../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -MT ../lib/str/ahost-ares_strcasecmp.o -MD -MP -MF ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Tpo -c -o ../lib/str/ahost-ares_strcasecmp.o `test -f '../lib/str/ares_strcasecmp.c' || echo '$(srcdir)/'`../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Tpo ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='../lib/str/ares_strcasecmp.c' object='../lib/str/ahost-ares_strcasecmp.o' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -c -o ../lib/str/ahost-ares_strcasecmp.o `test -f '../lib/str/ares_strcasecmp.c' || echo '$(srcdir)/'`../lib/str/ares_strcasecmp.c
-
-../lib/str/ahost-ares_strcasecmp.obj: ../lib/str/ares_strcasecmp.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -MT ../lib/str/ahost-ares_strcasecmp.obj -MD -MP -MF ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Tpo -c -o ../lib/str/ahost-ares_strcasecmp.obj `if test -f '../lib/str/ares_strcasecmp.c'; then $(CYGPATH_W) '../lib/str/ares_strcasecmp.c'; else $(CYGPATH_W) '$(srcdir)/../lib/str/ares_strcasecmp.c'; fi`
-@am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Tpo ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='../lib/str/ares_strcasecmp.c' object='../lib/str/ahost-ares_strcasecmp.obj' libtool=no @AMDEPBACKSLASH@
-@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(ahost_CPPFLAGS) $(CPPFLAGS) $(ahost_CFLAGS) $(CFLAGS) -c -o ../lib/str/ahost-ares_strcasecmp.obj `if test -f '../lib/str/ares_strcasecmp.c'; then $(CYGPATH_W) '../lib/str/ares_strcasecmp.c'; else $(CYGPATH_W) '$(srcdir)/../lib/str/ares_strcasecmp.c'; fi`
-
 mostlyclean-libtool:
 	-rm -f *.lo
 
@@ -718,8 +651,6 @@ clean-generic:
 distclean-generic:
 	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
 	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
-	-$(am__rm_f) ../lib/str/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) ../lib/str/$(am__dirstamp)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -730,10 +661,7 @@ clean-am: clean-generic clean-libtool clean-noinstPROGRAMS \
 	mostlyclean-am
 
 distclean: distclean-am
-	-rm -f ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po
-	-rm -f ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po
 	-rm -f ./$(DEPDIR)/adig-adig.Po
-	-rm -f ./$(DEPDIR)/adig-ares_getopt.Po
 	-rm -f ./$(DEPDIR)/ahost-ahost.Po
 	-rm -f ./$(DEPDIR)/ahost-ares_getopt.Po
 	-rm -f Makefile
@@ -781,10 +709,7 @@ install-ps-am:
 installcheck-am:
 
 maintainer-clean: maintainer-clean-am
-	-rm -f ../lib/str/$(DEPDIR)/adig-ares_strcasecmp.Po
-	-rm -f ../lib/str/$(DEPDIR)/ahost-ares_strcasecmp.Po
 	-rm -f ./$(DEPDIR)/adig-adig.Po
-	-rm -f ./$(DEPDIR)/adig-ares_getopt.Po
 	-rm -f ./$(DEPDIR)/ahost-ahost.Po
 	-rm -f ./$(DEPDIR)/ahost-ares_getopt.Po
 	-rm -f Makefile
diff --git a/deps/cares/src/tools/Makefile.inc b/deps/cares/src/tools/Makefile.inc
index 4c6b6aaa978ce3..088c7d4e06206d 100644
--- a/deps/cares/src/tools/Makefile.inc
+++ b/deps/cares/src/tools/Makefile.inc
@@ -1,7 +1,5 @@
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
-SAMPLESOURCES = ares_getopt.c		\
-  ../lib/str/ares_strcasecmp.c
+SAMPLESOURCES = ares_getopt.c
 
-SAMPLEHEADERS = ares_getopt.h		\
-  ../lib/str/ares_strcasecmp.h
+SAMPLEHEADERS = ares_getopt.h
diff --git a/deps/cares/src/tools/adig.c b/deps/cares/src/tools/adig.c
index 8b2ad2e96a6219..fce210a8053578 100644
--- a/deps/cares/src/tools/adig.c
+++ b/deps/cares/src/tools/adig.c
@@ -43,226 +43,159 @@
 #endif
 
 #include "ares.h"
+#include "ares_array.h"
+#include "ares_buf.h"
 #include "ares_dns.h"
+#include "ares_getopt.h"
+#include "ares_mem.h"
+#include "ares_str.h"
 
-#ifndef HAVE_STRDUP
-#  include "str/ares_str.h"
-#  define strdup(ptr) ares_strdup(ptr)
-#endif
+#include "limits.h"
 
-#ifndef HAVE_STRCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strcasecmp(p1, p2) ares_strcasecmp(p1, p2)
+#ifndef PATH_MAX
+#  define PATH_MAX 1024
 #endif
 
-#ifndef HAVE_STRNCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strncasecmp(p1, p2, n) ares_strncasecmp(p1, p2, n)
-#endif
-
-#include "ares_getopt.h"
+typedef struct {
+  unsigned short port;
+  size_t         tries;
+  size_t         ndots;
+  ares_bool_t    tcp;
+  ares_bool_t    ignore_tc;
+  char          *search;
+  ares_bool_t    do_search;
+  ares_bool_t    aa_flag;
+  ares_bool_t    ad_flag;
+  ares_bool_t    cd_flag;
+  ares_bool_t    rd_flag;
+  /* ares_bool_t do_flag; */
+  ares_bool_t    edns;
+  size_t         udp_size;
+  ares_bool_t    primary;
+  ares_bool_t    aliases;
+  ares_bool_t    stayopen;
+  ares_bool_t    dns0x20;
+  ares_bool_t    display_class;
+  ares_bool_t    display_ttl;
+  ares_bool_t    display_command;
+  ares_bool_t    display_stats;
+  ares_bool_t    display_query;
+  ares_bool_t    display_question;
+  ares_bool_t    display_answer;
+  ares_bool_t    display_authority;
+  ares_bool_t    display_additional;
+  ares_bool_t    display_comments;
+} dns_options_t;
 
 typedef struct {
+  dns_options_t       opts;
   ares_bool_t         is_help;
+  ares_bool_t         no_rcfile;
   struct ares_options options;
   int                 optmask;
   ares_dns_class_t    qclass;
   ares_dns_rec_type_t qtype;
-  int                 args_processed;
+  char               *name;
   char               *servers;
   char                error[256];
 } adig_config_t;
 
-typedef struct {
-  const char *name;
-  int         value;
-} nv_t;
-
-static const nv_t configflags[] = {
-  { "usevc",     ARES_FLAG_USEVC     },
-  { "primary",   ARES_FLAG_PRIMARY   },
-  { "igntc",     ARES_FLAG_IGNTC     },
-  { "norecurse", ARES_FLAG_NORECURSE },
-  { "stayopen",  ARES_FLAG_STAYOPEN  },
-  { "noaliases", ARES_FLAG_NOALIASES },
-  { "edns",      ARES_FLAG_EDNS      },
-  { "dns0x20",   ARES_FLAG_DNS0x20   }
+static adig_config_t global_config;
+
+
+static const char   *helpstr[] = {
+  "usage: adig [@server] [-c class] [-p port#] [-q name] [-t type] [-x addr]",
+  "       [name] [type] [class] [queryopt...]",
+  "",
+  "@server: server ip address.  May specify multiple in comma delimited "
+    "format.",
+  "         may be specified in URI format",
+  "name:    name of the resource record that is to be looked up",
+  "type:    what type of query is required.  e.g. - A, AAAA, MX, TXT, etc.  If",
+  "         no specified, A will be used.",
+  "class:   Sets the query class, defaults to IN.  May also be HS or CH.",
+  "",
+  "FLAGS",
+  "-c class: Sets the query class, defaults to IN.  May also be HS or CH.",
+  "-h:       Prints this help.",
+  "-p port:  Sends query to a port other than 53.  Often recommended to set",
+  "          the port using @server instead.",
+  "-q name:  Specifies the domain name to query. Useful to distinguish name",
+  "          from other arguments",
+  "-r:       Skip adigrc processing",
+  "-s:       Server (alias for @server syntax), compatibility with old cmdline",
+  "-t type:  Indicates resource record type to query. Useful to distinguish",
+  "          type from other arguments",
+  "-x addr:  Simplified reverse lookups.  Sets the type to PTR and forms a",
+  "          valid in-arpa query string",
+  "",
+  "QUERY OPTIONS",
+  "+[no]aaonly:      Sets the aa flag in the query. Default is off.",
+  "+[no]aaflag:      Alias for +[no]aaonly",
+  "+[no]additional:  Toggles printing the additional section. On by default.",
+  "+[no]adflag:      Sets the ad (authentic data) bit in the query. Default is",
+  "                  off.",
+  "+[no]aliases:     Whether or not to honor the HOSTALIASES file. Default is",
+  "                  on.",
+  "+[no]all:         Toggles all of +[no]cmd, +[no]stats, +[no]question,",
+  "                  +[no]answer, +[no]authority, +[no]additional, "
+    "+[no]comments",
+  "+[no]answer:      Toggles printing the answer. On by default.",
+  "+[no]authority:   Toggles printing the authority. On by default.",
+  "+bufsize=#:       UDP EDNS 0 packet size allowed. Defaults to 1232.",
+  "+[no]cdflag:      Sets the CD (checking disabled) bit in the query. Default",
+  "                  is off.",
+  "+[no]class:       Display the class when printing the record. On by "
+    "default.",
+  "+[no]cmd:         Toggles printing the command requested. On by default.",
+  "+[no]comments:    Toggles printing the comments. On by default",
+  "+[no]defname:     Alias for +[no]search",
+  "+domain=somename: Sets the search list to a single domain.",
+  "+[no]dns0x20:     Whether or not to use DNS 0x20 case randomization when",
+  "                  sending queries.  Default is off.",
+  "+[no]edns[=#]:    Enable or disable EDNS.  Only allows a value of 0 if",
+  "                  specified. Default is to enable EDNS.",
+  "+[no]ignore:      Ignore truncation on UDP, by default retried on TCP.",
+  "+[no]keepopen:    Whether or not the server connection should be "
+    "persistent.",
+  "                  Default is off.",
+  "+ndots=#:         Sets the number of dots that must appear before being",
+  "                  considered absolute. Defaults to 1.",
+  "+[no]primary:     Whether or not to only use a single server if more than "
+    "one",
+  "                  server is available.  Defaults to using all servers.",
+  "+[no]qr:          Toggles printing the request query. Off by default.",
+  "+[no]question:    Toggles printing the question. On by default.",
+  "+[no]recurse:     Toggles the RD (Recursion Desired) bit. On by default.",
+  "+retry=#:         Same as +tries but does not include the initial attempt.",
+  "+[no]search:      To use or not use the search list. Search list is not "
+    "used",
+  "                  by default.",
+  "+[no]stats:       Toggles printing the statistics. On by default.",
+  "+[no]tcp:         Whether to use TCP when querying name servers. Default is",
+  "                  UDP.",
+  "+tries=#:         Number of query tries. Defaults to 3.",
+  "+[no]ttlid:       Display the TTL when printing the record. On by default.",
+  "+[no]vc:          Alias for +[no]tcp",
+  "",
+  NULL
 };
-static const size_t nconfigflags = sizeof(configflags) / sizeof(*configflags);
-
-static int          lookup_flag(const nv_t *nv, size_t num_nv, const char *name)
-{
-  size_t i;
-
-  if (name == NULL) {
-    return 0;
-  }
-
-  for (i = 0; i < num_nv; i++) {
-    if (strcasecmp(nv[i].name, name) == 0) {
-      return nv[i].value;
-    }
-  }
-
-  return 0;
-}
 
-static void free_config(adig_config_t *config)
+static void free_config(void)
 {
-  free(config->servers);
-  memset(config, 0, sizeof(*config));
+  free(global_config.servers);
+  free(global_config.name);
+  free(global_config.opts.search);
+  memset(&global_config, 0, sizeof(global_config));
 }
 
 static void print_help(void)
 {
-  /* Split due to maximum c89 string literal of 509 bytes */
+  size_t i;
   printf("adig version %s\n\n", ares_version(NULL));
-  printf(
-    "usage: adig [-h] [-d] [-f flag] [[-s server] ...] [-T|U port] [-c class]\n"
-    "            [-t type] name ...\n\n");
-  printf("  -h : Display this help and exit.\n");
-  printf("  -d : Print some extra debugging output.\n");
-  printf(
-    "  -f flag   : Add a behavior control flag. May be specified more than "
-    "once\n"
-    "              to add additional flags. Possible values are:\n"
-    "              igntc     - do not retry a truncated query as TCP, just\n"
-    "                          return the truncated answer\n"
-    "              noaliases - don't honor the HOSTALIASES environment\n"
-    "                          variable\n");
-  printf("              norecurse - don't query upstream servers recursively\n"
-         "              primary   - use the first server\n"
-         "              stayopen  - don't close the communication sockets\n"
-         "              usevc     - use TCP only\n"
-         "              edns      - use EDNS\n"
-         "              dns0x20   - enable DNS 0x20 support\n");
-  printf(
-    "  -s server : Connect to the specified DNS server, instead of the\n"
-    "              system's default one(s). Servers are tried in round-robin,\n"
-    "              if the previous one failed.\n");
-  printf("  -T port   : Connect to the specified TCP port of DNS server.\n");
-  printf("  -U port   : Connect to the specified UDP port of DNS server.\n");
-  printf("  -c class  : Set the query class. Possible values for class are:\n"
-         "              ANY, CHAOS, HS and IN (default)\n");
-  printf(
-    "  -t type   : Query records of the specified type. Possible values for\n"
-    "              type are:\n"
-    "              A (default), AAAA, ANY, CNAME, HINFO, MX, NAPTR, NS, PTR,\n"
-    "              SOA, SRV, TXT, TLSA, URI, CAA, SVCB, HTTPS\n\n");
-}
-
-static ares_bool_t read_cmdline(int argc, const char * const *argv,
-                                adig_config_t *config)
-{
-  ares_getopt_state_t state;
-  int                 c;
-
-  ares_getopt_init(&state, argc, argv);
-  state.opterr = 0;
-
-  while ((c = ares_getopt(&state, "dh?f:s:c:t:T:U:")) != -1) {
-    int f;
-
-    switch (c) {
-      case 'd':
-#ifdef WATT32
-        dbug_init();
-#endif
-        break;
-
-      case 'h':
-        config->is_help = ARES_TRUE;
-        return ARES_TRUE;
-
-      case 'f':
-        f = lookup_flag(configflags, nconfigflags, state.optarg);
-        if (f == 0) {
-          snprintf(config->error, sizeof(config->error), "flag %s unknown",
-                   state.optarg);
-        }
-
-        config->options.flags |= f;
-        config->optmask       |= ARES_OPT_FLAGS;
-        break;
-
-      case 's':
-        if (state.optarg == NULL) {
-          snprintf(config->error, sizeof(config->error), "%s",
-                   "missing servers");
-          return ARES_FALSE;
-        }
-        if (config->servers) {
-          free(config->servers);
-        }
-        config->servers = strdup(state.optarg);
-        break;
-
-      case 'c':
-        if (!ares_dns_class_fromstr(&config->qclass, state.optarg)) {
-          snprintf(config->error, sizeof(config->error),
-                   "unrecognized class %s", state.optarg);
-          return ARES_FALSE;
-        }
-        break;
-
-      case 't':
-        if (!ares_dns_rec_type_fromstr(&config->qtype, state.optarg)) {
-          snprintf(config->error, sizeof(config->error), "unrecognized type %s",
-                   state.optarg);
-          return ARES_FALSE;
-        }
-        break;
-
-      case 'T':
-        {
-          /* Set the TCP port number. */
-          long port = strtol(state.optarg, NULL, 0);
-
-          if (port <= 0 || port > 65535) {
-            snprintf(config->error, sizeof(config->error),
-                     "invalid port number");
-            return ARES_FALSE;
-          }
-          config->options.tcp_port  = (unsigned short)port;
-          config->options.flags    |= ARES_FLAG_USEVC;
-          config->optmask          |= ARES_OPT_TCP_PORT;
-        }
-        break;
-
-      case 'U':
-        {
-          /* Set the TCP port number. */
-          long port = strtol(state.optarg, NULL, 0);
-
-          if (port <= 0 || port > 65535) {
-            snprintf(config->error, sizeof(config->error),
-                     "invalid port number");
-            return ARES_FALSE;
-          }
-          config->options.udp_port  = (unsigned short)port;
-          config->options.flags    |= ARES_FLAG_USEVC;
-          config->optmask          |= ARES_OPT_UDP_PORT;
-        }
-        break;
-
-      case ':':
-        snprintf(config->error, sizeof(config->error),
-                 "%c requires an argument", state.optopt);
-        return ARES_FALSE;
-
-      default:
-        snprintf(config->error, sizeof(config->error),
-                 "unrecognized option: %c", state.optopt);
-        return ARES_FALSE;
-    }
+  for (i = 0; helpstr[i] != NULL; i++) {
+    printf("%s\n", helpstr[i]);
   }
-
-  config->args_processed = state.optind;
-  if (config->args_processed >= argc) {
-    snprintf(config->error, sizeof(config->error), "missing query name");
-    return ARES_FALSE;
-  }
-  return ARES_TRUE;
 }
 
 static void print_flags(ares_dns_flags_t flags)
@@ -308,7 +241,11 @@ static void print_header(const ares_dns_record_t *dnsrec)
 static void print_question(const ares_dns_record_t *dnsrec)
 {
   size_t i;
-  printf(";; QUESTION SECTION:\n");
+
+  if (global_config.opts.display_comments) {
+    printf(";; QUESTION SECTION:\n");
+  }
+
   for (i = 0; i < ares_dns_record_query_cnt(dnsrec); i++) {
     const char         *name;
     ares_dns_rec_type_t qtype;
@@ -329,10 +266,17 @@ static void print_question(const ares_dns_record_t *dnsrec)
     if (len + 1 < 16) {
       printf("\t");
     }
-    printf("%s\t%s\n", ares_dns_class_tostr(qclass),
-           ares_dns_rec_type_tostr(qtype));
+
+    if (global_config.opts.display_class) {
+      printf("%s\t", ares_dns_class_tostr(qclass));
+    }
+
+    printf("%s\n", ares_dns_rec_type_tostr(qtype));
+  }
+
+  if (global_config.opts.display_comments) {
+    printf("\n");
   }
-  printf("\n");
 }
 
 static void print_opt_none(const unsigned char *val, size_t val_len)
@@ -667,9 +611,15 @@ static void print_rr(const ares_dns_rr_t *rr)
     printf("\t");
   }
 
-  printf("%u\t%s\t%s\t", ares_dns_rr_get_ttl(rr),
-         ares_dns_class_tostr(ares_dns_rr_get_class(rr)),
-         ares_dns_rec_type_tostr(rtype));
+  if (global_config.opts.display_ttl) {
+    printf("%u\t", ares_dns_rr_get_ttl(rr));
+  }
+
+  if (global_config.opts.display_class) {
+    printf("%s\t", ares_dns_class_tostr(ares_dns_rr_get_class(rr)));
+  }
+
+  printf("%s\t", ares_dns_rec_type_tostr(rtype));
 
   /* Output params here */
   for (i = 0; i < keys_cnt; i++) {
@@ -718,12 +668,12 @@ static void print_rr(const ares_dns_rr_t *rr)
   printf("\n");
 }
 
-static const ares_dns_rr_t *has_opt(ares_dns_record_t *dnsrec,
-                                    ares_dns_section_t section)
+static const ares_dns_rr_t *has_opt(const ares_dns_record_t *dnsrec,
+                                    ares_dns_section_t       section)
 {
   size_t i;
   for (i = 0; i < ares_dns_record_rr_cnt(dnsrec, section); i++) {
-    const ares_dns_rr_t *rr = ares_dns_record_rr_get(dnsrec, section, i);
+    const ares_dns_rr_t *rr = ares_dns_record_rr_get_const(dnsrec, section, i);
     if (ares_dns_rr_get_type(rr) == ARES_REC_TYPE_OPT) {
       return rr;
     }
@@ -731,7 +681,8 @@ static const ares_dns_rr_t *has_opt(ares_dns_record_t *dnsrec,
   return NULL;
 }
 
-static void print_section(ares_dns_record_t *dnsrec, ares_dns_section_t section)
+static void print_section(const ares_dns_record_t *dnsrec,
+                          ares_dns_section_t       section)
 {
   size_t i;
 
@@ -741,18 +692,22 @@ static void print_section(ares_dns_record_t *dnsrec, ares_dns_section_t section)
     return;
   }
 
-  printf(";; %s SECTION:\n", ares_dns_section_tostr(section));
+  if (global_config.opts.display_comments) {
+    printf(";; %s SECTION:\n", ares_dns_section_tostr(section));
+  }
   for (i = 0; i < ares_dns_record_rr_cnt(dnsrec, section); i++) {
-    const ares_dns_rr_t *rr = ares_dns_record_rr_get(dnsrec, section, i);
+    const ares_dns_rr_t *rr = ares_dns_record_rr_get_const(dnsrec, section, i);
     if (ares_dns_rr_get_type(rr) == ARES_REC_TYPE_OPT) {
       continue;
     }
     print_rr(rr);
   }
-  printf("\n");
+  if (global_config.opts.display_comments) {
+    printf("\n");
+  }
 }
 
-static void print_opt_psuedosection(ares_dns_record_t *dnsrec)
+static void print_opt_psuedosection(const ares_dns_record_t *dnsrec)
 {
   const ares_dns_rr_t *rr         = has_opt(dnsrec, ARES_SECTION_ADDITIONAL);
   const unsigned char *cookie     = NULL;
@@ -767,7 +722,6 @@ static void print_opt_psuedosection(ares_dns_record_t *dnsrec)
     cookie = NULL;
   }
 
-
   printf(";; OPT PSEUDOSECTION:\n");
   printf("; EDNS: version: %u, flags: %u; udp: %u\n",
          (unsigned int)ares_dns_rr_get_u8(rr, ARES_RR_OPT_VERSION),
@@ -781,60 +735,82 @@ static void print_opt_psuedosection(ares_dns_record_t *dnsrec)
   }
 }
 
-static void callback(void *arg, int status, int timeouts, unsigned char *abuf,
-                     int alen)
+static void print_record(const ares_dns_record_t *dnsrec)
 {
-  ares_dns_record_t *dnsrec = NULL;
-  (void)arg;
-  (void)timeouts;
+  if (global_config.opts.display_comments) {
+    print_header(dnsrec);
+    print_opt_psuedosection(dnsrec);
+  }
 
-  /* We got a "Server status" */
-  if (status >= ARES_SUCCESS && status <= ARES_EREFUSED) {
-    printf(";; Got answer:");
-  } else {
-    printf(";;");
+  if (global_config.opts.display_question) {
+    print_question(dnsrec);
   }
 
-  if (status != ARES_SUCCESS) {
-    printf(" %s", ares_strerror(status));
+  if (global_config.opts.display_answer) {
+    print_section(dnsrec, ARES_SECTION_ANSWER);
   }
-  printf("\n");
 
-  if (abuf == NULL || alen == 0) {
-    return;
+  if (global_config.opts.display_additional) {
+    print_section(dnsrec, ARES_SECTION_ADDITIONAL);
   }
 
-  status = (int)ares_dns_parse(abuf, (size_t)alen, 0, &dnsrec);
-  if (status != ARES_SUCCESS) {
-    fprintf(stderr, ";; FAILED TO PARSE DNS PACKET: %s\n",
-            ares_strerror(status));
-    return;
+  if (global_config.opts.display_authority) {
+    print_section(dnsrec, ARES_SECTION_AUTHORITY);
   }
 
-  print_header(dnsrec);
-  print_opt_psuedosection(dnsrec);
-  print_question(dnsrec);
-  print_section(dnsrec, ARES_SECTION_ANSWER);
-  print_section(dnsrec, ARES_SECTION_ADDITIONAL);
-  print_section(dnsrec, ARES_SECTION_AUTHORITY);
+  if (global_config.opts.display_stats) {
+    unsigned char *abuf = NULL;
+    size_t         alen = 0;
+    ares_dns_write(dnsrec, &abuf, &alen);
+    printf(";; MSG SIZE  rcvd: %d\n\n", (int)alen);
+    ares_free_string(abuf);
+  }
+}
 
-  printf(";; MSG SIZE  rcvd: %d\n\n", alen);
-  ares_dns_record_destroy(dnsrec);
+static void callback(void *arg, ares_status_t status, size_t timeouts,
+                     const ares_dns_record_t *dnsrec)
+{
+  (void)arg;
+  (void)timeouts;
+
+  if (global_config.opts.display_comments) {
+    /* We got a "Server status" */
+    if (status >= ARES_SUCCESS && status <= ARES_EREFUSED) {
+      printf(";; Got answer:");
+    } else {
+      printf(";;");
+    }
+    if (status != ARES_SUCCESS) {
+      printf(" %s", ares_strerror((int)status));
+    }
+    printf("\n");
+  }
+
+  print_record(dnsrec);
 }
 
-static ares_status_t enqueue_query(ares_channel_t      *channel,
-                                   const adig_config_t *config,
-                                   const char          *name)
+static ares_status_t enqueue_query(ares_channel_t *channel)
 {
   ares_dns_record_t *dnsrec = NULL;
   ares_dns_rr_t     *rr     = NULL;
   ares_status_t      status;
-  unsigned char     *buf      = NULL;
-  size_t             buf_len  = 0;
   unsigned short     flags    = 0;
   char              *nametemp = NULL;
+  const char        *name     = global_config.name;
+
+  if (global_config.opts.aa_flag) {
+    flags |= ARES_FLAG_AA;
+  }
+
+  if (global_config.opts.ad_flag) {
+    flags |= ARES_FLAG_AD;
+  }
+
+  if (global_config.opts.cd_flag) {
+    flags |= ARES_FLAG_CD;
+  }
 
-  if (!(config->options.flags & ARES_FLAG_NORECURSE)) {
+  if (global_config.opts.rd_flag) {
     flags |= ARES_FLAG_RD;
   }
 
@@ -846,7 +822,7 @@ static ares_status_t enqueue_query(ares_channel_t      *channel,
 
   /* If it is a PTR record, convert from ip address into in-arpa form
    * automatically */
-  if (config->qtype == ARES_REC_TYPE_PTR) {
+  if (global_config.qtype == ARES_REC_TYPE_PTR) {
     struct ares_addr addr;
     size_t           len;
     addr.family = AF_UNSPEC;
@@ -857,27 +833,33 @@ static ares_status_t enqueue_query(ares_channel_t      *channel,
     }
   }
 
-  status =
-    ares_dns_record_query_add(dnsrec, name, config->qtype, config->qclass);
+  status = ares_dns_record_query_add(dnsrec, name, global_config.qtype,
+                                     global_config.qclass);
   if (status != ARES_SUCCESS) {
     goto done;
   }
 
-  status = ares_dns_record_rr_add(&rr, dnsrec, ARES_SECTION_ADDITIONAL, "",
-                                  ARES_REC_TYPE_OPT, ARES_CLASS_IN, 0);
-  if (status != ARES_SUCCESS) {
-    goto done;
+  if (global_config.opts.edns) {
+    status = ares_dns_record_rr_add(&rr, dnsrec, ARES_SECTION_ADDITIONAL, "",
+                                    ARES_REC_TYPE_OPT, ARES_CLASS_IN, 0);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+    ares_dns_rr_set_u16(rr, ARES_RR_OPT_UDP_SIZE,
+                        (unsigned short)global_config.opts.udp_size);
+    ares_dns_rr_set_u8(rr, ARES_RR_OPT_VERSION, 0);
   }
-  ares_dns_rr_set_u16(rr, ARES_RR_OPT_UDP_SIZE, 1280);
-  ares_dns_rr_set_u8(rr, ARES_RR_OPT_VERSION, 0);
 
-  status = ares_dns_write(dnsrec, &buf, &buf_len);
-  if (status != ARES_SUCCESS) {
-    goto done;
+  if (global_config.opts.display_query) {
+    printf(";; Sending:\n");
+    print_record(dnsrec);
   }
 
-  ares_send(channel, buf, (int)buf_len, callback, NULL);
-  ares_free_string(buf);
+  if (global_config.opts.do_search) {
+    status = ares_search_dnsrec(channel, dnsrec, callback, NULL);
+  } else {
+    status = ares_send_dnsrec(channel, dnsrec, callback, NULL, NULL);
+  }
 
 done:
   ares_free_string(nametemp);
@@ -924,12 +906,584 @@ static int event_loop(ares_channel_t *channel)
   return 0;
 }
 
+typedef enum {
+  OPT_TYPE_BOOL,
+  OPT_TYPE_STRING,
+  OPT_TYPE_SIZE_T,
+  OPT_TYPE_U16,
+  OPT_TYPE_FUNC
+} opt_type_t;
+
+/* Callback called with OPT_TYPE_FUNC when processing options.
+ * \param[in] prefix  prefix character for option
+ * \param[in] name    name for option
+ * \param[in] is_true ARES_TRUE unless option was prefixed with 'no'
+ * \param[in] value   value for option
+ * \return ARES_TRUE on success, ARES_FALSE on failure.  Should fill in
+ *         global_config.error on error */
+typedef ares_bool_t (*dig_opt_cb_t)(char prefix, const char *name,
+                                    ares_bool_t is_true, const char *value);
+
+static ares_bool_t opt_class_cb(char prefix, const char *name,
+                                ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)is_true;
+
+  if (!ares_dns_class_fromstr(&global_config.qclass, value)) {
+    snprintf(global_config.error, sizeof(global_config.error),
+             "unrecognized class %s", value);
+    return ARES_FALSE;
+  }
+
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_type_cb(char prefix, const char *name,
+                               ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)is_true;
+
+  if (!ares_dns_rec_type_fromstr(&global_config.qtype, value)) {
+    snprintf(global_config.error, sizeof(global_config.error),
+             "unrecognized record type %s", value);
+    return ARES_FALSE;
+  }
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_ptr_cb(char prefix, const char *name,
+                              ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)is_true;
+  global_config.qtype = ARES_REC_TYPE_PTR;
+  ares_free(global_config.name);
+  global_config.name = strdup(value);
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_all_cb(char prefix, const char *name,
+                              ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)value;
+
+  global_config.opts.display_command    = is_true;
+  global_config.opts.display_stats      = is_true;
+  global_config.opts.display_question   = is_true;
+  global_config.opts.display_answer     = is_true;
+  global_config.opts.display_authority  = is_true;
+  global_config.opts.display_additional = is_true;
+  global_config.opts.display_comments   = is_true;
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_edns_cb(char prefix, const char *name,
+                               ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+
+  global_config.opts.edns = is_true;
+  if (is_true && value != NULL && atoi(value) > 0) {
+    snprintf(global_config.error, sizeof(global_config.error),
+             "edns 0 only supported");
+    return ARES_FALSE;
+  }
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_retry_cb(char prefix, const char *name,
+                                ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)is_true;
+
+  if (!ares_str_isnum(value)) {
+    snprintf(global_config.error, sizeof(global_config.error),
+             "value not numeric");
+    return ARES_FALSE;
+  }
+
+  global_config.opts.tries = strtoul(value, NULL, 10) + 1;
+  return ARES_TRUE;
+}
+
+static ares_bool_t opt_dig_bare_cb(char prefix, const char *name,
+                                   ares_bool_t is_true, const char *value)
+{
+  (void)prefix;
+  (void)name;
+  (void)is_true;
+
+  /* Handle @servers */
+  if (*value == '@') {
+    free(global_config.servers);
+    global_config.servers = strdup(value + 1);
+    return ARES_TRUE;
+  }
+
+  /* Make sure we don't pass options */
+  if (*value == '-' || *value == '+') {
+    snprintf(global_config.error, sizeof(global_config.error),
+             "unrecognized argument %s", value);
+    return ARES_FALSE;
+  }
+
+  /* See if it is a DNS class */
+  if (ares_dns_class_fromstr(&global_config.qclass, value)) {
+    return ARES_TRUE;
+  }
+
+  /* See if it is a DNS record type */
+  if (ares_dns_rec_type_fromstr(&global_config.qtype, value)) {
+    return ARES_TRUE;
+  }
+
+  /* See if it is a domain name */
+  if (ares_is_hostname(value)) {
+    free(global_config.name);
+    global_config.name = strdup(value);
+    return ARES_TRUE;
+  }
+
+  snprintf(global_config.error, sizeof(global_config.error),
+           "unrecognized argument %s", value);
+  return ARES_FALSE;
+}
+
+static const struct {
+  /* Prefix for option.  If 0 then this param is a non-option and type must be
+   * OPT_TYPE_FUNC where the entire value for the param will be passed */
+  char         prefix;
+  /* Name of option.  If null, there is none and the value is expected to be
+   * immediately after the prefix character */
+  const char  *name;
+  /* Separator between key and value.  If 0 then uses the next argument as the
+   * value, otherwise splits on the separator. BOOL types won't ever use a
+   * separator and is ignored.*/
+  char         separator;
+  /* Type of parameter passed in.  If it is OPT_TYPE_FUNC, then it calls the
+   * dig_opt_cb_t callback */
+  opt_type_t   type;
+  /* Pointer to argument to fill in */
+  void        *opt;
+  /* Callback if OPT_TYPE_FUNC */
+  dig_opt_cb_t cb;
+} dig_options[] = {
+  /* -4 (ipv4 only) */
+  /* -6 (ipv6 only) */
+  /* { '-', "b",          0,   OPT_TYPE_FUNC,   NULL, opt_bind_address_cb },
+   */
+  { '-', "c",          0,   OPT_TYPE_FUNC,   NULL,                                   opt_class_cb    },
+  /* -f file */
+  { '-', "h",          0,   OPT_TYPE_BOOL,   &global_config.is_help,                 NULL            },
+  /* -k keyfile */
+  /* -m (memory usage debugging) */
+  { '-', "p",          0,   OPT_TYPE_U16,    &global_config.opts.port,               NULL            },
+  { '-', "q",          0,   OPT_TYPE_STRING, &global_config.name,                    NULL            },
+  { '-', "r",          0,   OPT_TYPE_BOOL,   &global_config.no_rcfile,               NULL            },
+  { '-', "s",          0,   OPT_TYPE_STRING, &global_config.servers,                 NULL            },
+  { '-', "t",          0,   OPT_TYPE_FUNC,   NULL,                                   opt_type_cb     },
+  /* -u (print microseconds instead of milliseconds) */
+  { '-', "x",          0,   OPT_TYPE_FUNC,   NULL,                                   opt_ptr_cb      },
+  /* -y [hmac:]keynam:secret */
+  { '+', "aaflag",     0,   OPT_TYPE_BOOL,   &global_config.opts.aa_flag,            NULL            },
+  { '+', "aaonly",     0,   OPT_TYPE_BOOL,   &global_config.opts.aa_flag,            NULL            },
+  { '+', "additional", 0,   OPT_TYPE_BOOL,   &global_config.opts.display_additional,
+   NULL                                                                                              },
+  { '+', "adflag",     0,   OPT_TYPE_BOOL,   &global_config.opts.ad_flag,            NULL            },
+  { '+', "aliases",    0,   OPT_TYPE_BOOL,   &global_config.opts.aliases,            NULL            },
+  { '+', "all",        '=', OPT_TYPE_FUNC,   NULL,                                   opt_all_cb      },
+  { '+', "answer",     0,   OPT_TYPE_BOOL,   &global_config.opts.display_answer,     NULL            },
+  { '+', "authority",  0,   OPT_TYPE_BOOL,   &global_config.opts.display_authority,
+   NULL                                                                                              },
+  { '+', "bufsize",    '=', OPT_TYPE_SIZE_T, &global_config.opts.udp_size,           NULL            },
+  { '+', "cdflag",     0,   OPT_TYPE_BOOL,   &global_config.opts.cd_flag,            NULL            },
+  { '+', "class",      0,   OPT_TYPE_BOOL,   &global_config.opts.display_class,      NULL            },
+  { '+', "cmd",        0,   OPT_TYPE_BOOL,   &global_config.opts.display_command,    NULL            },
+  { '+', "comments",   0,   OPT_TYPE_BOOL,   &global_config.opts.display_comments,
+   NULL                                                                                              },
+  { '+', "defname",    0,   OPT_TYPE_BOOL,   &global_config.opts.do_search,          NULL            },
+  { '+', "dns0x20",    0,   OPT_TYPE_BOOL,   &global_config.opts.dns0x20,            NULL            },
+  { '+', "domain",     '=', OPT_TYPE_STRING, &global_config.opts.search,             NULL            },
+  { '+', "edns",       '=', OPT_TYPE_FUNC,   NULL,                                   opt_edns_cb     },
+  { '+', "keepopen",   0,   OPT_TYPE_BOOL,   &global_config.opts.stayopen,           NULL            },
+  { '+', "ignore",     0,   OPT_TYPE_BOOL,   &global_config.opts.ignore_tc,          NULL            },
+  { '+', "ndots",      '=', OPT_TYPE_SIZE_T, &global_config.opts.ndots,              NULL            },
+  { '+', "primary",    0,   OPT_TYPE_BOOL,   &global_config.opts.primary,            NULL            },
+  { '+', "qr",         0,   OPT_TYPE_BOOL,   &global_config.opts.display_query,      NULL            },
+  { '+', "question",   0,   OPT_TYPE_BOOL,   &global_config.opts.display_question,
+   NULL                                                                                              },
+  { '+', "recurse",    0,   OPT_TYPE_BOOL,   &global_config.opts.rd_flag,            NULL            },
+  { '+', "retry",      '=', OPT_TYPE_FUNC,   NULL,                                   opt_retry_cb    },
+  { '+', "search",     0,   OPT_TYPE_BOOL,   &global_config.opts.do_search,          NULL            },
+  { '+', "stats",      0,   OPT_TYPE_BOOL,   &global_config.opts.display_stats,      NULL            },
+  { '+', "tcp",        0,   OPT_TYPE_BOOL,   &global_config.opts.tcp,                NULL            },
+  { '+', "tries",      '=', OPT_TYPE_SIZE_T, &global_config.opts.tries,              NULL            },
+  { '+', "ttlid",      0,   OPT_TYPE_BOOL,   &global_config.opts.display_ttl,        NULL            },
+  { '+', "vc",         0,   OPT_TYPE_BOOL,   &global_config.opts.tcp,                NULL            },
+  { 0,   NULL,         0,   OPT_TYPE_FUNC,   NULL,                                   opt_dig_bare_cb },
+  { 0,   NULL,         0,   0,               NULL,                                   NULL            }
+};
+
+static ares_bool_t read_cmdline(int argc, const char * const *argv,
+                                int start_idx)
+{
+  int    arg;
+  size_t opt;
+
+  for (arg = start_idx; arg < argc; arg++) {
+    ares_bool_t option_handled = ARES_FALSE;
+
+    for (opt = 0; !option_handled &&
+                  (dig_options[opt].opt != NULL || dig_options[opt].cb != NULL);
+         opt++) {
+      ares_bool_t is_true = ARES_TRUE;
+      const char *value   = NULL;
+      const char *nameptr = NULL;
+      size_t      namelen;
+
+      /* Match prefix character */
+      if (dig_options[opt].prefix != 0 &&
+          dig_options[opt].prefix != *(argv[arg])) {
+        continue;
+      }
+
+      nameptr = argv[arg];
+
+      /* skip prefix */
+      if (dig_options[opt].prefix != 0) {
+        nameptr++;
+      }
+
+      /* Negated option if it has a 'no' prefix */
+      if (ares_streq_max(nameptr, "no", 2)) {
+        is_true  = ARES_FALSE;
+        nameptr += 2;
+      }
+
+      if (dig_options[opt].separator != 0) {
+        const char *ptr = strchr(nameptr, dig_options[opt].separator);
+        if (ptr == NULL) {
+          namelen = ares_strlen(nameptr);
+        } else {
+          namelen = (size_t)(ptr - nameptr);
+          value   = ptr + 1;
+        }
+      } else {
+        namelen = ares_strlen(nameptr);
+      }
+
+      /* Match name */
+      if (dig_options[opt].name != NULL &&
+          !ares_streq_max(nameptr, dig_options[opt].name, namelen)) {
+        continue;
+      }
+
+      if (dig_options[opt].name == NULL) {
+        value = nameptr;
+      }
+
+      /* We need another argument for the value */
+      if (dig_options[opt].type != OPT_TYPE_BOOL &&
+          dig_options[opt].prefix != 0 && dig_options[opt].separator == 0) {
+        if (arg == argc - 1) {
+          snprintf(global_config.error, sizeof(global_config.error),
+                   "insufficient arguments for %c%s", dig_options[opt].prefix,
+                   dig_options[opt].name);
+          return ARES_FALSE;
+        }
+        arg++;
+        value = argv[arg];
+      }
+
+      switch (dig_options[opt].type) {
+        case OPT_TYPE_BOOL:
+          {
+            ares_bool_t *b = dig_options[opt].opt;
+            if (b == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "invalid use for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            *b = is_true;
+          }
+          break;
+        case OPT_TYPE_STRING:
+          {
+            char **str = dig_options[opt].opt;
+            if (str == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "invalid use for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (value == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "missing value for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (*str != NULL) {
+              free(*str);
+            }
+            *str = strdup(value);
+            break;
+          }
+        case OPT_TYPE_SIZE_T:
+          {
+            size_t *s = dig_options[opt].opt;
+            if (s == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "invalid use for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (value == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "missing value for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (!ares_str_isnum(value)) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "%c%s is not a numeric value", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            *s = strtoul(value, NULL, 10);
+            break;
+          }
+        case OPT_TYPE_U16:
+          {
+            unsigned short *s = dig_options[opt].opt;
+            if (s == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "invalid use for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (value == NULL) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "missing value for %c%s", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            if (!ares_str_isnum(value)) {
+              snprintf(global_config.error, sizeof(global_config.error),
+                       "%c%s is not a numeric value", dig_options[opt].prefix,
+                       dig_options[opt].name);
+              return ARES_FALSE;
+            }
+            *s = (unsigned short)strtoul(value, NULL, 10);
+            break;
+          }
+        case OPT_TYPE_FUNC:
+          if (dig_options[opt].cb == NULL) {
+            snprintf(global_config.error, sizeof(global_config.error),
+                     "missing callback");
+            return ARES_FALSE;
+          }
+          if (!dig_options[opt].cb(dig_options[opt].prefix,
+                                   dig_options[opt].name, is_true, value)) {
+            return ARES_FALSE;
+          }
+          break;
+      }
+      option_handled = ARES_TRUE;
+    }
+
+    if (!option_handled) {
+      snprintf(global_config.error, sizeof(global_config.error),
+               "unrecognized option %s", argv[arg]);
+      return ARES_FALSE;
+    }
+  }
+
+  return ARES_TRUE;
+}
+
+static ares_bool_t read_rcfile(void)
+{
+  char         configdir[PATH_MAX];
+  unsigned int cdlen = 0;
+
+#if !defined(WIN32)
+#  if !defined(__APPLE__)
+  char *configdir_xdg;
+#  endif
+  char *homedir;
+#endif
+
+  char          rcfile[PATH_MAX];
+  unsigned int  rclen;
+
+  size_t        rcargc;
+  char        **rcargv;
+  ares_buf_t   *rcbuf;
+  ares_status_t rcstatus;
+
+#if defined(WIN32)
+  cdlen = (unsigned int)snprintf(configdir, sizeof(configdir), "%s/%s",
+                                 getenv("APPDATA"), "c-ares");
+
+#elif defined(__APPLE__)
+  homedir = getenv("HOME");
+  if (homedir != NULL) {
+    cdlen = (unsigned int)snprintf(configdir, sizeof(configdir), "%s/%s/%s/%s",
+                                   homedir, "Library", "Application Support",
+                                   "c-ares");
+  }
+
+#else
+  configdir_xdg = getenv("XDG_CONFIG_HOME");
+
+  if (configdir_xdg == NULL) {
+    homedir = getenv("HOME");
+    if (homedir != NULL) {
+      cdlen = (unsigned int)snprintf(configdir, sizeof(configdir), "%s/%s",
+                                     homedir, ".config");
+    }
+  } else {
+    cdlen =
+      (unsigned int)snprintf(configdir, sizeof(configdir), "%s", configdir_xdg);
+  }
+
+#endif
+
+  DEBUGF(fprintf(stderr, "read_cmdline() configdir: %s\n", configdir));
+
+  if (cdlen == 0 || cdlen > sizeof(configdir)) {
+    DEBUGF(
+      fprintf(stderr, "read_cmdline() skipping rcfile parsing on directory\n"));
+    return ARES_TRUE;
+  }
+
+  rclen =
+    (unsigned int)snprintf(rcfile, sizeof(rcfile), "%s/adigrc", configdir);
+
+  if (rclen > sizeof(rcfile)) {
+    DEBUGF(fprintf(stderr, "read_cmdline() skipping rcfile parsing on file\n"));
+    return ARES_TRUE;
+  }
+
+  rcbuf = ares_buf_create();
+  if (ares_buf_load_file(rcfile, rcbuf) == ARES_SUCCESS) {
+    rcstatus = ares_buf_split_str(rcbuf, (const unsigned char *)"\n ", 2,
+                                  ARES_BUF_SPLIT_TRIM, 0, &rcargv, &rcargc);
+
+    if (rcstatus == ARES_SUCCESS) {
+      read_cmdline((int)rcargc, (const char * const *)rcargv, 0);
+
+    } else {
+      snprintf(global_config.error, sizeof(global_config.error),
+               "rcfile is invalid: %s", ares_strerror((int)rcstatus));
+    }
+
+    ares_free_array(rcargv, rcargc, ares_free);
+
+    if (rcstatus != ARES_SUCCESS) {
+      ares_buf_destroy(rcbuf);
+      return ARES_FALSE;
+    }
+
+  } else {
+    DEBUGF(fprintf(stderr, "read_cmdline() failed to load rcfile"));
+  }
+  ares_buf_destroy(rcbuf);
+
+  return ARES_TRUE;
+}
+
+static void config_defaults(void)
+{
+  memset(&global_config, 0, sizeof(global_config));
+
+  global_config.opts.tries              = 3;
+  global_config.opts.ndots              = 1;
+  global_config.opts.rd_flag            = ARES_TRUE;
+  global_config.opts.edns               = ARES_TRUE;
+  global_config.opts.udp_size           = 1232;
+  global_config.opts.aliases            = ARES_TRUE;
+  global_config.opts.display_class      = ARES_TRUE;
+  global_config.opts.display_ttl        = ARES_TRUE;
+  global_config.opts.display_command    = ARES_TRUE;
+  global_config.opts.display_stats      = ARES_TRUE;
+  global_config.opts.display_question   = ARES_TRUE;
+  global_config.opts.display_answer     = ARES_TRUE;
+  global_config.opts.display_authority  = ARES_TRUE;
+  global_config.opts.display_additional = ARES_TRUE;
+  global_config.opts.display_comments   = ARES_TRUE;
+  global_config.qclass                  = ARES_CLASS_IN;
+  global_config.qtype                   = ARES_REC_TYPE_A;
+}
+
+static void config_opts(void)
+{
+  global_config.optmask = ARES_OPT_FLAGS;
+  if (global_config.opts.tcp) {
+    global_config.options.flags |= ARES_FLAG_USEVC;
+  }
+  if (global_config.opts.primary) {
+    global_config.options.flags |= ARES_FLAG_PRIMARY;
+  }
+  if (global_config.opts.edns) {
+    global_config.options.flags |= ARES_FLAG_EDNS;
+  }
+  if (global_config.opts.stayopen) {
+    global_config.options.flags |= ARES_FLAG_STAYOPEN;
+  }
+  if (global_config.opts.dns0x20) {
+    global_config.options.flags |= ARES_FLAG_DNS0x20;
+  }
+  if (!global_config.opts.aliases) {
+    global_config.options.flags |= ARES_FLAG_NOALIASES;
+  }
+  if (!global_config.opts.rd_flag) {
+    global_config.options.flags |= ARES_FLAG_NORECURSE;
+  }
+  if (!global_config.opts.do_search) {
+    global_config.options.flags |= ARES_FLAG_NOSEARCH;
+  }
+  if (global_config.opts.ignore_tc) {
+    global_config.options.flags |= ARES_FLAG_IGNTC;
+  }
+  if (global_config.opts.port) {
+    global_config.optmask          |= ARES_OPT_UDP_PORT;
+    global_config.optmask          |= ARES_OPT_TCP_PORT;
+    global_config.options.udp_port  = global_config.opts.port;
+    global_config.options.tcp_port  = global_config.opts.port;
+  }
+
+  global_config.optmask       |= ARES_OPT_TRIES;
+  global_config.options.tries  = (int)global_config.opts.tries;
+
+  global_config.optmask       |= ARES_OPT_NDOTS;
+  global_config.options.ndots  = (int)global_config.opts.ndots;
+
+  global_config.optmask         |= ARES_OPT_EDNSPSZ;
+  global_config.options.ednspsz  = (int)global_config.opts.udp_size;
+
+  if (global_config.opts.search != NULL) {
+    global_config.optmask          |= ARES_OPT_DOMAINS;
+    global_config.options.domains   = &global_config.opts.search;
+    global_config.options.ndomains  = 1;
+  }
+}
+
 int main(int argc, char **argv)
 {
   ares_channel_t *channel = NULL;
   ares_status_t   status;
-  adig_config_t   config;
-  int             i;
   int             rv = 0;
 
 #ifdef USE_WINSOCK
@@ -944,62 +1498,73 @@ int main(int argc, char **argv)
     return 1;
   }
 
-  memset(&config, 0, sizeof(config));
-  config.qclass = ARES_CLASS_IN;
-  config.qtype  = ARES_REC_TYPE_A;
-  if (!read_cmdline(argc, (const char * const *)argv, &config)) {
-    printf("\n** ERROR: %s\n\n", config.error);
+  config_defaults();
+
+  if (!read_cmdline(argc, (const char * const *)argv, 1)) {
+    printf("\n** ERROR: %s\n\n", global_config.error);
     print_help();
     rv = 1;
     goto done;
   }
 
-  if (config.is_help) {
+  if (global_config.no_rcfile && !read_rcfile()) {
+    fprintf(stderr, "\n** ERROR: %s\n", global_config.error);
+  }
+
+  if (global_config.is_help) {
     print_help();
     goto done;
   }
 
-  status =
-    (ares_status_t)ares_init_options(&channel, &config.options, config.optmask);
+  if (global_config.name == NULL) {
+    printf("missing query name\n");
+    print_help();
+    rv = 1;
+    goto done;
+  }
+
+  config_opts();
+
+  status = (ares_status_t)ares_init_options(&channel, &global_config.options,
+                                            global_config.optmask);
   if (status != ARES_SUCCESS) {
     fprintf(stderr, "ares_init_options: %s\n", ares_strerror((int)status));
     rv = 1;
     goto done;
   }
 
-  if (config.servers) {
-    status = (ares_status_t)ares_set_servers_ports_csv(channel, config.servers);
+  if (global_config.servers) {
+    status =
+      (ares_status_t)ares_set_servers_ports_csv(channel, global_config.servers);
     if (status != ARES_SUCCESS) {
-      fprintf(stderr, "ares_set_servers_ports_csv: %s\n",
-              ares_strerror((int)status));
+      fprintf(stderr, "ares_set_servers_ports_csv: %s: %s\n",
+              ares_strerror((int)status), global_config.servers);
       rv = 1;
       goto done;
     }
   }
 
-  /* Enqueue a query for each separate name */
-  for (i = config.args_processed; i < argc; i++) {
-    status = enqueue_query(channel, &config, argv[i]);
-    if (status != ARES_SUCCESS) {
-      fprintf(stderr, "Failed to create query for %s: %s\n", argv[i],
-              ares_strerror((int)status));
-      rv = 1;
-      goto done;
-    }
+  /* Debug */
+  if (global_config.opts.display_command) {
+    printf("\n; <<>> c-ares DiG %s <<>>", ares_version(NULL));
+    printf(" %s", global_config.name);
+    printf("\n");
   }
 
-  /* Debug */
-  printf("\n; <<>> c-ares DiG %s <<>>", ares_version(NULL));
-  for (i = config.args_processed; i < argc; i++) {
-    printf(" %s", argv[i]);
+  /* Enqueue a query for each separate name */
+  status = enqueue_query(channel);
+  if (status != ARES_SUCCESS) {
+    fprintf(stderr, "Failed to create query for %s: %s\n", global_config.name,
+            ares_strerror((int)status));
+    rv = 1;
+    goto done;
   }
-  printf("\n");
 
   /* Process events */
   rv = event_loop(channel);
 
 done:
-  free_config(&config);
+  free_config();
   ares_destroy(channel);
   ares_library_cleanup();
 
diff --git a/deps/cares/src/tools/ahost.c b/deps/cares/src/tools/ahost.c
index 73f29002a98e92..7d1d4a86dc7a2d 100644
--- a/deps/cares/src/tools/ahost.c
+++ b/deps/cares/src/tools/ahost.c
@@ -42,20 +42,7 @@
 #include "ares_getopt.h"
 #include "ares_ipv6.h"
 
-#ifndef HAVE_STRDUP
-#  include "str/ares_str.h"
-#  define strdup(ptr) ares_strdup(ptr)
-#endif
-
-#ifndef HAVE_STRCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strcasecmp(p1, p2) ares_strcasecmp(p1, p2)
-#endif
-
-#ifndef HAVE_STRNCASECMP
-#  include "str/ares_strcasecmp.h"
-#  define strncasecmp(p1, p2, n) ares_strncasecmp(p1, p2, n)
-#endif
+#include "ares_str.h"
 
 static void callback(void *arg, int status, int timeouts, struct hostent *host);
 static void ai_callback(void *arg, int status, int timeouts,
@@ -111,11 +98,11 @@ int         main(int argc, char **argv)
         options.domains[options.ndomains - 1] = strdup(state.optarg);
         break;
       case 't':
-        if (!strcasecmp(state.optarg, "a")) {
+        if (ares_strcaseeq(state.optarg, "a")) {
           addr_family = AF_INET;
-        } else if (!strcasecmp(state.optarg, "aaaa")) {
+        } else if (ares_strcaseeq(state.optarg, "aaaa")) {
           addr_family = AF_INET6;
-        } else if (!strcasecmp(state.optarg, "u")) {
+        } else if (ares_strcaseeq(state.optarg, "u")) {
           addr_family = AF_UNSPEC;
         } else {
           usage();

From 72817072e2c2fb121cdbe80b44335859ec9d3e29 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?H=C3=BCseyin=20A=C3=A7acak?=
 <110401522+huseyinacacak-janea@users.noreply.github.com>
Date: Tue, 15 Oct 2024 09:41:38 +0300
Subject: [PATCH 015/216] src: fix winapi_strerror error string

Fixes: https://github.com/nodejs/node/issues/23191
PR-URL: https://github.com/nodejs/node/pull/55207
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 src/api/exceptions.cc               | 16 ++++++++--------
 test/parallel/test-debug-process.js | 21 +++++++++++++++++++++
 2 files changed, 29 insertions(+), 8 deletions(-)
 create mode 100644 test/parallel/test-debug-process.js

diff --git a/src/api/exceptions.cc b/src/api/exceptions.cc
index 1b9b308ad89fc6..871fe78de95154 100644
--- a/src/api/exceptions.cc
+++ b/src/api/exceptions.cc
@@ -157,14 +157,14 @@ Local<Value> UVException(Isolate* isolate,
 static const char* winapi_strerror(const int errorno, bool* must_free) {
   char* errmsg = nullptr;
 
-  FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM |
-                    FORMAT_MESSAGE_IGNORE_INSERTS,
-                nullptr,
-                errorno,
-                MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
-                reinterpret_cast<LPTSTR>(&errmsg),
-                0,
-                nullptr);
+  FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM |
+                     FORMAT_MESSAGE_IGNORE_INSERTS,
+                 nullptr,
+                 errorno,
+                 MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
+                 reinterpret_cast<LPSTR>(&errmsg),
+                 0,
+                 nullptr);
 
   if (errmsg) {
     *must_free = true;
diff --git a/test/parallel/test-debug-process.js b/test/parallel/test-debug-process.js
new file mode 100644
index 00000000000000..0d10a15e2eefa0
--- /dev/null
+++ b/test/parallel/test-debug-process.js
@@ -0,0 +1,21 @@
+'use strict';
+const common = require('../common');
+const assert = require('assert');
+const child = require('child_process');
+
+if (!common.isWindows) {
+  common.skip('This test is specific to Windows to test winapi_strerror');
+}
+
+// Ref: https://github.com/nodejs/node/issues/23191
+// This test is specific to Windows.
+
+const cp = child.spawn('pwd');
+
+cp.on('exit', common.mustCall(function() {
+  try {
+    process._debugProcess(cp.pid);
+  } catch (error) {
+    assert.strictEqual(error.message, 'The system cannot find the file specified.');
+  }
+}));

From f10857828f6e9e35e7d7aae9c55953d8495b80c3 Mon Sep 17 00:00:00 2001
From: BadKey <julian@gassner.online>
Date: Tue, 15 Oct 2024 15:43:24 +0200
Subject: [PATCH 016/216] lib: test_runner#mock:timers respeced timeout_max
 behaviour

PR-URL: https://github.com/nodejs/node/pull/55375
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Claudio Wunder <cwunder@gnome.org>
---
 lib/internal/test_runner/mock/mock_timers.js |  6 ++++++
 test/parallel/test-runner-mock-timers.js     | 16 ++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/lib/internal/test_runner/mock/mock_timers.js b/lib/internal/test_runner/mock/mock_timers.js
index 147de4efa5e6a0..2d8ae186a9dfef 100644
--- a/lib/internal/test_runner/mock/mock_timers.js
+++ b/lib/internal/test_runner/mock/mock_timers.js
@@ -34,6 +34,8 @@ const {
   codes: { ERR_INVALID_STATE, ERR_INVALID_ARG_VALUE },
 } = require('internal/errors');
 
+const { TIMEOUT_MAX } = require('internal/timers');
+
 const PriorityQueue = require('internal/priority_queue');
 const nodeTimers = require('timers');
 const nodeTimersPromises = require('timers/promises');
@@ -284,6 +286,10 @@ class MockTimers {
   }
 
   #createTimer(isInterval, callback, delay, ...args) {
+    if (delay > TIMEOUT_MAX) {
+      delay = 1;
+    }
+
     const timerId = this.#currentTimer++;
     const opts = {
       __proto__: null,
diff --git a/test/parallel/test-runner-mock-timers.js b/test/parallel/test-runner-mock-timers.js
index e2a86a5263636a..9e1bc7e62cc5b2 100644
--- a/test/parallel/test-runner-mock-timers.js
+++ b/test/parallel/test-runner-mock-timers.js
@@ -251,6 +251,22 @@ describe('Mock Timers Test Suite', () => {
           done();
         }), timeout);
       });
+
+      it('should change timeout to 1ms when it is >= 2 ** 31', (t) => {
+        t.mock.timers.enable({ apis: ['setTimeout'] });
+        const fn = t.mock.fn();
+        global.setTimeout(fn, 2 ** 31);
+        t.mock.timers.tick(1);
+        assert.strictEqual(fn.mock.callCount(), 1);
+      });
+
+      it('should change the delay to one if timeout < 0', (t) => {
+        t.mock.timers.enable({ apis: ['setTimeout'] });
+        const fn = t.mock.fn();
+        global.setTimeout(fn, -1);
+        t.mock.timers.tick(1);
+        assert.strictEqual(fn.mock.callCount(), 1);
+      });
     });
 
     describe('clearTimeout Suite', () => {

From 33367cbd62bf5eb741f466b72d3da8b43cef3206 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 15 Oct 2024 20:56:39 +0200
Subject: [PATCH 017/216] deps: update simdutf to 5.6.0

PR-URL: https://github.com/nodejs/node/pull/55379
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
---
 deps/simdutf/simdutf.cpp | 48315 +++++++++++++++++++------------------
 deps/simdutf/simdutf.h   |  2909 ++-
 2 files changed, 27252 insertions(+), 23972 deletions(-)

diff --git a/deps/simdutf/simdutf.cpp b/deps/simdutf/simdutf.cpp
index b04ec9773e9215..e6b25c7ce27c16 100644
--- a/deps/simdutf/simdutf.cpp
+++ b/deps/simdutf/simdutf.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-09-04 18:13:32 +0200. Do not edit! */
+/* auto-generated on 2024-10-11 12:35:29 -0400. Do not edit! */
 /* begin file src/simdutf.cpp */
 #include "simdutf.h"
 // We include base64_tables once.
@@ -615,7 +615,7 @@ const unsigned char BitsSetTable256mul2[256] = {
     14, 10, 12, 12, 14, 12, 14, 14, 16};
 
 constexpr uint8_t to_base64_value[] = {
-    255, 255, 255, 255, 255, 255, 255, 255, 255, 64,  64,  255, 64, 64,  255,
+    255, 255, 255, 255, 255, 255, 255, 255, 255, 64,  64,  255, 64,  64,  255,
     255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
     255, 255, 64,  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 62,  255,
     255, 255, 63,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  255, 255,
@@ -635,7 +635,7 @@ constexpr uint8_t to_base64_value[] = {
     255};
 
 constexpr uint8_t to_base64_url_value[] = {
-    255, 255, 255, 255, 255, 255, 255, 255, 255, 64,  64,  255, 64, 64,  255,
+    255, 255, 255, 255, 255, 255, 255, 255, 255, 64,  64,  255, 64,  64,  255,
     255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
     255, 255, 64,  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
     62,  255, 255, 52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  255, 255,
@@ -653,22 +653,38 @@ constexpr uint8_t to_base64_url_value[] = {
     255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
     255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
     255};
-static_assert(sizeof(to_base64_value) == 256, "to_base64_value must have 256 elements");
-static_assert(sizeof(to_base64_url_value) == 256, "to_base64_url_value must have 256 elements");
-static_assert(to_base64_value[uint8_t(' ')] == 64, "space must be == 64 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t(' ')] == 64, "space must be == 64 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('\t')] == 64, "tab must be == 64 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('\t')] == 64, "tab must be == 64 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('\r')] == 64, "cr must be == 64 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('\r')] == 64, "cr must be == 64 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('\n')] == 64, "lf must be == 64 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('\n')] == 64, "lf must be == 64 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('\f')] == 64, "ff must be == 64 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('\f')] == 64, "ff must be == 64 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('+')] == 62, "+ must be == 62 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('-')] == 62, "- must be == 62 in to_base64_url_value");
-static_assert(to_base64_value[uint8_t('/')] == 63, "/ must be == 62 in to_base64_value");
-static_assert(to_base64_url_value[uint8_t('_')] == 63, "_ must be == 62 in to_base64_url_value");
+static_assert(sizeof(to_base64_value) == 256,
+              "to_base64_value must have 256 elements");
+static_assert(sizeof(to_base64_url_value) == 256,
+              "to_base64_url_value must have 256 elements");
+static_assert(to_base64_value[uint8_t(' ')] == 64,
+              "space must be == 64 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t(' ')] == 64,
+              "space must be == 64 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('\t')] == 64,
+              "tab must be == 64 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('\t')] == 64,
+              "tab must be == 64 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('\r')] == 64,
+              "cr must be == 64 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('\r')] == 64,
+              "cr must be == 64 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('\n')] == 64,
+              "lf must be == 64 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('\n')] == 64,
+              "lf must be == 64 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('\f')] == 64,
+              "ff must be == 64 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('\f')] == 64,
+              "ff must be == 64 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('+')] == 62,
+              "+ must be == 62 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('-')] == 62,
+              "- must be == 62 in to_base64_url_value");
+static_assert(to_base64_value[uint8_t('/')] == 63,
+              "/ must be == 62 in to_base64_value");
+static_assert(to_base64_url_value[uint8_t('_')] == 63,
+              "_ must be == 62 in to_base64_url_value");
 } // namespace base64
 } // namespace tables
 } // unnamed namespace
@@ -685,18 +701,17 @@ static_assert(to_base64_url_value[uint8_t('_')] == 63, "_ must be == 62 in to_ba
 namespace simdutf {
 namespace {
 
-template <typename T>
-std::string toBinaryString(T b) {
-   std::string binary = "";
-   T mask = T(1) << (sizeof(T) * CHAR_BIT - 1);
-   while (mask > 0) {
+template <typename T> std::string toBinaryString(T b) {
+  std::string binary = "";
+  T mask = T(1) << (sizeof(T) * CHAR_BIT - 1);
+  while (mask > 0) {
     binary += ((b & mask) == 0) ? '0' : '1';
     mask >>= 1;
   }
   return binary;
 }
-}
-}
+} // namespace
+} // namespace simdutf
 
 // Implementations
 // The best choice should always come first!
@@ -705,29 +720,27 @@ std::string toBinaryString(T b) {
 #define SIMDUTF_ARM64_H
 
 #ifdef SIMDUTF_FALLBACK_H
-#error "arm64.h must be included before fallback.h"
+  #error "arm64.h must be included before fallback.h"
 #endif
 
 
 #ifndef SIMDUTF_IMPLEMENTATION_ARM64
-#define SIMDUTF_IMPLEMENTATION_ARM64 (SIMDUTF_IS_ARM64)
+  #define SIMDUTF_IMPLEMENTATION_ARM64 (SIMDUTF_IS_ARM64)
 #endif
 #if SIMDUTF_IMPLEMENTATION_ARM64 && SIMDUTF_IS_ARM64
-#define SIMDUTF_CAN_ALWAYS_RUN_ARM64 1
+  #define SIMDUTF_CAN_ALWAYS_RUN_ARM64 1
 #else
-#define SIMDUTF_CAN_ALWAYS_RUN_ARM64 0
+  #define SIMDUTF_CAN_ALWAYS_RUN_ARM64 0
 #endif
 
 
-
 #if SIMDUTF_IMPLEMENTATION_ARM64
 
 namespace simdutf {
 /**
  * Implementation for NEON (ARMv8).
  */
-namespace arm64 {
-} // namespace arm64
+namespace arm64 {} // namespace arm64
 } // namespace simdutf
 
 /* begin file src/simdutf/arm64/implementation.h */
@@ -744,88 +757,197 @@ using namespace simdutf;
 
 class implementation final : public simdutf::implementation {
 public:
-  simdutf_really_inline implementation() : simdutf::implementation("arm64", "ARM NEON", internal::instruction_set::NEON) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
+  simdutf_really_inline implementation()
+      : simdutf::implementation("arm64", "ARM NEON",
+                                internal::instruction_set::NEON) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 };
 
 } // namespace arm64
@@ -839,7 +961,7 @@ class implementation final : public simdutf::implementation {
 // #define SIMDUTF_IMPLEMENTATION arm64
 /* end file src/simdutf/arm64/begin.h */
 
-// Declarations
+  // Declarations
 /* begin file src/simdutf/arm64/intrinsics.h */
 #ifndef SIMDUTF_ARM64_INTRINSICS_H
 #define SIMDUTF_ARM64_INTRINSICS_H
@@ -861,20 +983,20 @@ namespace {
 
 /* result might be undefined when input_num is zero */
 simdutf_really_inline int count_ones(uint64_t input_num) {
-   return vaddv_u8(vcnt_u8(vcreate_u8(input_num)));
+  return vaddv_u8(vcnt_u8(vcreate_u8(input_num)));
 }
 
 #if SIMDUTF_NEED_TRAILING_ZEROES
 simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
   unsigned long ret;
   // Search the mask data from least significant bit (LSB)
   // to the most significant bit (MSB) for a set bit (1).
   _BitScanForward64(&ret, input_num);
   return (int)ret;
-#else // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #else  // SIMDUTF_REGULAR_VISUAL_STUDIO
   return __builtin_ctzll(input_num);
-#endif // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #endif // SIMDUTF_REGULAR_VISUAL_STUDIO
 }
 #endif
 
@@ -890,7 +1012,6 @@ simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
 
 #include <type_traits>
 
-
 namespace simdutf {
 namespace arm64 {
 namespace {
@@ -898,1054 +1019,1327 @@ namespace simd {
 
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
 namespace {
-// Start of private section with Visual Studio workaround
-
-#ifndef simdutf_make_uint8x16_t
-#define simdutf_make_uint8x16_t(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, \
-                             x13, x14, x15, x16)                                   \
-   ([=]() {                                                                        \
-     uint8_t array[16] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8,                    \
-                                 x9, x10, x11, x12, x13, x14, x15, x16};           \
-     return vld1q_u8(array);                                                       \
-   }())
-#endif
-#ifndef simdutf_make_int8x16_t
-#define simdutf_make_int8x16_t(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, \
-                             x13, x14, x15, x16)                                  \
-   ([=]() {                                                                       \
-     int8_t array[16] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8,                    \
-                                 x9, x10, x11, x12, x13, x14, x15, x16};          \
-     return vld1q_s8(array);                                                      \
-   }())
-#endif
-
-#ifndef simdutf_make_uint8x8_t
-#define simdutf_make_uint8x8_t(x1, x2, x3, x4, x5, x6, x7, x8)                \
-   ([=]() {                                                                   \
-     uint8_t array[8] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8};               \
-     return vld1_u8(array);                                                   \
-   }())
-#endif
-#ifndef simdutf_make_int8x8_t
-#define simdutf_make_int8x8_t(x1, x2, x3, x4, x5, x6, x7, x8)                 \
-   ([=]() {                                                                   \
-     int8_t array[8] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8};                \
-     return vld1_s8(array);                                                   \
-   }())
-#endif
-#ifndef simdutf_make_uint16x8_t
-#define simdutf_make_uint16x8_t(x1, x2, x3, x4, x5, x6, x7, x8)                \
-   ([=]() {                                                                    \
-     uint16_t array[8] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8};               \
-     return vld1q_u16(array);                                                  \
-   }())
-#endif
-#ifndef simdutf_make_int16x8_t
-#define simdutf_make_int16x8_t(x1, x2, x3, x4, x5, x6, x7, x8)                 \
-   ([=]() {                                                                    \
-     int16_t array[8] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8};                \
-     return vld1q_s16(array);                                                  \
-   }())
-#endif
+  // Start of private section with Visual Studio workaround
+
+  #ifndef simdutf_make_uint8x16_t
+    #define simdutf_make_uint8x16_t(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10,   \
+                                    x11, x12, x13, x14, x15, x16)              \
+      ([=]() {                                                                 \
+        uint8_t array[16] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8,             \
+                             x9, x10, x11, x12, x13, x14, x15, x16};           \
+        return vld1q_u8(array);                                                \
+      }())
+  #endif
+  #ifndef simdutf_make_int8x16_t
+    #define simdutf_make_int8x16_t(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10,    \
+                                   x11, x12, x13, x14, x15, x16)               \
+      ([=]() {                                                                 \
+        int8_t array[16] = {x1, x2,  x3,  x4,  x5,  x6,  x7,  x8,              \
+                            x9, x10, x11, x12, x13, x14, x15, x16};            \
+        return vld1q_s8(array);                                                \
+      }())
+  #endif
 
+  #ifndef simdutf_make_uint8x8_t
+    #define simdutf_make_uint8x8_t(x1, x2, x3, x4, x5, x6, x7, x8)             \
+      ([=]() {                                                                 \
+        uint8_t array[8] = {x1, x2, x3, x4, x5, x6, x7, x8};                   \
+        return vld1_u8(array);                                                 \
+      }())
+  #endif
+  #ifndef simdutf_make_int8x8_t
+    #define simdutf_make_int8x8_t(x1, x2, x3, x4, x5, x6, x7, x8)              \
+      ([=]() {                                                                 \
+        int8_t array[8] = {x1, x2, x3, x4, x5, x6, x7, x8};                    \
+        return vld1_s8(array);                                                 \
+      }())
+  #endif
+  #ifndef simdutf_make_uint16x8_t
+    #define simdutf_make_uint16x8_t(x1, x2, x3, x4, x5, x6, x7, x8)            \
+      ([=]() {                                                                 \
+        uint16_t array[8] = {x1, x2, x3, x4, x5, x6, x7, x8};                  \
+        return vld1q_u16(array);                                               \
+      }())
+  #endif
+  #ifndef simdutf_make_int16x8_t
+    #define simdutf_make_int16x8_t(x1, x2, x3, x4, x5, x6, x7, x8)             \
+      ([=]() {                                                                 \
+        int16_t array[8] = {x1, x2, x3, x4, x5, x6, x7, x8};                   \
+        return vld1q_s16(array);                                               \
+      }())
+  #endif
 
 // End of private section with Visual Studio workaround
 } // namespace
 #endif // SIMDUTF_REGULAR_VISUAL_STUDIO
 
+template <typename T> struct simd8;
 
-  template<typename T>
-  struct simd8;
-
-  //
-  // Base class of simd8<uint8_t> and simd8<bool>, both of which use uint8x16_t internally.
-  //
-  template<typename T, typename Mask=simd8<bool>>
-  struct base_u8 {
-    uint8x16_t value;
-    static const int SIZE = sizeof(value);
-
-    // Conversion from/to SIMD register
-    simdutf_really_inline base_u8(const uint8x16_t _value) : value(_value) {}
-    simdutf_really_inline operator const uint8x16_t&() const { return this->value; }
-    simdutf_really_inline operator uint8x16_t&() { return this->value; }
-    simdutf_really_inline T first() const { return vgetq_lane_u8(*this,0); }
-    simdutf_really_inline T last() const { return vgetq_lane_u8(*this,15); }
-
-    // Bit operations
-    simdutf_really_inline simd8<T> operator|(const simd8<T> other) const { return vorrq_u8(*this, other); }
-    simdutf_really_inline simd8<T> operator&(const simd8<T> other) const { return vandq_u8(*this, other); }
-    simdutf_really_inline simd8<T> operator^(const simd8<T> other) const { return veorq_u8(*this, other); }
-    simdutf_really_inline simd8<T> bit_andnot(const simd8<T> other) const { return vbicq_u8(*this, other); }
-    simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
-    simdutf_really_inline simd8<T>& operator|=(const simd8<T> other) { auto this_cast = static_cast<simd8<T>*>(this); *this_cast = *this_cast | other; return *this_cast; }
-    simdutf_really_inline simd8<T>& operator&=(const simd8<T> other) { auto this_cast = static_cast<simd8<T>*>(this); *this_cast = *this_cast & other; return *this_cast; }
-    simdutf_really_inline simd8<T>& operator^=(const simd8<T> other) { auto this_cast = static_cast<simd8<T>*>(this); *this_cast = *this_cast ^ other; return *this_cast; }
-
-    friend simdutf_really_inline Mask operator==(const simd8<T> lhs, const simd8<T> rhs) { return vceqq_u8(lhs, rhs); }
-
-    template<int N=1>
-    simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
-      return vextq_u8(prev_chunk, *this, 16 - N);
-    }
-  };
-
-  // SIMD byte mask type (returned by things like eq and gt)
-  template<>
-  struct simd8<bool>: base_u8<bool> {
-    typedef uint16_t bitmask_t;
-    typedef uint32_t bitmask2_t;
+//
+// Base class of simd8<uint8_t> and simd8<bool>, both of which use uint8x16_t
+// internally.
+//
+template <typename T, typename Mask = simd8<bool>> struct base_u8 {
+  uint8x16_t value;
+  static const int SIZE = sizeof(value);
 
-    static simdutf_really_inline simd8<bool> splat(bool _value) { return vmovq_n_u8(uint8_t(-(!!_value))); }
+  // Conversion from/to SIMD register
+  simdutf_really_inline base_u8(const uint8x16_t _value) : value(_value) {}
+  simdutf_really_inline operator const uint8x16_t &() const {
+    return this->value;
+  }
+  simdutf_really_inline operator uint8x16_t &() { return this->value; }
+  simdutf_really_inline T first() const { return vgetq_lane_u8(*this, 0); }
+  simdutf_really_inline T last() const { return vgetq_lane_u8(*this, 15); }
 
-    simdutf_really_inline simd8(const uint8x16_t _value) : base_u8<bool>(_value) {}
-    // False constructor
-    simdutf_really_inline simd8() : simd8(vdupq_n_u8(0)) {}
-    // Splat constructor
-    simdutf_really_inline simd8(bool _value) : simd8(splat(_value)) {}
-    simdutf_really_inline void store(uint8_t dst[16]) const { return vst1q_u8(dst, *this); }
+  // Bit operations
+  simdutf_really_inline simd8<T> operator|(const simd8<T> other) const {
+    return vorrq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<T> operator&(const simd8<T> other) const {
+    return vandq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<T> operator^(const simd8<T> other) const {
+    return veorq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<T> bit_andnot(const simd8<T> other) const {
+    return vbicq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd8<T> &operator|=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd8<T> &operator&=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd8<T> &operator^=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
 
-    // We return uint32_t instead of uint16_t because that seems to be more efficient for most
-    // purposes (cutting it down to uint16_t costs performance in some compilers).
-    simdutf_really_inline uint32_t to_bitmask() const {
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint8x16_t bit_mask =  simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                                                   0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
-#else
-      const uint8x16_t bit_mask =  {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                                    0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
-#endif
-      auto minput = *this & bit_mask;
-      uint8x16_t tmp = vpaddq_u8(minput, minput);
-      tmp = vpaddq_u8(tmp, tmp);
-      tmp = vpaddq_u8(tmp, tmp);
-      return vgetq_lane_u16(vreinterpretq_u16_u8(tmp), 0);
-    }
+  friend simdutf_really_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return vceqq_u8(lhs, rhs);
+  }
 
-    // Returns 4-bit out of each byte, alternating between the high 4 bits and low bits
-    // result it is 64 bit.
-    // This method is expected to be faster than none() and is equivalent
-    // when the vector register is the result of a comparison, with byte
-    // values 0xff and 0x00.
-    simdutf_really_inline uint64_t to_bitmask64() const {
-      return vget_lane_u64(vreinterpret_u64_u8(vshrn_n_u16(vreinterpretq_u16_u8(*this), 4)), 0);
-    }
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
+    return vextq_u8(prev_chunk, *this, 16 - N);
+  }
+};
 
-    simdutf_really_inline bool any() const { return vmaxvq_u32(vreinterpretq_u32_u8(*this)) != 0; }
-    simdutf_really_inline bool none() const { return vmaxvq_u32(vreinterpretq_u32_u8(*this)) == 0; }
-    simdutf_really_inline bool all() const { return vminvq_u32(vreinterpretq_u32_u8(*this)) == 0xFFFFF; }
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd8<bool> : base_u8<bool> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
 
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return vmovq_n_u8(uint8_t(-(!!_value)));
+  }
 
-  };
+  simdutf_really_inline simd8(const uint8x16_t _value)
+      : base_u8<bool>(_value) {}
+  // False constructor
+  simdutf_really_inline simd8() : simd8(vdupq_n_u8(0)) {}
+  // Splat constructor
+  simdutf_really_inline simd8(bool _value) : simd8(splat(_value)) {}
+  simdutf_really_inline void store(uint8_t dst[16]) const {
+    return vst1q_u8(dst, *this);
+  }
 
-  // Unsigned bytes
-  template<>
-  struct simd8<uint8_t>: base_u8<uint8_t> {
-    static simdutf_really_inline simd8<uint8_t> splat(uint8_t _value) { return vmovq_n_u8(_value); }
-    static simdutf_really_inline simd8<uint8_t> zero() { return vdupq_n_u8(0); }
-    static simdutf_really_inline simd8<uint8_t> load(const uint8_t* values) { return vld1q_u8(values); }
-    simdutf_really_inline simd8(const uint8x16_t _value) : base_u8<uint8_t>(_value) {}
-    // Zero constructor
-    simdutf_really_inline simd8() : simd8(zero()) {}
-    // Array constructor
-    simdutf_really_inline simd8(const uint8_t values[16]) : simd8(load(values)) {}
-    // Splat constructor
-    simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
-    // Member-by-member initialization
+  // We return uint32_t instead of uint16_t because that seems to be more
+  // efficient for most purposes (cutting it down to uint16_t costs performance
+  // in some compilers).
+  simdutf_really_inline uint32_t to_bitmask() const {
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-    simdutf_really_inline simd8(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) : simd8(simdutf_make_uint8x16_t(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    )) {}
+    const uint8x16_t bit_mask =
+        simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
 #else
-    simdutf_really_inline simd8(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) : simd8(uint8x16_t{
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    }) {}
+    const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                 0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
 #endif
+    auto minput = *this & bit_mask;
+    uint8x16_t tmp = vpaddq_u8(minput, minput);
+    tmp = vpaddq_u8(tmp, tmp);
+    tmp = vpaddq_u8(tmp, tmp);
+    return vgetq_lane_u16(vreinterpretq_u16_u8(tmp), 0);
+  }
 
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<uint8_t> repeat_16(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) {
-      return simd8<uint8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-    // Store to array
-    simdutf_really_inline void store(uint8_t dst[16]) const { return vst1q_u8(dst, *this); }
-
-    // Saturated math
-    simdutf_really_inline simd8<uint8_t> saturating_add(const simd8<uint8_t> other) const { return vqaddq_u8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> saturating_sub(const simd8<uint8_t> other) const { return vqsubq_u8(*this, other); }
-
-    // Addition/subtraction are the same for signed and unsigned
-    simdutf_really_inline simd8<uint8_t> operator+(const simd8<uint8_t> other) const { return vaddq_u8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> operator-(const simd8<uint8_t> other) const { return vsubq_u8(*this, other); }
-    simdutf_really_inline simd8<uint8_t>& operator+=(const simd8<uint8_t> other) { *this = *this + other; return *this; }
-    simdutf_really_inline simd8<uint8_t>& operator-=(const simd8<uint8_t> other) { *this = *this - other; return *this; }
-
-    // Order-specific operations
-    simdutf_really_inline uint8_t max_val() const { return vmaxvq_u8(*this); }
-    simdutf_really_inline uint8_t min_val() const { return vminvq_u8(*this); }
-    simdutf_really_inline simd8<uint8_t> max_val(const simd8<uint8_t> other) const { return vmaxq_u8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> min_val(const simd8<uint8_t> other) const { return vminq_u8(*this, other); }
-    simdutf_really_inline simd8<bool> operator<=(const simd8<uint8_t> other) const { return vcleq_u8(*this, other); }
-    simdutf_really_inline simd8<bool> operator>=(const simd8<uint8_t> other) const { return vcgeq_u8(*this, other); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<uint8_t> other) const { return vcltq_u8(*this, other); }
-    simdutf_really_inline simd8<bool> operator>(const simd8<uint8_t> other) const { return vcgtq_u8(*this, other); }
-    // Same as >, but instead of guaranteeing all 1's == true, false = 0 and true = nonzero. For ARM, returns all 1's.
-    simdutf_really_inline simd8<uint8_t> gt_bits(const simd8<uint8_t> other) const { return simd8<uint8_t>(*this > other); }
-    // Same as <, but instead of guaranteeing all 1's == true, false = 0 and true = nonzero. For ARM, returns all 1's.
-    simdutf_really_inline simd8<uint8_t> lt_bits(const simd8<uint8_t> other) const { return simd8<uint8_t>(*this < other); }
-
-    // Bit-specific operations
-    simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const { return vtstq_u8(*this, bits); }
-    simdutf_really_inline bool is_ascii() const { return this->max_val() < 0b10000000u; }
-
-    simdutf_really_inline bool any_bits_set_anywhere() const { return this->max_val() != 0; }
-    simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const { return (*this & bits).any_bits_set_anywhere(); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shr() const { return vshrq_n_u8(*this, N); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shl() const { return vshlq_n_u8(*this, N); }
-
-    // Perform a lookup assuming the value is between 0 and 16 (undefined behavior for out of range values)
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
-      return lookup_table.apply_lookup_16_to(*this);
-    }
-
-
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(
-        L replace0,  L replace1,  L replace2,  L replace3,
-        L replace4,  L replace5,  L replace6,  L replace7,
-        L replace8,  L replace9,  L replace10, L replace11,
-        L replace12, L replace13, L replace14, L replace15) const {
-      return lookup_16(simd8<L>::repeat_16(
-        replace0,  replace1,  replace2,  replace3,
-        replace4,  replace5,  replace6,  replace7,
-        replace8,  replace9,  replace10, replace11,
-        replace12, replace13, replace14, replace15
-      ));
-    }
-
-    template<typename T>
-    simdutf_really_inline simd8<uint8_t> apply_lookup_16_to(const simd8<T> original) const {
-      return vqtbl1q_u8(*this, simd8<uint8_t>(original));
-    }
-  };
-
-  // Signed bytes
-  template<>
-  struct simd8<int8_t> {
-    int8x16_t value;
+  // Returns 4-bit out of each byte, alternating between the high 4 bits and low
+  // bits result it is 64 bit. This method is expected to be faster than none()
+  // and is equivalent when the vector register is the result of a comparison,
+  // with byte values 0xff and 0x00.
+  simdutf_really_inline uint64_t to_bitmask64() const {
+    return vget_lane_u64(
+        vreinterpret_u64_u8(vshrn_n_u16(vreinterpretq_u16_u8(*this), 4)), 0);
+  }
 
-    static simdutf_really_inline simd8<int8_t> splat(int8_t _value) { return vmovq_n_s8(_value); }
-    static simdutf_really_inline simd8<int8_t> zero() { return vdupq_n_s8(0); }
-    static simdutf_really_inline simd8<int8_t> load(const int8_t values[16]) { return vld1q_s8(values); }
+  simdutf_really_inline bool any() const {
+    return vmaxvq_u32(vreinterpretq_u32_u8(*this)) != 0;
+  }
+  simdutf_really_inline bool none() const {
+    return vmaxvq_u32(vreinterpretq_u32_u8(*this)) == 0;
+  }
+  simdutf_really_inline bool all() const {
+    return vminvq_u32(vreinterpretq_u32_u8(*this)) == 0xFFFFF;
+  }
+};
 
-    // Use ST2 instead of UXTL+UXTL2 to interleave zeroes. UXTL is actually a USHLL #0,
-    // and shifting in NEON is actually quite slow.
-    //
-    // While this needs the registers to be in a specific order, bigger cores can interleave
-    // these with no overhead, and it still performs decently on little cores.
-    //    movi  v1.3d, #0
-    //      mov   v0.16b, value[0]
-    //    st2   {v0.16b, v1.16b}, [ptr], #32
-    //      mov   v0.16b, value[1]
-    //    st2   {v0.16b, v1.16b}, [ptr], #32
-    //    ...
-    template <endianness big_endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * p) const {
-      int8x16x2_t pair = match_system(big_endian)
-          ? int8x16x2_t{{this->value, vmovq_n_s8(0)}}
-          : int8x16x2_t{{vmovq_n_s8(0), this->value}};
-      vst2q_s8(reinterpret_cast<int8_t *>(p), pair);
-    }
-
-    // currently unused
-    // Technically this could be done with ST4 like in store_ascii_as_utf16, but it is
-    // very much not worth it, as explicitly mentioned in the ARM Cortex-X1 Core Software
-    // Optimization Guide:
-    //   4.18 Complex ASIMD instructions
-    //     The bandwidth of [ST4 with element size less than 64b] is limited by decode
-    //     constraints and it is advisable to avoid them when high performing code is desired.
-    // Instead, it is better to use ZIP1+ZIP2 and two ST2.
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * p) const {
-      const uint16x8_t low = vreinterpretq_u16_s8(vzip1q_s8(this->value, vmovq_n_s8(0)));
-      const uint16x8_t high = vreinterpretq_u16_s8(vzip2q_s8(this->value, vmovq_n_s8(0)));
-      const uint16x8x2_t low_pair{{ low, vmovq_n_u16(0) }};
-      vst2q_u16(reinterpret_cast<uint16_t *>(p), low_pair);
-      const uint16x8x2_t high_pair{{ high, vmovq_n_u16(0) }};
-      vst2q_u16(reinterpret_cast<uint16_t *>(p + 8), high_pair);
-    }
-
-    // In places where the table can be reused, which is most uses in simdutf, it is worth it to do
-    // 4 table lookups, as there is no direct zero extension from u8 to u32.
-    simdutf_really_inline void store_ascii_as_utf32_tbl(char32_t * p) const {
-      const simd8<uint8_t> tb1{  0,255,255,255,  1,255,255,255,  2,255,255,255,  3,255,255,255 };
-      const simd8<uint8_t> tb2{  4,255,255,255,  5,255,255,255,  6,255,255,255,  7,255,255,255 };
-      const simd8<uint8_t> tb3{  8,255,255,255,  9,255,255,255, 10,255,255,255, 11,255,255,255 };
-      const simd8<uint8_t> tb4{ 12,255,255,255, 13,255,255,255, 14,255,255,255, 15,255,255,255 };
-
-      // encourage store pairing and interleaving
-      const auto shuf1 = this->apply_lookup_16_to(tb1);
-      const auto shuf2 = this->apply_lookup_16_to(tb2);
-      shuf1.store(reinterpret_cast<int8_t *>(p));
-      shuf2.store(reinterpret_cast<int8_t *>(p + 4));
-
-      const auto shuf3 = this->apply_lookup_16_to(tb3);
-      const auto shuf4 = this->apply_lookup_16_to(tb4);
-      shuf3.store(reinterpret_cast<int8_t *>(p + 8));
-      shuf4.store(reinterpret_cast<int8_t *>(p + 12));
-    }
-    // Conversion from/to SIMD register
-    simdutf_really_inline simd8(const int8x16_t _value) : value{_value} {}
-    simdutf_really_inline operator const int8x16_t&() const { return this->value; }
-#ifndef SIMDUTF_REGULAR_VISUAL_STUDIO
-    simdutf_really_inline operator const uint8x16_t() const { return vreinterpretq_u8_s8(this->value); }
-#endif
-    simdutf_really_inline operator int8x16_t&() { return this->value; }
-
-    // Zero constructor
-    simdutf_really_inline simd8() : simd8(zero()) {}
-    // Splat constructor
-    simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const int8_t* values) : simd8(load(values)) {}
-    // Member-by-member initialization
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base_u8<uint8_t> {
+  static simdutf_really_inline simd8<uint8_t> splat(uint8_t _value) {
+    return vmovq_n_u8(_value);
+  }
+  static simdutf_really_inline simd8<uint8_t> zero() { return vdupq_n_u8(0); }
+  static simdutf_really_inline simd8<uint8_t> load(const uint8_t *values) {
+    return vld1q_u8(values);
+  }
+  simdutf_really_inline simd8(const uint8x16_t _value)
+      : base_u8<uint8_t>(_value) {}
+  // Zero constructor
+  simdutf_really_inline simd8() : simd8(zero()) {}
+  // Array constructor
+  simdutf_really_inline simd8(const uint8_t values[16]) : simd8(load(values)) {}
+  // Splat constructor
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
+  // Member-by-member initialization
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-    simdutf_really_inline simd8(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3, int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) : simd8(simdutf_make_int8x16_t(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    )) {}
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
+      : simd8(simdutf_make_uint8x16_t(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                                      v10, v11, v12, v13, v14, v15)) {}
 #else
-    simdutf_really_inline simd8(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3, int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) : simd8(int8x16_t{
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    }) {}
-#endif
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<int8_t> repeat_16(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3,  int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) {
-      return simd8<int8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-    // Store to array
-    simdutf_really_inline void store(int8_t dst[16]) const { return vst1q_s8(dst, value); }
-    // Explicit conversion to/from unsigned
-    //
-    // Under Visual Studio/ARM64 uint8x16_t and int8x16_t are apparently the same type.
-    // In theory, we could check this occurrence with std::same_as and std::enabled_if but it is C++14
-    // and relatively ugly and hard to read.
-#ifndef SIMDUTF_REGULAR_VISUAL_STUDIO
-    simdutf_really_inline explicit simd8(const uint8x16_t other): simd8(vreinterpretq_s8_u8(other)) {}
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
+      : simd8(uint8x16_t{v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15}) {}
 #endif
-    simdutf_really_inline operator simd8<uint8_t>() const { return vreinterpretq_u8_s8(this->value); }
-
-    simdutf_really_inline simd8<int8_t> operator|(const simd8<int8_t> other) const { return vorrq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t> operator&(const simd8<int8_t> other) const { return vandq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t> operator^(const simd8<int8_t> other) const { return veorq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t> bit_andnot(const simd8<int8_t> other) const { return vbicq_s8(value, other.value); }
-
-    // Math
-    simdutf_really_inline simd8<int8_t> operator+(const simd8<int8_t> other) const { return vaddq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t> operator-(const simd8<int8_t> other) const { return vsubq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t>& operator+=(const simd8<int8_t> other) { *this = *this + other; return *this; }
-    simdutf_really_inline simd8<int8_t>& operator-=(const simd8<int8_t> other) { *this = *this - other; return *this; }
-
-    simdutf_really_inline int8_t max_val() const { return vmaxvq_s8(value); }
-    simdutf_really_inline int8_t min_val() const { return vminvq_s8(value); }
-    simdutf_really_inline bool is_ascii() const { return this->min_val() >= 0; }
-
-    // Order-sensitive comparisons
-    simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const { return vmaxq_s8(value, other.value); }
-    simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const { return vminq_s8(value, other.value); }
-    simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const { return vcgtq_s8(value, other.value); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const { return vcltq_s8(value, other.value); }
-    simdutf_really_inline simd8<bool> operator==(const simd8<int8_t> other) const { return vceqq_s8(value, other.value); }
-
-    template<int N=1>
-    simdutf_really_inline simd8<int8_t> prev(const simd8<int8_t> prev_chunk) const {
-      return vextq_s8(prev_chunk, *this, 16 - N);
-    }
-
-    // Perform a lookup assuming no value is larger than 16
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
-      return lookup_table.apply_lookup_16_to(*this);
-    }
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(
-        L replace0,  L replace1,  L replace2,  L replace3,
-        L replace4,  L replace5,  L replace6,  L replace7,
-        L replace8,  L replace9,  L replace10, L replace11,
-        L replace12, L replace13, L replace14, L replace15) const {
-      return lookup_16(simd8<L>::repeat_16(
-        replace0,  replace1,  replace2,  replace3,
-        replace4,  replace5,  replace6,  replace7,
-        replace8,  replace9,  replace10, replace11,
-        replace12, replace13, replace14, replace15
-      ));
-    }
-
-    template<typename T>
-    simdutf_really_inline simd8<int8_t> apply_lookup_16_to(const simd8<T> original) const {
-      return vqtbl1q_s8(*this, simd8<uint8_t>(original));
-    }
-  };
-
-  template<typename T>
-  struct simd8x64 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
-    static_assert(NUM_CHUNKS == 4, "ARM kernel should use four registers per 64-byte block.");
-    simd8<T> chunks[NUM_CHUNKS];
-
-    simd8x64(const simd8x64<T>& o) = delete; // no copy allowed
-    simd8x64<T>& operator=(const simd8<T> other) = delete; // no assignment allowed
-    simd8x64() = delete; // no default constructor allowed
-
-    simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1, const simd8<T> chunk2, const simd8<T> chunk3) : chunks{chunk0, chunk1, chunk2, chunk3} {}
-    simdutf_really_inline simd8x64(const T* ptr) : chunks{simd8<T>::load(ptr), simd8<T>::load(ptr+sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+2*sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+3*sizeof(simd8<T>)/sizeof(T))} {}
-
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd8<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd8<T>)*1/sizeof(T));
-      this->chunks[2].store(ptr+sizeof(simd8<T>)*2/sizeof(T));
-      this->chunks[3].store(ptr+sizeof(simd8<T>)*3/sizeof(T));
-    }
-
-
-    simdutf_really_inline simd8x64<T>& operator |=(const simd8x64<T> &other) {
-      this->chunks[0] |= other.chunks[0];
-      this->chunks[1] |= other.chunks[1];
-      this->chunks[2] |= other.chunks[2];
-      this->chunks[3] |= other.chunks[3];
-      return *this;
-    }
-
-    simdutf_really_inline simd8<T> reduce_or() const {
-      return (this->chunks[0] | this->chunks[1]) | (this->chunks[2] | this->chunks[3]);
-    }
 
-    simdutf_really_inline bool is_ascii() const {
-      return reduce_or().is_ascii();
-    }
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15);
+  }
 
-    template <endianness endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*1);
-      this->chunks[2].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*2);
-      this->chunks[3].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*3);
-    }
+  // Store to array
+  simdutf_really_inline void store(uint8_t dst[16]) const {
+    return vst1q_u8(dst, *this);
+  }
 
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf32_tbl(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].store_ascii_as_utf32_tbl(ptr+sizeof(simd8<T>)*1);
-      this->chunks[2].store_ascii_as_utf32_tbl(ptr+sizeof(simd8<T>)*2);
-      this->chunks[3].store_ascii_as_utf32_tbl(ptr+sizeof(simd8<T>)*3);
-    }
+  // Saturated math
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return vqaddq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return vqsubq_u8(*this, other);
+  }
 
-    simdutf_really_inline uint64_t to_bitmask() const {
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint8x16_t bit_mask = simdutf_make_uint8x16_t(
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
-      );
-#else
-      const uint8x16_t bit_mask = {
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
-      };
-#endif
-      // Add each of the elements next to each other, successively, to stuff each 8 byte mask into one.
-      uint8x16_t sum0 = vpaddq_u8(vandq_u8(uint8x16_t(this->chunks[0]), bit_mask), vandq_u8(uint8x16_t(this->chunks[1]), bit_mask));
-      uint8x16_t sum1 = vpaddq_u8(vandq_u8(uint8x16_t(this->chunks[2]), bit_mask), vandq_u8(uint8x16_t(this->chunks[3]), bit_mask));
-      sum0 = vpaddq_u8(sum0, sum1);
-      sum0 = vpaddq_u8(sum0, sum0);
-      return vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
-    }
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd8<uint8_t>
+  operator+(const simd8<uint8_t> other) const {
+    return vaddq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  operator-(const simd8<uint8_t> other) const {
+    return vsubq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t> &operator+=(const simd8<uint8_t> other) {
+    *this = *this + other;
+    return *this;
+  }
+  simdutf_really_inline simd8<uint8_t> &operator-=(const simd8<uint8_t> other) {
+    *this = *this - other;
+    return *this;
+  }
 
-    simdutf_really_inline uint64_t eq(const T m) const {
-    const simd8<T> mask = simd8<T>::splat(m);
-    return  simd8x64<bool>(
-      this->chunks[0] == mask,
-      this->chunks[1] == mask,
-      this->chunks[2] == mask,
-      this->chunks[3] == mask
-    ).to_bitmask();
+  // Order-specific operations
+  simdutf_really_inline uint8_t max_val() const { return vmaxvq_u8(*this); }
+  simdutf_really_inline uint8_t min_val() const { return vminvq_u8(*this); }
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return vmaxq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return vminq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return vcleq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return vcgeq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return vcltq_u8(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return vcgtq_u8(*this, other);
+  }
+  // Same as >, but instead of guaranteeing all 1's == true, false = 0 and true
+  // = nonzero. For ARM, returns all 1's.
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return simd8<uint8_t>(*this > other);
+  }
+  // Same as <, but instead of guaranteeing all 1's == true, false = 0 and true
+  // = nonzero. For ARM, returns all 1's.
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return simd8<uint8_t>(*this < other);
   }
 
-  simdutf_really_inline uint64_t lteq(const T m) const {
-    const simd8<T> mask = simd8<T>::splat(m);
-    return  simd8x64<bool>(
-      this->chunks[0] <= mask,
-      this->chunks[1] <= mask,
-      this->chunks[2] <= mask,
-      this->chunks[3] <= mask
-    ).to_bitmask();
-  }
-
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-
-      return  simd8x64<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
-        (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
-        (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-      return  simd8x64<bool>(
-        (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
-        (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
-        (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
-        (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask,
-        this->chunks[2] < mask,
-        this->chunks[3] < mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] > mask,
-        this->chunks[1] > mask,
-        this->chunks[2] > mask,
-        this->chunks[3] > mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] >= mask,
-        this->chunks[1] >= mask,
-        this->chunks[2] >= mask,
-        this->chunks[3] >= mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
-      const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
-      return  simd8x64<bool>(
-        simd8<uint8_t>(uint8x16_t(this->chunks[0])) >= mask,
-        simd8<uint8_t>(uint8x16_t(this->chunks[1])) >= mask,
-        simd8<uint8_t>(uint8x16_t(this->chunks[2])) >= mask,
-        simd8<uint8_t>(uint8x16_t(this->chunks[3])) >= mask
-      ).to_bitmask();
-    }
-  }; // struct simd8x64<T>
-/* begin file src/simdutf/arm64/simd16-inl.h */
-template<typename T>
-struct simd16;
-
-  template<typename T, typename Mask=simd16<bool>>
-  struct base_u16 {
-    uint16x8_t value;
-    static const int SIZE = sizeof(value);
-
-    // Conversion from/to SIMD register
-    simdutf_really_inline base_u16() = default;
-    simdutf_really_inline base_u16(const uint16x8_t _value) : value(_value) {}
-    simdutf_really_inline operator const uint16x8_t&() const { return this->value; }
-    simdutf_really_inline operator uint16x8_t&() { return this->value; }
-    // Bit operations
-    simdutf_really_inline simd16<T> operator|(const simd16<T> other) const { return vorrq_u16(*this, other); }
-    simdutf_really_inline simd16<T> operator&(const simd16<T> other) const { return vandq_u16(*this, other); }
-    simdutf_really_inline simd16<T> operator^(const simd16<T> other) const { return veorq_u16(*this, other); }
-    simdutf_really_inline simd16<T> bit_andnot(const simd16<T> other) const { return vbicq_u16(*this, other); }
-    simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
-    simdutf_really_inline simd16<T>& operator|=(const simd16<T> other) { auto this_cast = static_cast<simd16<T>*>(this); *this_cast = *this_cast | other; return *this_cast; }
-    simdutf_really_inline simd16<T>& operator&=(const simd16<T> other) { auto this_cast = static_cast<simd16<T>*>(this); *this_cast = *this_cast & other; return *this_cast; }
-    simdutf_really_inline simd16<T>& operator^=(const simd16<T> other) { auto this_cast = static_cast<simd16<T>*>(this); *this_cast = *this_cast ^ other; return *this_cast; }
-
-    friend simdutf_really_inline Mask operator==(const simd16<T> lhs, const simd16<T> rhs) { return vceqq_u16(lhs, rhs); }
-
-    template<int N=1>
-    simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
-      return vextq_u18(prev_chunk, *this, 8 - N);
-    }
-  };
+  // Bit-specific operations
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return vtstq_u8(*this, bits);
+  }
+  simdutf_really_inline bool is_ascii() const {
+    return this->max_val() < 0b10000000u;
+  }
 
-template<typename T, typename Mask=simd16<bool>>
-struct base16: base_u16<T> {
-  typedef uint16_t bitmask_t;
-  typedef uint32_t bitmask2_t;
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return this->max_val() != 0;
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return (*this & bits).any_bits_set_anywhere();
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return vshrq_n_u8(*this, N);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return vshlq_n_u8(*this, N);
+  }
 
-  simdutf_really_inline base16() : base_u16<T>() {}
-  simdutf_really_inline base16(const uint16x8_t _value) : base_u16<T>(_value) {}
-  template <typename Pointer>
-  simdutf_really_inline base16(const Pointer* ptr) : base16(vld1q_u16(ptr)) {}
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return lookup_table.apply_lookup_16_to(*this);
+  }
 
-  static const int SIZE = sizeof(base_u16<T>::value);
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
+  }
 
-  template<int N=1>
-  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
-    return vextq_u18(prev_chunk, *this, 8 - N);
+  template <typename T>
+  simdutf_really_inline simd8<uint8_t>
+  apply_lookup_16_to(const simd8<T> original) const {
+    return vqtbl1q_u8(*this, simd8<uint8_t>(original));
   }
 };
 
-// SIMD byte mask type (returned by things like eq and gt)
-template<>
-struct simd16<bool>: base16<bool> {
-  static simdutf_really_inline simd16<bool> splat(bool _value) { return vmovq_n_u16(uint16_t(-(!!_value))); }
-
-  simdutf_really_inline simd16() : base16() {}
-  simdutf_really_inline simd16(const uint16x8_t _value) : base16<bool>(_value) {}
-  // Splat constructor
-  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
-
-};
+// Signed bytes
+template <> struct simd8<int8_t> {
+  int8x16_t value;
 
-template<typename T>
-struct base16_numeric: base16<T> {
-  static simdutf_really_inline simd16<T> splat(T _value) { return vmovq_n_u16(_value); }
-  static simdutf_really_inline simd16<T> zero() { return vdupq_n_u16(0); }
-  static simdutf_really_inline simd16<T> load(const T values[8]) {
-    return vld1q_u16(reinterpret_cast<const uint16_t*>(values));
+  static simdutf_really_inline simd8<int8_t> splat(int8_t _value) {
+    return vmovq_n_s8(_value);
+  }
+  static simdutf_really_inline simd8<int8_t> zero() { return vdupq_n_s8(0); }
+  static simdutf_really_inline simd8<int8_t> load(const int8_t values[16]) {
+    return vld1q_s8(values);
   }
 
-  simdutf_really_inline base16_numeric() : base16<T>() {}
-  simdutf_really_inline base16_numeric(const uint16x8_t _value) : base16<T>(_value) {}
-
-  // Store to array
-  simdutf_really_inline void store(T dst[8]) const { return vst1q_u16(dst, *this); }
-
-  // Override to distinguish from bool version
-  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
-
-  // Addition/subtraction are the same for signed and unsigned
-  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const { return vaddq_u8(*this, other); }
-  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const { return vsubq_u8(*this, other); }
-  simdutf_really_inline simd16<T>& operator+=(const simd16<T> other) { *this = *this + other; return *static_cast<simd16<T>*>(this); }
-  simdutf_really_inline simd16<T>& operator-=(const simd16<T> other) { *this = *this - other; return *static_cast<simd16<T>*>(this); }
-};
-
-// Signed code units
-template<>
-struct simd16<int16_t> : base16_numeric<int16_t> {
-  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+  // Use ST2 instead of UXTL+UXTL2 to interleave zeroes. UXTL is actually a
+  // USHLL #0, and shifting in NEON is actually quite slow.
+  //
+  // While this needs the registers to be in a specific order, bigger cores can
+  // interleave these with no overhead, and it still performs decently on little
+  // cores.
+  //    movi  v1.3d, #0
+  //      mov   v0.16b, value[0]
+  //    st2   {v0.16b, v1.16b}, [ptr], #32
+  //      mov   v0.16b, value[1]
+  //    st2   {v0.16b, v1.16b}, [ptr], #32
+  //    ...
+  template <endianness big_endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *p) const {
+    int8x16x2_t pair = match_system(big_endian)
+                           ? int8x16x2_t{{this->value, vmovq_n_s8(0)}}
+                           : int8x16x2_t{{vmovq_n_s8(0), this->value}};
+    vst2q_s8(reinterpret_cast<int8_t *>(p), pair);
+  }
+
+  // currently unused
+  // Technically this could be done with ST4 like in store_ascii_as_utf16, but
+  // it is very much not worth it, as explicitly mentioned in the ARM Cortex-X1
+  // Core Software Optimization Guide:
+  //   4.18 Complex ASIMD instructions
+  //     The bandwidth of [ST4 with element size less than 64b] is limited by
+  //     decode constraints and it is advisable to avoid them when high
+  //     performing code is desired.
+  // Instead, it is better to use ZIP1+ZIP2 and two ST2.
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *p) const {
+    const uint16x8_t low =
+        vreinterpretq_u16_s8(vzip1q_s8(this->value, vmovq_n_s8(0)));
+    const uint16x8_t high =
+        vreinterpretq_u16_s8(vzip2q_s8(this->value, vmovq_n_s8(0)));
+    const uint16x8x2_t low_pair{{low, vmovq_n_u16(0)}};
+    vst2q_u16(reinterpret_cast<uint16_t *>(p), low_pair);
+    const uint16x8x2_t high_pair{{high, vmovq_n_u16(0)}};
+    vst2q_u16(reinterpret_cast<uint16_t *>(p + 8), high_pair);
+  }
+
+  // In places where the table can be reused, which is most uses in simdutf, it
+  // is worth it to do 4 table lookups, as there is no direct zero extension
+  // from u8 to u32.
+  simdutf_really_inline void store_ascii_as_utf32_tbl(char32_t *p) const {
+    const simd8<uint8_t> tb1{0, 255, 255, 255, 1, 255, 255, 255,
+                             2, 255, 255, 255, 3, 255, 255, 255};
+    const simd8<uint8_t> tb2{4, 255, 255, 255, 5, 255, 255, 255,
+                             6, 255, 255, 255, 7, 255, 255, 255};
+    const simd8<uint8_t> tb3{8,  255, 255, 255, 9,  255, 255, 255,
+                             10, 255, 255, 255, 11, 255, 255, 255};
+    const simd8<uint8_t> tb4{12, 255, 255, 255, 13, 255, 255, 255,
+                             14, 255, 255, 255, 15, 255, 255, 255};
+
+    // encourage store pairing and interleaving
+    const auto shuf1 = this->apply_lookup_16_to(tb1);
+    const auto shuf2 = this->apply_lookup_16_to(tb2);
+    shuf1.store(reinterpret_cast<int8_t *>(p));
+    shuf2.store(reinterpret_cast<int8_t *>(p + 4));
+
+    const auto shuf3 = this->apply_lookup_16_to(tb3);
+    const auto shuf4 = this->apply_lookup_16_to(tb4);
+    shuf3.store(reinterpret_cast<int8_t *>(p + 8));
+    shuf4.store(reinterpret_cast<int8_t *>(p + 12));
+  }
+  // Conversion from/to SIMD register
+  simdutf_really_inline simd8(const int8x16_t _value) : value{_value} {}
+  simdutf_really_inline operator const int8x16_t &() const {
+    return this->value;
+  }
 #ifndef SIMDUTF_REGULAR_VISUAL_STUDIO
-  simdutf_really_inline simd16(const uint16x8_t _value) : base16_numeric<int16_t>(_value) {}
+  simdutf_really_inline operator const uint8x16_t() const {
+    return vreinterpretq_u8_s8(this->value);
+  }
 #endif
-  simdutf_really_inline simd16(const int16x8_t _value) : base16_numeric<int16_t>(vreinterpretq_u16_s16(_value)) {}
+  simdutf_really_inline operator int8x16_t &() { return this->value; }
 
+  // Zero constructor
+  simdutf_really_inline simd8() : simd8(zero()) {}
   // Splat constructor
-  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd16(const int16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const int16_t*>(values))) {}
-  simdutf_really_inline operator simd16<uint16_t>() const;
-  simdutf_really_inline operator const uint16x8_t&() const { return this->value; }
-  simdutf_really_inline operator const int16x8_t() const { return vreinterpretq_s16_u16(this->value); }
-
-  simdutf_really_inline int16_t max_val() const { return vmaxvq_s16(vreinterpretq_s16_u16(this->value)); }
-  simdutf_really_inline int16_t min_val() const { return vminvq_s16(vreinterpretq_s16_u16(this->value)); }
-  // Order-sensitive comparisons
-  simdutf_really_inline simd16<int16_t> max_val(const simd16<int16_t> other) const { return vmaxq_s16(vreinterpretq_s16_u16(this->value), vreinterpretq_s16_u16(other.value)); }
-  simdutf_really_inline simd16<int16_t> min_val(const simd16<int16_t> other) const { return vmaxq_s16(vreinterpretq_s16_u16(this->value), vreinterpretq_s16_u16(other.value)); }
-  simdutf_really_inline simd16<bool> operator>(const simd16<int16_t> other) const { return vcgtq_s16(vreinterpretq_s16_u16(this->value), vreinterpretq_s16_u16(other.value)); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<int16_t> other) const { return vcltq_s16(vreinterpretq_s16_u16(this->value), vreinterpretq_s16_u16(other.value)); }
-};
-
-
-
-
-// Unsigned code units
-template<>
-struct simd16<uint16_t>: base16_numeric<uint16_t>  {
-  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
-  simdutf_really_inline simd16(const uint16x8_t _value) : base16_numeric<uint16_t>(_value) {}
-
-  // Splat constructor
-  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
-  // Array constructor
-  simdutf_really_inline simd16(const uint16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const uint16_t*>(values))) {}
+  simdutf_really_inline simd8(const int8_t *values) : simd8(load(values)) {}
+  // Member-by-member initialization
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
+                              int8_t v4, int8_t v5, int8_t v6, int8_t v7,
+                              int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+                              int8_t v12, int8_t v13, int8_t v14, int8_t v15)
+      : simd8(simdutf_make_int8x16_t(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                                     v10, v11, v12, v13, v14, v15)) {}
+#else
+  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
+                              int8_t v4, int8_t v5, int8_t v6, int8_t v7,
+                              int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+                              int8_t v12, int8_t v13, int8_t v14, int8_t v15)
+      : simd8(int8x16_t{v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                        v13, v14, v15}) {}
+#endif
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15);
+  }
 
+  // Store to array
+  simdutf_really_inline void store(int8_t dst[16]) const {
+    return vst1q_s8(dst, value);
+  }
+  // Explicit conversion to/from unsigned
+  //
+  // Under Visual Studio/ARM64 uint8x16_t and int8x16_t are apparently the same
+  // type. In theory, we could check this occurrence with std::same_as and
+  // std::enabled_if but it is C++14 and relatively ugly and hard to read.
+#ifndef SIMDUTF_REGULAR_VISUAL_STUDIO
+  simdutf_really_inline explicit simd8(const uint8x16_t other)
+      : simd8(vreinterpretq_s8_u8(other)) {}
+#endif
+  simdutf_really_inline operator simd8<uint8_t>() const {
+    return vreinterpretq_u8_s8(this->value);
+  }
 
-  simdutf_really_inline int16_t max_val() const { return vmaxvq_u16(*this); }
-  simdutf_really_inline int16_t min_val() const { return vminvq_u16(*this); }
-  // Saturated math
-  simdutf_really_inline simd16<uint16_t> saturating_add(const simd16<uint16_t> other) const { return vqaddq_u16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> saturating_sub(const simd16<uint16_t> other) const { return vqsubq_u16(*this, other); }
+  simdutf_really_inline simd8<int8_t>
+  operator|(const simd8<int8_t> other) const {
+    return vorrq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator&(const simd8<int8_t> other) const {
+    return vandq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator^(const simd8<int8_t> other) const {
+    return veorq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  bit_andnot(const simd8<int8_t> other) const {
+    return vbicq_s8(value, other.value);
+  }
 
-  // Order-specific operations
-  simdutf_really_inline simd16<uint16_t> max_val(const simd16<uint16_t> other) const { return vmaxq_u16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> min_val(const simd16<uint16_t> other) const { return vminq_u16(*this, other); }
-  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> gt_bits(const simd16<uint16_t> other) const { return this->saturating_sub(other); }
-  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> lt_bits(const simd16<uint16_t> other) const { return other.saturating_sub(*this); }
-  simdutf_really_inline simd16<bool> operator<=(const simd16<uint16_t> other) const { return vcleq_u16(*this, other); }
-  simdutf_really_inline simd16<bool> operator>=(const simd16<uint16_t> other) const { return vcgeq_u16(*this, other); }
-  simdutf_really_inline simd16<bool> operator>(const simd16<uint16_t> other) const { return  vcgtq_u16(*this, other); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<uint16_t> other) const { return vcltq_u16(*this, other); }
+  // Math
+  simdutf_really_inline simd8<int8_t>
+  operator+(const simd8<int8_t> other) const {
+    return vaddq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator-(const simd8<int8_t> other) const {
+    return vsubq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t> &operator+=(const simd8<int8_t> other) {
+    *this = *this + other;
+    return *this;
+  }
+  simdutf_really_inline simd8<int8_t> &operator-=(const simd8<int8_t> other) {
+    *this = *this - other;
+    return *this;
+  }
 
-  // Bit-specific operations
-  simdutf_really_inline simd16<bool> bits_not_set() const { return *this == uint16_t(0); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shr() const { return simd16<uint16_t>(vshrq_n_u16(*this, N)); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shl() const { return simd16<uint16_t>(vshlq_n_u16(*this, N)); }
+  simdutf_really_inline int8_t max_val() const { return vmaxvq_s8(value); }
+  simdutf_really_inline int8_t min_val() const { return vminvq_s8(value); }
+  simdutf_really_inline bool is_ascii() const { return this->min_val() >= 0; }
 
-  // logical operations
-  simdutf_really_inline simd16<uint16_t> operator|(const simd16<uint16_t> other) const { return vorrq_u16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> operator&(const simd16<uint16_t> other) const { return vandq_u16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> operator^(const simd16<uint16_t> other) const { return veorq_u16(*this, other); }
+  // Order-sensitive comparisons
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return vmaxq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return vminq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return vcgtq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return vcltq_s8(value, other.value);
+  }
+  simdutf_really_inline simd8<bool>
+  operator==(const simd8<int8_t> other) const {
+    return vceqq_s8(value, other.value);
+  }
 
-  // Pack with the unsigned saturation of two uint16_t code units into single uint8_t vector
-  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t>& v0, const simd16<uint16_t>& v1) {
-    return vqmovn_high_u16(vqmovn_u16(v0), v1);
+  template <int N = 1>
+  simdutf_really_inline simd8<int8_t>
+  prev(const simd8<int8_t> prev_chunk) const {
+    return vextq_s8(prev_chunk, *this, 16 - N);
   }
 
-  // Change the endianness
-  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
-    return vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(*this)));
+  // Perform a lookup assuming no value is larger than 16
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return lookup_table.apply_lookup_16_to(*this);
+  }
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
+  }
+
+  template <typename T>
+  simdutf_really_inline simd8<int8_t>
+  apply_lookup_16_to(const simd8<T> original) const {
+    return vqtbl1q_s8(*this, simd8<uint8_t>(original));
   }
 };
-simdutf_really_inline simd16<int16_t>::operator simd16<uint16_t>() const { return this->value; }
 
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(NUM_CHUNKS == 4,
+                "ARM kernel should use four registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
 
-  template<typename T>
-  struct simd16x32 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
-    static_assert(NUM_CHUNKS == 4, "ARM kernel should use four registers per 64-byte block.");
-    simd16<T> chunks[NUM_CHUNKS];
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
 
-    simd16x32(const simd16x32<T>& o) = delete; // no copy allowed
-    simd16x32<T>& operator=(const simd16<T> other) = delete; // no assignment allowed
-    simd16x32() = delete; // no default constructor allowed
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1,
+                                 const simd8<T> chunk2, const simd8<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 2 * sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 3 * sizeof(simd8<T>) / sizeof(T))} {}
 
-    simdutf_really_inline simd16x32(const simd16<T> chunk0, const simd16<T> chunk1, const simd16<T> chunk2, const simd16<T> chunk3) : chunks{chunk0, chunk1, chunk2, chunk3} {}
-    simdutf_really_inline simd16x32(const T* ptr) : chunks{simd16<T>::load(ptr), simd16<T>::load(ptr+sizeof(simd16<T>)/sizeof(T)), simd16<T>::load(ptr+2*sizeof(simd16<T>)/sizeof(T)), simd16<T>::load(ptr+3*sizeof(simd16<T>)/sizeof(T))} {}
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd8<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd8<T>) * 3 / sizeof(T));
+  }
 
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd16<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd16<T>)*1/sizeof(T));
-      this->chunks[2].store(ptr+sizeof(simd16<T>)*2/sizeof(T));
-      this->chunks[3].store(ptr+sizeof(simd16<T>)*3/sizeof(T));
-    }
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    this->chunks[2] |= other.chunks[2];
+    this->chunks[3] |= other.chunks[3];
+    return *this;
+  }
 
-    simdutf_really_inline simd16<T> reduce_or() const {
-      return (this->chunks[0] | this->chunks[1]) | (this->chunks[2] | this->chunks[3]);
-    }
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
+  }
 
-    simdutf_really_inline bool is_ascii() const {
-      return reduce_or().is_ascii();
-    }
+  simdutf_really_inline bool is_ascii() const { return reduce_or().is_ascii(); }
 
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*0);
-      this->chunks[1].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*1);
-      this->chunks[2].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*2);
-      this->chunks[3].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*3);
-    }
+  template <endianness endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 0);
+    this->chunks[1].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 1);
+    this->chunks[2].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 2);
+    this->chunks[3].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 3);
+  }
+
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 0);
+    this->chunks[1].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 1);
+    this->chunks[2].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 2);
+    this->chunks[3].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 3);
+  }
 
-    simdutf_really_inline uint64_t to_bitmask() const {
+  simdutf_really_inline uint64_t to_bitmask() const {
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint8x16_t bit_mask = simdutf_make_uint8x16_t(
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
-      );
+    const uint8x16_t bit_mask =
+        simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
 #else
-      const uint8x16_t bit_mask = {
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-        0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
-      };
+    const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                 0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
 #endif
-      // Add each of the elements next to each other, successively, to stuff each 8 byte mask into one.
-      uint8x16_t sum0 = vpaddq_u8(vreinterpretq_u8_u16(this->chunks[0] & vreinterpretq_u16_u8(bit_mask)), vreinterpretq_u8_u16(this->chunks[1] & vreinterpretq_u16_u8(bit_mask)));
-      uint8x16_t sum1 = vpaddq_u8(vreinterpretq_u8_u16(this->chunks[2] & vreinterpretq_u16_u8(bit_mask)), vreinterpretq_u8_u16(this->chunks[3] & vreinterpretq_u16_u8(bit_mask)));
-      sum0 = vpaddq_u8(sum0, sum1);
-      sum0 = vpaddq_u8(sum0, sum0);
-      return vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
-    }
-
-    simdutf_really_inline void swap_bytes() {
-      this->chunks[0] = this->chunks[0].swap_bytes();
-      this->chunks[1] = this->chunks[1].swap_bytes();
-      this->chunks[2] = this->chunks[2].swap_bytes();
-      this->chunks[3] = this->chunks[3].swap_bytes();
-    }
+    // Add each of the elements next to each other, successively, to stuff each
+    // 8 byte mask into one.
+    uint8x16_t sum0 =
+        vpaddq_u8(vandq_u8(uint8x16_t(this->chunks[0]), bit_mask),
+                  vandq_u8(uint8x16_t(this->chunks[1]), bit_mask));
+    uint8x16_t sum1 =
+        vpaddq_u8(vandq_u8(uint8x16_t(this->chunks[2]), bit_mask),
+                  vandq_u8(uint8x16_t(this->chunks[3]), bit_mask));
+    sum0 = vpaddq_u8(sum0, sum1);
+    sum0 = vpaddq_u8(sum0, sum0);
+    return vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
+  }
 
-    simdutf_really_inline uint64_t eq(const T m) const {
-    const simd16<T> mask = simd16<T>::splat(m);
-    return  simd16x32<bool>(
-      this->chunks[0] == mask,
-      this->chunks[1] == mask,
-      this->chunks[2] == mask,
-      this->chunks[3] == mask
-    ).to_bitmask();
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                          this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
   }
 
   simdutf_really_inline uint64_t lteq(const T m) const {
-    const simd16<T> mask = simd16<T>::splat(m);
-    return  simd16x32<bool>(
-      this->chunks[0] <= mask,
-      this->chunks[1] <= mask,
-      this->chunks[2] <= mask,
-      this->chunks[3] <= mask
-    ).to_bitmask();
-  }
-
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(low);
-      const simd16<T> mask_high = simd16<T>::splat(high);
-
-      return  simd16x32<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
-        (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
-        (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(low);
-      const simd16<T> mask_high = simd16<T>::splat(high);
-      return  simd16x32<bool>(
-        (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
-        (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
-        (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
-        (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask,
-        this->chunks[2] < mask,
-        this->chunks[3] < mask
-      ).to_bitmask();
-    }
-
-  }; // struct simd16x32<T>
-  template<>
-  simdutf_really_inline uint64_t simd16x32<uint16_t>::not_in_range(const uint16_t low, const uint16_t high) const {
-      const simd16<uint16_t> mask_low = simd16<uint16_t>::splat(low);
-      const simd16<uint16_t> mask_high = simd16<uint16_t>::splat(high);
-      simd16x32<uint16_t> x(
-        simd16<uint16_t>((this->chunks[0] > mask_high) | (this->chunks[0] < mask_low)),
-        simd16<uint16_t>((this->chunks[1] > mask_high) | (this->chunks[1] < mask_low)),
-        simd16<uint16_t>((this->chunks[2] > mask_high) | (this->chunks[2] < mask_low)),
-        simd16<uint16_t>((this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
-      );
-      return  x.to_bitmask();
-    }
-/* end file src/simdutf/arm64/simd16-inl.h */
-} // namespace simd
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                          this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
+  }
 
-#endif // SIMDUTF_ARM64_SIMD_H
-/* end file src/simdutf/arm64/simd.h */
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+    return simd8x64<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
+               (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
+               (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                          this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask,
+                          this->chunks[2] > mask, this->chunks[3] > mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask,
+                          this->chunks[2] >= mask, this->chunks[3] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>(simd8<uint8_t>(uint8x16_t(this->chunks[0])) >= mask,
+                          simd8<uint8_t>(uint8x16_t(this->chunks[1])) >= mask,
+                          simd8<uint8_t>(uint8x16_t(this->chunks[2])) >= mask,
+                          simd8<uint8_t>(uint8x16_t(this->chunks[3])) >= mask)
+        .to_bitmask();
+  }
+}; // struct simd8x64<T>
+/* begin file src/simdutf/arm64/simd16-inl.h */
+template <typename T> struct simd16;
 
-/* begin file src/simdutf/arm64/end.h */
-/* end file src/simdutf/arm64/end.h */
+template <typename T, typename Mask = simd16<bool>> struct base_u16 {
+  uint16x8_t value;
+  static const int SIZE = sizeof(value);
 
-#endif // SIMDUTF_IMPLEMENTATION_ARM64
+  // Conversion from/to SIMD register
+  simdutf_really_inline base_u16() = default;
+  simdutf_really_inline base_u16(const uint16x8_t _value) : value(_value) {}
+  simdutf_really_inline operator const uint16x8_t &() const {
+    return this->value;
+  }
+  simdutf_really_inline operator uint16x8_t &() { return this->value; }
+  // Bit operations
+  simdutf_really_inline simd16<T> operator|(const simd16<T> other) const {
+    return vorrq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<T> operator&(const simd16<T> other) const {
+    return vandq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<T> operator^(const simd16<T> other) const {
+    return veorq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<T> bit_andnot(const simd16<T> other) const {
+    return vbicq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd16<T> &operator|=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd16<T> &operator&=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd16<T> &operator^=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
 
-#endif // SIMDUTF_ARM64_H
-/* end file src/simdutf/arm64.h */
-/* begin file src/simdutf/icelake.h */
-#ifndef SIMDUTF_ICELAKE_H
-#define SIMDUTF_ICELAKE_H
+  friend simdutf_really_inline Mask operator==(const simd16<T> lhs,
+                                               const simd16<T> rhs) {
+    return vceqq_u16(lhs, rhs);
+  }
 
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return vextq_u18(prev_chunk, *this, 8 - N);
+  }
+};
 
+template <typename T, typename Mask = simd16<bool>>
+struct base16 : base_u16<T> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
 
-#ifdef __has_include
-// How do we detect that a compiler supports vbmi2?
-// For sure if the following header is found, we are ok?
-#if __has_include(<avx512vbmi2intrin.h>)
-#define SIMDUTF_COMPILER_SUPPORTS_VBMI2 1
-#endif
-#endif
+  simdutf_really_inline base16() : base_u16<T>() {}
+  simdutf_really_inline base16(const uint16x8_t _value) : base_u16<T>(_value) {}
+  template <typename Pointer>
+  simdutf_really_inline base16(const Pointer *ptr) : base16(vld1q_u16(ptr)) {}
 
-#ifdef _MSC_VER
-#if _MSC_VER >= 1930
-// Visual Studio 2022 and up support VBMI2 under x64 even if the header
-// avx512vbmi2intrin.h is not found.
-// Visual Studio 2019 technically supports VBMI2, but the implementation
-// might be unreliable. Search for visualstudio2019icelakeissue in our
-// tests.
-#define SIMDUTF_COMPILER_SUPPORTS_VBMI2 1
-#endif
-#endif
+  static const int SIZE = sizeof(base_u16<T>::value);
 
-// We allow icelake on x64 as long as the compiler is known to support VBMI2.
-#ifndef SIMDUTF_IMPLEMENTATION_ICELAKE
-#define SIMDUTF_IMPLEMENTATION_ICELAKE ((SIMDUTF_IS_X86_64) && (SIMDUTF_COMPILER_SUPPORTS_VBMI2))
-#endif
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return vextq_u18(prev_chunk, *this, 8 - N);
+  }
+};
 
-// To see why  (__BMI__) && (__LZCNT__) are not part of this next line, see
-// https://github.com/simdutf/simdutf/issues/1247
-#if ((SIMDUTF_IMPLEMENTATION_ICELAKE) && (SIMDUTF_IS_X86_64) && (__AVX2__) && (SIMDUTF_HAS_AVX512F && \
-                                         SIMDUTF_HAS_AVX512DQ && \
-                                         SIMDUTF_HAS_AVX512VL && \
-                                           SIMDUTF_HAS_AVX512VBMI2) && (!SIMDUTF_IS_32BITS))
-#define SIMDUTF_CAN_ALWAYS_RUN_ICELAKE 1
-#else
-#define SIMDUTF_CAN_ALWAYS_RUN_ICELAKE 0
-#endif
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd16<bool> : base16<bool> {
+  static simdutf_really_inline simd16<bool> splat(bool _value) {
+    return vmovq_n_u16(uint16_t(-(!!_value)));
+  }
 
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
-#define SIMDUTF_TARGET_ICELAKE
-#else
-#define SIMDUTF_TARGET_ICELAKE SIMDUTF_TARGET_REGION("avx512f,avx512dq,avx512cd,avx512bw,avx512vbmi,avx512vbmi2,avx512vl,avx2,bmi,bmi2,pclmul,lzcnt,popcnt,avx512vpopcntdq")
-#endif
+  simdutf_really_inline simd16() : base16() {}
+  simdutf_really_inline simd16(const uint16x8_t _value)
+      : base16<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+};
 
-namespace simdutf {
-namespace icelake {
-} // namespace icelake
-} // namespace simdutf
+template <typename T> struct base16_numeric : base16<T> {
+  static simdutf_really_inline simd16<T> splat(T _value) {
+    return vmovq_n_u16(_value);
+  }
+  static simdutf_really_inline simd16<T> zero() { return vdupq_n_u16(0); }
+  static simdutf_really_inline simd16<T> load(const T values[8]) {
+    return vld1q_u16(reinterpret_cast<const uint16_t *>(values));
+  }
 
+  simdutf_really_inline base16_numeric() : base16<T>() {}
+  simdutf_really_inline base16_numeric(const uint16x8_t _value)
+      : base16<T>(_value) {}
 
+  // Store to array
+  simdutf_really_inline void store(T dst[8]) const {
+    return vst1q_u16(dst, *this);
+  }
 
-//
-// These two need to be included outside SIMDUTF_TARGET_REGION
-//
-/* begin file src/simdutf/icelake/intrinsics.h */
-#ifndef SIMDUTF_ICELAKE_INTRINSICS_H
-#define SIMDUTF_ICELAKE_INTRINSICS_H
+  // Override to distinguish from bool version
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
 
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const {
+    return vaddq_u8(*this, other);
+  }
+  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const {
+    return vsubq_u8(*this, other);
+  }
+  simdutf_really_inline simd16<T> &operator+=(const simd16<T> other) {
+    *this = *this + other;
+    return *static_cast<simd16<T> *>(this);
+  }
+  simdutf_really_inline simd16<T> &operator-=(const simd16<T> other) {
+    *this = *this - other;
+    return *static_cast<simd16<T> *>(this);
+  }
+};
 
-#ifdef SIMDUTF_VISUAL_STUDIO
-// under clang within visual studio, this will include <x86intrin.h>
-#include <intrin.h>  // visual studio or clang
-#include <immintrin.h>
-#else
+// Signed code units
+template <> struct simd16<int16_t> : base16_numeric<int16_t> {
+  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+#ifndef SIMDUTF_REGULAR_VISUAL_STUDIO
+  simdutf_really_inline simd16(const uint16x8_t _value)
+      : base16_numeric<int16_t>(_value) {}
+#endif
+  simdutf_really_inline simd16(const int16x8_t _value)
+      : base16_numeric<int16_t>(vreinterpretq_u16_s16(_value)) {}
 
-#if SIMDUTF_GCC11ORMORE
-// We should not get warnings while including <x86intrin.h> yet we do
-// under some versions of GCC.
-// If the x86intrin.h header has uninitialized values that are problematic,
-// it is a GCC issue, we want to ignore these warnings.
-SIMDUTF_DISABLE_GCC_WARNING(-Wuninitialized)
+  // Splat constructor
+  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const int16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const int16_t *>(values))) {}
+  simdutf_really_inline operator simd16<uint16_t>() const;
+  simdutf_really_inline operator const uint16x8_t &() const {
+    return this->value;
+  }
+  simdutf_really_inline operator const int16x8_t() const {
+    return vreinterpretq_s16_u16(this->value);
+  }
+
+  simdutf_really_inline int16_t max_val() const {
+    return vmaxvq_s16(vreinterpretq_s16_u16(this->value));
+  }
+  simdutf_really_inline int16_t min_val() const {
+    return vminvq_s16(vreinterpretq_s16_u16(this->value));
+  }
+  // Order-sensitive comparisons
+  simdutf_really_inline simd16<int16_t>
+  max_val(const simd16<int16_t> other) const {
+    return vmaxq_s16(vreinterpretq_s16_u16(this->value),
+                     vreinterpretq_s16_u16(other.value));
+  }
+  simdutf_really_inline simd16<int16_t>
+  min_val(const simd16<int16_t> other) const {
+    return vmaxq_s16(vreinterpretq_s16_u16(this->value),
+                     vreinterpretq_s16_u16(other.value));
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<int16_t> other) const {
+    return vcgtq_s16(vreinterpretq_s16_u16(this->value),
+                     vreinterpretq_s16_u16(other.value));
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<int16_t> other) const {
+    return vcltq_s16(vreinterpretq_s16_u16(this->value),
+                     vreinterpretq_s16_u16(other.value));
+  }
+};
+
+// Unsigned code units
+template <> struct simd16<uint16_t> : base16_numeric<uint16_t> {
+  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
+  simdutf_really_inline simd16(const uint16x8_t _value)
+      : base16_numeric<uint16_t>(_value) {}
+
+  // Splat constructor
+  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const uint16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const uint16_t *>(values))) {}
+
+  simdutf_really_inline int16_t max_val() const { return vmaxvq_u16(*this); }
+  simdutf_really_inline int16_t min_val() const { return vminvq_u16(*this); }
+  // Saturated math
+  simdutf_really_inline simd16<uint16_t>
+  saturating_add(const simd16<uint16_t> other) const {
+    return vqaddq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  saturating_sub(const simd16<uint16_t> other) const {
+    return vqsubq_u16(*this, other);
+  }
+
+  // Order-specific operations
+  simdutf_really_inline simd16<uint16_t>
+  max_val(const simd16<uint16_t> other) const {
+    return vmaxq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  min_val(const simd16<uint16_t> other) const {
+    return vminq_u16(*this, other);
+  }
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  gt_bits(const simd16<uint16_t> other) const {
+    return this->saturating_sub(other);
+  }
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  lt_bits(const simd16<uint16_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<=(const simd16<uint16_t> other) const {
+    return vcleq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>=(const simd16<uint16_t> other) const {
+    return vcgeq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<uint16_t> other) const {
+    return vcgtq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<uint16_t> other) const {
+    return vcltq_u16(*this, other);
+  }
+
+  // Bit-specific operations
+  simdutf_really_inline simd16<bool> bits_not_set() const {
+    return *this == uint16_t(0);
+  }
+  template <int N> simdutf_really_inline simd16<uint16_t> shr() const {
+    return simd16<uint16_t>(vshrq_n_u16(*this, N));
+  }
+  template <int N> simdutf_really_inline simd16<uint16_t> shl() const {
+    return simd16<uint16_t>(vshlq_n_u16(*this, N));
+  }
+
+  // logical operations
+  simdutf_really_inline simd16<uint16_t>
+  operator|(const simd16<uint16_t> other) const {
+    return vorrq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  operator&(const simd16<uint16_t> other) const {
+    return vandq_u16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  operator^(const simd16<uint16_t> other) const {
+    return veorq_u16(*this, other);
+  }
+
+  // Pack with the unsigned saturation of two uint16_t code units into single
+  // uint8_t vector
+  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t> &v0,
+                                                   const simd16<uint16_t> &v1) {
+    return vqmovn_high_u16(vqmovn_u16(v0), v1);
+  }
+
+  // Change the endianness
+  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
+    return vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(*this)));
+  }
+};
+simdutf_really_inline simd16<int16_t>::operator simd16<uint16_t>() const {
+  return this->value;
+}
+
+template <typename T> struct simd16x32 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
+  static_assert(NUM_CHUNKS == 4,
+                "ARM kernel should use four registers per 64-byte block.");
+  simd16<T> chunks[NUM_CHUNKS];
+
+  simd16x32(const simd16x32<T> &o) = delete; // no copy allowed
+  simd16x32<T> &
+  operator=(const simd16<T> other) = delete; // no assignment allowed
+  simd16x32() = delete;                      // no default constructor allowed
+
+  simdutf_really_inline
+  simd16x32(const simd16<T> chunk0, const simd16<T> chunk1,
+            const simd16<T> chunk2, const simd16<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd16x32(const T *ptr)
+      : chunks{simd16<T>::load(ptr),
+               simd16<T>::load(ptr + sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 2 * sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 3 * sizeof(simd16<T>) / sizeof(T))} {}
+
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd16<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd16<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd16<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd16<T>) * 3 / sizeof(T));
+  }
+
+  simdutf_really_inline simd16<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
+  }
+
+  simdutf_really_inline bool is_ascii() const { return reduce_or().is_ascii(); }
+
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 0);
+    this->chunks[1].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 1);
+    this->chunks[2].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 2);
+    this->chunks[3].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 3);
+  }
+
+  simdutf_really_inline uint64_t to_bitmask() const {
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+    const uint8x16_t bit_mask =
+        simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
+#else
+    const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                                 0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
 #endif
+    // Add each of the elements next to each other, successively, to stuff each
+    // 8 byte mask into one.
+    uint8x16_t sum0 = vpaddq_u8(
+        vreinterpretq_u8_u16(this->chunks[0] & vreinterpretq_u16_u8(bit_mask)),
+        vreinterpretq_u8_u16(this->chunks[1] & vreinterpretq_u16_u8(bit_mask)));
+    uint8x16_t sum1 = vpaddq_u8(
+        vreinterpretq_u8_u16(this->chunks[2] & vreinterpretq_u16_u8(bit_mask)),
+        vreinterpretq_u8_u16(this->chunks[3] & vreinterpretq_u16_u8(bit_mask)));
+    sum0 = vpaddq_u8(sum0, sum1);
+    sum0 = vpaddq_u8(sum0, sum0);
+    return vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
+  }
 
-#include <x86intrin.h> // elsewhere
+  simdutf_really_inline void swap_bytes() {
+    this->chunks[0] = this->chunks[0].swap_bytes();
+    this->chunks[1] = this->chunks[1].swap_bytes();
+    this->chunks[2] = this->chunks[2].swap_bytes();
+    this->chunks[3] = this->chunks[3].swap_bytes();
+  }
 
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                           this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
+  }
 
-#if SIMDUTF_GCC11ORMORE
-// cancels the suppression of the -Wuninitialized
-SIMDUTF_POP_DISABLE_WARNINGS
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                           this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
+
+    return simd16x32<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
+    return simd16x32<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
+               (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
+               (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                           this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
+  }
+
+}; // struct simd16x32<T>
+template <>
+simdutf_really_inline uint64_t simd16x32<uint16_t>::not_in_range(
+    const uint16_t low, const uint16_t high) const {
+  const simd16<uint16_t> mask_low = simd16<uint16_t>::splat(low);
+  const simd16<uint16_t> mask_high = simd16<uint16_t>::splat(high);
+  simd16x32<uint16_t> x(simd16<uint16_t>((this->chunks[0] > mask_high) |
+                                         (this->chunks[0] < mask_low)),
+                        simd16<uint16_t>((this->chunks[1] > mask_high) |
+                                         (this->chunks[1] < mask_low)),
+                        simd16<uint16_t>((this->chunks[2] > mask_high) |
+                                         (this->chunks[2] < mask_low)),
+                        simd16<uint16_t>((this->chunks[3] > mask_high) |
+                                         (this->chunks[3] < mask_low)));
+  return x.to_bitmask();
+}
+/* end file src/simdutf/arm64/simd16-inl.h */
+} // namespace simd
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+
+#endif // SIMDUTF_ARM64_SIMD_H
+/* end file src/simdutf/arm64/simd.h */
+
+/* begin file src/simdutf/arm64/end.h */
+/* end file src/simdutf/arm64/end.h */
+
+#endif // SIMDUTF_IMPLEMENTATION_ARM64
+
+#endif // SIMDUTF_ARM64_H
+/* end file src/simdutf/arm64.h */
+/* begin file src/simdutf/icelake.h */
+#ifndef SIMDUTF_ICELAKE_H
+#define SIMDUTF_ICELAKE_H
+
+
+#ifdef __has_include
+  // How do we detect that a compiler supports vbmi2?
+  // For sure if the following header is found, we are ok?
+  #if __has_include(<avx512vbmi2intrin.h>)
+    #define SIMDUTF_COMPILER_SUPPORTS_VBMI2 1
+  #endif
 #endif
 
-#ifndef _tzcnt_u64
-#define _tzcnt_u64(x) __tzcnt_u64(x)
-#endif // _tzcnt_u64
-#endif // SIMDUTF_VISUAL_STUDIO
+#ifdef _MSC_VER
+  #if _MSC_VER >= 1930
+    // Visual Studio 2022 and up support VBMI2 under x64 even if the header
+    // avx512vbmi2intrin.h is not found.
+    // Visual Studio 2019 technically supports VBMI2, but the implementation
+    // might be unreliable. Search for visualstudio2019icelakeissue in our
+    // tests.
+    #define SIMDUTF_COMPILER_SUPPORTS_VBMI2 1
+  #endif
+#endif
 
-#ifdef SIMDUTF_CLANG_VISUAL_STUDIO
-/**
- * You are not supposed, normally, to include these
- * headers directly. Instead you should either include intrin.h
- * or x86intrin.h. However, when compiling with clang
- * under Windows (i.e., when _MSC_VER is set), these headers
- * only get included *if* the corresponding features are detected
- * from macros:
- * e.g., if __AVX2__ is set... in turn,  we normally set these
- * macros by compiling against the corresponding architecture
- * (e.g., arch:AVX2, -mavx2, etc.) which compiles the whole
- * software with these advanced instructions. In simdutf, we
- * want to compile the whole program for a generic target,
- * and only target our specific kernels. As a workaround,
- * we directly include the needed headers. These headers would
- * normally guard against such usage, but we carefully included
- * <x86intrin.h>  (or <intrin.h>) before, so the headers
- * are fooled.
- */
-#include <bmiintrin.h>   // for _blsr_u64
-#include <bmi2intrin.h>  // for _pext_u64, _pdep_u64
-#include <lzcntintrin.h> // for  __lzcnt64
-#include <immintrin.h>   // for most things (AVX2, AVX512, _popcnt64)
-#include <smmintrin.h>
-#include <tmmintrin.h>
-#include <avxintrin.h>
-#include <avx2intrin.h>
-// Important: we need the AVX-512 headers:
-#include <avx512fintrin.h>
-#include <avx512dqintrin.h>
-#include <avx512cdintrin.h>
-#include <avx512bwintrin.h>
-#include <avx512vlintrin.h>
-#include <avx512vlbwintrin.h>
-#include <avx512vbmiintrin.h>
-#include <avx512vbmi2intrin.h>
-#include <avx512vpopcntdqintrin.h>
-#include <avx512vpopcntdqvlintrin.h>
-// unfortunately, we may not get _blsr_u64, but, thankfully, clang
-// has it as a macro.
-#ifndef _blsr_u64
-// we roll our own
-#define _blsr_u64(n) ((n - 1) & n)
-#endif //  _blsr_u64
-#endif // SIMDUTF_CLANG_VISUAL_STUDIO
+// We allow icelake on x64 as long as the compiler is known to support VBMI2.
+#ifndef SIMDUTF_IMPLEMENTATION_ICELAKE
+  #define SIMDUTF_IMPLEMENTATION_ICELAKE                                       \
+    ((SIMDUTF_IS_X86_64) && (SIMDUTF_COMPILER_SUPPORTS_VBMI2))
+#endif
+
+// To see why  (__BMI__) && (__LZCNT__) are not part of this next line, see
+// https://github.com/simdutf/simdutf/issues/1247
+#if ((SIMDUTF_IMPLEMENTATION_ICELAKE) && (SIMDUTF_IS_X86_64) && (__AVX2__) &&  \
+     (SIMDUTF_HAS_AVX512F && SIMDUTF_HAS_AVX512DQ && SIMDUTF_HAS_AVX512VL &&   \
+      SIMDUTF_HAS_AVX512VBMI2) &&                                              \
+     (!SIMDUTF_IS_32BITS))
+  #define SIMDUTF_CAN_ALWAYS_RUN_ICELAKE 1
+#else
+  #define SIMDUTF_CAN_ALWAYS_RUN_ICELAKE 0
+#endif
+
+#if SIMDUTF_IMPLEMENTATION_ICELAKE
+  #if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
+    #define SIMDUTF_TARGET_ICELAKE
+  #else
+    #define SIMDUTF_TARGET_ICELAKE                                             \
+      SIMDUTF_TARGET_REGION(                                                   \
+          "avx512f,avx512dq,avx512cd,avx512bw,avx512vbmi,avx512vbmi2,"         \
+          "avx512vl,avx2,bmi,bmi2,pclmul,lzcnt,popcnt,avx512vpopcntdq")
+  #endif
+
+namespace simdutf {
+namespace icelake {} // namespace icelake
+} // namespace simdutf
+
+  //
+  // These two need to be included outside SIMDUTF_TARGET_REGION
+  //
+/* begin file src/simdutf/icelake/intrinsics.h */
+#ifndef SIMDUTF_ICELAKE_INTRINSICS_H
+#define SIMDUTF_ICELAKE_INTRINSICS_H
+
+
+#ifdef SIMDUTF_VISUAL_STUDIO
+  // under clang within visual studio, this will include <x86intrin.h>
+  #include <intrin.h> // visual studio or clang
+  #include <immintrin.h>
+#else
+
+  #if SIMDUTF_GCC11ORMORE
+// We should not get warnings while including <x86intrin.h> yet we do
+// under some versions of GCC.
+// If the x86intrin.h header has uninitialized values that are problematic,
+// it is a GCC issue, we want to ignore these warnings.
+SIMDUTF_DISABLE_GCC_WARNING(-Wuninitialized)
+  #endif
+
+  #include <x86intrin.h> // elsewhere
+
+  #if SIMDUTF_GCC11ORMORE
+// cancels the suppression of the -Wuninitialized
+SIMDUTF_POP_DISABLE_WARNINGS
+  #endif
 
+  #ifndef _tzcnt_u64
+    #define _tzcnt_u64(x) __tzcnt_u64(x)
+  #endif // _tzcnt_u64
+#endif   // SIMDUTF_VISUAL_STUDIO
 
+#ifdef SIMDUTF_CLANG_VISUAL_STUDIO
+  /**
+   * You are not supposed, normally, to include these
+   * headers directly. Instead you should either include intrin.h
+   * or x86intrin.h. However, when compiling with clang
+   * under Windows (i.e., when _MSC_VER is set), these headers
+   * only get included *if* the corresponding features are detected
+   * from macros:
+   * e.g., if __AVX2__ is set... in turn,  we normally set these
+   * macros by compiling against the corresponding architecture
+   * (e.g., arch:AVX2, -mavx2, etc.) which compiles the whole
+   * software with these advanced instructions. In simdutf, we
+   * want to compile the whole program for a generic target,
+   * and only target our specific kernels. As a workaround,
+   * we directly include the needed headers. These headers would
+   * normally guard against such usage, but we carefully included
+   * <x86intrin.h>  (or <intrin.h>) before, so the headers
+   * are fooled.
+   */
+  #include <bmiintrin.h>   // for _blsr_u64
+  #include <bmi2intrin.h>  // for _pext_u64, _pdep_u64
+  #include <lzcntintrin.h> // for  __lzcnt64
+  #include <immintrin.h>   // for most things (AVX2, AVX512, _popcnt64)
+  #include <smmintrin.h>
+  #include <tmmintrin.h>
+  #include <avxintrin.h>
+  #include <avx2intrin.h>
+  // Important: we need the AVX-512 headers:
+  #include <avx512fintrin.h>
+  #include <avx512dqintrin.h>
+  #include <avx512cdintrin.h>
+  #include <avx512bwintrin.h>
+  #include <avx512vlintrin.h>
+  #include <avx512vlbwintrin.h>
+  #include <avx512vbmiintrin.h>
+  #include <avx512vbmi2intrin.h>
+  #include <avx512vpopcntdqintrin.h>
+  #include <avx512vpopcntdqvlintrin.h>
+  // unfortunately, we may not get _blsr_u64, but, thankfully, clang
+  // has it as a macro.
+  #ifndef _blsr_u64
+    // we roll our own
+    #define _blsr_u64(n) ((n - 1) & n)
+  #endif //  _blsr_u64
+#endif   // SIMDUTF_CLANG_VISUAL_STUDIO
 
 #if defined(__GNUC__) && !defined(__clang__)
 
-#if __GNUC__ == 8
-#define SIMDUTF_GCC8 1
-#elif __GNUC__ == 9
-#define SIMDUTF_GCC9 1
-#endif //  __GNUC__ == 8 || __GNUC__ == 9
+  #if __GNUC__ == 8
+    #define SIMDUTF_GCC8 1
+  #elif __GNUC__ == 9
+    #define SIMDUTF_GCC9 1
+  #endif //  __GNUC__ == 8 || __GNUC__ == 9
 
 #endif // defined(__GNUC__) && !defined(__clang__)
 
 #if SIMDUTF_GCC8
-#pragma GCC push_options
-#pragma GCC target("avx512f")
+  #pragma GCC push_options
+  #pragma GCC target("avx512f")
 /**
  * GCC 8 fails to provide _mm512_set_epi8. We roll our own.
  */
-inline __m512i _mm512_set_epi8(uint8_t a0, uint8_t a1, uint8_t a2, uint8_t a3, uint8_t a4, uint8_t a5, uint8_t a6, uint8_t a7, uint8_t a8, uint8_t a9, uint8_t a10, uint8_t a11, uint8_t a12, uint8_t a13, uint8_t a14, uint8_t a15, uint8_t a16, uint8_t a17, uint8_t a18, uint8_t a19, uint8_t a20, uint8_t a21, uint8_t a22, uint8_t a23, uint8_t a24, uint8_t a25, uint8_t a26, uint8_t a27, uint8_t a28, uint8_t a29, uint8_t a30, uint8_t a31, uint8_t a32, uint8_t a33, uint8_t a34, uint8_t a35, uint8_t a36, uint8_t a37, uint8_t a38, uint8_t a39, uint8_t a40, uint8_t a41, uint8_t a42, uint8_t a43, uint8_t a44, uint8_t a45, uint8_t a46, uint8_t a47, uint8_t a48, uint8_t a49, uint8_t a50, uint8_t a51, uint8_t a52, uint8_t a53, uint8_t a54, uint8_t a55, uint8_t a56, uint8_t a57, uint8_t a58, uint8_t a59, uint8_t a60, uint8_t a61, uint8_t a62, uint8_t a63) {
-  return _mm512_set_epi64(uint64_t(a7) + (uint64_t(a6) << 8) + (uint64_t(a5) << 16) + (uint64_t(a4) << 24) + (uint64_t(a3) << 32) + (uint64_t(a2) << 40) + (uint64_t(a1) << 48) + (uint64_t(a0) << 56),
-                          uint64_t(a15) + (uint64_t(a14) << 8) + (uint64_t(a13) << 16) + (uint64_t(a12) << 24) + (uint64_t(a11) << 32) + (uint64_t(a10) << 40) + (uint64_t(a9) << 48) + (uint64_t(a8) << 56),
-                          uint64_t(a23) + (uint64_t(a22) << 8) + (uint64_t(a21) << 16) + (uint64_t(a20) << 24) + (uint64_t(a19) << 32) + (uint64_t(a18) << 40) + (uint64_t(a17) << 48) + (uint64_t(a16) << 56),
-                          uint64_t(a31) + (uint64_t(a30) << 8) + (uint64_t(a29) << 16) + (uint64_t(a28) << 24) + (uint64_t(a27) << 32) + (uint64_t(a26) << 40) + (uint64_t(a25) << 48) + (uint64_t(a24) << 56),
-                          uint64_t(a39) + (uint64_t(a38) << 8) + (uint64_t(a37) << 16) + (uint64_t(a36) << 24) + (uint64_t(a35) << 32) + (uint64_t(a34) << 40) + (uint64_t(a33) << 48) + (uint64_t(a32) << 56),
-                          uint64_t(a47) + (uint64_t(a46) << 8) + (uint64_t(a45) << 16) + (uint64_t(a44) << 24) + (uint64_t(a43) << 32) + (uint64_t(a42) << 40) + (uint64_t(a41) << 48) + (uint64_t(a40) << 56),
-                          uint64_t(a55) + (uint64_t(a54) << 8) + (uint64_t(a53) << 16) + (uint64_t(a52) << 24) + (uint64_t(a51) << 32) + (uint64_t(a50) << 40) + (uint64_t(a49) << 48) + (uint64_t(a48) << 56),
-                          uint64_t(a63) + (uint64_t(a62) << 8) + (uint64_t(a61) << 16) + (uint64_t(a60) << 24) + (uint64_t(a59) << 32) + (uint64_t(a58) << 40) + (uint64_t(a57) << 48) + (uint64_t(a56) << 56));
-}
-#pragma GCC pop_options
+inline __m512i
+_mm512_set_epi8(uint8_t a0, uint8_t a1, uint8_t a2, uint8_t a3, uint8_t a4,
+                uint8_t a5, uint8_t a6, uint8_t a7, uint8_t a8, uint8_t a9,
+                uint8_t a10, uint8_t a11, uint8_t a12, uint8_t a13, uint8_t a14,
+                uint8_t a15, uint8_t a16, uint8_t a17, uint8_t a18, uint8_t a19,
+                uint8_t a20, uint8_t a21, uint8_t a22, uint8_t a23, uint8_t a24,
+                uint8_t a25, uint8_t a26, uint8_t a27, uint8_t a28, uint8_t a29,
+                uint8_t a30, uint8_t a31, uint8_t a32, uint8_t a33, uint8_t a34,
+                uint8_t a35, uint8_t a36, uint8_t a37, uint8_t a38, uint8_t a39,
+                uint8_t a40, uint8_t a41, uint8_t a42, uint8_t a43, uint8_t a44,
+                uint8_t a45, uint8_t a46, uint8_t a47, uint8_t a48, uint8_t a49,
+                uint8_t a50, uint8_t a51, uint8_t a52, uint8_t a53, uint8_t a54,
+                uint8_t a55, uint8_t a56, uint8_t a57, uint8_t a58, uint8_t a59,
+                uint8_t a60, uint8_t a61, uint8_t a62, uint8_t a63) {
+  return _mm512_set_epi64(
+      uint64_t(a7) + (uint64_t(a6) << 8) + (uint64_t(a5) << 16) +
+          (uint64_t(a4) << 24) + (uint64_t(a3) << 32) + (uint64_t(a2) << 40) +
+          (uint64_t(a1) << 48) + (uint64_t(a0) << 56),
+      uint64_t(a15) + (uint64_t(a14) << 8) + (uint64_t(a13) << 16) +
+          (uint64_t(a12) << 24) + (uint64_t(a11) << 32) +
+          (uint64_t(a10) << 40) + (uint64_t(a9) << 48) + (uint64_t(a8) << 56),
+      uint64_t(a23) + (uint64_t(a22) << 8) + (uint64_t(a21) << 16) +
+          (uint64_t(a20) << 24) + (uint64_t(a19) << 32) +
+          (uint64_t(a18) << 40) + (uint64_t(a17) << 48) + (uint64_t(a16) << 56),
+      uint64_t(a31) + (uint64_t(a30) << 8) + (uint64_t(a29) << 16) +
+          (uint64_t(a28) << 24) + (uint64_t(a27) << 32) +
+          (uint64_t(a26) << 40) + (uint64_t(a25) << 48) + (uint64_t(a24) << 56),
+      uint64_t(a39) + (uint64_t(a38) << 8) + (uint64_t(a37) << 16) +
+          (uint64_t(a36) << 24) + (uint64_t(a35) << 32) +
+          (uint64_t(a34) << 40) + (uint64_t(a33) << 48) + (uint64_t(a32) << 56),
+      uint64_t(a47) + (uint64_t(a46) << 8) + (uint64_t(a45) << 16) +
+          (uint64_t(a44) << 24) + (uint64_t(a43) << 32) +
+          (uint64_t(a42) << 40) + (uint64_t(a41) << 48) + (uint64_t(a40) << 56),
+      uint64_t(a55) + (uint64_t(a54) << 8) + (uint64_t(a53) << 16) +
+          (uint64_t(a52) << 24) + (uint64_t(a51) << 32) +
+          (uint64_t(a50) << 40) + (uint64_t(a49) << 48) + (uint64_t(a48) << 56),
+      uint64_t(a63) + (uint64_t(a62) << 8) + (uint64_t(a61) << 16) +
+          (uint64_t(a60) << 24) + (uint64_t(a59) << 32) +
+          (uint64_t(a58) << 40) + (uint64_t(a57) << 48) +
+          (uint64_t(a56) << 56));
+}
+  #pragma GCC pop_options
 #endif // SIMDUTF_GCC8
 
 #endif // SIMDUTF_HASWELL_INTRINSICS_H
@@ -1964,91 +2358,206 @@ using namespace simdutf;
 
 class implementation final : public simdutf::implementation {
 public:
-  simdutf_really_inline implementation() : simdutf::implementation(
-      "icelake",
-      "Intel AVX512 (AVX-512BW, AVX-512CD, AVX-512VL, AVX-512VBMI2 extensions)",
-      internal::instruction_set::AVX2 | internal::instruction_set::BMI1 | internal::instruction_set::BMI2 | internal::instruction_set::AVX512BW | internal::instruction_set::AVX512CD | internal::instruction_set::AVX512VL | internal::instruction_set::AVX512VBMI2 | internal::instruction_set::AVX512VPOPCNTDQ ) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
+  simdutf_really_inline implementation()
+      : simdutf::implementation(
+            "icelake",
+            "Intel AVX512 (AVX-512BW, AVX-512CD, AVX-512VL, AVX-512VBMI2 "
+            "extensions)",
+            internal::instruction_set::AVX2 | internal::instruction_set::BMI1 |
+                internal::instruction_set::BMI2 |
+                internal::instruction_set::AVX512BW |
+                internal::instruction_set::AVX512CD |
+                internal::instruction_set::AVX512VL |
+                internal::instruction_set::AVX512VBMI2 |
+                internal::instruction_set::AVX512VPOPCNTDQ) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 };
 
 } // namespace icelake
@@ -2057,9 +2566,9 @@ class implementation final : public simdutf::implementation {
 #endif // SIMDUTF_ICELAKE_IMPLEMENTATION_H
 /* end file src/simdutf/icelake/implementation.h */
 
-//
-// The rest need to be inside the region
-//
+  //
+  // The rest need to be inside the region
+  //
 /* begin file src/simdutf/icelake/begin.h */
 // redefining SIMDUTF_IMPLEMENTATION to "icelake"
 // #define SIMDUTF_IMPLEMENTATION icelake
@@ -2070,11 +2579,14 @@ class implementation final : public simdutf::implementation {
 SIMDUTF_TARGET_ICELAKE
 #endif
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
 SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
 #endif // end of workaround
 /* end file src/simdutf/icelake/begin.h */
-// Declarations
+  // Declarations
 /* begin file src/simdutf/icelake/bitmanipulation.h */
 #ifndef SIMDUTF_ICELAKE_BITMANIPULATION_H
 #define SIMDUTF_ICELAKE_BITMANIPULATION_H
@@ -2086,7 +2598,7 @@ namespace {
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
 simdutf_really_inline unsigned __int64 count_ones(uint64_t input_num) {
   // note: we do not support legacy 32-bit Windows
-  return __popcnt64(input_num);// Visual Studio wants two underscores
+  return __popcnt64(input_num); // Visual Studio wants two underscores
 }
 #else
 simdutf_really_inline long long int count_ones(uint64_t input_num) {
@@ -2096,11 +2608,11 @@ simdutf_really_inline long long int count_ones(uint64_t input_num) {
 
 #if SIMDUTF_NEED_TRAILING_ZEROES
 simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
-#if SIMDUTF_REGULAR_VISUAL_STUDIO
+  #if SIMDUTF_REGULAR_VISUAL_STUDIO
   return (int)_tzcnt_u64(input_num);
-#else // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #else  // SIMDUTF_REGULAR_VISUAL_STUDIO
   return __builtin_ctzll(input_num);
-#endif // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #endif // SIMDUTF_REGULAR_VISUAL_STUDIO
 }
 #endif
 
@@ -2118,13 +2630,12 @@ SIMDUTF_UNTARGET_REGION
 #endif
 
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
 SIMDUTF_POP_DISABLE_WARNINGS
 #endif // end of workaround
 /* end file src/simdutf/icelake/end.h */
 
-
-
 #endif // SIMDUTF_IMPLEMENTATION_ICELAKE
 #endif // SIMDUTF_ICELAKE_H
 /* end file src/simdutf/icelake.h */
@@ -2133,56 +2644,56 @@ SIMDUTF_POP_DISABLE_WARNINGS
 #define SIMDUTF_HASWELL_H
 
 #ifdef SIMDUTF_WESTMERE_H
-#error "haswell.h must be included before westmere.h"
+  #error "haswell.h must be included before westmere.h"
 #endif
 #ifdef SIMDUTF_FALLBACK_H
-#error "haswell.h must be included before fallback.h"
+  #error "haswell.h must be included before fallback.h"
 #endif
 
 
-// Default Haswell to on if this is x86-64. Even if we are not compiled for it, it could be selected
-// at runtime.
+// Default Haswell to on if this is x86-64. Even if we are not compiled for it,
+// it could be selected at runtime.
 #ifndef SIMDUTF_IMPLEMENTATION_HASWELL
-//
-// You do not want to restrict it like so: SIMDUTF_IS_X86_64 && __AVX2__
-// because we want to rely on *runtime dispatch*.
-//
-#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
-#define SIMDUTF_IMPLEMENTATION_HASWELL 0
-#else
-#define SIMDUTF_IMPLEMENTATION_HASWELL (SIMDUTF_IS_X86_64)
-#endif
+  //
+  // You do not want to restrict it like so: SIMDUTF_IS_X86_64 && __AVX2__
+  // because we want to rely on *runtime dispatch*.
+  //
+  #if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
+    #define SIMDUTF_IMPLEMENTATION_HASWELL 0
+  #else
+    #define SIMDUTF_IMPLEMENTATION_HASWELL (SIMDUTF_IS_X86_64)
+  #endif
 
 #endif
 // To see why  (__BMI__) && (__LZCNT__) are not part of this next line, see
 // https://github.com/simdutf/simdutf/issues/1247
 #if ((SIMDUTF_IMPLEMENTATION_HASWELL) && (SIMDUTF_IS_X86_64) && (__AVX2__))
-#define SIMDUTF_CAN_ALWAYS_RUN_HASWELL 1
+  #define SIMDUTF_CAN_ALWAYS_RUN_HASWELL 1
 #else
-#define SIMDUTF_CAN_ALWAYS_RUN_HASWELL 0
+  #define SIMDUTF_CAN_ALWAYS_RUN_HASWELL 0
 #endif
 
 #if SIMDUTF_IMPLEMENTATION_HASWELL
 
-#define SIMDUTF_TARGET_HASWELL SIMDUTF_TARGET_REGION("avx2,bmi,lzcnt,popcnt")
+  #define SIMDUTF_TARGET_HASWELL SIMDUTF_TARGET_REGION("avx2,bmi,lzcnt,popcnt")
 
 namespace simdutf {
 /**
  * Implementation for Haswell (Intel AVX2).
  */
-namespace haswell {
-} // namespace haswell
+namespace haswell {} // namespace haswell
 } // namespace simdutf
 
-//
-// These two need to be included outside SIMDUTF_TARGET_REGION
-//
+  //
+  // These two need to be included outside SIMDUTF_TARGET_REGION
+  //
 /* begin file src/simdutf/haswell/implementation.h */
 #ifndef SIMDUTF_HASWELL_IMPLEMENTATION_H
 #define SIMDUTF_HASWELL_IMPLEMENTATION_H
 
 
-// The constructor may be executed on any host, so we take care not to use SIMDUTF_TARGET_REGION
+// The constructor may be executed on any host, so we take care not to use
+// SIMDUTF_TARGET_REGION
 namespace simdutf {
 namespace haswell {
 
@@ -2190,92 +2701,203 @@ using namespace simdutf;
 
 class implementation final : public simdutf::implementation {
 public:
-  simdutf_really_inline implementation() : simdutf::implementation(
-      "haswell",
-      "Intel/AMD AVX2",
-      internal::instruction_set::AVX2 | internal::instruction_set::BMI1 | internal::instruction_set::BMI2
-  ) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused virtual size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused virtual result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused virtual size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused virtual result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused virtual size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
+  simdutf_really_inline implementation()
+      : simdutf::implementation("haswell", "Intel/AMD AVX2",
+                                internal::instruction_set::AVX2 |
+                                    internal::instruction_set::BMI1 |
+                                    internal::instruction_set::BMI2) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused virtual size_t
+  maximal_binary_length_from_base64(const char *input,
+                                    size_t length) const noexcept;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual size_t
+  maximal_binary_length_from_base64(const char16_t *input,
+                                    size_t length) const noexcept;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual size_t
+  base64_length_from_binary(size_t length,
+                            base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 };
 
 } // namespace haswell
@@ -2289,68 +2911,67 @@ class implementation final : public simdutf::implementation {
 
 
 #ifdef SIMDUTF_VISUAL_STUDIO
-// under clang within visual studio, this will include <x86intrin.h>
-#include <intrin.h>  // visual studio or clang
+  // under clang within visual studio, this will include <x86intrin.h>
+  #include <intrin.h> // visual studio or clang
 #else
 
-#if SIMDUTF_GCC11ORMORE
+  #if SIMDUTF_GCC11ORMORE
 // We should not get warnings while including <x86intrin.h> yet we do
 // under some versions of GCC.
 // If the x86intrin.h header has uninitialized values that are problematic,
 // it is a GCC issue, we want to ignore these warnings.
 SIMDUTF_DISABLE_GCC_WARNING(-Wuninitialized)
-#endif
-
-#include <x86intrin.h> // elsewhere
+  #endif
 
+  #include <x86intrin.h> // elsewhere
 
-#if SIMDUTF_GCC11ORMORE
+  #if SIMDUTF_GCC11ORMORE
 // cancels the suppression of the -Wuninitialized
 SIMDUTF_POP_DISABLE_WARNINGS
-#endif
+  #endif
 
 #endif // SIMDUTF_VISUAL_STUDIO
 
 #ifdef SIMDUTF_CLANG_VISUAL_STUDIO
-/**
- * You are not supposed, normally, to include these
- * headers directly. Instead you should either include intrin.h
- * or x86intrin.h. However, when compiling with clang
- * under Windows (i.e., when _MSC_VER is set), these headers
- * only get included *if* the corresponding features are detected
- * from macros:
- * e.g., if __AVX2__ is set... in turn,  we normally set these
- * macros by compiling against the corresponding architecture
- * (e.g., arch:AVX2, -mavx2, etc.) which compiles the whole
- * software with these advanced instructions. In simdutf, we
- * want to compile the whole program for a generic target,
- * and only target our specific kernels. As a workaround,
- * we directly include the needed headers. These headers would
- * normally guard against such usage, but we carefully included
- * <x86intrin.h>  (or <intrin.h>) before, so the headers
- * are fooled.
- */
-#include <bmiintrin.h>   // for _blsr_u64
-#include <lzcntintrin.h> // for  __lzcnt64
-#include <immintrin.h>   // for most things (AVX2, AVX512, _popcnt64)
-#include <smmintrin.h>
-#include <tmmintrin.h>
-#include <avxintrin.h>
-#include <avx2intrin.h>
-// unfortunately, we may not get _blsr_u64, but, thankfully, clang
-// has it as a macro.
-#ifndef _blsr_u64
-// we roll our own
-#define _blsr_u64(n) ((n - 1) & n)
-#endif //  _blsr_u64
-#endif // SIMDUTF_CLANG_VISUAL_STUDIO
+  /**
+   * You are not supposed, normally, to include these
+   * headers directly. Instead you should either include intrin.h
+   * or x86intrin.h. However, when compiling with clang
+   * under Windows (i.e., when _MSC_VER is set), these headers
+   * only get included *if* the corresponding features are detected
+   * from macros:
+   * e.g., if __AVX2__ is set... in turn,  we normally set these
+   * macros by compiling against the corresponding architecture
+   * (e.g., arch:AVX2, -mavx2, etc.) which compiles the whole
+   * software with these advanced instructions. In simdutf, we
+   * want to compile the whole program for a generic target,
+   * and only target our specific kernels. As a workaround,
+   * we directly include the needed headers. These headers would
+   * normally guard against such usage, but we carefully included
+   * <x86intrin.h>  (or <intrin.h>) before, so the headers
+   * are fooled.
+   */
+  #include <bmiintrin.h>   // for _blsr_u64
+  #include <lzcntintrin.h> // for  __lzcnt64
+  #include <immintrin.h>   // for most things (AVX2, AVX512, _popcnt64)
+  #include <smmintrin.h>
+  #include <tmmintrin.h>
+  #include <avxintrin.h>
+  #include <avx2intrin.h>
+  // unfortunately, we may not get _blsr_u64, but, thankfully, clang
+  // has it as a macro.
+  #ifndef _blsr_u64
+    // we roll our own
+    #define _blsr_u64(n) ((n - 1) & n)
+  #endif //  _blsr_u64
+#endif   // SIMDUTF_CLANG_VISUAL_STUDIO
 
 #endif // SIMDUTF_HASWELL_INTRINSICS_H
 /* end file src/simdutf/haswell/intrinsics.h */
 
-//
-// The rest need to be inside the region
-//
+  //
+  // The rest need to be inside the region
+  //
 /* begin file src/simdutf/haswell/begin.h */
 // redefining SIMDUTF_IMPLEMENTATION to "haswell"
 // #define SIMDUTF_IMPLEMENTATION haswell
@@ -2361,11 +2982,14 @@ SIMDUTF_POP_DISABLE_WARNINGS
 SIMDUTF_TARGET_HASWELL
 #endif
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
 SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
 #endif // end of workaround
 /* end file src/simdutf/haswell/begin.h */
-// Declarations
+  // Declarations
 /* begin file src/simdutf/haswell/bitmanipulation.h */
 #ifndef SIMDUTF_HASWELL_BITMANIPULATION_H
 #define SIMDUTF_HASWELL_BITMANIPULATION_H
@@ -2377,7 +3001,7 @@ namespace {
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
 simdutf_really_inline unsigned __int64 count_ones(uint64_t input_num) {
   // note: we do not support legacy 32-bit Windows
-  return __popcnt64(input_num);// Visual Studio wants two underscores
+  return __popcnt64(input_num); // Visual Studio wants two underscores
 }
 #else
 simdutf_really_inline long long int count_ones(uint64_t input_num) {
@@ -2387,11 +3011,11 @@ simdutf_really_inline long long int count_ones(uint64_t input_num) {
 
 #if SIMDUTF_NEED_TRAILING_ZEROES
 simdutf_inline int trailing_zeroes(uint64_t input_num) {
-#if SIMDUTF_REGULAR_VISUAL_STUDIO
+  #if SIMDUTF_REGULAR_VISUAL_STUDIO
   return (int)_tzcnt_u64(input_num);
-#else // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #else  // SIMDUTF_REGULAR_VISUAL_STUDIO
   return __builtin_ctzll(input_num);
-#endif // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #endif // SIMDUTF_REGULAR_VISUAL_STUDIO
 }
 #endif
 
@@ -2405,547 +3029,745 @@ simdutf_inline int trailing_zeroes(uint64_t input_num) {
 #ifndef SIMDUTF_HASWELL_SIMD_H
 #define SIMDUTF_HASWELL_SIMD_H
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
 namespace simd {
 
-  // Forward-declared so they can be used by splat and friends.
-  template<typename Child>
-  struct base {
-    __m256i value;
-
-    // Zero constructor
-    simdutf_really_inline base() : value{__m256i()} {}
-
-    // Conversion from SIMD register
-    simdutf_really_inline base(const __m256i _value) : value(_value) {}
-    // Conversion to SIMD register
-    simdutf_really_inline operator const __m256i&() const { return this->value; }
-    simdutf_really_inline operator __m256i&() { return this->value; }
-    template <endianness big_endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      __m256i first = _mm256_cvtepu8_epi16(_mm256_castsi256_si128(*this));
-      __m256i second = _mm256_cvtepu8_epi16(_mm256_extractf128_si256(*this,1));
-      if (big_endian) {
-        const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-        first = _mm256_shuffle_epi8(first, swap);
-        second = _mm256_shuffle_epi8(second, swap);
-      }
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr), first);
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 16), second);
-    }
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * ptr) const {
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr), _mm256_cvtepu8_epi32(_mm256_castsi256_si128(*this)));
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr+8), _mm256_cvtepu8_epi32(_mm256_castsi256_si128(_mm256_srli_si256(*this,8))));
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 16), _mm256_cvtepu8_epi32(_mm256_extractf128_si256(*this,1)));
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 24), _mm256_cvtepu8_epi32(_mm_srli_si128(_mm256_extractf128_si256(*this,1),8)));
-    }
-    // Bit operations
-    simdutf_really_inline Child operator|(const Child other) const { return _mm256_or_si256(*this, other); }
-    simdutf_really_inline Child operator&(const Child other) const { return _mm256_and_si256(*this, other); }
-    simdutf_really_inline Child operator^(const Child other) const { return _mm256_xor_si256(*this, other); }
-    simdutf_really_inline Child bit_andnot(const Child other) const { return _mm256_andnot_si256(other, *this); }
-    simdutf_really_inline Child& operator|=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast | other; return *this_cast; }
-    simdutf_really_inline Child& operator&=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast & other; return *this_cast; }
-    simdutf_really_inline Child& operator^=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast ^ other; return *this_cast; }
-  };
+// Forward-declared so they can be used by splat and friends.
+template <typename Child> struct base {
+  __m256i value;
 
-  // Forward-declared so they can be used by splat and friends.
-  template<typename T>
-  struct simd8;
+  // Zero constructor
+  simdutf_really_inline base() : value{__m256i()} {}
 
-  template<typename T, typename Mask=simd8<bool>>
-  struct base8: base<simd8<T>> {
-    typedef uint32_t bitmask_t;
-    typedef uint64_t bitmask2_t;
+  // Conversion from SIMD register
+  simdutf_really_inline base(const __m256i _value) : value(_value) {}
+  // Conversion to SIMD register
+  simdutf_really_inline operator const __m256i &() const { return this->value; }
+  simdutf_really_inline operator __m256i &() { return this->value; }
+  template <endianness big_endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    __m256i first = _mm256_cvtepu8_epi16(_mm256_castsi256_si128(*this));
+    __m256i second = _mm256_cvtepu8_epi16(_mm256_extractf128_si256(*this, 1));
+    if (big_endian) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      first = _mm256_shuffle_epi8(first, swap);
+      second = _mm256_shuffle_epi8(second, swap);
+    }
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr), first);
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 16), second);
+  }
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr),
+                        _mm256_cvtepu8_epi32(_mm256_castsi256_si128(*this)));
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 8),
+                        _mm256_cvtepu8_epi32(_mm256_castsi256_si128(
+                            _mm256_srli_si256(*this, 8))));
+    _mm256_storeu_si256(
+        reinterpret_cast<__m256i *>(ptr + 16),
+        _mm256_cvtepu8_epi32(_mm256_extractf128_si256(*this, 1)));
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(ptr + 24),
+                        _mm256_cvtepu8_epi32(_mm_srli_si128(
+                            _mm256_extractf128_si256(*this, 1), 8)));
+  }
+  // Bit operations
+  simdutf_really_inline Child operator|(const Child other) const {
+    return _mm256_or_si256(*this, other);
+  }
+  simdutf_really_inline Child operator&(const Child other) const {
+    return _mm256_and_si256(*this, other);
+  }
+  simdutf_really_inline Child operator^(const Child other) const {
+    return _mm256_xor_si256(*this, other);
+  }
+  simdutf_really_inline Child bit_andnot(const Child other) const {
+    return _mm256_andnot_si256(other, *this);
+  }
+  simdutf_really_inline Child &operator|=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator&=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator^=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
+};
 
-    simdutf_really_inline base8() : base<simd8<T>>() {}
-    simdutf_really_inline base8(const __m256i _value) : base<simd8<T>>(_value) {}
-    simdutf_really_inline T first() const { return _mm256_extract_epi8(*this,0); }
-    simdutf_really_inline T last() const { return _mm256_extract_epi8(*this,31); }
-    friend simdutf_really_inline Mask operator==(const simd8<T> lhs, const simd8<T> rhs) { return _mm256_cmpeq_epi8(lhs, rhs); }
-
-    static const int SIZE = sizeof(base<T>::value);
-
-    template<int N=1>
-    simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
-      return _mm256_alignr_epi8(*this, _mm256_permute2x128_si256(prev_chunk, *this, 0x21), 16 - N);
-    }
-  };
-
-  // SIMD byte mask type (returned by things like eq and gt)
-  template<>
-  struct simd8<bool>: base8<bool> {
-    static simdutf_really_inline simd8<bool> splat(bool _value) { return _mm256_set1_epi8(uint8_t(-(!!_value))); }
-
-    simdutf_really_inline simd8() : base8() {}
-    simdutf_really_inline simd8(const __m256i _value) : base8<bool>(_value) {}
-    // Splat constructor
-    simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
-
-    simdutf_really_inline uint32_t to_bitmask() const { return uint32_t(_mm256_movemask_epi8(*this)); }
-    simdutf_really_inline bool any() const { return !_mm256_testz_si256(*this, *this); }
-    simdutf_really_inline bool none() const { return _mm256_testz_si256(*this, *this); }
-    simdutf_really_inline bool all() const { return static_cast<uint32_t>(_mm256_movemask_epi8(*this)) == 0xFFFFFFFF; }
-    simdutf_really_inline simd8<bool> operator~() const { return *this ^ true; }
-  };
-
-  template<typename T>
-  struct base8_numeric: base8<T> {
-    static simdutf_really_inline simd8<T> splat(T _value) { return _mm256_set1_epi8(_value); }
-    static simdutf_really_inline simd8<T> zero() { return _mm256_setzero_si256(); }
-    static simdutf_really_inline simd8<T> load(const T values[32]) {
-      return _mm256_loadu_si256(reinterpret_cast<const __m256i *>(values));
-    }
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    static simdutf_really_inline simd8<T> repeat_16(
-      T v0,  T v1,  T v2,  T v3,  T v4,  T v5,  T v6,  T v7,
-      T v8,  T v9,  T v10, T v11, T v12, T v13, T v14, T v15
-    ) {
-      return simd8<T>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15,
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-    simdutf_really_inline base8_numeric() : base8<T>() {}
-    simdutf_really_inline base8_numeric(const __m256i _value) : base8<T>(_value) {}
-
-    // Store to array
-    simdutf_really_inline void store(T dst[32]) const { return _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), *this); }
-
-    // Addition/subtraction are the same for signed and unsigned
-    simdutf_really_inline simd8<T> operator+(const simd8<T> other) const { return _mm256_add_epi8(*this, other); }
-    simdutf_really_inline simd8<T> operator-(const simd8<T> other) const { return _mm256_sub_epi8(*this, other); }
-    simdutf_really_inline simd8<T>& operator+=(const simd8<T> other) { *this = *this + other; return *static_cast<simd8<T>*>(this); }
-    simdutf_really_inline simd8<T>& operator-=(const simd8<T> other) { *this = *this - other; return *static_cast<simd8<T>*>(this); }
-
-    // Override to distinguish from bool version
-    simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
-
-    // Perform a lookup assuming the value is between 0 and 16 (undefined behavior for out of range values)
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
-      return _mm256_shuffle_epi8(lookup_table, *this);
-    }
-
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(
-        L replace0,  L replace1,  L replace2,  L replace3,
-        L replace4,  L replace5,  L replace6,  L replace7,
-        L replace8,  L replace9,  L replace10, L replace11,
-        L replace12, L replace13, L replace14, L replace15) const {
-      return lookup_16(simd8<L>::repeat_16(
-        replace0,  replace1,  replace2,  replace3,
-        replace4,  replace5,  replace6,  replace7,
-        replace8,  replace9,  replace10, replace11,
-        replace12, replace13, replace14, replace15
-      ));
-    }
-  };
-
-
-  // Signed bytes
-  template<>
-  struct simd8<int8_t> : base8_numeric<int8_t> {
-    simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
-    simdutf_really_inline simd8(const __m256i _value) : base8_numeric<int8_t>(_value) {}
-
-    // Splat constructor
-    simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const int8_t values[32]) : simd8(load(values)) {}
-    simdutf_really_inline operator simd8<uint8_t>() const;
-    // Member-by-member initialization
-    simdutf_really_inline simd8(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3,  int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15,
-      int8_t v16, int8_t v17, int8_t v18, int8_t v19, int8_t v20, int8_t v21, int8_t v22, int8_t v23,
-      int8_t v24, int8_t v25, int8_t v26, int8_t v27, int8_t v28, int8_t v29, int8_t v30, int8_t v31
-    ) : simd8(_mm256_setr_epi8(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15,
-      v16,v17,v18,v19,v20,v21,v22,v23,
-      v24,v25,v26,v27,v28,v29,v30,v31
-    )) {}
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<int8_t> repeat_16(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3,  int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) {
-      return simd8<int8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15,
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-    simdutf_really_inline bool is_ascii() const { return _mm256_movemask_epi8(*this) == 0; }
-    // Order-sensitive comparisons
-    simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const { return _mm256_max_epi8(*this, other); }
-    simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const { return _mm256_min_epi8(*this, other); }
-    simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const { return _mm256_cmpgt_epi8(*this, other); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const { return _mm256_cmpgt_epi8(other, *this); }
-  };
-
-  // Unsigned bytes
-  template<>
-  struct simd8<uint8_t>: base8_numeric<uint8_t> {
-    simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
-    simdutf_really_inline simd8(const __m256i _value) : base8_numeric<uint8_t>(_value) {}
-    // Splat constructor
-    simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const uint8_t values[32]) : simd8(load(values)) {}
-    // Member-by-member initialization
-    simdutf_really_inline simd8(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15,
-      uint8_t v16, uint8_t v17, uint8_t v18, uint8_t v19, uint8_t v20, uint8_t v21, uint8_t v22, uint8_t v23,
-      uint8_t v24, uint8_t v25, uint8_t v26, uint8_t v27, uint8_t v28, uint8_t v29, uint8_t v30, uint8_t v31
-    ) : simd8(_mm256_setr_epi8(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15,
-      v16,v17,v18,v19,v20,v21,v22,v23,
-      v24,v25,v26,v27,v28,v29,v30,v31
-    )) {}
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<uint8_t> repeat_16(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) {
-      return simd8<uint8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15,
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-
-    // Saturated math
-    simdutf_really_inline simd8<uint8_t> saturating_add(const simd8<uint8_t> other) const { return _mm256_adds_epu8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> saturating_sub(const simd8<uint8_t> other) const { return _mm256_subs_epu8(*this, other); }
-
-    // Order-specific operations
-    simdutf_really_inline simd8<uint8_t> max_val(const simd8<uint8_t> other) const { return _mm256_max_epu8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> min_val(const simd8<uint8_t> other) const { return _mm256_min_epu8(other, *this); }
-    // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint8_t> gt_bits(const simd8<uint8_t> other) const { return this->saturating_sub(other); }
-    // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint8_t> lt_bits(const simd8<uint8_t> other) const { return other.saturating_sub(*this); }
-    simdutf_really_inline simd8<bool> operator<=(const simd8<uint8_t> other) const { return other.max_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator>=(const simd8<uint8_t> other) const { return other.min_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator>(const simd8<uint8_t> other) const { return this->gt_bits(other).any_bits_set(); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<uint8_t> other) const { return this->lt_bits(other).any_bits_set(); }
-
-    // Bit-specific operations
-    simdutf_really_inline simd8<bool> bits_not_set() const { return *this == uint8_t(0); }
-    simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const { return (*this & bits).bits_not_set(); }
-    simdutf_really_inline simd8<bool> any_bits_set() const { return ~this->bits_not_set(); }
-    simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const { return ~this->bits_not_set(bits); }
-    simdutf_really_inline bool is_ascii() const { return _mm256_movemask_epi8(*this) == 0; }
-    simdutf_really_inline bool bits_not_set_anywhere() const { return _mm256_testz_si256(*this, *this); }
-    simdutf_really_inline bool any_bits_set_anywhere() const { return !bits_not_set_anywhere(); }
-    simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const { return _mm256_testz_si256(*this, bits); }
-    simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const { return !bits_not_set_anywhere(bits); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shr() const { return simd8<uint8_t>(_mm256_srli_epi16(*this, N)) & uint8_t(0xFFu >> N); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shl() const { return simd8<uint8_t>(_mm256_slli_epi16(*this, N)) & uint8_t(0xFFu << N); }
-    // Get one of the bits and make a bitmask out of it.
-    // e.g. value.get_bit<7>() gets the high bit
-    template<int N>
-    simdutf_really_inline int get_bit() const { return _mm256_movemask_epi8(_mm256_slli_epi16(*this, 7-N)); }
-  };
-  simdutf_really_inline simd8<int8_t>::operator simd8<uint8_t>() const { return this->value; }
-
-
-  template<typename T>
-  struct simd8x64 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
-    static_assert(NUM_CHUNKS == 2, "Haswell kernel should use two registers per 64-byte block.");
-    simd8<T> chunks[NUM_CHUNKS];
-
-    simd8x64(const simd8x64<T>& o) = delete; // no copy allowed
-    simd8x64<T>& operator=(const simd8<T> other) = delete; // no assignment allowed
-    simd8x64() = delete; // no default constructor allowed
-
-    simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1) : chunks{chunk0, chunk1} {}
-    simdutf_really_inline simd8x64(const T* ptr) : chunks{simd8<T>::load(ptr), simd8<T>::load(ptr+sizeof(simd8<T>)/sizeof(T))} {}
-
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd8<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd8<T>)*1/sizeof(T));
-    }
-
-    simdutf_really_inline uint64_t to_bitmask() const {
-      uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
-      uint64_t r_hi =                       this->chunks[1].to_bitmask();
-      return r_lo | (r_hi << 32);
-    }
-
-    simdutf_really_inline simd8x64<T>& operator|=(const simd8x64<T> &other) {
-      this->chunks[0] |= other.chunks[0];
-      this->chunks[1] |= other.chunks[1];
-      return *this;
-    }
-
-    simdutf_really_inline simd8<T> reduce_or() const {
-      return this->chunks[0] | this->chunks[1];
-    }
-
-    simdutf_really_inline bool is_ascii() const {
-      return this->reduce_or().is_ascii();
-    }
-
-    template <endianness endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*1);
-    }
-
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*1);
-    }
-
-    simdutf_really_inline simd8x64<T> bit_or(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return simd8x64<T>(
-        this->chunks[0] | mask,
-        this->chunks[1] | mask
-      );
-    }
-
-    simdutf_really_inline uint64_t eq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] == mask,
-        this->chunks[1] == mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
-      return  simd8x64<bool>(
-        this->chunks[0] == other.chunks[0],
-        this->chunks[1] == other.chunks[1]
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t lteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] <= mask,
-        this->chunks[1] <= mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-
-      return  simd8x64<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-      return  simd8x64<bool>(
-        (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
-        (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t gt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] > mask,
-        this->chunks[1] > mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] >= mask,
-        this->chunks[1] >= mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
-      const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
-      return  simd8x64<bool>(
-        (simd8<uint8_t>(__m256i(this->chunks[0])) >= mask),
-        (simd8<uint8_t>(__m256i(this->chunks[1])) >= mask)
-      ).to_bitmask();
-    }
-  }; // struct simd8x64<T>
-
-/* begin file src/simdutf/haswell/simd16-inl.h */
-#ifdef __GNUC__
-#if __GNUC__ < 8
-#define _mm256_set_m128i(xmm1, xmm2) _mm256_permute2f128_si256(_mm256_castsi128_si256(xmm1), _mm256_castsi128_si256(xmm2), 2)
-#define _mm256_setr_m128i(xmm2, xmm1)  _mm256_permute2f128_si256(_mm256_castsi128_si256(xmm1), _mm256_castsi128_si256(xmm2), 2)
-#endif
-#endif
-
-template<typename T>
-struct simd16;
-
-template<typename T, typename Mask=simd16<bool>>
-struct base16: base<simd16<T>> {
-  using bitmask_type = uint32_t;
+// Forward-declared so they can be used by splat and friends.
+template <typename T> struct simd8;
 
-  simdutf_really_inline base16() : base<simd16<T>>() {}
-  simdutf_really_inline base16(const __m256i _value) : base<simd16<T>>(_value) {}
-  template <typename Pointer>
-  simdutf_really_inline base16(const Pointer* ptr) : base16(_mm256_loadu_si256(reinterpret_cast<const __m256i*>(ptr))) {}
-  friend simdutf_really_inline Mask operator==(const simd16<T> lhs, const simd16<T> rhs) { return _mm256_cmpeq_epi16(lhs, rhs); }
+template <typename T, typename Mask = simd8<bool>>
+struct base8 : base<simd8<T>> {
+  typedef uint32_t bitmask_t;
+  typedef uint64_t bitmask2_t;
 
-  /// the size of vector in bytes
-  static const int SIZE = sizeof(base<simd16<T>>::value);
+  simdutf_really_inline base8() : base<simd8<T>>() {}
+  simdutf_really_inline base8(const __m256i _value) : base<simd8<T>>(_value) {}
+  simdutf_really_inline T first() const {
+    return _mm256_extract_epi8(*this, 0);
+  }
+  simdutf_really_inline T last() const {
+    return _mm256_extract_epi8(*this, 31);
+  }
+  friend simdutf_always_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return _mm256_cmpeq_epi8(lhs, rhs);
+  }
 
-  /// the number of elements of type T a vector can hold
-  static const int ELEMENTS = SIZE / sizeof(T);
+  static const int SIZE = sizeof(base<T>::value);
 
-  template<int N=1>
-  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
-    return _mm256_alignr_epi8(*this, prev_chunk, 16 - N);
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
+    return _mm256_alignr_epi8(
+        *this, _mm256_permute2x128_si256(prev_chunk, *this, 0x21), 16 - N);
   }
 };
 
 // SIMD byte mask type (returned by things like eq and gt)
-template<>
-struct simd16<bool>: base16<bool> {
-  static simdutf_really_inline simd16<bool> splat(bool _value) { return _mm256_set1_epi16(uint16_t(-(!!_value))); }
+template <> struct simd8<bool> : base8<bool> {
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return _mm256_set1_epi8(uint8_t(-(!!_value)));
+  }
 
-  simdutf_really_inline simd16() : base16() {}
-  simdutf_really_inline simd16(const __m256i _value) : base16<bool>(_value) {}
+  simdutf_really_inline simd8() : base8() {}
+  simdutf_really_inline simd8(const __m256i _value) : base8<bool>(_value) {}
   // Splat constructor
-  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+  simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
 
-  simdutf_really_inline bitmask_type to_bitmask() const { return _mm256_movemask_epi8(*this); }
-  simdutf_really_inline bool any() const { return !_mm256_testz_si256(*this, *this); }
-  simdutf_really_inline simd16<bool> operator~() const { return *this ^ true; }
+  simdutf_really_inline uint32_t to_bitmask() const {
+    return uint32_t(_mm256_movemask_epi8(*this));
+  }
+  simdutf_really_inline bool any() const {
+    return !_mm256_testz_si256(*this, *this);
+  }
+  simdutf_really_inline bool none() const {
+    return _mm256_testz_si256(*this, *this);
+  }
+  simdutf_really_inline bool all() const {
+    return static_cast<uint32_t>(_mm256_movemask_epi8(*this)) == 0xFFFFFFFF;
+  }
+  simdutf_really_inline simd8<bool> operator~() const { return *this ^ true; }
 };
 
-template<typename T>
-struct base16_numeric: base16<T> {
-  static simdutf_really_inline simd16<T> splat(T _value) { return _mm256_set1_epi16(_value); }
-  static simdutf_really_inline simd16<T> zero() { return _mm256_setzero_si256(); }
-  static simdutf_really_inline simd16<T> load(const T values[8]) {
+template <typename T> struct base8_numeric : base8<T> {
+  static simdutf_really_inline simd8<T> splat(T _value) {
+    return _mm256_set1_epi8(_value);
+  }
+  static simdutf_really_inline simd8<T> zero() {
+    return _mm256_setzero_si256();
+  }
+  static simdutf_really_inline simd8<T> load(const T values[32]) {
     return _mm256_loadu_si256(reinterpret_cast<const __m256i *>(values));
   }
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  static simdutf_really_inline simd8<T> repeat_16(T v0, T v1, T v2, T v3, T v4,
+                                                  T v5, T v6, T v7, T v8, T v9,
+                                                  T v10, T v11, T v12, T v13,
+                                                  T v14, T v15) {
+    return simd8<T>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
+                    v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                    v12, v13, v14, v15);
+  }
 
-  simdutf_really_inline base16_numeric() : base16<T>() {}
-  simdutf_really_inline base16_numeric(const __m256i _value) : base16<T>(_value) {}
+  simdutf_really_inline base8_numeric() : base8<T>() {}
+  simdutf_really_inline base8_numeric(const __m256i _value)
+      : base8<T>(_value) {}
 
   // Store to array
-  simdutf_really_inline void store(T dst[8]) const { return _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), *this); }
+  simdutf_really_inline void store(T dst[32]) const {
+    return _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), *this);
+  }
+
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd8<T> operator+(const simd8<T> other) const {
+    return _mm256_add_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<T> operator-(const simd8<T> other) const {
+    return _mm256_sub_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<T> &operator+=(const simd8<T> other) {
+    *this = *this + other;
+    return *static_cast<simd8<T> *>(this);
+  }
+  simdutf_really_inline simd8<T> &operator-=(const simd8<T> other) {
+    *this = *this - other;
+    return *static_cast<simd8<T> *>(this);
+  }
 
   // Override to distinguish from bool version
-  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFFFu; }
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
 
-  // Addition/subtraction are the same for signed and unsigned
-  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const { return _mm256_add_epi16(*this, other); }
-  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const { return _mm256_sub_epi16(*this, other); }
-  simdutf_really_inline simd16<T>& operator+=(const simd16<T> other) { *this = *this + other; return *static_cast<simd16<T>*>(this); }
-  simdutf_really_inline simd16<T>& operator-=(const simd16<T> other) { *this = *this - other; return *static_cast<simd16<T>*>(this); }
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return _mm256_shuffle_epi8(lookup_table, *this);
+  }
+
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
+  }
 };
 
-// Signed code units
-template<>
-struct simd16<int16_t> : base16_numeric<int16_t> {
-  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
-  simdutf_really_inline simd16(const __m256i _value) : base16_numeric<int16_t>(_value) {}
+// Signed bytes
+template <> struct simd8<int8_t> : base8_numeric<int8_t> {
+  simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
+  simdutf_really_inline simd8(const __m256i _value)
+      : base8_numeric<int8_t>(_value) {}
+
   // Splat constructor
-  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd16(const int16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const int16_t*>(values))) {}
+  simdutf_really_inline simd8(const int8_t values[32]) : simd8(load(values)) {}
+  simdutf_really_inline operator simd8<uint8_t>() const;
+  // Member-by-member initialization
+  simdutf_really_inline
+  simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+        int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+        int8_t v12, int8_t v13, int8_t v14, int8_t v15, int8_t v16, int8_t v17,
+        int8_t v18, int8_t v19, int8_t v20, int8_t v21, int8_t v22, int8_t v23,
+        int8_t v24, int8_t v25, int8_t v26, int8_t v27, int8_t v28, int8_t v29,
+        int8_t v30, int8_t v31)
+      : simd8(_mm256_setr_epi8(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                               v12, v13, v14, v15, v16, v17, v18, v19, v20, v21,
+                               v22, v23, v24, v25, v26, v27, v28, v29, v30,
+                               v31)) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                         v10, v11, v12, v13, v14, v15);
+  }
+  simdutf_really_inline bool is_ascii() const {
+    return _mm256_movemask_epi8(*this) == 0;
+  }
   // Order-sensitive comparisons
-  simdutf_really_inline simd16<int16_t> max_val(const simd16<int16_t> other) const { return _mm256_max_epi16(*this, other); }
-  simdutf_really_inline simd16<int16_t> min_val(const simd16<int16_t> other) const { return _mm256_min_epi16(*this, other); }
-  simdutf_really_inline simd16<bool> operator>(const simd16<int16_t> other) const { return _mm256_cmpgt_epi16(*this, other); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<int16_t> other) const { return _mm256_cmpgt_epi16(other, *this); }
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return _mm256_max_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return _mm256_min_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return _mm256_cmpgt_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return _mm256_cmpgt_epi8(other, *this);
+  }
 };
 
-// Unsigned code units
-template<>
-struct simd16<uint16_t>: base16_numeric<uint16_t>  {
-  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
-  simdutf_really_inline simd16(const __m256i _value) : base16_numeric<uint16_t>(_value) {}
-
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base8_numeric<uint8_t> {
+  simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
+  simdutf_really_inline simd8(const __m256i _value)
+      : base8_numeric<uint8_t>(_value) {}
   // Splat constructor
-  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd16(const uint16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const uint16_t*>(values))) {}
+  simdutf_really_inline simd8(const uint8_t values[32]) : simd8(load(values)) {}
+  // Member-by-member initialization
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15,
+        uint8_t v16, uint8_t v17, uint8_t v18, uint8_t v19, uint8_t v20,
+        uint8_t v21, uint8_t v22, uint8_t v23, uint8_t v24, uint8_t v25,
+        uint8_t v26, uint8_t v27, uint8_t v28, uint8_t v29, uint8_t v30,
+        uint8_t v31)
+      : simd8(_mm256_setr_epi8(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                               v12, v13, v14, v15, v16, v17, v18, v19, v20, v21,
+                               v22, v23, v24, v25, v26, v27, v28, v29, v30,
+                               v31)) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                          v10, v11, v12, v13, v14, v15);
+  }
 
   // Saturated math
-  simdutf_really_inline simd16<uint16_t> saturating_add(const simd16<uint16_t> other) const { return _mm256_adds_epu16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> saturating_sub(const simd16<uint16_t> other) const { return _mm256_subs_epu16(*this, other); }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return _mm256_adds_epu8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return _mm256_subs_epu8(*this, other);
+  }
 
   // Order-specific operations
-  simdutf_really_inline simd16<uint16_t> max_val(const simd16<uint16_t> other) const { return _mm256_max_epu16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> min_val(const simd16<uint16_t> other) const { return _mm256_min_epu16(*this, other); }
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return _mm256_max_epu8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return _mm256_min_epu8(other, *this);
+  }
   // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> gt_bits(const simd16<uint16_t> other) const { return this->saturating_sub(other); }
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return this->saturating_sub(other);
+  }
   // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> lt_bits(const simd16<uint16_t> other) const { return other.saturating_sub(*this); }
-  simdutf_really_inline simd16<bool> operator<=(const simd16<uint16_t> other) const { return other.max_val(*this) == other; }
-  simdutf_really_inline simd16<bool> operator>=(const simd16<uint16_t> other) const { return other.min_val(*this) == other; }
-  simdutf_really_inline simd16<bool> operator>(const simd16<uint16_t> other) const { return this->gt_bits(other).any_bits_set(); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<uint16_t> other) const { return this->gt_bits(other).any_bits_set(); }
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return other.max_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return other.min_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return this->lt_bits(other).any_bits_set();
+  }
 
   // Bit-specific operations
-  simdutf_really_inline simd16<bool> bits_not_set() const { return *this == uint16_t(0); }
-  simdutf_really_inline simd16<bool> bits_not_set(simd16<uint16_t> bits) const { return (*this & bits).bits_not_set(); }
-  simdutf_really_inline simd16<bool> any_bits_set() const { return ~this->bits_not_set(); }
-  simdutf_really_inline simd16<bool> any_bits_set(simd16<uint16_t> bits) const { return ~this->bits_not_set(bits); }
-
-  simdutf_really_inline bool bits_not_set_anywhere() const { return _mm256_testz_si256(*this, *this); }
-  simdutf_really_inline bool any_bits_set_anywhere() const { return !bits_not_set_anywhere(); }
-  simdutf_really_inline bool bits_not_set_anywhere(simd16<uint16_t> bits) const { return _mm256_testz_si256(*this, bits); }
-  simdutf_really_inline bool any_bits_set_anywhere(simd16<uint16_t> bits) const { return !bits_not_set_anywhere(bits); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shr() const { return simd16<uint16_t>(_mm256_srli_epi16(*this, N)); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shl() const { return simd16<uint16_t>(_mm256_slli_epi16(*this, N)); }
+  simdutf_really_inline simd8<bool> bits_not_set() const {
+    return *this == uint8_t(0);
+  }
+  simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const {
+    return (*this & bits).bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return ~this->bits_not_set(bits);
+  }
+  simdutf_really_inline bool is_ascii() const {
+    return _mm256_movemask_epi8(*this) == 0;
+  }
+  simdutf_really_inline bool bits_not_set_anywhere() const {
+    return _mm256_testz_si256(*this, *this);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return !bits_not_set_anywhere();
+  }
+  simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const {
+    return _mm256_testz_si256(*this, bits);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return !bits_not_set_anywhere(bits);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return simd8<uint8_t>(_mm256_srli_epi16(*this, N)) & uint8_t(0xFFu >> N);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return simd8<uint8_t>(_mm256_slli_epi16(*this, N)) & uint8_t(0xFFu << N);
+  }
   // Get one of the bits and make a bitmask out of it.
   // e.g. value.get_bit<7>() gets the high bit
-  template<int N>
-  simdutf_really_inline int get_bit() const { return _mm256_movemask_epi8(_mm256_slli_epi16(*this, 15-N)); }
-
-  // Change the endianness
-  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
-    const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-    return _mm256_shuffle_epi8(*this, swap);
+  template <int N> simdutf_really_inline int get_bit() const {
+    return _mm256_movemask_epi8(_mm256_slli_epi16(*this, 7 - N));
   }
+};
+simdutf_really_inline simd8<int8_t>::operator simd8<uint8_t>() const {
+  return this->value;
+}
 
-  // Pack with the unsigned saturation of two uint16_t code units into single uint8_t vector
-  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t>& v0, const simd16<uint16_t>& v1) {
-    // Note: the AVX2 variant of pack operates on 128-bit lanes, thus
-    //       we have to shuffle lanes in order to produce bytes in the
-    //       correct order.
-
-    // get the 0th lanes
-    const __m128i lo_0 = _mm256_extracti128_si256(v0, 0);
-    const __m128i lo_1 = _mm256_extracti128_si256(v1, 0);
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(NUM_CHUNKS == 2,
+                "Haswell kernel should use two registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
+
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
+
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1)
+      : chunks{chunk0, chunk1} {}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T))} {}
+
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
+  }
+
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r_hi = this->chunks[1].to_bitmask();
+    return r_lo | (r_hi << 32);
+  }
+
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    return *this;
+  }
+
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return this->chunks[0] | this->chunks[1];
+  }
+
+  simdutf_really_inline bool is_ascii() const {
+    return this->reduce_or().is_ascii();
+  }
+
+  template <endianness endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 0);
+    this->chunks[1].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 1);
+  }
+
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 0);
+    this->chunks[1].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 1);
+  }
+
+  simdutf_really_inline simd8x64<T> bit_or(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<T>(this->chunks[0] | mask, this->chunks[1] | mask);
+  }
+
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask)
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
+    return simd8x64<bool>(this->chunks[0] == other.chunks[0],
+                          this->chunks[1] == other.chunks[1])
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask)
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+    return simd8x64<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask)
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>((simd8<uint8_t>(__m256i(this->chunks[0])) >= mask),
+                          (simd8<uint8_t>(__m256i(this->chunks[1])) >= mask))
+        .to_bitmask();
+  }
+}; // struct simd8x64<T>
+
+/* begin file src/simdutf/haswell/simd16-inl.h */
+#ifdef __GNUC__
+  #if __GNUC__ < 8
+    #define _mm256_set_m128i(xmm1, xmm2)                                       \
+      _mm256_permute2f128_si256(_mm256_castsi128_si256(xmm1),                  \
+                                _mm256_castsi128_si256(xmm2), 2)
+    #define _mm256_setr_m128i(xmm2, xmm1)                                      \
+      _mm256_permute2f128_si256(_mm256_castsi128_si256(xmm1),                  \
+                                _mm256_castsi128_si256(xmm2), 2)
+  #endif
+#endif
+
+template <typename T> struct simd16;
+
+template <typename T, typename Mask = simd16<bool>>
+struct base16 : base<simd16<T>> {
+  using bitmask_type = uint32_t;
+
+  simdutf_really_inline base16() : base<simd16<T>>() {}
+  simdutf_really_inline base16(const __m256i _value)
+      : base<simd16<T>>(_value) {}
+  template <typename Pointer>
+  simdutf_really_inline base16(const Pointer *ptr)
+      : base16(_mm256_loadu_si256(reinterpret_cast<const __m256i *>(ptr))) {}
+  friend simdutf_always_inline Mask operator==(const simd16<T> lhs,
+                                               const simd16<T> rhs) {
+    return _mm256_cmpeq_epi16(lhs, rhs);
+  }
+
+  /// the size of vector in bytes
+  static const int SIZE = sizeof(base<simd16<T>>::value);
+
+  /// the number of elements of type T a vector can hold
+  static const int ELEMENTS = SIZE / sizeof(T);
+
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return _mm256_alignr_epi8(*this, prev_chunk, 16 - N);
+  }
+};
+
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd16<bool> : base16<bool> {
+  static simdutf_really_inline simd16<bool> splat(bool _value) {
+    return _mm256_set1_epi16(uint16_t(-(!!_value)));
+  }
+
+  simdutf_really_inline simd16() : base16() {}
+  simdutf_really_inline simd16(const __m256i _value) : base16<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+
+  simdutf_really_inline bitmask_type to_bitmask() const {
+    return _mm256_movemask_epi8(*this);
+  }
+  simdutf_really_inline bool any() const {
+    return !_mm256_testz_si256(*this, *this);
+  }
+  simdutf_really_inline simd16<bool> operator~() const { return *this ^ true; }
+};
+
+template <typename T> struct base16_numeric : base16<T> {
+  static simdutf_really_inline simd16<T> splat(T _value) {
+    return _mm256_set1_epi16(_value);
+  }
+  static simdutf_really_inline simd16<T> zero() {
+    return _mm256_setzero_si256();
+  }
+  static simdutf_really_inline simd16<T> load(const T values[8]) {
+    return _mm256_loadu_si256(reinterpret_cast<const __m256i *>(values));
+  }
+
+  simdutf_really_inline base16_numeric() : base16<T>() {}
+  simdutf_really_inline base16_numeric(const __m256i _value)
+      : base16<T>(_value) {}
+
+  // Store to array
+  simdutf_really_inline void store(T dst[8]) const {
+    return _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), *this);
+  }
+
+  // Override to distinguish from bool version
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFFFu; }
+
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const {
+    return _mm256_add_epi16(*this, other);
+  }
+  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const {
+    return _mm256_sub_epi16(*this, other);
+  }
+  simdutf_really_inline simd16<T> &operator+=(const simd16<T> other) {
+    *this = *this + other;
+    return *static_cast<simd16<T> *>(this);
+  }
+  simdutf_really_inline simd16<T> &operator-=(const simd16<T> other) {
+    *this = *this - other;
+    return *static_cast<simd16<T> *>(this);
+  }
+};
+
+// Signed code units
+template <> struct simd16<int16_t> : base16_numeric<int16_t> {
+  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+  simdutf_really_inline simd16(const __m256i _value)
+      : base16_numeric<int16_t>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const int16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const int16_t *>(values))) {}
+  // Order-sensitive comparisons
+  simdutf_really_inline simd16<int16_t>
+  max_val(const simd16<int16_t> other) const {
+    return _mm256_max_epi16(*this, other);
+  }
+  simdutf_really_inline simd16<int16_t>
+  min_val(const simd16<int16_t> other) const {
+    return _mm256_min_epi16(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<int16_t> other) const {
+    return _mm256_cmpgt_epi16(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<int16_t> other) const {
+    return _mm256_cmpgt_epi16(other, *this);
+  }
+};
+
+// Unsigned code units
+template <> struct simd16<uint16_t> : base16_numeric<uint16_t> {
+  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
+  simdutf_really_inline simd16(const __m256i _value)
+      : base16_numeric<uint16_t>(_value) {}
+
+  // Splat constructor
+  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const uint16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const uint16_t *>(values))) {}
+
+  // Saturated math
+  simdutf_really_inline simd16<uint16_t>
+  saturating_add(const simd16<uint16_t> other) const {
+    return _mm256_adds_epu16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  saturating_sub(const simd16<uint16_t> other) const {
+    return _mm256_subs_epu16(*this, other);
+  }
+
+  // Order-specific operations
+  simdutf_really_inline simd16<uint16_t>
+  max_val(const simd16<uint16_t> other) const {
+    return _mm256_max_epu16(*this, other);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  min_val(const simd16<uint16_t> other) const {
+    return _mm256_min_epu16(*this, other);
+  }
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  gt_bits(const simd16<uint16_t> other) const {
+    return this->saturating_sub(other);
+  }
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  lt_bits(const simd16<uint16_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<=(const simd16<uint16_t> other) const {
+    return other.max_val(*this) == other;
+  }
+  simdutf_really_inline simd16<bool>
+  operator>=(const simd16<uint16_t> other) const {
+    return other.min_val(*this) == other;
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<uint16_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<uint16_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
+
+  // Bit-specific operations
+  simdutf_really_inline simd16<bool> bits_not_set() const {
+    return *this == uint16_t(0);
+  }
+  simdutf_really_inline simd16<bool> bits_not_set(simd16<uint16_t> bits) const {
+    return (*this & bits).bits_not_set();
+  }
+  simdutf_really_inline simd16<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
+  simdutf_really_inline simd16<bool> any_bits_set(simd16<uint16_t> bits) const {
+    return ~this->bits_not_set(bits);
+  }
+
+  simdutf_really_inline bool bits_not_set_anywhere() const {
+    return _mm256_testz_si256(*this, *this);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return !bits_not_set_anywhere();
+  }
+  simdutf_really_inline bool
+  bits_not_set_anywhere(simd16<uint16_t> bits) const {
+    return _mm256_testz_si256(*this, bits);
+  }
+  simdutf_really_inline bool
+  any_bits_set_anywhere(simd16<uint16_t> bits) const {
+    return !bits_not_set_anywhere(bits);
+  }
+  template <int N> simdutf_really_inline simd16<uint16_t> shr() const {
+    return simd16<uint16_t>(_mm256_srli_epi16(*this, N));
+  }
+  template <int N> simdutf_really_inline simd16<uint16_t> shl() const {
+    return simd16<uint16_t>(_mm256_slli_epi16(*this, N));
+  }
+  // Get one of the bits and make a bitmask out of it.
+  // e.g. value.get_bit<7>() gets the high bit
+  template <int N> simdutf_really_inline int get_bit() const {
+    return _mm256_movemask_epi8(_mm256_slli_epi16(*this, 15 - N));
+  }
+
+  // Change the endianness
+  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
+    const __m256i swap = _mm256_setr_epi8(
+        1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+        21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+    return _mm256_shuffle_epi8(*this, swap);
+  }
+
+  // Pack with the unsigned saturation of two uint16_t code units into single
+  // uint8_t vector
+  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t> &v0,
+                                                   const simd16<uint16_t> &v1) {
+    // Note: the AVX2 variant of pack operates on 128-bit lanes, thus
+    //       we have to shuffle lanes in order to produce bytes in the
+    //       correct order.
+
+    // get the 0th lanes
+    const __m128i lo_0 = _mm256_extracti128_si256(v0, 0);
+    const __m128i lo_1 = _mm256_extracti128_si256(v1, 0);
 
     // get the 1st lanes
     const __m128i hi_0 = _mm256_extracti128_si256(v0, 1);
@@ -2960,105 +3782,99 @@ struct simd16<uint16_t>: base16_numeric<uint16_t>  {
   }
 };
 
+template <typename T> struct simd16x32 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
+  static_assert(NUM_CHUNKS == 2,
+                "Haswell kernel should use two registers per 64-byte block.");
+  simd16<T> chunks[NUM_CHUNKS];
 
-  template<typename T>
-  struct simd16x32 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
-    static_assert(NUM_CHUNKS == 2, "Haswell kernel should use two registers per 64-byte block.");
-    simd16<T> chunks[NUM_CHUNKS];
+  simd16x32(const simd16x32<T> &o) = delete; // no copy allowed
+  simd16x32<T> &
+  operator=(const simd16<T> other) = delete; // no assignment allowed
+  simd16x32() = delete;                      // no default constructor allowed
 
-    simd16x32(const simd16x32<T>& o) = delete; // no copy allowed
-    simd16x32<T>& operator=(const simd16<T> other) = delete; // no assignment allowed
-    simd16x32() = delete; // no default constructor allowed
+  simdutf_really_inline simd16x32(const simd16<T> chunk0,
+                                  const simd16<T> chunk1)
+      : chunks{chunk0, chunk1} {}
+  simdutf_really_inline simd16x32(const T *ptr)
+      : chunks{simd16<T>::load(ptr),
+               simd16<T>::load(ptr + sizeof(simd16<T>) / sizeof(T))} {}
 
-    simdutf_really_inline simd16x32(const simd16<T> chunk0, const simd16<T> chunk1) : chunks{chunk0, chunk1} {}
-    simdutf_really_inline simd16x32(const T* ptr) : chunks{simd16<T>::load(ptr), simd16<T>::load(ptr+sizeof(simd16<T>)/sizeof(T))} {}
-
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd16<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd16<T>)*1/sizeof(T));
-    }
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd16<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd16<T>) * 1 / sizeof(T));
+  }
 
-    simdutf_really_inline uint64_t to_bitmask() const {
-      uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
-      uint64_t r_hi =                       this->chunks[1].to_bitmask();
-      return r_lo | (r_hi << 32);
-    }
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r_hi = this->chunks[1].to_bitmask();
+    return r_lo | (r_hi << 32);
+  }
 
-    simdutf_really_inline simd16<T> reduce_or() const {
-      return this->chunks[0] | this->chunks[1];
-    }
+  simdutf_really_inline simd16<T> reduce_or() const {
+    return this->chunks[0] | this->chunks[1];
+  }
 
-    simdutf_really_inline bool is_ascii() const {
-      return this->reduce_or().is_ascii();
-    }
+  simdutf_really_inline bool is_ascii() const {
+    return this->reduce_or().is_ascii();
+  }
 
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*0);
-      this->chunks[1].store_ascii_as_utf16(ptr+sizeof(simd16<T>));
-    }
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 0);
+    this->chunks[1].store_ascii_as_utf16(ptr + sizeof(simd16<T>));
+  }
 
-    simdutf_really_inline simd16x32<T> bit_or(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return simd16x32<T>(
-        this->chunks[0] | mask,
-        this->chunks[1] | mask
-      );
-    }
+  simdutf_really_inline simd16x32<T> bit_or(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<T>(this->chunks[0] | mask, this->chunks[1] | mask);
+  }
 
-    simdutf_really_inline void swap_bytes() {
-      this->chunks[0] = this->chunks[0].swap_bytes();
-      this->chunks[1] = this->chunks[1].swap_bytes();
-    }
+  simdutf_really_inline void swap_bytes() {
+    this->chunks[0] = this->chunks[0].swap_bytes();
+    this->chunks[1] = this->chunks[1].swap_bytes();
+  }
 
-    simdutf_really_inline uint64_t eq(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] == mask,
-        this->chunks[1] == mask
-      ).to_bitmask();
-    }
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] == mask, this->chunks[1] == mask)
+        .to_bitmask();
+  }
 
-    simdutf_really_inline uint64_t eq(const simd16x32<uint16_t> &other) const {
-      return  simd16x32<bool>(
-        this->chunks[0] == other.chunks[0],
-        this->chunks[1] == other.chunks[1]
-      ).to_bitmask();
-    }
+  simdutf_really_inline uint64_t eq(const simd16x32<uint16_t> &other) const {
+    return simd16x32<bool>(this->chunks[0] == other.chunks[0],
+                           this->chunks[1] == other.chunks[1])
+        .to_bitmask();
+  }
 
-    simdutf_really_inline uint64_t lteq(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] <= mask,
-        this->chunks[1] <= mask
-      ).to_bitmask();
-    }
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask)
+        .to_bitmask();
+  }
 
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(low);
-      const simd16<T> mask_high = simd16<T>::splat(high);
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
 
-      return  simd16x32<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(static_cast<T>(low-1));
-      const simd16<T> mask_high = simd16<T>::splat(static_cast<T>(high+1));
-      return simd16x32<bool>(
-        (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
-        (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask
-      ).to_bitmask();
-    }
-  }; // struct simd16x32<T>
+    return simd16x32<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(static_cast<T>(low - 1));
+    const simd16<T> mask_high = simd16<T>::splat(static_cast<T>(high + 1));
+    return simd16x32<bool>(
+               (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
+               (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] < mask, this->chunks[1] < mask)
+        .to_bitmask();
+  }
+}; // struct simd16x32<T>
 /* end file src/simdutf/haswell/simd16-inl.h */
 
 } // namespace simd
@@ -3078,7 +3894,8 @@ SIMDUTF_UNTARGET_REGION
 #endif
 
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
 SIMDUTF_POP_DISABLE_WARNINGS
 #endif // end of workaround
 /* end file src/simdutf/haswell/end.h */
@@ -3091,51 +3908,51 @@ SIMDUTF_POP_DISABLE_WARNINGS
 #define SIMDUTF_WESTMERE_H
 
 #ifdef SIMDUTF_FALLBACK_H
-#error "westmere.h must be included before fallback.h"
+  #error "westmere.h must be included before fallback.h"
 #endif
 
 
 // Default Westmere to on if this is x86-64, unless we'll always select Haswell.
 #ifndef SIMDUTF_IMPLEMENTATION_WESTMERE
-//
-// You do not want to set it to (SIMDUTF_IS_X86_64 && !SIMDUTF_REQUIRES_HASWELL)
-// because you want to rely on runtime dispatch!
-//
-#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE || SIMDUTF_CAN_ALWAYS_RUN_HASWELL
-#define SIMDUTF_IMPLEMENTATION_WESTMERE 0
-#else
-#define SIMDUTF_IMPLEMENTATION_WESTMERE (SIMDUTF_IS_X86_64)
-#endif
+  //
+  // You do not want to set it to (SIMDUTF_IS_X86_64 &&
+  // !SIMDUTF_REQUIRES_HASWELL) because you want to rely on runtime dispatch!
+  //
+  #if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE || SIMDUTF_CAN_ALWAYS_RUN_HASWELL
+    #define SIMDUTF_IMPLEMENTATION_WESTMERE 0
+  #else
+    #define SIMDUTF_IMPLEMENTATION_WESTMERE (SIMDUTF_IS_X86_64)
+  #endif
 
 #endif
 
 #if (SIMDUTF_IMPLEMENTATION_WESTMERE && SIMDUTF_IS_X86_64 && __SSE4_2__)
-#define SIMDUTF_CAN_ALWAYS_RUN_WESTMERE 1
+  #define SIMDUTF_CAN_ALWAYS_RUN_WESTMERE 1
 #else
-#define SIMDUTF_CAN_ALWAYS_RUN_WESTMERE 0
+  #define SIMDUTF_CAN_ALWAYS_RUN_WESTMERE 0
 #endif
 
 #if SIMDUTF_IMPLEMENTATION_WESTMERE
 
-#define SIMDUTF_TARGET_WESTMERE SIMDUTF_TARGET_REGION("sse4.2,popcnt")
+  #define SIMDUTF_TARGET_WESTMERE SIMDUTF_TARGET_REGION("sse4.2,popcnt")
 
 namespace simdutf {
 /**
  * Implementation for Westmere (Intel SSE4.2).
  */
-namespace westmere {
-} // namespace westmere
+namespace westmere {} // namespace westmere
 } // namespace simdutf
 
-//
-// These two need to be included outside SIMDUTF_TARGET_REGION
-//
+  //
+  // These two need to be included outside SIMDUTF_TARGET_REGION
+  //
 /* begin file src/simdutf/westmere/implementation.h */
 #ifndef SIMDUTF_WESTMERE_IMPLEMENTATION_H
 #define SIMDUTF_WESTMERE_IMPLEMENTATION_H
 
 
-// The constructor may be executed on any host, so we take care not to use SIMDUTF_TARGET_REGION
+// The constructor may be executed on any host, so we take care not to use
+// SIMDUTF_TARGET_REGION
 namespace simdutf {
 namespace westmere {
 
@@ -3145,88 +3962,197 @@ using namespace simdutf;
 
 class implementation final : public simdutf::implementation {
 public:
-  simdutf_really_inline implementation() : simdutf::implementation("westmere", "Intel/AMD SSE4.2", internal::instruction_set::SSE42) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
+  simdutf_really_inline implementation()
+      : simdutf::implementation("westmere", "Intel/AMD SSE4.2",
+                                internal::instruction_set::SSE42) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 };
 
 } // namespace westmere
@@ -3239,49 +4165,45 @@ class implementation final : public simdutf::implementation {
 #define SIMDUTF_WESTMERE_INTRINSICS_H
 
 #ifdef SIMDUTF_VISUAL_STUDIO
-// under clang within visual studio, this will include <x86intrin.h>
-#include <intrin.h> // visual studio or clang
+  // under clang within visual studio, this will include <x86intrin.h>
+  #include <intrin.h> // visual studio or clang
 #else
 
-#if SIMDUTF_GCC11ORMORE
+  #if SIMDUTF_GCC11ORMORE
 // We should not get warnings while including <x86intrin.h> yet we do
 // under some versions of GCC.
 // If the x86intrin.h header has uninitialized values that are problematic,
 // it is a GCC issue, we want to ignore these warnings.
 SIMDUTF_DISABLE_GCC_WARNING(-Wuninitialized)
-#endif
-
-#include <x86intrin.h> // elsewhere
+  #endif
 
+  #include <x86intrin.h> // elsewhere
 
-#if SIMDUTF_GCC11ORMORE
+  #if SIMDUTF_GCC11ORMORE
 // cancels the suppression of the -Wuninitialized
 SIMDUTF_POP_DISABLE_WARNINGS
-#endif
+  #endif
 
 #endif // SIMDUTF_VISUAL_STUDIO
 
-
 #ifdef SIMDUTF_CLANG_VISUAL_STUDIO
-/**
- * You are not supposed, normally, to include these
- * headers directly. Instead you should either include intrin.h
- * or x86intrin.h. However, when compiling with clang
- * under Windows (i.e., when _MSC_VER is set), these headers
- * only get included *if* the corresponding features are detected
- * from macros:
- */
-#include <smmintrin.h>  // for _mm_alignr_epi8
+  /**
+   * You are not supposed, normally, to include these
+   * headers directly. Instead you should either include intrin.h
+   * or x86intrin.h. However, when compiling with clang
+   * under Windows (i.e., when _MSC_VER is set), these headers
+   * only get included *if* the corresponding features are detected
+   * from macros:
+   */
+  #include <smmintrin.h> // for _mm_alignr_epi8
 #endif
 
-
-
 #endif // SIMDUTF_WESTMERE_INTRINSICS_H
 /* end file src/simdutf/westmere/intrinsics.h */
 
-//
-// The rest need to be inside the region
-//
+  //
+  // The rest need to be inside the region
+  //
 /* begin file src/simdutf/westmere/begin.h */
 // redefining SIMDUTF_IMPLEMENTATION to "westmere"
 // #define SIMDUTF_IMPLEMENTATION westmere
@@ -3293,7 +4215,7 @@ SIMDUTF_TARGET_WESTMERE
 #endif
 /* end file src/simdutf/westmere/begin.h */
 
-// Declarations
+  // Declarations
 /* begin file src/simdutf/westmere/bitmanipulation.h */
 #ifndef SIMDUTF_WESTMERE_BITMANIPULATION_H
 #define SIMDUTF_WESTMERE_BITMANIPULATION_H
@@ -3305,7 +4227,7 @@ namespace {
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
 simdutf_really_inline unsigned __int64 count_ones(uint64_t input_num) {
   // note: we do not support legacy 32-bit Windows
-  return __popcnt64(input_num);// Visual Studio wants two underscores
+  return __popcnt64(input_num); // Visual Studio wants two underscores
 }
 #else
 simdutf_really_inline long long int count_ones(uint64_t input_num) {
@@ -3315,13 +4237,13 @@ simdutf_really_inline long long int count_ones(uint64_t input_num) {
 
 #if SIMDUTF_NEED_TRAILING_ZEROES
 simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
-#if SIMDUTF_REGULAR_VISUAL_STUDIO
+  #if SIMDUTF_REGULAR_VISUAL_STUDIO
   unsigned long ret;
   _BitScanForward64(&ret, input_num);
   return (int)ret;
-#else // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #else  // SIMDUTF_REGULAR_VISUAL_STUDIO
   return __builtin_ctzll(input_num);
-#endif // SIMDUTF_REGULAR_VISUAL_STUDIO
+  #endif // SIMDUTF_REGULAR_VISUAL_STUDIO
 }
 #endif
 
@@ -3340,1283 +4262,875 @@ namespace westmere {
 namespace {
 namespace simd {
 
-  template<typename Child>
-  struct base {
-    __m128i value;
-
-    // Zero constructor
-    simdutf_really_inline base() : value{__m128i()} {}
-
-    // Conversion from SIMD register
-    simdutf_really_inline base(const __m128i _value) : value(_value) {}
-    // Conversion to SIMD register
-    simdutf_really_inline operator const __m128i&() const { return this->value; }
-    simdutf_really_inline operator __m128i&() { return this->value; }
-    template <endianness big_endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * p) const {
-      __m128i first = _mm_cvtepu8_epi16(*this);
-      __m128i second = _mm_cvtepu8_epi16(_mm_srli_si128(*this,8));
-      if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        first = _mm_shuffle_epi8(first, swap);
-        second = _mm_shuffle_epi8(second, swap);
-      }
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p), first);
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p+8), second);
-    }
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * p) const {
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p), _mm_cvtepu8_epi32(*this));
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p+4), _mm_cvtepu8_epi32(_mm_srli_si128(*this,4)));
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p+8), _mm_cvtepu8_epi32(_mm_srli_si128(*this,8)));
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(p+12), _mm_cvtepu8_epi32(_mm_srli_si128(*this,12)));
-    }
-    // Bit operations
-    simdutf_really_inline Child operator|(const Child other) const { return _mm_or_si128(*this, other); }
-    simdutf_really_inline Child operator&(const Child other) const { return _mm_and_si128(*this, other); }
-    simdutf_really_inline Child operator^(const Child other) const { return _mm_xor_si128(*this, other); }
-    simdutf_really_inline Child bit_andnot(const Child other) const { return _mm_andnot_si128(other, *this); }
-    simdutf_really_inline Child& operator|=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast | other; return *this_cast; }
-    simdutf_really_inline Child& operator&=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast & other; return *this_cast; }
-    simdutf_really_inline Child& operator^=(const Child other) { auto this_cast = static_cast<Child*>(this); *this_cast = *this_cast ^ other; return *this_cast; }
-  };
-
-  // Forward-declared so they can be used by splat and friends.
-  template<typename T>
-  struct simd8;
-
-  template<typename T, typename Mask=simd8<bool>>
-  struct base8: base<simd8<T>> {
-    typedef uint16_t bitmask_t;
-    typedef uint32_t bitmask2_t;
-
-    simdutf_really_inline T first() const { return _mm_extract_epi8(*this,0); }
-    simdutf_really_inline T last() const { return _mm_extract_epi8(*this,15); }
-    simdutf_really_inline base8() : base<simd8<T>>() {}
-    simdutf_really_inline base8(const __m128i _value) : base<simd8<T>>(_value) {}
-
-    friend simdutf_really_inline Mask operator==(const simd8<T> lhs, const simd8<T> rhs) { return _mm_cmpeq_epi8(lhs, rhs); }
-
-    static const int SIZE = sizeof(base<simd8<T>>::value);
-
-    template<int N=1>
-    simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
-      return _mm_alignr_epi8(*this, prev_chunk, 16 - N);
-    }
-  };
-
-  // SIMD byte mask type (returned by things like eq and gt)
-  template<>
-  struct simd8<bool>: base8<bool> {
-    static simdutf_really_inline simd8<bool> splat(bool _value) { return _mm_set1_epi8(uint8_t(-(!!_value))); }
-
-    simdutf_really_inline simd8() : base8() {}
-    simdutf_really_inline simd8(const __m128i _value) : base8<bool>(_value) {}
-    // Splat constructor
-    simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
-
-    simdutf_really_inline int to_bitmask() const { return _mm_movemask_epi8(*this); }
-    simdutf_really_inline bool any() const { return !_mm_testz_si128(*this, *this); }
-    simdutf_really_inline bool none() const { return _mm_testz_si128(*this, *this); }
-    simdutf_really_inline bool all() const { return _mm_movemask_epi8(*this) == 0xFFFF; }
-    simdutf_really_inline simd8<bool> operator~() const { return *this ^ true; }
-  };
-
-  template<typename T>
-  struct base8_numeric: base8<T> {
-    static simdutf_really_inline simd8<T> splat(T _value) { return _mm_set1_epi8(_value); }
-    static simdutf_really_inline simd8<T> zero() { return _mm_setzero_si128(); }
-    static simdutf_really_inline simd8<T> load(const T values[16]) {
-      return _mm_loadu_si128(reinterpret_cast<const __m128i *>(values));
-    }
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    static simdutf_really_inline simd8<T> repeat_16(
-      T v0,  T v1,  T v2,  T v3,  T v4,  T v5,  T v6,  T v7,
-      T v8,  T v9,  T v10, T v11, T v12, T v13, T v14, T v15
-    ) {
-      return simd8<T>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-    simdutf_really_inline base8_numeric() : base8<T>() {}
-    simdutf_really_inline base8_numeric(const __m128i _value) : base8<T>(_value) {}
-
-    // Store to array
-    simdutf_really_inline void store(T dst[16]) const { return _mm_storeu_si128(reinterpret_cast<__m128i *>(dst), *this); }
-
-    // Override to distinguish from bool version
-    simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
-
-    // Addition/subtraction are the same for signed and unsigned
-    simdutf_really_inline simd8<T> operator+(const simd8<T> other) const { return _mm_add_epi8(*this, other); }
-    simdutf_really_inline simd8<T> operator-(const simd8<T> other) const { return _mm_sub_epi8(*this, other); }
-    simdutf_really_inline simd8<T>& operator+=(const simd8<T> other) { *this = *this + other; return *static_cast<simd8<T>*>(this); }
-    simdutf_really_inline simd8<T>& operator-=(const simd8<T> other) { *this = *this - other; return *static_cast<simd8<T>*>(this); }
-
-    // Perform a lookup assuming the value is between 0 and 16 (undefined behavior for out of range values)
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
-      return _mm_shuffle_epi8(lookup_table, *this);
-    }
-
-    template<typename L>
-    simdutf_really_inline simd8<L> lookup_16(
-        L replace0,  L replace1,  L replace2,  L replace3,
-        L replace4,  L replace5,  L replace6,  L replace7,
-        L replace8,  L replace9,  L replace10, L replace11,
-        L replace12, L replace13, L replace14, L replace15) const {
-      return lookup_16(simd8<L>::repeat_16(
-        replace0,  replace1,  replace2,  replace3,
-        replace4,  replace5,  replace6,  replace7,
-        replace8,  replace9,  replace10, replace11,
-        replace12, replace13, replace14, replace15
-      ));
-    }
-  };
+template <typename Child> struct base {
+  __m128i value;
 
-  // Signed bytes
-  template<>
-  struct simd8<int8_t> : base8_numeric<int8_t> {
-    simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
-    simdutf_really_inline simd8(const __m128i _value) : base8_numeric<int8_t>(_value) {}
-    // Splat constructor
-    simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const int8_t* values) : simd8(load(values)) {}
-    // Member-by-member initialization
-    simdutf_really_inline simd8(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3,  int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) : simd8(_mm_setr_epi8(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    )) {}
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<int8_t> repeat_16(
-      int8_t v0,  int8_t v1,  int8_t v2,  int8_t v3,  int8_t v4,  int8_t v5,  int8_t v6,  int8_t v7,
-      int8_t v8,  int8_t v9,  int8_t v10, int8_t v11, int8_t v12, int8_t v13, int8_t v14, int8_t v15
-    ) {
-      return simd8<int8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-    simdutf_really_inline operator simd8<uint8_t>() const;
-    simdutf_really_inline bool is_ascii() const { return _mm_movemask_epi8(*this) == 0; }
-
-    // Order-sensitive comparisons
-    simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const { return _mm_max_epi8(*this, other); }
-    simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const { return _mm_min_epi8(*this, other); }
-    simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const { return _mm_cmpgt_epi8(*this, other); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const { return _mm_cmpgt_epi8(other, *this); }
-  };
+  // Zero constructor
+  simdutf_really_inline base() : value{__m128i()} {}
 
-  // Unsigned bytes
-  template<>
-  struct simd8<uint8_t>: base8_numeric<uint8_t>  {
-    simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
-    simdutf_really_inline simd8(const __m128i _value) : base8_numeric<uint8_t>(_value) {}
-
-    // Splat constructor
-    simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const uint8_t* values) : simd8(load(values)) {}
-    // Member-by-member initialization
-    simdutf_really_inline simd8(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) : simd8(_mm_setr_epi8(
-      v0, v1, v2, v3, v4, v5, v6, v7,
-      v8, v9, v10,v11,v12,v13,v14,v15
-    )) {}
-    // Repeat 16 values as many times as necessary (usually for lookup tables)
-    simdutf_really_inline static simd8<uint8_t> repeat_16(
-      uint8_t v0,  uint8_t v1,  uint8_t v2,  uint8_t v3,  uint8_t v4,  uint8_t v5,  uint8_t v6,  uint8_t v7,
-      uint8_t v8,  uint8_t v9,  uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15
-    ) {
-      return simd8<uint8_t>(
-        v0, v1, v2, v3, v4, v5, v6, v7,
-        v8, v9, v10,v11,v12,v13,v14,v15
-      );
-    }
-
-    // Saturated math
-    simdutf_really_inline simd8<uint8_t> saturating_add(const simd8<uint8_t> other) const { return _mm_adds_epu8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> saturating_sub(const simd8<uint8_t> other) const { return _mm_subs_epu8(*this, other); }
-
-    // Order-specific operations
-    simdutf_really_inline simd8<uint8_t> max_val(const simd8<uint8_t> other) const { return _mm_max_epu8(*this, other); }
-    simdutf_really_inline simd8<uint8_t> min_val(const simd8<uint8_t> other) const { return _mm_min_epu8(*this, other); }
-    // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint8_t> gt_bits(const simd8<uint8_t> other) const { return this->saturating_sub(other); }
-    // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint8_t> lt_bits(const simd8<uint8_t> other) const { return other.saturating_sub(*this); }
-    simdutf_really_inline simd8<bool> operator<=(const simd8<uint8_t> other) const { return other.max_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator>=(const simd8<uint8_t> other) const { return other.min_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator>(const simd8<uint8_t> other) const { return this->gt_bits(other).any_bits_set(); }
-    simdutf_really_inline simd8<bool> operator<(const simd8<uint8_t> other) const { return this->gt_bits(other).any_bits_set(); }
-
-    // Bit-specific operations
-    simdutf_really_inline simd8<bool> bits_not_set() const { return *this == uint8_t(0); }
-    simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const { return (*this & bits).bits_not_set(); }
-    simdutf_really_inline simd8<bool> any_bits_set() const { return ~this->bits_not_set(); }
-    simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const { return ~this->bits_not_set(bits); }
-    simdutf_really_inline bool is_ascii() const { return _mm_movemask_epi8(*this) == 0; }
-
-    simdutf_really_inline bool bits_not_set_anywhere() const { return _mm_testz_si128(*this, *this); }
-    simdutf_really_inline bool any_bits_set_anywhere() const { return !bits_not_set_anywhere(); }
-    simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const { return _mm_testz_si128(*this, bits); }
-    simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const { return !bits_not_set_anywhere(bits); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shr() const { return simd8<uint8_t>(_mm_srli_epi16(*this, N)) & uint8_t(0xFFu >> N); }
-    template<int N>
-    simdutf_really_inline simd8<uint8_t> shl() const { return simd8<uint8_t>(_mm_slli_epi16(*this, N)) & uint8_t(0xFFu << N); }
-    // Get one of the bits and make a bitmask out of it.
-    // e.g. value.get_bit<7>() gets the high bit
-    template<int N>
-    simdutf_really_inline int get_bit() const { return _mm_movemask_epi8(_mm_slli_epi16(*this, 7-N)); }
-  };
-  simdutf_really_inline simd8<int8_t>::operator simd8<uint8_t>() const { return this->value; }
-
-  // Unsigned bytes
-  template<>
-  struct simd8<uint16_t>: base<uint16_t> {
-    static simdutf_really_inline simd8<uint16_t> splat(uint16_t _value) { return _mm_set1_epi16(_value); }
-    static simdutf_really_inline simd8<uint16_t> load(const uint16_t values[8]) {
-      return _mm_loadu_si128(reinterpret_cast<const __m128i *>(values));
-    }
-
-    simdutf_really_inline simd8() : base<uint16_t>() {}
-    simdutf_really_inline simd8(const __m128i _value) : base<uint16_t>(_value) {}
-    // Splat constructor
-    simdutf_really_inline simd8(uint16_t _value) : simd8(splat(_value)) {}
-    // Array constructor
-    simdutf_really_inline simd8(const uint16_t* values) : simd8(load(values)) {}
-    // Member-by-member initialization
-    simdutf_really_inline simd8(
-      uint16_t v0,  uint16_t v1,  uint16_t v2,  uint16_t v3,  uint16_t v4,  uint16_t v5,  uint16_t v6,  uint16_t v7
-    ) : simd8(_mm_setr_epi16(
-      v0, v1, v2, v3, v4, v5, v6, v7
-    )) {}
-
-    // Saturated math
-    simdutf_really_inline simd8<uint16_t> saturating_add(const simd8<uint16_t> other) const { return _mm_adds_epu16(*this, other); }
-    simdutf_really_inline simd8<uint16_t> saturating_sub(const simd8<uint16_t> other) const { return _mm_subs_epu16(*this, other); }
-
-    // Order-specific operations
-    simdutf_really_inline simd8<uint16_t> max_val(const simd8<uint16_t> other) const { return _mm_max_epu16(*this, other); }
-    simdutf_really_inline simd8<uint16_t> min_val(const simd8<uint16_t> other) const { return _mm_min_epu16(*this, other); }
-    // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint16_t> gt_bits(const simd8<uint16_t> other) const { return this->saturating_sub(other); }
-    // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-    simdutf_really_inline simd8<uint16_t> lt_bits(const simd8<uint16_t> other) const { return other.saturating_sub(*this); }
-    simdutf_really_inline simd8<bool> operator<=(const simd8<uint16_t> other) const { return other.max_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator>=(const simd8<uint16_t> other) const { return other.min_val(*this) == other; }
-    simdutf_really_inline simd8<bool> operator==(const simd8<uint16_t> other) const { return _mm_cmpeq_epi16(*this, other); }
-    simdutf_really_inline simd8<bool> operator&(const simd8<uint16_t> other) const { return _mm_and_si128(*this, other); }
-    simdutf_really_inline simd8<bool> operator|(const simd8<uint16_t> other) const { return _mm_or_si128(*this, other); }
-
-    // Bit-specific operations
-    simdutf_really_inline simd8<bool> bits_not_set() const { return *this == uint16_t(0); }
-    simdutf_really_inline simd8<bool> any_bits_set() const { return ~this->bits_not_set(); }
-
-    simdutf_really_inline bool bits_not_set_anywhere() const { return _mm_testz_si128(*this, *this); }
-    simdutf_really_inline bool any_bits_set_anywhere() const { return !bits_not_set_anywhere(); }
-    simdutf_really_inline bool bits_not_set_anywhere(simd8<uint16_t> bits) const { return _mm_testz_si128(*this, bits); }
-    simdutf_really_inline bool any_bits_set_anywhere(simd8<uint16_t> bits) const { return !bits_not_set_anywhere(bits); }
-     };
-  template<typename T>
-  struct simd8x64 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
-    static_assert(NUM_CHUNKS == 4, "Westmere kernel should use four registers per 64-byte block.");
-    simd8<T> chunks[NUM_CHUNKS];
-
-    simd8x64(const simd8x64<T>& o) = delete; // no copy allowed
-    simd8x64<T>& operator=(const simd8<T> other) = delete; // no assignment allowed
-    simd8x64() = delete; // no default constructor allowed
-
-    simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1, const simd8<T> chunk2, const simd8<T> chunk3) : chunks{chunk0, chunk1, chunk2, chunk3} {}
-    simdutf_really_inline simd8x64(const T* ptr) : chunks{simd8<T>::load(ptr), simd8<T>::load(ptr+sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+2*sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+3*sizeof(simd8<T>)/sizeof(T))} {}
-
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd8<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd8<T>)*1/sizeof(T));
-      this->chunks[2].store(ptr+sizeof(simd8<T>)*2/sizeof(T));
-      this->chunks[3].store(ptr+sizeof(simd8<T>)*3/sizeof(T));
-    }
-
-    simdutf_really_inline simd8x64<T>& operator |=(const simd8x64<T> &other) {
-      this->chunks[0] |= other.chunks[0];
-      this->chunks[1] |= other.chunks[1];
-      this->chunks[2] |= other.chunks[2];
-      this->chunks[3] |= other.chunks[3];
-      return *this;
-    }
-
-    simdutf_really_inline simd8<T> reduce_or() const {
-      return (this->chunks[0] | this->chunks[1]) | (this->chunks[2] | this->chunks[3]);
-    }
-
-    simdutf_really_inline bool is_ascii() const {
-      return this->reduce_or().is_ascii();
-    }
-
-    template <endianness endian>
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*1);
-      this->chunks[2].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*2);
-      this->chunks[3].template store_ascii_as_utf16<endian>(ptr+sizeof(simd8<T>)*3);
-    }
-
-    simdutf_really_inline void store_ascii_as_utf32(char32_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*0);
-      this->chunks[1].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*1);
-      this->chunks[2].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*2);
-      this->chunks[3].store_ascii_as_utf32(ptr+sizeof(simd8<T>)*3);
-    }
-
-    simdutf_really_inline uint64_t to_bitmask() const {
-      uint64_t r0 = uint32_t(this->chunks[0].to_bitmask());
-      uint64_t r1 =          this->chunks[1].to_bitmask();
-      uint64_t r2 =          this->chunks[2].to_bitmask();
-      uint64_t r3 =          this->chunks[3].to_bitmask();
-      return r0 | (r1 << 16) | (r2 << 32) | (r3 << 48);
-    }
-
-    simdutf_really_inline uint64_t eq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] == mask,
-        this->chunks[1] == mask,
-        this->chunks[2] == mask,
-        this->chunks[3] == mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
-      return  simd8x64<bool>(
-        this->chunks[0] == other.chunks[0],
-        this->chunks[1] == other.chunks[1],
-        this->chunks[2] == other.chunks[2],
-        this->chunks[3] == other.chunks[3]
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t lteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] <= mask,
-        this->chunks[1] <= mask,
-        this->chunks[2] <= mask,
-        this->chunks[3] <= mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-
-      return  simd8x64<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
-        (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
-        (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low-1);
-      const simd8<T> mask_high = simd8<T>::splat(high+1);
-      return simd8x64<bool>(
-        (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
-        (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low),
-        (this->chunks[2] >= mask_high) | (this->chunks[2] <= mask_low),
-        (this->chunks[3] >= mask_high) | (this->chunks[3] <= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask,
-        this->chunks[2] < mask,
-        this->chunks[3] < mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t gt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] > mask,
-        this->chunks[1] > mask,
-        this->chunks[2] > mask,
-        this->chunks[3] > mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] >= mask,
-        this->chunks[1] >= mask,
-        this->chunks[2] >= mask,
-        this->chunks[3] >= mask
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
-      const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
-      return  simd8x64<bool>(
-        simd8<uint8_t>(__m128i(this->chunks[0])) >= mask,
-        simd8<uint8_t>(__m128i(this->chunks[1])) >= mask,
-        simd8<uint8_t>(__m128i(this->chunks[2])) >= mask,
-        simd8<uint8_t>(__m128i(this->chunks[3])) >= mask
-      ).to_bitmask();
-    }
-  }; // struct simd8x64<T>
+  // Conversion from SIMD register
+  simdutf_really_inline base(const __m128i _value) : value(_value) {}
+  // Conversion to SIMD register
+  simdutf_really_inline operator const __m128i &() const { return this->value; }
+  simdutf_really_inline operator __m128i &() { return this->value; }
+  template <endianness big_endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *p) const {
+    __m128i first = _mm_cvtepu8_epi16(*this);
+    __m128i second = _mm_cvtepu8_epi16(_mm_srli_si128(*this, 8));
+    if (big_endian) {
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      first = _mm_shuffle_epi8(first, swap);
+      second = _mm_shuffle_epi8(second, swap);
+    }
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p), first);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p + 8), second);
+  }
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *p) const {
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p), _mm_cvtepu8_epi32(*this));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p + 4),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(*this, 4)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p + 8),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(*this, 8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(p + 12),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(*this, 12)));
+  }
+  // Bit operations
+  simdutf_really_inline Child operator|(const Child other) const {
+    return _mm_or_si128(*this, other);
+  }
+  simdutf_really_inline Child operator&(const Child other) const {
+    return _mm_and_si128(*this, other);
+  }
+  simdutf_really_inline Child operator^(const Child other) const {
+    return _mm_xor_si128(*this, other);
+  }
+  simdutf_really_inline Child bit_andnot(const Child other) const {
+    return _mm_andnot_si128(other, *this);
+  }
+  simdutf_really_inline Child &operator|=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator&=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator^=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
+};
 
-/* begin file src/simdutf/westmere/simd16-inl.h */
-template<typename T>
-struct simd16;
+// Forward-declared so they can be used by splat and friends.
+template <typename T> struct simd8;
 
-template<typename T, typename Mask=simd16<bool>>
-struct base16: base<simd16<T>> {
+template <typename T, typename Mask = simd8<bool>>
+struct base8 : base<simd8<T>> {
   typedef uint16_t bitmask_t;
   typedef uint32_t bitmask2_t;
 
-  simdutf_really_inline base16() : base<simd16<T>>() {}
-  simdutf_really_inline base16(const __m128i _value) : base<simd16<T>>(_value) {}
-  template <typename Pointer>
-  simdutf_really_inline base16(const Pointer* ptr) : base16(_mm_loadu_si128(reinterpret_cast<const __m128i*>(ptr))) {}
+  simdutf_really_inline T first() const { return _mm_extract_epi8(*this, 0); }
+  simdutf_really_inline T last() const { return _mm_extract_epi8(*this, 15); }
+  simdutf_really_inline base8() : base<simd8<T>>() {}
+  simdutf_really_inline base8(const __m128i _value) : base<simd8<T>>(_value) {}
 
-  friend simdutf_really_inline Mask operator==(const simd16<T> lhs, const simd16<T> rhs) { return _mm_cmpeq_epi16(lhs, rhs); }
+  friend simdutf_really_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return _mm_cmpeq_epi8(lhs, rhs);
+  }
 
-  static const int SIZE = sizeof(base<simd16<T>>::value);
+  static const int SIZE = sizeof(base<simd8<T>>::value);
 
-  template<int N=1>
-  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
     return _mm_alignr_epi8(*this, prev_chunk, 16 - N);
   }
 };
 
 // SIMD byte mask type (returned by things like eq and gt)
-template<>
-struct simd16<bool>: base16<bool> {
-  static simdutf_really_inline simd16<bool> splat(bool _value) { return _mm_set1_epi16(uint16_t(-(!!_value))); }
+template <> struct simd8<bool> : base8<bool> {
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return _mm_set1_epi8(uint8_t(-(!!_value)));
+  }
 
-  simdutf_really_inline simd16() : base16() {}
-  simdutf_really_inline simd16(const __m128i _value) : base16<bool>(_value) {}
+  simdutf_really_inline simd8() : base8() {}
+  simdutf_really_inline simd8(const __m128i _value) : base8<bool>(_value) {}
   // Splat constructor
-  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+  simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
 
-  simdutf_really_inline int to_bitmask() const { return _mm_movemask_epi8(*this); }
-  simdutf_really_inline bool any() const { return !_mm_testz_si128(*this, *this); }
-  simdutf_really_inline simd16<bool> operator~() const { return *this ^ true; }
+  simdutf_really_inline int to_bitmask() const {
+    return _mm_movemask_epi8(*this);
+  }
+  simdutf_really_inline bool any() const {
+    return !_mm_testz_si128(*this, *this);
+  }
+  simdutf_really_inline bool none() const {
+    return _mm_testz_si128(*this, *this);
+  }
+  simdutf_really_inline bool all() const {
+    return _mm_movemask_epi8(*this) == 0xFFFF;
+  }
+  simdutf_really_inline simd8<bool> operator~() const { return *this ^ true; }
 };
 
-template<typename T>
-struct base16_numeric: base16<T> {
-  static simdutf_really_inline simd16<T> splat(T _value) { return _mm_set1_epi16(_value); }
-  static simdutf_really_inline simd16<T> zero() { return _mm_setzero_si128(); }
-  static simdutf_really_inline simd16<T> load(const T values[8]) {
+template <typename T> struct base8_numeric : base8<T> {
+  static simdutf_really_inline simd8<T> splat(T _value) {
+    return _mm_set1_epi8(_value);
+  }
+  static simdutf_really_inline simd8<T> zero() { return _mm_setzero_si128(); }
+  static simdutf_really_inline simd8<T> load(const T values[16]) {
     return _mm_loadu_si128(reinterpret_cast<const __m128i *>(values));
   }
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  static simdutf_really_inline simd8<T> repeat_16(T v0, T v1, T v2, T v3, T v4,
+                                                  T v5, T v6, T v7, T v8, T v9,
+                                                  T v10, T v11, T v12, T v13,
+                                                  T v14, T v15) {
+    return simd8<T>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
+                    v14, v15);
+  }
 
-  simdutf_really_inline base16_numeric() : base16<T>() {}
-  simdutf_really_inline base16_numeric(const __m128i _value) : base16<T>(_value) {}
+  simdutf_really_inline base8_numeric() : base8<T>() {}
+  simdutf_really_inline base8_numeric(const __m128i _value)
+      : base8<T>(_value) {}
 
   // Store to array
-  simdutf_really_inline void store(T dst[8]) const { return _mm_storeu_si128(reinterpret_cast<__m128i *>(dst), *this); }
+  simdutf_really_inline void store(T dst[16]) const {
+    return _mm_storeu_si128(reinterpret_cast<__m128i *>(dst), *this);
+  }
 
   // Override to distinguish from bool version
-  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
 
   // Addition/subtraction are the same for signed and unsigned
-  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const { return _mm_add_epi16(*this, other); }
-  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const { return _mm_sub_epi16(*this, other); }
-  simdutf_really_inline simd16<T>& operator+=(const simd16<T> other) { *this = *this + other; return *static_cast<simd16<T>*>(this); }
-  simdutf_really_inline simd16<T>& operator-=(const simd16<T> other) { *this = *this - other; return *static_cast<simd16<T>*>(this); }
+  simdutf_really_inline simd8<T> operator+(const simd8<T> other) const {
+    return _mm_add_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<T> operator-(const simd8<T> other) const {
+    return _mm_sub_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<T> &operator+=(const simd8<T> other) {
+    *this = *this + other;
+    return *static_cast<simd8<T> *>(this);
+  }
+  simdutf_really_inline simd8<T> &operator-=(const simd8<T> other) {
+    *this = *this - other;
+    return *static_cast<simd8<T> *>(this);
+  }
+
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return _mm_shuffle_epi8(lookup_table, *this);
+  }
+
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
+  }
 };
 
-// Signed code units
-template<>
-struct simd16<int16_t> : base16_numeric<int16_t> {
-  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
-  simdutf_really_inline simd16(const __m128i _value) : base16_numeric<int16_t>(_value) {}
+// Signed bytes
+template <> struct simd8<int8_t> : base8_numeric<int8_t> {
+  simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
+  simdutf_really_inline simd8(const __m128i _value)
+      : base8_numeric<int8_t>(_value) {}
   // Splat constructor
-  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd16(const int16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const int16_t*>(values))) {}
+  simdutf_really_inline simd8(const int8_t *values) : simd8(load(values)) {}
   // Member-by-member initialization
-  simdutf_really_inline simd16(
-    int16_t v0, int16_t v1, int16_t v2, int16_t v3, int16_t v4, int16_t v5, int16_t v6, int16_t v7)
-    : simd16(_mm_setr_epi16(v0, v1, v2, v3, v4, v5, v6, v7)) {}
-  simdutf_really_inline operator simd16<uint16_t>() const;
+  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
+                              int8_t v4, int8_t v5, int8_t v6, int8_t v7,
+                              int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+                              int8_t v12, int8_t v13, int8_t v14, int8_t v15)
+      : simd8(_mm_setr_epi8(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                            v12, v13, v14, v15)) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15);
+  }
+  simdutf_really_inline operator simd8<uint8_t>() const;
+  simdutf_really_inline bool is_ascii() const {
+    return _mm_movemask_epi8(*this) == 0;
+  }
 
   // Order-sensitive comparisons
-  simdutf_really_inline simd16<int16_t> max_val(const simd16<int16_t> other) const { return _mm_max_epi16(*this, other); }
-  simdutf_really_inline simd16<int16_t> min_val(const simd16<int16_t> other) const { return _mm_min_epi16(*this, other); }
-  simdutf_really_inline simd16<bool> operator>(const simd16<int16_t> other) const { return _mm_cmpgt_epi16(*this, other); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<int16_t> other) const { return _mm_cmpgt_epi16(other, *this); }
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return _mm_max_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return _mm_min_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return _mm_cmpgt_epi8(*this, other);
+  }
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return _mm_cmpgt_epi8(other, *this);
+  }
 };
 
-// Unsigned code units
-template<>
-struct simd16<uint16_t>: base16_numeric<uint16_t>  {
-  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
-  simdutf_really_inline simd16(const __m128i _value) : base16_numeric<uint16_t>(_value) {}
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base8_numeric<uint8_t> {
+  simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
+  simdutf_really_inline simd8(const __m128i _value)
+      : base8_numeric<uint8_t>(_value) {}
 
   // Splat constructor
-  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd16(const uint16_t* values) : simd16(load(values)) {}
-  simdutf_really_inline simd16(const char16_t* values) : simd16(load(reinterpret_cast<const uint16_t*>(values))) {}
+  simdutf_really_inline simd8(const uint8_t *values) : simd8(load(values)) {}
   // Member-by-member initialization
-  simdutf_really_inline simd16(
-    uint16_t v0, uint16_t v1, uint16_t v2, uint16_t v3, uint16_t v4, uint16_t v5, uint16_t v6, uint16_t v7)
-  : simd16(_mm_setr_epi16(v0, v1, v2, v3, v4, v5, v6, v7)) {}
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
+      : simd8(_mm_setr_epi8(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                            v12, v13, v14, v15)) {}
   // Repeat 16 values as many times as necessary (usually for lookup tables)
-  simdutf_really_inline static simd16<uint16_t> repeat_16(
-    uint16_t v0, uint16_t v1, uint16_t v2, uint16_t v3, uint16_t v4, uint16_t v5, uint16_t v6, uint16_t v7
-  ) {
-    return simd16<uint16_t>(v0, v1, v2, v3, v4, v5, v6, v7);
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15);
   }
 
   // Saturated math
-  simdutf_really_inline simd16<uint16_t> saturating_add(const simd16<uint16_t> other) const { return _mm_adds_epu16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> saturating_sub(const simd16<uint16_t> other) const { return _mm_subs_epu16(*this, other); }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return _mm_adds_epu8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return _mm_subs_epu8(*this, other);
+  }
 
   // Order-specific operations
-  simdutf_really_inline simd16<uint16_t> max_val(const simd16<uint16_t> other) const { return _mm_max_epu16(*this, other); }
-  simdutf_really_inline simd16<uint16_t> min_val(const simd16<uint16_t> other) const { return _mm_min_epu16(*this, other); }
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return _mm_max_epu8(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return _mm_min_epu8(*this, other);
+  }
   // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> gt_bits(const simd16<uint16_t> other) const { return this->saturating_sub(other); }
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return this->saturating_sub(other);
+  }
   // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd16<uint16_t> lt_bits(const simd16<uint16_t> other) const { return other.saturating_sub(*this); }
-  simdutf_really_inline simd16<bool> operator<=(const simd16<uint16_t> other) const { return other.max_val(*this) == other; }
-  simdutf_really_inline simd16<bool> operator>=(const simd16<uint16_t> other) const { return other.min_val(*this) == other; }
-  simdutf_really_inline simd16<bool> operator>(const simd16<uint16_t> other) const { return this->gt_bits(other).any_bits_set(); }
-  simdutf_really_inline simd16<bool> operator<(const simd16<uint16_t> other) const { return this->gt_bits(other).any_bits_set(); }
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return other.max_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return other.min_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
 
   // Bit-specific operations
-  simdutf_really_inline simd16<bool> bits_not_set() const { return *this == uint16_t(0); }
-  simdutf_really_inline simd16<bool> bits_not_set(simd16<uint16_t> bits) const { return (*this & bits).bits_not_set(); }
-  simdutf_really_inline simd16<bool> any_bits_set() const { return ~this->bits_not_set(); }
-  simdutf_really_inline simd16<bool> any_bits_set(simd16<uint16_t> bits) const { return ~this->bits_not_set(bits); }
-
-  simdutf_really_inline bool bits_not_set_anywhere() const { return _mm_testz_si128(*this, *this); }
-  simdutf_really_inline bool any_bits_set_anywhere() const { return !bits_not_set_anywhere(); }
-  simdutf_really_inline bool bits_not_set_anywhere(simd16<uint16_t> bits) const { return _mm_testz_si128(*this, bits); }
-  simdutf_really_inline bool any_bits_set_anywhere(simd16<uint16_t> bits) const { return !bits_not_set_anywhere(bits); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shr() const { return simd16<uint16_t>(_mm_srli_epi16(*this, N)); }
-  template<int N>
-  simdutf_really_inline simd16<uint16_t> shl() const { return simd16<uint16_t>(_mm_slli_epi16(*this, N)); }
-  // Get one of the bits and make a bitmask out of it.
-  // e.g. value.get_bit<7>() gets the high bit
-  template<int N>
-  simdutf_really_inline int get_bit() const { return _mm_movemask_epi8(_mm_slli_epi16(*this, 7-N)); }
-
-  // Change the endianness
-  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
-    const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-    return _mm_shuffle_epi8(*this, swap);
+  simdutf_really_inline simd8<bool> bits_not_set() const {
+    return *this == uint8_t(0);
   }
-
-  // Pack with the unsigned saturation of two uint16_t code units into single uint8_t vector
-  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t>& v0, const simd16<uint16_t>& v1) {
-    return _mm_packus_epi16(v0, v1);
+  simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const {
+    return (*this & bits).bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return ~this->bits_not_set(bits);
+  }
+  simdutf_really_inline bool is_ascii() const {
+    return _mm_movemask_epi8(*this) == 0;
   }
-};
-simdutf_really_inline simd16<int16_t>::operator simd16<uint16_t>() const { return this->value; }
-
-template<typename T>
-  struct simd16x32 {
-    static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
-    static_assert(NUM_CHUNKS == 4, "Westmere kernel should use four registers per 64-byte block.");
-    simd16<T> chunks[NUM_CHUNKS];
-
-    simd16x32(const simd16x32<T>& o) = delete; // no copy allowed
-    simd16x32<T>& operator=(const simd16<T> other) = delete; // no assignment allowed
-    simd16x32() = delete; // no default constructor allowed
-
-    simdutf_really_inline simd16x32(const simd16<T> chunk0, const simd16<T> chunk1, const simd16<T> chunk2, const simd16<T> chunk3) : chunks{chunk0, chunk1, chunk2, chunk3} {}
-    simdutf_really_inline simd16x32(const T* ptr) : chunks{simd16<T>::load(ptr), simd16<T>::load(ptr+sizeof(simd16<T>)/sizeof(T)), simd16<T>::load(ptr+2*sizeof(simd16<T>)/sizeof(T)), simd16<T>::load(ptr+3*sizeof(simd16<T>)/sizeof(T))} {}
-
-    simdutf_really_inline void store(T* ptr) const {
-      this->chunks[0].store(ptr+sizeof(simd16<T>)*0/sizeof(T));
-      this->chunks[1].store(ptr+sizeof(simd16<T>)*1/sizeof(T));
-      this->chunks[2].store(ptr+sizeof(simd16<T>)*2/sizeof(T));
-      this->chunks[3].store(ptr+sizeof(simd16<T>)*3/sizeof(T));
-    }
-
-    simdutf_really_inline simd16<T> reduce_or() const {
-      return (this->chunks[0] | this->chunks[1]) | (this->chunks[2] | this->chunks[3]);
-    }
-
-    simdutf_really_inline bool is_ascii() const {
-      return this->reduce_or().is_ascii();
-    }
-
-    simdutf_really_inline void store_ascii_as_utf16(char16_t * ptr) const {
-      this->chunks[0].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*0);
-      this->chunks[1].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*1);
-      this->chunks[2].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*2);
-      this->chunks[3].store_ascii_as_utf16(ptr+sizeof(simd16<T>)*3);
-    }
-
-    simdutf_really_inline uint64_t to_bitmask() const {
-      uint64_t r0 = uint32_t(this->chunks[0].to_bitmask());
-      uint64_t r1 =          this->chunks[1].to_bitmask();
-      uint64_t r2 =          this->chunks[2].to_bitmask();
-      uint64_t r3 =          this->chunks[3].to_bitmask();
-      return r0 | (r1 << 16) | (r2 << 32) | (r3 << 48);
-    }
-
-    simdutf_really_inline void swap_bytes() {
-      this->chunks[0] = this->chunks[0].swap_bytes();
-      this->chunks[1] = this->chunks[1].swap_bytes();
-      this->chunks[2] = this->chunks[2].swap_bytes();
-      this->chunks[3] = this->chunks[3].swap_bytes();
-    }
-
-    simdutf_really_inline uint64_t eq(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] == mask,
-        this->chunks[1] == mask,
-        this->chunks[2] == mask,
-        this->chunks[3] == mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t eq(const simd16x32<uint16_t> &other) const {
-      return  simd16x32<bool>(
-        this->chunks[0] == other.chunks[0],
-        this->chunks[1] == other.chunks[1],
-        this->chunks[2] == other.chunks[2],
-        this->chunks[3] == other.chunks[3]
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t lteq(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] <= mask,
-        this->chunks[1] <= mask,
-        this->chunks[2] <= mask,
-        this->chunks[3] <= mask
-      ).to_bitmask();
-    }
-
-    simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(low);
-      const simd16<T> mask_high = simd16<T>::splat(high);
-
-      return  simd16x32<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
-        (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
-        (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd16<T> mask_low = simd16<T>::splat(static_cast<T>(low-1));
-      const simd16<T> mask_high = simd16<T>::splat(static_cast<T>(high+1));
-      return simd16x32<bool>(
-        (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
-        (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low),
-        (this->chunks[2] >= mask_high) | (this->chunks[2] <= mask_low),
-        (this->chunks[3] >= mask_high) | (this->chunks[3] <= mask_low)
-      ).to_bitmask();
-    }
-    simdutf_really_inline uint64_t lt(const T m) const {
-      const simd16<T> mask = simd16<T>::splat(m);
-      return  simd16x32<bool>(
-        this->chunks[0] < mask,
-        this->chunks[1] < mask,
-        this->chunks[2] < mask,
-        this->chunks[3] < mask
-      ).to_bitmask();
-    }
-  }; // struct simd16x32<T>
-/* end file src/simdutf/westmere/simd16-inl.h */
 
-} // namespace simd
-} // unnamed namespace
-} // namespace westmere
-} // namespace simdutf
+  simdutf_really_inline bool bits_not_set_anywhere() const {
+    return _mm_testz_si128(*this, *this);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return !bits_not_set_anywhere();
+  }
+  simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const {
+    return _mm_testz_si128(*this, bits);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return !bits_not_set_anywhere(bits);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return simd8<uint8_t>(_mm_srli_epi16(*this, N)) & uint8_t(0xFFu >> N);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return simd8<uint8_t>(_mm_slli_epi16(*this, N)) & uint8_t(0xFFu << N);
+  }
+  // Get one of the bits and make a bitmask out of it.
+  // e.g. value.get_bit<7>() gets the high bit
+  template <int N> simdutf_really_inline int get_bit() const {
+    return _mm_movemask_epi8(_mm_slli_epi16(*this, 7 - N));
+  }
+};
+simdutf_really_inline simd8<int8_t>::operator simd8<uint8_t>() const {
+  return this->value;
+}
 
-#endif // SIMDUTF_WESTMERE_SIMD_INPUT_H
-/* end file src/simdutf/westmere/simd.h */
+// Unsigned bytes
+template <> struct simd8<uint16_t> : base<uint16_t> {
+  static simdutf_really_inline simd8<uint16_t> splat(uint16_t _value) {
+    return _mm_set1_epi16(_value);
+  }
+  static simdutf_really_inline simd8<uint16_t> load(const uint16_t values[8]) {
+    return _mm_loadu_si128(reinterpret_cast<const __m128i *>(values));
+  }
 
-/* begin file src/simdutf/westmere/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
-// nothing needed.
-#else
-SIMDUTF_UNTARGET_REGION
-#endif
+  simdutf_really_inline simd8() : base<uint16_t>() {}
+  simdutf_really_inline simd8(const __m128i _value) : base<uint16_t>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd8(uint16_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const uint16_t *values) : simd8(load(values)) {}
+  // Member-by-member initialization
+  simdutf_really_inline simd8(uint16_t v0, uint16_t v1, uint16_t v2,
+                              uint16_t v3, uint16_t v4, uint16_t v5,
+                              uint16_t v6, uint16_t v7)
+      : simd8(_mm_setr_epi16(v0, v1, v2, v3, v4, v5, v6, v7)) {}
 
-/* end file src/simdutf/westmere/end.h */
+  // Saturated math
+  simdutf_really_inline simd8<uint16_t>
+  saturating_add(const simd8<uint16_t> other) const {
+    return _mm_adds_epu16(*this, other);
+  }
+  simdutf_really_inline simd8<uint16_t>
+  saturating_sub(const simd8<uint16_t> other) const {
+    return _mm_subs_epu16(*this, other);
+  }
 
-#endif // SIMDUTF_IMPLEMENTATION_WESTMERE
-#endif // SIMDUTF_WESTMERE_COMMON_H
-/* end file src/simdutf/westmere.h */
-/* begin file src/simdutf/ppc64.h */
-#ifndef SIMDUTF_PPC64_H
-#define SIMDUTF_PPC64_H
+  // Order-specific operations
+  simdutf_really_inline simd8<uint16_t>
+  max_val(const simd8<uint16_t> other) const {
+    return _mm_max_epu16(*this, other);
+  }
+  simdutf_really_inline simd8<uint16_t>
+  min_val(const simd8<uint16_t> other) const {
+    return _mm_min_epu16(*this, other);
+  }
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint16_t>
+  gt_bits(const simd8<uint16_t> other) const {
+    return this->saturating_sub(other);
+  }
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint16_t>
+  lt_bits(const simd8<uint16_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint16_t> other) const {
+    return other.max_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint16_t> other) const {
+    return other.min_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator==(const simd8<uint16_t> other) const {
+    return _mm_cmpeq_epi16(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator&(const simd8<uint16_t> other) const {
+    return _mm_and_si128(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator|(const simd8<uint16_t> other) const {
+    return _mm_or_si128(*this, other);
+  }
 
-#ifdef SIMDUTF_FALLBACK_H
-#error "ppc64.h must be included before fallback.h"
-#endif
+  // Bit-specific operations
+  simdutf_really_inline simd8<bool> bits_not_set() const {
+    return *this == uint16_t(0);
+  }
+  simdutf_really_inline simd8<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
 
+  simdutf_really_inline bool bits_not_set_anywhere() const {
+    return _mm_testz_si128(*this, *this);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return !bits_not_set_anywhere();
+  }
+  simdutf_really_inline bool bits_not_set_anywhere(simd8<uint16_t> bits) const {
+    return _mm_testz_si128(*this, bits);
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint16_t> bits) const {
+    return !bits_not_set_anywhere(bits);
+  }
+};
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(NUM_CHUNKS == 4,
+                "Westmere kernel should use four registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
 
-#ifndef SIMDUTF_IMPLEMENTATION_PPC64
-#define SIMDUTF_IMPLEMENTATION_PPC64 (SIMDUTF_IS_PPC64)
-#endif
-#define SIMDUTF_CAN_ALWAYS_RUN_PPC64 SIMDUTF_IMPLEMENTATION_PPC64 && SIMDUTF_IS_PPC64
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
 
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1,
+                                 const simd8<T> chunk2, const simd8<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 2 * sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 3 * sizeof(simd8<T>) / sizeof(T))} {}
 
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd8<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd8<T>) * 3 / sizeof(T));
+  }
 
-#if SIMDUTF_IMPLEMENTATION_PPC64
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    this->chunks[2] |= other.chunks[2];
+    this->chunks[3] |= other.chunks[3];
+    return *this;
+  }
 
-namespace simdutf {
-/**
- * Implementation for ALTIVEC (PPC64).
- */
-namespace ppc64 {
-} // namespace ppc64
-} // namespace simdutf
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
+  }
 
-/* begin file src/simdutf/ppc64/implementation.h */
-#ifndef SIMDUTF_PPC64_IMPLEMENTATION_H
-#define SIMDUTF_PPC64_IMPLEMENTATION_H
+  simdutf_really_inline bool is_ascii() const {
+    return this->reduce_or().is_ascii();
+  }
 
+  template <endianness endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 0);
+    this->chunks[1].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 1);
+    this->chunks[2].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 2);
+    this->chunks[3].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 3);
+  }
 
-namespace simdutf {
-namespace ppc64 {
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 0);
+    this->chunks[1].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 1);
+    this->chunks[2].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 2);
+    this->chunks[3].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 3);
+  }
 
-namespace {
-using namespace simdutf;
-} // namespace
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r0 = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r1 = this->chunks[1].to_bitmask();
+    uint64_t r2 = this->chunks[2].to_bitmask();
+    uint64_t r3 = this->chunks[3].to_bitmask();
+    return r0 | (r1 << 16) | (r2 << 32) | (r3 << 48);
+  }
 
-class implementation final : public simdutf::implementation {
-public:
-  simdutf_really_inline implementation()
-      : simdutf::implementation("ppc64", "PPC64 ALTIVEC",
-                                 internal::instruction_set::ALTIVEC) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
-};
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                          this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
+  }
 
-} // namespace ppc64
-} // namespace simdutf
+  simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
+    return simd8x64<bool>(this->chunks[0] == other.chunks[0],
+                          this->chunks[1] == other.chunks[1],
+                          this->chunks[2] == other.chunks[2],
+                          this->chunks[3] == other.chunks[3])
+        .to_bitmask();
+  }
 
-#endif // SIMDUTF_PPC64_IMPLEMENTATION_H
-/* end file src/simdutf/ppc64/implementation.h */
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                          this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
+  }
 
-/* begin file src/simdutf/ppc64/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "ppc64"
-// #define SIMDUTF_IMPLEMENTATION ppc64
-/* end file src/simdutf/ppc64/begin.h */
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low - 1);
+    const simd8<T> mask_high = simd8<T>::splat(high + 1);
+    return simd8x64<bool>(
+               (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
+               (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low),
+               (this->chunks[2] >= mask_high) | (this->chunks[2] <= mask_low),
+               (this->chunks[3] >= mask_high) | (this->chunks[3] <= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                          this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
+  }
 
-// Declarations
-/* begin file src/simdutf/ppc64/intrinsics.h */
-#ifndef SIMDUTF_PPC64_INTRINSICS_H
-#define SIMDUTF_PPC64_INTRINSICS_H
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask,
+                          this->chunks[2] > mask, this->chunks[3] > mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask,
+                          this->chunks[2] >= mask, this->chunks[3] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>(simd8<uint8_t>(__m128i(this->chunks[0])) >= mask,
+                          simd8<uint8_t>(__m128i(this->chunks[1])) >= mask,
+                          simd8<uint8_t>(__m128i(this->chunks[2])) >= mask,
+                          simd8<uint8_t>(__m128i(this->chunks[3])) >= mask)
+        .to_bitmask();
+  }
+}; // struct simd8x64<T>
 
+/* begin file src/simdutf/westmere/simd16-inl.h */
+template <typename T> struct simd16;
 
-// This should be the correct header whether
-// you use visual studio or other compilers.
-#include <altivec.h>
+template <typename T, typename Mask = simd16<bool>>
+struct base16 : base<simd16<T>> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
 
-// These are defined by altivec.h in GCC toolchain, it is safe to undef them.
-#ifdef bool
-#undef bool
-#endif
+  simdutf_really_inline base16() : base<simd16<T>>() {}
+  simdutf_really_inline base16(const __m128i _value)
+      : base<simd16<T>>(_value) {}
+  template <typename Pointer>
+  simdutf_really_inline base16(const Pointer *ptr)
+      : base16(_mm_loadu_si128(reinterpret_cast<const __m128i *>(ptr))) {}
 
-#ifdef vector
-#undef vector
-#endif
+  friend simdutf_really_inline Mask operator==(const simd16<T> lhs,
+                                               const simd16<T> rhs) {
+    return _mm_cmpeq_epi16(lhs, rhs);
+  }
 
-#endif //  SIMDUTF_PPC64_INTRINSICS_H
-/* end file src/simdutf/ppc64/intrinsics.h */
-/* begin file src/simdutf/ppc64/bitmanipulation.h */
-#ifndef SIMDUTF_PPC64_BITMANIPULATION_H
-#define SIMDUTF_PPC64_BITMANIPULATION_H
+  static const int SIZE = sizeof(base<simd16<T>>::value);
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return _mm_alignr_epi8(*this, prev_chunk, 16 - N);
+  }
+};
 
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-simdutf_really_inline int count_ones(uint64_t input_num) {
-  // note: we do not support legacy 32-bit Windows
-  return __popcnt64(input_num); // Visual Studio wants two underscores
-}
-#else
-simdutf_really_inline int count_ones(uint64_t input_num) {
-  return __builtin_popcountll(input_num);
-}
-#endif
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd16<bool> : base16<bool> {
+  static simdutf_really_inline simd16<bool> splat(bool _value) {
+    return _mm_set1_epi16(uint16_t(-(!!_value)));
+  }
 
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
+  simdutf_really_inline simd16() : base16() {}
+  simdutf_really_inline simd16(const __m128i _value) : base16<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
 
-#endif // SIMDUTF_PPC64_BITMANIPULATION_H
-/* end file src/simdutf/ppc64/bitmanipulation.h */
-/* begin file src/simdutf/ppc64/simd.h */
-#ifndef SIMDUTF_PPC64_SIMD_H
-#define SIMDUTF_PPC64_SIMD_H
+  simdutf_really_inline int to_bitmask() const {
+    return _mm_movemask_epi8(*this);
+  }
+  simdutf_really_inline bool any() const {
+    return !_mm_testz_si128(*this, *this);
+  }
+  simdutf_really_inline simd16<bool> operator~() const { return *this ^ true; }
+};
 
-#include <type_traits>
+template <typename T> struct base16_numeric : base16<T> {
+  static simdutf_really_inline simd16<T> splat(T _value) {
+    return _mm_set1_epi16(_value);
+  }
+  static simdutf_really_inline simd16<T> zero() { return _mm_setzero_si128(); }
+  static simdutf_really_inline simd16<T> load(const T values[8]) {
+    return _mm_loadu_si128(reinterpret_cast<const __m128i *>(values));
+  }
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace simd {
-
-using __m128i = __vector unsigned char;
-
-template <typename Child> struct base {
-  __m128i value;
-
-  // Zero constructor
-  simdutf_really_inline base() : value{__m128i()} {}
-
-  // Conversion from SIMD register
-  simdutf_really_inline base(const __m128i _value) : value(_value) {}
-
-  // Conversion to SIMD register
-  simdutf_really_inline operator const __m128i &() const {
-    return this->value;
-  }
-  simdutf_really_inline operator __m128i &() { return this->value; }
-
-  // Bit operations
-  simdutf_really_inline Child operator|(const Child other) const {
-    return vec_or(this->value, (__m128i)other);
-  }
-  simdutf_really_inline Child operator&(const Child other) const {
-    return vec_and(this->value, (__m128i)other);
-  }
-  simdutf_really_inline Child operator^(const Child other) const {
-    return vec_xor(this->value, (__m128i)other);
-  }
-  simdutf_really_inline Child bit_andnot(const Child other) const {
-    return vec_andc(this->value, (__m128i)other);
-  }
-  simdutf_really_inline Child &operator|=(const Child other) {
-    auto this_cast = static_cast<Child*>(this);
-    *this_cast = *this_cast | other;
-    return *this_cast;
-  }
-  simdutf_really_inline Child &operator&=(const Child other) {
-    auto this_cast = static_cast<Child*>(this);
-    *this_cast = *this_cast & other;
-    return *this_cast;
-  }
-  simdutf_really_inline Child &operator^=(const Child other) {
-    auto this_cast = static_cast<Child*>(this);
-    *this_cast = *this_cast ^ other;
-    return *this_cast;
-  }
-};
-
-// Forward-declared so they can be used by splat and friends.
-template <typename T> struct simd8;
-
-template <typename T, typename Mask = simd8<bool>>
-struct base8 : base<simd8<T>> {
-  typedef uint16_t bitmask_t;
-  typedef uint32_t bitmask2_t;
-
-  simdutf_really_inline base8() : base<simd8<T>>() {}
-  simdutf_really_inline base8(const __m128i _value) : base<simd8<T>>(_value) {}
-
-  friend simdutf_really_inline Mask operator==(const simd8<T> lhs, const simd8<T> rhs) {
-    return (__m128i)vec_cmpeq(lhs.value, (__m128i)rhs);
-  }
-
-  static const int SIZE = sizeof(base<simd8<T>>::value);
-
-  template <int N = 1>
-  simdutf_really_inline simd8<T> prev(simd8<T> prev_chunk) const {
-    __m128i chunk = this->value;
-#ifdef __LITTLE_ENDIAN__
-    chunk = (__m128i)vec_reve(this->value);
-    prev_chunk = (__m128i)vec_reve((__m128i)prev_chunk);
-#endif
-    chunk = (__m128i)vec_sld((__m128i)prev_chunk, (__m128i)chunk, 16 - N);
-#ifdef __LITTLE_ENDIAN__
-    chunk = (__m128i)vec_reve((__m128i)chunk);
-#endif
-    return chunk;
-  }
-};
-
-// SIMD byte mask type (returned by things like eq and gt)
-template <> struct simd8<bool> : base8<bool> {
-  static simdutf_really_inline simd8<bool> splat(bool _value) {
-    return (__m128i)vec_splats((unsigned char)(-(!!_value)));
-  }
-
-  simdutf_really_inline simd8() : base8() {}
-  simdutf_really_inline simd8(const __m128i _value)
-      : base8<bool>(_value) {}
-  // Splat constructor
-  simdutf_really_inline simd8(bool _value)
-      : base8<bool>(splat(_value)) {}
-
-  simdutf_really_inline int to_bitmask() const {
-    __vector unsigned long long result;
-    const __m128i perm_mask = {0x78, 0x70, 0x68, 0x60, 0x58, 0x50, 0x48, 0x40,
-                               0x38, 0x30, 0x28, 0x20, 0x18, 0x10, 0x08, 0x00};
-
-    result = ((__vector unsigned long long)vec_vbpermq((__m128i)this->value,
-                                                       (__m128i)perm_mask));
-#ifdef __LITTLE_ENDIAN__
-    return static_cast<int>(result[1]);
-#else
-    return static_cast<int>(result[0]);
-#endif
-  }
-  simdutf_really_inline bool any() const {
-    return !vec_all_eq(this->value, (__m128i)vec_splats(0));
-  }
-  simdutf_really_inline simd8<bool> operator~() const {
-    return this->value ^ (__m128i)splat(true);
-  }
-};
-
-template <typename T> struct base8_numeric : base8<T> {
-  static simdutf_really_inline simd8<T> splat(T value) {
-    (void)value;
-    return (__m128i)vec_splats(value);
-  }
-  static simdutf_really_inline simd8<T> zero() { return splat(0); }
-  static simdutf_really_inline simd8<T> load(const T values[16]) {
-    return (__m128i)(vec_vsx_ld(0, reinterpret_cast<const uint8_t *>(values)));
-  }
-  // Repeat 16 values as many times as necessary (usually for lookup tables)
-  static simdutf_really_inline simd8<T> repeat_16(T v0, T v1, T v2, T v3, T v4,
-                                                   T v5, T v6, T v7, T v8, T v9,
-                                                   T v10, T v11, T v12, T v13,
-                                                   T v14, T v15) {
-    return simd8<T>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
-                    v14, v15);
-  }
-
-  simdutf_really_inline base8_numeric() : base8<T>() {}
-  simdutf_really_inline base8_numeric(const __m128i _value)
-      : base8<T>(_value) {}
+  simdutf_really_inline base16_numeric() : base16<T>() {}
+  simdutf_really_inline base16_numeric(const __m128i _value)
+      : base16<T>(_value) {}
 
   // Store to array
-  simdutf_really_inline void store(T dst[16]) const {
-    vec_vsx_st(this->value, 0, reinterpret_cast<__m128i *>(dst));
+  simdutf_really_inline void store(T dst[8]) const {
+    return _mm_storeu_si128(reinterpret_cast<__m128i *>(dst), *this);
   }
 
   // Override to distinguish from bool version
-  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
 
   // Addition/subtraction are the same for signed and unsigned
-  simdutf_really_inline simd8<T> operator+(const simd8<T> other) const {
-    return (__m128i)((__m128i)this->value + (__m128i)other);
+  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const {
+    return _mm_add_epi16(*this, other);
   }
-  simdutf_really_inline simd8<T> operator-(const simd8<T> other) const {
-    return (__m128i)((__m128i)this->value - (__m128i)other);
+  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const {
+    return _mm_sub_epi16(*this, other);
   }
-  simdutf_really_inline simd8<T> &operator+=(const simd8<T> other) {
+  simdutf_really_inline simd16<T> &operator+=(const simd16<T> other) {
     *this = *this + other;
-    return *static_cast<simd8<T> *>(this);
+    return *static_cast<simd16<T> *>(this);
   }
-  simdutf_really_inline simd8<T> &operator-=(const simd8<T> other) {
+  simdutf_really_inline simd16<T> &operator-=(const simd16<T> other) {
     *this = *this - other;
-    return *static_cast<simd8<T> *>(this);
-  }
-
-  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
-  // for out of range values)
-  template <typename L>
-  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
-    return (__m128i)vec_perm((__m128i)lookup_table, (__m128i)lookup_table, this->value);
-  }
-
-  template <typename L>
-  simdutf_really_inline simd8<L>
-  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
-            L replace5, L replace6, L replace7, L replace8, L replace9,
-            L replace10, L replace11, L replace12, L replace13, L replace14,
-            L replace15) const {
-    return lookup_16(simd8<L>::repeat_16(
-        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
-        replace7, replace8, replace9, replace10, replace11, replace12,
-        replace13, replace14, replace15));
+    return *static_cast<simd16<T> *>(this);
   }
 };
 
-// Signed bytes
-template <> struct simd8<int8_t> : base8_numeric<int8_t> {
-  simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
-  simdutf_really_inline simd8(const __m128i _value)
-      : base8_numeric<int8_t>(_value) {}
-
+// Signed code units
+template <> struct simd16<int16_t> : base16_numeric<int16_t> {
+  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+  simdutf_really_inline simd16(const __m128i _value)
+      : base16_numeric<int16_t>(_value) {}
   // Splat constructor
-  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
+  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd8(const int8_t *values) : simd8(load(values)) {}
+  simdutf_really_inline simd16(const int16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const int16_t *>(values))) {}
   // Member-by-member initialization
-  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
-                               int8_t v4, int8_t v5, int8_t v6, int8_t v7,
-                               int8_t v8, int8_t v9, int8_t v10, int8_t v11,
-                               int8_t v12, int8_t v13, int8_t v14, int8_t v15)
-      : simd8((__m128i)(__vector signed char){v0, v1, v2, v3, v4, v5, v6, v7,
-                                              v8, v9, v10, v11, v12, v13, v14,
-                                              v15}) {}
-  // Repeat 16 values as many times as necessary (usually for lookup tables)
-  simdutf_really_inline static simd8<int8_t>
-  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
-            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
-            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
-    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
-                         v13, v14, v15);
-  }
+  simdutf_really_inline simd16(int16_t v0, int16_t v1, int16_t v2, int16_t v3,
+                               int16_t v4, int16_t v5, int16_t v6, int16_t v7)
+      : simd16(_mm_setr_epi16(v0, v1, v2, v3, v4, v5, v6, v7)) {}
+  simdutf_really_inline operator simd16<uint16_t>() const;
 
   // Order-sensitive comparisons
-  simdutf_really_inline simd8<int8_t>
-  max_val(const simd8<int8_t> other) const {
-    return (__m128i)vec_max((__vector signed char)this->value,
-                            (__vector signed char)(__m128i)other);
+  simdutf_really_inline simd16<int16_t>
+  max_val(const simd16<int16_t> other) const {
+    return _mm_max_epi16(*this, other);
   }
-  simdutf_really_inline simd8<int8_t>
-  min_val(const simd8<int8_t> other) const {
-    return (__m128i)vec_min((__vector signed char)this->value,
-                            (__vector signed char)(__m128i)other);
+  simdutf_really_inline simd16<int16_t>
+  min_val(const simd16<int16_t> other) const {
+    return _mm_min_epi16(*this, other);
   }
-  simdutf_really_inline simd8<bool>
-  operator>(const simd8<int8_t> other) const {
-    return (__m128i)vec_cmpgt((__vector signed char)this->value,
-                              (__vector signed char)(__m128i)other);
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<int16_t> other) const {
+    return _mm_cmpgt_epi16(*this, other);
   }
-  simdutf_really_inline simd8<bool>
-  operator<(const simd8<int8_t> other) const {
-    return (__m128i)vec_cmplt((__vector signed char)this->value,
-                              (__vector signed char)(__m128i)other);
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<int16_t> other) const {
+    return _mm_cmpgt_epi16(other, *this);
   }
 };
 
-// Unsigned bytes
-template <> struct simd8<uint8_t> : base8_numeric<uint8_t> {
-  simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
-  simdutf_really_inline simd8(const __m128i _value)
-      : base8_numeric<uint8_t>(_value) {}
+// Unsigned code units
+template <> struct simd16<uint16_t> : base16_numeric<uint16_t> {
+  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
+  simdutf_really_inline simd16(const __m128i _value)
+      : base16_numeric<uint16_t>(_value) {}
+
   // Splat constructor
-  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
+  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
   // Array constructor
-  simdutf_really_inline simd8(const uint8_t *values) : simd8(load(values)) {}
+  simdutf_really_inline simd16(const uint16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const uint16_t *>(values))) {}
   // Member-by-member initialization
-  simdutf_really_inline
-  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
-        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
-        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
-      : simd8((__m128i){v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
-                        v13, v14, v15}) {}
+  simdutf_really_inline simd16(uint16_t v0, uint16_t v1, uint16_t v2,
+                               uint16_t v3, uint16_t v4, uint16_t v5,
+                               uint16_t v6, uint16_t v7)
+      : simd16(_mm_setr_epi16(v0, v1, v2, v3, v4, v5, v6, v7)) {}
   // Repeat 16 values as many times as necessary (usually for lookup tables)
-  simdutf_really_inline static simd8<uint8_t>
-  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
-            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
-            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
-            uint8_t v15) {
-    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
-                          v13, v14, v15);
+  simdutf_really_inline static simd16<uint16_t>
+  repeat_16(uint16_t v0, uint16_t v1, uint16_t v2, uint16_t v3, uint16_t v4,
+            uint16_t v5, uint16_t v6, uint16_t v7) {
+    return simd16<uint16_t>(v0, v1, v2, v3, v4, v5, v6, v7);
   }
 
   // Saturated math
-  simdutf_really_inline simd8<uint8_t>
-  saturating_add(const simd8<uint8_t> other) const {
-    return (__m128i)vec_adds(this->value, (__m128i)other);
+  simdutf_really_inline simd16<uint16_t>
+  saturating_add(const simd16<uint16_t> other) const {
+    return _mm_adds_epu16(*this, other);
   }
-  simdutf_really_inline simd8<uint8_t>
-  saturating_sub(const simd8<uint8_t> other) const {
-    return (__m128i)vec_subs(this->value, (__m128i)other);
+  simdutf_really_inline simd16<uint16_t>
+  saturating_sub(const simd16<uint16_t> other) const {
+    return _mm_subs_epu16(*this, other);
   }
 
   // Order-specific operations
-  simdutf_really_inline simd8<uint8_t>
-  max_val(const simd8<uint8_t> other) const {
-    return (__m128i)vec_max(this->value, (__m128i)other);
+  simdutf_really_inline simd16<uint16_t>
+  max_val(const simd16<uint16_t> other) const {
+    return _mm_max_epu16(*this, other);
   }
-  simdutf_really_inline simd8<uint8_t>
-  min_val(const simd8<uint8_t> other) const {
-    return (__m128i)vec_min(this->value, (__m128i)other);
+  simdutf_really_inline simd16<uint16_t>
+  min_val(const simd16<uint16_t> other) const {
+    return _mm_min_epu16(*this, other);
   }
   // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd8<uint8_t>
-  gt_bits(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<uint16_t>
+  gt_bits(const simd16<uint16_t> other) const {
     return this->saturating_sub(other);
   }
   // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
-  simdutf_really_inline simd8<uint8_t>
-  lt_bits(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<uint16_t>
+  lt_bits(const simd16<uint16_t> other) const {
     return other.saturating_sub(*this);
   }
-  simdutf_really_inline simd8<bool>
-  operator<=(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<bool>
+  operator<=(const simd16<uint16_t> other) const {
     return other.max_val(*this) == other;
   }
-  simdutf_really_inline simd8<bool>
-  operator>=(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<bool>
+  operator>=(const simd16<uint16_t> other) const {
     return other.min_val(*this) == other;
   }
-  simdutf_really_inline simd8<bool>
-  operator>(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<uint16_t> other) const {
     return this->gt_bits(other).any_bits_set();
   }
-  simdutf_really_inline simd8<bool>
-  operator<(const simd8<uint8_t> other) const {
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<uint16_t> other) const {
     return this->gt_bits(other).any_bits_set();
   }
 
   // Bit-specific operations
-  simdutf_really_inline simd8<bool> bits_not_set() const {
-    return (__m128i)vec_cmpeq(this->value, (__m128i)vec_splats(uint8_t(0)));
+  simdutf_really_inline simd16<bool> bits_not_set() const {
+    return *this == uint16_t(0);
   }
-  simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const {
+  simdutf_really_inline simd16<bool> bits_not_set(simd16<uint16_t> bits) const {
     return (*this & bits).bits_not_set();
   }
-  simdutf_really_inline simd8<bool> any_bits_set() const {
+  simdutf_really_inline simd16<bool> any_bits_set() const {
     return ~this->bits_not_set();
   }
-  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+  simdutf_really_inline simd16<bool> any_bits_set(simd16<uint16_t> bits) const {
     return ~this->bits_not_set(bits);
   }
 
-  simdutf_really_inline bool is_ascii() const {
-      return this->saturating_sub(0b01111111u).bits_not_set_anywhere();
-  }
-
   simdutf_really_inline bool bits_not_set_anywhere() const {
-    return vec_all_eq(this->value, (__m128i)vec_splats(0));
+    return _mm_testz_si128(*this, *this);
   }
   simdutf_really_inline bool any_bits_set_anywhere() const {
     return !bits_not_set_anywhere();
   }
-  simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const {
-    return vec_all_eq(vec_and(this->value, (__m128i)bits),
-                      (__m128i)vec_splats(0));
+  simdutf_really_inline bool
+  bits_not_set_anywhere(simd16<uint16_t> bits) const {
+    return _mm_testz_si128(*this, bits);
   }
-  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+  simdutf_really_inline bool
+  any_bits_set_anywhere(simd16<uint16_t> bits) const {
     return !bits_not_set_anywhere(bits);
   }
-  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
-    return simd8<uint8_t>(
-        (__m128i)vec_sr(this->value, (__m128i)vec_splat_u8(N)));
+  template <int N> simdutf_really_inline simd16<uint16_t> shr() const {
+    return simd16<uint16_t>(_mm_srli_epi16(*this, N));
   }
-  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
-    return simd8<uint8_t>(
-        (__m128i)vec_sl(this->value, (__m128i)vec_splat_u8(N)));
+  template <int N> simdutf_really_inline simd16<uint16_t> shl() const {
+    return simd16<uint16_t>(_mm_slli_epi16(*this, N));
+  }
+  // Get one of the bits and make a bitmask out of it.
+  // e.g. value.get_bit<7>() gets the high bit
+  template <int N> simdutf_really_inline int get_bit() const {
+    return _mm_movemask_epi8(_mm_slli_epi16(*this, 7 - N));
+  }
+
+  // Change the endianness
+  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
+    const __m128i swap =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+    return _mm_shuffle_epi8(*this, swap);
+  }
+
+  // Pack with the unsigned saturation of two uint16_t code units into single
+  // uint8_t vector
+  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t> &v0,
+                                                   const simd16<uint16_t> &v1) {
+    return _mm_packus_epi16(v0, v1);
   }
 };
+simdutf_really_inline simd16<int16_t>::operator simd16<uint16_t>() const {
+  return this->value;
+}
 
-template <typename T> struct simd8x64 {
-  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+template <typename T> struct simd16x32 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
   static_assert(NUM_CHUNKS == 4,
-                "PPC64 kernel should use four registers per 64-byte block.");
-  simd8<T> chunks[NUM_CHUNKS];
+                "Westmere kernel should use four registers per 64-byte block.");
+  simd16<T> chunks[NUM_CHUNKS];
 
-  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
-  simd8x64<T> &
-  operator=(const simd8<T> other) = delete; // no assignment allowed
-  simd8x64() = delete;                      // no default constructor allowed
+  simd16x32(const simd16x32<T> &o) = delete; // no copy allowed
+  simd16x32<T> &
+  operator=(const simd16<T> other) = delete; // no assignment allowed
+  simd16x32() = delete;                      // no default constructor allowed
 
-  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1,
-                                  const simd8<T> chunk2, const simd8<T> chunk3)
+  simdutf_really_inline
+  simd16x32(const simd16<T> chunk0, const simd16<T> chunk1,
+            const simd16<T> chunk2, const simd16<T> chunk3)
       : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd16x32(const T *ptr)
+      : chunks{simd16<T>::load(ptr),
+               simd16<T>::load(ptr + sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 2 * sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 3 * sizeof(simd16<T>) / sizeof(T))} {}
 
-  simdutf_really_inline simd8x64(const T* ptr) : chunks{simd8<T>::load(ptr), simd8<T>::load(ptr+sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+2*sizeof(simd8<T>)/sizeof(T)), simd8<T>::load(ptr+3*sizeof(simd8<T>)/sizeof(T))} {}
-
-  simdutf_really_inline void store(T* ptr) const {
-    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0/sizeof(T));
-    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1/sizeof(T));
-    this->chunks[2].store(ptr + sizeof(simd8<T>) * 2/sizeof(T));
-    this->chunks[3].store(ptr + sizeof(simd8<T>) * 3/sizeof(T));
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd16<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd16<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd16<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd16<T>) * 3 / sizeof(T));
   }
 
-
-  simdutf_really_inline simd8x64<T>& operator |=(const simd8x64<T> &other) {
-      this->chunks[0] |= other.chunks[0];
-      this->chunks[1] |= other.chunks[1];
-      this->chunks[2] |= other.chunks[2];
-      this->chunks[3] |= other.chunks[3];
-      return *this;
-    }
-
-  simdutf_really_inline simd8<T> reduce_or() const {
+  simdutf_really_inline simd16<T> reduce_or() const {
     return (this->chunks[0] | this->chunks[1]) |
            (this->chunks[2] | this->chunks[3]);
   }
 
-
   simdutf_really_inline bool is_ascii() const {
-    return input.reduce_or().is_ascii();
+    return this->reduce_or().is_ascii();
+  }
+
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 0);
+    this->chunks[1].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 1);
+    this->chunks[2].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 2);
+    this->chunks[3].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 3);
   }
 
   simdutf_really_inline uint64_t to_bitmask() const {
@@ -4627,138 +5141,117 @@ template <typename T> struct simd8x64 {
     return r0 | (r1 << 16) | (r2 << 32) | (r3 << 48);
   }
 
+  simdutf_really_inline void swap_bytes() {
+    this->chunks[0] = this->chunks[0].swap_bytes();
+    this->chunks[1] = this->chunks[1].swap_bytes();
+    this->chunks[2] = this->chunks[2].swap_bytes();
+    this->chunks[3] = this->chunks[3].swap_bytes();
+  }
+
   simdutf_really_inline uint64_t eq(const T m) const {
-    const simd8<T> mask = simd8<T>::splat(m);
-    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
-                          this->chunks[2] == mask, this->chunks[3] == mask)
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                           this->chunks[2] == mask, this->chunks[3] == mask)
         .to_bitmask();
   }
 
-  simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
-    return simd8x64<bool>(this->chunks[0] == other.chunks[0],
-                          this->chunks[1] == other.chunks[1],
-                          this->chunks[2] == other.chunks[2],
-                          this->chunks[3] == other.chunks[3])
+  simdutf_really_inline uint64_t eq(const simd16x32<uint16_t> &other) const {
+    return simd16x32<bool>(this->chunks[0] == other.chunks[0],
+                           this->chunks[1] == other.chunks[1],
+                           this->chunks[2] == other.chunks[2],
+                           this->chunks[3] == other.chunks[3])
         .to_bitmask();
   }
 
   simdutf_really_inline uint64_t lteq(const T m) const {
-    const simd8<T> mask = simd8<T>::splat(m);
-    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
-                          this->chunks[2] <= mask, this->chunks[3] <= mask)
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                           this->chunks[2] <= mask, this->chunks[3] <= mask)
         .to_bitmask();
   }
 
   simdutf_really_inline uint64_t in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-
-      return  simd8x64<bool>(
-        (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
-        (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
-        (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
-        (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low)
-      ).to_bitmask();
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
+
+    return simd16x32<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
   }
   simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
-      const simd8<T> mask_low = simd8<T>::splat(low);
-      const simd8<T> mask_high = simd8<T>::splat(high);
-      return  simd8x64<bool>(
-        (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
-        (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
-        (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
-        (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low)
-      ).to_bitmask();
+    const simd16<T> mask_low = simd16<T>::splat(static_cast<T>(low - 1));
+    const simd16<T> mask_high = simd16<T>::splat(static_cast<T>(high + 1));
+    return simd16x32<bool>(
+               (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
+               (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low),
+               (this->chunks[2] >= mask_high) | (this->chunks[2] <= mask_low),
+               (this->chunks[3] >= mask_high) | (this->chunks[3] <= mask_low))
+        .to_bitmask();
   }
   simdutf_really_inline uint64_t lt(const T m) const {
-    const simd8<T> mask = simd8<T>::splat(m);
-    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
-                          this->chunks[2] < mask, this->chunks[3] < mask)
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                           this->chunks[2] < mask, this->chunks[3] < mask)
         .to_bitmask();
   }
-
-  simdutf_really_inline uint64_t gt(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] > mask,
-        this->chunks[1] > mask,
-        this->chunks[2] > mask,
-        this->chunks[3] > mask
-      ).to_bitmask();
-  }
-  simdutf_really_inline uint64_t gteq(const T m) const {
-      const simd8<T> mask = simd8<T>::splat(m);
-      return  simd8x64<bool>(
-        this->chunks[0] >= mask,
-        this->chunks[1] >= mask,
-        this->chunks[2] >= mask,
-        this->chunks[3] >= mask
-      ).to_bitmask();
-  }
-  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
-      const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
-      return  simd8x64<bool>(
-        simd8<uint8_t>(this->chunks[0]) >= mask,
-        simd8<uint8_t>(this->chunks[1]) >= mask,
-        simd8<uint8_t>(this->chunks[2]) >= mask,
-        simd8<uint8_t>(this->chunks[3]) >= mask
-      ).to_bitmask();
-  }
-}; // struct simd8x64<T>
+}; // struct simd16x32<T>
+/* end file src/simdutf/westmere/simd16-inl.h */
 
 } // namespace simd
 } // unnamed namespace
-} // namespace ppc64
+} // namespace westmere
 } // namespace simdutf
 
-#endif // SIMDUTF_PPC64_SIMD_INPUT_H
-/* end file src/simdutf/ppc64/simd.h */
+#endif // SIMDUTF_WESTMERE_SIMD_INPUT_H
+/* end file src/simdutf/westmere/simd.h */
 
-/* begin file src/simdutf/ppc64/end.h */
-/* end file src/simdutf/ppc64/end.h */
+/* begin file src/simdutf/westmere/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
+// nothing needed.
+#else
+SIMDUTF_UNTARGET_REGION
+#endif
 
-#endif // SIMDUTF_IMPLEMENTATION_PPC64
+/* end file src/simdutf/westmere/end.h */
 
-#endif // SIMDUTF_PPC64_H
-/* end file src/simdutf/ppc64.h */
-/* begin file src/simdutf/rvv.h */
-#ifndef SIMDUTF_RVV_H
-#define SIMDUTF_RVV_H
+#endif // SIMDUTF_IMPLEMENTATION_WESTMERE
+#endif // SIMDUTF_WESTMERE_COMMON_H
+/* end file src/simdutf/westmere.h */
+/* begin file src/simdutf/ppc64.h */
+#ifndef SIMDUTF_PPC64_H
+#define SIMDUTF_PPC64_H
 
 #ifdef SIMDUTF_FALLBACK_H
-#error "rvv.h must be included before fallback.h"
+  #error "ppc64.h must be included before fallback.h"
 #endif
 
 
-#define SIMDUTF_CAN_ALWAYS_RUN_RVV SIMDUTF_IS_RVV
-
-#ifndef SIMDUTF_IMPLEMENTATION_RVV
-#define SIMDUTF_IMPLEMENTATION_RVV (SIMDUTF_CAN_ALWAYS_RUN_RVV || (SIMDUTF_IS_RISCV64 && SIMDUTF_HAS_RVV_INTRINSICS && SIMDUTF_HAS_RVV_TARGET_REGION))
+#ifndef SIMDUTF_IMPLEMENTATION_PPC64
+  #define SIMDUTF_IMPLEMENTATION_PPC64 (SIMDUTF_IS_PPC64)
 #endif
+#define SIMDUTF_CAN_ALWAYS_RUN_PPC64                                           \
+  SIMDUTF_IMPLEMENTATION_PPC64 &&SIMDUTF_IS_PPC64
 
-#if SIMDUTF_IMPLEMENTATION_RVV
 
-#if SIMDUTF_CAN_ALWAYS_RUN_RVV
-#define SIMDUTF_TARGET_RVV
-#else
-#define SIMDUTF_TARGET_RVV SIMDUTF_TARGET_REGION("arch=+v")
-#endif
-#if !SIMDUTF_IS_ZVBB && SIMDUTF_HAS_ZVBB_INTRINSICS
-#define SIMDUTF_TARGET_ZVBB SIMDUTF_TARGET_REGION("arch=+v,+zvbb")
-#endif
+#if SIMDUTF_IMPLEMENTATION_PPC64
 
 namespace simdutf {
-namespace rvv {
-} // namespace rvv
+/**
+ * Implementation for ALTIVEC (PPC64).
+ */
+namespace ppc64 {} // namespace ppc64
 } // namespace simdutf
 
-/* begin file src/simdutf/rvv/implementation.h */
-#ifndef SIMDUTF_RVV_IMPLEMENTATION_H
-#define SIMDUTF_RVV_IMPLEMENTATION_H
+/* begin file src/simdutf/ppc64/implementation.h */
+#ifndef SIMDUTF_PPC64_IMPLEMENTATION_H
+#define SIMDUTF_PPC64_IMPLEMENTATION_H
 
 
 namespace simdutf {
-namespace rvv {
+namespace ppc64 {
 
 namespace {
 using namespace simdutf;
@@ -4767,2088 +5260,3523 @@ using namespace simdutf;
 class implementation final : public simdutf::implementation {
 public:
   simdutf_really_inline implementation()
-      : simdutf::implementation("rvv", "RISC-V Vector Extension",
-                                 internal::instruction_set::RVV)
-      , _supports_zvbb(internal::detect_supported_architectures() & internal::instruction_set::ZVBB)
-  {}
-  simdutf_warn_unused int detect_encodings(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char *buf, size_t len, char *utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char *buf, size_t len, char *latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char *buf, size_t len, char *latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len, char *latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t *buf, size_t len, char *latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len, char *latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t *buf, size_t len, char *latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t *buf, size_t len, char16_t *output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t len) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t len) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t len) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char *buf, size_t len) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
-private:
-  const bool _supports_zvbb;
-
-#if SIMDUTF_IS_ZVBB
-  bool supports_zvbb() const { return true; }
-#elif SIMDUTF_HAS_ZVBB_INTRINSICS
-  bool supports_zvbb() const { return _supports_zvbb; }
-#else
-  bool supports_zvbb() const { return false; }
-#endif
+      : simdutf::implementation("ppc64", "PPC64 ALTIVEC",
+                                internal::instruction_set::ALTIVEC) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 };
 
-} // namespace rvv
+} // namespace ppc64
 } // namespace simdutf
 
-#endif // SIMDUTF_RVV_IMPLEMENTATION_H
-/* end file src/simdutf/rvv/implementation.h */
-/* begin file src/simdutf/rvv/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "rvv"
-// #define SIMDUTF_IMPLEMENTATION rvv
+#endif // SIMDUTF_PPC64_IMPLEMENTATION_H
+/* end file src/simdutf/ppc64/implementation.h */
 
-#if SIMDUTF_CAN_ALWAYS_RUN_RVV
-// nothing needed.
-#else
-SIMDUTF_TARGET_RVV
-#endif
-/* end file src/simdutf/rvv/begin.h */
-/* begin file src/simdutf/rvv/intrinsics.h */
-#ifndef SIMDUTF_RVV_INTRINSICS_H
-#define SIMDUTF_RVV_INTRINSICS_H
+/* begin file src/simdutf/ppc64/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "ppc64"
+// #define SIMDUTF_IMPLEMENTATION ppc64
+/* end file src/simdutf/ppc64/begin.h */
 
+  // Declarations
+/* begin file src/simdutf/ppc64/intrinsics.h */
+#ifndef SIMDUTF_PPC64_INTRINSICS_H
+#define SIMDUTF_PPC64_INTRINSICS_H
 
-#include <riscv_vector.h>
 
-#if __riscv_v_intrinsic >= 1000000 ||  __GCC__ >= 14
-#define simdutf_vrgather_u8m1x2(tbl, idx) __riscv_vcreate_v_u8m1_u8m2( \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 0), __riscv_vsetvlmax_e8m1()), \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 1), __riscv_vsetvlmax_e8m1()));
+// This should be the correct header whether
+// you use visual studio or other compilers.
+#include <altivec.h>
 
-#define simdutf_vrgather_u8m1x4(tbl, idx) __riscv_vcreate_v_u8m1_u8m4( \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 0), __riscv_vsetvlmax_e8m1()), \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 1), __riscv_vsetvlmax_e8m1()), \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 2), __riscv_vsetvlmax_e8m1()), \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 3), __riscv_vsetvlmax_e8m1()));
-#else
-// This has worse codegen on gcc
-#define simdutf_vrgather_u8m1x2(tbl, idx) \
-        __riscv_vset_v_u8m1_u8m2(__riscv_vlmul_ext_v_u8m1_u8m2( \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 0), __riscv_vsetvlmax_e8m1())), 1, \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 1), __riscv_vsetvlmax_e8m1()))
-
-#define simdutf_vrgather_u8m1x4(tbl, idx) \
-        __riscv_vset_v_u8m1_u8m4(__riscv_vset_v_u8m1_u8m4(\
-        __riscv_vset_v_u8m1_u8m4(__riscv_vlmul_ext_v_u8m1_u8m4( \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 0), __riscv_vsetvlmax_e8m1())), 1, \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 1), __riscv_vsetvlmax_e8m1())), 2, \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 2), __riscv_vsetvlmax_e8m1())), 3, \
-        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 3), __riscv_vsetvlmax_e8m1()))
+// These are defined by altivec.h in GCC toolchain, it is safe to undef them.
+#ifdef bool
+  #undef bool
 #endif
 
-/* Zvbb adds dedicated support for endianness swaps with vrev8, but if we can't
- * use that, we have to emulate it with the standard V extension.
- * Using LMUL=1 vrgathers could be faster than the srl+macc variant, but that
- * would increase register pressure, and vrgather implementations performance
- * varies a lot. */
-enum class simdutf_ByteFlip { NONE, V, ZVBB };
-
-template<simdutf_ByteFlip method>
-simdutf_really_inline static uint16_t simdutf_byteflip(uint16_t v) {
-  if (method != simdutf_ByteFlip::NONE)
-    return (uint16_t)((v*1u) << 8 | (v*1u) >> 8);
-  return v;
-}
-
-#ifdef SIMDUTF_TARGET_ZVBB
-SIMDUTF_UNTARGET_REGION
-SIMDUTF_TARGET_ZVBB
+#ifdef vector
+  #undef vector
 #endif
 
-template<simdutf_ByteFlip method>
-simdutf_really_inline static vuint16m1_t simdutf_byteflip(vuint16m1_t v, size_t vl) {
-#if SIMDUTF_HAS_ZVBB_INTRINSICS
-  if (method == simdutf_ByteFlip::ZVBB)
-    return __riscv_vrev8_v_u16m1(v, vl);
-#endif
-  if (method == simdutf_ByteFlip::V)
-    return __riscv_vmacc_vx_u16m1(__riscv_vsrl_vx_u16m1(v, 8, vl), 0x100, v, vl);
-  return v;
-}
+#endif //  SIMDUTF_PPC64_INTRINSICS_H
+/* end file src/simdutf/ppc64/intrinsics.h */
+/* begin file src/simdutf/ppc64/bitmanipulation.h */
+#ifndef SIMDUTF_PPC64_BITMANIPULATION_H
+#define SIMDUTF_PPC64_BITMANIPULATION_H
 
-template<simdutf_ByteFlip method>
-simdutf_really_inline static vuint16m2_t simdutf_byteflip(vuint16m2_t v, size_t vl) {
-#if SIMDUTF_HAS_ZVBB_INTRINSICS
-  if (method == simdutf_ByteFlip::ZVBB)
-    return __riscv_vrev8_v_u16m2(v, vl);
-#endif
-  if (method == simdutf_ByteFlip::V)
-    return __riscv_vmacc_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 8, vl), 0x100, v, vl);
-  return v;
-}
+namespace simdutf {
+namespace ppc64 {
+namespace {
 
-template<simdutf_ByteFlip method>
-simdutf_really_inline static vuint16m4_t simdutf_byteflip(vuint16m4_t v, size_t vl) {
-#if SIMDUTF_HAS_ZVBB_INTRINSICS
-  if (method == simdutf_ByteFlip::ZVBB)
-    return __riscv_vrev8_v_u16m4(v, vl);
-#endif
-  if (method == simdutf_ByteFlip::V)
-    return __riscv_vmacc_vx_u16m4(__riscv_vsrl_vx_u16m4(v, 8, vl), 0x100, v, vl);
-  return v;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+simdutf_really_inline int count_ones(uint64_t input_num) {
+  // note: we do not support legacy 32-bit Windows
+  return __popcnt64(input_num); // Visual Studio wants two underscores
 }
-
-template<simdutf_ByteFlip method>
-simdutf_really_inline static vuint16m8_t simdutf_byteflip(vuint16m8_t v, size_t vl) {
-#if SIMDUTF_HAS_ZVBB_INTRINSICS
-  if (method == simdutf_ByteFlip::ZVBB)
-    return __riscv_vrev8_v_u16m8(v, vl);
-#endif
-  if (method == simdutf_ByteFlip::V)
-    return __riscv_vmacc_vx_u16m8(__riscv_vsrl_vx_u16m8(v, 8, vl), 0x100, v, vl);
-  return v;
+#else
+simdutf_really_inline int count_ones(uint64_t input_num) {
+  return __builtin_popcountll(input_num);
 }
-
-#ifdef SIMDUTF_TARGET_ZVBB
-SIMDUTF_UNTARGET_REGION
-SIMDUTF_TARGET_RVV
 #endif
 
-#endif //  SIMDUTF_RVV_INTRINSICS_H
-/* end file src/simdutf/rvv/intrinsics.h */
-/* begin file src/simdutf/rvv/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_RVV
-// nothing needed.
-#else
-SIMDUTF_UNTARGET_REGION
-#endif
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
 
-/* end file src/simdutf/rvv/end.h */
+#endif // SIMDUTF_PPC64_BITMANIPULATION_H
+/* end file src/simdutf/ppc64/bitmanipulation.h */
+/* begin file src/simdutf/ppc64/simd.h */
+#ifndef SIMDUTF_PPC64_SIMD_H
+#define SIMDUTF_PPC64_SIMD_H
 
-#endif // SIMDUTF_IMPLEMENTATION_RVV
+#include <type_traits>
 
-#endif // SIMDUTF_RVV_H
-/* end file src/simdutf/rvv.h */
-/* begin file src/simdutf/fallback.h */
-#ifndef SIMDUTF_FALLBACK_H
-#define SIMDUTF_FALLBACK_H
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace simd {
 
+using __m128i = __vector unsigned char;
 
-// Note that fallback.h is always imported last.
+template <typename Child> struct base {
+  __m128i value;
 
-// Default Fallback to on unless a builtin implementation has already been selected.
-#ifndef SIMDUTF_IMPLEMENTATION_FALLBACK
-#if SIMDUTF_CAN_ALWAYS_RUN_ARM64 || SIMDUTF_CAN_ALWAYS_RUN_ICELAKE || SIMDUTF_CAN_ALWAYS_RUN_HASWELL || SIMDUTF_CAN_ALWAYS_RUN_WESTMERE || SIMDUTF_CAN_ALWAYS_RUN_PPC64 || SIMDUTF_CAN_ALWAYS_RUN_RVV
-#define SIMDUTF_IMPLEMENTATION_FALLBACK 0
-#else
-#define SIMDUTF_IMPLEMENTATION_FALLBACK 1
-#endif
-#endif
-
-#define SIMDUTF_CAN_ALWAYS_RUN_FALLBACK (SIMDUTF_IMPLEMENTATION_FALLBACK)
+  // Zero constructor
+  simdutf_really_inline base() : value{__m128i()} {}
 
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
+  // Conversion from SIMD register
+  simdutf_really_inline base(const __m128i _value) : value(_value) {}
 
-namespace simdutf {
-/**
- * Fallback implementation (runs on any machine).
- */
-namespace fallback {
-} // namespace fallback
-} // namespace simdutf
+  // Conversion to SIMD register
+  simdutf_really_inline operator const __m128i &() const { return this->value; }
+  simdutf_really_inline operator __m128i &() { return this->value; }
 
-/* begin file src/simdutf/fallback/implementation.h */
-#ifndef SIMDUTF_FALLBACK_IMPLEMENTATION_H
-#define SIMDUTF_FALLBACK_IMPLEMENTATION_H
+  // Bit operations
+  simdutf_really_inline Child operator|(const Child other) const {
+    return vec_or(this->value, (__m128i)other);
+  }
+  simdutf_really_inline Child operator&(const Child other) const {
+    return vec_and(this->value, (__m128i)other);
+  }
+  simdutf_really_inline Child operator^(const Child other) const {
+    return vec_xor(this->value, (__m128i)other);
+  }
+  simdutf_really_inline Child bit_andnot(const Child other) const {
+    return vec_andc(this->value, (__m128i)other);
+  }
+  simdutf_really_inline Child &operator|=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator&=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator^=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
+};
 
+// Forward-declared so they can be used by splat and friends.
+template <typename T> struct simd8;
 
-namespace simdutf {
-namespace fallback {
+template <typename T, typename Mask = simd8<bool>>
+struct base8 : base<simd8<T>> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
 
-namespace {
-using namespace simdutf;
-}
+  simdutf_really_inline base8() : base<simd8<T>>() {}
+  simdutf_really_inline base8(const __m128i _value) : base<simd8<T>>(_value) {}
 
-class implementation final : public simdutf::implementation {
-public:
-  simdutf_really_inline implementation() : simdutf::implementation(
-      "fallback",
-      "Generic fallback implementation",
-      0
-  ) {}
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept final;
-  simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) const noexcept final;
-  void change_endianness_utf16(const char16_t * buf, size_t length, char16_t * output) const noexcept final;
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) const noexcept;
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) const noexcept;
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept;
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept;
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept;
-};
-} // namespace fallback
-} // namespace simdutf
+  friend simdutf_really_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return (__m128i)vec_cmpeq(lhs.value, (__m128i)rhs);
+  }
 
-#endif // SIMDUTF_FALLBACK_IMPLEMENTATION_H
-/* end file src/simdutf/fallback/implementation.h */
+  static const int SIZE = sizeof(base<simd8<T>>::value);
 
-/* begin file src/simdutf/fallback/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "fallback"
-// #define SIMDUTF_IMPLEMENTATION fallback
-/* end file src/simdutf/fallback/begin.h */
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(simd8<T> prev_chunk) const {
+    __m128i chunk = this->value;
+#ifdef __LITTLE_ENDIAN__
+    chunk = (__m128i)vec_reve(this->value);
+    prev_chunk = (__m128i)vec_reve((__m128i)prev_chunk);
+#endif
+    chunk = (__m128i)vec_sld((__m128i)prev_chunk, (__m128i)chunk, 16 - N);
+#ifdef __LITTLE_ENDIAN__
+    chunk = (__m128i)vec_reve((__m128i)chunk);
+#endif
+    return chunk;
+  }
+};
 
-// Declarations
-/* begin file src/simdutf/fallback/bitmanipulation.h */
-#ifndef SIMDUTF_FALLBACK_BITMANIPULATION_H
-#define SIMDUTF_FALLBACK_BITMANIPULATION_H
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd8<bool> : base8<bool> {
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return (__m128i)vec_splats((unsigned char)(-(!!_value)));
+  }
 
-#include <limits>
+  simdutf_really_inline simd8() : base8() {}
+  simdutf_really_inline simd8(const __m128i _value) : base8<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
 
-namespace simdutf {
-namespace fallback {
-namespace {
+  simdutf_really_inline int to_bitmask() const {
+    __vector unsigned long long result;
+    const __m128i perm_mask = {0x78, 0x70, 0x68, 0x60, 0x58, 0x50, 0x48, 0x40,
+                               0x38, 0x30, 0x28, 0x20, 0x18, 0x10, 0x08, 0x00};
 
-} // unnamed namespace
-} // namespace fallback
-} // namespace simdutf
+    result = ((__vector unsigned long long)vec_vbpermq((__m128i)this->value,
+                                                       (__m128i)perm_mask));
+#ifdef __LITTLE_ENDIAN__
+    return static_cast<int>(result[1]);
+#else
+    return static_cast<int>(result[0]);
+#endif
+  }
+  simdutf_really_inline bool any() const {
+    return !vec_all_eq(this->value, (__m128i)vec_splats(0));
+  }
+  simdutf_really_inline simd8<bool> operator~() const {
+    return this->value ^ (__m128i)splat(true);
+  }
+};
 
-#endif // SIMDUTF_FALLBACK_BITMANIPULATION_H
-/* end file src/simdutf/fallback/bitmanipulation.h */
+template <typename T> struct base8_numeric : base8<T> {
+  static simdutf_really_inline simd8<T> splat(T value) {
+    (void)value;
+    return (__m128i)vec_splats(value);
+  }
+  static simdutf_really_inline simd8<T> zero() { return splat(0); }
+  static simdutf_really_inline simd8<T> load(const T values[16]) {
+    return (__m128i)(vec_vsx_ld(0, reinterpret_cast<const uint8_t *>(values)));
+  }
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  static simdutf_really_inline simd8<T> repeat_16(T v0, T v1, T v2, T v3, T v4,
+                                                  T v5, T v6, T v7, T v8, T v9,
+                                                  T v10, T v11, T v12, T v13,
+                                                  T v14, T v15) {
+    return simd8<T>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
+                    v14, v15);
+  }
 
-/* begin file src/simdutf/fallback/end.h */
-/* end file src/simdutf/fallback/end.h */
+  simdutf_really_inline base8_numeric() : base8<T>() {}
+  simdutf_really_inline base8_numeric(const __m128i _value)
+      : base8<T>(_value) {}
 
-#endif // SIMDUTF_IMPLEMENTATION_FALLBACK
-#endif // SIMDUTF_FALLBACK_H
-/* end file src/simdutf/fallback.h */
+  // Store to array
+  simdutf_really_inline void store(T dst[16]) const {
+    vec_vsx_st(this->value, 0, reinterpret_cast<__m128i *>(dst));
+  }
 
-/* begin file src/scalar/utf8.h */
-#ifndef SIMDUTF_UTF8_H
-#define SIMDUTF_UTF8_H
+  // Override to distinguish from bool version
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8 {
-#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_RVV
-// only used by the fallback kernel.
-// credit: based on code from Google Fuchsia (Apache Licensed)
-inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  uint64_t pos = 0;
-  uint32_t code_point = 0;
-  while (pos < len) {
-    // check of the next 16 bytes are ascii.
-    uint64_t next_pos = pos + 16;
-    if (next_pos <= len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      std::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        pos = next_pos;
-        continue;
-      }
-    }
-    unsigned char byte = data[pos];
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd8<T> operator+(const simd8<T> other) const {
+    return (__m128i)((__m128i)this->value + (__m128i)other);
+  }
+  simdutf_really_inline simd8<T> operator-(const simd8<T> other) const {
+    return (__m128i)((__m128i)this->value - (__m128i)other);
+  }
+  simdutf_really_inline simd8<T> &operator+=(const simd8<T> other) {
+    *this = *this + other;
+    return *static_cast<simd8<T> *>(this);
+  }
+  simdutf_really_inline simd8<T> &operator-=(const simd8<T> other) {
+    *this = *this - other;
+    return *static_cast<simd8<T> *>(this);
+  }
 
-    while (byte < 0b10000000) {
-      if (++pos == len) { return true; }
-      byte = data[pos];
-    }
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return (__m128i)vec_perm((__m128i)lookup_table, (__m128i)lookup_table,
+                             this->value);
+  }
 
-    if ((byte & 0b11100000) == 0b11000000) {
-      next_pos = pos + 2;
-      if (next_pos > len) { return false; }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
-      // range check
-      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if ((code_point < 0x80) || (0x7ff < code_point)) { return false; }
-    } else if ((byte & 0b11110000) == 0b11100000) {
-      next_pos = pos + 3;
-      if (next_pos > len) { return false; }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; }
-      // range check
-      code_point = (byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point) ||
-          (0xd7ff < code_point && code_point < 0xe000)) {
-        return false;
-      }
-    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
-      next_pos = pos + 4;
-      if (next_pos > len) { return false; }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return false; }
-      // range check
-      code_point =
-          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) { return false; }
-    } else {
-      // we may have a continuation
-      return false;
-    }
-    pos = next_pos;
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
   }
-  return true;
-}
-#endif
+};
 
-inline simdutf_warn_unused result validate_with_errors(const char *buf, size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  uint32_t code_point = 0;
-  while (pos < len) {
-    // check of the next 16 bytes are ascii.
-    size_t next_pos = pos + 16;
-    if (next_pos <= len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      std::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        pos = next_pos;
-        continue;
-      }
-    }
-    unsigned char byte = data[pos];
+// Signed bytes
+template <> struct simd8<int8_t> : base8_numeric<int8_t> {
+  simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
+  simdutf_really_inline simd8(const __m128i _value)
+      : base8_numeric<int8_t>(_value) {}
 
-    while (byte < 0b10000000) {
-      if (++pos == len) { return result(error_code::SUCCESS, len); }
-      byte = data[pos];
-    }
+  // Splat constructor
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const int8_t *values) : simd8(load(values)) {}
+  // Member-by-member initialization
+  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
+                              int8_t v4, int8_t v5, int8_t v6, int8_t v7,
+                              int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+                              int8_t v12, int8_t v13, int8_t v14, int8_t v15)
+      : simd8((__m128i)(__vector signed char){v0, v1, v2, v3, v4, v5, v6, v7,
+                                              v8, v9, v10, v11, v12, v13, v14,
+                                              v15}) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15);
+  }
 
-    if ((byte & 0b11100000) == 0b11000000) {
-      next_pos = pos + 2;
-      if (next_pos > len) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      // range check
-      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if ((code_point < 0x80) || (0x7ff < code_point)) { return result(error_code::OVERLONG, pos); }
-    } else if ((byte & 0b11110000) == 0b11100000) {
-      next_pos = pos + 3;
-      if (next_pos > len) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      // range check
-      code_point = (byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point)) { return result(error_code::OVERLONG, pos);}
-      if (0xd7ff < code_point && code_point < 0xe000) { return result(error_code::SURROGATE, pos); }
-    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
-      next_pos = pos + 4;
-      if (next_pos > len) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      // range check
-      code_point =
-          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) { return result(error_code::OVERLONG, pos); }
-      if (0x10ffff < code_point) { return result(error_code::TOO_LARGE, pos); }
-    } else {
-      // we either have too many continuation bytes or an invalid leading byte
-      if ((byte & 0b11000000) == 0b10000000) { return result(error_code::TOO_LONG, pos); }
-      else { return result(error_code::HEADER_BITS, pos); }
-    }
-    pos = next_pos;
+  // Order-sensitive comparisons
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return (__m128i)vec_max((__vector signed char)this->value,
+                            (__vector signed char)(__m128i)other);
   }
-  return result(error_code::SUCCESS, len);
-}
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return (__m128i)vec_min((__vector signed char)this->value,
+                            (__vector signed char)(__m128i)other);
+  }
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return (__m128i)vec_cmpgt((__vector signed char)this->value,
+                              (__vector signed char)(__m128i)other);
+  }
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return (__m128i)vec_cmplt((__vector signed char)this->value,
+                              (__vector signed char)(__m128i)other);
+  }
+};
 
-// Finds the previous leading byte starting backward from buf and validates with errors from there
-// Used to pinpoint the location of an error when an invalid chunk is detected
-// We assume that the stream starts with a leading byte, and to check that it is the case, we
-// ask that you pass a pointer to the start of the stream (start).
-inline simdutf_warn_unused result rewind_and_validate_with_errors(const char *start, const char *buf, size_t len) noexcept {
-    // First check that we start with a leading byte
-  if ((*start & 0b11000000) == 0b10000000) {
-    return result(error_code::TOO_LONG, 0);
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base8_numeric<uint8_t> {
+  simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
+  simdutf_really_inline simd8(const __m128i _value)
+      : base8_numeric<uint8_t>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const uint8_t *values) : simd8(load(values)) {}
+  // Member-by-member initialization
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
+      : simd8((__m128i){v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                        v13, v14, v15}) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15);
   }
-  size_t extra_len{0};
-  // A leading byte cannot be further than 4 bytes away
-  for(int i = 0; i < 5; i++) {
-    unsigned char byte = *buf;
-    if ((byte & 0b11000000) != 0b10000000) {
-      break;
-    } else {
-      buf--;
-      extra_len++;
-    }
+
+  // Saturated math
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return (__m128i)vec_adds(this->value, (__m128i)other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return (__m128i)vec_subs(this->value, (__m128i)other);
   }
 
-  result res = validate_with_errors(buf, len + extra_len);
-  res.count -= extra_len;
-  return res;
-}
+  // Order-specific operations
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return (__m128i)vec_max(this->value, (__m128i)other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return (__m128i)vec_min(this->value, (__m128i)other);
+  }
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return this->saturating_sub(other);
+  }
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return other.max_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return other.min_val(*this) == other;
+  }
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return this->gt_bits(other).any_bits_set();
+  }
 
-inline size_t count_code_points(const char* buf, size_t len) {
-    const int8_t * p = reinterpret_cast<const int8_t *>(buf);
-    size_t counter{0};
-    for(size_t i = 0; i < len; i++) {
-        // -65 is 0b10111111, anything larger in two-complement's should start a new code point.
-        if(p[i] > -65) { counter++; }
-    }
-    return counter;
-}
+  // Bit-specific operations
+  simdutf_really_inline simd8<bool> bits_not_set() const {
+    return (__m128i)vec_cmpeq(this->value, (__m128i)vec_splats(uint8_t(0)));
+  }
+  simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const {
+    return (*this & bits).bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return ~this->bits_not_set(bits);
+  }
 
-inline size_t utf16_length_from_utf8(const char* buf, size_t len) {
-    const int8_t * p = reinterpret_cast<const int8_t *>(buf);
-    size_t counter{0};
-    for(size_t i = 0; i < len; i++) {
-        if(p[i] > -65) { counter++; }
-        if(uint8_t(p[i]) >= 240) { counter++; }
-    }
-    return counter;
-}
+  simdutf_really_inline bool is_ascii() const {
+    return this->saturating_sub(0b01111111u).bits_not_set_anywhere();
+  }
 
-simdutf_warn_unused inline size_t trim_partial_utf8(const char *input, size_t length) {
-  if (length < 3) {
-    switch (length) {
-      case 2:
-        if (uint8_t(input[length-1]) >= 0xc0) { return length-1; } // 2-, 3- and 4-byte characters with only 1 byte left
-        if (uint8_t(input[length-2]) >= 0xe0) { return length-2; } // 3- and 4-byte characters with only 2 bytes left
-        return length;
-      case 1:
-        if (uint8_t(input[length-1]) >= 0xc0) { return length-1; } // 2-, 3- and 4-byte characters with only 1 byte left
-        return length;
-      case 0:
-        return length;
-    }
-  }
-  if (uint8_t(input[length-1]) >= 0xc0) { return length-1; } // 2-, 3- and 4-byte characters with only 1 byte left
-  if (uint8_t(input[length-2]) >= 0xe0) { return length-2; } // 3- and 4-byte characters with only 1 byte left
-  if (uint8_t(input[length-3]) >= 0xf0) { return length-3; } // 4-byte characters with only 3 bytes left
-  return length;
-}
+  simdutf_really_inline bool bits_not_set_anywhere() const {
+    return vec_all_eq(this->value, (__m128i)vec_splats(0));
+  }
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return !bits_not_set_anywhere();
+  }
+  simdutf_really_inline bool bits_not_set_anywhere(simd8<uint8_t> bits) const {
+    return vec_all_eq(vec_and(this->value, (__m128i)bits),
+                      (__m128i)vec_splats(0));
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return !bits_not_set_anywhere(bits);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return simd8<uint8_t>(
+        (__m128i)vec_sr(this->value, (__m128i)vec_splat_u8(N)));
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return simd8<uint8_t>(
+        (__m128i)vec_sl(this->value, (__m128i)vec_splat_u8(N)));
+  }
+};
 
-} // utf8 namespace
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(NUM_CHUNKS == 4,
+                "PPC64 kernel should use four registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
 
-#endif
-/* end file src/scalar/utf8.h */
-/* begin file src/scalar/utf16.h */
-#ifndef SIMDUTF_UTF16_H
-#define SIMDUTF_UTF16_H
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16 {
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1,
+                                 const simd8<T> chunk2, const simd8<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
 
-inline simdutf_warn_unused uint16_t swap_bytes(const uint16_t word) {
-  return uint16_t((word >> 8) | (word << 8));
-}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 2 * sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 3 * sizeof(simd8<T>) / sizeof(T))} {}
 
-template <endianness big_endian>
-inline simdutf_warn_unused bool validate(const char16_t *buf, size_t len) noexcept {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  uint64_t pos = 0;
-  while (pos < len) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
-    if((word &0xF800) == 0xD800) {
-        if(pos + 1 >= len) { return false; }
-        uint16_t diff = uint16_t(word - 0xD800);
-        if(diff > 0x3FF) { return false; }
-        uint16_t next_word = !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
-        uint16_t diff2 = uint16_t(next_word - 0xDC00);
-        if(diff2 > 0x3FF) { return false; }
-        pos += 2;
-    } else {
-        pos++;
-    }
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd8<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd8<T>) * 3 / sizeof(T));
   }
-  return true;
-}
 
-template <endianness big_endian>
-inline simdutf_warn_unused result validate_with_errors(const char16_t *buf, size_t len) noexcept {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  while (pos < len) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
-    if((word & 0xF800) == 0xD800) {
-        if(pos + 1 >= len) { return result(error_code::SURROGATE, pos); }
-        uint16_t diff = uint16_t(word - 0xD800);
-        if(diff > 0x3FF) { return result(error_code::SURROGATE, pos); }
-        uint16_t next_word = !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
-        uint16_t diff2 = uint16_t(next_word - 0xDC00);
-        if(diff2 > 0x3FF) { return result(error_code::SURROGATE, pos); }
-        pos += 2;
-    } else {
-        pos++;
-    }
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    this->chunks[2] |= other.chunks[2];
+    this->chunks[3] |= other.chunks[3];
+    return *this;
   }
-  return result(error_code::SUCCESS, pos);
-}
 
-template <endianness big_endian>
-inline size_t count_code_points(const char16_t* buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t * p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for(size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter += ((word & 0xFC00) != 0xDC00);
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
   }
-  return counter;
-}
 
-template <endianness big_endian>
-inline size_t utf8_length_from_utf16(const char16_t* buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t * p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for(size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter++;                                      // ASCII
-    counter += static_cast<size_t>(word > 0x7F);    // non-ASCII is at least 2 bytes, surrogates are 2*2 == 4 bytes
-    counter += static_cast<size_t>((word > 0x7FF && word <= 0xD7FF) || (word >= 0xE000));   // three-byte
+  simdutf_really_inline bool is_ascii() const {
+    return input.reduce_or().is_ascii();
   }
-  return counter;
-}
 
-template <endianness big_endian>
-inline size_t utf32_length_from_utf16(const char16_t* buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t * p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for(size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter += ((word & 0xFC00) != 0xDC00);
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r0 = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r1 = this->chunks[1].to_bitmask();
+    uint64_t r2 = this->chunks[2].to_bitmask();
+    uint64_t r3 = this->chunks[3].to_bitmask();
+    return r0 | (r1 << 16) | (r2 << 32) | (r3 << 48);
   }
-  return counter;
-}
 
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                          this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
+  }
 
-inline size_t latin1_length_from_utf16(size_t len) {
-  return len;
-}
+  simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
+    return simd8x64<bool>(this->chunks[0] == other.chunks[0],
+                          this->chunks[1] == other.chunks[1],
+                          this->chunks[2] == other.chunks[2],
+                          this->chunks[3] == other.chunks[3])
+        .to_bitmask();
+  }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t size, char16_t* out) {
-  const uint16_t * input = reinterpret_cast<const uint16_t *>(in);
-  uint16_t * output = reinterpret_cast<uint16_t *>(out);
-  for (size_t i = 0; i < size; i++) {
-    *output++ = uint16_t(input[i] >> 8 | input[i] << 8);
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                          this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
   }
-}
 
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+    return simd8x64<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
+               (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
+               (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                          this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
+  }
 
-template <endianness big_endian>
-simdutf_warn_unused inline size_t trim_partial_utf16(const char16_t* input, size_t length) {
-  if (length <= 1) {
-    return length;
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask,
+                          this->chunks[2] > mask, this->chunks[3] > mask)
+        .to_bitmask();
   }
-  uint16_t last_word = uint16_t(input[length-1]);
-  last_word = !match_system(big_endian) ? swap_bytes(last_word) : last_word;
-  length -= ((last_word & 0xFC00) == 0xD800);
-  return length;
-}
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask,
+                          this->chunks[2] >= mask, this->chunks[3] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>(simd8<uint8_t>(this->chunks[0]) >= mask,
+                          simd8<uint8_t>(this->chunks[1]) >= mask,
+                          simd8<uint8_t>(this->chunks[2]) >= mask,
+                          simd8<uint8_t>(this->chunks[3]) >= mask)
+        .to_bitmask();
+  }
+}; // struct simd8x64<T>
 
-} // utf16 namespace
+} // namespace simd
 } // unnamed namespace
-} // namespace scalar
+} // namespace ppc64
 } // namespace simdutf
 
+#endif // SIMDUTF_PPC64_SIMD_INPUT_H
+/* end file src/simdutf/ppc64/simd.h */
+
+/* begin file src/simdutf/ppc64/end.h */
+/* end file src/simdutf/ppc64/end.h */
+
+#endif // SIMDUTF_IMPLEMENTATION_PPC64
+
+#endif // SIMDUTF_PPC64_H
+/* end file src/simdutf/ppc64.h */
+/* begin file src/simdutf/rvv.h */
+#ifndef SIMDUTF_RVV_H
+#define SIMDUTF_RVV_H
+
+#ifdef SIMDUTF_FALLBACK_H
+  #error "rvv.h must be included before fallback.h"
+#endif
+
+
+#define SIMDUTF_CAN_ALWAYS_RUN_RVV SIMDUTF_IS_RVV
+
+#ifndef SIMDUTF_IMPLEMENTATION_RVV
+  #define SIMDUTF_IMPLEMENTATION_RVV                                           \
+    (SIMDUTF_CAN_ALWAYS_RUN_RVV ||                                             \
+     (SIMDUTF_IS_RISCV64 && SIMDUTF_HAS_RVV_INTRINSICS &&                      \
+      SIMDUTF_HAS_RVV_TARGET_REGION))
 #endif
-/* end file src/scalar/utf16.h */
-/* begin file src/scalar/utf32.h */
-#ifndef SIMDUTF_UTF32_H
-#define SIMDUTF_UTF32_H
+
+#if SIMDUTF_IMPLEMENTATION_RVV
+
+  #if SIMDUTF_CAN_ALWAYS_RUN_RVV
+    #define SIMDUTF_TARGET_RVV
+  #else
+    #define SIMDUTF_TARGET_RVV SIMDUTF_TARGET_REGION("arch=+v")
+  #endif
+  #if !SIMDUTF_IS_ZVBB && SIMDUTF_HAS_ZVBB_INTRINSICS
+    #define SIMDUTF_TARGET_ZVBB SIMDUTF_TARGET_REGION("arch=+v,+zvbb")
+  #endif
 
 namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32 {
+namespace rvv {} // namespace rvv
+} // namespace simdutf
 
-inline simdutf_warn_unused bool validate(const char32_t *buf, size_t len) noexcept {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  uint64_t pos = 0;
-  for(;pos < len; pos++) {
-    uint32_t word = data[pos];
-    if(word > 0x10FFFF || (word >= 0xD800 && word <= 0xDFFF)) {
-        return false;
-    }
-  }
-  return true;
-}
+/* begin file src/simdutf/rvv/implementation.h */
+#ifndef SIMDUTF_RVV_IMPLEMENTATION_H
+#define SIMDUTF_RVV_IMPLEMENTATION_H
 
-inline simdutf_warn_unused result validate_with_errors(const char32_t *buf, size_t len) noexcept {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  for(;pos < len; pos++) {
-    uint32_t word = data[pos];
-    if(word > 0x10FFFF) {
-        return result(error_code::TOO_LARGE, pos);
-    }
-    if(word >= 0xD800 && word <= 0xDFFF) {
-        return result(error_code::SURROGATE, pos);
-    }
-  }
-  return result(error_code::SUCCESS, pos);
-}
 
-inline size_t utf8_length_from_utf32(const char32_t* buf, size_t len) {
-  // We are not BOM aware.
-  const uint32_t * p = reinterpret_cast<const uint32_t *>(buf);
-  size_t counter{0};
-  for(size_t i = 0; i < len; i++) {
-    // credit: @ttsugriy  for the vectorizable approach
-    counter++;                                      // ASCII
-    counter += static_cast<size_t>(p[i] > 0x7F);    // two-byte
-    counter += static_cast<size_t>(p[i] > 0x7FF);   // three-byte
-    counter += static_cast<size_t>(p[i] > 0xFFFF);  // four-bytes
-  }
-  return counter;
-}
+namespace simdutf {
+namespace rvv {
 
-inline size_t utf16_length_from_utf32(const char32_t* buf, size_t len) {
-  // We are not BOM aware.
-  const uint32_t * p = reinterpret_cast<const uint32_t *>(buf);
-  size_t counter{0};
-  for(size_t i = 0; i < len; i++) {
-    counter++;                                      // non-surrogate word
-    counter += static_cast<size_t>(p[i] > 0xFFFF);  // surrogate pair
-  }
-  return counter;
-}
+namespace {
+using namespace simdutf;
+} // namespace
 
-inline size_t latin1_length_from_utf32(size_t len) {
-  // We are not BOM aware.
-  return len; // a utf32 codepoint will always represent 1 latin1 character
-}
+class implementation final : public simdutf::implementation {
+public:
+  simdutf_really_inline implementation()
+      : simdutf::implementation("rvv", "RISC-V Vector Extension",
+                                internal::instruction_set::RVV),
+        _supports_zvbb(internal::detect_supported_architectures() &
+                       internal::instruction_set::ZVBB) {}
+  simdutf_warn_unused int detect_encodings(const char *buf,
+                                           size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t len,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t len) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t len) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *buf, size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *buf, size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16le(const char16_t *buf, size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16be(const char16_t *buf, size_t len) const noexcept;
+  simdutf_warn_unused size_t utf16_length_from_utf8(const char *buf,
+                                                    size_t len) const noexcept;
+  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *buf,
+                                                    size_t len) const noexcept;
+  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *buf,
+                                                     size_t len) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf8(const char *buf,
+                                                    size_t len) const noexcept;
+  simdutf_warn_unused size_t latin1_length_from_utf8(const char *buf,
+                                                     size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t len) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t len) const noexcept;
+  simdutf_warn_unused size_t utf8_length_from_latin1(const char *buf,
+                                                     size_t len) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 
-inline simdutf_warn_unused uint32_t swap_bytes(const uint32_t word) {
-  return ((word >> 24) & 0xff) |      // move byte 3 to byte 0
-         ((word << 8) & 0xff0000) |   // move byte 1 to byte 2
-         ((word >> 8) & 0xff00) |     // move byte 2 to byte 1
-         ((word << 24) & 0xff000000); // byte 0 to byte 3
-}
+private:
+  const bool _supports_zvbb;
 
-} // utf32 namespace
-} // unnamed namespace
-} // namespace scalar
+#if SIMDUTF_IS_ZVBB
+  bool supports_zvbb() const { return true; }
+#elif SIMDUTF_HAS_ZVBB_INTRINSICS
+  bool supports_zvbb() const { return _supports_zvbb; }
+#else
+  bool supports_zvbb() const { return false; }
+#endif
+};
+
+} // namespace rvv
 } // namespace simdutf
 
+#endif // SIMDUTF_RVV_IMPLEMENTATION_H
+/* end file src/simdutf/rvv/implementation.h */
+/* begin file src/simdutf/rvv/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "rvv"
+// #define SIMDUTF_IMPLEMENTATION rvv
+
+#if SIMDUTF_CAN_ALWAYS_RUN_RVV
+// nothing needed.
+#else
+SIMDUTF_TARGET_RVV
 #endif
-/* end file src/scalar/utf32.h */
-/* begin file src/scalar/base64.h */
-#ifndef SIMDUTF_BASE64_H
-#define SIMDUTF_BASE64_H
+/* end file src/simdutf/rvv/begin.h */
+/* begin file src/simdutf/rvv/intrinsics.h */
+#ifndef SIMDUTF_RVV_INTRINSICS_H
+#define SIMDUTF_RVV_INTRINSICS_H
 
-#include <cstddef>
-#include <cstdint>
-#include <cstring>
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace base64 {
 
-// This function is not expected to be fast. Do not use in long loops.
-template <class char_type>
-bool is_ascii_white_space(char_type c) {
-  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';
+#include <riscv_vector.h>
+
+#if __riscv_v_intrinsic >= 1000000 || __GCC__ >= 14
+  #define simdutf_vrgather_u8m1x2(tbl, idx)                                    \
+    __riscv_vcreate_v_u8m1_u8m2(                                               \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 0),        \
+                                 __riscv_vsetvlmax_e8m1()),                    \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 1),        \
+                                 __riscv_vsetvlmax_e8m1()));
+
+  #define simdutf_vrgather_u8m1x4(tbl, idx)                                    \
+    __riscv_vcreate_v_u8m1_u8m4(                                               \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 0),        \
+                                 __riscv_vsetvlmax_e8m1()),                    \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 1),        \
+                                 __riscv_vsetvlmax_e8m1()),                    \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 2),        \
+                                 __riscv_vsetvlmax_e8m1()),                    \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 3),        \
+                                 __riscv_vsetvlmax_e8m1()));
+#else
+  // This has worse codegen on gcc
+  #define simdutf_vrgather_u8m1x2(tbl, idx)                                    \
+    __riscv_vset_v_u8m1_u8m2(                                                  \
+        __riscv_vlmul_ext_v_u8m1_u8m2(__riscv_vrgather_vv_u8m1(                \
+            tbl, __riscv_vget_v_u8m2_u8m1(idx, 0), __riscv_vsetvlmax_e8m1())), \
+        1,                                                                     \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m2_u8m1(idx, 1),        \
+                                 __riscv_vsetvlmax_e8m1()))
+
+  #define simdutf_vrgather_u8m1x4(tbl, idx)                                    \
+    __riscv_vset_v_u8m1_u8m4(                                                  \
+        __riscv_vset_v_u8m1_u8m4(                                              \
+            __riscv_vset_v_u8m1_u8m4(                                          \
+                __riscv_vlmul_ext_v_u8m1_u8m4(__riscv_vrgather_vv_u8m1(        \
+                    tbl, __riscv_vget_v_u8m4_u8m1(idx, 0),                     \
+                    __riscv_vsetvlmax_e8m1())),                                \
+                1,                                                             \
+                __riscv_vrgather_vv_u8m1(tbl,                                  \
+                                         __riscv_vget_v_u8m4_u8m1(idx, 1),     \
+                                         __riscv_vsetvlmax_e8m1())),           \
+            2,                                                                 \
+            __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 2),    \
+                                     __riscv_vsetvlmax_e8m1())),               \
+        3,                                                                     \
+        __riscv_vrgather_vv_u8m1(tbl, __riscv_vget_v_u8m4_u8m1(idx, 3),        \
+                                 __riscv_vsetvlmax_e8m1()))
+#endif
+
+/* Zvbb adds dedicated support for endianness swaps with vrev8, but if we can't
+ * use that, we have to emulate it with the standard V extension.
+ * Using LMUL=1 vrgathers could be faster than the srl+macc variant, but that
+ * would increase register pressure, and vrgather implementations performance
+ * varies a lot. */
+enum class simdutf_ByteFlip { NONE, V, ZVBB };
+
+template <simdutf_ByteFlip method>
+simdutf_really_inline static uint16_t simdutf_byteflip(uint16_t v) {
+  if (method != simdutf_ByteFlip::NONE)
+    return (uint16_t)((v * 1u) << 8 | (v * 1u) >> 8);
+  return v;
 }
 
-template <class char_type>
-bool is_eight_byte(char_type c) {
-  if(sizeof(char_type) == 1) {
-    return true;
-  }
-  return uint8_t(c) == c;
+#ifdef SIMDUTF_TARGET_ZVBB
+SIMDUTF_UNTARGET_REGION
+SIMDUTF_TARGET_ZVBB
+#endif
+
+template <simdutf_ByteFlip method>
+simdutf_really_inline static vuint16m1_t simdutf_byteflip(vuint16m1_t v,
+                                                          size_t vl) {
+#if SIMDUTF_HAS_ZVBB_INTRINSICS
+  if (method == simdutf_ByteFlip::ZVBB)
+    return __riscv_vrev8_v_u16m1(v, vl);
+#endif
+  if (method == simdutf_ByteFlip::V)
+    return __riscv_vmacc_vx_u16m1(__riscv_vsrl_vx_u16m1(v, 8, vl), 0x100, v,
+                                  vl);
+  return v;
 }
 
-// Returns true upon success. The destination buffer must be large enough.
-// This functions assumes that the padding (=) has been removed.
-template <class char_type>
-result base64_tail_decode(char *dst, const char_type *src, size_t length, base64_options options) {
-  // This looks like 5 branches, but we expect the compiler to resolve this to a single branch:
-  const uint8_t *to_base64 = (options & base64_url) ? tables::base64::to_base64_url_value : tables::base64::to_base64_value;
-  const uint32_t *d0 = (options & base64_url) ? tables::base64::base64_url::d0 : tables::base64::base64_default::d0;
-  const uint32_t *d1 = (options & base64_url) ? tables::base64::base64_url::d1 : tables::base64::base64_default::d1;
-  const uint32_t *d2 = (options & base64_url) ? tables::base64::base64_url::d2 : tables::base64::base64_default::d2;
-  const uint32_t *d3 = (options & base64_url) ? tables::base64::base64_url::d3 : tables::base64::base64_default::d3;
+template <simdutf_ByteFlip method>
+simdutf_really_inline static vuint16m2_t simdutf_byteflip(vuint16m2_t v,
+                                                          size_t vl) {
+#if SIMDUTF_HAS_ZVBB_INTRINSICS
+  if (method == simdutf_ByteFlip::ZVBB)
+    return __riscv_vrev8_v_u16m2(v, vl);
+#endif
+  if (method == simdutf_ByteFlip::V)
+    return __riscv_vmacc_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 8, vl), 0x100, v,
+                                  vl);
+  return v;
+}
 
-  const char_type *srcend = src + length;
-  const char_type *srcinit = src;
-  const char *dstinit = dst;
+template <simdutf_ByteFlip method>
+simdutf_really_inline static vuint16m4_t simdutf_byteflip(vuint16m4_t v,
+                                                          size_t vl) {
+#if SIMDUTF_HAS_ZVBB_INTRINSICS
+  if (method == simdutf_ByteFlip::ZVBB)
+    return __riscv_vrev8_v_u16m4(v, vl);
+#endif
+  if (method == simdutf_ByteFlip::V)
+    return __riscv_vmacc_vx_u16m4(__riscv_vsrl_vx_u16m4(v, 8, vl), 0x100, v,
+                                  vl);
+  return v;
+}
 
-  uint32_t x;
-  size_t idx;
-  uint8_t buffer[4];
-  while (true) {
-    while (src + 4 <= srcend && is_eight_byte(src[0]) && is_eight_byte(src[1]) && is_eight_byte(src[2]) && is_eight_byte(src[3]) &&
-           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
-                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
-      if(match_system(endianness::BIG)) {
-        x = scalar::utf32::swap_bytes(x);
-      }
-      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
-      dst += 3;
-      src += 4;
-    }
-    idx = 0;
-    // we need at least four characters.
-    while (idx < 4 && src < srcend) {
-      char_type c = *src;
-      uint8_t code = to_base64[uint8_t(c)];
-      buffer[idx] = uint8_t(code);
-      if (is_eight_byte(c) && code <= 63) {
-        idx++;
-      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
-        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-      } else {
-        // We have a space or a newline. We ignore it.
-      }
-      src++;
-    }
-    if (idx != 4) {
-      if (idx == 2) {
-        uint32_t triple =
-            (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
-        if(match_system(endianness::BIG)) {
-          triple <<= 8;
-          std::memcpy(dst, &triple, 1);
-        } else {
-          triple = scalar::utf32::swap_bytes(triple);
-          triple >>= 8;
-          std::memcpy(dst, &triple, 1);
-        }
-        dst += 1;
-
-      } else if (idx == 3) {
-        uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
-                          (uint32_t(buffer[1]) << 2 * 6) +
-                          (uint32_t(buffer[2]) << 1 * 6);
-        if(match_system(endianness::BIG)) {
-          triple <<= 8;
-          std::memcpy(dst, &triple, 2);
-        } else {
-          triple = scalar::utf32::swap_bytes(triple);
-          triple >>= 8;
-          std::memcpy(dst, &triple, 2);
-        }
-        dst += 2;
-      } else if (idx == 1) {
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      return {SUCCESS, size_t(dst - dstinit)};
-    }
-
-    uint32_t triple =
-        (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6) +
-        (uint32_t(buffer[2]) << 1 * 6) + (uint32_t(buffer[3]) << 0 * 6);
-    if(match_system(endianness::BIG)) {
-      triple <<= 8;
-      std::memcpy(dst, &triple, 3);
-    } else {
-      triple = scalar::utf32::swap_bytes(triple);
-      triple >>= 8;
-      std::memcpy(dst, &triple, 3);
-    }
-    dst += 3;
-  }
+template <simdutf_ByteFlip method>
+simdutf_really_inline static vuint16m8_t simdutf_byteflip(vuint16m8_t v,
+                                                          size_t vl) {
+#if SIMDUTF_HAS_ZVBB_INTRINSICS
+  if (method == simdutf_ByteFlip::ZVBB)
+    return __riscv_vrev8_v_u16m8(v, vl);
+#endif
+  if (method == simdutf_ByteFlip::V)
+    return __riscv_vmacc_vx_u16m8(__riscv_vsrl_vx_u16m8(v, 8, vl), 0x100, v,
+                                  vl);
+  return v;
 }
 
-// like base64_tail_decode, but it will not write past the end of the output buffer.
-// outlen is modified to reflect the number of bytes written.
-// This functions assumes that the padding (=) has been removed.
-template <class char_type>
-result base64_tail_decode_safe(char *dst, size_t& outlen, const char_type *src, size_t length, base64_options options) {
-  if(length == 0) {
-    outlen = 0;
-    return {SUCCESS, 0};
-  }
-  // This looks like 5 branches, but we expect the compiler to resolve this to a single branch:
-  const uint8_t *to_base64 = (options & base64_url) ? tables::base64::to_base64_url_value : tables::base64::to_base64_value;
-  const uint32_t *d0 = (options & base64_url) ? tables::base64::base64_url::d0 : tables::base64::base64_default::d0;
-  const uint32_t *d1 = (options & base64_url) ? tables::base64::base64_url::d1 : tables::base64::base64_default::d1;
-  const uint32_t *d2 = (options & base64_url) ? tables::base64::base64_url::d2 : tables::base64::base64_default::d2;
-  const uint32_t *d3 = (options & base64_url) ? tables::base64::base64_url::d3 : tables::base64::base64_default::d3;
+#ifdef SIMDUTF_TARGET_ZVBB
+SIMDUTF_UNTARGET_REGION
+SIMDUTF_TARGET_RVV
+#endif
 
-  const char_type *srcend = src + length;
-  const char_type *srcinit = src;
-  const char *dstinit = dst;
-  const char *dstend = dst + outlen;
+#endif //  SIMDUTF_RVV_INTRINSICS_H
+/* end file src/simdutf/rvv/intrinsics.h */
+/* begin file src/simdutf/rvv/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_RVV
+// nothing needed.
+#else
+SIMDUTF_UNTARGET_REGION
+#endif
 
-  uint32_t x;
-  size_t idx;
-  uint8_t buffer[4];
-  while (true) {
-    while (src + 4 <= srcend && is_eight_byte(src[0]) && is_eight_byte(src[1]) && is_eight_byte(src[2]) && is_eight_byte(src[3]) &&
-           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
-                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
-      if(match_system(endianness::BIG)) {
-        x = scalar::utf32::swap_bytes(x);
-      }
-      if(dstend - dst < 3) {
-        outlen = size_t(dst - dstinit);
-        return {OUTPUT_BUFFER_TOO_SMALL, size_t(src - srcinit)};
-      }
-      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
-      dst += 3;
-      src += 4;
-    }
-    idx = 0;
-    const char_type *srccur = src;
+/* end file src/simdutf/rvv/end.h */
 
-    // we need at least four characters.
-    while (idx < 4 && src < srcend) {
-      char_type c = *src;
-      uint8_t code = to_base64[uint8_t(c)];
-      buffer[idx] = uint8_t(code);
-      if (is_eight_byte(c) && code <= 63) {
-        idx++;
-      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
-        outlen = size_t(dst - dstinit);
-        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-      } else {
-        // We have a space or a newline. We ignore it.
-      }
-      src++;
-    }
-    if (idx != 4) {
-      if (idx == 2) {
-        if(dst == dstend) {
-          outlen = size_t(dst - dstinit);
-          return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
-        }
-        uint32_t triple =
-            (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
-        if(match_system(endianness::BIG)) {
-          triple <<= 8;
-          std::memcpy(dst, &triple, 1);
-        } else {
-          triple = scalar::utf32::swap_bytes(triple);
-          triple >>= 8;
-          std::memcpy(dst, &triple, 1);
-        }
-        dst += 1;
+#endif // SIMDUTF_IMPLEMENTATION_RVV
 
-      } else if (idx == 3) {
-        if(dstend - dst < 2) {
-          outlen = size_t(dst - dstinit);
-          return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
-        }
-        uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
-                          (uint32_t(buffer[1]) << 2 * 6) +
-                          (uint32_t(buffer[2]) << 1 * 6);
-        if(match_system(endianness::BIG)) {
-          triple <<= 8;
-          std::memcpy(dst, &triple, 2);
-        } else {
-          triple = scalar::utf32::swap_bytes(triple);
-          triple >>= 8;
-          std::memcpy(dst, &triple, 2);
-        }
-        dst += 2;
-      } else if (idx == 1) {
-        outlen = size_t(dst - dstinit);
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      outlen = size_t(dst - dstinit);
-      return {SUCCESS, size_t(dst - dstinit)};
-    }
-    if(dstend - dst < 3) {
-      outlen = size_t(dst - dstinit);
-      return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
-    }
-    uint32_t triple =
-        (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6) +
-        (uint32_t(buffer[2]) << 1 * 6) + (uint32_t(buffer[3]) << 0 * 6);
-    if(match_system(endianness::BIG)) {
-      triple <<= 8;
-      std::memcpy(dst, &triple, 3);
-    } else {
-      triple = scalar::utf32::swap_bytes(triple);
-      triple >>= 8;
-      std::memcpy(dst, &triple, 3);
-    }
-    dst += 3;
-  }
-}
+#endif // SIMDUTF_RVV_H
+/* end file src/simdutf/rvv.h */
+/* begin file src/simdutf/fallback.h */
+#ifndef SIMDUTF_FALLBACK_H
+#define SIMDUTF_FALLBACK_H
 
-// Returns the number of bytes written. The destination buffer must be large
-// enough. It will add padding (=) if needed.
-size_t tail_encode_base64(char *dst, const char *src, size_t srclen, base64_options options) {
-  // By default, we use padding if we are not using the URL variant.
-  // This is check with ((options & base64_url) == 0) which returns true if we are not using the URL variant.
-  // However, we also allow 'inversion' of the convention with the base64_reverse_padding option.
-  // If the base64_reverse_padding option is set, we use padding if we are using the URL variant,
-  // and we omit it if we are not using the URL variant. This is checked with
-  // ((options & base64_reverse_padding) == base64_reverse_padding).
-  bool use_padding = ((options & base64_url) == 0) ^ ((options & base64_reverse_padding) == base64_reverse_padding);
-  // This looks like 3 branches, but we expect the compiler to resolve this to a single branch:
-  const char *e0 = (options & base64_url) ? tables::base64::base64_url::e0 : tables::base64::base64_default::e0;
-  const char *e1 = (options & base64_url) ? tables::base64::base64_url::e1 : tables::base64::base64_default::e1;
-  const char *e2 = (options & base64_url) ? tables::base64::base64_url::e2 : tables::base64::base64_default::e2;
-  char *out = dst;
-  size_t i = 0;
-  uint8_t t1, t2, t3;
-  for (; i + 2 < srclen; i += 3) {
-    t1 = uint8_t(src[i]);
-    t2 = uint8_t(src[i + 1]);
-    t3 = uint8_t(src[i + 2]);
-    *out++ = e0[t1];
-    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
-    *out++ = e1[((t2 & 0x0F) << 2) | ((t3 >> 6) & 0x03)];
-    *out++ = e2[t3];
-  }
-  switch (srclen - i) {
-  case 0:
-    break;
-  case 1:
-    t1 = uint8_t(src[i]);
-    *out++ = e0[t1];
-    *out++ = e1[(t1 & 0x03) << 4];
-    if(use_padding) {
-      *out++ = '=';
-      *out++ = '=';
-    }
-    break;
-  default: /* case 2 */
-    t1 = uint8_t(src[i]);
-    t2 = uint8_t(src[i + 1]);
-    *out++ = e0[t1];
-    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
-    *out++ = e2[(t2 & 0x0F) << 2];
-    if(use_padding) {
-      *out++ = '=';
-    }
-  }
-  return (size_t)(out - dst);
-}
 
-template <class char_type>
-simdutf_warn_unused size_t maximal_binary_length_from_base64(const char_type * input, size_t length) noexcept {
-  // We follow https://infra.spec.whatwg.org/#forgiving-base64-decode
-  size_t padding = 0;
-  if(length > 0) {
-    if(input[length - 1] == '=') {
-      padding++;
-      if(length > 1 && input[length - 2] == '=') {
-        padding++;
-      }
-    }
-  }
-  size_t actual_length = length - padding;
-  if(actual_length % 4 <= 1) {
-    return actual_length / 4 * 3;
-  }
-  // if we have a valid input, then the remainder must be 2 or 3 adding one or two extra bytes.
-  return  actual_length / 4 * 3 + (actual_length %4)  - 1;
-}
+// Note that fallback.h is always imported last.
 
-simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) noexcept {
-  // By default, we use padding if we are not using the URL variant.
-  // This is check with ((options & base64_url) == 0) which returns true if we are not using the URL variant.
-  // However, we also allow 'inversion' of the convention with the base64_reverse_padding option.
-  // If the base64_reverse_padding option is set, we use padding if we are using the URL variant,
-  // and we omit it if we are not using the URL variant. This is checked with
-  // ((options & base64_reverse_padding) == base64_reverse_padding).
-  bool use_padding = ((options & base64_url) == 0) ^ ((options & base64_reverse_padding) == base64_reverse_padding);
-  if(!use_padding) {
-    return length/3 * 4 + ((length % 3) ? (length % 3) + 1 : 0);
-  }
-  return (length + 2)/3 * 4; // We use padding to make the length a multiple of 4.
-}
+// Default Fallback to on unless a builtin implementation has already been
+// selected.
+#ifndef SIMDUTF_IMPLEMENTATION_FALLBACK
+  #if SIMDUTF_CAN_ALWAYS_RUN_ARM64 || SIMDUTF_CAN_ALWAYS_RUN_ICELAKE ||        \
+      SIMDUTF_CAN_ALWAYS_RUN_HASWELL || SIMDUTF_CAN_ALWAYS_RUN_WESTMERE ||     \
+      SIMDUTF_CAN_ALWAYS_RUN_PPC64 || SIMDUTF_CAN_ALWAYS_RUN_RVV
+    #define SIMDUTF_IMPLEMENTATION_FALLBACK 0
+  #else
+    #define SIMDUTF_IMPLEMENTATION_FALLBACK 1
+  #endif
+#endif
 
-} // namespace base64
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+#define SIMDUTF_CAN_ALWAYS_RUN_FALLBACK (SIMDUTF_IMPLEMENTATION_FALLBACK)
 
-#endif
-/* end file src/scalar/base64.h */
-/* begin file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
-#ifndef SIMDUTF_LATIN1_TO_UTF8_H
-#define SIMDUTF_LATIN1_TO_UTF8_H
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
 
 namespace simdutf {
-namespace scalar {
-namespace {
-namespace latin1_to_utf8 {
+/**
+ * Fallback implementation (runs on any machine).
+ */
+namespace fallback {} // namespace fallback
+} // namespace simdutf
 
-inline size_t convert(const char* buf, size_t len, char* utf8_output) {
-  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  size_t pos = 0;
-  size_t utf8_pos = 0;
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000 1000 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) == 0) { // if NONE of these are set, e.g. all of them are zero, then everything is ASCII
-        size_t final_pos = pos + 16;
-        while(pos < final_pos) {
-          utf8_output[utf8_pos++] = char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
+/* begin file src/simdutf/fallback/implementation.h */
+#ifndef SIMDUTF_FALLBACK_IMPLEMENTATION_H
+#define SIMDUTF_FALLBACK_IMPLEMENTATION_H
 
-    unsigned char byte = data[pos];
-    if((byte & 0x80) == 0) { // if ASCII
-      // will generate one UTF-8 bytes
-      utf8_output[utf8_pos++] = char(byte);
-      pos++;
-    } else {
-      // will generate two UTF-8 bytes
-      utf8_output[utf8_pos++] = char((byte>>6) | 0b11000000);
-      utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
-      pos++;
-    }
-  }
-  return utf8_pos;
+
+namespace simdutf {
+namespace fallback {
+
+namespace {
+using namespace simdutf;
 }
 
-inline size_t convert_safe(const char* buf, size_t len, char* utf8_output, size_t utf8_len) {
-  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  size_t pos = 0;
-  size_t skip_pos = 0;
-  size_t utf8_pos = 0;
-  while (pos < len && utf8_pos < utf8_len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos >= skip_pos && pos + 16 <= len && utf8_pos + 16 <= utf8_len) { // if it is safe to read 16 more bytes, check that they are ascii
+class implementation final : public simdutf::implementation {
+public:
+  simdutf_really_inline implementation()
+      : simdutf::implementation("fallback", "Generic fallback implementation",
+                                0) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
+};
+} // namespace fallback
+} // namespace simdutf
+
+#endif // SIMDUTF_FALLBACK_IMPLEMENTATION_H
+/* end file src/simdutf/fallback/implementation.h */
+
+/* begin file src/simdutf/fallback/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "fallback"
+// #define SIMDUTF_IMPLEMENTATION fallback
+/* end file src/simdutf/fallback/begin.h */
+
+  // Declarations
+/* begin file src/simdutf/fallback/bitmanipulation.h */
+#ifndef SIMDUTF_FALLBACK_BITMANIPULATION_H
+#define SIMDUTF_FALLBACK_BITMANIPULATION_H
+
+#include <limits>
+
+namespace simdutf {
+namespace fallback {
+namespace {} // unnamed namespace
+} // namespace fallback
+} // namespace simdutf
+
+#endif // SIMDUTF_FALLBACK_BITMANIPULATION_H
+/* end file src/simdutf/fallback/bitmanipulation.h */
+
+/* begin file src/simdutf/fallback/end.h */
+/* end file src/simdutf/fallback/end.h */
+
+#endif // SIMDUTF_IMPLEMENTATION_FALLBACK
+#endif // SIMDUTF_FALLBACK_H
+/* end file src/simdutf/fallback.h */
+
+/* begin file src/scalar/utf8.h */
+#ifndef SIMDUTF_UTF8_H
+#define SIMDUTF_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8 {
+#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_RVV
+// only used by the fallback kernel.
+// credit: based on code from Google Fuchsia (Apache Licensed)
+inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  uint64_t pos = 0;
+  uint32_t code_point = 0;
+  while (pos < len) {
+    // check of the next 16 bytes are ascii.
+    uint64_t next_pos = pos + 16;
+    if (next_pos <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      std::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000 1000 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) == 0) { // if NONE of these are set, e.g. all of them are zero, then everything is ASCII
-        ::memcpy(utf8_output + utf8_pos, buf + pos, 16);
-        utf8_pos += 16;
-        pos += 16;
-      } else {
-				// At least one of the next 16 bytes are not ASCII, we will process them one by one
-        skip_pos = pos + 16;
+      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        pos = next_pos;
+        continue;
+      }
+    }
+    unsigned char byte = data[pos];
+
+    while (byte < 0b10000000) {
+      if (++pos == len) {
+        return true;
+      }
+      byte = data[pos];
+    }
+
+    if ((byte & 0b11100000) == 0b11000000) {
+      next_pos = pos + 2;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if ((code_point < 0x80) || (0x7ff < code_point)) {
+        return false;
+      }
+    } else if ((byte & 0b11110000) == 0b11100000) {
+      next_pos = pos + 3;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point = (byte & 0b00001111) << 12 |
+                   (data[pos + 1] & 0b00111111) << 6 |
+                   (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point) ||
+          (0xd7ff < code_point && code_point < 0xe000)) {
+        return false;
+      }
+    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
+      next_pos = pos + 4;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point =
+          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
+          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return false;
       }
     } else {
-      const auto byte = data[pos];
-      if((byte & 0x80) == 0) { // if ASCII
-        // will generate one UTF-8 bytes
-        utf8_output[utf8_pos++] = char(byte);
-        pos++;
-      } else if (utf8_pos + 2 <= utf8_len) {
-        // will generate two UTF-8 bytes
-        utf8_output[utf8_pos++] = char((byte>>6) | 0b11000000);
-        utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
-        pos++;
+      // we may have a continuation
+      return false;
+    }
+    pos = next_pos;
+  }
+  return true;
+}
+#endif
+
+inline simdutf_warn_unused result validate_with_errors(const char *buf,
+                                                       size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  uint32_t code_point = 0;
+  while (pos < len) {
+    // check of the next 16 bytes are ascii.
+    size_t next_pos = pos + 16;
+    if (next_pos <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      std::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        pos = next_pos;
+        continue;
+      }
+    }
+    unsigned char byte = data[pos];
+
+    while (byte < 0b10000000) {
+      if (++pos == len) {
+        return result(error_code::SUCCESS, len);
+      }
+      byte = data[pos];
+    }
+
+    if ((byte & 0b11100000) == 0b11000000) {
+      next_pos = pos + 2;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if ((code_point < 0x80) || (0x7ff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+    } else if ((byte & 0b11110000) == 0b11100000) {
+      next_pos = pos + 3;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point = (byte & 0b00001111) << 12 |
+                   (data[pos + 1] & 0b00111111) << 6 |
+                   (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
+    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
+      next_pos = pos + 4;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point =
+          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
+          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+    } else {
+      // we either have too many continuation bytes or an invalid leading byte
+      if ((byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
       } else {
-        break;
+        return result(error_code::HEADER_BITS, pos);
       }
     }
+    pos = next_pos;
   }
-  return utf8_pos;
+  return result(error_code::SUCCESS, len);
 }
 
-} // latin1_to_utf8 namespace
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+// Finds the previous leading byte starting backward from buf and validates with
+// errors from there Used to pinpoint the location of an error when an invalid
+// chunk is detected We assume that the stream starts with a leading byte, and
+// to check that it is the case, we ask that you pass a pointer to the start of
+// the stream (start).
+inline simdutf_warn_unused result rewind_and_validate_with_errors(
+    const char *start, const char *buf, size_t len) noexcept {
+  // First check that we start with a leading byte
+  if ((*start & 0b11000000) == 0b10000000) {
+    return result(error_code::TOO_LONG, 0);
+  }
+  size_t extra_len{0};
+  // A leading byte cannot be further than 4 bytes away
+  for (int i = 0; i < 5; i++) {
+    unsigned char byte = *buf;
+    if ((byte & 0b11000000) != 0b10000000) {
+      break;
+    } else {
+      buf--;
+      extra_len++;
+    }
+  }
 
-#endif
-/* end file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
+  result res = validate_with_errors(buf, len + extra_len);
+  res.count -= extra_len;
+  return res;
+}
 
-namespace simdutf {
-bool implementation::supported_by_runtime_system() const {
-  uint32_t required_instruction_sets = this->required_instruction_sets();
-  uint32_t supported_instruction_sets = internal::detect_supported_architectures();
-  return ((supported_instruction_sets & required_instruction_sets) == required_instruction_sets);
-}
-
-simdutf_warn_unused encoding_type implementation::autodetect_encoding(const char * input, size_t length) const noexcept {
-    // If there is a BOM, then we trust it.
-    auto bom_encoding = simdutf::BOM::check_bom(input, length);
-    if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
-    // UTF8 is common, it includes ASCII, and is commonly represented
-    // without a BOM, so if it fits, go with that. Note that it is still
-    // possible to get it wrong, we are only 'guessing'. If some has UTF-16
-    // data without a BOM, it could pass as UTF-8.
-    //
-    // An interesting twist might be to check for UTF-16 ASCII first (every
-    // other byte is zero).
-    if(validate_utf8(input, length)) { return encoding_type::UTF8; }
-    // The next most common encoding that might appear without BOM is probably
-    // UTF-16LE, so try that next.
-    if((length % 2) == 0) {
-      // important: we need to divide by two
-      if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { return encoding_type::UTF16_LE; }
-    }
-    if((length % 4) == 0) {
-      if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { return encoding_type::UTF32_LE; }
+inline size_t count_code_points(const char *buf, size_t len) {
+  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    // -65 is 0b10111111, anything larger in two-complement's should start a new
+    // code point.
+    if (p[i] > -65) {
+      counter++;
     }
-    return encoding_type::unspecified;
+  }
+  return counter;
 }
 
-namespace internal {
-// When there is a single implementation, we should not pay a price
- // for dispatching to the best implementation. We should just use the
- // one we have. This is a compile-time check.
- #define SIMDUTF_SINGLE_IMPLEMENTATION (SIMDUTF_IMPLEMENTATION_ICELAKE \
-              + SIMDUTF_IMPLEMENTATION_HASWELL + SIMDUTF_IMPLEMENTATION_WESTMERE \
-              + SIMDUTF_IMPLEMENTATION_ARM64 + SIMDUTF_IMPLEMENTATION_PPC64 \
-              + SIMDUTF_IMPLEMENTATION_FALLBACK == 1)
+inline size_t utf16_length_from_utf8(const char *buf, size_t len) {
+  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    if (p[i] > -65) {
+      counter++;
+    }
+    if (uint8_t(p[i]) >= 240) {
+      counter++;
+    }
+  }
+  return counter;
+}
 
-// Static array of known implementations. We are hoping these get baked into the executable
-// without requiring a static initializer.
+simdutf_warn_unused inline size_t trim_partial_utf8(const char *input,
+                                                    size_t length) {
+  if (length < 3) {
+    switch (length) {
+    case 2:
+      if (uint8_t(input[length - 1]) >= 0xc0) {
+        return length - 1;
+      } // 2-, 3- and 4-byte characters with only 1 byte left
+      if (uint8_t(input[length - 2]) >= 0xe0) {
+        return length - 2;
+      } // 3- and 4-byte characters with only 2 bytes left
+      return length;
+    case 1:
+      if (uint8_t(input[length - 1]) >= 0xc0) {
+        return length - 1;
+      } // 2-, 3- and 4-byte characters with only 1 byte left
+      return length;
+    case 0:
+      return length;
+    }
+  }
+  if (uint8_t(input[length - 1]) >= 0xc0) {
+    return length - 1;
+  } // 2-, 3- and 4-byte characters with only 1 byte left
+  if (uint8_t(input[length - 2]) >= 0xe0) {
+    return length - 2;
+  } // 3- and 4-byte characters with only 1 byte left
+  if (uint8_t(input[length - 3]) >= 0xf0) {
+    return length - 3;
+  } // 4-byte characters with only 3 bytes left
+  return length;
+}
 
+} // namespace utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
 
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-static const icelake::implementation* get_icelake_singleton() {
-  static const icelake::implementation icelake_singleton{};
-  return &icelake_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_HASWELL
-static const haswell::implementation* get_haswell_singleton() {
-  static const haswell::implementation haswell_singleton{};
-  return &haswell_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_WESTMERE
-static const westmere::implementation* get_westmere_singleton() {
-  static const westmere::implementation westmere_singleton{};
-  return &westmere_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_ARM64
-static const arm64::implementation* get_arm64_singleton() {
-  static const arm64::implementation arm64_singleton{};
-  return &arm64_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_PPC64
-static const ppc64::implementation* get_ppc64_singleton() {
-  static const ppc64::implementation ppc64_singleton{};
-  return &ppc64_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_RVV
-static const rvv::implementation* get_rvv_singleton() {
-  static const rvv::implementation rvv_singleton{};
-  return &rvv_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-static const fallback::implementation* get_fallback_singleton() {
-  static const fallback::implementation fallback_singleton{};
-  return &fallback_singleton;
-}
-#endif
-
-#if SIMDUTF_SINGLE_IMPLEMENTATION
-static const implementation* get_single_implementation() {
-    return
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-    get_icelake_singleton();
-#endif
-#if SIMDUTF_IMPLEMENTATION_HASWELL
-    get_haswell_singleton();
-#endif
-#if SIMDUTF_IMPLEMENTATION_WESTMERE
-    get_westmere_singleton();
-#endif
-#if SIMDUTF_IMPLEMENTATION_ARM64
-    get_arm64_singleton();
-#endif
-#if SIMDUTF_IMPLEMENTATION_PPC64
-    get_ppc64_singleton();
-#endif
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-    get_fallback_singleton();
-#endif
-}
 #endif
+/* end file src/scalar/utf8.h */
+/* begin file src/scalar/utf16.h */
+#ifndef SIMDUTF_UTF16_H
+#define SIMDUTF_UTF16_H
 
-/**
- * @private Detects best supported implementation on first use, and sets it
- */
-class detect_best_supported_implementation_on_first_use final : public implementation {
-public:
-  std::string name() const noexcept final { return set_best()->name(); }
-  std::string description() const noexcept final { return set_best()->description(); }
-  uint32_t required_instruction_sets() const noexcept final { return set_best()->required_instruction_sets(); }
-
-  simdutf_warn_unused int detect_encodings(const char * input, size_t length) const noexcept override {
-    return set_best()->detect_encodings(input, length);
-  }
-
-  simdutf_warn_unused bool validate_utf8(const char * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf8(buf, len);
-  }
-
-  simdutf_warn_unused result validate_utf8_with_errors(const char * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf8_with_errors(buf, len);
-  }
-
-  simdutf_warn_unused bool validate_ascii(const char * buf, size_t len) const noexcept final override {
-    return set_best()->validate_ascii(buf, len);
-  }
-
-  simdutf_warn_unused result validate_ascii_with_errors(const char * buf, size_t len) const noexcept final override {
-    return set_best()->validate_ascii_with_errors(buf, len);
-  }
-
-  simdutf_warn_unused bool validate_utf16le(const char16_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16le(buf, len);
-  }
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16 {
 
-  simdutf_warn_unused bool validate_utf16be(const char16_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16be(buf, len);
-  }
+inline simdutf_warn_unused uint16_t swap_bytes(const uint16_t word) {
+  return uint16_t((word >> 8) | (word << 8));
+}
 
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16le_with_errors(buf, len);
+template <endianness big_endian>
+inline simdutf_warn_unused bool validate(const char16_t *buf,
+                                         size_t len) noexcept {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  uint64_t pos = 0;
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) == 0xD800) {
+      if (pos + 1 >= len) {
+        return false;
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return false;
+      }
+      uint16_t next_word =
+          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return false;
+      }
+      pos += 2;
+    } else {
+      pos++;
+    }
   }
+  return true;
+}
 
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16be_with_errors(buf, len);
+template <endianness big_endian>
+inline simdutf_warn_unused result validate_with_errors(const char16_t *buf,
+                                                       size_t len) noexcept {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) == 0xD800) {
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t next_word =
+          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      pos += 2;
+    } else {
+      pos++;
+    }
   }
+  return result(error_code::SUCCESS, pos);
+}
 
-  simdutf_warn_unused bool validate_utf32(const char32_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf32(buf, len);
+template <endianness big_endian>
+inline size_t count_code_points(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter += ((word & 0xFC00) != 0xDC00);
   }
+  return counter;
+}
 
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t * buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf32_with_errors(buf, len);
+template <endianness big_endian>
+inline size_t utf8_length_from_utf16(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter++; // ASCII
+    counter += static_cast<size_t>(
+        word >
+        0x7F); // non-ASCII is at least 2 bytes, surrogates are 2*2 == 4 bytes
+    counter += static_cast<size_t>((word > 0x7FF && word <= 0xD7FF) ||
+                                   (word >= 0xE000)); // three-byte
   }
+  return counter;
+}
 
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf8(buf, len, utf8_output);
+template <endianness big_endian>
+inline size_t utf32_length_from_utf16(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter += ((word & 0xFC00) != 0xDC00);
   }
+  return counter;
+}
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf16le(buf, len, utf16_output);
-  }
+inline size_t latin1_length_from_utf16(size_t len) { return len; }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf16be(buf, len, utf16_output);
+simdutf_really_inline void change_endianness_utf16(const char16_t *in,
+                                                   size_t size, char16_t *out) {
+  const uint16_t *input = reinterpret_cast<const uint16_t *>(in);
+  uint16_t *output = reinterpret_cast<uint16_t *>(out);
+  for (size_t i = 0; i < size; i++) {
+    *output++ = uint16_t(input[i] >> 8 | input[i] << 8);
   }
+}
 
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t * latin1_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf32(buf, len,latin1_output);
+template <endianness big_endian>
+simdutf_warn_unused inline size_t trim_partial_utf16(const char16_t *input,
+                                                     size_t length) {
+  if (length <= 1) {
+    return length;
   }
+  uint16_t last_word = uint16_t(input[length - 1]);
+  last_word = !match_system(big_endian) ? swap_bytes(last_word) : last_word;
+  length -= ((last_word & 0xFC00) == 0xD800);
+  return length;
+}
 
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf8_to_latin1(buf, len,latin1_output);
-  }
+} // namespace utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
 
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept  final override {
-  return set_best()->convert_utf8_to_latin1_with_errors(buf, len, latin1_output);
-  }
+#endif
+/* end file src/scalar/utf16.h */
+/* begin file src/scalar/utf32.h */
+#ifndef SIMDUTF_UTF32_H
+#define SIMDUTF_UTF32_H
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_latin1(buf, len,latin1_output);
-  }
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32 {
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16le(buf, len, utf16_output);
+inline simdutf_warn_unused bool validate(const char32_t *buf,
+                                         size_t len) noexcept {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  uint64_t pos = 0;
+  for (; pos < len; pos++) {
+    uint32_t word = data[pos];
+    if (word > 0x10FFFF || (word >= 0xD800 && word <= 0xDFFF)) {
+      return false;
+    }
   }
+  return true;
+}
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16be(buf, len, utf16_output);
+inline simdutf_warn_unused result validate_with_errors(const char32_t *buf,
+                                                       size_t len) noexcept {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  for (; pos < len; pos++) {
+    uint32_t word = data[pos];
+    if (word > 0x10FFFF) {
+      return result(error_code::TOO_LARGE, pos);
+    }
+    if (word >= 0xD800 && word <= 0xDFFF) {
+      return result(error_code::SURROGATE, pos);
+    }
   }
+  return result(error_code::SUCCESS, pos);
+}
 
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16le_with_errors(buf, len, utf16_output);
+inline size_t utf8_length_from_utf32(const char32_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    // credit: @ttsugriy  for the vectorizable approach
+    counter++;                                     // ASCII
+    counter += static_cast<size_t>(p[i] > 0x7F);   // two-byte
+    counter += static_cast<size_t>(p[i] > 0x7FF);  // three-byte
+    counter += static_cast<size_t>(p[i] > 0xFFFF); // four-bytes
   }
+  return counter;
+}
 
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16be_with_errors(buf, len, utf16_output);
+inline size_t utf16_length_from_utf32(const char32_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    counter++;                                     // non-surrogate word
+    counter += static_cast<size_t>(p[i] > 0xFFFF); // surrogate pair
   }
+  return counter;
+}
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf16le(buf, len, utf16_output);
-  }
+inline size_t latin1_length_from_utf32(size_t len) {
+  // We are not BOM aware.
+  return len; // a utf32 codepoint will always represent 1 latin1 character
+}
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf16be(buf, len, utf16_output);
-  }
+inline simdutf_warn_unused uint32_t swap_bytes(const uint32_t word) {
+  return ((word >> 24) & 0xff) |      // move byte 3 to byte 0
+         ((word << 8) & 0xff0000) |   // move byte 1 to byte 2
+         ((word >> 8) & 0xff00) |     // move byte 2 to byte 1
+         ((word << 24) & 0xff000000); // byte 0 to byte 3
+}
 
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf32(buf, len, utf32_output);
-  }
+} // namespace utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
 
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * buf, size_t len, char32_t* utf32_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf32_with_errors(buf, len, utf32_output);
-  }
+#endif
+/* end file src/scalar/utf32.h */
+/* begin file src/scalar/base64.h */
+#ifndef SIMDUTF_BASE64_H
+#define SIMDUTF_BASE64_H
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf32(buf, len, utf32_output);
-  }
+#include <cstddef>
+#include <cstdint>
+#include <cstring>
+#include <iostream>
 
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_latin1(buf, len, latin1_output);
-  }
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace base64 {
 
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_latin1(buf, len, latin1_output);
-  }
+// This function is not expected to be fast. Do not use in long loops.
+template <class char_type> bool is_ascii_white_space(char_type c) {
+  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';
+}
 
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_latin1_with_errors(buf, len, latin1_output);
+template <class char_type> bool is_eight_byte(char_type c) {
+  if (sizeof(char_type) == 1) {
+    return true;
   }
+  return uint8_t(c) == c;
+}
 
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_latin1_with_errors(buf, len, latin1_output);
-  }
+// Returns true upon success. The destination buffer must be large enough.
+// This functions assumes that the padding (=) has been removed.
+template <class char_type>
+result
+base64_tail_decode(char *dst, const char_type *src, size_t length,
+                   size_t padded_characters, // number of padding characters
+                                             // '=', typically 0, 1, 2.
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options) {
+  // This looks like 5 branches, but we expect the compiler to resolve this to a
+  // single branch:
+  const uint8_t *to_base64 = (options & base64_url)
+                                 ? tables::base64::to_base64_url_value
+                                 : tables::base64::to_base64_value;
+  const uint32_t *d0 = (options & base64_url)
+                           ? tables::base64::base64_url::d0
+                           : tables::base64::base64_default::d0;
+  const uint32_t *d1 = (options & base64_url)
+                           ? tables::base64::base64_url::d1
+                           : tables::base64::base64_default::d1;
+  const uint32_t *d2 = (options & base64_url)
+                           ? tables::base64::base64_url::d2
+                           : tables::base64::base64_default::d2;
+  const uint32_t *d3 = (options & base64_url)
+                           ? tables::base64::base64_url::d3
+                           : tables::base64::base64_default::d3;
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf16le_to_latin1(buf, len, latin1_output);
-  }
+  const char_type *srcend = src + length;
+  const char_type *srcinit = src;
+  const char *dstinit = dst;
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf16be_to_latin1(buf, len, latin1_output);
-  }
+  uint32_t x;
+  size_t idx;
+  uint8_t buffer[4];
+  while (true) {
+    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
+           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
+           is_eight_byte(src[3]) &&
+           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
+                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
+      if (match_system(endianness::BIG)) {
+        x = scalar::utf32::swap_bytes(x);
+      }
+      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
+      dst += 3;
+      src += 4;
+    }
+    idx = 0;
+    // we need at least four characters.
+    while (idx < 4 && src < srcend) {
+      char_type c = *src;
+      uint8_t code = to_base64[uint8_t(c)];
+      buffer[idx] = uint8_t(code);
+      if (is_eight_byte(c) && code <= 63) {
+        idx++;
+      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
+        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+      } else {
+        // We have a space or a newline. We ignore it.
+      }
+      src++;
+    }
+    if (idx != 4) {
+      if (last_chunk_options == last_chunk_handling_options::strict &&
+          (idx != 1) && ((idx + padded_characters) & 3) != 0) {
+        // The partial chunk was at src - idx
+        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
+      } else if (last_chunk_options ==
+                     last_chunk_handling_options::stop_before_partial &&
+                 (idx != 1) && ((idx + padded_characters) & 3) != 0) {
+        // Rewind src to before partial chunk
+        src -= idx;
+        return {SUCCESS, size_t(dst - dstinit)};
+      } else {
+        if (idx == 2) {
+          uint32_t triple =
+              (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
+          if (match_system(endianness::BIG)) {
+            triple <<= 8;
+            std::memcpy(dst, &triple, 1);
+          } else {
+            triple = scalar::utf32::swap_bytes(triple);
+            triple >>= 8;
+            std::memcpy(dst, &triple, 1);
+          }
+          dst += 1;
+        } else if (idx == 3) {
+          uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
+                            (uint32_t(buffer[1]) << 2 * 6) +
+                            (uint32_t(buffer[2]) << 1 * 6);
+          if (match_system(endianness::BIG)) {
+            triple <<= 8;
+            std::memcpy(dst, &triple, 2);
+          } else {
+            triple = scalar::utf32::swap_bytes(triple);
+            triple >>= 8;
+            std::memcpy(dst, &triple, 2);
+          }
+          dst += 2;
+        } else if (idx == 1) {
+          return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
+        }
+        return {SUCCESS, size_t(dst - dstinit)};
+      }
+    }
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf8(buf, len, utf8_output);
+    uint32_t triple =
+        (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6) +
+        (uint32_t(buffer[2]) << 1 * 6) + (uint32_t(buffer[3]) << 0 * 6);
+    if (match_system(endianness::BIG)) {
+      triple <<= 8;
+      std::memcpy(dst, &triple, 3);
+    } else {
+      triple = scalar::utf32::swap_bytes(triple);
+      triple >>= 8;
+      std::memcpy(dst, &triple, 3);
+    }
+    dst += 3;
   }
+}
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf8(buf, len, utf8_output);
+// like base64_tail_decode, but it will not write past the end of the output
+// buffer. outlen is modified to reflect the number of bytes written. This
+// functions assumes that the padding (=) has been removed.
+// like base64_tail_decode, but it will not write past the end of the output
+// buffer. outlen is modified to reflect the number of bytes written. This
+// functions assumes that the padding (=) has been removed.
+template <class char_type>
+result base64_tail_decode_safe(
+    char *dst, size_t &outlen, const char_type *src, size_t length,
+    size_t padded_characters, // number of padding characters '=', typically 0,
+                              // 1, 2.
+    base64_options options, last_chunk_handling_options last_chunk_options) {
+  if (length == 0) {
+    outlen = 0;
+    return {SUCCESS, 0};
   }
+  // This looks like 5 branches, but we expect the compiler to resolve this to a
+  // single branch:
+  const uint8_t *to_base64 = (options & base64_url)
+                                 ? tables::base64::to_base64_url_value
+                                 : tables::base64::to_base64_value;
+  const uint32_t *d0 = (options & base64_url)
+                           ? tables::base64::base64_url::d0
+                           : tables::base64::base64_default::d0;
+  const uint32_t *d1 = (options & base64_url)
+                           ? tables::base64::base64_url::d1
+                           : tables::base64::base64_default::d1;
+  const uint32_t *d2 = (options & base64_url)
+                           ? tables::base64::base64_url::d2
+                           : tables::base64::base64_default::d2;
+  const uint32_t *d3 = (options & base64_url)
+                           ? tables::base64::base64_url::d3
+                           : tables::base64::base64_default::d3;
 
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf8_with_errors(buf, len, utf8_output);
-  }
+  const char_type *srcend = src + length;
+  const char_type *srcinit = src;
+  const char *dstinit = dst;
+  const char *dstend = dst + outlen;
 
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf8_with_errors(buf, len, utf8_output);
-  }
+  uint32_t x;
+  size_t idx;
+  uint8_t buffer[4];
+  while (true) {
+    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
+           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
+           is_eight_byte(src[3]) &&
+           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
+                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
+      if (dstend - dst < 3) {
+        outlen = size_t(dst - dstinit);
+        return {OUTPUT_BUFFER_TOO_SMALL, size_t(src - srcinit)};
+      }
+      if (match_system(endianness::BIG)) {
+        x = scalar::utf32::swap_bytes(x);
+      }
+      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
+      dst += 3;
+      src += 4;
+    }
+    idx = 0;
+    const char_type *srccur = src;
+    // We need at least four characters.
+    while (idx < 4 && src < srcend) {
+      char_type c = *src;
+      uint8_t code = to_base64[uint8_t(c)];
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf16le_to_utf8(buf, len, utf8_output);
-  }
+      buffer[idx] = uint8_t(code);
+      if (is_eight_byte(c) && code <= 63) {
+        idx++;
+      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
+        outlen = size_t(dst - dstinit);
+        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+      } else {
+        // We have a space or a newline. We ignore it.
+      }
+      src++;
+    }
+    if (idx != 4) {
+      if (last_chunk_options == last_chunk_handling_options::strict &&
+          ((idx + padded_characters) & 3) != 0) {
+        outlen = size_t(dst - dstinit);
+        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
+      } else if (last_chunk_options ==
+                     last_chunk_handling_options::stop_before_partial &&
+                 ((idx + padded_characters) & 3) != 0) {
+        // Rewind src to before partial chunk
+        src = srccur;
+        outlen = size_t(dst - dstinit);
+        return {SUCCESS, size_t(dst - dstinit)};
+      } else { // loose mode
+        if (idx == 0) {
+          // No data left; return success
+          outlen = size_t(dst - dstinit);
+          return {SUCCESS, size_t(dst - dstinit)};
+        } else if (idx == 1) {
+          // Error: Incomplete chunk of length 1 is invalid in loose mode
+          outlen = size_t(dst - dstinit);
+          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
+        } else if (idx == 2 || idx == 3) {
+          // Check if there's enough space in the destination buffer
+          size_t required_space = (idx == 2) ? 1 : 2;
+          if (size_t(dstend - dst) < required_space) {
+            outlen = size_t(dst - dstinit);
+            return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
+          }
+          uint32_t triple = 0;
+          if (idx == 2) {
+            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12);
+            // Extract the first byte
+            triple >>= 16;
+            dst[0] = static_cast<char>(triple & 0xFF);
+            dst += 1;
+          } else if (idx == 3) {
+            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12) +
+                     (uint32_t(buffer[2]) << 6);
+            // Extract the first two bytes
+            triple >>= 8;
+            dst[0] = static_cast<char>((triple >> 8) & 0xFF);
+            dst[1] = static_cast<char>(triple & 0xFF);
+            dst += 2;
+          }
+          outlen = size_t(dst - dstinit);
+          return {SUCCESS, size_t(dst - dstinit)};
+        }
+      }
+    }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf16be_to_utf8(buf, len, utf8_output);
+    if (dstend - dst < 3) {
+      outlen = size_t(dst - dstinit);
+      return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
+    }
+    uint32_t triple = (uint32_t(buffer[0]) << 18) +
+                      (uint32_t(buffer[1]) << 12) + (uint32_t(buffer[2]) << 6) +
+                      (uint32_t(buffer[3]));
+    if (match_system(endianness::BIG)) {
+      triple <<= 8;
+      std::memcpy(dst, &triple, 3);
+    } else {
+      triple = scalar::utf32::swap_bytes(triple);
+      triple >>= 8;
+      std::memcpy(dst, &triple, 3);
+    }
+    dst += 3;
   }
+}
 
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1(buf, len,latin1_output);
+// Returns the number of bytes written. The destination buffer must be large
+// enough. It will add padding (=) if needed.
+size_t tail_encode_base64(char *dst, const char *src, size_t srclen,
+                          base64_options options) {
+  // By default, we use padding if we are not using the URL variant.
+  // This is check with ((options & base64_url) == 0) which returns true if we
+  // are not using the URL variant. However, we also allow 'inversion' of the
+  // convention with the base64_reverse_padding option. If the
+  // base64_reverse_padding option is set, we use padding if we are using the
+  // URL variant, and we omit it if we are not using the URL variant. This is
+  // checked with
+  // ((options & base64_reverse_padding) == base64_reverse_padding).
+  bool use_padding =
+      ((options & base64_url) == 0) ^
+      ((options & base64_reverse_padding) == base64_reverse_padding);
+  // This looks like 3 branches, but we expect the compiler to resolve this to
+  // a single branch:
+  const char *e0 = (options & base64_url) ? tables::base64::base64_url::e0
+                                          : tables::base64::base64_default::e0;
+  const char *e1 = (options & base64_url) ? tables::base64::base64_url::e1
+                                          : tables::base64::base64_default::e1;
+  const char *e2 = (options & base64_url) ? tables::base64::base64_url::e2
+                                          : tables::base64::base64_default::e2;
+  char *out = dst;
+  size_t i = 0;
+  uint8_t t1, t2, t3;
+  for (; i + 2 < srclen; i += 3) {
+    t1 = uint8_t(src[i]);
+    t2 = uint8_t(src[i + 1]);
+    t3 = uint8_t(src[i + 2]);
+    *out++ = e0[t1];
+    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
+    *out++ = e1[((t2 & 0x0F) << 2) | ((t3 >> 6) & 0x03)];
+    *out++ = e2[t3];
   }
-
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1_with_errors(buf, len,latin1_output);
+  switch (srclen - i) {
+  case 0:
+    break;
+  case 1:
+    t1 = uint8_t(src[i]);
+    *out++ = e0[t1];
+    *out++ = e1[(t1 & 0x03) << 4];
+    if (use_padding) {
+      *out++ = '=';
+      *out++ = '=';
+    }
+    break;
+  default: /* case 2 */
+    t1 = uint8_t(src[i]);
+    t2 = uint8_t(src[i + 1]);
+    *out++ = e0[t1];
+    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
+    *out++ = e2[(t2 & 0x0F) << 2];
+    if (use_padding) {
+      *out++ = '=';
+    }
   }
+  return (size_t)(out - dst);
+}
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * buf, size_t len, char* latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1(buf, len,latin1_output);
+template <class char_type>
+simdutf_warn_unused size_t maximal_binary_length_from_base64(
+    const char_type *input, size_t length) noexcept {
+  // We follow https://infra.spec.whatwg.org/#forgiving-base64-decode
+  size_t padding = 0;
+  if (length > 0) {
+    if (input[length - 1] == '=') {
+      padding++;
+      if (length > 1 && input[length - 2] == '=') {
+        padding++;
+      }
+    }
   }
-
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf8(buf, len, utf8_output);
+  size_t actual_length = length - padding;
+  if (actual_length % 4 <= 1) {
+    return actual_length / 4 * 3;
   }
+  // if we have a valid input, then the remainder must be 2 or 3 adding one or
+  // two extra bytes.
+  return actual_length / 4 * 3 + (actual_length % 4) - 1;
+}
 
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+simdutf_warn_unused size_t
+base64_length_from_binary(size_t length, base64_options options) noexcept {
+  // By default, we use padding if we are not using the URL variant.
+  // This is check with ((options & base64_url) == 0) which returns true if we
+  // are not using the URL variant. However, we also allow 'inversion' of the
+  // convention with the base64_reverse_padding option. If the
+  // base64_reverse_padding option is set, we use padding if we are using the
+  // URL variant, and we omit it if we are not using the URL variant. This is
+  // checked with
+  // ((options & base64_reverse_padding) == base64_reverse_padding).
+  bool use_padding =
+      ((options & base64_url) == 0) ^
+      ((options & base64_reverse_padding) == base64_reverse_padding);
+  if (!use_padding) {
+    return length / 3 * 4 + ((length % 3) ? (length % 3) + 1 : 0);
   }
+  return (length + 2) / 3 *
+         4; // We use padding to make the length a multiple of 4.
+}
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf8(buf, len, utf8_output);
-  }
+} // namespace base64
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16le(buf, len, utf16_output);
-  }
+#endif
+/* end file src/scalar/base64.h */
+/* begin file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
+#ifndef SIMDUTF_LATIN1_TO_UTF8_H
+#define SIMDUTF_LATIN1_TO_UTF8_H
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16be(buf, len, utf16_output);
-  }
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace latin1_to_utf8 {
 
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16le_with_errors(buf, len, utf16_output);
-  }
+inline size_t convert(const char *buf, size_t len, char *utf8_output) {
+  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
+  size_t pos = 0;
+  size_t utf8_pos = 0;
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          utf8_output[utf8_pos++] = char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
 
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16be_with_errors(buf, len, utf16_output);
+    unsigned char byte = data[pos];
+    if ((byte & 0x80) == 0) { // if ASCII
+      // will generate one UTF-8 bytes
+      utf8_output[utf8_pos++] = char(byte);
+      pos++;
+    } else {
+      // will generate two UTF-8 bytes
+      utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
+      utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
+      pos++;
+    }
   }
+  return utf8_pos;
+}
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf16le(buf, len, utf16_output);
+inline size_t convert_safe(const char *buf, size_t len, char *utf8_output,
+                           size_t utf8_len) {
+  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
+  size_t pos = 0;
+  size_t skip_pos = 0;
+  size_t utf8_pos = 0;
+  while (pos < len && utf8_pos < utf8_len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos >= skip_pos && pos + 16 <= len &&
+        utf8_pos + 16 <= utf8_len) { // if it is safe to read 16 more bytes,
+                                     // check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        ::memcpy(utf8_output + utf8_pos, buf + pos, 16);
+        utf8_pos += 16;
+        pos += 16;
+      } else {
+        // At least one of the next 16 bytes are not ASCII, we will process them
+        // one by one
+        skip_pos = pos + 16;
+      }
+    } else {
+      const auto byte = data[pos];
+      if ((byte & 0x80) == 0) { // if ASCII
+        // will generate one UTF-8 bytes
+        utf8_output[utf8_pos++] = char(byte);
+        pos++;
+      } else if (utf8_pos + 2 <= utf8_len) {
+        // will generate two UTF-8 bytes
+        utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
+        utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
+        pos++;
+      } else {
+        break;
+      }
+    }
   }
+  return utf8_pos;
+}
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf16be(buf, len, utf16_output);
-  }
+} // namespace latin1_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
+#endif
+/* end file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
+
+namespace simdutf {
+bool implementation::supported_by_runtime_system() const {
+  uint32_t required_instruction_sets = this->required_instruction_sets();
+  uint32_t supported_instruction_sets =
+      internal::detect_supported_architectures();
+  return ((supported_instruction_sets & required_instruction_sets) ==
+          required_instruction_sets);
+}
+
+simdutf_warn_unused encoding_type implementation::autodetect_encoding(
+    const char *input, size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  // UTF8 is common, it includes ASCII, and is commonly represented
+  // without a BOM, so if it fits, go with that. Note that it is still
+  // possible to get it wrong, we are only 'guessing'. If some has UTF-16
+  // data without a BOM, it could pass as UTF-8.
+  //
+  // An interesting twist might be to check for UTF-16 ASCII first (every
+  // other byte is zero).
+  if (validate_utf8(input, length)) {
+    return encoding_type::UTF8;
+  }
+  // The next most common encoding that might appear without BOM is probably
+  // UTF-16LE, so try that next.
+  if ((length % 2) == 0) {
+    // important: we need to divide by two
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      return encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      return encoding_type::UTF32_LE;
+    }
+  }
+  return encoding_type::unspecified;
+}
+
+namespace internal {
+// When there is a single implementation, we should not pay a price
+// for dispatching to the best implementation. We should just use the
+// one we have. This is a compile-time check.
+#define SIMDUTF_SINGLE_IMPLEMENTATION                                          \
+  (SIMDUTF_IMPLEMENTATION_ICELAKE + SIMDUTF_IMPLEMENTATION_HASWELL +           \
+       SIMDUTF_IMPLEMENTATION_WESTMERE + SIMDUTF_IMPLEMENTATION_ARM64 +        \
+       SIMDUTF_IMPLEMENTATION_PPC64 + SIMDUTF_IMPLEMENTATION_FALLBACK ==       \
+   1)
+
+// Static array of known implementations. We are hoping these get baked into the
+// executable without requiring a static initializer.
+
+#if SIMDUTF_IMPLEMENTATION_ICELAKE
+static const icelake::implementation *get_icelake_singleton() {
+  static const icelake::implementation icelake_singleton{};
+  return &icelake_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_HASWELL
+static const haswell::implementation *get_haswell_singleton() {
+  static const haswell::implementation haswell_singleton{};
+  return &haswell_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_WESTMERE
+static const westmere::implementation *get_westmere_singleton() {
+  static const westmere::implementation westmere_singleton{};
+  return &westmere_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_ARM64
+static const arm64::implementation *get_arm64_singleton() {
+  static const arm64::implementation arm64_singleton{};
+  return &arm64_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_PPC64
+static const ppc64::implementation *get_ppc64_singleton() {
+  static const ppc64::implementation ppc64_singleton{};
+  return &ppc64_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_RVV
+static const rvv::implementation *get_rvv_singleton() {
+  static const rvv::implementation rvv_singleton{};
+  return &rvv_singleton;
+}
+#endif
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+static const fallback::implementation *get_fallback_singleton() {
+  static const fallback::implementation fallback_singleton{};
+  return &fallback_singleton;
+}
+#endif
+
+#if SIMDUTF_SINGLE_IMPLEMENTATION
+static const implementation *get_single_implementation() {
+  return
+  #if SIMDUTF_IMPLEMENTATION_ICELAKE
+      get_icelake_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_HASWELL
+  get_haswell_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_WESTMERE
+  get_westmere_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_ARM64
+  get_arm64_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_PPC64
+  get_ppc64_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_FALLBACK
+  get_fallback_singleton();
+  #endif
+}
+#endif
+
+/**
+ * @private Detects best supported implementation on first use, and sets it
+ */
+class detect_best_supported_implementation_on_first_use final
+    : public implementation {
+public:
+  std::string name() const noexcept final { return set_best()->name(); }
+  std::string description() const noexcept final {
+    return set_best()->description();
+  }
+  uint32_t required_instruction_sets() const noexcept final {
+    return set_best()->required_instruction_sets();
+  }
+
+  simdutf_warn_unused int
+  detect_encodings(const char *input, size_t length) const noexcept override {
+    return set_best()->detect_encodings(input, length);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf8(const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf8(buf, len);
+  }
+
+  simdutf_warn_unused result validate_utf8_with_errors(
+      const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf8_with_errors(buf, len);
+  }
+
+  simdutf_warn_unused bool
+  validate_ascii(const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_ascii(buf, len);
+  }
+
+  simdutf_warn_unused result validate_ascii_with_errors(
+      const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_ascii_with_errors(buf, len);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf16le(const char16_t *buf,
+                   size_t len) const noexcept final override {
+    return set_best()->validate_utf16le(buf, len);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf16be(const char16_t *buf,
+                   size_t len) const noexcept final override {
+    return set_best()->validate_utf16be(buf, len);
+  }
+
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf16le_with_errors(buf, len);
+  }
+
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf16be_with_errors(buf, len);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf32(const char32_t *buf,
+                 size_t len) const noexcept final override {
+    return set_best()->validate_utf32(buf, len);
+  }
+
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf32_with_errors(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  convert_latin1_to_utf8(const char *buf, size_t len,
+                         char *utf8_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len,
+      char32_t *latin1_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf32(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf8_to_latin1(const char *buf, size_t len,
+                         char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf8_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf8_to_latin1_with_errors(buf, len,
+                                                          latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16le_with_errors(buf, len,
+                                                           utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16be_with_errors(buf, len,
+                                                           utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf8_to_utf32(const char *buf, size_t len,
+                        char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf32(buf, len, utf32_output);
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf32_with_errors(buf, len,
+                                                         utf32_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf32(buf, len, utf32_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_latin1_with_errors(buf, len,
+                                                             latin1_output);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_latin1_with_errors(buf, len,
+                                                             latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf16le_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf16be_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf8(const char16_t *buf, size_t len,
+                          char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf8(const char16_t *buf, size_t len,
+                          char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf8_with_errors(buf, len,
+                                                           utf8_output);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf8_with_errors(buf, len,
+                                                           utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf16le_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf16be_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
+      const char32_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1_with_errors(buf, len,
+                                                           latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
+      const char32_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf8(const char32_t *buf, size_t len,
+                        char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf8(const char32_t *buf, size_t len,
+                              char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16le(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16be(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16le_with_errors(buf, len,
+                                                            utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16be_with_errors(buf, len,
+                                                            utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf16le_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
     return set_best()->convert_utf16le_to_utf32(buf, len, utf32_output);
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16be_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
     return set_best()->convert_utf16be_to_utf32(buf, len, utf32_output);
   }
 
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf32_with_errors(buf, len, utf32_output);
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf32_with_errors(buf, len,
+                                                            utf32_output);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf32_with_errors(buf, len, utf32_output);
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf32_with_errors(buf, len,
+                                                            utf32_output);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
     return set_best()->convert_valid_utf16le_to_utf32(buf, len, utf32_output);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_output) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
     return set_best()->convert_valid_utf16be_to_utf32(buf, len, utf32_output);
   }
 
-  void change_endianness_utf16(const char16_t * buf, size_t len, char16_t * output) const noexcept final override {
+  void change_endianness_utf16(const char16_t *buf, size_t len,
+                               char16_t *output) const noexcept final override {
     set_best()->change_endianness_utf16(buf, len, output);
   }
 
-  simdutf_warn_unused size_t count_utf16le(const char16_t * buf, size_t len) const noexcept final override {
+  simdutf_warn_unused size_t
+  count_utf16le(const char16_t *buf, size_t len) const noexcept final override {
     return set_best()->count_utf16le(buf, len);
   }
 
-  simdutf_warn_unused size_t count_utf16be(const char16_t * buf, size_t len) const noexcept final override {
+  simdutf_warn_unused size_t
+  count_utf16be(const char16_t *buf, size_t len) const noexcept final override {
     return set_best()->count_utf16be(buf, len);
   }
 
-  simdutf_warn_unused size_t count_utf8(const char * buf, size_t len) const noexcept final override {
+  simdutf_warn_unused size_t
+  count_utf8(const char *buf, size_t len) const noexcept final override {
     return set_best()->count_utf8(buf, len);
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *buf, size_t len) const noexcept override {
     return set_best()->latin1_length_from_utf8(buf, len);
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t len) const noexcept override {
     return set_best()->latin1_length_from_utf16(len);
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t len) const noexcept override {
     return set_best()->latin1_length_from_utf32(len);
   }
 
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *buf, size_t len) const noexcept override {
     return set_best()->utf8_length_from_latin1(buf, len);
   }
 
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf8_length_from_utf16le(
+      const char16_t *buf, size_t len) const noexcept override {
     return set_best()->utf8_length_from_utf16le(buf, len);
   }
 
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf8_length_from_utf16be(
+      const char16_t *buf, size_t len) const noexcept override {
     return set_best()->utf8_length_from_utf16be(buf, len);
   }
 
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t len) const noexcept override {
     return set_best()->utf16_length_from_latin1(len);
   }
 
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t len) const noexcept override {
     return set_best()->utf32_length_from_latin1(len);
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *buf, size_t len) const noexcept override {
     return set_best()->utf32_length_from_utf16le(buf, len);
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *buf, size_t len) const noexcept override {
     return set_best()->utf32_length_from_utf16be(buf, len);
   }
 
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *buf, size_t len) const noexcept override {
     return set_best()->utf16_length_from_utf8(buf, len);
   }
 
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf8_length_from_utf32(
+      const char32_t *buf, size_t len) const noexcept override {
     return set_best()->utf8_length_from_utf32(buf, len);
   }
 
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t utf16_length_from_utf32(
+      const char32_t *buf, size_t len) const noexcept override {
     return set_best()->utf16_length_from_utf32(buf, len);
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char * buf, size_t len) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *buf, size_t len) const noexcept override {
     return set_best()->utf32_length_from_utf8(buf, len);
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept override {
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept override {
     return set_best()->maximal_binary_length_from_base64(input, length);
   }
 
-  simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept override {
-    return set_best()->base64_to_binary(input, length, output, options);
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary(input, length, output, options,
+                                        last_chunk_handling_options);
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept override {
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept override {
     return set_best()->maximal_binary_length_from_base64(input, length);
   }
 
-  simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept override {
-    return set_best()->base64_to_binary(input, length, output, options);
+  simdutf_warn_unused result base64_to_binary(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary(input, length, output, options,
+                                        last_chunk_handling_options);
   }
 
-  simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) const noexcept override {
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept override {
     return set_best()->base64_length_from_binary(length, options);
   }
 
-  size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept override {
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept override {
     return set_best()->binary_to_base64(input, length, output, options);
   }
 
-  simdutf_really_inline detect_best_supported_implementation_on_first_use() noexcept : implementation("best_supported_detector", "Detects the best supported implementation and sets it", 0) {}
+  simdutf_really_inline
+  detect_best_supported_implementation_on_first_use() noexcept
+      : implementation("best_supported_detector",
+                       "Detects the best supported implementation and sets it",
+                       0) {}
 
 private:
   const implementation *set_best() const noexcept;
 };
 
-static_assert(std::is_trivially_destructible<detect_best_supported_implementation_on_first_use>::value, "detect_best_supported_implementation_on_first_use should be trivially destructible");
+static_assert(std::is_trivially_destructible<
+                  detect_best_supported_implementation_on_first_use>::value,
+              "detect_best_supported_implementation_on_first_use should be "
+              "trivially destructible");
 
-static const std::initializer_list<const implementation *>& get_available_implementation_pointers() {
-  static const std::initializer_list<const implementation *> available_implementation_pointers {
+static const std::initializer_list<const implementation *> &
+get_available_implementation_pointers() {
+  static const std::initializer_list<const implementation *>
+      available_implementation_pointers{
 #if SIMDUTF_IMPLEMENTATION_ICELAKE
-    get_icelake_singleton(),
+          get_icelake_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_HASWELL
-    get_haswell_singleton(),
+          get_haswell_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_WESTMERE
-    get_westmere_singleton(),
+          get_westmere_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_ARM64
-    get_arm64_singleton(),
+          get_arm64_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_PPC64
-    get_ppc64_singleton(),
+          get_ppc64_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_RVV
-    get_rvv_singleton(),
+          get_rvv_singleton(),
 #endif
 #if SIMDUTF_IMPLEMENTATION_FALLBACK
-    get_fallback_singleton(),
+          get_fallback_singleton(),
 #endif
-  }; // available_implementation_pointers
+      }; // available_implementation_pointers
   return available_implementation_pointers;
 }
 
-// So we can return UNSUPPORTED_ARCHITECTURE from the parser when there is no support
+// So we can return UNSUPPORTED_ARCHITECTURE from the parser when there is no
+// support
 class unsupported_implementation final : public implementation {
 public:
-  simdutf_warn_unused int detect_encodings(const char *, size_t) const noexcept override {
+  simdutf_warn_unused int detect_encodings(const char *,
+                                           size_t) const noexcept override {
     return encoding_type::unspecified;
   }
 
-  simdutf_warn_unused bool validate_utf8(const char *, size_t) const noexcept final override {
-    return false; // Just refuse to validate. Given that we have a fallback implementation
-    // it seems unlikely that unsupported_implementation will ever be used. If it is used,
-    // then it will flag all strings as invalid. The alternative is to return an error_code
-    // from which the user has to figure out whether the string is valid UTF-8... which seems
-    // like a lot of work just to handle the very unlikely case that we have an unsupported
-    // implementation. And, when it does happen (that we have an unsupported implementation),
-    // what are the chances that the programmer has a fallback? Given that *we* provide the
-    // fallback, it implies that the programmer would need a fallback for our fallback.
+  simdutf_warn_unused bool validate_utf8(const char *,
+                                         size_t) const noexcept final override {
+    return false; // Just refuse to validate. Given that we have a fallback
+                  // implementation
+    // it seems unlikely that unsupported_implementation will ever be used. If
+    // it is used, then it will flag all strings as invalid. The alternative is
+    // to return an error_code from which the user has to figure out whether the
+    // string is valid UTF-8... which seems like a lot of work just to handle
+    // the very unlikely case that we have an unsupported implementation. And,
+    // when it does happen (that we have an unsupported implementation), what
+    // are the chances that the programmer has a fallback? Given that *we*
+    // provide the fallback, it implies that the programmer would need a
+    // fallback for our fallback.
   }
 
-  simdutf_warn_unused result validate_utf8_with_errors(const char *, size_t) const noexcept final override {
+  simdutf_warn_unused result validate_utf8_with_errors(
+      const char *, size_t) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused bool validate_ascii(const char *, size_t) const noexcept final override {
+  simdutf_warn_unused bool
+  validate_ascii(const char *, size_t) const noexcept final override {
     return false;
   }
 
-  simdutf_warn_unused result validate_ascii_with_errors(const char *, size_t) const noexcept final override {
+  simdutf_warn_unused result validate_ascii_with_errors(
+      const char *, size_t) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused bool validate_utf16le(const char16_t*, size_t) const noexcept final override {
+  simdutf_warn_unused bool
+  validate_utf16le(const char16_t *, size_t) const noexcept final override {
     return false;
   }
 
-  simdutf_warn_unused bool validate_utf16be(const char16_t*, size_t) const noexcept final override {
+  simdutf_warn_unused bool
+  validate_utf16be(const char16_t *, size_t) const noexcept final override {
     return false;
   }
 
-  simdutf_warn_unused result validate_utf16le_with_errors(const char16_t*, size_t) const noexcept final override {
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *, size_t) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result validate_utf16be_with_errors(const char16_t*, size_t) const noexcept final override {
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *, size_t) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused bool validate_utf32(const char32_t*, size_t) const noexcept final override {
+  simdutf_warn_unused bool
+  validate_utf32(const char32_t *, size_t) const noexcept final override {
     return false;
   }
 
-  simdutf_warn_unused result validate_utf32_with_errors(const char32_t*, size_t) const noexcept final override {
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *, size_t) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf8(const char*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf32(const char*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf8_to_latin1(const char*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *, size_t, char16_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *, size_t, char16_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf8_to_utf32(const char*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *, size_t, char32_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16le_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16be_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t *, size_t, char* ) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf32_to_latin1(
+      const char32_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t *, size_t, char* ) const noexcept final override {
+  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
+      const char32_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t *, size_t, char* ) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
+      const char32_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *, size_t, char *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t*, size_t, char*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *, size_t, char *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf32_to_utf16le(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf32_to_utf16be(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t*, size_t, char16_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16le_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_utf16be_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t*, size_t, char32_t*) const noexcept final override {
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
     return 0;
   }
 
-  void change_endianness_utf16(const char16_t *, size_t, char16_t *) const noexcept final override {
+  void change_endianness_utf16(const char16_t *, size_t,
+                               char16_t *) const noexcept final override {}
 
-  }
-
-  simdutf_warn_unused size_t count_utf16le(const char16_t *, size_t) const noexcept final override {
+  simdutf_warn_unused size_t
+  count_utf16le(const char16_t *, size_t) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t count_utf16be(const char16_t *, size_t) const noexcept final override {
+  simdutf_warn_unused size_t
+  count_utf16be(const char16_t *, size_t) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t count_utf8(const char *, size_t) const noexcept final override {
+  simdutf_warn_unused size_t count_utf8(const char *,
+                                        size_t) const noexcept final override {
     return 0;
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf8(const char *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf16(size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t latin1_length_from_utf32(size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t) const noexcept override {
     return 0;
   }
-  simdutf_warn_unused size_t utf8_length_from_latin1(const char *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16le(const char16_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16be(const char16_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf32_length_from_latin1(size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf16_length_from_utf8(const char *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *, size_t) const noexcept override {
     return 0;
   }
-  simdutf_warn_unused size_t utf16_length_from_latin1(size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t) const noexcept override {
     return 0;
   }
-  simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t utf32_length_from_utf8(const char *, size_t) const noexcept override {
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char *, size_t) const noexcept override {
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused result base64_to_binary(const char *, size_t, char*, base64_options) const noexcept override {
+  simdutf_warn_unused result
+  base64_to_binary(const char *, size_t, char *, base64_options,
+                   last_chunk_handling_options) const noexcept override {
     return result(error_code::OTHER, 0);
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t *, size_t) const noexcept override {
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *, size_t) const noexcept override {
     return 0;
   }
 
-  simdutf_warn_unused result base64_to_binary(const char16_t *, size_t, char*, base64_options) const noexcept override {
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *, size_t, char *, base64_options,
+                   last_chunk_handling_options) const noexcept override {
     return result(error_code::OTHER, 0);
   }
 
-
-  simdutf_warn_unused size_t base64_length_from_binary(size_t, base64_options) const noexcept override {
+  simdutf_warn_unused size_t
+  base64_length_from_binary(size_t, base64_options) const noexcept override {
     return 0;
   }
 
-  size_t binary_to_base64(const char *, size_t, char*, base64_options) const noexcept override {
+  size_t binary_to_base64(const char *, size_t, char *,
+                          base64_options) const noexcept override {
     return 0;
   }
 
-  unsupported_implementation() : implementation("unsupported", "Unsupported CPU (no detected SIMD instructions)", 0) {}
+  unsupported_implementation()
+      : implementation("unsupported",
+                       "Unsupported CPU (no detected SIMD instructions)", 0) {}
 };
 
-const unsupported_implementation* get_unsupported_singleton() {
-    static const unsupported_implementation unsupported_singleton{};
-    return &unsupported_singleton;
+const unsupported_implementation *get_unsupported_singleton() {
+  static const unsupported_implementation unsupported_singleton{};
+  return &unsupported_singleton;
 }
-static_assert(std::is_trivially_destructible<unsupported_implementation>::value, "unsupported_singleton should be trivially destructible");
+static_assert(std::is_trivially_destructible<unsupported_implementation>::value,
+              "unsupported_singleton should be trivially destructible");
 
 size_t available_implementation_list::size() const noexcept {
   return internal::get_available_implementation_pointers().size();
 }
-const implementation * const *available_implementation_list::begin() const noexcept {
+const implementation *const *
+available_implementation_list::begin() const noexcept {
   return internal::get_available_implementation_pointers().begin();
 }
-const implementation * const *available_implementation_list::end() const noexcept {
+const implementation *const *
+available_implementation_list::end() const noexcept {
   return internal::get_available_implementation_pointers().end();
 }
-const implementation *available_implementation_list::detect_best_supported() const noexcept {
+const implementation *
+available_implementation_list::detect_best_supported() const noexcept {
   // They are prelisted in priority order, so we just go down the list
-  uint32_t supported_instruction_sets = internal::detect_supported_architectures();
-  for (const implementation *impl : internal::get_available_implementation_pointers()) {
+  uint32_t supported_instruction_sets =
+      internal::detect_supported_architectures();
+  for (const implementation *impl :
+       internal::get_available_implementation_pointers()) {
     uint32_t required_instruction_sets = impl->required_instruction_sets();
-    if ((supported_instruction_sets & required_instruction_sets) == required_instruction_sets) { return impl; }
+    if ((supported_instruction_sets & required_instruction_sets) ==
+        required_instruction_sets) {
+      return impl;
+    }
   }
   return get_unsupported_singleton(); // this should never happen?
 }
 
-const implementation *detect_best_supported_implementation_on_first_use::set_best() const noexcept {
+const implementation *
+detect_best_supported_implementation_on_first_use::set_best() const noexcept {
   SIMDUTF_PUSH_DISABLE_WARNINGS
-  SIMDUTF_DISABLE_DEPRECATED_WARNING // Disable CRT_SECURE warning on MSVC: manually verified this is safe
-  char *force_implementation_name = getenv("SIMDUTF_FORCE_IMPLEMENTATION");
+  SIMDUTF_DISABLE_DEPRECATED_WARNING // Disable CRT_SECURE warning on MSVC:
+                                     // manually verified this is safe
+      char *force_implementation_name = getenv("SIMDUTF_FORCE_IMPLEMENTATION");
   SIMDUTF_POP_DISABLE_WARNINGS
 
   if (force_implementation_name) {
-    auto force_implementation = get_available_implementations()[force_implementation_name];
+    auto force_implementation =
+        get_available_implementations()[force_implementation_name];
     if (force_implementation) {
       return get_active_implementation() = force_implementation;
     } else {
@@ -6856,43 +8784,47 @@ const implementation *detect_best_supported_implementation_on_first_use::set_bes
       return get_active_implementation() = get_unsupported_singleton();
     }
   }
-  return get_active_implementation() = get_available_implementations().detect_best_supported();
+  return get_active_implementation() =
+             get_available_implementations().detect_best_supported();
 }
 
 } // namespace internal
 
-
-
 /**
  * The list of available implementations compiled into simdutf.
  */
-SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list& get_available_implementations() {
-  static const internal::available_implementation_list available_implementations{};
+SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list &
+get_available_implementations() {
+  static const internal::available_implementation_list
+      available_implementations{};
   return available_implementations;
 }
 
 /**
  * The active implementation.
  */
-SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation>& get_active_implementation() {
+SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation> &
+get_active_implementation() {
 #if SIMDUTF_SINGLE_IMPLEMENTATION
-    // skip runtime detection
-    static internal::atomic_ptr<const implementation> active_implementation{internal::get_single_implementation()};
-    return active_implementation;
+  // skip runtime detection
+  static internal::atomic_ptr<const implementation> active_implementation{
+      internal::get_single_implementation()};
+  return active_implementation;
 #else
-    static const internal::detect_best_supported_implementation_on_first_use detect_best_supported_implementation_on_first_use_singleton;
-    static internal::atomic_ptr<const implementation> active_implementation{&detect_best_supported_implementation_on_first_use_singleton};
-    return active_implementation;
+  static const internal::detect_best_supported_implementation_on_first_use
+      detect_best_supported_implementation_on_first_use_singleton;
+  static internal::atomic_ptr<const implementation> active_implementation{
+      &detect_best_supported_implementation_on_first_use_singleton};
+  return active_implementation;
 #endif
 }
 
-
 #if SIMDUTF_SINGLE_IMPLEMENTATION
-const implementation * get_default_implementation() {
+const implementation *get_default_implementation() {
   return internal::get_single_implementation();
 }
 #else
-internal::atomic_ptr<const implementation>& get_default_implementation() {
+internal::atomic_ptr<const implementation> &get_default_implementation() {
   return get_active_implementation();
 }
 #endif
@@ -6901,311 +8833,435 @@ internal::atomic_ptr<const implementation>& get_default_implementation() {
 simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept {
   return get_default_implementation()->validate_utf8(buf, len);
 }
-simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) noexcept {
+simdutf_warn_unused result validate_utf8_with_errors(const char *buf,
+                                                     size_t len) noexcept {
   return get_default_implementation()->validate_utf8_with_errors(buf, len);
 }
 simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) noexcept {
   return get_default_implementation()->validate_ascii(buf, len);
 }
-simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) noexcept {
+simdutf_warn_unused result validate_ascii_with_errors(const char *buf,
+                                                      size_t len) noexcept {
   return get_default_implementation()->validate_ascii_with_errors(buf, len);
 }
-simdutf_warn_unused size_t convert_utf8_to_utf16(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf8_to_utf16be(input, length, utf16_output);
-  #else
+#else
   return convert_utf8_to_utf16le(input, length, utf16_output);
-  #endif
-}
-simdutf_warn_unused size_t convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf8(buf, len,utf8_output);
-}
-simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * buf, size_t len, char16_t* utf16_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf16le(buf, len, utf16_output);
-}
-simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * buf, size_t len, char16_t* utf16_output) noexcept{
-  return get_default_implementation()->convert_latin1_to_utf16be(buf, len, utf16_output);
-}
-simdutf_warn_unused size_t convert_latin1_to_utf32(const char * buf, size_t len, char32_t * latin1_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf32(buf, len,latin1_output);
-}
-simdutf_warn_unused size_t convert_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_latin1(buf, len,latin1_output);
-}
-simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_latin1_with_errors(buf, len, latin1_output);
-}
-simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * buf, size_t len, char* latin1_output) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_latin1(buf, len,latin1_output);
-}
-simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16le(input, length, utf16_output);
-}
-simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16be(input, length, utf16_output);
+#endif
 }
-simdutf_warn_unused result convert_utf8_to_utf16_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_latin1_to_utf8(const char *buf, size_t len,
+                                                  char *utf8_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf8(buf, len,
+                                                              utf8_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf16le(buf, len,
+                                                                 utf16_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf16be(buf, len,
+                                                                 utf16_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *latin1_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf32(buf, len,
+                                                               latin1_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_latin1(buf, len,
+                                                              latin1_output);
+}
+simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_latin1_with_errors(
+      buf, len, latin1_output);
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_latin1(
+      buf, len, latin1_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16le(input, length,
+                                                               utf16_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16be(input, length,
+                                                               utf16_output);
+}
+simdutf_warn_unused result convert_utf8_to_utf16_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf8_to_utf16be_with_errors(input, length, utf16_output);
-  #else
+#else
   return convert_utf8_to_utf16le_with_errors(input, length, utf16_output);
-  #endif
+#endif
 }
-simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16le_with_errors(input, length, utf16_output);
+simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16le_with_errors(
+      input, length, utf16_output);
 }
-simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16be_with_errors(input, length, utf16_output);
+simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16be_with_errors(
+      input, length, utf16_output);
 }
-simdutf_warn_unused size_t convert_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf32(input, length, utf32_output);
+simdutf_warn_unused size_t convert_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf32(input, length,
+                                                             utf32_output);
 }
-simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * input, size_t length, char32_t* utf32_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf32_with_errors(input, length, utf32_output);
+simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+    const char *input, size_t length, char32_t *utf32_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf32_with_errors(
+      input, length, utf32_output);
 }
-simdutf_warn_unused bool validate_utf16(const char16_t * buf, size_t len) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused bool validate_utf16(const char16_t *buf,
+                                        size_t len) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return validate_utf16be(buf, len);
-  #else
+#else
   return validate_utf16le(buf, len);
-  #endif
+#endif
 }
-simdutf_warn_unused bool validate_utf16le(const char16_t * buf, size_t len) noexcept {
+simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                          size_t len) noexcept {
   return get_default_implementation()->validate_utf16le(buf, len);
 }
-simdutf_warn_unused bool validate_utf16be(const char16_t * buf, size_t len) noexcept {
+simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                          size_t len) noexcept {
   return get_default_implementation()->validate_utf16be(buf, len);
 }
-simdutf_warn_unused result validate_utf16_with_errors(const char16_t * buf, size_t len) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf,
+                                                      size_t len) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return validate_utf16be_with_errors(buf, len);
-  #else
+#else
   return validate_utf16le_with_errors(buf, len);
-  #endif
+#endif
 }
-simdutf_warn_unused result validate_utf16le_with_errors(const char16_t * buf, size_t len) noexcept {
+simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept {
   return get_default_implementation()->validate_utf16le_with_errors(buf, len);
 }
-simdutf_warn_unused result validate_utf16be_with_errors(const char16_t * buf, size_t len) noexcept {
+simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept {
   return get_default_implementation()->validate_utf16be_with_errors(buf, len);
 }
-simdutf_warn_unused bool validate_utf32(const char32_t * buf, size_t len) noexcept {
+simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                        size_t len) noexcept {
   return get_default_implementation()->validate_utf32(buf, len);
 }
-simdutf_warn_unused result validate_utf32_with_errors(const char32_t * buf, size_t len) noexcept {
+simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf,
+                                                      size_t len) noexcept {
   return get_default_implementation()->validate_utf32_with_errors(buf, len);
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16(const char * input, size_t length, char16_t* utf16_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_valid_utf8_to_utf16be(input, length, utf16_buffer);
-  #else
+#else
   return convert_valid_utf8_to_utf16le(input, length, utf16_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf16le(input, length, utf16_buffer);
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf16le(
+      input, length, utf16_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf16be(input, length, utf16_buffer);
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf16be(
+      input, length, utf16_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf32(input, length, utf32_buffer);
+simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf32(
+      input, length, utf32_buffer);
 }
-simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t *buf,
+                                                 size_t len,
+                                                 char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_utf8(buf, len, utf8_buffer);
-  #else
+#else
   return convert_utf16le_to_utf8(buf, len, utf8_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_utf16_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_utf16_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_latin1(buf, len, latin1_buffer);
-  #else
+#else
   return convert_utf16le_to_latin1(buf, len, latin1_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_latin1_to_utf16(const char * buf, size_t len, char16_t* utf16_output) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_latin1_to_utf16(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_latin1_to_utf16be(buf, len, utf16_output);
-  #else
+#else
   return convert_latin1_to_utf16le(buf, len, utf16_output);
-  #endif
-}
-simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_latin1(buf, len, latin1_buffer);
-}
-simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_latin1(buf, len, latin1_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_latin1(buf, len, latin1_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_latin1(buf, len, latin1_buffer);
-}
-simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_latin1_with_errors(buf, len, latin1_buffer);
-}
-simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_latin1_with_errors(buf, len, latin1_buffer);
-}
-simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf8(buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf8(buf, len, utf8_buffer);
+#endif
 }
-simdutf_warn_unused result convert_utf16_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_latin1(buf, len,
+                                                                 latin1_buffer);
+}
+simdutf_warn_unused size_t convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_latin1(buf, len,
+                                                                 latin1_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_latin1(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_latin1(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_latin1_with_errors(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_latin1_with_errors(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t *buf,
+                                                   size_t len,
+                                                   char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf8(buf, len,
+                                                               utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *buf,
+                                                   size_t len,
+                                                   char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf8(buf, len,
+                                                               utf8_buffer);
+}
+simdutf_warn_unused result convert_utf16_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_utf8_with_errors(buf, len, utf8_buffer);
-  #else
+#else
   return convert_utf16le_to_utf8_with_errors(buf, len, utf8_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused result convert_utf16_to_latin1_with_errors(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused result convert_utf16_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_latin1_with_errors(buf, len, latin1_buffer);
-  #else
+#else
   return convert_utf16le_to_latin1_with_errors(buf, len, latin1_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf8_with_errors(buf, len, utf8_buffer);
+simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf8_with_errors(
+      buf, len, utf8_buffer);
 }
-simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf8_with_errors(buf, len, utf8_buffer);
+simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf8_with_errors(
+      buf, len, utf8_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf16_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf16_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_valid_utf16be_to_utf8(buf, len, utf8_buffer);
-  #else
+#else
   return convert_valid_utf16le_to_utf8(buf, len, utf8_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_valid_utf16_to_latin1(const char16_t * buf, size_t len, char* latin1_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf16_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_valid_utf16be_to_latin1(buf, len, latin1_buffer);
-  #else
+#else
   return convert_valid_utf16le_to_latin1(buf, len, latin1_buffer);
-  #endif
-}
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_utf8(buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_utf8(buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf8(buf, len, utf8_buffer);
-}
-simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf8_with_errors(buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * buf, size_t len, char* utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf8(buf, len, utf8_buffer);
+#endif
 }
-simdutf_warn_unused size_t convert_utf32_to_utf16(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_utf8(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_utf8(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *buf,
+                                                 size_t len,
+                                                 char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf8(buf, len,
+                                                             utf8_buffer);
+}
+simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf8_with_errors(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf8(buf, len,
+                                                                   utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf16(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf32_to_utf16be(buf, len, utf16_buffer);
-  #else
+#else
   return convert_utf32_to_utf16le(buf, len, utf16_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * input, size_t length, char* latin1_output) noexcept {
-  return get_default_implementation()->convert_utf32_to_latin1(input, length, latin1_output);
+simdutf_warn_unused size_t convert_utf32_to_latin1(
+    const char32_t *input, size_t length, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf32_to_latin1(input, length,
+                                                               latin1_output);
 }
-simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16le(buf, len, utf16_buffer);
+simdutf_warn_unused size_t convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16le(buf, len,
+                                                                utf16_buffer);
 }
-simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16be(buf, len, utf16_buffer);
+simdutf_warn_unused size_t convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16be(buf, len,
+                                                                utf16_buffer);
 }
-simdutf_warn_unused result convert_utf32_to_utf16_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused result convert_utf32_to_utf16_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf32_to_utf16be_with_errors(buf, len, utf16_buffer);
-  #else
+#else
   return convert_utf32_to_utf16le_with_errors(buf, len, utf16_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16le_with_errors(buf, len, utf16_buffer);
+simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16le_with_errors(
+      buf, len, utf16_buffer);
 }
-simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16be_with_errors(buf, len, utf16_buffer);
+simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16be_with_errors(
+      buf, len, utf16_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_valid_utf32_to_utf16be(buf, len, utf16_buffer);
-  #else
+#else
   return convert_valid_utf32_to_utf16le(buf, len, utf16_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf16le(buf, len, utf16_buffer);
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf16le(
+      buf, len, utf16_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * buf, size_t len, char16_t* utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf16be(buf, len, utf16_buffer);
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf16be(
+      buf, len, utf16_buffer);
 }
-simdutf_warn_unused size_t convert_utf16_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_utf16_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_utf32(buf, len, utf32_buffer);
-  #else
+#else
   return convert_utf16le_to_utf32(buf, len, utf32_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf32(buf, len, utf32_buffer);
+simdutf_warn_unused size_t convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf32(buf, len,
+                                                                utf32_buffer);
 }
-simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf32(buf, len, utf32_buffer);
+simdutf_warn_unused size_t convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf32(buf, len,
+                                                                utf32_buffer);
 }
-simdutf_warn_unused result convert_utf16_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused result convert_utf16_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_utf16be_to_utf32_with_errors(buf, len, utf32_buffer);
-  #else
+#else
   return convert_utf16le_to_utf32_with_errors(buf, len, utf32_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf32_with_errors(buf, len, utf32_buffer);
+simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf32_with_errors(
+      buf, len, utf32_buffer);
 }
-simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf32_with_errors(buf, len, utf32_buffer);
+simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf32_with_errors(
+      buf, len, utf32_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf16_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t convert_valid_utf16_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return convert_valid_utf16be_to_utf32(buf, len, utf32_buffer);
-  #else
+#else
   return convert_valid_utf16le_to_utf32(buf, len, utf32_buffer);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_utf32(buf, len, utf32_buffer);
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_utf32(
+      buf, len, utf32_buffer);
 }
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * buf, size_t len, char32_t* utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_utf32(buf, len, utf32_buffer);
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_utf32(
+      buf, len, utf32_buffer);
 }
-void change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) noexcept {
+void change_endianness_utf16(const char16_t *input, size_t length,
+                             char16_t *output) noexcept {
   get_default_implementation()->change_endianness_utf16(input, length, output);
 }
-simdutf_warn_unused size_t count_utf16(const char16_t * input, size_t length) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t count_utf16(const char16_t *input,
+                                       size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return count_utf16be(input, length);
-  #else
+#else
   return count_utf16le(input, length);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t count_utf16le(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t count_utf16le(const char16_t *input,
+                                         size_t length) noexcept {
   return get_default_implementation()->count_utf16le(input, length);
 }
-simdutf_warn_unused size_t count_utf16be(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t count_utf16be(const char16_t *input,
+                                         size_t length) noexcept {
   return get_default_implementation()->count_utf16be(input, length);
 }
-simdutf_warn_unused size_t count_utf8(const char * input, size_t length) noexcept {
+simdutf_warn_unused size_t count_utf8(const char *input,
+                                      size_t length) noexcept {
   return get_default_implementation()->count_utf8(input, length);
 }
-simdutf_warn_unused size_t latin1_length_from_utf8(const char * buf, size_t len) noexcept {
+simdutf_warn_unused size_t latin1_length_from_utf8(const char *buf,
+                                                   size_t len) noexcept {
   return get_default_implementation()->latin1_length_from_utf8(buf, len);
 }
 simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) noexcept {
@@ -7214,86 +9270,120 @@ simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) noexcept {
 simdutf_warn_unused size_t latin1_length_from_utf32(size_t len) noexcept {
   return get_default_implementation()->latin1_length_from_utf32(len);
 }
-simdutf_warn_unused size_t utf8_length_from_latin1(const char * buf, size_t len) noexcept {
+simdutf_warn_unused size_t utf8_length_from_latin1(const char *buf,
+                                                   size_t len) noexcept {
   return get_default_implementation()->utf8_length_from_latin1(buf, len);
 }
-simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t * input, size_t length) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t *input,
+                                                  size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return utf8_length_from_utf16be(input, length);
-  #else
+#else
   return utf8_length_from_utf16le(input, length);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *input,
+                                                    size_t length) noexcept {
   return get_default_implementation()->utf8_length_from_utf16le(input, length);
 }
-simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *input,
+                                                    size_t length) noexcept {
   return get_default_implementation()->utf8_length_from_utf16be(input, length);
 }
-simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t * input, size_t length) noexcept {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t *input,
+                                                   size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
   return utf32_length_from_utf16be(input, length);
-  #else
+#else
   return utf32_length_from_utf16le(input, length);
-  #endif
+#endif
 }
-simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *input,
+                                                     size_t length) noexcept {
   return get_default_implementation()->utf32_length_from_utf16le(input, length);
 }
-simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *input,
+                                                     size_t length) noexcept {
   return get_default_implementation()->utf32_length_from_utf16be(input, length);
 }
-simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf16_length_from_utf8(const char *input,
+                                                  size_t length) noexcept {
   return get_default_implementation()->utf16_length_from_utf8(input, length);
 }
 simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) noexcept {
   return get_default_implementation()->utf16_length_from_latin1(length);
 }
-simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *input,
+                                                  size_t length) noexcept {
   return get_default_implementation()->utf8_length_from_utf32(input, length);
 }
-simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *input,
+                                                   size_t length) noexcept {
   return get_default_implementation()->utf16_length_from_utf32(input, length);
 }
-simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) noexcept {
+simdutf_warn_unused size_t utf32_length_from_utf8(const char *input,
+                                                  size_t length) noexcept {
   return get_default_implementation()->utf32_length_from_utf8(input, length);
 }
 
-simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) noexcept {
-  return get_default_implementation()->maximal_binary_length_from_base64(input, length);
+simdutf_warn_unused size_t
+maximal_binary_length_from_base64(const char *input, size_t length) noexcept {
+  return get_default_implementation()->maximal_binary_length_from_base64(
+      input, length);
 }
 
-simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options) noexcept {
-  return get_default_implementation()->base64_to_binary(input, length, output, options);
+simdutf_warn_unused result base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return get_default_implementation()->base64_to_binary(
+      input, length, output, options, last_chunk_handling_options);
 }
 
-simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) noexcept {
-  return get_default_implementation()->maximal_binary_length_from_base64(input, length);
+simdutf_warn_unused size_t maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) noexcept {
+  return get_default_implementation()->maximal_binary_length_from_base64(
+      input, length);
 }
 
-simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) noexcept {
-  return get_default_implementation()->base64_to_binary(input, length, output, options);
+simdutf_warn_unused result base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return get_default_implementation()->base64_to_binary(
+      input, length, output, options, last_chunk_handling_options);
 }
 
 template <typename chartype>
-simdutf_warn_unused result base64_to_binary_safe_impl(const chartype * input, size_t length, char* output, size_t& outlen, base64_options options) noexcept {
-  static_assert(std::is_same<chartype, char>::value || std::is_same<chartype, char16_t>::value, "Only char and char16_t are supported.");
+simdutf_warn_unused result base64_to_binary_safe_impl(
+    const chartype *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  static_assert(std::is_same<chartype, char>::value ||
+                    std::is_same<chartype, char16_t>::value,
+                "Only char and char16_t are supported.");
   // The implementation could be nicer, but we expect that most times, the user
   // will provide us with a buffer that is large enough.
   size_t max_length = maximal_binary_length_from_base64(input, length);
-  if(outlen >= max_length) {
+  if (outlen >= max_length) {
     // fast path
-    result r = base64_to_binary(input, length, output, options);
-    if(r.error != error_code::INVALID_BASE64_CHARACTER) { outlen = r.count; r.count = length; }
+    result r = base64_to_binary(input, length, output, options,
+                                last_chunk_handling_options);
+    if (r.error != error_code::INVALID_BASE64_CHARACTER) {
+      outlen = r.count;
+      r.count = length;
+    }
     return r;
   }
-  // The output buffer is maybe too small. We will decode a truncated version of the input.
+  // The output buffer is maybe too small. We will decode a truncated version of
+  // the input.
   size_t outlen3 = outlen / 3 * 3; // round down to multiple of 3
   size_t safe_input = base64_length_from_binary(outlen3, options);
-  result r = base64_to_binary(input, safe_input, output, options);
-  if(r.error == error_code::INVALID_BASE64_CHARACTER) { return r; }
-  size_t offset = (r.error == error_code::BASE64_INPUT_REMAINDER) ? 1 :
-    ((r.count % 3) == 0 ? 0 : (r.count % 3) + 1);
+  result r = base64_to_binary(input, safe_input, output, options, loose);
+  if (r.error == error_code::INVALID_BASE64_CHARACTER) {
+    return r;
+  }
+  size_t offset = (r.error == error_code::BASE64_INPUT_REMAINDER)
+                      ? 1
+                      : ((r.count % 3) == 0 ? 0 : (r.count % 3) + 1);
   size_t output_index = r.count - (r.count % 3);
   size_t input_index = safe_input;
   // offset is a value that is no larger than 3. We backtrack
@@ -7301,37 +9391,42 @@ simdutf_warn_unused result base64_to_binary_safe_impl(const chartype * input, si
   // white space characters. It is expected that the next loop
   // runs at most 3 times + the number of white space characters
   // in between them, so we are not worried about performance.
-  while(offset > 0 && input_index > 0) {
+  while (offset > 0 && input_index > 0) {
     chartype c = input[--input_index];
-    if(scalar::base64::is_ascii_white_space(c)){
+    if (scalar::base64::is_ascii_white_space(c)) {
       // skipping
     } else {
       offset--;
     }
   }
   size_t remaining_out = outlen - output_index;
-  const chartype * tail_input = input + input_index;
+  const chartype *tail_input = input + input_index;
   size_t tail_length = length - input_index;
-  while(tail_length > 0 && scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
+  while (tail_length > 0 &&
+         scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
     tail_length--;
   }
   size_t padding_characts = 0;
-  if(tail_length > 0 && tail_input[tail_length - 1] == '=') {
+  if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
     tail_length--;
     padding_characts++;
-    while(tail_length > 0 && scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
+    while (tail_length > 0 &&
+           scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
       tail_length--;
     }
-    if(tail_length > 0 && tail_input[tail_length - 1] == '=') {
+    if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
       tail_length--;
       padding_characts++;
     }
   }
-  r = scalar::base64::base64_tail_decode_safe(output + output_index, remaining_out, tail_input, tail_length, options);
+  r = scalar::base64::base64_tail_decode_safe(
+      output + output_index, remaining_out, tail_input, tail_length,
+      padding_characts, options, last_chunk_handling_options);
   outlen = output_index + remaining_out;
-  if(r.error == error_code::SUCCESS && padding_characts > 0) {
+  if (last_chunk_handling_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && padding_characts > 0) {
     // additional checks
-    if((outlen % 3 == 0) || ((outlen % 3) + 1 + padding_characts != 4)) {
+    if ((outlen % 3 == 0) || ((outlen % 3) + 1 + padding_characts != 4)) {
       r.error = error_code::INVALID_BASE64_CHARACTER;
     }
   }
@@ -7339,9 +9434,8 @@ simdutf_warn_unused result base64_to_binary_safe_impl(const chartype * input, si
   return r;
 }
 
-
-
-simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * buf, size_t len, char* utf8_output, size_t utf8_len) noexcept {
+simdutf_warn_unused size_t convert_latin1_to_utf8_safe(
+    const char *buf, size_t len, char *utf8_output, size_t utf8_len) noexcept {
   const auto start{utf8_output};
 
   while (true) {
@@ -7351,7 +9445,8 @@ simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * buf, size_t
       break;
     }
 
-    const auto write_len = simdutf::convert_latin1_to_utf8(buf, read_len, utf8_output);
+    const auto write_len =
+        simdutf::convert_latin1_to_utf8(buf, read_len, utf8_output);
 
     utf8_output += write_len;
     utf8_len -= write_len;
@@ -7359,35 +9454,51 @@ simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * buf, size_t
     len -= read_len;
   }
 
-  utf8_output += scalar::latin1_to_utf8::convert_safe(buf, len, utf8_output, utf8_len);
+  utf8_output +=
+      scalar::latin1_to_utf8::convert_safe(buf, len, utf8_output, utf8_len);
 
   return utf8_output - start;
 }
 
-
-simdutf_warn_unused result base64_to_binary_safe(const char * input, size_t length, char* output, size_t& outlen, base64_options options) noexcept {
-  return base64_to_binary_safe_impl<char>(input, length, output, outlen, options);
+simdutf_warn_unused result base64_to_binary_safe(
+    const char *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return base64_to_binary_safe_impl<char>(input, length, output, outlen,
+                                          options, last_chunk_handling_options);
 }
-simdutf_warn_unused result base64_to_binary_safe(const char16_t * input, size_t length, char* output, size_t& outlen, base64_options options) noexcept {
-  return base64_to_binary_safe_impl<char16_t>(input, length, output, outlen, options);
+simdutf_warn_unused result base64_to_binary_safe(
+    const char16_t *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return base64_to_binary_safe_impl<char16_t>(
+      input, length, output, outlen, options, last_chunk_handling_options);
 }
 
-simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options) noexcept {
-  return get_default_implementation()->base64_length_from_binary(length, options);
+simdutf_warn_unused size_t
+base64_length_from_binary(size_t length, base64_options options) noexcept {
+  return get_default_implementation()->base64_length_from_binary(length,
+                                                                 options);
 }
 
-size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options) noexcept {
-  return get_default_implementation()->binary_to_base64(input, length, output, options);
+size_t binary_to_base64(const char *input, size_t length, char *output,
+                        base64_options options) noexcept {
+  return get_default_implementation()->binary_to_base64(input, length, output,
+                                                        options);
 }
 
-simdutf_warn_unused simdutf::encoding_type autodetect_encoding(const char * buf, size_t length) noexcept {
+simdutf_warn_unused simdutf::encoding_type
+autodetect_encoding(const char *buf, size_t length) noexcept {
   return get_default_implementation()->autodetect_encoding(buf, length);
 }
-simdutf_warn_unused int detect_encodings(const char * buf, size_t length) noexcept {
+simdutf_warn_unused int detect_encodings(const char *buf,
+                                         size_t length) noexcept {
   return get_default_implementation()->detect_encodings(buf, length);
 }
-const implementation * builtin_implementation() {
-  static const implementation * builtin_impl = get_available_implementations()[SIMDUTF_STRINGIFY(SIMDUTF_BUILTIN_IMPLEMENTATION)];
+const implementation *builtin_implementation() {
+  static const implementation *builtin_impl =
+      get_available_implementations()[SIMDUTF_STRINGIFY(
+          SIMDUTF_BUILTIN_IMPLEMENTATION)];
   return builtin_impl;
 }
 
@@ -7395,85 +9506,103 @@ simdutf_warn_unused size_t trim_partial_utf8(const char *input, size_t length) {
   return scalar::utf8::trim_partial_utf8(input, length);
 }
 
-simdutf_warn_unused size_t trim_partial_utf16be(const char16_t* input, size_t length) {
+simdutf_warn_unused size_t trim_partial_utf16be(const char16_t *input,
+                                                size_t length) {
   return scalar::utf16::trim_partial_utf16<BIG>(input, length);
 }
 
-simdutf_warn_unused size_t trim_partial_utf16le(const char16_t* input, size_t length) {
+simdutf_warn_unused size_t trim_partial_utf16le(const char16_t *input,
+                                                size_t length) {
   return scalar::utf16::trim_partial_utf16<LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t trim_partial_utf16(const char16_t* input, size_t length) {
-  #if SIMDUTF_IS_BIG_ENDIAN
+simdutf_warn_unused size_t trim_partial_utf16(const char16_t *input,
+                                              size_t length) {
+#if SIMDUTF_IS_BIG_ENDIAN
   return trim_partial_utf16be(input, length);
-  #else
+#else
   return trim_partial_utf16le(input, length);
-  #endif
+#endif
 }
 
 } // namespace simdutf
-
 /* end file src/implementation.cpp */
 /* begin file src/encoding_types.cpp */
 
 namespace simdutf {
 bool match_system(endianness e) {
 #if SIMDUTF_IS_BIG_ENDIAN
-    return e == endianness::BIG;
+  return e == endianness::BIG;
 #else
-    return e == endianness::LITTLE;
+  return e == endianness::LITTLE;
 #endif
 }
 
 std::string to_string(encoding_type bom) {
   switch (bom) {
-      case UTF16_LE:     return "UTF16 little-endian";
-      case UTF16_BE:     return "UTF16 big-endian";
-      case UTF32_LE:     return "UTF32 little-endian";
-      case UTF32_BE:     return "UTF32 big-endian";
-      case UTF8:         return "UTF8";
-      case unspecified:  return "unknown";
-      default:           return "error";
+  case UTF16_LE:
+    return "UTF16 little-endian";
+  case UTF16_BE:
+    return "UTF16 big-endian";
+  case UTF32_LE:
+    return "UTF32 little-endian";
+  case UTF32_BE:
+    return "UTF32 big-endian";
+  case UTF8:
+    return "UTF8";
+  case unspecified:
+    return "unknown";
+  default:
+    return "error";
   }
 }
 
 namespace BOM {
 // Note that BOM for UTF8 is discouraged.
-encoding_type check_bom(const uint8_t* byte, size_t length) {
-        if (length >= 2 && byte[0] == 0xff and byte[1] == 0xfe) {
-            if (length >= 4 && byte[2] == 0x00 and byte[3] == 0x0) {
-                return encoding_type::UTF32_LE;
-            } else {
-                return encoding_type::UTF16_LE;
-            }
-        } else if (length >= 2 && byte[0] == 0xfe and byte[1] == 0xff) {
-            return encoding_type::UTF16_BE;
-        } else if (length >= 4 && byte[0] == 0x00 and byte[1] == 0x00 and byte[2] == 0xfe and byte[3] == 0xff) {
-            return encoding_type::UTF32_BE;
-        } else if (length >= 4 && byte[0] == 0xef and byte[1] == 0xbb and byte[2] == 0xbf) {
-            return encoding_type::UTF8;
-        }
-        return encoding_type::unspecified;
-    }
-
-encoding_type check_bom(const char* byte, size_t length) {
-      return check_bom(reinterpret_cast<const uint8_t*>(byte), length);
- }
-
- size_t bom_byte_size(encoding_type bom) {
-        switch (bom) {
-            case UTF16_LE:     return 2;
-            case UTF16_BE:     return 2;
-            case UTF32_LE:     return 4;
-            case UTF32_BE:     return 4;
-            case UTF8:         return 3;
-            case unspecified:  return 0;
-            default:           return 0;
-        }
+encoding_type check_bom(const uint8_t *byte, size_t length) {
+  if (length >= 2 && byte[0] == 0xff and byte[1] == 0xfe) {
+    if (length >= 4 && byte[2] == 0x00 and byte[3] == 0x0) {
+      return encoding_type::UTF32_LE;
+    } else {
+      return encoding_type::UTF16_LE;
+    }
+  } else if (length >= 2 && byte[0] == 0xfe and byte[1] == 0xff) {
+    return encoding_type::UTF16_BE;
+  } else if (length >= 4 && byte[0] == 0x00 and byte[1] == 0x00 and
+             byte[2] == 0xfe and byte[3] == 0xff) {
+    return encoding_type::UTF32_BE;
+  } else if (length >= 4 && byte[0] == 0xef and byte[1] == 0xbb and
+             byte[2] == 0xbf) {
+    return encoding_type::UTF8;
+  }
+  return encoding_type::unspecified;
 }
 
+encoding_type check_bom(const char *byte, size_t length) {
+  return check_bom(reinterpret_cast<const uint8_t *>(byte), length);
 }
+
+size_t bom_byte_size(encoding_type bom) {
+  switch (bom) {
+  case UTF16_LE:
+    return 2;
+  case UTF16_BE:
+    return 2;
+  case UTF32_LE:
+    return 4;
+  case UTF32_BE:
+    return 4;
+  case UTF8:
+    return 3;
+  case unspecified:
+    return 0;
+  default:
+    return 0;
+  }
 }
+
+} // namespace BOM
+} // namespace simdutf
 /* end file src/encoding_types.cpp */
 /* begin file src/error.cpp */
 namespace simdutf {
@@ -7503,4318 +9632,808 @@ namespace utf8_to_utf16 {
  * performance penalty.
  */
 
-const uint8_t shufutf8[209][16] =
-{	{0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 5, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 6, 5, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 6, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 7, 6, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 6, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 7, 6, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
- 	{0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
- 	{1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 8, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 9, 8, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 9, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 10, 9, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 9, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 10, 9, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 255, 0, 0, 0, 0},
- 	{1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 5, 255, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 6, 5, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 7, 6, 5, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 6, 255, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 7, 6, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 8, 7, 6, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 7, 255, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 8, 7, 255, 255},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 9, 8, 7, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 6, 255, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 7, 6, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 8, 7, 6, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 7, 255, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 8, 7, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 9, 8, 7, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 8, 255, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 9, 8, 255, 255},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 10, 9, 8, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 7, 6, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 8, 7, 6, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 7, 255, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 8, 7, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 9, 8, 7, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 8, 255, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 9, 8, 255, 255},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 10, 9, 8, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 7, 255, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 8, 7, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 9, 8, 7, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 8, 255, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 9, 8, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 10, 9, 8, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 9, 255, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 10, 9, 255, 255},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 11, 10, 9, 255},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 1, 255, 255, 255, 5, 4, 3, 2, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 2, 1, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 3, 2, 1, 255, 7, 6, 5, 4, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 4, 3, 2, 1, 5, 255, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 4, 3, 2, 1, 6, 5, 255, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 4, 3, 2, 1, 7, 6, 5, 255, 0, 0, 0, 0},
- 	{0, 255, 255, 255, 4, 3, 2, 1, 8, 7, 6, 5, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 2, 255, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 3, 2, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 4, 3, 2, 255, 8, 7, 6, 5, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 5, 4, 3, 2, 6, 255, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 5, 4, 3, 2, 7, 6, 255, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 5, 4, 3, 2, 8, 7, 6, 255, 0, 0, 0, 0},
- 	{1, 0, 255, 255, 5, 4, 3, 2, 9, 8, 7, 6, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 3, 255, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 4, 3, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 5, 4, 3, 255, 9, 8, 7, 6, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 6, 5, 4, 3, 7, 255, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 6, 5, 4, 3, 8, 7, 255, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 6, 5, 4, 3, 9, 8, 7, 255, 0, 0, 0, 0},
- 	{2, 1, 0, 255, 6, 5, 4, 3, 10, 9, 8, 7, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 4, 255, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 4, 255, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 4, 255, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 4, 255, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 5, 4, 255, 255, 6, 255, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 5, 4, 255, 255, 7, 6, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 5, 4, 255, 255, 8, 7, 6, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 5, 4, 255, 255, 9, 8, 7, 6, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 6, 5, 4, 255, 7, 255, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 6, 5, 4, 255, 8, 7, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 6, 5, 4, 255, 9, 8, 7, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 6, 5, 4, 255, 10, 9, 8, 7, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 7, 6, 5, 4, 8, 255, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 7, 6, 5, 4, 9, 8, 255, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 7, 6, 5, 4, 10, 9, 8, 255, 0, 0, 0, 0},
- 	{3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 0, 0, 0, 0}};
+const uint8_t shufutf8[209][16] = {
+    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 5, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 6, 5, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 7, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 8, 7, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 8, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 9, 8, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 7, 6, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 8, 7, 6, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 8, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 9, 8, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 8, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 9, 8, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 9, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 10, 9, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 11, 10, 9, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 5, 4, 3, 2, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 5, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 6, 5, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 7, 6, 5, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 8, 7, 6, 5, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 6, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 7, 6, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 8, 7, 6, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 9, 8, 7, 6, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 9, 8, 7, 6, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 7, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 8, 7, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 9, 8, 7, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 10, 9, 8, 7, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 6, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 7, 6, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 8, 7, 6, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 9, 8, 7, 6, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 7, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 8, 7, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 9, 8, 7, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 10, 9, 8, 7, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 8, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 9, 8, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 10, 9, 8, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 0, 0, 0, 0}};
 /* number of two bytes : 64 */
 /* number of two + three bytes : 145 */
 /* number of two + three + four bytes : 209 */
-const uint8_t utf8bigindex[4096][2] =
-{	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{148, 6},
- 	{209, 12},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{209, 12},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{209, 12},
- 	{155, 7},
- 	{167, 7},
- 	{69, 7},
- 	{179, 7},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{170, 7},
- 	{71, 7},
- 	{182, 7},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{185, 7},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{171, 8},
- 	{72, 8},
- 	{183, 8},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{186, 8},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{68, 6},
- 	{122, 8},
- 	{74, 6},
- 	{92, 6},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{76, 6},
- 	{94, 6},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{77, 7},
- 	{95, 7},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{187, 9},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{77, 7},
- 	{95, 7},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{176, 10},
- 	{148, 6},
- 	{188, 10},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{200, 10},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{191, 10},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{203, 10},
- 	{90, 10},
- 	{108, 10},
- 	{69, 7},
- 	{126, 10},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{114, 10},
- 	{71, 7},
- 	{132, 10},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{138, 10},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{206, 10},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{116, 10},
- 	{72, 8},
- 	{134, 10},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{140, 10},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{15, 10},
- 	{122, 8},
- 	{23, 10},
- 	{39, 10},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{27, 10},
- 	{43, 10},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{51, 10},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{29, 10},
- 	{45, 10},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{53, 10},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{57, 10},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{142, 10},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{30, 10},
- 	{46, 10},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{54, 10},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{58, 10},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{60, 10},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{148, 6},
- 	{209, 12},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{209, 12},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{192, 11},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{204, 11},
- 	{155, 7},
- 	{167, 7},
- 	{69, 7},
- 	{179, 7},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{170, 7},
- 	{71, 7},
- 	{182, 7},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{185, 7},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{207, 11},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{117, 11},
- 	{72, 8},
- 	{135, 11},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{141, 11},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{68, 6},
- 	{122, 8},
- 	{74, 6},
- 	{92, 6},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{76, 6},
- 	{94, 6},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{77, 7},
- 	{95, 7},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{143, 11},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{31, 11},
- 	{47, 11},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{55, 11},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{59, 11},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{61, 11},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{176, 10},
- 	{148, 6},
- 	{188, 10},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{200, 10},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{191, 10},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{203, 10},
- 	{90, 10},
- 	{108, 10},
- 	{69, 7},
- 	{126, 10},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{114, 10},
- 	{71, 7},
- 	{132, 10},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{138, 10},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{206, 10},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{116, 10},
- 	{72, 8},
- 	{134, 10},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{140, 10},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{62, 11},
- 	{15, 10},
- 	{122, 8},
- 	{23, 10},
- 	{39, 10},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{27, 10},
- 	{43, 10},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{51, 10},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{29, 10},
- 	{45, 10},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{53, 10},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{57, 10},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{142, 10},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{30, 10},
- 	{46, 10},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{54, 10},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{58, 10},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{60, 10},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{148, 6},
- 	{209, 12},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{209, 12},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{209, 12},
- 	{155, 7},
- 	{167, 7},
- 	{69, 7},
- 	{179, 7},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{170, 7},
- 	{71, 7},
- 	{182, 7},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{185, 7},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{208, 12},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{171, 8},
- 	{72, 8},
- 	{183, 8},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{186, 8},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{68, 6},
- 	{122, 8},
- 	{74, 6},
- 	{92, 6},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{76, 6},
- 	{94, 6},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{77, 7},
- 	{95, 7},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{144, 12},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{77, 7},
- 	{95, 7},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{176, 10},
- 	{148, 6},
- 	{188, 10},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{200, 10},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{191, 10},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{203, 10},
- 	{90, 10},
- 	{108, 10},
- 	{69, 7},
- 	{126, 10},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{114, 10},
- 	{71, 7},
- 	{132, 10},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{138, 10},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{206, 10},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{116, 10},
- 	{72, 8},
- 	{134, 10},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{140, 10},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{63, 12},
- 	{15, 10},
- 	{122, 8},
- 	{23, 10},
- 	{39, 10},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{27, 10},
- 	{43, 10},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{51, 10},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{29, 10},
- 	{45, 10},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{53, 10},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{57, 10},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{142, 10},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{30, 10},
- 	{46, 10},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{54, 10},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{58, 10},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{60, 10},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{148, 6},
- 	{209, 12},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{209, 12},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{192, 11},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{204, 11},
- 	{155, 7},
- 	{167, 7},
- 	{69, 7},
- 	{179, 7},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{170, 7},
- 	{71, 7},
- 	{182, 7},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{185, 7},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{207, 11},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{117, 11},
- 	{72, 8},
- 	{135, 11},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{141, 11},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{104, 8},
- 	{68, 6},
- 	{122, 8},
- 	{74, 6},
- 	{92, 6},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{76, 6},
- 	{94, 6},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{77, 7},
- 	{95, 7},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{143, 11},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{31, 11},
- 	{47, 11},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{55, 11},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{59, 11},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{61, 11},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{147, 5},
- 	{209, 12},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{209, 12},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{176, 10},
- 	{148, 6},
- 	{188, 10},
- 	{151, 6},
- 	{163, 6},
- 	{66, 6},
- 	{200, 10},
- 	{154, 6},
- 	{166, 6},
- 	{68, 6},
- 	{178, 6},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{169, 6},
- 	{70, 6},
- 	{181, 6},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{191, 10},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{203, 10},
- 	{90, 10},
- 	{108, 10},
- 	{69, 7},
- 	{126, 10},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{114, 10},
- 	{71, 7},
- 	{132, 10},
- 	{77, 7},
- 	{95, 7},
- 	{65, 5},
- 	{194, 7},
- 	{83, 7},
- 	{101, 7},
- 	{67, 5},
- 	{119, 7},
- 	{73, 5},
- 	{91, 5},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{138, 10},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{103, 7},
- 	{68, 6},
- 	{121, 7},
- 	{74, 6},
- 	{92, 6},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{76, 6},
- 	{94, 6},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{206, 10},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{116, 10},
- 	{72, 8},
- 	{134, 10},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{140, 10},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{62, 11},
- 	{15, 10},
- 	{122, 8},
- 	{23, 10},
- 	{39, 10},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{27, 10},
- 	{43, 10},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{51, 10},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{29, 10},
- 	{45, 10},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{53, 10},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{57, 10},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{146, 4},
- 	{209, 12},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{160, 9},
- 	{172, 9},
- 	{147, 5},
- 	{184, 9},
- 	{150, 5},
- 	{162, 5},
- 	{65, 5},
- 	{196, 9},
- 	{153, 5},
- 	{165, 5},
- 	{67, 5},
- 	{177, 5},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{175, 9},
- 	{148, 6},
- 	{142, 10},
- 	{81, 9},
- 	{99, 9},
- 	{66, 6},
- 	{199, 9},
- 	{87, 9},
- 	{105, 9},
- 	{68, 6},
- 	{123, 9},
- 	{74, 6},
- 	{92, 6},
- 	{64, 4},
- 	{209, 12},
- 	{157, 6},
- 	{111, 9},
- 	{70, 6},
- 	{129, 9},
- 	{76, 6},
- 	{94, 6},
- 	{65, 5},
- 	{193, 6},
- 	{82, 6},
- 	{100, 6},
- 	{67, 5},
- 	{118, 6},
- 	{73, 5},
- 	{91, 5},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{190, 9},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{202, 9},
- 	{89, 9},
- 	{107, 9},
- 	{69, 7},
- 	{125, 9},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{113, 9},
- 	{71, 7},
- 	{131, 9},
- 	{30, 10},
- 	{46, 10},
- 	{7, 9},
- 	{194, 7},
- 	{83, 7},
- 	{54, 10},
- 	{11, 9},
- 	{119, 7},
- 	{19, 9},
- 	{35, 9},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{137, 9},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{58, 10},
- 	{13, 9},
- 	{121, 7},
- 	{21, 9},
- 	{37, 9},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{25, 9},
- 	{41, 9},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{49, 9},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{145, 3},
- 	{205, 9},
- 	{156, 8},
- 	{168, 8},
- 	{146, 4},
- 	{180, 8},
- 	{149, 4},
- 	{161, 4},
- 	{64, 4},
- 	{209, 12},
- 	{159, 8},
- 	{115, 9},
- 	{72, 8},
- 	{133, 9},
- 	{78, 8},
- 	{96, 8},
- 	{65, 5},
- 	{195, 8},
- 	{84, 8},
- 	{102, 8},
- 	{67, 5},
- 	{120, 8},
- 	{73, 5},
- 	{91, 5},
- 	{64, 4},
- 	{209, 12},
- 	{209, 12},
- 	{174, 8},
- 	{148, 6},
- 	{139, 9},
- 	{80, 8},
- 	{98, 8},
- 	{66, 6},
- 	{198, 8},
- 	{86, 8},
- 	{60, 10},
- 	{14, 9},
- 	{122, 8},
- 	{22, 9},
- 	{38, 9},
- 	{3, 8},
- 	{209, 12},
- 	{157, 6},
- 	{110, 8},
- 	{70, 6},
- 	{128, 8},
- 	{26, 9},
- 	{42, 9},
- 	{5, 8},
- 	{193, 6},
- 	{82, 6},
- 	{50, 9},
- 	{9, 8},
- 	{118, 6},
- 	{17, 8},
- 	{33, 8},
- 	{0, 6},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{209, 12},
- 	{189, 8},
- 	{152, 7},
- 	{164, 7},
- 	{145, 3},
- 	{201, 8},
- 	{88, 8},
- 	{106, 8},
- 	{69, 7},
- 	{124, 8},
- 	{75, 7},
- 	{93, 7},
- 	{64, 4},
- 	{209, 12},
- 	{158, 7},
- 	{112, 8},
- 	{71, 7},
- 	{130, 8},
- 	{28, 9},
- 	{44, 9},
- 	{6, 8},
- 	{194, 7},
- 	{83, 7},
- 	{52, 9},
- 	{10, 8},
- 	{119, 7},
- 	{18, 8},
- 	{34, 8},
- 	{1, 7},
- 	{209, 12},
- 	{209, 12},
- 	{173, 7},
- 	{148, 6},
- 	{136, 8},
- 	{79, 7},
- 	{97, 7},
- 	{66, 6},
- 	{197, 7},
- 	{85, 7},
- 	{56, 9},
- 	{12, 8},
- 	{121, 7},
- 	{20, 8},
- 	{36, 8},
- 	{2, 7},
- 	{209, 12},
- 	{157, 6},
- 	{109, 7},
- 	{70, 6},
- 	{127, 7},
- 	{24, 8},
- 	{40, 8},
- 	{4, 7},
- 	{193, 6},
- 	{82, 6},
- 	{48, 8},
- 	{8, 7},
- 	{118, 6},
- 	{16, 7},
- 	{32, 7},
- 	{0, 6}};
-} // utf8_to_utf16 namespace
-} // tables namespace
+const uint8_t utf8bigindex[4096][2] = {
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
+    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12},
+    {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},
+    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {152, 7},
+    {164, 7},  {145, 3},  {209, 12}, {155, 7},  {167, 7},  {69, 7},   {179, 7},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},
+    {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
+    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
+    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {171, 8},
+    {72, 8},   {183, 8},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {186, 8},  {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},
+    {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},
+    {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
+    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {187, 9},  {81, 9},
+    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
+    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
+    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {77, 7},   {95, 7},
+    {7, 9},    {194, 7},  {83, 7},   {101, 7},  {11, 9},   {119, 7},  {19, 9},
+    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {13, 9},
+    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
+    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
+    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
+    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},
+    {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},
+    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10},
+    {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},
+    {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},
+    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
+    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10},
+    {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},
+    {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10},
+    {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},
+    {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},
+    {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},
+    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
+    {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},
+    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},
+    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {15, 10},  {122, 8},  {23, 10},
+    {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
+    {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},
+    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
+    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},
+    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},
+    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
+    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},
+    {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10},
+    {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},
+    {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},
+    {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
+    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},
+    {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},
+    {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},
+    {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},
+    {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},
+    {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},
+    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
+    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
+    {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
+    {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12},
+    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},
+    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
+    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
+    {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},
+    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
+    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
+    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12},
+    {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},
+    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},
+    {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},
+    {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},
+    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
+    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {192, 11}, {152, 7},  {164, 7},  {145, 3},  {204, 11}, {155, 7},  {167, 7},
+    {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},
+    {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},
+    {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {207, 11}, {156, 8},
+    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {159, 8},  {117, 11}, {72, 8},   {135, 11}, {78, 8},   {96, 8},   {65, 5},
+    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {141, 11}, {80, 8},
+    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},
+    {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
+    {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},
+    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
+    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},
+    {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},
+    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},
+    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
+    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},
+    {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},
+    {143, 11}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},
+    {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
+    {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
+    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},
+    {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},
+    {31, 11},  {47, 11},  {7, 9},    {194, 7},  {83, 7},   {55, 11},  {11, 9},
+    {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {59, 11},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},
+    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},
+    {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
+    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},
+    {86, 8},   {61, 11},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},
+    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},
+    {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},
+    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
+    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
+    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
+    {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},
+    {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
+    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
+    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12},
+    {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12},
+    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},
+    {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},
+    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},
+    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
+    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},
+    {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},
+    {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},
+    {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10},
+    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},
+    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10},
+    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {62, 11},  {15, 10},
+    {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},
+    {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},
+    {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
+    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},
+    {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},
+    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},
+    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
+    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},
+    {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},
+    {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},
+    {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},
+    {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},
+    {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},
+    {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},
+    {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},
+    {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},
+    {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},
+    {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},
+    {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},
+    {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},
+    {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {152, 7},  {164, 7},  {145, 3},  {209, 12},
+    {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},
+    {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},
+    {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},
+    {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},
+    {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {208, 12}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {171, 8},  {72, 8},   {183, 8},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {186, 8},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
+    {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},
+    {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},
+    {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},
+    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {175, 9},  {148, 6},  {144, 12}, {81, 9},   {99, 9},   {66, 6},   {199, 9},
+    {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},
+    {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},
+    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
+    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},
+    {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},
+    {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},
+    {71, 7},   {131, 9},  {77, 7},   {95, 7},   {7, 9},    {194, 7},  {83, 7},
+    {101, 7},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {13, 9},   {121, 7},  {21, 9},   {37, 9},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},
+    {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},
+    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
+    {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},
+    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},
+    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {14, 9},   {122, 8},  {22, 9},
+    {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
+    {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},
+    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
+    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},
+    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},
+    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
+    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},
+    {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10},
+    {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},
+    {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},
+    {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
+    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},
+    {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},
+    {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},
+    {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},
+    {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},
+    {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10},
+    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
+    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
+    {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
+    {63, 12},  {15, 10},  {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12},
+    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},
+    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
+    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
+    {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},
+    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
+    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
+    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},
+    {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},
+    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},
+    {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},
+    {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},
+    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
+    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},
+    {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},
+    {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},
+    {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},
+    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},
+    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},
+    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},
+    {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
+    {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},
+    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
+    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},
+    {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},
+    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},
+    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
+    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},
+    {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},
+    {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},
+    {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
+    {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
+    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {192, 11}, {152, 7},  {164, 7},
+    {145, 3},  {204, 11}, {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},
+    {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},
+    {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},
+    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {207, 11}, {156, 8},  {168, 8},  {146, 4},  {180, 8},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {117, 11}, {72, 8},
+    {135, 11}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
+    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {174, 8},  {148, 6},  {141, 11}, {80, 8},   {98, 8},   {66, 6},   {198, 8},
+    {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},
+    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},
+    {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},
+    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
+    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
+    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
+    {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},
+    {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
+    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
+    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},
+    {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},
+    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {175, 9},  {148, 6},  {143, 11}, {81, 9},   {99, 9},
+    {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},
+    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},
+    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
+    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},
+    {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {113, 9},  {71, 7},   {131, 9},  {31, 11},  {47, 11},  {7, 9},
+    {194, 7},  {83, 7},   {55, 11},  {11, 9},   {119, 7},  {19, 9},   {35, 9},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {59, 11},  {13, 9},   {121, 7},
+    {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},
+    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},
+    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},
+    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {61, 11},  {14, 9},
+    {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},
+    {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},
+    {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
+    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},
+    {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},
+    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},
+    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
+    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
+    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10},
+    {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},
+    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},
+    {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},
+    {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
+    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
+    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10},
+    {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {62, 11},  {15, 10},  {122, 8},  {23, 10},  {39, 10},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},
+    {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},
+    {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
+    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},
+    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
+    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
+    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},
+    {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},
+    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},
+    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
+    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},
+    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
+    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6}};
+} // namespace utf8_to_utf16
+} // namespace tables
 } // unnamed namespace
 } // namespace simdutf
 
@@ -11830,528 +10449,761 @@ namespace {
 namespace tables {
 namespace utf16_to_utf8 {
 
-  // 1 byte for length, 16 bytes for mask
-  const uint8_t pack_1_2_utf8_bytes[256][17] = {
-    {16,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14},
-    {15,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14,0x80},
-    {15,1,0,3,2,5,4,7,6,8,11,10,13,12,15,14,0x80},
-    {14,0,3,2,5,4,7,6,8,11,10,13,12,15,14,0x80,0x80},
-    {15,1,0,2,5,4,7,6,9,8,11,10,13,12,15,14,0x80},
-    {14,0,2,5,4,7,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {14,1,0,2,5,4,7,6,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,2,5,4,7,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {15,1,0,3,2,5,4,7,6,9,8,10,13,12,15,14,0x80},
-    {14,0,3,2,5,4,7,6,9,8,10,13,12,15,14,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,8,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80},
-    {14,1,0,2,5,4,7,6,9,8,10,13,12,15,14,0x80,0x80},
-    {13,0,2,5,4,7,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {15,1,0,3,2,4,7,6,9,8,11,10,13,12,15,14,0x80},
-    {14,0,3,2,4,7,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {14,1,0,3,2,4,7,6,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,4,7,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {14,1,0,2,4,7,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,2,4,7,6,9,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,2,4,7,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,4,7,6,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,4,7,6,9,8,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,4,7,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,4,7,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,4,7,6,9,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {15,1,0,3,2,5,4,7,6,9,8,11,10,12,15,14,0x80},
-    {14,0,3,2,5,4,7,6,9,8,11,10,12,15,14,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,8,11,10,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80},
-    {14,1,0,2,5,4,7,6,9,8,11,10,12,15,14,0x80,0x80},
-    {13,0,2,5,4,7,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,9,8,10,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,7,6,8,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,4,7,6,9,8,11,10,12,15,14,0x80,0x80},
-    {13,0,3,2,4,7,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,4,7,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,2,4,7,6,9,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,7,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {15,1,0,3,2,5,4,6,9,8,11,10,13,12,15,14,0x80},
-    {14,0,3,2,5,4,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {14,1,0,3,2,5,4,6,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {14,1,0,2,5,4,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,2,5,4,6,9,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,2,5,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,6,9,8,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,4,6,9,8,11,10,13,12,15,14,0x80,0x80},
-    {13,0,3,2,4,6,9,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,4,6,9,8,11,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,2,4,6,9,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,6,8,11,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,6,9,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,8,10,13,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,6,9,8,11,10,12,15,14,0x80,0x80},
-    {13,0,3,2,5,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,2,5,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,9,8,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,5,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,5,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,5,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,5,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80},
-    {12,0,3,2,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,6,9,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,8,11,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,9,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,4,6,8,10,12,15,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {15,1,0,3,2,5,4,7,6,9,8,11,10,13,12,14,0x80},
-    {14,0,3,2,5,4,7,6,9,8,11,10,13,12,14,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,8,11,10,13,12,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80},
-    {14,1,0,2,5,4,7,6,9,8,11,10,13,12,14,0x80,0x80},
-    {13,0,2,5,4,7,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,9,8,10,13,12,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,7,6,8,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,4,7,6,9,8,11,10,13,12,14,0x80,0x80},
-    {13,0,3,2,4,7,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,4,7,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,2,4,7,6,9,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,7,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,7,6,9,8,11,10,12,14,0x80,0x80},
-    {13,0,3,2,5,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,7,6,8,11,10,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80},
-    {12,0,2,5,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,7,6,9,8,10,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,5,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,5,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,5,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,5,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80},
-    {12,0,3,2,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,7,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,7,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,7,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,4,7,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {14,1,0,3,2,5,4,6,9,8,11,10,13,12,14,0x80,0x80},
-    {13,0,3,2,5,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {13,1,0,2,5,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,2,5,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,9,8,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,5,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,5,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,5,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,5,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80},
-    {12,0,3,2,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,4,6,9,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,8,11,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,9,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,4,6,8,10,13,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {13,1,0,3,2,5,4,6,9,8,11,10,12,14,0x80,0x80,0x80},
-    {12,0,3,2,5,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,5,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,5,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,2,5,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,2,5,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,5,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,5,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,5,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,5,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,5,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,5,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,5,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,5,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,5,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,5,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {12,1,0,3,2,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80},
-    {11,0,3,2,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,2,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,2,4,6,9,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,4,6,8,11,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,1,0,3,2,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80},
-    {10,0,3,2,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,3,2,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,3,2,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,1,0,2,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,2,4,6,9,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,1,0,2,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,2,4,6,8,10,12,14,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80}
-  };
-
-  // 1 byte for length, 16 bytes for mask
-  const uint8_t pack_1_2_3_utf8_bytes[256][17] = {
-    {12,2,3,1,6,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80},
-    {9,6,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,3,1,6,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80},
-    {10,0,6,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,2,3,1,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80},
-    {8,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,3,1,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,7,5,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,4,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,4,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,4,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,4,10,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,6,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,6,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,6,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,6,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,7,5,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,4,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,4,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,4,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,4,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,2,3,1,6,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80},
-    {8,6,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,3,1,6,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,6,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,7,5,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,4,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,4,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,4,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,4,11,9,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,6,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,6,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,6,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,6,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,7,5,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,4,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,4,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,4,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,4,8,14,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,6,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,6,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,6,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,6,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,7,5,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,4,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,4,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,4,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,4,10,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,6,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,6,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,6,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,6,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,2,3,1,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {0,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,3,1,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {1,0,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,7,5,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,2,3,1,4,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {1,4,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,3,1,4,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,0,4,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,6,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,6,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,6,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,6,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,7,5,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,4,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,4,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,4,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,4,11,9,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,6,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,6,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,6,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,6,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,2,3,1,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {1,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,3,1,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,0,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,7,5,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,4,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,4,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,4,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,4,8,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {11,2,3,1,6,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80},
-    {8,6,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,3,1,6,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,0,6,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,7,5,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,4,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,4,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,4,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,4,10,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,6,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,6,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,6,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,6,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,7,5,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,4,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,4,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,4,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,4,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,6,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,6,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,6,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,6,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,7,5,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,4,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,4,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,4,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,4,11,9,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,6,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,6,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,6,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,6,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,7,5,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,4,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,4,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,4,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,4,8,15,13,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {10,2,3,1,6,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,6,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,3,1,6,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,0,6,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,7,5,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,4,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,4,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,4,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,4,10,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,6,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,6,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,6,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,6,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,2,3,1,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {1,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,3,1,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,0,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,7,5,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,4,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,4,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,4,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,4,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {9,2,3,1,6,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,6,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,3,1,6,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,0,6,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,7,5,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,4,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,4,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,4,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,4,11,9,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {8,2,3,1,6,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,6,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,3,1,6,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,0,6,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,2,3,1,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {2,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,3,1,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,0,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {7,2,3,1,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,3,1,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,0,7,5,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {6,2,3,1,4,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {3,4,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {5,3,1,4,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80},
-    {4,0,4,8,12,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80,0x80}
-  };
-
-} // utf16_to_utf8 namespace
-} // tables namespace
+// 1 byte for length, 16 bytes for mask
+const uint8_t pack_1_2_utf8_bytes[256][17] = {
+    {16, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14},
+    {15, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80},
+    {15, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {15, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80},
+    {14, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {15, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {15, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80},
+    {14, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {15, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {15, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 6, 8, 11, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 6, 9, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 8, 10, 13, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 6, 9, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 8, 11, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 9, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {15, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {11, 1, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 1, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80}};
+
+// 1 byte for length, 16 bytes for mask
+const uint8_t pack_1_2_3_utf8_bytes[256][17] = {
+    {12, 2, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80},
+    {9, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {11, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 2, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {11, 2, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 2, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {11, 2, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 2, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {1, 0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {5, 2, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 2, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 2, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 2, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 2, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {11, 2, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {4, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {3, 0, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {10, 2, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 2, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 2, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 2, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 2, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80}};
+
+} // namespace utf16_to_utf8
+} // namespace tables
 } // unnamed namespace
 } // namespace simdutf
 
@@ -12371,49 +11223,58 @@ namespace ascii {
 #if SIMDUTF_IMPLEMENTATION_FALLBACK
 // Only used by the fallback kernel.
 inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
-    const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-    uint64_t pos = 0;
-    // process in blocks of 16 bytes when possible
-    for (;pos + 16 <= len; pos += 16) {
-        uint64_t v1;
-        std::memcpy(&v1, data + pos, sizeof(uint64_t));
-        uint64_t v2;
-        std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-        uint64_t v{v1 | v2};
-        if ((v & 0x8080808080808080) != 0) { return false; }
-    }
-    // process the tail byte-by-byte
-    for (;pos < len; pos ++) {
-        if (data[pos] >= 0b10000000) { return false; }
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  uint64_t pos = 0;
+  // process in blocks of 16 bytes when possible
+  for (; pos + 16 <= len; pos += 16) {
+    uint64_t v1;
+    std::memcpy(&v1, data + pos, sizeof(uint64_t));
+    uint64_t v2;
+    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+    uint64_t v{v1 | v2};
+    if ((v & 0x8080808080808080) != 0) {
+      return false;
     }
-    return true;
+  }
+  // process the tail byte-by-byte
+  for (; pos < len; pos++) {
+    if (data[pos] >= 0b10000000) {
+      return false;
+    }
+  }
+  return true;
 }
 #endif
 
-inline simdutf_warn_unused result validate_with_errors(const char *buf, size_t len) noexcept {
-    const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-    size_t pos = 0;
-    // process in blocks of 16 bytes when possible
-    for (;pos + 16 <= len; pos += 16) {
-        uint64_t v1;
-        std::memcpy(&v1, data + pos, sizeof(uint64_t));
-        uint64_t v2;
-        std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-        uint64_t v{v1 | v2};
-        if ((v & 0x8080808080808080) != 0) {
-            for (;pos < len; pos ++) {
-                if (data[pos] >= 0b10000000) { return result(error_code::TOO_LARGE, pos); }
-            }
+inline simdutf_warn_unused result validate_with_errors(const char *buf,
+                                                       size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  // process in blocks of 16 bytes when possible
+  for (; pos + 16 <= len; pos += 16) {
+    uint64_t v1;
+    std::memcpy(&v1, data + pos, sizeof(uint64_t));
+    uint64_t v2;
+    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+    uint64_t v{v1 | v2};
+    if ((v & 0x8080808080808080) != 0) {
+      for (; pos < len; pos++) {
+        if (data[pos] >= 0b10000000) {
+          return result(error_code::TOO_LARGE, pos);
         }
+      }
     }
-    // process the tail byte-by-byte
-    for (;pos < len; pos ++) {
-        if (data[pos] >= 0b10000000) { return result(error_code::TOO_LARGE, pos); }
+  }
+  // process the tail byte-by-byte
+  for (; pos < len; pos++) {
+    if (data[pos] >= 0b10000000) {
+      return result(error_code::TOO_LARGE, pos);
     }
-    return result(error_code::SUCCESS, pos);
+  }
+  return result(error_code::SUCCESS, pos);
 }
 
-} // ascii namespace
+} // namespace ascii
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12435,19 +11296,19 @@ inline size_t utf32_length_from_latin1(size_t len) {
 }
 
 inline size_t utf8_length_from_latin1(const char *buf, size_t len) {
-  const uint8_t * c = reinterpret_cast<const uint8_t *>(buf);
+  const uint8_t *c = reinterpret_cast<const uint8_t *>(buf);
   size_t answer = 0;
-  for(size_t i = 0; i<len; i++) {
-    if((c[i]>>7)) { answer++; }
+  for (size_t i = 0; i < len; i++) {
+    if ((c[i] >> 7)) {
+      answer++;
+    }
   }
   return answer + len;
 }
 
-inline size_t utf16_length_from_latin1(size_t len) {
-  return len;
-}
+inline size_t utf16_length_from_latin1(size_t len) { return len; }
 
-} // utf32 namespace
+} // namespace latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12466,55 +11327,57 @@ namespace utf32_to_utf8 {
 
 #if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
 // only used by the fallback and POWER kernel
-inline size_t convert_valid(const char32_t* buf, size_t len, char* utf8_output) {
-	const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char *utf8_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0xFFFFFF80FFFFFF80) == 0) {
         *utf8_output++ = char(buf[pos]);
-				*utf8_output++ = char(buf[pos+1]);
+        *utf8_output++ = char(buf[pos + 1]);
         pos += 2;
         continue;
       }
     }
     uint32_t word = data[pos];
-    if((word & 0xFFFFFF80)==0) {
+    if ((word & 0xFFFFFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xFFFFF800)==0) {
+    } else if ((word & 0xFFFFF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word & 0xFFFF0000)==0) {
+    } else if ((word & 0xFFFF0000) == 0) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>18) | 0b11110000);
-      *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos ++;
+      pos++;
     }
   }
   return utf8_output - start;
 }
 #endif // SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
 
-} // utf32_to_utf8 namespace
+} // namespace utf32_to_utf8
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12530,105 +11393,116 @@ namespace scalar {
 namespace {
 namespace utf32_to_utf8 {
 
-inline size_t convert(const char32_t* buf, size_t len, char* utf8_output) {
+inline size_t convert(const char32_t *buf, size_t len, char *utf8_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0xFFFFFF80FFFFFF80) == 0) {
         *utf8_output++ = char(buf[pos]);
-				*utf8_output++ = char(buf[pos+1]);
+        *utf8_output++ = char(buf[pos + 1]);
         pos += 2;
         continue;
       }
     }
     uint32_t word = data[pos];
-    if((word & 0xFFFFFF80)==0) {
+    if ((word & 0xFFFFFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xFFFFF800)==0) {
+    } else if ((word & 0xFFFFF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word & 0xFFFF0000)==0) {
+    } else if ((word & 0xFFFF0000) == 0) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-			if (word >= 0xD800 && word <= 0xDFFF) { return 0; }
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return 0;
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-			if (word > 0x10FFFF) { return 0; }
-      *utf8_output++ = char((word>>18) | 0b11110000);
-      *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      if (word > 0x10FFFF) {
+        return 0;
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos ++;
+      pos++;
     }
   }
   return utf8_output - start;
 }
 
-inline result convert_with_errors(const char32_t* buf, size_t len, char* utf8_output) {
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char *utf8_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0xFFFFFF80FFFFFF80) == 0) {
         *utf8_output++ = char(buf[pos]);
-				*utf8_output++ = char(buf[pos+1]);
+        *utf8_output++ = char(buf[pos + 1]);
         pos += 2;
         continue;
       }
     }
     uint32_t word = data[pos];
-    if((word & 0xFFFFFF80)==0) {
+    if ((word & 0xFFFFFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xFFFFF800)==0) {
+    } else if ((word & 0xFFFFF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word & 0xFFFF0000)==0) {
+    } else if ((word & 0xFFFF0000) == 0) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-			if (word >= 0xD800 && word <= 0xDFFF) { return result(error_code::SURROGATE, pos); }
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-			if (word > 0x10FFFF) { return result(error_code::TOO_LARGE, pos); }
-      *utf8_output++ = char((word>>18) | 0b11110000);
-      *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      if (word > 0x10FFFF) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos ++;
+      pos++;
     }
   }
   return result(error_code::SUCCESS, utf8_output - start);
 }
 
-} // utf32_to_utf8 namespace
+} // namespace utf32_to_utf8
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12646,15 +11520,18 @@ namespace {
 namespace utf32_to_utf16 {
 
 template <endianness big_endian>
-inline size_t convert_valid(const char32_t* buf, size_t len, char16_t* utf16_output) {
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char16_t *utf16_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     uint32_t word = data[pos];
-    if((word & 0xFFFF0000)==0) {
+    if ((word & 0xFFFF0000) == 0) {
       // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(uint16_t(word))) : char16_t(word);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
       pos++;
     } else {
       // will generate a surrogate pair
@@ -12673,7 +11550,7 @@ inline size_t convert_valid(const char32_t* buf, size_t len, char16_t* utf16_out
   return utf16_output - start;
 }
 
-} // utf32_to_utf16 namespace
+} // namespace utf32_to_utf16
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12690,19 +11567,25 @@ namespace {
 namespace utf32_to_utf16 {
 
 template <endianness big_endian>
-inline size_t convert(const char32_t* buf, size_t len, char16_t* utf16_output) {
+inline size_t convert(const char32_t *buf, size_t len, char16_t *utf16_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     uint32_t word = data[pos];
-    if((word & 0xFFFF0000)==0) {
-      if (word >= 0xD800 && word <= 0xDFFF) { return 0; }
+    if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return 0;
+      }
       // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(uint16_t(word))) : char16_t(word);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
     } else {
       // will generate a surrogate pair
-      if (word > 0x10FFFF) { return 0; }
+      if (word > 0x10FFFF) {
+        return 0;
+      }
       word -= 0x10000;
       uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
       uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
@@ -12719,19 +11602,26 @@ inline size_t convert(const char32_t* buf, size_t len, char16_t* utf16_output) {
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) {
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char16_t *utf16_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     uint32_t word = data[pos];
-    if((word & 0xFFFF0000)==0) {
-      if (word >= 0xD800 && word <= 0xDFFF) { return result(error_code::SURROGATE, pos); }
+    if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return result(error_code::SURROGATE, pos);
+      }
       // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(uint16_t(word))) : char16_t(word);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
     } else {
       // will generate a surrogate pair
-      if (word > 0x10FFFF) { return result(error_code::TOO_LARGE, pos); }
+      if (word > 0x10FFFF) {
+        return result(error_code::TOO_LARGE, pos);
+      }
       word -= 0x10000;
       uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
       uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
@@ -12747,7 +11637,7 @@ inline result convert_with_errors(const char32_t* buf, size_t len, char16_t* utf
   return result(error_code::SUCCESS, utf16_output - start);
 }
 
-} // utf32_to_utf16 namespace
+} // namespace utf32_to_utf16
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12765,56 +11655,67 @@ namespace {
 namespace utf16_to_utf8 {
 
 template <endianness big_endian>
-inline size_t convert_valid(const char16_t* buf, size_t len, char* utf8_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 4 ASCII characters
-    if (pos + 4 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian)) { v = (v >> 8) | (v << (64 - 8)); }
+      if (!match_system(big_endian)) {
+        v = (v >> 8) | (v << (64 - 8));
+      }
       if ((v & 0xFF80FF80FF80FF80) == 0) {
         size_t final_pos = pos + 4;
-        while(pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian) ? char(utf16::swap_bytes(buf[pos])) : char(buf[pos]);
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
           pos++;
         }
         continue;
       }
     }
 
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word & 0xFF80)==0) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xF800)==0) {
+    } else if ((word & 0xF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word &0xF800 ) != 0xD800) {
+    } else if ((word & 0xF800) != 0xD800) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // must be a surrogate pair
       uint16_t diff = uint16_t(word - 0xD800);
-      if(pos + 1 >= len) { return 0; } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value>>18) | 0b11110000);
-      *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((value & 0b111111) | 0b10000000);
       pos += 2;
     }
@@ -12822,7 +11723,7 @@ inline size_t convert_valid(const char16_t* buf, size_t len, char* utf8_output)
   return utf8_output - start;
 }
 
-} // utf16_to_utf8 namespace
+} // namespace utf16_to_utf8
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12839,57 +11740,71 @@ namespace {
 namespace utf16_to_utf8 {
 
 template <endianness big_endian>
-inline size_t convert(const char16_t* buf, size_t len, char* utf8_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline size_t convert(const char16_t *buf, size_t len, char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 8 bytes
-    if (pos + 4 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian)) { v = (v >> 8) | (v << (64 - 8)); }
+      if (!match_system(big_endian)) {
+        v = (v >> 8) | (v << (64 - 8));
+      }
       if ((v & 0xFF80FF80FF80FF80) == 0) {
         size_t final_pos = pos + 4;
-        while(pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian) ? char(utf16::swap_bytes(buf[pos])) : char(buf[pos]);
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
           pos++;
         }
         continue;
       }
     }
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word & 0xFF80)==0) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xF800)==0) {
+    } else if ((word & 0xF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word &0xF800 ) != 0xD800) {
+    } else if ((word & 0xF800) != 0xD800) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // must be a surrogate pair
-      if(pos + 1 >= len) { return 0; }
+      if (pos + 1 >= len) {
+        return 0;
+      }
       uint16_t diff = uint16_t(word - 0xD800);
-      if(diff > 0x3FF) { return 0; }
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (diff > 0x3FF) {
+        return 0;
+      }
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if(diff2 > 0x3FF) { return 0; }
+      if (diff2 > 0x3FF) {
+        return 0;
+      }
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value>>18) | 0b11110000);
-      *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((value & 0b111111) | 0b10000000);
       pos += 2;
     }
@@ -12898,57 +11813,71 @@ inline size_t convert(const char16_t* buf, size_t len, char* utf8_output) {
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char16_t* buf, size_t len, char* utf8_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char* start{utf8_output};
+  char *start{utf8_output};
   while (pos < len) {
     // try to convert the next block of 8 bytes
-    if (pos + 4 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian)) v = (v >> 8) | (v << (64 - 8));
+      if (!match_system(big_endian))
+        v = (v >> 8) | (v << (64 - 8));
       if ((v & 0xFF80FF80FF80FF80) == 0) {
         size_t final_pos = pos + 4;
-        while(pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian) ? char(utf16::swap_bytes(buf[pos])) : char(buf[pos]);
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
           pos++;
         }
         continue;
       }
     }
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word & 0xFF80)==0) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
       // will generate one UTF-8 bytes
       *utf8_output++ = char(word);
       pos++;
-    } else if((word & 0xF800)==0) {
+    } else if ((word & 0xF800) == 0) {
       // will generate two UTF-8 bytes
       // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>6) | 0b11000000);
+      *utf8_output++ = char((word >> 6) | 0b11000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
-    } else if((word &0xF800 ) != 0xD800) {
+    } else if ((word & 0xF800) != 0xD800) {
       // will generate three UTF-8 bytes
       // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word>>12) | 0b11100000);
-      *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((word & 0b111111) | 0b10000000);
       pos++;
     } else {
       // must be a surrogate pair
-      if(pos + 1 >= len) { return result(error_code::SURROGATE, pos); }
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      }
       uint16_t diff = uint16_t(word - 0xD800);
-      if(diff > 0x3FF) { return result(error_code::SURROGATE, pos); }
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if(diff2 > 0x3FF) { return result(error_code::SURROGATE, pos); }
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       // will generate four UTF-8 bytes
       // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value>>18) | 0b11110000);
-      *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
       *utf8_output++ = char((value & 0b111111) | 0b10000000);
       pos += 2;
     }
@@ -12956,7 +11885,7 @@ inline result convert_with_errors(const char16_t* buf, size_t len, char* utf8_ou
   return result(error_code::SUCCESS, utf8_output - start);
 }
 
-} // utf16_to_utf8 namespace
+} // namespace utf16_to_utf8
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -12974,21 +11903,27 @@ namespace {
 namespace utf16_to_utf32 {
 
 template <endianness big_endian>
-inline size_t convert_valid(const char16_t* buf, size_t len, char32_t* utf32_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word &0xF800 ) != 0xD800) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
       // No surrogate pair, extend 16-bit word to 32-bit word
       *utf32_output++ = char32_t(word);
       pos++;
     } else {
       // must be a surrogate pair
       uint16_t diff = uint16_t(word - 0xD800);
-      if(pos + 1 >= len) { return 0; } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       *utf32_output++ = char32_t(value);
@@ -12998,7 +11933,7 @@ inline size_t convert_valid(const char16_t* buf, size_t len, char32_t* utf32_out
   return utf32_output - start;
 }
 
-} // utf16_to_utf32 namespace
+} // namespace utf16_to_utf32
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13015,24 +11950,33 @@ namespace {
 namespace utf16_to_utf32 {
 
 template <endianness big_endian>
-inline size_t convert(const char16_t* buf, size_t len, char32_t* utf32_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline size_t convert(const char16_t *buf, size_t len, char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word &0xF800 ) != 0xD800) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
       // No surrogate pair, extend 16-bit word to 32-bit word
       *utf32_output++ = char32_t(word);
       pos++;
     } else {
       // must be a surrogate pair
       uint16_t diff = uint16_t(word - 0xD800);
-      if(diff > 0x3FF) { return 0; }
-      if(pos + 1 >= len) { return 0; } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (diff > 0x3FF) {
+        return 0;
+      }
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if(diff2 > 0x3FF) { return 0; }
+      if (diff2 > 0x3FF) {
+        return 0;
+      }
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       *utf32_output++ = char32_t(value);
       pos += 2;
@@ -13042,24 +11986,34 @@ inline size_t convert(const char16_t* buf, size_t len, char32_t* utf32_output) {
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
-    uint16_t word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word &0xF800 ) != 0xD800) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
       // No surrogate pair, extend 16-bit word to 32-bit word
       *utf32_output++ = char32_t(word);
       pos++;
     } else {
       // must be a surrogate pair
       uint16_t diff = uint16_t(word - 0xD800);
-      if(diff > 0x3FF) { return result(error_code::SURROGATE, pos); }
-      if(pos + 1 >= len) { return result(error_code::SURROGATE, pos); } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian) ? utf16::swap_bytes(data[pos + 1]) : data[pos + 1];
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
       uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if(diff2 > 0x3FF) { return result(error_code::SURROGATE, pos); }
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
       uint32_t value = (diff << 10) + diff2 + 0x10000;
       *utf32_output++ = char32_t(value);
       pos += 2;
@@ -13068,7 +12022,7 @@ inline result convert_with_errors(const char16_t* buf, size_t len, char32_t* utf
   return result(error_code::SUCCESS, utf32_output - start);
 }
 
-} // utf16_to_utf32 namespace
+} // namespace utf16_to_utf32
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13086,19 +12040,23 @@ namespace {
 namespace utf8_to_utf16 {
 
 template <endianness big_endian>
-inline size_t convert_valid(const char* buf, size_t len, char16_t* utf16_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert_valid(const char *buf, size_t len,
+                            char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     // try to convert the next block of 8 ASCII bytes
-    if (pos + 8 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 8 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 8;
-        while(pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(buf[pos])) : char16_t(buf[pos]);
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
           pos++;
         }
         continue;
@@ -13107,13 +12065,18 @@ inline size_t convert_valid(const char* buf, size_t len, char16_t* utf16_output)
     uint8_t leading_byte = data[pos]; // leading byte
     if (leading_byte < 0b10000000) {
       // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(leading_byte)) : char16_t(leading_byte);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 1 >= len) { break; } // minimal bound checking
-      uint16_t code_point = uint16_t(((leading_byte &0b00011111) << 6) | (data[pos + 1] &0b00111111));
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      uint16_t code_point = uint16_t(((leading_byte & 0b00011111) << 6) |
+                                     (data[pos + 1] & 0b00111111));
       if (!match_system(big_endian)) {
         code_point = utf16::swap_bytes(uint16_t(code_point));
       }
@@ -13122,8 +12085,12 @@ inline size_t convert_valid(const char* buf, size_t len, char16_t* utf16_output)
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 2 >= len) { break; } // minimal bound checking
-      uint16_t code_point = uint16_t(((leading_byte &0b00001111) << 12) | ((data[pos + 1] &0b00111111) << 6) | (data[pos + 2] &0b00111111));
+      if (pos + 2 >= len) {
+        break;
+      } // minimal bound checking
+      uint16_t code_point = uint16_t(((leading_byte & 0b00001111) << 12) |
+                                     ((data[pos + 1] & 0b00111111) << 6) |
+                                     (data[pos + 2] & 0b00111111));
       if (!match_system(big_endian)) {
         code_point = utf16::swap_bytes(uint16_t(code_point));
       }
@@ -13131,9 +12098,13 @@ inline size_t convert_valid(const char* buf, size_t len, char16_t* utf16_output)
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { break; } // minimal bound checking
-      uint32_t code_point = ((leading_byte & 0b00000111) << 18 )| ((data[pos + 1] &0b00111111) << 12)
-                           | ((data[pos + 2] &0b00111111) << 6) | (data[pos + 3] &0b00111111);
+      if (pos + 3 >= len) {
+        break;
+      } // minimal bound checking
+      uint32_t code_point = ((leading_byte & 0b00000111) << 18) |
+                            ((data[pos + 1] & 0b00111111) << 12) |
+                            ((data[pos + 2] & 0b00111111) << 6) |
+                            (data[pos + 3] & 0b00111111);
       code_point -= 0x10000;
       uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
       uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
@@ -13152,7 +12123,6 @@ inline size_t convert_valid(const char* buf, size_t len, char16_t* utf16_output)
   return utf16_output - start;
 }
 
-
 } // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace scalar
@@ -13170,13 +12140,14 @@ namespace {
 namespace utf8_to_utf16 {
 
 template <endianness big_endian>
-inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
@@ -13184,8 +12155,10 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
       uint64_t v{v1 | v2};
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(buf[pos])) : char16_t(buf[pos]);
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
           pos++;
         }
         continue;
@@ -13195,16 +12168,25 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
     uint8_t leading_byte = data[pos]; // leading byte
     if (leading_byte < 0b10000000) {
       // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(leading_byte)): char16_t(leading_byte);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 1 >= len) { return 0; } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
       // range check
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) { return 0; }
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return 0;
+      }
       if (!match_system(big_endian)) {
         code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
       }
@@ -13213,14 +12195,20 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 2 >= len) { return 0; } // minimal bound checking
+      if (pos + 2 >= len) {
+        return 0;
+      } // minimal bound checking
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return 0; }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
       // range check
       uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
       if (code_point < 0x800 || 0xffff < code_point ||
           (0xd7ff < code_point && code_point < 0xe000)) {
         return 0;
@@ -13232,16 +12220,27 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { return 0; } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return 0; }
+      if (pos + 3 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
 
       // range check
-      uint32_t code_point =
-          (leading_byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) { return 0; }
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return 0;
+      }
       code_point -= 0x10000;
       uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
       uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
@@ -13260,13 +12259,15 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
@@ -13274,8 +12275,10 @@ inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_o
       uint64_t v{v1 | v2};
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(buf[pos])) : char16_t(buf[pos]);
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
           pos++;
         }
         continue;
@@ -13284,16 +12287,25 @@ inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_o
     uint8_t leading_byte = data[pos]; // leading byte
     if (leading_byte < 0b10000000) {
       // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian) ? char16_t(utf16::swap_bytes(leading_byte)): char16_t(leading_byte);
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 1 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
       // range check
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) { return result(error_code::OVERLONG, pos); }
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
       if (!match_system(big_endian)) {
         code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
       }
@@ -13302,16 +12314,26 @@ inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_o
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8, it should become
       // a single UTF-16 word.
-      if(pos + 2 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
+      if (pos + 2 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
       // range check
       uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point)) { return result(error_code::OVERLONG, pos);}
-      if (0xd7ff < code_point && code_point < 0xe000) { return result(error_code::SURROGATE, pos); }
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
       if (!match_system(big_endian)) {
         code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
       }
@@ -13319,17 +12341,30 @@ inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_o
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if (pos + 3 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
 
       // range check
-      uint32_t code_point =
-          (leading_byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) { return result(error_code::OVERLONG, pos); }
-      if (0x10ffff < code_point) { return result(error_code::TOO_LARGE, pos); }
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
       code_point -= 0x10000;
       uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
       uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
@@ -13342,44 +12377,52 @@ inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_o
       pos += 4;
     } else {
       // we either have too many continuation bytes or an invalid leading byte
-      if ((leading_byte & 0b11000000) == 0b10000000) { return result(error_code::TOO_LONG, pos); }
-      else { return result(error_code::HEADER_BITS, pos); }
+      if ((leading_byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      } else {
+        return result(error_code::HEADER_BITS, pos);
+      }
     }
   }
   return result(error_code::SUCCESS, utf16_output - start);
 }
 
 /**
- * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and we have
- * up to len input bytes left, and we encountered some error. It is possible that
- * the error is at 'buf' exactly, but it could also be in the previous bytes  (up to 3 bytes back).
+ * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
+ * we have up to len input bytes left, and we encountered some error. It is
+ * possible that the error is at 'buf' exactly, but it could also be in the
+ * previous bytes  (up to 3 bytes back).
  *
- * prior_bytes indicates how many bytes, prior to 'buf' may belong to the current memory section
- * and can be safely accessed. We prior_bytes to access safely up to three bytes before 'buf'.
+ * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
+ * current memory section and can be safely accessed. We prior_bytes to access
+ * safely up to three bytes before 'buf'.
  *
  * The caller is responsible to ensure that len > 0.
  *
- * If the error is believed to have occurred prior to 'buf', the count value contain in the result
- * will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
+ * If the error is believed to have occurred prior to 'buf', the count value
+ * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
  */
 template <endianness endian>
-inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf, size_t len, char16_t* utf16_output) {
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char16_t *utf16_output) {
   size_t extra_len{0};
   // We potentially need to go back in time and find a leading byte.
-  // In theory '3' would be sufficient, but sometimes the error can go back quite far.
+  // In theory '3' would be sufficient, but sometimes the error can go back
+  // quite far.
   size_t how_far_back = prior_bytes;
   // size_t how_far_back = 3; // 3 bytes in the past + current position
   // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
   bool found_leading_bytes{false};
   // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for(size_t i = 0; i <= how_far_back; i++) {
+  for (size_t i = 0; i <= how_far_back; i++) {
     unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
     found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if(found_leading_bytes) {
-      if(i > 0 && byte < 128) {
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
         // If we had to go back and the leading byte is ascii
         // then we can stop right away.
-        return result(error_code::TOO_LONG, 0-i+1);
+        return result(error_code::TOO_LONG, 0 - i + 1);
       }
       buf -= i;
       extra_len = i;
@@ -13388,16 +12431,18 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   }
   //
   // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described in C Standard as <stddef.h>.
-  // C Standard Section 4.1.5 defines size_t as an unsigned integral type of the result of the sizeof operator
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
   //
   // An unsigned type will simply wrap round arithmetically (well defined).
   //
-  if(!found_leading_bytes) {
+  if (!found_leading_bytes) {
     // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is continuation]
-    // Or we possibly have a stream that does not start with a leading byte.
-    return result(error_code::TOO_LONG, 0-how_far_back);
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
   }
   result res = convert_with_errors<endian>(buf, len + extra_len, utf16_output);
   if (res.error) {
@@ -13406,7 +12451,7 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   return res;
 }
 
-} // utf8_to_utf16 namespace
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13423,18 +12468,20 @@ namespace scalar {
 namespace {
 namespace utf8_to_utf32 {
 
-inline size_t convert_valid(const char* buf, size_t len, char32_t* utf32_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert_valid(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
     // try to convert the next block of 8 ASCII bytes
-    if (pos + 8 <= len) { // if it is safe to read 8 more bytes, check that they are ascii
+    if (pos + 8 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 8;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *utf32_output++ = char32_t(buf[pos]);
           pos++;
         }
@@ -13448,19 +12495,30 @@ inline size_t convert_valid(const char* buf, size_t len, char32_t* utf32_output)
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) { break; } // minimal bound checking
-      *utf32_output++ = char32_t(((leading_byte &0b00011111) << 6) | (data[pos + 1] &0b00111111));
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      *utf32_output++ = char32_t(((leading_byte & 0b00011111) << 6) |
+                                 (data[pos + 1] & 0b00111111));
       pos += 2;
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8
-      if(pos + 2 >= len) { break; } // minimal bound checking
-      *utf32_output++ = char32_t(((leading_byte &0b00001111) << 12) | ((data[pos + 1] &0b00111111) << 6) | (data[pos + 2] &0b00111111));
+      if (pos + 2 >= len) {
+        break;
+      } // minimal bound checking
+      *utf32_output++ = char32_t(((leading_byte & 0b00001111) << 12) |
+                                 ((data[pos + 1] & 0b00111111) << 6) |
+                                 (data[pos + 2] & 0b00111111));
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { break; } // minimal bound checking
-      uint32_t code_word = ((leading_byte & 0b00000111) << 18 )| ((data[pos + 1] &0b00111111) << 12)
-                           | ((data[pos + 2] &0b00111111) << 6) | (data[pos + 3] &0b00111111);
+      if (pos + 3 >= len) {
+        break;
+      } // minimal bound checking
+      uint32_t code_word = ((leading_byte & 0b00000111) << 18) |
+                           ((data[pos + 1] & 0b00111111) << 12) |
+                           ((data[pos + 2] & 0b00111111) << 6) |
+                           (data[pos + 3] & 0b00111111);
       *utf32_output++ = char32_t(code_word);
       pos += 4;
     } else {
@@ -13471,7 +12529,6 @@ inline size_t convert_valid(const char* buf, size_t len, char32_t* utf32_output)
   return utf32_output - start;
 }
 
-
 } // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace scalar
@@ -13488,13 +12545,14 @@ namespace scalar {
 namespace {
 namespace utf8_to_utf32 {
 
-inline size_t convert(const char* buf, size_t len, char32_t* utf32_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
@@ -13502,7 +12560,7 @@ inline size_t convert(const char* buf, size_t len, char32_t* utf32_output) {
       uint64_t v{v1 | v2};
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *utf32_output++ = char32_t(buf[pos]);
           pos++;
         }
@@ -13516,23 +12574,36 @@ inline size_t convert(const char* buf, size_t len, char32_t* utf32_output) {
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) { return 0; } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
       // range check
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) { return 0; }
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return 0;
+      }
       *utf32_output++ = char32_t(code_point);
       pos += 2;
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8
-      if(pos + 2 >= len) { return 0; } // minimal bound checking
+      if (pos + 2 >= len) {
+        return 0;
+      } // minimal bound checking
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return 0; }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
       // range check
       uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
       if (code_point < 0x800 || 0xffff < code_point ||
           (0xd7ff < code_point && code_point < 0xe000)) {
         return 0;
@@ -13541,16 +12612,27 @@ inline size_t convert(const char* buf, size_t len, char32_t* utf32_output) {
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { return 0; } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return 0; }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return 0; }
+      if (pos + 3 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
 
       // range check
-      uint32_t code_point =
-          (leading_byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) { return 0; }
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return 0;
+      }
       *utf32_output++ = char32_t(code_point);
       pos += 4;
     } else {
@@ -13560,13 +12642,15 @@ inline size_t convert(const char* buf, size_t len, char32_t* utf32_output) {
   return utf32_output - start;
 }
 
-inline result convert_with_errors(const char* buf, size_t len, char32_t* utf32_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
@@ -13574,7 +12658,7 @@ inline result convert_with_errors(const char* buf, size_t len, char32_t* utf32_o
       uint64_t v{v1 | v2};
       if ((v & 0x8080808080808080) == 0) {
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *utf32_output++ = char32_t(buf[pos]);
           pos++;
         }
@@ -13588,79 +12672,118 @@ inline result convert_with_errors(const char* buf, size_t len, char32_t* utf32_o
       pos++;
     } else if ((leading_byte & 0b11100000) == 0b11000000) {
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
       // range check
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) { return result(error_code::OVERLONG, pos); }
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
       *utf32_output++ = char32_t(code_point);
       pos += 2;
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
       // We have a three-byte UTF-8
-      if(pos + 2 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
+      if (pos + 2 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
       // range check
       uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if (code_point < 0x800 || 0xffff < code_point) { return result(error_code::OVERLONG, pos); }
-      if (0xd7ff < code_point && code_point < 0xe000) { return result(error_code::SURROGATE, pos); }
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if (code_point < 0x800 || 0xffff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
       *utf32_output++ = char32_t(code_point);
       pos += 3;
     } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
       // we have a 4-byte UTF-8 word.
-      if(pos + 3 >= len) { return result(error_code::TOO_SHORT, pos); } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos);}
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) { return result(error_code::TOO_SHORT, pos); }
+      if (pos + 3 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
 
       // range check
-      uint32_t code_point =
-          (leading_byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) { return result(error_code::OVERLONG, pos); }
-      if (0x10ffff < code_point) { return result(error_code::TOO_LARGE, pos); }
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
       *utf32_output++ = char32_t(code_point);
       pos += 4;
     } else {
       // we either have too many continuation bytes or an invalid leading byte
-      if ((leading_byte & 0b11000000) == 0b10000000) { return result(error_code::TOO_LONG, pos); }
-      else { return result(error_code::HEADER_BITS, pos); }
+      if ((leading_byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      } else {
+        return result(error_code::HEADER_BITS, pos);
+      }
     }
   }
   return result(error_code::SUCCESS, utf32_output - start);
 }
 
 /**
- * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and we have
- * up to len input bytes left, and we encountered some error. It is possible that
- * the error is at 'buf' exactly, but it could also be in the previous bytes location (up to 3 bytes back).
+ * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
+ * we have up to len input bytes left, and we encountered some error. It is
+ * possible that the error is at 'buf' exactly, but it could also be in the
+ * previous bytes location (up to 3 bytes back).
  *
- * prior_bytes indicates how many bytes, prior to 'buf' may belong to the current memory section
- * and can be safely accessed. We prior_bytes to access safely up to three bytes before 'buf'.
+ * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
+ * current memory section and can be safely accessed. We prior_bytes to access
+ * safely up to three bytes before 'buf'.
  *
  * The caller is responsible to ensure that len > 0.
  *
- * If the error is believed to have occurred prior to 'buf', the count value contain in the result
- * will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
+ * If the error is believed to have occurred prior to 'buf', the count value
+ * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
  */
-inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf, size_t len, char32_t* utf32_output) {
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char32_t *utf32_output) {
   size_t extra_len{0};
   // We potentially need to go back in time and find a leading byte.
   size_t how_far_back = 3; // 3 bytes in the past + current position
-  if(how_far_back > prior_bytes) { how_far_back = prior_bytes; }
+  if (how_far_back > prior_bytes) {
+    how_far_back = prior_bytes;
+  }
   bool found_leading_bytes{false};
   // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for(size_t i = 0; i <= how_far_back; i++) {
+  for (size_t i = 0; i <= how_far_back; i++) {
     unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
     found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if(found_leading_bytes) {
-      if(i > 0 && byte < 128) {
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
         // If we had to go back and the leading byte is ascii
         // then we can stop right away.
-        return result(error_code::TOO_LONG, 0-i+1);
+        return result(error_code::TOO_LONG, 0 - i + 1);
       }
       buf -= i;
       extra_len = i;
@@ -13669,16 +12792,18 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   }
   //
   // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described in C Standard as <stddef.h>.
-  // C Standard Section 4.1.5 defines size_t as an unsigned integral type of the result of the sizeof operator
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
   //
   // An unsigned type will simply wrap round arithmetically (well defined).
   //
-  if(!found_leading_bytes) {
+  if (!found_leading_bytes) {
     // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is continuation]
-    // Or we possibly have a stream that does not start with a leading byte.
-    return result(error_code::TOO_LONG, 0-how_far_back);
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
   }
 
   result res = convert_with_errors(buf, len + extra_len, utf32_output);
@@ -13688,7 +12813,7 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   return res;
 }
 
-} // utf8_to_utf32 namespace
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13706,14 +12831,16 @@ namespace {
 namespace latin1_to_utf16 {
 
 template <endianness big_endian>
-inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
-  const uint8_t* data = reinterpret_cast<const uint8_t*>(buf);
+inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char16_t* start{ utf16_output };
+  char16_t *start{utf16_output};
 
   while (pos < len) {
-    uint16_t word = uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
-    *utf16_output++ = char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
+    uint16_t word =
+        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
+    *utf16_output++ =
+        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
     pos++;
   }
 
@@ -13721,21 +12848,24 @@ inline size_t convert(const char* buf, size_t len, char16_t* utf16_output) {
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char* buf, size_t len, char16_t* utf16_output) {
-  const uint8_t* data = reinterpret_cast<const uint8_t*>(buf);
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char16_t* start{ utf16_output };
+  char16_t *start{utf16_output};
 
   while (pos < len) {
-    uint16_t word = uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
-    *utf16_output++ = char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
+    uint16_t word =
+        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
+    *utf16_output++ =
+        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
     pos++;
   }
 
   return result(error_code::SUCCESS, utf16_output - start);
 }
 
-} // latin1_to_utf16 namespace
+} // namespace latin1_to_utf16
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13751,17 +12881,16 @@ namespace scalar {
 namespace {
 namespace latin1_to_utf32 {
 
-
 inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
   const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   for (size_t i = 0; i < len; i++) {
     *utf32_output++ = (char32_t)data[i];
   }
   return utf32_output - start;
 }
 
-} // latin1_to_utf32 namespace
+} // namespace latin1_to_utf32
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13778,22 +12907,26 @@ namespace scalar {
 namespace {
 namespace utf8_to_latin1 {
 
-inline size_t convert(const char* buf, size_t len, char* latin_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert(const char *buf, size_t len, char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char* start{latin_output};
+  char *start{latin_output};
 
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
       ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000 1000 1000 .... etc
-      if ((v & 0x8080808080808080) == 0) { // if NONE of these are set, e.g. all of them are zero, then everything is ASCII
+      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
+                           // 1000 1000 .... etc
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *latin_output++ = char(buf[pos]);
           pos++;
         }
@@ -13807,16 +12940,29 @@ inline size_t convert(const char* buf, size_t len, char* latin_output) {
       // converting one ASCII byte !!!
       *latin_output++ = char(leading_byte);
       pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) { // the first three bits indicate:
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) {
+      if (pos + 1 >= len) {
         return 0;
       } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; } // checks if the next byte is a valid continuation byte in UTF-8. A valid continuation byte starts with 10.
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
       // range check -
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111); // assembles the Unicode code point from the two bytes. It does this by discarding the leading 110 and 10 bits from the two bytes, shifting the remaining bits of the first byte, and then combining the results with a bitwise OR operation.
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
       if (code_point < 0x80 || 0xFF < code_point) {
-        return 0; // We only care about the range 129-255 which is Non-ASCII latin1 characters. A code_point beneath 0x80 is invalid as it is already covered by bytes whose leading bit is zero.
+        return 0; // We only care about the range 129-255 which is Non-ASCII
+                  // latin1 characters. A code_point beneath 0x80 is invalid as
+                  // it is already covered by bytes whose leading bit is zero.
       }
       *latin_output++ = char(code_point);
       pos += 2;
@@ -13827,22 +12973,27 @@ inline size_t convert(const char* buf, size_t len, char* latin_output) {
   return latin_output - start;
 }
 
-inline result convert_with_errors(const char* buf, size_t len, char* latin_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
   size_t pos = 0;
-  char* start{latin_output};
+  char *start{latin_output};
 
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
       ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000 1000 1000...etc
-      if ((v & 0x8080808080808080) == 0) { // if NONE of these are set, e.g. all of them are zero, then everything is ASCII
+      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
+                           // 1000 1000...etc
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *latin_output++ = char(buf[pos]);
           pos++;
         }
@@ -13855,20 +13006,32 @@ inline result convert_with_errors(const char* buf, size_t len, char* latin_outpu
       // converting one ASCII byte !!!
       *latin_output++ = char(leading_byte);
       pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) { // the first three bits indicate:
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) {
-        return result(error_code::TOO_SHORT, pos); } // minimal bound checking
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
       if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos); } // checks if the next byte is a valid continuation byte in UTF-8. A valid continuation byte starts with 10.
+        return result(error_code::TOO_SHORT, pos);
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
       // range check -
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111); // assembles the Unicode code point from the two bytes. It does this by discarding the leading 110 and 10 bits from the two bytes, shifting the remaining bits of the first byte, and then combining the results with a bitwise OR operation.
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
       if (code_point < 0x80) {
         return result(error_code::OVERLONG, pos);
       }
       if (0xFF < code_point) {
         return result(error_code::TOO_LARGE, pos);
-      } // We only care about the range 129-255 which is Non-ASCII latin1 characters
+      } // We only care about the range 129-255 which is Non-ASCII latin1
+        // characters
       *latin_output++ = char(code_point);
       pos += 2;
     } else if ((leading_byte & 0b11110000) == 0b11100000) {
@@ -13884,30 +13047,31 @@ inline result convert_with_errors(const char* buf, size_t len, char* latin_outpu
       }
 
       return result(error_code::HEADER_BITS, pos);
-
     }
   }
   return result(error_code::SUCCESS, latin_output - start);
 }
 
-
-inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf, size_t len, char* latin1_output) {
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char *latin1_output) {
   size_t extra_len{0};
   // We potentially need to go back in time and find a leading byte.
-  // In theory '3' would be sufficient, but sometimes the error can go back quite far.
+  // In theory '3' would be sufficient, but sometimes the error can go back
+  // quite far.
   size_t how_far_back = prior_bytes;
   // size_t how_far_back = 3; // 3 bytes in the past + current position
   // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
   bool found_leading_bytes{false};
   // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for(size_t i = 0; i <= how_far_back; i++) {
+  for (size_t i = 0; i <= how_far_back; i++) {
     unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
     found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if(found_leading_bytes) {
-      if(i > 0 && byte < 128) {
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
         // If we had to go back and the leading byte is ascii
         // then we can stop right away.
-        return result(error_code::TOO_LONG, 0-i+1);
+        return result(error_code::TOO_LONG, 0 - i + 1);
       }
       buf -= i;
       extra_len = i;
@@ -13916,16 +13080,18 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   }
   //
   // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described in C Standard as <stddef.h>.
-  // C Standard Section 4.1.5 defines size_t as an unsigned integral type of the result of the sizeof operator
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
   //
   // An unsigned type will simply wrap round arithmetically (well defined).
   //
-  if(!found_leading_bytes) {
+  if (!found_leading_bytes) {
     // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is continuation]
-    // Or we possibly have a stream that does not start with a leading byte.
-    return result(error_code::TOO_LONG, 0-how_far_back);
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
   }
   result res = convert_with_errors(buf, len + extra_len, latin1_output);
   if (res.error) {
@@ -13934,8 +13100,7 @@ inline result rewind_and_convert_with_errors(size_t prior_bytes, const char* buf
   return res;
 }
 
-
-} // utf8_to_latin1 namespace
+} // namespace utf8_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -13951,15 +13116,16 @@ namespace scalar {
 namespace {
 namespace utf16_to_latin1 {
 
-#include <cstring>  // for std::memcpy
+#include <cstring> // for std::memcpy
 
 template <endianness big_endian>
-inline size_t convert(const char16_t* buf, size_t len, char* latin_output) {
-  if(len == 0) { return 0; }
+inline size_t convert(const char16_t *buf, size_t len, char *latin_output) {
+  if (len == 0) {
+    return 0;
+  }
   const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  std::vector<char> temp_output(len);
-  char* current_write = temp_output.data();
+  char *current_write = latin_output;
   uint16_t word = 0;
   uint16_t too_large = 0;
 
@@ -13969,55 +13135,69 @@ inline size_t convert(const char16_t* buf, size_t len, char* latin_output) {
     *current_write++ = char(word & 0xFF);
     pos++;
   }
-  if((too_large & 0xFF00) != 0) { return 0; }
-
-  // Only copy to latin_output if there were no errors
-  std::memcpy(latin_output, temp_output.data(), len);
+  if ((too_large & 0xFF00) != 0) {
+    return 0;
+  }
 
-  return current_write - temp_output.data();
+  return current_write - latin_output;
 }
 
 template <endianness big_endian>
-inline result convert_with_errors(const char16_t* buf, size_t len, char* latin_output) {
-  if(len == 0) { return result(error_code::SUCCESS,0); }
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char *latin_output) {
+  if (len == 0) {
+    return result(error_code::SUCCESS, 0);
+  }
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char* start{latin_output};
+  char *start{latin_output};
   uint16_t word;
 
   while (pos < len) {
-    if (pos + 16 <= len) { // if it is safe to read 32 more bytes, check that they are Latin1
+    if (pos + 16 <= len) { // if it is safe to read 32 more bytes, check that
+                           // they are Latin1
       uint64_t v1, v2, v3, v4;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       ::memcpy(&v2, data + pos + 4, sizeof(uint64_t));
       ::memcpy(&v3, data + pos + 8, sizeof(uint64_t));
-      ::memcpy(&v4, data + pos  + 12, sizeof(uint64_t));
+      ::memcpy(&v4, data + pos + 12, sizeof(uint64_t));
 
-      if (!match_system(big_endian)) { v1 = (v1 >> 8) | (v1 << (64 - 8)); }
-      if (!match_system(big_endian)) { v2 = (v2 >> 8) | (v2 << (64 - 8)); }
-      if (!match_system(big_endian)) { v3 = (v3 >> 8) | (v3 << (64 - 8)); }
-      if (!match_system(big_endian)) { v4 = (v4 >> 8) | (v4 << (64 - 8)); }
+      if (!match_system(big_endian)) {
+        v1 = (v1 >> 8) | (v1 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v2 = (v2 >> 8) | (v2 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v3 = (v3 >> 8) | (v3 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v4 = (v4 >> 8) | (v4 << (64 - 8));
+      }
 
       if (((v1 | v2 | v3 | v4) & 0xFF00FF00FF00FF00) == 0) {
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
-          *latin_output++ = !match_system(big_endian) ? char(utf16::swap_bytes(data[pos])) : char(data[pos]);
+        while (pos < final_pos) {
+          *latin_output++ = !match_system(big_endian)
+                                ? char(utf16::swap_bytes(data[pos]))
+                                : char(data[pos]);
           pos++;
         }
         continue;
       }
     }
     word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if((word & 0xFF00 ) == 0) {
-        *latin_output++ = char(word & 0xFF);
-        pos++;
-    } else { return result(error_code::TOO_LARGE, pos); }
+    if ((word & 0xFF00) == 0) {
+      *latin_output++ = char(word & 0xFF);
+      pos++;
+    } else {
+      return result(error_code::TOO_LARGE, pos);
+    }
   }
-  return result(error_code::SUCCESS,latin_output - start);
+  return result(error_code::SUCCESS, latin_output - start);
 }
 
-
-} // utf16_to_latin1 namespace
+} // namespace utf16_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -14035,7 +13215,7 @@ namespace utf32_to_latin1 {
 
 inline size_t convert(const char32_t *buf, size_t len, char *latin1_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char* start = latin1_output;
+  char *start = latin1_output;
   uint32_t utf32_char;
   size_t pos = 0;
   uint32_t too_large = 0;
@@ -14046,35 +13226,42 @@ inline size_t convert(const char32_t *buf, size_t len, char *latin1_output) {
     *latin1_output++ = (char)(utf32_char & 0xFF);
     pos++;
   }
-  if((too_large & 0xFFFFFF00) != 0) { return 0; }
+  if ((too_large & 0xFFFFFF00) != 0) {
+    return 0;
+  }
   return latin1_output - start;
 }
 
-inline result convert_with_errors(const char32_t *buf, size_t len, char *latin1_output) {
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char *latin1_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char* start{latin1_output};
+  char *start{latin1_output};
   size_t pos = 0;
   while (pos < len) {
-    if (pos + 2 <= len) { // if it is safe to read 8 more bytes, check that they are Latin1
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are Latin1
       uint64_t v;
       ::memcpy(&v, data + pos, sizeof(uint64_t));
       if ((v & 0xFFFFFF00FFFFFF00) == 0) {
         *latin1_output++ = char(buf[pos]);
-        *latin1_output++ = char(buf[pos+1]);
+        *latin1_output++ = char(buf[pos + 1]);
         pos += 2;
         continue;
       }
     }
     uint32_t utf32_char = data[pos];
-    if ((utf32_char & 0xFFFFFF00) == 0) { // Check if the character can be represented in Latin-1
+    if ((utf32_char & 0xFFFFFF00) ==
+        0) { // Check if the character can be represented in Latin-1
       *latin1_output++ = (char)(utf32_char & 0xFF);
       pos++;
-    } else { return result(error_code::TOO_LARGE, pos); };
+    } else {
+      return result(error_code::TOO_LARGE, pos);
+    };
   }
   return result(error_code::SUCCESS, latin1_output - start);
 }
 
-} // utf32_to_latin1 namespace
+} // namespace utf32_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -14091,23 +13278,28 @@ namespace scalar {
 namespace {
 namespace utf8_to_latin1 {
 
-inline size_t convert_valid(const char* buf, size_t len, char* latin_output) {
- const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+inline size_t convert_valid(const char *buf, size_t len, char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
 
   size_t pos = 0;
-  char* start{latin_output};
+  char *start{latin_output};
 
   while (pos < len) {
     // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <= len) { // if it is safe to read 16 more bytes, check that they are ascii
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
       uint64_t v1;
       ::memcpy(&v1, data + pos, sizeof(uint64_t));
       uint64_t v2;
       ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000 1000 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) == 0) { // if NONE of these are set, e.g. all of them are zero, then everything is ASCII
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
         size_t final_pos = pos + 16;
-        while(pos < final_pos) {
+        while (pos < final_pos) {
           *latin_output++ = char(buf[pos]);
           pos++;
         }
@@ -14121,12 +13313,25 @@ inline size_t convert_valid(const char* buf, size_t len, char* latin_output) {
       // converting one ASCII byte !!!
       *latin_output++ = char(leading_byte);
       pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) { // the first three bits indicate:
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
       // We have a two-byte UTF-8
-      if(pos + 1 >= len) { break; } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) { return 0; } // checks if the next byte is a valid continuation byte in UTF-8. A valid continuation byte starts with 10.
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
       // range check -
-      uint32_t code_point = (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111); // assembles the Unicode code point from the two bytes. It does this by discarding the leading 110 and 10 bits from the two bytes, shifting the remaining bits of the first byte, and then combining the results with a bitwise OR operation.
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
       *latin_output++ = char(code_point);
       pos += 2;
     } else {
@@ -14137,7 +13342,7 @@ inline size_t convert_valid(const char* buf, size_t len, char* latin_output) {
   return latin_output - start;
 }
 
-} // utf8_to_latin1 namespace
+} // namespace utf8_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -14154,10 +13359,11 @@ namespace {
 namespace utf16_to_latin1 {
 
 template <endianness big_endian>
-inline size_t convert_valid(const char16_t* buf, size_t len, char* latin_output) {
- const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char *latin_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
   size_t pos = 0;
-  char* start{latin_output};
+  char *start{latin_output};
   uint16_t word = 0;
 
   while (pos < len) {
@@ -14169,7 +13375,7 @@ inline size_t convert_valid(const char16_t* buf, size_t len, char* latin_output)
   return latin_output - start;
 }
 
-} // utf16_to_latin1 namespace
+} // namespace utf16_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -14185,40 +13391,42 @@ namespace scalar {
 namespace {
 namespace utf32_to_latin1 {
 
-inline size_t convert_valid(const char32_t *buf, size_t len, char *latin1_output) {
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char *latin1_output) {
   const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char* start = latin1_output;
+  char *start = latin1_output;
   uint32_t utf32_char;
   size_t pos = 0;
 
   while (pos < len) {
-      utf32_char = (uint32_t) data[pos];
-
-      if (pos + 2 <= len) { // if it is safe to read 8 more bytes, check that they are Latin1
-          uint64_t v;
-          ::memcpy(&v, data + pos, sizeof(uint64_t));
-          if ((v & 0xFFFFFF00FFFFFF00) == 0) {
-              *latin1_output++ = char(buf[pos]);
-              *latin1_output++ = char(buf[pos + 1]);
-              pos += 2;
-              continue;
-          } else {
-              // output can not be represented in latin1
-              return 0;
-          }
-      }
-      if ((utf32_char & 0xFFFFFF00) == 0) {
-          *latin1_output++ = char(utf32_char);
+    utf32_char = (uint32_t)data[pos];
+
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are Latin1
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF00FFFFFF00) == 0) {
+        *latin1_output++ = char(buf[pos]);
+        *latin1_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
       } else {
-          // output can not be represented in latin1
-          return 0;
+        // output can not be represented in latin1
+        return 0;
       }
-      pos++;
+    }
+    if ((utf32_char & 0xFFFFFF00) == 0) {
+      *latin1_output++ = char(utf32_char);
+    } else {
+      // output can not be represented in latin1
+      return 0;
+    }
+    pos++;
   }
   return latin1_output - start;
 }
 
-} // utf32_to_latin1 namespace
+} // namespace utf32_to_latin1
 } // unnamed namespace
 } // namespace scalar
 } // namespace simdutf
@@ -14226,12 +13434,9 @@ inline size_t convert_valid(const char32_t *buf, size_t len, char *latin1_output
 #endif
 /* end file src/scalar/utf32_to_latin1/valid_utf32_to_latin1.h */
 
-
-
 SIMDUTF_PUSH_DISABLE_WARNINGS
 SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 
-
 #if SIMDUTF_IMPLEMENTATION_ARM64
 /* begin file src/arm64/implementation.cpp */
 /* begin file src/simdutf/arm64/begin.h */
@@ -14242,31 +13447,36 @@ namespace simdutf {
 namespace arm64 {
 namespace {
 #ifndef SIMDUTF_ARM64_H
-#error "arm64.h must be included"
+  #error "arm64.h must be included"
 #endif
 using namespace simd;
 
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t>& input) {
-    simd8<uint8_t> bits = input.reduce_or();
-    return bits.max_val() < 0b10000000u;
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
+  simd8<uint8_t> bits = input.reduce_or();
+  return bits.max_val() < 0b10000000u;
 }
 
-simdutf_unused simdutf_really_inline simd8<bool> must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-    simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
-    simd8<bool> is_third_byte  = prev2 >= uint8_t(0b11100000u);
-    simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
-    // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller is using ^ as well.
-    // This will work fine because we only have to report errors for cases with 0-1 lead bytes.
-    // Multiple lead bytes implies 2 overlapping multibyte characters, and if that happens, there is
-    // guaranteed to be at least *one* lead byte that is part of only 1 other multibyte character.
-    // The error will be detected there.
-    return is_second_byte ^ is_third_byte ^ is_fourth_byte;
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller
+  // is using ^ as well. This will work fine because we only have to report
+  // errors for cases with 0-1 lead bytes. Multiple lead bytes implies 2
+  // overlapping multibyte characters, and if that happens, there is guaranteed
+  // to be at least *one* lead byte that is part of only 1 other multibyte
+  // character. The error will be detected there.
+  return is_second_byte ^ is_third_byte ^ is_fourth_byte;
 }
 
-simdutf_really_inline simd8<bool> must_be_2_3_continuation(const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-    simd8<bool> is_third_byte  = prev2 >= uint8_t(0b11100000u);
-    simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
-    return is_third_byte ^ is_fourth_byte;
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  return is_third_byte ^ is_fourth_byte;
 }
 
 // common functions for utf8 conversions
@@ -14274,7 +13484,8 @@ simdutf_really_inline uint16x4_t convert_utf8_3_byte_to_utf16(uint8x16_t in) {
   // Low half contains  10cccccc|1110aaaa
   // High half contains 10bbbbbb|10bbbbbb
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t sh = simdutf_make_uint8x16_t(0, 2, 3, 5, 6, 8, 9, 11, 1, 1, 4, 4, 7, 7, 10, 10);
+  const uint8x16_t sh = simdutf_make_uint8x16_t(0, 2, 3, 5, 6, 8, 9, 11, 1, 1,
+                                                4, 4, 7, 7, 10, 10);
 #else
   const uint8x16_t sh = {0, 2, 3, 5, 6, 8, 9, 11, 1, 1, 4, 4, 7, 7, 10, 10};
 #endif
@@ -14303,8 +13514,8 @@ simdutf_really_inline uint16x4_t convert_utf8_3_byte_to_utf16(uint8x16_t in) {
 simdutf_really_inline uint16x8_t convert_utf8_2_byte_to_utf16(uint8x16_t in) {
   // Converts 6 2 byte UTF-8 characters to 6 UTF-16 characters.
   // Technically this calculates 8, but 6 does better and happens more often
-  // (The languages which use these codepoints use ASCII spaces so 8 would need to be
-  // in the middle of a very long word).
+  // (The languages which use these codepoints use ASCII spaces so 8 would need
+  // to be in the middle of a very long word).
 
   // 10bbbbbb 110aaaaa
   uint16x8_t upper = vreinterpretq_u16_u8(in);
@@ -14319,12 +13530,14 @@ simdutf_really_inline uint16x8_t convert_utf8_2_byte_to_utf16(uint8x16_t in) {
   return composed;
 }
 
-simdutf_really_inline uint16x8_t convert_utf8_1_to_2_byte_to_utf16(uint8x16_t in, size_t shufutf8_idx) {
+simdutf_really_inline uint16x8_t
+convert_utf8_1_to_2_byte_to_utf16(uint8x16_t in, size_t shufutf8_idx) {
   // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
   // This is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six code
-  // code units spanning between 1 and 2 bytes each is 12 bytes.
-  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]));
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes.
+  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+      simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]));
   // Shuffle
   // 1 byte: 00000000 0bbbbbbb
   // 2 byte: 110aaaaa 10bbbbbb
@@ -14345,207 +13558,269 @@ simdutf_really_inline uint16x8_t convert_utf8_1_to_2_byte_to_utf16(uint8x16_t in
 
 /* begin file src/arm64/arm_validate_utf16.cpp */
 template <endianness big_endian>
-const char16_t* arm_validate_utf16(const char16_t* input, size_t size) {
-    const char16_t* end = input + size;
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
-    while (end - input >= 16) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
-        if (!match_system(big_endian)) {
-            in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
-            in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
-        }
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
-        const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
-        if(surrogates_wordmask == 0) {
-            input += 16;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint64_t V = ~surrogates_wordmask;
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto vH = ((in & v_fc) ==  v_dc);
-            const uint64_t H = vH.to_bitmask64();
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint64_t L = ~H & surrogates_wordmask;
-
-            const uint64_t a = L & (H >> 4); // A low surrogate must be followed by high one.
-                              // (A low surrogate placed in the 7th register's word
-                              // is an exception we handle.)
-            const uint64_t b = a << 4; // Just mark that the opposite fact is hold,
-                          // thanks to that we have only two masks for valid case.
-            const uint64_t c = V | a | b;      // Combine all the masks into the final one.
-            if (c == ~0ull) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += 16;
-            } else if (c == 0xfffffffffffffffull) {
-                // The 15 lower code units of the input register contains valid UTF-16.
-                // The 15th word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += 15;
-            } else {
-                return nullptr;
-            }
-        }
+const char16_t *arm_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  while (end - input >= 16) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    if (!match_system(big_endian)) {
+      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
+      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+    }
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
+    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
+    if (surrogates_wordmask == 0) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint64_t V = ~surrogates_wordmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = ((in & v_fc) == v_dc);
+      const uint64_t H = vH.to_bitmask64();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint64_t L = ~H & surrogates_wordmask;
+
+      const uint64_t a =
+          L & (H >> 4); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint64_t b =
+          a << 4; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint64_t c = V | a | b; // Combine all the masks into the final one.
+      if (c == ~0ull) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0xfffffffffffffffull) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return nullptr;
+      }
     }
-    return input;
+  }
+  return input;
 }
 
-
 template <endianness big_endian>
-const result arm_validate_utf16_with_errors(const char16_t* input, size_t size) {
-    const char16_t* start = input;
-    const char16_t* end = input + size;
-
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
-    while (input + 16 < end) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
-
-        if (!match_system(big_endian)) {
-            in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
-            in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
-        }
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
-        const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
-        if(surrogates_wordmask == 0) {
-            input += 16;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint64_t V = ~surrogates_wordmask;
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto vH = ((in & v_fc) ==  v_dc);
-            const uint64_t H = vH.to_bitmask64();
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint64_t L = ~H & surrogates_wordmask;
-
-            const uint64_t a = L & (H >> 4); // A low surrogate must be followed by high one.
-                              // (A low surrogate placed in the 7th register's word
-                              // is an exception we handle.)
-            const uint64_t b = a << 4; // Just mark that the opposite fact is hold,
-                          // thanks to that we have only two masks for valid case.
-            const uint64_t c = V | a | b;      // Combine all the masks into the final one.
-            if (c == ~0ull) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += 16;
-            } else if (c == 0xfffffffffffffffull) {
-                // The 15 lower code units of the input register contains valid UTF-16.
-                // The 15th word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += 15;
-            } else {
-                return result(error_code::SURROGATE, input - start);
-            }
-        }
+const result arm_validate_utf16_with_errors(const char16_t *input,
+                                            size_t size) {
+  const char16_t *start = input;
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  while (input + 16 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+
+    if (!match_system(big_endian)) {
+      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
+      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+    }
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
+    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
+    if (surrogates_wordmask == 0) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint64_t V = ~surrogates_wordmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = ((in & v_fc) == v_dc);
+      const uint64_t H = vH.to_bitmask64();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint64_t L = ~H & surrogates_wordmask;
+
+      const uint64_t a =
+          L & (H >> 4); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint64_t b =
+          a << 4; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint64_t c = V | a | b; // Combine all the masks into the final one.
+      if (c == ~0ull) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0xfffffffffffffffull) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
     }
-    return result(error_code::SUCCESS, input - start);
+  }
+  return result(error_code::SUCCESS, input - start);
 }
 /* end file src/arm64/arm_validate_utf16.cpp */
 /* begin file src/arm64/arm_validate_utf32le.cpp */
 
-const char32_t* arm_validate_utf32le(const char32_t* input, size_t size) {
-    const char32_t* end = input + size;
+const char32_t *arm_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
 
-    const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
-    const uint32x4_t offset = vmovq_n_u32(0xffff2000);
-    const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
-    uint32x4_t currentmax = vmovq_n_u32(0x0);
-    uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
+  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
+  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
+  uint32x4_t currentmax = vmovq_n_u32(0x0);
+  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
 
-    while (end - input >= 4) {
-        const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t*>(input));
-        currentmax = vmaxq_u32(in,currentmax);
-        currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
-        input += 4;
-    }
+  while (end - input >= 4) {
+    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
+    currentmax = vmaxq_u32(in, currentmax);
+    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
+    input += 4;
+  }
 
-    uint32x4_t is_zero = veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
-    if(vmaxvq_u32(is_zero) != 0) {
-        return nullptr;
-    }
+  uint32x4_t is_zero =
+      veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
+  if (vmaxvq_u32(is_zero) != 0) {
+    return nullptr;
+  }
 
-    is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-    if(vmaxvq_u32(is_zero) != 0) {
-        return nullptr;
-    }
+  is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
+                      standardoffsetmax);
+  if (vmaxvq_u32(is_zero) != 0) {
+    return nullptr;
+  }
 
-    return input;
+  return input;
 }
 
+const result arm_validate_utf32le_with_errors(const char32_t *input,
+                                              size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
 
-const result arm_validate_utf32le_with_errors(const char32_t* input, size_t size) {
-    const char32_t* start = input;
-    const char32_t* end = input + size;
+  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
+  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
+  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
+  uint32x4_t currentmax = vmovq_n_u32(0x0);
+  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
 
-    const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
-    const uint32x4_t offset = vmovq_n_u32(0xffff2000);
-    const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
-    uint32x4_t currentmax = vmovq_n_u32(0x0);
-    uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+  while (end - input >= 4) {
+    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
+    currentmax = vmaxq_u32(in, currentmax);
+    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
 
-    while (end - input >= 4) {
-        const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t*>(input));
-        currentmax = vmaxq_u32(in,currentmax);
-        currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
+    uint32x4_t is_zero =
+        veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
+    if (vmaxvq_u32(is_zero) != 0) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
 
-        uint32x4_t is_zero = veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
-        if(vmaxvq_u32(is_zero) != 0) {
-            return result(error_code::TOO_LARGE, input - start);
-        }
+    is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
+                        standardoffsetmax);
+    if (vmaxvq_u32(is_zero) != 0) {
+      return result(error_code::SURROGATE, input - start);
+    }
 
-        is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-        if(vmaxvq_u32(is_zero) != 0) {
-            return result(error_code::SURROGATE, input - start);
-        }
+    input += 4;
+  }
 
-        input += 4;
+  return result(error_code::SUCCESS, input - start);
+}
+/* end file src/arm64/arm_validate_utf32le.cpp */
+
+/* begin file src/arm64/arm_convert_latin1_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char *, char16_t *>
+arm_convert_latin1_to_utf16(const char *buf, size_t len,
+                            char16_t *utf16_output) {
+  const char *end = buf + len;
+
+  while (end - buf >= 16) {
+    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
+    uint16x8_t inlow = vmovl_u8(vget_low_u8(in8));
+    if (!match_system(big_endian)) {
+      inlow = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inlow)));
     }
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), inlow);
+    uint16x8_t inhigh = vmovl_u8(vget_high_u8(in8));
+    if (!match_system(big_endian)) {
+      inhigh = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inhigh)));
+    }
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output + 8), inhigh);
+    utf16_output += 16;
+    buf += 16;
+  }
 
-    return result(error_code::SUCCESS, input - start);
+  return std::make_pair(buf, utf16_output);
 }
-/* end file src/arm64/arm_validate_utf32le.cpp */
+/* end file src/arm64/arm_convert_latin1_to_utf16.cpp */
+/* begin file src/arm64/arm_convert_latin1_to_utf32.cpp */
+std::pair<const char *, char32_t *>
+arm_convert_latin1_to_utf32(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char *end = buf + len;
 
+  while (end - buf >= 16) {
+    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
+    uint16x8_t in8low = vmovl_u8(vget_low_u8(in8));
+    uint32x4_t in16lowlow = vmovl_u16(vget_low_u16(in8low));
+    uint32x4_t in16lowhigh = vmovl_u16(vget_high_u16(in8low));
+    uint16x8_t in8high = vmovl_u8(vget_high_u8(in8));
+    uint32x4_t in8highlow = vmovl_u16(vget_low_u16(in8high));
+    uint32x4_t in8highhigh = vmovl_u16(vget_high_u16(in8high));
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output), in16lowlow);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 4), in16lowhigh);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 8), in8highlow);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 12), in8highhigh);
+
+    utf32_output += 16;
+    buf += 16;
+  }
+
+  return std::make_pair(buf, utf32_output);
+}
+/* end file src/arm64/arm_convert_latin1_to_utf32.cpp */
 /* begin file src/arm64/arm_convert_latin1_to_utf8.cpp */
 /*
   Returns a pair: the first unprocessed byte from buf and utf8_output
@@ -14594,8 +13869,8 @@ arm_convert_latin1_to_utf8(const char *latin1_input, size_t len,
         vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in16, t4));
     // 3. prepare bitmask for 8-bit lookup
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-    const uint16x8_t mask = simdutf_make_uint16x8_t(0x0001, 0x0004, 0x0010, 0x0040,
-                                            0x0002, 0x0008, 0x0020, 0x0080);
+    const uint16x8_t mask = simdutf_make_uint16x8_t(
+        0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
 #else
     const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
                              0x0002, 0x0008, 0x0020, 0x0080};
@@ -14618,117 +13893,149 @@ arm_convert_latin1_to_utf8(const char *latin1_input, size_t len,
   return std::make_pair(latin1_input, reinterpret_cast<char *>(utf8_output));
 }
 /* end file src/arm64/arm_convert_latin1_to_utf8.cpp */
-/* begin file src/arm64/arm_convert_latin1_to_utf16.cpp */
-template <endianness big_endian>
-std::pair<const char*, char16_t*> arm_convert_latin1_to_utf16(const char* buf, size_t len, char16_t* utf16_output) {
-    const char* end = buf + len;
-
-    while (end - buf >= 16) {
-        uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
-        uint16x8_t inlow = vmovl_u8(vget_low_u8(in8));
-        if (!match_system(big_endian)) { inlow = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inlow))); }
-        vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), inlow);
-        uint16x8_t inhigh = vmovl_u8(vget_high_u8(in8));
-        if (!match_system(big_endian)) { inhigh = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inhigh))); }
-        vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output+8), inhigh);
-        utf16_output += 16;
-        buf += 16;
-    }
-
-    return std::make_pair(buf, utf16_output);
-}
-/* end file src/arm64/arm_convert_latin1_to_utf16.cpp */
-/* begin file src/arm64/arm_convert_latin1_to_utf32.cpp */
-std::pair<const char*, char32_t*> arm_convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) {
-    const char* end = buf + len;
-
-    while (end - buf >= 16) {
-        uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
-        uint16x8_t in8low = vmovl_u8(vget_low_u8(in8));
-        uint32x4_t in16lowlow = vmovl_u16(vget_low_u16(in8low));
-        uint32x4_t in16lowhigh = vmovl_u16(vget_high_u16(in8low));
-        uint16x8_t in8high = vmovl_u8(vget_high_u8(in8));
-        uint32x4_t in8highlow = vmovl_u16(vget_low_u16(in8high));
-        uint32x4_t in8highhigh = vmovl_u16(vget_high_u16(in8high));
-        vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output), in16lowlow);
-        vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output+4), in16lowhigh);
-        vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output+8), in8highlow);
-        vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output+12), in8highhigh);
-
-        utf32_output += 16;
-        buf += 16;
-    }
-
-    return std::make_pair(buf, utf32_output);
-}
-/* end file src/arm64/arm_convert_latin1_to_utf32.cpp */
 
-/* begin file src/arm64/arm_convert_utf8_to_utf16.cpp */
+/* begin file src/arm64/arm_convert_utf8_to_latin1.cpp */
 // Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 16, usually 12).
-template <endianness big_endian>
-size_t convert_masked_utf8_to_utf16(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char16_t *&utf16_output) {
+size_t convert_masked_utf8_to_latin1(const char *input,
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t*>(input));
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xfff;
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
 
   // We first try a few fast paths.
   // The obvious first test is ASCII, which actually consumes the full 16.
-  if((utf8_end_of_code_point_mask & 0xFFFF) == 0xffff) {
-    // We process in chunks of 16 bytes
-    // The routine in simd.h is reused.
-    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
-    temp.store_ascii_as_utf16<big_endian>(utf16_output);
-    utf16_output += 16; // We wrote 16 16-bit characters.
-    return 16; // We consumed 16 bytes.
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process in chunks of 12 bytes
+    vst1q_u8(reinterpret_cast<uint8_t *>(latin1_output), in);
+    latin1_output += 12; // We wrote 12 18-bit characters.
+    return 12;           // We consumed 12 bytes.
   }
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
 
-  // 3 byte sequences are the next most common, as seen in CJK, which has long sequences
-  // of these.
-  if (input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte UTF-16 code units.
-    uint16x4_t composed = convert_utf8_3_byte_to_utf16(in);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
-    }
-    vst1_u16(reinterpret_cast<uint16_t*>(utf16_output), composed);
-    utf16_output += 4; // We wrote 4 16-bit characters.
-    return 12; // We consumed 12 bytes.
-  }
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  // this indicates an invalid input:
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. Converts 6
+  // 1-2 byte UTF-8 characters to 6 UTF-16 characters. This is a relatively easy
+  // scenario we process SIX (6) input code-code units. The max length in bytes
+  // of six code code units spanning between 1 and 2 bytes each is 12 bytes.
+  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+      simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
+  // Mask
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000000 00bbbbbb
+  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
+  // 1 byte: 00000000 00000000
+  // 2 byte: 000aaaaa 00000000
+  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
+  // Combine with a shift right accumulate
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000aaa aabbbbbb
+  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
+  // writing 8 bytes even though we only care about the first 6 bytes.
+  uint8x8_t latin1_packed = vmovn_u16(composed);
+  vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+  latin1_output += 6; // We wrote 6 bytes.
+  return consumed;
+}
+/* end file src/arm64/arm_convert_utf8_to_latin1.cpp */
+/* begin file src/arm64/arm_convert_utf8_to_utf16.cpp */
+// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 16, usually 12).
+template <endianness big_endian>
+size_t convert_masked_utf8_to_utf16(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xffff) {
+    // We process in chunks of 16 bytes
+    // The routine in simd.h is reused.
+    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
+    temp.store_ascii_as_utf16<big_endian>(utf16_output);
+    utf16_output += 16; // We wrote 16 16-bit characters.
+    return 16;          // We consumed 16 bytes.
+  }
+
+  // 3 byte sequences are the next most common, as seen in CJK, which has long
+  // sequences of these.
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units.
+    uint16x4_t composed = convert_utf8_3_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
+    }
+    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+    utf16_output += 4; // We wrote 4 16-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
 
   // 2 byte sequences occur in short bursts in languages like Greek and Russian.
   if ((utf8_end_of_code_point_mask & 0xFFF) == 0xaaa) {
-    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte UTF-16 code units.
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte
+    // UTF-16 code units.
     uint16x8_t composed = convert_utf8_2_byte_to_utf16(in);
     // Byte swap if necessary
     if (!match_system(big_endian)) {
-      composed = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+      composed =
+          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
     }
     vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
 
     utf16_output += 6; // We wrote 6 16-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;         // We consumed 12 bytes.
   }
 
-  /// We do not have a fast path available, or the fast path is unimportant, so we fallback.
-  const uint8_t idx =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
 
-  const uint8_t consumed =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
 
   if (idx < 64) {
     // SIX (6) input code-code units
@@ -14736,16 +14043,18 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     uint16x8_t composed = convert_utf8_1_to_2_byte_to_utf16(in, idx);
     // Byte swap if necessary
     if (!match_system(big_endian)) {
-      composed = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+      composed =
+          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
     }
     // Store
-    vst1q_u16(reinterpret_cast<uint16_t*>(utf16_output), composed);
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
     utf16_output += 6; // We wrote 6 16-bit characters.
     return consumed;
   } else if (idx < 145) {
     // FOUR (4) input code-code units
     // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
     // XXX: depending on the system scalar instructions might be faster.
     // 1 byte: 00000000 00000000 0ccccccc
     // 2 byte: 00000000 110bbbbb 10cccccc
@@ -14783,19 +14092,19 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     if (!match_system(big_endian)) {
       composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
     }
-    vst1_u16(reinterpret_cast<uint16_t*>(utf16_output), composed);
+    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
 
     utf16_output += 4; // We wrote 4 16-bit codepoints
     return consumed;
   } else if (idx < 209) {
     // THREE (3) input code-code units
     if (input_utf8_end_of_code_point_mask == 0x888) {
-      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte UTF-16 pairs.
-      // Generating surrogate pairs is a little tricky though, but it is easier when we
-      // can assume they are all pairs.
-      // This version does not use the LUT, but 4 byte sequences are less common and the
-      // overhead of the extra memory access is less important than the early branch overhead
-      // in shorter sequences.
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-16 pairs. Generating surrogate pairs is a little tricky though, but
+      // it is easier when we can assume they are all pairs. This version does
+      // not use the LUT, but 4 byte sequences are less common and the overhead
+      // of the extra memory access is less important than the early branch
+      // overhead in shorter sequences.
 
       // Swap byte pairs
       // 10dddddd 10cccccc|10bbbbbb 11110aaa
@@ -14804,10 +14113,9 @@ size_t convert_masked_utf8_to_utf16(const char *input,
       // Shift left 2 bits
       // cccccc00 dddddd00 xxxxxxxx bbbbbb00
       uint32x4_t shift = vreinterpretq_u32_u8(vshlq_n_u8(swap, 2));
-      // Create a magic number containing the low 2 bits of the trail surrogate and all the
-      // corrections needed to create the pair.
-      // UTF-8 4b prefix   = -0x0000|0xF000
-      // surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
+      // Create a magic number containing the low 2 bits of the trail surrogate
+      // and all the corrections needed to create the pair. UTF-8 4b prefix   =
+      // -0x0000|0xF000 surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
       // surrogate high    = +0x0000|0xD800
       // surrogate low     = +0xDC00|0x0000
       // -------------------------------
@@ -14815,34 +14123,42 @@ size_t convert_masked_utf8_to_utf16(const char *input,
       uint32x4_t magic = vmovq_n_u32(0xDC00E7C0);
       // Generate unadjusted trail surrogate minus lowest 2 bits
       // xxxxxxxx xxxxxxxx|11110aaa bbbbbb00
-      uint32x4_t trail = vbslq_u32(vmovq_n_u32(0x0000FF00), vreinterpretq_u32_u8(swap), shift);
+      uint32x4_t trail =
+          vbslq_u32(vmovq_n_u32(0x0000FF00), vreinterpretq_u32_u8(swap), shift);
       // Insert low 2 bits of trail surrogate to magic number for later
       // 11011100 00000000 11100111 110000cc
-      uint16x8_t magic_with_low_2 = vreinterpretq_u16_u32(vsraq_n_u32(magic, shift, 30));
+      uint16x8_t magic_with_low_2 =
+          vreinterpretq_u16_u32(vsraq_n_u32(magic, shift, 30));
       // Generate lead surrogate
       // xxxxcccc ccdddddd|xxxxxxxx xxxxxxxx
-      uint32x4_t lead = vreinterpretq_u32_u16(vsliq_n_u16(vreinterpretq_u16_u8(swap), vreinterpretq_u16_u8(in), 6));
+      uint32x4_t lead = vreinterpretq_u32_u16(
+          vsliq_n_u16(vreinterpretq_u16_u8(swap), vreinterpretq_u16_u8(in), 6));
       // Mask out lead
       // 000000cc ccdddddd|xxxxxxxx xxxxxxxx
       lead = vbicq_u32(lead, vmovq_n_u32(uint32_t(~0x03FFFFFF)));
       // Blend pairs
       // 000000cc ccdddddd|11110aaa bbbbbb00
-      uint16x8_t blend = vreinterpretq_u16_u32(vbslq_u32(vmovq_n_u32(0x0000FFFF), trail, lead));
+      uint16x8_t blend = vreinterpretq_u16_u32(
+          vbslq_u32(vmovq_n_u32(0x0000FFFF), trail, lead));
       // Add magic number to finish the result
       // 110111CC CCDDDDDD|110110AA BBBBBBCC
       uint16x8_t composed = vaddq_u16(blend, magic_with_low_2);
       // Byte swap if necessary
       if (!match_system(big_endian)) {
-        composed = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+        composed =
+            vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
       }
       uint16_t buffer[8];
       vst1q_u16(reinterpret_cast<uint16_t *>(buffer), composed);
-      for(int k = 0; k < 6; k++) { utf16_output[k] = buffer[k]; } // the loop might compiler to a couple of instructions.
+      for (int k = 0; k < 6; k++) {
+        utf16_output[k] = buffer[k];
+      } // the loop might compiler to a couple of instructions.
       utf16_output += 6; // We wrote 3 32-bit surrogate pairs.
-      return 12; // We consumed 12 bytes.
+      return 12;         // We consumed 12 bytes.
     }
     // 3 1-4 byte sequences
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
 
     // 1 byte: 00000000 00000000 00000000 0ddddddd
     // 3 byte: 00000000 00000000 110ccccc 10dddddd
@@ -14850,52 +14166,56 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
     uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
     // added to fix issue https://github.com/simdutf/simdutf/issues/514
-    // We only want to write 2 * 16-bit code units when that is actually what we have.
-    // Unfortunately, we cannot trust the input. So it is possible to get 0xff as an input byte
-    // and it should not result in a surrogate pair. We need to check for that.
+    // We only want to write 2 * 16-bit code units when that is actually what we
+    // have. Unfortunately, we cannot trust the input. So it is possible to get
+    // 0xff as an input byte and it should not result in a surrogate pair. We
+    // need to check for that.
     uint32_t permbuffer[4];
     vst1q_u32(permbuffer, perm);
     // Mask the low and middle bytes
     // 00000000 00000000 00000000 0ddddddd
     uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7f));
-    // Because the surrogates need more work, the high surrogate is computed first.
+    // Because the surrogates need more work, the high surrogate is computed
+    // first.
     uint32x4_t middlehigh = vshlq_n_u32(perm, 2);
     // 00000000 00000000 00cccccc 00000000
     uint32x4_t middlebyte = vandq_u32(perm, vmovq_n_u32(0x3F00));
-    // Start assembling the sequence. Since the 4th byte is in the same position as it
-    // would be in a surrogate and there is no dependency, shift left instead of right.
-    // 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx
-    // 4 byte: 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
+    // Start assembling the sequence. Since the 4th byte is in the same position
+    // as it would be in a surrogate and there is no dependency, shift left
+    // instead of right. 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx 4 byte:
+    // 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
     uint32x4_t ab = vbslq_u32(vmovq_n_u32(0xFF000000), perm, middlehigh);
-    // Top 16 bits contains the high ten bits of the surrogate pair before correction
-    // 3 byte: 00000000 10bbbbcc|cccc0000 00000000
-    // 4 byte: 11110aaa bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
-    uint32x4_t abc = vbslq_u32(vmovq_n_u32(0xFFFC0000), ab, vshlq_n_u32(middlebyte, 4));
+    // Top 16 bits contains the high ten bits of the surrogate pair before
+    // correction 3 byte: 00000000 10bbbbcc|cccc0000 00000000 4 byte: 11110aaa
+    // bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
+    uint32x4_t abc =
+        vbslq_u32(vmovq_n_u32(0xFFFC0000), ab, vshlq_n_u32(middlebyte, 4));
     // Combine the low 6 or 7 bits by a shift right accumulate
     // 3 byte: 00000000 00000010|bbbbcccc ccdddddd - low 16 bits correct
-    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o correction
+    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o
+    // correction
     uint32x4_t composed = vsraq_n_u32(ascii, abc, 6);
     // After this is for surrogates
     // Blend the low and high surrogates
     // 4 byte: 11110aaa bbbbbbcc|bbbbcccc ccdddddd
     uint32x4_t mixed = vbslq_u32(vmovq_n_u32(0xFFFF0000), abc, composed);
-    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits yet as
-    // 0x10000 was not subtracted from the codepoint yet.
-    // 4 byte: 11110aaa bbbbbbcc|000000cc ccdddddd
-    uint16x8_t masked_pair =
-        vreinterpretq_u16_u32(vbicq_u32(mixed, vmovq_n_u32(uint32_t(~0xFFFF03FF))));
-    // Correct the remaining UTF-8 prefix, surrogate offset, and add the surrogate prefixes
-    // in one magic 16-bit addition.
-    // similar magic number but without the continue byte adjust and halfword swapped
-    // UTF-8 4b prefix   = -0xF000|0x0000
-    // surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
+    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits
+    // yet as 0x10000 was not subtracted from the codepoint yet. 4 byte:
+    // 11110aaa bbbbbbcc|000000cc ccdddddd
+    uint16x8_t masked_pair = vreinterpretq_u16_u32(
+        vbicq_u32(mixed, vmovq_n_u32(uint32_t(~0xFFFF03FF))));
+    // Correct the remaining UTF-8 prefix, surrogate offset, and add the
+    // surrogate prefixes in one magic 16-bit addition. similar magic number but
+    // without the continue byte adjust and halfword swapped UTF-8 4b prefix   =
+    // -0xF000|0x0000 surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
     // surrogate high    = +0xD800|0x0000
     // surrogate low     = +0x0000|0xDC00
     // -----------------------------------
     //                   = +0xE7C0|0xDC00
     uint16x8_t magic = vreinterpretq_u16_u32(vmovq_n_u32(0xE7C0DC00));
     // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD - surrogate pair complete
-    uint32x4_t surrogates = vreinterpretq_u32_u16(vaddq_u16(masked_pair, magic));
+    uint32x4_t surrogates =
+        vreinterpretq_u32_u16(vaddq_u16(masked_pair, magic));
     // If the high bit is 1 (s32 less than zero), this needs a surrogate pair
     uint32x4_t is_pair = vcltzq_s32(vreinterpretq_s32_u32(perm));
 
@@ -14905,19 +14225,21 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     uint32x4_t selected = vbslq_u32(is_pair, surrogates, composed);
     // Byte swap if necessary
     if (!match_system(big_endian)) {
-      selected = vreinterpretq_u32_u8(vrev16q_u8(vreinterpretq_u8_u32(selected)));
+      selected =
+          vreinterpretq_u32_u8(vrev16q_u8(vreinterpretq_u8_u32(selected)));
     }
     // Attempting to shuffle and store would be complex, just scalarize.
     uint32_t buffer[4];
     vst1q_u32(buffer, selected);
     // Test for the top bit of the surrogate mask. Remove due to issue 514
-    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 : 0x00800000;
+    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 :
+    // 0x00800000;
     for (size_t i = 0; i < 3; i++) {
       // Surrogate
       // Used to be if (buffer[i] & SURROGATE_MASK) {
       // See discussion above.
       // patch for issue https://github.com/simdutf/simdutf/issues/514
-      if((permbuffer[i] & 0xf8000000) == 0xf0000000) {
+      if ((permbuffer[i] & 0xf8000000) == 0xf0000000) {
         utf16_output[0] = uint16_t(buffer[i] >> 16);
         utf16_output[1] = uint16_t(buffer[i] & 0xFFFF);
         utf16_output += 2;
@@ -14932,7 +14254,6 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     return 12;
   }
 }
-
 /* end file src/arm64/arm_convert_utf8_to_utf16.cpp */
 /* begin file src/arm64/arm_convert_utf8_to_utf32.cpp */
 // Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
@@ -14940,74 +14261,75 @@ size_t convert_masked_utf8_to_utf16(const char *input,
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_utf32(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char32_t *&utf32_out) {
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_out) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
-  uint32_t*& utf32_output = reinterpret_cast<uint32_t*&>(utf32_out);
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t*>(input));
+  uint32_t *&utf32_output = reinterpret_cast<uint32_t *&>(utf32_out);
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xFFF;
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   // We first try a few fast paths.
-  if(utf8_end_of_code_point_mask == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process in chunks of 12 bytes.
     // use fast implementation in src/simdutf/arm64/simd.h
     // Ideally the compiler can keep the tables in registers.
     simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
     temp.store_ascii_as_utf32_tbl(utf32_out);
     utf32_output += 12; // We wrote 12 32-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;          // We consumed 12 bytes.
   }
-  if(input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte UTF-32 code units.
-    // Convert to UTF-16
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. Convert to UTF-16
     uint16x4_t composed_utf16 = convert_utf8_3_byte_to_utf16(in);
     // Zero extend and store via ST2 with a zero.
-    uint16x4x2_t interleaver = {{ composed_utf16, vmov_n_u16(0) }};
+    uint16x4x2_t interleaver = {{composed_utf16, vmov_n_u16(0)}};
     vst2_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
     utf32_output += 4; // We wrote 4 32-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;         // We consumed 12 bytes.
   }
 
   // 2 byte sequences occur in short bursts in languages like Greek and Russian.
-  if(input_utf8_end_of_code_point_mask == 0xaaa) {
-    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte UTF-32 code units.
-    // Convert to UTF-16
+  if (input_utf8_end_of_code_point_mask == 0xaaa) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte
+    // UTF-32 code units. Convert to UTF-16
     uint16x8_t composed_utf16 = convert_utf8_2_byte_to_utf16(in);
     // Zero extend and store via ST2 with a zero.
-    uint16x8x2_t interleaver = {{ composed_utf16, vmovq_n_u16(0) }};
+    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
     vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
     utf32_output += 6; // We wrote 6 32-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;         // We consumed 12 bytes.
   }
   /// Either no fast path or an unimportant fast path.
 
-  const uint8_t idx =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
-
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
 
   if (idx < 64) {
     // SIX (6) input code-code units
     // Convert to UTF-16
     uint16x8_t composed_utf16 = convert_utf8_1_to_2_byte_to_utf16(in, idx);
     // Zero extend and store with ST2 and zero
-    uint16x8x2_t interleaver = {{ composed_utf16, vmovq_n_u16(0) }};
+    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
     vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
     utf32_output += 6; // We wrote 6 32-bit characters.
     return consumed;
   } else if (idx < 145) {
     // FOUR (4) input code-code units
     // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
     // Shuffle
     // 1 byte: 00000000 00000000 0ccccccc
     // 2 byte: 00000000 110bbbbb 10cccccc
@@ -15015,15 +14337,16 @@ size_t convert_masked_utf8_to_utf32(const char *input,
     uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
     // Split
     // 00000000 00000000 0ccccccc
-    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F));    // 6 or 7 bits
+    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F)); // 6 or 7 bits
     // Note: unmasked
     // xxxxxxxx aaaaxxxx xxxxxxxx
-    uint32x4_t high = vshrq_n_u32(perm, 4);                   // 4 bits
+    uint32x4_t high = vshrq_n_u32(perm, 4); // 4 bits
     // Use 16 bit bic instead of and.
     // The top bits will be corrected later in the bsl
     // 00000000 10bbbbbb 00000000
-    uint32x4_t middle =
-        vreinterpretq_u32_u16(vbicq_u16(vreinterpretq_u16_u32(perm), vmovq_n_u16(uint16_t(~0xff00)))); // 5 or 6 bits
+    uint32x4_t middle = vreinterpretq_u32_u16(
+        vbicq_u16(vreinterpretq_u16_u32(perm),
+                  vmovq_n_u16(uint16_t(~0xff00)))); // 5 or 6 bits
     // Combine low and middle with shift right accumulate
     // 00000000 00xxbbbb bbcccccc
     uint32x4_t lowmid = vsraq_n_u32(ascii, middle, 2);
@@ -15036,13 +14359,14 @@ size_t convert_masked_utf8_to_utf32(const char *input,
   } else if (idx < 209) {
     // THREE (3) input code-code units
     if (input_utf8_end_of_code_point_mask == 0x888) {
-      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte UTF-32 code units.
-      // This uses the same method as the fixed 3 byte version, reversing and shift left insert.
-      // However, there is no need for a shuffle mask now, just rev16 and rev32.
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-32 code units. This uses the same method as the fixed 3 byte
+      // version, reversing and shift left insert. However, there is no need for
+      // a shuffle mask now, just rev16 and rev32.
       //
-      // This version does not use the LUT, but 4 byte sequences are less common and the
-      // overhead of the extra memory access is less important than the early branch overhead
-      // in shorter sequences, so it comes last.
+      // This version does not use the LUT, but 4 byte sequences are less common
+      // and the overhead of the extra memory access is less important than the
+      // early branch overhead in shorter sequences, so it comes last.
 
       // Swap pairs of bytes
       // 10dddddd|10cccccc|10bbbbbb|11110aaa
@@ -15065,11 +14389,12 @@ size_t convert_masked_utf8_to_utf32(const char *input,
       vst1q_u32(utf32_output, composed);
 
       utf32_output += 3; // We wrote 3 32-bit characters.
-      return 12; // We consumed 12 bytes.
+      return 12;         // We consumed 12 bytes.
     }
-    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit due to
-    // surrogates no longer being involved.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit
+    // due to surrogates no longer being involved.
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
     // 1 byte: 00000000 00000000 00000000 0ddddddd
     // 2 byte: 00000000 00000000 110ccccc 10dddddd
     // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
@@ -15078,11 +14403,11 @@ size_t convert_masked_utf8_to_utf32(const char *input,
     // Ascii
     uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F));
     uint32x4_t middle = vandq_u32(perm, vmovq_n_u32(0x3f00));
-    // When converting the way we do, the 3 byte prefix will be interpreted as the
-    // 18th bit being set, since the code would interpret the lead byte (0b1110bbbb)
-    // as a continuation byte (0b10bbbbbb). To fix this, we can either xor or do an
-    // 8 bit add of the 6th bit shifted right by 1. Since NEON has shift right accumulate,
-    // we use that.
+    // When converting the way we do, the 3 byte prefix will be interpreted as
+    // the 18th bit being set, since the code would interpret the lead byte
+    // (0b1110bbbb) as a continuation byte (0b10bbbbbb). To fix this, we can
+    // either xor or do an 8 bit add of the 6th bit shifted right by 1. Since
+    // NEON has shift right accumulate, we use that.
     //  4 byte   3 byte
     // 10bbbbbb 1110bbbb
     // 00000000 01000000 6th bit
@@ -15091,13 +14416,14 @@ size_t convert_masked_utf8_to_utf32(const char *input,
     // 00bbbbbb 0000bbbb mask
     uint8x16_t correction =
         vreinterpretq_u8_u32(vandq_u32(perm, vmovq_n_u32(0x00400000)));
-    uint32x4_t corrected =
-        vreinterpretq_u32_u8(vsraq_n_u8(vreinterpretq_u8_u32(perm), correction, 1));
+    uint32x4_t corrected = vreinterpretq_u32_u8(
+        vsraq_n_u8(vreinterpretq_u8_u32(perm), correction, 1));
     // 00000000 00000000 0000cccc ccdddddd
     uint32x4_t cd = vsraq_n_u32(ascii, middle, 2);
     // Insert twice
     // xxxxxxxx xxxaaabb bbbbxxxx xxxxxxxx
-    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0x01C0000), vshrq_n_u32(corrected, 6), vshrq_n_u32(corrected, 4));
+    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0x01C0000), vshrq_n_u32(corrected, 6),
+                              vshrq_n_u32(corrected, 4));
     // 00000000 000aaabb bbbbcccc ccdddddd
     uint32x4_t composed = vbslq_u32(vmovq_n_u32(0xFFE00FFF), cd, ab);
     // Store
@@ -15110,130 +14436,73 @@ size_t convert_masked_utf8_to_utf32(const char *input,
   }
 }
 /* end file src/arm64/arm_convert_utf8_to_utf32.cpp */
-/* begin file src/arm64/arm_convert_utf8_to_latin1.cpp */
-// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 16, usually 12).
-size_t convert_masked_utf8_to_latin1(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char *&latin1_output) {
-  // we use an approach where we try to process up to 12 input bytes.
-  // Why 12 input bytes and not 16? Because we are concerned with the size of
-  // the lookup tables. Also 12 is nicely divisible by two and three.
-  //
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t*>(input));
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff;
-  //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
-
-  // We first try a few fast paths.
-  // The obvious first test is ASCII, which actually consumes the full 16.
-  if(utf8_end_of_code_point_mask == 0xfff) {
-    // We process in chunks of 12 bytes
-    vst1q_u8(reinterpret_cast<uint8_t*>(latin1_output), in);
-    latin1_output += 12; // We wrote 12 18-bit characters.
-    return 12; // We consumed 12 bytes.
-  }
-  /// We do not have a fast path available, or the fast path is unimportant, so we fallback.
-  const uint8_t idx =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-
-  const uint8_t consumed =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
-  // this indicates an invalid input:
-  if(idx >= 64) { return consumed; }
-  // Here we should have (idx < 64), if not, there is a bug in the validation or elsewhere.
-  // SIX (6) input code-code units
-  // this is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six code
-  // code units spanning between 1 and 2 bytes each is 12 bytes.
-  // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
-  // This is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six code
-  // code units spanning between 1 and 2 bytes each is 12 bytes.
-  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t*>(simdutf::tables::utf8_to_utf16::shufutf8[idx]));
-  // Shuffle
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 110aaaaa 10bbbbbb
-  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
-  // Mask
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000000 00bbbbbb
-  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
-  // 1 byte: 00000000 00000000
-  // 2 byte: 000aaaaa 00000000
-  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
-  // Combine with a shift right accumulate
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000aaa aabbbbbb
-  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
-  // writing 8 bytes even though we only care about the first 6 bytes.
-  uint8x8_t latin1_packed = vmovn_u16(composed);
-  vst1_u8(reinterpret_cast<uint8_t*>(latin1_output), latin1_packed);
-  latin1_output += 6; // We wrote 6 bytes.
-  return consumed;
-}
-
-/* end file src/arm64/arm_convert_utf8_to_latin1.cpp */
 
 /* begin file src/arm64/arm_convert_utf16_to_latin1.cpp */
 
 template <endianness big_endian>
-std::pair<const char16_t*, char*> arm_convert_utf16_to_latin1(const char16_t* buf, size_t len, char* latin1_output) {
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char *>
+arm_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) {
+  const char16_t *end = buf + len;
   while (end - buf >= 8) {
     uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
     if (vmaxvq_u16(in) <= 0xff) {
-        // 1. pack the bytes
-        uint8x8_t latin1_packed = vmovn_u16(in);
-        // 2. store (8 bytes)
-        vst1_u8(reinterpret_cast<uint8_t*>(latin1_output), latin1_packed);
-        // 3. adjust pointers
-        buf += 8;
-        latin1_output += 8;
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(in);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
     } else {
-      return std::make_pair(nullptr, reinterpret_cast<char*>(latin1_output));
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
   } // while
   return std::make_pair(buf, latin1_output);
 }
 
 template <endianness big_endian>
-std::pair<result, char*> arm_convert_utf16_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char *>
+arm_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
   while (end - buf >= 8) {
     uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
     if (vmaxvq_u16(in) <= 0xff) {
-        // 1. pack the bytes
-        uint8x8_t latin1_packed = vmovn_u16(in);
-        // 2. store (8 bytes)
-        vst1_u8(reinterpret_cast<uint8_t*>(latin1_output), latin1_packed);
-        // 3. adjust pointers
-        buf += 8;
-        latin1_output += 8;
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(in);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
     } else {
       // Let us do a scalar fallback.
-      for(int k = 0; k < 8; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if(word <= 0xff) {
+      for (int k = 0; k < 8; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if (word <= 0xff) {
           *latin1_output++ = char(word);
         } else {
-          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), latin1_output);
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
         }
       }
     }
   } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), latin1_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
 /* end file src/arm64/arm_convert_utf16_to_latin1.cpp */
-/* begin file src/arm64/arm_convert_utf16_to_utf8.cpp */
+/* begin file src/arm64/arm_convert_utf16_to_utf32.cpp */
 /*
     The vectorized algorithm works on single SSE register i.e., it
     loads eight 16-bit code units.
@@ -15287,240 +14556,450 @@ std::pair<result, char*> arm_convert_utf16_to_latin1_with_errors(const char16_t*
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::pair<const char16_t*, char*> arm_convert_utf16_to_utf8(const char16_t* buf, size_t len, char* utf8_out) {
-  uint8_t * utf8_output = reinterpret_cast<uint8_t*>(utf8_out);
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char32_t *>
+arm_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *end = buf + len;
 
   const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
   const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+
+  while (end - buf >= 8) {
     uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
-    if(vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
-        // It is common enough that we have sequences of 16 consecutive ASCII characters.
-        uint16x8_t nextin = vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
-        if (!match_system(big_endian)) { nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin))); }
-        if(vmaxvq_u16(nextin) > 0x7F) {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          uint8x8_t utf8_packed = vmovn_u16(in);
-          // 2. store (8 bytes)
-          vst1_u8(utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 8;
-          utf8_output += 8;
-          in = nextin;
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
+      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
         } else {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
-          // 2. store (16 bytes)
-          vst1q_u8(utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 16;
-          utf8_output += 16;
-          continue; // we are done for this round!
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
         }
+      }
+      buf += k;
     }
+  } // while
+  return std::make_pair(buf, reinterpret_cast<char32_t *>(utf32_output));
+}
 
-    if (vmaxvq_u16(in) <= 0x7FF) {
-          // 1. prepare 2-byte values
-          // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-          // expected output   : [110a|aaaa|10bb|bbbb] x 8
-          const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-          const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-          // t0 = [000a|aaaa|bbbb|bb00]
-          const uint16x8_t t0 = vshlq_n_u16(in, 2);
-          // t1 = [000a|aaaa|0000|0000]
-          const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-          // t2 = [0000|0000|00bb|bbbb]
-          const uint16x8_t t2 = vandq_u16(in, v_003f);
-          // t3 = [000a|aaaa|00bb|bbbb]
-          const uint16x8_t t3 = vorrq_u16(t1, t2);
-          // t4 = [110a|aaaa|10bb|bbbb]
-          const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-          // 2. merge ASCII and 2-byte codewords
-          const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-          const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-          const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
-          // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t mask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0002, 0x0008,
-                                    0x0020, 0x0080);
-#else
-          const uint16x8_t mask = { 0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0002, 0x0008,
-                                    0x0020, 0x0080 };
-#endif
-          uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-          // 4. pack the bytes
-          const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-          const uint8x16_t shuffle = vld1q_u8(row + 1);
-          const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-          // 5. store bytes
-          vst1q_u8(utf8_output, utf8_packed);
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char32_t *>
+arm_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                       char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
-          // 6. adjust pointers
-          buf += 8;
-          utf8_output += row[0];
-          continue;
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
 
+  while ((end - buf) >= 8) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
     }
-    const uint16x8_t surrogates_bytemask = vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (vmaxvq_u16(surrogates_bytemask) == 0) {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t dup_even = simdutf_make_uint16x8_t(0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-#else
-        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-#endif
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
+      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
+      }
+      buf += k;
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char32_t *>(utf32_output));
+}
+/* end file src/arm64/arm_convert_utf16_to_utf32.cpp */
+/* begin file src/arm64/arm_convert_utf16_to_utf8.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       is in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+    Ad 1.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    char) or 2) two UTF8 bytes.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const uint16x8_t t0 = vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const uint16x8_t t2 = vorrq_u16 (t1, simdutf_vec(0b1000000000000000));
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
 
-        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-        const uint16x8_t s0 = vshrq_n_u16(in, 12);
-        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-        const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
-        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-        // [00bb|bbbb|0000|aaaa]
-        const uint16x8_t s2 = vorrq_u16(s0, s1s);
-        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-        const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
-        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-        const uint16x8_t s4 = veorq_u16(s3, m0);
-#undef simdutf_vec
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
 
-        // 4. expand code units 16-bit => 32-bit
-        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+    Ad 2.
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
+
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
+
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+arm_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *end = buf + len;
+
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      uint16x8_t nextin =
+          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
+      if (!match_system(big_endian)) {
+        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+      }
+      if (vmaxvq_u16(nextin) > 0x7F) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(in);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
+        // 2. store (16 bytes)
+        vst1q_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
+    }
+
+    if (vmaxvq_u16(in) <= 0x7FF) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const uint16x8_t t0 = vshlq_n_u16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const uint16x8_t t2 = vandq_u16(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const uint16x8_t t3 = vorrq_u16(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+      const uint8x16_t utf8_unpacked =
+          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
+      // 3. prepare bitmask for 8-bit lookup
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t onemask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0100, 0x0400,
-                                    0x1000, 0x4000 );
-        const uint16x8_t twomask = simdutf_make_uint16x8_t(0x0002, 0x0008,
-                                    0x0020, 0x0080,
-                                    0x0200, 0x0800,
-                                    0x2000, 0x8000 );
+      const uint16x8_t mask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
 #else
-        const uint16x8_t onemask = { 0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0100, 0x0400,
-                                    0x1000, 0x4000 };
-        const uint16x8_t twomask = { 0x0002, 0x0008,
-                                    0x0020, 0x0080,
-                                    0x0200, 0x0800,
-                                    0x2000, 0x8000 };
+      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                               0x0002, 0x0008, 0x0020, 0x0080};
 #endif
-        const uint16x8_t combined = vorrq_u16(vandq_u16(one_byte_bytemask, onemask), vandq_u16(one_or_two_bytes_bytemask, twomask));
-        const uint16_t mask = vaddvq_u16(combined);
-        // The following fast path may or may not be beneficial.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += 12;
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
+      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+      // 4. pack the bytes
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const uint8x16_t shuffle = vld1q_u8(row + 1);
+      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
 
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+      // 5. store bytes
+      vst1q_u8(utf8_output, utf8_packed);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
+    }
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
+
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
+
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
+
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
+
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const uint16x8_t t0 = vreinterpretq_u16_u8(
+          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      const uint16x8_t s0 = vshrq_n_u16(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
+      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+      // [00bb|bbbb|0000|aaaa]
+      const uint16x8_t s2 = vorrq_u16(s0, s1s);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
+      const uint16x8_t m0 =
+          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
+      const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+      // 4. expand code units 16-bit => 32-bit
+      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
 
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t onemask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+      const uint16x8_t twomask = simdutf_make_uint16x8_t(
+          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                  0x0100, 0x0400, 0x1000, 0x4000};
+      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                  0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+      const uint16x8_t combined =
+          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                    vandq_u16(one_or_two_bytes_bytemask, twomask));
+      const uint16_t mask = vaddvq_u16(combined);
+      // The following fast path may or may not be beneficial.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
         vst1q_u8(utf8_output, utf8_0);
-        utf8_output += row0[0];
+        utf8_output += 12;
         vst1q_u8(utf8_output, utf8_1);
-        utf8_output += row1[0];
-
+        utf8_output += 12;
         buf += 8;
-    // surrogate pair(s) in a register
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+      vst1q_u8(utf8_output, utf8_0);
+      utf8_output += row0[0];
+      vst1q_u8(utf8_output, utf8_1);
+      utf8_output += row1[0];
+
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, reinterpret_cast<char*>(utf8_output)); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -15528,253 +15007,272 @@ std::pair<const char16_t*, char*> arm_convert_utf16_to_utf8(const char16_t* buf,
     }
   } // while
 
-  return std::make_pair(buf, reinterpret_cast<char*>(utf8_output));
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
 }
 
-
 /*
   Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
 */
 template <endianness big_endian>
-std::pair<result, char*> arm_convert_utf16_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_out) {
-  uint8_t * utf8_output = reinterpret_cast<uint8_t*>(utf8_out);
-    const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char *>
+arm_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
   const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
   const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
   const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
     uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
-    if(vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
-        // It is common enough that we have sequences of 16 consecutive ASCII characters.
-        uint16x8_t nextin = vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
-        if (!match_system(big_endian)) { nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin))); }
-        if(vmaxvq_u16(nextin) > 0x7F) {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          uint8x8_t utf8_packed = vmovn_u16(in);
-          // 2. store (8 bytes)
-          vst1_u8(utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 8;
-          utf8_output += 8;
-          in = nextin;
-        } else {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
-          // 2. store (16 bytes)
-          vst1q_u8(utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 16;
-          utf8_output += 16;
-          continue; // we are done for this round!
-        }
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      uint16x8_t nextin =
+          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
+      if (!match_system(big_endian)) {
+        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+      }
+      if (vmaxvq_u16(nextin) > 0x7F) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(in);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
+        // 2. store (16 bytes)
+        vst1q_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
 
     if (vmaxvq_u16(in) <= 0x7FF) {
-          // 1. prepare 2-byte values
-          // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-          // expected output   : [110a|aaaa|10bb|bbbb] x 8
-          const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-          const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-          // t0 = [000a|aaaa|bbbb|bb00]
-          const uint16x8_t t0 = vshlq_n_u16(in, 2);
-          // t1 = [000a|aaaa|0000|0000]
-          const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-          // t2 = [0000|0000|00bb|bbbb]
-          const uint16x8_t t2 = vandq_u16(in, v_003f);
-          // t3 = [000a|aaaa|00bb|bbbb]
-          const uint16x8_t t3 = vorrq_u16(t1, t2);
-          // t4 = [110a|aaaa|10bb|bbbb]
-          const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-          // 2. merge ASCII and 2-byte codewords
-          const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-          const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-          const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
-          // 3. prepare bitmask for 8-bit lookup
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const uint16x8_t t0 = vshlq_n_u16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const uint16x8_t t2 = vandq_u16(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const uint16x8_t t3 = vorrq_u16(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+      const uint8x16_t utf8_unpacked =
+          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
+      // 3. prepare bitmask for 8-bit lookup
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t mask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0002, 0x0008,
-                                    0x0020, 0x0080);
+      const uint16x8_t mask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
 #else
-          const uint16x8_t mask = { 0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0002, 0x0008,
-                                    0x0020, 0x0080 };
+      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                               0x0002, 0x0008, 0x0020, 0x0080};
 #endif
-          uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-          // 4. pack the bytes
-          const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-          const uint8x16_t shuffle = vld1q_u8(row + 1);
-          const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-          // 5. store bytes
-          vst1q_u8(utf8_output, utf8_packed);
+      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+      // 4. pack the bytes
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const uint8x16_t shuffle = vld1q_u8(row + 1);
+      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
 
-          // 6. adjust pointers
-          buf += 8;
-          utf8_output += row[0];
-          continue;
+      // 5. store bytes
+      vst1q_u8(utf8_output, utf8_packed);
 
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
     }
-    const uint16x8_t surrogates_bytemask = vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (vmaxvq_u16(surrogates_bytemask) == 0) {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t dup_even = simdutf_make_uint16x8_t(0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 #else
-        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
 #endif
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
-
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const uint16x8_t t0 = vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const uint16x8_t t2 = vorrq_u16 (t1, simdutf_vec(0b1000000000000000));
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-        const uint16x8_t s0 = vshrq_n_u16(in, 12);
-        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-        const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
-        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-        // [00bb|bbbb|0000|aaaa]
-        const uint16x8_t s2 = vorrq_u16(s0, s1s);
-        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-        const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
-        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-        const uint16x8_t s4 = veorq_u16(s3, m0);
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const uint16x8_t t0 = vreinterpretq_u16_u8(
+          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      const uint16x8_t s0 = vshrq_n_u16(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
+      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+      // [00bb|bbbb|0000|aaaa]
+      const uint16x8_t s2 = vorrq_u16(s0, s1s);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
+      const uint16x8_t m0 =
+          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
+      const uint16x8_t s4 = veorq_u16(s3, m0);
 #undef simdutf_vec
 
-        // 4. expand code units 16-bit => 32-bit
-        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+      // 4. expand code units 16-bit => 32-bit
+      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t onemask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0100, 0x0400,
-                                    0x1000, 0x4000 );
-        const uint16x8_t twomask = simdutf_make_uint16x8_t(0x0002, 0x0008,
-                                    0x0020, 0x0080,
-                                    0x0200, 0x0800,
-                                    0x2000, 0x8000 );
+      const uint16x8_t onemask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+      const uint16x8_t twomask = simdutf_make_uint16x8_t(
+          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
 #else
-        const uint16x8_t onemask = { 0x0001, 0x0004,
-                                    0x0010, 0x0040,
-                                    0x0100, 0x0400,
-                                    0x1000, 0x4000 };
-        const uint16x8_t twomask = { 0x0002, 0x0008,
-                                    0x0020, 0x0080,
-                                    0x0200, 0x0800,
-                                    0x2000, 0x8000 };
+      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                  0x0100, 0x0400, 0x1000, 0x4000};
+      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                  0x0200, 0x0800, 0x2000, 0x8000};
 #endif
-        const uint16x8_t combined = vorrq_u16(vandq_u16(one_byte_bytemask, onemask), vandq_u16(one_or_two_bytes_bytemask, twomask));
-        const uint16_t mask = vaddvq_u16(combined);
-        // The following fast path may or may not be beneficial.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += 12;
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
+      const uint16x8_t combined =
+          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                    vandq_u16(one_or_two_bytes_bytemask, twomask));
+      const uint16_t mask = vaddvq_u16(combined);
+      // The following fast path may or may not be beneficial.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += 12;
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += 12;
+        buf += 8;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
 
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
 
-        vst1q_u8(utf8_output, utf8_0);
-        utf8_output += row0[0];
-        vst1q_u8(utf8_output, utf8_1);
-        utf8_output += row1[0];
+      vst1q_u8(utf8_output, utf8_0);
+      utf8_output += row0[0];
+      vst1q_u8(utf8_output, utf8_1);
+      utf8_output += row1[0];
 
-        buf += 8;
-    // surrogate pair(s) in a register
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), reinterpret_cast<char*>(utf8_output)); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char *>(utf8_output));
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -15782,740 +15280,630 @@ std::pair<result, char*> arm_convert_utf16_to_utf8_with_errors(const char16_t* b
     }
   } // while
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), reinterpret_cast<char*>(utf8_output));
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
 }
 /* end file src/arm64/arm_convert_utf16_to_utf8.cpp */
-/* begin file src/arm64/arm_convert_utf16_to_utf32.cpp */
-/*
-    The vectorized algorithm works on single SSE register i.e., it
-    loads eight 16-bit code units.
-
-    We consider three cases:
-    1. an input register contains no surrogates and each value
-       is in range 0x0000 .. 0x07ff.
-    2. an input register contains no surrogates and values are
-       is in range 0x0000 .. 0xffff.
-    3. an input register contains surrogates --- i.e. codepoints
-       can have 16 or 32 bits.
 
-    Ad 1.
+/* begin file src/arm64/arm_base64.cpp */
+/**
+ * References and further reading:
+ *
+ * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
+ * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
+ * https://arxiv.org/abs/1910.05109
+ *
+ * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
+ * Instructions, ACM Transactions on the Web 12 (3), 2018.
+ * https://arxiv.org/abs/1704.00605
+ *
+ * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
+ * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
+ * Request for Comments: 4648.
+ *
+ * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
+ * http://www.alfredklomp.com/programming/sse-base64/. (2014).
+ *
+ * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
+ * acceleration. https://github.com/aklomp/base64. (2014).
+ *
+ * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
+ * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
+ *
+ * Nick Kopp. 2013. Base64 Encoding on a GPU.
+ * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
+ */
 
-    When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it is an ASCII
-    char) or 2) two UTF8 bytes.
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
+  // credit: Wojciech Muła
+  uint8_t *out = (uint8_t *)dst;
+  constexpr static uint8_t source_table[64] = {
+      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
+      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
+      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
+      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
+      'N', 'd', 't', '9', 'O', 'e', 'u', '+', 'P', 'f', 'v', '/',
+  };
+  constexpr static uint8_t source_table_url[64] = {
+      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
+      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
+      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
+      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
+      'N', 'd', 't', '9', 'O', 'e', 'u', '-', 'P', 'f', 'v', '_',
+  };
+  const uint8x16_t v3f = vdupq_n_u8(0x3f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  // When trying to load a uint8_t array, Visual Studio might
+  // error with: error C2664: '__n128x4 neon_ld4m_q8(const char *)':
+  // cannot convert argument 1 from 'const uint8_t [64]' to 'const char *
+  const uint8x16x4_t table = vld4q_u8(
+      (reinterpret_cast<const char *>(options & base64_url) ? source_table_url
+                                                            : source_table));
+#else
+  const uint8x16x4_t table =
+      vld4q_u8((options & base64_url) ? source_table_url : source_table);
+#endif
+  size_t i = 0;
+  for (; i + 16 * 3 <= srclen; i += 16 * 3) {
+    const uint8x16x3_t in = vld3q_u8((const uint8_t *)src + i);
+    uint8x16x4_t result;
+    result.val[0] = vshrq_n_u8(in.val[0], 2);
+    result.val[1] =
+        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[1], 4), in.val[0], 4), v3f);
+    result.val[2] =
+        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[2], 6), in.val[1], 2), v3f);
+    result.val[3] = vandq_u8(in.val[2], v3f);
+    result.val[0] = vqtbl4q_u8(table, result.val[0]);
+    result.val[1] = vqtbl4q_u8(table, result.val[1]);
+    result.val[2] = vqtbl4q_u8(table, result.val[2]);
+    result.val[3] = vqtbl4q_u8(table, result.val[3]);
+    vst4q_u8(out, result);
+    out += 64;
+  }
+  out += scalar::base64::tail_encode_base64((char *)out, src + i, srclen - i,
+                                            options);
 
-    For this case we do only some shuffle to obtain these 2-byte
-    codes and finally compress the whole SSE register with a single
-    shuffle.
+  return size_t((char *)out - dst);
+}
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+static inline void compress(uint8x16_t data, uint16_t mask, char *output) {
+  if (mask == 0) {
+    vst1q_u8((uint8_t *)output, data);
+    return;
+  }
+  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
+  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
+  uint64x2_t compactmasku64 = {tables::base64::thintable_epi8[mask1],
+                               tables::base64::thintable_epi8[mask2]};
+  uint8x16_t compactmask = vreinterpretq_u8_u64(compactmasku64);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  const uint8x16_t off =
+      simdutf_make_uint8x16_t(0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8);
+#else
+  const uint8x16_t off = {0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8};
+#endif
 
-    Ad 2.
+  compactmask = vaddq_u8(compactmask, off);
+  uint8x16_t pruned = vqtbl1q_u8(data, compactmask);
 
-    When values fit in 16-bit code units, but are above 0x07ff, then
-    a single word may produce one, two or three UTF8 bytes.
+  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
+  // then load the corresponding mask, what it does is to write
+  // only the first pop1 bytes from the first 8 bytes, and then
+  // it fills in with the bytes from the second 8 bytes + some filling
+  // at the end.
+  compactmask = vld1q_u8(tables::base64::pshufb_combine_table + pop1 * 8);
+  uint8x16_t answer = vqtbl1q_u8(pruned, compactmask);
+  vst1q_u8((uint8_t *)output, answer);
+}
 
-    We prepare data for all these three cases in two registers.
-    The first register contains lower two UTF8 bytes (used in all
-    cases), while the second one contains just the third byte for
-    the three-UTF8-bytes case.
+struct block64 {
+  uint8x16_t chunks[4];
+};
 
-    Finally these two registers are interleaved forming eight-element
-    array of 32-bit values. The array spans two SSE registers.
-    The bytes from the registers are compressed using two shuffles.
+static_assert(sizeof(block64) == 64, "block64 is not 64 bytes");
+template <bool base64_url> uint64_t to_base64_mask(block64 *b, bool *error) {
+  uint8x16_t v0f = vdupq_n_u8(0xf);
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+  uint8x16_t underscore0, underscore1, underscore2, underscore3;
+  if (base64_url) {
+    underscore0 = vceqq_u8(b->chunks[0], vdupq_n_u8(0x5f));
+    underscore1 = vceqq_u8(b->chunks[1], vdupq_n_u8(0x5f));
+    underscore2 = vceqq_u8(b->chunks[2], vdupq_n_u8(0x5f));
+    underscore3 = vceqq_u8(b->chunks[3], vdupq_n_u8(0x5f));
+  } else {
+    (void)underscore0;
+    (void)underscore1;
+    (void)underscore2;
+    (void)underscore3;
+  }
 
+  uint8x16_t lo_nibbles0 = vandq_u8(b->chunks[0], v0f);
+  uint8x16_t lo_nibbles1 = vandq_u8(b->chunks[1], v0f);
+  uint8x16_t lo_nibbles2 = vandq_u8(b->chunks[2], v0f);
+  uint8x16_t lo_nibbles3 = vandq_u8(b->chunks[3], v0f);
 
-    To summarize:
-    - We need two 256-entry tables that have 8704 bytes in total.
-*/
-/*
-  Returns a pair: the first unprocessed byte from buf and utf8_output
-  A scalar routing should carry on the conversion of the tail.
-*/
-template <endianness big_endian>
-std::pair<const char16_t*, char32_t*> arm_convert_utf16_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_out) {
-  uint32_t * utf32_output = reinterpret_cast<uint32_t*>(utf32_out);
-  const char16_t* end = buf + len;
+  // Needed by the decoding step.
+  uint8x16_t hi_nibbles0 = vshrq_n_u8(b->chunks[0], 4);
+  uint8x16_t hi_nibbles1 = vshrq_n_u8(b->chunks[1], 4);
+  uint8x16_t hi_nibbles2 = vshrq_n_u8(b->chunks[2], 4);
+  uint8x16_t hi_nibbles3 = vshrq_n_u8(b->chunks[3], 4);
+  uint8x16_t lut_lo;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    lut_lo =
+        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                                0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4);
+  } else {
+    lut_lo =
+        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                                0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4);
+  }
+#else
+  if (base64_url) {
+    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                        0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4};
+  } else {
+    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                        0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4};
+  }
+#endif
+  uint8x16_t lo0 = vqtbl1q_u8(lut_lo, lo_nibbles0);
+  uint8x16_t lo1 = vqtbl1q_u8(lut_lo, lo_nibbles1);
+  uint8x16_t lo2 = vqtbl1q_u8(lut_lo, lo_nibbles2);
+  uint8x16_t lo3 = vqtbl1q_u8(lut_lo, lo_nibbles3);
+  uint8x16_t lut_hi;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    lut_hi =
+        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
+                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
+  } else {
+    lut_hi =
+        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
+                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
+  }
+#else
+  if (base64_url) {
+    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
+                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
+  } else {
+    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
+                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
+  }
+#endif
+  uint8x16_t hi0 = vqtbl1q_u8(lut_hi, hi_nibbles0);
+  uint8x16_t hi1 = vqtbl1q_u8(lut_hi, hi_nibbles1);
+  uint8x16_t hi2 = vqtbl1q_u8(lut_hi, hi_nibbles2);
+  uint8x16_t hi3 = vqtbl1q_u8(lut_hi, hi_nibbles3);
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+  if (base64_url) {
+    hi0 = vbicq_u8(hi0, underscore0);
+    hi1 = vbicq_u8(hi1, underscore1);
+    hi2 = vbicq_u8(hi2, underscore2);
+    hi3 = vbicq_u8(hi3, underscore3);
+  }
 
-  while (end - buf >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
+  uint8_t checks =
+      vmaxvq_u8(vorrq_u8(vorrq_u8(vandq_u8(lo0, hi0), vandq_u8(lo1, hi1)),
+                         vorrq_u8(vandq_u8(lo2, hi2), vandq_u8(lo3, hi3))));
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  const uint8x16_t bit_mask =
+      simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                              0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
+#else
+  const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                               0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
+#endif
+  uint64_t badcharmask = 0;
+  *error = checks > 0x3;
+  if (checks) {
+    // Add each of the elements next to each other, successively, to stuff each
+    // 8 byte mask into one.
+    uint8x16_t test0 = vtstq_u8(lo0, hi0);
+    uint8x16_t test1 = vtstq_u8(lo1, hi1);
+    uint8x16_t test2 = vtstq_u8(lo2, hi2);
+    uint8x16_t test3 = vtstq_u8(lo3, hi3);
+    uint8x16_t sum0 =
+        vpaddq_u8(vandq_u8(test0, bit_mask), vandq_u8(test1, bit_mask));
+    uint8x16_t sum1 =
+        vpaddq_u8(vandq_u8(test2, bit_mask), vandq_u8(test3, bit_mask));
+    sum0 = vpaddq_u8(sum0, sum1);
+    sum0 = vpaddq_u8(sum0, sum0);
+    badcharmask = vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
+  }
+  // This is the transformation step that can be done while we are waiting for
+  // sum0
+  uint8x16_t roll_lut;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    roll_lut =
+        simdutf_make_uint8x16_t(0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
+  } else {
+    roll_lut =
+        simdutf_make_uint8x16_t(0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
+  }
+#else
+  if (base64_url) {
+    roll_lut = uint8x16_t{0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                          0x0,  0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
+  } else {
+    roll_lut = uint8x16_t{0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                          0x0, 0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
+  }
+#endif
+  uint8x16_t vsecond_last = base64_url ? vdupq_n_u8(0x2d) : vdupq_n_u8(0x2f);
+  if (base64_url) {
+    hi_nibbles0 = vbicq_u8(hi_nibbles0, underscore0);
+    hi_nibbles1 = vbicq_u8(hi_nibbles1, underscore1);
+    hi_nibbles2 = vbicq_u8(hi_nibbles2, underscore2);
+    hi_nibbles3 = vbicq_u8(hi_nibbles3, underscore3);
+  }
+  uint8x16_t roll0 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[0], vsecond_last), hi_nibbles0));
+  uint8x16_t roll1 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[1], vsecond_last), hi_nibbles1));
+  uint8x16_t roll2 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[2], vsecond_last), hi_nibbles2));
+  uint8x16_t roll3 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[3], vsecond_last), hi_nibbles3));
+  b->chunks[0] = vaddq_u8(b->chunks[0], roll0);
+  b->chunks[1] = vaddq_u8(b->chunks[1], roll1);
+  b->chunks[2] = vaddq_u8(b->chunks[2], roll2);
+  b->chunks[3] = vaddq_u8(b->chunks[3], roll3);
+  return badcharmask;
+}
 
-    const uint16x8_t surrogates_bytemask = vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code units
-      vst1q_u32(utf32_output,  vmovl_u16(vget_low_u16(in)));
-      vst1q_u32(utf32_output+4,  vmovl_high_u16(in));
-      utf32_output += 8;
-      buf += 8;
-    // surrogate pair(s) in a register
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
-          *utf32_output++ = char32_t(word);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, reinterpret_cast<char32_t*>(utf32_output)); }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf32_output++ = char32_t(value);
-        }
-      }
-      buf += k;
-    }
-  } // while
-  return std::make_pair(buf, reinterpret_cast<char32_t*>(utf32_output));
+void copy_block(block64 *b, char *output) {
+  vst1q_u8((uint8_t *)output, b->chunks[0]);
+  vst1q_u8((uint8_t *)output + 16, b->chunks[1]);
+  vst1q_u8((uint8_t *)output + 32, b->chunks[2]);
+  vst1q_u8((uint8_t *)output + 48, b->chunks[3]);
 }
 
+uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
+  uint64_t popcounts =
+      vget_lane_u64(vreinterpret_u64_u8(vcnt_u8(vcreate_u8(~mask))), 0);
+  uint64_t offsets = popcounts * 0x0101010101010101;
+  compress(b->chunks[0], uint16_t(mask), output);
+  compress(b->chunks[1], uint16_t(mask >> 16), &output[(offsets >> 8) & 0xFF]);
+  compress(b->chunks[2], uint16_t(mask >> 32), &output[(offsets >> 24) & 0xFF]);
+  compress(b->chunks[3], uint16_t(mask >> 48), &output[(offsets >> 40) & 0xFF]);
+  return offsets >> 56;
+}
 
-/*
-  Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
-*/
-template <endianness big_endian>
-std::pair<result, char32_t*> arm_convert_utf16_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_out) {
-  uint32_t * utf32_output = reinterpret_cast<uint32_t*>(utf32_out);
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
+void load_block(block64 *b, const char *src) {
+  b->chunks[0] = vld1q_u8(reinterpret_cast<const uint8_t *>(src));
+  b->chunks[1] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 16);
+  b->chunks[2] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 32);
+  b->chunks[3] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 48);
+}
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+// The caller of this function is responsible to ensure that there are 32 bytes
+// available from reading at data. It returns a 16-byte value, narrowing with
+// saturation the 16-bit words.
+inline uint8x16_t load_satured(const uint16_t *data) {
+  uint16x8_t in1 = vld1q_u16(data);
+  uint16x8_t in2 = vld1q_u16(data + 8);
+  return vqmovn_high_u16(vqmovn_u16(in1), in2);
+}
 
-  while ((end - buf) >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) { in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in))); }
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
+void load_block(block64 *b, const char16_t *src) {
+  b->chunks[0] = load_satured(reinterpret_cast<const uint16_t *>(src));
+  b->chunks[1] = load_satured(reinterpret_cast<const uint16_t *>(src) + 16);
+  b->chunks[2] = load_satured(reinterpret_cast<const uint16_t *>(src) + 32);
+  b->chunks[3] = load_satured(reinterpret_cast<const uint16_t *>(src) + 48);
+}
 
-    const uint16x8_t surrogates_bytemask = vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code units
-      vst1q_u32(utf32_output,  vmovl_u16(vget_low_u16(in)));
-      vst1q_u32(utf32_output+4,  vmovl_high_u16(in));
-      utf32_output += 8;
-      buf += 8;
-    // surrogate pair(s) in a register
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
-          *utf32_output++ = char32_t(word);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), reinterpret_cast<char32_t*>(utf32_output)); }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf32_output++ = char32_t(value);
+// decode 64 bytes and output 48 bytes
+void base64_decode_block(char *out, const char *src) {
+  uint8x16x4_t str = vld4q_u8((uint8_t *)src);
+  uint8x16x3_t outvec;
+  outvec.val[0] =
+      vorrq_u8(vshlq_n_u8(str.val[0], 2), vshrq_n_u8(str.val[1], 4));
+  outvec.val[1] =
+      vorrq_u8(vshlq_n_u8(str.val[1], 4), vshrq_n_u8(str.val[2], 2));
+  outvec.val[2] = vorrq_u8(vshlq_n_u8(str.val[2], 6), str.val[3]);
+  vst3q_u8((uint8_t *)out, outvec);
+}
+
+template <bool base64_url, typename char_type>
+result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
+                              base64_options options,
+                              last_chunk_handling_options last_chunk_options) {
+  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
+                                        : tables::base64::to_base64_value;
+  size_t equallocation =
+      srclen; // location of the first padding character if any
+  // skip trailing spaces
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
+    srclen--;
+  }
+  size_t equalsigns = 0;
+  if (srclen > 0 && src[srclen - 1] == '=') {
+    equallocation = srclen - 1;
+    srclen--;
+    equalsigns = 1;
+    // skip trailing spaces
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
+      srclen--;
+    }
+    if (srclen > 0 && src[srclen - 1] == '=') {
+      equallocation = srclen - 1;
+      srclen--;
+      equalsigns = 2;
+    }
+  }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
+  const char_type *const srcinit = src;
+  const char *const dstinit = dst;
+  const char_type *const srcend = src + srclen;
+
+  constexpr size_t block_size = 10;
+  char buffer[block_size * 64];
+  char *bufferptr = buffer;
+  if (srclen >= 64) {
+    const char_type *const srcend64 = src + srclen - 64;
+    while (src <= srcend64) {
+      block64 b;
+      load_block(&b, src);
+      src += 64;
+      bool error = false;
+      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
+      if (badcharmask) {
+        if (error) {
+          src -= 64;
+          while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+                 to_base64[uint8_t(*src)] <= 64) {
+            src++;
+          }
+          if (src < srcend) {
+            // should never happen
+          }
+          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
         }
       }
-      buf += k;
+
+      if (badcharmask != 0) {
+        // optimization opportunity: check for simple masks like those made of
+        // continuous 1s followed by continuous 0s. And masks containing a
+        // single bad character.
+        bufferptr += compress_block(&b, badcharmask, bufferptr);
+      } else {
+        // optimization opportunity: if bufferptr == buffer and mask == 0, we
+        // can avoid the call to compress_block and decode directly.
+        copy_block(&b, bufferptr);
+        bufferptr += 64;
+      }
+      if (bufferptr >= (block_size - 1) * 64 + buffer) {
+        for (size_t i = 0; i < (block_size - 1); i++) {
+          base64_decode_block(dst, buffer + i * 64);
+          dst += 48;
+        }
+        std::memcpy(buffer, buffer + (block_size - 1) * 64,
+                    64); // 64 might be too much
+        bufferptr -= (block_size - 1) * 64;
+      }
+    }
+  }
+  char *buffer_start = buffer;
+  // Optimization note: if this is almost full, then it is worth our
+  // time, otherwise, we should just decode directly.
+  int last_block = (int)((bufferptr - buffer_start) % 64);
+  if (last_block != 0 && srcend - src + last_block >= 64) {
+    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
+      uint8_t val = to_base64[uint8_t(*src)];
+      *bufferptr = char(val);
+      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+      }
+      bufferptr += (val <= 63);
+      src++;
+    }
+  }
+
+  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
+    base64_decode_block(dst, buffer_start);
+    dst += 48;
+  }
+  if ((bufferptr - buffer_start) % 64 != 0) {
+    while (buffer_start + 4 < bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 4);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    if (buffer_start + 4 <= bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 3);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
+    // backtrack
+    int leftover = int(bufferptr - buffer_start);
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
+      }
+      src--;
+      leftover--;
+    }
+  }
+  if (src < srcend + equalsigns) {
+    result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
+      r.count += size_t(src - srcinit);
+      return r;
+    } else {
+      r.count += size_t(dst - dstinit);
+    }
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
+      // additional checks
+      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+        r.error = error_code::INVALID_BASE64_CHARACTER;
+        r.count = equallocation;
+      }
     }
-  } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), reinterpret_cast<char32_t*>(utf32_output));
+    return r;
+  }
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+  }
+  return {SUCCESS, size_t(dst - dstinit)};
 }
-/* end file src/arm64/arm_convert_utf16_to_utf32.cpp */
-
+/* end file src/arm64/arm_base64.cpp */
 /* begin file src/arm64/arm_convert_utf32_to_latin1.cpp */
-std::pair<const char32_t*, char*> arm_convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) {
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char *>
+arm_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                            char *latin1_output) {
+  const char32_t *end = buf + len;
   while (end - buf >= 8) {
     uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf+4));
+    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
 
     uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
     if (vmaxvq_u16(utf16_packed) <= 0xff) {
-        // 1. pack the bytes
-        uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
-        // 2. store (8 bytes)
-        vst1_u8(reinterpret_cast<uint8_t*>(latin1_output), latin1_packed);
-        // 3. adjust pointers
-        buf += 8;
-        latin1_output += 8;
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
     } else {
-      return std::make_pair(nullptr, reinterpret_cast<char*>(latin1_output));
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
   } // while
   return std::make_pair(buf, latin1_output);
 }
 
-
-std::pair<result, char*> arm_convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) {
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
+std::pair<result, char *>
+arm_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
   while (end - buf >= 8) {
     uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf+4));
+    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
 
     uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
 
     if (vmaxvq_u16(utf16_packed) <= 0xff) {
-        // 1. pack the bytes
-        uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
-        // 2. store (8 bytes)
-        vst1_u8(reinterpret_cast<uint8_t*>(latin1_output), latin1_packed);
-        // 3. adjust pointers
-        buf += 8;
-        latin1_output += 8;
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
     } else {
       // Let us do a scalar fallback.
-      for(int k = 0; k < 8; k++) {
+      for (int k = 0; k < 8; k++) {
         uint32_t word = buf[k];
-        if(word <= 0xff) {
+        if (word <= 0xff) {
           *latin1_output++ = char(word);
         } else {
-          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), latin1_output);
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
         }
       }
     }
   } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), latin1_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
 /* end file src/arm64/arm_convert_utf32_to_latin1.cpp */
-/* begin file src/arm64/arm_convert_utf32_to_utf8.cpp */
-std::pair<const char32_t*, char*> arm_convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_out) {
-  uint8_t * utf8_output = reinterpret_cast<uint8_t*>(utf8_out);
-  const char32_t* end = buf + len;
-
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-
-  uint16x8_t forbidden_bytemask = vmovq_n_u16(0x0);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
-
-  while (buf + 16 + safety_margin < end) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf+4));
-
-    // Check if no bits set above 16th
-    if(vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
-      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
-      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
-      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
-      if(vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
-        // 2. store (8 bytes)
-        vst1_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        continue; // we are done for this round!
-      }
-
-      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
-        // 1. prepare 2-byte values
-        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-        // expected output   : [110a|aaaa|10bb|bbbb] x 8
-        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-        // t0 = [000a|aaaa|bbbb|bb00]
-        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
-        // t1 = [000a|aaaa|0000|0000]
-        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-        // t2 = [0000|0000|00bb|bbbb]
-        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
-        // t3 = [000a|aaaa|00bb|bbbb]
-        const uint16x8_t t3 = vorrq_u16(t1, t2);
-        // t4 = [110a|aaaa|10bb|bbbb]
-        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-        // 2. merge ASCII and 2-byte codewords
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, utf16_packed, t4));
-        // 3. prepare bitmask for 8-bit lookup
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t mask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                  0x0010, 0x0040,
-                                  0x0002, 0x0008,
-                                  0x0020, 0x0080);
-  #else
-        const uint16x8_t mask = { 0x0001, 0x0004,
-                                  0x0010, 0x0040,
-                                  0x0002, 0x0008,
-                                  0x0020, 0x0080 };
-  #endif
-        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-        // 4. pack the bytes
-        const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-        const uint8x16_t shuffle = vld1q_u8(row + 1);
-        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-        // 5. store bytes
-        vst1q_u8(utf8_output, utf8_packed);
-
-        // 6. adjust pointers
-        buf += 8;
-        utf8_output += row[0];
-        continue;
-      } else {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
-        forbidden_bytemask = vorrq_u16(vandq_u16(vcleq_u16(utf16_packed, v_dfff), vcgeq_u16(utf16_packed, v_d800)), forbidden_bytemask);
-
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t dup_even = simdutf_make_uint16x8_t(0x0000, 0x0202, 0x0404, 0x0606,
-                                      0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-  #else
-          const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                      0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-  #endif
-          /* In this branch we handle three cases:
-            1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-            2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-            3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
-
-            We expand the input word (16-bit) into two code units (32-bit), thus
-            we have room for four bytes. However, we need five distinct bit
-            layouts. Note that the last byte in cases #2 and #3 is the same.
-
-            We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-            in register t2.
-
-            We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-            either byte 1 for case #2 or byte 2 for case #3. Note that they
-            differ by exactly one bit.
-
-            Finally from these two code units we build proper UTF-8 sequence, taking
-            into account the case (i.e, the number of bytes to write).
-          */
-          /**
-           * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-           * t2 => [0ccc|cccc] [10cc|cccc]
-           * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-           */
-  #define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-          // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-          const uint16x8_t t0 = vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed), vreinterpretq_u8_u16(dup_even)));
-          // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-          const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-          // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-          const uint16x8_t t2 = vorrq_u16 (t1, simdutf_vec(0b1000000000000000));
-
-          // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-          const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
-          // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-          const uint16x8_t s1 = vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
-          // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-          const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-          // [00bb|bbbb|0000|aaaa]
-          const uint16x8_t s2 = vorrq_u16(s0, s1s);
-          // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-          const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-          const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-          const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(utf16_packed, v_07ff);
-          const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-          const uint16x8_t s4 = veorq_u16(s3, m0);
-  #undef simdutf_vec
-
-          // 4. expand code units 16-bit => 32-bit
-          const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-          const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
-
-          // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-          const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-          const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t onemask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                      0x0010, 0x0040,
-                                      0x0100, 0x0400,
-                                      0x1000, 0x4000 );
-          const uint16x8_t twomask = simdutf_make_uint16x8_t(0x0002, 0x0008,
-                                      0x0020, 0x0080,
-                                      0x0200, 0x0800,
-                                      0x2000, 0x8000 );
-  #else
-          const uint16x8_t onemask = { 0x0001, 0x0004,
-                                      0x0010, 0x0040,
-                                      0x0100, 0x0400,
-                                      0x1000, 0x4000 };
-          const uint16x8_t twomask = { 0x0002, 0x0008,
-                                      0x0020, 0x0080,
-                                      0x0200, 0x0800,
-                                      0x2000, 0x8000 };
-  #endif
-          const uint16x8_t combined = vorrq_u16(vandq_u16(one_byte_bytemask, onemask), vandq_u16(one_or_two_bytes_bytemask, twomask));
-          const uint16_t mask = vaddvq_u16(combined);
-          // The following fast path may or may not be beneficial.
-          /*if(mask == 0) {
-            // We only have three-byte code units. Use fast path.
-            const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-            const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-            const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-            vst1q_u8(utf8_output, utf8_0);
-            utf8_output += 12;
-            vst1q_u8(utf8_output, utf8_1);
-            utf8_output += 12;
-            buf += 8;
-            continue;
-          }*/
-          const uint8_t mask0 = uint8_t(mask);
-          const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-          const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
-
-          const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-          const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-          const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
-
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += row0[0];
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += row1[0];
-
-          buf += 8;
-      }
-    // At least one 32-bit word will produce a surrogate pair in UTF-16 <=> will produce four UTF-8 bytes.
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {
-          *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000)==0) {
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, reinterpret_cast<char*>(utf8_output)); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, reinterpret_cast<char*>(utf8_output)); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
-      }
-      buf += k;
-    }
-  } // while
-
-  // check for invalid input
-  if (vmaxvq_u16(forbidden_bytemask) != 0) {
-    return std::make_pair(nullptr, reinterpret_cast<char*>(utf8_output));
-  }
-  return std::make_pair(buf, reinterpret_cast<char*>(utf8_output));
-}
-
-
-std::pair<result, char*> arm_convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_out) {
-  uint8_t * utf8_output = reinterpret_cast<uint8_t*>(utf8_out);
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
-
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
-
-  while (buf + 16 + safety_margin < end) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf+4));
-
-    // Check if no bits set above 16th
-    if(vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
-      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
-      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
-      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
-      if(vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
-          // 1. pack the bytes
-          // obviously suboptimal.
-          uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
-          // 2. store (8 bytes)
-          vst1_u8(utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 8;
-          utf8_output += 8;
-          continue; // we are done for this round!
-      }
-
-      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
-        // 1. prepare 2-byte values
-        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-        // expected output   : [110a|aaaa|10bb|bbbb] x 8
-        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-        // t0 = [000a|aaaa|bbbb|bb00]
-        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
-        // t1 = [000a|aaaa|0000|0000]
-        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-        // t2 = [0000|0000|00bb|bbbb]
-        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
-        // t3 = [000a|aaaa|00bb|bbbb]
-        const uint16x8_t t3 = vorrq_u16(t1, t2);
-        // t4 = [110a|aaaa|10bb|bbbb]
-        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-        // 2. merge ASCII and 2-byte codewords
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, utf16_packed, t4));
-        // 3. prepare bitmask for 8-bit lookup
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t mask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                  0x0010, 0x0040,
-                                  0x0002, 0x0008,
-                                  0x0020, 0x0080);
-  #else
-        const uint16x8_t mask = { 0x0001, 0x0004,
-                                  0x0010, 0x0040,
-                                  0x0002, 0x0008,
-                                  0x0020, 0x0080 };
-  #endif
-        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-        // 4. pack the bytes
-        const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-        const uint8x16_t shuffle = vld1q_u8(row + 1);
-        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-        // 5. store bytes
-        vst1q_u8(utf8_output, utf8_packed);
-
-        // 6. adjust pointers
-        buf += 8;
-        utf8_output += row[0];
-        continue;
-      } else {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-
-        // check for invalid input
-        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
-        const uint16x8_t forbidden_bytemask = vandq_u16(vcleq_u16(utf16_packed, v_dfff), vcgeq_u16(utf16_packed, v_d800));
-        if (vmaxvq_u16(forbidden_bytemask) != 0) {
-          return std::make_pair(result(error_code::SURROGATE, buf - start), reinterpret_cast<char*>(utf8_output));
-        }
-
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t dup_even = simdutf_make_uint16x8_t(0x0000, 0x0202, 0x0404, 0x0606,
-                                      0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-  #else
-          const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                      0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-  #endif
-          /* In this branch we handle three cases:
-            1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-            2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-            3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
-
-            We expand the input word (16-bit) into two code units (32-bit), thus
-            we have room for four bytes. However, we need five distinct bit
-            layouts. Note that the last byte in cases #2 and #3 is the same.
-
-            We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-            in register t2.
-
-            We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-            either byte 1 for case #2 or byte 2 for case #3. Note that they
-            differ by exactly one bit.
-
-            Finally from these two code units we build proper UTF-8 sequence, taking
-            into account the case (i.e, the number of bytes to write).
-          */
-          /**
-           * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-           * t2 => [0ccc|cccc] [10cc|cccc]
-           * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-           */
-  #define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-          // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-          const uint16x8_t t0 = vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed), vreinterpretq_u8_u16(dup_even)));
-          // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-          const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-          // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-          const uint16x8_t t2 = vorrq_u16 (t1, simdutf_vec(0b1000000000000000));
-
-          // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-          const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
-          // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-          const uint16x8_t s1 = vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
-          // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-          const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-          // [00bb|bbbb|0000|aaaa]
-          const uint16x8_t s2 = vorrq_u16(s0, s1s);
-          // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-          const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-          const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-          const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(utf16_packed, v_07ff);
-          const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-          const uint16x8_t s4 = veorq_u16(s3, m0);
-  #undef simdutf_vec
-
-          // 4. expand code units 16-bit => 32-bit
-          const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-          const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
-
-          // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-          const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-          const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-  #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-          const uint16x8_t onemask = simdutf_make_uint16x8_t(0x0001, 0x0004,
-                                      0x0010, 0x0040,
-                                      0x0100, 0x0400,
-                                      0x1000, 0x4000 );
-          const uint16x8_t twomask = simdutf_make_uint16x8_t(0x0002, 0x0008,
-                                      0x0020, 0x0080,
-                                      0x0200, 0x0800,
-                                      0x2000, 0x8000 );
-  #else
-          const uint16x8_t onemask = { 0x0001, 0x0004,
-                                      0x0010, 0x0040,
-                                      0x0100, 0x0400,
-                                      0x1000, 0x4000 };
-          const uint16x8_t twomask = { 0x0002, 0x0008,
-                                      0x0020, 0x0080,
-                                      0x0200, 0x0800,
-                                      0x2000, 0x8000 };
-  #endif
-          const uint16x8_t combined = vorrq_u16(vandq_u16(one_byte_bytemask, onemask), vandq_u16(one_or_two_bytes_bytemask, twomask));
-          const uint16_t mask = vaddvq_u16(combined);
-          // The following fast path may or may not be beneficial.
-          /*if(mask == 0) {
-            // We only have three-byte code units. Use fast path.
-            const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-            const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-            const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-            vst1q_u8(utf8_output, utf8_0);
-            utf8_output += 12;
-            vst1q_u8(utf8_output, utf8_1);
-            utf8_output += 12;
-            buf += 8;
-            continue;
-          }*/
-          const uint8_t mask0 = uint8_t(mask);
-
-          const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-          const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
-
-          const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-          const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-          const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
-
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += row0[0];
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += row1[0];
-
-          buf += 8;
-      }
-    // At least one 32-bit word will produce a surrogate pair in UTF-16 <=> will produce four UTF-8 bytes.
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
-        uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {
-          *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000)==0) {
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), reinterpret_cast<char*>(utf8_output)); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), reinterpret_cast<char*>(utf8_output)); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
-      }
-      buf += k;
-    }
-  } // while
-
-  return std::make_pair(result(error_code::SUCCESS, buf - start), reinterpret_cast<char*>(utf8_output));
-}
-/* end file src/arm64/arm_convert_utf32_to_utf8.cpp */
 /* begin file src/arm64/arm_convert_utf32_to_utf16.cpp */
 template <endianness big_endian>
-std::pair<const char32_t*, char16_t*> arm_convert_utf32_to_utf16(const char32_t* buf, size_t len, char16_t* utf16_out) {
-  uint16_t * utf16_output = reinterpret_cast<uint16_t*>(utf16_out);
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char16_t *>
+arm_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                           char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *end = buf + len;
 
   uint16x4_t forbidden_bytemask = vmov_n_u16(0x0);
 
-  while(end - buf >= 4) {
+  while (end - buf >= 4) {
     uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
 
     // Check if no bits set above 16th
-    if(vmaxvq_u32(in) <= 0xFFFF) {
+    if (vmaxvq_u32(in) <= 0xFFFF) {
       uint16x4_t utf16_packed = vmovn_u32(in);
 
       const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
       const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
-      forbidden_bytemask = vorr_u16(vand_u16(vcle_u16(utf16_packed, v_dfff), vcge_u16(utf16_packed, v_d800)), forbidden_bytemask);
+      forbidden_bytemask = vorr_u16(vand_u16(vcle_u16(utf16_packed, v_dfff),
+                                             vcge_u16(utf16_packed, v_d800)),
+                                    forbidden_bytemask);
 
-      if (!match_system(big_endian)) { utf16_packed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed))); }
+      if (!match_system(big_endian)) {
+        utf16_packed =
+            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
+      }
       vst1_u16(utf16_output, utf16_packed);
       utf16_output += 4;
       buf += 4;
     } else {
       size_t forward = 3;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, reinterpret_cast<char16_t*>(utf16_output)); }
-          *utf16_output++ = !match_system(big_endian) ? char16_t(word >> 8 | word << 8) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, reinterpret_cast<char16_t*>(utf16_output)); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (!match_system(big_endian)) {
-            high_surrogate = uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
             low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
           }
           *utf16_output++ = char16_t(high_surrogate);
@@ -16528,55 +15916,74 @@ std::pair<const char32_t*, char16_t*> arm_convert_utf32_to_utf16(const char32_t*
 
   // check for invalid input
   if (vmaxv_u16(forbidden_bytemask) != 0) {
-    return std::make_pair(nullptr, reinterpret_cast<char16_t*>(utf16_output));
+    return std::make_pair(nullptr, reinterpret_cast<char16_t *>(utf16_output));
   }
 
-  return std::make_pair(buf, reinterpret_cast<char16_t*>(utf16_output));
+  return std::make_pair(buf, reinterpret_cast<char16_t *>(utf16_output));
 }
 
-
 template <endianness big_endian>
-std::pair<result, char16_t*> arm_convert_utf32_to_utf16_with_errors(const char32_t* buf, size_t len, char16_t* utf16_out) {
-  uint16_t * utf16_output = reinterpret_cast<uint16_t*>(utf16_out);
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
+std::pair<result, char16_t *>
+arm_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                       char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-  while(end - buf >= 4) {
+  while (end - buf >= 4) {
     uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
 
     // Check if no bits set above 16th
-    if(vmaxvq_u32(in) <= 0xFFFF) {
+    if (vmaxvq_u32(in) <= 0xFFFF) {
       uint16x4_t utf16_packed = vmovn_u32(in);
 
       const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
       const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
-      const uint16x4_t forbidden_bytemask = vand_u16(vcle_u16(utf16_packed, v_dfff), vcge_u16(utf16_packed, v_d800));
+      const uint16x4_t forbidden_bytemask = vand_u16(
+          vcle_u16(utf16_packed, v_dfff), vcge_u16(utf16_packed, v_d800));
       if (vmaxv_u16(forbidden_bytemask) != 0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), reinterpret_cast<char16_t*>(utf16_output));
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              reinterpret_cast<char16_t *>(utf16_output));
       }
 
-      if (!match_system(big_endian)) { utf16_packed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed))); }
+      if (!match_system(big_endian)) {
+        utf16_packed =
+            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
+      }
       vst1_u16(utf16_output, utf16_packed);
       utf16_output += 4;
       buf += 4;
     } else {
       size_t forward = 3;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), reinterpret_cast<char16_t*>(utf16_output)); }
-          *utf16_output++ = !match_system(big_endian) ? char16_t(word >> 8 | word << 8) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), reinterpret_cast<char16_t*>(utf16_output)); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (!match_system(big_endian)) {
-            high_surrogate = uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
             low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
           }
           *utf16_output++ = char16_t(high_surrogate);
@@ -16587,523 +15994,517 @@ std::pair<result, char16_t*> arm_convert_utf32_to_utf16_with_errors(const char32
     }
   }
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), reinterpret_cast<char16_t*>(utf16_output));
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char16_t *>(utf16_output));
 }
 /* end file src/arm64/arm_convert_utf32_to_utf16.cpp */
-/* begin file src/arm64/arm_base64.cpp */
-/**
- * References and further reading:
- *
- * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
- * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
- * https://arxiv.org/abs/1910.05109
- *
- * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
- * Instructions, ACM Transactions on the Web 12 (3), 2018.
- * https://arxiv.org/abs/1704.00605
- *
- * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
- * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
- * Request for Comments: 4648.
- *
- * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
- * http://www.alfredklomp.com/programming/sse-base64/. (2014).
- *
- * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
- * acceleration. https://github.com/aklomp/base64. (2014).
- *
- * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
- * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
- *
- * Nick Kopp. 2013. Base64 Encoding on a GPU.
- * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
- */
-
-size_t encode_base64(char *dst, const char *src, size_t srclen,
-                     base64_options options) {
-  // credit: Wojciech Muła
-  uint8_t *out = (uint8_t *)dst;
-  constexpr static uint8_t source_table[64] = {
-      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
-      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
-      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
-      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
-      'N', 'd', 't', '9', 'O', 'e', 'u', '+', 'P', 'f', 'v', '/',
-  };
-  constexpr static uint8_t source_table_url[64] = {
-      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
-      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
-      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
-      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
-      'N', 'd', 't', '9', 'O', 'e', 'u', '-', 'P', 'f', 'v', '_',
-  };
-  const uint8x16_t v3f = vdupq_n_u8(0x3f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  // When trying to load a uint8_t array, Visual Studio might
-  // error with: error C2664: '__n128x4 neon_ld4m_q8(const char *)':
-  // cannot convert argument 1 from 'const uint8_t [64]' to 'const char *
-  const uint8x16x4_t table =
-      vld4q_u8((reinterpret_cast<const char *>(
-        options & base64_url) ? source_table_url : source_table));
-#else
-  const uint8x16x4_t table =
-      vld4q_u8((options & base64_url) ? source_table_url : source_table);
-#endif
-  size_t i = 0;
-  for (; i + 16 * 3 <= srclen; i += 16 * 3) {
-    const uint8x16x3_t in = vld3q_u8((const uint8_t *)src + i);
-    uint8x16x4_t result;
-    result.val[0] = vshrq_n_u8(in.val[0], 2);
-    result.val[1] =
-        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[1], 4), in.val[0], 4), v3f);
-    result.val[2] =
-        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[2], 6), in.val[1], 2), v3f);
-    result.val[3] = vandq_u8(in.val[2], v3f);
-    result.val[0] = vqtbl4q_u8(table, result.val[0]);
-    result.val[1] = vqtbl4q_u8(table, result.val[1]);
-    result.val[2] = vqtbl4q_u8(table, result.val[2]);
-    result.val[3] = vqtbl4q_u8(table, result.val[3]);
-    vst4q_u8(out, result);
-    out += 64;
-  }
-  out += scalar::base64::tail_encode_base64((char *)out, src + i, srclen - i,
-                                            options);
-
-  return size_t((char *)out - dst);
-}
-
-static inline void compress(uint8x16_t data, uint16_t mask, char *output) {
-  if (mask == 0) {
-    vst1q_u8((uint8_t *)output, data);
-    return;
-  }
-  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
-  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
-  uint64x2_t compactmasku64 = {tables::base64::thintable_epi8[mask1],
-                               tables::base64::thintable_epi8[mask2]};
-  uint8x16_t compactmask = vreinterpretq_u8_u64(compactmasku64);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t off =
-      simdutf_make_uint8x16_t(0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8);
-#else
-  const uint8x16_t off = {0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8};
-#endif
-
-  compactmask = vaddq_u8(compactmask, off);
-  uint8x16_t pruned = vqtbl1q_u8(data, compactmask);
-
-  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
-  // then load the corresponding mask, what it does is to write
-  // only the first pop1 bytes from the first 8 bytes, and then
-  // it fills in with the bytes from the second 8 bytes + some filling
-  // at the end.
-  compactmask = vld1q_u8(tables::base64::pshufb_combine_table + pop1 * 8);
-  uint8x16_t answer = vqtbl1q_u8(pruned, compactmask);
-  vst1q_u8((uint8_t *)output, answer);
-}
-
-struct block64 {
-  uint8x16_t chunks[4];
-};
+/* begin file src/arm64/arm_convert_utf32_to_utf8.cpp */
+std::pair<const char32_t *, char *>
+arm_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *end = buf + len;
 
-static_assert(sizeof(block64) == 64, "block64 is not 64 bytes");
-template <bool base64_url> uint64_t to_base64_mask(block64 *b, bool *error) {
-  uint8x16_t v0f = vdupq_n_u8(0xf);
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
 
-  uint8x16_t underscore0, underscore1, underscore2, underscore3;
-  if (base64_url) {
-    underscore0 = vceqq_u8(b->chunks[0], vdupq_n_u8(0x5f));
-    underscore1 = vceqq_u8(b->chunks[1], vdupq_n_u8(0x5f));
-    underscore2 = vceqq_u8(b->chunks[2], vdupq_n_u8(0x5f));
-    underscore3 = vceqq_u8(b->chunks[3], vdupq_n_u8(0x5f));
-  } else {
-    (void)underscore0;
-    (void)underscore1;
-    (void)underscore2;
-    (void)underscore3;
-  }
+  uint16x8_t forbidden_bytemask = vmovq_n_u16(0x0);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-  uint8x16_t lo_nibbles0 = vandq_u8(b->chunks[0], v0f);
-  uint8x16_t lo_nibbles1 = vandq_u8(b->chunks[1], v0f);
-  uint8x16_t lo_nibbles2 = vandq_u8(b->chunks[2], v0f);
-  uint8x16_t lo_nibbles3 = vandq_u8(b->chunks[3], v0f);
+  while (buf + 16 + safety_margin < end) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
 
-  // Needed by the decoding step.
-  uint8x16_t hi_nibbles0 = vshrq_n_u8(b->chunks[0], 4);
-  uint8x16_t hi_nibbles1 = vshrq_n_u8(b->chunks[1], 4);
-  uint8x16_t hi_nibbles2 = vshrq_n_u8(b->chunks[2], 4);
-  uint8x16_t hi_nibbles3 = vshrq_n_u8(b->chunks[3], 4);
-  uint8x16_t lut_lo;
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    lut_lo =
-        simdutf_make_uint8x16_t(0x3a,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x61,0xe1,0xf4,0xe5,0xa5,0xf4,0xf4);
-  } else {
-    lut_lo =
-        simdutf_make_uint8x16_t(0x3a,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x61,0xe1,0xb4,0xe5,0xe5,0xf4,0xb4);
-  }
-#else
-  if (base64_url) {
-    lut_lo = uint8x16_t{0x3a,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x61,0xe1,0xf4,0xe5,0xa5,0xf4,0xf4};
-  } else {
-    lut_lo = uint8x16_t{0x3a,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x70,0x61,0xe1,0xb4,0xe5,0xe5,0xf4,0xb4};
-  }
-#endif
-  uint8x16_t lo0 = vqtbl1q_u8(lut_lo, lo_nibbles0);
-  uint8x16_t lo1 = vqtbl1q_u8(lut_lo, lo_nibbles1);
-  uint8x16_t lo2 = vqtbl1q_u8(lut_lo, lo_nibbles2);
-  uint8x16_t lo3 = vqtbl1q_u8(lut_lo, lo_nibbles3);
-  uint8x16_t lut_hi;
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    lut_hi =
-        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
-                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
-  } else {
-    lut_hi =
-        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
-                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
-  }
-#else
-  if (base64_url) {
-    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
-              0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
-  } else {
-    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
-              0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
-  }
-#endif
-  uint8x16_t hi0 = vqtbl1q_u8(lut_hi, hi_nibbles0);
-  uint8x16_t hi1 = vqtbl1q_u8(lut_hi, hi_nibbles1);
-  uint8x16_t hi2 = vqtbl1q_u8(lut_hi, hi_nibbles2);
-  uint8x16_t hi3 = vqtbl1q_u8(lut_hi, hi_nibbles3);
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
+      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
+      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
 
-  if (base64_url) {
-    hi0 = vbicq_u8(hi0, underscore0);
-    hi1 = vbicq_u8(hi1, underscore1);
-    hi2 = vbicq_u8(hi2, underscore2);
-    hi3 = vbicq_u8(hi3, underscore3);
-  }
+      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
+        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
 
-  uint8_t checks =
-      vmaxvq_u8(vorrq_u8(vorrq_u8(vandq_u8(lo0, hi0), vandq_u8(lo1, hi1)),
-                         vorrq_u8(vandq_u8(lo2, hi2), vandq_u8(lo3, hi3))));
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+        // t2 = [0000|0000|00bb|bbbb]
+        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const uint16x8_t t3 = vorrq_u16(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
+            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
+        // 3. prepare bitmask for 8-bit lookup
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t bit_mask =
-      simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                              0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
+        const uint16x8_t mask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
 #else
-  const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                               0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
+        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                 0x0002, 0x0008, 0x0020, 0x0080};
 #endif
-  uint64_t badcharmask = 0;
-  *error = checks > 0x3;
-  if (checks) {
-    // Add each of the elements next to each other, successively, to stuff each
-    // 8 byte mask into one.
-    uint8x16_t test0 = vtstq_u8(lo0, hi0);
-    uint8x16_t test1 = vtstq_u8(lo1, hi1);
-    uint8x16_t test2 = vtstq_u8(lo2, hi2);
-    uint8x16_t test3 = vtstq_u8(lo3, hi3);
-    uint8x16_t sum0 =
-        vpaddq_u8(vandq_u8(test0, bit_mask), vandq_u8(test1, bit_mask));
-    uint8x16_t sum1 =
-        vpaddq_u8(vandq_u8(test2, bit_mask), vandq_u8(test3, bit_mask));
-    sum0 = vpaddq_u8(sum0, sum1);
-    sum0 = vpaddq_u8(sum0, sum0);
-    badcharmask = vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
-  }
-  // This is the transformation step that can be done while we are waiting for
-  // sum0
-  uint8x16_t roll_lut;
+        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+        const uint8x16_t shuffle = vld1q_u8(row + 1);
+        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+        // 5. store bytes
+        vst1q_u8(utf8_output, utf8_packed);
+
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
+        forbidden_bytemask =
+            vorrq_u16(vandq_u16(vcleq_u16(utf16_packed, v_dfff),
+                                vcgeq_u16(utf16_packed, v_d800)),
+                      forbidden_bytemask);
+
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    roll_lut =
-        simdutf_make_uint8x16_t(0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
-  } else {
-    roll_lut =
-        simdutf_make_uint8x16_t(0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
-  }
+        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 #else
-  if (base64_url) {
-    roll_lut = uint8x16_t{0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                0x0,  0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
-  } else {
-    roll_lut = uint8x16_t{0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                0x0, 0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
-  }
+        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
 #endif
-  uint8x16_t vsecond_last = base64_url ? vdupq_n_u8(0x2d) : vdupq_n_u8(0x2f);
-  if (base64_url) {
-    hi_nibbles0 = vbicq_u8(hi_nibbles0, underscore0);
-    hi_nibbles1 = vbicq_u8(hi_nibbles1, underscore1);
-    hi_nibbles2 = vbicq_u8(hi_nibbles2, underscore2);
-    hi_nibbles3 = vbicq_u8(hi_nibbles3, underscore3);
-  }
-  uint8x16_t roll0 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[0], vsecond_last), hi_nibbles0));
-  uint8x16_t roll1 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[1], vsecond_last), hi_nibbles1));
-  uint8x16_t roll2 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[2], vsecond_last), hi_nibbles2));
-  uint8x16_t roll3 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[3], vsecond_last), hi_nibbles3));
-  b->chunks[0] = vaddq_u8(b->chunks[0], roll0);
-  b->chunks[1] = vaddq_u8(b->chunks[1], roll1);
-  b->chunks[2] = vaddq_u8(b->chunks[2], roll2);
-  b->chunks[3] = vaddq_u8(b->chunks[3], roll3);
-  return badcharmask;
-}
+        /* In this branch we handle three cases:
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+          two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          three UTF-8 bytes
 
-void copy_block(block64 *b, char *output) {
-  vst1q_u8((uint8_t *)output, b->chunks[0]);
-  vst1q_u8((uint8_t *)output + 16, b->chunks[1]);
-  vst1q_u8((uint8_t *)output + 32, b->chunks[2]);
-  vst1q_u8((uint8_t *)output + 48, b->chunks[3]);
-}
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
-  uint64_t popcounts =
-      vget_lane_u64(vreinterpret_u64_u8(vcnt_u8(vcreate_u8(~mask))), 0);
-  uint64_t offsets = popcounts * 0x0101010101010101;
-  compress(b->chunks[0], uint16_t(mask), output);
-  compress(b->chunks[1], uint16_t(mask >> 16), &output[(offsets >> 8) & 0xFF]);
-  compress(b->chunks[2], uint16_t(mask >> 32), &output[(offsets >> 24) & 0xFF]);
-  compress(b->chunks[3], uint16_t(mask >> 48), &output[(offsets >> 40) & 0xFF]);
-  return offsets >> 56;
-}
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-// The caller of this function is responsible to ensure that there are 64 bytes available
-// from reading at src. The data is read into a block64 structure.
-void load_block(block64 *b, const char *src) {
-  b->chunks[0] = vld1q_u8(reinterpret_cast<const uint8_t *>(src));
-  b->chunks[1] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 16);
-  b->chunks[2] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 32);
-  b->chunks[3] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 48);
-}
+          We precompute byte 1 for case #3 and -- **conditionally** --
+          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+          they differ by exactly one bit.
 
-// The caller of this function is responsible to ensure that there are 32 bytes available
-// from reading at data. It returns a 16-byte value, narrowing with saturation the 16-bit words.
-inline uint8x16_t load_satured(const uint16_t *data) {
-  uint16x8_t in1 = vld1q_u16(data);
-  uint16x8_t in2 = vld1q_u16(data + 8);
-  return vqmovn_high_u16(vqmovn_u16(in1), in2);
-}
+          Finally from these two code units we build proper UTF-8 sequence,
+          taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        const uint16x8_t t0 =
+            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
+                                            vreinterpretq_u8_u16(dup_even)));
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
 
-// The caller of this function is responsible to ensure that there are 128 bytes available
-// from reading at src. The data is read into a block64 structure.
-void load_block(block64 *b, const char16_t *src) {
-  b->chunks[0] = load_satured(reinterpret_cast<const uint16_t *>(src));
-  b->chunks[1] = load_satured(reinterpret_cast<const uint16_t *>(src) + 16);
-  b->chunks[2] = load_satured(reinterpret_cast<const uint16_t *>(src) + 32);
-  b->chunks[3] = load_satured(reinterpret_cast<const uint16_t *>(src) + 48);
-}
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        const uint16x8_t s1 =
+            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+        // [00bb|bbbb|0000|aaaa]
+        const uint16x8_t s2 = vorrq_u16(s0, s1s);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        const uint16x8_t one_or_two_bytes_bytemask =
+            vcleq_u16(utf16_packed, v_07ff);
+        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
+                                        one_or_two_bytes_bytemask);
+        const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
 
-// decode 64 bytes and output 48 bytes
-void base64_decode_block(char *out, const char *src) {
-  uint8x16x4_t str = vld4q_u8((uint8_t *)src);
-  uint8x16x3_t outvec;
-  outvec.val[0] =
-      vorrq_u8(vshlq_n_u8(str.val[0], 2), vshrq_n_u8(str.val[1], 4));
-  outvec.val[1] =
-      vorrq_u8(vshlq_n_u8(str.val[1], 4), vshrq_n_u8(str.val[2], 2));
-  outvec.val[2] = vorrq_u8(vshlq_n_u8(str.val[2], 6), str.val[3]);
-  vst3q_u8((uint8_t *)out, outvec);
-}
+        // 4. expand code units 16-bit => 32-bit
+        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
 
-template <bool base64_url, typename char_type>
-result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
-                              base64_options options) {
-  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
-                                        : tables::base64::to_base64_value;
-  size_t equallocation = srclen; // location of the first padding character if any
-  // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
-    srclen--;
-  }
-  size_t equalsigns = 0;
-  if (srclen > 0 && src[srclen - 1] == '=') {
-    equallocation = srclen - 1;
-    srclen--;
-    equalsigns = 1;
-    // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
-      srclen--;
-    }
-    if (srclen > 0 && src[srclen - 1] == '=') {
-      equallocation = srclen - 1;
-      srclen--;
-      equalsigns = 2;
-    }
-  }
-  const char_type *const srcinit = src;
-  const char *const dstinit = dst;
-  const char_type *const srcend = src + srclen;
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t onemask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+        const uint16x8_t twomask = simdutf_make_uint16x8_t(
+            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                    0x0100, 0x0400, 0x1000, 0x4000};
+        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                    0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+        const uint16x8_t combined =
+            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                      vandq_u16(one_or_two_bytes_bytemask, twomask));
+        const uint16_t mask = vaddvq_u16(combined);
+        // The following fast path may or may not be beneficial.
+        /*if(mask == 0) {
+          // We only have three-byte code units. Use fast path.
+          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+          vst1q_u8(utf8_output, utf8_0);
+          utf8_output += 12;
+          vst1q_u8(utf8_output, utf8_1);
+          utf8_output += 12;
+          buf += 8;
+          continue;
+        }*/
+        const uint8_t mask0 = uint8_t(mask);
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
 
-  constexpr size_t block_size = 10;
-  char buffer[block_size * 64];
-  char *bufferptr = buffer;
-  if (srclen >= 64) {
-    const char_type *const srcend64 = src + srclen - 64;
-    while (src <= srcend64) {
-      block64 b;
-      load_block(&b, src);
-      src += 64;
-      bool error = false;
-      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
-      if(badcharmask){
-        if (error) {
-          src -= 64;
-          while (src < srcend && scalar::base64::is_eight_byte(*src) && to_base64[uint8_t(*src)] <= 64) {
-            src++;
+        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += row0[0];
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += row1[0];
+
+        buf += 8;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
           }
-          if(src < srcend){
-            // should never happen
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
           }
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
+      buf += k;
+    }
+  } // while
+
+  // check for invalid input
+  if (vmaxvq_u16(forbidden_bytemask) != 0) {
+    return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
+  }
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
+}
+
+std::pair<result, char *>
+arm_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (buf + 16 + safety_margin < end) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
+      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
+      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
+
+      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
+        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+        // t2 = [0000|0000|00bb|bbbb]
+        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const uint16x8_t t3 = vorrq_u16(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
+            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
+        // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t mask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                 0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+        const uint8x16_t shuffle = vld1q_u8(row + 1);
+        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
 
-      if (badcharmask != 0) {
-        // optimization opportunity: check for simple masks like those made of
-        // continuous 1s followed by continuous 0s. And masks containing a
-        // single bad character.
-        bufferptr += compress_block(&b, badcharmask, bufferptr);
+        // 5. store bytes
+        vst1q_u8(utf8_output, utf8_packed);
+
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
       } else {
-        // optimization opportunity: if bufferptr == buffer and mask == 0, we
-        // can avoid the call to compress_block and decode directly.
-        copy_block(&b, bufferptr);
-        bufferptr += 64;
-      }
-      if (bufferptr >= (block_size - 1) * 64 + buffer) {
-        for (size_t i = 0; i < (block_size - 1); i++) {
-          base64_decode_block(dst, buffer + i * 64);
-          dst += 48;
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+
+        // check for invalid input
+        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
+        const uint16x8_t forbidden_bytemask = vandq_u16(
+            vcleq_u16(utf16_packed, v_dfff), vcgeq_u16(utf16_packed, v_d800));
+        if (vmaxvq_u16(forbidden_bytemask) != 0) {
+          return std::make_pair(result(error_code::SURROGATE, buf - start),
+                                reinterpret_cast<char *>(utf8_output));
         }
-        std::memcpy(buffer, buffer + (block_size - 1) * 64,
-                    64); // 64 might be too much
-        bufferptr -= (block_size - 1) * 64;
-      }
-    }
-  }
-  char *buffer_start = buffer;
-  // Optimization note: if this is almost full, then it is worth our
-  // time, otherwise, we should just decode directly.
-  int last_block = (int)((bufferptr - buffer_start) % 64);
-  if (last_block != 0 && srcend - src + last_block >= 64) {
-    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
-      uint8_t val = to_base64[uint8_t(*src)];
-      *bufferptr = char(val);
-      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-      }
-      bufferptr += (val <= 63);
-      src++;
-    }
-  }
 
-  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
-    base64_decode_block(dst, buffer_start);
-    dst += 48;
-  }
-  if ((bufferptr - buffer_start) % 64 != 0) {
-    while (buffer_start + 4 < bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 4);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+        /* In this branch we handle three cases:
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+          two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          three UTF-8 bytes
 
-      dst += 3;
-      buffer_start += 4;
-    }
-    if (buffer_start + 4 <= bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 3);
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-      dst += 3;
-      buffer_start += 4;
-    }
-    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // bring in src content
-    int leftover = int(bufferptr - buffer_start);
-    if (leftover > 0) {
-      while (leftover < 4 && src < srcend) {
-        uint8_t val = to_base64[uint8_t(*src)];
-        if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-        }
-        buffer_start[leftover] = char(val);
-        leftover += (val <= 63);
-        src++;
-      }
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-      if (leftover == 1) {
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      if (leftover == 2) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-        std::memcpy(dst, &triple, 1);
-        dst += 1;
-      } else if (leftover == 3) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6) +
-                          (uint32_t(buffer_start[2]) << 1 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-
-        std::memcpy(dst, &triple, 2);
-        dst += 2;
-      } else {
-        uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                           (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                           (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                           (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                          << 8;
-        triple = scalar::utf32::swap_bytes(triple);
-        std::memcpy(dst, &triple, 3);
-        dst += 3;
+          We precompute byte 1 for case #3 and -- **conditionally** --
+          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+          they differ by exactly one bit.
+
+          Finally from these two code units we build proper UTF-8 sequence,
+          taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        const uint16x8_t t0 =
+            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
+                                            vreinterpretq_u8_u16(dup_even)));
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        const uint16x8_t s1 =
+            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+        // [00bb|bbbb|0000|aaaa]
+        const uint16x8_t s2 = vorrq_u16(s0, s1s);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        const uint16x8_t one_or_two_bytes_bytemask =
+            vcleq_u16(utf16_packed, v_07ff);
+        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
+                                        one_or_two_bytes_bytemask);
+        const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+        // 4. expand code units 16-bit => 32-bit
+        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t onemask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+        const uint16x8_t twomask = simdutf_make_uint16x8_t(
+            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                    0x0100, 0x0400, 0x1000, 0x4000};
+        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                    0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+        const uint16x8_t combined =
+            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                      vandq_u16(one_or_two_bytes_bytemask, twomask));
+        const uint16_t mask = vaddvq_u16(combined);
+        // The following fast path may or may not be beneficial.
+        /*if(mask == 0) {
+          // We only have three-byte code units. Use fast path.
+          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+          vst1q_u8(utf8_output, utf8_0);
+          utf8_output += 12;
+          vst1q_u8(utf8_output, utf8_1);
+          utf8_output += 12;
+          buf += 8;
+          continue;
+        }*/
+        const uint8_t mask0 = uint8_t(mask);
+
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += row0[0];
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += row1[0];
+
+        buf += 8;
       }
-    }
-  }
-  if (src < srcend + equalsigns) {
-    result r =
-        scalar::base64::base64_tail_decode(dst, src, srcend - src, options);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-      r.count += size_t(src - srcinit);
-      return r;
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
     } else {
-      r.count += size_t(dst - dstinit);
-    }
-    if(r.error == error_code::SUCCESS && equalsigns > 0) {
-      // additional checks
-      if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-        r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.count = equallocation;
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
       }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
     }
-    return r;
-  }
-  if(equalsigns > 0) {
-    if((size_t(dst - dstinit) % 3 == 0) || ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-  }
-  return {SUCCESS, size_t(dst - dstinit)};
+  } // while
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
 }
-/* end file src/arm64/arm_base64.cpp */
+/* end file src/arm64/arm_convert_utf32_to_utf8.cpp */
 
 } // unnamed namespace
 } // namespace arm64
@@ -17113,9 +16514,9 @@ namespace simdutf {
 namespace arm64 {
 namespace {
 
-// Walks through a buffer in block-sized increments, loading the last part with spaces
-template<size_t STEP_SIZE>
-struct buf_block_reader {
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
 public:
   simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
   simdutf_really_inline size_t block_index();
@@ -17124,14 +16525,16 @@ struct buf_block_reader {
   /**
    * Get the last block, padded with spaces.
    *
-   * There will always be a last block, with at least 1 byte, unless len == 0 (in which case this
-   * function fills the buffer with spaces and returns 0. In particular, if len == STEP_SIZE there
-   * will be 0 full_blocks and 1 remainder block with STEP_SIZE bytes and no spaces for padding.
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
    *
    * @return the number of effective characters in the last block.
    */
   simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
   simdutf_really_inline void advance();
+
 private:
   const uint8_t *buf;
   const size_t len;
@@ -17140,9 +16543,10 @@ struct buf_block_reader {
 };
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text_64(const uint8_t *text) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
     buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
@@ -17150,50 +16554,64 @@ simdutf_unused static char * format_input_text_64(const uint8_t *text) {
 }
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text(const simd8x64<uint8_t>& in) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t*>(buf));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') { buf[i] = '_'; }
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
   return buf;
 }
 
-simdutf_unused static char * format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char*>(malloc(64 + 1));
-  for (size_t i=0; i<64; i++) {
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
     buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
   }
   buf[64] = '\0';
   return buf;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len) : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE}, idx{0} {}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() { return idx; }
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
   return idx < lenminusstep;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *buf_block_reader<STEP_SIZE>::full_block() const {
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
   return &buf[idx];
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if(len == idx) { return 0; } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20, STEP_SIZE); // std::memset STEP_SIZE because it is more efficient to write out 8 or 16 bytes at once.
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
   std::memcpy(dst, buf + idx, len - idx);
   return len - idx;
 }
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
   idx += STEP_SIZE;
 }
@@ -17210,38 +16628,39 @@ namespace utf8_validation {
 
 using namespace simd;
 
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -17251,137 +16670,173 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
+
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
   //
-  // Return nonzero if there are incomplete multibyte characters at the end of the block:
-  // e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-    // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
-    // ... 1111____ 111_____ 11______
-    static const uint8_t max_array[32] = {
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 0b11110000u-1, 0b11100000u-1, 0b11000000u-1
-    };
-    const simd8<uint8_t> max_value(&max_array[sizeof(max_array)-sizeof(simd8<uint8_t>)]);
-    return input.gt_bits(max_value);
-  }
-
-  struct utf8_checker {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-    // The last input we received
-    simd8<uint8_t> prev_input_block;
-    // Whether the last input we received was incomplete (used for ASCII fast path)
-    simd8<uint8_t> prev_incomplete;
-
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-    // The only problem that can happen at EOF is that a multibyte character is too short
-    // or a byte value too large in the last bytes: check_special_cases only checks for bytes
-    // too large in the first of two bytes.
-    simdutf_really_inline void check_eof() {
-      // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
-      // possibly finish them.
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
       this->error |= this->prev_incomplete;
-    }
-
-    simdutf_really_inline void check_next_input(const simd8x64<uint8_t>& input) {
-      if(simdutf_likely(is_ascii(input))) {
-        this->error |= this->prev_incomplete;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-        static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        this->prev_incomplete = is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1]);
-        this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1];
-
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
       }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
+  }
 
-    // do not forget to call check_eof!
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
+}; // struct utf8_checker
 } // namespace utf8_validation
 
 using utf8_validation::utf8_checker;
@@ -17399,97 +16854,109 @@ namespace utf8_validation {
 /**
  * Validates that the string is actual UTF-8.
  */
-template<class checker>
-bool generic_validate_utf8(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
     reader.advance();
-    c.check_eof();
-    return !c.errors();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-bool generic_validate_utf8(const char * input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 /**
  * Validates that the string is actual UTF-8 and stops on errors.
  */
-template<class checker>
-result generic_validate_utf8_with_errors(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    size_t count{0};
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      if(c.errors()) {
-        if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-        result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input + count), length - count);
-        res.count += count;
-        return res;
-      }
-      reader.advance();
-      count += 64;
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
-    reader.advance();
-    c.check_eof();
     if (c.errors()) {
-      if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input) + count, length - count);
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
       res.count += count;
       return res;
-    } else {
-      return result(error_code::SUCCESS, length);
     }
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
 }
 
-result generic_validate_utf8_with_errors(const char * input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-bool generic_validate_ascii(const uint8_t * input, size_t length) {
-    buf_block_reader<64> reader(input, length);
-    uint8_t blocks[64]{};
-    simd::simd8x64<uint8_t> running_or(blocks);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      running_or |= in;
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     running_or |= in;
-    return running_or.is_ascii();
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
 }
 
-bool generic_validate_ascii(const char * input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-result generic_validate_ascii_with_errors(const uint8_t * input, size_t length) {
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
   buf_block_reader<64> reader(input, length);
   size_t count{0};
   while (reader.has_full_block()) {
     simd::simd8x64<uint8_t> in(reader.full_block());
     if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
       return result(res.error, count + res.count);
     }
     reader.advance();
@@ -17500,15 +16967,17 @@ result generic_validate_ascii_with_errors(const uint8_t * input, size_t length)
   reader.get_remainder(block);
   simd::simd8x64<uint8_t> in(block);
   if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
     return result(res.error, count + res.count);
   } else {
     return result(error_code::SUCCESS, length);
   }
 }
 
-result generic_validate_ascii_with_errors(const char * input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 } // namespace utf8_validation
@@ -17517,122 +16986,47 @@ result generic_validate_ascii_with_errors(const char * input, size_t length) {
 } // namespace simdutf
 /* end file src/generic/utf8_validation/utf8_validator.h */
 // transcoding from UTF-8 to UTF-16
-/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-
-
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_utf16 {
-
-using namespace simd;
-
-template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char16_t* utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the generic directory.
-  size_t pos = 0;
-  char16_t* start{utf16_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the mask
-    // far more than 64 bytes.
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
-      in.store_ascii_as_utf16<endian>(utf16_output);
-      utf16_output += 64;
-      pos += 64;
-    } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow path.
-      // Anything that is not a continuation mask is a 'leading byte', that is, the
-      // start of a new code point.
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end* of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times when using solely
-      // the slow/regular path, and at least four times if there are fast paths.
-      while(pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        //
-        // Thus we may allow convert_masked_utf8_to_utf16 to process
-        // more bytes at a time under a fast-path mode where 16 bytes
-        // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(input + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
-      }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
-    }
-  }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(input + pos, size - pos, utf16_output);
-  return utf16_output - start;
-}
-
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 /* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace arm64 {
 namespace {
 namespace utf8_to_utf16 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -17642,352 +17036,410 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-    template <endianness endian>
-    simdutf_really_inline size_t convert(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf16::convert<endian>(in + pos, size - pos, utf16_output);
-        if(howmany == 0) { return 0; }
-        utf16_output += howmany;
-      }
-      return utf16_output - start;
-    }
-
-    template <endianness endian>
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
+
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf16_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
       }
-      return result(error_code::SUCCESS, utf16_output - start);
     }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf16 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-// transcoding from UTF-8 to UTF-32
-/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
 namespace simdutf {
 namespace arm64 {
 namespace {
-namespace utf8_to_utf32 {
+namespace utf8_to_utf16 {
 
 using namespace simd;
 
-
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char32_t* utf32_output) noexcept {
+template <endianness endian>
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char16_t *start{utf16_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
-      in.store_ascii_as_utf32(utf32_output);
-      utf32_output += 64;
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
       pos += 64;
     } else {
-    // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
-    uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-    uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-    uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-    size_t max_starting_point = (pos + 64) - 12;
-    while(pos < max_starting_point) {
-      size_t consumed = convert_masked_utf8_to_utf32(input + pos,
-                          utf8_end_of_code_point_mask, utf32_output);
-      pos += consumed;
-      utf8_end_of_code_point_mask >>= consumed;
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
   }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos, utf32_output);
-  return utf32_output - start;
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
 }
 
-
-} // namespace utf8_to_utf32
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
 /* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-
 namespace simdutf {
 namespace arm64 {
 namespace {
 namespace utf8_to_utf32 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -17997,298 +17449,323 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 words when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 16 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // we have an error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-        if(howmany == 0) { return 0; }
-        utf32_output += howmany;
-      }
-      return utf32_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf32_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      return result(error_code::SUCCESS, utf32_output - start);
     }
-
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
     }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf32 namespace
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
+
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
-// other functions
-/* begin file src/generic/utf8.h */
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
 namespace simdutf {
 namespace arm64 {
 namespace {
-namespace utf8 {
+namespace utf8_to_utf32 {
 
 using namespace simd;
 
-simdutf_really_inline size_t count_code_points(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.gt(-65);
-      count += count_ones(utf8_continuation_mask);
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
     }
-    return count + scalar::utf8::count_code_points(in + pos, size - pos);
+  }
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
 }
 
-simdutf_really_inline size_t utf16_length_from_utf8(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-      // We count one word for anything that is not a continuation (so
-      // leading bytes).
-      count += 64 - count_ones(utf8_continuation_mask);
-      int64_t utf8_4byte = input.gteq_unsigned(240);
-      count += count_ones(utf8_4byte);
-    }
-    return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // utf8 namespace
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
-/* end file src/generic/utf8.h */
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+// other functions
 /* begin file src/generic/utf16.h */
 namespace simdutf {
 namespace arm64 {
@@ -18296,48 +17773,59 @@ namespace {
 namespace utf16 {
 
 template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-      count += count_ones(not_pair) / 2;
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t ascii_mask = input.lteq(0x7F);
-      uint64_t twobyte_mask = input.lteq(0x7FF);
-      uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
-
-      size_t ascii_count = count_ones(ascii_mask) / 2;
-      size_t twobyte_count = count_ones(twobyte_mask & ~ ascii_mask) / 2;
-      size_t threebyte_count = count_ones(not_pair_mask & ~ twobyte_mask) / 2;
-      size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-      count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count + ascii_count;
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos, size - pos);
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t* in, size_t size) {
-    return count_code_points<big_endian>(in, size);
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
 }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t size, char16_t* output) {
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
   size_t pos = 0;
 
-  while (pos < size/32*32) {
+  while (pos < size / 32 * 32) {
     simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
     input.swap_bytes();
     input.store(reinterpret_cast<uint16_t *>(output));
@@ -18348,58 +17836,99 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
   scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
 }
 
-} // utf16
+} // namespace utf16
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
 /* end file src/generic/utf16.h */
+/* begin file src/generic/utf8.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8 {
+
+using namespace simd;
+
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
+
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8.h */
 // transcoding from UTF-8 to Latin 1
 /* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace arm64 {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// For UTF-8 to Latin 1, we can allow any ASCII character, and any continuation byte,
-// but the non-ASCII leading bytes must be 0b11000011 or 0b11000010 and nothing else.
-//
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-    constexpr const uint8_t FORBIDDEN  = 0xff;
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -18409,324 +17938,347 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       FORBIDDEN,
       // 1111____ ________ <four+ byte lead in byte 1>
-      FORBIDDEN
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      FORBIDDEN,
-      // ____0101 ________
-      FORBIDDEN,
-      // ____011_ ________
-      FORBIDDEN,
-      FORBIDDEN,
-
-      // ____1___ ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      // ____1101 ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
+
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
 
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      this->error |= check_special_cases(input, prev1);
-    }
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
-        if(howmany == 0) { return 0; }
-        latin1_output += howmany;
-      }
-      return latin1_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          if (errors()) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        if (res.error) {    // In case of error, we want the error position
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          latin1_output += res.count;
         }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
       }
-      return result(error_code::SUCCESS, latin1_output - start);
     }
+    return result(error_code::SUCCESS, latin1_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_latin1 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
 } // unnamed namespace
 } // namespace arm64
 } // namespace simdutf
 /* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 /* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace arm64 {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-    simdutf_really_inline size_t convert_valid(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
-        }
-      }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos, latin1_output);
-        latin1_output += howmany;
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
-      return latin1_output - start;
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
-
   }
-}   // utf8_to_latin1 namespace
-}   // unnamed namespace
-}   // namespace arm64
- // namespace simdutf
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
+
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace arm64
+} // namespace simdutf
+  // namespace simdutf
 /* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
 // placeholder scalars
@@ -18737,52 +18289,72 @@ using namespace simd;
 namespace simdutf {
 namespace arm64 {
 
-simdutf_warn_unused int implementation::detect_encodings(const char * input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   // todo: reimplement as a one-pass algorithm.
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_utf8(buf,len);
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_utf8_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_ascii(buf,len);
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_ascii_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid. protected the implementation from nullptr.
     return true;
   }
-  const char16_t* tail = arm_validate_utf16<endianness::LITTLE>(buf, len);
+  const char16_t *tail = arm_validate_utf16<endianness::LITTLE>(buf, len);
   if (tail) {
-    return scalar::utf16::validate<endianness::LITTLE>(tail, len - (tail - buf));
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
   } else {
     return false;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid. protected the implementation from nullptr.
     return true;
@@ -18795,38 +18367,43 @@ simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, s
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return result(error_code::SUCCESS, 0);
   }
   result res = arm_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return result(error_code::SUCCESS, 0);
   }
   result res = arm_validate_utf16_with_errors<endianness::BIG>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid. protected the implementation from nullptr.
     return true;
   }
-  const char32_t* tail = arm_validate_utf32le(buf, len);
+  const char32_t *tail = arm_validate_utf32le(buf, len);
   if (tail) {
     return scalar::utf32::validate(tail, len - (tail - buf));
   } else {
@@ -18834,157 +18411,203 @@ simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, siz
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return result(error_code::SUCCESS, 0);
   }
   result res = arm_validate_utf32le_with_errors(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char*, char*> ret = arm_convert_latin1_to_utf8(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char *, char *> ret =
+      arm_convert_latin1_to_utf8(buf, len, utf8_output);
   size_t converted_chars = ret.second - utf8_output;
 
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
-      ret.first, len - (ret.first - buf), ret.second);
+        ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char*, char16_t*> ret = arm_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      arm_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
   size_t converted_chars = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::LITTLE>(
-      ret.first, len - (ret.first - buf), ret.second);
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char*, char16_t*> ret = arm_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      arm_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
   size_t converted_chars = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::BIG>(
-      ret.first, len - (ret.first - buf), ret.second);
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char*, char32_t*> ret = arm_convert_latin1_to_utf32(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      arm_convert_latin1_to_utf32(buf, len, utf32_output);
   size_t converted_chars = ret.second - utf32_output;
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-      ret.first, len - (ret.first - buf), ret.second);
+        ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
-  return arm64::utf8_to_latin1::convert_valid(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return arm64::utf8_to_latin1::convert_valid(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
-  return converter.convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert(buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* input, size_t size,
-    char32_t* utf32_output) const noexcept {
-  return utf8_to_utf32::convert_valid(input, size,  utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = arm_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = arm_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = arm_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -18992,16 +18615,26 @@ simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = arm_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                               latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19009,53 +18642,79 @@ simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement a custom function.
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement a custom function.
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = arm_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = arm_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = arm_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len,
+                                                                utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19063,17 +18722,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = arm_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len,
+                                                             utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19081,43 +18750,56 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16le_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return 0;
   }
-  std::pair<const char32_t*, char*> ret = arm_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return result(error_code::SUCCESS, 0);
   }
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = arm_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19125,43 +18807,67 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = arm_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      arm_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = arm_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      arm_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = arm_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      arm_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19169,17 +18875,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = arm_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      arm_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len,
+                                                              utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19187,30 +18903,43 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = arm_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = arm_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
     result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19218,60 +18947,86 @@ simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = arm_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert_valid(
-                                        ret.first, len - (ret.first - buf), ret.second);
+        ret.first, len - (ret.first - buf), ret.second);
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   // optimization opportunity: implement a custom function.
   return convert_utf32_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = arm_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      arm_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = arm_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      arm_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = arm_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      arm_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19279,16 +19034,23 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = arm_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      arm_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                              utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -19296,56 +19058,72 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16le(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16be(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16le_to_utf32(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16be_to_utf32(buf, len, utf32_output);
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char* buf, size_t len) const noexcept {
-  return count_utf8(buf,len);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
   return scalar::utf16::latin1_length_from_utf16(length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
   return scalar::utf32::latin1_length_from_utf32(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char * input, size_t length) const noexcept {
-  // See https://lemire.me/blog/2023/05/15/computing-the-utf-8-size-of-a-latin-1-string-quickly-arm-neon-edition/
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  // See
+  // https://lemire.me/blog/2023/05/15/computing-the-utf-8-size-of-a-latin-1-string-quickly-arm-neon-edition/
   // credit to Pete Cawley
   const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
   uint64_t result = 0;
@@ -19361,116 +19139,154 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *
     // vertical addition
     result -= vaddvq_s8(vreinterpretq_s8_u8(withhighbit));
   }
-  return result + (length / lanes) * lanes + scalar::latin1::utf8_length_from_latin1((const char*)simd_end, rem);
+  return result + (length / lanes) * lanes +
+         scalar::latin1::utf8_length_from_latin1((const char *)simd_end, rem);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
 }
 
-
-
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::utf16_length_from_utf8(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const uint32x4_t v_7f = vmovq_n_u32((uint32_t)0x7f);
   const uint32x4_t v_7ff = vmovq_n_u32((uint32_t)0x7ff);
   const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
   const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 4 <= length; pos += 4) {
+  for (; pos + 4 <= length; pos += 4) {
     uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
     const uint32x4_t ascii_bytes_bytemask = vcleq_u32(in, v_7f);
     const uint32x4_t one_two_bytes_bytemask = vcleq_u32(in, v_7ff);
-    const uint32x4_t two_bytes_bytemask = veorq_u32(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const uint32x4_t three_bytes_bytemask = veorq_u32(vcleq_u32(in, v_ffff), one_two_bytes_bytemask);
-
-    const uint16x8_t reduced_ascii_bytes_bytemask = vreinterpretq_u16_u32(vandq_u32(ascii_bytes_bytemask, v_1));
-    const uint16x8_t reduced_two_bytes_bytemask = vreinterpretq_u16_u32(vandq_u32(two_bytes_bytemask, v_1));
-    const uint16x8_t reduced_three_bytes_bytemask = vreinterpretq_u16_u32(vandq_u32(three_bytes_bytemask, v_1));
-
-    const uint16x8_t compressed_bytemask0 = vpaddq_u16(reduced_ascii_bytes_bytemask, reduced_two_bytes_bytemask);
-    const uint16x8_t compressed_bytemask1 = vpaddq_u16(reduced_three_bytes_bytemask, reduced_three_bytes_bytemask);
-
-    size_t ascii_count = count_ones(vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 0));
-    size_t two_bytes_count = count_ones(vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 1));
-    size_t three_bytes_count = count_ones(vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask1), 0));
-
-    count += 16 - 3*ascii_count - 2*two_bytes_count - three_bytes_count;
-  }
-  return count + scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
-}
-
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+    const uint32x4_t two_bytes_bytemask =
+        veorq_u32(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const uint32x4_t three_bytes_bytemask =
+        veorq_u32(vcleq_u32(in, v_ffff), one_two_bytes_bytemask);
+
+    const uint16x8_t reduced_ascii_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(ascii_bytes_bytemask, v_1));
+    const uint16x8_t reduced_two_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(two_bytes_bytemask, v_1));
+    const uint16x8_t reduced_three_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(three_bytes_bytemask, v_1));
+
+    const uint16x8_t compressed_bytemask0 =
+        vpaddq_u16(reduced_ascii_bytes_bytemask, reduced_two_bytes_bytemask);
+    const uint16x8_t compressed_bytemask1 =
+        vpaddq_u16(reduced_three_bytes_bytemask, reduced_three_bytes_bytemask);
+
+    size_t ascii_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 0));
+    size_t two_bytes_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 1));
+    size_t three_bytes_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask1), 0));
+
+    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
+  }
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
   const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 4 <= length; pos += 4) {
+  for (; pos + 4 <= length; pos += 4) {
     uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
     const uint32x4_t surrogate_bytemask = vcgtq_u32(in, v_ffff);
-    const uint16x8_t reduced_bytemask = vreinterpretq_u16_u32(vandq_u32(surrogate_bytemask, v_1));
-    const uint16x8_t compressed_bytemask = vpaddq_u16(reduced_bytemask, reduced_bytemask);
-    size_t surrogate_count = count_ones(vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask), 0));
+    const uint16x8_t reduced_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(surrogate_bytemask, v_1));
+    const uint16x8_t compressed_bytemask =
+        vpaddq_u16(reduced_bytemask, reduced_bytemask);
+    size_t surrogate_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask), 0));
     count += 4 + surrogate_count;
   }
-  return count + scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
   return encode_base64(output, input, length, options);
 }
 
-
 } // namespace arm64
 } // namespace simdutf
 
@@ -19498,282 +19314,394 @@ size_t implementation::binary_to_base64(const char * input, size_t length, char*
 namespace simdutf {
 namespace fallback {
 
-simdutf_warn_unused int implementation::detect_encodings(const char * input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   // todo: reimplement as a one-pass algorithm.
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
   return scalar::utf8::validate(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
   return scalar::utf8::validate_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
   return scalar::ascii::validate(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
   return scalar::ascii::validate_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   return scalar::utf16::validate<endianness::LITTLE>(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
   return scalar::utf16::validate<endianness::BIG>(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
   return scalar::utf32::validate(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
   return scalar::utf32::validate_with_errors(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::latin1_to_utf8::convert(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::latin1_to_utf16::convert<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::latin1_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                              utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::latin1_to_utf16::convert<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::latin1_to_utf16::convert<endianness::BIG>(buf, len,
+                                                           utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char * buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   return scalar::latin1_to_utf32::convert(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf8_to_latin1::convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf8_to_latin1::convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf8_to_latin1::convert_valid(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                            utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert<endianness::BIG>(buf, len,
+                                                         utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_valid<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                               utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   return scalar::utf8_to_utf32::convert(buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   return scalar::utf8_to_utf32::convert_with_errors(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* input, size_t size,
-    char32_t* utf32_output) const noexcept {
-  return scalar::utf8_to_utf32::convert_valid(input, size,  utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return scalar::utf8_to_utf32::convert_valid(input, size, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert<endianness::LITTLE>(buf, len, latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert<endianness::LITTLE>(buf, len,
+                                                              latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert<endianness::BIG>(buf, len, latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert<endianness::BIG>(buf, len,
+                                                           latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(buf, len, latin1_output);
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+      buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(buf, len, latin1_output);
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+      buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_valid<endianness::LITTLE>(buf, len, latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_valid<endianness::LITTLE>(
+      buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_valid<endianness::BIG>(buf, len, latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_valid<endianness::BIG>(buf, len,
+                                                                 latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
+                                                            utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+      buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
+                                                               utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf32_to_latin1::convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf32_to_latin1::convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
   return scalar::utf32_to_latin1::convert_valid(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                                utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
+                                                             utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
+                                                          utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
+                                                                utf32_output);
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   scalar::utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   return scalar::utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char* buf, size_t len) const noexcept {
-  return scalar::utf8::count_code_points(buf,len);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return scalar::utf8::count_code_points(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
   return scalar::utf16::latin1_length_from_utf16(length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
   return length;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
   size_t answer = length;
   size_t i = 0;
   auto pop = [](uint64_t v) {
-    return (size_t)(((v>>7) & UINT64_C(0x0101010101010101)) * UINT64_C(0x0101010101010101) >> 56);
+    return (size_t)(((v >> 7) & UINT64_C(0x0101010101010101)) *
+                        UINT64_C(0x0101010101010101) >>
+                    56);
   };
-  for(; i + 32 <= length; i += 32) {
+  for (; i + 32 <= length; i += 32) {
     uint64_t v;
     memcpy(&v, input + i, 8);
     answer += pop(v);
@@ -19784,140 +19712,171 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *
     memcpy(&v, input + i + 24, sizeof(v));
     answer += pop(v);
   }
-  for(; i + 8 <= length; i += 8) {
+  for (; i + 8 <= length; i += 8) {
     uint64_t v;
     memcpy(&v, input + i, sizeof(v));
     answer += pop(v);
   }
-  for(; i + 1 <= length; i += 1) {
-      answer += static_cast<uint8_t>(input[i]) >> 7;
+  for (; i + 1 <= length; i += 1) {
+    answer += static_cast<uint8_t>(input[i]) >> 7;
   }
   return answer;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
+                                                                   length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
+                                                                    length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return scalar::utf8::utf16_length_from_utf8(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   return scalar::utf32::utf8_length_from_utf32(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   return scalar::utf32::utf16_length_from_utf32(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return scalar::utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   size_t equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
+  if (length == 0) {
+    if (equalsigns > 0) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   size_t equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
+  if (length == 0) {
+    if (equalsigns > 0) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
   return scalar::base64::tail_encode_base64(output, input, length, options);
 }
 } // namespace fallback
@@ -19941,22 +19900,26 @@ size_t implementation::binary_to_base64(const char * input, size_t length, char*
 SIMDUTF_TARGET_ICELAKE
 #endif
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
 SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
 #endif // end of workaround
 /* end file src/simdutf/icelake/begin.h */
 namespace simdutf {
 namespace icelake {
 namespace {
 #ifndef SIMDUTF_ICELAKE_H
-#error "icelake.h must be included"
+  #error "icelake.h must be included"
 #endif
 /* begin file src/icelake/icelake_utf8_common.inl.cpp */
-// Common procedures for both validating and non-validating conversions from UTF-8.
-enum block_processing_mode { SIMDUTF_FULL, SIMDUTF_TAIL};
+// Common procedures for both validating and non-validating conversions from
+// UTF-8.
+enum block_processing_mode { SIMDUTF_FULL, SIMDUTF_TAIL };
 
-using utf8_to_utf16_result = std::pair<const char*, char16_t*>;
-using utf8_to_utf32_result = std::pair<const char*, uint32_t*>;
+using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
+using utf8_to_utf32_result = std::pair<const char *, uint32_t *>;
 
 /*
     process_block_utf8_to_utf16 converts up to 64 bytes from 'in' from UTF-8
@@ -19970,42 +19933,54 @@ using utf8_to_utf32_result = std::pair<const char*, uint32_t*>;
     bytes have been processed, upon success.
 */
 template <block_processing_mode tail, endianness big_endian>
-simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t *&out, size_t gap) {
+simdutf_really_inline bool
+process_block_utf8_to_utf16(const char *&in, char16_t *&out, size_t gap) {
   // constants
-  __m512i mask_identity = _mm512_set_epi8(63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
+  __m512i mask_identity = _mm512_set_epi8(
+      63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46,
+      45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28,
+      27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9,
+      8, 7, 6, 5, 4, 3, 2, 1, 0);
   __m512i mask_c0c0c0c0 = _mm512_set1_epi32(0xc0c0c0c0);
   __m512i mask_80808080 = _mm512_set1_epi32(0x80808080);
   __m512i mask_f0f0f0f0 = _mm512_set1_epi32(0xf0f0f0f0);
-  __m512i mask_dfdfdfdf_tail = _mm512_set_epi64(0xffffdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf);
+  __m512i mask_dfdfdfdf_tail = _mm512_set_epi64(
+      0xffffdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
+      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
+      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf);
   __m512i mask_c2c2c2c2 = _mm512_set1_epi32(0xc2c2c2c2);
   __m512i mask_ffffffff = _mm512_set1_epi32(0xffffffff);
   __m512i mask_d7c0d7c0 = _mm512_set1_epi32(0xd7c0d7c0);
   __m512i mask_dc00dc00 = _mm512_set1_epi32(0xdc00dc00);
-  __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
   // Note that 'tail' is a compile-time constant !
-  __mmask64 b = (tail == SIMDUTF_FULL) ? 0xFFFFFFFFFFFFFFFF : (uint64_t(1) << gap) - 1;
-  __m512i input = (tail == SIMDUTF_FULL) ? _mm512_loadu_si512(in) : _mm512_maskz_loadu_epi8(b, in);
-  __mmask64 m1 = (tail == SIMDUTF_FULL) ? _mm512_cmplt_epu8_mask(input, mask_80808080) : _mm512_mask_cmplt_epu8_mask(b, input, mask_80808080);
-  if(_ktestc_mask64_u8(m1, b)) {// NOT(m1) AND b -- if all zeroes, then all ASCII
-  // alternatively, we could do 'if (m1 == b) { '
+  __mmask64 b =
+      (tail == SIMDUTF_FULL) ? 0xFFFFFFFFFFFFFFFF : (uint64_t(1) << gap) - 1;
+  __m512i input = (tail == SIMDUTF_FULL) ? _mm512_loadu_si512(in)
+                                         : _mm512_maskz_loadu_epi8(b, in);
+  __mmask64 m1 = (tail == SIMDUTF_FULL)
+                     ? _mm512_cmplt_epu8_mask(input, mask_80808080)
+                     : _mm512_mask_cmplt_epu8_mask(b, input, mask_80808080);
+  if (_ktestc_mask64_u8(m1,
+                        b)) { // NOT(m1) AND b -- if all zeroes, then all ASCII
+                              // alternatively, we could do 'if (m1 == b) { '
     if (tail == SIMDUTF_FULL) {
-      in += 64;          // consumed 64 bytes
+      in += 64; // consumed 64 bytes
       // we convert a full 64-byte block, writing 128 bytes.
       __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-      if(big_endian) { input1 = _mm512_shuffle_epi8(input1, byteflip); }
+      if (big_endian) {
+        input1 = _mm512_shuffle_epi8(input1, byteflip);
+      }
       _mm512_storeu_si512(out, input1);
       out += 32;
-      __m512i input2 = _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
-      if(big_endian) { input2 = _mm512_shuffle_epi8(input2, byteflip); }
+      __m512i input2 =
+          _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
+      if (big_endian) {
+        input2 = _mm512_shuffle_epi8(input2, byteflip);
+      }
       _mm512_storeu_si512(out, input2);
       out += 32;
       return true; // we are done
@@ -20013,60 +19988,85 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
       in += gap;
       if (gap <= 32) {
         __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-        if(big_endian) { input1 = _mm512_shuffle_epi8(input1, byteflip); }
-        _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << (gap)) - 1), input1);
+        if (big_endian) {
+          input1 = _mm512_shuffle_epi8(input1, byteflip);
+        }
+        _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << (gap)) - 1),
+                                 input1);
         out += gap;
       } else {
         __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-        if(big_endian) { input1 = _mm512_shuffle_epi8(input1, byteflip); }
+        if (big_endian) {
+          input1 = _mm512_shuffle_epi8(input1, byteflip);
+        }
         _mm512_storeu_si512(out, input1);
         out += 32;
-        __m512i input2 = _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
-        if(big_endian) { input2 = _mm512_shuffle_epi8(input2, byteflip); }
-        _mm512_mask_storeu_epi16(out, __mmask32((uint32_t(1) << (gap - 32)) - 1), input2);
+        __m512i input2 =
+            _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
+        if (big_endian) {
+          input2 = _mm512_shuffle_epi8(input2, byteflip);
+        }
+        _mm512_mask_storeu_epi16(
+            out, __mmask32((uint32_t(1) << (gap - 32)) - 1), input2);
         out += gap - 32;
       }
       return true; // we are done
     }
   }
   // classify characters further
-  __mmask64 m234 = _mm512_cmp_epu8_mask(mask_c0c0c0c0, input,
-                                        _MM_CMPINT_LE); // 0xc0 <= input, 2, 3, or 4 leading byte
-  __mmask64 m34 = _mm512_cmp_epu8_mask(mask_dfdfdfdf_tail, input,
-                                       _MM_CMPINT_LT); // 0xdf < input,  3 or 4 leading byte
-
-  __mmask64 milltwobytes = _mm512_mask_cmp_epu8_mask(m234, input, mask_c2c2c2c2,
-                                                     _MM_CMPINT_LT); // 0xc0 <= input < 0xc2 (illegal two byte sequence)
-                                                                     // Overlong 2-byte sequence
+  __mmask64 m234 = _mm512_cmp_epu8_mask(
+      mask_c0c0c0c0, input,
+      _MM_CMPINT_LE); // 0xc0 <= input, 2, 3, or 4 leading byte
+  __mmask64 m34 =
+      _mm512_cmp_epu8_mask(mask_dfdfdfdf_tail, input,
+                           _MM_CMPINT_LT); // 0xdf < input,  3 or 4 leading byte
+
+  __mmask64 milltwobytes = _mm512_mask_cmp_epu8_mask(
+      m234, input, mask_c2c2c2c2,
+      _MM_CMPINT_LT); // 0xc0 <= input < 0xc2 (illegal two byte sequence)
+                      // Overlong 2-byte sequence
   if (_ktestz_mask64_u8(milltwobytes, milltwobytes) == 0) {
     // Overlong 2-byte sequence
     return false;
   }
   if (_ktestz_mask64_u8(m34, m34) == 0) {
-    // We have a 3-byte sequence and/or a 2-byte sequence, or possibly even a 4-byte sequence!
-    __mmask64 m4 = _mm512_cmp_epu8_mask(input, mask_f0f0f0f0,
-                                        _MM_CMPINT_NLT); // 0xf0 <= zmm0 (4 byte start bytes)
+    // We have a 3-byte sequence and/or a 2-byte sequence, or possibly even a
+    // 4-byte sequence!
+    __mmask64 m4 = _mm512_cmp_epu8_mask(
+        input, mask_f0f0f0f0,
+        _MM_CMPINT_NLT); // 0xf0 <= zmm0 (4 byte start bytes)
 
-    __mmask64 mask_not_ascii = (tail == SIMDUTF_FULL) ? _knot_mask64(m1) : _kand_mask64(_knot_mask64(m1), b);
+    __mmask64 mask_not_ascii = (tail == SIMDUTF_FULL)
+                                   ? _knot_mask64(m1)
+                                   : _kand_mask64(_knot_mask64(m1), b);
 
     __mmask64 mp1 = _kshiftli_mask64(m234, 1);
     __mmask64 mp2 = _kshiftli_mask64(m34, 2);
     // We could do it as follows...
-    // if (_kortestz_mask64_u8(m4,m4)) { // compute the bitwise OR of the 64-bit masks a and b and return 1 if all zeroes
-    // but GCC generates better code when we do:
-    if (m4 == 0) { // compute the bitwise OR of the 64-bit masks a and b and return 1 if all zeroes
+    // if (_kortestz_mask64_u8(m4,m4)) { // compute the bitwise OR of the 64-bit
+    // masks a and b and return 1 if all zeroes but GCC generates better code
+    // when we do:
+    if (m4 == 0) { // compute the bitwise OR of the 64-bit masks a and b and
+                   // return 1 if all zeroes
       // Fast path with 1,2,3 bytes
       __mmask64 mc = _kor_mask64(mp1, mp2); // expected continuation bytes
       __mmask64 m1234 = _kor_mask64(m1, m234);
       // mismatched continuation bytes:
       if (tail == SIMDUTF_FULL) {
-        __mmask64 xnormcm1234 = _kxnor_mask64(mc, m1234); // XNOR of mc and m1234 should be all zero if they differ
+        __mmask64 xnormcm1234 = _kxnor_mask64(
+            mc,
+            m1234); // XNOR of mc and m1234 should be all zero if they differ
         // the presence of a 1 bit indicates that they overlap.
-        // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return 1 if all zeroes.
-        if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) { return false; }
+        // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return
+        // 1 if all zeroes.
+        if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
+          return false;
+        }
       } else {
         __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
-        if (mc != bxorm1234) { return false; }
+        if (mc != bxorm1234) {
+          return false;
+        }
       }
       // mend: identifying the last bytes of each sequence to be decoded
       __mmask64 mend = _kshiftri_mask64(m1234, 1);
@@ -20074,36 +20074,56 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
         mend = _kor_mask64(mend, (uint64_t(1) << (gap - 1)));
       }
 
-
       __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
-      __m512i last_and_thirdu16 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
-
-      __m512i nonasciitags = _mm512_maskz_mov_epi8(mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
-      __m512i clearedbytes = _mm512_andnot_si512(nonasciitags, input);             // high two bits cleared where not ASCII
-      __m512i lastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, last_and_thirdu16,
-                                                        clearedbytes); // the last byte of each character
-
-      __mmask64 mask_before_non_ascii = _kshiftri_mask64(mask_not_ascii, 1);               // bytes that precede non-ASCII bytes
-      __m512i indexofsecondlastbytes = _mm512_add_epi16(mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
-      __m512i beforeasciibytes = _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
-      __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, indexofsecondlastbytes,
-                                                              beforeasciibytes); // the second last bytes (of two, three byte seq,
-                                                                                 // surrogates)
-      secondlastbytes = _mm512_slli_epi16(secondlastbytes, 6);                   // shifted into position
-
-      __m512i indexofthirdlastbytes = _mm512_add_epi16(mask_ffffffff,
-                                                       indexofsecondlastbytes); // indices of the second last bytes
-      __m512i thirdlastbyte = _mm512_maskz_mov_epi8(m34,
-                                                    clearedbytes); // only those that are the third last byte of a sequence
-      __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, indexofthirdlastbytes,
-                                                             thirdlastbyte); // the third last bytes (of three byte sequences, hi
-                                                                             // surrogate)
-      thirdlastbytes = _mm512_slli_epi16(thirdlastbytes, 12);                // shifted into position
-      __m512i Wout = _mm512_ternarylogic_epi32(lastbytes, secondlastbytes, thirdlastbytes, 254);
-      // the elements of Wout excluding the last element if it happens to be a high surrogate:
-
-      __mmask64 mprocessed = (tail == SIMDUTF_FULL) ? _pdep_u64(0xFFFFFFFF, mend) : _pdep_u64(0xFFFFFFFF, _kand_mask64(mend, b)); // we adjust mend at the end of the output.
-
+      __m512i last_and_thirdu16 =
+          _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
+
+      __m512i nonasciitags = _mm512_maskz_mov_epi8(
+          mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
+      __m512i clearedbytes = _mm512_andnot_si512(
+          nonasciitags, input); // high two bits cleared where not ASCII
+      __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, last_and_thirdu16,
+          clearedbytes); // the last byte of each character
+
+      __mmask64 mask_before_non_ascii = _kshiftri_mask64(
+          mask_not_ascii, 1); // bytes that precede non-ASCII bytes
+      __m512i indexofsecondlastbytes = _mm512_add_epi16(
+          mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
+      __m512i beforeasciibytes =
+          _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
+      __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, indexofsecondlastbytes,
+          beforeasciibytes); // the second last bytes (of two, three byte seq,
+                             // surrogates)
+      secondlastbytes =
+          _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
+
+      __m512i indexofthirdlastbytes = _mm512_add_epi16(
+          mask_ffffffff,
+          indexofsecondlastbytes); // indices of the second last bytes
+      __m512i thirdlastbyte =
+          _mm512_maskz_mov_epi8(m34,
+                                clearedbytes); // only those that are the third
+                                               // last byte of a sequence
+      __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, indexofthirdlastbytes,
+          thirdlastbyte); // the third last bytes (of three byte sequences, hi
+                          // surrogate)
+      thirdlastbytes =
+          _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
+      __m512i Wout = _mm512_ternarylogic_epi32(lastbytes, secondlastbytes,
+                                               thirdlastbytes, 254);
+      // the elements of Wout excluding the last element if it happens to be a
+      // high surrogate:
+
+      __mmask64 mprocessed =
+          (tail == SIMDUTF_FULL)
+              ? _pdep_u64(0xFFFFFFFF, mend)
+              : _pdep_u64(
+                    0xFFFFFFFF,
+                    _kand_mask64(
+                        mend, b)); // we adjust mend at the end of the output.
 
       // Encodings out of range...
       {
@@ -20112,15 +20132,21 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
         // code units in Wout corresponding to 3-byte sequences.
         __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
         __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
-        __mmask32 Msmall800 = _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
+        __mmask32 Msmall800 =
+            _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
         __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
         __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
-        __mmask32 M3s = _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
-        if (_kor_mask32(Msmall800, M3s)) { return false; }
+        __mmask32 M3s =
+            _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
+        if (_kor_mask32(Msmall800, M3s)) {
+          return false;
+        }
       }
       int64_t nout = _mm_popcnt_u64(mprocessed);
-      in +=  64 - _lzcnt_u64(mprocessed);
-      if(big_endian) { Wout = _mm512_shuffle_epi8(Wout, byteflip); }
+      in += 64 - _lzcnt_u64(mprocessed);
+      if (big_endian) {
+        Wout = _mm512_shuffle_epi8(Wout, byteflip);
+      }
       _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
       out += nout;
       return true; // ok
@@ -20129,64 +20155,96 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
     // We have a 4-byte sequence, this is the general case.
     // Slow!
     __mmask64 mp3 = _kshiftli_mask64(m4, 3);
-    __mmask64 mc = _kor_mask64(_kor_mask64(mp1, mp2), mp3); // expected continuation bytes
+    __mmask64 mc =
+        _kor_mask64(_kor_mask64(mp1, mp2), mp3); // expected continuation bytes
     __mmask64 m1234 = _kor_mask64(m1, m234);
 
     // mend: identifying the last bytes of each sequence to be decoded
-    __mmask64 mend = _kor_mask64(_kshiftri_mask64(_kor_mask64(mp3, m1234), 1), mp3);
+    __mmask64 mend =
+        _kor_mask64(_kshiftri_mask64(_kor_mask64(mp3, m1234), 1), mp3);
     if (tail != SIMDUTF_FULL) {
       mend = _kor_mask64(mend, __mmask64(uint64_t(1) << (gap - 1)));
     }
     __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
-    __m512i last_and_thirdu16 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
-
-    __m512i nonasciitags = _mm512_maskz_mov_epi8(mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
-    __m512i clearedbytes = _mm512_andnot_si512(nonasciitags, input);             // high two bits cleared where not ASCII
-    __m512i lastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, last_and_thirdu16,
-                                                      clearedbytes); // the last byte of each character
-
-    __mmask64 mask_before_non_ascii = _kshiftri_mask64(mask_not_ascii, 1);               // bytes that precede non-ASCII bytes
-    __m512i indexofsecondlastbytes = _mm512_add_epi16(mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
-    __m512i beforeasciibytes = _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
-    __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, indexofsecondlastbytes,
-                                                            beforeasciibytes); // the second last bytes (of two, three byte seq,
-                                                                               // surrogates)
-    secondlastbytes = _mm512_slli_epi16(secondlastbytes, 6);                   // shifted into position
-
-    __m512i indexofthirdlastbytes = _mm512_add_epi16(mask_ffffffff,
-                                                     indexofsecondlastbytes); // indices of the second last bytes
-    __m512i thirdlastbyte = _mm512_maskz_mov_epi8(m34,
-                                                  clearedbytes); // only those that are the third last byte of a sequence
-    __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(0x5555555555555555, indexofthirdlastbytes,
-                                                           thirdlastbyte); // the third last bytes (of three byte sequences, hi
-                                                                           // surrogate)
-    thirdlastbytes = _mm512_slli_epi16(thirdlastbytes, 12);                // shifted into position
-    __m512i thirdsecondandlastbytes = _mm512_ternarylogic_epi32(lastbytes, secondlastbytes, thirdlastbytes, 254);
+    __m512i last_and_thirdu16 =
+        _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
+
+    __m512i nonasciitags = _mm512_maskz_mov_epi8(
+        mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
+    __m512i clearedbytes = _mm512_andnot_si512(
+        nonasciitags, input); // high two bits cleared where not ASCII
+    __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, last_and_thirdu16,
+        clearedbytes); // the last byte of each character
+
+    __mmask64 mask_before_non_ascii = _kshiftri_mask64(
+        mask_not_ascii, 1); // bytes that precede non-ASCII bytes
+    __m512i indexofsecondlastbytes = _mm512_add_epi16(
+        mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
+    __m512i beforeasciibytes =
+        _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
+    __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, indexofsecondlastbytes,
+        beforeasciibytes); // the second last bytes (of two, three byte seq,
+                           // surrogates)
+    secondlastbytes =
+        _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
+
+    __m512i indexofthirdlastbytes = _mm512_add_epi16(
+        mask_ffffffff,
+        indexofsecondlastbytes); // indices of the second last bytes
+    __m512i thirdlastbyte = _mm512_maskz_mov_epi8(
+        m34,
+        clearedbytes); // only those that are the third last byte of a sequence
+    __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, indexofthirdlastbytes,
+        thirdlastbyte); // the third last bytes (of three byte sequences, hi
+                        // surrogate)
+    thirdlastbytes =
+        _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
+    __m512i thirdsecondandlastbytes = _mm512_ternarylogic_epi32(
+        lastbytes, secondlastbytes, thirdlastbytes, 254);
     uint64_t Mlo_uint64 = _pext_u64(mp3, mend);
     __mmask32 Mlo = __mmask32(Mlo_uint64);
     __mmask32 Mhi = __mmask32(Mlo_uint64 >> 1);
-    __m512i lo_surr_mask = _mm512_maskz_mov_epi16(Mlo,
-                                                  mask_dc00dc00); // lo surr: 1101110000000000, other:  0000000000000000
-    __m512i shifted4_thirdsecondandlastbytes = _mm512_srli_epi16(thirdsecondandlastbytes,
-                                                                 4); // hi surr: 00000WVUTSRQPNML  vuts = WVUTS - 1
-    __m512i tagged_lo_surrogates = _mm512_or_si512(thirdsecondandlastbytes,
-                                                   lo_surr_mask); // lo surr: 110111KJHGFEDCBA, other:  unchanged
-    __m512i Wout = _mm512_mask_add_epi16(tagged_lo_surrogates, Mhi, shifted4_thirdsecondandlastbytes,
-                                         mask_d7c0d7c0); // hi sur: 110110vutsRQPNML, other:  unchanged
-    // the elements of Wout excluding the last element if it happens to be a high surrogate:
+    __m512i lo_surr_mask = _mm512_maskz_mov_epi16(
+        Mlo,
+        mask_dc00dc00); // lo surr: 1101110000000000, other:  0000000000000000
+    __m512i shifted4_thirdsecondandlastbytes =
+        _mm512_srli_epi16(thirdsecondandlastbytes,
+                          4); // hi surr: 00000WVUTSRQPNML  vuts = WVUTS - 1
+    __m512i tagged_lo_surrogates = _mm512_or_si512(
+        thirdsecondandlastbytes,
+        lo_surr_mask); // lo surr: 110111KJHGFEDCBA, other:  unchanged
+    __m512i Wout = _mm512_mask_add_epi16(
+        tagged_lo_surrogates, Mhi, shifted4_thirdsecondandlastbytes,
+        mask_d7c0d7c0); // hi sur: 110110vutsRQPNML, other:  unchanged
+    // the elements of Wout excluding the last element if it happens to be a
+    // high surrogate:
     __mmask32 Mout = ~(Mhi & 0x80000000);
-    __mmask64 mprocessed = (tail == SIMDUTF_FULL) ? _pdep_u64(Mout, mend) : _pdep_u64(Mout, _kand_mask64(mend, b)); // we adjust mend at the end of the output.
-
+    __mmask64 mprocessed =
+        (tail == SIMDUTF_FULL)
+            ? _pdep_u64(Mout, mend)
+            : _pdep_u64(
+                  Mout,
+                  _kand_mask64(mend,
+                               b)); // we adjust mend at the end of the output.
 
     // mismatched continuation bytes:
     if (tail == SIMDUTF_FULL) {
-      __mmask64 xnormcm1234 = _kxnor_mask64(mc, m1234); // XNOR of mc and m1234 should be all zero if they differ
+      __mmask64 xnormcm1234 = _kxnor_mask64(
+          mc, m1234); // XNOR of mc and m1234 should be all zero if they differ
       // the presence of a 1 bit indicates that they overlap.
-      // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return 1 if all zeroes.
-      if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) { return false; }
+      // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return 1
+      // if all zeroes.
+      if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
+        return false;
+      }
     } else {
       __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
-      if (mc != bxorm1234) { return false; }
+      if (mc != bxorm1234) {
+        return false;
+      }
     }
     // Encodings out of range...
     {
@@ -20195,33 +20253,50 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
       // code units in Wout corresponding to 3-byte sequences.
       __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
       __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
-      __mmask32 Msmall800 = _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
+      __mmask32 Msmall800 =
+          _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
       __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
       __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
-      __mmask32 M3s = _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
+      __mmask32 M3s =
+          _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
       __m512i mask_04000400 = _mm512_set1_epi32(0x04000400);
-      __mmask32 M4s = _mm512_mask_cmpge_epu16_mask(Mhi, Moutminusd800, mask_04000400);
-      if (!_kortestz_mask32_u8(M4s, _kor_mask32(Msmall800, M3s))) { return false; }
+      __mmask32 M4s =
+          _mm512_mask_cmpge_epu16_mask(Mhi, Moutminusd800, mask_04000400);
+      if (!_kortestz_mask32_u8(M4s, _kor_mask32(Msmall800, M3s))) {
+        return false;
+      }
     }
     in += 64 - _lzcnt_u64(mprocessed);
     int64_t nout = _mm_popcnt_u64(mprocessed);
-    if(big_endian) { Wout = _mm512_shuffle_epi8(Wout, byteflip); }
+    if (big_endian) {
+      Wout = _mm512_shuffle_epi8(Wout, byteflip);
+    }
     _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
     out += nout;
     return true; // ok
   }
   // Fast path 2: all ASCII or 2 byte
-  __mmask64 continuation_or_ascii = (tail == SIMDUTF_FULL) ? _knot_mask64(m234) : _kand_mask64(_knot_mask64(m234), b);
+  __mmask64 continuation_or_ascii = (tail == SIMDUTF_FULL)
+                                        ? _knot_mask64(m234)
+                                        : _kand_mask64(_knot_mask64(m234), b);
   // on top of -0xc0 we subtract -2 which we get back later of the
   // continuation byte tags
   __m512i leading2byte = _mm512_maskz_sub_epi8(m234, input, mask_c2c2c2c2);
-  __mmask64 leading = tail == (tail == SIMDUTF_FULL) ? _kor_mask64(m1, m234) : _kand_mask64(_kor_mask64(m1, m234), b); // first bytes of each sequence
+  __mmask64 leading = tail == (tail == SIMDUTF_FULL)
+                          ? _kor_mask64(m1, m234)
+                          : _kand_mask64(_kor_mask64(m1, m234),
+                                         b); // first bytes of each sequence
   if (tail == SIMDUTF_FULL) {
-    __mmask64 xnor234leading = _kxnor_mask64(_kshiftli_mask64(m234, 1), leading);
-    if (!_kortestz_mask64_u8(xnor234leading, xnor234leading)) { return false; }
+    __mmask64 xnor234leading =
+        _kxnor_mask64(_kshiftli_mask64(m234, 1), leading);
+    if (!_kortestz_mask64_u8(xnor234leading, xnor234leading)) {
+      return false;
+    }
   } else {
     __mmask64 bxorleading = _kxor_mask64(b, leading);
-    if (_kshiftli_mask64(m234, 1) != bxorleading) { return false; }
+    if (_kshiftli_mask64(m234, 1) != bxorleading) {
+      return false;
+    }
   }
   //
   if (tail == SIMDUTF_FULL) {
@@ -20232,23 +20307,30 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
     // Note that if x is an ASCII byte, then the following is false:
     // int8_t(x) <= int8_t(0xc0) under two's complement.
     in += 32;
-    if(int8_t(*in) <= int8_t(0xc0)) in++;
+    if (int8_t(*in) <= int8_t(0xc0))
+      in++;
     // The alternative is to do
     // in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
-    // but it requires loading the input, doing the mask computation, and converting
-    // back the mask to a general register. It just takes too long, leaving the
-    // processor likely to be idle.
+    // but it requires loading the input, doing the mask computation, and
+    // converting back the mask to a general register. It just takes too long,
+    // leaving the processor likely to be idle.
   } else {
     in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
   }
-  __m512i lead = _mm512_maskz_compress_epi8(leading, leading2byte);          // will contain zero for ascii, and the data
-  lead = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(lead));                 // ... zero extended into code units
-  __m512i follow = _mm512_maskz_compress_epi8(continuation_or_ascii, input); // the last bytes of each sequence
-  follow = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(follow));             // ... zero extended into code units
-  lead = _mm512_slli_epi16(lead, 6);                                         // shifted into position
-  __m512i final = _mm512_add_epi16(follow, lead);                            // combining lead and follow
+  __m512i lead = _mm512_maskz_compress_epi8(
+      leading, leading2byte); // will contain zero for ascii, and the data
+  lead = _mm512_cvtepu8_epi16(
+      _mm512_castsi512_si256(lead)); // ... zero extended into code units
+  __m512i follow = _mm512_maskz_compress_epi8(
+      continuation_or_ascii, input); // the last bytes of each sequence
+  follow = _mm512_cvtepu8_epi16(
+      _mm512_castsi512_si256(follow)); // ... zero extended into code units
+  lead = _mm512_slli_epi16(lead, 6);   // shifted into position
+  __m512i final = _mm512_add_epi16(follow, lead); // combining lead and follow
 
-  if(big_endian) { final = _mm512_shuffle_epi8(final, byteflip); }
+  if (big_endian) {
+    final = _mm512_shuffle_epi8(final, byteflip);
+  }
   if (tail == SIMDUTF_FULL) {
     // Next part is UTF-16 specific and can be generalized to UTF-32.
     int nout = _mm_popcnt_u32(uint32_t(leading));
@@ -20263,9 +20345,6 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
   return true; // we are fine.
 }
 
-
-
-
 /*
     utf32_to_utf16_masked converts `count` lower UTF-32 code units
     from input `utf32` into UTF-16. It differs from utf32_to_utf16
@@ -20288,58 +20367,77 @@ simdutf_really_inline bool process_block_utf8_to_utf16(const char *&in, char16_t
     keep the value in a (constant) register.
 */
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_to_utf16_masked(const __m512i byteflip, __m512i utf32, unsigned int count, char16_t* output) {
-
-    const __mmask16 valid = uint16_t((1 << count) - 1);
-    // 1. check if we have any surrogate pairs
-    const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
-    const __mmask16 sp_mask = _mm512_mask_cmpgt_epu32_mask(valid, utf32, v_0000_ffff);
+simdutf_really_inline size_t utf32_to_utf16_masked(const __m512i byteflip,
+                                                   __m512i utf32,
+                                                   unsigned int count,
+                                                   char16_t *output) {
+
+  const __mmask16 valid = uint16_t((1 << count) - 1);
+  // 1. check if we have any surrogate pairs
+  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
+  const __mmask16 sp_mask =
+      _mm512_mask_cmpgt_epu32_mask(valid, utf32, v_0000_ffff);
+
+  if (sp_mask == 0) {
+    if (big_endian) {
+      _mm256_mask_storeu_epi16(
+          (__m256i *)output, valid,
+          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
+                              _mm512_castsi512_si256(byteflip)));
 
-    if (sp_mask == 0) {
-        if(big_endian) {
-          _mm256_mask_storeu_epi16((__m256i*)output, valid, _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32), _mm512_castsi512_si256(byteflip)));
+    } else {
+      _mm256_mask_storeu_epi16((__m256i *)output, valid,
+                               _mm512_cvtepi32_epi16(utf32));
+    }
+    return count;
+  }
 
-        } else {
-          _mm256_mask_storeu_epi16((__m256i*)output, valid, _mm512_cvtepi32_epi16(utf32));
-        }
-        return count;
+  {
+    // build surrogate pair code units in 32-bit lanes
+
+    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
+    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
+    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
+
+    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
+    const __m512i t1 = _mm512_slli_epi32(t0, 6);
+
+    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
+    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
+    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
+
+    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
+    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
+    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
+    const __m512i t3 =
+        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
+    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
+    __m512i t5 = _mm512_ror_epi32(t4, 16);
+    // Here we want to trim all of the upper 16-bit code units from the 2-byte
+    // characters represented as 4-byte values. We can compute it from
+    // sp_mask or the following... It can be more optimized!
+    const __mmask32 nonzero = _kor_mask32(
+        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
+    const __mmask32 nonzero_masked =
+        _kand_mask32(nonzero, __mmask32((uint64_t(1) << (2 * count)) - 1));
+    if (big_endian) {
+      t5 = _mm512_shuffle_epi8(t5, byteflip);
     }
+    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
+    // (zen4)
+    __m512i compressed = _mm512_maskz_compress_epi16(nonzero_masked, t5);
+    _mm512_mask_storeu_epi16(
+        output,
+        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
+        compressed);
+    //_mm512_mask_compressstoreu_epi16(output, nonzero_masked, t5);
+  }
 
-    {
-        // build surrogate pair code units in 32-bit lanes
-
-        //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
-        const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
-        const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
-
-        //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
-        const __m512i t1 = _mm512_slli_epi32(t0, 6);
-
-        //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1 to t0
-        //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
-        const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
-        const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
-
-        //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1 to t0
-        //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
-        const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
-        const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
-        const __m512i t3 = _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
-        const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
-        __m512i t5 = _mm512_ror_epi32(t4, 16);
-        // Here we want to trim all of the upper 16-bit code units from the 2-byte
-        // characters represented as 4-byte values. We can compute it from
-        // sp_mask or the following... It can be more optimized!
-        const  __mmask32 nonzero = _kor_mask32(0xaaaaaaaa,_mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
-        const  __mmask32 nonzero_masked = _kand_mask32(nonzero, __mmask32((uint64_t(1) << (2*count)) - 1));
-        if(big_endian) { t5 = _mm512_shuffle_epi8(t5, byteflip); }
-        // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability (zen4)
-        __m512i compressed = _mm512_maskz_compress_epi16(nonzero_masked, t5);
-        _mm512_mask_storeu_epi16(output, (1<<(count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1, compressed);
-        //_mm512_mask_compressstoreu_epi16(output, nonzero_masked, t5);
-    }
-
-    return count + static_cast<unsigned int>(count_ones(sp_mask));
+  return count + static_cast<unsigned int>(count_ones(sp_mask));
 }
 
 /*
@@ -20363,96 +20461,109 @@ simdutf_really_inline size_t utf32_to_utf16_masked(const __m512i byteflip, __m51
     keep the value in a (constant) register.
 */
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_to_utf16(const __m512i byteflip, __m512i utf32, unsigned int count, char16_t* output) {
-    // check if we have any surrogate pairs
-    const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
-    const __mmask16 sp_mask = _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
-
-    if (sp_mask == 0) {
-        // technically, it should be _mm256_storeu_epi16
-        if(big_endian) {
-          _mm256_storeu_si256((__m256i*)output, _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),_mm512_castsi512_si256(byteflip)));
-        } else {
-          _mm256_storeu_si256((__m256i*)output, _mm512_cvtepi32_epi16(utf32));
-        }
-        return count;
+simdutf_really_inline size_t utf32_to_utf16(const __m512i byteflip,
+                                            __m512i utf32, unsigned int count,
+                                            char16_t *output) {
+  // check if we have any surrogate pairs
+  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
+  const __mmask16 sp_mask = _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
+
+  if (sp_mask == 0) {
+    // technically, it should be _mm256_storeu_epi16
+    if (big_endian) {
+      _mm256_storeu_si256(
+          (__m256i *)output,
+          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
+                              _mm512_castsi512_si256(byteflip)));
+    } else {
+      _mm256_storeu_si256((__m256i *)output, _mm512_cvtepi32_epi16(utf32));
     }
+    return count;
+  }
 
-    {
-        // build surrogate pair code units in 32-bit lanes
-
-        //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
-        const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
-        const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
-
-        //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
-        const __m512i t1 = _mm512_slli_epi32(t0, 6);
-
-        //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1 to t0
-        //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
-        const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
-        const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
-
-        //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1 to t0
-        //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
-        const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
-        const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
-        const __m512i t3 = _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
-        const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
-        __m512i t5 = _mm512_ror_epi32(t4, 16);
-        const  __mmask32 nonzero = _kor_mask32(0xaaaaaaaa,_mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
-        if(big_endian) { t5 = _mm512_shuffle_epi8(t5, byteflip); }
-        // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability (zen4)
-        __m512i compressed = _mm512_maskz_compress_epi16(nonzero, t5);
-        _mm512_mask_storeu_epi16(output, (1<<(count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1, compressed);
-        //_mm512_mask_compressstoreu_epi16(output, nonzero, t5);
+  {
+    // build surrogate pair code units in 32-bit lanes
+
+    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
+    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
+    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
+
+    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
+    const __m512i t1 = _mm512_slli_epi32(t0, 6);
+
+    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
+    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
+    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
+
+    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
+    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
+    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
+    const __m512i t3 =
+        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
+    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
+    __m512i t5 = _mm512_ror_epi32(t4, 16);
+    const __mmask32 nonzero = _kor_mask32(
+        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
+    if (big_endian) {
+      t5 = _mm512_shuffle_epi8(t5, byteflip);
     }
+    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
+    // (zen4)
+    __m512i compressed = _mm512_maskz_compress_epi16(nonzero, t5);
+    _mm512_mask_storeu_epi16(
+        output,
+        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
+        compressed);
+    //_mm512_mask_compressstoreu_epi16(output, nonzero, t5);
+  }
 
-    return count + static_cast<unsigned int>(count_ones(sp_mask));
+  return count + static_cast<unsigned int>(count_ones(sp_mask));
 }
 
 /**
  * Store the last N bytes of previous followed by 512-N bytes from input.
  */
-template <int N>
-__m512i prev(__m512i input, __m512i previous) {
-    static_assert(N<=32, "N must be no larger than 32");
-    const __m512i movemask = _mm512_setr_epi32(28,29,30,31,0,1,2,3,4,5,6,7,8,9,10,11);
-    const __m512i rotated = _mm512_permutex2var_epi32(input, movemask, previous);
+template <int N> __m512i prev(__m512i input, __m512i previous) {
+  static_assert(N <= 32, "N must be no larger than 32");
+  const __m512i movemask =
+      _mm512_setr_epi32(28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+  const __m512i rotated = _mm512_permutex2var_epi32(input, movemask, previous);
 #if SIMDUTF_GCC8 || SIMDUTF_GCC9
-    constexpr int shift = 16-N; // workaround for GCC8,9
-    return _mm512_alignr_epi8(input, rotated, shift);
+  constexpr int shift = 16 - N; // workaround for GCC8,9
+  return _mm512_alignr_epi8(input, rotated, shift);
 #else
-    return _mm512_alignr_epi8(input, rotated, 16-N);
+  return _mm512_alignr_epi8(input, rotated, 16 - N);
 #endif // SIMDUTF_GCC8 || SIMDUTF_GCC9
 }
 
 template <unsigned idx0, unsigned idx1, unsigned idx2, unsigned idx3>
 __m512i shuffle_epi128(__m512i v) {
-    static_assert((idx0 >= 0 && idx0 <= 3), "idx0 must be in range 0..3");
-    static_assert((idx1 >= 0 && idx1 <= 3), "idx1 must be in range 0..3");
-    static_assert((idx2 >= 0 && idx2 <= 3), "idx2 must be in range 0..3");
-    static_assert((idx3 >= 0 && idx3 <= 3), "idx3 must be in range 0..3");
+  static_assert((idx0 >= 0 && idx0 <= 3), "idx0 must be in range 0..3");
+  static_assert((idx1 >= 0 && idx1 <= 3), "idx1 must be in range 0..3");
+  static_assert((idx2 >= 0 && idx2 <= 3), "idx2 must be in range 0..3");
+  static_assert((idx3 >= 0 && idx3 <= 3), "idx3 must be in range 0..3");
 
-    constexpr unsigned shuffle = idx0 | (idx1 << 2) | (idx2 << 4) | (idx3 << 6);
-    return _mm512_shuffle_i32x4(v, v, shuffle);
+  constexpr unsigned shuffle = idx0 | (idx1 << 2) | (idx2 << 4) | (idx3 << 6);
+  return _mm512_shuffle_i32x4(v, v, shuffle);
 }
 
-template <unsigned idx>
-constexpr __m512i broadcast_epi128(__m512i v) {
-    return shuffle_epi128<idx, idx, idx, idx>(v);
+template <unsigned idx> constexpr __m512i broadcast_epi128(__m512i v) {
+  return shuffle_epi128<idx, idx, idx, idx>(v);
 }
 
 /**
  * Current unused.
  */
-template <int N>
-__m512i rotate_by_N_epi8(const __m512i input) {
+template <int N> __m512i rotate_by_N_epi8(const __m512i input) {
 
-    // lanes order: 1, 2, 3, 0 => 0b00_11_10_01
-    const __m512i permuted = _mm512_shuffle_i32x4(input, input, 0x39);
+  // lanes order: 1, 2, 3, 0 => 0b00_11_10_01
+  const __m512i permuted = _mm512_shuffle_i32x4(input, input, 0x39);
 
-    return _mm512_alignr_epi8(permuted, input, N);
+  return _mm512_alignr_epi8(permuted, input, N);
 }
 
 /*
@@ -20463,163 +20574,149 @@ __m512i rotate_by_N_epi8(const __m512i input) {
     0x8080800N, where N is 4 highest bits from the leading byte; 0x80 resets
     corresponding bytes during pshufb.
 */
-simdutf_really_inline __m512i expanded_utf8_to_utf32(__m512i char_class, __m512i utf8) {
-    /*
-        Input:
-        - utf8: bytes stored at separate 32-bit code units
-        - valid: which code units have valid UTF-8 characters
-
-        Bit layout of single word. We show 4 cases for each possible
-        UTF-8 character encoding. The `?` denotes bits we must not
-        assume their value.
-
-        |10dd.dddd|10cc.cccc|10bb.bbbb|1111.0aaa| 4-byte char
-        |????.????|10cc.cccc|10bb.bbbb|1110.aaaa| 3-byte char
-        |????.????|????.????|10bb.bbbb|110a.aaaa| 2-byte char
-        |????.????|????.????|????.????|0aaa.aaaa| ASCII char
-          byte 3    byte 2    byte 1     byte 0
-    */
-
-    /* 1. Reset control bits of continuation bytes and the MSB
-          of the leading byte; this makes all bytes unsigned (and
-          does not alter ASCII char).
-
-        |00dd.dddd|00cc.cccc|00bb.bbbb|0111.0aaa| 4-byte char
-        |00??.????|00cc.cccc|00bb.bbbb|0110.aaaa| 3-byte char
-        |00??.????|00??.????|00bb.bbbb|010a.aaaa| 2-byte char
-        |00??.????|00??.????|00??.????|0aaa.aaaa| ASCII char
-         ^^        ^^        ^^        ^
-    */
-    __m512i values;
-    const __m512i v_3f3f_3f7f = _mm512_set1_epi32(0x3f3f3f7f);
-    values = _mm512_and_si512(utf8, v_3f3f_3f7f);
-
-    /* 2. Swap and join fields A-B and C-D
-
-        |0000.cccc|ccdd.dddd|0001.110a|aabb.bbbb| 4-byte char
-        |0000.cccc|cc??.????|0001.10aa|aabb.bbbb| 3-byte char
-        |0000.????|????.????|0001.0aaa|aabb.bbbb| 2-byte char
-        |0000.????|????.????|000a.aaaa|aa??.????| ASCII char */
-    const __m512i v_0140_0140 = _mm512_set1_epi32(0x01400140);
-    values = _mm512_maddubs_epi16(values, v_0140_0140);
-
-    /* 3. Swap and join fields AB & CD
-
-        |0000.0001|110a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char
-        |0000.0001|10aa.aabb|bbbb.cccc|cc??.????| 3-byte char
-        |0000.0001|0aaa.aabb|bbbb.????|????.????| 2-byte char
-        |0000.000a|aaaa.aa??|????.????|????.????| ASCII char */
-    const __m512i v_0001_1000 = _mm512_set1_epi32(0x00011000);
-    values = _mm512_madd_epi16(values, v_0001_1000);
-
-    /* 4. Shift left the values by variable amounts to reset highest UTF-8 bits
-        |aaab.bbbb|bccc.cccd|dddd.d000|0000.0000| 4-byte char -- by 11
-        |aaaa.bbbb|bbcc.cccc|????.??00|0000.0000| 3-byte char -- by 10
-        |aaaa.abbb|bbb?.????|????.???0|0000.0000| 2-byte char -- by 9
-        |aaaa.aaa?|????.????|????.????|?000.0000| ASCII char -- by 7 */
-    {
-        /** pshufb
-
-        continuation = 0
-        ascii    = 7
-        _2_bytes = 9
-        _3_bytes = 10
-        _4_bytes = 11
-
-        shift_left_v3 = 4 * [
-            ascii, # 0000
-            ascii, # 0001
-            ascii, # 0010
-            ascii, # 0011
-            ascii, # 0100
-            ascii, # 0101
-            ascii, # 0110
-            ascii, # 0111
-            continuation, # 1000
-            continuation, # 1001
-            continuation, # 1010
-            continuation, # 1011
-            _2_bytes, # 1100
-            _2_bytes, # 1101
-            _3_bytes, # 1110
-            _4_bytes, # 1111
-        ] */
-        const __m512i shift_left_v3 = _mm512_setr_epi64(
-            0x0707070707070707,
-            0x0b0a090900000000,
-            0x0707070707070707,
-            0x0b0a090900000000,
-            0x0707070707070707,
-            0x0b0a090900000000,
-            0x0707070707070707,
-            0x0b0a090900000000
-        );
-
-        const __m512i shift = _mm512_shuffle_epi8(shift_left_v3, char_class);
-        values = _mm512_sllv_epi32(values, shift);
-    }
-
-    /* 5. Shift right the values by variable amounts to reset lowest bits
-        |0000.0000|000a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char -- by 11
-        |0000.0000|0000.0000|aaaa.bbbb|bbcc.cccc| 3-byte char -- by 16
-        |0000.0000|0000.0000|0000.0aaa|aabb.bbbb| 2-byte char -- by 21
-        |0000.0000|0000.0000|0000.0000|0aaa.aaaa| ASCII char -- by 25 */
-    {
-        // 4 * [25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 21, 21, 16, 11]
-        const __m512i shift_right = _mm512_setr_epi64(
-            0x1919191919191919,
-            0x0b10151500000000,
-            0x1919191919191919,
-            0x0b10151500000000,
-            0x1919191919191919,
-            0x0b10151500000000,
-            0x1919191919191919,
-            0x0b10151500000000
-        );
+simdutf_really_inline __m512i expanded_utf8_to_utf32(__m512i char_class,
+                                                     __m512i utf8) {
+  /*
+      Input:
+      - utf8: bytes stored at separate 32-bit code units
+      - valid: which code units have valid UTF-8 characters
+
+      Bit layout of single word. We show 4 cases for each possible
+      UTF-8 character encoding. The `?` denotes bits we must not
+      assume their value.
+
+      |10dd.dddd|10cc.cccc|10bb.bbbb|1111.0aaa| 4-byte char
+      |????.????|10cc.cccc|10bb.bbbb|1110.aaaa| 3-byte char
+      |????.????|????.????|10bb.bbbb|110a.aaaa| 2-byte char
+      |????.????|????.????|????.????|0aaa.aaaa| ASCII char
+        byte 3    byte 2    byte 1     byte 0
+  */
+
+  /* 1. Reset control bits of continuation bytes and the MSB
+        of the leading byte; this makes all bytes unsigned (and
+        does not alter ASCII char).
+
+      |00dd.dddd|00cc.cccc|00bb.bbbb|0111.0aaa| 4-byte char
+      |00??.????|00cc.cccc|00bb.bbbb|0110.aaaa| 3-byte char
+      |00??.????|00??.????|00bb.bbbb|010a.aaaa| 2-byte char
+      |00??.????|00??.????|00??.????|0aaa.aaaa| ASCII char
+       ^^        ^^        ^^        ^
+  */
+  __m512i values;
+  const __m512i v_3f3f_3f7f = _mm512_set1_epi32(0x3f3f3f7f);
+  values = _mm512_and_si512(utf8, v_3f3f_3f7f);
+
+  /* 2. Swap and join fields A-B and C-D
+
+      |0000.cccc|ccdd.dddd|0001.110a|aabb.bbbb| 4-byte char
+      |0000.cccc|cc??.????|0001.10aa|aabb.bbbb| 3-byte char
+      |0000.????|????.????|0001.0aaa|aabb.bbbb| 2-byte char
+      |0000.????|????.????|000a.aaaa|aa??.????| ASCII char */
+  const __m512i v_0140_0140 = _mm512_set1_epi32(0x01400140);
+  values = _mm512_maddubs_epi16(values, v_0140_0140);
+
+  /* 3. Swap and join fields AB & CD
+
+      |0000.0001|110a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char
+      |0000.0001|10aa.aabb|bbbb.cccc|cc??.????| 3-byte char
+      |0000.0001|0aaa.aabb|bbbb.????|????.????| 2-byte char
+      |0000.000a|aaaa.aa??|????.????|????.????| ASCII char */
+  const __m512i v_0001_1000 = _mm512_set1_epi32(0x00011000);
+  values = _mm512_madd_epi16(values, v_0001_1000);
+
+  /* 4. Shift left the values by variable amounts to reset highest UTF-8 bits
+      |aaab.bbbb|bccc.cccd|dddd.d000|0000.0000| 4-byte char -- by 11
+      |aaaa.bbbb|bbcc.cccc|????.??00|0000.0000| 3-byte char -- by 10
+      |aaaa.abbb|bbb?.????|????.???0|0000.0000| 2-byte char -- by 9
+      |aaaa.aaa?|????.????|????.????|?000.0000| ASCII char -- by 7 */
+  {
+    /** pshufb
+
+    continuation = 0
+    ascii    = 7
+    _2_bytes = 9
+    _3_bytes = 10
+    _4_bytes = 11
+
+    shift_left_v3 = 4 * [
+        ascii, # 0000
+        ascii, # 0001
+        ascii, # 0010
+        ascii, # 0011
+        ascii, # 0100
+        ascii, # 0101
+        ascii, # 0110
+        ascii, # 0111
+        continuation, # 1000
+        continuation, # 1001
+        continuation, # 1010
+        continuation, # 1011
+        _2_bytes, # 1100
+        _2_bytes, # 1101
+        _3_bytes, # 1110
+        _4_bytes, # 1111
+    ] */
+    const __m512i shift_left_v3 = _mm512_setr_epi64(
+        0x0707070707070707, 0x0b0a090900000000, 0x0707070707070707,
+        0x0b0a090900000000, 0x0707070707070707, 0x0b0a090900000000,
+        0x0707070707070707, 0x0b0a090900000000);
+
+    const __m512i shift = _mm512_shuffle_epi8(shift_left_v3, char_class);
+    values = _mm512_sllv_epi32(values, shift);
+  }
+
+  /* 5. Shift right the values by variable amounts to reset lowest bits
+      |0000.0000|000a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char -- by 11
+      |0000.0000|0000.0000|aaaa.bbbb|bbcc.cccc| 3-byte char -- by 16
+      |0000.0000|0000.0000|0000.0aaa|aabb.bbbb| 2-byte char -- by 21
+      |0000.0000|0000.0000|0000.0000|0aaa.aaaa| ASCII char -- by 25 */
+  {
+    // 4 * [25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 21, 21, 16, 11]
+    const __m512i shift_right = _mm512_setr_epi64(
+        0x1919191919191919, 0x0b10151500000000, 0x1919191919191919,
+        0x0b10151500000000, 0x1919191919191919, 0x0b10151500000000,
+        0x1919191919191919, 0x0b10151500000000);
 
-        const __m512i shift = _mm512_shuffle_epi8(shift_right, char_class);
-        values = _mm512_srlv_epi32(values, shift);
-    }
+    const __m512i shift = _mm512_shuffle_epi8(shift_right, char_class);
+    values = _mm512_srlv_epi32(values, shift);
+  }
 
-    return values;
+  return values;
 }
 
-
-simdutf_really_inline __m512i expand_and_identify(__m512i lane0, __m512i lane1, int &count) {
-    const __m512i merged = _mm512_mask_mov_epi32(lane0, 0x1000, lane1);
-    const __m512i expand_ver2 = _mm512_setr_epi64(
-                0x0403020103020100,
-                0x0605040305040302,
-                0x0807060507060504,
-                0x0a09080709080706,
-                0x0c0b0a090b0a0908,
-                0x0e0d0c0b0d0c0b0a,
-                0x000f0e0d0f0e0d0c,
-                0x0201000f01000f0e
-    );
-    const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);
-    const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);
-    const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);
-    const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);
-    const __mmask16 leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);
-    count = static_cast<int>(count_ones(leading_bytes));
-    return  _mm512_mask_compress_epi32(_mm512_setzero_si512(), leading_bytes, input);
+simdutf_really_inline __m512i expand_and_identify(__m512i lane0, __m512i lane1,
+                                                  int &count) {
+  const __m512i merged = _mm512_mask_mov_epi32(lane0, 0x1000, lane1);
+  const __m512i expand_ver2 = _mm512_setr_epi64(
+      0x0403020103020100, 0x0605040305040302, 0x0807060507060504,
+      0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,
+      0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);
+  const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);
+  const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);
+  const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);
+  const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);
+  const __mmask16 leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);
+  count = static_cast<int>(count_ones(leading_bytes));
+  return _mm512_mask_compress_epi32(_mm512_setzero_si512(), leading_bytes,
+                                    input);
 }
 
 simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
-    __m512i char_class = _mm512_srli_epi32(input, 4);
-    /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */
-    const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);
-    const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);
-    char_class = _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea);
-    return expanded_utf8_to_utf32(char_class, input);
+  __m512i char_class = _mm512_srli_epi32(input, 4);
+  /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */
+  const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);
+  const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);
+  char_class =
+      _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea);
+  return expanded_utf8_to_utf32(char_class, input);
 }
 /* end file src/icelake/icelake_utf8_common.inl.cpp */
 /* begin file src/icelake/icelake_macros.inl.cpp */
 
 /*
-    This upcoming macro (SIMDUTF_ICELAKE_TRANSCODE16) takes 16 + 4 bytes (of a UTF-8 string)
-    and loads all possible 4-byte substring into an AVX512 register.
+    This upcoming macro (SIMDUTF_ICELAKE_TRANSCODE16) takes 16 + 4 bytes (of a
+   UTF-8 string) and loads all possible 4-byte substring into an AVX512
+   register.
 
     For example if we have bytes abcdefgh... we create following 32-bit lanes
 
@@ -20628,8 +20725,9 @@ simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
      byte 0 of reg              byte 63 of reg
 */
 /** pshufb
-        # lane{0,1,2} have got bytes: [  0,  1,  2,  3,  4,  5,  6,  8,  9, 10, 11, 12, 13, 14, 15]
-        # lane3 has got bytes:        [ 16, 17, 18, 19,  4,  5,  6,  8,  9, 10, 11, 12, 13, 14, 15]
+        # lane{0,1,2} have got bytes: [  0,  1,  2,  3,  4,  5,  6,  8,  9, 10,
+   11, 12, 13, 14, 15] # lane3 has got bytes:        [ 16, 17, 18, 19,  4,  5,
+   6,  8,  9, 10, 11, 12, 13, 14, 15]
 
         expand_ver2 = [
             # lane 0:
@@ -20650,105 +20748,113 @@ simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
             10, 11, 12, 13,
             11, 12, 13, 14,
 
-            # lane 3 order: 13, 14, 15, 16 14, 15, 16, 17, 15, 16, 17, 18, 16, 17, 18, 19
-            12, 13, 14, 15,
-            13, 14, 15,  0,
-            14, 15,  0,  1,
-            15,  0,  1,  2,
+            # lane 3 order: 13, 14, 15, 16 14, 15, 16, 17, 15, 16, 17, 18, 16,
+   17, 18, 19 12, 13, 14, 15, 13, 14, 15,  0, 14, 15,  0,  1, 15,  0,  1,  2,
         ]
 */
 
-#define SIMDUTF_ICELAKE_TRANSCODE16(LANE0, LANE1, MASKED)                                                    \
-        {                                                                                                    \
-            const __m512i merged = _mm512_mask_mov_epi32(LANE0, 0x1000, LANE1);                              \
-            const __m512i expand_ver2 = _mm512_setr_epi64(                                                   \
-                0x0403020103020100,                                                                          \
-                0x0605040305040302,                                                                          \
-                0x0807060507060504,                                                                          \
-                0x0a09080709080706,                                                                          \
-                0x0c0b0a090b0a0908,                                                                          \
-                0x0e0d0c0b0d0c0b0a,                                                                          \
-                0x000f0e0d0f0e0d0c,                                                                          \
-                0x0201000f01000f0e                                                                           \
-            );                                                                                               \
-            const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);                                  \
-                                                                                                             \
-            __mmask16 leading_bytes;                                                                         \
-            const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);                                             \
-            const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);                                         \
-            const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);                                             \
-            leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);                                       \
-                                                                                                             \
-            __m512i char_class;                                                                              \
-            char_class = _mm512_srli_epi32(input, 4);                                                        \
-            /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */                                           \
-            const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);                                             \
-            const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);                                       \
-            char_class = _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea);              \
-                                                                                                             \
-            const int valid_count = static_cast<int>(count_ones(leading_bytes));                             \
-            const __m512i utf32 = expanded_utf8_to_utf32(char_class, input);                                 \
-                                                                                                             \
-            const __m512i out = _mm512_mask_compress_epi32(_mm512_setzero_si512(), leading_bytes, utf32);    \
-                                                                                                             \
-            if (UTF32) {                                                                                     \
-                if(MASKED) {                                                                                 \
-                    const __mmask16 valid = uint16_t((1 << valid_count) - 1);                                \
-                    _mm512_mask_storeu_epi32((__m512i*)output, valid, out);                                  \
-                } else {                                                                                     \
-                    _mm512_storeu_si512((__m512i*)output, out);                                              \
-                }                                                                                            \
-                output += valid_count;                                                                       \
-            } else {                                                                                         \
-                if(MASKED) {                                                                                 \
-                    output += utf32_to_utf16_masked<big_endian>(byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
-                } else {                                                                                     \
-                    output += utf32_to_utf16<big_endian>(byteflip, out, valid_count, reinterpret_cast<char16_t *>(output));        \
-                }                                                                                            \
-            }                                                                                                \
-        }
-
-#define SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(INPUT, VALID_COUNT, MASKED)                                    \
-{                                                                                                           \
-    if (UTF32) {                                                                                            \
-        if(MASKED) {                                                                                        \
-            const __mmask16 valid_mask = uint16_t((1 << VALID_COUNT) - 1);                                  \
-            _mm512_mask_storeu_epi32((__m512i*)output, valid_mask, INPUT);                                  \
-        } else {                                                                                            \
-            _mm512_storeu_si512((__m512i*)output, INPUT);                                              \
-        }                                                                                                   \
-        output += VALID_COUNT;                                                                              \
-    } else {                                                                                                \
-        if(MASKED) {                                                                                        \
-            output += utf32_to_utf16_masked<big_endian>(byteflip, INPUT, VALID_COUNT, reinterpret_cast<char16_t *>(output));      \
-        } else {                                                                                            \
-            output += utf32_to_utf16<big_endian>(byteflip, INPUT, VALID_COUNT, reinterpret_cast<char16_t *>(output));             \
-        }                                                                                                   \
-    }                                                                                                       \
-}
-
-
-#define SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)                                  \
-        if (UTF32) {                                                                      \
-                const __m128i t0 = _mm512_castsi512_si128(utf8);                          \
-                const __m128i t1 = _mm512_extracti32x4_epi32(utf8, 1);                    \
-                const __m128i t2 = _mm512_extracti32x4_epi32(utf8, 2);                    \
-                const __m128i t3 = _mm512_extracti32x4_epi32(utf8, 3);                    \
-                _mm512_storeu_si512((__m512i*)(output + 0*16), _mm512_cvtepu8_epi32(t0)); \
-                _mm512_storeu_si512((__m512i*)(output + 1*16), _mm512_cvtepu8_epi32(t1)); \
-                _mm512_storeu_si512((__m512i*)(output + 2*16), _mm512_cvtepu8_epi32(t2)); \
-                _mm512_storeu_si512((__m512i*)(output + 3*16), _mm512_cvtepu8_epi32(t3)); \
-        } else {                                                                          \
-                const __m256i h0 = _mm512_castsi512_si256(utf8);                          \
-                const __m256i h1 = _mm512_extracti64x4_epi64(utf8, 1);                    \
-                if(big_endian) {                                                          \
-                _mm512_storeu_si512((__m512i*)(output + 0*16), _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h0), byteflip)); \
-                _mm512_storeu_si512((__m512i*)(output + 2*16), _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h1), byteflip)); \
-                } else {                                                                  \
-                _mm512_storeu_si512((__m512i*)(output + 0*16), _mm512_cvtepu8_epi16(h0)); \
-                _mm512_storeu_si512((__m512i*)(output + 2*16), _mm512_cvtepu8_epi16(h1)); \
-                }                                                                         \
-        }
+#define SIMDUTF_ICELAKE_TRANSCODE16(LANE0, LANE1, MASKED)                      \
+  {                                                                            \
+    const __m512i merged = _mm512_mask_mov_epi32(LANE0, 0x1000, LANE1);        \
+    const __m512i expand_ver2 = _mm512_setr_epi64(                             \
+        0x0403020103020100, 0x0605040305040302, 0x0807060507060504,            \
+        0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,            \
+        0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);                               \
+    const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);            \
+                                                                               \
+    __mmask16 leading_bytes;                                                   \
+    const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);                       \
+    const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);                   \
+    const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);                       \
+    leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);                 \
+                                                                               \
+    __m512i char_class;                                                        \
+    char_class = _mm512_srli_epi32(input, 4);                                  \
+    /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */                     \
+    const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);                       \
+    const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);                 \
+    char_class =                                                               \
+        _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea); \
+                                                                               \
+    const int valid_count = static_cast<int>(count_ones(leading_bytes));       \
+    const __m512i utf32 = expanded_utf8_to_utf32(char_class, input);           \
+                                                                               \
+    const __m512i out = _mm512_mask_compress_epi32(_mm512_setzero_si512(),     \
+                                                   leading_bytes, utf32);      \
+                                                                               \
+    if (UTF32) {                                                               \
+      if (MASKED) {                                                            \
+        const __mmask16 valid = uint16_t((1 << valid_count) - 1);              \
+        _mm512_mask_storeu_epi32((__m512i *)output, valid, out);               \
+      } else {                                                                 \
+        _mm512_storeu_si512((__m512i *)output, out);                           \
+      }                                                                        \
+      output += valid_count;                                                   \
+    } else {                                                                   \
+      if (MASKED) {                                                            \
+        output += utf32_to_utf16_masked<big_endian>(                           \
+            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
+      } else {                                                                 \
+        output += utf32_to_utf16<big_endian>(                                  \
+            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
+      }                                                                        \
+    }                                                                          \
+  }
+
+#define SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(INPUT, VALID_COUNT, MASKED)       \
+  {                                                                            \
+    if (UTF32) {                                                               \
+      if (MASKED) {                                                            \
+        const __mmask16 valid_mask = uint16_t((1 << VALID_COUNT) - 1);         \
+        _mm512_mask_storeu_epi32((__m512i *)output, valid_mask, INPUT);        \
+      } else {                                                                 \
+        _mm512_storeu_si512((__m512i *)output, INPUT);                         \
+      }                                                                        \
+      output += VALID_COUNT;                                                   \
+    } else {                                                                   \
+      if (MASKED) {                                                            \
+        output += utf32_to_utf16_masked<big_endian>(                           \
+            byteflip, INPUT, VALID_COUNT,                                      \
+            reinterpret_cast<char16_t *>(output));                             \
+      } else {                                                                 \
+        output +=                                                              \
+            utf32_to_utf16<big_endian>(byteflip, INPUT, VALID_COUNT,           \
+                                       reinterpret_cast<char16_t *>(output));  \
+      }                                                                        \
+    }                                                                          \
+  }
+
+#define SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)                       \
+  if (UTF32) {                                                                 \
+    const __m128i t0 = _mm512_castsi512_si128(utf8);                           \
+    const __m128i t1 = _mm512_extracti32x4_epi32(utf8, 1);                     \
+    const __m128i t2 = _mm512_extracti32x4_epi32(utf8, 2);                     \
+    const __m128i t3 = _mm512_extracti32x4_epi32(utf8, 3);                     \
+    _mm512_storeu_si512((__m512i *)(output + 0 * 16),                          \
+                        _mm512_cvtepu8_epi32(t0));                             \
+    _mm512_storeu_si512((__m512i *)(output + 1 * 16),                          \
+                        _mm512_cvtepu8_epi32(t1));                             \
+    _mm512_storeu_si512((__m512i *)(output + 2 * 16),                          \
+                        _mm512_cvtepu8_epi32(t2));                             \
+    _mm512_storeu_si512((__m512i *)(output + 3 * 16),                          \
+                        _mm512_cvtepu8_epi32(t3));                             \
+  } else {                                                                     \
+    const __m256i h0 = _mm512_castsi512_si256(utf8);                           \
+    const __m256i h1 = _mm512_extracti64x4_epi64(utf8, 1);                     \
+    if (big_endian) {                                                          \
+      _mm512_storeu_si512(                                                     \
+          (__m512i *)(output + 0 * 16),                                        \
+          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h0), byteflip));            \
+      _mm512_storeu_si512(                                                     \
+          (__m512i *)(output + 2 * 16),                                        \
+          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h1), byteflip));            \
+    } else {                                                                   \
+      _mm512_storeu_si512((__m512i *)(output + 0 * 16),                        \
+                          _mm512_cvtepu8_epi16(h0));                           \
+      _mm512_storeu_si512((__m512i *)(output + 2 * 16),                        \
+                          _mm512_cvtepu8_epi16(h1));                           \
+    }                                                                          \
+  }
 /* end file src/icelake/icelake_macros.inl.cpp */
 /* begin file src/icelake/icelake_from_valid_utf8.inl.cpp */
 // file included directly
@@ -20771,250 +20877,240 @@ simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
     - pair.second   - the first unprocessed output word
 */
 template <endianness big_endian, typename OUTPUT>
-std::pair<const char*, OUTPUT*> valid_utf8_to_fixed_length(const char* str, size_t len, OUTPUT* dwords) {
-    constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-    constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-    static_assert(UTF32 or UTF16, "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-    static_assert(!(UTF32 and big_endian), "we do not currently support big-endian UTF-32");
+std::pair<const char *, OUTPUT *>
+valid_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
 
-    __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    const char* ptr = str;
-    const char* end = ptr + len;
-
-    OUTPUT* output = dwords;
-    /**
-     * In the main loop, we consume 64 bytes per iteration,
-     * but we access 64 + 4 bytes.
-     * We check for ptr + 64 + 64 <= end because
-     * we want to be do maskless writes without overruns.
-     */
-    while (end - ptr  >= 64 + 4) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-        const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
-        if(ascii == 0) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-            continue;
-        }
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  const char *ptr = str;
+  const char *end = ptr + len;
 
-        const __m512i lane0 = broadcast_epi128<0>(utf8);
-        const __m512i lane1 = broadcast_epi128<1>(utf8);
-        int valid_count0;
-        __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-        const __m512i lane2 = broadcast_epi128<2>(utf8);
-        int valid_count1;
-        __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-        if(valid_count0 + valid_count1 <= 16) {
-            vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-            valid_count0 += valid_count1;
-            vec0 = expand_utf8_to_utf32(vec0);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        } else {
-            vec0 = expand_utf8_to_utf32(vec0);
-            vec1 = expand_utf8_to_utf32(vec1);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-        }
-        const __m512i lane3 = broadcast_epi128<3>(utf8);
-        int valid_count2;
-        __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-        uint32_t tmp1;
-        ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-        const __m512i lane4 = _mm512_set1_epi32(tmp1);
-        int valid_count3;
-        __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-        if(valid_count2 + valid_count3 <= 16) {
-            vec2 = _mm512_mask_expand_epi32(vec2, __mmask16(((1<<valid_count3)-1)<<valid_count2), vec3);
-            valid_count2 += valid_count3;
-            vec2 = expand_utf8_to_utf32(vec2);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-        } else {
-            vec2 = expand_utf8_to_utf32(vec2);
-            vec3 = expand_utf8_to_utf32(vec3);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
-        }
-        ptr += 4*16;
+  OUTPUT *output = dwords;
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   * We check for ptr + 64 + 64 <= end because
+   * we want to be do maskless writes without overruns.
+   */
+  while (end - ptr >= 64 + 4) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
+    if (ascii == 0) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
     }
 
-    if (end - ptr >= 64) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-        const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
-        if(ascii == 0) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-        } else {
-            const __m512i lane0 = broadcast_epi128<0>(utf8);
-            const __m512i lane1 = broadcast_epi128<1>(utf8);
-            int valid_count0;
-            __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-            const __m512i lane2 = broadcast_epi128<2>(utf8);
-            int valid_count1;
-            __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-            if(valid_count0 + valid_count1 <= 16) {
-                vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-                valid_count0 += valid_count1;
-                vec0 = expand_utf8_to_utf32(vec0);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            } else {
-                vec0 = expand_utf8_to_utf32(vec0);
-                vec1 = expand_utf8_to_utf32(vec1);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-            }
-
-            const __m512i lane3 = broadcast_epi128<3>(utf8);
-            SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
-
-            ptr += 3*16;
-        }
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+    } else {
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+    }
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    } else {
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
     }
-    return {ptr, output};
-}
+    ptr += 4 * 16;
+  }
+
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
+    if (ascii == 0) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+      }
+
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
 
+      ptr += 3 * 16;
+    }
+  }
+  return {ptr, output};
+}
 
-using utf8_to_utf16_result = std::pair<const char*, char16_t*>;
+using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
 /* end file src/icelake/icelake_from_valid_utf8.inl.cpp */
 /* begin file src/icelake/icelake_utf8_validation.inl.cpp */
 // file included directly
 
+simdutf_really_inline __m512i check_special_cases(__m512i input,
+                                                  const __m512i prev1) {
+  __m512i mask1 = _mm512_setr_epi64(0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080);
+  const __m512i v_0f = _mm512_set1_epi8(0x0f);
+  __m512i index1 = _mm512_and_si512(_mm512_srli_epi16(prev1, 4), v_0f);
+
+  __m512i byte_1_high = _mm512_shuffle_epi8(mask1, index1);
+  __m512i mask2 = _mm512_setr_epi64(0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb);
+  __m512i index2 = _mm512_and_si512(prev1, v_0f);
+
+  __m512i byte_1_low = _mm512_shuffle_epi8(mask2, index2);
+  __m512i mask3 =
+      _mm512_setr_epi64(0x101010101010101, 0x1010101babaaee6, 0x101010101010101,
+                        0x1010101babaaee6, 0x101010101010101, 0x1010101babaaee6,
+                        0x101010101010101, 0x1010101babaaee6);
+  __m512i index3 = _mm512_and_si512(_mm512_srli_epi16(input, 4), v_0f);
+  __m512i byte_2_high = _mm512_shuffle_epi8(mask3, index3);
+  return _mm512_ternarylogic_epi64(byte_1_high, byte_1_low, byte_2_high, 128);
+}
+
+simdutf_really_inline __m512i check_multibyte_lengths(const __m512i input,
+                                                      const __m512i prev_input,
+                                                      const __m512i sc) {
+  __m512i prev2 = prev<2>(input, prev_input);
+  __m512i prev3 = prev<3>(input, prev_input);
+  __m512i is_third_byte = _mm512_subs_epu8(
+      prev2, _mm512_set1_epi8(0b11100000u - 1)); // Only 111_____ will be > 0
+  __m512i is_fourth_byte = _mm512_subs_epu8(
+      prev3, _mm512_set1_epi8(0b11110000u - 1)); // Only 1111____ will be > 0
+  __m512i is_third_or_fourth_byte =
+      _mm512_or_si512(is_third_byte, is_fourth_byte);
+  const __m512i v_7f = _mm512_set1_epi8(char(0x7f));
+  is_third_or_fourth_byte = _mm512_adds_epu8(v_7f, is_third_or_fourth_byte);
+  // We want to compute (is_third_or_fourth_byte AND v80) XOR sc.
+  const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+  return _mm512_ternarylogic_epi32(is_third_or_fourth_byte, v_80, sc,
+                                   0b1101010);
+  //__m512i is_third_or_fourth_byte_mask =
+  //_mm512_and_si512(is_third_or_fourth_byte, v_80); return
+  // _mm512_xor_si512(is_third_or_fourth_byte_mask, sc);
+}
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline __m512i is_incomplete(const __m512i input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  __m512i max_value = _mm512_setr_epi64(0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xbfdfefffffffffff);
+  return _mm512_subs_epu8(input, max_value);
+}
+
+struct avx512_utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  __m512i error{};
+
+  // The last input we received
+  __m512i prev_input_block{};
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  __m512i prev_incomplete{};
 
-simdutf_really_inline __m512i check_special_cases(__m512i input, const __m512i prev1) {
-  __m512i mask1 = _mm512_setr_epi64(
-        0x0202020202020202,
-        0x4915012180808080,
-        0x0202020202020202,
-        0x4915012180808080,
-        0x0202020202020202,
-        0x4915012180808080,
-        0x0202020202020202,
-        0x4915012180808080);
-    const __m512i v_0f = _mm512_set1_epi8(0x0f);
-    __m512i index1 = _mm512_and_si512(_mm512_srli_epi16(prev1, 4), v_0f);
-
-    __m512i byte_1_high = _mm512_shuffle_epi8(mask1, index1);
-    __m512i mask2 = _mm512_setr_epi64(
-        0xcbcbcb8b8383a3e7,
-        0xcbcbdbcbcbcbcbcb,
-        0xcbcbcb8b8383a3e7,
-        0xcbcbdbcbcbcbcbcb,
-        0xcbcbcb8b8383a3e7,
-        0xcbcbdbcbcbcbcbcb,
-        0xcbcbcb8b8383a3e7,
-        0xcbcbdbcbcbcbcbcb);
-     __m512i index2 = _mm512_and_si512(prev1, v_0f);
-
-    __m512i byte_1_low = _mm512_shuffle_epi8(mask2, index2);
-    __m512i mask3 = _mm512_setr_epi64(
-        0x101010101010101,
-        0x1010101babaaee6,
-        0x101010101010101,
-        0x1010101babaaee6,
-        0x101010101010101,
-        0x1010101babaaee6,
-        0x101010101010101,
-        0x1010101babaaee6
-    );
-    __m512i index3 = _mm512_and_si512(_mm512_srli_epi16(input, 4), v_0f);
-    __m512i byte_2_high = _mm512_shuffle_epi8(mask3, index3);
-    return _mm512_ternarylogic_epi64(byte_1_high, byte_1_low, byte_2_high, 128);
-  }
-
-  simdutf_really_inline __m512i check_multibyte_lengths(const __m512i input,
-      const __m512i prev_input, const __m512i sc) {
-    __m512i prev2 = prev<2>(input, prev_input);
-    __m512i prev3 = prev<3>(input, prev_input);
-    __m512i is_third_byte  = _mm512_subs_epu8(prev2, _mm512_set1_epi8(0b11100000u-1)); // Only 111_____ will be > 0
-    __m512i is_fourth_byte  = _mm512_subs_epu8(prev3, _mm512_set1_epi8(0b11110000u-1)); // Only 1111____ will be > 0
-    __m512i is_third_or_fourth_byte = _mm512_or_si512(is_third_byte, is_fourth_byte);
-    const __m512i v_7f = _mm512_set1_epi8(char(0x7f));
-    is_third_or_fourth_byte = _mm512_adds_epu8(v_7f, is_third_or_fourth_byte);
-    // We want to compute (is_third_or_fourth_byte AND v80) XOR sc.
-    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-    return _mm512_ternarylogic_epi32(is_third_or_fourth_byte, v_80, sc, 0b1101010);
-    //__m512i is_third_or_fourth_byte_mask = _mm512_and_si512(is_third_or_fourth_byte, v_80);
-    //return _mm512_xor_si512(is_third_or_fourth_byte_mask, sc);
-  }
   //
-  // Return nonzero if there are incomplete multibyte characters at the end of the block:
-  // e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline __m512i is_incomplete(const __m512i input) {
-    // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
-    // ... 1111____ 111_____ 11______
-    __m512i max_value = _mm512_setr_epi64(
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xffffffffffffffff,
-        0xbfdfefffffffffff);
-    return _mm512_subs_epu8(input, max_value);
-  }
-
-  struct avx512_utf8_checker {
-    // If this is nonzero, there has been a UTF-8 error.
-    __m512i error{};
-
-    // The last input we received
-    __m512i prev_input_block{};
-    // Whether the last input we received was incomplete (used for ASCII fast path)
-    __m512i prev_incomplete{};
-
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const __m512i input, const __m512i prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      __m512i prev1 = prev<1>(input, prev_input);
-      __m512i sc = check_special_cases(input, prev1);
-      this->error = _mm512_or_si512(check_multibyte_lengths(input, prev_input, sc), this->error);
-    }
-
-    // The only problem that can happen at EOF is that a multibyte character is too short
-    // or a byte value too large in the last bytes: check_special_cases only checks for bytes
-    // too large in the first of two bytes.
-    simdutf_really_inline void check_eof() {
-      // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
-      // possibly finish them.
+  simdutf_really_inline void check_utf8_bytes(const __m512i input,
+                                              const __m512i prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    __m512i prev1 = prev<1>(input, prev_input);
+    __m512i sc = check_special_cases(input, prev1);
+    this->error = _mm512_or_si512(
+        check_multibyte_lengths(input, prev_input, sc), this->error);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error = _mm512_or_si512(this->error, this->prev_incomplete);
+  }
+
+  // returns true if ASCII.
+  simdutf_really_inline bool check_next_input(const __m512i input) {
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(input, v_80);
+    if (ascii == 0) {
       this->error = _mm512_or_si512(this->error, this->prev_incomplete);
+      return true;
+    } else {
+      this->check_utf8_bytes(input, this->prev_input_block);
+      this->prev_incomplete = is_incomplete(input);
+      this->prev_input_block = input;
+      return false;
     }
-
-    // returns true if ASCII.
-    simdutf_really_inline bool check_next_input(const __m512i input) {
-      const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-      const __mmask64 ascii = _mm512_test_epi8_mask(input, v_80);
-      if(ascii == 0) {
-        this->error = _mm512_or_si512(this->error, this->prev_incomplete);
-        return true;
-      } else {
-        this->check_utf8_bytes(input, this->prev_input_block);
-        this->prev_incomplete = is_incomplete(input);
-        this->prev_input_block = input;
-        return false;
-      }
-    }
-    // do not forget to call check_eof!
-    simdutf_really_inline bool errors() const {
-        return _mm512_test_epi8_mask(this->error, this->error) != 0;
-    }
-  }; // struct avx512_utf8_checker
+  }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return _mm512_test_epi8_mask(this->error, this->error) != 0;
+  }
+}; // struct avx512_utf8_checker
 /* end file src/icelake/icelake_utf8_validation.inl.cpp */
 /* begin file src/icelake/icelake_from_utf8.inl.cpp */
 // file included directly
@@ -21029,305 +21125,331 @@ simdutf_really_inline __m512i check_special_cases(__m512i input, const __m512i p
  */
 
 template <endianness big_endian>
-utf8_to_utf16_result fast_avx512_convert_utf8_to_utf16(const char *in, size_t len, char16_t *out) {
+utf8_to_utf16_result
+fast_avx512_convert_utf8_to_utf16(const char *in, size_t len, char16_t *out) {
   const char *const final_in = in + len;
   bool result = true;
   while (result) {
-    if (final_in - in >= 64 ) {
-        result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(in, out, final_in - in);
-    } else if(in < final_in) {
-        result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(in, out, final_in - in);
-    } else { break; }
+    if (final_in - in >= 64) {
+      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
+          in, out, final_in - in);
+    } else if (in < final_in) {
+      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
+          in, out, final_in - in);
+    } else {
+      break;
+    }
+  }
+  if (!result) {
+    out = nullptr;
   }
-  if(!result) { out = nullptr; }
   return std::make_pair(in, out);
 }
 
 template <endianness big_endian>
-simdutf::result fast_avx512_convert_utf8_to_utf16_with_errors(const char *in, size_t len, char16_t *out) {
+simdutf::result fast_avx512_convert_utf8_to_utf16_with_errors(const char *in,
+                                                              size_t len,
+                                                              char16_t *out) {
   const char *const init_in = in;
   const char16_t *const init_out = out;
   const char *const final_in = in + len;
-  bool  result = true;
+  bool result = true;
   while (result) {
-    if (final_in - in >= 64 ) {
-        result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(in, out, final_in - in);
-    } else if(in < final_in) {
-        result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(in, out, final_in - in);
-    } else { break; }
+    if (final_in - in >= 64) {
+      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
+          in, out, final_in - in);
+    } else if (in < final_in) {
+      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
+          in, out, final_in - in);
+    } else {
+      break;
+    }
   }
-  if(!result) {
+  if (!result) {
     size_t pos = size_t(in - init_in);
     if (pos < len && (init_in[pos] & 0xc0) == 0x80 && pos >= 64) {
       // We must check whether we are the fourth continuation byte
       bool c1 = (init_in[pos - 1] & 0xc0) == 0x80;
       bool c2 = (init_in[pos - 2] & 0xc0) == 0x80;
       bool c3 = (init_in[pos - 3] & 0xc0) == 0x80;
-      if(c1 && c2 && c3) {
+      if (c1 && c2 && c3) {
         return {simdutf::TOO_LONG, pos};
       }
     }
-    // rewind_and_convert_with_errors will seek a potential error from in onward,
-    // with the ability to go back up to in - init_in bytes, and read final_in - in bytes forward.
-    simdutf::result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<big_endian>(in - init_in, in, final_in - in, out);
+    // rewind_and_convert_with_errors will seek a potential error from in
+    // onward, with the ability to go back up to in - init_in bytes, and read
+    // final_in - in bytes forward.
+    simdutf::result res =
+        scalar::utf8_to_utf16::rewind_and_convert_with_errors<big_endian>(
+            in - init_in, in, final_in - in, out);
     res.count += (in - init_in);
     return res;
   } else {
-    return simdutf::result(error_code::SUCCESS,out - init_out);
+    return simdutf::result(error_code::SUCCESS, out - init_out);
   }
 }
 
-
 template <endianness big_endian, typename OUTPUT>
-// todo: replace with the utf-8 to utf-16 routine adapted to utf-32. This code is legacy.
-std::pair<const char*, OUTPUT*> validating_utf8_to_fixed_length(const char* str, size_t len, OUTPUT* dwords) {
-    constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-    constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-    static_assert(UTF32 or UTF16, "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-    static_assert(!(UTF32 and big_endian), "we do not currently support big-endian UTF-32");
-
-    const char* ptr = str;
-    const char* end = ptr + len;
-    __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    OUTPUT* output = dwords;
-    avx512_utf8_checker checker{};
-    /**
-     * In the main loop, we consume 64 bytes per iteration,
-     * but we access 64 + 4 bytes.
-     * We use masked writes to avoid overruns, see https://github.com/simdutf/simdutf/issues/471
-     */
-    while (end - ptr >= 64 + 4) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        if(checker.check_next_input(utf8)) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-            continue;
-        }
-        const __m512i lane0 = broadcast_epi128<0>(utf8);
-        const __m512i lane1 = broadcast_epi128<1>(utf8);
-        int valid_count0;
-        __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-        const __m512i lane2 = broadcast_epi128<2>(utf8);
-        int valid_count1;
-        __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-        if(valid_count0 + valid_count1 <= 16) {
-            vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-            valid_count0 += valid_count1;
-            vec0 = expand_utf8_to_utf32(vec0);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        } else {
-            vec0 = expand_utf8_to_utf32(vec0);
-            vec1 = expand_utf8_to_utf32(vec1);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-        }
-        const __m512i lane3 = broadcast_epi128<3>(utf8);
-        int valid_count2;
-        __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-        uint32_t tmp1;
-        ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-        const __m512i lane4 = _mm512_set1_epi32(tmp1);
-        int valid_count3;
-        __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-        if(valid_count2 + valid_count3 <= 16) {
-            vec2 = _mm512_mask_expand_epi32(vec2, __mmask16(((1<<valid_count3)-1)<<valid_count2), vec3);
-            valid_count2 += valid_count3;
-            vec2 = expand_utf8_to_utf32(vec2);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-        } else {
-            vec2 = expand_utf8_to_utf32(vec2);
-            vec3 = expand_utf8_to_utf32(vec3);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
-        }
-        ptr += 4*16;
-    }
-    const char* validatedptr = ptr; // validated up to ptr
-
-    // For the final pass, we validate 64 bytes, but we only transcode
-    // 3*16 bytes, so we may end up double-validating 16 bytes.
-    if (end - ptr >= 64) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        if(checker.check_next_input(utf8)) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-        } else {
-            const __m512i lane0 = broadcast_epi128<0>(utf8);
-            const __m512i lane1 = broadcast_epi128<1>(utf8);
-            int valid_count0;
-            __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-            const __m512i lane2 = broadcast_epi128<2>(utf8);
-            int valid_count1;
-            __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-            if(valid_count0 + valid_count1 <= 16) {
-                vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-                valid_count0 += valid_count1;
-                vec0 = expand_utf8_to_utf32(vec0);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            } else {
-                vec0 = expand_utf8_to_utf32(vec0);
-                vec1 = expand_utf8_to_utf32(vec1);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-            }
-
-            const __m512i lane3 = broadcast_epi128<3>(utf8);
-            SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
-
-            ptr += 3*16;
-        }
-        validatedptr += 4*16;
-    }
-    if (end != validatedptr) {
-       const __m512i utf8 = _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)), (const __m512i*)validatedptr);
-       checker.check_next_input(utf8);
+// todo: replace with the utf-8 to utf-16 routine adapted to utf-32. This code
+// is legacy.
+std::pair<const char *, OUTPUT *>
+validating_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
+
+  const char *ptr = str;
+  const char *end = ptr + len;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  OUTPUT *output = dwords;
+  avx512_utf8_checker checker{};
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   * We use masked writes to avoid overruns, see
+   * https://github.com/simdutf/simdutf/issues/471
+   */
+  while (end - ptr >= 64 + 4) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    if (checker.check_next_input(utf8)) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
     }
-    checker.check_eof();
-    if(checker.errors()) {
-        return {ptr, nullptr}; // We found an error.
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+    } else {
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+    }
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    } else {
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+    }
+    ptr += 4 * 16;
+  }
+  const char *validatedptr = ptr; // validated up to ptr
+
+  // For the final pass, we validate 64 bytes, but we only transcode
+  // 3*16 bytes, so we may end up double-validating 16 bytes.
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    if (checker.check_next_input(utf8)) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+      }
+
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+
+      ptr += 3 * 16;
     }
-    return {ptr, output};
+    validatedptr += 4 * 16;
+  }
+  if (end != validatedptr) {
+    const __m512i utf8 =
+        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
+                                (const __m512i *)validatedptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  if (checker.errors()) {
+    return {ptr, nullptr}; // We found an error.
+  }
+  return {ptr, output};
 }
 
-// Like validating_utf8_to_fixed_length but returns as soon as an error is identified
-// todo: replace with the utf-8 to utf-16 routine adapted to utf-32. This code is legacy.
+// Like validating_utf8_to_fixed_length but returns as soon as an error is
+// identified todo: replace with the utf-8 to utf-16 routine adapted to utf-32.
+// This code is legacy.
 template <endianness big_endian, typename OUTPUT>
-std::tuple<const char*, OUTPUT*, bool> validating_utf8_to_fixed_length_with_constant_checks(const char* str, size_t len, OUTPUT* dwords) {
-    constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-    constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-    static_assert(UTF32 or UTF16, "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-    static_assert(!(UTF32 and big_endian), "we do not currently support big-endian UTF-32");
-
-    const char* ptr = str;
-    const char* end = ptr + len;
-    __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    OUTPUT* output = dwords;
-    avx512_utf8_checker checker{};
-    /**
-     * In the main loop, we consume 64 bytes per iteration,
-     * but we access 64 + 4 bytes.
-     */
-    while (end - ptr >= 4 + 64) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        bool ascii = checker.check_next_input(utf8);
-        if(checker.errors()) {
-            return {ptr, output, false}; // We found an error.
-        }
-        if(ascii) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-            continue;
-        }
-        const __m512i lane0 = broadcast_epi128<0>(utf8);
-        const __m512i lane1 = broadcast_epi128<1>(utf8);
-        int valid_count0;
-        __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-        const __m512i lane2 = broadcast_epi128<2>(utf8);
-        int valid_count1;
-        __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-        if(valid_count0 + valid_count1 <= 16) {
-            vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-            valid_count0 += valid_count1;
-            vec0 = expand_utf8_to_utf32(vec0);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        } else {
-            vec0 = expand_utf8_to_utf32(vec0);
-            vec1 = expand_utf8_to_utf32(vec1);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-        }
-        const __m512i lane3 = broadcast_epi128<3>(utf8);
-        int valid_count2;
-        __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-        uint32_t tmp1;
-        ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-        const __m512i lane4 = _mm512_set1_epi32(tmp1);
-        int valid_count3;
-        __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-        if(valid_count2 + valid_count3 <= 16) {
-            vec2 = _mm512_mask_expand_epi32(vec2, __mmask16(((1<<valid_count3)-1)<<valid_count2), vec3);
-            valid_count2 += valid_count3;
-            vec2 = expand_utf8_to_utf32(vec2);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-        } else {
-            vec2 = expand_utf8_to_utf32(vec2);
-            vec3 = expand_utf8_to_utf32(vec3);
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-            SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
-        }
-        ptr += 4*16;
+std::tuple<const char *, OUTPUT *, bool>
+validating_utf8_to_fixed_length_with_constant_checks(const char *str,
+                                                     size_t len,
+                                                     OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
+
+  const char *ptr = str;
+  const char *end = ptr + len;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  OUTPUT *output = dwords;
+  avx512_utf8_checker checker{};
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   */
+  while (end - ptr >= 4 + 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    bool ascii = checker.check_next_input(utf8);
+    if (checker.errors()) {
+      return {ptr, output, false}; // We found an error.
+    }
+    if (ascii) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
     }
-    const char* validatedptr = ptr; // validated up to ptr
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+    } else {
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+    }
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    } else {
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+    }
+    ptr += 4 * 16;
+  }
+  const char *validatedptr = ptr; // validated up to ptr
+
+  // For the final pass, we validate 64 bytes, but we only transcode
+  // 3*16 bytes, so we may end up double-validating 16 bytes.
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    bool ascii = checker.check_next_input(utf8);
+    if (checker.errors()) {
+      return {ptr, output, false}; // We found an error.
+    }
+    if (ascii) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+      }
 
-    // For the final pass, we validate 64 bytes, but we only transcode
-    // 3*16 bytes, so we may end up double-validating 16 bytes.
-    if (end - ptr >= 64) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        bool ascii = checker.check_next_input(utf8);
-        if(checker.errors()) {        
-            return {ptr, output, false}; // We found an error.
-        }
-        if(ascii) {
-            SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-            output += 64;
-            ptr += 64;
-        } else {
-            const __m512i lane0 = broadcast_epi128<0>(utf8);
-            const __m512i lane1 = broadcast_epi128<1>(utf8);
-            int valid_count0;
-            __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-            const __m512i lane2 = broadcast_epi128<2>(utf8);
-            int valid_count1;
-            __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-            if(valid_count0 + valid_count1 <= 16) {
-                vec0 = _mm512_mask_expand_epi32(vec0, __mmask16(((1<<valid_count1)-1)<<valid_count0), vec1);
-                valid_count0 += valid_count1;
-                vec0 = expand_utf8_to_utf32(vec0);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-            } else {
-                vec0 = expand_utf8_to_utf32(vec0);
-                vec1 = expand_utf8_to_utf32(vec1);
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-                SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-            }
-
-            const __m512i lane3 = broadcast_epi128<3>(utf8);
-            SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
-
-            ptr += 3*16;
-        }
-        validatedptr += 4*16;
-    }
-    if (end != validatedptr) {
-       const __m512i utf8 = _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)), (const __m512i*)validatedptr);
-       checker.check_next_input(utf8);
-    }
-    checker.check_eof();
-    if(checker.errors()) {
-        return {ptr, output, false}; // We found an error.
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+
+      ptr += 3 * 16;
     }
-    return {ptr, output, true};
+    validatedptr += 4 * 16;
+  }
+  if (end != validatedptr) {
+    const __m512i utf8 =
+        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
+                                (const __m512i *)validatedptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  if (checker.errors()) {
+    return {ptr, output, false}; // We found an error.
+  }
+  return {ptr, output, true};
 }
 /* end file src/icelake/icelake_from_utf8.inl.cpp */
 /* begin file src/icelake/icelake_convert_utf8_to_latin1.inl.cpp */
@@ -21336,18 +21458,16 @@ std::tuple<const char*, OUTPUT*, bool> validating_utf8_to_fixed_length_with_cons
 // File contains conversion procedure from possibly invalid UTF-8 strings.
 
 template <bool is_remaining>
-simdutf_really_inline size_t process_block_from_utf8_to_latin1(const char *buf, size_t len,
-                                           char *latin_output, __m512i minus64,
-                                           __m512i one,
-                                           __mmask64 *next_leading_ptr,
-                                           __mmask64 *next_bit6_ptr) {
+simdutf_really_inline size_t process_block_from_utf8_to_latin1(
+    const char *buf, size_t len, char *latin_output, __m512i minus64,
+    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
   __mmask64 load_mask =
       is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
   __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
   __mmask64 nonascii = _mm512_movepi8_mask(input);
   if (nonascii == 0) {
-    if(*next_leading_ptr) { // If we ended with a leading byte, it is an error.
-      return 0; // Indicates error
+    if (*next_leading_ptr) { // If we ended with a leading byte, it is an error.
+      return 0;              // Indicates error
     }
     is_remaining
         ? _mm512_mask_storeu_epi8((__m512i *)latin_output, load_mask, input)
@@ -21378,7 +21498,7 @@ simdutf_really_inline size_t process_block_from_utf8_to_latin1(const char *buf,
   __mmask64 retain = ~leading & load_mask;
   __m512i output = _mm512_maskz_compress_epi8(retain, input);
   int64_t written_out = count_ones(retain);
-  if(written_out == 0) {
+  if (written_out == 0) {
     return 0; // Indicates error
   }
   *next_bit6_ptr = bit6 >> 63;
@@ -21391,7 +21511,8 @@ simdutf_really_inline size_t process_block_from_utf8_to_latin1(const char *buf,
   return written_out;
 }
 
-size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len, char *&inlatin_output) {
+size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len,
+                             char *&inlatin_output) {
   const char *buf = inbuf;
   char *latin_output = inlatin_output;
   char *start = latin_output;
@@ -21402,12 +21523,13 @@ size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len, char *&inlatin_outp
   __mmask64 next_bit6 = 0;
 
   while (pos + 64 <= len) {
-    size_t written = process_block_from_utf8_to_latin1<false>(buf + pos, 64, latin_output, minus64,
-                                          one, &next_leading, &next_bit6);
+    size_t written = process_block_from_utf8_to_latin1<false>(
+        buf + pos, 64, latin_output, minus64, one, &next_leading, &next_bit6);
     if (written == 0) {
       inlatin_output = latin_output;
       inbuf = buf + pos - next_leading;
-      return 0; // Indicates error at pos or after, or just before pos (too short error)
+      return 0; // Indicates error at pos or after, or just before pos (too
+                // short error)
     }
     latin_output += written;
     pos += 64;
@@ -21415,17 +21537,18 @@ size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len, char *&inlatin_outp
 
   if (pos < len) {
     size_t remaining = len - pos;
-    size_t written =
-        process_block_from_utf8_to_latin1<true>(buf + pos, remaining, latin_output, minus64, one,
-                            &next_leading, &next_bit6);
+    size_t written = process_block_from_utf8_to_latin1<true>(
+        buf + pos, remaining, latin_output, minus64, one, &next_leading,
+        &next_bit6);
     if (written == 0) {
       inbuf = buf + pos - next_leading;
       inlatin_output = latin_output;
-      return 0; // Indicates error at pos or after, or just before pos (too short error)
+      return 0; // Indicates error at pos or after, or just before pos (too
+                // short error)
     }
     latin_output += written;
   }
-  if(next_leading) {
+  if (next_leading) {
     inbuf = buf + len - next_leading;
     inlatin_output = latin_output;
     return 0; // Indicates error at end of buffer
@@ -21441,11 +21564,9 @@ size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len, char *&inlatin_outp
 // File contains conversion procedure from valid UTF-8 strings.
 
 template <bool is_remaining>
-simdutf_really_inline size_t process_valid_block_from_utf8_to_latin1(const char *buf, size_t len,
-                                                 char *latin_output,
-                                                 __m512i minus64, __m512i one,
-                                                 __mmask64 *next_leading_ptr,
-                                                 __mmask64 *next_bit6_ptr) {
+simdutf_really_inline size_t process_valid_block_from_utf8_to_latin1(
+    const char *buf, size_t len, char *latin_output, __m512i minus64,
+    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
   __mmask64 load_mask =
       is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
   __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
@@ -21472,7 +21593,7 @@ simdutf_really_inline size_t process_valid_block_from_utf8_to_latin1(const char
   __mmask64 retain = ~leading & load_mask;
   __m512i output = _mm512_maskz_compress_epi8(retain, input);
   int64_t written_out = count_ones(retain);
-  if(written_out == 0) {
+  if (written_out == 0) {
     return 0; // Indicates error
   }
   __mmask64 store_mask = ~UINT64_C(0) >> (64 - written_out);
@@ -21499,9 +21620,9 @@ size_t valid_utf8_to_latin1_avx512(const char *buf, size_t len,
 
   if (pos < len) {
     size_t remaining = len - pos;
-    size_t written =
-        process_valid_block_from_utf8_to_latin1<true>(buf + pos, remaining, latin_output, minus64,
-                                  one, &next_leading, &next_bit6);
+    size_t written = process_valid_block_from_utf8_to_latin1<true>(
+        buf + pos, remaining, latin_output, minus64, one, &next_leading,
+        &next_bit6);
     latin_output += written;
   }
 
@@ -21618,40 +21739,36 @@ icelake_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
 
 /**
  * This function converts the input (inbuf, inlen), assumed to be valid
- * UTF16 (little endian) into UTF-8 (to outbuf). The number of code units written
- * is written to 'outlen' and the function reports the number of input word
- * consumed.
+ * UTF16 (little endian) into UTF-8 (to outbuf). The number of code units
+ * written is written to 'outlen' and the function reports the number of input
+ * word consumed.
  */
 template <endianness big_endian>
 size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
-                               unsigned char *outbuf, size_t *outlen) {
+                             unsigned char *outbuf, size_t *outlen) {
   __m512i in;
   __mmask32 inmask = _cvtu32_mask32(0x7fffffff);
-  __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-  const char16_t * const inbuf_orig = inbuf;
-  const unsigned char * const outbuf_orig = outbuf;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  const char16_t *const inbuf_orig = inbuf;
+  const unsigned char *const outbuf_orig = outbuf;
   int adjust = 0;
   int carry = 0;
 
   while (inlen >= 32) {
     in = _mm512_loadu_si512(inbuf);
-    if(big_endian) { in = _mm512_shuffle_epi8(in, byteflip); }
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
     inlen -= 31;
   lastiteration:
     inbuf += 31;
 
   failiteration:
     const __mmask32 is234byte = _mm512_mask_cmp_epu16_mask(
-      inmask, in, _mm512_set1_epi16(0x0080), _MM_CMPINT_NLT);
+        inmask, in, _mm512_set1_epi16(0x0080), _MM_CMPINT_NLT);
 
     if (_ktestz_mask32_u8(inmask, is234byte)) {
       // fast path for ASCII only
@@ -21680,9 +21797,12 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
       const __m512i cmpmask =
           _mm512_mask_blend_epi16(inmask, _mm512_set1_epi16(int16_t(0xffff)),
                                   _mm512_set1_epi16(0x0800));
-      const __mmask64 smoosh = _mm512_cmp_epu8_mask(in, cmpmask, _MM_CMPINT_NLT);
+      const __mmask64 smoosh =
+          _mm512_cmp_epu8_mask(in, cmpmask, _MM_CMPINT_NLT);
       const __m512i out = _mm512_maskz_compress_epi8(smoosh, in);
-      _mm512_mask_storeu_epi8(outbuf, _cvtu64_mask64(_pext_u64(_cvtmask64_u64(smoosh), _cvtmask64_u64(smoosh))),
+      _mm512_mask_storeu_epi8(outbuf,
+                              _cvtu64_mask64(_pext_u64(_cvtmask64_u64(smoosh),
+                                                       _cvtmask64_u64(smoosh))),
                               out);
       outbuf += 31 + _mm_popcnt_u32(_cvtmask32_u32(is234byte));
       carry = 0;
@@ -21696,11 +21816,11 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
     __m512i lo = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
     __m512i hi = _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
 
-
     __m512i taglo = _mm512_set1_epi32(0x8080e000);
     __m512i taghi = taglo;
 
-    const __m512i fc00masked = _mm512_and_epi32(in, _mm512_set1_epi16(int16_t(0xfc00)));
+    const __m512i fc00masked =
+        _mm512_and_epi32(in, _mm512_set1_epi16(int16_t(0xfc00)));
     const __mmask32 hisurr = _mm512_mask_cmp_epu16_mask(
         inmask, fc00masked, _mm512_set1_epi16(int16_t(0xd800)), _MM_CMPINT_EQ);
     const __mmask32 losurr = _mm512_cmp_epu16_mask(
@@ -21714,10 +21834,10 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
       __m512i his = _mm512_alignr_epi32(lo, hi, 1);
 
       const __mmask32 hisurrhi = _kshiftri_mask32(hisurr, 16);
-      taglo =
-          _mm512_mask_mov_epi32(taglo,__mmask16(hisurr), _mm512_set1_epi32(0x808080f0));
-      taghi =
-          _mm512_mask_mov_epi32(taghi, __mmask16(hisurrhi), _mm512_set1_epi32(0x808080f0));
+      taglo = _mm512_mask_mov_epi32(taglo, __mmask16(hisurr),
+                                    _mm512_set1_epi32(0x808080f0));
+      taghi = _mm512_mask_mov_epi32(taghi, __mmask16(hisurrhi),
+                                    _mm512_set1_epi32(0x808080f0));
 
       lo = _mm512_mask_slli_epi32(lo, __mmask16(hisurr), lo, 10);
       hi = _mm512_mask_slli_epi32(hi, __mmask16(hisurrhi), hi, 10);
@@ -21728,8 +21848,8 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
 
       carryout = _cvtu32_mask32(_kshiftri_mask32(hisurr, 30));
 
-      const uint32_t  h = _cvtmask32_u32(hisurr);
-      const uint32_t  l = _cvtmask32_u32(losurr);
+      const uint32_t h = _cvtmask32_u32(hisurr);
+      const uint32_t l = _cvtmask32_u32(losurr);
       // check for mismatched surrogates
       if ((h + h + carry) ^ l) {
         const uint32_t lonohi = l & ~(h + h + carry);
@@ -21743,7 +21863,7 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
       }
     }
 
-    hi = _mm512_maskz_mov_epi32(_cvtu32_mask16(0x7fff),hi);
+    hi = _mm512_maskz_mov_epi32(_cvtu32_mask16(0x7fff), hi);
     carry = carryout;
 
     __m512i mslo =
@@ -21759,19 +21879,22 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
     const __mmask64 is1bhi = _kshiftri_mask64(is1byte, 16);
     const __mmask64 is12bhi = _kshiftri_mask64(is12byte, 16);
 
-    taglo =
-        _mm512_mask_mov_epi32(taglo, __mmask16(is12byte), _mm512_set1_epi32(0x80c00000));
-    taghi =
-        _mm512_mask_mov_epi32(taghi, __mmask16(is12bhi), _mm512_set1_epi32(0x80c00000));
-    __m512i magiclo = _mm512_mask_blend_epi32(__mmask16(outmask), _mm512_set1_epi32(0xffffffff),
-                                      _mm512_set1_epi32(0x00010101));
-    __m512i magichi = _mm512_mask_blend_epi32(__mmask16(outmhi), _mm512_set1_epi32(0xffffffff),
-                                      _mm512_set1_epi32(0x00010101));
-
-
-    magiclo = _mm512_mask_blend_epi32(__mmask16(outmask), _mm512_set1_epi32(0xffffffff),
+    taglo = _mm512_mask_mov_epi32(taglo, __mmask16(is12byte),
+                                  _mm512_set1_epi32(0x80c00000));
+    taghi = _mm512_mask_mov_epi32(taghi, __mmask16(is12bhi),
+                                  _mm512_set1_epi32(0x80c00000));
+    __m512i magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
+                                              _mm512_set1_epi32(0xffffffff),
+                                              _mm512_set1_epi32(0x00010101));
+    __m512i magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
+                                              _mm512_set1_epi32(0xffffffff),
+                                              _mm512_set1_epi32(0x00010101));
+
+    magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
+                                      _mm512_set1_epi32(0xffffffff),
                                       _mm512_set1_epi32(0x00010101));
-    magichi = _mm512_mask_blend_epi32(__mmask16(outmhi), _mm512_set1_epi32(0xffffffff),
+    magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
+                                      _mm512_set1_epi32(0xffffffff),
                                       _mm512_set1_epi32(0x00010101));
 
     mslo = _mm512_ternarylogic_epi32(mslo, _mm512_set1_epi32(0x3f3f3f3f), taglo,
@@ -21782,8 +21905,10 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
 
     mshi = _mm512_mask_slli_epi32(mshi, __mmask16(is1bhi), hi, 24);
 
-    const __mmask64 wantlo = _mm512_cmp_epu8_mask(mslo, magiclo, _MM_CMPINT_NLT);
-    const __mmask64 wanthi = _mm512_cmp_epu8_mask(mshi, magichi, _MM_CMPINT_NLT);
+    const __mmask64 wantlo =
+        _mm512_cmp_epu8_mask(mslo, magiclo, _MM_CMPINT_NLT);
+    const __mmask64 wanthi =
+        _mm512_cmp_epu8_mask(mshi, magichi, _MM_CMPINT_NLT);
     const __m512i outlo = _mm512_maskz_compress_epi8(wantlo, mslo);
     const __m512i outhi = _mm512_maskz_compress_epi8(wanthi, mshi);
     const uint64_t wantlo_uint64 = _cvtmask64_u64(wantlo);
@@ -21792,8 +21917,11 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
     uint64_t advlo = _mm_popcnt_u64(wantlo_uint64);
     uint64_t advhi = _mm_popcnt_u64(wanthi_uint64);
 
-    _mm512_mask_storeu_epi8(outbuf, _cvtu64_mask64(_pext_u64(wantlo_uint64, wantlo_uint64)), outlo);
-    _mm512_mask_storeu_epi8(outbuf + advlo, _cvtu64_mask64(_pext_u64(wanthi_uint64, wanthi_uint64)), outhi);
+    _mm512_mask_storeu_epi8(
+        outbuf, _cvtu64_mask64(_pext_u64(wantlo_uint64, wantlo_uint64)), outlo);
+    _mm512_mask_storeu_epi8(
+        outbuf + advlo, _cvtu64_mask64(_pext_u64(wanthi_uint64, wanthi_uint64)),
+        outhi);
     outbuf += advlo + advhi;
   }
   outbuf += -adjust;
@@ -21803,7 +21931,9 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
     // We must have inlen < 31.
     inmask = _cvtu32_mask32((1U << inlen) - 1);
     in = _mm512_maskz_loadu_epi16(inmask, inbuf);
-    if(big_endian) { in = _mm512_shuffle_epi8(in, byteflip); }
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
     adjust = (int)inlen - 31;
     inlen = 0;
     goto lastiteration;
@@ -21820,38 +21950,41 @@ size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::tuple<const char16_t*, char32_t*, bool> convert_utf16_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) {
-  const char16_t* end = buf + len;
+std::tuple<const char16_t *, char32_t *, bool>
+convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                       char32_t *utf32_output) {
+  const char16_t *end = buf + len;
   const __m512i v_fc00 = _mm512_set1_epi16((uint16_t)0xfc00);
   const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
   const __m512i v_dc00 = _mm512_set1_epi16((uint16_t)0xdc00);
   __mmask32 carry{0};
   const __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-  while (std::distance(buf,end) >= 32) {
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  while (std::distance(buf, end) >= 32) {
     // Always safe because buf + 32 <= end so that end - buf >= 32 bytes:
-    __m512i in = _mm512_loadu_si512((__m512i*)buf);
-    if(big_endian) { in = _mm512_shuffle_epi8(in, byteflip); }
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
 
     // H - bitmask for high surrogates
-    const __mmask32 H = _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_d800);
+    const __mmask32 H =
+        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_d800);
     // H - bitmask for low surrogates
-    const __mmask32 L = _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_dc00);
+    const __mmask32 L =
+        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_dc00);
 
-    if ((H|L)) {
+    if ((H | L)) {
       // surrogate pair(s) in a register
-      const __mmask32 V = (L ^ (carry | (H << 1)));   // A high surrogate must be followed by low one and a low one must be preceded by a high one.
-                                                      // If valid, V should be equal to 0
+      const __mmask32 V =
+          (L ^
+           (carry | (H << 1))); // A high surrogate must be followed by low one
+                                // and a low one must be preceded by a high one.
+                                // If valid, V should be equal to 0
 
-      if(V == 0) {
+      if (V == 0) {
         // valid case
         /*
             Input surrogate pair:
@@ -21859,68 +21992,92 @@ std::tuple<const char16_t*, char32_t*, bool> convert_utf16_to_utf32(const char16
                 low surrogate      high surrogate
         */
         /*  1. Expand all code units to 32-bit code units
-            in  |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
+            in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
         */
         const __m512i first = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
-        const __m512i second = _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in,1));
-
-        /*  2. Shift by one 16-bit word to align low surrogates with high surrogates
-            in      |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
-            shifted |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
+        const __m512i second =
+            _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
+
+        /*  2. Shift by one 16-bit word to align low surrogates with high
+           surrogates in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
+            shifted
+           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
         */
         const __m512i shifted_first = _mm512_alignr_epi32(second, first, 1);
-        const __m512i shifted_second = _mm512_alignr_epi32(_mm512_setzero_si512(), second, 1);
+        const __m512i shifted_second =
+            _mm512_alignr_epi32(_mm512_setzero_si512(), second, 1);
 
-        /*  3. Align all high surrogates in first and second by shifting to the left by 10 bits
+        /*  3. Align all high surrogates in first and second by shifting to the
+           left by 10 bits
             |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
         */
-        const __m512i aligned_first = _mm512_mask_slli_epi32(first, (__mmask16)H, first, 10);
-        const __m512i aligned_second = _mm512_mask_slli_epi32(second, (__mmask16)(H>>16), second, 10);
-
-        /*  4. Remove surrogate prefixes and add offset 0x10000 by adding in, shifted and constant
-            in      |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
-            shifted |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
+        const __m512i aligned_first =
+            _mm512_mask_slli_epi32(first, (__mmask16)H, first, 10);
+        const __m512i aligned_second =
+            _mm512_mask_slli_epi32(second, (__mmask16)(H >> 16), second, 10);
+
+        /*  4. Remove surrogate prefixes and add offset 0x10000 by adding in,
+           shifted and constant in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
+            shifted
+           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
             constant|1111.1100.1010.0000.0010.0100.0000.0000|1111.1100.1010.0000.0010.0100.0000.0000|
         */
         const __m512i constant = _mm512_set1_epi32((uint32_t)0xfca02400);
-        const __m512i added_first = _mm512_mask_add_epi32(aligned_first, (__mmask16)H, aligned_first, shifted_first);
-        const __m512i utf32_first = _mm512_mask_add_epi32(added_first, (__mmask16)H, added_first, constant);
-
-        const __m512i added_second = _mm512_mask_add_epi32(aligned_second, (__mmask16)(H>>16), aligned_second, shifted_second);
-        const __m512i utf32_second = _mm512_mask_add_epi32(added_second, (__mmask16)(H>>16), added_second, constant);
-
-        //  5. Store all valid UTF-32 code units (low surrogate positions and 32nd word are invalid)
+        const __m512i added_first = _mm512_mask_add_epi32(
+            aligned_first, (__mmask16)H, aligned_first, shifted_first);
+        const __m512i utf32_first = _mm512_mask_add_epi32(
+            added_first, (__mmask16)H, added_first, constant);
+
+        const __m512i added_second =
+            _mm512_mask_add_epi32(aligned_second, (__mmask16)(H >> 16),
+                                  aligned_second, shifted_second);
+        const __m512i utf32_second = _mm512_mask_add_epi32(
+            added_second, (__mmask16)(H >> 16), added_second, constant);
+
+        //  5. Store all valid UTF-32 code units (low surrogate positions and
+        //  32nd word are invalid)
         const __mmask32 valid = ~L & 0x7fffffff;
-        // We deliberately do a _mm512_maskz_compress_epi32 followed by storeu_epi32
-        // to ease performance portability to Zen 4.
-        const __m512i compressed_first = _mm512_maskz_compress_epi32((__mmask16)(valid), utf32_first);
+        // We deliberately do a _mm512_maskz_compress_epi32 followed by
+        // storeu_epi32 to ease performance portability to Zen 4.
+        const __m512i compressed_first =
+            _mm512_maskz_compress_epi32((__mmask16)(valid), utf32_first);
         const size_t howmany1 = count_ones((uint16_t)(valid));
-        _mm512_storeu_si512((__m512i *) utf32_output,  compressed_first);
+        _mm512_storeu_si512((__m512i *)utf32_output, compressed_first);
         utf32_output += howmany1;
-        const __m512i compressed_second = _mm512_maskz_compress_epi32((__mmask16)(valid >> 16), utf32_second);
+        const __m512i compressed_second =
+            _mm512_maskz_compress_epi32((__mmask16)(valid >> 16), utf32_second);
         const size_t howmany2 = count_ones((uint16_t)(valid >> 16));
         // The following could be unsafe in some cases?
         //_mm512_storeu_epi32((__m512i *) utf32_output, compressed_second);
-        _mm512_mask_storeu_epi32((__m512i *) utf32_output, __mmask16((1<<howmany2)-1), compressed_second);
+        _mm512_mask_storeu_epi32((__m512i *)utf32_output,
+                                 __mmask16((1 << howmany2) - 1),
+                                 compressed_second);
         utf32_output += howmany2;
-        // Only process 31 code units, but keep track if the 31st word is a high surrogate as a carry
+        // Only process 31 code units, but keep track if the 31st word is a high
+        // surrogate as a carry
         buf += 31;
         carry = (H >> 30) & 0x1;
       } else {
         // invalid case
-        return std::make_tuple(buf+carry, utf32_output, false);
+        return std::make_tuple(buf + carry, utf32_output, false);
       }
     } else {
       // no surrogates
       // extend all thirty-two 16-bit code units to thirty-two 32-bit code units
-      _mm512_storeu_si512((__m512i *)(utf32_output), _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in)));
-      _mm512_storeu_si512((__m512i *)(utf32_output) + 1, _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in,1)));
+      _mm512_storeu_si512((__m512i *)(utf32_output),
+                          _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in)));
+      _mm512_storeu_si512(
+          (__m512i *)(utf32_output) + 1,
+          _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1)));
       utf32_output += 32;
       buf += 32;
       carry = 0;
     }
   } // while
-  return std::make_tuple(buf+carry, utf32_output, true);
+  return std::make_tuple(buf + carry, utf32_output, true);
 }
 /* end file src/icelake/icelake_convert_utf16_to_utf32.inl.cpp */
 /* begin file src/icelake/icelake_convert_utf32_to_latin1.inl.cpp */
@@ -21938,8 +22095,9 @@ size_t icelake_convert_utf32_to_latin1(const char32_t *buf, size_t len,
     if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
       return 0;
     }
-    _mm_storeu_si128((__m128i *)latin1_output,
-                     _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+    _mm_storeu_si128(
+        (__m128i *)latin1_output,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
     latin1_output += 16;
     buf += 16;
   }
@@ -21975,8 +22133,9 @@ icelake_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
       return std::make_pair(result(error_code::TOO_LARGE, buf - start),
                             latin1_output);
     }
-    _mm_storeu_si128((__m128i *)latin1_output,
-                     _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+    _mm_storeu_si128(
+        (__m128i *)latin1_output,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
     latin1_output += 16;
     buf += 16;
   }
@@ -22001,8 +22160,10 @@ icelake_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
 // file included directly
 
 // Todo: currently, this is just the haswell code, optimize for icelake kernel.
-std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char *>
+avx512_convert_utf32_to_utf8(const char32_t *buf, size_t len,
+                             char *utf8_output) {
+  const char32_t *end = buf + len;
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
   const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
@@ -22012,36 +22173,46 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
   __m256i running_max = _mm256_setzero_si256();
   __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i*)buf+1);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
     running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff), _mm256_and_si256(nextin, v_7fffffff));
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
     in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-    if(_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
       // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in_16),_mm256_extractf128_si256(in_16,1));
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
       // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
       // 3. adjust pointers
       buf += 16;
       utf8_output += 16;
       continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
@@ -22061,25 +22232,32 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
       const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
       const uint32_t M0 = one_byte_bitmask & 0x55555555;
       const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
       // 4. pack the bytes
 
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
       utf8_output += row[0];
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
       utf8_output += row_2[0];
 
       // 6. adjust pointers
@@ -22087,22 +22265,28 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
       continue;
     }
     // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
     if (saturation_bitmask == 0xffffffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
       const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(forbidden_bytemask, _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
 
-      const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                              0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -22129,7 +22313,7 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m256i s0 = _mm256_srli_epi16(in_16, 4);
@@ -22139,7 +22323,8 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
       const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
       const __m256i s4 = _mm256_xor_si256(s3, m0);
 #undef simdutf_vec
 
@@ -22150,78 +22335,93 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
       const uint32_t mask = (one_byte_bitmask & 0x55555555) |
                             (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be useful.
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
       /*if(mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-        const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-        const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
         utf8_output += 12;
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-        utf8_output += 12;
-        buf += 16;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
         continue;
       }*/
       const uint8_t mask0 = uint8_t(mask);
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
       const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-      const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
-
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
       const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-      const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
       utf8_output += row2[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
       utf8_output += row3[0];
       buf += 16;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will produce four UTF-8 bytes.
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // may require large, non-trivial tables?
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {  // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) { // 2-byte
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000 )==0) {  // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {  // 4-byte
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -22231,19 +22431,24 @@ std::pair<const char32_t*, char*> avx512_convert_utf32_to_utf8(const char32_t* b
 
   // check for invalid input
   const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
-  if(static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(_mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
+          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
     return std::make_pair(nullptr, utf8_output);
   }
 
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf8_output); }
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf8_output);
+  }
 
   return std::make_pair(buf, utf8_output);
 }
 
 // Todo: currently, this is just the haswell code, optimize for icelake kernel.
-std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
-  const char32_t* start = buf;
+std::pair<result, char *>
+avx512_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                         char *utf8_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
 
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
@@ -22253,40 +22458,53 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
   const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
   const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i*)buf+1);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
     // Check for too large input
-    const __m256i max_input = _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
-    if(static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start), utf8_output);
+    const __m256i max_input =
+        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
+    if (static_cast<uint32_t>(_mm256_movemask_epi8(
+            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            utf8_output);
     }
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff), _mm256_and_si256(nextin, v_7fffffff));
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
     in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-    if(_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
       // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in_16),_mm256_extractf128_si256(in_16,1));
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
       // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
       // 3. adjust pointers
       buf += 16;
       utf8_output += 16;
       continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
@@ -22306,25 +22524,32 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
       const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
       const uint32_t M0 = one_byte_bitmask & 0x55555555;
       const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
       // 4. pack the bytes
 
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
       utf8_output += row[0];
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
       utf8_output += row_2[0];
 
       // 6. adjust pointers
@@ -22332,27 +22557,34 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
       continue;
     }
     // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
     if (saturation_bitmask == 0xffffffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
       // Check for illegal surrogate code units
       const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      const __m256i forbidden_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf8_output);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf8_output);
       }
 
-      const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                              0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -22379,7 +22611,7 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m256i s0 = _mm256_srli_epi16(in_16, 4);
@@ -22389,7 +22621,8 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
       const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
       const __m256i s4 = _mm256_xor_si256(s3, m0);
 #undef simdutf_vec
 
@@ -22400,78 +22633,95 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
       const uint32_t mask = (one_byte_bitmask & 0x55555555) |
                             (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be useful.
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
       /*if(mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-        const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-        const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
         utf8_output += 12;
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-        utf8_output += 12;
-        buf += 16;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
         continue;
       }*/
       const uint8_t mask0 = uint8_t(mask);
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
       const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-      const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
-
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
       const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-      const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
       utf8_output += row2[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
       utf8_output += row3[0];
       buf += 16;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will produce four UTF-8 bytes.
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // may require large, non-trivial tables?
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {  // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) { // 2-byte
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000 )==0) {  // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {  // 4-byte
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -22487,55 +22737,75 @@ std::pair<result, char*> avx512_convert_utf32_to_utf8_with_errors(const char32_t
 
 // Todo: currently, this is just the haswell code, optimize for icelake kernel.
 template <endianness big_endian>
-std::pair<const char32_t*, char16_t*> avx512_convert_utf32_to_utf16(const char32_t* buf, size_t len, char16_t* utf16_output) {
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char16_t *>
+avx512_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                              char16_t *utf16_output) {
+  const char32_t *end = buf + len;
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
   __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-
   while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
     const __m256i v_00000000 = _mm256_setzero_si256();
     const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
     // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
     if (saturation_bitmask == 0xffffffff) {
       const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
       const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(forbidden_bytemask, _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -22546,64 +22816,90 @@ std::pair<const char32_t*, char16_t*> avx512_convert_utf32_to_utf16(const char32
   }
 
   // check for invalid input
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf16_output); }
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf16_output);
+  }
 
   return std::make_pair(buf, utf16_output);
 }
 
 // Todo: currently, this is just the haswell code, optimize for icelake kernel.
 template <endianness big_endian>
-std::pair<result, char16_t*> avx512_convert_utf32_to_utf16_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) {
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
+std::pair<result, char16_t *>
+avx512_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                          char16_t *utf16_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
     const __m256i v_00000000 = _mm256_setzero_si256();
     const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
     // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
     if (saturation_bitmask == 0xffffffff) {
       const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
       const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      const __m256i forbidden_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf16_output);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf16_output);
       }
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -22619,17 +22915,20 @@ std::pair<result, char16_t*> avx512_convert_utf32_to_utf16_with_errors(const cha
 /* begin file src/icelake/icelake_ascii_validation.inl.cpp */
 // file included directly
 
-bool validate_ascii(const char* buf, size_t len) {
-  const char* end = buf + len;
+bool validate_ascii(const char *buf, size_t len) {
+  const char *end = buf + len;
   const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
   __m512i running_or = _mm512_setzero_si512();
   for (; end - buf >= 64; buf += 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i*)buf);
-    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii, 0xf8); // running_or | (utf8 & ascii)
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)buf);
+    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
+                                           0xf8); // running_or | (utf8 & ascii)
   }
-  if(buf < end) {
-     const __m512i utf8 = _mm512_maskz_loadu_epi8((uint64_t(1) << (end-buf)) - 1,(const __m512i*)buf);
-    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii, 0xf8); // running_or | (utf8 & ascii)
+  if (buf < end) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        (uint64_t(1) << (end - buf)) - 1, (const __m512i *)buf);
+    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
+                                           0xf8); // running_or | (utf8 & ascii)
   }
   return (_mm512_test_epi8_mask(running_or, running_or) == 0);
 }
@@ -22637,92 +22936,97 @@ bool validate_ascii(const char* buf, size_t len) {
 /* begin file src/icelake/icelake_utf32_validation.inl.cpp */
 // file included directly
 
-const char32_t* validate_utf32(const char32_t* buf, size_t len) {
-    if(len < 16) { return buf; }
-    const char32_t* end = buf + len - 16;
+const char32_t *validate_utf32(const char32_t *buf, size_t len) {
+  if (len < 16) {
+    return buf;
+  }
+  const char32_t *end = buf + len - 16;
 
-    const __m512i offset = _mm512_set1_epi32((uint32_t)0xffff2000);
-    __m512i currentmax = _mm512_setzero_si512();
-    __m512i currentoffsetmax = _mm512_setzero_si512();
+  const __m512i offset = _mm512_set1_epi32((uint32_t)0xffff2000);
+  __m512i currentmax = _mm512_setzero_si512();
+  __m512i currentoffsetmax = _mm512_setzero_si512();
 
-    while (buf <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i*)buf);
-      buf += 16;
-      currentoffsetmax = _mm512_max_epu32(_mm512_add_epi32(utf32, offset), currentoffsetmax);
-      currentmax = _mm512_max_epu32(utf32, currentmax);
-    }
+  while (buf <= end) {
+    __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
+    buf += 16;
+    currentoffsetmax =
+        _mm512_max_epu32(_mm512_add_epi32(utf32, offset), currentoffsetmax);
+    currentmax = _mm512_max_epu32(utf32, currentmax);
+  }
 
-    const __m512i standardmax = _mm512_set1_epi32((uint32_t)0x10ffff);
-    const __m512i standardoffsetmax = _mm512_set1_epi32((uint32_t)0xfffff7ff);
-    __m512i is_zero = _mm512_xor_si512(_mm512_max_epu32(currentmax, standardmax), standardmax);
-    if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
-      return nullptr;
-    }
-    is_zero = _mm512_xor_si512(_mm512_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-    if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
-      return nullptr;
-    }
+  const __m512i standardmax = _mm512_set1_epi32((uint32_t)0x10ffff);
+  const __m512i standardoffsetmax = _mm512_set1_epi32((uint32_t)0xfffff7ff);
+  __m512i is_zero =
+      _mm512_xor_si512(_mm512_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
+    return nullptr;
+  }
+  is_zero = _mm512_xor_si512(
+      _mm512_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
+  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
+    return nullptr;
+  }
 
-    return buf;
+  return buf;
 }
 /* end file src/icelake/icelake_utf32_validation.inl.cpp */
 /* begin file src/icelake/icelake_convert_latin1_to_utf8.inl.cpp */
 // file included directly
 
-static inline size_t latin1_to_utf8_avx512_vec(__m512i input, size_t input_len, char *utf8_output, int mask_output) {
+static inline size_t latin1_to_utf8_avx512_vec(__m512i input, size_t input_len,
+                                               char *utf8_output,
+                                               int mask_output) {
   __mmask64 nonascii = _mm512_movepi8_mask(input);
   size_t output_size = input_len + (size_t)count_ones(nonascii);
 
   // Mask to denote whether the byte is a leading byte that is not ascii
-  __mmask64 sixth =
-      _mm512_cmpge_epu8_mask(input, _mm512_set1_epi8(-64)); //binary representation of -64: 1100 0000
+  __mmask64 sixth = _mm512_cmpge_epu8_mask(
+      input, _mm512_set1_epi8(-64)); // binary representation of -64: 1100 0000
 
   const uint64_t alternate_bits = UINT64_C(0x5555555555555555);
   uint64_t ascii = ~nonascii;
   // the bits in ascii are inverted and zeros are interspersed in between them
   uint64_t maskA = ~_pdep_u64(ascii, alternate_bits);
-  uint64_t maskB = ~_pdep_u64(ascii>>32, alternate_bits);
+  uint64_t maskB = ~_pdep_u64(ascii >> 32, alternate_bits);
 
   // interleave bytes from top and bottom halves (abcd...ABCD -> aAbBcCdD)
-  __m512i input_interleaved = _mm512_permutexvar_epi8(_mm512_set_epi32(
-    0x3f1f3e1e, 0x3d1d3c1c, 0x3b1b3a1a, 0x39193818,
-    0x37173616, 0x35153414, 0x33133212, 0x31113010,
-    0x2f0f2e0e, 0x2d0d2c0c, 0x2b0b2a0a, 0x29092808,
-    0x27072606, 0x25052404, 0x23032202, 0x21012000
-  ), input);
+  __m512i input_interleaved = _mm512_permutexvar_epi8(
+      _mm512_set_epi32(0x3f1f3e1e, 0x3d1d3c1c, 0x3b1b3a1a, 0x39193818,
+                       0x37173616, 0x35153414, 0x33133212, 0x31113010,
+                       0x2f0f2e0e, 0x2d0d2c0c, 0x2b0b2a0a, 0x29092808,
+                       0x27072606, 0x25052404, 0x23032202, 0x21012000),
+      input);
 
   // double size of each byte, and insert the leading byte 1100 0010
 
-/*
-upscale the bytes to 16-bit value, adding the 0b11000000 leading byte in the process.
-We adjust for the bytes that have their two most significant bits. This takes care of the first 32 bytes, assuming we interleaved the bytes. */
-  __m512i outputA = _mm512_shldi_epi16(input_interleaved, _mm512_set1_epi8(-62), 8);
+  /*
+  upscale the bytes to 16-bit value, adding the 0b11000000 leading byte in the
+  process. We adjust for the bytes that have their two most significant bits.
+  This takes care of the first 32 bytes, assuming we interleaved the bytes. */
+  __m512i outputA =
+      _mm512_shldi_epi16(input_interleaved, _mm512_set1_epi8(-62), 8);
   outputA = _mm512_mask_add_epi16(
-                                  outputA,
-                                 (__mmask32)sixth,
-                                  outputA,
-                                  _mm512_set1_epi16(1 - 0x4000)); // 1- 0x4000 = 1100 0000 0000 0001????
-
-  // in the second 32-bit half, set first or second option based on whether original input is leading byte (second case) or not (first case)
-  __m512i leadingB = _mm512_mask_blend_epi16(
-                                              (__mmask32)(sixth>>32),
-                                              _mm512_set1_epi16(0x00c2), // 0000 0000 1101 0010
-                                              _mm512_set1_epi16(0x40c3));// 0100 0000 1100 0011
+      outputA, (__mmask32)sixth, outputA,
+      _mm512_set1_epi16(1 - 0x4000)); // 1- 0x4000 = 1100 0000 0000 0001????
+
+  // in the second 32-bit half, set first or second option based on whether
+  // original input is leading byte (second case) or not (first case)
+  __m512i leadingB =
+      _mm512_mask_blend_epi16((__mmask32)(sixth >> 32),
+                              _mm512_set1_epi16(0x00c2),  // 0000 0000 1101 0010
+                              _mm512_set1_epi16(0x40c3)); // 0100 0000 1100 0011
   __m512i outputB = _mm512_ternarylogic_epi32(
-                                              input_interleaved,
-                                              leadingB,
-                                              _mm512_set1_epi16((short)0xff00),
-                                              (240 & 170) ^ 204); // (input_interleaved & 0xff00) ^ leadingB
+      input_interleaved, leadingB, _mm512_set1_epi16((short)0xff00),
+      (240 & 170) ^ 204); // (input_interleaved & 0xff00) ^ leadingB
 
   // prune redundant bytes
   outputA = _mm512_maskz_compress_epi8(maskA, outputA);
   outputB = _mm512_maskz_compress_epi8(maskB, outputB);
 
-
   size_t output_sizeA = (size_t)count_ones((uint32_t)nonascii) + 32;
 
-  if(mask_output) {
-    if(input_len > 32) { // is the second half of the input vector used?
+  if (mask_output) {
+    if (input_len > 32) { // is the second half of the input vector used?
       __mmask64 write_mask = _bzhi_u64(~0ULL, (unsigned int)output_sizeA);
       _mm512_mask_storeu_epi8(utf8_output, write_mask, outputA);
       utf8_output += output_sizeA;
@@ -22740,9 +23044,10 @@ We adjust for the bytes that have their two most significant bits. This takes ca
   return output_size;
 }
 
-static inline size_t latin1_to_utf8_avx512_branch(__m512i input, char *utf8_output) {
+static inline size_t latin1_to_utf8_avx512_branch(__m512i input,
+                                                  char *utf8_output) {
   __mmask64 nonascii = _mm512_movepi8_mask(input);
-  if(nonascii) {
+  if (nonascii) {
     return latin1_to_utf8_avx512_vec(input, 64, utf8_output, 0);
   } else {
     _mm512_storeu_si512(utf8_output, input);
@@ -22750,7 +23055,8 @@ static inline size_t latin1_to_utf8_avx512_branch(__m512i input, char *utf8_outp
   }
 }
 
-size_t latin1_to_utf8_avx512_start(const char *buf, size_t len, char *utf8_output) {
+size_t latin1_to_utf8_avx512_start(const char *buf, size_t len,
+                                   char *utf8_output) {
   char *start = utf8_output;
   size_t pos = 0;
   // if there's at least 128 bytes remaining, we don't need to mask the output
@@ -22812,22 +23118,25 @@ size_t icelake_convert_latin1_to_utf16(const char *latin1_input, size_t len,
 }
 /* end file src/icelake/icelake_convert_latin1_to_utf16.inl.cpp */
 /* begin file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
-std::pair<const char*, char32_t*> avx512_convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) {
-    size_t rounded_len = len & ~0xF;  // Round down to nearest multiple of 16
+std::pair<const char *, char32_t *>
+avx512_convert_latin1_to_utf32(const char *buf, size_t len,
+                               char32_t *utf32_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
 
-    for (size_t i = 0; i < rounded_len; i += 16) {
-        // Load 16 Latin1 characters into a 128-bit register
-        __m128i in = _mm_loadu_si128((__m128i*)&buf[i]);
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    // Load 16 Latin1 characters into a 128-bit register
+    __m128i in = _mm_loadu_si128((__m128i *)&buf[i]);
 
-        // Zero extend each set of 8 Latin1 characters to 16 32-bit integers using vpmovzxbd
-        __m512i out = _mm512_cvtepu8_epi32(in);
+    // Zero extend each set of 8 Latin1 characters to 16 32-bit integers using
+    // vpmovzxbd
+    __m512i out = _mm512_cvtepu8_epi32(in);
 
-        // Store the results back to memory
-        _mm512_storeu_si512((__m512i*)&utf32_output[i], out);
-    }
+    // Store the results back to memory
+    _mm512_storeu_si512((__m512i *)&utf32_output[i], out);
+  }
 
-    // Return pointers pointing to where we left off
-    return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
+  // Return pointers pointing to where we left off
+  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
 }
 /* end file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
 /* begin file src/icelake/icelake_base64.inl.cpp */
@@ -22901,9 +23210,9 @@ template <bool base64_url>
 static inline uint64_t to_base64_mask(block64 *b, bool *error) {
   __m512i input = b->chunks[0];
   const __m512i ascii_space_tbl = _mm512_set_epi8(
-      0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10, 9,
-      0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0,
-      32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32);
+      0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10,
+      9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0,
+      0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32);
   __m512i lookup0;
   if (base64_url) {
     lookup0 = _mm512_set_epi8(
@@ -22959,14 +23268,14 @@ static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
   return _mm_popcnt_u64(nmask);
 }
 
-// The caller of this function is responsible to ensure that there are 64 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char *src) {
   b->chunks[0] = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
 }
 
-// The caller of this function is responsible to ensure that there are 128 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char16_t *src) {
   __m512i m1 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
   __m512i m2 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src + 32));
@@ -23001,13 +23310,16 @@ static inline void base64_decode_block(char *out, block64 *b) {
 
 template <bool base64_url, typename chartype>
 result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options) {
+                              base64_options options,
+                              last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
-  size_t equallocation = srclen; // location of the first padding character if any
+  size_t equallocation =
+      srclen; // location of the first padding character if any
   size_t equalsigns = 0;
   // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
     srclen--;
   }
   if (srclen > 0 && src[srclen - 1] == '=') {
@@ -23015,7 +23327,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     srclen--;
     equalsigns = 1;
     // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
       srclen--;
     }
     if (srclen > 0 && src[srclen - 1] == '=') {
@@ -23024,6 +23337,12 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       equalsigns = 2;
     }
   }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
   const chartype *const srcinit = src;
   const char *const dstinit = dst;
   const chartype *const srcend = src + srclen;
@@ -23042,7 +23361,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) && to_base64[uint8_t(*src)] <= 64) {
+        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+               to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
         return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
@@ -23116,70 +23436,38 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       buffer_start += 4;
     }
     // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // bring in src content
+    // backtrack
     int leftover = int(bufferptr - buffer_start);
-    if (leftover > 0) {
-      while (leftover < 4 && src < srcend) {
-        uint8_t val = to_base64[uint8_t(*src)];
-        if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-        }
-        buffer_start[leftover] = char(val);
-        leftover += (val <= 63);
-        src++;
-      }
-
-      if (leftover == 1) {
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      if (leftover == 2) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-        std::memcpy(dst, &triple, 1);
-        dst += 1;
-      } else if (leftover == 3) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6) +
-                          (uint32_t(buffer_start[2]) << 1 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-
-        std::memcpy(dst, &triple, 2);
-        dst += 2;
-      } else {
-        uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                           (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                           (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                           (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                          << 8;
-        triple = scalar::utf32::swap_bytes(triple);
-        std::memcpy(dst, &triple, 3);
-        dst += 3;
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
       }
+      src--;
+      leftover--;
     }
   }
   if (src < srcend + equalsigns) {
-    result r =
-        scalar::base64::base64_tail_decode(dst, src, srcend - src, options);
+    result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
     if (r.error == error_code::INVALID_BASE64_CHARACTER) {
       r.count += size_t(src - srcinit);
       return r;
     } else {
       r.count += size_t(dst - dstinit);
     }
-    if(r.error == error_code::SUCCESS && equalsigns > 0) {
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
         r.count = equallocation;
       }
     }
     return r;
   }
-  if(equalsigns > 0) {
-    if((size_t(dst - dstinit) % 3 == 0) || ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
@@ -23187,7 +23475,6 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
 }
 /* end file src/icelake/icelake_base64.inl.cpp */
 
-
 #include <cstdint>
 
 } // namespace
@@ -23203,306 +23490,370 @@ implementation::detect_encodings(const char *input,
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
   // todo: convert to a one-pass algorithm
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return true;
   }
-    avx512_utf8_checker checker{};
-    const char* ptr = buf;
-    const char* end = ptr + len;
-    for (; end - ptr >= 64 ; ptr += 64) {
-        const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-        checker.check_next_input(utf8);
-    }
-    if(end != ptr) {
-       const __m512i utf8 = _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i*)ptr);
-       checker.check_next_input(utf8);
-    }
-    checker.check_eof();
-    return ! checker.errors();
-}
-
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
-    if (simdutf_unlikely(len == 0)) {
-       return result(error_code::SUCCESS, len);
-    }
-    avx512_utf8_checker checker{};
-    const char* ptr = buf;
-    const char* end = ptr + len;
-    size_t count{0};
-    for (; end - ptr >= 64 ; ptr += 64) {
-      const __m512i utf8 = _mm512_loadu_si512((const __m512i*)ptr);
-      checker.check_next_input(utf8);
-      if(checker.errors()) {
-        if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-        result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(buf), reinterpret_cast<const char*>(buf + count), len - count);
-        res.count += count;
-        return res;
-      }
-      count += 64;
-    }
-    if (end != ptr) {
-      const __m512i utf8 = _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i*)ptr);
-      checker.check_next_input(utf8);
-    }
-    checker.check_eof();
-    if(checker.errors()) {
-      if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(buf), reinterpret_cast<const char*>(buf + count), len - count);
+  avx512_utf8_checker checker{};
+  const char *ptr = buf;
+  const char *end = ptr + len;
+  for (; end - ptr >= 64; ptr += 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  if (end != ptr) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  return !checker.errors();
+}
+
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, len);
+  }
+  avx512_utf8_checker checker{};
+  const char *ptr = buf;
+  const char *end = ptr + len;
+  size_t count{0};
+  for (; end - ptr >= 64; ptr += 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    checker.check_next_input(utf8);
+    if (checker.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(buf),
+          reinterpret_cast<const char *>(buf + count), len - count);
       res.count += count;
       return res;
     }
-    return result(error_code::SUCCESS, len);
+    count += 64;
+  }
+  if (end != ptr) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  if (checker.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(buf),
+        reinterpret_cast<const char *>(buf + count), len - count);
+    res.count += count;
+    return res;
+  }
+  return result(error_code::SUCCESS, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
   return icelake::validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
-  const char* buf_orig = buf;
-  const char* end = buf + len;
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  const char *buf_orig = buf;
+  const char *end = buf + len;
   const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
-  for (; end - buf >= 64 ; buf += 64) {
-    const __m512i input = _mm512_loadu_si512((const __m512i*)buf);
+  for (; end - buf >= 64; buf += 64) {
+    const __m512i input = _mm512_loadu_si512((const __m512i *)buf);
     __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
-    if(notascii) {
-      return result(error_code::TOO_LARGE, buf - buf_orig + _tzcnt_u64(notascii));
+    if (notascii) {
+      return result(error_code::TOO_LARGE,
+                    buf - buf_orig + _tzcnt_u64(notascii));
     }
   }
   if (end != buf) {
-    const __m512i input = _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - buf)), (const __m512i*)buf);
+    const __m512i input = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - buf)), (const __m512i *)buf);
     __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
-    if(notascii) {
-      return result(error_code::TOO_LARGE, buf - buf_orig + _tzcnt_u64(notascii));
+    if (notascii) {
+      return result(error_code::TOO_LARGE,
+                    buf - buf_orig + _tzcnt_u64(notascii));
     }
   }
   return result(error_code::SUCCESS, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
-    const char16_t *end = buf + len;
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  const char16_t *end = buf + len;
 
-    for(;end - buf >= 32; ) {
-      __m512i in = _mm512_loadu_si512((__m512i*)buf);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-           return false;
-        }
-        bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-        if(ends_with_high) {
-          buf += 31; // advance only by 31 code units so that we start with the high surrogate on the next round.
-        } else {
-          buf += 32;
-        }
+  for (; end - buf >= 32;) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
       } else {
         buf += 32;
       }
+    } else {
+      buf += 32;
     }
-    if(buf < end) {
-      __m512i in = _mm512_maskz_loadu_epi16((1U<<(end-buf))-1,(__m512i*)buf);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-           return false;
-        }
+  }
+  if (buf < end) {
+    __m512i in =
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
       }
     }
-    return true;
+  }
+  return true;
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
-   const char16_t *end = buf + len;
-   const __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    for(;end - buf >= 32; ) {
-      __m512i in = _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i*)buf), byteflip);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-           return false;
-        }
-        bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-        if(ends_with_high) {
-          buf += 31; // advance only by 31 code units so that we start with the high surrogate on the next round.
-        } else {
-          buf += 32;
-        }
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  const char16_t *end = buf + len;
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  for (; end - buf >= 32;) {
+    __m512i in =
+        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
       } else {
         buf += 32;
       }
+    } else {
+      buf += 32;
     }
-    if(buf < end) {
-      __m512i in = _mm512_shuffle_epi8(_mm512_maskz_loadu_epi16((1U<<(end-buf))-1,(__m512i*)buf), byteflip);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-           return false;
-        }
+  }
+  if (buf < end) {
+    __m512i in = _mm512_shuffle_epi8(
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
+        byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
       }
     }
-    return true;
+  }
+  return true;
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
-    const char16_t *start_buf = buf;
-    const char16_t *end = buf + len;
-    for(;end - buf >= 32; ) {
-      __m512i in = _mm512_loadu_si512((__m512i*)buf);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-          uint32_t extra_low = _tzcnt_u32(lowsurrogates &~(highsurrogates << 1));
-          uint32_t extra_high = _tzcnt_u32(highsurrogates &~(lowsurrogates >> 1));
-          return result(error_code::SURROGATE, (buf - start_buf) + (extra_low < extra_high ? extra_low : extra_high));
-        }
-        bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-        if(ends_with_high) {
-          buf += 31; // advance only by 31 code units so that we start with the high surrogate on the next round.
-        } else {
-          buf += 32;
-        }
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  const char16_t *start_buf = buf;
+  const char16_t *end = buf + len;
+  for (; end - buf >= 32;) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
       } else {
         buf += 32;
       }
+    } else {
+      buf += 32;
     }
-    if(buf < end) {
-      __m512i in = _mm512_maskz_loadu_epi16((1U<<(end-buf))-1,(__m512i*)buf);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-          uint32_t extra_low = _tzcnt_u32(lowsurrogates &~(highsurrogates << 1));
-          uint32_t extra_high = _tzcnt_u32(highsurrogates &~(lowsurrogates >> 1));
-          return result(error_code::SURROGATE, (buf - start_buf) + (extra_low < extra_high ? extra_low : extra_high));
-        }
+  }
+  if (buf < end) {
+    __m512i in =
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
       }
     }
-    return result(error_code::SUCCESS, len);
+  }
+  return result(error_code::SUCCESS, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
-    const char16_t *start_buf = buf;
-    const char16_t *end = buf + len;
-    const __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    for(;end - buf >= 32; ) {
-      __m512i in = _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i*)buf), byteflip);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-          uint32_t extra_low = _tzcnt_u32(lowsurrogates &~(highsurrogates << 1));
-          uint32_t extra_high = _tzcnt_u32(highsurrogates &~(lowsurrogates >> 1));
-          return result(error_code::SURROGATE, (buf - start_buf) + (extra_low < extra_high ? extra_low : extra_high));
-        }
-        bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-        if(ends_with_high) {
-          buf += 31; // advance only by 31 code units so that we start with the high surrogate on the next round.
-        } else {
-          buf += 32;
-        }
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  const char16_t *start_buf = buf;
+  const char16_t *end = buf + len;
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  for (; end - buf >= 32;) {
+    __m512i in =
+        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
       } else {
         buf += 32;
       }
+    } else {
+      buf += 32;
     }
-    if(buf < end) {
-      __m512i in = _mm512_shuffle_epi8(_mm512_maskz_loadu_epi16((1U<<(end-buf))-1,(__m512i*)buf), byteflip);
-      __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-      __mmask32 surrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-      if(surrogates) {
-        __mmask32 highsurrogates = _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-        __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-        // high must be followed by low
-        if ((highsurrogates << 1) != lowsurrogates) {
-          uint32_t extra_low = _tzcnt_u32(lowsurrogates &~(highsurrogates << 1));
-          uint32_t extra_high = _tzcnt_u32(highsurrogates &~(lowsurrogates >> 1));
-          return result(error_code::SURROGATE, (buf - start_buf) + (extra_low < extra_high ? extra_low : extra_high));
-        }
+  }
+  if (buf < end) {
+    __m512i in = _mm512_shuffle_epi8(
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
+        byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
       }
     }
-    return result(error_code::SUCCESS, len);
+  }
+  return result(error_code::SUCCESS, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
-  const char32_t * tail = icelake::validate_utf32(buf, len);
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  const char32_t *tail = icelake::validate_utf32(buf, len);
   if (tail) {
     return scalar::utf32::validate(tail, len - (tail - buf));
   } else {
-    // we come here if there was an error, or buf was nullptr which may happen for empty input.
+    // we come here if there was an error, or buf was nullptr which may happen
+    // for empty input.
     return len == 0;
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
-  const char32_t* buf_orig = buf;
-  if(len >= 16) {
-    const char32_t* end = buf + len - 16;
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  const char32_t *buf_orig = buf;
+  if (len >= 16) {
+    const char32_t *end = buf + len - 16;
     while (buf <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i*)buf);
-      __mmask16 outside_range = _mm512_cmp_epu32_mask(utf32, _mm512_set1_epi32(0x10ffff),
-                                _MM_CMPINT_GT);
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
+      __mmask16 outside_range = _mm512_cmp_epu32_mask(
+          utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
 
-      __m512i utf32_off = _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
+      __m512i utf32_off =
+          _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
 
-      __mmask16 surrogate_range = _mm512_cmp_epu32_mask(utf32_off, _mm512_set1_epi32(0xfffff7ff),
-                                _MM_CMPINT_GT);
-      if((outside_range | surrogate_range)) {
+      __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
+          utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
+      if ((outside_range | surrogate_range)) {
         auto outside_idx = _tzcnt_u32(outside_range);
         auto surrogate_idx = _tzcnt_u32(surrogate_range);
 
@@ -23516,15 +23867,16 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(const char
       buf += 16;
     }
   }
-  if(len > 0) {
-    __m512i utf32 = _mm512_maskz_loadu_epi32(__mmask16((1U<<(buf_orig + len - buf))-1),(const __m512i*)buf);
-    __mmask16 outside_range = _mm512_cmp_epu32_mask(utf32, _mm512_set1_epi32(0x10ffff),
-                              _MM_CMPINT_GT);
+  if (len > 0) {
+    __m512i utf32 = _mm512_maskz_loadu_epi32(
+        __mmask16((1U << (buf_orig + len - buf)) - 1), (const __m512i *)buf);
+    __mmask16 outside_range = _mm512_cmp_epu32_mask(
+        utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
     __m512i utf32_off = _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
 
-    __mmask16 surrogate_range = _mm512_cmp_epu32_mask(utf32_off, _mm512_set1_epi32(0xfffff7ff),
-                              _MM_CMPINT_GT);
-    if((outside_range | surrogate_range)) {
+    __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
+        utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
+    if ((outside_range | surrogate_range)) {
       auto outside_idx = _tzcnt_u32(outside_range);
       auto surrogate_idx = _tzcnt_u32(surrogate_range);
 
@@ -23534,91 +23886,115 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(const char
 
       return result(error_code::SURROGATE, buf - buf_orig + surrogate_idx);
     }
-
   }
 
   return result(error_code::SUCCESS, len);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
   return icelake::latin1_to_utf8_avx512_start(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return icelake_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return icelake_convert_latin1_to_utf16<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return icelake_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return icelake_convert_latin1_to_utf16<endianness::BIG>(buf, len,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
-    std::pair<const char*, char32_t*> ret = avx512_convert_latin1_to_utf32(buf, len, utf32_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf32_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      avx512_convert_latin1_to_utf32(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   return icelake::utf8_to_latin1_avx512(buf, len, latin1_output);
 }
 
-
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   // First, try to convert as much as possible using the SIMD implementation.
-  const char * obuf = buf;
-  char * olatin1_output = latin1_output;
+  const char *obuf = buf;
+  char *olatin1_output = latin1_output;
   size_t written = icelake::utf8_to_latin1_avx512(obuf, len, olatin1_output);
 
   // If we have completely converted the string
-  if(obuf == buf + len) {
+  if (obuf == buf + len) {
     return {simdutf::SUCCESS, written};
   }
   size_t pos = obuf - buf;
-  result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, buf + pos, len - pos, latin1_output);
+  result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+      pos, buf + pos, len - pos, latin1_output);
   res.count += pos;
   return res;
 }
 
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   return icelake::valid_utf8_to_latin1_avx512(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  utf8_to_utf16_result ret = fast_avx512_convert_utf8_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      fast_avx512_convert_utf8_to_utf16<endianness::LITTLE>(buf, len,
+                                                            utf16_output);
   if (ret.second == nullptr) {
     return 0;
   }
   return ret.second - utf16_output;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  utf8_to_utf16_result ret = fast_avx512_convert_utf8_to_utf16<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret = fast_avx512_convert_utf8_to_utf16<endianness::BIG>(
+      buf, len, utf16_output);
   if (ret.second == nullptr) {
     return 0;
   }
   return ret.second - utf16_output;
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-   return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-   return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  utf8_to_utf16_result ret = icelake::valid_utf8_to_fixed_length<endianness::LITTLE, char16_t>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, char16_t>(
+          buf, len, utf16_output);
   size_t saved_bytes = ret.second - utf16_output;
-  const char* end = buf + len;
+  const char *end = buf + len;
   if (ret.first == end) {
     return saved_bytes;
   }
@@ -23629,23 +24005,29 @@ simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const c
   //       It meas, we have to skip continuation bytes from
   //       the beginning ret.first, as they were already consumed.
   while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-      ret.first += 1;
+    ret.first += 1;
   }
 
   if (ret.first != end) {
-    const size_t scalar_saved_bytes = scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
 
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-  utf8_to_utf16_result ret = icelake::valid_utf8_to_fixed_length<endianness::BIG, char16_t>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::BIG, char16_t>(
+          buf, len, utf16_output);
   size_t saved_bytes = ret.second - utf16_output;
-  const char* end = buf + len;
+  const char *end = buf + len;
   if (ret.first == end) {
     return saved_bytes;
   }
@@ -23656,28 +24038,33 @@ simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const c
   //       It meas, we have to skip continuation bytes from
   //       the beginning ret.first, as they were already consumed.
   while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-      ret.first += 1;
+    ret.first += 1;
   }
 
   if (ret.first != end) {
-    const size_t scalar_saved_bytes = scalar::utf8_to_utf16::convert_valid<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf8_to_utf16::convert_valid<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
 
   return saved_bytes;
 }
 
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_out) const noexcept {
-  uint32_t * utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  utf8_to_utf32_result ret = icelake::validating_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  utf8_to_utf32_result ret =
+      icelake::validating_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
+          buf, len, utf32_output);
   if (ret.second == nullptr)
     return 0;
 
   size_t saved_bytes = ret.second - utf32_output;
-  const char* end = buf + len;
+  const char *end = buf + len;
   if (ret.first == end) {
     return saved_bytes;
   }
@@ -23688,49 +24075,55 @@ simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf
   //       It means, we have to skip continuation bytes from
   //       the beginning ret.first, as they were already consumed.
   while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-      ret.first += 1;
+    ret.first += 1;
   }
   if (ret.first != end) {
     const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert(
-                                        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
 
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* buf, size_t len, char32_t* utf32) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     return {error_code::SUCCESS, 0};
   }
-  uint32_t * utf32_output = reinterpret_cast<uint32_t *>(utf32);
-  auto ret = icelake::validating_utf8_to_fixed_length_with_constant_checks<endianness::LITTLE, uint32_t>(buf, len, utf32_output);
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32);
+  auto ret = icelake::validating_utf8_to_fixed_length_with_constant_checks<
+      endianness::LITTLE, uint32_t>(buf, len, utf32_output);
 
   if (!std::get<2>(ret)) {
     size_t pos = std::get<0>(ret) - buf;
     // We might have an error that occurs right before  pos.
     // This is only a concern if buf[pos] is not a continuation byte.
-    if((buf[pos] & 0xc0) != 0x80 && pos >= 64) {
+    if ((buf[pos] & 0xc0) != 0x80 && pos >= 64) {
       pos -= 1;
     } else if ((buf[pos] & 0xc0) == 0x80 && pos >= 64) {
       // We must check whether we are the fourth continuation byte
       bool c1 = (buf[pos - 1] & 0xc0) == 0x80;
       bool c2 = (buf[pos - 2] & 0xc0) == 0x80;
       bool c3 = (buf[pos - 3] & 0xc0) == 0x80;
-      if(c1 && c2 && c3) {
+      if (c1 && c2 && c3) {
         return {simdutf::TOO_LONG, pos};
       }
     }
-    // todo: we reset the output to utf32 instead of using std::get<2.(ret) as you'd expect.
-    // that is because validating_utf8_to_fixed_length_with_constant_checks may have processed
+    // todo: we reset the output to utf32 instead of using std::get<2.(ret) as
+    // you'd expect. that is because
+    // validating_utf8_to_fixed_length_with_constant_checks may have processed
     // data beyond the error.
-    result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, buf + pos, len - pos, utf32);
-      res.count += pos;
-      return res;
+    result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+        pos, buf + pos, len - pos, utf32);
+    res.count += pos;
+    return res;
   }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  const char* end = buf + len;
+  const char *end = buf + len;
   if (std::get<0>(ret) == end) {
     return {simdutf::SUCCESS, saved_bytes};
   }
@@ -23740,15 +24133,17 @@ simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(con
   //       continuation bytes lie outside 16-byte window.
   //       It means, we have to skip continuation bytes from
   //       the beginning ret.first, as they were already consumed.
-  while (std::get<0>(ret) != end and ((uint8_t(*std::get<0>(ret)) & 0xc0) == 0x80)) {
-      std::get<0>(ret) += 1;
+  while (std::get<0>(ret) != end and
+         ((uint8_t(*std::get<0>(ret)) & 0xc0) == 0x80)) {
+    std::get<0>(ret) += 1;
   }
 
   if (std::get<0>(ret) != end) {
     auto scalar_result = scalar::utf8_to_utf32::convert_with_errors(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), reinterpret_cast<char32_t *>(utf32_output) + saved_bytes);
+        std::get<0>(ret), len - (std::get<0>(ret) - buf),
+        reinterpret_cast<char32_t *>(utf32_output) + saved_bytes);
     if (scalar_result.error != simdutf::SUCCESS) {
-      scalar_result.count +=  (std::get<0>(ret) - buf);
+      scalar_result.count += (std::get<0>(ret) - buf);
     } else {
       scalar_result.count += saved_bytes;
     }
@@ -23758,12 +24153,14 @@ simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(con
   return {simdutf::SUCCESS, size_t(std::get<1>(ret) - utf32_output)};
 }
 
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_out) const noexcept {
-  uint32_t * utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  utf8_to_utf32_result ret = icelake::valid_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  utf8_to_utf32_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
+          buf, len, utf32_output);
   size_t saved_bytes = ret.second - utf32_output;
-  const char* end = buf + len;
+  const char *end = buf + len;
   if (ret.first == end) {
     return saved_bytes;
   }
@@ -23774,122 +24171,165 @@ simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const cha
   //       It meas, we have to skip continuation bytes from
   //       the beginning ret.first, as they were already consumed.
   while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-      ret.first += 1;
+    ret.first += 1;
   }
 
   if (ret.first != end) {
     const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert_valid(
-                                        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
 
   return saved_bytes;
 }
 
-
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1<endianness::LITTLE>(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
+                                                             latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1<endianness::BIG>(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1<endianness::BIG>(buf, len,
+                                                          latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(buf,len,latin1_output).first;
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+             buf, len, latin1_output)
+      .first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf,len,latin1_output).first;
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1_with_errors<endianness::BIG>(
+             buf, len, latin1_output)
+      .first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement custom function
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement custom function
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(buf, len, (unsigned char*)utf8_output, &outlen);
-  if(inlen != len) { return 0; }
+  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    return 0;
+  }
   return outlen;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(buf, len, (unsigned char*)utf8_output, &outlen);
-  if(inlen != len) { return 0; }
+  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    return 0;
+  }
   return outlen;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(buf, len, (unsigned char*)utf8_output, &outlen);
-  if(inlen != len) {
-    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(buf + inlen, len - inlen, utf8_output + outlen);
+  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+        buf + inlen, len - inlen, utf8_output + outlen);
     res.count += inlen;
     return res;
   }
   return {simdutf::SUCCESS, outlen};
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(buf, len, (unsigned char*)utf8_output, &outlen);
-  if(inlen != len) {
-    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(buf + inlen, len - inlen, utf8_output + outlen);
+  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+        buf + inlen, len - inlen, utf8_output + outlen);
     res.count += inlen;
     return res;
   }
   return {simdutf::SUCCESS, outlen};
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16le_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1_with_errors(buf,len,latin1_output).first;
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1_with_errors(buf, len, latin1_output)
+      .first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
-
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = avx512_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx512_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = icelake::avx512_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      icelake::avx512_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -23897,46 +24337,68 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf32_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = avx512_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx512_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = avx512_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx512_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = avx512_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      avx512_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+          buf, len, utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -23944,16 +24406,23 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = avx512_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      avx512_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                                 utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -23961,56 +24430,80 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16le(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16be(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::BIG>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
   if (!std::get<2>(ret)) {
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
     scalar_res.count += (std::get<0>(ret) - buf);
     return scalar_res;
   }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
     if (scalar_res.error) {
       scalar_res.count += (std::get<0>(ret) - buf);
       return scalar_res;
@@ -24022,18 +24515,22 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
   return simdutf::result(simdutf::SUCCESS, saved_bytes);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
   if (!std::get<2>(ret)) {
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
     scalar_res.count += (std::get<0>(ret) - buf);
     return scalar_res;
   }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
     if (scalar_res.error) {
       scalar_res.count += (std::get<0>(ret) - buf);
       return scalar_res;
@@ -24045,116 +24542,130 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
   return simdutf::result(simdutf::SUCCESS, saved_bytes);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) { return 0; }
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::tuple<const char16_t*, char32_t*, bool> ret = icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) { return 0; }
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
   size_t saved_bytes = std::get<1>(ret) - utf32_output;
   if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::BIG>(
-                                        std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   size_t pos = 0;
   const __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
   while (pos + 32 <= length) {
-    __m512i utf16 = _mm512_loadu_si512((const __m512i*)(input + pos));
+    __m512i utf16 = _mm512_loadu_si512((const __m512i *)(input + pos));
     utf16 = _mm512_shuffle_epi8(utf16, byteflip);
     _mm512_storeu_si512(output + pos, utf16);
     pos += 32;
   }
-  if(pos < length) {
-    __mmask32 m((1U<< (length - pos))-1);
-    __m512i utf16 = _mm512_maskz_loadu_epi16(m, (const __m512i*)(input + pos));
+  if (pos < length) {
+    __mmask32 m((1U << (length - pos)) - 1);
+    __m512i utf16 = _mm512_maskz_loadu_epi16(m, (const __m512i *)(input + pos));
     utf16 = _mm512_shuffle_epi8(utf16, byteflip);
     _mm512_mask_storeu_epi16(output + pos, m, utf16);
   }
 }
 
-
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
-  const char16_t* ptr = input;
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
   size_t count{0};
 
-  if(length >= 32) {
-    const char16_t* end = input + length - 32;
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
     const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
     const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
 
-
     while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i*)ptr);
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
       ptr += 32;
-      uint64_t not_high_surrogate = static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) | _mm512_cmplt_epu16_mask(utf16, low));
+      uint64_t not_high_surrogate =
+          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
+                                _mm512_cmplt_epu16_mask(utf16, low));
       count += count_ones(not_high_surrogate);
     }
   }
 
-  return count + scalar::utf16::count_code_points<endianness::LITTLE>(ptr, length - (ptr - input));
+  return count + scalar::utf16::count_code_points<endianness::LITTLE>(
+                     ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
-  const char16_t* ptr = input;
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
   size_t count{0};
-  if(length >= 32) {
+  if (length >= 32) {
 
-    const char16_t* end = input + length - 32;
+    const char16_t *end = input + length - 32;
 
     const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
     const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
 
     const __m512i byteflip = _mm512_setr_epi64(
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809
-          );
+        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+        0x0607040502030001, 0x0e0f0c0d0a0b0809);
     while (ptr <= end) {
-      __m512i utf16 = _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i*)ptr), byteflip);
+      __m512i utf16 =
+          _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)ptr), byteflip);
       ptr += 32;
-      uint64_t not_high_surrogate = static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) | _mm512_cmplt_epu16_mask(utf16, low));
+      uint64_t not_high_surrogate =
+          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
+                                _mm512_cmplt_epu16_mask(utf16, low));
       count += count_ones(not_high_surrogate);
     }
   }
 
-  return count + scalar::utf16::count_code_points<endianness::BIG>(ptr, length - (ptr - input));
+  return count + scalar::utf16::count_code_points<endianness::BIG>(
+                     ptr, length - (ptr - input));
 }
 
-
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
-  size_t answer =  length / sizeof(__m512i) * sizeof(__m512i); // Number of 512-bit chunks that fits into the length.
+  size_t answer =
+      length / sizeof(__m512i) *
+      sizeof(__m512i); // Number of 512-bit chunks that fits into the length.
   size_t i = 0;
   __m512i unrolled_popcount{0};
 
@@ -24164,35 +24675,43 @@ simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t
     size_t iterations = (length - i) / sizeof(__m512i);
 
     size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
-    for (; i + 8*sizeof(__m512i) <= max_i; i += 8*sizeof(__m512i)) {
-        __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
-        __m512i input2 = _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
-        __m512i input3 = _mm512_loadu_si512((const __m512i *)(str + i + 2*sizeof(__m512i)));
-        __m512i input4 = _mm512_loadu_si512((const __m512i *)(str + i + 3*sizeof(__m512i)));
-        __m512i input5 = _mm512_loadu_si512((const __m512i *)(str + i + 4*sizeof(__m512i)));
-        __m512i input6 = _mm512_loadu_si512((const __m512i *)(str + i + 5*sizeof(__m512i)));
-        __m512i input7 = _mm512_loadu_si512((const __m512i *)(str + i + 6*sizeof(__m512i)));
-        __m512i input8 = _mm512_loadu_si512((const __m512i *)(str + i + 7*sizeof(__m512i)));
-
-
-        __mmask64 mask1 = _mm512_cmple_epi8_mask(input1, continuation);
-        __mmask64 mask2 = _mm512_cmple_epi8_mask(input2, continuation);
-        __mmask64 mask3 = _mm512_cmple_epi8_mask(input3, continuation);
-        __mmask64 mask4 = _mm512_cmple_epi8_mask(input4, continuation);
-        __mmask64 mask5 = _mm512_cmple_epi8_mask(input5, continuation);
-        __mmask64 mask6 = _mm512_cmple_epi8_mask(input6, continuation);
-        __mmask64 mask7 = _mm512_cmple_epi8_mask(input7, continuation);
-        __mmask64 mask8 = _mm512_cmple_epi8_mask(input8, continuation);
-
-        __m512i mask_register = _mm512_set_epi64(mask8, mask7, mask6, mask5, mask4, mask3, mask2, mask1);
-
-
-        unrolled_popcount = _mm512_add_epi64(unrolled_popcount, _mm512_popcnt_epi64(mask_register));
+    for (; i + 8 * sizeof(__m512i) <= max_i; i += 8 * sizeof(__m512i)) {
+      __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
+      __m512i input2 =
+          _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
+      __m512i input3 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 2 * sizeof(__m512i)));
+      __m512i input4 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 3 * sizeof(__m512i)));
+      __m512i input5 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 4 * sizeof(__m512i)));
+      __m512i input6 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 5 * sizeof(__m512i)));
+      __m512i input7 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 6 * sizeof(__m512i)));
+      __m512i input8 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 7 * sizeof(__m512i)));
+
+      __mmask64 mask1 = _mm512_cmple_epi8_mask(input1, continuation);
+      __mmask64 mask2 = _mm512_cmple_epi8_mask(input2, continuation);
+      __mmask64 mask3 = _mm512_cmple_epi8_mask(input3, continuation);
+      __mmask64 mask4 = _mm512_cmple_epi8_mask(input4, continuation);
+      __mmask64 mask5 = _mm512_cmple_epi8_mask(input5, continuation);
+      __mmask64 mask6 = _mm512_cmple_epi8_mask(input6, continuation);
+      __mmask64 mask7 = _mm512_cmple_epi8_mask(input7, continuation);
+      __mmask64 mask8 = _mm512_cmple_epi8_mask(input8, continuation);
+
+      __m512i mask_register = _mm512_set_epi64(mask8, mask7, mask6, mask5,
+                                               mask4, mask3, mask2, mask1);
+
+      unrolled_popcount = _mm512_add_epi64(unrolled_popcount,
+                                           _mm512_popcnt_epi64(mask_register));
     }
 
     for (; i <= max_i; i += sizeof(__m512i)) {
       __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
-      uint64_t continuation_bitmask = static_cast<uint64_t>(_mm512_cmple_epi8_mask(more_input, continuation));
+      uint64_t continuation_bitmask = static_cast<uint64_t>(
+          _mm512_cmple_epi8_mask(more_input, continuation));
       answer -= count_ones(continuation_bitmask);
     }
   }
@@ -24208,59 +24727,70 @@ simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t
             (size_t)_mm256_extract_epi64(second_half, 2) +
             (size_t)_mm256_extract_epi64(second_half, 3);
 
-  return answer + scalar::utf8::count_code_points(reinterpret_cast<const char *>(str + i), length - i);
+  return answer + scalar::utf8::count_code_points(
+                      reinterpret_cast<const char *>(str + i), length - i);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char* buf, size_t len) const noexcept {
-  return count_utf8(buf,len);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
   return scalar::utf16::latin1_length_from_utf16(length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
   return scalar::utf32::latin1_length_from_utf32(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
-  const char16_t* ptr = input;
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
   size_t count{0};
-  if(length >= 32) {
-    const char16_t* end = input + length - 32;
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
     const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
     const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
     const __m512i v_dfff = _mm512_set1_epi16((uint16_t)0xdfff);
     const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
 
-
     while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i*)ptr);
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
       ptr += 32;
       __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
-      __mmask32 two_bytes_bitmask = _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
+      __mmask32 two_bytes_bitmask =
+          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
       __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
-      __mmask32 surrogates_bitmask = _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) & _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
+      __mmask32 surrogates_bitmask =
+          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
+          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
 
       size_t ascii_count = count_ones(ascii_bitmask);
       size_t two_bytes_count = count_ones(two_bytes_bitmask);
       size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
-      size_t three_bytes_count = 32 - ascii_count - two_bytes_count - surrogate_bytes_count;
+      size_t three_bytes_count =
+          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
 
-      count += ascii_count + 2*two_bytes_count + 3*three_bytes_count + 2*surrogate_bytes_count;
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               2 * surrogate_bytes_count;
     }
   }
 
-  return count + scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(ptr, length - (ptr - input));
+  return count + scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(
+                     ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
-  const char16_t* ptr = input;
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
   size_t count{0};
 
-  if(length >= 32) {
-    const char16_t* end = input + length - 32;
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
     const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
     const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
@@ -24268,57 +24798,61 @@ simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16
     const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
 
     const __m512i byteflip = _mm512_setr_epi64(
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809,
-              0x0607040502030001,
-              0x0e0f0c0d0a0b0809
-          );
+        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+        0x0607040502030001, 0x0e0f0c0d0a0b0809);
     while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i*)ptr);
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
       utf16 = _mm512_shuffle_epi8(utf16, byteflip);
       ptr += 32;
       __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
-      __mmask32 two_bytes_bitmask = _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
+      __mmask32 two_bytes_bitmask =
+          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
       __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
-      __mmask32 surrogates_bitmask = _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) & _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
+      __mmask32 surrogates_bitmask =
+          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
+          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
 
       size_t ascii_count = count_ones(ascii_bitmask);
       size_t two_bytes_count = count_ones(two_bytes_bitmask);
       size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
-      size_t three_bytes_count = 32 - ascii_count - two_bytes_count - surrogate_bytes_count;
-      count += ascii_count + 2*two_bytes_count + 3*three_bytes_count + 2*surrogate_bytes_count;
+      size_t three_bytes_count =
+          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               2 * surrogate_bytes_count;
     }
   }
 
-  return count + scalar::utf16::utf8_length_from_utf16<endianness::BIG>(ptr, length - (ptr - input));
+  return count + scalar::utf16::utf8_length_from_utf16<endianness::BIG>(
+                     ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return implementation::count_utf16le(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return implementation::count_utf16be(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
   const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
   size_t answer = length / sizeof(__m512i) * sizeof(__m512i);
   size_t i = 0;
-  if(answer >= 2048) { // long strings optimization
+  if (answer >= 2048) { // long strings optimization
     unsigned char v_0xFF = 0xff;
     __m512i eight_64bits = _mm512_setzero_si512();
     while (i + sizeof(__m512i) <= length) {
@@ -24328,39 +24862,53 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *
         iterations = 255;
       }
       size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
-      for (; i + 4*sizeof(__m512i) <= max_i; i += 4*sizeof(__m512i)) {
-              // Load four __m512i vectors
-              __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
-              __m512i input2 = _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
-              __m512i input3 = _mm512_loadu_si512((const __m512i *)(str + i + 2*sizeof(__m512i)));
-              __m512i input4 = _mm512_loadu_si512((const __m512i *)(str + i + 3*sizeof(__m512i)));
-
-              // Generate four masks
-              __mmask64 mask1 = _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input1);
-              __mmask64 mask2 = _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input2);
-              __mmask64 mask3 = _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input3);
-              __mmask64 mask4 = _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input4);
-              // Apply the masks and subtract from the runner
-              __m512i not_ascii1 = _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask1, v_0xFF);
-              __m512i not_ascii2 = _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask2, v_0xFF);
-              __m512i not_ascii3 = _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask3, v_0xFF);
-              __m512i not_ascii4 = _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask4, v_0xFF);
-
-              runner = _mm512_sub_epi8(runner, not_ascii1);
-              runner = _mm512_sub_epi8(runner, not_ascii2);
-              runner = _mm512_sub_epi8(runner, not_ascii3);
-              runner = _mm512_sub_epi8(runner, not_ascii4);
+      for (; i + 4 * sizeof(__m512i) <= max_i; i += 4 * sizeof(__m512i)) {
+        // Load four __m512i vectors
+        __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
+        __m512i input2 =
+            _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
+        __m512i input3 = _mm512_loadu_si512(
+            (const __m512i *)(str + i + 2 * sizeof(__m512i)));
+        __m512i input4 = _mm512_loadu_si512(
+            (const __m512i *)(str + i + 3 * sizeof(__m512i)));
+
+        // Generate four masks
+        __mmask64 mask1 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input1);
+        __mmask64 mask2 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input2);
+        __mmask64 mask3 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input3);
+        __mmask64 mask4 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input4);
+        // Apply the masks and subtract from the runner
+        __m512i not_ascii1 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask1, v_0xFF);
+        __m512i not_ascii2 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask2, v_0xFF);
+        __m512i not_ascii3 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask3, v_0xFF);
+        __m512i not_ascii4 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask4, v_0xFF);
+
+        runner = _mm512_sub_epi8(runner, not_ascii1);
+        runner = _mm512_sub_epi8(runner, not_ascii2);
+        runner = _mm512_sub_epi8(runner, not_ascii3);
+        runner = _mm512_sub_epi8(runner, not_ascii4);
       }
 
       for (; i <= max_i; i += sizeof(__m512i)) {
         __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
 
-        __mmask64 mask = _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), more_input);
-        __m512i not_ascii = _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask, v_0xFF);
+        __mmask64 mask =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), more_input);
+        __m512i not_ascii =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask, v_0xFF);
         runner = _mm512_sub_epi8(runner, not_ascii);
       }
 
-      eight_64bits = _mm512_add_epi64(eight_64bits, _mm512_sad_epu8(runner, _mm512_setzero_si512()));
+      eight_64bits = _mm512_add_epi64(
+          eight_64bits, _mm512_sad_epu8(runner, _mm512_setzero_si512()));
     }
 
     __m256i first_half = _mm512_extracti64x4_epi64(eight_64bits, 0);
@@ -24374,110 +24922,140 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *
               (size_t)_mm256_extract_epi64(second_half, 2) +
               (size_t)_mm256_extract_epi64(second_half, 3);
   } else if (answer > 0) {
-    for(; i + sizeof(__m512i) <= length; i += sizeof(__m512i)) {
-      __m512i latin = _mm512_loadu_si512((const __m512i*)(str + i));
+    for (; i + sizeof(__m512i) <= length; i += sizeof(__m512i)) {
+      __m512i latin = _mm512_loadu_si512((const __m512i *)(str + i));
       uint64_t non_ascii = _mm512_movepi8_mask(latin);
       answer += count_ones(non_ascii);
     }
   }
-  return answer + scalar::latin1::utf8_length_from_latin1(reinterpret_cast<const char *>(str + i), length - i);
+  return answer + scalar::latin1::utf8_length_from_latin1(
+                      reinterpret_cast<const char *>(str + i), length - i);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos + 64 <= length; pos += 64) {
-      __m512i utf8 = _mm512_loadu_si512((const __m512i*)(input+pos));
-      uint64_t utf8_continuation_mask = _mm512_cmplt_epi8_mask(utf8, _mm512_set1_epi8(-65+1));
-      // We count one word for anything that is not a continuation (so
-      // leading bytes).
-      count += 64 - count_ones(utf8_continuation_mask);
-      uint64_t utf8_4byte = _mm512_cmpge_epu8_mask(utf8, _mm512_set1_epi8(int8_t(240)));
-      count += count_ones(utf8_4byte);
-    }
-    return count + scalar::utf8::utf16_length_from_utf8(input + pos, length - pos);
-}
-
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
-  const char32_t* ptr = input;
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= length; pos += 64) {
+    __m512i utf8 = _mm512_loadu_si512((const __m512i *)(input + pos));
+    uint64_t utf8_continuation_mask =
+        _mm512_cmplt_epi8_mask(utf8, _mm512_set1_epi8(-65 + 1));
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    uint64_t utf8_4byte =
+        _mm512_cmpge_epu8_mask(utf8, _mm512_set1_epi8(int8_t(240)));
+    count += count_ones(utf8_4byte);
+  }
+  return count +
+         scalar::utf8::utf16_length_from_utf8(input + pos, length - pos);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const char32_t *ptr = input;
   size_t count{0};
 
-  if(length >= 16) {
-    const char32_t* end = input + length - 16;
+  if (length >= 16) {
+    const char32_t *end = input + length - 16;
 
     const __m512i v_0000_007f = _mm512_set1_epi32((uint32_t)0x7f);
     const __m512i v_0000_07ff = _mm512_set1_epi32((uint32_t)0x7ff);
     const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
 
-
     while (ptr <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i*)ptr);
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
       ptr += 16;
       __mmask16 ascii_bitmask = _mm512_cmple_epu32_mask(utf32, v_0000_007f);
-      __mmask16 two_bytes_bitmask = _mm512_mask_cmple_epu32_mask(_knot_mask16(ascii_bitmask), utf32, v_0000_07ff);
-      __mmask16 three_bytes_bitmask = _mm512_mask_cmple_epu32_mask(_knot_mask16(_mm512_kor(ascii_bitmask, two_bytes_bitmask)), utf32, v_0000_ffff);
+      __mmask16 two_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
+          _knot_mask16(ascii_bitmask), utf32, v_0000_07ff);
+      __mmask16 three_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
+          _knot_mask16(_mm512_kor(ascii_bitmask, two_bytes_bitmask)), utf32,
+          v_0000_ffff);
 
       size_t ascii_count = count_ones(ascii_bitmask);
       size_t two_bytes_count = count_ones(two_bytes_bitmask);
       size_t three_bytes_count = count_ones(three_bytes_bitmask);
-      size_t four_bytes_count = 16 - ascii_count - two_bytes_count - three_bytes_count;
-      count += ascii_count + 2*two_bytes_count + 3*three_bytes_count + 4*four_bytes_count;
+      size_t four_bytes_count =
+          16 - ascii_count - two_bytes_count - three_bytes_count;
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               4 * four_bytes_count;
     }
   }
 
-  return count + scalar::utf32::utf8_length_from_utf32(ptr, length - (ptr - input));
+  return count +
+         scalar::utf32::utf8_length_from_utf32(ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
-  const char32_t* ptr = input;
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const char32_t *ptr = input;
   size_t count{0};
 
-  if(length >= 16) {
-    const char32_t* end = input + length - 16;
+  if (length >= 16) {
+    const char32_t *end = input + length - 16;
 
     const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
 
-
     while (ptr <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i*)ptr);
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
       ptr += 16;
-      __mmask16 surrogates_bitmask = _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
+      __mmask16 surrogates_bitmask =
+          _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
 
       count += 16 + count_ones(surrogates_bitmask);
     }
   }
 
-  return count + scalar::utf32::utf16_length_from_utf32(ptr, length - (ptr - input));
+  return count +
+         scalar::utf32::utf16_length_from_utf32(ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return implementation::count_utf8(input, length);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  if(options & base64_url) {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
     return encode_base64<true>(output, input, length, options);
   } else {
     return encode_base64<false>(output, input, length, options);
@@ -24495,7 +25073,8 @@ SIMDUTF_UNTARGET_REGION
 #endif
 
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
 SIMDUTF_POP_DISABLE_WARNINGS
 #endif // end of workaround
 /* end file src/simdutf/icelake/end.h */
@@ -24514,34 +25093,47 @@ SIMDUTF_POP_DISABLE_WARNINGS
 SIMDUTF_TARGET_HASWELL
 #endif
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
 SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
 #endif // end of workaround
 /* end file src/simdutf/haswell/begin.h */
 namespace simdutf {
 namespace haswell {
 namespace {
 #ifndef SIMDUTF_HASWELL_H
-#error "haswell.h must be included"
+  #error "haswell.h must be included"
 #endif
 using namespace simd;
 
-
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t>& input) {
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
   return input.reduce_or().is_ascii();
 }
 
-simdutf_unused simdutf_really_inline simd8<bool> must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_second_byte = prev1.saturating_sub(0b11000000u-1); // Only 11______ will be > 0
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0b11100000u-1); // Only 111_____ will be > 0
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0b11110000u-1); // Only 1111____ will be > 0
-  // Caller requires a bool (all 1's). All values resulting from the subtraction will be <= 64, so signed comparison is fine.
-  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) > int8_t(0);
-}
-
-simdutf_really_inline simd8<bool> must_be_2_3_continuation(const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0xe0u-0x80); // Only 111_____ will be > 0x80
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0xf0u-0x80); // Only 1111____ will be > 0x80
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_second_byte =
+      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
+         int8_t(0);
+}
+
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be > 0x80
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be > 0x80
   return simd8<bool>(is_third_byte | is_fourth_byte);
 }
 
@@ -24583,239 +25175,253 @@ simdutf_really_inline simd8<bool> must_be_2_3_continuation(const simd8<uint8_t>
       0   0   1   0   1   0   0   0   b = a << 1
       1   1   1   1   1   1   1   0   c = V | a | b
                                   ^
-                                  the last bit can be zero, we just consume 7 code units
-                                  and recheck this word in the next iteration
+                                  the last bit can be zero, we just consume 7
+   code units and recheck this word in the next iteration
 */
 
 /* Returns:
-   - pointer to the last unprocessed character (a scalar fallback should check the rest);
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
    - nullptr if an error was detected.
 */
 template <endianness big_endian>
-const char16_t* avx2_validate_utf16(const char16_t* input, size_t size) {
-    const char16_t* end = input + size;
+const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
 
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-    while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
 
-        if (big_endian) {
-            in0 = in0.swap_bytes();
-            in1 = in1.swap_bytes();
-        }
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
 
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
 
-        const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd16<uint16_t>::pack(t0, t1);
 
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const auto surrogates_wordmask = (in & v_f8) == v_d8;
-        const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
-        if (surrogates_bitmask == 0x0) {
-            input += simd16<uint16_t>::ELEMENTS * 2;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint32_t V = ~surrogates_bitmask;
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto    vH = (in & v_fc) == v_dc;
-            const uint32_t H = vH.to_bitmask();
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint32_t L = ~H & surrogates_bitmask;
-
-            const uint32_t a = L & (H >> 1);  // A low surrogate must be followed by high one.
-                                              // (A low surrogate placed in the 7th register's word
-                                              // is an exception we handle.)
-            const uint32_t b = a << 1;        // Just mark that the opposite fact is hold,
-                                              // thanks to that we have only two masks for valid case.
-            const uint32_t c = V | a | b;     // Combine all the masks into the final one.
-
-            if (c == 0xffffffff) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += simd16<uint16_t>::ELEMENTS * 2;
-            } else if (c == 0x7fffffff) {
-                // The 31 lower code units of the input register contains valid UTF-16.
-                // The 31 word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += simd16<uint16_t>::ELEMENTS * 2 - 1;
-            } else {
-                return nullptr;
-            }
-        }
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint32_t V = ~surrogates_bitmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint32_t H = vH.to_bitmask();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint32_t L = ~H & surrogates_bitmask;
+
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+
+      if (c == 0xffffffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+      } else {
+        return nullptr;
+      }
     }
+  }
 
-    return input;
+  return input;
 }
 
-
 template <endianness big_endian>
-const result avx2_validate_utf16_with_errors(const char16_t* input, size_t size) {
-    if (simdutf_unlikely(size == 0)) {
-        return result(error_code::SUCCESS, 0);
-    }
-    const char16_t *start = input;
-    const char16_t* end = input + size;
+const result avx2_validate_utf16_with_errors(const char16_t *input,
+                                             size_t size) {
+  if (simdutf_unlikely(size == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  const char16_t *start = input;
+  const char16_t *end = input + size;
 
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-    while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
 
-        if (big_endian) {
-            in0 = in0.swap_bytes();
-            in1 = in1.swap_bytes();
-        }
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
 
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
 
-        const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd16<uint16_t>::pack(t0, t1);
 
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const auto surrogates_wordmask = (in & v_f8) == v_d8;
-        const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
-        if (surrogates_bitmask == 0x0) {
-            input += simd16<uint16_t>::ELEMENTS * 2;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint32_t V = ~surrogates_bitmask;
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto    vH = (in & v_fc) == v_dc;
-            const uint32_t H = vH.to_bitmask();
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint32_t L = ~H & surrogates_bitmask;
-
-            const uint32_t a = L & (H >> 1);  // A low surrogate must be followed by high one.
-                                              // (A low surrogate placed in the 7th register's word
-                                              // is an exception we handle.)
-            const uint32_t b = a << 1;        // Just mark that the opposite fact is hold,
-                                              // thanks to that we have only two masks for valid case.
-            const uint32_t c = V | a | b;     // Combine all the masks into the final one.
-
-            if (c == 0xffffffff) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += simd16<uint16_t>::ELEMENTS * 2;
-            } else if (c == 0x7fffffff) {
-                // The 31 lower code units of the input register contains valid UTF-16.
-                // The 31 word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += simd16<uint16_t>::ELEMENTS * 2 - 1;
-            } else {
-                return result(error_code::SURROGATE, input - start);
-            }
-        }
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint32_t V = ~surrogates_bitmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint32_t H = vH.to_bitmask();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint32_t L = ~H & surrogates_bitmask;
+
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+
+      if (c == 0xffffffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
     }
+  }
 
-    return result(error_code::SUCCESS, input - start);
+  return result(error_code::SUCCESS, input - start);
 }
 /* end file src/haswell/avx2_validate_utf16.cpp */
 /* begin file src/haswell/avx2_validate_utf32le.cpp */
 /* Returns:
-   - pointer to the last unprocessed character (a scalar fallback should check the rest);
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
    - nullptr if an error was detected.
 */
-const char32_t* avx2_validate_utf32le(const char32_t* input, size_t size) {
-    const char32_t* end = input + size;
-
-    const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
-    const __m256i offset = _mm256_set1_epi32(0xffff2000);
-    const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
-    __m256i currentmax = _mm256_setzero_si256();
-    __m256i currentoffsetmax = _mm256_setzero_si256();
-
-    while (input + 8 < end) {
-        const __m256i in = _mm256_loadu_si256((__m256i *)input);
-        currentmax = _mm256_max_epu32(in,currentmax);
-        currentoffsetmax = _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
-        input += 8;
-    }
-    __m256i is_zero = _mm256_xor_si256(_mm256_max_epu32(currentmax, standardmax), standardmax);
-    if(_mm256_testz_si256(is_zero, is_zero) == 0) {
-        return nullptr;
-    }
+const char32_t *avx2_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
 
-    is_zero = _mm256_xor_si256(_mm256_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-    if(_mm256_testz_si256(is_zero, is_zero) == 0) {
-        return nullptr;
-    }
+  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
+  const __m256i offset = _mm256_set1_epi32(0xffff2000);
+  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
+  __m256i currentmax = _mm256_setzero_si256();
+  __m256i currentoffsetmax = _mm256_setzero_si256();
 
-    return input;
-}
+  while (input + 8 < end) {
+    const __m256i in = _mm256_loadu_si256((__m256i *)input);
+    currentmax = _mm256_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
+    input += 8;
+  }
+  __m256i is_zero =
+      _mm256_xor_si256(_mm256_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+    return nullptr;
+  }
 
+  is_zero = _mm256_xor_si256(
+      _mm256_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
+  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+    return nullptr;
+  }
 
-const result avx2_validate_utf32le_with_errors(const char32_t* input, size_t size) {
-    const char32_t* start = input;
-    const char32_t* end = input + size;
+  return input;
+}
 
-    const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
-    const __m256i offset = _mm256_set1_epi32(0xffff2000);
-    const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
-    __m256i currentmax = _mm256_setzero_si256();
-    __m256i currentoffsetmax = _mm256_setzero_si256();
+const result avx2_validate_utf32le_with_errors(const char32_t *input,
+                                               size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
 
-    while (input + 8 < end) {
-        const __m256i in = _mm256_loadu_si256((__m256i *)input);
-        currentmax = _mm256_max_epu32(in,currentmax);
-        currentoffsetmax = _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
+  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
+  const __m256i offset = _mm256_set1_epi32(0xffff2000);
+  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
+  __m256i currentmax = _mm256_setzero_si256();
+  __m256i currentoffsetmax = _mm256_setzero_si256();
 
-        __m256i is_zero = _mm256_xor_si256(_mm256_max_epu32(currentmax, standardmax), standardmax);
-        if(_mm256_testz_si256(is_zero, is_zero) == 0) {
-            return result(error_code::TOO_LARGE, input - start);
-        }
+  while (input + 8 < end) {
+    const __m256i in = _mm256_loadu_si256((__m256i *)input);
+    currentmax = _mm256_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
 
-        is_zero = _mm256_xor_si256(_mm256_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-        if(_mm256_testz_si256(is_zero, is_zero) == 0) {
-            return result(error_code::SURROGATE, input - start);
-        }
-        input += 8;
+    __m256i is_zero = _mm256_xor_si256(
+        _mm256_max_epu32(currentmax, standardmax), standardmax);
+    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
+
+    is_zero =
+        _mm256_xor_si256(_mm256_max_epu32(currentoffsetmax, standardoffsetmax),
+                         standardoffsetmax);
+    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+      return result(error_code::SURROGATE, input - start);
     }
+    input += 8;
+  }
 
-    return result(error_code::SUCCESS, input - start);
+  return result(error_code::SUCCESS, input - start);
 }
 /* end file src/haswell/avx2_validate_utf32le.cpp */
 
 /* begin file src/haswell/avx2_convert_latin1_to_utf8.cpp */
-std::pair<const char *, char *> avx2_convert_latin1_to_utf8(const char *latin1_input, size_t len,
-                           char *utf8_output) {
+std::pair<const char *, char *>
+avx2_convert_latin1_to_utf8(const char *latin1_input, size_t len,
+                            char *utf8_output) {
   const char *end = latin1_input + len;
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
@@ -24857,8 +25463,10 @@ std::pair<const char *, char *> avx2_convert_latin1_to_utf8(const char *latin1_i
     // 2. merge ASCII and 2-byte codewords
 
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in, one_byte_bytemask);
 
@@ -24897,116 +25505,128 @@ std::pair<const char *, char *> avx2_convert_latin1_to_utf8(const char *latin1_i
 /* end file src/haswell/avx2_convert_latin1_to_utf8.cpp */
 /* begin file src/haswell/avx2_convert_latin1_to_utf16.cpp */
 template <endianness big_endian>
-std::pair<const char*, char16_t*> avx2_convert_latin1_to_utf16(const char* latin1_input, size_t len, char16_t* utf16_output) {
-    size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 32
-
-    size_t i = 0;
-    for (; i < rounded_len; i += 16) {
-        // Load 16 bytes from the address (input + i) into a xmm register
-        __m128i xmm0 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(latin1_input + i));
+std::pair<const char *, char16_t *>
+avx2_convert_latin1_to_utf16(const char *latin1_input, size_t len,
+                             char16_t *utf16_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 32
 
-        // Zero extend each byte in xmm0 to word and put it in another xmm register
-        __m128i xmm1 = _mm_cvtepu8_epi16(xmm0);
-
-        // Shift xmm0 to the right by 8 bytes
-        xmm0 = _mm_srli_si128(xmm0, 8);
+  size_t i = 0;
+  for (; i < rounded_len; i += 16) {
+    // Load 16 bytes from the address (input + i) into a xmm register
+    __m128i xmm0 =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(latin1_input + i));
 
-        // Zero extend each byte in the shifted xmm0 to word in xmm0
-        xmm0 = _mm_cvtepu8_epi16(xmm0);
+    // Zero extend each byte in xmm0 to word and put it in another xmm register
+    __m128i xmm1 = _mm_cvtepu8_epi16(xmm0);
 
-        if (big_endian) {
-            const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-            xmm0 = _mm_shuffle_epi8(xmm0, swap);
-            xmm1 = _mm_shuffle_epi8(xmm1, swap);
-        }
+    // Shift xmm0 to the right by 8 bytes
+    xmm0 = _mm_srli_si128(xmm0, 8);
 
-        // Store the contents of xmm1 into the address pointed by (output + i)
-        _mm_storeu_si128(reinterpret_cast<__m128i*>(utf16_output + i), xmm1);
+    // Zero extend each byte in the shifted xmm0 to word in xmm0
+    xmm0 = _mm_cvtepu8_epi16(xmm0);
 
-        // Store the contents of xmm0 into the address pointed by (output + i + 8)
-        _mm_storeu_si128(reinterpret_cast<__m128i*>(utf16_output + i + 8), xmm0);
+    if (big_endian) {
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      xmm0 = _mm_shuffle_epi8(xmm0, swap);
+      xmm1 = _mm_shuffle_epi8(xmm1, swap);
     }
 
-    return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
+    // Store the contents of xmm1 into the address pointed by (output + i)
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i), xmm1);
+
+    // Store the contents of xmm0 into the address pointed by (output + i + 8)
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i + 8), xmm0);
+  }
 
+  return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
 }
 /* end file src/haswell/avx2_convert_latin1_to_utf16.cpp */
 /* begin file src/haswell/avx2_convert_latin1_to_utf32.cpp */
-std::pair<const char*, char32_t*> avx2_convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) {
-    size_t rounded_len = ((len | 7) ^ 7);  // Round down to nearest multiple of 8
+std::pair<const char *, char32_t *>
+avx2_convert_latin1_to_utf32(const char *buf, size_t len,
+                             char32_t *utf32_output) {
+  size_t rounded_len = ((len | 7) ^ 7); // Round down to nearest multiple of 8
 
-    for (size_t i = 0; i < rounded_len; i += 8) {
-        // Load 8 Latin1 characters into a 64-bit register
-        __m128i in = _mm_loadl_epi64((__m128i*)&buf[i]);
+  for (size_t i = 0; i < rounded_len; i += 8) {
+    // Load 8 Latin1 characters into a 64-bit register
+    __m128i in = _mm_loadl_epi64((__m128i *)&buf[i]);
 
-        // Zero extend each set of 8 Latin1 characters to 8 32-bit integers using vpmovzxbd
-        __m256i out = _mm256_cvtepu8_epi32(in);
+    // Zero extend each set of 8 Latin1 characters to 8 32-bit integers using
+    // vpmovzxbd
+    __m256i out = _mm256_cvtepu8_epi32(in);
 
-        // Store the results back to memory
-        _mm256_storeu_si256((__m256i*)&utf32_output[i], out);
-    }
+    // Store the results back to memory
+    _mm256_storeu_si256((__m256i *)&utf32_output[i], out);
+  }
 
-    // return pointers pointing to where we left off
-    return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
+  // return pointers pointing to where we left off
+  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
 }
-
 /* end file src/haswell/avx2_convert_latin1_to_utf32.cpp */
 
 /* begin file src/haswell/avx2_convert_utf8_to_utf16.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
-
 // Convert up to 12 bytes from utf8 to utf16 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 template <endianness big_endian>
 size_t convert_masked_utf8_to_utf16(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char16_t *&utf16_output) {
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   // We first try a few fast paths.
-  const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  const __m128i swap =
+      _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
   const __m128i in = _mm_loadu_si128((__m128i *)input);
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xfff;
-  if(utf8_end_of_code_point_mask  == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
     __m256i ascii = _mm256_cvtepu8_epi16(in);
     if (big_endian) {
-      const __m256i swap256 = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      const __m256i swap256 = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
       ascii = _mm256_shuffle_epi8(ascii, swap256);
     }
     _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf16_output), ascii);
     utf16_output += 12; // We wrote 12 16-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;          // We consumed 12 bytes.
   }
-  if(((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte UTF-16 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian) composed = _mm_shuffle_epi8(composed, swap);
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed);
     utf16_output += 8; // We wrote 16 bytes, 8 code points.
     return 16;
   }
-  if(input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte UTF-16 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -25019,35 +25639,39 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i composed =
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
     __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian) composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
     utf16_output += 4;
     return 12;
   }
 
-  const uint8_t idx =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      simdutf::tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
   if (idx < 64) {
     // SIX (6) input code-code units
     // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six code
-    // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-    // where pdep/pext is fast, we might be able to use a small lookup table.
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian) composed = _mm_shuffle_epi8(composed, swap);
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed);
-    utf16_output += 6; // We wrote 12 bytes, 6 code points. There is a potential overflow of 4 bytes.
+    utf16_output += 6; // We wrote 12 bytes, 6 code points. There is a potential
+                       // overflow of 4 bytes.
   } else if (idx < 145) {
     // FOUR (4) input code-code units
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -25060,21 +25684,23 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i composed =
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
     __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian) composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
     utf16_output += 4; // Here we overflow by 8 bytes.
   } else if (idx < 209) {
     // TWO (2) input code-code units
     //////////////
-    // There might be garbage inputs where a leading byte mascarades as a four-byte
-    // leading byte (by being followed by 3 continuation byte), but is not greater than
-    // 0xf0. This could trigger a buffer overflow if we only counted leading
-    // bytes of the form 0xf0 as generating surrogate pairs, without further UTF-8 validation.
-    // Thus we must be careful to ensure that only leading bytes at least as large as 0xf0 generate surrogate pairs.
-    // We do as at the cost of an extra mask.
+    // There might be garbage inputs where a leading byte mascarades as a
+    // four-byte leading byte (by being followed by 3 continuation byte), but is
+    // not greater than 0xf0. This could trigger a buffer overflow if we only
+    // counted leading bytes of the form 0xf0 as generating surrogate pairs,
+    // without further UTF-8 validation. Thus we must be careful to ensure that
+    // only leading bytes at least as large as 0xf0 generate surrogate pairs. We
+    // do as at the cost of an extra mask.
     /////////////
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
     const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
@@ -25085,8 +25711,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
         _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
     middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
     const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
-    // We deliberately carry the leading four bits in highbyte if they are present,
-    // we remove them later when computing hightenbits.
+    // We deliberately carry the leading four bits in highbyte if they are
+    // present, we remove them later when computing hightenbits.
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0xff000000));
     const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
     // When we need to generate a surrogate pair (leading byte > 0xF0), then
@@ -25101,30 +25727,32 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i lowtenbits =
         _mm_and_si128(composedminus, _mm_set1_epi32(0x3ff));
     // Notice the 0x3ff mask:
-    const __m128i hightenbits = _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
+    const __m128i hightenbits =
+        _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
     const __m128i lowtenbitsadd =
         _mm_add_epi32(lowtenbits, _mm_set1_epi32(0xDC00));
     const __m128i hightenbitsadd =
         _mm_add_epi32(hightenbits, _mm_set1_epi32(0xD800));
     const __m128i lowtenbitsaddshifted = _mm_slli_epi32(lowtenbitsadd, 16);
-    __m128i surrogates =
-        _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
+    __m128i surrogates = _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
     uint32_t basic_buffer[4];
     uint32_t basic_buffer_swap[4];
     if (big_endian) {
-      _mm_storeu_si128((__m128i *)basic_buffer_swap, _mm_shuffle_epi8(composed, swap));
+      _mm_storeu_si128((__m128i *)basic_buffer_swap,
+                       _mm_shuffle_epi8(composed, swap));
       surrogates = _mm_shuffle_epi8(surrogates, swap);
     }
     _mm_storeu_si128((__m128i *)basic_buffer, composed);
     uint32_t surrogate_buffer[4];
     _mm_storeu_si128((__m128i *)surrogate_buffer, surrogates);
     for (size_t i = 0; i < 3; i++) {
-      if(basic_buffer[i] > 0x3c00000) {
+      if (basic_buffer[i] > 0x3c00000) {
         utf16_output[0] = uint16_t(surrogate_buffer[i] & 0xffff);
         utf16_output[1] = uint16_t(surrogate_buffer[i] >> 16);
         utf16_output += 2;
-      } else  {
-        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i]) : uint16_t(basic_buffer[i]);
+      } else {
+        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i])
+                                     : uint16_t(basic_buffer[i]);
         utf16_output++;
       }
     }
@@ -25137,50 +25765,57 @@ size_t convert_masked_utf8_to_utf16(const char *input,
 /* begin file src/haswell/avx2_convert_utf8_to_utf32.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
-
 // Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_utf32(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char32_t *&utf32_output) {
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   // We first try a few fast paths.
   const __m128i in = _mm_loadu_si128((__m128i *)input);
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xfff;
-  if(utf8_end_of_code_point_mask == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output), _mm256_cvtepu8_epi32(in));
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output+8), _mm256_cvtepu8_epi32(_mm_srli_si128(in,8)));
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                        _mm256_cvtepu8_epi32(in));
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output + 8),
+                        _mm256_cvtepu8_epi32(_mm_srli_si128(in, 8)));
     utf32_output += 12; // We wrote 12 32-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;          // We consumed 12 bytes.
   }
-  if(((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte UTF-32 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm256_storeu_si256((__m256i *)utf32_output, _mm256_cvtepu16_epi32(composed));
+    _mm256_storeu_si256((__m256i *)utf32_output,
+                        _mm256_cvtepu16_epi32(composed));
     utf32_output += 8; // We wrote 16 bytes, 8 code points.
     return 16;
   }
-  if(input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte UTF-32 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -25205,16 +25840,18 @@ size_t convert_masked_utf8_to_utf32(const char *input,
   if (idx < 64) {
     // SIX (6) input code-code units
     // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six code
-    // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-    // where pdep/pext is fast, we might be able to use a small lookup table.
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
     const __m128i sh =
         _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm256_storeu_si256((__m256i *)utf32_output, _mm256_cvtepu16_epi32(composed));
+    _mm256_storeu_si256((__m256i *)utf32_output,
+                        _mm256_cvtepu16_epi32(composed));
     utf32_output += 6; // We wrote 24 bytes, 6 code points. There is a potential
     // overflow of 32 - 24 = 8 bytes.
   } else if (idx < 145) {
@@ -25254,7 +25891,8 @@ size_t convert_masked_utf8_to_utf32(const char *input,
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
                      _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
     _mm_storeu_si128((__m128i *)utf32_output, composed);
-    utf32_output += 3; // We wrote 3 * 4 bytes, there is a potential overflow of 4 bytes.
+    utf32_output +=
+        3; // We wrote 3 * 4 bytes, there is a potential overflow of 4 bytes.
   } else {
     // here we know that there is an error but we do not handle errors
   }
@@ -25399,236 +26037,270 @@ avx2_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
     - We need two 256-entry tables that have 8704 bytes in total.
 */
 
-
 /*
   Returns a pair: the first unprocessed byte from buf and utf8_output
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::pair<const char16_t*, char*> avx2_convert_utf16_to_utf8(const char16_t* buf, size_t len, char* utf8_output) {
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char *>
+avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
+  const char16_t *end = buf + len;
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
   const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
   const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-  while (end -  buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
       in = _mm256_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
     const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
-    if(_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
+    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
 
-          // 1. prepare 2-byte values
-          // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-          // expected output   : [110a|aaaa|10bb|bbbb] x 8
-          const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-          const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-          // t0 = [000a|aaaa|bbbb|bb00]
-          const __m256i t0 = _mm256_slli_epi16(in, 2);
-          // t1 = [000a|aaaa|0000|0000]
-          const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-          // t2 = [0000|0000|00bb|bbbb]
-          const __m256i t2 = _mm256_and_si256(in, v_003f);
-          // t3 = [000a|aaaa|00bb|bbbb]
-          const __m256i t3 = _mm256_or_si256(t1, t2);
-          // t4 = [110a|aaaa|10bb|bbbb]
-          const __m256i t4 = _mm256_or_si256(t3, v_c080);
-
-          // 2. merge ASCII and 2-byte codewords
-          const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in, one_byte_bytemask);
-
-          // 3. prepare bitmask for 8-bit lookup
-          const uint32_t M0 = one_byte_bitmask & 0x55555555;
-          const uint32_t M1 = M0 >> 7;
-          const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
-          // 4. pack the bytes
-
-          const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-          const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
-
-          const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-          const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
-
-          const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
-          // 5. store bytes
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
-          utf8_output += row[0];
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
-          utf8_output += row_2[0];
-
-          // 6. adjust pointers
-          buf += 16;
-          continue;
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
+
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
+
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
+
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
+
+      // 6. adjust pointers
+      buf += 16;
+      continue;
     }
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x00000000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                                0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
 #define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
-
-        // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-        const __m256i s0 = _mm256_srli_epi16(in, 4);
-        // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-        const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
-        // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-        const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
-        // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-        const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
-        const __m256i s4 = _mm256_xor_si256(s3, m0);
-#undef simdutf_vec
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-        // 4. expand code units 16-bit => 32-bit
-        const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-        const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                              (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-        // Due to the wider registers, the following path is less likely to be useful.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-          const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-          const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-          utf8_output += 12;
-          buf += 16;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-        const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-        const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-        const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-        const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-        const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-        const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
-        const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-        const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-        const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-        const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-        utf8_output += row0[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-        utf8_output += row1[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
-        utf8_output += row2[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
-        utf8_output += row3[0];
-        buf += 16;
-    // surrogate pair(s) in a register
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, utf8_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -25638,240 +26310,278 @@ std::pair<const char16_t*, char*> avx2_convert_utf16_to_utf8(const char16_t* buf
   return std::make_pair(buf, utf8_output);
 }
 
-
 /*
   Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
 */
 template <endianness big_endian>
-std::pair<result, char*> avx2_convert_utf16_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char *>
+avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                       char *utf8_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
   const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
   const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
       in = _mm256_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
     const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
-    if(_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
+    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
 
-          // 1. prepare 2-byte values
-          // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-          // expected output   : [110a|aaaa|10bb|bbbb] x 8
-          const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-          const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-          // t0 = [000a|aaaa|bbbb|bb00]
-          const __m256i t0 = _mm256_slli_epi16(in, 2);
-          // t1 = [000a|aaaa|0000|0000]
-          const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-          // t2 = [0000|0000|00bb|bbbb]
-          const __m256i t2 = _mm256_and_si256(in, v_003f);
-          // t3 = [000a|aaaa|00bb|bbbb]
-          const __m256i t3 = _mm256_or_si256(t1, t2);
-          // t4 = [110a|aaaa|10bb|bbbb]
-          const __m256i t4 = _mm256_or_si256(t3, v_c080);
-
-          // 2. merge ASCII and 2-byte codewords
-          const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in, one_byte_bytemask);
-
-          // 3. prepare bitmask for 8-bit lookup
-          const uint32_t M0 = one_byte_bitmask & 0x55555555;
-          const uint32_t M1 = M0 >> 7;
-          const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
-          // 4. pack the bytes
-
-          const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-          const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
-
-          const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-          const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
-
-          const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
-          // 5. store bytes
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
-          utf8_output += row[0];
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
-          utf8_output += row_2[0];
-
-          // 6. adjust pointers
-          buf += 16;
-          continue;
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
+
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
+
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
+
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
+
+      // 6. adjust pointers
+      buf += 16;
+      continue;
     }
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x00000000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                                0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
 #define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
-
-        // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-        const __m256i s0 = _mm256_srli_epi16(in, 4);
-        // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-        const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
-        // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-        const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
-        // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-        const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
-        const __m256i s4 = _mm256_xor_si256(s3, m0);
-#undef simdutf_vec
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-        // 4. expand code units 16-bit => 32-bit
-        const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-        const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                              (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-        // Due to the wider registers, the following path is less likely to be useful.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-          const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-          const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-          utf8_output += 12;
-          buf += 16;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-        const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-        const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-        const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-        const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-        const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-        const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
-        const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-        const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-        const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-        const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-        utf8_output += row0[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-        utf8_output += row1[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
-        utf8_output += row2[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
-        utf8_output += row3[0];
-        buf += 16;
-    // surrogate pair(s) in a register
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), utf8_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf8_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -25931,61 +26641,74 @@ std::pair<result, char*> avx2_convert_utf16_to_utf8_with_errors(const char16_t*
     - We need two 256-entry tables that have 8704 bytes in total.
 */
 
-
 /*
   Returns a pair: the first unprocessed byte from buf and utf32_output
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::pair<const char16_t*, char32_t*> avx2_convert_utf16_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) {
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char32_t *>
+avx2_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char16_t *end = buf + len;
   const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
   const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
 
   while (end - buf >= 16) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
       in = _mm256_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x00000000) {
-      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code units
-        _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output), _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
-        _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output + 8), _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in,1)));
-        utf32_output += 16;
-        buf += 16;
-    // surrogate pair(s) in a register
+      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
+      // units
+      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
+      _mm256_storeu_si256(
+          reinterpret_cast<__m256i *>(utf32_output + 8),
+          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
+      utf32_output += 16;
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
+        if ((word & 0xF800) != 0xD800) {
           // No surrogate pair
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, utf32_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf32_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
         }
@@ -25996,64 +26719,80 @@ std::pair<const char16_t*, char32_t*> avx2_convert_utf16_to_utf32(const char16_t
   return std::make_pair(buf, utf32_output);
 }
 
-
 /*
   Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
 */
 template <endianness big_endian>
-std::pair<result, char32_t*> avx2_convert_utf16_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char32_t *>
+avx2_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                        char32_t *utf32_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
   const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
   const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
 
   while (end - buf >= 16) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14,
-                                  17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
       in = _mm256_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x00000000) {
-      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code units
-        _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output), _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
-        _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output + 8), _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in,1)));
-        utf32_output += 16;
-        buf += 16;
-    // surrogate pair(s) in a register
+      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
+      // units
+      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
+      _mm256_storeu_si256(
+          reinterpret_cast<__m256i *>(utf32_output + 8),
+          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
+      utf32_output += 16;
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
+        if ((word & 0xF800) != 0xD800) {
           // No surrogate pair
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), utf32_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf32_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
         }
@@ -26070,13 +26809,13 @@ std::pair<const char32_t *, char *>
 avx2_convert_utf32_to_latin1(const char32_t *buf, size_t len,
                              char *latin1_output) {
   const size_t rounded_len =
-  len & ~0x1F; // Round down to nearest multiple of 32
+      len & ~0x1F; // Round down to nearest multiple of 32
 
   __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
 
   __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-                                    -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
-                                    -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
+                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
+                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
 
   for (size_t i = 0; i < rounded_len; i += 16) {
     __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
@@ -26088,19 +26827,18 @@ avx2_convert_utf32_to_latin1(const char32_t *buf, size_t len,
       return std::make_pair(nullptr, latin1_output);
     }
 
-    //Turn UTF32 bytes into latin 1 bytes
+    // Turn UTF32 bytes into latin 1 bytes
     __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
     __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
 
-    //move Latin1 bytes to their correct spot
-    __m256i idx1 = _mm256_set_epi32(-1, -1,-1,-1,-1,-1,4,0);
-    __m256i idx2 = _mm256_set_epi32(-1, -1,-1,-1,4,0,-1,-1);
+    // move Latin1 bytes to their correct spot
+    __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
+    __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
     __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
     __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
 
     __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
-    _mm_storeu_si128((__m128i *)latin1_output,
-                     _mm256_castsi256_si128(result));
+    _mm_storeu_si128((__m128i *)latin1_output, _mm256_castsi256_si128(result));
 
     latin1_output += 16;
     buf += 16;
@@ -26111,57 +26849,60 @@ avx2_convert_utf32_to_latin1(const char32_t *buf, size_t len,
 std::pair<result, char *>
 avx2_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
                                          char *latin1_output) {
-    const size_t rounded_len =
-        len & ~0x1F; // Round down to nearest multiple of 32
-
-    __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
-    __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-                                       -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
-                                       -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
-
-    const char32_t *start = buf;
-
-    for (size_t i = 0; i < rounded_len; i += 16) {
-        __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
-        __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
-
-        __m256i check_combined = _mm256_or_si256(in1, in2);
-
-        if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
-            // Fallback to scalar code for handling errors
-            for (int k = 0; k < 8; k++) {
-                char32_t codepoint = buf[k];
-                if (codepoint <= 0xFF) {
-                    *latin1_output++ = static_cast<char>(codepoint);
-                } else {
-                    return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
-                                          latin1_output);
-                }
-            }
-            buf += 8;
-        } else {
-            __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
-            __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
+  const size_t rounded_len =
+      len & ~0x1F; // Round down to nearest multiple of 32
+
+  __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
+  __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
+                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
+
+  const char32_t *start = buf;
 
-            __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
-            __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
-            __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
-            __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
+    __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
 
-            __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
-            _mm_storeu_si128((__m128i *)latin1_output, _mm256_castsi256_si128(result));
+    __m256i check_combined = _mm256_or_si256(in1, in2);
 
-            latin1_output += 16;
-            buf += 16;
+    if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
+      // Fallback to scalar code for handling errors
+      for (int k = 0; k < 8; k++) {
+        char32_t codepoint = buf[k];
+        if (codepoint <= 0xFF) {
+          *latin1_output++ = static_cast<char>(codepoint);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
         }
+      }
+      buf += 8;
+    } else {
+      __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
+      __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
+
+      __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
+      __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
+      __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
+      __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
+
+      __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
+      _mm_storeu_si128((__m128i *)latin1_output,
+                       _mm256_castsi256_si128(result));
+
+      latin1_output += 16;
+      buf += 16;
     }
+  }
 
-    return std::make_pair(result(error_code::SUCCESS, buf - start), latin1_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
 /* end file src/haswell/avx2_convert_utf32_to_latin1.cpp */
 /* begin file src/haswell/avx2_convert_utf32_to_utf8.cpp */
-std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char *>
+avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
+  const char32_t *end = buf + len;
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
   const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
@@ -26171,36 +26912,46 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
   __m256i running_max = _mm256_setzero_si256();
   __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i*)buf+1);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
     running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff), _mm256_and_si256(nextin, v_7fffffff));
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
     in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-    if(_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
       // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in_16),_mm256_extractf128_si256(in_16,1));
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
       // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
       // 3. adjust pointers
       buf += 16;
       utf8_output += 16;
       continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
@@ -26220,25 +26971,32 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
       const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
       const uint32_t M0 = one_byte_bitmask & 0x55555555;
       const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
       // 4. pack the bytes
 
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
       utf8_output += row[0];
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
       utf8_output += row_2[0];
 
       // 6. adjust pointers
@@ -26246,22 +27004,28 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
       continue;
     }
     // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
     if (saturation_bitmask == 0xffffffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
       const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(forbidden_bytemask, _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
 
-      const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                              0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -26288,7 +27052,7 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m256i s0 = _mm256_srli_epi16(in_16, 4);
@@ -26298,7 +27062,8 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
       const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
       const __m256i s4 = _mm256_xor_si256(s3, m0);
 #undef simdutf_vec
 
@@ -26309,78 +27074,93 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
       const uint32_t mask = (one_byte_bitmask & 0x55555555) |
                             (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be useful.
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
       /*if(mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-        const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-        const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
         utf8_output += 12;
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-        utf8_output += 12;
-        buf += 16;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
         continue;
       }*/
       const uint8_t mask0 = uint8_t(mask);
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
       const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-      const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
-
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
       const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-      const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
       utf8_output += row2[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
       utf8_output += row3[0];
       buf += 16;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will produce four UTF-8 bytes.
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // may require large, non-trivial tables?
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {  // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) { // 2-byte
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000 )==0) {  // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {  // 4-byte
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -26390,19 +27170,23 @@ std::pair<const char32_t*, char*> avx2_convert_utf32_to_utf8(const char32_t* buf
 
   // check for invalid input
   const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
-  if(static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(_mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
+          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
     return std::make_pair(nullptr, utf8_output);
   }
 
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf8_output); }
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf8_output);
+  }
 
   return std::make_pair(buf, utf8_output);
 }
 
-
-std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
-  const char32_t* start = buf;
+std::pair<result, char *>
+avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                       char *utf8_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
 
   const __m256i v_0000 = _mm256_setzero_si256();
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
@@ -26412,40 +27196,53 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
   const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
   const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i*)buf+1);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
     // Check for too large input
-    const __m256i max_input = _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
-    if(static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start), utf8_output);
+    const __m256i max_input =
+        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
+    if (static_cast<uint32_t>(_mm256_movemask_epi8(
+            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            utf8_output);
     }
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff), _mm256_and_si256(nextin, v_7fffffff));
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
     in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-    if(_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
       // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(_mm256_castsi256_si128(in_16),_mm256_extractf128_si256(in_16,1));
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
       // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
       // 3. adjust pointers
       buf += 16;
       utf8_output += 16;
       continue; // we are done for this round!
     }
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
     if (one_or_two_bytes_bitmask == 0xffffffff) {
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
@@ -26465,25 +27262,32 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
       const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
       const uint32_t M0 = one_byte_bitmask & 0x55555555;
       const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0)  & 0x00ff00ff;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
       // 4. pack the bytes
 
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t* row_2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2>>16)][0];
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i*)(row_2 + 1));
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(utf8_unpacked, _mm256_setr_m128i(shuffle,shuffle_2));
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_packed));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
       utf8_output += row[0];
-      _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_packed,1));
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
       utf8_output += row_2[0];
 
       // 6. adjust pointers
@@ -26491,27 +27295,34 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
       continue;
     }
     // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
     if (saturation_bitmask == 0xffffffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
       // Check for illegal surrogate code units
       const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      const __m256i forbidden_bytemask = _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf8_output);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf8_output);
       }
 
-      const __m256i dup_even = _mm256_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-                                              0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -26538,7 +27349,7 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256 (t1, simdutf_vec(0b1000000000000000));
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m256i s0 = _mm256_srli_epi16(in_16, 4);
@@ -26548,7 +27359,8 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
       const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
       const __m256i s4 = _mm256_xor_si256(s3, m0);
 #undef simdutf_vec
 
@@ -26559,78 +27371,95 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
       const uint32_t mask = (one_byte_bitmask & 0x55555555) |
                             (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be useful.
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
       /*if(mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle = _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1, 2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-        const __m256i utf8_0 = _mm256_shuffle_epi8(out0, shuffle);
-        const __m256i utf8_1 = _mm256_shuffle_epi8(out1, shuffle);
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
         utf8_output += 12;
         _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_0,1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_extractf128_si256(utf8_1,1));
-        utf8_output += 12;
-        buf += 16;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
         continue;
       }*/
       const uint8_t mask0 = uint8_t(mask);
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
       const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t* row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i*)(row2 + 1));
-      const __m128i utf8_2 = _mm_shuffle_epi8(_mm256_extractf128_si256(out0,1), shuffle2);
-
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
       const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t* row3 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i*)(row3 + 1));
-      const __m128i utf8_3 = _mm_shuffle_epi8(_mm256_extractf128_si256(out1,1), shuffle3);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_2);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
       utf8_output += row2[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_3);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
       utf8_output += row3[0];
       buf += 16;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will produce four UTF-8 bytes.
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // may require large, non-trivial tables?
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {  // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) { // 2-byte
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word & 0xFFFF0000 )==0) {  // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {  // 4-byte
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -26643,55 +27472,75 @@ std::pair<result, char*> avx2_convert_utf32_to_utf8_with_errors(const char32_t*
 /* end file src/haswell/avx2_convert_utf32_to_utf8.cpp */
 /* begin file src/haswell/avx2_convert_utf32_to_utf16.cpp */
 template <endianness big_endian>
-std::pair<const char32_t*, char16_t*> avx2_convert_utf32_to_utf16(const char32_t* buf, size_t len, char16_t* utf16_output) {
-  const char32_t* end = buf + len;
+std::pair<const char32_t *, char16_t *>
+avx2_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                            char16_t *utf16_output) {
+  const char32_t *end = buf + len;
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
   __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-
   while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
     const __m256i v_00000000 = _mm256_setzero_si256();
     const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
     // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
     if (saturation_bitmask == 0xffffffff) {
       const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
       const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(forbidden_bytemask, _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -26702,64 +27551,89 @@ std::pair<const char32_t*, char16_t*> avx2_convert_utf32_to_utf16(const char32_t
   }
 
   // check for invalid input
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf16_output); }
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf16_output);
+  }
 
   return std::make_pair(buf, utf16_output);
 }
 
-
 template <endianness big_endian>
-std::pair<result, char16_t*> avx2_convert_utf32_to_utf16_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) {
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
+std::pair<result, char16_t *>
+avx2_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                        char16_t *utf16_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i*)buf);
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
     const __m256i v_00000000 = _mm256_setzero_si256();
     const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
     // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
     if (saturation_bitmask == 0xffffffff) {
       const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
       const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      const __m256i forbidden_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf16_output);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf16_output);
       }
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),_mm256_extractf128_si256(in,1));
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -26781,27 +27655,29 @@ std::pair<result, char16_t*> avx2_convert_utf32_to_utf16_with_errors(const char3
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_latin1(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char *&latin1_output) {
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   const __m128i in = _mm_loadu_si128((__m128i *)input);
 
   const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff; // we are only processing 12 bytes in case it is not all ASCII
+      utf8_end_of_code_point_mask &
+      0xfff; // we are only processing 12 bytes in case it is not all ASCII
 
-  if(utf8_end_of_code_point_mask == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
     _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
     latin1_output += 12; // We wrote 12 characters.
-    return 12; // We consumed 1 bytes.
+    return 12;           // We consumed 1 bytes.
   }
   /// We do not have a fast path available, so we fallback.
   const uint8_t idx =
@@ -26809,22 +27685,25 @@ size_t convert_masked_utf8_to_latin1(const char *input,
   const uint8_t consumed =
       tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
   // this indicates an invalid input:
-  if(idx >= 64) { return consumed; }
-  // Here we should have (idx < 64), if not, there is a bug in the validation or elsewhere.
-  // SIX (6) input code-code units
-  // this is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six code
-  // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-  // where pdep/pext is fast, we might be able to use a small lookup table.
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+  // processors where pdep/pext is fast, we might be able to use a small lookup
+  // table.
   const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
   const __m128i perm = _mm_shuffle_epi8(in, sh);
   const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
   const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
   __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-  const __m128i latin1_packed = _mm_packus_epi16(composed,composed);
+  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
   // writing 8 bytes even though we only care about the first 6 bytes.
-  // performance note: it would be faster to use _mm_storeu_si128, we should investigate.
+  // performance note: it would be faster to use _mm_storeu_si128, we should
+  // investigate.
   _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
   latin1_output += 6; // We wrote 6 bytes.
   return consumed;
@@ -26889,7 +27768,8 @@ simdutf_really_inline __m256i lookup_pshufb_improved(const __m256i input) {
 }
 
 template <bool isbase64url>
-size_t encode_base64(char *dst, const char *src, size_t srclen, base64_options options) {
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
   // credit: Wojciech Muła
   const uint8_t *input = (const uint8_t *)src;
 
@@ -27088,8 +27968,9 @@ static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
 
   if (base64_url) {
     check_asso =
-        _mm256_setr_epi8(0xD,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x3,0x7,0xB,0xE,0xB,0x6,
-        0xD,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x3,0x7,0xB,0xE,0xB,0x6);
+        _mm256_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x3,
+                         0x7, 0xB, 0xE, 0xB, 0x6, 0xD, 0x1, 0x1, 0x1, 0x1, 0x1,
+                         0x1, 0x1, 0x1, 0x1, 0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
   } else {
 
     check_asso = _mm256_setr_epi8(
@@ -27100,8 +27981,13 @@ static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
   __m256i check_values;
   if (base64_url) {
     check_values = _mm256_setr_epi8(
-        uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0xCF),uint8_t(0xBF),uint8_t(0xB6),uint8_t(0xA6),uint8_t(0xB5),uint8_t(0xA1),0x0,uint8_t(0x80),0x0,uint8_t(0x80),0x0,uint8_t(0x80),
-        uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0xCF),uint8_t(0xBF),uint8_t(0xB6),uint8_t(0xA6),uint8_t(0xB5),uint8_t(0xA1),0x0,uint8_t(0x80),0x0,uint8_t(0x80),0x0,uint8_t(0x80));
+        uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+        uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6), uint8_t(0xA6),
+        uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0, uint8_t(0x80),
+        0x0, uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+        uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6),
+        uint8_t(0xA6), uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
+        uint8_t(0x80), 0x0, uint8_t(0x80));
   } else {
     check_values = _mm256_setr_epi8(
         int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0xCF),
@@ -27112,7 +27998,7 @@ static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
         int8_t(0x86), int8_t(0xD1), int8_t(0x80), int8_t(0xB1), int8_t(0x80),
         int8_t(0x91), int8_t(0x80));
   }
-  const __m256i shifted =_mm256_srli_epi32(*src, 3);
+  const __m256i shifted = _mm256_srli_epi32(*src, 3);
   const __m256i delta_hash =
       _mm256_avg_epu8(_mm256_shuffle_epi8(delta_asso, *src), shifted);
   const __m256i check_hash =
@@ -27152,16 +28038,16 @@ static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
   return _mm_popcnt_u64(nmask);
 }
 
-// The caller of this function is responsible to ensure that there are 64 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char *src) {
   b->chunks[0] = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
   b->chunks[1] =
       _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32));
 }
 
-// The caller of this function is responsible to ensure that there are 128 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char16_t *src) {
   __m256i m1 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
   __m256i m2 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 16));
@@ -27216,12 +28102,15 @@ static inline void base64_decode_block_safe(char *out, block64 *b) {
 
 template <bool base64_url, typename chartype>
 result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options) {
+                              base64_options options,
+                              last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
-  size_t equallocation = srclen; // location of the first padding character if any
+  size_t equallocation =
+      srclen; // location of the first padding character if any
   // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
     srclen--;
   }
   size_t equalsigns = 0;
@@ -27230,7 +28119,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     srclen--;
     equalsigns = 1;
     // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
       srclen--;
     }
     if (srclen > 0 && src[srclen - 1] == '=') {
@@ -27239,6 +28129,12 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       equalsigns = 2;
     }
   }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
   char *end_of_safe_64byte_zone =
       (srclen + 3) / 4 * 3 >= 63 ? dst + (srclen + 3) / 4 * 3 - 63 : dst;
 
@@ -27260,7 +28156,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) && to_base64[uint8_t(*src)] <= 64) {
+        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+               to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
         return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
@@ -27350,69 +28247,38 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       buffer_start += 4;
     }
     // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // bring in src content
+    // backtrack
     int leftover = int(bufferptr - buffer_start);
-    if (leftover > 0) {
-      while (leftover < 4 && src < srcend) {
-        uint8_t val = to_base64[uint8_t(*src)];
-        if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-        }
-        buffer_start[leftover] = char(val);
-        leftover += (val <= 63);
-        src++;
-      }
-
-      if (leftover == 1) {
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      if (leftover == 2) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-        std::memcpy(dst, &triple, 1);
-        dst += 1;
-      } else if (leftover == 3) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6) +
-                          (uint32_t(buffer_start[2]) << 1 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-        std::memcpy(dst, &triple, 2);
-        dst += 2;
-      } else {
-        uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                           (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                           (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                           (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                          << 8;
-        triple = scalar::utf32::swap_bytes(triple);
-        std::memcpy(dst, &triple, 3);
-        dst += 3;
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
       }
+      src--;
+      leftover--;
     }
   }
   if (src < srcend + equalsigns) {
-    result r =
-        scalar::base64::base64_tail_decode(dst, src, srcend - src, options);
+    result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
     if (r.error == error_code::INVALID_BASE64_CHARACTER) {
       r.count += size_t(src - srcinit);
       return r;
     } else {
       r.count += size_t(dst - dstinit);
     }
-    if(r.error == error_code::SUCCESS && equalsigns > 0) {
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
         r.count = equallocation;
       }
     }
     return r;
   }
-  if(equalsigns > 0) {
-    if((size_t(dst - dstinit) % 3 == 0) || ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
@@ -27429,9 +28295,9 @@ namespace simdutf {
 namespace haswell {
 namespace {
 
-// Walks through a buffer in block-sized increments, loading the last part with spaces
-template<size_t STEP_SIZE>
-struct buf_block_reader {
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
 public:
   simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
   simdutf_really_inline size_t block_index();
@@ -27440,14 +28306,16 @@ struct buf_block_reader {
   /**
    * Get the last block, padded with spaces.
    *
-   * There will always be a last block, with at least 1 byte, unless len == 0 (in which case this
-   * function fills the buffer with spaces and returns 0. In particular, if len == STEP_SIZE there
-   * will be 0 full_blocks and 1 remainder block with STEP_SIZE bytes and no spaces for padding.
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
    *
    * @return the number of effective characters in the last block.
    */
   simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
   simdutf_really_inline void advance();
+
 private:
   const uint8_t *buf;
   const size_t len;
@@ -27456,9 +28324,10 @@ struct buf_block_reader {
 };
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text_64(const uint8_t *text) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
     buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
@@ -27466,50 +28335,64 @@ simdutf_unused static char * format_input_text_64(const uint8_t *text) {
 }
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text(const simd8x64<uint8_t>& in) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t*>(buf));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') { buf[i] = '_'; }
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
   return buf;
 }
 
-simdutf_unused static char * format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char*>(malloc(64 + 1));
-  for (size_t i=0; i<64; i++) {
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
     buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
   }
   buf[64] = '\0';
   return buf;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len) : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE}, idx{0} {}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() { return idx; }
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
   return idx < lenminusstep;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *buf_block_reader<STEP_SIZE>::full_block() const {
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
   return &buf[idx];
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if(len == idx) { return 0; } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20, STEP_SIZE); // std::memset STEP_SIZE because it is more efficient to write out 8 or 16 bytes at once.
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
   std::memcpy(dst, buf + idx, len - idx);
   return len - idx;
 }
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
   idx += STEP_SIZE;
 }
@@ -27526,38 +28409,39 @@ namespace utf8_validation {
 
 using namespace simd;
 
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -27567,137 +28451,173 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
+
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
   //
-  // Return nonzero if there are incomplete multibyte characters at the end of the block:
-  // e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-    // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
-    // ... 1111____ 111_____ 11______
-    static const uint8_t max_array[32] = {
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 0b11110000u-1, 0b11100000u-1, 0b11000000u-1
-    };
-    const simd8<uint8_t> max_value(&max_array[sizeof(max_array)-sizeof(simd8<uint8_t>)]);
-    return input.gt_bits(max_value);
-  }
-
-  struct utf8_checker {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-    // The last input we received
-    simd8<uint8_t> prev_input_block;
-    // Whether the last input we received was incomplete (used for ASCII fast path)
-    simd8<uint8_t> prev_incomplete;
-
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-    // The only problem that can happen at EOF is that a multibyte character is too short
-    // or a byte value too large in the last bytes: check_special_cases only checks for bytes
-    // too large in the first of two bytes.
-    simdutf_really_inline void check_eof() {
-      // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
-      // possibly finish them.
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
       this->error |= this->prev_incomplete;
-    }
-
-    simdutf_really_inline void check_next_input(const simd8x64<uint8_t>& input) {
-      if(simdutf_likely(is_ascii(input))) {
-        this->error |= this->prev_incomplete;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-        static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        this->prev_incomplete = is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1]);
-        this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1];
-
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
       }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
+  }
 
-    // do not forget to call check_eof!
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
+}; // struct utf8_checker
 } // namespace utf8_validation
 
 using utf8_validation::utf8_checker;
@@ -27715,97 +28635,109 @@ namespace utf8_validation {
 /**
  * Validates that the string is actual UTF-8.
  */
-template<class checker>
-bool generic_validate_utf8(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
     reader.advance();
-    c.check_eof();
-    return !c.errors();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-bool generic_validate_utf8(const char * input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 /**
  * Validates that the string is actual UTF-8 and stops on errors.
  */
-template<class checker>
-result generic_validate_utf8_with_errors(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    size_t count{0};
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      if(c.errors()) {
-        if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-        result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input + count), length - count);
-        res.count += count;
-        return res;
-      }
-      reader.advance();
-      count += 64;
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
-    reader.advance();
-    c.check_eof();
     if (c.errors()) {
-      if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input) + count, length - count);
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
       res.count += count;
       return res;
-    } else {
-      return result(error_code::SUCCESS, length);
     }
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
 }
 
-result generic_validate_utf8_with_errors(const char * input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-bool generic_validate_ascii(const uint8_t * input, size_t length) {
-    buf_block_reader<64> reader(input, length);
-    uint8_t blocks[64]{};
-    simd::simd8x64<uint8_t> running_or(blocks);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      running_or |= in;
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     running_or |= in;
-    return running_or.is_ascii();
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
 }
 
-bool generic_validate_ascii(const char * input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-result generic_validate_ascii_with_errors(const uint8_t * input, size_t length) {
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
   buf_block_reader<64> reader(input, length);
   size_t count{0};
   while (reader.has_full_block()) {
     simd::simd8x64<uint8_t> in(reader.full_block());
     if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
       return result(res.error, count + res.count);
     }
     reader.advance();
@@ -27816,15 +28748,17 @@ result generic_validate_ascii_with_errors(const uint8_t * input, size_t length)
   reader.get_remainder(block);
   simd::simd8x64<uint8_t> in(block);
   if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
     return result(res.error, count + res.count);
   } else {
     return result(error_code::SUCCESS, length);
   }
 }
 
-result generic_validate_ascii_with_errors(const char * input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 } // namespace utf8_validation
@@ -27835,7 +28769,6 @@ result generic_validate_ascii_with_errors(const char * input, size_t length) {
 // transcoding from UTF-8 to UTF-16
 /* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
@@ -27844,36 +28777,39 @@ namespace utf8_to_utf16 {
 using namespace simd;
 
 template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char16_t* utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the generic directory.
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the mask
-    // far more than 64 bytes.
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf16<endian>(utf16_output);
       utf16_output += 64;
       pos += 64;
     } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow path.
-      // Anything that is not a continuation mask is a 'leading byte', that is, the
-      // start of a new code point.
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
       uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
       uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end* of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
       // We process in blocks of up to 12 bytes except possibly
       // for fast paths which may process up to 16 bytes. For the
       // slow path to work, we should have at least 12 input bytes left.
       size_t max_starting_point = (pos + 64) - 12;
       // Next loop is going to run at least five times when using solely
       // the slow/regular path, and at least four times if there are fast paths.
-      while(pos < max_starting_point) {
+      while (pos < max_starting_point) {
         // Performance note: our ability to compute 'consumed' and
         // then shift and recompute is critical. If there is a
         // latency of, say, 4 cycles on getting 'consumed', then
@@ -27887,8 +28823,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
         // Thus we may allow convert_masked_utf8_to_utf16 to process
         // more bytes at a time under a fast-path mode where 16 bytes
         // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(input + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
         pos += consumed;
         utf8_end_of_code_point_mask >>= consumed;
       }
@@ -27898,7 +28834,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
       // 85% to 90% efficiency.
     }
   }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(input + pos, size - pos, utf16_output);
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
   return utf16_output - start;
 }
 
@@ -27909,46 +28846,45 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
 /* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 /* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
 namespace utf8_to_utf16 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -27958,260 +28894,287 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-    template <endianness endian>
-    simdutf_really_inline size_t convert(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf16::convert<endian>(in + pos, size - pos, utf16_output);
-        if(howmany == 0) { return 0; }
-        utf16_output += howmany;
-      }
-      return utf16_output - start;
-    }
-
-    template <endianness endian>
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
+
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf16_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
       }
-      return result(error_code::SUCCESS, utf16_output - start);
     }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf16 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace haswell
 } // namespace simdutf
@@ -28226,37 +29189,37 @@ namespace utf8_to_utf32 {
 
 using namespace simd;
 
-
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char32_t* utf32_output) noexcept {
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
+  while (pos + 64 + safety_margin <= size) {
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf32(utf32_output);
       utf32_output += 64;
       pos += 64;
     } else {
-    // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
-    uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-    uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-    uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-    size_t max_starting_point = (pos + 64) - 12;
-    while(pos < max_starting_point) {
-      size_t consumed = convert_masked_utf8_to_utf32(input + pos,
-                          utf8_end_of_code_point_mask, utf32_output);
-      pos += consumed;
-      utf8_end_of_code_point_mask >>= consumed;
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
     }
   }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos, utf32_output);
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
   return utf32_output - start;
 }
 
-
 } // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace haswell
@@ -28264,46 +29227,45 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
 /* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 /* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
 namespace utf8_to_utf32 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -28313,253 +29275,273 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 words when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 16 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // we have an error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-        if(howmany == 0) { return 0; }
-        utf32_output += howmany;
-      }
-      return utf32_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf32_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      return result(error_code::SUCCESS, utf32_output - start);
     }
-
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
     }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf32 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace haswell
 } // namespace simdutf
@@ -28574,33 +29556,34 @@ namespace utf8 {
 
 using namespace simd;
 
-simdutf_really_inline size_t count_code_points(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.gt(-65);
-      count += count_ones(utf8_continuation_mask);
-    }
-    return count + scalar::utf8::count_code_points(in + pos, size - pos);
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
 }
 
-simdutf_really_inline size_t utf16_length_from_utf8(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-      // We count one word for anything that is not a continuation (so
-      // leading bytes).
-      count += 64 - count_ones(utf8_continuation_mask);
-      int64_t utf8_4byte = input.gteq_unsigned(240);
-      count += count_ones(utf8_4byte);
-    }
-    return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // utf8 namespace
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
 } // unnamed namespace
 } // namespace haswell
 } // namespace simdutf
@@ -28612,48 +29595,59 @@ namespace {
 namespace utf16 {
 
 template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-      count += count_ones(not_pair) / 2;
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t ascii_mask = input.lteq(0x7F);
-      uint64_t twobyte_mask = input.lteq(0x7FF);
-      uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
-
-      size_t ascii_count = count_ones(ascii_mask) / 2;
-      size_t twobyte_count = count_ones(twobyte_mask & ~ ascii_mask) / 2;
-      size_t threebyte_count = count_ones(not_pair_mask & ~ twobyte_mask) / 2;
-      size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-      count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count + ascii_count;
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos, size - pos);
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t* in, size_t size) {
-    return count_code_points<big_endian>(in, size);
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
 }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t size, char16_t* output) {
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
   size_t pos = 0;
 
-  while (pos < size/32*32) {
+  while (pos < size / 32 * 32) {
     simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
     input.swap_bytes();
     input.store(reinterpret_cast<uint16_t *>(output));
@@ -28664,60 +29658,59 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
   scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
 }
 
-} // utf16
+} // namespace utf16
 } // unnamed namespace
 } // namespace haswell
 } // namespace simdutf
 /* end file src/generic/utf16.h */
 
-
 // transcoding from UTF-8 to Latin 1
 /* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// For UTF-8 to Latin 1, we can allow any ASCII character, and any continuation byte,
-// but the non-ASCII leading bytes must be 0b11000011 or 0b11000010 and nothing else.
-//
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-    constexpr const uint8_t FORBIDDEN  = 0xff;
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -28727,361 +29720,401 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       FORBIDDEN,
       // 1111____ ________ <four+ byte lead in byte 1>
-      FORBIDDEN
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      FORBIDDEN,
-      // ____0101 ________
-      FORBIDDEN,
-      // ____011_ ________
-      FORBIDDEN,
-      FORBIDDEN,
-
-      // ____1___ ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      // ____1101 ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
+
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
 
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      this->error |= check_special_cases(input, prev1);
-    }
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
-        if(howmany == 0) { return 0; }
-        latin1_output += howmany;
-      }
-      return latin1_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          if (errors()) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        if (res.error) {    // In case of error, we want the error position
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          latin1_output += res.count;
         }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
       }
-      return result(error_code::SUCCESS, latin1_output - start);
     }
+    return result(error_code::SUCCESS, latin1_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_latin1 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
 } // unnamed namespace
 } // namespace haswell
 } // namespace simdutf
 /* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 /* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace haswell {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-    simdutf_really_inline size_t convert_valid(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
-        }
-      }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos, latin1_output);
-        latin1_output += howmany;
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
-      return latin1_output - start;
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
-
   }
-}   // utf8_to_latin1 namespace
-}   // unnamed namespace
-}   // namespace haswell
- // namespace simdutf
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
+
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace haswell
+} // namespace simdutf
+  // namespace simdutf
 /* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
 namespace simdutf {
 namespace haswell {
 
-simdutf_warn_unused int implementation::detect_encodings(const char * input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_utf8(buf,len);
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_utf8_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_ascii(buf,len);
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_ascii_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-16. protect the implementation from
     // handling nullptr
@@ -29089,19 +30122,22 @@ simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, s
   }
   const char16_t *tail = avx2_validate_utf16<endianness::LITTLE>(buf, len);
   if (tail) {
-    return scalar::utf16::validate<endianness::LITTLE>(tail, len - (tail - buf));
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
   } else {
     return false;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-16. protect the implementation from
     // handling nullptr
     return true;
   }
-  const char16_t* tail = avx2_validate_utf16<endianness::BIG>(buf, len);
+  const char16_t *tail = avx2_validate_utf16<endianness::BIG>(buf, len);
   if (tail) {
     return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
   } else {
@@ -29109,33 +30145,38 @@ simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, s
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   result res = avx2_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   result res = avx2_validate_utf16_with_errors<endianness::BIG>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-32. protect the implementation from
     // handling nullptr
     return true;
   }
-  const char32_t* tail = avx2_validate_utf32le(buf, len);
+  const char32_t *tail = avx2_validate_utf32le(buf, len);
   if (tail) {
     return scalar::utf32::validate(tail, len - (tail - buf));
   } else {
@@ -29143,7 +30184,8 @@ simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, siz
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-32. protect the implementation from
     // handling nullptr
@@ -29151,158 +30193,215 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(const char
   }
   result res = avx2_validate_utf32le_with_errors(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char*, char*> ret = avx2_convert_latin1_to_utf8(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char *, char *> ret =
+      avx2_convert_latin1_to_utf8(buf, len, utf8_output);
   size_t converted_chars = ret.second - utf8_output;
 
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
-      ret.first, len - (ret.first - buf), ret.second);
+        ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
 
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-    std::pair<const char*, char16_t*> ret = avx2_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf16_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::LITTLE>(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      avx2_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-    std::pair<const char*, char16_t*> ret = avx2_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf16_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::BIG>(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      avx2_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
-    std::pair<const char*, char32_t*> ret = avx2_convert_latin1_to_utf32(buf, len, utf32_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf32_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      avx2_convert_latin1_to_utf32(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char* input, size_t size,
-    char* latin1_output) const noexcept {
-   return utf8_to_latin1::convert_valid(input, size,  latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *input, size_t size, char *latin1_output) const noexcept {
+  return utf8_to_latin1::convert_valid(input, size, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
-  return converter.convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-   return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-   return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert(buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* input, size_t size,
-    char32_t* utf32_output) const noexcept {
-  return utf8_to_utf32::convert_valid(input, size,  utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
 }
 
-
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = haswell::avx2_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
+                                                                latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = haswell::avx2_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_latin1<endianness::BIG>(buf, len,
+                                                             latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = avx2_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      avx2_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29310,16 +30409,26 @@ simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = avx2_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      avx2_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                                latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29327,53 +30436,81 @@ simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement a custom function
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: implement a custom function
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = haswell::avx2_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8<endianness::LITTLE>(buf, len,
+                                                              utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = haswell::avx2_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8<endianness::BIG>(buf, len,
+                                                           utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29381,17 +30518,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::BIG>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29399,50 +30546,69 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16le_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = avx2_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx2_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = avx2_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx2_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = avx2_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      avx2_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29450,20 +30616,26 @@ simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  return convert_utf32_to_latin1(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = haswell::avx2_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29471,43 +30643,69 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = haswell::avx2_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                               utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = haswell::avx2_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32<endianness::BIG>(buf, len,
+                                                            utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29515,17 +30713,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::BIG>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29533,46 +30741,68 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf32_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = avx2_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx2_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = avx2_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx2_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+          buf, len, utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29580,16 +30810,23 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::BIG>(
+          buf, len, utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -29597,89 +30834,109 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16le(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16be(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16le_to_utf32(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16be_to_utf32(buf, len, utf32_output);
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char* buf, size_t len) const noexcept {
-  return count_utf8(buf,len);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
   return scalar::utf16::latin1_length_from_utf16(length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
   return scalar::utf32::latin1_length_from_utf32(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::utf16_length_from_utf8(input, length);
 }
 
-
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *input, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t len) const noexcept {
   const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
   size_t answer = len / sizeof(__m256i) * sizeof(__m256i);
   size_t i = 0;
-  if(answer >= 2048) { // long strings optimization
+  if (answer >= 2048) { // long strings optimization
     __m256i four_64bits = _mm256_setzero_si256();
     while (i + sizeof(__m256i) <= len) {
       __m256i runner = _mm256_setzero_si256();
@@ -29689,21 +30946,26 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *i
         iterations = 255;
       }
       size_t max_i = i + iterations * sizeof(__m256i) - sizeof(__m256i);
-      for (; i + 4*sizeof(__m256i) <= max_i; i += 4*sizeof(__m256i)) {
+      for (; i + 4 * sizeof(__m256i) <= max_i; i += 4 * sizeof(__m256i)) {
         __m256i input1 = _mm256_loadu_si256((const __m256i *)(data + i));
-        __m256i input2 = _mm256_loadu_si256((const __m256i *)(data + i + sizeof(__m256i)));
-        __m256i input3 = _mm256_loadu_si256((const __m256i *)(data + i + 2*sizeof(__m256i)));
-        __m256i input4 = _mm256_loadu_si256((const __m256i *)(data + i + 3*sizeof(__m256i)));
-        __m256i input12 = _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input1),
-                _mm256_cmpgt_epi8(_mm256_setzero_si256(), input2));
-        __m256i input23 = _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input3),
-                _mm256_cmpgt_epi8(_mm256_setzero_si256(), input4));
+        __m256i input2 =
+            _mm256_loadu_si256((const __m256i *)(data + i + sizeof(__m256i)));
+        __m256i input3 = _mm256_loadu_si256(
+            (const __m256i *)(data + i + 2 * sizeof(__m256i)));
+        __m256i input4 = _mm256_loadu_si256(
+            (const __m256i *)(data + i + 3 * sizeof(__m256i)));
+        __m256i input12 =
+            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input1),
+                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input2));
+        __m256i input23 =
+            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input3),
+                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input4));
         __m256i input1234 = _mm256_add_epi8(input12, input23);
-        runner = _mm256_sub_epi8(
-            runner, input1234);
+        runner = _mm256_sub_epi8(runner, input1234);
       }
       for (; i <= max_i; i += sizeof(__m256i)) {
-        __m256i input_256_chunk = _mm256_loadu_si256((const __m256i *)(data + i));
+        __m256i input_256_chunk =
+            _mm256_loadu_si256((const __m256i *)(data + i));
         runner = _mm256_sub_epi8(
             runner, _mm256_cmpgt_epi8(_mm256_setzero_si256(), input_256_chunk));
       }
@@ -29715,82 +30977,115 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *i
               _mm256_extract_epi64(four_64bits, 2) +
               _mm256_extract_epi64(four_64bits, 3);
   } else if (answer > 0) {
-    for(; i + sizeof(__m256i) <= len; i += sizeof(__m256i)) {
-      __m256i latin = _mm256_loadu_si256((const __m256i*)(data + i));
+    for (; i + sizeof(__m256i) <= len; i += sizeof(__m256i)) {
+      __m256i latin = _mm256_loadu_si256((const __m256i *)(data + i));
       uint32_t non_ascii = _mm256_movemask_epi8(latin);
       answer += count_ones(non_ascii);
     }
   }
-  return answer + scalar::latin1::utf8_length_from_latin1(reinterpret_cast<const char *>(data + i), len - i);
+  return answer + scalar::latin1::utf8_length_from_latin1(
+                      reinterpret_cast<const char *>(data + i), len - i);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const __m256i v_00000000 = _mm256_setzero_si256();
   const __m256i v_ffffff80 = _mm256_set1_epi32((uint32_t)0xffffff80);
   const __m256i v_fffff800 = _mm256_set1_epi32((uint32_t)0xfffff800);
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 8 <= length; pos += 8) {
-    __m256i in = _mm256_loadu_si256((__m256i*)(input + pos));
-    const __m256i ascii_bytes_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffffff80), v_00000000);
-    const __m256i one_two_bytes_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_fffff800), v_00000000);
-    const __m256i two_bytes_bytemask = _mm256_xor_si256(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const __m256i one_two_three_bytes_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const __m256i three_bytes_bytemask = _mm256_xor_si256(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
-    const uint32_t ascii_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(ascii_bytes_bytemask));
-    const uint32_t two_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(two_bytes_bytemask));
-    const uint32_t three_bytes_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(three_bytes_bytemask));
+  for (; pos + 8 <= length; pos += 8) {
+    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
+    const __m256i ascii_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffffff80), v_00000000);
+    const __m256i one_two_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_fffff800), v_00000000);
+    const __m256i two_bytes_bytemask =
+        _mm256_xor_si256(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const __m256i one_two_three_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const __m256i three_bytes_bytemask =
+        _mm256_xor_si256(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
+    const uint32_t ascii_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(ascii_bytes_bytemask));
+    const uint32_t two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(two_bytes_bytemask));
+    const uint32_t three_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(three_bytes_bytemask));
 
     size_t ascii_count = count_ones(ascii_bytes_bitmask) / 4;
     size_t two_bytes_count = count_ones(two_bytes_bitmask) / 4;
     size_t three_bytes_count = count_ones(three_bytes_bitmask) / 4;
-    count += 32 - 3*ascii_count - 2*two_bytes_count - three_bytes_count;
+    count += 32 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
   }
-  return count + scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const __m256i v_00000000 = _mm256_setzero_si256();
   const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 8 <= length; pos += 8) {
-    __m256i in = _mm256_loadu_si256((__m256i*)(input + pos));
-    const __m256i surrogate_bytemask = _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t surrogate_bitmask = static_cast<uint32_t>(_mm256_movemask_epi8(surrogate_bytemask));
-    size_t surrogate_count = (32-count_ones(surrogate_bitmask))/4;
+  for (; pos + 8 <= length; pos += 8) {
+    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
+    const __m256i surrogate_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t surrogate_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogate_bytemask));
+    size_t surrogate_count = (32 - count_ones(surrogate_bitmask)) / 4;
     count += 8 + surrogate_count;
   }
-  return count + scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  if(options & base64_url) {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
     return encode_base64<true>(output, input, length, options);
   } else {
     return encode_base64<false>(output, input, length, options);
@@ -29807,7 +31102,8 @@ SIMDUTF_UNTARGET_REGION
 #endif
 
 
-#if SIMDUTF_GCC11ORMORE // workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
 SIMDUTF_POP_DISABLE_WARNINGS
 #endif // end of workaround
 /* end file src/simdutf/haswell/end.h */
@@ -29828,28 +31124,39 @@ namespace simdutf {
 namespace ppc64 {
 namespace {
 #ifndef SIMDUTF_PPC64_H
-#error "ppc64.h must be included"
+  #error "ppc64.h must be included"
 #endif
 using namespace simd;
 
-
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t>& input) {
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
   // careful: 0x80 is not ascii.
   return input.reduce_or().saturating_sub(0b01111111u).bits_not_set_anywhere();
 }
 
-simdutf_unused simdutf_really_inline simd8<bool> must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_second_byte = prev1.saturating_sub(0b11000000u-1); // Only 11______ will be > 0
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0b11100000u-1); // Only 111_____ will be > 0
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0b11110000u-1); // Only 1111____ will be > 0
-  // Caller requires a bool (all 1's). All values resulting from the subtraction will be <= 64, so signed comparison is fine.
-  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) > int8_t(0);
-}
-
-simdutf_really_inline simd8<bool> must_be_2_3_continuation(const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0xe0u-0x80); // Only 111_____ will be >= 0x80
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0xf0u-0x80); // Only 1111____ will be >= 0x80
-  // Caller requires a bool (all 1's). All values resulting from the subtraction will be <= 64, so signed comparison is fine.
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_second_byte =
+      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
+         int8_t(0);
+}
+
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
   return simd8<bool>(is_third_byte | is_fourth_byte);
 }
 
@@ -29862,9 +31169,9 @@ namespace simdutf {
 namespace ppc64 {
 namespace {
 
-// Walks through a buffer in block-sized increments, loading the last part with spaces
-template<size_t STEP_SIZE>
-struct buf_block_reader {
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
 public:
   simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
   simdutf_really_inline size_t block_index();
@@ -29873,14 +31180,16 @@ struct buf_block_reader {
   /**
    * Get the last block, padded with spaces.
    *
-   * There will always be a last block, with at least 1 byte, unless len == 0 (in which case this
-   * function fills the buffer with spaces and returns 0. In particular, if len == STEP_SIZE there
-   * will be 0 full_blocks and 1 remainder block with STEP_SIZE bytes and no spaces for padding.
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
    *
    * @return the number of effective characters in the last block.
    */
   simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
   simdutf_really_inline void advance();
+
 private:
   const uint8_t *buf;
   const size_t len;
@@ -29889,9 +31198,10 @@ struct buf_block_reader {
 };
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text_64(const uint8_t *text) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
     buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
@@ -29899,50 +31209,64 @@ simdutf_unused static char * format_input_text_64(const uint8_t *text) {
 }
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text(const simd8x64<uint8_t>& in) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t*>(buf));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') { buf[i] = '_'; }
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
   return buf;
 }
 
-simdutf_unused static char * format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char*>(malloc(64 + 1));
-  for (size_t i=0; i<64; i++) {
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
     buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
   }
   buf[64] = '\0';
   return buf;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len) : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE}, idx{0} {}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() { return idx; }
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
   return idx < lenminusstep;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *buf_block_reader<STEP_SIZE>::full_block() const {
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
   return &buf[idx];
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if(len == idx) { return 0; } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20, STEP_SIZE); // std::memset STEP_SIZE because it is more efficient to write out 8 or 16 bytes at once.
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
   std::memcpy(dst, buf + idx, len - idx);
   return len - idx;
 }
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
   idx += STEP_SIZE;
 }
@@ -29959,38 +31283,39 @@ namespace utf8_validation {
 
 using namespace simd;
 
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -30000,137 +31325,173 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
+
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
   //
-  // Return nonzero if there are incomplete multibyte characters at the end of the block:
-  // e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-    // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
-    // ... 1111____ 111_____ 11______
-    static const uint8_t max_array[32] = {
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 0b11110000u-1, 0b11100000u-1, 0b11000000u-1
-    };
-    const simd8<uint8_t> max_value(&max_array[sizeof(max_array)-sizeof(simd8<uint8_t>)]);
-    return input.gt_bits(max_value);
-  }
-
-  struct utf8_checker {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-    // The last input we received
-    simd8<uint8_t> prev_input_block;
-    // Whether the last input we received was incomplete (used for ASCII fast path)
-    simd8<uint8_t> prev_incomplete;
-
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-    // The only problem that can happen at EOF is that a multibyte character is too short
-    // or a byte value too large in the last bytes: check_special_cases only checks for bytes
-    // too large in the first of two bytes.
-    simdutf_really_inline void check_eof() {
-      // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
-      // possibly finish them.
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
       this->error |= this->prev_incomplete;
-    }
-
-    simdutf_really_inline void check_next_input(const simd8x64<uint8_t>& input) {
-      if(simdutf_likely(is_ascii(input))) {
-        this->error |= this->prev_incomplete;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-        static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        this->prev_incomplete = is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1]);
-        this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1];
-
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
       }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
+  }
 
-    // do not forget to call check_eof!
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
+}; // struct utf8_checker
 } // namespace utf8_validation
 
 using utf8_validation::utf8_checker;
@@ -30148,127 +31509,475 @@ namespace utf8_validation {
 /**
  * Validates that the string is actual UTF-8.
  */
-template<class checker>
-bool generic_validate_utf8(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
     reader.advance();
-    c.check_eof();
-    return !c.errors();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-bool generic_validate_utf8(const char * input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 /**
  * Validates that the string is actual UTF-8 and stops on errors.
  */
-template<class checker>
-result generic_validate_utf8_with_errors(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    size_t count{0};
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      if(c.errors()) {
-        if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-        result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input + count), length - count);
-        res.count += count;
-        return res;
-      }
-      reader.advance();
-      count += 64;
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
-    reader.advance();
-    c.check_eof();
     if (c.errors()) {
-      if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input) + count, length - count);
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
       res.count += count;
       return res;
-    } else {
-      return result(error_code::SUCCESS, length);
-    }
-}
-
-result generic_validate_utf8_with_errors(const char * input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
-}
-
-template<class checker>
-bool generic_validate_ascii(const uint8_t * input, size_t length) {
-    buf_block_reader<64> reader(input, length);
-    uint8_t blocks[64]{};
-    simd::simd8x64<uint8_t> running_or(blocks);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      running_or |= in;
-      reader.advance();
     }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
-    running_or |= in;
-    return running_or.is_ascii();
-}
-
-bool generic_validate_ascii(const char * input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
-}
-
-template<class checker>
-result generic_validate_ascii_with_errors(const uint8_t * input, size_t length) {
-  buf_block_reader<64> reader(input, length);
-  size_t count{0};
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
-      return result(res.error, count + res.count);
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
+
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    running_or |= in;
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
+}
+
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    if (!in.is_ascii()) {
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
+      return result(res.error, count + res.count);
+    }
+    reader.advance();
+
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  if (!in.is_ascii()) {
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
+    return result(res.error, count + res.count);
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
+
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+} // namespace utf8_validation
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_validator.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_to_utf16 {
+using namespace simd;
+
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
+
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
+      }
     }
-    reader.advance();
-
-    count += 64;
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
-    return result(res.error, count + res.count);
-  } else {
-    return result(error_code::SUCCESS, length);
+    return result(error_code::SUCCESS, utf16_output - start);
   }
-}
 
-result generic_validate_ascii_with_errors(const char * input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
-}
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-} // namespace utf8_validation
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace ppc64
 } // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_validator.h */
-// transcoding from UTF-8 to UTF-16
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 /* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace ppc64 {
 namespace {
@@ -30277,36 +31986,39 @@ namespace utf8_to_utf16 {
 using namespace simd;
 
 template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char16_t* utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the generic directory.
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the mask
-    // far more than 64 bytes.
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf16<endian>(utf16_output);
       utf16_output += 64;
       pos += 64;
     } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow path.
-      // Anything that is not a continuation mask is a 'leading byte', that is, the
-      // start of a new code point.
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
       uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
       uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end* of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
       // We process in blocks of up to 12 bytes except possibly
       // for fast paths which may process up to 16 bytes. For the
       // slow path to work, we should have at least 12 input bytes left.
       size_t max_starting_point = (pos + 64) - 12;
       // Next loop is going to run at least five times when using solely
       // the slow/regular path, and at least four times if there are fast paths.
-      while(pos < max_starting_point) {
+      while (pos < max_starting_point) {
         // Performance note: our ability to compute 'consumed' and
         // then shift and recompute is critical. If there is a
         // latency of, say, 4 cycles on getting 'consumed', then
@@ -30320,8 +32032,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
         // Thus we may allow convert_masked_utf8_to_utf16 to process
         // more bytes at a time under a fast-path mode where 16 bytes
         // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(input + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
         pos += consumed;
         utf8_end_of_code_point_mask >>= consumed;
       }
@@ -30331,7 +32043,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
       // 85% to 90% efficiency.
     }
   }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(input + pos, size - pos, utf16_output);
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
   return utf16_output - start;
 }
 
@@ -30340,48 +32053,48 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
 } // namespace ppc64
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
 namespace simdutf {
 namespace ppc64 {
 namespace {
-namespace utf8_to_utf16 {
+namespace utf8_to_utf32 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -30391,265 +32104,277 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-    template <endianness endian>
-    simdutf_really_inline size_t convert(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf16::convert<endian>(in + pos, size - pos, utf16_output);
-        if(howmany == 0) { return 0; }
-        utf16_output += howmany;
-      }
-      return utf16_output - start;
-    }
-
-    template <endianness endian>
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf16_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      return result(error_code::SUCCESS, utf16_output - start);
     }
-
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
     }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf16 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace ppc64
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-// transcoding from UTF-8 to UTF-32
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 /* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
 namespace simdutf {
@@ -30659,385 +32384,43 @@ namespace utf8_to_utf32 {
 
 using namespace simd;
 
-
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char32_t* utf32_output) noexcept {
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
+  while (pos + 64 + safety_margin <= size) {
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf32(utf32_output);
       utf32_output += 64;
       pos += 64;
     } else {
-    // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
-    uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-    uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-    uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-    size_t max_starting_point = (pos + 64) - 12;
-    while(pos < max_starting_point) {
-      size_t consumed = convert_masked_utf8_to_utf32(input + pos,
-                          utf8_end_of_code_point_mask, utf32_output);
-      pos += consumed;
-      utf8_end_of_code_point_mask >>= consumed;
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
     }
   }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos, utf32_output);
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
   return utf32_output - start;
 }
 
-
 } // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace ppc64
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
-/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
-
-
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_to_utf32 {
-using namespace simd;
-
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 words when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 16 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // we have an error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
-        }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-        if(howmany == 0) { return 0; }
-        utf32_output += howmany;
-      }
-      return utf32_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
-        }
-      }
-      if(errors()) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        res.count += pos;
-        return res;
-      }
-      if(pos < size) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        if (res.error) {    // In case of error, we want the error position
-          res.count += pos;
-          return res;
-        } else {    // In case of success, we want the number of word written
-          utf32_output += res.count;
-        }
-      }
-      return result(error_code::SUCCESS, utf32_output - start);
-    }
-
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
-
-  }; // struct utf8_checker
-} // utf8_to_utf32 namespace
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 // other functions
-/* begin file src/generic/utf8.h */
-
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8 {
-
-using namespace simd;
-
-simdutf_really_inline size_t count_code_points(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.gt(-65);
-      count += count_ones(utf8_continuation_mask);
-    }
-    return count + scalar::utf8::count_code_points(in + pos, size - pos);
-}
-
-simdutf_really_inline size_t utf16_length_from_utf8(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-      // We count one word for anything that is not a continuation (so
-      // leading bytes).
-      count += 64 - count_ones(utf8_continuation_mask);
-      int64_t utf8_4byte = input.gteq_unsigned(240);
-      count += count_ones(utf8_4byte);
-    }
-    return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // utf8 namespace
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8.h */
 /* begin file src/generic/utf16.h */
 namespace simdutf {
 namespace ppc64 {
@@ -31045,48 +32428,59 @@ namespace {
 namespace utf16 {
 
 template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-      count += count_ones(not_pair) / 2;
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t ascii_mask = input.lteq(0x7F);
-      uint64_t twobyte_mask = input.lteq(0x7FF);
-      uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
-
-      size_t ascii_count = count_ones(ascii_mask) / 2;
-      size_t twobyte_count = count_ones(twobyte_mask & ~ ascii_mask) / 2;
-      size_t threebyte_count = count_ones(not_pair_mask & ~ twobyte_mask) / 2;
-      size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-      count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count + ascii_count;
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos, size - pos);
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t* in, size_t size) {
-    return count_code_points<big_endian>(in, size);
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
 }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t size, char16_t* output) {
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
   size_t pos = 0;
 
-  while (pos < size/32*32) {
+  while (pos < size / 32 * 32) {
     simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
     input.swap_bytes();
     input.store(reinterpret_cast<uint16_t *>(output));
@@ -31097,11 +32491,52 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
   scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
 }
 
-} // utf16
+} // namespace utf16
 } // unnamed namespace
 } // namespace ppc64
 } // namespace simdutf
 /* end file src/generic/utf16.h */
+/* begin file src/generic/utf8.h */
+
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8 {
+
+using namespace simd;
+
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
+
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8.h */
 
 //
 // Implementation-specific overrides
@@ -31109,317 +32544,428 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
 namespace simdutf {
 namespace ppc64 {
 
-simdutf_warn_unused int implementation::detect_encodings(const char * input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   // todo: reimplement as a one-pass algorithm.
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
 
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_utf8(buf,len);
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_utf8_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_ascii(buf,len);
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_ascii_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   return scalar::utf16::validate<endianness::LITTLE>(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
   return scalar::utf16::validate<endianness::BIG>(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
   return scalar::utf32::validate_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char16_t *buf, size_t len) const noexcept {
   return scalar::utf32::validate(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return result(error_code::OTHER, 0); // stub
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return result(error_code::OTHER, 0); // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* /*buf*/, size_t /*len*/, char16_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* /*buf*/, size_t /*len*/, char32_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* /*buf*/, size_t /*len*/, char32_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
   return result(error_code::OTHER, 0); // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* /*buf*/, size_t /*len*/, char32_t* /*utf16_output*/) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
   return 0; // stub
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
+                                                            utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+      buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
+                                                               utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                                utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
+                                                             utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
+                                                          utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
+      buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
+                                                                utf32_output);
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   scalar::utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
+                                                                   length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
+                                                                    length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return scalar::utf8::utf16_length_from_utf8(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   return scalar::utf32::utf8_length_from_utf32(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   return scalar::utf32::utf16_length_from_utf32(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return scalar::utf8::count_code_points(input, length);
 }
 
-
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
   // skip trailing spaces
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   size_t equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};;
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
   // skip trailing spaces
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   size_t equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
+  if (length == 0) {
+    if (equalsigns > 0) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
   return scalar::base64::binary_to_base64(input, length, output, options);
 }
 } // namespace ppc64
@@ -31450,7 +32996,7 @@ namespace simdutf {
 namespace rvv {
 namespace {
 #ifndef SIMDUTF_RVV_H
-#error "rvv.h must be included"
+  #error "rvv.h must be included"
 #endif
 
 } // unnamed namespace
@@ -31462,216 +33008,271 @@ namespace {
 //
 namespace simdutf {
 namespace rvv {
+/* begin file src/rvv/rvv_helpers.inl.cpp */
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf32_store_utf16_m4(uint16_t *dst, vuint32m4_t utf32, size_t vl,
+                         vbool4_t m4even) {
+  /* convert [000000000000aaaa|aaaaaabbbbbbbbbb]
+   * to      [110111bbbbbbbbbb|110110aaaaaaaaaa] */
+  vuint32m4_t sur = __riscv_vsub_vx_u32m4(utf32, 0x10000, vl);
+  sur = __riscv_vor_vv_u32m4(__riscv_vsll_vx_u32m4(sur, 16, vl),
+                             __riscv_vsrl_vx_u32m4(sur, 10, vl), vl);
+  sur = __riscv_vand_vx_u32m4(sur, 0x3FF03FF, vl);
+  sur = __riscv_vor_vx_u32m4(sur, 0xDC00D800, vl);
+  /* merge 1 byte utf32 and 2 byte sur */
+  vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(utf32, 0xFFFF, vl);
+  vuint16m4_t utf32_16 = __riscv_vreinterpret_v_u32m4_u16m4(
+      __riscv_vmerge_vvm_u32m4(utf32, sur, m4, vl));
+  /* compress and store */
+  vbool4_t mOut = __riscv_vmor_mm_b4(
+      __riscv_vmsne_vx_u16m4_b4(utf32_16, 0, vl * 2), m4even, vl * 2);
+  vuint16m4_t vout = __riscv_vcompress_vm_u16m4(utf32_16, mOut, vl * 2);
+  vl = __riscv_vcpop_m_b4(mOut, vl * 2);
+  __riscv_vse16_v_u16m4(dst, simdutf_byteflip<bflip>(vout, vl), vl);
+  return vl;
+};
+/* end file src/rvv/rvv_helpers.inl.cpp */
 
 /* begin file src/rvv/rvv_length_from.inl.cpp */
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf16le(const char16_t *src, size_t len) const noexcept {
   return utf32_length_from_utf16le(src, len);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf16be(const char16_t *src, size_t len) const noexcept {
   return utf32_length_from_utf16be(src, len);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char *src, size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *src, size_t len) const noexcept {
   return utf32_length_from_utf8(src, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *src, size_t len) const noexcept {
   return utf32_length_from_utf8(src, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t len) const noexcept {
   return len;
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t len) const noexcept {
   return len;
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t len) const noexcept {
   return len;
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t len) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t len) const noexcept {
   return len;
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *src, size_t len) const noexcept {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t*)src, vl);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
     vbool1_t mask = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
     count += __riscv_vcpop_m_b1(mask, vl);
   }
   return count;
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t rvv_utf32_length_from_utf16(const char16_t *src, size_t len) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf32_length_from_utf16(const char16_t *src, size_t len) {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
     v = simdutf_byteflip<bflip>(v, vl);
-    vbool2_t notHigh = __riscv_vmor_mm_b2(
-            __riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl),
-            __riscv_vmsltu_vx_u16m8_b2(v, 0xDC00, vl), vl);
+    vbool2_t notHigh =
+        __riscv_vmor_mm_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl),
+                           __riscv_vmsltu_vx_u16m8_b2(v, 0xDC00, vl), vl);
     count += __riscv_vcpop_m_b2(notHigh, vl);
   }
   return count;
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *src, size_t len) const noexcept {
   return rvv_utf32_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *src, size_t len) const noexcept {
   if (supports_zvbb())
     return rvv_utf32_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
   else
     return rvv_utf32_length_from_utf16<simdutf_ByteFlip::V>(src, len);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *src, size_t len) const noexcept {
   size_t count = len;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t*)src, vl);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
     count += __riscv_vcpop_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
   }
   return count;
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t rvv_utf8_length_from_utf16(const char16_t *src, size_t len) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf8_length_from_utf16(const char16_t *src, size_t len) {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
     v = simdutf_byteflip<bflip>(v, vl);
     vbool2_t m234 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7F, vl);
     vbool2_t m34 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7FF, vl);
-    vbool2_t notSur = __riscv_vmor_mm_b2(
-            __riscv_vmsltu_vx_u16m8_b2(v, 0xD800, vl),
-            __riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl), vl);
+    vbool2_t notSur =
+        __riscv_vmor_mm_b2(__riscv_vmsltu_vx_u16m8_b2(v, 0xD800, vl),
+                           __riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl), vl);
     vbool2_t m3 = __riscv_vmand_mm_b2(m34, notSur, vl);
     count += vl + __riscv_vcpop_m_b2(m234, vl) + __riscv_vcpop_m_b2(m3, vl);
   }
   return count;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *src, size_t len) const noexcept {
   return rvv_utf8_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *src, size_t len) const noexcept {
   if (supports_zvbb())
     return rvv_utf8_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
   else
     return rvv_utf8_length_from_utf16<simdutf_ByteFlip::V>(src, len);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *src, size_t len) const noexcept {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t*)src, vl);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
     vbool4_t m234 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7F, vl);
-    vbool4_t m34  = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7FF, vl);
-    vbool4_t m4   = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
-    count += vl + __riscv_vcpop_m_b4(m234, vl) + __riscv_vcpop_m_b4(m34, vl) + __riscv_vcpop_m_b4(m4, vl);
+    vbool4_t m34 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7FF, vl);
+    vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
+    count += vl + __riscv_vcpop_m_b4(m234, vl) + __riscv_vcpop_m_b4(m34, vl) +
+             __riscv_vcpop_m_b4(m4, vl);
   }
   return count;
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *src, size_t len) const noexcept {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t*)src, vl);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
     vbool1_t m1234 = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
-    vbool1_t m4 = __riscv_vmsgtu_vx_u8m8_b1(
-            __riscv_vreinterpret_u8m8(v), (uint8_t)0b11101111, vl);
+    vbool1_t m4 = __riscv_vmsgtu_vx_u8m8_b1(__riscv_vreinterpret_u8m8(v),
+                                            (uint8_t)0b11101111, vl);
     count += __riscv_vcpop_m_b1(m1234, vl) + __riscv_vcpop_m_b1(m4, vl);
   }
   return count;
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t *src, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *src, size_t len) const noexcept {
   size_t count = 0;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t*)src, vl);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
     vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
     count += vl + __riscv_vcpop_m_b4(m4, vl);
   }
   return count;
 }
-
 /* end file src/rvv/rvv_length_from.inl.cpp */
 /* begin file src/rvv/rvv_validate.inl.cpp */
 
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *src, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *src, size_t len) const noexcept {
   size_t vlmax = __riscv_vsetvlmax_e8m8();
   vint8m8_t mask = __riscv_vmv_v_x_i8m8(0, vlmax);
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t*)src, vl);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
     mask = __riscv_vor_vv_i8m8_tu(mask, mask, v, vl);
   }
-  return __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(mask, 0, vlmax), vlmax) < 0;
+  return __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(mask, 0, vlmax), vlmax) <
+         0;
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *src, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *src, size_t len) const noexcept {
   const char *beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t*)src, vl);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
     long idx = __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
-    if (idx >= 0) return result(error_code::TOO_LARGE, src - beg + idx);
+    if (idx >= 0)
+      return result(error_code::TOO_LARGE, src - beg + idx);
   }
   return result(error_code::SUCCESS, src - beg);
 }
 
 /* Returns a close estimation of the number of valid UTF-8 bytes up to the
  * first invalid one, but never overestimating. */
-simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src, size_t len) {
+simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src,
+                                                         size_t len) {
   const char *beg = src;
-  if (len < 32) return 0;
+  if (len < 32)
+    return 0;
 
   /* validate first three bytes */
   {
     size_t idx = 3;
     while (idx < len && (src[idx] >> 6) == 0b10)
       ++idx;
-    if (idx > 3+3 || !scalar::utf8::validate(src, idx))
+    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
       return 0;
   }
 
-  static const uint64_t err1m[] = { 0x0202020202020202, 0x4915012180808080 };
-  static const uint64_t err2m[] = { 0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB };
-  static const uint64_t err3m[] = { 0x0101010101010101, 0X01010101BABAAEE6 };
+  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
+  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
+  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
 
-  const vuint8m1_t err1tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
-  const vuint8m1_t err2tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
-  const vuint8m1_t err3tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
+  const vuint8m1_t err1tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
+  const vuint8m1_t err2tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
+  const vuint8m1_t err3tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
 
   size_t tail = 3;
   size_t n = len - tail;
 
   for (size_t vl; n > 0; n -= vl, src += vl) {
     vl = __riscv_vsetvl_e8m4(n);
-    vuint8m4_t v0 = __riscv_vle8_v_u8m4((uint8_t const*)src, vl);
+    vuint8m4_t v0 = __riscv_vle8_v_u8m4((uint8_t const *)src, vl);
 
-    uint8_t next0 = src[vl+0];
-    uint8_t next1 = src[vl+1];
-    uint8_t next2 = src[vl+2];
+    uint8_t next0 = src[vl + 0];
+    uint8_t next1 = src[vl + 1];
+    uint8_t next2 = src[vl + 2];
 
     /* fast path: ASCII */
-    if (__riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u8m4_b2(v0, 0b01111111, vl), vl) < 0 && (next0|next1|next2) < 0b10000000)
+    if (__riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u8m4_b2(v0, 0b01111111, vl), vl) <
+            0 &&
+        (next0 | next1 | next2) < 0b10000000)
       continue;
 
     /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
@@ -31680,8 +33281,10 @@ simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src, size_t
     vuint8m4_t v2 = __riscv_vslide1down_vx_u8m4(v1, next1, vl);
     vuint8m4_t v3 = __riscv_vslide1down_vx_u8m4(v2, next2, vl);
 
-    vuint8m4_t s1 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(__riscv_vreinterpret_v_u8m4_u16m4(v2), 4, __riscv_vsetvlmax_e16m4()));
-    vuint8m4_t s3 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(__riscv_vreinterpret_v_u8m4_u16m4(v3), 4, __riscv_vsetvlmax_e16m4()));
+    vuint8m4_t s1 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
+        __riscv_vreinterpret_v_u8m4_u16m4(v2), 4, __riscv_vsetvlmax_e16m4()));
+    vuint8m4_t s3 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
+        __riscv_vreinterpret_v_u8m4_u16m4(v3), 4, __riscv_vsetvlmax_e16m4()));
 
     vuint8m4_t idx2 = __riscv_vand_vx_u8m4(v2, 0xF, vl);
     vuint8m4_t idx1 = __riscv_vand_vx_u8m4(s1, 0xF, vl);
@@ -31690,614 +33293,310 @@ simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src, size_t
     vuint8m4_t err1 = simdutf_vrgather_u8m1x4(err1tbl, idx1);
     vuint8m4_t err2 = simdutf_vrgather_u8m1x4(err2tbl, idx2);
     vuint8m4_t err3 = simdutf_vrgather_u8m1x4(err3tbl, idx3);
-    vint8m4_t errs = __riscv_vreinterpret_v_u8m4_i8m4(__riscv_vand_vv_u8m4(__riscv_vand_vv_u8m4(err1, err2, vl), err3, vl));
+    vint8m4_t errs = __riscv_vreinterpret_v_u8m4_i8m4(
+        __riscv_vand_vv_u8m4(__riscv_vand_vv_u8m4(err1, err2, vl), err3, vl));
 
-    vbool2_t is_3 = __riscv_vmsgtu_vx_u8m4_b2(v1, 0b11100000-1, vl);
-    vbool2_t is_4 = __riscv_vmsgtu_vx_u8m4_b2(v0, 0b11110000-1, vl);
+    vbool2_t is_3 = __riscv_vmsgtu_vx_u8m4_b2(v1, 0b11100000 - 1, vl);
+    vbool2_t is_4 = __riscv_vmsgtu_vx_u8m4_b2(v0, 0b11110000 - 1, vl);
     vbool2_t is_34 = __riscv_vmor_mm_b2(is_3, is_4, vl);
-    vbool2_t err34 = __riscv_vmxor_mm_b2(is_34, __riscv_vmslt_vx_i8m4_b2(errs, 0, vl), vl);
-    vbool2_t errm = __riscv_vmor_mm_b2(__riscv_vmsgt_vx_i8m4_b2(errs, 0, vl), err34, vl);
-    if (__riscv_vfirst_m_b2(errm , vl) >= 0)
+    vbool2_t err34 =
+        __riscv_vmxor_mm_b2(is_34, __riscv_vmslt_vx_i8m4_b2(errs, 0, vl), vl);
+    vbool2_t errm =
+        __riscv_vmor_mm_b2(__riscv_vmsgt_vx_i8m4_b2(errs, 0, vl), err34, vl);
+    if (__riscv_vfirst_m_b2(errm, vl) >= 0)
       break;
   }
 
   /* we need to validate the last character */
-  while (tail < len && (src[0] >> 6) == 0b10) --src, ++tail;
+  while (tail < len && (src[0] >> 6) == 0b10)
+    --src, ++tail;
   return src - beg;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *src, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *src, size_t len) const noexcept {
   size_t count = rvv_count_valid_utf8(src, len);
   return scalar::utf8::validate(src + count, len - count);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *src, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *src, size_t len) const noexcept {
   size_t count = rvv_count_valid_utf8(src, len);
   result res = scalar::utf8::validate_with_errors(src + count, len - count);
   return result(res.error, count + res.count);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *src,
+                                 size_t len) const noexcept {
   return validate_utf16le_with_errors(src, len).error == error_code::SUCCESS;
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *src,
+                                 size_t len) const noexcept {
   return validate_utf16be_with_errors(src, len).error == error_code::SUCCESS;
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static result rvv_validate_utf16_with_errors(const char16_t *src, size_t len) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_validate_utf16_with_errors(const char16_t *src, size_t len) {
   const char16_t *beg = src;
   uint16_t last = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl, last = simdutf_byteflip<bflip>(src[-1])) {
+  for (size_t vl; len > 0;
+       len -= vl, src += vl, last = simdutf_byteflip<bflip>(src[-1])) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v1 = __riscv_vle16_v_u16m8((const uint16_t*)src, vl);
+    vuint16m8_t v1 = __riscv_vle16_v_u16m8((const uint16_t *)src, vl);
     v1 = simdutf_byteflip<bflip>(v1, vl);
     vuint16m8_t v0 = __riscv_vslide1up_vx_u16m8(v1, last, vl);
 
-    vbool2_t surhi = __riscv_vmseq_vx_u16m8_b2(__riscv_vand_vx_u16m8(v0, 0xFC00, vl), 0xD800, vl);
-    vbool2_t surlo = __riscv_vmseq_vx_u16m8_b2(__riscv_vand_vx_u16m8(v1, 0xFC00, vl), 0xDC00, vl);
+    vbool2_t surhi = __riscv_vmseq_vx_u16m8_b2(
+        __riscv_vand_vx_u16m8(v0, 0xFC00, vl), 0xD800, vl);
+    vbool2_t surlo = __riscv_vmseq_vx_u16m8_b2(
+        __riscv_vand_vx_u16m8(v1, 0xFC00, vl), 0xDC00, vl);
 
     long idx = __riscv_vfirst_m_b2(__riscv_vmxor_mm_b2(surhi, surlo, vl), vl);
     if (idx >= 0) {
-      last = idx > 0 ? simdutf_byteflip<bflip>(src[idx-1]) : last;
-      return result(error_code::SURROGATE, src - beg + idx - (last - 0xD800u < 0x400u));
+      last = idx > 0 ? simdutf_byteflip<bflip>(src[idx - 1]) : last;
+      return result(error_code::SURROGATE,
+                    src - beg + idx - (last - 0xD800u < 0x400u));
       break;
     }
   }
   if (last - 0xD800u < 0x400u) {
-    return result(error_code::SURROGATE, src - beg - 1); /* end on high surrogate */
+    return result(error_code::SURROGATE,
+                  src - beg - 1); /* end on high surrogate */
   } else {
     return result(error_code::SUCCESS, src - beg);
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *src, size_t len) const noexcept {
   return rvv_validate_utf16_with_errors<simdutf_ByteFlip::NONE>(src, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *src, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *src, size_t len) const noexcept {
   if (supports_zvbb())
     return rvv_validate_utf16_with_errors<simdutf_ByteFlip::ZVBB>(src, len);
   else
     return rvv_validate_utf16_with_errors<simdutf_ByteFlip::V>(src, len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *src, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *src, size_t len) const noexcept {
   size_t vlmax = __riscv_vsetvlmax_e32m8();
-  vuint32m8_t max    = __riscv_vmv_v_x_u32m8(0x10FFFF, vlmax);
+  vuint32m8_t max = __riscv_vmv_v_x_u32m8(0x10FFFF, vlmax);
   vuint32m8_t maxOff = __riscv_vmv_v_x_u32m8(0xFFFFF7FF, vlmax);
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t*)src, vl);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
     vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
-    max    = __riscv_vmaxu_vv_u32m8_tu(max, max, v, vl);
+    max = __riscv_vmaxu_vv_u32m8_tu(max, max, v, vl);
     maxOff = __riscv_vmaxu_vv_u32m8_tu(maxOff, maxOff, off, vl);
   }
-  return __riscv_vfirst_m_b4(__riscv_vmor_mm_b4(
-             __riscv_vmsne_vx_u32m8_b4(max, 0x10FFFF, vlmax),
-             __riscv_vmsne_vx_u32m8_b4(maxOff, 0xFFFFF7FF, vlmax), vlmax), vlmax) < 0;
+  return __riscv_vfirst_m_b4(
+             __riscv_vmor_mm_b4(
+                 __riscv_vmsne_vx_u32m8_b4(max, 0x10FFFF, vlmax),
+                 __riscv_vmsne_vx_u32m8_b4(maxOff, 0xFFFFF7FF, vlmax), vlmax),
+             vlmax) < 0;
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *src, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *src, size_t len) const noexcept {
   const char32_t *beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl) {
     vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t*)src, vl);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
     vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
-    long idx1 = __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 0x10FFFF, vl), vl);
-    long idx2 = __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(off, 0xFFFFF7FF, vl), vl);
-    if(idx1 >= 0 && idx2 >= 0) {
-      if(idx1 <= idx2) { return result(error_code::TOO_LARGE, src - beg + idx1); }
-      else { return result(error_code::SURROGATE, src - beg + idx2); }
+    long idx1 =
+        __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 0x10FFFF, vl), vl);
+    long idx2 = __riscv_vfirst_m_b4(
+        __riscv_vmsgtu_vx_u32m8_b4(off, 0xFFFFF7FF, vl), vl);
+    if (idx1 >= 0 && idx2 >= 0) {
+      if (idx1 <= idx2) {
+        return result(error_code::TOO_LARGE, src - beg + idx1);
+      } else {
+        return result(error_code::SURROGATE, src - beg + idx2);
+      }
+    }
+    if (idx1 >= 0) {
+      return result(error_code::TOO_LARGE, src - beg + idx1);
+    }
+    if (idx2 >= 0) {
+      return result(error_code::SURROGATE, src - beg + idx2);
     }
-    if (idx1 >= 0) { return result(error_code::TOO_LARGE, src - beg + idx1); }
-    if (idx2 >= 0) { return result(error_code::SURROGATE, src - beg + idx2); }
   }
   return result(error_code::SUCCESS, src - beg);
 }
-
 /* end file src/rvv/rvv_validate.inl.cpp */
 
 /* begin file src/rvv/rvv_latin1_to.inl.cpp */
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *src, size_t len, char *dst) const noexcept {
   char *beg = dst;
   for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
     vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t*)src, vl);
-    vbool4_t nascii = __riscv_vmslt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v1), 0, vl);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    vbool4_t nascii =
+        __riscv_vmslt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v1), 0, vl);
     size_t cnt = __riscv_vcpop_m_b4(nascii, vl);
     vlOut = vl + cnt;
     if (cnt == 0) {
-      __riscv_vse8_v_u8m2((uint8_t*)dst, v1, vlOut);
+      __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
       continue;
     }
 
-    vuint8m2_t v0 = __riscv_vor_vx_u8m2(__riscv_vsrl_vx_u8m2(v1, 6, vl), 0b11000000, vl);
+    vuint8m2_t v0 =
+        __riscv_vor_vx_u8m2(__riscv_vsrl_vx_u8m2(v1, 6, vl), 0b11000000, vl);
     v1 = __riscv_vand_vx_u8m2_mu(nascii, v1, v1, 0b10111111, vl);
 
-    vuint8m4_t wide = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vwmaccu_vx_u16m4(__riscv_vwaddu_vv_u16m4(v0, v1, vl), 0xFF, v1, vl));
-    vbool2_t mask = __riscv_vmsgtu_vx_u8m4_b2(__riscv_vsub_vx_u8m4(wide, 0b11000000, vl*2), 1, vl*2);
-    vuint8m4_t comp = __riscv_vcompress_vm_u8m4(wide, mask, vl*2);
+    vuint8m4_t wide =
+        __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vwmaccu_vx_u16m4(
+            __riscv_vwaddu_vv_u16m4(v0, v1, vl), 0xFF, v1, vl));
+    vbool2_t mask = __riscv_vmsgtu_vx_u8m4_b2(
+        __riscv_vsub_vx_u8m4(wide, 0b11000000, vl * 2), 1, vl * 2);
+    vuint8m4_t comp = __riscv_vcompress_vm_u8m4(wide, mask, vl * 2);
 
-    __riscv_vse8_v_u8m4((uint8_t*)dst, comp, vlOut);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, comp, vlOut);
   }
   return dst - beg;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *src, size_t len, char16_t *dst) const noexcept {
   char16_t *beg = dst;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e8m4(len);
-    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t*)src, vl);
-    __riscv_vse16_v_u16m8((uint16_t*)dst, __riscv_vzext_vf2_u16m8(v, vl), vl);
+    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
+    __riscv_vse16_v_u16m8((uint16_t *)dst, __riscv_vzext_vf2_u16m8(v, vl), vl);
   }
   return dst - beg;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *src, size_t len, char16_t *dst) const noexcept {
   char16_t *beg = dst;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e8m4(len);
-    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t*)src, vl);
-    __riscv_vse16_v_u16m8((uint16_t*)dst, __riscv_vsll_vx_u16m8(__riscv_vzext_vf2_u16m8(v, vl), 8, vl), vl);
+    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
+    __riscv_vse16_v_u16m8(
+        (uint16_t *)dst,
+        __riscv_vsll_vx_u16m8(__riscv_vzext_vf2_u16m8(v, vl), 8, vl), vl);
   }
   return dst - beg;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *src, size_t len, char32_t *dst) const noexcept {
   char32_t *beg = dst;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v = __riscv_vle8_v_u8m2((uint8_t*)src, vl);
-    __riscv_vse32_v_u32m8((uint32_t*)dst, __riscv_vzext_vf4_u32m8(v, vl), vl);
+    vuint8m2_t v = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    __riscv_vse32_v_u32m8((uint32_t *)dst, __riscv_vzext_vf4_u32m8(v, vl), vl);
   }
   return dst - beg;
 }
-
 /* end file src/rvv/rvv_latin1_to.inl.cpp */
-/* begin file src/rvv/rvv_utf8_to.inl.cpp */
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t rvv_utf32_store_utf16_m4(uint16_t *dst, vuint32m4_t utf32, size_t vl, vbool4_t m4even) {
-  /* convert [000000000000aaaa|aaaaaabbbbbbbbbb]
-   * to      [110111bbbbbbbbbb|110110aaaaaaaaaa] */
-  vuint32m4_t sur = __riscv_vsub_vx_u32m4(utf32, 0x10000, vl);
-  sur = __riscv_vor_vv_u32m4(__riscv_vsll_vx_u32m4(sur, 16, vl),
-                             __riscv_vsrl_vx_u32m4(sur, 10, vl), vl);
-  sur = __riscv_vand_vx_u32m4(sur, 0x3FF03FF, vl);
-  sur = __riscv_vor_vx_u32m4(sur, 0xDC00D800, vl);
-  /* merge 1 byte utf32 and 2 byte sur */
-  vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(utf32, 0xFFFF, vl);
-  vuint16m4_t utf32_16 = __riscv_vreinterpret_v_u32m4_u16m4(__riscv_vmerge_vvm_u32m4(utf32, sur, m4, vl));
-  /* compress and store */
-  vbool4_t mOut = __riscv_vmor_mm_b4(__riscv_vmsne_vx_u16m4_b4(utf32_16, 0, vl*2), m4even, vl*2);
-  vuint16m4_t vout = __riscv_vcompress_vm_u16m4(utf32_16, mOut, vl*2);
-  vl = __riscv_vcpop_m_b4(mOut, vl*2);
-  __riscv_vse16_v_u16m4(dst, simdutf_byteflip<bflip>(vout, vl), vl);
-  return vl;
-};
-
-template<typename Tdst, simdutf_ByteFlip bflip, bool validate=true>
-simdutf_really_inline static size_t rvv_utf8_to_common(char const *src, size_t len, Tdst *dst) {
-  static_assert(std::is_same<Tdst, uint16_t>() || std::is_same<Tdst, uint32_t>(), "invalid type");
-  constexpr bool is16 = std::is_same<Tdst, uint16_t>();
-  constexpr endianness endian = bflip == simdutf_ByteFlip::NONE ? endianness::LITTLE : endianness::BIG;
-  const auto scalar = [](char const *in, size_t count, Tdst *out) {
-    return is16 ? scalar::utf8_to_utf16::convert<endian>(in, count, (char16_t*)out)
-                : scalar::utf8_to_utf32::convert(in, count, (char32_t*)out);
-  };
-
-  if (len < 32) return scalar(src, len, dst);
-
-  /* validate first three bytes */
-  if (validate) {
-    size_t idx = 3;
-    while (idx < len && (src[idx] >> 6) == 0b10)
-      ++idx;
-    if (idx > 3+3 || !scalar::utf8::validate(src, idx))
-      return 0;
-  }
-
-  size_t tail = 3;
-  size_t n = len - tail;
-  Tdst *beg = dst;
-
-  static const uint64_t err1m[] = { 0x0202020202020202, 0x4915012180808080 };
-  static const uint64_t err2m[] = { 0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB };
-  static const uint64_t err3m[] = { 0x0101010101010101, 0X01010101BABAAEE6 };
-
-  const vuint8m1_t err1tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
-  const vuint8m1_t err2tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
-  const vuint8m1_t err3tbl = __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
-
-  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(__riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
-
-  for (size_t vl, vlOut; n > 0; n -= vl, src += vl, dst += vlOut) {
-    vl = __riscv_vsetvl_e8m2(n);
-
-    vuint8m2_t v0 = __riscv_vle8_v_u8m2((uint8_t const*)src, vl);
-    uint64_t max = __riscv_vmv_x_s_u8m1_u8(__riscv_vredmaxu_vs_u8m2_u8m1(v0, __riscv_vmv_s_x_u8m1(0, vl), vl));
-
-    uint8_t next0 = src[vl+0];
-    uint8_t next1 = src[vl+1];
-    uint8_t next2 = src[vl+2];
-
-    /* fast path: ASCII */
-    if ((max|next0|next1|next2) < 0b10000000) {
-      vlOut = vl;
-      if (is16) __riscv_vse16_v_u16m4((uint16_t*)dst, simdutf_byteflip<bflip>(__riscv_vzext_vf2_u16m4(v0, vlOut), vlOut), vlOut);
-      else      __riscv_vse32_v_u32m8((uint32_t*)dst, __riscv_vzext_vf4_u32m8(v0, vlOut), vlOut);
-      continue;
-    }
-
-    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
-     * https://arxiv.org/abs/2010.03090 */
-    vuint8m2_t v1 = __riscv_vslide1down_vx_u8m2(v0, next0, vl);
-    vuint8m2_t v2 = __riscv_vslide1down_vx_u8m2(v1, next1, vl);
-    vuint8m2_t v3 = __riscv_vslide1down_vx_u8m2(v2, next2, vl);
-
-    if (validate) {
-      vuint8m2_t s1 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(__riscv_vreinterpret_v_u8m2_u16m2(v2), 4, __riscv_vsetvlmax_e16m2()));
-      vuint8m2_t s3 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(__riscv_vreinterpret_v_u8m2_u16m2(v3), 4, __riscv_vsetvlmax_e16m2()));
-
-      vuint8m2_t idx2 = __riscv_vand_vx_u8m2(v2, 0xF, vl);
-      vuint8m2_t idx1 = __riscv_vand_vx_u8m2(s1, 0xF, vl);
-      vuint8m2_t idx3 = __riscv_vand_vx_u8m2(s3, 0xF, vl);
-
-      vuint8m2_t err1 = simdutf_vrgather_u8m1x2(err1tbl, idx1);
-      vuint8m2_t err2 = simdutf_vrgather_u8m1x2(err2tbl, idx2);
-      vuint8m2_t err3 = simdutf_vrgather_u8m1x2(err3tbl, idx3);
-      vint8m2_t errs = __riscv_vreinterpret_v_u8m2_i8m2(__riscv_vand_vv_u8m2(__riscv_vand_vv_u8m2(err1, err2, vl), err3, vl));
-
-      vbool4_t is_3 = __riscv_vmsgtu_vx_u8m2_b4(v1, 0b11100000-1, vl);
-      vbool4_t is_4 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b11110000-1, vl);
-      vbool4_t is_34 = __riscv_vmor_mm_b4(is_3, is_4, vl);
-      vbool4_t err34 = __riscv_vmxor_mm_b4(is_34, __riscv_vmslt_vx_i8m2_b4(errs, 0, vl), vl);
-      vbool4_t errm = __riscv_vmor_mm_b4(__riscv_vmsgt_vx_i8m2_b4(errs, 0, vl), err34, vl);
-      if (__riscv_vfirst_m_b4(errm , vl) >= 0)
-        return 0;
-    }
-
-    /* decoding */
-
-    /* mask of non continuation bytes */
-    vbool4_t m = __riscv_vmsgt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v0), -65, vl);
-    vlOut = __riscv_vcpop_m_b4(m, vl);
-
-    /* extract first and second bytes */
-    vuint8m2_t b1 = __riscv_vcompress_vm_u8m2(v0, m, vl);
-    vuint8m2_t b2 = __riscv_vcompress_vm_u8m2(v1, m, vl);
-
-    /* fast path: one and two byte */
-    if (max < 0b11100000) {
-      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
-
-      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
-      b1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
-
-      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(b1, __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1<<6, m1, vlOut), vlOut);
-      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
-      if (is16) __riscv_vse16_v_u16m4((uint16_t*)dst, simdutf_byteflip<bflip>(b12, vlOut), vlOut);
-      else      __riscv_vse32_v_u32m8((uint32_t*)dst, __riscv_vzext_vf2_u32m8(b12, vlOut), vlOut);
-      continue;
-    }
-
-    /* fast path: one, two and three byte */
-    if (max < 0b11110000) {
-      vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
-
-      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
-      b3 = __riscv_vand_vx_u8m2(b3, 0b00111111, vlOut);
-
-      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
-      vbool4_t m3 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b11011111, vlOut);
-
-      vuint8m2_t t1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
-      b1 = __riscv_vand_vx_u8m2_mu(m3, t1, b1, 15, vlOut);
-
-      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(b1, __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1<<6, m1, vlOut), vlOut);
-      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
-      vuint16m4_t b123 = __riscv_vwaddu_wv_u16m4_mu(m3, b12, __riscv_vsll_vx_u16m4_mu(m3, b12, b12, 6, vlOut), b3, vlOut);
-      if (is16) __riscv_vse16_v_u16m4((uint16_t*)dst, simdutf_byteflip<bflip>(b123, vlOut), vlOut);
-      else      __riscv_vse32_v_u32m8((uint32_t*)dst, __riscv_vzext_vf2_u32m8(b123, vlOut), vlOut);
-      continue;
-    }
-
-    /* extract third and fourth bytes */
-    vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
-    vuint8m2_t b4 = __riscv_vcompress_vm_u8m2(v3, m, vl);
-
-     /* remove prefix from leading bytes
-      *
-      * We could also use vrgather here, but it increases register pressure,
-      * and its performance varies widely on current platforms. It might be
-      * worth reconsidering, though, once there is more hardware available.
-      * Same goes for the __riscv_vsrl_vv_u32m4 correction step.
-      *
-      * We shift left and then right by the number of bytes in the prefix,
-      * which can be calculated as follows:
-      *         x                                max(x-10, 0)
-      * 0xxx -> 0000-0111 -> sift by 0 or 1   -> 0
-      * 10xx -> 1000-1011 -> don't care
-      * 110x -> 1100,1101 -> sift by 3        -> 2,3
-      * 1110 -> 1110      -> sift by 4        -> 4
-      * 1111 -> 1111      -> sift by 5        -> 5
-      *
-      * vssubu.vx v, 10, (max(x-10, 0)) almost gives us what we want, we
-      * just need to manually detect and handle the one special case:
-      */
-    #define SIMDUTF_RVV_UTF8_TO_COMMON_M1(idx) \
-      vuint8m1_t c1 = __riscv_vget_v_u8m2_u8m1(b1, idx); \
-      vuint8m1_t c2 = __riscv_vget_v_u8m2_u8m1(b2, idx); \
-      vuint8m1_t c3 = __riscv_vget_v_u8m2_u8m1(b3, idx); \
-      vuint8m1_t c4 = __riscv_vget_v_u8m2_u8m1(b4, idx); \
-      /* remove prefix from trailing bytes */ \
-      c2 = __riscv_vand_vx_u8m1(c2, 0b00111111, vlOut); \
-      c3 = __riscv_vand_vx_u8m1(c3, 0b00111111, vlOut); \
-      c4 = __riscv_vand_vx_u8m1(c4, 0b00111111, vlOut);  \
-      vuint8m1_t shift = __riscv_vsrl_vx_u8m1(c1, 4, vlOut); \
-      shift = __riscv_vmerge_vxm_u8m1(__riscv_vssubu_vx_u8m1(shift, 10, vlOut), 3, __riscv_vmseq_vx_u8m1_b8(shift, 12, vlOut), vlOut); \
-      c1 = __riscv_vsll_vv_u8m1(c1, shift, vlOut); \
-      c1 = __riscv_vsrl_vv_u8m1(c1, shift, vlOut); \
-      /* unconditionally widen and combine to c1234 */ \
-      vuint16m2_t c34 = __riscv_vwaddu_wv_u16m2(__riscv_vwmulu_vx_u16m2(c3, 1<<6, vlOut), c4, vlOut); \
-      vuint16m2_t c12 = __riscv_vwaddu_wv_u16m2(__riscv_vwmulu_vx_u16m2(c1, 1<<6, vlOut), c2, vlOut); \
-      vuint32m4_t c1234 = __riscv_vwaddu_wv_u32m4(__riscv_vwmulu_vx_u32m4(c12, 1 << 12, vlOut), c34, vlOut); \
-      /* derive required right-shift amount from `shift` to reduce
-       * c1234 to the required number of bytes */ \
-      c1234 = __riscv_vsrl_vv_u32m4(c1234, __riscv_vzext_vf4_u32m4(__riscv_vmul_vx_u8m1( \
-              __riscv_vrsub_vx_u8m1( __riscv_vssubu_vx_u8m1(shift, 2, vlOut), 3, vlOut), 6, vlOut), vlOut), vlOut); \
-      /* store result in desired format */ \
-      if (is16) vlDst = rvv_utf32_store_utf16_m4<bflip>((uint16_t*)dst, c1234, vlOut, m4even); \
-      else      vlDst = vlOut, __riscv_vse32_v_u32m4((uint32_t*)dst, c1234, vlOut);
-
-    /* Unrolling this manually reduces register pressure and allows
-     * us to terminate early. */
-    {
-      size_t vlOutm2 = vlOut, vlDst;
-      vlOut = __riscv_vsetvl_e8m1(vlOut);
-      SIMDUTF_RVV_UTF8_TO_COMMON_M1(0)
-      if (vlOutm2 == vlOut) {
-        vlOut = vlDst;
-        continue;
-      }
-
-      dst += vlDst;
-      vlOut = vlOutm2 - vlOut;
-    }
-    {
-      size_t vlDst;
-      SIMDUTF_RVV_UTF8_TO_COMMON_M1(1)
-      vlOut = vlDst;
-    }
-
-#undef SIMDUTF_RVV_UTF8_TO_COMMON_M1
-  }
-
-  /* validate the last character and reparse it + tail */
-  if (len > tail) {
-    if ((src[0] >> 6) == 0b10)
-      --dst;
-    while ((src[0] >> 6) == 0b10 && tail < len)
-      --src, ++tail;
-    if (is16) {
-      /* go back one more, when on high surrogate */
-      if (simdutf_byteflip<bflip>((uint16_t)dst[-1]) >= 0xD800 && simdutf_byteflip<bflip>((uint16_t)dst[-1]) <= 0xDBFF)
-        --dst;
-    }
-  }
-  size_t ret = scalar(src, tail, dst);
-  if (ret == 0) return 0;
-  return (size_t)(dst - beg) + ret;
-}
-
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char *src, size_t len, char *dst) const noexcept {
-  const char *beg = dst;
-  uint8_t last = 0;
-  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut, last = src[-1]) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t*)src, vl);
-    // check which bytes are ASCII
-    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
-    // count ASCII bytes
-    vlOut = __riscv_vcpop_m_b4(ascii, vl);
-    // The original code would only enter the next block after this check:
-    //   vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-    //   vlOut = __riscv_vcpop_m_b4(m, vl);
-    //   if (vlOut != vl || last > 0b01111111) {...}q
-    // So that everything is ASCII or continuation bytes, we just proceeded
-    // without any processing, going straight to __riscv_vse8_v_u8m2.
-    // But you need the __riscv_vslide1up_vx_u8m2 whenever there is a non-ASCII byte.
-    if (vlOut != vl) { // If not pure ASCII
-      // Non-ASCII characters
-      // We now want to mark the ascii and continuation bytes
-      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-      // We count them, that's our new vlOut (output vector length)
-      vlOut = __riscv_vcpop_m_b4(m, vl);
-
-      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
-
-      vbool4_t leading0  = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b10111111, vl);
-      vbool4_t trailing1 = __riscv_vmslt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v1), (uint8_t)0b11000000, vl);
-      // -62 i 0b11000010, so we check whether any of v0 is too big
-      vbool4_t tobig = __riscv_vmand_mm_b4(leading0, __riscv_vmsgtu_vx_u8m2_b4(__riscv_vxor_vx_u8m2(v0, (uint8_t)-62, vl), 1, vl), vl);
-      if (__riscv_vfirst_m_b4(__riscv_vmor_mm_b4(tobig, __riscv_vmxor_mm_b4(leading0, trailing1, vl), vl), vl) >= 0)
-        return 0;
-
-      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl), v1, v1, 0b01000000, vl);
-      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
-    } else if (last >= 0b11000000) { // If last byte is a leading  byte and we got only ASCII, error!
-      return 0;
-    }
-    __riscv_vse8_v_u8m2((uint8_t*)dst, v1, vlOut);
-  }
-  if (last > 0b10111111)
-    return 0;
-  return dst - beg;
-}
-
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char *src, size_t len, char *dst) const noexcept {
-  size_t res = convert_utf8_to_latin1(src, len, dst);
-  if (res) return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_latin1::convert_with_errors(src, len, dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char *src, size_t len, char *dst) const noexcept {
-  const char *beg = dst;
-  uint8_t last = 0;
-  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut, last = src[-1]) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t*)src, vl);
-    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
-    vlOut = __riscv_vcpop_m_b4(ascii, vl);
-    if (vlOut != vl) { // If not pure ASCII
-      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-      vlOut = __riscv_vcpop_m_b4(m, vl);
-      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
-      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl), v1, v1, 0b01000000, vl);
-      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
-    }
-    __riscv_vse8_v_u8m2((uint8_t*)dst, v1, vlOut);
-  }
-  return dst - beg;
-}
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE>(src, len, (uint16_t*)dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB>(src, len, (uint16_t*)dst);
-  else
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V>(src, len, (uint16_t*)dst);
-}
-
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char *src, size_t len, char16_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf16le(src, len, dst);
-  if (res) return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(src, len, dst);
-}
-
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char *src, size_t len, char16_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf16be(src, len, dst);
-  if (res) return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(src, len, dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE, false>(src, len, (uint16_t*)dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB, false>(src, len, (uint16_t*)dst);
-  else
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V, false>(src, len, (uint16_t*)dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char *src, size_t len, char32_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE>(src, len, (uint32_t*)dst);
-}
-
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char *src, size_t len, char32_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf32(src, len, dst);
-  if (res) return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf32::convert_with_errors(src, len, dst);
-}
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char *src, size_t len, char32_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE, false>(src, len, (uint32_t*)dst);
-}
-
-/* end file src/rvv/rvv_utf8_to.inl.cpp */
 /* begin file src/rvv/rvv_utf16_to.inl.cpp */
 #include <cstdio>
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static result rvv_utf16_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) {
   const char16_t *const beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
     v = simdutf_byteflip<bflip>(v, vl);
     long idx = __riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 255, vl), vl);
     if (idx >= 0)
       return result(error_code::TOO_LARGE, src - beg + idx);
-    __riscv_vse8_v_u8m4((uint8_t*)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
   }
   return result(error_code::SUCCESS, src - beg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf16le_to_latin1_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf16be_to_latin1_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   if (supports_zvbb())
-    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
+    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                   dst);
   else
     return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   const char16_t *const beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
-    __riscv_vse8_v_u8m4((uint8_t*)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
   }
   return src - beg;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   const char16_t *const beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
-    __riscv_vse8_v_u8m4((uint8_t*)dst, __riscv_vnsrl_wx_u8m4(v, 8, vl), vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vnsrl_wx_u8m4(v, 8, vl), vl);
   }
   return src - beg;
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static result rvv_utf16_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) {
   size_t n = len;
   const char16_t *srcBeg = src;
   const char *dstBeg = dst;
   size_t vl8m4 = __riscv_vsetvlmax_e8m4();
-  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(__riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
+  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
+      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
 
-  for (size_t vl, vlOut; n > 0; ) {
+  for (size_t vl, vlOut; n > 0;) {
     vl = __riscv_vsetvl_e16m2(n);
 
-    vuint16m2_t v = __riscv_vle16_v_u16m2((uint16_t const*)src, vl);
+    vuint16m2_t v = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
     v = simdutf_byteflip<bflip>(v, vl);
-    vbool8_t m234 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x80-1, vl);
+    vbool8_t m234 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x80 - 1, vl);
 
-    if (__riscv_vfirst_m_b8(m234,vl) < 0) { /* 1 byte utf8 */
+    if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
       vlOut = vl;
-      __riscv_vse8_v_u8m1((uint8_t*)dst, __riscv_vncvt_x_x_w_u8m1(v, vlOut), vlOut);
+      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(v, vlOut),
+                          vlOut);
       n -= vl, src += vl, dst += vlOut;
       continue;
     }
 
-    vbool8_t m34  = __riscv_vmsgtu_vx_u16m2_b8(v, 0x800-1, vl);
+    vbool8_t m34 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x800 - 1, vl);
 
-    if (__riscv_vfirst_m_b8(m34,vl) < 0) { /* 1/2 byte utf8 */
+    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
       /* 0: [     aaa|aabbbbbb]
        * 1: [aabbbbbb|        ] vsll 8
        * 2: [        |   aaaaa] vsrl 6
@@ -32305,28 +33604,32 @@ simdutf_really_inline static result rvv_utf16_to_utf8_with_errors(const char16_t
        * 4: [  bbbbbb|000aaaaa] (1|2)&3
        * 5: [11000000|11000000]
        * 6: [10bbbbbb|110aaaaa] 4|5 */
-      vuint16m2_t twoByte  =
-        __riscv_vand_vx_u16m2(__riscv_vor_vv_u16m2(
-          __riscv_vsll_vx_u16m2(v, 8, vl),
-          __riscv_vsrl_vx_u16m2(v, 6, vl),
-        vl), 0b0011111100011111, vl);
-      vuint16m2_t vout16 = __riscv_vor_vx_u16m2_mu(m234, v, twoByte, 0b1000000011000000, vl);
+      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
+          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(v, 8, vl),
+                               __riscv_vsrl_vx_u16m2(v, 6, vl), vl),
+          0b0011111100011111, vl);
+      vuint16m2_t vout16 =
+          __riscv_vor_vx_u16m2_mu(m234, v, twoByte, 0b1000000011000000, vl);
       vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
 
       /* Every high byte that is zero should be compressed
        * low bytes should never be compressed, so we set them
        * to all ones, and then create a non-zero bytes mask */
-      vbool4_t mcomp = __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(__riscv_vor_vx_u16m2(vout16, 0xFF, vl)), 0, vl*2);
-      vlOut = __riscv_vcpop_m_b4(mcomp, vl*2);
+      vbool4_t mcomp =
+          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
+                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
+                                   0, vl * 2);
+      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
 
-      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl*2);
-      __riscv_vse8_v_u8m2((uint8_t*)dst, vout, vlOut);
+      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
+      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
 
       n -= vl, src += vl, dst += vlOut;
       continue;
     }
 
-    vbool8_t sur = __riscv_vmseq_vx_u16m2_b8(__riscv_vand_vx_u16m2(v, 0xF800, vl), 0xD800, vl);
+    vbool8_t sur = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v, 0xF800, vl), 0xD800, vl);
     long first = __riscv_vfirst_m_b8(sur, vl);
     size_t tail = vl - first;
     vl = first < 0 ? vl : first;
@@ -32340,124 +33643,146 @@ simdutf_really_inline static result rvv_utf16_to_utf8_with_errors(const char16_t
        * v3: [        |1110aaaa] vsrl 12 | 0b11100000
        *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
        *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
-       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb] [10cccccc]
+       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
+       * [10cccccc]
        */
       vuint16m2_t v1, v2, v3, v12;
-      v1 = __riscv_vor_vx_u16m2_mu(m234, v, __riscv_vand_vx_u16m2(v, 0b00111111, vl), 0b10000000, vl);
+      v1 = __riscv_vor_vx_u16m2_mu(
+          m234, v, __riscv_vand_vx_u16m2(v, 0b00111111, vl), 0b10000000, vl);
       v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
 
-      v2 = __riscv_vor_vx_u16m2(__riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 6, vl), 0b00111111, vl), 0b10000000, vl);
-      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34,vl), v2, v2, 0b01000000, vl);
-      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 12, vl), 0b11100000, vl);
+      v2 = __riscv_vor_vx_u16m2(
+          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 6, vl), 0b00111111,
+                                vl),
+          0b10000000, vl);
+      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
+                                   0b01000000, vl);
+      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 12, vl), 0b11100000,
+                                vl);
       v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
 
-      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1<<8, vl);
+      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
       vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
       vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
 
-      vbool2_t mcomp = __riscv_vmor_mm_b2(m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl*4), vl*4);
-      vlOut = __riscv_vcpop_m_b2(mcomp, vl*4);
+      vbool2_t mcomp = __riscv_vmor_mm_b2(
+          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
+      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
 
-      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl*4);
-      __riscv_vse8_v_u8m4((uint8_t*)dst, vout, vlOut);
+      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
+      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
 
       n -= vl, src += vl, dst += vlOut;
     }
 
-    if (tail) while (n) {
-      uint16_t word = simdutf_byteflip<bflip>(src[0]);
-      if((word & 0xFF80)==0) {
-        break;
-      } else if((word & 0xF800)==0) {
-        break;
-      } else if ((word & 0xF800) != 0xD800) {
-        break;
-      } else {
-        // must be a surrogate pair
-        if (n <= 1) return result(error_code::SURROGATE, src - srcBeg);
-        uint16_t diff = word - 0xD800;
-        if (diff > 0x3FF) return result(error_code::SURROGATE, src - srcBeg);
-        uint16_t diff2 = simdutf_byteflip<bflip>(src[1]) - 0xDC00;
-        if (diff2 > 0x3FF) return result(error_code::SURROGATE, src - srcBeg);
-
-        uint32_t value = ((diff + 0x40) << 10) + diff2 ;
-
-        // will generate four UTF-8 bytes
-        // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-        *dst++ = (char)( (value>>18)             | 0b11110000);
-        *dst++ = (char)(((value>>12) & 0b111111) | 0b10000000);
-        *dst++ = (char)(((value>> 6) & 0b111111) | 0b10000000);
-        *dst++ = (char)(( value      & 0b111111) | 0b10000000);
-        src += 2;
-        n -= 2;
+    if (tail)
+      while (n) {
+        uint16_t word = simdutf_byteflip<bflip>(src[0]);
+        if ((word & 0xFF80) == 0) {
+          break;
+        } else if ((word & 0xF800) == 0) {
+          break;
+        } else if ((word & 0xF800) != 0xD800) {
+          break;
+        } else {
+          // must be a surrogate pair
+          if (n <= 1)
+            return result(error_code::SURROGATE, src - srcBeg);
+          uint16_t diff = word - 0xD800;
+          if (diff > 0x3FF)
+            return result(error_code::SURROGATE, src - srcBeg);
+          uint16_t diff2 = simdutf_byteflip<bflip>(src[1]) - 0xDC00;
+          if (diff2 > 0x3FF)
+            return result(error_code::SURROGATE, src - srcBeg);
+
+          uint32_t value = ((diff + 0x40) << 10) + diff2;
+
+          // will generate four UTF-8 bytes
+          // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+          *dst++ = (char)((value >> 18) | 0b11110000);
+          *dst++ = (char)(((value >> 12) & 0b111111) | 0b10000000);
+          *dst++ = (char)(((value >> 6) & 0b111111) | 0b10000000);
+          *dst++ = (char)((value & 0b111111) | 0b10000000);
+          src += 2;
+          n -= 2;
+        }
       }
-    }
   }
 
   return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf16le_to_utf8_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf16be_to_utf8_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   if (supports_zvbb())
     return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
   else
     return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   return convert_utf16le_to_utf8(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
   return convert_utf16be_to_utf8(src, len, dst);
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static result rvv_utf16_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) {
   const char16_t *const srcBeg = src;
   char32_t *const dstBeg = dst;
 
-  constexpr const uint16_t ANY_SURROGATE_MASK  = 0xf800;
+  constexpr const uint16_t ANY_SURROGATE_MASK = 0xf800;
   constexpr const uint16_t ANY_SURROGATE_VALUE = 0xd800;
-  constexpr const uint16_t LO_SURROGATE_MASK  = 0xfc00;
+  constexpr const uint16_t LO_SURROGATE_MASK = 0xfc00;
   constexpr const uint16_t LO_SURROGATE_VALUE = 0xdc00;
-  constexpr const uint16_t HI_SURROGATE_MASK  = 0xfc00;
+  constexpr const uint16_t HI_SURROGATE_MASK = 0xfc00;
   constexpr const uint16_t HI_SURROGATE_VALUE = 0xd800;
 
   uint16_t last = 0;
   while (len > 0) {
     size_t vl = __riscv_vsetvl_e16m2(len);
-    vuint16m2_t v0 = __riscv_vle16_v_u16m2((uint16_t const*)src, vl);
+    vuint16m2_t v0 = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
     v0 = simdutf_byteflip<bflip>(v0, vl);
 
-    {   // check fast-path
-        const vuint16m2_t v = __riscv_vand_vx_u16m2(v0, ANY_SURROGATE_MASK, vl);
-        const vbool8_t any_surrogate = __riscv_vmseq_vx_u16m2_b8(v, ANY_SURROGATE_VALUE, vl);
-        if (__riscv_vfirst_m_b8(any_surrogate, vl) < 0) {
-            /* no surrogates */
-            __riscv_vse32_v_u32m4((uint32_t*)dst, __riscv_vzext_vf2_u32m4(v0, vl), vl);
-            len -= vl;
-            src += vl;
-            dst += vl;
-            continue;
-        }
+    { // check fast-path
+      const vuint16m2_t v = __riscv_vand_vx_u16m2(v0, ANY_SURROGATE_MASK, vl);
+      const vbool8_t any_surrogate =
+          __riscv_vmseq_vx_u16m2_b8(v, ANY_SURROGATE_VALUE, vl);
+      if (__riscv_vfirst_m_b8(any_surrogate, vl) < 0) {
+        /* no surrogates */
+        __riscv_vse32_v_u32m4((uint32_t *)dst, __riscv_vzext_vf2_u32m4(v0, vl),
+                              vl);
+        len -= vl;
+        src += vl;
+        dst += vl;
+        continue;
+      }
     }
 
-    if ((simdutf_byteflip<bflip>(src[0]) & LO_SURROGATE_MASK) == LO_SURROGATE_VALUE) {
+    if ((simdutf_byteflip<bflip>(src[0]) & LO_SURROGATE_MASK) ==
+        LO_SURROGATE_VALUE) {
       return result(error_code::SURROGATE, src - srcBeg);
     }
 
@@ -32465,25 +33790,31 @@ simdutf_really_inline static result rvv_utf16_to_utf32_with_errors(const char16_
     vuint16m2_t v1 = __riscv_vslide1down_vx_u16m2(v0, 0, vl);
     vl = __riscv_vsetvl_e16m2(vl - 1);
     if (vl == 0) {
-        return result(error_code::SURROGATE, src - srcBeg);
+      return result(error_code::SURROGATE, src - srcBeg);
     }
 
-    const vbool8_t surhi  = __riscv_vmseq_vx_u16m2_b8(__riscv_vand_vx_u16m2(v0, HI_SURROGATE_MASK, vl), HI_SURROGATE_VALUE, vl);
-    const vbool8_t surlo  = __riscv_vmseq_vx_u16m2_b8(__riscv_vand_vx_u16m2(v1, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE, vl);
+    const vbool8_t surhi = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v0, HI_SURROGATE_MASK, vl), HI_SURROGATE_VALUE,
+        vl);
+    const vbool8_t surlo = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v1, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
+        vl);
 
     // compress everything but lo surrogates
-    const vbool8_t compress = __riscv_vmsne_vx_u16m2_b8(__riscv_vand_vx_u16m2(v0, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE, vl);
+    const vbool8_t compress = __riscv_vmsne_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v0, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
+        vl);
 
     {
-        const vbool8_t diff = __riscv_vmxor_mm_b8(surhi, surlo, vl);
-        const long idx = __riscv_vfirst_m_b8(diff, vl);
-        if (idx >= 0) {
-          uint16_t word = simdutf_byteflip<bflip>(src[idx]);
-          if(word < 0xD800 || word > 0xDBFF) {
-            return result(error_code::SURROGATE, src - srcBeg + idx + 1);
-          }
-          return result(error_code::SURROGATE, src - srcBeg + idx);
+      const vbool8_t diff = __riscv_vmxor_mm_b8(surhi, surlo, vl);
+      const long idx = __riscv_vfirst_m_b8(diff, vl);
+      if (idx >= 0) {
+        uint16_t word = simdutf_byteflip<bflip>(src[idx]);
+        if (word < 0xD800 || word > 0xDBFF) {
+          return result(error_code::SURROGATE, src - srcBeg + idx + 1);
         }
+        return result(error_code::SURROGATE, src - srcBeg + idx);
+      }
     }
 
     last = simdutf_byteflip<bflip>(src[vl]);
@@ -32493,12 +33824,14 @@ simdutf_really_inline static result rvv_utf16_to_utf32_with_errors(const char16_
     // v1 = 110111xxxxxxxxxx (0xdc00 + xxxxxxxxxx) --- lo surrogate
 
     // t0 = u16(                    0000_00yy_yyyy_yyyy)
-    const vuint32m4_t t0 = __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v0, 0x03ff, vl), vl);
+    const vuint32m4_t t0 =
+        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v0, 0x03ff, vl), vl);
     // t1 = u32(0000_0000_0000_yyyy_yyyy_yy00_0000_0000)
     const vuint32m4_t t1 = __riscv_vsll_vx_u32m4(t0, 10, vl);
 
     // t2 = u32(0000_0000_0000_0000_0000_00xx_xxxx_xxxx)
-    const vuint32m4_t t2   = __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v1, 0x03ff, vl), vl);
+    const vuint32m4_t t2 =
+        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v1, 0x03ff, vl), vl);
 
     // t3 = u32(0000_0000_0000_yyyy_yyyy_yyxx_xxxx_xxxx)
     const vuint32m4_t t3 = __riscv_vor_vv_u32m4(t1, t2, vl);
@@ -32510,101 +33843,117 @@ simdutf_really_inline static result rvv_utf16_to_utf32_with_errors(const char16_
 
     const vuint32m4_t comp = __riscv_vcompress_vm_u32m4(result, compress, vl);
     const size_t vlOut = __riscv_vcpop_m_b8(compress, vl);
-    __riscv_vse32_v_u32m4((uint32_t*)dst, comp, vlOut);
+    __riscv_vse32_v_u32m4((uint32_t *)dst, comp, vlOut);
 
     len -= vl;
     src += vl;
     dst += vlOut;
 
     if ((last & LO_SURROGATE_MASK) == LO_SURROGATE_VALUE) {
-        // last item is lo surrogate and got already consumed
-        len -= 1;
-        src += 1;
+      // last item is lo surrogate and got already consumed
+      len -= 1;
+      src += 1;
     }
   }
 
   return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   result res = convert_utf16le_to_utf32_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   result res = convert_utf16be_to_utf32_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   if (supports_zvbb())
-    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
+    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                  dst);
   else
     return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   return convert_utf16le_to_utf32(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t *src, size_t len, char32_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
   return convert_utf16be_to_utf32(src, len, dst);
 }
 /* end file src/rvv/rvv_utf16_to.inl.cpp */
 /* begin file src/rvv/rvv_utf32_to.inl.cpp */
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf32_to_latin1_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   const char32_t *const beg = src;
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t*)src, vl);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
     long idx = __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 255, vl), vl);
     if (idx >= 0)
       return result(error_code::TOO_LARGE, src - beg + idx);
-      /* We don't use vcompress here, because its performance varies widely on current platforms.
-       * This might be worth reconsidering once there is more hardware available. */
-    __riscv_vse8_v_u8m2((uint8_t*)dst, __riscv_vncvt_x_x_w_u8m2(__riscv_vncvt_x_x_w_u16m4(v, vl), vl), vl);
+    /* We don't use vcompress here, because its performance varies widely on
+     * current platforms. This might be worth reconsidering once there is more
+     * hardware available. */
+    __riscv_vse8_v_u8m2(
+        (uint8_t *)dst,
+        __riscv_vncvt_x_x_w_u8m2(__riscv_vncvt_x_x_w_u16m4(v, vl), vl), vl);
   }
   return result(error_code::SUCCESS, src - beg);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   return convert_utf32_to_latin1(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   size_t n = len;
   const char32_t *srcBeg = src;
   const char *dstBeg = dst;
   size_t vl8m4 = __riscv_vsetvlmax_e8m4();
-  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(__riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
+  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
+      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
 
-  for (size_t vl, vlOut; n > 0; ) {
+  for (size_t vl, vlOut; n > 0;) {
     vl = __riscv_vsetvl_e32m4(n);
 
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t const*)src, vl);
-    vbool8_t m234 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x80-1, vl);
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t const *)src, vl);
+    vbool8_t m234 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x80 - 1, vl);
     vuint16m2_t vn = __riscv_vncvt_x_x_w_u16m2(v, vl);
 
     if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
       vlOut = vl;
-      __riscv_vse8_v_u8m1((uint8_t*)dst, __riscv_vncvt_x_x_w_u8m1(vn, vlOut), vlOut);
+      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(vn, vlOut),
+                          vlOut);
       n -= vl, src += vl, dst += vlOut;
       continue;
     }
 
-    vbool8_t m34  = __riscv_vmsgtu_vx_u32m4_b8(v, 0x800-1, vl);
+    vbool8_t m34 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x800 - 1, vl);
 
-    if (__riscv_vfirst_m_b8(m34,vl) < 0) { /* 1/2 byte utf8 */
+    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
       /* 0: [     aaa|aabbbbbb]
        * 1: [aabbbbbb|        ] vsll 8
        * 2: [        |   aaaaa] vsrl 6
@@ -32612,37 +33961,49 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
        * 4: [  bbbbbb|000aaaaa] (1|2)&3
        * 5: [10000000|11000000]
        * 6: [10bbbbbb|110aaaaa] 4|5 */
-      vuint16m2_t twoByte  =
-        __riscv_vand_vx_u16m2(__riscv_vor_vv_u16m2(
-          __riscv_vsll_vx_u16m2(vn, 8, vl),
-          __riscv_vsrl_vx_u16m2(vn, 6, vl),
-        vl), 0b0011111100111111, vl);
-      vuint16m2_t vout16 = __riscv_vor_vx_u16m2_mu(m234, vn, twoByte, 0b1000000011000000, vl);
+      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
+          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(vn, 8, vl),
+                               __riscv_vsrl_vx_u16m2(vn, 6, vl), vl),
+          0b0011111100111111, vl);
+      vuint16m2_t vout16 =
+          __riscv_vor_vx_u16m2_mu(m234, vn, twoByte, 0b1000000011000000, vl);
       vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
 
       /* Every high byte that is zero should be compressed
        * low bytes should never be compressed, so we set them
        * to all ones, and then create a non-zero bytes mask */
-      vbool4_t mcomp = __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(__riscv_vor_vx_u16m2(vout16, 0xFF, vl)), 0, vl*2);
-      vlOut = __riscv_vcpop_m_b4(mcomp, vl*2);
+      vbool4_t mcomp =
+          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
+                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
+                                   0, vl * 2);
+      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
 
-      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl*2);
-      __riscv_vse8_v_u8m2((uint8_t*)dst, vout, vlOut);
+      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
+      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
 
       n -= vl, src += vl, dst += vlOut;
       continue;
     }
-    long idx1 = __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
-    vbool8_t sur = __riscv_vmseq_vx_u32m4_b8(__riscv_vand_vx_u32m4(v, 0xFFFFF800, vl), 0xD800, vl);
+    long idx1 =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
+    vbool8_t sur = __riscv_vmseq_vx_u32m4_b8(
+        __riscv_vand_vx_u32m4(v, 0xFFFFF800, vl), 0xD800, vl);
     long idx2 = __riscv_vfirst_m_b8(sur, vl);
-    if(idx1 >= 0 && idx2 >= 0) {
-      if(idx1 <= idx2) { return result(error_code::TOO_LARGE, src - srcBeg + idx1); }
-      else { return result(error_code::SURROGATE, src - srcBeg + idx2); }
+    if (idx1 >= 0 && idx2 >= 0) {
+      if (idx1 <= idx2) {
+        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+      } else {
+        return result(error_code::SURROGATE, src - srcBeg + idx2);
+      }
+    }
+    if (idx1 >= 0) {
+      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+    }
+    if (idx2 >= 0) {
+      return result(error_code::SURROGATE, src - srcBeg + idx2);
     }
-    if (idx1 >= 0) { return result(error_code::TOO_LARGE, src - srcBeg + idx1); }
-    if (idx2 >= 0) { return result(error_code::SURROGATE, src - srcBeg + idx2); }
 
-    vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x10000-1, vl);
+    vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x10000 - 1, vl);
     long first = __riscv_vfirst_m_b8(m4, vl);
     size_t tail = vl - first;
     vl = first < 0 ? vl : first;
@@ -32656,138 +34017,611 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
        * v3: [        |1110aaaa] vsrl 12 | 0b11100000
        *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
        *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
-       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb] [10cccccc]
+       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
+       * [10cccccc]
        */
       vuint16m2_t v1, v2, v3, v12;
-      v1 = __riscv_vor_vx_u16m2_mu(m234, vn, __riscv_vand_vx_u16m2(vn, 0b00111111, vl), 0b10000000, vl);
+      v1 = __riscv_vor_vx_u16m2_mu(
+          m234, vn, __riscv_vand_vx_u16m2(vn, 0b00111111, vl), 0b10000000, vl);
       v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
 
-      v2 = __riscv_vor_vx_u16m2(__riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 6, vl), 0b00111111, vl), 0b10000000, vl);
-      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34,vl), v2, v2, 0b01000000, vl);
-      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 12, vl), 0b11100000, vl);
+      v2 = __riscv_vor_vx_u16m2(
+          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 6, vl), 0b00111111,
+                                vl),
+          0b10000000, vl);
+      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
+                                   0b01000000, vl);
+      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 12, vl), 0b11100000,
+                                vl);
       v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
 
-      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1<<8, vl);
+      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
       vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
       vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
 
-      vbool2_t mcomp = __riscv_vmor_mm_b2(m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl*4), vl*4);
-      vlOut = __riscv_vcpop_m_b2(mcomp, vl*4);
+      vbool2_t mcomp = __riscv_vmor_mm_b2(
+          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
+      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
 
-      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl*4);
-      __riscv_vse8_v_u8m4((uint8_t*)dst, vout, vlOut);
+      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
+      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
 
       n -= vl, src += vl, dst += vlOut;
     }
 
-    if (tail) while (n) {
-      uint32_t word = src[0];
-      if (word < 0x10000) break;
-      if (word > 0x10FFFF) return result(error_code::TOO_LARGE, src - srcBeg);
-      *dst++ = (uint8_t)(( word>>18)             | 0b11110000);
-      *dst++ = (uint8_t)(((word>>12) & 0b111111) | 0b10000000);
-      *dst++ = (uint8_t)(((word>> 6) & 0b111111) | 0b10000000);
-      *dst++ = (uint8_t)(( word      & 0b111111) | 0b10000000);
-      ++src;
-      --n;
-    }
+    if (tail)
+      while (n) {
+        uint32_t word = src[0];
+        if (word < 0x10000)
+          break;
+        if (word > 0x10FFFF)
+          return result(error_code::TOO_LARGE, src - srcBeg);
+        *dst++ = (uint8_t)((word >> 18) | 0b11110000);
+        *dst++ = (uint8_t)(((word >> 12) & 0b111111) | 0b10000000);
+        *dst++ = (uint8_t)(((word >> 6) & 0b111111) | 0b10000000);
+        *dst++ = (uint8_t)((word & 0b111111) | 0b10000000);
+        ++src;
+        --n;
+      }
   }
 
   return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   result res = convert_utf32_to_utf8_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t *src, size_t len, char *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *src, size_t len, char *dst) const noexcept {
   return convert_utf32_to_utf8(src, len, dst);
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static result rvv_convert_utf32_to_utf16_with_errors(const char32_t *src, size_t len, char16_t *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_convert_utf32_to_utf16_with_errors(const char32_t *src, size_t len,
+                                       char16_t *dst) {
   size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(__riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
   const char16_t *dstBeg = dst;
   const char32_t *srcBeg = src;
   for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
     vl = __riscv_vsetvl_e32m4(len);
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t*)src, vl);
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
     vuint32m4_t off = __riscv_vadd_vx_u32m4(v, 0xFFFF2000, vl);
-    long idx1 = __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
-    long idx2 = __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(off, 0xFFFFF7FF, vl), vl);
+    long idx1 =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
+    long idx2 = __riscv_vfirst_m_b8(
+        __riscv_vmsgtu_vx_u32m4_b8(off, 0xFFFFF7FF, vl), vl);
     if (idx1 >= 0 && idx2 >= 0) {
-      if (idx1 <= idx2) return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+      if (idx1 <= idx2)
+        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
       return result(error_code::SURROGATE, src - srcBeg + idx2);
     }
-    if (idx1 >= 0) return result(error_code::TOO_LARGE, src - srcBeg + idx1);
-    if (idx2 >= 0) return result(error_code::SURROGATE, src - srcBeg + idx2);
-    long idx = __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl);
+    if (idx1 >= 0)
+      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+    if (idx2 >= 0)
+      return result(error_code::SURROGATE, src - srcBeg + idx2);
+    long idx =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl);
     if (idx < 0) {
       vlOut = vl;
-      vuint16m2_t n = simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
-      __riscv_vse16_v_u16m2((uint16_t*)dst, n, vlOut);
+      vuint16m2_t n =
+          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
+      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
       continue;
     }
-    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t*)dst, v, vl, m4even);
+    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
   }
   return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
   result res = convert_utf32_to_utf16le_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
   result res = convert_utf32_to_utf16be_with_errors(src, len, dst);
   return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::NONE>(
+      src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
   if (supports_zvbb())
-    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
+    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::ZVBB>(
+        src, len, dst);
   else
-    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::V>(src, len, dst);
+    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::V>(src, len,
+                                                                       dst);
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t rvv_convert_valid_utf32_to_utf16(const char32_t *src, size_t len, char16_t *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_convert_valid_utf32_to_utf16(const char32_t *src, size_t len,
+                                 char16_t *dst) {
   size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(__riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
   char16_t *dstBeg = dst;
   for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
     vl = __riscv_vsetvl_e32m4(len);
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t*)src, vl);
-    if (__riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl) < 0) {
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
+    if (__riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl) <
+        0) {
       vlOut = vl;
-      vuint16m2_t n = simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
-      __riscv_vse16_v_u16m2((uint16_t*)dst, n, vlOut);
+      vuint16m2_t n =
+          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
+      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
       continue;
     }
-    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t*)dst, v, vl, m4even);
+    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
   }
   return dst - dstBeg;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::NONE>(src, len, dst);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::NONE>(src, len,
+                                                                  dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t *src, size_t len, char16_t *dst) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
   if (supports_zvbb())
-    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::ZVBB>(src, len, dst);
+    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                    dst);
   else
     return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::V>(src, len, dst);
 }
 /* end file src/rvv/rvv_utf32_to.inl.cpp */
+/* begin file src/rvv/rvv_utf8_to.inl.cpp */
+template <typename Tdst, simdutf_ByteFlip bflip, bool validate = true>
+simdutf_really_inline static size_t rvv_utf8_to_common(char const *src,
+                                                       size_t len, Tdst *dst) {
+  static_assert(std::is_same<Tdst, uint16_t>() ||
+                    std::is_same<Tdst, uint32_t>(),
+                "invalid type");
+  constexpr bool is16 = std::is_same<Tdst, uint16_t>();
+  constexpr endianness endian =
+      bflip == simdutf_ByteFlip::NONE ? endianness::LITTLE : endianness::BIG;
+  const auto scalar = [](char const *in, size_t count, Tdst *out) {
+    return is16 ? scalar::utf8_to_utf16::convert<endian>(in, count,
+                                                         (char16_t *)out)
+                : scalar::utf8_to_utf32::convert(in, count, (char32_t *)out);
+  };
+
+  if (len < 32)
+    return scalar(src, len, dst);
+
+  /* validate first three bytes */
+  if (validate) {
+    size_t idx = 3;
+    while (idx < len && (src[idx] >> 6) == 0b10)
+      ++idx;
+    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
+      return 0;
+  }
+
+  size_t tail = 3;
+  size_t n = len - tail;
+  Tdst *beg = dst;
+
+  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
+  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
+  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
+
+  const vuint8m1_t err1tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
+  const vuint8m1_t err2tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
+  const vuint8m1_t err3tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
+
+  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
+
+  for (size_t vl, vlOut; n > 0; n -= vl, src += vl, dst += vlOut) {
+    vl = __riscv_vsetvl_e8m2(n);
+
+    vuint8m2_t v0 = __riscv_vle8_v_u8m2((uint8_t const *)src, vl);
+    uint64_t max = __riscv_vmv_x_s_u8m1_u8(
+        __riscv_vredmaxu_vs_u8m2_u8m1(v0, __riscv_vmv_s_x_u8m1(0, vl), vl));
+
+    uint8_t next0 = src[vl + 0];
+    uint8_t next1 = src[vl + 1];
+    uint8_t next2 = src[vl + 2];
+
+    /* fast path: ASCII */
+    if ((max | next0 | next1 | next2) < 0b10000000) {
+      vlOut = vl;
+      if (is16)
+        __riscv_vse16_v_u16m4(
+            (uint16_t *)dst,
+            simdutf_byteflip<bflip>(__riscv_vzext_vf2_u16m4(v0, vlOut), vlOut),
+            vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf4_u32m8(v0, vlOut), vlOut);
+      continue;
+    }
+
+    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
+     * https://arxiv.org/abs/2010.03090 */
+    vuint8m2_t v1 = __riscv_vslide1down_vx_u8m2(v0, next0, vl);
+    vuint8m2_t v2 = __riscv_vslide1down_vx_u8m2(v1, next1, vl);
+    vuint8m2_t v3 = __riscv_vslide1down_vx_u8m2(v2, next2, vl);
+
+    if (validate) {
+      vuint8m2_t s1 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
+          __riscv_vreinterpret_v_u8m2_u16m2(v2), 4, __riscv_vsetvlmax_e16m2()));
+      vuint8m2_t s3 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
+          __riscv_vreinterpret_v_u8m2_u16m2(v3), 4, __riscv_vsetvlmax_e16m2()));
+
+      vuint8m2_t idx2 = __riscv_vand_vx_u8m2(v2, 0xF, vl);
+      vuint8m2_t idx1 = __riscv_vand_vx_u8m2(s1, 0xF, vl);
+      vuint8m2_t idx3 = __riscv_vand_vx_u8m2(s3, 0xF, vl);
+
+      vuint8m2_t err1 = simdutf_vrgather_u8m1x2(err1tbl, idx1);
+      vuint8m2_t err2 = simdutf_vrgather_u8m1x2(err2tbl, idx2);
+      vuint8m2_t err3 = simdutf_vrgather_u8m1x2(err3tbl, idx3);
+      vint8m2_t errs = __riscv_vreinterpret_v_u8m2_i8m2(
+          __riscv_vand_vv_u8m2(__riscv_vand_vv_u8m2(err1, err2, vl), err3, vl));
+
+      vbool4_t is_3 = __riscv_vmsgtu_vx_u8m2_b4(v1, 0b11100000 - 1, vl);
+      vbool4_t is_4 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b11110000 - 1, vl);
+      vbool4_t is_34 = __riscv_vmor_mm_b4(is_3, is_4, vl);
+      vbool4_t err34 =
+          __riscv_vmxor_mm_b4(is_34, __riscv_vmslt_vx_i8m2_b4(errs, 0, vl), vl);
+      vbool4_t errm =
+          __riscv_vmor_mm_b4(__riscv_vmsgt_vx_i8m2_b4(errs, 0, vl), err34, vl);
+      if (__riscv_vfirst_m_b4(errm, vl) >= 0)
+        return 0;
+    }
+
+    /* decoding */
+
+    /* mask of non continuation bytes */
+    vbool4_t m =
+        __riscv_vmsgt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v0), -65, vl);
+    vlOut = __riscv_vcpop_m_b4(m, vl);
+
+    /* extract first and second bytes */
+    vuint8m2_t b1 = __riscv_vcompress_vm_u8m2(v0, m, vl);
+    vuint8m2_t b2 = __riscv_vcompress_vm_u8m2(v1, m, vl);
+
+    /* fast path: one and two byte */
+    if (max < 0b11100000) {
+      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
+
+      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
+      b1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
+
+      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
+          b1,
+          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
+                                  vlOut),
+          vlOut);
+      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
+      if (is16)
+        __riscv_vse16_v_u16m4((uint16_t *)dst,
+                              simdutf_byteflip<bflip>(b12, vlOut), vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf2_u32m8(b12, vlOut), vlOut);
+      continue;
+    }
+
+    /* fast path: one, two and three byte */
+    if (max < 0b11110000) {
+      vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
+
+      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
+      b3 = __riscv_vand_vx_u8m2(b3, 0b00111111, vlOut);
+
+      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
+      vbool4_t m3 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b11011111, vlOut);
+
+      vuint8m2_t t1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
+      b1 = __riscv_vand_vx_u8m2_mu(m3, t1, b1, 15, vlOut);
+
+      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
+          b1,
+          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
+                                  vlOut),
+          vlOut);
+      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
+      vuint16m4_t b123 = __riscv_vwaddu_wv_u16m4_mu(
+          m3, b12, __riscv_vsll_vx_u16m4_mu(m3, b12, b12, 6, vlOut), b3, vlOut);
+      if (is16)
+        __riscv_vse16_v_u16m4((uint16_t *)dst,
+                              simdutf_byteflip<bflip>(b123, vlOut), vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf2_u32m8(b123, vlOut), vlOut);
+      continue;
+    }
+
+    /* extract third and fourth bytes */
+    vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
+    vuint8m2_t b4 = __riscv_vcompress_vm_u8m2(v3, m, vl);
+
+    /* remove prefix from leading bytes
+     *
+     * We could also use vrgather here, but it increases register pressure,
+     * and its performance varies widely on current platforms. It might be
+     * worth reconsidering, though, once there is more hardware available.
+     * Same goes for the __riscv_vsrl_vv_u32m4 correction step.
+     *
+     * We shift left and then right by the number of bytes in the prefix,
+     * which can be calculated as follows:
+     *         x                                max(x-10, 0)
+     * 0xxx -> 0000-0111 -> sift by 0 or 1   -> 0
+     * 10xx -> 1000-1011 -> don't care
+     * 110x -> 1100,1101 -> sift by 3        -> 2,3
+     * 1110 -> 1110      -> sift by 4        -> 4
+     * 1111 -> 1111      -> sift by 5        -> 5
+     *
+     * vssubu.vx v, 10, (max(x-10, 0)) almost gives us what we want, we
+     * just need to manually detect and handle the one special case:
+     */
+#define SIMDUTF_RVV_UTF8_TO_COMMON_M1(idx)                                     \
+  vuint8m1_t c1 = __riscv_vget_v_u8m2_u8m1(b1, idx);                           \
+  vuint8m1_t c2 = __riscv_vget_v_u8m2_u8m1(b2, idx);                           \
+  vuint8m1_t c3 = __riscv_vget_v_u8m2_u8m1(b3, idx);                           \
+  vuint8m1_t c4 = __riscv_vget_v_u8m2_u8m1(b4, idx);                           \
+  /* remove prefix from trailing bytes */                                      \
+  c2 = __riscv_vand_vx_u8m1(c2, 0b00111111, vlOut);                            \
+  c3 = __riscv_vand_vx_u8m1(c3, 0b00111111, vlOut);                            \
+  c4 = __riscv_vand_vx_u8m1(c4, 0b00111111, vlOut);                            \
+  vuint8m1_t shift = __riscv_vsrl_vx_u8m1(c1, 4, vlOut);                       \
+  shift = __riscv_vmerge_vxm_u8m1(__riscv_vssubu_vx_u8m1(shift, 10, vlOut), 3, \
+                                  __riscv_vmseq_vx_u8m1_b8(shift, 12, vlOut),  \
+                                  vlOut);                                      \
+  c1 = __riscv_vsll_vv_u8m1(c1, shift, vlOut);                                 \
+  c1 = __riscv_vsrl_vv_u8m1(c1, shift, vlOut);                                 \
+  /* unconditionally widen and combine to c1234 */                             \
+  vuint16m2_t c34 = __riscv_vwaddu_wv_u16m2(                                   \
+      __riscv_vwmulu_vx_u16m2(c3, 1 << 6, vlOut), c4, vlOut);                  \
+  vuint16m2_t c12 = __riscv_vwaddu_wv_u16m2(                                   \
+      __riscv_vwmulu_vx_u16m2(c1, 1 << 6, vlOut), c2, vlOut);                  \
+  vuint32m4_t c1234 = __riscv_vwaddu_wv_u32m4(                                 \
+      __riscv_vwmulu_vx_u32m4(c12, 1 << 12, vlOut), c34, vlOut);               \
+  /* derive required right-shift amount from `shift` to reduce                 \
+   * c1234 to the required number of bytes */                                  \
+  c1234 = __riscv_vsrl_vv_u32m4(                                               \
+      c1234,                                                                   \
+      __riscv_vzext_vf4_u32m4(                                                 \
+          __riscv_vmul_vx_u8m1(                                                \
+              __riscv_vrsub_vx_u8m1(__riscv_vssubu_vx_u8m1(shift, 2, vlOut),   \
+                                    3, vlOut),                                 \
+              6, vlOut),                                                       \
+          vlOut),                                                              \
+      vlOut);                                                                  \
+  /* store result in desired format */                                         \
+  if (is16)                                                                    \
+    vlDst = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, c1234, vlOut,     \
+                                            m4even);                           \
+  else                                                                         \
+    vlDst = vlOut, __riscv_vse32_v_u32m4((uint32_t *)dst, c1234, vlOut);
+
+    /* Unrolling this manually reduces register pressure and allows
+     * us to terminate early. */
+    {
+      size_t vlOutm2 = vlOut, vlDst;
+      vlOut = __riscv_vsetvl_e8m1(vlOut);
+      SIMDUTF_RVV_UTF8_TO_COMMON_M1(0)
+      if (vlOutm2 == vlOut) {
+        vlOut = vlDst;
+        continue;
+      }
+
+      dst += vlDst;
+      vlOut = vlOutm2 - vlOut;
+    }
+    {
+      size_t vlDst;
+      SIMDUTF_RVV_UTF8_TO_COMMON_M1(1)
+      vlOut = vlDst;
+    }
+
+#undef SIMDUTF_RVV_UTF8_TO_COMMON_M1
+  }
+
+  /* validate the last character and reparse it + tail */
+  if (len > tail) {
+    if ((src[0] >> 6) == 0b10)
+      --dst;
+    while ((src[0] >> 6) == 0b10 && tail < len)
+      --src, ++tail;
+    if (is16) {
+      /* go back one more, when on high surrogate */
+      if (simdutf_byteflip<bflip>((uint16_t)dst[-1]) >= 0xD800 &&
+          simdutf_byteflip<bflip>((uint16_t)dst[-1]) <= 0xDBFF)
+        --dst;
+    }
+  }
+  size_t ret = scalar(src, tail, dst);
+  if (ret == 0)
+    return 0;
+  return (size_t)(dst - beg) + ret;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *src, size_t len, char *dst) const noexcept {
+  const char *beg = dst;
+  uint8_t last = 0;
+  for (size_t vl, vlOut; len > 0;
+       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    // check which bytes are ASCII
+    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
+    // count ASCII bytes
+    vlOut = __riscv_vcpop_m_b4(ascii, vl);
+    // The original code would only enter the next block after this check:
+    //   vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+    //   vlOut = __riscv_vcpop_m_b4(m, vl);
+    //   if (vlOut != vl || last > 0b01111111) {...}q
+    // So that everything is ASCII or continuation bytes, we just proceeded
+    // without any processing, going straight to __riscv_vse8_v_u8m2.
+    // But you need the __riscv_vslide1up_vx_u8m2 whenever there is a non-ASCII
+    // byte.
+    if (vlOut != vl) { // If not pure ASCII
+      // Non-ASCII characters
+      // We now want to mark the ascii and continuation bytes
+      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+      // We count them, that's our new vlOut (output vector length)
+      vlOut = __riscv_vcpop_m_b4(m, vl);
+
+      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
+
+      vbool4_t leading0 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b10111111, vl);
+      vbool4_t trailing1 = __riscv_vmslt_vx_i8m2_b4(
+          __riscv_vreinterpret_v_u8m2_i8m2(v1), (uint8_t)0b11000000, vl);
+      // -62 i 0b11000010, so we check whether any of v0 is too big
+      vbool4_t tobig = __riscv_vmand_mm_b4(
+          leading0,
+          __riscv_vmsgtu_vx_u8m2_b4(__riscv_vxor_vx_u8m2(v0, (uint8_t)-62, vl),
+                                    1, vl),
+          vl);
+      if (__riscv_vfirst_m_b4(
+              __riscv_vmor_mm_b4(
+                  tobig, __riscv_vmxor_mm_b4(leading0, trailing1, vl), vl),
+              vl) >= 0)
+        return 0;
+
+      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
+                                  v1, v1, 0b01000000, vl);
+      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
+    } else if (last >= 0b11000000) { // If last byte is a leading  byte and we
+                                     // got only ASCII, error!
+      return 0;
+    }
+    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
+  }
+  if (last > 0b10111111)
+    return 0;
+  return dst - beg;
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *src, size_t len, char *dst) const noexcept {
+  size_t res = convert_utf8_to_latin1(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_latin1::convert_with_errors(src, len, dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *src, size_t len, char *dst) const noexcept {
+  const char *beg = dst;
+  uint8_t last = 0;
+  for (size_t vl, vlOut; len > 0;
+       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
+    vlOut = __riscv_vcpop_m_b4(ascii, vl);
+    if (vlOut != vl) { // If not pure ASCII
+      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+      vlOut = __riscv_vcpop_m_b4(m, vl);
+      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
+      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
+                                  v1, v1, 0b01000000, vl);
+      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
+    }
+    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
+  }
+  return dst - beg;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE>(src, len,
+                                                              (uint16_t *)dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB>(
+        src, len, (uint16_t *)dst);
+  else
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V>(src, len,
+                                                             (uint16_t *)dst);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf16le(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
+      src, len, dst);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf16be(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(src, len,
+                                                                     dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE, false>(
+      src, len, (uint16_t *)dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB, false>(
+        src, len, (uint16_t *)dst);
+  else
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V, false>(
+        src, len, (uint16_t *)dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE>(src, len,
+                                                              (uint32_t *)dst);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf32(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf32::convert_with_errors(src, len, dst);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE, false>(
+      src, len, (uint32_t *)dst);
+}
+/* end file src/rvv/rvv_utf8_to.inl.cpp */
 
-simdutf_warn_unused int implementation::detect_encodings(const char *input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
   if (bom_encoding != encoding_type::unspecified)
@@ -32797,117 +34631,137 @@ simdutf_warn_unused int implementation::detect_encodings(const char *input, size
   if (validate_utf8(input, length))
     out |= encoding_type::UTF8;
   if (length % 2 == 0) {
-    if (validate_utf16(reinterpret_cast<const char16_t*>(input), length/2))
+    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2))
       out |= encoding_type::UTF16_LE;
   }
   if (length % 4 == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t*>(input), length/4))
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4))
       out |= encoding_type::UTF32_LE;
   }
 
   return out;
 }
 
-template<simdutf_ByteFlip bflip>
-simdutf_really_inline static void rvv_change_endianness_utf16(const char16_t *src, size_t len, char16_t *dst) {
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static void
+rvv_change_endianness_utf16(const char16_t *src, size_t len, char16_t *dst) {
   for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
     vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t*)src, vl);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
     __riscv_vse16_v_u16m8((uint16_t *)dst, simdutf_byteflip<bflip>(v, vl), vl);
   }
 }
 
-void implementation::change_endianness_utf16(const char16_t *src, size_t len, char16_t *dst) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *src, size_t len,
+                                             char16_t *dst) const noexcept {
   if (supports_zvbb())
     return rvv_change_endianness_utf16<simdutf_ByteFlip::ZVBB>(src, len, dst);
   else
     return rvv_change_endianness_utf16<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   size_t equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
+  if (length == 0) {
+    if (equalsigns > 0) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
     length--;
   }
-  size_t equallocation = length; // location of the first padding character if any
+  size_t equallocation =
+      length; // location of the first padding character if any
   auto equalsigns = 0;
-  if(length > 0 && input[length - 1] == '=') {
+  if (length > 0 && input[length - 1] == '=') {
     equallocation = length - 1;
     length -= 1;
     equalsigns++;
-    while(length > 0 && scalar::base64::is_ascii_white_space(input[length - 1])) {
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
       length--;
     }
-    if(length > 0 && input[length - 1] == '=') {
+    if (length > 0 && input[length - 1] == '=') {
       equallocation = length - 1;
       equalsigns++;
       length -= 1;
     }
   }
-  if(length == 0) {
-    if(equalsigns > 0) {
+  if (length == 0) {
+    if (equalsigns > 0) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
     return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(output, input, length, options);
-  if(r.error == error_code::SUCCESS && equalsigns > 0) {
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
     // additional checks
-    if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
   return r;
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
   return scalar::base64::tail_encode_base64(output, input, length, options);
 }
 } // namespace rvv
@@ -32939,25 +34793,36 @@ namespace simdutf {
 namespace westmere {
 namespace {
 #ifndef SIMDUTF_WESTMERE_H
-#error "westmere.h must be included"
+  #error "westmere.h must be included"
 #endif
 using namespace simd;
 
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t>& input) {
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
   return input.reduce_or().is_ascii();
 }
 
-simdutf_unused simdutf_really_inline simd8<bool> must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_second_byte = prev1.saturating_sub(0b11000000u-1); // Only 11______ will be > 0
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0b11100000u-1); // Only 111_____ will be > 0
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0b11110000u-1); // Only 1111____ will be > 0
-  // Caller requires a bool (all 1's). All values resulting from the subtraction will be <= 64, so signed comparison is fine.
-  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) > int8_t(0);
-}
-
-simdutf_really_inline simd8<bool> must_be_2_3_continuation(const simd8<uint8_t> prev2, const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_third_byte  = prev2.saturating_sub(0xe0u-0x80); // Only 111_____ will be >= 0x80
-  simd8<uint8_t> is_fourth_byte = prev3.saturating_sub(0xf0u-0x80); // Only 1111____ will be >= 0x80
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_second_byte =
+      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
+         int8_t(0);
+}
+
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
   return simd8<bool>(is_third_byte | is_fourth_byte);
 }
 
@@ -32967,18 +34832,15 @@ namespace westmere {
 
 /* begin file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
 /*
-* reads a vector of uint16 values
-* bits after 11th are ignored
-* first 11 bits are encoded into utf8
-* !important! utf8_output must have at least 16 writable bytes
-*/
+ * reads a vector of uint16 values
+ * bits after 11th are ignored
+ * first 11 bits are encoded into utf8
+ * !important! utf8_output must have at least 16 writable bytes
+ */
 
-inline void write_v_u16_11bits_to_utf8(
-  const __m128i v_u16,
-  char*& utf8_output,
-  const __m128i one_byte_bytemask,
-  const uint16_t one_byte_bitmask
-) {
+inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
+                                       const __m128i one_byte_bytemask,
+                                       const uint16_t one_byte_bitmask) {
   // 0b1100_0000_1000_0000
   const __m128i v_c080 = _mm_set1_epi16((int16_t)0xc080);
   // 0b0001_1111_0000_0000
@@ -32987,8 +34849,8 @@ inline void write_v_u16_11bits_to_utf8(
   const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f);
 
   // 1. prepare 2-byte values
-          // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-          // expected output   : [110a|aaaa|10bb|bbbb] x 8
+  // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+  // expected output   : [110a|aaaa|10bb|bbbb] x 8
 
   // t0 = [000a|aaaa|bbbb|bb00]
   const __m128i t0 = _mm_slli_epi16(v_u16, 2);
@@ -33005,34 +34867,35 @@ inline void write_v_u16_11bits_to_utf8(
   const __m128i utf8_unpacked = _mm_blendv_epi8(t4, v_u16, one_byte_bytemask);
 
   // 3. prepare bitmask for 8-bit lookup
-  //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a - LSB)
-  const uint16_t m0 = one_byte_bitmask & 0x5555;  // m0 = 0h0g0f0e0d0c0b0a
-  const uint16_t m1 = static_cast<uint16_t>(m0 >> 7);                    // m1 = 00000000h0g0f0e0
-  const uint8_t  m2 = static_cast<uint8_t>((m0 | m1) & 0xff);           // m2 =         hdgcfbea
+  //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a
+  //    - LSB)
+  const uint16_t m0 = one_byte_bitmask & 0x5555;      // m0 = 0h0g0f0e0d0c0b0a
+  const uint16_t m1 = static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+  const uint8_t m2 = static_cast<uint8_t>((m0 | m1) & 0xff); // m2 = hdgcfbea
   // 4. pack the bytes
-  const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-  const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
+  const uint8_t *row =
+      &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+  const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
   const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
 
   // 5. store bytes
-  _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+  _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
 
   // 6. adjust pointers
   utf8_output += row[0];
 }
 
-inline void write_v_u16_11bits_to_utf8(
-  const __m128i v_u16,
-  char*& utf8_output,
-  const __m128i v_0000,
-  const __m128i v_ff80
-) {
+inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
+                                       const __m128i v_0000,
+                                       const __m128i v_ff80) {
   // no bits set above 7th bit
-  const __m128i one_byte_bytemask = _mm_cmpeq_epi16(_mm_and_si128(v_u16, v_ff80), v_0000);
-  const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+  const __m128i one_byte_bytemask =
+      _mm_cmpeq_epi16(_mm_and_si128(v_u16, v_ff80), v_0000);
+  const uint16_t one_byte_bitmask =
+      static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
-  write_v_u16_11bits_to_utf8(
-    v_u16, utf8_output, one_byte_bytemask, one_byte_bitmask);
+  write_v_u16_11bits_to_utf8(v_u16, utf8_output, one_byte_bytemask,
+                             one_byte_bitmask);
 }
 /* end file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
 
@@ -33078,241 +34941,258 @@ inline void write_v_u16_11bits_to_utf8(
       0   0   1   0   1   0   0   0   b = a << 1
       1   1   1   1   1   1   1   0   c = V | a | b
                                   ^
-                                  the last bit can be zero, we just consume 7 code units
-                                  and recheck this word in the next iteration
+                                  the last bit can be zero, we just consume 7
+   code units and recheck this word in the next iteration
 */
 
 /* Returns:
-   - pointer to the last unprocessed character (a scalar fallback should check the rest);
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
    - nullptr if an error was detected.
 */
 template <endianness big_endian>
-const char16_t* sse_validate_utf16(const char16_t* input, size_t size) {
-    const char16_t* end = input + size;
-
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
-
-    while (input + simd16<uint16_t>::SIZE * 2 < end) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
-        if (big_endian) {
-            in0 = in0.swap_bytes();
-            in1 = in1.swap_bytes();
-        }
+const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
 
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
 
-        const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd16<uint16_t>::pack(t0, t1);
 
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const auto surrogates_wordmask = (in & v_f8) == v_d8;
-        const uint16_t surrogates_bitmask = static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
-        if (surrogates_bitmask == 0x0000) {
-            input += 16;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto    vH = (in & v_fc) == v_dc;
-            const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
-
-            const uint16_t a = static_cast<uint16_t>(L & (H >> 1));  // A low surrogate must be followed by high one.
-                                              // (A low surrogate placed in the 7th register's word
-                                              // is an exception we handle.)
-            const uint16_t b = static_cast<uint16_t>(a << 1);        // Just mark that the opinput - startite fact is hold,
-                                              // thanks to that we have only two masks for valid case.
-            const uint16_t c = static_cast<uint16_t>(V | a | b);     // Combine all the masks into the final one.
-
-            if (c == 0xffff) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += 16;
-            } else if (c == 0x7fff) {
-                // The 15 lower code units of the input register contains valid UTF-16.
-                // The 15th word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += 15;
-            } else {
-                return nullptr;
-            }
-        }
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
+
+      if (c == 0xffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return nullptr;
+      }
     }
+  }
 
-    return input;
+  return input;
 }
 
-
 template <endianness big_endian>
-const result sse_validate_utf16_with_errors(const char16_t* input, size_t size) {
-    if (simdutf_unlikely(size == 0)) {
-        return result(error_code::SUCCESS, 0);
-    }
-    const char16_t* start = input;
-    const char16_t* end = input + size;
+const result sse_validate_utf16_with_errors(const char16_t *input,
+                                            size_t size) {
+  if (simdutf_unlikely(size == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  const char16_t *start = input;
+  const char16_t *end = input + size;
 
-    const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-    const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-    const auto v_fc = simd8<uint8_t>::splat(0xfc);
-    const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-    while (input + simd16<uint16_t>::SIZE * 2 < end) {
-        // 0. Load data: since the validation takes into account only higher
-        //    byte of each word, we compress the two vectors into one which
-        //    consists only the higher bytes.
-        auto in0 = simd16<uint16_t>(input);
-        auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
 
-        if (big_endian) {
-            in0 = in0.swap_bytes();
-            in1 = in1.swap_bytes();
-        }
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
 
-        const auto t0 = in0.shr<8>();
-        const auto t1 = in1.shr<8>();
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
 
-        const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd16<uint16_t>::pack(t0, t1);
 
-        // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-        const auto surrogates_wordmask = (in & v_f8) == v_d8;
-        const uint16_t surrogates_bitmask = static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
-        if (surrogates_bitmask == 0x0000) {
-            input += 16;
-        } else {
-            // 2. We have some surrogates that have to be distinguished:
-            //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-            //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-            //
-            //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
-
-            // V - non-surrogate code units
-            //     V = not surrogates_wordmask
-            const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
-
-            // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-            const auto    vH = (in & v_fc) == v_dc;
-            const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
-
-            // L - word mask for low surrogates
-            //     L = not H and surrogates_wordmask
-            const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
-
-            const uint16_t a = static_cast<uint16_t>(L & (H >> 1));  // A low surrogate must be followed by high one.
-                                              // (A low surrogate placed in the 7th register's word
-                                              // is an exception we handle.)
-            const uint16_t b = static_cast<uint16_t>(a << 1);        // Just mark that the opinput - startite fact is hold,
-                                              // thanks to that we have only two masks for valid case.
-            const uint16_t c = static_cast<uint16_t>(V | a | b);     // Combine all the masks into the final one.
-
-            if (c == 0xffff) {
-                // The whole input register contains valid UTF-16, i.e.,
-                // either single code units or proper surrogate pairs.
-                input += 16;
-            } else if (c == 0x7fff) {
-                // The 15 lower code units of the input register contains valid UTF-16.
-                // The 15th word may be either a low or high surrogate. It the next
-                // iteration we 1) check if the low surrogate is followed by a high
-                // one, 2) reject sole high surrogate.
-                input += 15;
-            } else {
-                return result(error_code::SURROGATE, input - start);
-            }
-        }
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
+
+      if (c == 0xffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
     }
+  }
 
-    return result(error_code::SUCCESS, input - start);
+  return result(error_code::SUCCESS, input - start);
 }
 /* end file src/westmere/sse_validate_utf16.cpp */
 /* begin file src/westmere/sse_validate_utf32le.cpp */
 /* Returns:
-   - pointer to the last unprocessed character (a scalar fallback should check the rest);
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
    - nullptr if an error was detected.
 */
-const char32_t* sse_validate_utf32le(const char32_t* input, size_t size) {
-    const char32_t* end = input + size;
-
-    const __m128i standardmax = _mm_set1_epi32(0x10ffff);
-    const __m128i offset = _mm_set1_epi32(0xffff2000);
-    const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
-    __m128i currentmax = _mm_setzero_si128();
-    __m128i currentoffsetmax = _mm_setzero_si128();
-
-    while (input + 4 < end) {
-        const __m128i in = _mm_loadu_si128((__m128i *)input);
-        currentmax = _mm_max_epu32(in,currentmax);
-        currentoffsetmax = _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
-        input += 4;
-    }
-    __m128i is_zero = _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
-    if(_mm_test_all_zeros(is_zero, is_zero) == 0) {
-        return nullptr;
-    }
+const char32_t *sse_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
 
-    is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-    if(_mm_test_all_zeros(is_zero, is_zero) == 0) {
-        return nullptr;
-    }
+  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
+  const __m128i offset = _mm_set1_epi32(0xffff2000);
+  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
+  __m128i currentmax = _mm_setzero_si128();
+  __m128i currentoffsetmax = _mm_setzero_si128();
 
-    return input;
-}
+  while (input + 4 < end) {
+    const __m128i in = _mm_loadu_si128((__m128i *)input);
+    currentmax = _mm_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
+    input += 4;
+  }
+  __m128i is_zero =
+      _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+    return nullptr;
+  }
 
+  is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
+                          standardoffsetmax);
+  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+    return nullptr;
+  }
 
-const result sse_validate_utf32le_with_errors(const char32_t* input, size_t size) {
-    const char32_t* start = input;
-    const char32_t* end = input + size;
+  return input;
+}
 
-    const __m128i standardmax = _mm_set1_epi32(0x10ffff);
-    const __m128i offset = _mm_set1_epi32(0xffff2000);
-    const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
-    __m128i currentmax = _mm_setzero_si128();
-    __m128i currentoffsetmax = _mm_setzero_si128();
+const result sse_validate_utf32le_with_errors(const char32_t *input,
+                                              size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
 
-    while (input + 4 < end) {
-        const __m128i in = _mm_loadu_si128((__m128i *)input);
-        currentmax = _mm_max_epu32(in,currentmax);
-        currentoffsetmax = _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
+  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
+  const __m128i offset = _mm_set1_epi32(0xffff2000);
+  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
+  __m128i currentmax = _mm_setzero_si128();
+  __m128i currentoffsetmax = _mm_setzero_si128();
 
-        __m128i is_zero = _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
-        if(_mm_test_all_zeros(is_zero, is_zero) == 0) {
-            return result(error_code::TOO_LARGE, input - start);
-        }
+  while (input + 4 < end) {
+    const __m128i in = _mm_loadu_si128((__m128i *)input);
+    currentmax = _mm_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
 
-        is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-        if(_mm_test_all_zeros(is_zero, is_zero) == 0) {
-            return result(error_code::SURROGATE, input - start);
-        }
-        input += 4;
+    __m128i is_zero =
+        _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
+    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
+
+    is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
+                            standardoffsetmax);
+    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+      return result(error_code::SURROGATE, input - start);
     }
+    input += 4;
+  }
 
-    return result(error_code::SUCCESS, input - start);
+  return result(error_code::SUCCESS, input - start);
 }
 /* end file src/westmere/sse_validate_utf32le.cpp */
 
 /* begin file src/westmere/sse_convert_latin1_to_utf8.cpp */
-std::pair<const char* const, char* const> sse_convert_latin1_to_utf8(
-  const char* latin_input,
-  const size_t latin_input_length,
-  char* utf8_output) {
-  const char* end = latin_input + latin_input_length;
+std::pair<const char *const, char *const>
+sse_convert_latin1_to_utf8(const char *latin_input,
+                           const size_t latin_input_length, char *utf8_output) {
+  const char *end = latin_input + latin_input_length;
 
   const __m128i v_0000 = _mm_setzero_si128();
   // 0b1000_0000
@@ -33320,70 +35200,60 @@ std::pair<const char* const, char* const> sse_convert_latin1_to_utf8(
   // 0b1111_1111_1000_0000
   const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80);
 
-  const __m128i latin_1_half_into_u16_byte_mask = _mm_setr_epi8(
-      0, '\x80',
-      1, '\x80',
-      2, '\x80',
-      3, '\x80',
-      4, '\x80',
-      5, '\x80',
-      6, '\x80',
-      7, '\x80'
-    );
-
-  const __m128i latin_2_half_into_u16_byte_mask = _mm_setr_epi8(
-      8, '\x80',
-      9, '\x80',
-      10, '\x80',
-      11, '\x80',
-      12, '\x80',
-      13, '\x80',
-      14, '\x80',
-      15, '\x80'
-    );
+  const __m128i latin_1_half_into_u16_byte_mask =
+      _mm_setr_epi8(0, '\x80', 1, '\x80', 2, '\x80', 3, '\x80', 4, '\x80', 5,
+                    '\x80', 6, '\x80', 7, '\x80');
+
+  const __m128i latin_2_half_into_u16_byte_mask =
+      _mm_setr_epi8(8, '\x80', 9, '\x80', 10, '\x80', 11, '\x80', 12, '\x80',
+                    13, '\x80', 14, '\x80', 15, '\x80');
 
   // each latin1 takes 1-2 utf8 bytes
-  // slow path writes useful 8-15 bytes twice (eagerly writes 16 bytes and then adjust the pointer)
-  // so the last write can exceed the utf8_output size by 8-1 bytes
-  // by reserving 8 extra input bytes, we expect the output to have 8-16 bytes free
+  // slow path writes useful 8-15 bytes twice (eagerly writes 16 bytes and then
+  // adjust the pointer) so the last write can exceed the utf8_output size by
+  // 8-1 bytes by reserving 8 extra input bytes, we expect the output to have
+  // 8-16 bytes free
   while (end - latin_input >= 16 + 8) {
     // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-    __m128i v_latin = _mm_loadu_si128((__m128i*)latin_input);
-
+    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
 
-    if (_mm_testz_si128(v_latin, v_80)) {// ASCII fast path!!!!
-      _mm_storeu_si128((__m128i*)utf8_output, v_latin);
+    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
+      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
       latin_input += 16;
       utf8_output += 16;
       continue;
     }
 
-
     // assuming a/b are bytes and A/B are uint16 of the same value
     // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
-    __m128i v_u16_latin_1_half = _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
+    __m128i v_u16_latin_1_half =
+        _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
     // aaaa_aaaa_bbbb_bbbb -> BBBB_BBBB
-    __m128i v_u16_latin_2_half = _mm_shuffle_epi8(v_latin, latin_2_half_into_u16_byte_mask);
-
+    __m128i v_u16_latin_2_half =
+        _mm_shuffle_epi8(v_latin, latin_2_half_into_u16_byte_mask);
 
-    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_1_half, utf8_output, v_0000, v_ff80);
-    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_2_half, utf8_output, v_0000, v_ff80);
+    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_1_half,
+                                                   utf8_output, v_0000, v_ff80);
+    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_2_half,
+                                                   utf8_output, v_0000, v_ff80);
     latin_input += 16;
   }
 
   if (end - latin_input >= 16) {
     // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-    __m128i v_latin = _mm_loadu_si128((__m128i*)latin_input);
+    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
 
-    if (_mm_testz_si128(v_latin, v_80)) {// ASCII fast path!!!!
-      _mm_storeu_si128((__m128i*)utf8_output, v_latin);
+    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
+      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
       latin_input += 16;
       utf8_output += 16;
     } else {
       // assuming a/b are bytes and A/B are uint16 of the same value
       // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
-      __m128i v_u16_latin_1_half = _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
-      internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_1_half, utf8_output, v_0000, v_ff80);
+      __m128i v_u16_latin_1_half =
+          _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          v_u16_latin_1_half, utf8_output, v_0000, v_ff80);
       latin_input += 8;
     }
   }
@@ -33393,115 +35263,125 @@ std::pair<const char* const, char* const> sse_convert_latin1_to_utf8(
 /* end file src/westmere/sse_convert_latin1_to_utf8.cpp */
 /* begin file src/westmere/sse_convert_latin1_to_utf16.cpp */
 template <endianness big_endian>
-std::pair<const char*, char16_t*> sse_convert_latin1_to_utf16(const char *latin1_input, size_t len,
-                                                              char16_t *utf16_output) {
-    size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
-    for (size_t i = 0; i < rounded_len; i += 16) {
-        // Load 16 Latin1 characters into a 128-bit register
-        __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&latin1_input[i]));
-        __m128i out1 = big_endian ? _mm_unpacklo_epi8(_mm_setzero_si128(), in)
-                         : _mm_unpacklo_epi8(in, _mm_setzero_si128());
-        __m128i out2 = big_endian ? _mm_unpackhi_epi8(_mm_setzero_si128(), in)
-                         : _mm_unpackhi_epi8(in, _mm_setzero_si128());
-        // Zero extend each Latin1 character to 16-bit integers and store the results back to memory
-        _mm_storeu_si128(reinterpret_cast<__m128i*>(&utf16_output[i]), out1);
-        _mm_storeu_si128(reinterpret_cast<__m128i*>(&utf16_output[i + 8]), out2);
-    }
-    // return pointers pointing to where we left off
-    return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
+std::pair<const char *, char16_t *>
+sse_convert_latin1_to_utf16(const char *latin1_input, size_t len,
+                            char16_t *utf16_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    // Load 16 Latin1 characters into a 128-bit register
+    __m128i in =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(&latin1_input[i]));
+    __m128i out1 = big_endian ? _mm_unpacklo_epi8(_mm_setzero_si128(), in)
+                              : _mm_unpacklo_epi8(in, _mm_setzero_si128());
+    __m128i out2 = big_endian ? _mm_unpackhi_epi8(_mm_setzero_si128(), in)
+                              : _mm_unpackhi_epi8(in, _mm_setzero_si128());
+    // Zero extend each Latin1 character to 16-bit integers and store the
+    // results back to memory
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i]), out1);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i + 8]), out2);
+  }
+  // return pointers pointing to where we left off
+  return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
 }
 /* end file src/westmere/sse_convert_latin1_to_utf16.cpp */
 /* begin file src/westmere/sse_convert_latin1_to_utf32.cpp */
-std::pair<const char*, char32_t*> sse_convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) {
-    const char* end = buf + len;
-
-    while (end - buf >= 16) {
-        // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-        __m128i in = _mm_loadu_si128((__m128i*)buf);
-
-        // Shift input to process next 4 bytes
-        __m128i in_shifted1 = _mm_srli_si128(in, 4);
-        __m128i in_shifted2 = _mm_srli_si128(in, 8);
-        __m128i in_shifted3 = _mm_srli_si128(in, 12);
-
-        // expand 8-bit to 32-bit unit
-        __m128i out1 = _mm_cvtepu8_epi32(in);
-        __m128i out2 = _mm_cvtepu8_epi32(in_shifted1);
-        __m128i out3 = _mm_cvtepu8_epi32(in_shifted2);
-        __m128i out4 = _mm_cvtepu8_epi32(in_shifted3);
-
-        _mm_storeu_si128((__m128i*)utf32_output, out1);
-        _mm_storeu_si128((__m128i*)(utf32_output + 4), out2);
-        _mm_storeu_si128((__m128i*)(utf32_output + 8), out3);
-        _mm_storeu_si128((__m128i*)(utf32_output + 12), out4);
-
-        utf32_output += 16;
-        buf += 16;
-    }
+std::pair<const char *, char32_t *>
+sse_convert_latin1_to_utf32(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char *end = buf + len;
 
-    return std::make_pair(buf, utf32_output);
-}
+  while (end - buf >= 16) {
+    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
 
-/* end file src/westmere/sse_convert_latin1_to_utf32.cpp */
+    // Shift input to process next 4 bytes
+    __m128i in_shifted1 = _mm_srli_si128(in, 4);
+    __m128i in_shifted2 = _mm_srli_si128(in, 8);
+    __m128i in_shifted3 = _mm_srli_si128(in, 12);
+
+    // expand 8-bit to 32-bit unit
+    __m128i out1 = _mm_cvtepu8_epi32(in);
+    __m128i out2 = _mm_cvtepu8_epi32(in_shifted1);
+    __m128i out3 = _mm_cvtepu8_epi32(in_shifted2);
+    __m128i out4 = _mm_cvtepu8_epi32(in_shifted3);
+
+    _mm_storeu_si128((__m128i *)utf32_output, out1);
+    _mm_storeu_si128((__m128i *)(utf32_output + 4), out2);
+    _mm_storeu_si128((__m128i *)(utf32_output + 8), out3);
+    _mm_storeu_si128((__m128i *)(utf32_output + 12), out4);
+
+    utf32_output += 16;
+    buf += 16;
+  }
 
+  return std::make_pair(buf, utf32_output);
+}
+/* end file src/westmere/sse_convert_latin1_to_utf32.cpp */
 
 /* begin file src/westmere/sse_convert_utf8_to_utf16.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
-
 // Convert up to 12 bytes from utf8 to utf16 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 template <endianness big_endian>
 size_t convert_masked_utf8_to_utf16(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char16_t *&utf16_output) {
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   // We first try a few fast paths.
-  const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  const __m128i swap =
+      _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
   const __m128i in = _mm_loadu_si128((__m128i *)input);
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xfff;
-  if(utf8_end_of_code_point_mask == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
     // Note: using 16 bytes is unsafe, see issue_ossfuzz_71218
     __m128i ascii_first = _mm_cvtepu8_epi16(in);
-    __m128i ascii_second = _mm_cvtepu8_epi16(_mm_srli_si128(in,8));
+    __m128i ascii_second = _mm_cvtepu8_epi16(_mm_srli_si128(in, 8));
     if (big_endian) {
       ascii_first = _mm_shuffle_epi8(ascii_first, swap);
       ascii_second = _mm_shuffle_epi8(ascii_second, swap);
     }
     _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output), ascii_first);
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + 8), ascii_second);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + 8),
+                     ascii_second);
     utf16_output += 12; // We wrote 12 16-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;          // We consumed 12 bytes.
   }
-  if(((utf8_end_of_code_point_mask & 0xFFFF) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte UTF-16 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  if (((utf8_end_of_code_point_mask & 0xFFFF) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian) composed = _mm_shuffle_epi8(composed, swap);
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed);
     utf16_output += 8; // We wrote 16 bytes, 8 code points.
     return 16;
   }
-  if(input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte UTF-16 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -33514,7 +35394,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i composed =
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
     __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian) composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
     utf16_output += 4;
     return 12;
@@ -33528,16 +35409,18 @@ size_t convert_masked_utf8_to_utf16(const char *input,
   if (idx < 64) {
     // SIX (6) input code-code units
     // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six code
-    // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-    // where pdep/pext is fast, we might be able to use a small lookup table.
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
     const __m128i sh =
         _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian) composed = _mm_shuffle_epi8(composed, swap);
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed);
     utf16_output += 6; // We wrote 12 bytes, 6 code points.
   } else if (idx < 145) {
@@ -33555,19 +35438,21 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
     const __m128i composed =
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
-     __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian) composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    __m128i composed_repacked = _mm_packus_epi32(composed, composed);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
     utf16_output += 4;
   } else if (idx < 209) {
     // TWO (2) input code-code units
     //////////////
-    // There might be garbage inputs where a leading byte mascarades as a four-byte
-    // leading byte (by being followed by 3 continuation byte), but is not greater than
-    // 0xf0. This could trigger a buffer overflow if we only counted leading
-    // bytes of the form 0xf0 as generating surrogate pairs, without further UTF-8 validation.
-    // Thus we must be careful to ensure that only leading bytes at least as large as 0xf0 generate surrogate pairs.
-    // We do as at the cost of an extra mask.
+    // There might be garbage inputs where a leading byte mascarades as a
+    // four-byte leading byte (by being followed by 3 continuation byte), but is
+    // not greater than 0xf0. This could trigger a buffer overflow if we only
+    // counted leading bytes of the form 0xf0 as generating surrogate pairs,
+    // without further UTF-8 validation. Thus we must be careful to ensure that
+    // only leading bytes at least as large as 0xf0 generate surrogate pairs. We
+    // do as at the cost of an extra mask.
     /////////////
     const __m128i sh =
         _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
@@ -33581,8 +35466,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
         _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
     middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
     const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
-    // We deliberately carry the leading four bits in highbyte if they are present,
-    // we remove them later when computing hightenbits.
+    // We deliberately carry the leading four bits in highbyte if they are
+    // present, we remove them later when computing hightenbits.
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0xff000000));
     const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
     // When we need to generate a surrogate pair (leading byte > 0xF0), then
@@ -33597,30 +35482,32 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     const __m128i lowtenbits =
         _mm_and_si128(composedminus, _mm_set1_epi32(0x3ff));
     // Notice the 0x3ff mask:
-    const __m128i hightenbits = _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
+    const __m128i hightenbits =
+        _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
     const __m128i lowtenbitsadd =
         _mm_add_epi32(lowtenbits, _mm_set1_epi32(0xDC00));
     const __m128i hightenbitsadd =
         _mm_add_epi32(hightenbits, _mm_set1_epi32(0xD800));
     const __m128i lowtenbitsaddshifted = _mm_slli_epi32(lowtenbitsadd, 16);
-    __m128i surrogates =
-        _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
+    __m128i surrogates = _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
     uint32_t basic_buffer[4];
     uint32_t basic_buffer_swap[4];
     if (big_endian) {
-      _mm_storeu_si128((__m128i *)basic_buffer_swap, _mm_shuffle_epi8(composed, swap));
+      _mm_storeu_si128((__m128i *)basic_buffer_swap,
+                       _mm_shuffle_epi8(composed, swap));
       surrogates = _mm_shuffle_epi8(surrogates, swap);
     }
     _mm_storeu_si128((__m128i *)basic_buffer, composed);
     uint32_t surrogate_buffer[4];
     _mm_storeu_si128((__m128i *)surrogate_buffer, surrogates);
     for (size_t i = 0; i < 3; i++) {
-      if(basic_buffer[i] > 0x3c00000) {
+      if (basic_buffer[i] > 0x3c00000) {
         utf16_output[0] = uint16_t(surrogate_buffer[i] & 0xffff);
         utf16_output[1] = uint16_t(surrogate_buffer[i] >> 16);
         utf16_output += 2;
       } else {
-        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i]) : uint16_t(basic_buffer[i]);
+        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i])
+                                     : uint16_t(basic_buffer[i]);
         utf16_output++;
       }
     }
@@ -33633,53 +35520,63 @@ size_t convert_masked_utf8_to_utf16(const char *input,
 /* begin file src/westmere/sse_convert_utf8_to_utf32.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
-
 // Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_utf32(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char32_t *&utf32_output) {
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   // We first try a few fast paths.
   const __m128i in = _mm_loadu_si128((__m128i *)input);
   const uint16_t input_utf8_end_of_code_point_mask =
       utf8_end_of_code_point_mask & 0xfff;
-  if(utf8_end_of_code_point_mask == 0xfff) {
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output), _mm_cvtepu8_epi32(in));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+4), _mm_cvtepu8_epi32(_mm_srli_si128(in,4)));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+8), _mm_cvtepu8_epi32(_mm_srli_si128(in,8)));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+12), _mm_cvtepu8_epi32(_mm_srli_si128(in,12)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu8_epi32(in));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 4)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 8),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 12),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 12)));
     utf32_output += 12; // We wrote 12 32-bit characters.
-    return 12; // We consumed 12 bytes.
+    return 12;          // We consumed 12 bytes.
   }
-  if(((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte UTF-32 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output), _mm_cvtepu16_epi32(composed));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+4), _mm_cvtepu16_epi32(_mm_srli_si128(composed,8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu16_epi32(composed));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
     utf32_output += 8; // We wrote 32 bytes, 8 code points.
     return 16;
   }
-  if(input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte UTF-32 code units.
-    // There is probably a more efficient sequence, but the following might do.
-    const __m128i sh = _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -33704,17 +35601,20 @@ size_t convert_masked_utf8_to_utf32(const char *input,
   if (idx < 64) {
     // SIX (6) input code-code units
     // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six code
-    // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-    // where pdep/pext is fast, we might be able to use a small lookup table.
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
     const __m128i sh =
         _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output), _mm_cvtepu16_epi32(composed));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+4), _mm_cvtepu16_epi32(_mm_srli_si128(composed,8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu16_epi32(composed));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
     utf32_output += 6; // We wrote 12 bytes, 6 code points.
   } else if (idx < 145) {
     // FOUR (4) input code-code units
@@ -33763,31 +35663,32 @@ size_t convert_masked_utf8_to_utf32(const char *input,
 /* begin file src/westmere/sse_convert_utf8_to_latin1.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
-
 // Convert up to 12 bytes from utf8 to latin1 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_latin1(const char *input,
-                           uint64_t utf8_end_of_code_point_mask,
-                           char *&latin1_output) {
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
   //
-  // Optimization note: our main path below is load-latency dependent. Thus it is maybe
-  // beneficial to have fast paths that depend on branch prediction but have less latency.
-  // This results in more instructions but, potentially, also higher speeds.
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
   //
   const __m128i in = _mm_loadu_si128((__m128i *)input);
   const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff; // we are only processing 12 bytes in case it is not all ASCII
-  if(utf8_end_of_code_point_mask == 0xfff) {
+      utf8_end_of_code_point_mask &
+      0xfff; // we are only processing 12 bytes in case it is not all ASCII
+  if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
     _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
     latin1_output += 12; // We wrote 12 characters.
-    return 12; // We consumed 12 bytes.
+    return 12;           // We consumed 12 bytes.
   }
   /// We do not have a fast path available, so we fallback.
   const uint8_t idx =
@@ -33795,22 +35696,25 @@ size_t convert_masked_utf8_to_latin1(const char *input,
   const uint8_t consumed =
       tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
   // this indicates an invalid input:
-  if(idx >= 64) { return consumed; }
-  // Here we should have (idx < 64), if not, there is a bug in the validation or elsewhere.
-  // SIX (6) input code-code units
-  // this is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six code
-  // code units spanning between 1 and 2 bytes each is 12 bytes. On processors
-  // where pdep/pext is fast, we might be able to use a small lookup table.
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+  // processors where pdep/pext is fast, we might be able to use a small lookup
+  // table.
   const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
   const __m128i perm = _mm_shuffle_epi8(in, sh);
   const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
   const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
   __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-  const __m128i latin1_packed = _mm_packus_epi16(composed,composed);
+  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
   // writing 8 bytes even though we only care about the first 6 bytes.
-  // performance note: it would be faster to use _mm_storeu_si128, we should investigate.
+  // performance note: it would be faster to use _mm_storeu_si128, we should
+  // investigate.
   _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
   latin1_output += 6; // We wrote 6 bytes.
   return consumed;
@@ -33819,14 +35723,17 @@ size_t convert_masked_utf8_to_latin1(const char *input,
 
 /* begin file src/westmere/sse_convert_utf16_to_latin1.cpp */
 template <endianness big_endian>
-std::pair<const char16_t*, char*> sse_convert_utf16_to_latin1(const char16_t* buf, size_t len, char* latin1_output) {
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char *>
+sse_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) {
+  const char16_t *end = buf + len;
   while (end - buf >= 8) {
     // Load 8 UTF-16 characters into 128-bit SSE register
-    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i*>(buf));
+    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
 
     if (!match_system(big_endian)) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
 
@@ -33834,49 +35741,58 @@ std::pair<const char16_t*, char*> sse_convert_utf16_to_latin1(const char16_t* bu
     if (_mm_testz_si128(in, high_byte_mask)) {
       // Pack 16-bit characters into 8-bit and store in latin1_output
       __m128i latin1_packed = _mm_packus_epi16(in, in);
-      _mm_storel_epi64(reinterpret_cast<__m128i*>(latin1_output), latin1_packed);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
+                       latin1_packed);
       // Adjust pointers for next iteration
       buf += 8;
       latin1_output += 8;
     } else {
-      return std::make_pair(nullptr, reinterpret_cast<char*>(latin1_output));
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
   } // while
   return std::make_pair(buf, latin1_output);
 }
 
 template <endianness big_endian>
-std::pair<result, char*> sse_convert_utf16_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char *>
+sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
   while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i*>(buf));
+    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
 
     if (!match_system(big_endian)) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
 
     __m128i high_byte_mask = _mm_set1_epi16((int16_t)0xFF00);
     if (_mm_testz_si128(in, high_byte_mask)) {
       __m128i latin1_packed = _mm_packus_epi16(in, in);
-      _mm_storel_epi64(reinterpret_cast<__m128i*>(latin1_output), latin1_packed);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
+                       latin1_packed);
       buf += 8;
       latin1_output += 8;
     } else {
       // Fallback to scalar code for handling errors
-      for(int k = 0; k < 8; k++) {
-        uint16_t word = !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if(word <= 0xff) {
+      for (int k = 0; k < 8; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if (word <= 0xff) {
           *latin1_output++ = char(word);
         } else {
-          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), latin1_output);
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
         }
       }
       buf += 8;
     }
   } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), latin1_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
 /* end file src/westmere/sse_convert_utf16_to_latin1.cpp */
 /* begin file src/westmere/sse_convert_utf16_to_utf8.cpp */
@@ -33934,62 +35850,72 @@ std::pair<result, char*> sse_convert_utf16_to_latin1_with_errors(const char16_t*
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::pair<const char16_t*, char*> sse_convert_utf16_to_utf8(const char16_t* buf, size_t len, char* utf8_output) {
+std::pair<const char16_t *, char *>
+sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
 
-  const char16_t* end = buf + len;
+  const char16_t *end = buf + len;
 
   const __m128i v_0000 = _mm_setzero_si128();
   const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
   const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
     if (big_endian) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
     const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
-    if(_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
-        __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);
-        if (big_endian) {
-          const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-          nextin = _mm_shuffle_epi8(nextin, swap);
-        }
-        if(!_mm_testz_si128(nextin, v_ff80)) {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          const __m128i utf8_packed = _mm_packus_epi16(in,in);
-          // 2. store (16 bytes)
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 8;
-          utf8_output += 8;
-          in = nextin;
-        } else {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          const __m128i utf8_packed = _mm_packus_epi16(in,nextin);
-          // 2. store (16 bytes)
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 16;
-          utf8_output += 16;
-          continue; // we are done for this round!
-        }
+    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
+      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        nextin = _mm_shuffle_epi8(nextin, swap);
+      }
+      if (!_mm_testz_si128(nextin, v_ff80)) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, in);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
 
     // no bits set above 7th bit
-    const __m128i one_byte_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
     if (one_or_two_bytes_bitmask == 0xffff) {
-      internal::westmere::write_v_u16_11bits_to_utf8(in, utf8_output, one_byte_bytemask, one_byte_bitmask);
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
       buf += 8;
       continue;
     }
@@ -33997,129 +35923,143 @@ std::pair<const char16_t*, char*> sse_convert_utf16_to_utf8(const char16_t* buf,
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m128i surrogates_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x0000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
 #define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const __m128i t2 = _mm_or_si128 (t1, simdutf_vec(0b1000000000000000));
-
-        // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-        const __m128i s0 = _mm_srli_epi16(in, 4);
-        // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-        const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-        // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-        const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-        // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-        const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
-        const __m128i s4 = _mm_xor_si128(s3, m0);
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m128i s0 = _mm_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
-        // 4. expand code units 16-bit => 32-bit
-        const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-        const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+      // 4. expand code units 16-bit => 32-bit
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16_t mask = (one_byte_bitmask & 0x5555) |
-                              (one_or_two_bytes_bitmask & 0xaaaa);
-        if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const __m128i shuffle = _mm_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-          const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-          const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }
-        const uint8_t mask0 = uint8_t(mask);
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+        utf8_output += 12;
+        buf += 8;
+        continue;
+      }
+      const uint8_t mask0 = uint8_t(mask);
 
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
 
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-        utf8_output += row0[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-        utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
 
-        buf += 8;
-    // surrogate pair(s) in a register
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, utf8_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -34130,70 +36070,81 @@ std::pair<const char16_t*, char*> sse_convert_utf16_to_utf8(const char16_t* buf,
   return std::make_pair(buf, utf8_output);
 }
 
-
 /*
   Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
 */
 template <endianness big_endian>
-std::pair<result, char*> sse_convert_utf16_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char *>
+sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                      char *utf8_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
   const __m128i v_0000 = _mm_setzero_si128();
   const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
   const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
     if (big_endian) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
     const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
-    if(_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
-        __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);
-        if (big_endian) {
-          const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-          nextin = _mm_shuffle_epi8(nextin, swap);
-        }
-        if(!_mm_testz_si128(nextin, v_ff80)) {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          const __m128i utf8_packed = _mm_packus_epi16(in,in);
-          // 2. store (16 bytes)
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 8;
-          utf8_output += 8;
-          in = nextin;
-        } else {
-          // 1. pack the bytes
-          // obviously suboptimal.
-          const __m128i utf8_packed = _mm_packus_epi16(in,nextin);
-          // 2. store (16 bytes)
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-          // 3. adjust pointers
-          buf += 16;
-          utf8_output += 16;
-          continue; // we are done for this round!
-        }
+    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
+      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        nextin = _mm_shuffle_epi8(nextin, swap);
+      }
+      if (!_mm_testz_si128(nextin, v_ff80)) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, in);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
 
     // no bits set above 7th bit
-    const __m128i one_byte_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
     if (one_or_two_bytes_bitmask == 0xffff) {
-      internal::westmere::write_v_u16_11bits_to_utf8(in, utf8_output, one_byte_bytemask, one_byte_bitmask);
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
       buf += 8;
       continue;
     }
@@ -34201,129 +36152,145 @@ std::pair<result, char*> sse_convert_utf16_to_utf8_with_errors(const char16_t* b
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m128i surrogates_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x0000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                                0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-        /* In this branch we handle three cases:
-           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-          We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-          either byte 1 for case #2 or byte 2 for case #3. Note that they
-          differ by exactly one bit.
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-          Finally from these two code units we build proper UTF-8 sequence, taking
-          into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
 #define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const __m128i t2 = _mm_or_si128 (t1, simdutf_vec(0b1000000000000000));
-
-        // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-        const __m128i s0 = _mm_srli_epi16(in, 4);
-        // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-        const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-        // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-        const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-        // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-        const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
-        const __m128i s4 = _mm_xor_si128(s3, m0);
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m128i s0 = _mm_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
-        // 4. expand code units 16-bit => 32-bit
-        const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-        const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+      // 4. expand code units 16-bit => 32-bit
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16_t mask = (one_byte_bitmask & 0x5555) |
-                              (one_or_two_bytes_bitmask & 0xaaaa);
-        if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const __m128i shuffle = _mm_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
-          const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-          const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-          utf8_output += 12;
-          _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }
-        const uint8_t mask0 = uint8_t(mask);
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+        utf8_output += 12;
+        buf += 8;
+        continue;
+      }
+      const uint8_t mask0 = uint8_t(mask);
 
-        const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
 
-        const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
-        utf8_output += row0[0];
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
-        utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
 
-        buf += 8;
-    // surrogate pair(s) in a register
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word & 0xFF80)==0) {
+        if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xF800 ) != 0xD800) {
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), utf8_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf8_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value>>18) | 0b11110000);
-          *utf8_output++ = char(((value>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value>>6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((value & 0b111111) | 0b10000000);
         }
       }
@@ -34389,55 +36356,67 @@ std::pair<result, char*> sse_convert_utf16_to_utf8_with_errors(const char16_t* b
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
-std::pair<const char16_t*, char32_t*> sse_convert_utf16_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) {
-  const char16_t* end = buf + len;
+std::pair<const char16_t *, char32_t *>
+sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_output) {
+  const char16_t *end = buf + len;
 
   const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
   const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
 
   while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
 
     if (big_endian) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m128i surrogates_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x0000) {
       // case: no surrogate pair, extend 16-bit code units to 32-bit code units
-        _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output), _mm_cvtepu16_epi32(in));
-        _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+4), _mm_cvtepu16_epi32(_mm_srli_si128(in,8)));
-        utf32_output += 8;
-        buf += 8;
-    // surrogate pair(s) in a register
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                       _mm_cvtepu16_epi32(in));
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
+        if ((word & 0xF800) != 0xD800) {
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(nullptr, utf32_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf32_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
         }
@@ -34448,64 +36427,78 @@ std::pair<const char16_t*, char32_t*> sse_convert_utf16_to_utf32(const char16_t*
   return std::make_pair(buf, utf32_output);
 }
 
-
 /*
   Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the error.
-  Otherwise, it is the position of the first unprocessed byte in buf (even if finished).
-  A scalar routing should carry on the conversion of the tail if needed.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
 */
 template <endianness big_endian>
-std::pair<result, char32_t*> sse_convert_utf16_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) {
-  const char16_t* start = buf;
-  const char16_t* end = buf + len;
+std::pair<result, char32_t *>
+sse_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                       char32_t *utf32_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
   const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
   const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
 
   while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
 
     if (big_endian) {
-      const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
       in = _mm_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m128i surrogates_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help. However,
-    // it is likely an uncommon occurrence.
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
     if (surrogates_bitmask == 0x0000) {
       // case: no surrogate pair, extend 16-bit code units to 32-bit code units
-        _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output), _mm_cvtepu16_epi32(in));
-        _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output+4), _mm_cvtepu16_epi32(_mm_srli_si128(in,8)));
-        utf32_output += 8;
-        buf += 8;
-    // surrogate pair(s) in a register
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                       _mm_cvtepu16_epi32(in));
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
       // It may seem wasteful to use scalar code, but being efficient with SIMD
       // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
-        if((word &0xF800 ) != 0xD800) {
+        if ((word & 0xF800) != 0xD800) {
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = big_endian ? scalar::utf16::swap_bytes(buf[k+1]) : buf[k+1];
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if((diff | diff2) > 0x3FF)  { return std::make_pair(result(error_code::SURROGATE, buf - start + k - 1), utf32_output); }
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf32_output);
+          }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
         }
@@ -34540,8 +36533,10 @@ sse_convert_utf32_to_latin1(const char32_t *buf, size_t len,
     if (!_mm_testz_si128(check_combined, high_bytes_mask)) {
       return std::make_pair(nullptr, latin1_output);
     }
-    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask), _mm_shuffle_epi8(in2, shufmask));
-    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask), _mm_shuffle_epi8(in4, shufmask));
+    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
+                                       _mm_shuffle_epi8(in2, shufmask));
+    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
+                                       _mm_shuffle_epi8(in4, shufmask));
     __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
     _mm_storeu_si128((__m128i *)latin1_output, pack);
     latin1_output += 16;
@@ -34585,8 +36580,10 @@ sse_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
       buf += 16;
       continue;
     }
-    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask), _mm_shuffle_epi8(in2, shufmask));
-    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask), _mm_shuffle_epi8(in4, shufmask));
+    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
+                                       _mm_shuffle_epi8(in2, shufmask));
+    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
+                                       _mm_shuffle_epi8(in4, shufmask));
     __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
     _mm_storeu_si128((__m128i *)latin1_output, pack);
     latin1_output += 16;
@@ -34598,58 +36595,90 @@ sse_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
 }
 /* end file src/westmere/sse_convert_utf32_to_latin1.cpp */
 /* begin file src/westmere/sse_convert_utf32_to_utf8.cpp */
-std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
-
-  const __m128i v_0000 = _mm_setzero_si128();//__m128 = 128 bits
-  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800); //1111 1000 0000 0000
-  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080); //1100 0000 1000 0000
-  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80); //1111 1111 1000 0000
-  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000); //1111 1111 1111 1111 0000 0000 0000 0000
-  const __m128i v_7fffffff = _mm_set1_epi32((uint32_t)0x7fffffff); //0111 1111 1111 1111 1111 1111 1111 1111
+std::pair<const char32_t *, char *>
+sse_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
+  const char32_t *end = buf + len;
+
+  const __m128i v_0000 = _mm_setzero_si128();              //__m128 = 128 bits
+  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800); // 1111 1000 0000
+                                                           // 0000
+  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080); // 1100 0000 1000
+                                                           // 0000
+  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80); // 1111 1111 1000
+                                                           // 0000
+  const __m128i v_ffff0000 = _mm_set1_epi32(
+      (uint32_t)0xffff0000); // 1111 1111 1111 1111 0000 0000 0000 0000
+  const __m128i v_7fffffff = _mm_set1_epi32(
+      (uint32_t)0x7fffffff); // 0111 1111 1111 1111 1111 1111 1111 1111
   __m128i running_max = _mm_setzero_si128();
   __m128i forbidden_bytemask = _mm_setzero_si128();
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
-
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) { // buf is a char32_t pointer, each char32_t has 4 bytes or 32 bits, thus buf + 16 * char_32t = 512 bits = 64 bytes
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >=
+         std::ptrdiff_t(
+             16 + safety_margin)) { // buf is a char32_t pointer, each char32_t
+                                    // has 4 bytes or 32 bits, thus buf + 16 *
+                                    // char_32t = 512 bits = 64 bytes
     // We load two 16 bytes registers for a total of 32 bytes or 16 characters.
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);//These two values can hold only 8 UTF32 chars
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128(
+        (__m128i *)buf + 1); // These two values can hold only 8 UTF32 chars
     running_max = _mm_max_epu32(
-                                _mm_max_epu32(in, running_max), //take element-wise max char32_t from in and running_max vector
-                                 nextin); //and take element-wise max element from nextin and running_max vector
+        _mm_max_epu32(in, running_max), // take element-wise max char32_t from
+                                        // in and running_max vector
+        nextin); // and take element-wise max element from nextin and
+                 // running_max vector
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
     __m128i in_16 = _mm_packus_epi32(
-                                      _mm_and_si128(in, v_7fffffff),
-                                      _mm_and_si128(nextin, v_7fffffff)
-                                      );//in this context pack the two __m128 into a single
-    //By ensuring the highest bit is set to 0(&v_7fffffff), we are making sure all values are interpreted as non-negative, or specifically, the values are within the range of valid Unicode code points.
-    //remember : having leading byte 0 means a positive number by the two complements system. Unicode is well beneath the range where you'll start getting issues so that's OK.
+        _mm_and_si128(in, v_7fffffff),
+        _mm_and_si128(
+            nextin,
+            v_7fffffff)); // in this context pack the two __m128 into a single
+    // By ensuring the highest bit is set to 0(&v_7fffffff), we are making sure
+    // all values are interpreted as non-negative, or specifically, the values
+    // are within the range of valid Unicode code points. remember : having
+    // leading byte 0 means a positive number by the two complements system.
+    // Unicode is well beneath the range where you'll start getting issues so
+    // that's OK.
 
     // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
 
     // Check for ASCII fast path
 
     // ASCII fast path!!!!
-      // We eagerly load another 32 bytes, hoping that they will be ASCII too.
-      // The intuition is that we try to collect 16 ASCII characters which requires
-      // a total of 64 bytes of input. If we fail, we just pass thirdin and fourthin
-      // as our new inputs.
-    if(_mm_testz_si128(in_16, v_ff80)) {  //if the first two blocks are ASCII
-      __m128i thirdin = _mm_loadu_si128((__m128i*)buf+2);
-      __m128i fourthin = _mm_loadu_si128((__m128i*)buf+3);
-      running_max = _mm_max_epu32(_mm_max_epu32(thirdin, running_max), fourthin);//take the running max of all 4 vectors thus far
-      __m128i nextin_16 = _mm_packus_epi32(_mm_and_si128(thirdin, v_7fffffff), _mm_and_si128(fourthin, v_7fffffff));//pack into 1 vector, now you have two
-      if(!_mm_testz_si128(nextin_16, v_ff80)) {  //checks if the second packed vector is ASCII, if not:
+    // We eagerly load another 32 bytes, hoping that they will be ASCII too.
+    // The intuition is that we try to collect 16 ASCII characters which
+    // requires a total of 64 bytes of input. If we fail, we just pass thirdin
+    // and fourthin as our new inputs.
+    if (_mm_testz_si128(in_16, v_ff80)) { // if the first two blocks are ASCII
+      __m128i thirdin = _mm_loadu_si128((__m128i *)buf + 2);
+      __m128i fourthin = _mm_loadu_si128((__m128i *)buf + 3);
+      running_max = _mm_max_epu32(
+          _mm_max_epu32(thirdin, running_max),
+          fourthin); // take the running max of all 4 vectors thus far
+      __m128i nextin_16 = _mm_packus_epi32(
+          _mm_and_si128(thirdin, v_7fffffff),
+          _mm_and_si128(fourthin,
+                        v_7fffffff)); // pack into 1 vector, now you have two
+      if (!_mm_testz_si128(
+              nextin_16,
+              v_ff80)) { // checks if the second packed vector is ASCII, if not:
         // 1. pack the bytes
         // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in_16,in_16); //creates two copy of in_16 in 1 vector
+        const __m128i utf8_packed = _mm_packus_epi16(
+            in_16, in_16); // creates two copy of in_16 in 1 vector
         // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_packed); //put them into the output
+        _mm_storeu_si128((__m128i *)utf8_output,
+                         utf8_packed); // put them into the output
         // 3. adjust pointers
-        buf += 8; //the char32_t buffer pointer goes up 8 char32_t chars* 32 bits =  256 bits
-        utf8_output += 8; //same with output, e.g. lift the first two blocks alone.
+        buf += 8; // the char32_t buffer pointer goes up 8 char32_t chars* 32
+                  // bits =  256 bits
+        utf8_output +=
+            8; // same with output, e.g. lift the first two blocks alone.
         // Proceed with next input
         in_16 = nextin_16;
         // We need to update in and nextin because they are used later.
@@ -34659,7 +36688,7 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
         // 1. pack the bytes
         const __m128i utf8_packed = _mm_packus_epi16(in_16, nextin_16);
         // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
         // 3. adjust pointers
         buf += 16;
         utf8_output += 16;
@@ -34668,51 +36697,72 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
     }
 
     // no bits set above 7th bit -- find out all the ASCII characters
-    const __m128i one_byte_bytemask = _mm_cmpeq_epi16( // this takes four bytes at a time and compares:
-                                                      _mm_and_si128(in_16, v_ff80), // the vector that get only the first 9 bits of each 16-bit/2-byte units
-                                                       v_0000 //
-                                                       ); // they should be all zero if they are ASCII. E.g. ASCII in UTF32 is of format 0000 0000 0000 0XXX XXXX
-    // _mm_cmpeq_epi16 should now return a 1111 1111 1111 1111 for equals, and 0000 0000 0000 0000 if not for each 16-bit/2-byte units
-    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask)); // collect the MSB from previous vector and put them into uint16_t mas
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16( // this takes four bytes at a time and compares:
+            _mm_and_si128(in_16, v_ff80), // the vector that get only the first
+                                          // 9 bits of each 16-bit/2-byte units
+            v_0000                        //
+        ); // they should be all zero if they are ASCII. E.g. ASCII in UTF32 is
+           // of format 0000 0000 0000 0XXX XXXX
+    // _mm_cmpeq_epi16 should now return a 1111 1111 1111 1111 for equals, and
+    // 0000 0000 0000 0000 if not for each 16-bit/2-byte units
+    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(
+        one_byte_bytemask)); // collect the MSB from previous vector and put
+                             // them into uint16_t mas
 
     // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
     if (one_or_two_bytes_bitmask == 0xffff) {
-      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one produces 2 bytes)
+      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
+      // produces 2 bytes)
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
       // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m128i v_1f00 = _mm_set1_epi16((int16_t)0x1f00); // 0001 1111 0000 0000
-      const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f); // 0000 0000 0011 1111
+      const __m128i v_1f00 =
+          _mm_set1_epi16((int16_t)0x1f00); // 0001 1111 0000 0000
+      const __m128i v_003f =
+          _mm_set1_epi16((int16_t)0x003f); // 0000 0000 0011 1111
 
       // t0 = [000a|aaaa|bbbb|bb00]
       const __m128i t0 = _mm_slli_epi16(in_16, 2); // shift packed vector by two
       // t1 = [000a|aaaa|0000|0000]
-      const __m128i t1 = _mm_and_si128(t0, v_1f00); // potentital first utf8 byte
+      const __m128i t1 =
+          _mm_and_si128(t0, v_1f00); // potentital first utf8 byte
       // t2 = [0000|0000|00bb|bbbb]
-      const __m128i t2 = _mm_and_si128(in_16, v_003f);// potential second utf8 byte
+      const __m128i t2 =
+          _mm_and_si128(in_16, v_003f); // potential second utf8 byte
       // t3 = [000a|aaaa|00bb|bbbb]
-      const __m128i t3 = _mm_or_si128(t1, t2); // first and second potential utf8 byte together
+      const __m128i t3 =
+          _mm_or_si128(t1, t2); // first and second potential utf8 byte together
       // t4 = [110a|aaaa|10bb|bbbb]
-      const __m128i t4 = _mm_or_si128(t3, v_c080); // t3 | 1100 0000 1000 0000 = full potential 2-byte utf8 unit
+      const __m128i t4 = _mm_or_si128(
+          t3,
+          v_c080); // t3 | 1100 0000 1000 0000 = full potential 2-byte utf8 unit
 
       // 2. merge ASCII and 2-byte codewords
-      const __m128i utf8_unpacked = _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m128i utf8_unpacked =
+          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
-      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a - LSB)
-      const uint16_t m0 = one_byte_bitmask & 0x5555;  // m0 = 0h0g0f0e0d0c0b0a
-      const uint16_t m1 = static_cast<uint16_t>(m0 >> 7);                    // m1 = 00000000h0g0f0e0
-      const uint8_t  m2 = static_cast<uint8_t>((m0 | m1) & 0xff);           // m2 =         hdgcfbea
+      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
+      //    MSB, a - LSB)
+      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
+      const uint16_t m1 =
+          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+      const uint8_t m2 =
+          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
       // 4. pack the bytes
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
       const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
 
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
 
       // 6. adjust pointers
       buf += 8;
@@ -34722,20 +36772,27 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
 
     // Check for overflow in packing
 
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(_mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
     if (saturation_bitmask == 0xffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
       const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm_or_si128(forbidden_bytemask, _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800));
+      forbidden_bytemask =
+          _mm_or_si128(forbidden_bytemask,
+                       _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800));
 
       const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
                                               0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+        two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -34762,7 +36819,7 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128 (t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m128i s0 = _mm_srli_epi16(in_16, 4);
@@ -34772,7 +36829,8 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
       const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
       const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
@@ -34781,63 +36839,72 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
       const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask = (one_byte_bitmask & 0x5555) |
-                            (one_or_two_bytes_bitmask & 0xaaaa);
-      if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
         const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
         const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
         buf += 8;
         continue;
       }
       const uint8_t mask0 = uint8_t(mask);
 
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
       const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
 
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
       const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
 
       buf += 8;
     } else {
-      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=> will produce four UTF-8 bytes
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
+      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD in the
+      // presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {
+        if ((word & 0xFFFFFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xFFFF0000 )==0) {
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -34847,19 +36914,23 @@ std::pair<const char32_t*, char*> sse_convert_utf32_to_utf8(const char32_t* buf,
 
   // check for invalid input
   const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
-  if(static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi32(_mm_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffff) {
+  if (static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi32(
+          _mm_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffff) {
     return std::make_pair(nullptr, utf8_output);
   }
 
-  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf8_output); }
+  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf8_output);
+  }
 
   return std::make_pair(buf, utf8_output);
 }
 
-
-std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) {
-  const char32_t* end = buf + len;
-  const char32_t* start = buf;
+std::pair<result, char *>
+sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                      char *utf8_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
 
   const __m128i v_0000 = _mm_setzero_si128();
   const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
@@ -34869,46 +36940,57 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
   const __m128i v_7fffffff = _mm_set1_epi32((uint32_t)0x7fffffff);
   const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
 
-  const size_t safety_margin = 12; // to avoid overruns, see issue https://github.com/simdutf/simdutf/issues/92
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
     // We load two 16 bytes registers for a total of 32 bytes or 8 characters.
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
     // Check for too large input
     __m128i max_input = _mm_max_epu32(_mm_max_epu32(in, nextin), v_10ffff);
-    if(static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi32(max_input, v_10ffff))) != 0xffff) {
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start), utf8_output);
+    if (static_cast<uint16_t>(_mm_movemask_epi8(
+            _mm_cmpeq_epi32(max_input, v_10ffff))) != 0xffff) {
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            utf8_output);
     }
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned saturation
-    __m128i in_16 = _mm_packus_epi32(_mm_and_si128(in, v_7fffffff), _mm_and_si128(nextin, v_7fffffff));
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m128i in_16 = _mm_packus_epi32(_mm_and_si128(in, v_7fffffff),
+                                     _mm_and_si128(nextin, v_7fffffff));
 
     // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
 
     // Check for ASCII fast path
-    if(_mm_testz_si128(in_16, v_ff80)) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in_16,in_16);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        continue;
+    if (_mm_testz_si128(in_16, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      // obviously suboptimal.
+      const __m128i utf8_packed = _mm_packus_epi16(in_16, in_16);
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 8;
+      utf8_output += 8;
+      continue;
     }
 
     // no bits set above 7th bit
-    const __m128i one_byte_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in_16, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
     if (one_or_two_bytes_bitmask == 0xffff) {
-      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one produces 2 bytes)
+      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
+      // produces 2 bytes)
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
       // expected output   : [110a|aaaa|10bb|bbbb] x 8
@@ -34927,20 +37009,25 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
       const __m128i t4 = _mm_or_si128(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m128i utf8_unpacked = _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m128i utf8_unpacked =
+          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
-      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a - LSB)
-      const uint16_t m0 = one_byte_bitmask & 0x5555;  // m0 = 0h0g0f0e0d0c0b0a
-      const uint16_t m1 = static_cast<uint16_t>(m0 >> 7);                    // m1 = 00000000h0g0f0e0
-      const uint8_t  m2 = static_cast<uint8_t>((m0 | m1) & 0xff);           // m2 =         hdgcfbea
+      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
+      //    MSB, a - LSB)
+      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
+      const uint16_t m1 =
+          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+      const uint8_t m2 =
+          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
       // 4. pack the bytes
-      const uint8_t* row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const __m128i shuffle = _mm_loadu_si128((__m128i*)(row + 1));
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
       const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
 
       // 5. store bytes
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_packed);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
 
       // 6. adjust pointers
       buf += 8;
@@ -34948,28 +37035,34 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
       continue;
     }
 
-
     // Check for overflow in packing
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(_mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
 
     if (saturation_bitmask == 0xffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
       // Check for illegal surrogate code units
       const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      const __m128i forbidden_bytemask = _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800);
+      const __m128i forbidden_bytemask =
+          _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800);
       if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf8_output);
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf8_output);
       }
 
       const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
                                               0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three UTF-8 bytes
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+        two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
         we have room for four bytes. However, we need five distinct bit
@@ -34996,7 +37089,7 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
       const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128 (t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
       const __m128i s0 = _mm_srli_epi16(in_16, 4);
@@ -35006,7 +37099,8 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
       const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
       const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask, simdutf_vec(0b0100000000000000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
       const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
@@ -35015,63 +37109,74 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
       const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask = (one_byte_bitmask & 0x5555) |
-                            (one_or_two_bytes_bitmask & 0xaaaa);
-      if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1);
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
         const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
         const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
         buf += 8;
         continue;
       }
       const uint8_t mask0 = uint8_t(mask);
 
-      const uint8_t* row0 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i*)(row0 + 1));
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
       const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
 
-      const uint8_t* row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i*)(row1 + 1));
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
       const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_0);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
-      _mm_storeu_si128((__m128i*)utf8_output, utf8_1);
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
 
       buf += 8;
     } else {
-      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=> will produce four UTF-8 bytes
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
+      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD in the
+      // presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFFFF80)==0) {
+        if ((word & 0xFFFFFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if((word & 0xFFFFF800)==0) {
-          *utf8_output++ = char((word>>6) | 0b11000000);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if((word &0xFFFF0000 )==0) {
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf8_output); }
-          *utf8_output++ = char((word>>12) | 0b11100000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf- start + k), utf8_output); }
-          *utf8_output++ = char((word>>18) | 0b11110000);
-          *utf8_output++ = char(((word>>12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word>>6) & 0b111111) | 0b10000000);
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         }
       }
@@ -35083,19 +37188,23 @@ std::pair<result, char*> sse_convert_utf32_to_utf8_with_errors(const char32_t* b
 /* end file src/westmere/sse_convert_utf32_to_utf8.cpp */
 /* begin file src/westmere/sse_convert_utf32_to_utf16.cpp */
 template <endianness big_endian>
-std::pair<const char32_t*, char16_t*> sse_convert_utf32_to_utf16(const char32_t* buf, size_t len, char16_t* utf16_output) {
+std::pair<const char32_t *, char16_t *>
+sse_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                           char16_t *utf16_output) {
 
-  const char32_t* end = buf + len;
+  const char32_t *end = buf + len;
 
   const __m128i v_0000 = _mm_setzero_si128();
   const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
   __m128i forbidden_bytemask = _mm_setzero_si128();
 
   while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(_mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
 
     // Check if no bits set above 16th
     if (saturation_bitmask == 0xffff) {
@@ -35104,35 +37213,49 @@ std::pair<const char32_t*, char16_t*> sse_convert_utf32_to_utf16(const char32_t*
 
       const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
       const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm_or_si128(forbidden_bytemask, _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800));
+      forbidden_bytemask = _mm_or_si128(
+          forbidden_bytemask,
+          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800));
 
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
 
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(nullptr, utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(nullptr, utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -35143,25 +37266,30 @@ std::pair<const char32_t*, char16_t*> sse_convert_utf32_to_utf16(const char32_t*
   }
 
   // check for invalid input
-  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) { return std::make_pair(nullptr, utf16_output); }
+  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf16_output);
+  }
 
   return std::make_pair(buf, utf16_output);
 }
 
-
 template <endianness big_endian>
-std::pair<result, char16_t*> sse_convert_utf32_to_utf16_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) {
-  const char32_t* start = buf;
-  const char32_t* end = buf + len;
+std::pair<result, char16_t *>
+sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                       char16_t *utf16_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
   const __m128i v_0000 = _mm_setzero_si128();
   const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
 
   while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i*)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i*)buf+1);
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(_mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask = static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
 
     // Check if no bits set above 16th
     if (saturation_bitmask == 0xffff) {
@@ -35170,38 +37298,54 @@ std::pair<result, char16_t*> sse_convert_utf32_to_utf16_with_errors(const char32
 
       const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
       const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      const __m128i forbidden_bytemask = _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800);
+      const __m128i forbidden_bytemask =
+          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800);
       if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start), utf16_output);
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf16_output);
       }
 
       if (big_endian) {
-        const __m128i swap = _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
 
-      _mm_storeu_si128((__m128i*)utf16_output, utf16_packed);
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
     } else {
       size_t forward = 7;
       size_t k = 0;
-      if(size_t(end - buf) < forward + 1) { forward = size_t(end - buf - 1);}
-      for(; k < forward; k++) {
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if((word & 0xFFFF0000)==0) {
+        if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) { return std::make_pair(result(error_code::SURROGATE, buf - start + k), utf16_output); }
-          *utf16_output++ = big_endian ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8)) : char16_t(word);
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
           // will generate a surrogate pair
-          if (word > 0x10FFFF) { return std::make_pair(result(error_code::TOO_LARGE, buf - start + k), utf16_output); }
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+          }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
           if (big_endian) {
-            high_surrogate = uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate = uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -35274,7 +37418,8 @@ template <bool base64_url> __m128i lookup_pshufb_improved(const __m128i input) {
 }
 
 template <bool isbase64url>
-size_t encode_base64(char *dst, const char *src, size_t srclen, base64_options options) {
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
   // credit: Wojciech Muła
   // SSE (lookup: pshufb improved unrolled)
   const uint8_t *input = (const uint8_t *)src;
@@ -35453,7 +37598,8 @@ static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
   }
   __m128i check_asso;
   if (base64_url) {
-    check_asso = _mm_setr_epi8(0xD,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x3,0x7,0xB,0xE,0xB,0x6);
+    check_asso = _mm_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
+                               0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
   } else {
 
     check_asso = _mm_setr_epi8(0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
@@ -35461,9 +37607,11 @@ static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
   }
   __m128i check_values;
   if (base64_url) {
-    check_values = _mm_setr_epi8(uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0x80),uint8_t(0xCF),
-                   uint8_t(0xBF),uint8_t(0xB6),uint8_t(0xA6),uint8_t(0xB5),uint8_t(0xA1),0x0,uint8_t(0x80),
-                   0x0,uint8_t(0x80),0x0,uint8_t(0x80));
+    check_values = _mm_setr_epi8(uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+                                 uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF),
+                                 uint8_t(0xB6), uint8_t(0xA6), uint8_t(0xB5),
+                                 uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
+                                 uint8_t(0x80), 0x0, uint8_t(0x80));
   } else {
 
     check_values =
@@ -35472,7 +37620,7 @@ static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
                       int8_t(0xB5), int8_t(0x86), int8_t(0xD1), int8_t(0x80),
                       int8_t(0xB1), int8_t(0x80), int8_t(0x91), int8_t(0x80));
   }
-  const __m128i shifted =_mm_srli_epi32(*src, 3);
+  const __m128i shifted = _mm_srli_epi32(*src, 3);
 
   const __m128i delta_hash =
       _mm_avg_epu8(_mm_shuffle_epi8(delta_asso, *src), shifted);
@@ -35522,8 +37670,8 @@ static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
   return _mm_popcnt_u64(nmask);
 }
 
-// The caller of this function is responsible to ensure that there are 64 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char *src) {
   b->chunks[0] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
   b->chunks[1] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16));
@@ -35531,8 +37679,8 @@ static inline void load_block(block64 *b, const char *src) {
   b->chunks[3] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48));
 }
 
-// The caller of this function is responsible to ensure that there are 128 bytes available
-// from reading at src. The data is read into a block64 structure.
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char16_t *src) {
   __m128i m1 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
   __m128i m2 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 8));
@@ -35599,12 +37747,15 @@ static inline void base64_decode_block_safe(char *out, block64 *b) {
 
 template <bool base64_url, typename chartype>
 result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options) {
+                              base64_options options,
+                              last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
-  size_t equallocation = srclen; // location of the first padding character if any
+  size_t equallocation =
+      srclen; // location of the first padding character if any
   // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
     srclen--;
   }
   size_t equalsigns = 0;
@@ -35613,7 +37764,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     srclen--;
     equalsigns = 1;
     // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) && to_base64[uint8_t(src[srclen - 1])] == 64) {
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
       srclen--;
     }
     if (srclen > 0 && src[srclen - 1] == '=') {
@@ -35622,6 +37774,12 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       equalsigns = 2;
     }
   }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
   char *end_of_safe_64byte_zone =
       (srclen + 3) / 4 * 3 >= 63 ? dst + (srclen + 3) / 4 * 3 - 63 : dst;
 
@@ -35643,7 +37801,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) && to_base64[uint8_t(*src)] <= 64) {
+        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+               to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
         return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
@@ -35732,71 +37891,38 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       buffer_start += 4;
     }
     // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // bring in src content
+    // backtrack
     int leftover = int(bufferptr - buffer_start);
-    if (leftover > 0) {
-      while (leftover < 4 && src < srcend) {
-        uint8_t val = to_base64[uint8_t(*src)];
-        if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-        }
-        buffer_start[leftover] = char(val);
-        leftover += (val <= 63);
-        src++;
-      }
-
-      if (leftover == 1) {
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
-      }
-      if (leftover == 2) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-        triple >>= 8;
-        std::memcpy(dst, &triple, 1);
-        dst += 1;
-      } else if (leftover == 3) {
-        uint32_t triple = (uint32_t(buffer_start[0]) << 3 * 6) +
-                          (uint32_t(buffer_start[1]) << 2 * 6) +
-                          (uint32_t(buffer_start[2]) << 1 * 6);
-        triple = scalar::utf32::swap_bytes(triple);
-
-        triple >>= 8;
-
-        std::memcpy(dst, &triple, 2);
-        dst += 2;
-      } else {
-        uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                           (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                           (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                           (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                          << 8;
-        triple = scalar::utf32::swap_bytes(triple);
-        std::memcpy(dst, &triple, 3);
-        dst += 3;
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
       }
+      src--;
+      leftover--;
     }
   }
   if (src < srcend + equalsigns) {
-    result r =
-        scalar::base64::base64_tail_decode(dst, src, srcend - src, options);
+    result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
     if (r.error == error_code::INVALID_BASE64_CHARACTER) {
       r.count += size_t(src - srcinit);
       return r;
     } else {
       r.count += size_t(dst - dstinit);
     }
-    if(r.error == error_code::SUCCESS && equalsigns > 0) {
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
         r.count = equallocation;
       }
     }
     return r;
   }
-  if(equalsigns > 0) {
-    if((size_t(dst - dstinit) % 3 == 0) || ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
       return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
@@ -35813,9 +37939,9 @@ namespace simdutf {
 namespace westmere {
 namespace {
 
-// Walks through a buffer in block-sized increments, loading the last part with spaces
-template<size_t STEP_SIZE>
-struct buf_block_reader {
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
 public:
   simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
   simdutf_really_inline size_t block_index();
@@ -35824,14 +37950,16 @@ struct buf_block_reader {
   /**
    * Get the last block, padded with spaces.
    *
-   * There will always be a last block, with at least 1 byte, unless len == 0 (in which case this
-   * function fills the buffer with spaces and returns 0. In particular, if len == STEP_SIZE there
-   * will be 0 full_blocks and 1 remainder block with STEP_SIZE bytes and no spaces for padding.
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
    *
    * @return the number of effective characters in the last block.
    */
   simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
   simdutf_really_inline void advance();
+
 private:
   const uint8_t *buf;
   const size_t len;
@@ -35840,9 +37968,10 @@ struct buf_block_reader {
 };
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text_64(const uint8_t *text) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
     buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
@@ -35850,50 +37979,64 @@ simdutf_unused static char * format_input_text_64(const uint8_t *text) {
 }
 
 // Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char * format_input_text(const simd8x64<uint8_t>& in) {
-  static char *buf = reinterpret_cast<char*>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t*>(buf));
-  for (size_t i=0; i<sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') { buf[i] = '_'; }
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
   }
   buf[sizeof(simd8x64<uint8_t>)] = '\0';
   return buf;
 }
 
-simdutf_unused static char * format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char*>(malloc(64 + 1));
-  for (size_t i=0; i<64; i++) {
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
     buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
   }
   buf[64] = '\0';
   return buf;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len) : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE}, idx{0} {}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() { return idx; }
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
   return idx < lenminusstep;
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *buf_block_reader<STEP_SIZE>::full_block() const {
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
   return &buf[idx];
 }
 
-template<size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if(len == idx) { return 0; } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20, STEP_SIZE); // std::memset STEP_SIZE because it is more efficient to write out 8 or 16 bytes at once.
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
   std::memcpy(dst, buf + idx, len - idx);
   return len - idx;
 }
 
-template<size_t STEP_SIZE>
+template <size_t STEP_SIZE>
 simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
   idx += STEP_SIZE;
 }
@@ -35910,38 +38053,39 @@ namespace utf8_validation {
 
 using namespace simd;
 
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -35951,137 +38095,173 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
+
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
   //
-  // Return nonzero if there are incomplete multibyte characters at the end of the block:
-  // e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-    // If the previous input's last 3 bytes match this, they're too short (they ended at EOF):
-    // ... 1111____ 111_____ 11______
-    static const uint8_t max_array[32] = {
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 255, 255, 255,
-      255, 255, 255, 255, 255, 0b11110000u-1, 0b11100000u-1, 0b11000000u-1
-    };
-    const simd8<uint8_t> max_value(&max_array[sizeof(max_array)-sizeof(simd8<uint8_t>)]);
-    return input.gt_bits(max_value);
-  }
-
-  struct utf8_checker {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-    // The last input we received
-    simd8<uint8_t> prev_input_block;
-    // Whether the last input we received was incomplete (used for ASCII fast path)
-    simd8<uint8_t> prev_incomplete;
-
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-    // The only problem that can happen at EOF is that a multibyte character is too short
-    // or a byte value too large in the last bytes: check_special_cases only checks for bytes
-    // too large in the first of two bytes.
-    simdutf_really_inline void check_eof() {
-      // If the previous block had incomplete UTF-8 characters at the end, an ASCII block can't
-      // possibly finish them.
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
       this->error |= this->prev_incomplete;
-    }
-
-    simdutf_really_inline void check_next_input(const simd8x64<uint8_t>& input) {
-      if(simdutf_likely(is_ascii(input))) {
-        this->error |= this->prev_incomplete;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-        static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        this->prev_incomplete = is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1]);
-        this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS-1];
-
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
       }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
+  }
 
-    // do not forget to call check_eof!
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
+}; // struct utf8_checker
 } // namespace utf8_validation
 
 using utf8_validation::utf8_checker;
@@ -36099,97 +38279,109 @@ namespace utf8_validation {
 /**
  * Validates that the string is actual UTF-8.
  */
-template<class checker>
-bool generic_validate_utf8(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
     reader.advance();
-    c.check_eof();
-    return !c.errors();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-bool generic_validate_utf8(const char * input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 /**
  * Validates that the string is actual UTF-8 and stops on errors.
  */
-template<class checker>
-result generic_validate_utf8_with_errors(const uint8_t * input, size_t length) {
-    checker c{};
-    buf_block_reader<64> reader(input, length);
-    size_t count{0};
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      c.check_next_input(in);
-      if(c.errors()) {
-        if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-        result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input + count), length - count);
-        res.count += count;
-        return res;
-      }
-      reader.advance();
-      count += 64;
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     c.check_next_input(in);
-    reader.advance();
-    c.check_eof();
     if (c.errors()) {
-      if (count != 0) { count--; } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(reinterpret_cast<const char*>(input), reinterpret_cast<const char*>(input) + count, length - count);
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
       res.count += count;
       return res;
-    } else {
-      return result(error_code::SUCCESS, length);
     }
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
 }
 
-result generic_validate_utf8_with_errors(const char * input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-bool generic_validate_ascii(const uint8_t * input, size_t length) {
-    buf_block_reader<64> reader(input, length);
-    uint8_t blocks[64]{};
-    simd::simd8x64<uint8_t> running_or(blocks);
-    while (reader.has_full_block()) {
-      simd::simd8x64<uint8_t> in(reader.full_block());
-      running_or |= in;
-      reader.advance();
-    }
-    uint8_t block[64]{};
-    reader.get_remainder(block);
-    simd::simd8x64<uint8_t> in(block);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
     running_or |= in;
-    return running_or.is_ascii();
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
 }
 
-bool generic_validate_ascii(const char * input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-template<class checker>
-result generic_validate_ascii_with_errors(const uint8_t * input, size_t length) {
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
   buf_block_reader<64> reader(input, length);
   size_t count{0};
   while (reader.has_full_block()) {
     simd::simd8x64<uint8_t> in(reader.full_block());
     if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
       return result(res.error, count + res.count);
     }
     reader.advance();
@@ -36200,15 +38392,17 @@ result generic_validate_ascii_with_errors(const uint8_t * input, size_t length)
   reader.get_remainder(block);
   simd::simd8x64<uint8_t> in(block);
   if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(reinterpret_cast<const char*>(input + count), length - count);
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
     return result(res.error, count + res.count);
   } else {
     return result(error_code::SUCCESS, length);
   }
 }
 
-result generic_validate_ascii_with_errors(const char * input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(reinterpret_cast<const uint8_t *>(input),length);
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
 } // namespace utf8_validation
@@ -36219,7 +38413,6 @@ result generic_validate_ascii_with_errors(const char * input, size_t length) {
 // transcoding from UTF-8 to UTF-16
 /* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace westmere {
 namespace {
@@ -36228,36 +38421,39 @@ namespace utf8_to_utf16 {
 using namespace simd;
 
 template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char16_t* utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the generic directory.
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
   size_t pos = 0;
-  char16_t* start{utf16_output};
+  char16_t *start{utf16_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the mask
-    // far more than 64 bytes.
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf16<endian>(utf16_output);
       utf16_output += 64;
       pos += 64;
     } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow path.
-      // Anything that is not a continuation mask is a 'leading byte', that is, the
-      // start of a new code point.
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
       uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
       uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end* of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
       // We process in blocks of up to 12 bytes except possibly
       // for fast paths which may process up to 16 bytes. For the
       // slow path to work, we should have at least 12 input bytes left.
       size_t max_starting_point = (pos + 64) - 12;
       // Next loop is going to run at least five times when using solely
       // the slow/regular path, and at least four times if there are fast paths.
-      while(pos < max_starting_point) {
+      while (pos < max_starting_point) {
         // Performance note: our ability to compute 'consumed' and
         // then shift and recompute is critical. If there is a
         // latency of, say, 4 cycles on getting 'consumed', then
@@ -36271,8 +38467,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
         // Thus we may allow convert_masked_utf8_to_utf16 to process
         // more bytes at a time under a fast-path mode where 16 bytes
         // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(input + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
         pos += consumed;
         utf8_end_of_code_point_mask >>= consumed;
       }
@@ -36282,7 +38478,8 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
       // 85% to 90% efficiency.
     }
   }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(input + pos, size - pos, utf16_output);
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
   return utf16_output - start;
 }
 
@@ -36293,46 +38490,45 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
 /* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 /* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-
 namespace simdutf {
 namespace westmere {
 namespace {
 namespace utf8_to_utf16 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -36342,260 +38538,287 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-    template <endianness endian>
-    simdutf_really_inline size_t convert(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf16::convert<endian>(in + pos, size - pos, utf16_output);
-        if(howmany == 0) { return 0; }
-        utf16_output += howmany;
-      }
-      return utf16_output - start;
-    }
-
-    template <endianness endian>
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char16_t* utf16_output) {
-      size_t pos = 0;
-      char16_t* start{utf16_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf16<endian>(utf16_output);
-          utf16_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf16<endian>(in + pos,
-                            utf8_end_of_code_point_mask, utf16_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(pos, in + pos, size - pos, utf16_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
+
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf16_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
       }
-      return result(error_code::SUCCESS, utf16_output - start);
     }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf16 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
 } // unnamed namespace
 } // namespace westmere
 } // namespace simdutf
@@ -36610,37 +38833,37 @@ namespace utf8_to_utf32 {
 
 using namespace simd;
 
-
-simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
-    char32_t* utf32_output) noexcept {
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
   size_t pos = 0;
-  char32_t* start{utf32_output};
+  char32_t *start{utf32_output};
   const size_t safety_margin = 16; // to avoid overruns!
-  while(pos + 64 + safety_margin <= size) {
+  while (pos + 64 + safety_margin <= size) {
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if(in.is_ascii()) {
+    if (in.is_ascii()) {
       in.store_ascii_as_utf32(utf32_output);
       utf32_output += 64;
       pos += 64;
     } else {
-    // -65 is 0b10111111 in two-complement's, so largest possible continuation byte
-    uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-    uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-    uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-    size_t max_starting_point = (pos + 64) - 12;
-    while(pos < max_starting_point) {
-      size_t consumed = convert_masked_utf8_to_utf32(input + pos,
-                          utf8_end_of_code_point_mask, utf32_output);
-      pos += consumed;
-      utf8_end_of_code_point_mask >>= consumed;
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
     }
   }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos, utf32_output);
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
   return utf32_output - start;
 }
 
-
 } // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace westmere
@@ -36648,46 +38871,45 @@ simdutf_warn_unused size_t convert_valid(const char* input, size_t size,
 /* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 /* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-
 namespace simdutf {
 namespace westmere {
 namespace {
 namespace utf8_to_utf32 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -36697,253 +38919,273 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      CARRY | TOO_LARGE,
-      // ____0101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____011_ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-
-      // ____1___ ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      // ____1101 ________
-      CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-      CARRY | TOO_LARGE | TOO_LARGE_1000,
-      CARRY | TOO_LARGE | TOO_LARGE_1000
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
-  simdutf_really_inline simd8<uint8_t> check_multibyte_lengths(const simd8<uint8_t> input,
-      const simd8<uint8_t> prev_input, const simd8<uint8_t> sc) {
-    simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-    simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-    simd8<uint8_t> must23 = simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-    simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-    return must23_80 ^ sc;
-  }
-
-
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
-
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      simd8<uint8_t> sc = check_special_cases(input, prev1);
-      this->error |= check_multibyte_lengths(input, prev_input, sc);
-    }
-
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 words when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 16 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if(utf8_continuation_mask & 1) {
-            return 0; // we have an error
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-        if(howmany == 0) { return 0; }
-        utf32_output += howmany;
-      }
-      return utf32_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char32_t* utf32_output) {
-      size_t pos = 0;
-      char32_t* start{utf32_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the fourth last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store_ascii_as_utf32(utf32_output);
-          utf32_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          if (errors() || (utf8_continuation_mask & 1)) {
-            result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_utf32(in + pos,
-                            utf8_end_of_code_point_mask, utf32_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(pos, in + pos, size - pos, utf32_output);
-        if (res.error) {    // In case of error, we want the error position
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          utf32_output += res.count;
         }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      return result(error_code::SUCCESS, utf32_output - start);
     }
-
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
     }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_utf32 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
 } // unnamed namespace
 } // namespace westmere
 } // namespace simdutf
@@ -36958,33 +39200,34 @@ namespace utf8 {
 
 using namespace simd;
 
-simdutf_really_inline size_t count_code_points(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.gt(-65);
-      count += count_ones(utf8_continuation_mask);
-    }
-    return count + scalar::utf8::count_code_points(in + pos, size - pos);
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
 }
 
-simdutf_really_inline size_t utf16_length_from_utf8(const char* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos + 64 <= size; pos += 64) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-      // We count one word for anything that is not a continuation (so
-      // leading bytes).
-      count += 64 - count_ones(utf8_continuation_mask);
-      int64_t utf8_4byte = input.gteq_unsigned(240);
-      count += count_ones(utf8_4byte);
-    }
-    return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // utf8 namespace
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
 } // unnamed namespace
 } // namespace westmere
 } // namespace simdutf
@@ -36996,48 +39239,59 @@ namespace {
 namespace utf16 {
 
 template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-      count += count_ones(not_pair) / 2;
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t* in, size_t size) {
-    size_t pos = 0;
-    size_t count = 0;
-    // This algorithm could no doubt be improved!
-    for(;pos < size/32*32; pos += 32) {
-      simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-      if (!match_system(big_endian)) { input.swap_bytes(); }
-      uint64_t ascii_mask = input.lteq(0x7F);
-      uint64_t twobyte_mask = input.lteq(0x7FF);
-      uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
-
-      size_t ascii_count = count_ones(ascii_mask) / 2;
-      size_t twobyte_count = count_ones(twobyte_mask & ~ ascii_mask) / 2;
-      size_t threebyte_count = count_ones(not_pair_mask & ~ twobyte_mask) / 2;
-      size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-      count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count + ascii_count;
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos, size - pos);
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t* in, size_t size) {
-    return count_code_points<big_endian>(in, size);
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
 }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t size, char16_t* output) {
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
   size_t pos = 0;
 
-  while (pos < size/32*32) {
+  while (pos < size / 32 * 32) {
     simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
     input.swap_bytes();
     input.store(reinterpret_cast<uint16_t *>(output));
@@ -37048,7 +39302,7 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
   scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
 }
 
-} // utf16
+} // namespace utf16
 } // unnamed namespace
 } // namespace westmere
 } // namespace simdutf
@@ -37056,50 +39310,50 @@ simdutf_really_inline void change_endianness_utf16(const char16_t* in, size_t si
 // transcoding from UTF-8 to Latin 1
 /* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace westmere {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-  simdutf_really_inline simd8<uint8_t> check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-// For UTF-8 to Latin 1, we can allow any ASCII character, and any continuation byte,
-// but the non-ASCII leading bytes must be 0b11000011 or 0b11000010 and nothing else.
-//
-// Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-// Bit 1 = Too Long (ASCII followed by continuation)
-// Bit 2 = Overlong 3-byte
-// Bit 4 = Surrogate
-// Bit 5 = Overlong 2-byte
-// Bit 7 = Two Continuations
-    constexpr const uint8_t TOO_SHORT   = 1<<0; // 11______ 0_______
-                                                // 11______ 11______
-    constexpr const uint8_t TOO_LONG    = 1<<1; // 0_______ 10______
-    constexpr const uint8_t OVERLONG_3  = 1<<2; // 11100000 100_____
-    constexpr const uint8_t SURROGATE   = 1<<4; // 11101101 101_____
-    constexpr const uint8_t OVERLONG_2  = 1<<5; // 1100000_ 10______
-    constexpr const uint8_t TWO_CONTS   = 1<<7; // 10______ 10______
-    constexpr const uint8_t TOO_LARGE   = 1<<3; // 11110100 1001____
-                                                // 11110100 101_____
-                                                // 11110101 1001____
-                                                // 11110101 101_____
-                                                // 1111011_ 1001____
-                                                // 1111011_ 101_____
-                                                // 11111___ 1001____
-                                                // 11111___ 101_____
-    constexpr const uint8_t TOO_LARGE_1000 = 1<<6;
-                                                // 11110101 1000____
-                                                // 1111011_ 1000____
-                                                // 11111___ 1000____
-    constexpr const uint8_t OVERLONG_4  = 1<<6; // 11110000 1000____
-    constexpr const uint8_t FORBIDDEN  = 0xff;
-
-    const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
       // 10______ ________ <continuation in byte 1>
       TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
       // 1100____ ________ <two byte lead in byte 1>
@@ -37109,326 +39363,348 @@ using namespace simd;
       // 1110____ ________ <three byte lead in byte 1>
       FORBIDDEN,
       // 1111____ ________ <four+ byte lead in byte 1>
-      FORBIDDEN
-    );
-    constexpr const uint8_t CARRY = TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-    const simd8<uint8_t> byte_1_low = (prev1 & 0x0F).lookup_16<uint8_t>(
-      // ____0000 ________
-      CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-      // ____0001 ________
-      CARRY | OVERLONG_2,
-      // ____001_ ________
-      CARRY,
-      CARRY,
-
-      // ____0100 ________
-      FORBIDDEN,
-      // ____0101 ________
-      FORBIDDEN,
-      // ____011_ ________
-      FORBIDDEN,
-      FORBIDDEN,
-
-      // ____1___ ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN,
-      // ____1101 ________
-      FORBIDDEN,
-      FORBIDDEN,
-      FORBIDDEN
-    );
-    const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
+
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
       // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
       // ________ 1001____
       TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
       // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE  | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
       // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT
-    );
-    return (byte_1_high & byte_1_low & byte_2_high);
-  }
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
 
-  struct validating_transcoder {
-    // If this is nonzero, there has been a UTF-8 error.
-    simd8<uint8_t> error;
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-    validating_transcoder() : error(uint8_t(0)) {}
-    //
-    // Check whether the current bytes are valid UTF-8.
-    //
-    simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input, const simd8<uint8_t> prev_input) {
-      // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+ lead bytes
-      // (2, 3, 4-byte leads become large positive numbers instead of small negative numbers)
-      simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-      this->error |= check_special_cases(input, prev1);
-    }
-
-
-    simdutf_really_inline size_t convert(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      if(errors()) { return 0; }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
-        if(howmany == 0) { return 0; }
-        latin1_output += howmany;
-      }
-      return latin1_output - start;
-    }
-
-    simdutf_really_inline result convert_with_errors(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65);
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) || (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-              "We support either two or four chunks per 64-byte block.");
-          auto zero = simd8<uint8_t>{uint8_t(0)};
-          if(simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          } else if(simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-            this->check_utf8_bytes(input.chunks[0], zero);
-            this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-            this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-            this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-          }
-          if (errors()) {
-            // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-            // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-            result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-            res.count += pos;
-            return res;
-          }
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      if(errors()) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        res.count += pos;
-        return res;
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
       }
-      if(pos < size) {
-        // rewind_and_convert_with_errors will seek a potential error from in+pos onward,
-        // with the ability to go back up to pos bytes, and read size-pos bytes forward.
-        result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(pos, in + pos, size - pos, latin1_output);
-        if (res.error) {    // In case of error, we want the error position
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
           res.count += pos;
           return res;
-        } else {    // In case of success, we want the number of word written
-          latin1_output += res.count;
         }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
       }
-      return result(error_code::SUCCESS, latin1_output - start);
     }
+    return result(error_code::SUCCESS, latin1_output - start);
+  }
 
-    simdutf_really_inline bool errors() const {
-      return this->error.any_bits_set_anywhere();
-    }
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-  }; // struct utf8_checker
-} // utf8_to_latin1 namespace
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
 } // unnamed namespace
 } // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 /* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-
 namespace simdutf {
 namespace westmere {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
 
-
-    simdutf_really_inline size_t convert_valid(const char* in, size_t size, char* latin1_output) {
-      size_t pos = 0;
-      char* start{latin1_output};
-      // In the worst case, we have the haswell kernel which can cause an overflow of
-      // 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last 16 bytes,
-      // and if the data is valid, then it is entirely safe because 16 UTF-8 bytes generate
-      // much more than 8 bytes. However, you cannot generally assume that you have valid
-      // UTF-8 input, so we are going to go back from the end counting 8 leading bytes,
-      // to give us a good margin.
-      size_t leading_byte = 0;
-      size_t margin = size;
-      for(; margin > 0 && leading_byte < 8; margin--) {
-        leading_byte += (int8_t(in[margin-1]) > -65); //twos complement of -65 is 1011 1111 ...
-      }
-      // If the input is long enough, then we have that margin-1 is the eight last leading byte.
-      const size_t safety_margin = size - margin + 1; // to avoid overruns!
-      while(pos + 64 + safety_margin <= size) {
-        simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-        if(input.is_ascii()) {
-          input.store((int8_t*)latin1_output);
-          latin1_output += 64;
-          pos += 64;
-        } else {
-          // you might think that a for-loop would work, but under Visual Studio, it is not good enough.
-          uint64_t utf8_continuation_mask = input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in this case, we also have ASCII to account for.
-          uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-          uint64_t utf8_end_of_code_point_mask = utf8_leading_mask>>1;
-          // We process in blocks of up to 12 bytes except possibly
-          // for fast paths which may process up to 16 bytes. For the
-          // slow path to work, we should have at least 12 input bytes left.
-          size_t max_starting_point = (pos + 64) - 12;
-          // Next loop is going to run at least five times.
-          while(pos < max_starting_point) {
-            // Performance note: our ability to compute 'consumed' and
-            // then shift and recompute is critical. If there is a
-            // latency of, say, 4 cycles on getting 'consumed', then
-            // the inner loop might have a total latency of about 6 cycles.
-            // Yet we process between 6 to 12 inputs bytes, thus we get
-            // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-            // for this section of the code. Hence, there is a limit
-            // to how much we can further increase this latency before
-            // it seriously harms performance.
-            size_t consumed = convert_masked_utf8_to_latin1(in + pos,
-                            utf8_end_of_code_point_mask, latin1_output);
-            pos += consumed;
-            utf8_end_of_code_point_mask >>= consumed;
-          }
-          // At this point there may remain between 0 and 12 bytes in the
-          // 64-byte block. These bytes will be processed again. So we have an
-          // 80% efficiency (in the worst case). In practice we expect an
-          // 85% to 90% efficiency.
-        }
-      }
-      if(pos < size) {
-        size_t howmany  = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos, latin1_output);
-        latin1_output += howmany;
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
-      return latin1_output - start;
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
-
   }
-}   // utf8_to_latin1 namespace
-}   // unnamed namespace
-}   // namespace westmere
- // namespace simdutf
-/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
 
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace westmere
+} // namespace simdutf
+  // namespace simdutf
+/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
 //
 // Implementation-specific overrides
@@ -37437,39 +39713,57 @@ using namespace simd;
 namespace simdutf {
 namespace westmere {
 
-simdutf_warn_unused int implementation::detect_encodings(const char * input, size_t length) const noexcept {
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
   // todo: reimplement as a one-pass algorithm.
-  if(bom_encoding != encoding_type::unspecified) { return bom_encoding; }
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
   int out = 0;
-  if(validate_utf8(input, length)) { out |= encoding_type::UTF8; }
-  if((length % 2) == 0) {
-    if(validate_utf16le(reinterpret_cast<const char16_t*>(input), length/2)) { out |= encoding_type::UTF16_LE; }
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
   }
-  if((length % 4) == 0) {
-    if(validate_utf32(reinterpret_cast<const char32_t*>(input), length/4)) { out |= encoding_type::UTF32_LE; }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
   }
   return out;
 }
 
-simdutf_warn_unused bool implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
   return westmere::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
   return westmere::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-simdutf_warn_unused bool implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
   return westmere::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(const char *buf, size_t len) const noexcept {
-  return westmere::utf8_validation::generic_validate_ascii_with_errors(buf,len);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return westmere::utf8_validation::generic_validate_ascii_with_errors(buf,
+                                                                       len);
 }
 
-simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-16. protect the implementation from
     // handling nullptr
@@ -37477,13 +39771,16 @@ simdutf_warn_unused bool implementation::validate_utf16le(const char16_t *buf, s
   }
   const char16_t *tail = sse_validate_utf16<endianness::LITTLE>(buf, len);
   if (tail) {
-    return scalar::utf16::validate<endianness::LITTLE>(tail, len - (tail - buf));
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
   } else {
     return false;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-16. protect the implementation from
     // handling nullptr
@@ -37497,27 +39794,32 @@ simdutf_warn_unused bool implementation::validate_utf16be(const char16_t *buf, s
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   result res = sse_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
   result res = sse_validate_utf16_with_errors<endianness::BIG>(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(buf + res.count, len - res.count);
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
     // empty input is valid UTF-32. protect the implementation from
     // handling nullptr
@@ -37531,7 +39833,8 @@ simdutf_warn_unused bool implementation::validate_utf32(const char32_t *buf, siz
   }
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept {
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
   if (len == 0) {
     // empty input is valid UTF-32. protect the implementation from
     // handling nullptr
@@ -37539,161 +39842,216 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(const char
   }
   result res = sse_validate_utf32le_with_errors(buf, len);
   if (res.count != len) {
-    result scalar_res = scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
     return result(scalar_res.error, res.count + scalar_res.count);
   } else {
     return res;
   }
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(const char * buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
 
-  std::pair<const char*, char*> ret = sse_convert_latin1_to_utf8(buf, len, utf8_output);
+  std::pair<const char *, char *> ret =
+      sse_convert_latin1_to_utf8(buf, len, utf8_output);
   size_t converted_chars = ret.second - utf8_output;
 
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
-      ret.first, len - (ret.first - buf), ret.second);
+        ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
 
   return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-    std::pair<const char*, char16_t*> ret = sse_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf16_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::LITTLE>(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      sse_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
-    std::pair<const char*, char16_t*> ret = sse_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf16_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf16::convert<endianness::BIG>(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      sse_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
-    std::pair<const char*, char32_t*> ret = sse_convert_latin1_to_utf32(buf, len, utf32_output);
-    if (ret.first == nullptr) { return 0; }
-    size_t converted_chars = ret.second - utf32_output;
-    if (ret.first != buf + len) {
-        const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-                                              ret.first, len - (ret.first - buf), ret.second);
-        if (scalar_converted_chars == 0) { return 0; }
-        converted_chars += scalar_converted_chars;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      sse_convert_latin1_to_utf32(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
     }
-    return converted_chars;
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(const char* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
   utf8_to_latin1::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(const char* buf, size_t len, char* latin1_output) const noexcept {
-  return westmere::utf8_to_latin1::convert_valid(buf,len,latin1_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return westmere::utf8_to_latin1::convert_valid(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
-  return converter.convert_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(const char* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   utf8_to_utf16::validating_transcoder converter;
   return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
 }
 
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(const char* input, size_t size,
-    char16_t* utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,  utf16_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert(buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(const char* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   utf8_to_utf32::validating_transcoder converter;
   return converter.convert_with_errors(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(const char* input, size_t size,
-    char32_t* utf32_output) const noexcept {
-  return utf8_to_utf32::convert_valid(input, size,  utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = sse_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      sse_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = sse_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      sse_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
 
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_latin1::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = sse_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      sse_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37701,16 +40059,26 @@ simdutf_warn_unused result implementation::convert_utf16le_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<result, char*> ret = sse_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      sse_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                               latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37718,54 +40086,79 @@ simdutf_warn_unused result implementation::convert_utf16be_to_latin1_with_errors
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: we could provide an optimized function.
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(const char16_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: we could provide an optimized function.
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = sse_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      sse_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char16_t*, char*> ret = sse_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      sse_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf8::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = westmere::sse_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37773,17 +40166,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = westmere::sse_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::BIG>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37791,39 +40194,52 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16le_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(const char16_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = sse_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      sse_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - latin1_output;
   // if (ret.first != buf + len) {
   if (ret.first < buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = westmere::sse_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      westmere::sse_convert_utf32_to_latin1_with_errors(buf, len,
+                                                        latin1_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37831,34 +40247,46 @@ simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(c
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - latin1_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(const char32_t* buf, size_t len, char* latin1_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
   // optimization opportunity: we could provide an optimized function.
-  return convert_utf32_to_latin1(buf,len,latin1_output);
+  return convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  std::pair<const char32_t*, char*> ret = sse_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      sse_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf8_output;
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char*> ret = westmere::sse_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      westmere::sse_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+        buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37866,43 +40294,67 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(con
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf8_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = sse_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      sse_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  std::pair<const char16_t*, char32_t*> ret = sse_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      sse_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf32_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf16_to_utf32::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = westmere::sse_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37910,17 +40362,27 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char32_t*> ret = westmere::sse_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first.error) { return ret.first; }  // Can return directly since scalar fallback already found correct ret.first.count
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::BIG>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
   if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37928,46 +40390,68 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf32_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(const char32_t* buf, size_t len, char* utf8_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   return convert_utf32_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = sse_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      sse_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  std::pair<const char32_t*, char16_t*> ret = sse_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) { return 0; }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      sse_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
   size_t saved_bytes = ret.second - utf16_output;
   if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf16::convert<endianness::BIG>(
-                                        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) { return 0; }
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
     saved_bytes += scalar_saved_bytes;
   }
   return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = westmere::sse_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+          buf, len, utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37975,16 +40459,23 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of code units written even if finished
-  std::pair<result, char16_t*> ret = westmere::sse_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len, utf16_output);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::BIG>(
+          buf, len, utf16_output);
   if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-                                        buf + ret.first.count, len - ret.first.count, ret.second);
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
     if (scalar_res.error) {
       scalar_res.count += ret.first.count;
       return scalar_res;
@@ -37992,75 +40483,94 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
       ret.second += scalar_res.count;
     }
   }
-  ret.first.count = ret.second - utf16_output;   // Set count to the number of 8-bit code units written
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
   return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16le(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(const char32_t* buf, size_t len, char16_t* utf16_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   return convert_utf32_to_utf16be(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16le_to_utf32(buf, len, utf32_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(const char16_t* buf, size_t len, char32_t* utf32_output) const noexcept {
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   return convert_utf16be_to_utf32(buf, len, utf32_output);
 }
 
-void implementation::change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept {
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
   utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(const char* buf, size_t len) const noexcept {
-  return count_utf8(buf,len);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf16(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
   return scalar::utf16::latin1_length_from_utf16(length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf32(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
   return scalar::utf32::latin1_length_from_utf32(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_latin1(size_t length) const noexcept {
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char * input, size_t len) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t len) const noexcept {
   const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
   size_t answer = len / sizeof(__m128i) * sizeof(__m128i);
   size_t i = 0;
-  if(answer >= 2048) { // long strings optimization
+  if (answer >= 2048) { // long strings optimization
     __m128i two_64bits = _mm_setzero_si128();
     while (i + sizeof(__m128i) <= len) {
       __m128i runner = _mm_setzero_si128();
@@ -38069,135 +40579,166 @@ simdutf_warn_unused size_t implementation::utf8_length_from_latin1(const char *
         iterations = 255;
       }
       size_t max_i = i + iterations * sizeof(__m128i) - sizeof(__m128i);
-      for (; i + 4*sizeof(__m128i) <= max_i; i += 4*sizeof(__m128i)) {
+      for (; i + 4 * sizeof(__m128i) <= max_i; i += 4 * sizeof(__m128i)) {
         __m128i input1 = _mm_loadu_si128((const __m128i *)(str + i));
-        __m128i input2 = _mm_loadu_si128((const __m128i *)(str + i + sizeof(__m128i)));
-        __m128i input3 = _mm_loadu_si128((const __m128i *)(str + i + 2*sizeof(__m128i)));
-        __m128i input4 = _mm_loadu_si128((const __m128i *)(str + i + 3*sizeof(__m128i)));
-        __m128i input12 = _mm_add_epi8(
-                                        _mm_cmpgt_epi8(
-                                                      _mm_setzero_si128(),
-                                                      input1),
-                                        _mm_cmpgt_epi8(
-                                                      _mm_setzero_si128(),
-                                                      input2));
-        __m128i input34 = _mm_add_epi8(
-                                        _mm_cmpgt_epi8(
-                                                      _mm_setzero_si128(),
-                                                      input3),
-                                        _mm_cmpgt_epi8(
-                                                      _mm_setzero_si128(),
-                                                      input4));
+        __m128i input2 =
+            _mm_loadu_si128((const __m128i *)(str + i + sizeof(__m128i)));
+        __m128i input3 =
+            _mm_loadu_si128((const __m128i *)(str + i + 2 * sizeof(__m128i)));
+        __m128i input4 =
+            _mm_loadu_si128((const __m128i *)(str + i + 3 * sizeof(__m128i)));
+        __m128i input12 =
+            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input1),
+                         _mm_cmpgt_epi8(_mm_setzero_si128(), input2));
+        __m128i input34 =
+            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input3),
+                         _mm_cmpgt_epi8(_mm_setzero_si128(), input4));
         __m128i input1234 = _mm_add_epi8(input12, input34);
         runner = _mm_sub_epi8(runner, input1234);
       }
       for (; i <= max_i; i += sizeof(__m128i)) {
         __m128i more_input = _mm_loadu_si128((const __m128i *)(str + i));
-        runner = _mm_sub_epi8(
-            runner, _mm_cmpgt_epi8(_mm_setzero_si128(), more_input));
+        runner = _mm_sub_epi8(runner,
+                              _mm_cmpgt_epi8(_mm_setzero_si128(), more_input));
       }
-      two_64bits = _mm_add_epi64(
-          two_64bits, _mm_sad_epu8(runner, _mm_setzero_si128()));
+      two_64bits =
+          _mm_add_epi64(two_64bits, _mm_sad_epu8(runner, _mm_setzero_si128()));
     }
-    answer += _mm_extract_epi64(two_64bits, 0) +
-              _mm_extract_epi64(two_64bits, 1);
+    answer +=
+        _mm_extract_epi64(two_64bits, 0) + _mm_extract_epi64(two_64bits, 1);
   } else if (answer > 0) { // short string optimization
-    for(; i + 2*sizeof(__m128i) <= len; i += 2*sizeof(__m128i)) {
-      __m128i latin = _mm_loadu_si128((const __m128i*)(input + i));
+    for (; i + 2 * sizeof(__m128i) <= len; i += 2 * sizeof(__m128i)) {
+      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
       uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
       answer += count_ones(non_ascii);
-      latin = _mm_loadu_si128((const __m128i*)(input + i)+1);
+      latin = _mm_loadu_si128((const __m128i *)(input + i) + 1);
       non_ascii = (uint16_t)_mm_movemask_epi8(latin);
       answer += count_ones(non_ascii);
     }
-    for(; i + sizeof(__m128i) <= len; i += sizeof(__m128i)) {
-      __m128i latin = _mm_loadu_si128((const __m128i*)(input + i));
+    for (; i + sizeof(__m128i) <= len; i += sizeof(__m128i)) {
+      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
       uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
       answer += count_ones(non_ascii);
     }
   }
-  return answer + scalar::latin1::utf8_length_from_latin1(reinterpret_cast<const char *>(str + i), len - i);
+  return answer + scalar::latin1::utf8_length_from_latin1(
+                      reinterpret_cast<const char *>(str + i), len - i);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
   return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::utf16_length_from_utf8(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const __m128i v_00000000 = _mm_setzero_si128();
   const __m128i v_ffffff80 = _mm_set1_epi32((uint32_t)0xffffff80);
   const __m128i v_fffff800 = _mm_set1_epi32((uint32_t)0xfffff800);
   const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 4 <= length; pos += 4) {
-    __m128i in = _mm_loadu_si128((__m128i*)(input + pos));
-    const __m128i ascii_bytes_bytemask = _mm_cmpeq_epi32(_mm_and_si128(in, v_ffffff80), v_00000000);
-    const __m128i one_two_bytes_bytemask = _mm_cmpeq_epi32(_mm_and_si128(in, v_fffff800), v_00000000);
-    const __m128i two_bytes_bytemask = _mm_xor_si128(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const __m128i one_two_three_bytes_bytemask = _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
-    const __m128i three_bytes_bytemask = _mm_xor_si128(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
-    const uint16_t ascii_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(ascii_bytes_bytemask));
-    const uint16_t two_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(two_bytes_bytemask));
-    const uint16_t three_bytes_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(three_bytes_bytemask));
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
+    const __m128i ascii_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffffff80), v_00000000);
+    const __m128i one_two_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_fffff800), v_00000000);
+    const __m128i two_bytes_bytemask =
+        _mm_xor_si128(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const __m128i one_two_three_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
+    const __m128i three_bytes_bytemask =
+        _mm_xor_si128(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
+    const uint16_t ascii_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(ascii_bytes_bytemask));
+    const uint16_t two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(two_bytes_bytemask));
+    const uint16_t three_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(three_bytes_bytemask));
 
     size_t ascii_count = count_ones(ascii_bytes_bitmask) / 4;
     size_t two_bytes_count = count_ones(two_bytes_bitmask) / 4;
     size_t three_bytes_count = count_ones(three_bytes_bitmask) / 4;
-    count += 16 - 3*ascii_count - 2*two_bytes_count - three_bytes_count;
+    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
   }
-  return count + scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
   const __m128i v_00000000 = _mm_setzero_si128();
   const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for(;pos + 4 <= length; pos += 4) {
-    __m128i in = _mm_loadu_si128((__m128i*)(input + pos));
-    const __m128i surrogate_bytemask = _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
-    const uint16_t surrogate_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(surrogate_bytemask));
-    size_t surrogate_count = (16-count_ones(surrogate_bitmask))/4;
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
+    const __m128i surrogate_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
+    const uint16_t surrogate_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogate_bytemask));
+    size_t surrogate_count = (16 - count_ones(surrogate_bitmask)) / 4;
     count += 4 + surrogate_count;
   }
-  return count + scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
   return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept {
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options) const noexcept {
-  return (options & base64_url) ? compress_decode_base64<true>(output, input, length, options) : compress_decode_base64<false>(output, input, length, options);
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(size_t length, base64_options options) const noexcept {
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
 }
 
-size_t implementation::binary_to_base64(const char * input, size_t length, char* output, base64_options options) const noexcept {
-  if(options & base64_url) {
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
     return encode_base64<true>(output, input, length, options);
   } else {
     return encode_base64<false>(output, input, length, options);
diff --git a/deps/simdutf/simdutf.h b/deps/simdutf/simdutf.h
index 4b534c12563072..4ed08d542b0ac3 100644
--- a/deps/simdutf/simdutf.h
+++ b/deps/simdutf/simdutf.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-09-04 18:13:32 +0200. Do not edit! */
+/* auto-generated on 2024-10-11 12:35:29 -0400. Do not edit! */
 /* begin file include/simdutf.h */
 #ifndef SIMDUTF_H
 #define SIMDUTF_H
@@ -9,46 +9,44 @@
 #define SIMDUTF_COMPILER_CHECK_H
 
 #ifndef __cplusplus
-#error simdutf requires a C++ compiler
+  #error simdutf requires a C++ compiler
 #endif
 
 #ifndef SIMDUTF_CPLUSPLUS
-#if defined(_MSVC_LANG) && !defined(__clang__)
-#define SIMDUTF_CPLUSPLUS (_MSC_VER == 1900 ? 201103L : _MSVC_LANG)
-#else
-#define SIMDUTF_CPLUSPLUS __cplusplus
-#endif
+  #if defined(_MSVC_LANG) && !defined(__clang__)
+    #define SIMDUTF_CPLUSPLUS (_MSC_VER == 1900 ? 201103L : _MSVC_LANG)
+  #else
+    #define SIMDUTF_CPLUSPLUS __cplusplus
+  #endif
 #endif
 
-
 // C++ 23
 #if !defined(SIMDUTF_CPLUSPLUS23) && (SIMDUTF_CPLUSPLUS >= 202302L)
-#define SIMDUTF_CPLUSPLUS23 1
+  #define SIMDUTF_CPLUSPLUS23 1
 #endif
 
 // C++ 20
 #if !defined(SIMDUTF_CPLUSPLUS20) && (SIMDUTF_CPLUSPLUS >= 202002L)
-#define SIMDUTF_CPLUSPLUS20 1
+  #define SIMDUTF_CPLUSPLUS20 1
 #endif
 
-
 // C++ 17
 #if !defined(SIMDUTF_CPLUSPLUS17) && (SIMDUTF_CPLUSPLUS >= 201703L)
-#define SIMDUTF_CPLUSPLUS17 1
+  #define SIMDUTF_CPLUSPLUS17 1
 #endif
 
 // C++ 14
 #if !defined(SIMDUTF_CPLUSPLUS14) && (SIMDUTF_CPLUSPLUS >= 201402L)
-#define SIMDUTF_CPLUSPLUS14 1
+  #define SIMDUTF_CPLUSPLUS14 1
 #endif
 
 // C++ 11
 #if !defined(SIMDUTF_CPLUSPLUS11) && (SIMDUTF_CPLUSPLUS >= 201103L)
-#define SIMDUTF_CPLUSPLUS11 1
+  #define SIMDUTF_CPLUSPLUS11 1
 #endif
 
 #ifndef SIMDUTF_CPLUSPLUS11
-#error simdutf requires a compiler compliant with the C++11 standard
+  #error simdutf requires a compiler compliant with the C++11 standard
 #endif
 
 #endif // SIMDUTF_COMPILER_CHECK_H
@@ -68,8 +66,8 @@
 #include <cfloat>
 #include <cassert>
 #ifndef _WIN32
-// strcasecmp, strncasecmp
-#include <strings.h>
+  // strcasecmp, strncasecmp
+  #include <strings.h>
 #endif
 
 /**
@@ -78,131 +76,134 @@
  */
 
 #if defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__)
-#define SIMDUTF_IS_BIG_ENDIAN (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+  #define SIMDUTF_IS_BIG_ENDIAN (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
 #elif defined(_WIN32)
-#define SIMDUTF_IS_BIG_ENDIAN 0
+  #define SIMDUTF_IS_BIG_ENDIAN 0
 #else
-#if defined(__APPLE__) || defined(__FreeBSD__) // defined __BYTE_ORDER__ && defined __ORDER_BIG_ENDIAN__
-#include <machine/endian.h>
-#elif defined(sun) || defined(__sun) // defined(__APPLE__) || defined(__FreeBSD__)
-#include <sys/byteorder.h>
-#else  // defined(__APPLE__) || defined(__FreeBSD__)
-
-#ifdef __has_include
-#if __has_include(<endian.h>)
-#include <endian.h>
-#endif //__has_include(<endian.h>)
-#endif //__has_include
-
-#endif // defined(__APPLE__) || defined(__FreeBSD__)
-
-
-#ifndef !defined(__BYTE_ORDER__) || !defined(__ORDER_LITTLE_ENDIAN__)
-#define SIMDUTF_IS_BIG_ENDIAN 0
-#endif
+  #if defined(__APPLE__) ||                                                    \
+      defined(__FreeBSD__) // defined __BYTE_ORDER__ && defined
+                           // __ORDER_BIG_ENDIAN__
+    #include <machine/endian.h>
+  #elif defined(sun) ||                                                        \
+      defined(__sun) // defined(__APPLE__) || defined(__FreeBSD__)
+    #include <sys/byteorder.h>
+  #else // defined(__APPLE__) || defined(__FreeBSD__)
+
+    #ifdef __has_include
+      #if __has_include(<endian.h>)
+        #include <endian.h>
+      #endif //__has_include(<endian.h>)
+    #endif   //__has_include
+
+  #endif // defined(__APPLE__) || defined(__FreeBSD__)
+
+  #ifndef !defined(__BYTE_ORDER__) || !defined(__ORDER_LITTLE_ENDIAN__)
+    #define SIMDUTF_IS_BIG_ENDIAN 0
+  #endif
 
-#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-#define SIMDUTF_IS_BIG_ENDIAN 0
-#else // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-#define SIMDUTF_IS_BIG_ENDIAN 1
-#endif // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+    #define SIMDUTF_IS_BIG_ENDIAN 0
+  #else // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+    #define SIMDUTF_IS_BIG_ENDIAN 1
+  #endif // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 
 #endif // defined __BYTE_ORDER__ && defined __ORDER_BIG_ENDIAN__
 
-
 /**
  * At this point in time, SIMDUTF_IS_BIG_ENDIAN is defined.
  */
 
 #ifdef _MSC_VER
-#define SIMDUTF_VISUAL_STUDIO 1
-/**
- * We want to differentiate carefully between
- * clang under visual studio and regular visual
- * studio.
- *
- * Under clang for Windows, we enable:
- *  * target pragmas so that part and only part of the
- *     code gets compiled for advanced instructions.
- *
- */
-#ifdef __clang__
-// clang under visual studio
-#define SIMDUTF_CLANG_VISUAL_STUDIO 1
-#else
-// just regular visual studio (best guess)
-#define SIMDUTF_REGULAR_VISUAL_STUDIO 1
-#endif // __clang__
-#endif // _MSC_VER
+  #define SIMDUTF_VISUAL_STUDIO 1
+  /**
+   * We want to differentiate carefully between
+   * clang under visual studio and regular visual
+   * studio.
+   *
+   * Under clang for Windows, we enable:
+   *  * target pragmas so that part and only part of the
+   *     code gets compiled for advanced instructions.
+   *
+   */
+  #ifdef __clang__
+    // clang under visual studio
+    #define SIMDUTF_CLANG_VISUAL_STUDIO 1
+  #else
+    // just regular visual studio (best guess)
+    #define SIMDUTF_REGULAR_VISUAL_STUDIO 1
+  #endif // __clang__
+#endif   // _MSC_VER
 
 #ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-// https://en.wikipedia.org/wiki/C_alternative_tokens
-// This header should have no effect, except maybe
-// under Visual Studio.
-#include <iso646.h>
+  // https://en.wikipedia.org/wiki/C_alternative_tokens
+  // This header should have no effect, except maybe
+  // under Visual Studio.
+  #include <iso646.h>
 #endif
 
 #if (defined(__x86_64__) || defined(_M_AMD64)) && !defined(_M_ARM64EC)
-#define SIMDUTF_IS_X86_64 1
+  #define SIMDUTF_IS_X86_64 1
 #elif defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
-#define SIMDUTF_IS_ARM64 1
+  #define SIMDUTF_IS_ARM64 1
 #elif defined(__PPC64__) || defined(_M_PPC64)
-//#define SIMDUTF_IS_PPC64 1
-// The simdutf library does yet support SIMD acceleration under
-// POWER processors. Please see https://github.com/lemire/simdutf/issues/51
+// #define SIMDUTF_IS_PPC64 1
+//  The simdutf library does yet support SIMD acceleration under
+//  POWER processors. Please see https://github.com/lemire/simdutf/issues/51
 #elif defined(__s390__)
 // s390 IBM system. Big endian.
 #elif (defined(__riscv) || defined(__riscv__)) && __riscv_xlen == 64
-// RISC-V 64-bit
-#define SIMDUTF_IS_RISCV64 1
+  // RISC-V 64-bit
+  #define SIMDUTF_IS_RISCV64 1
 
-#if __clang_major__ >= 19
-// Does the compiler support target regions for RISC-V
-#define SIMDUTF_HAS_RVV_TARGET_REGION 1
-#endif
+  #if __clang_major__ >= 19
+    // Does the compiler support target regions for RISC-V
+    #define SIMDUTF_HAS_RVV_TARGET_REGION 1
+  #endif
 
-#if __riscv_v_intrinsic >= 11000
-#define SIMDUTF_HAS_RVV_INTRINSICS 1
-#endif
+  #if __riscv_v_intrinsic >= 11000
+    #define SIMDUTF_HAS_RVV_INTRINSICS 1
+  #endif
 
-#define SIMDUTF_HAS_ZVBB_INTRINSICS 0 // there is currently no way to detect this
+  #define SIMDUTF_HAS_ZVBB_INTRINSICS                                          \
+    0 // there is currently no way to detect this
 
-#if SIMDUTF_HAS_RVV_INTRINSICS && __riscv_vector && __riscv_v_min_vlen >= 128 && __riscv_v_elen >= 64
-// RISC-V V extension
-#define SIMDUTF_IS_RVV 1
-#if SIMDUTF_HAS_ZVBB_INTRINSICS && __riscv_zvbb >= 1000000
-// RISC-V Vector Basic Bit-manipulation
-#define SIMDUTF_IS_ZVBB 1
-#endif
-#endif
+  #if SIMDUTF_HAS_RVV_INTRINSICS && __riscv_vector &&                          \
+      __riscv_v_min_vlen >= 128 && __riscv_v_elen >= 64
+    // RISC-V V extension
+    #define SIMDUTF_IS_RVV 1
+    #if SIMDUTF_HAS_ZVBB_INTRINSICS && __riscv_zvbb >= 1000000
+      // RISC-V Vector Basic Bit-manipulation
+      #define SIMDUTF_IS_ZVBB 1
+    #endif
+  #endif
 
 #elif defined(__loongarch_lp64)
 // LoongArch 64-bit
 #else
-// The simdutf library is designed
-// for 64-bit processors and it seems that you are not
-// compiling for a known 64-bit platform. Please
-// use a 64-bit target such as x64 or 64-bit ARM for best performance.
-#define SIMDUTF_IS_32BITS 1
-
-// We do not support 32-bit platforms, but it can be
-// handy to identify them.
-#if defined(_M_IX86) || defined(__i386__)
-#define SIMDUTF_IS_X86_32BITS 1
-#elif defined(__arm__) || defined(_M_ARM)
-#define SIMDUTF_IS_ARM_32BITS 1
-#elif defined(__PPC__) || defined(_M_PPC)
-#define SIMDUTF_IS_PPC_32BITS 1
-#endif
+  // The simdutf library is designed
+  // for 64-bit processors and it seems that you are not
+  // compiling for a known 64-bit platform. Please
+  // use a 64-bit target such as x64 or 64-bit ARM for best performance.
+  #define SIMDUTF_IS_32BITS 1
+
+  // We do not support 32-bit platforms, but it can be
+  // handy to identify them.
+  #if defined(_M_IX86) || defined(__i386__)
+    #define SIMDUTF_IS_X86_32BITS 1
+  #elif defined(__arm__) || defined(_M_ARM)
+    #define SIMDUTF_IS_ARM_32BITS 1
+  #elif defined(__PPC__) || defined(_M_PPC)
+    #define SIMDUTF_IS_PPC_32BITS 1
+  #endif
 
 #endif // defined(__x86_64__) || defined(_M_AMD64)
 
 #ifdef SIMDUTF_IS_32BITS
-#ifndef SIMDUTF_NO_PORTABILITY_WARNING
-// In the future, we may want to warn users of 32-bit systems that
-// the simdutf does not support accelerated kernels for such systems.
-#endif // SIMDUTF_NO_PORTABILITY_WARNING
-#endif // SIMDUTF_IS_32BITS
+  #ifndef SIMDUTF_NO_PORTABILITY_WARNING
+  // In the future, we may want to warn users of 32-bit systems that
+  // the simdutf does not support accelerated kernels for such systems.
+  #endif // SIMDUTF_NO_PORTABILITY_WARNING
+#endif   // SIMDUTF_IS_32BITS
 
 // this is almost standard?
 #define SIMDUTF_STRINGIFY_IMPLEMENTATION_(a) #a
@@ -219,93 +220,98 @@
 // slower, but it should run everywhere.
 
 //
-// Enable valid runtime implementations, and select SIMDUTF_BUILTIN_IMPLEMENTATION
+// Enable valid runtime implementations, and select
+// SIMDUTF_BUILTIN_IMPLEMENTATION
 //
 
 // We are going to use runtime dispatch.
 #ifdef SIMDUTF_IS_X86_64
-#ifdef __clang__
-// clang does not have GCC push pop
-// warning: clang attribute push can't be used within a namespace in clang up
-// til 8.0 so SIMDUTF_TARGET_REGION and SIMDUTF_UNTARGET_REGION must be *outside* of a
-// namespace.
-#define SIMDUTF_TARGET_REGION(T)                                                       \
-  _Pragma(SIMDUTF_STRINGIFY(                                                           \
-      clang attribute push(__attribute__((target(T))), apply_to = function)))
-#define SIMDUTF_UNTARGET_REGION _Pragma("clang attribute pop")
-#elif defined(__GNUC__)
-// GCC is easier
-#define SIMDUTF_TARGET_REGION(T)                                                       \
-  _Pragma("GCC push_options") _Pragma(SIMDUTF_STRINGIFY(GCC target(T)))
-#define SIMDUTF_UNTARGET_REGION _Pragma("GCC pop_options")
-#endif // clang then gcc
+  #ifdef __clang__
+    // clang does not have GCC push pop
+    // warning: clang attribute push can't be used within a namespace in clang
+    // up til 8.0 so SIMDUTF_TARGET_REGION and SIMDUTF_UNTARGET_REGION must be
+    // *outside* of a namespace.
+    #define SIMDUTF_TARGET_REGION(T)                                           \
+      _Pragma(SIMDUTF_STRINGIFY(clang attribute push(                          \
+          __attribute__((target(T))), apply_to = function)))
+    #define SIMDUTF_UNTARGET_REGION _Pragma("clang attribute pop")
+  #elif defined(__GNUC__)
+    // GCC is easier
+    #define SIMDUTF_TARGET_REGION(T)                                           \
+      _Pragma("GCC push_options") _Pragma(SIMDUTF_STRINGIFY(GCC target(T)))
+    #define SIMDUTF_UNTARGET_REGION _Pragma("GCC pop_options")
+  #endif // clang then gcc
 
 #endif // x86
 
 // Default target region macros don't do anything.
 #ifndef SIMDUTF_TARGET_REGION
-#define SIMDUTF_TARGET_REGION(T)
-#define SIMDUTF_UNTARGET_REGION
+  #define SIMDUTF_TARGET_REGION(T)
+  #define SIMDUTF_UNTARGET_REGION
 #endif
 
 // Is threading enabled?
 #if defined(_REENTRANT) || defined(_MT)
-#ifndef SIMDUTF_THREADS_ENABLED
-#define SIMDUTF_THREADS_ENABLED
-#endif
+  #ifndef SIMDUTF_THREADS_ENABLED
+    #define SIMDUTF_THREADS_ENABLED
+  #endif
 #endif
 
 // workaround for large stack sizes under -O0.
 // https://github.com/simdutf/simdutf/issues/691
 #ifdef __APPLE__
-#ifndef __OPTIMIZE__
-// Apple systems have small stack sizes in secondary threads.
-// Lack of compiler optimization may generate high stack usage.
-// Users may want to disable threads for safety, but only when
-// in debug mode which we detect by the fact that the __OPTIMIZE__
-// macro is not defined.
-#undef SIMDUTF_THREADS_ENABLED
-#endif
+  #ifndef __OPTIMIZE__
+    // Apple systems have small stack sizes in secondary threads.
+    // Lack of compiler optimization may generate high stack usage.
+    // Users may want to disable threads for safety, but only when
+    // in debug mode which we detect by the fact that the __OPTIMIZE__
+    // macro is not defined.
+    #undef SIMDUTF_THREADS_ENABLED
+  #endif
 #endif
 
 #ifdef SIMDUTF_VISUAL_STUDIO
-// This is one case where we do not distinguish between
-// regular visual studio and clang under visual studio.
-// clang under Windows has _stricmp (like visual studio) but not strcasecmp (as clang normally has)
-#define simdutf_strcasecmp _stricmp
-#define simdutf_strncasecmp _strnicmp
+  // This is one case where we do not distinguish between
+  // regular visual studio and clang under visual studio.
+  // clang under Windows has _stricmp (like visual studio) but not strcasecmp
+  // (as clang normally has)
+  #define simdutf_strcasecmp _stricmp
+  #define simdutf_strncasecmp _strnicmp
 #else
-// The strcasecmp, strncasecmp, and strcasestr functions do not work with multibyte strings (e.g. UTF-8).
-// So they are only useful for ASCII in our context.
-// https://www.gnu.org/software/libunistring/manual/libunistring.html#char-_002a-strings
-#define simdutf_strcasecmp strcasecmp
-#define simdutf_strncasecmp strncasecmp
+  // The strcasecmp, strncasecmp, and strcasestr functions do not work with
+  // multibyte strings (e.g. UTF-8). So they are only useful for ASCII in our
+  // context.
+  // https://www.gnu.org/software/libunistring/manual/libunistring.html#char-_002a-strings
+  #define simdutf_strcasecmp strcasecmp
+  #define simdutf_strncasecmp strncasecmp
 #endif
 
 #ifdef NDEBUG
 
-#ifdef SIMDUTF_VISUAL_STUDIO
-#define SIMDUTF_UNREACHABLE() __assume(0)
-#define SIMDUTF_ASSUME(COND) __assume(COND)
-#else
-#define SIMDUTF_UNREACHABLE() __builtin_unreachable();
-#define SIMDUTF_ASSUME(COND) do { if (!(COND)) __builtin_unreachable(); } while (0)
-#endif
+  #ifdef SIMDUTF_VISUAL_STUDIO
+    #define SIMDUTF_UNREACHABLE() __assume(0)
+    #define SIMDUTF_ASSUME(COND) __assume(COND)
+  #else
+    #define SIMDUTF_UNREACHABLE() __builtin_unreachable();
+    #define SIMDUTF_ASSUME(COND)                                               \
+      do {                                                                     \
+        if (!(COND))                                                           \
+          __builtin_unreachable();                                             \
+      } while (0)
+  #endif
 
 #else // NDEBUG
 
-#define SIMDUTF_UNREACHABLE() assert(0);
-#define SIMDUTF_ASSUME(COND) assert(COND)
+  #define SIMDUTF_UNREACHABLE() assert(0);
+  #define SIMDUTF_ASSUME(COND) assert(COND)
 
 #endif
 
-
 #if defined(__GNUC__) && !defined(__clang__)
-#if __GNUC__ >= 11
-#define SIMDUTF_GCC11ORMORE 1
-#endif //  __GNUC__ >= 11
-#endif // defined(__GNUC__) && !defined(__clang__)
-
+  #if __GNUC__ >= 11
+    #define SIMDUTF_GCC11ORMORE 1
+  #endif //  __GNUC__ >= 11
+#endif   // defined(__GNUC__) && !defined(__clang__)
 
 #endif // SIMDUTF_PORTABILITY_H
 /* end file include/simdutf/portability.h */
@@ -323,80 +329,83 @@
 */
 
 #ifndef SIMDUTF_HAS_AVX512F
-# if defined(__AVX512F__) && __AVX512F__ == 1
-#   define SIMDUTF_HAS_AVX512F 1
-# endif
+  #if defined(__AVX512F__) && __AVX512F__ == 1
+    #define SIMDUTF_HAS_AVX512F 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512DQ
-# if defined(__AVX512DQ__) && __AVX512DQ__ == 1
-#   define SIMDUTF_HAS_AVX512DQ 1
-# endif
+  #if defined(__AVX512DQ__) && __AVX512DQ__ == 1
+    #define SIMDUTF_HAS_AVX512DQ 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512IFMA
-# if defined(__AVX512IFMA__) && __AVX512IFMA__ == 1
-#   define SIMDUTF_HAS_AVX512IFMA 1
-# endif
+  #if defined(__AVX512IFMA__) && __AVX512IFMA__ == 1
+    #define SIMDUTF_HAS_AVX512IFMA 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512CD
-# if defined(__AVX512CD__) && __AVX512CD__ == 1
-#   define SIMDUTF_HAS_AVX512CD 1
-# endif
+  #if defined(__AVX512CD__) && __AVX512CD__ == 1
+    #define SIMDUTF_HAS_AVX512CD 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512BW
-# if defined(__AVX512BW__) && __AVX512BW__ == 1
-#   define SIMDUTF_HAS_AVX512BW 1
-# endif
+  #if defined(__AVX512BW__) && __AVX512BW__ == 1
+    #define SIMDUTF_HAS_AVX512BW 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512VL
-# if defined(__AVX512VL__) && __AVX512VL__ == 1
-#   define SIMDUTF_HAS_AVX512VL 1
-# endif
+  #if defined(__AVX512VL__) && __AVX512VL__ == 1
+    #define SIMDUTF_HAS_AVX512VL 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512VBMI
-# if defined(__AVX512VBMI__) && __AVX512VBMI__ == 1
-#   define SIMDUTF_HAS_AVX512VBMI 1
-# endif
+  #if defined(__AVX512VBMI__) && __AVX512VBMI__ == 1
+    #define SIMDUTF_HAS_AVX512VBMI 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512VBMI2
-# if defined(__AVX512VBMI2__) && __AVX512VBMI2__ == 1
-#   define SIMDUTF_HAS_AVX512VBMI2 1
-# endif
+  #if defined(__AVX512VBMI2__) && __AVX512VBMI2__ == 1
+    #define SIMDUTF_HAS_AVX512VBMI2 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512VNNI
-# if defined(__AVX512VNNI__) && __AVX512VNNI__ == 1
-#   define SIMDUTF_HAS_AVX512VNNI 1
-# endif
+  #if defined(__AVX512VNNI__) && __AVX512VNNI__ == 1
+    #define SIMDUTF_HAS_AVX512VNNI 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512BITALG
-# if defined(__AVX512BITALG__) && __AVX512BITALG__ == 1
-#   define SIMDUTF_HAS_AVX512BITALG 1
-# endif
+  #if defined(__AVX512BITALG__) && __AVX512BITALG__ == 1
+    #define SIMDUTF_HAS_AVX512BITALG 1
+  #endif
 #endif
 
 #ifndef SIMDUTF_HAS_AVX512VPOPCNTDQ
-# if defined(__AVX512VPOPCNTDQ__) && __AVX512VPOPCNTDQ__ == 1
-#   define SIMDUTF_HAS_AVX512VPOPCNTDQ 1
-# endif
+  #if defined(__AVX512VPOPCNTDQ__) && __AVX512VPOPCNTDQ__ == 1
+    #define SIMDUTF_HAS_AVX512VPOPCNTDQ 1
+  #endif
 #endif
 
 #endif // SIMDUTF_AVX512_H_
 /* end file include/simdutf/avx512.h */
 
-
 #if defined(__GNUC__)
   // Marks a block with a name so that MCA analysis can see it.
-  #define SIMDUTF_BEGIN_DEBUG_BLOCK(name) __asm volatile("# LLVM-MCA-BEGIN " #name);
+  #define SIMDUTF_BEGIN_DEBUG_BLOCK(name)                                      \
+    __asm volatile("# LLVM-MCA-BEGIN " #name);
   #define SIMDUTF_END_DEBUG_BLOCK(name) __asm volatile("# LLVM-MCA-END " #name);
-  #define SIMDUTF_DEBUG_BLOCK(name, block) BEGIN_DEBUG_BLOCK(name); block; END_DEBUG_BLOCK(name);
+  #define SIMDUTF_DEBUG_BLOCK(name, block)                                     \
+    BEGIN_DEBUG_BLOCK(name);                                                   \
+    block;                                                                     \
+    END_DEBUG_BLOCK(name);
 #else
   #define SIMDUTF_BEGIN_DEBUG_BLOCK(name)
   #define SIMDUTF_END_DEBUG_BLOCK(name)
@@ -404,55 +413,59 @@
 #endif
 
 // Align to N-byte boundary
-#define SIMDUTF_ROUNDUP_N(a, n) (((a) + ((n)-1)) & ~((n)-1))
-#define SIMDUTF_ROUNDDOWN_N(a, n) ((a) & ~((n)-1))
+#define SIMDUTF_ROUNDUP_N(a, n) (((a) + ((n) - 1)) & ~((n) - 1))
+#define SIMDUTF_ROUNDDOWN_N(a, n) ((a) & ~((n) - 1))
 
-#define SIMDUTF_ISALIGNED_N(ptr, n) (((uintptr_t)(ptr) & ((n)-1)) == 0)
+#define SIMDUTF_ISALIGNED_N(ptr, n) (((uintptr_t)(ptr) & ((n) - 1)) == 0)
 
 #if defined(SIMDUTF_REGULAR_VISUAL_STUDIO)
   #define SIMDUTF_DEPRECATED __declspec(deprecated)
 
-
-  #define simdutf_really_inline __forceinline
+  #define simdutf_really_inline __forceinline // really inline in release mode
+  #define simdutf_always_inline __forceinline // always inline, no matter what
   #define simdutf_never_inline __declspec(noinline)
 
   #define simdutf_unused
   #define simdutf_warn_unused
 
   #ifndef simdutf_likely
-  #define simdutf_likely(x) x
+    #define simdutf_likely(x) x
   #endif
   #ifndef simdutf_unlikely
-  #define simdutf_unlikely(x) x
+    #define simdutf_unlikely(x) x
   #endif
 
-  #define SIMDUTF_PUSH_DISABLE_WARNINGS __pragma(warning( push ))
-  #define SIMDUTF_PUSH_DISABLE_ALL_WARNINGS __pragma(warning( push, 0 ))
-  #define SIMDUTF_DISABLE_VS_WARNING(WARNING_NUMBER) __pragma(warning( disable : WARNING_NUMBER ))
+  #define SIMDUTF_PUSH_DISABLE_WARNINGS __pragma(warning(push))
+  #define SIMDUTF_PUSH_DISABLE_ALL_WARNINGS __pragma(warning(push, 0))
+  #define SIMDUTF_DISABLE_VS_WARNING(WARNING_NUMBER)                           \
+    __pragma(warning(disable : WARNING_NUMBER))
   // Get rid of Intellisense-only warnings (Code Analysis)
-  // Though __has_include is C++17, it is supported in Visual Studio 2017 or better (_MSC_VER>=1910).
+  // Though __has_include is C++17, it is supported in Visual Studio 2017 or
+  // better (_MSC_VER>=1910).
   #ifdef __has_include
-  #if __has_include(<CppCoreCheck\Warnings.h>)
-  #include <CppCoreCheck\Warnings.h>
-  #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS SIMDUTF_DISABLE_VS_WARNING(ALL_CPPCORECHECK_WARNINGS)
-  #endif
+    #if __has_include(<CppCoreCheck\Warnings.h>)
+      #include <CppCoreCheck\Warnings.h>
+      #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS                               \
+        SIMDUTF_DISABLE_VS_WARNING(ALL_CPPCORECHECK_WARNINGS)
+    #endif
   #endif
 
   #ifndef SIMDUTF_DISABLE_UNDESIRED_WARNINGS
-  #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS
+    #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS
   #endif
 
   #define SIMDUTF_DISABLE_DEPRECATED_WARNING SIMDUTF_DISABLE_VS_WARNING(4996)
   #define SIMDUTF_DISABLE_STRICT_OVERFLOW_WARNING
-  #define SIMDUTF_POP_DISABLE_WARNINGS __pragma(warning( pop ))
+  #define SIMDUTF_POP_DISABLE_WARNINGS __pragma(warning(pop))
 
 #else // SIMDUTF_REGULAR_VISUAL_STUDIO
-#if defined(__OPTIMIZE__) || defined(NDEBUG)
-  #define simdutf_really_inline inline __attribute__((always_inline))
-#else
-  #define simdutf_really_inline inline
-#endif
-
+  #if defined(__OPTIMIZE__) || defined(NDEBUG)
+    #define simdutf_really_inline inline __attribute__((always_inline))
+  #else
+    #define simdutf_really_inline inline
+  #endif
+  #define simdutf_always_inline                                                \
+    inline __attribute__((always_inline)) // always inline, no matter what
   #define SIMDUTF_DEPRECATED __attribute__((deprecated))
   #define simdutf_never_inline inline __attribute__((noinline))
 
@@ -460,61 +473,72 @@
   #define simdutf_warn_unused __attribute__((warn_unused_result))
 
   #ifndef simdutf_likely
-  #define simdutf_likely(x) __builtin_expect(!!(x), 1)
+    #define simdutf_likely(x) __builtin_expect(!!(x), 1)
   #endif
   #ifndef simdutf_unlikely
-  #define simdutf_unlikely(x) __builtin_expect(!!(x), 0)
+    #define simdutf_unlikely(x) __builtin_expect(!!(x), 0)
   #endif
 
+  // clang-format off
   #define SIMDUTF_PUSH_DISABLE_WARNINGS _Pragma("GCC diagnostic push")
-  // gcc doesn't seem to disable all warnings with all and extra, add warnings here as necessary
-  #define SIMDUTF_PUSH_DISABLE_ALL_WARNINGS SIMDUTF_PUSH_DISABLE_WARNINGS \
-    SIMDUTF_DISABLE_GCC_WARNING(-Weffc++) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wall) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wconversion) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wextra) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wattributes) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wimplicit-fallthrough) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wnon-virtual-dtor) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wreturn-type) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wshadow) \
-    SIMDUTF_DISABLE_GCC_WARNING(-Wunused-parameter) \
+  // gcc doesn't seem to disable all warnings with all and extra, add warnings
+  // here as necessary
+  #define SIMDUTF_PUSH_DISABLE_ALL_WARNINGS                                    \
+    SIMDUTF_PUSH_DISABLE_WARNINGS                                              \
+    SIMDUTF_DISABLE_GCC_WARNING(-Weffc++)                                      \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wall)                                         \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wconversion)                                  \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wextra)                                       \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wattributes)                                  \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wimplicit-fallthrough)                        \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wnon-virtual-dtor)                            \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wreturn-type)                                 \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wshadow)                                      \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wunused-parameter)                            \
     SIMDUTF_DISABLE_GCC_WARNING(-Wunused-variable)
   #define SIMDUTF_PRAGMA(P) _Pragma(#P)
-  #define SIMDUTF_DISABLE_GCC_WARNING(WARNING) SIMDUTF_PRAGMA(GCC diagnostic ignored #WARNING)
+  #define SIMDUTF_DISABLE_GCC_WARNING(WARNING)                                 \
+    SIMDUTF_PRAGMA(GCC diagnostic ignored #WARNING)
   #if defined(SIMDUTF_CLANG_VISUAL_STUDIO)
-  #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS SIMDUTF_DISABLE_GCC_WARNING(-Wmicrosoft-include)
+    #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS                                 \
+      SIMDUTF_DISABLE_GCC_WARNING(-Wmicrosoft-include)
   #else
-  #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS
+    #define SIMDUTF_DISABLE_UNDESIRED_WARNINGS
   #endif
-  #define SIMDUTF_DISABLE_DEPRECATED_WARNING SIMDUTF_DISABLE_GCC_WARNING(-Wdeprecated-declarations)
-  #define SIMDUTF_DISABLE_STRICT_OVERFLOW_WARNING SIMDUTF_DISABLE_GCC_WARNING(-Wstrict-overflow)
+  #define SIMDUTF_DISABLE_DEPRECATED_WARNING                                   \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wdeprecated-declarations)
+  #define SIMDUTF_DISABLE_STRICT_OVERFLOW_WARNING                              \
+    SIMDUTF_DISABLE_GCC_WARNING(-Wstrict-overflow)
   #define SIMDUTF_POP_DISABLE_WARNINGS _Pragma("GCC diagnostic pop")
-
-
+  // clang-format on
 
 #endif // MSC_VER
 
 #ifndef SIMDUTF_DLLIMPORTEXPORT
-    #if defined(SIMDUTF_VISUAL_STUDIO)
-      /**
-       * It does not matter here whether you are using
-       * the regular visual studio or clang under visual
-       * studio.
-       */
-      #if SIMDUTF_USING_LIBRARY
+  #if defined(SIMDUTF_VISUAL_STUDIO)
+    /**
+     * It does not matter here whether you are using
+     * the regular visual studio or clang under visual
+     * studio.
+     */
+    #if SIMDUTF_USING_LIBRARY
       #define SIMDUTF_DLLIMPORTEXPORT __declspec(dllimport)
-      #else
-      #define SIMDUTF_DLLIMPORTEXPORT __declspec(dllexport)
-      #endif
     #else
-      #define SIMDUTF_DLLIMPORTEXPORT
+      #define SIMDUTF_DLLIMPORTEXPORT __declspec(dllexport)
     #endif
+  #else
+    #define SIMDUTF_DLLIMPORTEXPORT
+  #endif
 #endif
 
 /// If EXPR is an error, returns it.
-#define SIMDUTF_TRY(EXPR) { auto _err = (EXPR); if (_err) { return _err; } }
-
+#define SIMDUTF_TRY(EXPR)                                                      \
+  {                                                                            \
+    auto _err = (EXPR);                                                        \
+    if (_err) {                                                                \
+      return _err;                                                             \
+    }                                                                          \
+  }
 
 #endif // SIMDUTF_COMMON_DEFS_H
 /* end file include/simdutf/common_defs.h */
@@ -524,20 +548,17 @@
 namespace simdutf {
 
 enum encoding_type {
-        UTF8 = 1,       // BOM 0xef 0xbb 0xbf
-        UTF16_LE = 2,   // BOM 0xff 0xfe
-        UTF16_BE = 4,   // BOM 0xfe 0xff
-        UTF32_LE = 8,   // BOM 0xff 0xfe 0x00 0x00
-        UTF32_BE = 16,   // BOM 0x00 0x00 0xfe 0xff
-        Latin1 = 32,
-
-        unspecified = 0
+  UTF8 = 1,      // BOM 0xef 0xbb 0xbf
+  UTF16_LE = 2,  // BOM 0xff 0xfe
+  UTF16_BE = 4,  // BOM 0xfe 0xff
+  UTF32_LE = 8,  // BOM 0xff 0xfe 0x00 0x00
+  UTF32_BE = 16, // BOM 0x00 0x00 0xfe 0xff
+  Latin1 = 32,
+
+  unspecified = 0
 };
 
-enum endianness {
-        LITTLE = 0,
-        BIG = 1
-};
+enum endianness { LITTLE = 0, BIG = 1 };
 
 bool match_system(endianness e);
 
@@ -553,8 +574,8 @@ namespace BOM {
  * @return the corresponding encoding
  */
 
-encoding_type check_bom(const uint8_t* byte, size_t length);
-encoding_type check_bom(const char* byte, size_t length);
+encoding_type check_bom(const uint8_t *byte, size_t length);
+encoding_type check_bom(const char *byte, size_t length);
 /**
  * Returns the size, in bytes, of the BOM for a given encoding type.
  * Note that UTF8 BOM are discouraged.
@@ -563,8 +584,8 @@ encoding_type check_bom(const char* byte, size_t length);
  */
 size_t bom_byte_size(encoding_type bom);
 
-} // BOM namespace
-} // simdutf namespace
+} // namespace BOM
+} // namespace simdutf
 /* end file include/simdutf/encoding_types.h */
 /* begin file include/simdutf/error.h */
 #ifndef SIMDUTF_ERROR_H
@@ -573,32 +594,43 @@ namespace simdutf {
 
 enum error_code {
   SUCCESS = 0,
-  HEADER_BITS,  // Any byte must have fewer than 5 header bits.
-  TOO_SHORT,    // The leading byte must be followed by N-1 continuation bytes, where N is the UTF-8 character length
-                // This is also the error when the input is truncated.
-  TOO_LONG,     // We either have too many consecutive continuation bytes or the string starts with a continuation byte.
-  OVERLONG,     // The decoded character must be above U+7F for two-byte characters, U+7FF for three-byte characters,
-                // and U+FFFF for four-byte characters.
-  TOO_LARGE,    // The decoded character must be less than or equal to U+10FFFF,less than or equal than U+7F for ASCII OR less than equal than U+FF for Latin1
-  SURROGATE,    // The decoded character must be not be in U+D800...DFFF (UTF-8 or UTF-32) OR
-                // a high surrogate must be followed by a low surrogate and a low surrogate must be preceded by a high surrogate (UTF-16) OR
-                // there must be no surrogate at all (Latin1)
-  INVALID_BASE64_CHARACTER, // Found a character that cannot be part of a valid base64 string.
-  BASE64_INPUT_REMAINDER, // The base64 input terminates with a single character, excluding padding (=).
-  OUTPUT_BUFFER_TOO_SMALL, // The provided buffer is too small.
-  OTHER         // Not related to validation/transcoding.
+  HEADER_BITS, // Any byte must have fewer than 5 header bits.
+  TOO_SHORT,   // The leading byte must be followed by N-1 continuation bytes,
+               // where N is the UTF-8 character length This is also the error
+               // when the input is truncated.
+  TOO_LONG,    // We either have too many consecutive continuation bytes or the
+               // string starts with a continuation byte.
+  OVERLONG, // The decoded character must be above U+7F for two-byte characters,
+            // U+7FF for three-byte characters, and U+FFFF for four-byte
+            // characters.
+  TOO_LARGE, // The decoded character must be less than or equal to
+             // U+10FFFF,less than or equal than U+7F for ASCII OR less than
+             // equal than U+FF for Latin1
+  SURROGATE, // The decoded character must be not be in U+D800...DFFF (UTF-8 or
+             // UTF-32) OR a high surrogate must be followed by a low surrogate
+             // and a low surrogate must be preceded by a high surrogate
+             // (UTF-16) OR there must be no surrogate at all (Latin1)
+  INVALID_BASE64_CHARACTER, // Found a character that cannot be part of a valid
+                            // base64 string.
+  BASE64_INPUT_REMAINDER,   // The base64 input terminates with a single
+                            // character, excluding padding (=).
+  OUTPUT_BUFFER_TOO_SMALL,  // The provided buffer is too small.
+  OTHER                     // Not related to validation/transcoding.
 };
 
 struct result {
   error_code error;
-  size_t count;     // In case of error, indicates the position of the error. In case of success, indicates the number of code units validated/written.
+  size_t count; // In case of error, indicates the position of the error. In
+                // case of success, indicates the number of code units
+                // validated/written.
 
   simdutf_really_inline result() : error{error_code::SUCCESS}, count{0} {}
 
-  simdutf_really_inline result(error_code _err, size_t _pos) : error{_err}, count{_pos} {}
+  simdutf_really_inline result(error_code _err, size_t _pos)
+      : error{_err}, count{_pos} {}
 };
 
-}
+} // namespace simdutf
 #endif
 /* end file include/simdutf/error.h */
 
@@ -613,7 +645,7 @@ SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 #define SIMDUTF_SIMDUTF_VERSION_H
 
 /** The version of simdutf being used (major.minor.revision) */
-#define SIMDUTF_VERSION "5.5.0"
+#define SIMDUTF_VERSION "5.6.0"
 
 namespace simdutf {
 enum {
@@ -624,7 +656,7 @@ enum {
   /**
    * The minor version (major.MINOR.revision) of simdutf being used.
    */
-  SIMDUTF_VERSION_MINOR = 5,
+  SIMDUTF_VERSION_MINOR = 6,
   /**
    * The revision (major.minor.REVISION) of simdutf being used.
    */
@@ -639,10 +671,10 @@ enum {
 #define SIMDUTF_IMPLEMENTATION_H
 #include <string>
 #if !defined(SIMDUTF_NO_THREADS)
-#include <atomic>
+  #include <atomic>
 #endif
-#include <vector>
 #include <tuple>
+#include <vector>
 /* begin file include/simdutf/internal/isadetection.h */
 /* From
 https://github.com/endorno/pytorch/blob/master/torch/lib/TH/generic/simd/simd.h
@@ -695,21 +727,24 @@ POSSIBILITY OF SUCH DAMAGE.
 #include <cstdint>
 #include <cstdlib>
 #if defined(_MSC_VER)
-#include <intrin.h>
+  #include <intrin.h>
 #elif defined(HAVE_GCC_GET_CPUID) && defined(USE_GCC_GET_CPUID)
-#include <cpuid.h>
+  #include <cpuid.h>
 #endif
 
 
 // RISC-V ISA detection utilities
 #if SIMDUTF_IS_RISCV64 && defined(__linux__)
-#include <unistd.h> // for syscall
+  #include <unistd.h> // for syscall
 // We define these ourselves, for backwards compatibility
-struct simdutf_riscv_hwprobe { int64_t key; uint64_t value; };
-#define simdutf_riscv_hwprobe(...) syscall(258, __VA_ARGS__)
-#define SIMDUTF_RISCV_HWPROBE_KEY_IMA_EXT_0 4
-#define SIMDUTF_RISCV_HWPROBE_IMA_V    (1 << 2)
-#define SIMDUTF_RISCV_HWPROBE_EXT_ZVBB (1 << 17)
+struct simdutf_riscv_hwprobe {
+  int64_t key;
+  uint64_t value;
+};
+  #define simdutf_riscv_hwprobe(...) syscall(258, __VA_ARGS__)
+  #define SIMDUTF_RISCV_HWPROBE_KEY_IMA_EXT_0 4
+  #define SIMDUTF_RISCV_HWPROBE_IMA_V (1 << 2)
+  #define SIMDUTF_RISCV_HWPROBE_EXT_ZVBB (1 << 17)
 #endif // SIMDUTF_IS_RISCV64 && defined(__linux__)
 
 namespace simdutf {
@@ -748,15 +783,16 @@ static inline uint32_t detect_supported_architectures() {
 
 static inline uint32_t detect_supported_architectures() {
   uint32_t host_isa = instruction_set::DEFAULT;
-#if SIMDUTF_IS_RVV
+  #if SIMDUTF_IS_RVV
   host_isa |= instruction_set::RVV;
-#endif
-#if SIMDUTF_IS_ZVBB
+  #endif
+  #if SIMDUTF_IS_ZVBB
   host_isa |= instruction_set::ZVBB;
-#endif
-#if defined(__linux__)
-  simdutf_riscv_hwprobe probes[] = { { SIMDUTF_RISCV_HWPROBE_KEY_IMA_EXT_0, 0 } };
-  long ret = simdutf_riscv_hwprobe(&probes, sizeof probes/sizeof *probes, 0, nullptr, 0);
+  #endif
+  #if defined(__linux__)
+  simdutf_riscv_hwprobe probes[] = {{SIMDUTF_RISCV_HWPROBE_KEY_IMA_EXT_0, 0}};
+  long ret = simdutf_riscv_hwprobe(&probes, sizeof probes / sizeof *probes, 0,
+                                   nullptr, 0);
   if (ret == 0) {
     uint64_t extensions = probes[0].value;
     if (extensions & SIMDUTF_RISCV_HWPROBE_IMA_V)
@@ -764,11 +800,11 @@ static inline uint32_t detect_supported_architectures() {
     if (extensions & SIMDUTF_RISCV_HWPROBE_EXT_ZVBB)
       host_isa |= instruction_set::ZVBB;
   }
-#endif
-#if defined(RUN_IN_SPIKE_SIMULATOR)
+  #endif
+  #if defined(RUN_IN_SPIKE_SIMULATOR)
   // Proxy Kernel does not implement yet hwprobe syscall
   host_isa |= instruction_set::RVV;
-#endif
+  #endif
   return host_isa;
 }
 
@@ -780,80 +816,82 @@ static inline uint32_t detect_supported_architectures() {
 
 #elif defined(__x86_64__) || defined(_M_AMD64) // x64
 
-
 namespace {
 namespace cpuid_bit {
-    // Can be found on Intel ISA Reference for CPUID
-
-    // EAX = 0x01
-    constexpr uint32_t pclmulqdq = uint32_t(1) << 1; ///< @private bit  1 of ECX for EAX=0x1
-    constexpr uint32_t sse42 = uint32_t(1) << 20;    ///< @private bit 20 of ECX for EAX=0x1
-    constexpr uint32_t osxsave = (uint32_t(1) << 26) | (uint32_t(1) << 27); ///< @private bits 26+27 of ECX for EAX=0x1
-
-    // EAX = 0x7f (Structured Extended Feature Flags), ECX = 0x00 (Sub-leaf)
-    // See: "Table 3-8. Information Returned by CPUID Instruction"
-    namespace ebx {
-      constexpr uint32_t bmi1 = uint32_t(1) << 3;
-      constexpr uint32_t avx2 = uint32_t(1) << 5;
-      constexpr uint32_t bmi2 = uint32_t(1) << 8;
-      constexpr uint32_t avx512f = uint32_t(1) << 16;
-      constexpr uint32_t avx512dq = uint32_t(1) << 17;
-      constexpr uint32_t avx512ifma = uint32_t(1) << 21;
-      constexpr uint32_t avx512cd = uint32_t(1) << 28;
-      constexpr uint32_t avx512bw = uint32_t(1) << 30;
-      constexpr uint32_t avx512vl = uint32_t(1) << 31;
-    }
-
-    namespace ecx {
-      constexpr uint32_t avx512vbmi = uint32_t(1) << 1;
-      constexpr uint32_t avx512vbmi2 = uint32_t(1) << 6;
-      constexpr uint32_t avx512vnni = uint32_t(1) << 11;
-      constexpr uint32_t avx512bitalg = uint32_t(1) << 12;
-      constexpr uint32_t avx512vpopcnt = uint32_t(1) << 14;
-    }
-    namespace edx {
-      constexpr uint32_t avx512vp2intersect = uint32_t(1) << 8;
-    }
-    namespace xcr0_bit {
-     constexpr uint64_t avx256_saved = uint64_t(1) << 2; ///< @private bit 2 = AVX
-     constexpr uint64_t avx512_saved = uint64_t(7) << 5; ///< @private bits 5,6,7 = opmask, ZMM_hi256, hi16_ZMM
-   }
-  }
+// Can be found on Intel ISA Reference for CPUID
+
+// EAX = 0x01
+constexpr uint32_t pclmulqdq = uint32_t(1)
+                               << 1; ///< @private bit  1 of ECX for EAX=0x1
+constexpr uint32_t sse42 = uint32_t(1)
+                           << 20; ///< @private bit 20 of ECX for EAX=0x1
+constexpr uint32_t osxsave =
+    (uint32_t(1) << 26) |
+    (uint32_t(1) << 27); ///< @private bits 26+27 of ECX for EAX=0x1
+
+// EAX = 0x7f (Structured Extended Feature Flags), ECX = 0x00 (Sub-leaf)
+// See: "Table 3-8. Information Returned by CPUID Instruction"
+namespace ebx {
+constexpr uint32_t bmi1 = uint32_t(1) << 3;
+constexpr uint32_t avx2 = uint32_t(1) << 5;
+constexpr uint32_t bmi2 = uint32_t(1) << 8;
+constexpr uint32_t avx512f = uint32_t(1) << 16;
+constexpr uint32_t avx512dq = uint32_t(1) << 17;
+constexpr uint32_t avx512ifma = uint32_t(1) << 21;
+constexpr uint32_t avx512cd = uint32_t(1) << 28;
+constexpr uint32_t avx512bw = uint32_t(1) << 30;
+constexpr uint32_t avx512vl = uint32_t(1) << 31;
+} // namespace ebx
+
+namespace ecx {
+constexpr uint32_t avx512vbmi = uint32_t(1) << 1;
+constexpr uint32_t avx512vbmi2 = uint32_t(1) << 6;
+constexpr uint32_t avx512vnni = uint32_t(1) << 11;
+constexpr uint32_t avx512bitalg = uint32_t(1) << 12;
+constexpr uint32_t avx512vpopcnt = uint32_t(1) << 14;
+} // namespace ecx
+namespace edx {
+constexpr uint32_t avx512vp2intersect = uint32_t(1) << 8;
 }
-
-
+namespace xcr0_bit {
+constexpr uint64_t avx256_saved = uint64_t(1) << 2; ///< @private bit 2 = AVX
+constexpr uint64_t avx512_saved =
+    uint64_t(7) << 5; ///< @private bits 5,6,7 = opmask, ZMM_hi256, hi16_ZMM
+} // namespace xcr0_bit
+} // namespace cpuid_bit
+} // namespace
 
 static inline void cpuid(uint32_t *eax, uint32_t *ebx, uint32_t *ecx,
                          uint32_t *edx) {
-#if defined(_MSC_VER)
+  #if defined(_MSC_VER)
   int cpu_info[4];
   __cpuidex(cpu_info, *eax, *ecx);
   *eax = cpu_info[0];
   *ebx = cpu_info[1];
   *ecx = cpu_info[2];
   *edx = cpu_info[3];
-#elif defined(HAVE_GCC_GET_CPUID) && defined(USE_GCC_GET_CPUID)
+  #elif defined(HAVE_GCC_GET_CPUID) && defined(USE_GCC_GET_CPUID)
   uint32_t level = *eax;
   __get_cpuid(level, eax, ebx, ecx, edx);
-#else
+  #else
   uint32_t a = *eax, b, c = *ecx, d;
   asm volatile("cpuid\n\t" : "+a"(a), "=b"(b), "+c"(c), "=d"(d));
   *eax = a;
   *ebx = b;
   *ecx = c;
   *edx = d;
-#endif
+  #endif
 }
 
 static inline uint64_t xgetbv() {
- #if defined(_MSC_VER)
-   return _xgetbv(0);
- #else
-   uint32_t xcr0_lo, xcr0_hi;
-   asm volatile("xgetbv\n\t" : "=a" (xcr0_lo), "=d" (xcr0_hi) : "c" (0));
-   return xcr0_lo | ((uint64_t)xcr0_hi << 32);
- #endif
- }
+  #if defined(_MSC_VER)
+  return _xgetbv(0);
+  #else
+  uint32_t xcr0_lo, xcr0_hi;
+  asm volatile("xgetbv\n\t" : "=a"(xcr0_lo), "=d"(xcr0_hi) : "c"(0));
+  return xcr0_lo | ((uint64_t)xcr0_hi << 32);
+  #endif
+}
 
 static inline uint32_t detect_supported_architectures() {
   uint32_t eax;
@@ -897,7 +935,8 @@ static inline uint32_t detect_supported_architectures() {
   if (ebx & cpuid_bit::ebx::bmi2) {
     host_isa |= instruction_set::BMI2;
   }
-  if (!((xcr0 & cpuid_bit::xcr0_bit::avx512_saved) == cpuid_bit::xcr0_bit::avx512_saved)) {
+  if (!((xcr0 & cpuid_bit::xcr0_bit::avx512_saved) ==
+        cpuid_bit::xcr0_bit::avx512_saved)) {
     return host_isa;
   }
   if (ebx & cpuid_bit::ebx::avx512f) {
@@ -930,7 +969,6 @@ static inline uint32_t detect_supported_architectures() {
   return instruction_set::DEFAULT;
 }
 
-
 #endif // end SIMD extension detection code
 
 } // namespace internal
@@ -939,7 +977,6 @@ static inline uint32_t detect_supported_architectures() {
 #endif // SIMDutf_INTERNAL_ISADETECTION_H
 /* end file include/simdutf/internal/isadetection.h */
 
-
 namespace simdutf {
 
 /**
@@ -952,8 +989,10 @@ namespace simdutf {
  * @param length the length of the string in bytes.
  * @return the detected encoding type
  */
-simdutf_warn_unused simdutf::encoding_type autodetect_encoding(const char * input, size_t length) noexcept;
-simdutf_really_inline simdutf_warn_unused simdutf::encoding_type autodetect_encoding(const uint8_t * input, size_t length) noexcept {
+simdutf_warn_unused simdutf::encoding_type
+autodetect_encoding(const char *input, size_t length) noexcept;
+simdutf_really_inline simdutf_warn_unused simdutf::encoding_type
+autodetect_encoding(const uint8_t *input, size_t length) noexcept {
   return autodetect_encoding(reinterpret_cast<const char *>(input), length);
 }
 
@@ -968,8 +1007,10 @@ simdutf_really_inline simdutf_warn_unused simdutf::encoding_type autodetect_enco
  * @param length the length of the string in bytes.
  * @return the detected encoding type
  */
-simdutf_warn_unused int detect_encodings(const char * input, size_t length) noexcept;
-simdutf_really_inline simdutf_warn_unused int detect_encodings(const uint8_t * input, size_t length) noexcept {
+simdutf_warn_unused int detect_encodings(const char *input,
+                                         size_t length) noexcept;
+simdutf_really_inline simdutf_warn_unused int
+detect_encodings(const uint8_t *input, size_t length) noexcept {
   return detect_encodings(reinterpret_cast<const char *>(input), length);
 }
 
@@ -993,9 +1034,13 @@ simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept;
  *
  * @param buf the UTF-8 string to validate.
  * @param len the length of the string in bytes.
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_utf8_with_errors(const char *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_utf8_with_errors(const char *buf,
+                                                     size_t len) noexcept;
 
 /**
  * Validate the ASCII string.
@@ -1016,24 +1061,30 @@ simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) noexcept;
  *
  * @param buf the ASCII string to validate.
  * @param len the length of the string in bytes.
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_ascii_with_errors(const char *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_ascii_with_errors(const char *buf,
+                                                      size_t len) noexcept;
 
 /**
  * Using native endianness; Validate the UTF-16 string.
- * This function may be best when you expect the input to be almost always valid.
- * Otherwise, consider using validate_utf16_with_errors.
+ * This function may be best when you expect the input to be almost always
+ * valid. Otherwise, consider using validate_utf16_with_errors.
  *
  * Overridden by each implementation.
  *
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16 string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
  * @return true if and only if the string is valid UTF-16.
  */
-simdutf_warn_unused bool validate_utf16(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused bool validate_utf16(const char16_t *buf,
+                                        size_t len) noexcept;
 
 /**
  * Validate the UTF-16LE string. This function may be best when you expect
@@ -1045,10 +1096,12 @@ simdutf_warn_unused bool validate_utf16(const char16_t *buf, size_t len) noexcep
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16LE string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
  * @return true if and only if the string is valid UTF-16LE.
  */
-simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                          size_t len) noexcept;
 
 /**
  * Validate the UTF-16BE string. This function may be best when you expect
@@ -1060,24 +1113,32 @@ simdutf_warn_unused bool validate_utf16le(const char16_t *buf, size_t len) noexc
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16BE string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
  * @return true if and only if the string is valid UTF-16BE.
  */
-simdutf_warn_unused bool validate_utf16be(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                          size_t len) noexcept;
 
 /**
  * Using native endianness; Validate the UTF-16 string and stop on error.
- * It might be faster than validate_utf16 when an error is expected to occur early.
+ * It might be faster than validate_utf16 when an error is expected to occur
+ * early.
  *
  * Overridden by each implementation.
  *
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16 string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf,
+                                                      size_t len) noexcept;
 
 /**
  * Validate the UTF-16LE string and stop on error. It might be faster than
@@ -1088,10 +1149,15 @@ simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf, size_
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16LE string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept;
 
 /**
  * Validate the UTF-16BE string and stop on error. It might be faster than
@@ -1102,10 +1168,15 @@ simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf, siz
  * This function is not BOM-aware.
  *
  * @param buf the UTF-16BE string to validate.
- * @param len the length of the string in number of 2-byte code units (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @param len the length of the string in number of 2-byte code units
+ * (char16_t).
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept;
 
 /**
  * Validate the UTF-32 string. This function may be best when you expect
@@ -1117,10 +1188,12 @@ simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf, siz
  * This function is not BOM-aware.
  *
  * @param buf the UTF-32 string to validate.
- * @param len the length of the string in number of 4-byte code units (char32_t).
+ * @param len the length of the string in number of 4-byte code units
+ * (char32_t).
  * @return true if and only if the string is valid UTF-32.
  */
-simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) noexcept;
+simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                        size_t len) noexcept;
 
 /**
  * Validate the UTF-32 string and stop on error. It might be faster than
@@ -1131,10 +1204,15 @@ simdutf_warn_unused bool validate_utf32(const char32_t *buf, size_t len) noexcep
  * This function is not BOM-aware.
  *
  * @param buf the UTF-32 string to validate.
- * @param len the length of the string in number of 4-byte code units (char32_t).
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+ * @param len the length of the string in number of 4-byte code units
+ * (char32_t).
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
  */
-simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_t len) noexcept;
+simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf,
+                                                      size_t len) noexcept;
 
 /**
  * Convert Latin1 string into UTF8 string.
@@ -1146,7 +1224,9 @@ simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf, size_
  * @param utf8_output   the pointer to buffer that can hold conversion result
  * @return the number of written char; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_latin1_to_utf8(const char * input, size_t length, char* utf8_output) noexcept;
+simdutf_warn_unused size_t convert_latin1_to_utf8(const char *input,
+                                                  size_t length,
+                                                  char *utf8_output) noexcept;
 
 /**
  * Convert Latin1 string into UTF8 string with output limit.
@@ -1159,7 +1239,9 @@ simdutf_warn_unused size_t convert_latin1_to_utf8(const char * input, size_t len
  * @param utf8_len      the maximum output length
  * @return the number of written char; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * input, size_t length, char* utf8_output, size_t utf8_len) noexcept;
+simdutf_warn_unused size_t
+convert_latin1_to_utf8_safe(const char *input, size_t length, char *utf8_output,
+                            size_t utf8_len) noexcept;
 
 /**
  * Convert possibly Latin1 string into UTF-16LE string.
@@ -1171,7 +1253,8 @@ simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * input, size_
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_latin1_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert Latin1 string into UTF-16BE string.
@@ -1183,7 +1266,8 @@ simdutf_warn_unused size_t convert_latin1_to_utf16le(const char * input, size_t
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_latin1_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert Latin1 string into UTF-32 string.
@@ -1195,7 +1279,8 @@ simdutf_warn_unused size_t convert_latin1_to_utf16be(const char * input, size_t
  * @param utf32_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char32_t; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_latin1_to_utf32(const char * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_latin1_to_utf32(
+    const char *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into latin1 string.
@@ -1206,12 +1291,16 @@ simdutf_warn_unused size_t convert_latin1_to_utf32(const char * input, size_t le
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param latin1_output  the pointer to buffer that can hold conversion result
- * @return the number of written char; 0 if the input was not valid UTF-8 string or if it cannot be represented as Latin1
+ * @return the number of written char; 0 if the input was not valid UTF-8 string
+ * or if it cannot be represented as Latin1
  */
-simdutf_warn_unused size_t convert_utf8_to_latin1(const char * input, size_t length, char* latin1_output) noexcept;
+simdutf_warn_unused size_t convert_utf8_to_latin1(const char *input,
+                                                  size_t length,
+                                                  char *latin1_output) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-8 string into a UTF-16 string.
+ * Using native endianness, convert possibly broken UTF-8 string into a UTF-16
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1219,9 +1308,11 @@ simdutf_warn_unused size_t convert_utf8_to_latin1(const char * input, size_t len
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+ * @return the number of written char16_t; 0 if the input was not valid UTF-8
+ * string
  */
-simdutf_warn_unused size_t convert_utf8_to_utf16(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Using native endianness, convert a Latin1 string into a UTF-16 string.
@@ -1231,7 +1322,8 @@ simdutf_warn_unused size_t convert_utf8_to_utf16(const char * input, size_t leng
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t.
  */
-simdutf_warn_unused size_t convert_latin1_to_utf16(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_latin1_to_utf16(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-16LE string.
@@ -1242,9 +1334,11 @@ simdutf_warn_unused size_t convert_latin1_to_utf16(const char * input, size_t le
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+ * @return the number of written char16_t; 0 if the input was not valid UTF-8
+ * string
  */
-simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-16BE string.
@@ -1255,25 +1349,30 @@ simdutf_warn_unused size_t convert_utf8_to_utf16le(const char * input, size_t le
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+ * @return the number of written char16_t; 0 if the input was not valid UTF-8
+ * string
  */
-simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused size_t convert_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
-
-  /**
-   * Convert possibly broken UTF-8 string into latin1 string with errors.
-   * If the string cannot be represented as Latin1, an error
-   * code is returned.
-   *
-   * During the conversion also validation of the input string is done.
-   * This function is suitable to work with inputs from untrusted sources.
-   *
-   * @param input         the UTF-8 string to convert
-   * @param length        the length of the string in bytes
-   * @param latin1_output  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
-   */
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(const char * input, size_t length, char* latin1_output) noexcept;
+/**
+ * Convert possibly broken UTF-8 string into latin1 string with errors.
+ * If the string cannot be represented as Latin1, an error
+ * code is returned.
+ *
+ * During the conversion also validation of the input string is done.
+ * This function is suitable to work with inputs from untrusted sources.
+ *
+ * @param input         the UTF-8 string to convert
+ * @param length        the length of the string in bytes
+ * @param latin1_output  the pointer to buffer that can hold conversion result
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of code units validated if
+ * successful.
+ */
+simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+    const char *input, size_t length, char *latin1_output) noexcept;
 
 /**
  * Using native endianness, convert possibly broken UTF-8 string into UTF-16
@@ -1285,9 +1384,13 @@ simdutf_warn_unused size_t convert_utf8_to_utf16be(const char * input, size_t le
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf8_to_utf16_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused result convert_utf8_to_utf16_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-16LE string and stop on error.
@@ -1298,9 +1401,13 @@ simdutf_warn_unused result convert_utf8_to_utf16_with_errors(const char * input,
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-16BE string and stop on error.
@@ -1311,9 +1418,13 @@ simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(const char * inpu
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * input, size_t length, char16_t* utf16_output) noexcept;
+simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-32 string.
@@ -1324,9 +1435,11 @@ simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(const char * inpu
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf32_buffer  the pointer to buffer that can hold conversion result
- * @return the number of written char32_t; 0 if the input was not valid UTF-8 string
+ * @return the number of written char32_t; 0 if the input was not valid UTF-8
+ * string
  */
-simdutf_warn_unused size_t convert_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_output) noexcept;
+simdutf_warn_unused size_t convert_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_output) noexcept;
 
 /**
  * Convert possibly broken UTF-8 string into UTF-32 string and stop on error.
@@ -1337,18 +1450,25 @@ simdutf_warn_unused size_t convert_utf8_to_utf32(const char * input, size_t leng
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf32_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char32_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * input, size_t length, char32_t* utf32_output) noexcept;
+simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+    const char *input, size_t length, char32_t *utf32_output) noexcept;
 
 /**
  * Convert valid UTF-8 string into latin1 string.
  *
- * This function assumes that the input string is valid UTF-8 and that it can be represented as Latin1.
- * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+ * This function assumes that the input string is valid UTF-8 and that it can be
+ * represented as Latin1. If you violate this assumption, the result is
+ * implementation defined and may include system-dependent behavior such as
+ * crashes.
  *
- * This function is for expert users only and not part of our public API. Use convert_utf8_to_latin1 instead.
- * The function may be removed from the library in the future.
+ * This function is for expert users only and not part of our public API. Use
+ * convert_utf8_to_latin1 instead. The function may be removed from the library
+ * in the future.
  *
  * This function is not BOM-aware.
  *
@@ -1357,8 +1477,8 @@ simdutf_warn_unused result convert_utf8_to_utf32_with_errors(const char * input,
  * @param latin1_output  the pointer to buffer that can hold conversion result
  * @return the number of written char; 0 if the input was not valid UTF-8 string
  */
-simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * input, size_t length, char* latin1_output) noexcept;
-
+simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+    const char *input, size_t length, char *latin1_output) noexcept;
 
 /**
  * Using native endianness, convert valid UTF-8 string into a UTF-16 string.
@@ -1370,7 +1490,8 @@ simdutf_warn_unused size_t convert_valid_utf8_to_latin1(const char * input, size
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t
  */
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16(const char * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert valid UTF-8 string into UTF-16LE string.
@@ -1382,7 +1503,8 @@ simdutf_warn_unused size_t convert_valid_utf8_to_utf16(const char * input, size_
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t
  */
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert valid UTF-8 string into UTF-16BE string.
@@ -1394,7 +1516,8 @@ simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(const char * input, siz
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char16_t
  */
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert valid UTF-8 string into UTF-32 string.
@@ -1406,23 +1529,26 @@ simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(const char * input, siz
  * @param utf32_buffer  the pointer to buffer that can hold conversion result
  * @return the number of written char32_t
  */
-simdutf_warn_unused size_t convert_valid_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
- * Return the number of bytes that this Latin1 string would require in UTF-8 format.
+ * Return the number of bytes that this Latin1 string would require in UTF-8
+ * format.
  *
  * @param input         the Latin1 string to convert
  * @param length        the length of the string bytes
  * @return the number of bytes required to encode the Latin1 string as UTF-8
  */
-simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf8_length_from_latin1(const char *input,
+                                                   size_t length) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-8 string would require in Latin1 format.
+ * Compute the number of bytes that this UTF-8 string would require in Latin1
+ * format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-8 strings but in such cases
-   * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-8 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -1430,40 +1556,48 @@ simdutf_warn_unused size_t utf8_length_from_latin1(const char * input, size_t le
  * @param length        the length of the string in byte
  * @return the number of bytes required to encode the UTF-8 string as Latin1
  */
-simdutf_warn_unused size_t latin1_length_from_utf8(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t latin1_length_from_utf8(const char *input,
+                                                   size_t length) noexcept;
 
 /**
- * Compute the number of 2-byte code units that this UTF-8 string would require in UTF-16LE format.
+ * Compute the number of 2-byte code units that this UTF-8 string would require
+ * in UTF-16LE format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-8 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-8 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
  * @param input         the UTF-8 string to process
  * @param length        the length of the string in bytes
- * @return the number of char16_t code units required to encode the UTF-8 string as UTF-16LE
+ * @return the number of char16_t code units required to encode the UTF-8 string
+ * as UTF-16LE
  */
-simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf16_length_from_utf8(const char *input,
+                                                  size_t length) noexcept;
 
 /**
- * Compute the number of 4-byte code units that this UTF-8 string would require in UTF-32 format.
+ * Compute the number of 4-byte code units that this UTF-8 string would require
+ * in UTF-32 format.
  *
  * This function is equivalent to count_utf8
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-8 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-8 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
  * @param input         the UTF-8 string to process
  * @param length        the length of the string in bytes
- * @return the number of char32_t code units required to encode the UTF-8 string as UTF-32
+ * @return the number of char32_t code units required to encode the UTF-8 string
+ * as UTF-32
  */
-simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf32_length_from_utf8(const char *input,
+                                                  size_t length) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-16 string into UTF-8 string.
+ * Using native endianness, convert possibly broken UTF-16 string into UTF-8
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1473,14 +1607,16 @@ simdutf_warn_unused size_t utf32_length_from_utf8(const char * input, size_t len
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
-
-
+simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t *input,
+                                                 size_t length,
+                                                 char *utf8_buffer) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-16 string into Latin1 string.
+ * Using native endianness, convert possibly broken UTF-16 string into Latin1
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1490,9 +1626,11 @@ simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t * input, size_t
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16 string  or if it cannot be represented as Latin1
+ * @return number of written code units; 0 if input is not a valid UTF-16 string
+ * or if it cannot be represented as Latin1
  */
-simdutf_warn_unused size_t convert_utf16_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into Latin1 string.
@@ -1507,9 +1645,11 @@ simdutf_warn_unused size_t convert_utf16_to_latin1(const char16_t * input, size_
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string or if it cannot be represented as Latin1
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string or if it cannot be represented as Latin1
  */
-simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16le_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into Latin1 string.
@@ -1522,10 +1662,11 @@ simdutf_warn_unused size_t convert_utf16le_to_latin1(const char16_t * input, siz
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16BE string or if it cannot be represented as Latin1
+ * @return number of written code units; 0 if input is not a valid UTF-16BE
+ * string or if it cannot be represented as Latin1
  */
-simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_utf16be_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into UTF-8 string.
@@ -1538,9 +1679,12 @@ simdutf_warn_unused size_t convert_utf16be_to_latin1(const char16_t * input, siz
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t *input,
+                                                   size_t length,
+                                                   char *utf8_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into UTF-8 string.
@@ -1553,12 +1697,16 @@ simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t * input, size_
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *input,
+                                                   size_t length,
+                                                   char *utf8_buffer) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-16 string into Latin1 string.
+ * Using native endianness, convert possibly broken UTF-16 string into Latin1
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1567,9 +1715,13 @@ simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t * input, size_
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16_to_latin1_with_errors(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused result convert_utf16_to_latin1_with_errors(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into Latin1 string.
@@ -1581,9 +1733,13 @@ simdutf_warn_unused result convert_utf16_to_latin1_with_errors(const char16_t *
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into Latin1 string.
@@ -1597,13 +1753,17 @@ simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(const char16_t
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
-
+simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-16 string into UTF-8 string and stop on error.
+ * Using native endianness, convert possibly broken UTF-16 string into UTF-8
+ * string and stop on error.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1613,9 +1773,13 @@ simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(const char16_t
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16_to_utf8_with_errors(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused result convert_utf16_to_utf8_with_errors(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into UTF-8 string and stop on error.
@@ -1628,9 +1792,13 @@ simdutf_warn_unused result convert_utf16_to_utf8_with_errors(const char16_t * in
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into UTF-8 string and stop on error.
@@ -1643,9 +1811,13 @@ simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(const char16_t *
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Using native endianness, convert valid UTF-16 string into UTF-8 string.
@@ -1656,20 +1828,24 @@ simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(const char16_t *
  *
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf8_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_valid_utf16_to_utf8(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Using native endianness, convert UTF-16 string into Latin1 string.
  *
- * This function assumes that the input string is valid UTF-16 and that it can be represented as Latin1.
- * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+ * This function assumes that the input string is valid UTF-16 and that it can
+ * be represented as Latin1. If you violate this assumption, the result is
+ * implementation defined and may include system-dependent behavior such as
+ * crashes.
  *
- * This function is for expert users only and not part of our public API. Use convert_utf16_to_latin1 instead.
- * The function may be removed from the library in the future.
+ * This function is for expert users only and not part of our public API. Use
+ * convert_utf16_to_latin1 instead. The function may be removed from the library
+ * in the future.
  *
  * This function is not BOM-aware.
  *
@@ -1678,16 +1854,20 @@ simdutf_warn_unused size_t convert_valid_utf16_to_utf8(const char16_t * input, s
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert valid UTF-16LE string into Latin1 string.
  *
- * This function assumes that the input string is valid UTF-16LE and that it can be represented as Latin1.
- * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+ * This function assumes that the input string is valid UTF-16LE and that it can
+ * be represented as Latin1. If you violate this assumption, the result is
+ * implementation defined and may include system-dependent behavior such as
+ * crashes.
  *
- * This function is for expert users only and not part of our public API. Use convert_utf16le_to_latin1 instead.
- * The function may be removed from the library in the future.
+ * This function is for expert users only and not part of our public API. Use
+ * convert_utf16le_to_latin1 instead. The function may be removed from the
+ * library in the future.
  *
  * This function is not BOM-aware.
  *
@@ -1696,16 +1876,20 @@ simdutf_warn_unused size_t convert_valid_utf16_to_latin1(const char16_t * input,
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert valid UTF-16BE string into Latin1 string.
  *
- * This function assumes that the input string is valid UTF-16BE and that it can be represented as Latin1.
- * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+ * This function assumes that the input string is valid UTF-16BE and that it can
+ * be represented as Latin1. If you violate this assumption, the result is
+ * implementation defined and may include system-dependent behavior such as
+ * crashes.
  *
- * This function is for expert users only and not part of our public API. Use convert_utf16be_to_latin1 instead.
- * The function may be removed from the library in the future.
+ * This function is for expert users only and not part of our public API. Use
+ * convert_utf16be_to_latin1 instead. The function may be removed from the
+ * library in the future.
  *
  * This function is not BOM-aware.
  *
@@ -1714,22 +1898,25 @@ simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(const char16_t * inpu
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+    const char16_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert valid UTF-16LE string into UTF-8 string.
  *
- * This function assumes that the input string is valid UTF-16LE and that it can be represented as Latin1.
+ * This function assumes that the input string is valid UTF-16LE and that it can
+ * be represented as Latin1.
  *
  * This function is not BOM-aware.
  *
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf8_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Convert valid UTF-16BE string into UTF-8 string.
@@ -1740,13 +1927,16 @@ simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(const char16_t * input,
  *
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf8_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+    const char16_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-16 string into UTF-32 string.
+ * Using native endianness, convert possibly broken UTF-16 string into UTF-32
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1756,9 +1946,11 @@ simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(const char16_t * input,
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into UTF-32 string.
@@ -1771,9 +1963,11 @@ simdutf_warn_unused size_t convert_utf16_to_utf32(const char16_t * input, size_t
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16le_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into UTF-32 string.
@@ -1786,9 +1980,11 @@ simdutf_warn_unused size_t convert_utf16le_to_utf32(const char16_t * input, size
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-16LE string
+ * @return number of written code units; 0 if input is not a valid UTF-16LE
+ * string
  */
-simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf16be_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Using native endianness, convert possibly broken UTF-16 string into
@@ -1802,9 +1998,13 @@ simdutf_warn_unused size_t convert_utf16be_to_utf32(const char16_t * input, size
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char32_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16_to_utf32_with_errors(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused result convert_utf16_to_utf32_with_errors(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16LE string into UTF-32 string and stop on error.
@@ -1817,9 +2017,13 @@ simdutf_warn_unused result convert_utf16_to_utf32_with_errors(const char16_t * i
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char32_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-16BE string into UTF-32 string and stop on error.
@@ -1832,23 +2036,30 @@ simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(const char16_t *
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char32_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Using native endianness, convert valid UTF-16 string into UTF-32 string.
  *
- * This function assumes that the input string is valid UTF-16 (native endianness).
+ * This function assumes that the input string is valid UTF-16 (native
+ * endianness).
  *
  * This function is not BOM-aware.
  *
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf32_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf32_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert valid UTF-16LE string into UTF-32 string.
@@ -1859,10 +2070,12 @@ simdutf_warn_unused size_t convert_valid_utf16_to_utf32(const char16_t * input,
  *
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf32_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf32_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /**
  * Convert valid UTF-16BE string into UTF-32 string.
@@ -1873,17 +2086,19 @@ simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(const char16_t * input
  *
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param utf32_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf32_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+    const char16_t *input, size_t length, char32_t *utf32_buffer) noexcept;
 
 /*
- * Compute the number of bytes that this UTF-16LE/BE string would require in Latin1 format.
+ * Compute the number of bytes that this UTF-16LE/BE string would require in
+ * Latin1 format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -1892,43 +2107,47 @@ simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(const char16_t * input
  */
 simdutf_warn_unused size_t latin1_length_from_utf16(size_t length) noexcept;
 
-
 /**
  * Using native endianness; Compute the number of bytes that this UTF-16
  * string would require in UTF-8 format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16LE string as UTF-8
  */
-simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t *input,
+                                                  size_t length) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-16LE string would require in UTF-8 format.
+ * Compute the number of bytes that this UTF-16LE string would require in UTF-8
+ * format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16LE string as UTF-8
  */
-simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *input,
+                                                    size_t length) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-16BE string would require in UTF-8 format.
+ * Compute the number of bytes that this UTF-16BE string would require in UTF-8
+ * format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16BE string as UTF-8
  */
-simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *input,
+                                                    size_t length) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-8 string.
@@ -1943,7 +2162,9 @@ simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t * input, size
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if input is not a valid UTF-32 string
  */
-simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *input,
+                                                 size_t length,
+                                                 char *utf8_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-8 string and stop on error.
@@ -1956,9 +2177,13 @@ simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t * input, size_t
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+    const char32_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
  * Convert valid UTF-32 string into UTF-8 string.
@@ -1969,13 +2194,16 @@ simdutf_warn_unused result convert_utf32_to_utf8_with_errors(const char32_t * in
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
- * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf8_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * input, size_t length, char* utf8_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+    const char32_t *input, size_t length, char *utf8_buffer) noexcept;
 
 /**
- * Using native endianness, convert possibly broken UTF-32 string into a UTF-16 string.
+ * Using native endianness, convert possibly broken UTF-32 string into a UTF-16
+ * string.
  *
  * During the conversion also validation of the input string is done.
  * This function is suitable to work with inputs from untrusted sources.
@@ -1987,7 +2215,8 @@ simdutf_warn_unused size_t convert_valid_utf32_to_utf8(const char32_t * input, s
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if input is not a valid UTF-32 string
  */
-simdutf_warn_unused size_t convert_utf32_to_utf16(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf32_to_utf16(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-16LE string.
@@ -2002,7 +2231,8 @@ simdutf_warn_unused size_t convert_utf32_to_utf16(const char32_t * input, size_t
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if input is not a valid UTF-32 string
  */
-simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf32_to_utf16le(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into Latin1 string.
@@ -2015,10 +2245,11 @@ simdutf_warn_unused size_t convert_utf32_to_utf16le(const char32_t * input, size
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return number of written code units; 0 if input is not a valid UTF-32 string or if it cannot be represented as Latin1
+ * @return number of written code units; 0 if input is not a valid UTF-32 string
+ * or if it cannot be represented as Latin1
  */
-simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * input, size_t length, char* latin1_buffer) noexcept;
-
+simdutf_warn_unused size_t convert_utf32_to_latin1(
+    const char32_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into Latin1 string and stop on error.
@@ -2032,27 +2263,36 @@ simdutf_warn_unused size_t convert_utf32_to_latin1(const char32_t * input, size_
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf32_to_latin1_with_errors(const char32_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
+    const char32_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert valid UTF-32 string into Latin1 string.
  *
- * This function assumes that the input string is valid UTF-32 and that it can be represented as Latin1.
- * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+ * This function assumes that the input string is valid UTF-32 and that it can
+ * be represented as Latin1. If you violate this assumption, the result is
+ * implementation defined and may include system-dependent behavior such as
+ * crashes.
  *
- * This function is for expert users only and not part of our public API. Use convert_utf32_to_latin1 instead.
- * The function may be removed from the library in the future.
+ * This function is for expert users only and not part of our public API. Use
+ * convert_utf32_to_latin1 instead. The function may be removed from the library
+ * in the future.
  *
  * This function is not BOM-aware.
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
- * @param latin1_buffer   the pointer to buffer that can hold the conversion result
+ * @param latin1_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * input, size_t length, char* latin1_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
+    const char32_t *input, size_t length, char *latin1_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-16BE string.
@@ -2067,7 +2307,8 @@ simdutf_warn_unused size_t convert_valid_utf32_to_latin1(const char32_t * input,
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
  * @return number of written code units; 0 if input is not a valid UTF-32 string
  */
-simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_utf32_to_utf16be(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Using native endianness, convert possibly broken UTF-32 string into UTF-16
@@ -2081,9 +2322,13 @@ simdutf_warn_unused size_t convert_utf32_to_utf16be(const char32_t * input, size
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf32_to_utf16_with_errors(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused result convert_utf32_to_utf16_with_errors(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-16LE string and stop on error.
@@ -2096,9 +2341,13 @@ simdutf_warn_unused result convert_utf32_to_utf16_with_errors(const char32_t * i
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert possibly broken UTF-32 string into UTF-16BE string and stop on error.
@@ -2111,9 +2360,13 @@ simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(const char32_t *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in code units) if any, or the number of char16_t written if
+ * successful.
  */
-simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Using native endianness, convert valid UTF-32 string into a UTF-16 string.
@@ -2124,10 +2377,12 @@ simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(const char32_t *
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
- * @param utf16_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf16_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert valid UTF-32 string into UTF-16LE string.
@@ -2138,10 +2393,12 @@ simdutf_warn_unused size_t convert_valid_utf32_to_utf16(const char32_t * input,
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
- * @param utf16_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf16_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
  * Convert valid UTF-32 string into UTF-16BE string.
@@ -2152,14 +2409,16 @@ simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(const char32_t * input
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
- * @param utf16_buffer   the pointer to buffer that can hold the conversion result
+ * @param utf16_buffer   the pointer to buffer that can hold the conversion
+ * result
  * @return number of written code units; 0 if conversion is not possible
  */
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * input, size_t length, char16_t* utf16_buffer) noexcept;
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+    const char32_t *input, size_t length, char16_t *utf16_buffer) noexcept;
 
 /**
- * Change the endianness of the input. Can be used to go from UTF-16LE to UTF-16BE or
- * from UTF-16BE to UTF-16LE.
+ * Change the endianness of the input. Can be used to go from UTF-16LE to
+ * UTF-16BE or from UTF-16BE to UTF-16LE.
  *
  * This function does not validate the input.
  *
@@ -2167,33 +2426,39 @@ simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(const char32_t * input
  *
  * @param input         the UTF-16 string to process
  * @param length        the length of the string in 2-byte code units (char16_t)
- * @param output        the pointer to buffer that can hold the conversion result
+ * @param output        the pointer to buffer that can hold the conversion
+ * result
  */
-void change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) noexcept;
+void change_endianness_utf16(const char16_t *input, size_t length,
+                             char16_t *output) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-32 string would require in UTF-8 format.
+ * Compute the number of bytes that this UTF-32 string would require in UTF-8
+ * format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-32 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-32 strings but in such cases the result is implementation defined.
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @return the number of bytes required to encode the UTF-32 string as UTF-8
  */
-simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *input,
+                                                  size_t length) noexcept;
 
 /**
- * Compute the number of two-byte code units that this UTF-32 string would require in UTF-16 format.
+ * Compute the number of two-byte code units that this UTF-32 string would
+ * require in UTF-16 format.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-32 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-32 strings but in such cases the result is implementation defined.
  *
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @return the number of bytes required to encode the UTF-32 string as UTF-16
  */
-simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *input,
+                                                   size_t length) noexcept;
 
 /**
  * Using native endianness; Compute the number of bytes that this UTF-16
@@ -2201,8 +2466,8 @@ simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_
  *
  * This function is equivalent to count_utf16.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -2210,15 +2475,17 @@ simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t * input, size_
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16LE string as UTF-32
  */
-simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t *input,
+                                                   size_t length) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-16LE string would require in UTF-32 format.
+ * Compute the number of bytes that this UTF-16LE string would require in UTF-32
+ * format.
  *
  * This function is equivalent to count_utf16le.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -2226,15 +2493,17 @@ simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t * input, size_
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16LE string as UTF-32
  */
-simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *input,
+                                                     size_t length) noexcept;
 
 /**
- * Compute the number of bytes that this UTF-16BE string would require in UTF-32 format.
+ * Compute the number of bytes that this UTF-16BE string would require in UTF-32
+ * format.
  *
  * This function is equivalent to count_utf16be.
  *
- * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function does not validate the input. It is acceptable to pass invalid
+ * UTF-16 strings but in such cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -2242,15 +2511,16 @@ simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t * input, siz
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return the number of bytes required to encode the UTF-16BE string as UTF-32
  */
-simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *input,
+                                                     size_t length) noexcept;
 
 /**
  * Count the number of code points (characters) in the string assuming that
  * it is valid.
  *
- * This function assumes that the input string is valid UTF-16 (native endianness).
- * It is acceptable to pass invalid UTF-16 strings but in such cases
- * the result is implementation defined.
+ * This function assumes that the input string is valid UTF-16 (native
+ * endianness). It is acceptable to pass invalid UTF-16 strings but in such
+ * cases the result is implementation defined.
  *
  * This function is not BOM-aware.
  *
@@ -2258,7 +2528,8 @@ simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t * input, siz
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return number of code points
  */
-simdutf_warn_unused size_t count_utf16(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t count_utf16(const char16_t *input,
+                                       size_t length) noexcept;
 
 /**
  * Count the number of code points (characters) in the string assuming that
@@ -2274,7 +2545,8 @@ simdutf_warn_unused size_t count_utf16(const char16_t * input, size_t length) no
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return number of code points
  */
-simdutf_warn_unused size_t count_utf16le(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t count_utf16le(const char16_t *input,
+                                         size_t length) noexcept;
 
 /**
  * Count the number of code points (characters) in the string assuming that
@@ -2290,7 +2562,8 @@ simdutf_warn_unused size_t count_utf16le(const char16_t * input, size_t length)
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @return number of code points
  */
-simdutf_warn_unused size_t count_utf16be(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t count_utf16be(const char16_t *input,
+                                         size_t length) noexcept;
 
 /**
  * Count the number of code points (characters) in the string assuming that
@@ -2304,16 +2577,18 @@ simdutf_warn_unused size_t count_utf16be(const char16_t * input, size_t length)
  * @param length        the length of the string in bytes
  * @return number of code points
  */
-simdutf_warn_unused size_t count_utf8(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t count_utf8(const char *input,
+                                      size_t length) noexcept;
 
 /**
  * Given a valid UTF-8 string having a possibly truncated last character,
- * this function checks the end of string. If the last character is truncated (or partial),
- * then it returns a shorter length (shorter by 1 to 3 bytes) so that the short UTF-8
- * strings only contain complete characters. If there is no truncated character,
- * the original length is returned.
+ * this function checks the end of string. If the last character is truncated
+ * (or partial), then it returns a shorter length (shorter by 1 to 3 bytes) so
+ * that the short UTF-8 strings only contain complete characters. If there is no
+ * truncated character, the original length is returned.
  *
- * This function assumes that the input string is valid UTF-8, but possibly truncated.
+ * This function assumes that the input string is valid UTF-8, but possibly
+ * truncated.
  *
  * @param input         the UTF-8 string to process
  * @param length        the length of the string in bytes
@@ -2323,59 +2598,77 @@ simdutf_warn_unused size_t trim_partial_utf8(const char *input, size_t length);
 
 /**
  * Given a valid UTF-16BE string having a possibly truncated last character,
- * this function checks the end of string. If the last character is truncated (or partial),
- * then it returns a shorter length (shorter by 1 unit) so that the short UTF-16BE
- * strings only contain complete characters. If there is no truncated character,
- * the original length is returned.
+ * this function checks the end of string. If the last character is truncated
+ * (or partial), then it returns a shorter length (shorter by 1 unit) so that
+ * the short UTF-16BE strings only contain complete characters. If there is no
+ * truncated character, the original length is returned.
  *
- * This function assumes that the input string is valid UTF-16BE, but possibly truncated.
+ * This function assumes that the input string is valid UTF-16BE, but possibly
+ * truncated.
  *
  * @param input         the UTF-16BE string to process
  * @param length        the length of the string in bytes
  * @return the length of the string in bytes, possibly shorter by 1 unit
  */
-simdutf_warn_unused size_t trim_partial_utf16be(const char16_t* input, size_t length);
+simdutf_warn_unused size_t trim_partial_utf16be(const char16_t *input,
+                                                size_t length);
 
 /**
  * Given a valid UTF-16LE string having a possibly truncated last character,
- * this function checks the end of string. If the last character is truncated (or partial),
- * then it returns a shorter length (shorter by 1 unit) so that the short UTF-16LE
- * strings only contain complete characters. If there is no truncated character,
- * the original length is returned.
+ * this function checks the end of string. If the last character is truncated
+ * (or partial), then it returns a shorter length (shorter by 1 unit) so that
+ * the short UTF-16LE strings only contain complete characters. If there is no
+ * truncated character, the original length is returned.
  *
- * This function assumes that the input string is valid UTF-16LE, but possibly truncated.
+ * This function assumes that the input string is valid UTF-16LE, but possibly
+ * truncated.
  *
  * @param input         the UTF-16LE string to process
  * @param length        the length of the string in bytes
  * @return the length of the string in unit, possibly shorter by 1 unit
  */
-simdutf_warn_unused size_t trim_partial_utf16le(const char16_t* input, size_t length);
-
+simdutf_warn_unused size_t trim_partial_utf16le(const char16_t *input,
+                                                size_t length);
 
 /**
  * Given a valid UTF-16 string having a possibly truncated last character,
- * this function checks the end of string. If the last character is truncated (or partial),
- * then it returns a shorter length (shorter by 1 unit) so that the short UTF-16
- * strings only contain complete characters. If there is no truncated character,
- * the original length is returned.
+ * this function checks the end of string. If the last character is truncated
+ * (or partial), then it returns a shorter length (shorter by 1 unit) so that
+ * the short UTF-16 strings only contain complete characters. If there is no
+ * truncated character, the original length is returned.
  *
- * This function assumes that the input string is valid UTF-16, but possibly truncated.
- * We use the native endianness.
+ * This function assumes that the input string is valid UTF-16, but possibly
+ * truncated. We use the native endianness.
  *
  * @param input         the UTF-16 string to process
  * @param length        the length of the string in bytes
  * @return the length of the string in unit, possibly shorter by 1 unit
  */
-simdutf_warn_unused size_t trim_partial_utf16(const char16_t* input, size_t length);
+simdutf_warn_unused size_t trim_partial_utf16(const char16_t *input,
+                                              size_t length);
 
 // base64_options are used to specify the base64 encoding options.
 using base64_options = uint64_t;
 enum : base64_options {
-  base64_default = 0, /* standard base64 format (with padding) */
-  base64_url = 1, /* base64url format (no padding) */
+  base64_default = 0,         /* standard base64 format (with padding) */
+  base64_url = 1,             /* base64url format (no padding) */
   base64_reverse_padding = 2, /* modifier for base64_default and base64_url */
-  base64_default_no_padding = base64_default | base64_reverse_padding, /* standard base64 format without padding */
-  base64_url_with_padding = base64_url | base64_reverse_padding, /* base64url with padding */
+  base64_default_no_padding =
+      base64_default |
+      base64_reverse_padding, /* standard base64 format without padding */
+  base64_url_with_padding =
+      base64_url | base64_reverse_padding, /* base64url with padding */
+};
+
+// last_chunk_handling_options are used to specify the handling of the last
+// chunk in base64 decoding.
+// https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
+enum last_chunk_handling_options : uint64_t {
+  loose = 0, /* standard base64 format, decode partial final chunk */
+  strict =
+      1, /* error when the last chunk is partial, 2 or 3 chars, and unpadded */
+  stop_before_partial =
+      2, /* if the last chunk is partial (2 or 3 chars), ignore it (no error) */
 };
 
 /**
@@ -2387,53 +2680,80 @@ enum : base64_options {
  * @param length        the length of the base64 input in bytes
  * @return maximum number of binary bytes
  */
-simdutf_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) noexcept;
+simdutf_warn_unused size_t
+maximal_binary_length_from_base64(const char *input, size_t length) noexcept;
 
 /**
  * Provide the maximal binary length in bytes given the base64 input.
  * In general, if the input contains ASCII spaces, the result will be less than
  * the maximum length.
  *
- * @param input         the base64 input to process, in ASCII stored as 16-bit units
+ * @param input         the base64 input to process, in ASCII stored as 16-bit
+ * units
  * @param length        the length of the base64 input in 16-bit units
  * @return maximal number of binary bytes
  */
-simdutf_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) noexcept;
+simdutf_warn_unused size_t maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) noexcept;
 
 /**
  * Convert a base64 input to a binary output.
  *
- * This function follows the WHATWG forgiving-base64 format, which means that it will
- * ignore any ASCII spaces in the input. You may provide a padded input (with one or two
- * equal signs at the end) or an unpadded input (without any equal signs at the end).
+ * This function follows the WHATWG forgiving-base64 format, which means that it
+ * will ignore any ASCII spaces in the input. You may provide a padded input
+ * (with one or two equal signs at the end) or an unpadded input (without any
+ * equal signs at the end).
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are two possible reasons for
- * failure: the input contains a number of base64 characters that when divided by 4, leaves
- * a single remainder character (BASE64_INPUT_REMAINDER), or the input contains a character
- * that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+ * This function will fail in case of invalid input. There are two possible
+ * reasons for failure: the input contains a number of base64 characters that
+ * when divided by 4, leaves a single remainder character
+ * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
+ * valid base64 character (INVALID_BASE64_CHARACTER).
+ *
+ * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the
+ * input where the invalid character was found. When the error is
+ * BASE64_INPUT_REMAINDER, then r.count contains the number of bytes decoded.
  *
- * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input
- * where the invalid character was found. When the error is BASE64_INPUT_REMAINDER, then
- * r.count contains the number of bytes decoded.
+ * The default option (simdutf::base64_default) expects the characters `+` and
+ * `/` as part of its alphabet. The URL option (simdutf::base64_url) expects the
+ * characters `-` and `_` as part of its alphabet.
  *
- * The default option (simdutf::base64_default) expects the characters `+` and `/` as part of its alphabet.
- * The URL option (simdutf::base64_url) expects the characters `-` and `_` as part of its alphabet.
+ * The padding (`=`) is validated if present. There may be at most two padding
+ * characters at the end of the input. If there are any padding characters, the
+ * total number of characters (excluding spaces but including padding
+ * characters) must be divisible by four.
  *
- * The padding (`=`) is validated if present. There may be at most two padding characters at the end of the input.
- * If there are any padding characters, the total number of characters (excluding spaces but including padding characters) must be divisible by four.
+ * You should call this function with a buffer that is at least
+ * maximal_binary_length_from_base64(input, length) bytes long. If you fail to
+ * provide that much space, the function may cause a buffer overflow.
  *
- * You should call this function with a buffer that is at least maximal_binary_length_from_base64(input, length) bytes long.
- * If you fail to provide that much space, the function may cause a buffer overflow.
+ * Advanced users may want to taylor how the last chunk is handled. By default,
+ * we use a loose (forgiving) approach but we also support a strict approach
+ * as well as a stop_before_partial approach, as per the following proposal:
+ *
+ * https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
  *
  * @param input         the base64 string to process
  * @param length        the length of the string in bytes
- * @param output        the pointer to buffer that can hold the conversion result (should be at least maximal_binary_length_from_base64(input, length) bytes long).
- * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in bytes) if any, or the number of bytes written if successful.
+ * @param output        the pointer to buffer that can hold the conversion
+ * result (should be at least maximal_binary_length_from_base64(input, length)
+ * bytes long).
+ * @param options       the base64 options to use, usually base64_default or
+ * base64_url, and base64_default by default.
+ * @param last_chunk_options the last chunk handling options,
+ * last_chunk_handling_options::loose by default
+ * but can also be last_chunk_handling_options::strict or
+ * last_chunk_handling_options::stop_before_partial.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and either position of the error
+ * (in the input in bytes) if any, or the number of bytes written if successful.
  */
-simdutf_warn_unused result base64_to_binary(const char * input, size_t length, char* output, base64_options options = base64_default) noexcept;
+simdutf_warn_unused result base64_to_binary(
+    const char *input, size_t length, char *output,
+    base64_options options = base64_default,
+    last_chunk_handling_options last_chunk_options = loose) noexcept;
 
 /**
  * Provide the base64 length in bytes given the length of a binary input.
@@ -2441,117 +2761,182 @@ simdutf_warn_unused result base64_to_binary(const char * input, size_t length, c
  * @param length        the length of the input in bytes
  * @return number of base64 bytes
  */
-simdutf_warn_unused size_t base64_length_from_binary(size_t length, base64_options options = base64_default) noexcept;
+simdutf_warn_unused size_t base64_length_from_binary(
+    size_t length, base64_options options = base64_default) noexcept;
 
 /**
  * Convert a binary input to a base64 output.
  *
- * The default option (simdutf::base64_default) uses the characters `+` and `/` as part of its alphabet.
- * Further, it adds padding (`=`) at the end of the output to ensure that the output length is a multiple of four.
+ * The default option (simdutf::base64_default) uses the characters `+` and `/`
+ * as part of its alphabet. Further, it adds padding (`=`) at the end of the
+ * output to ensure that the output length is a multiple of four.
  *
- * The URL option (simdutf::base64_url) uses the characters `-` and `_` as part of its alphabet. No padding
- * is added at the end of the output.
+ * The URL option (simdutf::base64_url) uses the characters `-` and `_` as part
+ * of its alphabet. No padding is added at the end of the output.
  *
  * This function always succeeds.
  *
  * @param input         the binary to process
  * @param length        the length of the input in bytes
- * @param output        the pointer to buffer that can hold the conversion result (should be at least base64_length_from_binary(length) bytes long)
- * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
- * @return number of written bytes, will be equal to base64_length_from_binary(length, options)
+ * @param output        the pointer to buffer that can hold the conversion
+ * result (should be at least base64_length_from_binary(length) bytes long)
+ * @param options       the base64 options to use, can be base64_default or
+ * base64_url, is base64_default by default.
+ * @return number of written bytes, will be equal to
+ * base64_length_from_binary(length, options)
  */
-size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options = base64_default) noexcept;
+size_t binary_to_base64(const char *input, size_t length, char *output,
+                        base64_options options = base64_default) noexcept;
 
 /**
  * Convert a base64 input to a binary output.
  *
- * This function follows the WHATWG forgiving-base64 format, which means that it will
- * ignore any ASCII spaces in the input. You may provide a padded input (with one or two
- * equal signs at the end) or an unpadded input (without any equal signs at the end).
+ * This function follows the WHATWG forgiving-base64 format, which means that it
+ * will ignore any ASCII spaces in the input. You may provide a padded input
+ * (with one or two equal signs at the end) or an unpadded input (without any
+ * equal signs at the end).
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are two possible reasons for
- * failure: the input contains a number of base64 characters that when divided by 4, leaves
- * a single remainder character (BASE64_INPUT_REMAINDER), or the input contains a character
- * that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+ * This function will fail in case of invalid input. There are two possible
+ * reasons for failure: the input contains a number of base64 characters that
+ * when divided by 4, leaves a single remainder character
+ * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
+ * valid base64 character (INVALID_BASE64_CHARACTER).
+ *
+ * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the
+ * input where the invalid character was found. When the error is
+ * BASE64_INPUT_REMAINDER, then r.count contains the number of bytes decoded.
+ *
+ * The default option (simdutf::base64_default) expects the characters `+` and
+ * `/` as part of its alphabet. The URL option (simdutf::base64_url) expects the
+ * characters `-` and `_` as part of its alphabet.
  *
- * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input
- * where the invalid character was found. When the error is BASE64_INPUT_REMAINDER, then
- * r.count contains the number of bytes decoded.
+ * The padding (`=`) is validated if present. There may be at most two padding
+ * characters at the end of the input. If there are any padding characters, the
+ * total number of characters (excluding spaces but including padding
+ * characters) must be divisible by four.
  *
- * The default option (simdutf::base64_default) expects the characters `+` and `/` as part of its alphabet.
- * The URL option (simdutf::base64_url) expects the characters `-` and `_` as part of its alphabet.
+ * You should call this function with a buffer that is at least
+ * maximal_binary_length_from_utf6_base64(input, length) bytes long. If you fail
+ * to provide that much space, the function may cause a buffer overflow.
  *
- * The padding (`=`) is validated if present. There may be at most two padding characters at the end of the input.
- * If there are any padding characters, the total number of characters (excluding spaces but including padding characters) must be divisible by four.
+ * Advanced users may want to taylor how the last chunk is handled. By default,
+ * we use a loose (forgiving) approach but we also support a strict approach
+ * as well as a stop_before_partial approach, as per the following proposal:
  *
- * You should call this function with a buffer that is at least maximal_binary_length_from_utf6_base64(input, length) bytes long.
- * If you fail to provide that much space, the function may cause a buffer overflow.
+ * https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
  *
- * @param input         the base64 string to process, in ASCII stored as 16-bit units
+ * @param input         the base64 string to process, in ASCII stored as 16-bit
+ * units
  * @param length        the length of the string in 16-bit units
- * @param output        the pointer to buffer that can hold the conversion result (should be at least maximal_binary_length_from_base64(input, length) bytes long).
- * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and position of the INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number of bytes written if successful.
+ * @param output        the pointer to buffer that can hold the conversion
+ * result (should be at least maximal_binary_length_from_base64(input, length)
+ * bytes long).
+ * @param options       the base64 options to use, can be base64_default or
+ * base64_url, is base64_default by default.
+ * @param last_chunk_options the last chunk handling options,
+ * last_chunk_handling_options::loose by default
+ * but can also be last_chunk_handling_options::strict or
+ * last_chunk_handling_options::stop_before_partial.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and position of the
+ * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number
+ * of bytes written if successful.
  */
-simdutf_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options = base64_default)  noexcept;
+simdutf_warn_unused result
+base64_to_binary(const char16_t *input, size_t length, char *output,
+                 base64_options options = base64_default,
+                 last_chunk_handling_options last_chunk_options =
+                     last_chunk_handling_options::loose) noexcept;
 
 /**
  * Convert a base64 input to a binary output.
  *
- * This function follows the WHATWG forgiving-base64 format, which means that it will
- * ignore any ASCII spaces in the input. You may provide a padded input (with one or two
- * equal signs at the end) or an unpadded input (without any equal signs at the end).
+ * This function follows the WHATWG forgiving-base64 format, which means that it
+ * will ignore any ASCII spaces in the input. You may provide a padded input
+ * (with one or two equal signs at the end) or an unpadded input (without any
+ * equal signs at the end).
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are three possible reasons for
- * failure: the input contains a number of base64 characters that when divided by 4, leaves
- * a single remainder character (BASE64_INPUT_REMAINDER), the input contains a character
- * that is not a valid base64 character (INVALID_BASE64_CHARACTER), or the output buffer
- * is too small (OUTPUT_BUFFER_TOO_SMALL).
+ * This function will fail in case of invalid input. There are three possible
+ * reasons for failure: the input contains a number of base64 characters that
+ * when divided by 4, leaves a single remainder character
+ * (BASE64_INPUT_REMAINDER), the input contains a character that is not a valid
+ * base64 character (INVALID_BASE64_CHARACTER), or the output buffer is too
+ * small (OUTPUT_BUFFER_TOO_SMALL).
  *
  * When OUTPUT_BUFFER_TOO_SMALL, we return both the number of bytes written
- * and the number of units processed, see description of the parameters and returned value.
+ * and the number of units processed, see description of the parameters and
+ * returned value.
  *
- * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input
- * where the invalid character was found. When the error is BASE64_INPUT_REMAINDER, then
- * r.count contains the number of bytes decoded.
+ * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the
+ * input where the invalid character was found. When the error is
+ * BASE64_INPUT_REMAINDER, then r.count contains the number of bytes decoded.
  *
- * The default option (simdutf::base64_default) expects the characters `+` and `/` as part of its alphabet.
- * The URL option (simdutf::base64_url) expects the characters `-` and `_` as part of its alphabet.
+ * The default option (simdutf::base64_default) expects the characters `+` and
+ * `/` as part of its alphabet. The URL option (simdutf::base64_url) expects the
+ * characters `-` and `_` as part of its alphabet.
  *
- * The padding (`=`) is validated if present. There may be at most two padding characters at the end of the input.
- * If there are any padding characters, the total number of characters (excluding spaces but including padding characters) must be divisible by four.
+ * The padding (`=`) is validated if present. There may be at most two padding
+ * characters at the end of the input. If there are any padding characters, the
+ * total number of characters (excluding spaces but including padding
+ * characters) must be divisible by four.
  *
- * The INVALID_BASE64_CHARACTER cases are considered fatal and you are expected to discard
- * the output.
+ * The INVALID_BASE64_CHARACTER cases are considered fatal and you are expected
+ * to discard the output.
  *
- * @param input         the base64 string to process, in ASCII stored as 8-bit or 16-bit units
+ * Advanced users may want to taylor how the last chunk is handled. By default,
+ * we use a loose (forgiving) approach but we also support a strict approach
+ * as well as a stop_before_partial approach, as per the following proposal:
+ *
+ * https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
+ *
+ * @param input         the base64 string to process, in ASCII stored as 8-bit
+ * or 16-bit units
  * @param length        the length of the string in 8-bit or 16-bit units.
- * @param output        the pointer to buffer that can hold the conversion result.
- * @param outlen        the number of bytes that can be written in the output buffer. Upon return, it is modified to reflect how many bytes were written.
- * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
- * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and position of the INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number of units processed if successful.
+ * @param output        the pointer to buffer that can hold the conversion
+ * result.
+ * @param outlen        the number of bytes that can be written in the output
+ * buffer. Upon return, it is modified to reflect how many bytes were written.
+ * @param options       the base64 options to use, can be base64_default or
+ * base64_url, is base64_default by default.
+ * @param last_chunk_options the last chunk handling options,
+ * last_chunk_handling_options::loose by default
+ * but can also be last_chunk_handling_options::strict or
+ * last_chunk_handling_options::stop_before_partial.
+ * @return a result pair struct (of type simdutf::error containing the two
+ * fields error and count) with an error code and position of the
+ * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number
+ * of units processed if successful.
  */
-simdutf_warn_unused result base64_to_binary_safe(const char * input, size_t length, char* output, size_t& outlen, base64_options options = base64_default) noexcept;
-simdutf_warn_unused result base64_to_binary_safe(const char16_t * input, size_t length, char* output, size_t& outlen, base64_options options = base64_default) noexcept;
+simdutf_warn_unused result
+base64_to_binary_safe(const char *input, size_t length, char *output,
+                      size_t &outlen, base64_options options = base64_default,
+                      last_chunk_handling_options last_chunk_options =
+                          last_chunk_handling_options::loose) noexcept;
+simdutf_warn_unused result
+base64_to_binary_safe(const char16_t *input, size_t length, char *output,
+                      size_t &outlen, base64_options options = base64_default,
+                      last_chunk_handling_options last_chunk_options =
+                          last_chunk_handling_options::loose) noexcept;
 
 /**
  * An implementation of simdutf for a particular CPU architecture.
  *
- * Also used to maintain the currently active implementation. The active implementation is
- * automatically initialized on first use to the most advanced implementation supported by the host.
+ * Also used to maintain the currently active implementation. The active
+ * implementation is automatically initialized on first use to the most advanced
+ * implementation supported by the host.
  */
 class implementation {
 public:
-
   /**
    * The name of this implementation.
    *
    *     const implementation *impl = simdutf::active_implementation;
-   *     cout << "simdutf is optimized for " << impl->name() << "(" << impl->description() << ")" << endl;
+   *     cout << "simdutf is optimized for " << impl->name() << "(" <<
+   * impl->description() << ")" << endl;
    *
    * @return the name of the implementation, e.g. "haswell", "westmere", "arm64"
    */
@@ -2561,7 +2946,8 @@ class implementation {
    * The description of this implementation.
    *
    *     const implementation *impl = simdutf::active_implementation;
-   *     cout << "simdutf is optimized for " << impl->name() << "(" << impl->description() << ")" << endl;
+   *     cout << "simdutf is optimized for " << impl->name() << "(" <<
+   * impl->description() << ")" << endl;
    *
    * @return the name of the implementation, e.g. "haswell", "westmere", "arm64"
    */
@@ -2573,7 +2959,8 @@ class implementation {
    * and should therefore not be called too often if performance is a concern.
    *
    *
-   * @return true if the implementation can be safely used on the current system (determined at runtime)
+   * @return true if the implementation can be safely used on the current system
+   * (determined at runtime)
    */
   bool supported_by_runtime_system() const;
 
@@ -2583,7 +2970,8 @@ class implementation {
    * @param length the length of the string in bytes.
    * @return the encoding type detected
    */
-  virtual encoding_type autodetect_encoding(const char * input, size_t length) const noexcept;
+  virtual encoding_type autodetect_encoding(const char *input,
+                                            size_t length) const noexcept;
 
   /**
    * This function will try to detect the possible encodings in one pass
@@ -2591,7 +2979,8 @@ class implementation {
    * @param length the length of the string in bytes.
    * @return the encoding type detected
    */
-  virtual int detect_encodings(const char * input, size_t length) const noexcept = 0;
+  virtual int detect_encodings(const char *input,
+                               size_t length) const noexcept = 0;
 
   /**
    * @private For internal implementation use
@@ -2600,8 +2989,9 @@ class implementation {
    *
    * @return a mask of all required `internal::instruction_set::` values
    */
-  virtual uint32_t required_instruction_sets() const { return _required_instruction_sets; }
-
+  virtual uint32_t required_instruction_sets() const {
+    return _required_instruction_sets;
+  }
 
   /**
    * Validate the UTF-8 string.
@@ -2612,7 +3002,8 @@ class implementation {
    * @param len the length of the string in bytes.
    * @return true if and only if the string is valid UTF-8.
    */
-  simdutf_warn_unused virtual bool validate_utf8(const char *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual bool validate_utf8(const char *buf,
+                                                 size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-8 string and stop on errors.
@@ -2621,9 +3012,13 @@ class implementation {
    *
    * @param buf the UTF-8 string to validate.
    * @param len the length of the string in bytes.
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result validate_utf8_with_errors(const char *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the ASCII string.
@@ -2634,7 +3029,8 @@ class implementation {
    * @param len the length of the string in bytes.
    * @return true if and only if the string is valid ASCII.
    */
-  simdutf_warn_unused virtual bool validate_ascii(const char *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual bool
+  validate_ascii(const char *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the ASCII string and stop on error.
@@ -2643,9 +3039,13 @@ class implementation {
    *
    * @param buf the ASCII string to validate.
    * @param len the length of the string in bytes.
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result validate_ascii_with_errors(const char *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-16LE string.This function may be best when you expect
@@ -2657,10 +3057,12 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param buf the UTF-16LE string to validate.
-   * @param len the length of the string in number of 2-byte code units (char16_t).
+   * @param len the length of the string in number of 2-byte code units
+   * (char16_t).
    * @return true if and only if the string is valid UTF-16LE.
    */
-  simdutf_warn_unused virtual bool validate_utf16le(const char16_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual bool
+  validate_utf16le(const char16_t *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-16BE string. This function may be best when you expect
@@ -2672,24 +3074,32 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param buf the UTF-16BE string to validate.
-   * @param len the length of the string in number of 2-byte code units (char16_t).
+   * @param len the length of the string in number of 2-byte code units
+   * (char16_t).
    * @return true if and only if the string is valid UTF-16BE.
    */
-  simdutf_warn_unused virtual bool validate_utf16be(const char16_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual bool
+  validate_utf16be(const char16_t *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-16LE string and stop on error.  It might be faster than
- * validate_utf16le when an error is expected to occur early.
+   * validate_utf16le when an error is expected to occur early.
    *
    * Overridden by each implementation.
    *
    * This function is not BOM-aware.
    *
    * @param buf the UTF-16LE string to validate.
-   * @param len the length of the string in number of 2-byte code units (char16_t).
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @param len the length of the string in number of 2-byte code units
+   * (char16_t).
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result validate_utf16le_with_errors(const char16_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  validate_utf16le_with_errors(const char16_t *buf,
+                               size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-16BE string and stop on error. It might be faster than
@@ -2700,10 +3110,16 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param buf the UTF-16BE string to validate.
-   * @param len the length of the string in number of 2-byte code units (char16_t).
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @param len the length of the string in number of 2-byte code units
+   * (char16_t).
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result validate_utf16be_with_errors(const char16_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  validate_utf16be_with_errors(const char16_t *buf,
+                               size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-32 string.
@@ -2713,10 +3129,12 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param buf the UTF-32 string to validate.
-   * @param len the length of the string in number of 4-byte code units (char32_t).
+   * @param len the length of the string in number of 4-byte code units
+   * (char32_t).
    * @return true if and only if the string is valid UTF-32.
    */
-  simdutf_warn_unused virtual bool validate_utf32(const char32_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual bool
+  validate_utf32(const char32_t *buf, size_t len) const noexcept = 0;
 
   /**
    * Validate the UTF-32 string and stop on error.
@@ -2726,10 +3144,16 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param buf the UTF-32 string to validate.
-   * @param len the length of the string in number of 4-byte code units (char32_t).
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @param len the length of the string in number of 4-byte code units
+   * (char32_t).
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result validate_utf32_with_errors(const char32_t *buf, size_t len) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  validate_utf32_with_errors(const char32_t *buf,
+                             size_t len) const noexcept = 0;
 
   /**
    * Convert Latin1 string into UTF8 string.
@@ -2741,7 +3165,9 @@ class implementation {
    * @param utf8_output  the pointer to buffer that can hold conversion result
    * @return the number of written char; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_latin1_to_utf8(const char * input, size_t length, char* utf8_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_latin1_to_utf8(const char *input, size_t length,
+                         char *utf8_output) const noexcept = 0;
 
   /**
    * Convert possibly Latin1 string into UTF-16LE string.
@@ -2753,7 +3179,9 @@ class implementation {
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char16_t; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_latin1_to_utf16le(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_latin1_to_utf16le(const char *input, size_t length,
+                            char16_t *utf16_output) const noexcept = 0;
 
   /**
    * Convert Latin1 string into UTF-16BE string.
@@ -2765,7 +3193,9 @@ class implementation {
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char16_t; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_latin1_to_utf16be(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_latin1_to_utf16be(const char *input, size_t length,
+                            char16_t *utf16_output) const noexcept = 0;
 
   /**
    * Convert Latin1 string into UTF-32 string.
@@ -2777,9 +3207,11 @@ class implementation {
    * @param utf32_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char32_t; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_latin1_to_utf32(const char * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_latin1_to_utf32(const char *input, size_t length,
+                          char32_t *utf32_buffer) const noexcept = 0;
 
- /**
+  /**
    * Convert possibly broken UTF-8 string into latin1 string.
    *
    * During the conversion also validation of the input string is done.
@@ -2788,9 +3220,12 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param latin1_output  the pointer to buffer that can hold conversion result
-   * @return the number of written char; 0 if the input was not valid UTF-8 string or if it cannot be represented as Latin1
+   * @return the number of written char; 0 if the input was not valid UTF-8
+   * string or if it cannot be represented as Latin1
    */
-  simdutf_warn_unused virtual size_t convert_utf8_to_latin1(const char * input, size_t length, char* latin1_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf8_to_latin1(const char *input, size_t length,
+                         char *latin1_output) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-8 string into latin1 string with errors.
@@ -2803,27 +3238,37 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param latin1_output  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result convert_utf8_to_latin1_with_errors(const char * input, size_t length, char* latin1_output) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf8_to_latin1_with_errors(const char *input, size_t length,
+                                     char *latin1_output) const noexcept = 0;
 
   /**
    * Convert valid UTF-8 string into latin1 string.
    *
-   * This function assumes that the input string is valid UTF-8 and that it can be represented as Latin1.
-   * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+   * This function assumes that the input string is valid UTF-8 and that it can
+   * be represented as Latin1. If you violate this assumption, the result is
+   * implementation defined and may include system-dependent behavior such as
+   * crashes.
    *
-   * This function is for expert users only and not part of our public API. Use convert_utf8_to_latin1 instead.
+   * This function is for expert users only and not part of our public API. Use
+   * convert_utf8_to_latin1 instead.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param latin1_output  the pointer to buffer that can hold conversion result
-   * @return the number of written char; 0 if the input was not valid UTF-8 string
+   * @return the number of written char; 0 if the input was not valid UTF-8
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf8_to_latin1(const char * input, size_t length, char* latin1_output) const noexcept = 0;
-
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf8_to_latin1(const char *input, size_t length,
+                               char *latin1_output) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-8 string into UTF-16LE string.
@@ -2834,9 +3279,12 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+   * @return the number of written char16_t; 0 if the input was not valid UTF-8
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf8_to_utf16le(const char *input, size_t length,
+                          char16_t *utf16_output) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-8 string into UTF-16BE string.
@@ -2847,12 +3295,16 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+   * @return the number of written char16_t; 0 if the input was not valid UTF-8
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf8_to_utf16be(const char *input, size_t length,
+                          char16_t *utf16_output) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-8 string into UTF-16LE string and stop on error.
+   * Convert possibly broken UTF-8 string into UTF-16LE string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -2860,12 +3312,18 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result convert_utf8_to_utf16le_with_errors(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf8_to_utf16le_with_errors(
+      const char *input, size_t length,
+      char16_t *utf16_output) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-8 string into UTF-16BE string and stop on error.
+   * Convert possibly broken UTF-8 string into UTF-16BE string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -2873,9 +3331,14 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of code units validated if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of code units validated
+   * if successful.
    */
-  simdutf_warn_unused virtual result convert_utf8_to_utf16be_with_errors(const char * input, size_t length, char16_t* utf16_output) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf8_to_utf16be_with_errors(
+      const char *input, size_t length,
+      char16_t *utf16_output) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-8 string into UTF-32 string.
@@ -2886,9 +3349,12 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf32_buffer  the pointer to buffer that can hold conversion result
-   * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
+   * @return the number of written char16_t; 0 if the input was not valid UTF-8
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_output) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf8_to_utf32(const char *input, size_t length,
+                        char32_t *utf32_output) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-8 string into UTF-32 string and stop on error.
@@ -2899,9 +3365,14 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf32_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char32_t written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf8_to_utf32_with_errors(const char * input, size_t length, char32_t* utf32_output) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf8_to_utf32_with_errors(const char *input, size_t length,
+                                    char32_t *utf32_output) const noexcept = 0;
 
   /**
    * Convert valid UTF-8 string into UTF-16LE string.
@@ -2913,9 +3384,11 @@ class implementation {
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char16_t
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf8_to_utf16le(const char * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf8_to_utf16le(const char *input, size_t length,
+                                char16_t *utf16_buffer) const noexcept = 0;
 
-/**
+  /**
    * Convert valid UTF-8 string into UTF-16BE string.
    *
    * This function assumes that the input string is valid UTF-8.
@@ -2925,7 +3398,9 @@ class implementation {
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char16_t
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf8_to_utf16be(const char * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf8_to_utf16be(const char *input, size_t length,
+                                char16_t *utf16_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-8 string into UTF-32 string.
@@ -2937,33 +3412,41 @@ class implementation {
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
    * @return the number of written char32_t
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf8_to_utf32(const char * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf8_to_utf32(const char *input, size_t length,
+                              char32_t *utf32_buffer) const noexcept = 0;
 
   /**
-   * Compute the number of 2-byte code units that this UTF-8 string would require in UTF-16LE format.
+   * Compute the number of 2-byte code units that this UTF-8 string would
+   * require in UTF-16LE format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-8 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-8 strings but in such cases the result is implementation defined.
    *
    * @param input         the UTF-8 string to process
    * @param length        the length of the string in bytes
-   * @return the number of char16_t code units required to encode the UTF-8 string as UTF-16LE
+   * @return the number of char16_t code units required to encode the UTF-8
+   * string as UTF-16LE
    */
-  simdutf_warn_unused virtual size_t utf16_length_from_utf8(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept = 0;
 
-   /**
-   * Compute the number of 4-byte code units that this UTF-8 string would require in UTF-32 format.
+  /**
+   * Compute the number of 4-byte code units that this UTF-8 string would
+   * require in UTF-32 format.
    *
-   * This function is equivalent to count_utf8. It is acceptable to pass invalid UTF-8 strings but in such cases
-   * the result is implementation defined.
+   * This function is equivalent to count_utf8. It is acceptable to pass invalid
+   * UTF-8 strings but in such cases the result is implementation defined.
    *
    * This function does not validate the input.
    *
    * @param input         the UTF-8 string to process
    * @param length        the length of the string in bytes
-   * @return the number of char32_t code units required to encode the UTF-8 string as UTF-32
+   * @return the number of char32_t code units required to encode the UTF-8
+   * string as UTF-32
    */
-  simdutf_warn_unused virtual size_t utf32_length_from_utf8(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16LE string into Latin1 string.
@@ -2974,11 +3457,16 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16LE string or if it cannot be represented as Latin1
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return number of written code units; 0 if input is not a valid UTF-16LE
+   * string or if it cannot be represented as Latin1
    */
-  simdutf_warn_unused virtual size_t convert_utf16le_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16le_to_latin1(const char16_t *input, size_t length,
+                            char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16BE string into Latin1 string.
@@ -2989,11 +3477,16 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16BE string or if it cannot be represented as Latin1
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return number of written code units; 0 if input is not a valid UTF-16BE
+   * string or if it cannot be represented as Latin1
    */
-  simdutf_warn_unused virtual size_t convert_utf16be_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16be_to_latin1(const char16_t *input, size_t length,
+                            char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16LE string into Latin1 string.
@@ -3005,11 +3498,18 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16le_to_latin1_with_errors(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf16le_to_latin1_with_errors(const char16_t *input, size_t length,
+                                        char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16BE string into Latin1 string.
@@ -3021,45 +3521,66 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16be_to_latin1_with_errors(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf16be_to_latin1_with_errors(const char16_t *input, size_t length,
+                                        char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16LE string into Latin1 string.
    *
-   * This function assumes that the input string is valid UTF-L16LE and that it can be represented as Latin1.
-   * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+   * This function assumes that the input string is valid UTF-L16LE and that it
+   * can be represented as Latin1. If you violate this assumption, the result is
+   * implementation defined and may include system-dependent behavior such as
+   * crashes.
    *
-   * This function is for expert users only and not part of our public API. Use convert_utf16le_to_latin1 instead.
+   * This function is for expert users only and not part of our public API. Use
+   * convert_utf16le_to_latin1 instead.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16le_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16le_to_latin1(const char16_t *input, size_t length,
+                                  char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16BE string into Latin1 string.
    *
-   * This function assumes that the input string is valid UTF16-BE and that it can be represented as Latin1.
-   * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+   * This function assumes that the input string is valid UTF16-BE and that it
+   * can be represented as Latin1. If you violate this assumption, the result is
+   * implementation defined and may include system-dependent behavior such as
+   * crashes.
    *
-   * This function is for expert users only and not part of our public API. Use convert_utf16be_to_latin1 instead.
+   * This function is for expert users only and not part of our public API. Use
+   * convert_utf16be_to_latin1 instead.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16be_to_latin1(const char16_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16be_to_latin1(const char16_t *input, size_t length,
+                                  char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16LE string into UTF-8 string.
@@ -3070,11 +3591,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16LE string
+   * @return number of written code units; 0 if input is not a valid UTF-16LE
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf16le_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16le_to_utf8(const char16_t *input, size_t length,
+                          char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16BE string into UTF-8 string.
@@ -3085,14 +3610,19 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16BE string
+   * @return number of written code units; 0 if input is not a valid UTF-16BE
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf16be_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16be_to_utf8(const char16_t *input, size_t length,
+                          char *utf8_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-16LE string into UTF-8 string and stop on error.
+   * Convert possibly broken UTF-16LE string into UTF-8 string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3100,14 +3630,21 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16le_to_utf8_with_errors(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf16le_to_utf8_with_errors(const char16_t *input, size_t length,
+                                      char *utf8_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-16BE string into UTF-8 string and stop on error.
+   * Convert possibly broken UTF-16BE string into UTF-8 string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3115,11 +3652,17 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16be_to_utf8_with_errors(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf16be_to_utf8_with_errors(const char16_t *input, size_t length,
+                                      char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16LE string into UTF-8 string.
@@ -3129,11 +3672,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param utf8_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16le_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16le_to_utf8(const char16_t *input, size_t length,
+                                char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16BE string into UTF-8 string.
@@ -3143,11 +3690,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param utf8_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16be_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16be_to_utf8(const char16_t *input, size_t length,
+                                char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16LE string into UTF-32 string.
@@ -3158,11 +3709,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16LE string
+   * @return number of written code units; 0 if input is not a valid UTF-16LE
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf16le_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16le_to_utf32(const char16_t *input, size_t length,
+                           char32_t *utf32_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-16BE string into UTF-32 string.
@@ -3173,14 +3728,19 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-16BE string
+   * @return number of written code units; 0 if input is not a valid UTF-16BE
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf16be_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf16be_to_utf32(const char16_t *input, size_t length,
+                           char32_t *utf32_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-16LE string into UTF-32 string and stop on error.
+   * Convert possibly broken UTF-16LE string into UTF-32 string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3188,14 +3748,21 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char32_t written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16le_to_utf32_with_errors(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf16le_to_utf32_with_errors(
+      const char16_t *input, size_t length,
+      char32_t *utf32_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-16BE string into UTF-32 string and stop on error.
+   * Convert possibly broken UTF-16BE string into UTF-32 string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3203,11 +3770,17 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char32_t written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char32_t written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf16be_to_utf32_with_errors(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf16be_to_utf32_with_errors(
+      const char16_t *input, size_t length,
+      char32_t *utf32_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16LE string into UTF-32 string.
@@ -3217,11 +3790,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param utf32_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param utf32_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16le_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16le_to_utf32(const char16_t *input, size_t length,
+                                 char32_t *utf32_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-16LE string into UTF-32BE string.
@@ -3231,39 +3808,51 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param utf32_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param utf32_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf16be_to_utf32(const char16_t * input, size_t length, char32_t* utf32_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf16be_to_utf32(const char16_t *input, size_t length,
+                                 char32_t *utf32_buffer) const noexcept = 0;
 
   /**
-   * Compute the number of bytes that this UTF-16LE string would require in UTF-8 format.
+   * Compute the number of bytes that this UTF-16LE string would require in
+   * UTF-8 format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-16 strings but in such cases the result is implementation defined.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @return the number of bytes required to encode the UTF-16LE string as UTF-8
    */
-  simdutf_warn_unused virtual size_t utf8_length_from_utf16le(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf8_length_from_utf16le(const char16_t *input,
+                           size_t length) const noexcept = 0;
 
   /**
-   * Compute the number of bytes that this UTF-16BE string would require in UTF-8 format.
+   * Compute the number of bytes that this UTF-16BE string would require in
+   * UTF-8 format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-16 strings but in such cases the result is implementation defined.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @return the number of bytes required to encode the UTF-16BE string as UTF-8
    */
-  simdutf_warn_unused virtual size_t utf8_length_from_utf16be(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf8_length_from_utf16be(const char16_t *input,
+                           size_t length) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into Latin1 string.
@@ -3274,12 +3863,17 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-32 string
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return number of written code units; 0 if input is not a valid UTF-32
+   * string
    */
 
-  simdutf_warn_unused virtual size_t convert_utf32_to_latin1(const char32_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf32_to_latin1(const char32_t *input, size_t length,
+                          char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into Latin1 string and stop on error.
@@ -3291,28 +3885,42 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param latin1_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param latin1_buffer   the pointer to buffer that can hold conversion
+   * result
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf32_to_latin1_with_errors(const char32_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf32_to_latin1_with_errors(const char32_t *input, size_t length,
+                                      char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-32 string into Latin1 string.
    *
-   * This function assumes that the input string is valid UTF-32 and can be represented as Latin1.
-   * If you violate this assumption, the result is implementation defined and may include system-dependent behavior such as crashes.
+   * This function assumes that the input string is valid UTF-32 and can be
+   * represented as Latin1. If you violate this assumption, the result is
+   * implementation defined and may include system-dependent behavior such as
+   * crashes.
    *
-   * This function is for expert users only and not part of our public API. Use convert_utf32_to_latin1 instead.
+   * This function is for expert users only and not part of our public API. Use
+   * convert_utf32_to_latin1 instead.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param latin1_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param latin1_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf32_to_latin1(const char32_t * input, size_t length, char* latin1_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf32_to_latin1(const char32_t *input, size_t length,
+                                char *latin1_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into UTF-8 string.
@@ -3323,11 +3931,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-32 string
+   * @return number of written code units; 0 if input is not a valid UTF-32
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf32_to_utf8(const char32_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf32_to_utf8(const char32_t *input, size_t length,
+                        char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into UTF-8 string and stop on error.
@@ -3338,11 +3950,17 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf32_to_utf8_with_errors(const char32_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  convert_utf32_to_utf8_with_errors(const char32_t *input, size_t length,
+                                    char *utf8_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-32 string into UTF-8 string.
@@ -3352,22 +3970,28 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param utf8_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param utf8_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf32_to_utf8(const char32_t * input, size_t length, char* utf8_buffer) const noexcept = 0;
-
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf32_to_utf8(const char32_t *input, size_t length,
+                              char *utf8_buffer) const noexcept = 0;
 
   /**
-   * Return the number of bytes that this UTF-16 string would require in Latin1 format.
+   * Return the number of bytes that this UTF-16 string would require in Latin1
+   * format.
    *
    *
    * @param input         the UTF-16 string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @return the number of bytes required to encode the UTF-16 string as Latin1
    */
-    simdutf_warn_unused virtual size_t utf16_length_from_latin1(size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf16_length_from_latin1(size_t length) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into UTF-16LE string.
@@ -3378,11 +4002,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-32 string
+   * @return number of written code units; 0 if input is not a valid UTF-32
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf32_to_utf16le(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf32_to_utf16le(const char32_t *input, size_t length,
+                           char16_t *utf16_buffer) const noexcept = 0;
 
   /**
    * Convert possibly broken UTF-32 string into UTF-16BE string.
@@ -3393,14 +4021,19 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return number of written code units; 0 if input is not a valid UTF-32 string
+   * @return number of written code units; 0 if input is not a valid UTF-32
+   * string
    */
-  simdutf_warn_unused virtual size_t convert_utf32_to_utf16be(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_utf32_to_utf16be(const char32_t *input, size_t length,
+                           char16_t *utf16_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-32 string into UTF-16LE string and stop on error.
+   * Convert possibly broken UTF-32 string into UTF-16LE string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3408,14 +4041,21 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char16_t written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf32_to_utf16le_with_errors(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf32_to_utf16le_with_errors(
+      const char32_t *input, size_t length,
+      char16_t *utf16_buffer) const noexcept = 0;
 
   /**
-   * Convert possibly broken UTF-32 string into UTF-16BE string and stop on error.
+   * Convert possibly broken UTF-32 string into UTF-16BE string and stop on
+   * error.
    *
    * During the conversion also validation of the input string is done.
    * This function is suitable to work with inputs from untrusted sources.
@@ -3423,11 +4063,17 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in code units) if any, or the number of char16_t written if successful.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in code units) if any, or the number of char16_t written if
+   * successful.
    */
-  simdutf_warn_unused virtual result convert_utf32_to_utf16be_with_errors(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual result convert_utf32_to_utf16be_with_errors(
+      const char32_t *input, size_t length,
+      char16_t *utf16_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-32 string into UTF-16LE string.
@@ -3437,11 +4083,15 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param utf16_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param utf16_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf32_to_utf16le(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf32_to_utf16le(const char32_t *input, size_t length,
+                                 char16_t *utf16_buffer) const noexcept = 0;
 
   /**
    * Convert valid UTF-32 string into UTF-16BE string.
@@ -3451,137 +4101,175 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
-   * @param utf16_buffer   the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
+   * @param utf16_buffer   the pointer to buffer that can hold the conversion
+   * result
    * @return number of written code units; 0 if conversion is not possible
    */
-  simdutf_warn_unused virtual size_t convert_valid_utf32_to_utf16be(const char32_t * input, size_t length, char16_t* utf16_buffer) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  convert_valid_utf32_to_utf16be(const char32_t *input, size_t length,
+                                 char16_t *utf16_buffer) const noexcept = 0;
 
   /**
-   * Change the endianness of the input. Can be used to go from UTF-16LE to UTF-16BE or
-   * from UTF-16BE to UTF-16LE.
+   * Change the endianness of the input. Can be used to go from UTF-16LE to
+   * UTF-16BE or from UTF-16BE to UTF-16LE.
    *
    * This function does not validate the input.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16 string to process
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @param output        the pointer to buffer that can hold the conversion result
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @param output        the pointer to buffer that can hold the conversion
+   * result
    */
-  virtual void change_endianness_utf16(const char16_t * input, size_t length, char16_t * output) const noexcept = 0;
+  virtual void change_endianness_utf16(const char16_t *input, size_t length,
+                                       char16_t *output) const noexcept = 0;
 
- /**
-   * Return the number of bytes that this Latin1 string would require in UTF-8 format.
+  /**
+   * Return the number of bytes that this Latin1 string would require in UTF-8
+   * format.
    *
    * @param input         the Latin1 string to convert
    * @param length        the length of the string bytes
    * @return the number of bytes required to encode the Latin1 string as UTF-8
    */
-    simdutf_warn_unused virtual size_t utf8_length_from_latin1(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept = 0;
 
   /**
-   * Compute the number of bytes that this UTF-32 string would require in UTF-8 format.
+   * Compute the number of bytes that this UTF-32 string would require in UTF-8
+   * format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-32 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-32 strings but in such cases the result is implementation defined.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @return the number of bytes required to encode the UTF-32 string as UTF-8
    */
-  simdutf_warn_unused virtual size_t utf8_length_from_utf32(const char32_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf8_length_from_utf32(const char32_t *input,
+                         size_t length) const noexcept = 0;
 
   /**
-   * Compute the number of bytes that this UTF-32 string would require in Latin1 format.
+   * Compute the number of bytes that this UTF-32 string would require in Latin1
+   * format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-32 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-32 strings but in such cases the result is implementation defined.
    *
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @return the number of bytes required to encode the UTF-32 string as Latin1
    */
-  simdutf_warn_unused virtual size_t latin1_length_from_utf32(size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  latin1_length_from_utf32(size_t length) const noexcept = 0;
 
   /**
-   * Compute the number of bytes that this UTF-8 string would require in Latin1 format.
+   * Compute the number of bytes that this UTF-8 string would require in Latin1
+   * format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-8 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-8 strings but in such cases the result is implementation defined.
    *
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in byte
    * @return the number of bytes required to encode the UTF-8 string as Latin1
    */
-  simdutf_warn_unused virtual size_t latin1_length_from_utf8(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept = 0;
 
   /*
-   * Compute the number of bytes that this UTF-16LE/BE string would require in Latin1 format.
+   * Compute the number of bytes that this UTF-16LE/BE string would require in
+   * Latin1 format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-16 strings but in such cases the result is implementation defined.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @return the number of bytes required to encode the UTF-16LE string as Latin1
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @return the number of bytes required to encode the UTF-16LE string as
+   * Latin1
    */
-  simdutf_warn_unused virtual size_t latin1_length_from_utf16(size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  latin1_length_from_utf16(size_t length) const noexcept = 0;
 
   /**
-   * Compute the number of two-byte code units that this UTF-32 string would require in UTF-16 format.
+   * Compute the number of two-byte code units that this UTF-32 string would
+   * require in UTF-16 format.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-32 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-32 strings but in such cases the result is implementation defined.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @return the number of bytes required to encode the UTF-32 string as UTF-16
    */
-  simdutf_warn_unused virtual size_t utf16_length_from_utf32(const char32_t * input, size_t length) const noexcept = 0;
-
+  simdutf_warn_unused virtual size_t
+  utf16_length_from_utf32(const char32_t *input,
+                          size_t length) const noexcept = 0;
 
   /**
-   * Return the number of bytes that this UTF-32 string would require in Latin1 format.
+   * Return the number of bytes that this UTF-32 string would require in Latin1
+   * format.
    *
    * @param input         the UTF-32 string to convert
-   * @param length        the length of the string in 4-byte code units (char32_t)
+   * @param length        the length of the string in 4-byte code units
+   * (char32_t)
    * @return the number of bytes required to encode the UTF-32 string as Latin1
    */
-    simdutf_warn_unused virtual size_t utf32_length_from_latin1(size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf32_length_from_latin1(size_t length) const noexcept = 0;
 
   /*
-   * Compute the number of bytes that this UTF-16LE string would require in UTF-32 format.
+   * Compute the number of bytes that this UTF-16LE string would require in
+   * UTF-32 format.
    *
    * This function is equivalent to count_utf16le.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-16 strings but in such cases the result is implementation defined.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @return the number of bytes required to encode the UTF-16LE string as UTF-32
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @return the number of bytes required to encode the UTF-16LE string as
+   * UTF-32
    */
-  simdutf_warn_unused virtual size_t utf32_length_from_utf16le(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf32_length_from_utf16le(const char16_t *input,
+                            size_t length) const noexcept = 0;
 
   /*
-   * Compute the number of bytes that this UTF-16BE string would require in UTF-32 format.
+   * Compute the number of bytes that this UTF-16BE string would require in
+   * UTF-32 format.
    *
    * This function is equivalent to count_utf16be.
    *
-   * This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases
-   * the result is implementation defined.
+   * This function does not validate the input. It is acceptable to pass invalid
+   * UTF-16 strings but in such cases the result is implementation defined.
    *
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to convert
-   * @param length        the length of the string in 2-byte code units (char16_t)
-   * @return the number of bytes required to encode the UTF-16BE string as UTF-32
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
+   * @return the number of bytes required to encode the UTF-16BE string as
+   * UTF-32
    */
-  simdutf_warn_unused virtual size_t utf32_length_from_utf16be(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  utf32_length_from_utf16be(const char16_t *input,
+                            size_t length) const noexcept = 0;
 
   /**
    * Count the number of code points (characters) in the string assuming that
@@ -3594,10 +4282,12 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16LE string to process
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @return number of code points
    */
-  simdutf_warn_unused virtual size_t count_utf16le(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  count_utf16le(const char16_t *input, size_t length) const noexcept = 0;
 
   /**
    * Count the number of code points (characters) in the string assuming that
@@ -3610,11 +4300,12 @@ class implementation {
    * This function is not BOM-aware.
    *
    * @param input         the UTF-16BE string to process
-   * @param length        the length of the string in 2-byte code units (char16_t)
+   * @param length        the length of the string in 2-byte code units
+   * (char16_t)
    * @return number of code points
    */
-  simdutf_warn_unused virtual size_t count_utf16be(const char16_t * input, size_t length) const noexcept = 0;
-
+  simdutf_warn_unused virtual size_t
+  count_utf16be(const char16_t *input, size_t length) const noexcept = 0;
 
   /**
    * Count the number of code points (characters) in the string assuming that
@@ -3628,135 +4319,174 @@ class implementation {
    * @param length        the length of the string in bytes
    * @return number of code points
    */
-  simdutf_warn_unused virtual size_t count_utf8(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  count_utf8(const char *input, size_t length) const noexcept = 0;
 
   /**
    * Provide the maximal binary length in bytes given the base64 input.
-   * In general, if the input contains ASCII spaces, the result will be less than
-   * the maximum length. It is acceptable to pass invalid base64 strings but in such cases
-   * the result is implementation defined.
+   * In general, if the input contains ASCII spaces, the result will be less
+   * than the maximum length. It is acceptable to pass invalid base64 strings
+   * but in such cases the result is implementation defined.
    *
    * @param input         the base64 input to process
    * @param length        the length of the base64 input in bytes
    * @return maximal number of binary bytes
    */
-  simdutf_warn_unused virtual size_t maximal_binary_length_from_base64(const char * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  maximal_binary_length_from_base64(const char *input,
+                                    size_t length) const noexcept = 0;
 
   /**
    * Provide the maximal binary length in bytes given the base64 input.
-   * In general, if the input contains ASCII spaces, the result will be less than
-   * the maximum length. It is acceptable to pass invalid base64 strings but in such cases
-   * the result is implementation defined.
+   * In general, if the input contains ASCII spaces, the result will be less
+   * than the maximum length. It is acceptable to pass invalid base64 strings
+   * but in such cases the result is implementation defined.
    *
-   * @param input         the base64 input to process, in ASCII stored as 16-bit units
+   * @param input         the base64 input to process, in ASCII stored as 16-bit
+   * units
    * @param length        the length of the base64 input in 16-bit units
    * @return maximal number of binary bytes
    */
-  simdutf_warn_unused virtual size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) const noexcept = 0;
+  simdutf_warn_unused virtual size_t
+  maximal_binary_length_from_base64(const char16_t *input,
+                                    size_t length) const noexcept = 0;
 
   /**
    * Convert a base64 input to a binary output.
    *
-   * This function follows the WHATWG forgiving-base64 format, which means that it will
-   * ignore any ASCII spaces in the input. You may provide a padded input (with one or two
-   * equal signs at the end) or an unpadded input (without any equal signs at the end).
+   * This function follows the WHATWG forgiving-base64 format, which means that
+   * it will ignore any ASCII spaces in the input. You may provide a padded
+   * input (with one or two equal signs at the end) or an unpadded input
+   * (without any equal signs at the end).
    *
    * See https://infra.spec.whatwg.org/#forgiving-base64-decode
    *
-   * This function will fail in case of invalid input. There are two possible reasons for
-   * failure: the input contains a number of base64 characters that when divided by 4, leaves
-   * a single remainder character (BASE64_INPUT_REMAINDER), or the input contains a character
-   * that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+   * This function will fail in case of invalid input. There are two possible
+   * reasons for failure: the input contains a number of base64 characters that
+   * when divided by 4, leaves a single remainder character
+   * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
+   * valid base64 character (INVALID_BASE64_CHARACTER).
    *
-   * You should call this function with a buffer that is at least maximal_binary_length_from_base64(input, length) bytes long.
-   * If you fail to provide that much space, the function may cause a buffer overflow.
+   * You should call this function with a buffer that is at least
+   * maximal_binary_length_from_base64(input, length) bytes long. If you fail to
+   * provide that much space, the function may cause a buffer overflow.
    *
    * @param input         the base64 string to process
    * @param length        the length of the string in bytes
-   * @param output        the pointer to buffer that can hold the conversion result (should be at least maximal_binary_length_from_base64(input, length) bytes long).
-   * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and either position of the error (in the input in bytes) if any, or the number of bytes written if successful.
+   * @param output        the pointer to buffer that can hold the conversion
+   * result (should be at least maximal_binary_length_from_base64(input, length)
+   * bytes long).
+   * @param options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and either position of the error
+   * (in the input in bytes) if any, or the number of bytes written if
+   * successful.
    */
-  simdutf_warn_unused virtual result base64_to_binary(const char * input, size_t length, char* output, base64_options options = base64_default) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options = base64_default,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept = 0;
 
   /**
    * Convert a base64 input to a binary output.
    *
-   * This function follows the WHATWG forgiving-base64 format, which means that it will
-   * ignore any ASCII spaces in the input. You may provide a padded input (with one or two
-   * equal signs at the end) or an unpadded input (without any equal signs at the end).
+   * This function follows the WHATWG forgiving-base64 format, which means that
+   * it will ignore any ASCII spaces in the input. You may provide a padded
+   * input (with one or two equal signs at the end) or an unpadded input
+   * (without any equal signs at the end).
    *
    * See https://infra.spec.whatwg.org/#forgiving-base64-decode
    *
-   * This function will fail in case of invalid input. There are two possible reasons for
-   * failure: the input contains a number of base64 characters that when divided by 4, leaves
-   * a single remainder character (BASE64_INPUT_REMAINDER), or the input contains a character
-   * that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+   * This function will fail in case of invalid input. There are two possible
+   * reasons for failure: the input contains a number of base64 characters that
+   * when divided by 4, leaves a single remainder character
+   * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
+   * valid base64 character (INVALID_BASE64_CHARACTER).
    *
-   * You should call this function with a buffer that is at least maximal_binary_length_from_utf6_base64(input, length) bytes long.
-   * If you fail to provide that much space, the function may cause a buffer overflow.
+   * You should call this function with a buffer that is at least
+   * maximal_binary_length_from_utf6_base64(input, length) bytes long. If you
+   * fail to provide that much space, the function may cause a buffer overflow.
    *
-   * @param input         the base64 string to process, in ASCII stored as 16-bit units
+   * @param input         the base64 string to process, in ASCII stored as
+   * 16-bit units
    * @param length        the length of the string in 16-bit units
-   * @param output        the pointer to buffer that can hold the conversion result (should be at least maximal_binary_length_from_base64(input, length) bytes long).
-   * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
-   * @return a result pair struct (of type simdutf::error containing the two fields error and count) with an error code and position of the INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number of bytes written if successful.
+   * @param output        the pointer to buffer that can hold the conversion
+   * result (should be at least maximal_binary_length_from_base64(input, length)
+   * bytes long).
+   * @param options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
+   * @return a result pair struct (of type simdutf::error containing the two
+   * fields error and count) with an error code and position of the
+   * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the
+   * number of bytes written if successful.
    */
-  simdutf_warn_unused virtual result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options = base64_default) const noexcept = 0;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options = base64_default,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept = 0;
 
   /**
    * Provide the base64 length in bytes given the length of a binary input.
    *
    * @param length        the length of the input in bytes
-   * @parem options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
+   * @parem options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
    * @return number of base64 bytes
    */
-  simdutf_warn_unused virtual size_t base64_length_from_binary(size_t length, base64_options options = base64_default) const noexcept = 0;
+  simdutf_warn_unused virtual size_t base64_length_from_binary(
+      size_t length,
+      base64_options options = base64_default) const noexcept = 0;
 
   /**
    * Convert a binary input to a base64 output.
    *
-   * The default option (simdutf::base64_default) uses the characters `+` and `/` as part of its alphabet.
-   * Further, it adds padding (`=`) at the end of the output to ensure that the output length is a multiple of four.
+   * The default option (simdutf::base64_default) uses the characters `+` and
+   * `/` as part of its alphabet. Further, it adds padding (`=`) at the end of
+   * the output to ensure that the output length is a multiple of four.
    *
-   * The URL option (simdutf::base64_url) uses the characters `-` and `_` as part of its alphabet. No padding
-   * is added at the end of the output.
+   * The URL option (simdutf::base64_url) uses the characters `-` and `_` as
+   * part of its alphabet. No padding is added at the end of the output.
    *
    * This function always succeeds.
    *
    * @param input         the binary to process
    * @param length        the length of the input in bytes
-   * @param output        the pointer to buffer that can hold the conversion result (should be at least base64_length_from_binary(length) bytes long)
-   * @param options       the base64 options to use, can be base64_default or base64_url, is base64_default by default.
-   * @return number of written bytes, will be equal to base64_length_from_binary(length, options)
+   * @param output        the pointer to buffer that can hold the conversion
+   * result (should be at least base64_length_from_binary(length) bytes long)
+   * @param options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
+   * @return number of written bytes, will be equal to
+   * base64_length_from_binary(length, options)
    */
-  virtual size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options = base64_default) const noexcept = 0;
-
+  virtual size_t
+  binary_to_base64(const char *input, size_t length, char *output,
+                   base64_options options = base64_default) const noexcept = 0;
 
 protected:
-  /** @private Construct an implementation with the given name and description. For subclasses. */
-  simdutf_really_inline implementation(
-    const char* name,
-    const char* description,
-    uint32_t required_instruction_sets
-  ) :
-    _name(name),
-    _description(description),
-    _required_instruction_sets(required_instruction_sets)
-  {
-  }
+  /** @private Construct an implementation with the given name and description.
+   * For subclasses. */
+  simdutf_really_inline implementation(const char *name,
+                                       const char *description,
+                                       uint32_t required_instruction_sets)
+      : _name(name), _description(description),
+        _required_instruction_sets(required_instruction_sets) {}
+
 protected:
   ~implementation() = default;
+
 private:
   /**
    * The name of this implementation.
    */
-  const char* _name;
+  const char *_name;
 
   /**
    * The description of this implementation.
    */
-  const char* _description;
+  const char *_description;
 
   /**
    * Instruction sets required for this implementation.
@@ -3777,26 +4507,28 @@ class available_implementation_list {
   /** Number of implementations */
   size_t size() const noexcept;
   /** STL const begin() iterator */
-  const implementation * const *begin() const noexcept;
+  const implementation *const *begin() const noexcept;
   /** STL const end() iterator */
-  const implementation * const *end() const noexcept;
+  const implementation *const *end() const noexcept;
 
   /**
    * Get the implementation with the given name.
    *
    * Case sensitive.
    *
-   *     const implementation *impl = simdutf::available_implementations["westmere"];
-   *     if (!impl) { exit(1); }
-   *     if (!imp->supported_by_runtime_system()) { exit(1); }
+   *     const implementation *impl =
+   * simdutf::available_implementations["westmere"]; if (!impl) { exit(1); } if
+   * (!imp->supported_by_runtime_system()) { exit(1); }
    *     simdutf::active_implementation = impl;
    *
    * @param name the implementation to find, e.g. "westmere", "haswell", "arm64"
    * @return the implementation, or nullptr if the parse failed.
    */
-  const implementation * operator[](const std::string &name) const noexcept {
-    for (const implementation * impl : *this) {
-      if (impl->name() == name) { return impl; }
+  const implementation *operator[](const std::string &name) const noexcept {
+    for (const implementation *impl : *this) {
+      if (impl->name() == name) {
+        return impl;
+      }
     }
     return nullptr;
   }
@@ -3806,48 +4538,54 @@ class available_implementation_list {
    *
    * This is used to initialize the implementation on startup.
    *
-   *     const implementation *impl = simdutf::available_implementation::detect_best_supported();
+   *     const implementation *impl =
+   * simdutf::available_implementation::detect_best_supported();
    *     simdutf::active_implementation = impl;
    *
-   * @return the most advanced supported implementation for the current host, or an
-   *         implementation that returns UNSUPPORTED_ARCHITECTURE if there is no supported
-   *         implementation. Will never return nullptr.
+   * @return the most advanced supported implementation for the current host, or
+   * an implementation that returns UNSUPPORTED_ARCHITECTURE if there is no
+   * supported implementation. Will never return nullptr.
    */
   const implementation *detect_best_supported() const noexcept;
 };
 
-template<typename T>
-class atomic_ptr {
+template <typename T> class atomic_ptr {
 public:
   atomic_ptr(T *_ptr) : ptr{_ptr} {}
 
 #if defined(SIMDUTF_NO_THREADS)
-  operator const T*() const { return ptr; }
-  const T& operator*() const { return *ptr; }
-  const T* operator->() const { return ptr; }
-
-  operator T*() { return ptr; }
-  T& operator*() { return *ptr; }
-  T* operator->() { return ptr; }
-  atomic_ptr& operator=(T *_ptr) { ptr = _ptr; return *this; }
+  operator const T *() const { return ptr; }
+  const T &operator*() const { return *ptr; }
+  const T *operator->() const { return ptr; }
+
+  operator T *() { return ptr; }
+  T &operator*() { return *ptr; }
+  T *operator->() { return ptr; }
+  atomic_ptr &operator=(T *_ptr) {
+    ptr = _ptr;
+    return *this;
+  }
 
 #else
-  operator const T*() const { return ptr.load(); }
-  const T& operator*() const { return *ptr; }
-  const T* operator->() const { return ptr.load(); }
-
-  operator T*() { return ptr.load(); }
-  T& operator*() { return *ptr; }
-  T* operator->() { return ptr.load(); }
-  atomic_ptr& operator=(T *_ptr) { ptr = _ptr; return *this; }
+  operator const T *() const { return ptr.load(); }
+  const T &operator*() const { return *ptr; }
+  const T *operator->() const { return ptr.load(); }
+
+  operator T *() { return ptr.load(); }
+  T &operator*() { return *ptr; }
+  T *operator->() { return ptr.load(); }
+  atomic_ptr &operator=(T *_ptr) {
+    ptr = _ptr;
+    return *this;
+  }
 
 #endif
 
 private:
 #if defined(SIMDUTF_NO_THREADS)
-  T* ptr;
+  T *ptr;
 #else
-  std::atomic<T*> ptr;
+  std::atomic<T *> ptr;
 #endif
 };
 
@@ -3858,27 +4596,28 @@ class detect_best_supported_implementation_on_first_use;
 /**
  * The list of available implementations compiled into simdutf.
  */
-extern SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list& get_available_implementations();
+extern SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list &
+get_available_implementations();
 
 /**
-  * The active implementation.
-  *
-  * Automatically initialized on first use to the most advanced implementation supported by this hardware.
-  */
-extern SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation>& get_active_implementation();
-
+ * The active implementation.
+ *
+ * Automatically initialized on first use to the most advanced implementation
+ * supported by this hardware.
+ */
+extern SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation> &
+get_active_implementation();
 
 } // namespace simdutf
 
 #endif // SIMDUTF_IMPLEMENTATION_H
 /* end file include/simdutf/implementation.h */
 
-
-// Implementation-internal files (must be included before the implementations themselves, to keep
-// amalgamation working--otherwise, the first time a file is included, it might be put inside the
-// #ifdef SIMDUTF_IMPLEMENTATION_ARM64/FALLBACK/etc., which means the other implementations can't
-// compile unless that implementation is turned on).
-
+// Implementation-internal files (must be included before the implementations
+// themselves, to keep amalgamation working--otherwise, the first time a file is
+// included, it might be put inside the #ifdef
+// SIMDUTF_IMPLEMENTATION_ARM64/FALLBACK/etc., which means the other
+// implementations can't compile unless that implementation is turned on).
 
 SIMDUTF_POP_DISABLE_WARNINGS
 

From 8c8de30680a181ba1fbeaa58c1d5736fd2134347 Mon Sep 17 00:00:00 2001
From: Wei Zhu <yesmeck@gmail.com>
Date: Wed, 16 Oct 2024 05:32:07 +1030
Subject: [PATCH 018/216] esm: fix inconsistency with `importAssertion` in
 `resolve` hook

As the documentation states, the `context.importAssertion` should be
still supported and emit a warning. This is true for the `load` hook,
but not correct for context of the `resolve` hook.

This commit fixes the inconsistency.

PR-URL: https://github.com/nodejs/node/pull/55365
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 lib/internal/modules/esm/hooks.js                    | 2 +-
 test/es-module/test-esm-import-assertion-warning.mjs | 9 +++++++--
 test/fixtures/es-module-loaders/hooks-input.mjs      | 4 +++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/lib/internal/modules/esm/hooks.js b/lib/internal/modules/esm/hooks.js
index aca37333581596..88c66f89a83c66 100644
--- a/lib/internal/modules/esm/hooks.js
+++ b/lib/internal/modules/esm/hooks.js
@@ -303,7 +303,7 @@ class Hooks {
 
     const nextResolve = nextHookFactory(chain[chain.length - 1], meta, { validateArgs, validateOutput });
 
-    const resolution = await nextResolve(originalSpecifier, context);
+    const resolution = await nextResolve(originalSpecifier, defineImportAssertionAlias(context));
     const { hookErrIdentifier } = meta; // Retrieve the value after all settled
 
     validateOutput(hookErrIdentifier, resolution);
diff --git a/test/es-module/test-esm-import-assertion-warning.mjs b/test/es-module/test-esm-import-assertion-warning.mjs
index a11b5164cebffc..2f9b7348a2d97a 100644
--- a/test/es-module/test-esm-import-assertion-warning.mjs
+++ b/test/es-module/test-esm-import-assertion-warning.mjs
@@ -7,6 +7,11 @@ await Promise.all([
   `data:text/javascript,export ${encodeURIComponent(function resolve() {
     return { shortCircuit: true, url: 'data:application/json,1', importAssertions: { type: 'json' } };
   })}`,
+  // Using importAssertions on the context object of the resolve hook should warn but still work.
+  `data:text/javascript,export ${encodeURIComponent(function resolve(s, c, n) {
+    const type = c.importAssertions.type;
+    return { shortCircuit: true, url: 'data:application/json,1', importAttributes: { type: type ?? 'json' } };
+  })}`,
   // Setting importAssertions on the context object of the load hook should warn but still work.
   `data:text/javascript,export ${encodeURIComponent(function load(u, c, n) {
     c.importAssertions = { type: 'json' };
@@ -22,9 +27,9 @@ await Promise.all([
     '--eval', `
     import assert from 'node:assert';
     import { register } from 'node:module';
-    
+
     register(${JSON.stringify(loaderURL)});
-    
+
     assert.deepStrictEqual(
       { ...await import('data:') },
       { default: 1 }
diff --git a/test/fixtures/es-module-loaders/hooks-input.mjs b/test/fixtures/es-module-loaders/hooks-input.mjs
index 1d3759f458224e..854b8e619281e4 100644
--- a/test/fixtures/es-module-loaders/hooks-input.mjs
+++ b/test/fixtures/es-module-loaders/hooks-input.mjs
@@ -37,6 +37,7 @@ export async function resolve(specifier, context, next) {
     'conditions',
     'importAttributes',
     'parentURL',
+    'importAssertions',
   ]);
   assert.ok(Array.isArray(context.conditions));
   assert.strictEqual(typeof next, 'function');
@@ -71,9 +72,10 @@ export async function load(url, context, next) {
 
   assert.ok(new URL(url));
   // Ensure `context` has all and only the properties it's supposed to
-  assert.deepStrictEqual(Object.keys(context), [
+  assert.deepStrictEqual(Reflect.ownKeys(context), [
     'format',
     'importAttributes',
+    'importAssertions',
   ]);
   assert.strictEqual(context.format, 'test');
   assert.strictEqual(typeof next, 'function');

From 3fcca163747b1a2d9f1ecc700f4c29d14731f114 Mon Sep 17 00:00:00 2001
From: Erick Wendel <erick.workspace@gmail.com>
Date: Tue, 15 Oct 2024 17:36:09 -0300
Subject: [PATCH 019/216] test_runner: add support for scheduler.wait on mock
 timers

This adds support for nodetimers.promises.scheduler.wait on Mocktimers

Refs: https://github.com/nodejs/node/pull/55244
PR-URL: https://github.com/nodejs/node/pull/55244
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 lib/internal/test_runner/mock/mock_timers.js |  49 ++++++--
 test/parallel/test-runner-mock-timers.js     | 120 +++++++++++++++++++
 2 files changed, 157 insertions(+), 12 deletions(-)

diff --git a/lib/internal/test_runner/mock/mock_timers.js b/lib/internal/test_runner/mock/mock_timers.js
index 2d8ae186a9dfef..059d3fc96ac86c 100644
--- a/lib/internal/test_runner/mock/mock_timers.js
+++ b/lib/internal/test_runner/mock/mock_timers.js
@@ -60,9 +60,9 @@ function abortIt(signal) {
 }
 
 /**
- * @enum {('setTimeout'|'setInterval'|'setImmediate'|'Date')[]} Supported timers
+ * @enum {('setTimeout'|'setInterval'|'setImmediate'|'Date', 'scheduler.wait')[]} Supported timers
  */
-const SUPPORTED_APIS = ['setTimeout', 'setInterval', 'setImmediate', 'Date'];
+const SUPPORTED_APIS = ['setTimeout', 'setInterval', 'setImmediate', 'Date', 'scheduler.wait'];
 const TIMERS_DEFAULT_INTERVAL = {
   __proto__: null,
   setImmediate: -1,
@@ -104,6 +104,7 @@ class MockTimers {
 
   #realPromisifiedSetTimeout;
   #realPromisifiedSetInterval;
+  #realTimersPromisifiedSchedulerWait;
 
   #realTimersSetTimeout;
   #realTimersClearTimeout;
@@ -188,6 +189,13 @@ class MockTimers {
     );
   }
 
+  #restoreOriginalSchedulerWait() {
+    nodeTimersPromises.scheduler.wait = FunctionPrototypeBind(
+      this.#realTimersPromisifiedSchedulerWait,
+      this,
+    );
+  }
+
   #restoreOriginalSetTimeout() {
     ObjectDefineProperty(
       globalThis,
@@ -262,6 +270,14 @@ class MockTimers {
     );
   }
 
+  #storeOriginalSchedulerWait() {
+
+    this.#realTimersPromisifiedSchedulerWait = FunctionPrototypeBind(
+      nodeTimersPromises.scheduler.wait,
+      this,
+    );
+  }
+
   #storeOriginalSetTimeout() {
     this.#realSetTimeout = ObjectGetOwnPropertyDescriptor(
       globalThis,
@@ -558,8 +574,14 @@ class MockTimers {
     const options = {
       __proto__: null,
       toFake: {
-        __proto__: null,
-        setTimeout: () => {
+        '__proto__': null,
+        'scheduler.wait': () => {
+          this.#storeOriginalSchedulerWait();
+
+          nodeTimersPromises.scheduler.wait = (delay, options) =>
+            this.#setTimeoutPromisified(delay, undefined, options);
+        },
+        'setTimeout': () => {
           this.#storeOriginalSetTimeout();
 
           globalThis.setTimeout = this.#setTimeout;
@@ -573,7 +595,7 @@ class MockTimers {
             this,
           );
         },
-        setInterval: () => {
+        'setInterval': () => {
           this.#storeOriginalSetInterval();
 
           globalThis.setInterval = this.#setInterval;
@@ -587,7 +609,7 @@ class MockTimers {
             this,
           );
         },
-        setImmediate: () => {
+        'setImmediate': () => {
           this.#storeOriginalSetImmediate();
 
           // setImmediate functions needs to bind MockTimers
@@ -611,23 +633,26 @@ class MockTimers {
             this,
           );
         },
-        Date: () => {
+        'Date': () => {
           this.#nativeDateDescriptor = ObjectGetOwnPropertyDescriptor(globalThis, 'Date');
           globalThis.Date = this.#createDate();
         },
       },
       toReal: {
-        __proto__: null,
-        setTimeout: () => {
+        '__proto__': null,
+        'scheduler.wait': () => {
+          this.#restoreOriginalSchedulerWait();
+        },
+        'setTimeout': () => {
           this.#restoreOriginalSetTimeout();
         },
-        setInterval: () => {
+        'setInterval': () => {
           this.#restoreOriginalSetInterval();
         },
-        setImmediate: () => {
+        'setImmediate': () => {
           this.#restoreSetImmediate();
         },
-        Date: () => {
+        'Date': () => {
           ObjectDefineProperty(globalThis, 'Date', this.#nativeDateDescriptor);
         },
       },
diff --git a/test/parallel/test-runner-mock-timers.js b/test/parallel/test-runner-mock-timers.js
index 9e1bc7e62cc5b2..e438b2636b832a 100644
--- a/test/parallel/test-runner-mock-timers.js
+++ b/test/parallel/test-runner-mock-timers.js
@@ -791,6 +791,126 @@ describe('Mock Timers Test Suite', () => {
     });
   });
 
+  describe('scheduler Suite', () => {
+    describe('scheduler.wait', () => {
+      it('should advance in time and trigger timers when calling the .tick function', (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+
+        const now = Date.now();
+        const durationAtMost = 100;
+
+        const p = nodeTimersPromises.scheduler.wait(4000);
+        t.mock.timers.tick(4000);
+
+        return p.then(common.mustCall((result) => {
+          assert.strictEqual(result, undefined);
+          assert.ok(
+            Date.now() - now < durationAtMost,
+            `time should be advanced less than the ${durationAtMost}ms`
+          );
+        }));
+      });
+
+      it('should advance in time and trigger timers when calling the .tick function multiple times', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+
+        const fn = t.mock.fn();
+
+        nodeTimersPromises.scheduler.wait(9999).then(fn);
+
+        t.mock.timers.tick(8999);
+        assert.strictEqual(fn.mock.callCount(), 0);
+        t.mock.timers.tick(500);
+
+        await nodeTimersPromises.setImmediate();
+
+        assert.strictEqual(fn.mock.callCount(), 0);
+        t.mock.timers.tick(500);
+
+        await nodeTimersPromises.setImmediate();
+        assert.strictEqual(fn.mock.callCount(), 1);
+      });
+
+      it('should work with the same params as the original timers/promises/scheduler.wait', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+        const controller = new AbortController();
+        const p = nodeTimersPromises.scheduler.wait(2000, {
+          ref: true,
+          signal: controller.signal,
+        });
+
+        t.mock.timers.tick(1000);
+        t.mock.timers.tick(500);
+        t.mock.timers.tick(500);
+        t.mock.timers.tick(500);
+
+        const result = await p;
+        assert.strictEqual(result, undefined);
+      });
+
+      it('should abort operation if timers/promises/scheduler.wait received an aborted signal', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+        const controller = new AbortController();
+        const p = nodeTimersPromises.scheduler.wait(2000, {
+          ref: true,
+          signal: controller.signal,
+        });
+
+        t.mock.timers.tick(1000);
+        controller.abort();
+        t.mock.timers.tick(500);
+        t.mock.timers.tick(500);
+        t.mock.timers.tick(500);
+
+        await assert.rejects(() => p, {
+          name: 'AbortError',
+        });
+      });
+      it('should abort operation even if the .tick was not called', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+        const controller = new AbortController();
+        const p = nodeTimersPromises.scheduler.wait(2000, {
+          ref: true,
+          signal: controller.signal,
+        });
+
+        controller.abort();
+
+        await assert.rejects(() => p, {
+          name: 'AbortError',
+        });
+      });
+
+      it('should abort operation when .abort is called before calling setInterval', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+        const controller = new AbortController();
+        controller.abort();
+        const p = nodeTimersPromises.scheduler.wait(2000, {
+          ref: true,
+          signal: controller.signal,
+        });
+
+        await assert.rejects(() => p, {
+          name: 'AbortError',
+        });
+      });
+
+      it('should reject given an an invalid signal instance', async (t) => {
+        t.mock.timers.enable({ apis: ['scheduler.wait'] });
+        const p = nodeTimersPromises.scheduler.wait(2000, {
+          ref: true,
+          signal: {},
+        });
+
+        await assert.rejects(() => p, {
+          name: 'TypeError',
+          code: 'ERR_INVALID_ARG_TYPE',
+        });
+      });
+
+    });
+  });
+
   describe('Date Suite', () => {
     it('should return the initial UNIX epoch if not specified', (t) => {
       t.mock.timers.enable({ apis: ['Date'] });

From 85d8eb397c4662ca211d39867480f8ad1d37819d Mon Sep 17 00:00:00 2001
From: Jan Martin <jan.krems@gmail.com>
Date: Mon, 30 Sep 2024 11:08:35 -0700
Subject: [PATCH 020/216] doc: spell out condition restrictions

PR-URL: https://github.com/nodejs/node/pull/55187
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 doc/api/packages.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/doc/api/packages.md b/doc/api/packages.md
index 4e3414c66f7f7a..8e07bb60c1e7f7 100644
--- a/doc/api/packages.md
+++ b/doc/api/packages.md
@@ -715,6 +715,20 @@ exports, while resolving the existing `"node"`, `"node-addons"`, `"default"`,
 
 Any number of custom conditions can be set with repeat flags.
 
+Typical conditions should only contain alphanumerical characters,
+using ":", "-", or "=" as separators if necessary. Anything else may run
+into compability issues outside of node.
+
+In node, conditions have very few restrictions, but specifically these include:
+
+1. They must contain at least one character.
+2. They cannot start with "." since they may appear in places that also
+   allow relative paths.
+3. They cannot contain "," since they may be parsed as a comma-separated
+   list by some CLI tools.
+4. They cannot be integer property keys like "10" since that can have
+   unexpected effects on property key ordering for JS objects.
+
 ### Community Conditions Definitions
 
 Condition strings other than the `"import"`, `"require"`, `"node"`,

From 5eb6c94851c57d5da29a2b31c4fa12988ac3462c Mon Sep 17 00:00:00 2001
From: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Date: Thu, 17 Oct 2024 16:35:02 +0330
Subject: [PATCH 021/216] build: fix path concatenation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- The `Path` class does not support concatenation with the `+`
operator, so use the `/` operator instead.
- When concatenating paths, if the operand is an absolute path the
previous path is ignored, so change `/include` to `include`.

PR-URL: https://github.com/nodejs/node/pull/55387
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 configure.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.py b/configure.py
index 0d089c35d1720e..6cfb95eee95690 100755
--- a/configure.py
+++ b/configure.py
@@ -1308,7 +1308,7 @@ def configure_zos(o):
   o['variables']['node_static_zoslib'] = b(True)
   if options.static_zoslib_gyp:
     # Apply to all Node.js components for now
-    o['variables']['zoslib_include_dir'] = Path(options.static_zoslib_gyp).parent + '/include'
+    o['variables']['zoslib_include_dir'] = Path(options.static_zoslib_gyp).parent / 'include'
     o['include_dirs'] += [o['variables']['zoslib_include_dir']]
   else:
     raise Exception('--static-zoslib-gyp=<path to zoslib.gyp file> is required.')

From 44f3b23749825879db68c70c4e9a48209060278d Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Thu, 17 Oct 2024 15:18:28 +0200
Subject: [PATCH 022/216] dns: honor the order option

Fixes: https://github.com/nodejs/node/issues/55391
PR-URL: https://github.com/nodejs/node/pull/55392
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Tim Perry <pimterry@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Feng Yu <F3n67u@outlook.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 lib/dns.js                                       |  2 +-
 test/parallel/test-dns-default-order-verbatim.js | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/dns.js b/lib/dns.js
index 9924f07f41947b..448258dfedc10f 100644
--- a/lib/dns.js
+++ b/lib/dns.js
@@ -193,7 +193,7 @@ function lookup(hostname, options, callback) {
     }
     if (options?.order != null) {
       validateOneOf(options.order, 'options.order', ['ipv4first', 'ipv6first', 'verbatim']);
-      dnsOrder = options.dnsOrder;
+      dnsOrder = options.order;
     }
   }
 
diff --git a/test/parallel/test-dns-default-order-verbatim.js b/test/parallel/test-dns-default-order-verbatim.js
index 0c45728782d20a..12b666191b9ef5 100644
--- a/test/parallel/test-dns-default-order-verbatim.js
+++ b/test/parallel/test-dns-default-order-verbatim.js
@@ -46,4 +46,14 @@ function allowFailed(fn) {
 
   await allowFailed(dnsPromises.lookup('example.org', {}));
   checkParameter(cares.DNS_ORDER_VERBATIM);
+
+  await allowFailed(
+    promisify(dns.lookup)('example.org', { order: 'ipv4first' })
+  );
+  checkParameter(cares.DNS_ORDER_IPV4_FIRST);
+
+  await allowFailed(
+    promisify(dns.lookup)('example.org', { order: 'ipv6first' })
+  );
+  checkParameter(cares.DNS_ORDER_IPV6_FIRST);
 })().then(common.mustCall());

From 22c07867d1f353f1303cf635e8577a77d61772d7 Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Thu, 17 Oct 2024 15:43:02 +0200
Subject: [PATCH 023/216] test: remove duplicate tests

`test/parallel/test-dns-default-verbatim-false.js` is a duplicate of
`test/parallel/test-dns-default-order-ipv4.js` and
`test/parallel/test-dns-default-verbatim-true.js` is a duplicate of
`test/parallel/test-dns-default-order-verbatim.js`.

PR-URL: https://github.com/nodejs/node/pull/55393
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 .../test-dns-default-verbatim-false.js        | 51 -------------------
 .../test-dns-default-verbatim-true.js         | 51 -------------------
 2 files changed, 102 deletions(-)
 delete mode 100644 test/parallel/test-dns-default-verbatim-false.js
 delete mode 100644 test/parallel/test-dns-default-verbatim-true.js

diff --git a/test/parallel/test-dns-default-verbatim-false.js b/test/parallel/test-dns-default-verbatim-false.js
deleted file mode 100644
index 76f6ef0bcabd82..00000000000000
--- a/test/parallel/test-dns-default-verbatim-false.js
+++ /dev/null
@@ -1,51 +0,0 @@
-// Flags: --expose-internals --dns-result-order=ipv4first
-'use strict';
-const common = require('../common');
-const assert = require('assert');
-const { internalBinding } = require('internal/test/binding');
-const cares = internalBinding('cares_wrap');
-const { promisify } = require('util');
-
-// Test that --dns-result-order=ipv4first works as expected.
-
-const originalGetaddrinfo = cares.getaddrinfo;
-const calls = [];
-cares.getaddrinfo = common.mustCallAtLeast((...args) => {
-  calls.push(args);
-  originalGetaddrinfo(...args);
-}, 1);
-
-const dns = require('dns');
-const dnsPromises = dns.promises;
-
-let verbatim;
-
-// We want to test the parameter of verbatim only so that we
-// ignore possible errors here.
-function allowFailed(fn) {
-  return fn.catch((_err) => {
-    //
-  });
-}
-
-(async () => {
-  let callsLength = 0;
-  const checkParameter = (expected) => {
-    assert.strictEqual(calls.length, callsLength + 1);
-    verbatim = calls[callsLength][4];
-    assert.strictEqual(verbatim, expected);
-    callsLength += 1;
-  };
-
-  await allowFailed(promisify(dns.lookup)('example.org'));
-  checkParameter(cares.DNS_ORDER_IPV4_FIRST);
-
-  await allowFailed(dnsPromises.lookup('example.org'));
-  checkParameter(cares.DNS_ORDER_IPV4_FIRST);
-
-  await allowFailed(promisify(dns.lookup)('example.org', {}));
-  checkParameter(cares.DNS_ORDER_IPV4_FIRST);
-
-  await allowFailed(dnsPromises.lookup('example.org', {}));
-  checkParameter(cares.DNS_ORDER_IPV4_FIRST);
-})().then(common.mustCall());
diff --git a/test/parallel/test-dns-default-verbatim-true.js b/test/parallel/test-dns-default-verbatim-true.js
deleted file mode 100644
index dfa0640f446412..00000000000000
--- a/test/parallel/test-dns-default-verbatim-true.js
+++ /dev/null
@@ -1,51 +0,0 @@
-// Flags: --expose-internals --dns-result-order=verbatim
-'use strict';
-const common = require('../common');
-const assert = require('assert');
-const { internalBinding } = require('internal/test/binding');
-const cares = internalBinding('cares_wrap');
-const { promisify } = require('util');
-
-// Test that --dns-result-order=verbatim works as expected.
-
-const originalGetaddrinfo = cares.getaddrinfo;
-const calls = [];
-cares.getaddrinfo = common.mustCallAtLeast((...args) => {
-  calls.push(args);
-  originalGetaddrinfo(...args);
-}, 1);
-
-const dns = require('dns');
-const dnsPromises = dns.promises;
-
-let verbatim;
-
-// We want to test the parameter of verbatim only so that we
-// ignore possible errors here.
-function allowFailed(fn) {
-  return fn.catch((_err) => {
-    //
-  });
-}
-
-(async () => {
-  let callsLength = 0;
-  const checkParameter = (expected) => {
-    assert.strictEqual(calls.length, callsLength + 1);
-    verbatim = calls[callsLength][4];
-    assert.strictEqual(verbatim, expected);
-    callsLength += 1;
-  };
-
-  await allowFailed(promisify(dns.lookup)('example.org'));
-  checkParameter(cares.DNS_ORDER_VERBATIM);
-
-  await allowFailed(dnsPromises.lookup('example.org'));
-  checkParameter(cares.DNS_ORDER_VERBATIM);
-
-  await allowFailed(promisify(dns.lookup)('example.org', {}));
-  checkParameter(cares.DNS_ORDER_VERBATIM);
-
-  await allowFailed(dnsPromises.lookup('example.org', {}));
-  checkParameter(cares.DNS_ORDER_VERBATIM);
-})().then(common.mustCall());

From 6d7b78c3d8e7e24d1bb0cfd58362b556c762a9c7 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Thu, 17 Oct 2024 12:45:46 -0300
Subject: [PATCH 024/216] meta: change color to blue notify review-wanted

The current colour seems something went wrong when in fact
it's just someone asking for a review.

PR-URL: https://github.com/nodejs/node/pull/55423
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Filip Skokan <panva.ip@gmail.com>
---
 .github/workflows/notify-on-review-wanted.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/notify-on-review-wanted.yml b/.github/workflows/notify-on-review-wanted.yml
index 5ecb01733a964a..471455e3bf6b28 100644
--- a/.github/workflows/notify-on-review-wanted.yml
+++ b/.github/workflows/notify-on-review-wanted.yml
@@ -32,7 +32,7 @@ jobs:
       - name: Slack Notification
         uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907  # 2.3.0
         env:
-          SLACK_COLOR: '#DE512A'
+          SLACK_COLOR: '#3d85c6'
           SLACK_ICON: https://github.com/nodejs.png?size=48
           SLACK_TITLE: ${{ steps.define-message.outputs.title }}
           SLACK_MESSAGE: ${{ steps.define-message.outputs.message }}

From c1cab9b4d76ea35197b3c3da2046b834c86ff75a Mon Sep 17 00:00:00 2001
From: RafaelGSS <rafael.nunu@hotmail.com>
Date: Tue, 15 Oct 2024 18:20:40 -0300
Subject: [PATCH 025/216] doc: move Beth Griggs keys to old gpg keys
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refs: https://github.com/nodejs/Release/issues/1042
PR-URL: https://github.com/nodejs/node/pull/55399
Refs: https://github.com/nodejs/Release/issues/1036
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 README.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index db5b5c221c83a5..f6852922deacc8 100644
--- a/README.md
+++ b/README.md
@@ -770,8 +770,6 @@ Primary GPG keys for Node.js Releasers (some Releasers sign with subkeys):
 
 * **Antoine du Hamel** <<duhamelantoine1995@gmail.com>>
   `C0D6248439F1D5604AAFFB4021D900FFDB233756`
-* **Beth Griggs** <<bethanyngriggs@gmail.com>>
-  `4ED778F539E3634C779C87C6D7062848A1AB005C`
 * **Bryan English** <<bryan@bryanenglish.com>>
   `141F07595B7B3FFE74309A937405533BE57C7D57`
 * **Danielle Adams** <<adamzdanielle@gmail.com>>
@@ -796,7 +794,6 @@ to sign releases):
 
 ```bash
 gpg --keyserver hkps://keys.openpgp.org --recv-keys C0D6248439F1D5604AAFFB4021D900FFDB233756 # Antoine du Hamel
-gpg --keyserver hkps://keys.openpgp.org --recv-keys 4ED778F539E3634C779C87C6D7062848A1AB005C # Beth Griggs
 gpg --keyserver hkps://keys.openpgp.org --recv-keys 141F07595B7B3FFE74309A937405533BE57C7D57 # Bryan English
 gpg --keyserver hkps://keys.openpgp.org --recv-keys 74F12602B6F1C4E913FAA37AD3A89613643B6201 # Danielle Adams
 gpg --keyserver hkps://keys.openpgp.org --recv-keys DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7 # Juan José Arboleda
@@ -815,6 +812,8 @@ verify a downloaded file.
 
 <summary>Other keys used to sign some previous releases</summary>
 
+* **Beth Griggs** <<bethanyngriggs@gmail.com>>
+  `4ED778F539E3634C779C87C6D7062848A1AB005C`
 * **Chris Dickinson** <<christopher.s.dickinson@gmail.com>>
   `9554F04D7259F04124DE6B476D5A82AC7E37093B`
 * **Colin Ihrig** <<cjihrig@gmail.com>>

From bfbe651626cd3595fb6eff8575070e83c2acb872 Mon Sep 17 00:00:00 2001
From: RafaelGSS <rafael.nunu@hotmail.com>
Date: Tue, 15 Oct 2024 18:21:56 -0300
Subject: [PATCH 026/216] doc: move Bryan English key to old gpg keys
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refs: https://github.com/nodejs/Release/issues/1040
PR-URL: https://github.com/nodejs/node/pull/55399
Refs: https://github.com/nodejs/Release/issues/1036
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 README.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index f6852922deacc8..58d51410bc0c8c 100644
--- a/README.md
+++ b/README.md
@@ -770,8 +770,6 @@ Primary GPG keys for Node.js Releasers (some Releasers sign with subkeys):
 
 * **Antoine du Hamel** <<duhamelantoine1995@gmail.com>>
   `C0D6248439F1D5604AAFFB4021D900FFDB233756`
-* **Bryan English** <<bryan@bryanenglish.com>>
-  `141F07595B7B3FFE74309A937405533BE57C7D57`
 * **Danielle Adams** <<adamzdanielle@gmail.com>>
   `74F12602B6F1C4E913FAA37AD3A89613643B6201`
 * **Juan José Arboleda** <<soyjuanarbol@gmail.com>>
@@ -794,7 +792,6 @@ to sign releases):
 
 ```bash
 gpg --keyserver hkps://keys.openpgp.org --recv-keys C0D6248439F1D5604AAFFB4021D900FFDB233756 # Antoine du Hamel
-gpg --keyserver hkps://keys.openpgp.org --recv-keys 141F07595B7B3FFE74309A937405533BE57C7D57 # Bryan English
 gpg --keyserver hkps://keys.openpgp.org --recv-keys 74F12602B6F1C4E913FAA37AD3A89613643B6201 # Danielle Adams
 gpg --keyserver hkps://keys.openpgp.org --recv-keys DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7 # Juan José Arboleda
 gpg --keyserver hkps://keys.openpgp.org --recv-keys CC68F5A3106FF448322E48ED27F5E38D5B0A215F # Marco Ippolito
@@ -814,6 +811,8 @@ verify a downloaded file.
 
 * **Beth Griggs** <<bethanyngriggs@gmail.com>>
   `4ED778F539E3634C779C87C6D7062848A1AB005C`
+* **Bryan English** <<bryan@bryanenglish.com>>
+  `141F07595B7B3FFE74309A937405533BE57C7D57`
 * **Chris Dickinson** <<christopher.s.dickinson@gmail.com>>
   `9554F04D7259F04124DE6B476D5A82AC7E37093B`
 * **Colin Ihrig** <<cjihrig@gmail.com>>

From e9a8feb44a5650966e2abc4249b70d5984e7907e Mon Sep 17 00:00:00 2001
From: RafaelGSS <rafael.nunu@hotmail.com>
Date: Tue, 15 Oct 2024 18:22:20 -0300
Subject: [PATCH 027/216] doc: move Danielle Adams key to old gpg keys
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refs: https://github.com/nodejs/Release/issues/1041
PR-URL: https://github.com/nodejs/node/pull/55399
Refs: https://github.com/nodejs/Release/issues/1036
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 README.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 58d51410bc0c8c..b60e3c5df619a4 100644
--- a/README.md
+++ b/README.md
@@ -770,8 +770,6 @@ Primary GPG keys for Node.js Releasers (some Releasers sign with subkeys):
 
 * **Antoine du Hamel** <<duhamelantoine1995@gmail.com>>
   `C0D6248439F1D5604AAFFB4021D900FFDB233756`
-* **Danielle Adams** <<adamzdanielle@gmail.com>>
-  `74F12602B6F1C4E913FAA37AD3A89613643B6201`
 * **Juan José Arboleda** <<soyjuanarbol@gmail.com>>
   `DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7`
 * **Marco Ippolito** <<marcoippolito54@gmail.com>>
@@ -792,7 +790,6 @@ to sign releases):
 
 ```bash
 gpg --keyserver hkps://keys.openpgp.org --recv-keys C0D6248439F1D5604AAFFB4021D900FFDB233756 # Antoine du Hamel
-gpg --keyserver hkps://keys.openpgp.org --recv-keys 74F12602B6F1C4E913FAA37AD3A89613643B6201 # Danielle Adams
 gpg --keyserver hkps://keys.openpgp.org --recv-keys DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7 # Juan José Arboleda
 gpg --keyserver hkps://keys.openpgp.org --recv-keys CC68F5A3106FF448322E48ED27F5E38D5B0A215F # Marco Ippolito
 gpg --keyserver hkps://keys.openpgp.org --recv-keys 8FCCA13FEF1D0C2E91008E09770F7A9A5AE15600 # Michaël Zasso
@@ -819,6 +816,7 @@ verify a downloaded file.
   `94AE36675C464D64BAFA68DD7434390BDBE9B9C5`
 * **Danielle Adams** <<adamzdanielle@gmail.com>>
   `1C050899334244A8AF75E53792EF661D867B9DFA`
+  `74F12602B6F1C4E913FAA37AD3A89613643B6201`
 * **Evan Lucas** <<evanlucas@me.com>>
   `B9AE9905FFD7803F25714661B63B535A4C206CA9`
 * **Gibson Fahnestock** <<gibfahn@gmail.com>>

From 43f7050338ddc8068c6110379fb12dacae78ec78 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Fri, 18 Oct 2024 12:33:58 -0300
Subject: [PATCH 028/216] benchmark: add --runs support to run.js
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55158
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
---
 benchmark/run.js                              | 83 +++++++++++--------
 .../writing-and-running-benchmarks.md         | 10 +++
 2 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/benchmark/run.js b/benchmark/run.js
index 6a61df71221710..ea0dc415e91ec6 100644
--- a/benchmark/run.js
+++ b/benchmark/run.js
@@ -12,6 +12,8 @@ const cli = new CLI(`usage: ./node run.js [options] [--] <category> ...
                             (can be repeated)
   --exclude  pattern        excludes scripts matching <pattern> (can be
                             repeated)
+  --runs   variable=value   set the amount of benchmark suite execution.
+                            Default: 1
   --set    variable=value   set benchmark variable (can be repeated)
   --format [simple|csv]     optional value that specifies the output format
   test                      only run a single configuration from the options
@@ -45,8 +47,7 @@ if (format === 'csv') {
   console.log('"filename", "configuration", "rate", "time"');
 }
 
-(function recursive(i) {
-  const filename = benchmarks[i];
+function runBenchmark(filename) {
   const scriptPath = path.resolve(__dirname, filename);
 
   const args = cli.test ? ['--test'] : cli.optional.set;
@@ -63,42 +64,52 @@ if (format === 'csv') {
     );
   }
 
-  if (format !== 'csv') {
-    console.log();
-    console.log(filename);
-  }
-
-  child.on('message', (data) => {
-    if (data.type !== 'report') {
-      return;
-    }
-    // Construct configuration string, " A=a, B=b, ..."
-    let conf = '';
-    for (const key of Object.keys(data.conf)) {
-      if (conf !== '')
-        conf += ' ';
-      conf += `${key}=${JSON.stringify(data.conf[key])}`;
-    }
-    if (format === 'csv') {
-      // Escape quotes (") for correct csv formatting
-      conf = conf.replace(/"/g, '""');
-      console.log(`"${data.name}", "${conf}", ${data.rate}, ${data.time}`);
-    } else {
-      let rate = data.rate.toString().split('.');
-      rate[0] = rate[0].replace(/(\d)(?=(?:\d\d\d)+(?!\d))/g, '$1,');
-      rate = (rate[1] ? rate.join('.') : rate[0]);
-      console.log(`${data.name} ${conf}: ${rate}`);
-    }
+  return new Promise((resolve, reject) => {
+    child.on('message', (data) => {
+      if (data.type !== 'report') {
+        return;
+      }
+      // Construct configuration string, " A=a, B=b, ..."
+      let conf = '';
+      for (const key of Object.keys(data.conf)) {
+        if (conf !== '')
+          conf += ' ';
+        conf += `${key}=${JSON.stringify(data.conf[key])}`;
+      }
+      if (format === 'csv') {
+        // Escape quotes (") for correct csv formatting
+        conf = conf.replace(/"/g, '""');
+        console.log(`"${data.name}", "${conf}", ${data.rate}, ${data.time}`);
+      } else {
+        let rate = data.rate.toString().split('.');
+        rate[0] = rate[0].replace(/(\d)(?=(?:\d\d\d)+(?!\d))/g, '$1,');
+        rate = (rate[1] ? rate.join('.') : rate[0]);
+        console.log(`${data.name} ${conf}: ${rate}`);
+      }
+    });
+    child.once('close', (code) => {
+      if (code) {
+        reject(code);
+      } else {
+        resolve(code);
+      }
+    });
   });
+}
 
-  child.once('close', (code) => {
-    if (code) {
-      process.exit(code);
+async function run() {
+  for (let i = 0; i < benchmarks.length; ++i) {
+    let runs = cli.optional.runs ?? 1;
+    const filename = benchmarks[i];
+    if (format !== 'csv') {
+      console.log();
+      console.log(filename);
     }
 
-    // If there are more benchmarks execute the next
-    if (i + 1 < benchmarks.length) {
-      recursive(i + 1);
+    while (runs-- > 0) {
+      await runBenchmark(filename);
     }
-  });
-})(0);
+  }
+}
+
+run();
diff --git a/doc/contributing/writing-and-running-benchmarks.md b/doc/contributing/writing-and-running-benchmarks.md
index 9ad7166be321bb..63fbf75c798833 100644
--- a/doc/contributing/writing-and-running-benchmarks.md
+++ b/doc/contributing/writing-and-running-benchmarks.md
@@ -174,6 +174,16 @@ It is possible to execute more groups by adding extra process arguments.
 node benchmark/run.js assert async_hooks
 ```
 
+It's also possible to execute the benchmark more than once using the
+`--runs` flag.
+
+```bash
+node benchmark/run.js --runs 10 assert async_hooks
+```
+
+This command will run the benchmark files in `benchmark/assert` and `benchmark/async_hooks`
+10 times each.
+
 #### Specifying CPU Cores for Benchmarks with run.js
 
 When using `run.js` to execute a group of benchmarks,

From 0977bb6c1d189da798c2203c17f2d961a4eeb115 Mon Sep 17 00:00:00 2001
From: Yagiz Nizipli <yagiz@nizipli.com>
Date: Fri, 18 Oct 2024 12:47:50 -0400
Subject: [PATCH 029/216] src: remove icu based `ToASCII` and `ToUnicode`

PR-URL: https://github.com/nodejs/node/pull/55156
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 src/node_i18n.cc                        | 172 +-----------------------
 src/node_i18n.h                         |  13 --
 test/fixtures/icu-punycode-toascii.json | 149 --------------------
 test/parallel/test-icu-punycode.js      |  57 --------
 4 files changed, 2 insertions(+), 389 deletions(-)
 delete mode 100644 test/fixtures/icu-punycode-toascii.json
 delete mode 100644 test/parallel/test-icu-punycode.js

diff --git a/src/node_i18n.cc b/src/node_i18n.cc
index 2aa7cd98ecc179..1ddae30e97257b 100644
--- a/src/node_i18n.cc
+++ b/src/node_i18n.cc
@@ -60,18 +60,16 @@
 #include <unicode/uchar.h>
 #include <unicode/uclean.h>
 #include <unicode/ucnv.h>
-#include <unicode/udata.h>
-#include <unicode/uidna.h>
 #include <unicode/ulocdata.h>
 #include <unicode/urename.h>
-#include <unicode/ustring.h>
 #include <unicode/utf16.h>
-#include <unicode/utf8.h>
 #include <unicode/utypes.h>
 #include <unicode/uvernum.h>
 #include <unicode/uversion.h>
 
 #ifdef NODE_HAVE_SMALL_ICU
+#include <unicode/udata.h>
+
 /* if this is defined, we have a 'secondary' entry point.
    compare following to utypes.h defs for U_ICUDATA_ENTRY_POINT */
 #define SMALL_ICUDATA_ENTRY_POINT \
@@ -95,7 +93,6 @@ using v8::Int32;
 using v8::Isolate;
 using v8::Local;
 using v8::MaybeLocal;
-using v8::NewStringType;
 using v8::Object;
 using v8::ObjectTemplate;
 using v8::String;
@@ -582,167 +579,6 @@ void SetDefaultTimeZone(const char* tzid) {
   CHECK(U_SUCCESS(status));
 }
 
-int32_t ToUnicode(MaybeStackBuffer<char>* buf,
-                  const char* input,
-                  size_t length) {
-  UErrorCode status = U_ZERO_ERROR;
-  uint32_t options = UIDNA_NONTRANSITIONAL_TO_UNICODE;
-  UIDNA* uidna = uidna_openUTS46(options, &status);
-  if (U_FAILURE(status))
-    return -1;
-  UIDNAInfo info = UIDNA_INFO_INITIALIZER;
-
-  int32_t len = uidna_nameToUnicodeUTF8(uidna,
-                                        input, length,
-                                        **buf, buf->capacity(),
-                                        &info,
-                                        &status);
-
-  // Do not check info.errors like we do with ToASCII since ToUnicode always
-  // returns a string, despite any possible errors that may have occurred.
-
-  if (status == U_BUFFER_OVERFLOW_ERROR) {
-    status = U_ZERO_ERROR;
-    buf->AllocateSufficientStorage(len);
-    len = uidna_nameToUnicodeUTF8(uidna,
-                                  input, length,
-                                  **buf, buf->capacity(),
-                                  &info,
-                                  &status);
-  }
-
-  // info.errors is ignored as UTS #46 ToUnicode always produces a Unicode
-  // string, regardless of whether an error occurred.
-
-  if (U_FAILURE(status)) {
-    len = -1;
-    buf->SetLength(0);
-  } else {
-    buf->SetLength(len);
-  }
-
-  uidna_close(uidna);
-  return len;
-}
-
-int32_t ToASCII(MaybeStackBuffer<char>* buf,
-                const char* input,
-                size_t length,
-                idna_mode mode) {
-  UErrorCode status = U_ZERO_ERROR;
-  uint32_t options =                  // CheckHyphens = false; handled later
-    UIDNA_CHECK_BIDI |                // CheckBidi = true
-    UIDNA_CHECK_CONTEXTJ |            // CheckJoiners = true
-    UIDNA_NONTRANSITIONAL_TO_ASCII;   // Nontransitional_Processing
-  if (mode == idna_mode::kStrict) {
-    options |= UIDNA_USE_STD3_RULES;  // UseSTD3ASCIIRules = beStrict
-                                      // VerifyDnsLength = beStrict;
-                                      //   handled later
-  }
-
-  UIDNA* uidna = uidna_openUTS46(options, &status);
-  if (U_FAILURE(status))
-    return -1;
-  UIDNAInfo info = UIDNA_INFO_INITIALIZER;
-
-  int32_t len = uidna_nameToASCII_UTF8(uidna,
-                                       input, length,
-                                       **buf, buf->capacity(),
-                                       &info,
-                                       &status);
-
-  if (status == U_BUFFER_OVERFLOW_ERROR) {
-    status = U_ZERO_ERROR;
-    buf->AllocateSufficientStorage(len);
-    len = uidna_nameToASCII_UTF8(uidna,
-                                 input, length,
-                                 **buf, buf->capacity(),
-                                 &info,
-                                 &status);
-  }
-
-  // In UTS #46 which specifies ToASCII, certain error conditions are
-  // configurable through options, and the WHATWG URL Standard promptly elects
-  // to disable some of them to accommodate for real-world use cases.
-  // Unfortunately, ICU4C's IDNA module does not support disabling some of
-  // these options through `options` above, and thus continues throwing
-  // unnecessary errors. To counter this situation, we just filter out the
-  // errors that may have happened afterwards, before deciding whether to
-  // return an error from this function.
-
-  // CheckHyphens = false
-  // (Specified in the current UTS #46 draft rev. 18.)
-  // Refs:
-  // - https://github.com/whatwg/url/issues/53
-  // - https://github.com/whatwg/url/pull/309
-  // - http://www.unicode.org/review/pri317/
-  // - http://www.unicode.org/reports/tr46/tr46-18.html
-  // - https://www.icann.org/news/announcement-2000-01-07-en
-  info.errors &= ~UIDNA_ERROR_HYPHEN_3_4;
-  info.errors &= ~UIDNA_ERROR_LEADING_HYPHEN;
-  info.errors &= ~UIDNA_ERROR_TRAILING_HYPHEN;
-
-  if (mode != idna_mode::kStrict) {
-    // VerifyDnsLength = beStrict
-    info.errors &= ~UIDNA_ERROR_EMPTY_LABEL;
-    info.errors &= ~UIDNA_ERROR_LABEL_TOO_LONG;
-    info.errors &= ~UIDNA_ERROR_DOMAIN_NAME_TOO_LONG;
-  }
-
-  if (U_FAILURE(status) || (mode != idna_mode::kLenient && info.errors != 0)) {
-    len = -1;
-    buf->SetLength(0);
-  } else {
-    buf->SetLength(len);
-  }
-
-  uidna_close(uidna);
-  return len;
-}
-
-static void ToUnicode(const FunctionCallbackInfo<Value>& args) {
-  Environment* env = Environment::GetCurrent(args);
-  CHECK_GE(args.Length(), 1);
-  CHECK(args[0]->IsString());
-  Utf8Value val(env->isolate(), args[0]);
-
-  MaybeStackBuffer<char> buf;
-  int32_t len = ToUnicode(&buf, *val, val.length());
-
-  if (len < 0) {
-    return THROW_ERR_INVALID_ARG_VALUE(env, "Cannot convert name to Unicode");
-  }
-
-  args.GetReturnValue().Set(
-      String::NewFromUtf8(env->isolate(),
-                          *buf,
-                          NewStringType::kNormal,
-                          len).ToLocalChecked());
-}
-
-static void ToASCII(const FunctionCallbackInfo<Value>& args) {
-  Environment* env = Environment::GetCurrent(args);
-  CHECK_GE(args.Length(), 1);
-  CHECK(args[0]->IsString());
-  Utf8Value val(env->isolate(), args[0]);
-  // optional arg
-  bool lenient = args[1]->BooleanValue(env->isolate());
-  idna_mode mode = lenient ? idna_mode::kLenient : idna_mode::kDefault;
-
-  MaybeStackBuffer<char> buf;
-  int32_t len = ToASCII(&buf, *val, val.length(), mode);
-
-  if (len < 0) {
-    return THROW_ERR_INVALID_ARG_VALUE(env, "Cannot convert name to ASCII");
-  }
-
-  args.GetReturnValue().Set(
-      String::NewFromUtf8(env->isolate(),
-                          *buf,
-                          NewStringType::kNormal,
-                          len).ToLocalChecked());
-}
-
 // This is similar to wcwidth except that it takes the current unicode
 // character properties database into consideration, allowing it to
 // correctly calculate the column widths of things like emoji's and
@@ -849,8 +685,6 @@ static void CreatePerIsolateProperties(IsolateData* isolate_data,
                                        Local<ObjectTemplate> target) {
   Isolate* isolate = isolate_data->isolate();
 
-  SetMethod(isolate, target, "toUnicode", ToUnicode);
-  SetMethod(isolate, target, "toASCII", ToASCII);
   SetMethod(isolate, target, "getStringWidth", GetStringWidth);
 
   // One-shot converters
@@ -879,8 +713,6 @@ void CreatePerContextProperties(Local<Object> target,
                                 void* priv) {}
 
 void RegisterExternalReferences(ExternalReferenceRegistry* registry) {
-  registry->Register(ToUnicode);
-  registry->Register(ToASCII);
   registry->Register(GetStringWidth);
   registry->Register(ICUErrorName);
   registry->Register(Transcode);
diff --git a/src/node_i18n.h b/src/node_i18n.h
index e516282865fb18..6ba3c41fc4f056 100644
--- a/src/node_i18n.h
+++ b/src/node_i18n.h
@@ -53,19 +53,6 @@ enum class idna_mode {
   kStrict
 };
 
-// Implements the WHATWG URL Standard "domain to ASCII" algorithm.
-// https://url.spec.whatwg.org/#concept-domain-to-ascii
-int32_t ToASCII(MaybeStackBuffer<char>* buf,
-                const char* input,
-                size_t length,
-                idna_mode mode = idna_mode::kDefault);
-
-// Implements the WHATWG URL Standard "domain to Unicode" algorithm.
-// https://url.spec.whatwg.org/#concept-domain-to-unicode
-int32_t ToUnicode(MaybeStackBuffer<char>* buf,
-                  const char* input,
-                  size_t length);
-
 struct ConverterDeleter {
   void operator()(UConverter* pointer) const { ucnv_close(pointer); }
 };
diff --git a/test/fixtures/icu-punycode-toascii.json b/test/fixtures/icu-punycode-toascii.json
deleted file mode 100644
index 814f06e794866d..00000000000000
--- a/test/fixtures/icu-punycode-toascii.json
+++ /dev/null
@@ -1,149 +0,0 @@
-[
-  "This resource is focused on highlighting issues with UTS #46 ToASCII",
-  {
-    "comment": "Label with hyphens in 3rd and 4th position",
-    "input": "aa--",
-    "output": "aa--"
-  },
-  {
-    "input": "a†--",
-    "output": "xn--a---kp0a"
-  },
-  {
-    "input": "ab--c",
-    "output": "ab--c"
-  },
-  {
-    "comment": "Label with leading hyphen",
-    "input": "-x",
-    "output": "-x"
-  },
-  {
-    "input": "-†",
-    "output": "xn----xhn"
-  },
-  {
-    "input": "-x.xn--nxa",
-    "output": "-x.xn--nxa"
-  },
-  {
-    "input": "-x.β",
-    "output": "-x.xn--nxa"
-  },
-  {
-    "comment": "Label with trailing hyphen",
-    "input": "x-.xn--nxa",
-    "output": "x-.xn--nxa"
-  },
-  {
-    "input": "x-.β",
-    "output": "x-.xn--nxa"
-  },
-  {
-    "comment": "Empty labels",
-    "input": "x..xn--nxa",
-    "output": "x..xn--nxa"
-  },
-  {
-    "input": "x..β",
-    "output": "x..xn--nxa"
-  },
-  {
-    "comment": "Invalid Punycode",
-    "input": "xn--a",
-    "output": null
-  },
-  {
-    "input": "xn--a.xn--nxa",
-    "output": null
-  },
-  {
-    "input": "xn--a.β",
-    "output": null
-  },
-  {
-    "comment": "Valid Punycode",
-    "input": "xn--nxa.xn--nxa",
-    "output": "xn--nxa.xn--nxa"
-  },
-  {
-    "comment": "Mixed",
-    "input": "xn--nxa.β",
-    "output": "xn--nxa.xn--nxa"
-  },
-  {
-    "input": "ab--c.xn--nxa",
-    "output": "ab--c.xn--nxa"
-  },
-  {
-    "input": "ab--c.β",
-    "output": "ab--c.xn--nxa"
-  },
-  {
-    "comment": "CheckJoiners is true",
-    "input": "\u200D.example",
-    "output": null
-  },
-  {
-    "input": "xn--1ug.example",
-    "output": null
-  },
-  {
-    "comment": "CheckBidi is true",
-    "input": "يa",
-    "output": null
-  },
-  {
-    "input": "xn--a-yoc",
-    "output": null
-  },
-  {
-    "comment": "processing_option is Nontransitional_Processing",
-    "input": "ශ්‍රී",
-    "output": "xn--10cl1a0b660p"
-  },
-  {
-    "input": "نامه‌ای",
-    "output": "xn--mgba3gch31f060k"
-  },
-  {
-    "comment": "U+FFFD",
-    "input": "\uFFFD.com",
-    "output": null
-  },
-  {
-    "comment": "U+FFFD character encoded in Punycode",
-    "input": "xn--zn7c.com",
-    "output": null
-  },
-  {
-    "comment": "Label longer than 63 code points",
-    "input": "x01234567890123456789012345678901234567890123456789012345678901x",
-    "output": "x01234567890123456789012345678901234567890123456789012345678901x"
-  },
-  {
-    "input": "x01234567890123456789012345678901234567890123456789012345678901†",
-    "output": "xn--x01234567890123456789012345678901234567890123456789012345678901-6963b"
-  },
-  {
-    "input": "x01234567890123456789012345678901234567890123456789012345678901x.xn--nxa",
-    "output": "x01234567890123456789012345678901234567890123456789012345678901x.xn--nxa"
-  },
-  {
-    "input": "x01234567890123456789012345678901234567890123456789012345678901x.β",
-    "output": "x01234567890123456789012345678901234567890123456789012345678901x.xn--nxa"
-  },
-  {
-    "comment": "Domain excluding TLD longer than 253 code points",
-    "input": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.x",
-    "output": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.x"
-  },
-  {
-    "input": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.xn--nxa",
-    "output": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.xn--nxa"
-  },
-  {
-    "input": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.β",
-    "output": "01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.01234567890123456789012345678901234567890123456789.0123456789012345678901234567890123456789012345678.xn--nxa"
-  }
-]
diff --git a/test/parallel/test-icu-punycode.js b/test/parallel/test-icu-punycode.js
deleted file mode 100644
index 29e88f9b9a6262..00000000000000
--- a/test/parallel/test-icu-punycode.js
+++ /dev/null
@@ -1,57 +0,0 @@
-'use strict';
-// Flags: --expose-internals
-const common = require('../common');
-
-if (!common.hasIntl)
-  common.skip('missing Intl');
-
-const { internalBinding } = require('internal/test/binding');
-const icu = internalBinding('icu');
-const assert = require('assert');
-
-// Test hasConverter method
-assert(icu.hasConverter('utf-8'),
-       'hasConverter should report converter exists for utf-8');
-assert(!icu.hasConverter('x'),
-       'hasConverter should report converter does not exist for x');
-
-const tests = require('../fixtures/url-idna.js');
-const fixtures = require('../fixtures/icu-punycode-toascii.json');
-
-{
-  for (const [i, { ascii, unicode }] of tests.entries()) {
-    assert.strictEqual(ascii, icu.toASCII(unicode), `toASCII(${i + 1})`);
-    assert.strictEqual(unicode, icu.toUnicode(ascii), `toUnicode(${i + 1})`);
-    assert.strictEqual(ascii, icu.toASCII(icu.toUnicode(ascii)),
-                       `toASCII(toUnicode(${i + 1}))`);
-    assert.strictEqual(unicode, icu.toUnicode(icu.toASCII(unicode)),
-                       `toUnicode(toASCII(${i + 1}))`);
-  }
-}
-
-{
-  for (const [i, test] of fixtures.entries()) {
-    if (typeof test === 'string')
-      continue; // skip comments
-    const { comment, input, output } = test;
-    let caseComment = `case ${i + 1}`;
-    if (comment)
-      caseComment += ` (${comment})`;
-    if (output === null) {
-      assert.throws(
-        () => icu.toASCII(input),
-        {
-          code: 'ERR_INVALID_ARG_VALUE',
-          name: 'TypeError',
-          message: 'Cannot convert name to ASCII'
-        }
-      );
-      icu.toASCII(input, true); // Should not throw.
-    } else {
-      assert.strictEqual(icu.toASCII(input), output, `ToASCII ${caseComment}`);
-      assert.strictEqual(icu.toASCII(input, true), output,
-                         `ToASCII ${caseComment} in lenient mode`);
-    }
-    icu.toUnicode(input); // Should not throw.
-  }
-}

From 64b140d484785d15607124593108d8679e1a1525 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Jos=C3=A9?= <soyjuanarbol@gmail.com>
Date: Sat, 19 Oct 2024 21:17:22 -0500
Subject: [PATCH 030/216] cli: add `--heap-prof` flag available to
 `NODE_OPTIONS`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: https://github.com/nodejs/node/issues/54257
Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/54259
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 doc/api/cli.md                                       |  4 ++++
 src/node_options.cc                                  | 12 ++++++++----
 .../test-process-env-allowed-flags-are-documented.js | 12 ++++++++++++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/doc/api/cli.md b/doc/api/cli.md
index 63d2d00efb8eb6..825e3c6e313cf6 100644
--- a/doc/api/cli.md
+++ b/doc/api/cli.md
@@ -2777,6 +2777,10 @@ one is included in the list below.
 * `--force-fips`
 * `--force-node-api-uncaught-exceptions-policy`
 * `--frozen-intrinsics`
+* `--heap-prof-dir`
+* `--heap-prof-interval`
+* `--heap-prof-name`
+* `--heap-prof`
 * `--heapsnapshot-near-heap-limit`
 * `--heapsnapshot-signal`
 * `--http-parser`
diff --git a/src/node_options.cc b/src/node_options.cc
index e325b082dec6ae..f6ff810953b224 100644
--- a/src/node_options.cc
+++ b/src/node_options.cc
@@ -620,19 +620,23 @@ EnvironmentOptionsParser::EnvironmentOptionsParser() {
       "Start the V8 heap profiler on start up, and write the heap profile "
       "to disk before exit. If --heap-prof-dir is not specified, write "
       "the profile to the current working directory.",
-      &EnvironmentOptions::heap_prof);
+      &EnvironmentOptions::heap_prof,
+      kAllowedInEnvvar);
   AddOption("--heap-prof-name",
             "specified file name of the V8 heap profile generated with "
             "--heap-prof",
-            &EnvironmentOptions::heap_prof_name);
+            &EnvironmentOptions::heap_prof_name,
+            kAllowedInEnvvar);
   AddOption("--heap-prof-dir",
             "Directory where the V8 heap profiles generated by --heap-prof "
             "will be placed.",
-            &EnvironmentOptions::heap_prof_dir);
+            &EnvironmentOptions::heap_prof_dir,
+            kAllowedInEnvvar);
   AddOption("--heap-prof-interval",
             "specified sampling interval in bytes for the V8 heap "
             "profile generated with --heap-prof. (default: 512 * 1024)",
-            &EnvironmentOptions::heap_prof_interval);
+            &EnvironmentOptions::heap_prof_interval,
+            kAllowedInEnvvar);
 #endif  // HAVE_INSPECTOR
   AddOption("--max-http-header-size",
             "set the maximum size of HTTP headers (default: 16384 (16KB))",
diff --git a/test/parallel/test-process-env-allowed-flags-are-documented.js b/test/parallel/test-process-env-allowed-flags-are-documented.js
index 2746fdfc086f99..3480b2c1482f17 100644
--- a/test/parallel/test-process-env-allowed-flags-are-documented.js
+++ b/test/parallel/test-process-env-allowed-flags-are-documented.js
@@ -86,6 +86,18 @@ const difference = (setA, setB) => {
   return new Set([...setA].filter((x) => !setB.has(x)));
 };
 
+// Remove heap prof options if the inspector is not enabled.
+// NOTE: this is for ubuntuXXXX_sharedlibs_withoutssl_x64, no SSL, no inspector
+// Refs: https://github.com/nodejs/node/pull/54259#issuecomment-2308256647
+if (!process.features.inspector) {
+  [
+    '--heap-prof-dir',
+    '--heap-prof-interval',
+    '--heap-prof-name',
+    '--heap-prof',
+  ].forEach((opt) => documented.delete(opt));
+}
+
 const overdocumented = difference(documented,
                                   process.allowedNodeEnvironmentFlags);
 assert.strictEqual(overdocumented.size, 0,

From 3dceeb8b15e6024c568054693bcc3925b27e059d Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Fri, 18 Oct 2024 16:04:25 +0000
Subject: [PATCH 031/216] tools: add script to synch c-ares source lists

Add step to the updater script for c-ares to synchronize the list
of sources in our gyp file with the lists in c-ares' Makefiles.

PR-URL: https://github.com/nodejs/node/pull/55445
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 tools/dep_updaters/update-c-ares.mjs | 38 ++++++++++++++++++++++++++++
 tools/dep_updaters/update-c-ares.sh  |  3 +++
 2 files changed, 41 insertions(+)
 create mode 100644 tools/dep_updaters/update-c-ares.mjs

diff --git a/tools/dep_updaters/update-c-ares.mjs b/tools/dep_updaters/update-c-ares.mjs
new file mode 100644
index 00000000000000..c5057e666de70d
--- /dev/null
+++ b/tools/dep_updaters/update-c-ares.mjs
@@ -0,0 +1,38 @@
+// Synchronize the sources for our c-ares gyp file from c-ares' Makefiles.
+import { readFileSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+
+const srcroot = join(import.meta.dirname, '..', '..');
+const options = { encoding: 'utf8' };
+
+// Extract list of sources from the gyp file.
+const gypFile = join(srcroot, 'deps', 'cares', 'cares.gyp');
+const contents = readFileSync(gypFile, options);
+const sourcesRE = /^\s+'cares_sources_common':\s+\[\s*\n(?<files>[\s\S]*?)\s+\],$/gm;
+const sourcesCommon = sourcesRE.exec(contents);
+
+// Extract the list of sources from c-ares' Makefile.inc.
+const makefile = join(srcroot, 'deps', 'cares', 'src', 'lib', 'Makefile.inc');
+const libSources = readFileSync(makefile, options).split('\n')
+  // Extract filenames (excludes comments and variable assignment).
+  .map((line) => line.match(/^(?:.*= |\s*)?([^#\s]*)\s*\\?/)?.[1])
+  // Filter out empty lines.
+  .filter((line) => line !== '')
+  // Prefix with directory and format as list entry.
+  .map((line) => `      'src/lib/${line}',`);
+
+// Extract include files.
+const includeMakefile = join(srcroot, 'deps', 'cares', 'include', 'Makefile.am');
+const includeSources = readFileSync(includeMakefile, options)
+  .match(/include_HEADERS\s*=\s*(.*)/)[1]
+  .split(/\s/)
+  .map((header) => `      'include/${header}',`);
+
+// Combine the lists. Alphabetically sort to minimize diffs.
+const fileList = includeSources.concat(libSources).sort();
+
+// Replace the list of sources.
+const newContents = contents.replace(sourcesCommon.groups.files, fileList.join('\n'));
+if (newContents !== contents) {
+  writeFileSync(gypFile, newContents, options);
+}
diff --git a/tools/dep_updaters/update-c-ares.sh b/tools/dep_updaters/update-c-ares.sh
index 66d37ba350c246..f990430d9f6aee 100755
--- a/tools/dep_updaters/update-c-ares.sh
+++ b/tools/dep_updaters/update-c-ares.sh
@@ -71,6 +71,9 @@ cp "$DEPS_DIR/cares/"*.gn "$DEPS_DIR/cares/"*.gni "$WORKSPACE/cares"
 echo "Replacing existing c-ares"
 replace_dir "$DEPS_DIR/cares" "$WORKSPACE/cares"
 
+echo "Updating cares.gyp"
+"$NODE" "$ROOT/tools/dep_updaters/update-c-ares.mjs"
+
 # Update the version number on maintaining-dependencies.md
 # and print the new version as the last line of the script as we need
 # to add it to $GITHUB_ENV variable

From e89e80752299b72e34dcff95308261b93838bd33 Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Fri, 18 Oct 2024 16:47:05 +0000
Subject: [PATCH 032/216] build: synchronize list of c-ares source files

Run the `tools/dep_updaters/update-c-ares.mjs` script to synchronize
the list of source files in our gyp file with the lists from c-ares'
Makefiles.

PR-URL: https://github.com/nodejs/node/pull/55445
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 deps/cares/cares.gyp | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/deps/cares/cares.gyp b/deps/cares/cares.gyp
index 87898f5610d56c..ef43c25a47ff34 100644
--- a/deps/cares/cares.gyp
+++ b/deps/cares/cares.gyp
@@ -2,6 +2,7 @@
   'variables': {
     'cares_sources_common': [
       'include/ares.h',
+      'include/ares_build.h',
       'include/ares_dns.h',
       'include/ares_dns_record.h',
       'include/ares_nameser.h',
@@ -49,6 +50,8 @@
       'src/lib/ares_strerror.c',
       'src/lib/ares_sysconfig.c',
       'src/lib/ares_sysconfig_files.c',
+      'src/lib/ares_sysconfig_mac.c',
+      'src/lib/ares_sysconfig_win.c',
       'src/lib/ares_timeout.c',
       'src/lib/ares_update_servers.c',
       'src/lib/ares_version.c',
@@ -111,11 +114,11 @@
       'src/lib/record/ares_dns_private.h',
       'src/lib/record/ares_dns_record.c',
       'src/lib/record/ares_dns_write.c',
-      'src/lib/str/ares__buf.h',
       'src/lib/str/ares_buf.c',
       'src/lib/str/ares_str.c',
       'src/lib/str/ares_strsplit.c',
       'src/lib/str/ares_strsplit.h',
+      'src/lib/thirdparty/apple/dnsinfo.h',
       'src/lib/util/ares_iface_ips.c',
       'src/lib/util/ares_iface_ips.h',
       'src/lib/util/ares_math.c',
@@ -128,8 +131,7 @@
       'src/lib/util/ares_timeval.c',
       'src/lib/util/ares_uri.c',
       'src/lib/util/ares_uri.h',
-      'src/tools/ares_getopt.c',
-      'src/tools/ares_getopt.h',
+      'src/lib/windows_port.c',
     ],
     'cares_sources_mac': [
       'config/darwin/ares_config.h',

From 2f71f168ef74beb9d17efc65488fbf88c03f2351 Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Fri, 18 Oct 2024 16:52:00 +0000
Subject: [PATCH 033/216] build: tidy up cares.gyp

Add comment noting that `cares_sources_common` is generated by tooling.
Remove duplicated entries.

PR-URL: https://github.com/nodejs/node/pull/55445
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 deps/cares/cares.gyp | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/deps/cares/cares.gyp b/deps/cares/cares.gyp
index ef43c25a47ff34..c34ad7370ca45a 100644
--- a/deps/cares/cares.gyp
+++ b/deps/cares/cares.gyp
@@ -1,5 +1,6 @@
 {
   'variables': {
+    # This list is generated by `tools/dep_updaters/update-c-ares.mjs`.
     'cares_sources_common': [
       'include/ares.h',
       'include/ares_build.h',
@@ -135,13 +136,6 @@
     ],
     'cares_sources_mac': [
       'config/darwin/ares_config.h',
-      'src/lib/ares_sysconfig_mac.c',
-      'src/lib/thirdparty/apple/dnsinfo.h',
-    ],
-    'cares_sources_win': [
-      'src/lib/ares_sysconfig_win.c',
-      'src/lib/config-win32.h',
-      'src/lib/windows_port.c',
     ],
   },
 
@@ -203,9 +197,6 @@
             '_WINSOCK_DEPRECATED_NO_WARNINGS',
           ],
           'include_dirs': [ 'config/win32' ],
-          'sources': [
-            '<@(cares_sources_win)',
-          ],
           'libraries': [
             '-lws2_32.lib',
             '-liphlpapi.lib'

From aeae7e1e6f688ba44b38f3832751bc660b734bd1 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 21 Oct 2024 15:25:45 +0200
Subject: [PATCH 034/216] meta: move one or more collaborators to emeritus

PR-URL: https://github.com/nodejs/node/pull/55381
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index b60e3c5df619a4..2bb65e5dcc4d6d 100644
--- a/README.md
+++ b/README.md
@@ -320,8 +320,6 @@ For information about the governance of the Node.js project, see
   **Kohei Ueno** <<kohei.ueno119@gmail.com>> (he/him)
 * [daeyeon](https://github.com/daeyeon) -
   **Daeyeon Jeong** <<daeyeon.dev@gmail.com>> (he/him)
-* [danielleadams](https://github.com/danielleadams) -
-  **Danielle Adams** <<adamzdanielle@gmail.com>> (she/her)
 * [debadree25](https://github.com/debadree25) -
   **Debadree Chatterjee** <<debadree333@gmail.com>> (he/him)
 * [deokjinkim](https://github.com/deokjinkim) -
@@ -502,6 +500,8 @@ For information about the governance of the Node.js project, see
   **Claudio Rodriguez** <<cjrodr@yahoo.com>>
 * [danbev](https://github.com/danbev) -
   **Daniel Bevenius** <<daniel.bevenius@gmail.com>> (he/him)
+* [danielleadams](https://github.com/danielleadams) -
+  **Danielle Adams** <<adamzdanielle@gmail.com>> (she/her)
 * [DavidCai1993](https://github.com/DavidCai1993) -
   **David Cai** <<davidcai1993@yahoo.com>> (he/him)
 * [davisjam](https://github.com/davisjam) -

From ad725c766dc5c3ca6841fc9085a865acb72c02ac Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 21 Oct 2024 17:51:00 +0200
Subject: [PATCH 035/216] deps: update ada to 2.9.1

PR-URL: https://github.com/nodejs/node/pull/54679
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 deps/ada/ada.cpp | 41 +++++++++++++++++++------------
 deps/ada/ada.h   | 64 ++++++++++++++++++++++++++++++++++--------------
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/deps/ada/ada.cpp b/deps/ada/ada.cpp
index 72512a0651826e..d7f9b3a92c5330 100644
--- a/deps/ada/ada.cpp
+++ b/deps/ada/ada.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-07-06 17:38:56 -0400. Do not edit! */
+/* auto-generated on 2024-09-02 20:07:32 -0400. Do not edit! */
 /* begin file src/ada.cpp */
 #include "ada.h"
 /* begin file src/checkers.cpp */
@@ -11553,21 +11553,21 @@ ada_really_inline bool url::parse_scheme(const std::string_view input) {
       // If url's scheme is not a special scheme and buffer is a special scheme,
       // then return.
       if (is_special() != is_input_special) {
-        return true;
+        return false;
       }
 
       // If url includes credentials or has a non-null port, and buffer is
       // "file", then return.
       if ((has_credentials() || port.has_value()) &&
           parsed_type == ada::scheme::type::FILE) {
-        return true;
+        return false;
       }
 
       // If url's scheme is "file" and its host is an empty host, then return.
       // An empty host is the empty string.
       if (type == ada::scheme::type::FILE && host.has_value() &&
           host.value().empty()) {
-        return true;
+        return false;
       }
     }
 
@@ -13215,21 +13215,21 @@ template <bool has_state_override>
       // If url's scheme is not a special scheme and buffer is a special scheme,
       // then return.
       if (is_special() != is_input_special) {
-        return true;
+        return false;
       }
 
       // If url includes credentials or has a non-null port, and buffer is
       // "file", then return.
       if ((has_credentials() || components.port != url_components::omitted) &&
           parsed_type == ada::scheme::type::FILE) {
-        return true;
+        return false;
       }
 
       // If url's scheme is "file" and its host is an empty host, then return.
       // An empty host is the empty string.
       if (type == ada::scheme::type::FILE &&
           components.host_start == components.host_end) {
-        return true;
+        return false;
       }
     }
 
@@ -13830,7 +13830,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return "null";
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_username() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_username() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_username");
   if (has_non_empty_username()) {
     return helpers::substring(buffer, components.protocol_end + 2,
@@ -13839,7 +13840,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return "";
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_password() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_password() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_password");
   if (has_non_empty_password()) {
     return helpers::substring(buffer, components.username_end + 1,
@@ -13848,7 +13850,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return "";
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_port() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_port() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_port");
   if (components.port == url_components::omitted) {
     return "";
@@ -13857,7 +13860,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
                             components.pathname_start);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_hash() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_hash() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_hash");
   // If this's URL's fragment is either null or the empty string, then return
   // the empty string. Return U+0023 (#), followed by this's URL's fragment.
@@ -13870,7 +13874,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return helpers::substring(buffer, components.hash_start);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_host() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_host() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_host");
   // Technically, we should check if there is a hostname, but
   // the code below works even if there isn't.
@@ -13888,7 +13893,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return helpers::substring(buffer, start, components.pathname_start);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_hostname() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_hostname() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_hostname");
   // Technically, we should check if there is a hostname, but
   // the code below works even if there isn't.
@@ -13902,7 +13908,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return helpers::substring(buffer, start, components.host_end);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_pathname() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_pathname() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_pathname pathname_start = ",
           components.pathname_start, " buffer.size() = ", buffer.size(),
           " components.search_start = ", components.search_start,
@@ -13916,7 +13923,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return helpers::substring(buffer, components.pathname_start, ending_index);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_search() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_search() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_search");
   // If this's URL's query is either null or the empty string, then return the
   // empty string. Return U+003F (?), followed by this's URL's query.
@@ -13933,7 +13941,8 @@ bool url_aggregator::set_hostname(const std::string_view input) {
   return helpers::substring(buffer, components.search_start, ending_index);
 }
 
-[[nodiscard]] std::string_view url_aggregator::get_protocol() const noexcept {
+[[nodiscard]] std::string_view url_aggregator::get_protocol() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_protocol");
   return helpers::substring(buffer, 0, components.protocol_end);
 }
diff --git a/deps/ada/ada.h b/deps/ada/ada.h
index 00ccd803505087..4b00198e6a4bac 100644
--- a/deps/ada/ada.h
+++ b/deps/ada/ada.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-07-06 17:38:56 -0400. Do not edit! */
+/* auto-generated on 2024-09-02 20:07:32 -0400. Do not edit! */
 /* begin file include/ada.h */
 /**
  * @file ada.h
@@ -479,6 +479,18 @@ namespace ada {
 #define ADA_NEON 1
 #endif
 
+#ifndef __has_cpp_attribute
+#define ada_lifetime_bound
+#elif __has_cpp_attribute(msvc::lifetimebound)
+#define ada_lifetime_bound [[msvc::lifetimebound]]
+#elif __has_cpp_attribute(clang::lifetimebound)
+#define ada_lifetime_bound [[clang::lifetimebound]]
+#elif __has_cpp_attribute(lifetimebound)
+#define ada_lifetime_bound [[lifetimebound]]
+#else
+#define ada_lifetime_bound
+#endif
+
 #endif  // ADA_COMMON_DEFS_H
 /* end file include/ada/common_defs.h */
 #include <cstdint>
@@ -4845,35 +4857,38 @@ struct url_aggregator : url_base {
    * @see https://url.spec.whatwg.org/#dom-url-href
    * @see https://url.spec.whatwg.org/#concept-url-serializer
    */
-  [[nodiscard]] inline std::string_view get_href() const noexcept;
+  [[nodiscard]] inline std::string_view get_href() const noexcept
+      ada_lifetime_bound;
   /**
    * The username getter steps are to return this's URL's username.
    * This function does not allocate memory.
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-username
    */
-  [[nodiscard]] std::string_view get_username() const noexcept;
+  [[nodiscard]] std::string_view get_username() const noexcept
+      ada_lifetime_bound;
   /**
    * The password getter steps are to return this's URL's password.
    * This function does not allocate memory.
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-password
    */
-  [[nodiscard]] std::string_view get_password() const noexcept;
+  [[nodiscard]] std::string_view get_password() const noexcept
+      ada_lifetime_bound;
   /**
    * Return this's URL's port, serialized.
    * This function does not allocate memory.
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-port
    */
-  [[nodiscard]] std::string_view get_port() const noexcept;
+  [[nodiscard]] std::string_view get_port() const noexcept ada_lifetime_bound;
   /**
    * Return U+0023 (#), followed by this's URL's fragment.
    * This function does not allocate memory.
    * @return a lightweight std::string_view..
    * @see https://url.spec.whatwg.org/#dom-url-hash
    */
-  [[nodiscard]] std::string_view get_hash() const noexcept;
+  [[nodiscard]] std::string_view get_hash() const noexcept ada_lifetime_bound;
   /**
    * Return url's host, serialized, followed by U+003A (:) and url's port,
    * serialized.
@@ -4882,7 +4897,7 @@ struct url_aggregator : url_base {
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-host
    */
-  [[nodiscard]] std::string_view get_host() const noexcept;
+  [[nodiscard]] std::string_view get_host() const noexcept ada_lifetime_bound;
   /**
    * Return this's URL's host, serialized.
    * This function does not allocate memory.
@@ -4890,7 +4905,8 @@ struct url_aggregator : url_base {
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-hostname
    */
-  [[nodiscard]] std::string_view get_hostname() const noexcept;
+  [[nodiscard]] std::string_view get_hostname() const noexcept
+      ada_lifetime_bound;
   /**
    * The pathname getter steps are to return the result of URL path serializing
    * this's URL.
@@ -4898,7 +4914,8 @@ struct url_aggregator : url_base {
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-pathname
    */
-  [[nodiscard]] std::string_view get_pathname() const noexcept;
+  [[nodiscard]] std::string_view get_pathname() const noexcept
+      ada_lifetime_bound;
   /**
    * Compute the pathname length in bytes without instantiating a view or a
    * string.
@@ -4912,7 +4929,7 @@ struct url_aggregator : url_base {
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-search
    */
-  [[nodiscard]] std::string_view get_search() const noexcept;
+  [[nodiscard]] std::string_view get_search() const noexcept ada_lifetime_bound;
   /**
    * The protocol getter steps are to return this's URL's scheme, followed by
    * U+003A (:).
@@ -4920,7 +4937,8 @@ struct url_aggregator : url_base {
    * @return a lightweight std::string_view.
    * @see https://url.spec.whatwg.org/#dom-url-protocol
    */
-  [[nodiscard]] std::string_view get_protocol() const noexcept;
+  [[nodiscard]] std::string_view get_protocol() const noexcept
+      ada_lifetime_bound;
 
   /**
    * A URL includes credentials if its username or password is not the empty
@@ -5828,7 +5846,7 @@ inline void url::set_scheme(std::string &&new_scheme) noexcept {
   type = ada::scheme::get_scheme_type(new_scheme);
   // We only move the 'scheme' if it is non-special.
   if (!is_special()) {
-    non_special_scheme = new_scheme;
+    non_special_scheme = std::move(new_scheme);
   }
 }
 
@@ -5877,10 +5895,15 @@ inline void url::copy_scheme(const ada::url &u) {
 ada_really_inline size_t url::parse_port(std::string_view view,
                                          bool check_trailing_content) noexcept {
   ada_log("parse_port('", view, "') ", view.size());
+  if (!view.empty() && view[0] == '-') {
+    ada_log("parse_port: view[0] == '0' && view.size() > 1");
+    is_valid = false;
+    return 0;
+  }
   uint16_t parsed_port{};
   auto r = std::from_chars(view.data(), view.data() + view.size(), parsed_port);
   if (r.ec == std::errc::result_out_of_range) {
-    ada_log("parse_port: std::errc::result_out_of_range");
+    ada_log("parse_port: r.ec == std::errc::result_out_of_range");
     is_valid = false;
     return 0;
   }
@@ -6776,8 +6799,8 @@ inline bool url_aggregator::has_port() const noexcept {
          buffer[components.host_end + 1] == '.';
 }
 
-[[nodiscard]] inline std::string_view url_aggregator::get_href()
-    const noexcept {
+[[nodiscard]] inline std::string_view url_aggregator::get_href() const noexcept
+    ada_lifetime_bound {
   ada_log("url_aggregator::get_href");
   return buffer;
 }
@@ -6785,10 +6808,15 @@ inline bool url_aggregator::has_port() const noexcept {
 ada_really_inline size_t url_aggregator::parse_port(
     std::string_view view, bool check_trailing_content) noexcept {
   ada_log("url_aggregator::parse_port('", view, "') ", view.size());
+  if (!view.empty() && view[0] == '-') {
+    ada_log("parse_port: view[0] == '0' && view.size() > 1");
+    is_valid = false;
+    return 0;
+  }
   uint16_t parsed_port{};
   auto r = std::from_chars(view.data(), view.data() + view.size(), parsed_port);
   if (r.ec == std::errc::result_out_of_range) {
-    ada_log("parse_port: std::errc::result_out_of_range");
+    ada_log("parse_port: r.ec == std::errc::result_out_of_range");
     is_valid = false;
     return 0;
   }
@@ -7279,14 +7307,14 @@ url_search_params_entries_iter::next() {
 #ifndef ADA_ADA_VERSION_H
 #define ADA_ADA_VERSION_H
 
-#define ADA_VERSION "2.9.0"
+#define ADA_VERSION "2.9.2"
 
 namespace ada {
 
 enum {
   ADA_VERSION_MAJOR = 2,
   ADA_VERSION_MINOR = 9,
-  ADA_VERSION_REVISION = 0,
+  ADA_VERSION_REVISION = 2,
 };
 
 }  // namespace ada

From 284e932326ad06531a73dead84faf6b687b40f5a Mon Sep 17 00:00:00 2001
From: Cloorc <13597105+cloorc@users.noreply.github.com>
Date: Tue, 22 Oct 2024 03:50:43 +0800
Subject: [PATCH 036/216] build: fix uninstall script for AIX 7.1

Signed-off-by: Cloorc <13597105+cloorc@users.noreply.github.com>
PR-URL: https://github.com/nodejs/node/pull/55438
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Abdirahim Musse <abdirahim.musse@ibm.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
---
 tools/install.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/install.py b/tools/install.py
index b132c7bf26c028..2dceb5c39ea4a1 100755
--- a/tools/install.py
+++ b/tools/install.py
@@ -4,10 +4,15 @@
 import ast
 import errno
 import os
+import platform
 import shutil
 import sys
 import re
 
+current_system = platform.system()
+
+SYSTEM_AIX = "AIX"
+
 def abspath(*args):
   path = os.path.join(*args)
   return os.path.abspath(path)
@@ -44,6 +49,7 @@ def try_rmdir_r(options, path):
     except OSError as e:
       if e.errno == errno.ENOTEMPTY: return
       if e.errno == errno.ENOENT: return
+      if e.errno == errno.EEXIST and current_system == SYSTEM_AIX: return
       raise
     path = abspath(path, '..')
 

From 0487e70475a538e116bfa8b869ad1567c722ad35 Mon Sep 17 00:00:00 2001
From: Ian Kerins <git@isk.haus>
Date: Mon, 21 Oct 2024 20:17:55 -0400
Subject: [PATCH 037/216] doc: remove outdated remarks about `highWaterMark` in
 fs

`Readable`'s `highWaterMark` has in fact been 64KiB since #52037.

PR-URL: https://github.com/nodejs/node/pull/55462
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 doc/api/fs.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/doc/api/fs.md b/doc/api/fs.md
index 81cce1ff2ebfb4..d2ee714c0460a1 100644
--- a/doc/api/fs.md
+++ b/doc/api/fs.md
@@ -268,9 +268,6 @@ added: v16.11.0
   * `signal` {AbortSignal|undefined} **Default:** `undefined`
 * Returns: {fs.ReadStream}
 
-Unlike the 16 KiB default `highWaterMark` for a {stream.Readable}, the stream
-returned by this method has a default `highWaterMark` of 64 KiB.
-
 `options` can include `start` and `end` values to read a range of bytes from
 the file instead of the entire file. Both `start` and `end` are inclusive and
 start counting at 0, allowed values are in the
@@ -2473,9 +2470,6 @@ changes:
   * `signal` {AbortSignal|null} **Default:** `null`
 * Returns: {fs.ReadStream}
 
-Unlike the 16 KiB default `highWaterMark` for a {stream.Readable}, the stream
-returned by this method has a default `highWaterMark` of 64 KiB.
-
 `options` can include `start` and `end` values to read a range of bytes from
 the file instead of the entire file. Both `start` and `end` are inclusive and
 start counting at 0, allowed values are in the

From f63413c6f30c2cfd856b70b6cecb3b64d70ac28c Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 22 Oct 2024 02:38:21 +0200
Subject: [PATCH 038/216] deps: update c-ares to v1.34.2

PR-URL: https://github.com/nodejs/node/pull/55463
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
---
 deps/cares/CMakeLists.txt                     |  2 +-
 deps/cares/RELEASE-NOTES.md                   |  8 +++++++-
 deps/cares/aminclude_static.am                |  2 +-
 deps/cares/configure                          | 20 +++++++++----------
 deps/cares/configure.ac                       |  2 +-
 deps/cares/docs/Makefile.in                   |  2 ++
 deps/cares/docs/Makefile.inc                  |  2 ++
 deps/cares/docs/ares_dns_rr_del_opt_byid.3    |  3 +++
 .../cares/docs/ares_set_socket_functions_ex.3 |  3 +++
 deps/cares/include/ares_version.h             | 18 +++++++----------
 deps/cares/src/lib/Makefile.in                |  2 +-
 .../src/lib/event/ares_event_wake_pipe.c      |  6 +++---
 12 files changed, 41 insertions(+), 29 deletions(-)
 create mode 100644 deps/cares/docs/ares_dns_rr_del_opt_byid.3
 create mode 100644 deps/cares/docs/ares_set_socket_functions_ex.3

diff --git a/deps/cares/CMakeLists.txt b/deps/cares/CMakeLists.txt
index 39963f1e8c3cb4..cf9a516414d1ab 100644
--- a/deps/cares/CMakeLists.txt
+++ b/deps/cares/CMakeLists.txt
@@ -12,7 +12,7 @@ INCLUDE (CheckCSourceCompiles)
 INCLUDE (CheckStructHasMember)
 INCLUDE (CheckLibraryExists)
 
-PROJECT (c-ares LANGUAGES C VERSION "1.34.1" )
+PROJECT (c-ares LANGUAGES C VERSION "1.34.2" )
 
 # Set this version before release
 SET (CARES_VERSION "${PROJECT_VERSION}")
diff --git a/deps/cares/RELEASE-NOTES.md b/deps/cares/RELEASE-NOTES.md
index fa1db666083365..cbd4788600f3ac 100644
--- a/deps/cares/RELEASE-NOTES.md
+++ b/deps/cares/RELEASE-NOTES.md
@@ -1,4 +1,10 @@
-## c-ares version 1.34.1 - Octover 9 2024
+## c-ares version 1.34.2 - October 15 2024
+
+This release contains a fix for downstream packages detecting the c-ares
+version based on the contents of the header file rather than the
+distributed pkgconf or cmake files.
+
+## c-ares version 1.34.1 - October 9 2024
 
 This release fixes a packaging issue.
 
diff --git a/deps/cares/aminclude_static.am b/deps/cares/aminclude_static.am
index 7cc94c822cf773..9e346c39c815a1 100644
--- a/deps/cares/aminclude_static.am
+++ b/deps/cares/aminclude_static.am
@@ -1,6 +1,6 @@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Wed Oct  9 20:58:25 EDT 2024
+# from AX_AM_MACROS_STATIC on Tue Oct 15 06:09:51 EDT 2024
 
 
 # Code coverage
diff --git a/deps/cares/configure b/deps/cares/configure
index 635872c9f18e1a..a6b48c9872767b 100755
--- a/deps/cares/configure
+++ b/deps/cares/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.72 for c-ares 1.34.1.
+# Generated by GNU Autoconf 2.72 for c-ares 1.34.2.
 #
 # Report bugs to <c-ares mailing list: http://lists.haxx.se/listinfo/c-ares>.
 #
@@ -614,8 +614,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='c-ares'
 PACKAGE_TARNAME='c-ares'
-PACKAGE_VERSION='1.34.1'
-PACKAGE_STRING='c-ares 1.34.1'
+PACKAGE_VERSION='1.34.2'
+PACKAGE_STRING='c-ares 1.34.2'
 PACKAGE_BUGREPORT='c-ares mailing list: http://lists.haxx.se/listinfo/c-ares'
 PACKAGE_URL=''
 
@@ -1423,7 +1423,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-'configure' configures c-ares 1.34.1 to adapt to many kinds of systems.
+'configure' configures c-ares 1.34.2 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1494,7 +1494,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of c-ares 1.34.1:";;
+     short | recursive ) echo "Configuration of c-ares 1.34.2:";;
    esac
   cat <<\_ACEOF
 
@@ -1635,7 +1635,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-c-ares configure 1.34.1
+c-ares configure 1.34.2
 generated by GNU Autoconf 2.72
 
 Copyright (C) 2023 Free Software Foundation, Inc.
@@ -2279,7 +2279,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by c-ares $as_me 1.34.1, which was
+It was created by c-ares $as_me 1.34.2, which was
 generated by GNU Autoconf 2.72.  Invocation command line was
 
   $ $0$ac_configure_args_raw
@@ -6192,7 +6192,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='c-ares'
- VERSION='1.34.1'
+ VERSION='1.34.2'
 
 
 printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h
@@ -26823,7 +26823,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by c-ares $as_me 1.34.1, which was
+This file was extended by c-ares $as_me 1.34.2, which was
 generated by GNU Autoconf 2.72.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -26891,7 +26891,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config='$ac_cs_config_escaped'
 ac_cs_version="\\
-c-ares config.status 1.34.1
+c-ares config.status 1.34.2
 configured by $0, generated by GNU Autoconf 2.72,
   with options \\"\$ac_cs_config\\"
 
diff --git a/deps/cares/configure.ac b/deps/cares/configure.ac
index cc52c2c6c5de0a..0ebda1c63f5f5e 100644
--- a/deps/cares/configure.ac
+++ b/deps/cares/configure.ac
@@ -2,7 +2,7 @@ dnl Copyright (C) The c-ares project and its contributors
 dnl SPDX-License-Identifier: MIT
 AC_PREREQ([2.69])
 
-AC_INIT([c-ares], [1.34.1],
+AC_INIT([c-ares], [1.34.2],
   [c-ares mailing list: http://lists.haxx.se/listinfo/c-ares])
 
 CARES_VERSION_INFO="21:1:19"
diff --git a/deps/cares/docs/Makefile.in b/deps/cares/docs/Makefile.in
index 75b3f3d942bbd6..da57136dad9a88 100644
--- a/deps/cares/docs/Makefile.in
+++ b/deps/cares/docs/Makefile.in
@@ -386,6 +386,7 @@ MANPAGES = ares_cancel.3		\
   ares_dns_rr.3				\
   ares_dns_rr_add_abin.3		\
   ares_dns_rr_del_abin.3		\
+  ares_dns_rr_del_opt_byid.3		\
   ares_dns_rr_get_abin.3		\
   ares_dns_rr_get_abin_cnt.3		\
   ares_dns_rr_get_addr.3		\
@@ -483,6 +484,7 @@ MANPAGES = ares_cancel.3		\
   ares_set_socket_callback.3		\
   ares_set_socket_configure_callback.3	\
   ares_set_socket_functions.3		\
+  ares_set_socket_functions_ex.3	\
   ares_set_sortlist.3			\
   ares_strerror.3			\
   ares_svcb_param_t.3			\
diff --git a/deps/cares/docs/Makefile.inc b/deps/cares/docs/Makefile.inc
index d6ad73246b3f99..b5519369aa9ea5 100644
--- a/deps/cares/docs/Makefile.inc
+++ b/deps/cares/docs/Makefile.inc
@@ -43,6 +43,7 @@ MANPAGES = ares_cancel.3		\
   ares_dns_rr.3				\
   ares_dns_rr_add_abin.3		\
   ares_dns_rr_del_abin.3		\
+  ares_dns_rr_del_opt_byid.3		\
   ares_dns_rr_get_abin.3		\
   ares_dns_rr_get_abin_cnt.3		\
   ares_dns_rr_get_addr.3		\
@@ -140,6 +141,7 @@ MANPAGES = ares_cancel.3		\
   ares_set_socket_callback.3		\
   ares_set_socket_configure_callback.3	\
   ares_set_socket_functions.3		\
+  ares_set_socket_functions_ex.3	\
   ares_set_sortlist.3			\
   ares_strerror.3			\
   ares_svcb_param_t.3			\
diff --git a/deps/cares/docs/ares_dns_rr_del_opt_byid.3 b/deps/cares/docs/ares_dns_rr_del_opt_byid.3
new file mode 100644
index 00000000000000..b93e4cd4e37fa8
--- /dev/null
+++ b/deps/cares/docs/ares_dns_rr_del_opt_byid.3
@@ -0,0 +1,3 @@
+.\" Copyright (C) 2023 The c-ares project and its contributors.
+.\" SPDX-License-Identifier: MIT
+.so man3/ares_dns_rr.3
diff --git a/deps/cares/docs/ares_set_socket_functions_ex.3 b/deps/cares/docs/ares_set_socket_functions_ex.3
new file mode 100644
index 00000000000000..a0f02456c320cb
--- /dev/null
+++ b/deps/cares/docs/ares_set_socket_functions_ex.3
@@ -0,0 +1,3 @@
+.\" Copyright (C) 2024 The c-ares project and its contributors.
+.\" SPDX-License-Identifier: MIT
+.so man3/ares_set_socket_functions.3
diff --git a/deps/cares/include/ares_version.h b/deps/cares/include/ares_version.h
index ba98e6949d53c8..d7a9c9e61e36d2 100644
--- a/deps/cares/include/ares_version.h
+++ b/deps/cares/include/ares_version.h
@@ -32,20 +32,16 @@
 
 #define ARES_VERSION_MAJOR 1
 #define ARES_VERSION_MINOR 34
-#define ARES_VERSION_PATCH 1
+#define ARES_VERSION_PATCH 2
+#define ARES_VERSION_STR "1.34.2"
+
+/* NOTE: We cannot make the version string a C preprocessor stringify operation
+ *       due to assumptions made by integrators that aren't properly using
+ *       pkgconf or cmake and are doing their own detection based on parsing
+ *       this header */
 
 #define ARES_VERSION                                        \
   ((ARES_VERSION_MAJOR << 16) | (ARES_VERSION_MINOR << 8) | \
    (ARES_VERSION_PATCH))
 
-
-/* Need a level of indirection due to argument prescan to stringify a macro
- * value. */
-#define ARES_STRINGIFY_PRE(s) #s
-#define ARES_STRINGIFY(s)     ARES_STRINGIFY_PRE(s)
-
-#define ARES_VERSION_STR             \
-  ARES_STRINGIFY(ARES_VERSION_MAJOR) \
-  "." ARES_STRINGIFY(ARES_VERSION_MINOR) "." ARES_STRINGIFY(ARES_VERSION_PATCH)
-
 #endif
diff --git a/deps/cares/src/lib/Makefile.in b/deps/cares/src/lib/Makefile.in
index 6fdb27835828af..db6b17f2f53112 100644
--- a/deps/cares/src/lib/Makefile.in
+++ b/deps/cares/src/lib/Makefile.in
@@ -15,7 +15,7 @@
 @SET_MAKE@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Wed Oct  9 20:58:25 EDT 2024
+# from AX_AM_MACROS_STATIC on Tue Oct 15 06:09:51 EDT 2024
 
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
diff --git a/deps/cares/src/lib/event/ares_event_wake_pipe.c b/deps/cares/src/lib/event/ares_event_wake_pipe.c
index 282d013dc62b24..d3b166a3d6cb78 100644
--- a/deps/cares/src/lib/event/ares_event_wake_pipe.c
+++ b/deps/cares/src/lib/event/ares_event_wake_pipe.c
@@ -92,9 +92,9 @@ static ares_pipeevent_t *ares_pipeevent_init(void)
   }
 #    endif
 
-#    ifdef O_CLOEXEC
-  fcntl(p->filedes[0], F_SETFD, O_CLOEXEC);
-  fcntl(p->filedes[1], F_SETFD, O_CLOEXEC);
+#    ifdef FD_CLOEXEC
+  fcntl(p->filedes[0], F_SETFD, FD_CLOEXEC);
+  fcntl(p->filedes[1], F_SETFD, FD_CLOEXEC);
 #    endif
 #  endif
 

From c6236571fcc57c9652f1d068d1b31d6e3699f223 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 22 Oct 2024 02:38:37 +0200
Subject: [PATCH 039/216] deps: update googletest to df1544b

PR-URL: https://github.com/nodejs/node/pull/55465
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 deps/googletest/src/gtest.cc | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/deps/googletest/src/gtest.cc b/deps/googletest/src/gtest.cc
index 6662a13ce1455f..c08ab4197c5500 100644
--- a/deps/googletest/src/gtest.cc
+++ b/deps/googletest/src/gtest.cc
@@ -1660,10 +1660,25 @@ std::string GetBoolAssertionFailureMessage(
   return msg.GetString();
 }
 
-// Helper function for implementing ASSERT_NEAR.
+// Helper function for implementing ASSERT_NEAR. Treats infinity as a specific
+// value, such that comparing infinity to infinity is equal, the distance
+// between -infinity and +infinity is infinity, and infinity <= infinity is
+// true.
 AssertionResult DoubleNearPredFormat(const char* expr1, const char* expr2,
                                      const char* abs_error_expr, double val1,
                                      double val2, double abs_error) {
+  // We want to return success when the two values are infinity and at least
+  // one of the following is true:
+  //  * The values are the same-signed infinity.
+  //  * The error limit itself is infinity.
+  // This is done here so that we don't end up with a NaN when calculating the
+  // difference in values.
+  if (std::isinf(val1) && std::isinf(val2) &&
+      (std::signbit(val1) == std::signbit(val2) ||
+       (abs_error > 0.0 && std::isinf(abs_error)))) {
+    return AssertionSuccess();
+  }
+
   const double diff = fabs(val1 - val2);
   if (diff <= abs_error) return AssertionSuccess();
 

From a28376bb854c0d71cd60c552b1a4bbb543143f40 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Tue, 22 Oct 2024 00:55:32 -0400
Subject: [PATCH 040/216] test: deflake
 `test-cluster-shared-handle-bind-privileged-port`

PR-URL: https://github.com/nodejs/node/pull/55378
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 ...cluster-shared-handle-bind-privileged-port.js | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/test/parallel/test-cluster-shared-handle-bind-privileged-port.js b/test/parallel/test-cluster-shared-handle-bind-privileged-port.js
index 8bdde0a3320e44..5e04c8eea1a899 100644
--- a/test/parallel/test-cluster-shared-handle-bind-privileged-port.js
+++ b/test/parallel/test-cluster-shared-handle-bind-privileged-port.js
@@ -35,6 +35,22 @@ if (common.isWindows)
 if (process.getuid() === 0)
   common.skip('as this test should not be run as `root`');
 
+// Some systems won't have port 42 set as a privileged port, in that
+// case, skip the test.
+if (common.isLinux) {
+  const { readFileSync } = require('fs');
+
+  try {
+    const unprivilegedPortStart = parseInt(readFileSync('/proc/sys/net/ipv4/ip_unprivileged_port_start'));
+    if (unprivilegedPortStart <= 42) {
+      common.skip('Port 42 is unprivileged');
+    }
+  } catch {
+    // Do nothing, feature doesn't exist, minimum is 1024 so 42 is usable.
+    // Continue...
+  }
+}
+
 const assert = require('assert');
 const cluster = require('cluster');
 const net = require('net');

From 475e478713d9cd6655b46a2812651fe5fc82cfdc Mon Sep 17 00:00:00 2001
From: leviscar <823727280@qq.com>
Date: Tue, 22 Oct 2024 13:00:25 +0800
Subject: [PATCH 041/216] doc: add `isBigIntObject` to documentation

Refs: https://github.com/nodejs/node/pull/19989
Fixes: https://github.com/nodejs/node/issues/55446
PR-URL: https://github.com/nodejs/node/pull/55450
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/api/util.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/doc/api/util.md b/doc/api/util.md
index 5db0680265b3a4..334d89e36c0a2f 100644
--- a/doc/api/util.md
+++ b/doc/api/util.md
@@ -2292,6 +2292,24 @@ util.types.isBigInt64Array(new BigInt64Array());   // Returns true
 util.types.isBigInt64Array(new BigUint64Array());  // Returns false
 ```
 
+### `util.types.isBigIntObject(value)`
+
+<!-- YAML
+added: v10.4.0
+-->
+
+* `value` {any}
+* Returns: {boolean}
+
+Returns `true` if the value is a BigInt object, e.g. created
+by `Object(BigInt(123))`.
+
+```js
+util.types.isBigIntObject(Object(BigInt(123)));   // Returns true
+util.types.isBigIntObject(BigInt(123));   // Returns false
+util.types.isBigIntObject(123);  // Returns false
+```
+
 ### `util.types.isBigUint64Array(value)`
 
 <!-- YAML

From 9db657532b8b98c177870bc4b85e596a3729f6be Mon Sep 17 00:00:00 2001
From: "Ederin (Ed) Igharoro" <eigharoro@gmail.com>
Date: Mon, 21 Oct 2024 23:05:05 -0600
Subject: [PATCH 042/216] doc: add note about stdio streams in child_process

PR-URL: https://github.com/nodejs/node/pull/55322
Fixes: https://github.com/nodejs/node/issues/15714
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 doc/api/child_process.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/api/child_process.md b/doc/api/child_process.md
index a03dbc7ed68bd9..b98e0e041cf1b8 100644
--- a/doc/api/child_process.md
+++ b/doc/api/child_process.md
@@ -1061,6 +1061,15 @@ pipes between the parent and child. The value is one of the following:
    corresponds to the index in the `stdio` array. The stream must have an
    underlying descriptor (file streams do not start until the `'open'` event has
    occurred).
+   **NOTE:** While it is technically possible to pass `stdin` as a writable or
+   `stdout`/`stderr` as readable, it is not recommended.
+   Readable and writable streams are designed with distinct behaviors, and using
+   them incorrectly (e.g., passing a readable stream where a writable stream is
+   expected) can lead to unexpected results or errors. This practice is discouraged
+   as it may result in undefined behavior or dropped callbacks if the stream
+   encounters errors. Always ensure that `stdin` is used as writable and
+   `stdout`/`stderr` as readable to maintain the intended flow of data between
+   the parent and child processes.
 7. Positive integer: The integer value is interpreted as a file descriptor
    that is open in the parent process. It is shared with the child
    process, similar to how {Stream} objects can be shared. Passing sockets

From d5ad06007361a9f9a199fc40e05da0b6a52463a9 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Tue, 22 Oct 2024 09:49:19 +0200
Subject: [PATCH 043/216] test: fix addons and node-api test assumptions

PR-URL: https://github.com/nodejs/node/pull/55441
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 test/addons/repl-domain-abort/test.js                 | 2 +-
 test/node-api/test_instance_data/test.js              | 9 ++-------
 test/node-api/test_uv_threadpool_size/node-options.js | 7 ++-----
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/test/addons/repl-domain-abort/test.js b/test/addons/repl-domain-abort/test.js
index 8d148432c31c21..0108de0d9ee67d 100644
--- a/test/addons/repl-domain-abort/test.js
+++ b/test/addons/repl-domain-abort/test.js
@@ -38,7 +38,7 @@ process.on('exit', () => {
 
 const lines = [
   // This line shouldn't cause an assertion error.
-  `require('${buildPath}')` +
+  `require(${JSON.stringify(buildPath)})` +
   // Log output to double check callback ran.
   '.method(function(v1, v2) {' +
   'console.log(\'cb_ran\'); return v1 === true && v2 === false; });',
diff --git a/test/node-api/test_instance_data/test.js b/test/node-api/test_instance_data/test.js
index 7f8e785ac43546..1758bea28931d3 100644
--- a/test/node-api/test_instance_data/test.js
+++ b/test/node-api/test_instance_data/test.js
@@ -38,13 +38,8 @@ if (module !== require.main) {
   function testProcessExit(addonName) {
     // Make sure that process exit is clean when the instance data has
     // references to JS objects.
-    const path = require
-      .resolve(`./build/${common.buildType}/${addonName}`)
-      // Replace any backslashes with double backslashes because they'll be re-
-      // interpreted back to single backslashes in the command line argument
-      // to the child process. Windows needs this.
-      .replace(/\\/g, '\\\\');
-    const child = spawnSync(process.execPath, ['-e', `require('${path}');`]);
+    const path = require.resolve(`./build/${common.buildType}/${addonName}`);
+    const child = spawnSync(process.execPath, ['-e', `require(${JSON.stringify(path)});`]);
     assert.strictEqual(child.signal, null);
     assert.strictEqual(child.status, 0);
     assert.strictEqual(child.stderr.toString(), 'addon_free');
diff --git a/test/node-api/test_uv_threadpool_size/node-options.js b/test/node-api/test_uv_threadpool_size/node-options.js
index 40610274acce4c..c558addd1bccd5 100644
--- a/test/node-api/test_uv_threadpool_size/node-options.js
+++ b/test/node-api/test_uv_threadpool_size/node-options.js
@@ -12,12 +12,9 @@ if (process.config.variables.node_without_node_options) {
 const uvThreadPoolPath = '../../fixtures/dotenv/uv-threadpool.env';
 
 // Should update UV_THREADPOOL_SIZE
-let filePath = path.join(__dirname, `./build/${common.buildType}/test_uv_threadpool_size`);
-if (common.isWindows) {
-  filePath = filePath.replaceAll('\\', '\\\\');
-}
+const filePath = path.join(__dirname, `./build/${common.buildType}/test_uv_threadpool_size`);
 const code = `
-   const { test } = require('${filePath}');
+   const { test } = require(${JSON.stringify(filePath)});
    const size = parseInt(process.env.UV_THREADPOOL_SIZE, 10);
    require('assert').strictEqual(size, 4);
    test(size);

From 274d0b40620ddd73cbbe278e0a3c678592601742 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 22 Oct 2024 14:55:44 +0200
Subject: [PATCH 044/216] tools: update lint-md-dependencies

- @rollup/plugin-commonjs@28.0.1
- @rollup/plugin-node-resolve@15.3.0
- remark-preset-lint-node@5.1.2
- rollup@4.24.0

PR-URL: https://github.com/nodejs/node/pull/55470
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
---
 tools/lint-md/lint-md.mjs       | 2903 ++++++++++++++-----------------
 tools/lint-md/package-lock.json |  833 ++-------
 tools/lint-md/package.json      |    8 +-
 3 files changed, 1519 insertions(+), 2225 deletions(-)

diff --git a/tools/lint-md/lint-md.mjs b/tools/lint-md/lint-md.mjs
index 5a6ab56740ffb0..81c72856249fae 100644
--- a/tools/lint-md/lint-md.mjs
+++ b/tools/lint-md/lint-md.mjs
@@ -1,7 +1,7 @@
 import fs from 'fs';
-import path$2 from 'path';
+import path$1 from 'path';
 import { pathToFileURL } from 'url';
-import path$1 from 'node:path';
+import minpath from 'node:path';
 import process$1 from 'node:process';
 import { fileURLToPath } from 'node:url';
 import fs$1 from 'node:fs';
@@ -14,96 +14,103 @@ function bail(error) {
   }
 }
 
-var commonjsGlobal = typeof globalThis !== 'undefined' ? globalThis : typeof window !== 'undefined' ? window : typeof global !== 'undefined' ? global : typeof self !== 'undefined' ? self : {};
-
 function getDefaultExportFromCjs (x) {
 	return x && x.__esModule && Object.prototype.hasOwnProperty.call(x, 'default') ? x['default'] : x;
 }
 
-var hasOwn = Object.prototype.hasOwnProperty;
-var toStr = Object.prototype.toString;
-var defineProperty = Object.defineProperty;
-var gOPD = Object.getOwnPropertyDescriptor;
-var isArray = function isArray(arr) {
-	if (typeof Array.isArray === 'function') {
-		return Array.isArray(arr);
-	}
-	return toStr.call(arr) === '[object Array]';
-};
-var isPlainObject$1 = function isPlainObject(obj) {
-	if (!obj || toStr.call(obj) !== '[object Object]') {
-		return false;
-	}
-	var hasOwnConstructor = hasOwn.call(obj, 'constructor');
-	var hasIsPrototypeOf = obj.constructor && obj.constructor.prototype && hasOwn.call(obj.constructor.prototype, 'isPrototypeOf');
-	if (obj.constructor && !hasOwnConstructor && !hasIsPrototypeOf) {
-		return false;
-	}
-	var key;
-	for (key in obj) {  }
-	return typeof key === 'undefined' || hasOwn.call(obj, key);
-};
-var setProperty = function setProperty(target, options) {
-	if (defineProperty && options.name === '__proto__') {
-		defineProperty(target, options.name, {
-			enumerable: true,
-			configurable: true,
-			value: options.newValue,
-			writable: true
-		});
-	} else {
-		target[options.name] = options.newValue;
-	}
-};
-var getProperty = function getProperty(obj, name) {
-	if (name === '__proto__') {
-		if (!hasOwn.call(obj, name)) {
-			return void 0;
-		} else if (gOPD) {
-			return gOPD(obj, name).value;
+var extend$2;
+var hasRequiredExtend;
+function requireExtend () {
+	if (hasRequiredExtend) return extend$2;
+	hasRequiredExtend = 1;
+	var hasOwn = Object.prototype.hasOwnProperty;
+	var toStr = Object.prototype.toString;
+	var defineProperty = Object.defineProperty;
+	var gOPD = Object.getOwnPropertyDescriptor;
+	var isArray = function isArray(arr) {
+		if (typeof Array.isArray === 'function') {
+			return Array.isArray(arr);
 		}
-	}
-	return obj[name];
-};
-var extend$1 = function extend() {
-	var options, name, src, copy, copyIsArray, clone;
-	var target = arguments[0];
-	var i = 1;
-	var length = arguments.length;
-	var deep = false;
-	if (typeof target === 'boolean') {
-		deep = target;
-		target = arguments[1] || {};
-		i = 2;
-	}
-	if (target == null || (typeof target !== 'object' && typeof target !== 'function')) {
-		target = {};
-	}
-	for (; i < length; ++i) {
-		options = arguments[i];
-		if (options != null) {
-			for (name in options) {
-				src = getProperty(target, name);
-				copy = getProperty(options, name);
-				if (target !== copy) {
-					if (deep && copy && (isPlainObject$1(copy) || (copyIsArray = isArray(copy)))) {
-						if (copyIsArray) {
-							copyIsArray = false;
-							clone = src && isArray(src) ? src : [];
-						} else {
-							clone = src && isPlainObject$1(src) ? src : {};
+		return toStr.call(arr) === '[object Array]';
+	};
+	var isPlainObject = function isPlainObject(obj) {
+		if (!obj || toStr.call(obj) !== '[object Object]') {
+			return false;
+		}
+		var hasOwnConstructor = hasOwn.call(obj, 'constructor');
+		var hasIsPrototypeOf = obj.constructor && obj.constructor.prototype && hasOwn.call(obj.constructor.prototype, 'isPrototypeOf');
+		if (obj.constructor && !hasOwnConstructor && !hasIsPrototypeOf) {
+			return false;
+		}
+		var key;
+		for (key in obj) {  }
+		return typeof key === 'undefined' || hasOwn.call(obj, key);
+	};
+	var setProperty = function setProperty(target, options) {
+		if (defineProperty && options.name === '__proto__') {
+			defineProperty(target, options.name, {
+				enumerable: true,
+				configurable: true,
+				value: options.newValue,
+				writable: true
+			});
+		} else {
+			target[options.name] = options.newValue;
+		}
+	};
+	var getProperty = function getProperty(obj, name) {
+		if (name === '__proto__') {
+			if (!hasOwn.call(obj, name)) {
+				return void 0;
+			} else if (gOPD) {
+				return gOPD(obj, name).value;
+			}
+		}
+		return obj[name];
+	};
+	extend$2 = function extend() {
+		var options, name, src, copy, copyIsArray, clone;
+		var target = arguments[0];
+		var i = 1;
+		var length = arguments.length;
+		var deep = false;
+		if (typeof target === 'boolean') {
+			deep = target;
+			target = arguments[1] || {};
+			i = 2;
+		}
+		if (target == null || (typeof target !== 'object' && typeof target !== 'function')) {
+			target = {};
+		}
+		for (; i < length; ++i) {
+			options = arguments[i];
+			if (options != null) {
+				for (name in options) {
+					src = getProperty(target, name);
+					copy = getProperty(options, name);
+					if (target !== copy) {
+						if (deep && copy && (isPlainObject(copy) || (copyIsArray = isArray(copy)))) {
+							if (copyIsArray) {
+								copyIsArray = false;
+								clone = src && isArray(src) ? src : [];
+							} else {
+								clone = src && isPlainObject(src) ? src : {};
+							}
+							setProperty(target, { name: name, newValue: extend(deep, clone, copy) });
+						} else if (typeof copy !== 'undefined') {
+							setProperty(target, { name: name, newValue: copy });
 						}
-						setProperty(target, { name: name, newValue: extend(deep, clone, copy) });
-					} else if (typeof copy !== 'undefined') {
-						setProperty(target, { name: name, newValue: copy });
 					}
 				}
 			}
 		}
-	}
-	return target;
-};
-var extend$2 = getDefaultExportFromCjs(extend$1);
+		return target;
+	};
+	return extend$2;
+}
+
+var extendExports = requireExtend();
+var extend$1 = /*@__PURE__*/getDefaultExportFromCjs(extendExports);
 
 function ok$1() {}
 
@@ -348,7 +355,7 @@ class VFile {
     } else {
       options = value;
     }
-    this.cwd = process$1.cwd();
+    this.cwd = 'cwd' in options ? '' : process$1.cwd();
     this.data = {};
     this.history = [];
     this.messages = [];
@@ -358,39 +365,45 @@ class VFile {
     this.stored;
     let index = -1;
     while (++index < order.length) {
-      const prop = order[index];
+      const field = order[index];
       if (
-        prop in options &&
-        options[prop] !== undefined &&
-        options[prop] !== null
+        field in options &&
+        options[field] !== undefined &&
+        options[field] !== null
       ) {
-        this[prop] = prop === 'history' ? [...options[prop]] : options[prop];
+        this[field] = field === 'history' ? [...options[field]] : options[field];
       }
     }
-    let prop;
-    for (prop in options) {
-      if (!order.includes(prop)) {
-        this[prop] = options[prop];
+    let field;
+    for (field in options) {
+      if (!order.includes(field)) {
+        this[field] = options[field];
       }
     }
   }
   get basename() {
-    return typeof this.path === 'string' ? path$1.basename(this.path) : undefined
+    return typeof this.path === 'string'
+      ? minpath.basename(this.path)
+      : undefined
   }
   set basename(basename) {
     assertNonEmpty(basename, 'basename');
     assertPart(basename, 'basename');
-    this.path = path$1.join(this.dirname || '', basename);
+    this.path = minpath.join(this.dirname || '', basename);
   }
   get dirname() {
-    return typeof this.path === 'string' ? path$1.dirname(this.path) : undefined
+    return typeof this.path === 'string'
+      ? minpath.dirname(this.path)
+      : undefined
   }
   set dirname(dirname) {
     assertPath(this.basename, 'dirname');
-    this.path = path$1.join(dirname || '', this.basename);
+    this.path = minpath.join(dirname || '', this.basename);
   }
   get extname() {
-    return typeof this.path === 'string' ? path$1.extname(this.path) : undefined
+    return typeof this.path === 'string'
+      ? minpath.extname(this.path)
+      : undefined
   }
   set extname(extname) {
     assertPart(extname, 'extname');
@@ -403,7 +416,7 @@ class VFile {
         throw new Error('`extname` cannot contain multiple dots')
       }
     }
-    this.path = path$1.join(this.dirname, this.stem + (extname || ''));
+    this.path = minpath.join(this.dirname, this.stem + (extname || ''));
   }
   get path() {
     return this.history[this.history.length - 1]
@@ -419,13 +432,13 @@ class VFile {
   }
   get stem() {
     return typeof this.path === 'string'
-      ? path$1.basename(this.path, this.extname)
+      ? minpath.basename(this.path, this.extname)
       : undefined
   }
   set stem(stem) {
     assertNonEmpty(stem, 'stem');
     assertPart(stem, 'stem');
-    this.path = path$1.join(this.dirname || '', stem + (this.extname || ''));
+    this.path = minpath.join(this.dirname || '', stem + (this.extname || ''));
   }
   fail(causeOrReason, optionsOrParentOrPlace, origin) {
     const message = this.message(causeOrReason, optionsOrParentOrPlace, origin);
@@ -463,9 +476,9 @@ class VFile {
   }
 }
 function assertPart(part, name) {
-  if (part && part.includes(path$1.sep)) {
+  if (part && part.includes(minpath.sep)) {
     throw new Error(
-      '`' + name + '` cannot be a path: did not expect `' + path$1.sep + '`'
+      '`' + name + '` cannot be a path: did not expect `' + minpath.sep + '`'
     )
   }
 }
@@ -531,7 +544,7 @@ class Processor extends CallableInstance {
       const attacher = this.attachers[index];
       destination.use(...attacher);
     }
-    destination.data(extend$2(true, {}, this.namespace));
+    destination.data(extend$1(true, {}, this.namespace));
     return destination
   }
   data(key, value) {
@@ -719,7 +732,7 @@ class Processor extends CallableInstance {
       }
       addList(result.plugins);
       if (result.settings) {
-        namespace.settings = extend$2(true, namespace.settings, result.settings);
+        namespace.settings = extend$1(true, namespace.settings, result.settings);
       }
     }
     function addList(plugins) {
@@ -749,7 +762,7 @@ class Processor extends CallableInstance {
         let [primary, ...rest] = parameters;
         const currentPrimary = attachers[entryIndex][1];
         if (isPlainObject(currentPrimary) && isPlainObject(primary)) {
-          primary = extend$2(true, currentPrimary, primary);
+          primary = extend$1(true, currentPrimary, primary);
         }
         attachers[entryIndex] = [plugin, primary, ...rest];
       }
@@ -6726,7 +6739,7 @@ var defaultConstructs = /*#__PURE__*/Object.freeze({
   text: text$2
 });
 
-function parse$2(options) {
+function parse$1(options) {
   const settings = options || {};
   const constructs =
     combineExtensions([defaultConstructs, ...(settings.extensions || [])]);
@@ -6755,7 +6768,7 @@ function postprocess(events) {
   return events
 }
 
-const search$1 = /[\0\t\n\r]/g;
+const search = /[\0\t\n\r]/g;
 function preprocess() {
   let column = 1;
   let buffer = '';
@@ -6783,8 +6796,8 @@ function preprocess() {
       start = undefined;
     }
     while (startPosition < value.length) {
-      search$1.lastIndex = startPosition;
-      match = search$1.exec(value);
+      search.lastIndex = startPosition;
+      match = search.exec(value);
       endPosition =
         match && match.index !== undefined ? match.index : value.length;
       code = value.charCodeAt(endPosition);
@@ -6862,7 +6875,7 @@ function fromMarkdown(value, encoding, options) {
     options = encoding;
     encoding = undefined;
   }
-  return compiler(options)(postprocess(parse$2(options).document().write(preprocess()(value, encoding, true))));
+  return compiler(options)(postprocess(parse$1(options).document().write(preprocess()(value, encoding, true))));
 }
 function compiler(options) {
   const config = {
@@ -9355,7 +9368,7 @@ function transformGfmAutolinkLiterals(tree) {
     tree,
     [
       [/(https?:\/\/|www(?=\.))([-.\w]+)([^ \t\r\n]*)/gi, findUrl],
-      [/([-.\w+]+)@([-\w]+(?:\.[-\w]+)+)/g, findEmail]
+      [/(?<=^|\s|\p{P}|\p{S})([-.\w+]+)@([-\w]+(?:\.[-\w]+)+)/gu, findEmail]
     ],
     {ignore: ['link', 'linkReference']}
   );
@@ -11301,56 +11314,64 @@ function isNode(value) {
   return Boolean(value && typeof value === 'object' && 'type' in value)
 }
 
-function parse$1(value) {
+function parse(value) {
   const input = String(value || '').trim();
   return input ? input.split(/[ \t\n\r\f]+/g) : []
 }
 
-const search = /\r?\n|\r/g;
 function location(file) {
   const value = String(file);
   const indices = [];
-  search.lastIndex = 0;
-  while (search.test(value)) {
-    indices.push(search.lastIndex);
-  }
-  indices.push(value.length + 1);
-  return {toPoint, toOffset}
+  return {toOffset, toPoint}
   function toPoint(offset) {
-    let index = -1;
-    if (
-      typeof offset === 'number' &&
-      offset > -1 &&
-      offset < indices[indices.length - 1]
-    ) {
-      while (++index < indices.length) {
-        if (indices[index] > offset) {
+    if (typeof offset === 'number' && offset > -1 && offset <= value.length) {
+      let index = 0;
+      while (true) {
+        let end = indices[index];
+        if (end === undefined) {
+          const eol = next(value, indices[index - 1]);
+          end = eol === -1 ? value.length + 1 : eol + 1;
+          indices[index] = end;
+        }
+        if (end > offset) {
           return {
             line: index + 1,
             column: offset - (index > 0 ? indices[index - 1] : 0) + 1,
             offset
           }
         }
+        index++;
       }
     }
   }
   function toOffset(point) {
-    const line = point && point.line;
-    const column = point && point.column;
     if (
-      typeof line === 'number' &&
-      typeof column === 'number' &&
-      !Number.isNaN(line) &&
-      !Number.isNaN(column) &&
-      line - 1 in indices
+      point &&
+      typeof point.line === 'number' &&
+      typeof point.column === 'number' &&
+      !Number.isNaN(point.line) &&
+      !Number.isNaN(point.column)
     ) {
-      const offset = (indices[line - 2] || 0) + column - 1 || 0;
-      if (offset > -1 && offset < indices[indices.length - 1]) {
-        return offset
+      while (indices.length < point.line) {
+        const from = indices[indices.length - 1];
+        const eol = next(value, from);
+        const end = eol === -1 ? value.length + 1 : eol + 1;
+        if (from === end) break
+        indices.push(end);
       }
+      const offset =
+        (point.line > 1 ? indices[point.line - 2] : 0) + point.column - 1;
+      if (offset < indices[point.line - 1]) return offset
     }
   }
 }
+function next(value, from) {
+  const cr = value.indexOf('\r', from);
+  const lf = value.indexOf('\n', from);
+  if (lf === -1) return cr
+  if (cr === -1 || cr + 1 === lf) return lf
+  return cr < lf ? cr : lf
+}
 
 const own = {}.hasOwnProperty;
 function messageControl(tree, options) {
@@ -11386,7 +11407,7 @@ function messageControl(tree, options) {
     if (!point || !mark || mark.name !== name) {
       return
     }
-    const ruleIds = parse$1(mark.attributes);
+    const ruleIds = parse(mark.attributes);
     const verb = ruleIds.shift();
     const fn =
       verb === 'enable'
@@ -11573,7 +11594,7 @@ function lintRule$1(meta, rule) {
   Object.defineProperty(plugin, 'name', {value: id});
   return plugin
   function plugin(config) {
-    const [severity, options] = coerce$2(ruleId, config);
+    const [severity, options] = coerce$1(ruleId, config);
     const fatal = severity === 2;
     if (!severity) return
     return function (tree, file, next) {
@@ -11593,7 +11614,7 @@ function lintRule$1(meta, rule) {
     }
   }
 }
-function coerce$2(name, config) {
+function coerce$1(name, config) {
   if (!Array.isArray(config)) {
     return [1, config]
   }
@@ -11858,334 +11879,342 @@ function commonjsRequire(path) {
 	throw new Error('Could not dynamically require "' + path + '". Please configure the dynamicRequireTargets or/and ignoreDynamicRequires option of @rollup/plugin-commonjs appropriately for this require call to work.');
 }
 
-var pluralize$1 = {exports: {}};
+var pluralize$2 = {exports: {}};
 
-(function (module, exports) {
-	(function (root, pluralize) {
-	  if (typeof commonjsRequire === 'function' && 'object' === 'object' && 'object' === 'object') {
-	    module.exports = pluralize();
-	  } else {
-	    root.pluralize = pluralize();
-	  }
-	})(commonjsGlobal, function () {
-	  var pluralRules = [];
-	  var singularRules = [];
-	  var uncountables = {};
-	  var irregularPlurals = {};
-	  var irregularSingles = {};
-	  function sanitizeRule (rule) {
-	    if (typeof rule === 'string') {
-	      return new RegExp('^' + rule + '$', 'i');
-	    }
-	    return rule;
-	  }
-	  function restoreCase (word, token) {
-	    if (word === token) return token;
-	    if (word === word.toLowerCase()) return token.toLowerCase();
-	    if (word === word.toUpperCase()) return token.toUpperCase();
-	    if (word[0] === word[0].toUpperCase()) {
-	      return token.charAt(0).toUpperCase() + token.substr(1).toLowerCase();
-	    }
-	    return token.toLowerCase();
-	  }
-	  function interpolate (str, args) {
-	    return str.replace(/\$(\d{1,2})/g, function (match, index) {
-	      return args[index] || '';
-	    });
-	  }
-	  function replace (word, rule) {
-	    return word.replace(rule[0], function (match, index) {
-	      var result = interpolate(rule[1], arguments);
-	      if (match === '') {
-	        return restoreCase(word[index - 1], result);
-	      }
-	      return restoreCase(match, result);
-	    });
-	  }
-	  function sanitizeWord (token, word, rules) {
-	    if (!token.length || uncountables.hasOwnProperty(token)) {
-	      return word;
-	    }
-	    var len = rules.length;
-	    while (len--) {
-	      var rule = rules[len];
-	      if (rule[0].test(word)) return replace(word, rule);
-	    }
-	    return word;
-	  }
-	  function replaceWord (replaceMap, keepMap, rules) {
-	    return function (word) {
-	      var token = word.toLowerCase();
-	      if (keepMap.hasOwnProperty(token)) {
-	        return restoreCase(word, token);
-	      }
-	      if (replaceMap.hasOwnProperty(token)) {
-	        return restoreCase(word, replaceMap[token]);
-	      }
-	      return sanitizeWord(token, word, rules);
-	    };
-	  }
-	  function checkWord (replaceMap, keepMap, rules, bool) {
-	    return function (word) {
-	      var token = word.toLowerCase();
-	      if (keepMap.hasOwnProperty(token)) return true;
-	      if (replaceMap.hasOwnProperty(token)) return false;
-	      return sanitizeWord(token, token, rules) === token;
-	    };
-	  }
-	  function pluralize (word, count, inclusive) {
-	    var pluralized = count === 1
-	      ? pluralize.singular(word) : pluralize.plural(word);
-	    return (inclusive ? count + ' ' : '') + pluralized;
-	  }
-	  pluralize.plural = replaceWord(
-	    irregularSingles, irregularPlurals, pluralRules
-	  );
-	  pluralize.isPlural = checkWord(
-	    irregularSingles, irregularPlurals, pluralRules
-	  );
-	  pluralize.singular = replaceWord(
-	    irregularPlurals, irregularSingles, singularRules
-	  );
-	  pluralize.isSingular = checkWord(
-	    irregularPlurals, irregularSingles, singularRules
-	  );
-	  pluralize.addPluralRule = function (rule, replacement) {
-	    pluralRules.push([sanitizeRule(rule), replacement]);
-	  };
-	  pluralize.addSingularRule = function (rule, replacement) {
-	    singularRules.push([sanitizeRule(rule), replacement]);
-	  };
-	  pluralize.addUncountableRule = function (word) {
-	    if (typeof word === 'string') {
-	      uncountables[word.toLowerCase()] = true;
-	      return;
-	    }
-	    pluralize.addPluralRule(word, '$0');
-	    pluralize.addSingularRule(word, '$0');
-	  };
-	  pluralize.addIrregularRule = function (single, plural) {
-	    plural = plural.toLowerCase();
-	    single = single.toLowerCase();
-	    irregularSingles[single] = plural;
-	    irregularPlurals[plural] = single;
-	  };
-	  [
-	    ['I', 'we'],
-	    ['me', 'us'],
-	    ['he', 'they'],
-	    ['she', 'they'],
-	    ['them', 'them'],
-	    ['myself', 'ourselves'],
-	    ['yourself', 'yourselves'],
-	    ['itself', 'themselves'],
-	    ['herself', 'themselves'],
-	    ['himself', 'themselves'],
-	    ['themself', 'themselves'],
-	    ['is', 'are'],
-	    ['was', 'were'],
-	    ['has', 'have'],
-	    ['this', 'these'],
-	    ['that', 'those'],
-	    ['echo', 'echoes'],
-	    ['dingo', 'dingoes'],
-	    ['volcano', 'volcanoes'],
-	    ['tornado', 'tornadoes'],
-	    ['torpedo', 'torpedoes'],
-	    ['genus', 'genera'],
-	    ['viscus', 'viscera'],
-	    ['stigma', 'stigmata'],
-	    ['stoma', 'stomata'],
-	    ['dogma', 'dogmata'],
-	    ['lemma', 'lemmata'],
-	    ['schema', 'schemata'],
-	    ['anathema', 'anathemata'],
-	    ['ox', 'oxen'],
-	    ['axe', 'axes'],
-	    ['die', 'dice'],
-	    ['yes', 'yeses'],
-	    ['foot', 'feet'],
-	    ['eave', 'eaves'],
-	    ['goose', 'geese'],
-	    ['tooth', 'teeth'],
-	    ['quiz', 'quizzes'],
-	    ['human', 'humans'],
-	    ['proof', 'proofs'],
-	    ['carve', 'carves'],
-	    ['valve', 'valves'],
-	    ['looey', 'looies'],
-	    ['thief', 'thieves'],
-	    ['groove', 'grooves'],
-	    ['pickaxe', 'pickaxes'],
-	    ['passerby', 'passersby']
-	  ].forEach(function (rule) {
-	    return pluralize.addIrregularRule(rule[0], rule[1]);
-	  });
-	  [
-	    [/s?$/i, 's'],
-	    [/[^\u0000-\u007F]$/i, '$0'],
-	    [/([^aeiou]ese)$/i, '$1'],
-	    [/(ax|test)is$/i, '$1es'],
-	    [/(alias|[^aou]us|t[lm]as|gas|ris)$/i, '$1es'],
-	    [/(e[mn]u)s?$/i, '$1s'],
-	    [/([^l]ias|[aeiou]las|[ejzr]as|[iu]am)$/i, '$1'],
-	    [/(alumn|syllab|vir|radi|nucle|fung|cact|stimul|termin|bacill|foc|uter|loc|strat)(?:us|i)$/i, '$1i'],
-	    [/(alumn|alg|vertebr)(?:a|ae)$/i, '$1ae'],
-	    [/(seraph|cherub)(?:im)?$/i, '$1im'],
-	    [/(her|at|gr)o$/i, '$1oes'],
-	    [/(agend|addend|millenni|dat|extrem|bacteri|desiderat|strat|candelabr|errat|ov|symposi|curricul|automat|quor)(?:a|um)$/i, '$1a'],
-	    [/(apheli|hyperbat|periheli|asyndet|noumen|phenomen|criteri|organ|prolegomen|hedr|automat)(?:a|on)$/i, '$1a'],
-	    [/sis$/i, 'ses'],
-	    [/(?:(kni|wi|li)fe|(ar|l|ea|eo|oa|hoo)f)$/i, '$1$2ves'],
-	    [/([^aeiouy]|qu)y$/i, '$1ies'],
-	    [/([^ch][ieo][ln])ey$/i, '$1ies'],
-	    [/(x|ch|ss|sh|zz)$/i, '$1es'],
-	    [/(matr|cod|mur|sil|vert|ind|append)(?:ix|ex)$/i, '$1ices'],
-	    [/\b((?:tit)?m|l)(?:ice|ouse)$/i, '$1ice'],
-	    [/(pe)(?:rson|ople)$/i, '$1ople'],
-	    [/(child)(?:ren)?$/i, '$1ren'],
-	    [/eaux$/i, '$0'],
-	    [/m[ae]n$/i, 'men'],
-	    ['thou', 'you']
-	  ].forEach(function (rule) {
-	    return pluralize.addPluralRule(rule[0], rule[1]);
-	  });
-	  [
-	    [/s$/i, ''],
-	    [/(ss)$/i, '$1'],
-	    [/(wi|kni|(?:after|half|high|low|mid|non|night|[^\w]|^)li)ves$/i, '$1fe'],
-	    [/(ar|(?:wo|[ae])l|[eo][ao])ves$/i, '$1f'],
-	    [/ies$/i, 'y'],
-	    [/\b([pl]|zomb|(?:neck|cross)?t|coll|faer|food|gen|goon|group|lass|talk|goal|cut)ies$/i, '$1ie'],
-	    [/\b(mon|smil)ies$/i, '$1ey'],
-	    [/\b((?:tit)?m|l)ice$/i, '$1ouse'],
-	    [/(seraph|cherub)im$/i, '$1'],
-	    [/(x|ch|ss|sh|zz|tto|go|cho|alias|[^aou]us|t[lm]as|gas|(?:her|at|gr)o|[aeiou]ris)(?:es)?$/i, '$1'],
-	    [/(analy|diagno|parenthe|progno|synop|the|empha|cri|ne)(?:sis|ses)$/i, '$1sis'],
-	    [/(movie|twelve|abuse|e[mn]u)s$/i, '$1'],
-	    [/(test)(?:is|es)$/i, '$1is'],
-	    [/(alumn|syllab|vir|radi|nucle|fung|cact|stimul|termin|bacill|foc|uter|loc|strat)(?:us|i)$/i, '$1us'],
-	    [/(agend|addend|millenni|dat|extrem|bacteri|desiderat|strat|candelabr|errat|ov|symposi|curricul|quor)a$/i, '$1um'],
-	    [/(apheli|hyperbat|periheli|asyndet|noumen|phenomen|criteri|organ|prolegomen|hedr|automat)a$/i, '$1on'],
-	    [/(alumn|alg|vertebr)ae$/i, '$1a'],
-	    [/(cod|mur|sil|vert|ind)ices$/i, '$1ex'],
-	    [/(matr|append)ices$/i, '$1ix'],
-	    [/(pe)(rson|ople)$/i, '$1rson'],
-	    [/(child)ren$/i, '$1'],
-	    [/(eau)x?$/i, '$1'],
-	    [/men$/i, 'man']
-	  ].forEach(function (rule) {
-	    return pluralize.addSingularRule(rule[0], rule[1]);
-	  });
-	  [
-	    'adulthood',
-	    'advice',
-	    'agenda',
-	    'aid',
-	    'aircraft',
-	    'alcohol',
-	    'ammo',
-	    'analytics',
-	    'anime',
-	    'athletics',
-	    'audio',
-	    'bison',
-	    'blood',
-	    'bream',
-	    'buffalo',
-	    'butter',
-	    'carp',
-	    'cash',
-	    'chassis',
-	    'chess',
-	    'clothing',
-	    'cod',
-	    'commerce',
-	    'cooperation',
-	    'corps',
-	    'debris',
-	    'diabetes',
-	    'digestion',
-	    'elk',
-	    'energy',
-	    'equipment',
-	    'excretion',
-	    'expertise',
-	    'firmware',
-	    'flounder',
-	    'fun',
-	    'gallows',
-	    'garbage',
-	    'graffiti',
-	    'hardware',
-	    'headquarters',
-	    'health',
-	    'herpes',
-	    'highjinks',
-	    'homework',
-	    'housework',
-	    'information',
-	    'jeans',
-	    'justice',
-	    'kudos',
-	    'labour',
-	    'literature',
-	    'machinery',
-	    'mackerel',
-	    'mail',
-	    'media',
-	    'mews',
-	    'moose',
-	    'music',
-	    'mud',
-	    'manga',
-	    'news',
-	    'only',
-	    'personnel',
-	    'pike',
-	    'plankton',
-	    'pliers',
-	    'police',
-	    'pollution',
-	    'premises',
-	    'rain',
-	    'research',
-	    'rice',
-	    'salmon',
-	    'scissors',
-	    'series',
-	    'sewage',
-	    'shambles',
-	    'shrimp',
-	    'software',
-	    'species',
-	    'staff',
-	    'swine',
-	    'tennis',
-	    'traffic',
-	    'transportation',
-	    'trout',
-	    'tuna',
-	    'wealth',
-	    'welfare',
-	    'whiting',
-	    'wildebeest',
-	    'wildlife',
-	    'you',
-	    /pok[eé]mon$/i,
-	    /[^aeiou]ese$/i,
-	    /deer$/i,
-	    /fish$/i,
-	    /measles$/i,
-	    /o[iu]s$/i,
-	    /pox$/i,
-	    /sheep$/i
-	  ].forEach(pluralize.addUncountableRule);
-	  return pluralize;
-	});
-} (pluralize$1));
-var pluralizeExports = pluralize$1.exports;
-var pluralize = getDefaultExportFromCjs(pluralizeExports);
+var pluralize$1 = pluralize$2.exports;
+var hasRequiredPluralize;
+function requirePluralize () {
+	if (hasRequiredPluralize) return pluralize$2.exports;
+	hasRequiredPluralize = 1;
+	(function (module, exports) {
+		(function (root, pluralize) {
+		  if (typeof commonjsRequire === 'function' && 'object' === 'object' && 'object' === 'object') {
+		    module.exports = pluralize();
+		  } else {
+		    root.pluralize = pluralize();
+		  }
+		})(pluralize$1, function () {
+		  var pluralRules = [];
+		  var singularRules = [];
+		  var uncountables = {};
+		  var irregularPlurals = {};
+		  var irregularSingles = {};
+		  function sanitizeRule (rule) {
+		    if (typeof rule === 'string') {
+		      return new RegExp('^' + rule + '$', 'i');
+		    }
+		    return rule;
+		  }
+		  function restoreCase (word, token) {
+		    if (word === token) return token;
+		    if (word === word.toLowerCase()) return token.toLowerCase();
+		    if (word === word.toUpperCase()) return token.toUpperCase();
+		    if (word[0] === word[0].toUpperCase()) {
+		      return token.charAt(0).toUpperCase() + token.substr(1).toLowerCase();
+		    }
+		    return token.toLowerCase();
+		  }
+		  function interpolate (str, args) {
+		    return str.replace(/\$(\d{1,2})/g, function (match, index) {
+		      return args[index] || '';
+		    });
+		  }
+		  function replace (word, rule) {
+		    return word.replace(rule[0], function (match, index) {
+		      var result = interpolate(rule[1], arguments);
+		      if (match === '') {
+		        return restoreCase(word[index - 1], result);
+		      }
+		      return restoreCase(match, result);
+		    });
+		  }
+		  function sanitizeWord (token, word, rules) {
+		    if (!token.length || uncountables.hasOwnProperty(token)) {
+		      return word;
+		    }
+		    var len = rules.length;
+		    while (len--) {
+		      var rule = rules[len];
+		      if (rule[0].test(word)) return replace(word, rule);
+		    }
+		    return word;
+		  }
+		  function replaceWord (replaceMap, keepMap, rules) {
+		    return function (word) {
+		      var token = word.toLowerCase();
+		      if (keepMap.hasOwnProperty(token)) {
+		        return restoreCase(word, token);
+		      }
+		      if (replaceMap.hasOwnProperty(token)) {
+		        return restoreCase(word, replaceMap[token]);
+		      }
+		      return sanitizeWord(token, word, rules);
+		    };
+		  }
+		  function checkWord (replaceMap, keepMap, rules, bool) {
+		    return function (word) {
+		      var token = word.toLowerCase();
+		      if (keepMap.hasOwnProperty(token)) return true;
+		      if (replaceMap.hasOwnProperty(token)) return false;
+		      return sanitizeWord(token, token, rules) === token;
+		    };
+		  }
+		  function pluralize (word, count, inclusive) {
+		    var pluralized = count === 1
+		      ? pluralize.singular(word) : pluralize.plural(word);
+		    return (inclusive ? count + ' ' : '') + pluralized;
+		  }
+		  pluralize.plural = replaceWord(
+		    irregularSingles, irregularPlurals, pluralRules
+		  );
+		  pluralize.isPlural = checkWord(
+		    irregularSingles, irregularPlurals, pluralRules
+		  );
+		  pluralize.singular = replaceWord(
+		    irregularPlurals, irregularSingles, singularRules
+		  );
+		  pluralize.isSingular = checkWord(
+		    irregularPlurals, irregularSingles, singularRules
+		  );
+		  pluralize.addPluralRule = function (rule, replacement) {
+		    pluralRules.push([sanitizeRule(rule), replacement]);
+		  };
+		  pluralize.addSingularRule = function (rule, replacement) {
+		    singularRules.push([sanitizeRule(rule), replacement]);
+		  };
+		  pluralize.addUncountableRule = function (word) {
+		    if (typeof word === 'string') {
+		      uncountables[word.toLowerCase()] = true;
+		      return;
+		    }
+		    pluralize.addPluralRule(word, '$0');
+		    pluralize.addSingularRule(word, '$0');
+		  };
+		  pluralize.addIrregularRule = function (single, plural) {
+		    plural = plural.toLowerCase();
+		    single = single.toLowerCase();
+		    irregularSingles[single] = plural;
+		    irregularPlurals[plural] = single;
+		  };
+		  [
+		    ['I', 'we'],
+		    ['me', 'us'],
+		    ['he', 'they'],
+		    ['she', 'they'],
+		    ['them', 'them'],
+		    ['myself', 'ourselves'],
+		    ['yourself', 'yourselves'],
+		    ['itself', 'themselves'],
+		    ['herself', 'themselves'],
+		    ['himself', 'themselves'],
+		    ['themself', 'themselves'],
+		    ['is', 'are'],
+		    ['was', 'were'],
+		    ['has', 'have'],
+		    ['this', 'these'],
+		    ['that', 'those'],
+		    ['echo', 'echoes'],
+		    ['dingo', 'dingoes'],
+		    ['volcano', 'volcanoes'],
+		    ['tornado', 'tornadoes'],
+		    ['torpedo', 'torpedoes'],
+		    ['genus', 'genera'],
+		    ['viscus', 'viscera'],
+		    ['stigma', 'stigmata'],
+		    ['stoma', 'stomata'],
+		    ['dogma', 'dogmata'],
+		    ['lemma', 'lemmata'],
+		    ['schema', 'schemata'],
+		    ['anathema', 'anathemata'],
+		    ['ox', 'oxen'],
+		    ['axe', 'axes'],
+		    ['die', 'dice'],
+		    ['yes', 'yeses'],
+		    ['foot', 'feet'],
+		    ['eave', 'eaves'],
+		    ['goose', 'geese'],
+		    ['tooth', 'teeth'],
+		    ['quiz', 'quizzes'],
+		    ['human', 'humans'],
+		    ['proof', 'proofs'],
+		    ['carve', 'carves'],
+		    ['valve', 'valves'],
+		    ['looey', 'looies'],
+		    ['thief', 'thieves'],
+		    ['groove', 'grooves'],
+		    ['pickaxe', 'pickaxes'],
+		    ['passerby', 'passersby']
+		  ].forEach(function (rule) {
+		    return pluralize.addIrregularRule(rule[0], rule[1]);
+		  });
+		  [
+		    [/s?$/i, 's'],
+		    [/[^\u0000-\u007F]$/i, '$0'],
+		    [/([^aeiou]ese)$/i, '$1'],
+		    [/(ax|test)is$/i, '$1es'],
+		    [/(alias|[^aou]us|t[lm]as|gas|ris)$/i, '$1es'],
+		    [/(e[mn]u)s?$/i, '$1s'],
+		    [/([^l]ias|[aeiou]las|[ejzr]as|[iu]am)$/i, '$1'],
+		    [/(alumn|syllab|vir|radi|nucle|fung|cact|stimul|termin|bacill|foc|uter|loc|strat)(?:us|i)$/i, '$1i'],
+		    [/(alumn|alg|vertebr)(?:a|ae)$/i, '$1ae'],
+		    [/(seraph|cherub)(?:im)?$/i, '$1im'],
+		    [/(her|at|gr)o$/i, '$1oes'],
+		    [/(agend|addend|millenni|dat|extrem|bacteri|desiderat|strat|candelabr|errat|ov|symposi|curricul|automat|quor)(?:a|um)$/i, '$1a'],
+		    [/(apheli|hyperbat|periheli|asyndet|noumen|phenomen|criteri|organ|prolegomen|hedr|automat)(?:a|on)$/i, '$1a'],
+		    [/sis$/i, 'ses'],
+		    [/(?:(kni|wi|li)fe|(ar|l|ea|eo|oa|hoo)f)$/i, '$1$2ves'],
+		    [/([^aeiouy]|qu)y$/i, '$1ies'],
+		    [/([^ch][ieo][ln])ey$/i, '$1ies'],
+		    [/(x|ch|ss|sh|zz)$/i, '$1es'],
+		    [/(matr|cod|mur|sil|vert|ind|append)(?:ix|ex)$/i, '$1ices'],
+		    [/\b((?:tit)?m|l)(?:ice|ouse)$/i, '$1ice'],
+		    [/(pe)(?:rson|ople)$/i, '$1ople'],
+		    [/(child)(?:ren)?$/i, '$1ren'],
+		    [/eaux$/i, '$0'],
+		    [/m[ae]n$/i, 'men'],
+		    ['thou', 'you']
+		  ].forEach(function (rule) {
+		    return pluralize.addPluralRule(rule[0], rule[1]);
+		  });
+		  [
+		    [/s$/i, ''],
+		    [/(ss)$/i, '$1'],
+		    [/(wi|kni|(?:after|half|high|low|mid|non|night|[^\w]|^)li)ves$/i, '$1fe'],
+		    [/(ar|(?:wo|[ae])l|[eo][ao])ves$/i, '$1f'],
+		    [/ies$/i, 'y'],
+		    [/\b([pl]|zomb|(?:neck|cross)?t|coll|faer|food|gen|goon|group|lass|talk|goal|cut)ies$/i, '$1ie'],
+		    [/\b(mon|smil)ies$/i, '$1ey'],
+		    [/\b((?:tit)?m|l)ice$/i, '$1ouse'],
+		    [/(seraph|cherub)im$/i, '$1'],
+		    [/(x|ch|ss|sh|zz|tto|go|cho|alias|[^aou]us|t[lm]as|gas|(?:her|at|gr)o|[aeiou]ris)(?:es)?$/i, '$1'],
+		    [/(analy|diagno|parenthe|progno|synop|the|empha|cri|ne)(?:sis|ses)$/i, '$1sis'],
+		    [/(movie|twelve|abuse|e[mn]u)s$/i, '$1'],
+		    [/(test)(?:is|es)$/i, '$1is'],
+		    [/(alumn|syllab|vir|radi|nucle|fung|cact|stimul|termin|bacill|foc|uter|loc|strat)(?:us|i)$/i, '$1us'],
+		    [/(agend|addend|millenni|dat|extrem|bacteri|desiderat|strat|candelabr|errat|ov|symposi|curricul|quor)a$/i, '$1um'],
+		    [/(apheli|hyperbat|periheli|asyndet|noumen|phenomen|criteri|organ|prolegomen|hedr|automat)a$/i, '$1on'],
+		    [/(alumn|alg|vertebr)ae$/i, '$1a'],
+		    [/(cod|mur|sil|vert|ind)ices$/i, '$1ex'],
+		    [/(matr|append)ices$/i, '$1ix'],
+		    [/(pe)(rson|ople)$/i, '$1rson'],
+		    [/(child)ren$/i, '$1'],
+		    [/(eau)x?$/i, '$1'],
+		    [/men$/i, 'man']
+		  ].forEach(function (rule) {
+		    return pluralize.addSingularRule(rule[0], rule[1]);
+		  });
+		  [
+		    'adulthood',
+		    'advice',
+		    'agenda',
+		    'aid',
+		    'aircraft',
+		    'alcohol',
+		    'ammo',
+		    'analytics',
+		    'anime',
+		    'athletics',
+		    'audio',
+		    'bison',
+		    'blood',
+		    'bream',
+		    'buffalo',
+		    'butter',
+		    'carp',
+		    'cash',
+		    'chassis',
+		    'chess',
+		    'clothing',
+		    'cod',
+		    'commerce',
+		    'cooperation',
+		    'corps',
+		    'debris',
+		    'diabetes',
+		    'digestion',
+		    'elk',
+		    'energy',
+		    'equipment',
+		    'excretion',
+		    'expertise',
+		    'firmware',
+		    'flounder',
+		    'fun',
+		    'gallows',
+		    'garbage',
+		    'graffiti',
+		    'hardware',
+		    'headquarters',
+		    'health',
+		    'herpes',
+		    'highjinks',
+		    'homework',
+		    'housework',
+		    'information',
+		    'jeans',
+		    'justice',
+		    'kudos',
+		    'labour',
+		    'literature',
+		    'machinery',
+		    'mackerel',
+		    'mail',
+		    'media',
+		    'mews',
+		    'moose',
+		    'music',
+		    'mud',
+		    'manga',
+		    'news',
+		    'only',
+		    'personnel',
+		    'pike',
+		    'plankton',
+		    'pliers',
+		    'police',
+		    'pollution',
+		    'premises',
+		    'rain',
+		    'research',
+		    'rice',
+		    'salmon',
+		    'scissors',
+		    'series',
+		    'sewage',
+		    'shambles',
+		    'shrimp',
+		    'software',
+		    'species',
+		    'staff',
+		    'swine',
+		    'tennis',
+		    'traffic',
+		    'transportation',
+		    'trout',
+		    'tuna',
+		    'wealth',
+		    'welfare',
+		    'whiting',
+		    'wildebeest',
+		    'wildlife',
+		    'you',
+		    /pok[eé]mon$/i,
+		    /[^aeiou]ese$/i,
+		    /deer$/i,
+		    /fish$/i,
+		    /measles$/i,
+		    /o[iu]s$/i,
+		    /pox$/i,
+		    /sheep$/i
+		  ].forEach(pluralize.addUncountableRule);
+		  return pluralize;
+		});
+	} (pluralize$2));
+	return pluralize$2.exports;
+}
+
+var pluralizeExports = requirePluralize();
+var pluralize = /*@__PURE__*/getDefaultExportFromCjs(pluralizeExports);
 
 /**
  * remark-lint rule to warn when list item markers are indented.
@@ -15012,7 +15041,7 @@ const remarkLintDefinitionSpacing = lintRule$1(
 const quotation =
   (
     function (value, open, close) {
-      const start = open ;
+      const start = open;
       const end = start;
       let index = -1;
       if (Array.isArray(value)) {
@@ -17672,340 +17701,24 @@ const remarkLintNoTabs = lintRule$1(
   }
 );
 
-var sliced$1 = function (args, slice, sliceEnd) {
-  var ret = [];
-  var len = args.length;
-  if (0 === len) return ret;
-  var start = slice < 0
-    ? Math.max(0, slice + len)
-    : slice || 0;
-  if (sliceEnd !== undefined) {
-    len = sliceEnd < 0
-      ? sliceEnd + len
-      : sliceEnd;
-  }
-  while (len-- > start) {
-    ret[len - start] = args[len];
-  }
-  return ret;
-};
-getDefaultExportFromCjs(sliced$1);
-
-var slice = Array.prototype.slice;
-var co_1 = co$1;
-function co$1(fn) {
-  var isGenFun = isGeneratorFunction(fn);
-  return function (done) {
-    var ctx = this;
-    var gen = fn;
-    if (isGenFun) {
-      var args = slice.call(arguments), len = args.length;
-      var hasCallback = len && 'function' == typeof args[len - 1];
-      done = hasCallback ? args.pop() : error;
-      gen = fn.apply(this, args);
-    } else {
-      done = done || error;
-    }
-    next();
-    function exit(err, res) {
-      setImmediate(function(){
-        done.call(ctx, err, res);
-      });
-    }
-    function next(err, res) {
-      var ret;
-      if (arguments.length > 2) res = slice.call(arguments, 1);
-      if (err) {
-        try {
-          ret = gen.throw(err);
-        } catch (e) {
-          return exit(e);
-        }
-      }
-      if (!err) {
-        try {
-          ret = gen.next(res);
-        } catch (e) {
-          return exit(e);
-        }
-      }
-      if (ret.done) return exit(null, ret.value);
-      ret.value = toThunk(ret.value, ctx);
-      if ('function' == typeof ret.value) {
-        var called = false;
-        try {
-          ret.value.call(ctx, function(){
-            if (called) return;
-            called = true;
-            next.apply(ctx, arguments);
-          });
-        } catch (e) {
-          setImmediate(function(){
-            if (called) return;
-            called = true;
-            next(e);
-          });
-        }
-        return;
-      }
-      next(new TypeError('You may only yield a function, promise, generator, array, or object, '
-        + 'but the following was passed: "' + String(ret.value) + '"'));
-    }
-  }
-}
-function toThunk(obj, ctx) {
-  if (isGeneratorFunction(obj)) {
-    return co$1(obj.call(ctx));
-  }
-  if (isGenerator(obj)) {
-    return co$1(obj);
-  }
-  if (isPromise(obj)) {
-    return promiseToThunk(obj);
-  }
-  if ('function' == typeof obj) {
-    return obj;
-  }
-  if (isObject$1(obj) || Array.isArray(obj)) {
-    return objectToThunk.call(ctx, obj);
-  }
-  return obj;
-}
-function objectToThunk(obj){
-  var ctx = this;
-  var isArray = Array.isArray(obj);
-  return function(done){
-    var keys = Object.keys(obj);
-    var pending = keys.length;
-    var results = isArray
-      ? new Array(pending)
-      : new obj.constructor();
-    var finished;
-    if (!pending) {
-      setImmediate(function(){
-        done(null, results);
-      });
-      return;
-    }
-    if (!isArray) {
-      for (var i = 0; i < pending; i++) {
-        results[keys[i]] = undefined;
-      }
-    }
-    for (var i = 0; i < keys.length; i++) {
-      run(obj[keys[i]], keys[i]);
-    }
-    function run(fn, key) {
-      if (finished) return;
-      try {
-        fn = toThunk(fn, ctx);
-        if ('function' != typeof fn) {
-          results[key] = fn;
-          return --pending || done(null, results);
-        }
-        fn.call(ctx, function(err, res){
-          if (finished) return;
-          if (err) {
-            finished = true;
-            return done(err);
-          }
-          results[key] = res;
-          --pending || done(null, results);
-        });
-      } catch (err) {
-        finished = true;
-        done(err);
-      }
-    }
-  }
-}
-function promiseToThunk(promise) {
-  return function(fn){
-    promise.then(function(res) {
-      fn(null, res);
-    }, fn);
-  }
-}
-function isPromise(obj) {
-  return obj && 'function' == typeof obj.then;
-}
-function isGenerator(obj) {
-  return obj && 'function' == typeof obj.next && 'function' == typeof obj.throw;
-}
-function isGeneratorFunction(obj) {
-  return obj && obj.constructor && 'GeneratorFunction' == obj.constructor.name;
-}
-function isObject$1(val) {
-  return val && Object == val.constructor;
-}
-function error(err) {
-  if (!err) return;
-  setImmediate(function(){
-    throw err;
-  });
-}
-getDefaultExportFromCjs(co_1);
-
-var sliced = sliced$1;
-var noop = function(){};
-var co = co_1;
-var wrapped_1 = wrapped$1;
-function wrapped$1(fn) {
-  function wrap() {
-    var args = sliced(arguments);
-    var last = args[args.length - 1];
-    var ctx = this;
-    var done = typeof last == 'function' ? args.pop() : noop;
-    if (!fn) {
-      return done.apply(ctx, [null].concat(args));
-    }
-    if (generator(fn)) {
-      return co(fn).apply(ctx, args.concat(done));
-    }
-    if (fn.length > args.length) {
-      try {
-        return fn.apply(ctx, args.concat(done));
-      } catch (e) {
-        return done(e);
-      }
-    }
-    return sync(fn, done).apply(ctx, args);
-  }
-  return wrap;
-}
-function sync(fn, done) {
-  return function () {
-    var ret;
-    try {
-      ret = fn.apply(this, arguments);
-    } catch (err) {
-      return done(err);
-    }
-    if (promise(ret)) {
-      ret.then(function (value) { done(null, value); }, done);
-    } else {
-      ret instanceof Error ? done(ret) : done(null, ret);
-    }
-  }
-}
-function generator(value) {
-  return value
-    && value.constructor
-    && 'GeneratorFunction' == value.constructor.name;
-}
-function promise(value) {
-  return value && 'function' == typeof value.then;
-}
-getDefaultExportFromCjs(wrapped_1);
-
-var wrapped = wrapped_1;
-var unifiedLintRule = factory;
-function factory(id, rule) {
-  var parts = id.split(':');
-  var source = parts[0];
-  var ruleId = parts[1];
-  var fn = wrapped(rule);
-  if (!ruleId) {
-    ruleId = source;
-    source = null;
-  }
-  attacher.displayName = id;
-  return attacher
-  function attacher(raw) {
-    var config = coerce$1(ruleId, raw);
-    var severity = config[0];
-    var options = config[1];
-    var fatal = severity === 2;
-    return severity ? transformer : undefined
-    function transformer(tree, file, next) {
-      var index = file.messages.length;
-      fn(tree, file, options, done);
-      function done(err) {
-        var messages = file.messages;
-        var message;
-        if (err && messages.indexOf(err) === -1) {
-          try {
-            file.fail(err);
-          } catch (_) {}
-        }
-        while (index < messages.length) {
-          message = messages[index];
-          message.ruleId = ruleId;
-          message.source = source;
-          message.fatal = fatal;
-          index++;
-        }
-        next();
-      }
-    }
-  }
-}
-function coerce$1(name, value) {
-  var def = 1;
-  var result;
-  var level;
-  if (typeof value === 'boolean') {
-    result = [value];
-  } else if (value == null) {
-    result = [def];
-  } else if (
-    typeof value === 'object' &&
-    (typeof value[0] === 'number' ||
-      typeof value[0] === 'boolean' ||
-      typeof value[0] === 'string')
-  ) {
-    result = value.concat();
-  } else {
-    result = [1, value];
-  }
-  level = result[0];
-  if (typeof level === 'boolean') {
-    level = level ? 1 : 0;
-  } else if (typeof level === 'string') {
-    if (level === 'off') {
-      level = 0;
-    } else if (level === 'on' || level === 'warn') {
-      level = 1;
-    } else if (level === 'error') {
-      level = 2;
-    } else {
-      level = 1;
-      result = [level, result];
-    }
-  }
-  if (level < 0 || level > 2) {
-    throw new Error(
-      'Incorrect severity `' +
-        level +
-        '` for `' +
-        name +
-        '`, ' +
-        'expected 0, 1, or 2'
-    )
-  }
-  result[0] = level;
-  return result
-}
-getDefaultExportFromCjs(unifiedLintRule);
-
-var rule = unifiedLintRule;
-var remarkLintNoTrailingSpaces = rule('remark-lint:no-trailing-spaces', noTrailingSpaces);
+const rule = lintRule$1('remark-lint:no-trailing-spaces', noTrailingSpaces);
 function noTrailingSpaces(ast, file) {
-  var lines = file.toString().split(/\r?\n/);
-  for (var i = 0; i < lines.length; i++) {
-    var currentLine = lines[i];
-    var lineIndex = i + 1;
-    if (/\s$/.test(currentLine)) {
+  const myLocation = location(file);
+  const lines = file.toString().split(/\r?\n/);
+  for (let i = 0; i < lines.length; i++) {
+    const currentLine = lines[i];
+    const lineIndex = i + 1;
+    const match = /\s+$/.exec(currentLine);
+    if (match) {
+      const startOffset = myLocation.toOffset({ line: lineIndex, column: match.index+1 });
+      const endOffset = myLocation.toOffset({ line: lineIndex, column: currentLine.length+1 });
       file.message('Remove trailing whitespace', {
-        position: {
-          start: { line: lineIndex, column: currentLine.length + 1 },
-          end: { line: lineIndex }
-        }
+        start: myLocation.toPoint(startOffset),
+        end: myLocation.toPoint(endOffset),
       });
     }
   }
 }
-var remarkLintNoTrailingSpaces$1 = getDefaultExportFromCjs(remarkLintNoTrailingSpaces);
 
 function* getLinksRecursively(node) {
   if (node.url) {
@@ -18016,7 +17729,7 @@ function* getLinksRecursively(node) {
   }
 }
 function validateLinks(tree, vfile) {
-  const currentFileURL = pathToFileURL(path$2.join(vfile.cwd, vfile.path));
+  const currentFileURL = pathToFileURL(path$1.join(vfile.cwd, vfile.path));
   let previousDefinitionLabel;
   for (const node of getLinksRecursively(tree)) {
     if (node.url[0] !== "#") {
@@ -20860,474 +20573,532 @@ var jsYaml = {
 	safeDump: safeDump
 };
 
-const debug$1 = (
-  typeof process === 'object' &&
-  process.env &&
-  process.env.NODE_DEBUG &&
-  /\bsemver\b/i.test(process.env.NODE_DEBUG)
-) ? (...args) => console.error('SEMVER', ...args)
-  : () => {};
-var debug_1 = debug$1;
-getDefaultExportFromCjs(debug_1);
-
-const SEMVER_SPEC_VERSION = '2.0.0';
-const MAX_LENGTH$1 = 256;
-const MAX_SAFE_INTEGER$1 = Number.MAX_SAFE_INTEGER ||
- 9007199254740991;
-const MAX_SAFE_COMPONENT_LENGTH = 16;
-const MAX_SAFE_BUILD_LENGTH = MAX_LENGTH$1 - 6;
-const RELEASE_TYPES = [
-  'major',
-  'premajor',
-  'minor',
-  'preminor',
-  'patch',
-  'prepatch',
-  'prerelease',
-];
-var constants = {
-  MAX_LENGTH: MAX_LENGTH$1,
-  MAX_SAFE_COMPONENT_LENGTH,
-  MAX_SAFE_BUILD_LENGTH,
-  MAX_SAFE_INTEGER: MAX_SAFE_INTEGER$1,
-  RELEASE_TYPES,
-  SEMVER_SPEC_VERSION,
-  FLAG_INCLUDE_PRERELEASE: 0b001,
-  FLAG_LOOSE: 0b010,
-};
-getDefaultExportFromCjs(constants);
-
-var re$1 = {exports: {}};
+var debug_1;
+var hasRequiredDebug;
+function requireDebug () {
+	if (hasRequiredDebug) return debug_1;
+	hasRequiredDebug = 1;
+	const debug = (
+	  typeof process === 'object' &&
+	  process.env &&
+	  process.env.NODE_DEBUG &&
+	  /\bsemver\b/i.test(process.env.NODE_DEBUG)
+	) ? (...args) => console.error('SEMVER', ...args)
+	  : () => {};
+	debug_1 = debug;
+	return debug_1;
+}
 
-(function (module, exports) {
-	const {
+var constants;
+var hasRequiredConstants;
+function requireConstants () {
+	if (hasRequiredConstants) return constants;
+	hasRequiredConstants = 1;
+	const SEMVER_SPEC_VERSION = '2.0.0';
+	const MAX_LENGTH = 256;
+	const MAX_SAFE_INTEGER = Number.MAX_SAFE_INTEGER ||
+	 9007199254740991;
+	const MAX_SAFE_COMPONENT_LENGTH = 16;
+	const MAX_SAFE_BUILD_LENGTH = MAX_LENGTH - 6;
+	const RELEASE_TYPES = [
+	  'major',
+	  'premajor',
+	  'minor',
+	  'preminor',
+	  'patch',
+	  'prepatch',
+	  'prerelease',
+	];
+	constants = {
+	  MAX_LENGTH,
 	  MAX_SAFE_COMPONENT_LENGTH,
 	  MAX_SAFE_BUILD_LENGTH,
-	  MAX_LENGTH,
-	} = constants;
-	const debug = debug_1;
-	exports = module.exports = {};
-	const re = exports.re = [];
-	const safeRe = exports.safeRe = [];
-	const src = exports.src = [];
-	const t = exports.t = {};
-	let R = 0;
-	const LETTERDASHNUMBER = '[a-zA-Z0-9-]';
-	const safeRegexReplacements = [
-	  ['\\s', 1],
-	  ['\\d', MAX_LENGTH],
-	  [LETTERDASHNUMBER, MAX_SAFE_BUILD_LENGTH],
-	];
-	const makeSafeRegex = (value) => {
-	  for (const [token, max] of safeRegexReplacements) {
-	    value = value
-	      .split(`${token}*`).join(`${token}{0,${max}}`)
-	      .split(`${token}+`).join(`${token}{1,${max}}`);
+	  MAX_SAFE_INTEGER,
+	  RELEASE_TYPES,
+	  SEMVER_SPEC_VERSION,
+	  FLAG_INCLUDE_PRERELEASE: 0b001,
+	  FLAG_LOOSE: 0b010,
+	};
+	return constants;
+}
+
+var re = {exports: {}};
+
+var hasRequiredRe;
+function requireRe () {
+	if (hasRequiredRe) return re.exports;
+	hasRequiredRe = 1;
+	(function (module, exports) {
+		const {
+		  MAX_SAFE_COMPONENT_LENGTH,
+		  MAX_SAFE_BUILD_LENGTH,
+		  MAX_LENGTH,
+		} = requireConstants();
+		const debug = requireDebug();
+		exports = module.exports = {};
+		const re = exports.re = [];
+		const safeRe = exports.safeRe = [];
+		const src = exports.src = [];
+		const t = exports.t = {};
+		let R = 0;
+		const LETTERDASHNUMBER = '[a-zA-Z0-9-]';
+		const safeRegexReplacements = [
+		  ['\\s', 1],
+		  ['\\d', MAX_LENGTH],
+		  [LETTERDASHNUMBER, MAX_SAFE_BUILD_LENGTH],
+		];
+		const makeSafeRegex = (value) => {
+		  for (const [token, max] of safeRegexReplacements) {
+		    value = value
+		      .split(`${token}*`).join(`${token}{0,${max}}`)
+		      .split(`${token}+`).join(`${token}{1,${max}}`);
+		  }
+		  return value
+		};
+		const createToken = (name, value, isGlobal) => {
+		  const safe = makeSafeRegex(value);
+		  const index = R++;
+		  debug(name, index, value);
+		  t[name] = index;
+		  src[index] = value;
+		  re[index] = new RegExp(value, isGlobal ? 'g' : undefined);
+		  safeRe[index] = new RegExp(safe, isGlobal ? 'g' : undefined);
+		};
+		createToken('NUMERICIDENTIFIER', '0|[1-9]\\d*');
+		createToken('NUMERICIDENTIFIERLOOSE', '\\d+');
+		createToken('NONNUMERICIDENTIFIER', `\\d*[a-zA-Z-]${LETTERDASHNUMBER}*`);
+		createToken('MAINVERSION', `(${src[t.NUMERICIDENTIFIER]})\\.` +
+		                   `(${src[t.NUMERICIDENTIFIER]})\\.` +
+		                   `(${src[t.NUMERICIDENTIFIER]})`);
+		createToken('MAINVERSIONLOOSE', `(${src[t.NUMERICIDENTIFIERLOOSE]})\\.` +
+		                        `(${src[t.NUMERICIDENTIFIERLOOSE]})\\.` +
+		                        `(${src[t.NUMERICIDENTIFIERLOOSE]})`);
+		createToken('PRERELEASEIDENTIFIER', `(?:${src[t.NUMERICIDENTIFIER]
+		}|${src[t.NONNUMERICIDENTIFIER]})`);
+		createToken('PRERELEASEIDENTIFIERLOOSE', `(?:${src[t.NUMERICIDENTIFIERLOOSE]
+		}|${src[t.NONNUMERICIDENTIFIER]})`);
+		createToken('PRERELEASE', `(?:-(${src[t.PRERELEASEIDENTIFIER]
+		}(?:\\.${src[t.PRERELEASEIDENTIFIER]})*))`);
+		createToken('PRERELEASELOOSE', `(?:-?(${src[t.PRERELEASEIDENTIFIERLOOSE]
+		}(?:\\.${src[t.PRERELEASEIDENTIFIERLOOSE]})*))`);
+		createToken('BUILDIDENTIFIER', `${LETTERDASHNUMBER}+`);
+		createToken('BUILD', `(?:\\+(${src[t.BUILDIDENTIFIER]
+		}(?:\\.${src[t.BUILDIDENTIFIER]})*))`);
+		createToken('FULLPLAIN', `v?${src[t.MAINVERSION]
+		}${src[t.PRERELEASE]}?${
+		  src[t.BUILD]}?`);
+		createToken('FULL', `^${src[t.FULLPLAIN]}$`);
+		createToken('LOOSEPLAIN', `[v=\\s]*${src[t.MAINVERSIONLOOSE]
+		}${src[t.PRERELEASELOOSE]}?${
+		  src[t.BUILD]}?`);
+		createToken('LOOSE', `^${src[t.LOOSEPLAIN]}$`);
+		createToken('GTLT', '((?:<|>)?=?)');
+		createToken('XRANGEIDENTIFIERLOOSE', `${src[t.NUMERICIDENTIFIERLOOSE]}|x|X|\\*`);
+		createToken('XRANGEIDENTIFIER', `${src[t.NUMERICIDENTIFIER]}|x|X|\\*`);
+		createToken('XRANGEPLAIN', `[v=\\s]*(${src[t.XRANGEIDENTIFIER]})` +
+		                   `(?:\\.(${src[t.XRANGEIDENTIFIER]})` +
+		                   `(?:\\.(${src[t.XRANGEIDENTIFIER]})` +
+		                   `(?:${src[t.PRERELEASE]})?${
+		                     src[t.BUILD]}?` +
+		                   `)?)?`);
+		createToken('XRANGEPLAINLOOSE', `[v=\\s]*(${src[t.XRANGEIDENTIFIERLOOSE]})` +
+		                        `(?:\\.(${src[t.XRANGEIDENTIFIERLOOSE]})` +
+		                        `(?:\\.(${src[t.XRANGEIDENTIFIERLOOSE]})` +
+		                        `(?:${src[t.PRERELEASELOOSE]})?${
+		                          src[t.BUILD]}?` +
+		                        `)?)?`);
+		createToken('XRANGE', `^${src[t.GTLT]}\\s*${src[t.XRANGEPLAIN]}$`);
+		createToken('XRANGELOOSE', `^${src[t.GTLT]}\\s*${src[t.XRANGEPLAINLOOSE]}$`);
+		createToken('COERCEPLAIN', `${'(^|[^\\d])' +
+		              '(\\d{1,'}${MAX_SAFE_COMPONENT_LENGTH}})` +
+		              `(?:\\.(\\d{1,${MAX_SAFE_COMPONENT_LENGTH}}))?` +
+		              `(?:\\.(\\d{1,${MAX_SAFE_COMPONENT_LENGTH}}))?`);
+		createToken('COERCE', `${src[t.COERCEPLAIN]}(?:$|[^\\d])`);
+		createToken('COERCEFULL', src[t.COERCEPLAIN] +
+		              `(?:${src[t.PRERELEASE]})?` +
+		              `(?:${src[t.BUILD]})?` +
+		              `(?:$|[^\\d])`);
+		createToken('COERCERTL', src[t.COERCE], true);
+		createToken('COERCERTLFULL', src[t.COERCEFULL], true);
+		createToken('LONETILDE', '(?:~>?)');
+		createToken('TILDETRIM', `(\\s*)${src[t.LONETILDE]}\\s+`, true);
+		exports.tildeTrimReplace = '$1~';
+		createToken('TILDE', `^${src[t.LONETILDE]}${src[t.XRANGEPLAIN]}$`);
+		createToken('TILDELOOSE', `^${src[t.LONETILDE]}${src[t.XRANGEPLAINLOOSE]}$`);
+		createToken('LONECARET', '(?:\\^)');
+		createToken('CARETTRIM', `(\\s*)${src[t.LONECARET]}\\s+`, true);
+		exports.caretTrimReplace = '$1^';
+		createToken('CARET', `^${src[t.LONECARET]}${src[t.XRANGEPLAIN]}$`);
+		createToken('CARETLOOSE', `^${src[t.LONECARET]}${src[t.XRANGEPLAINLOOSE]}$`);
+		createToken('COMPARATORLOOSE', `^${src[t.GTLT]}\\s*(${src[t.LOOSEPLAIN]})$|^$`);
+		createToken('COMPARATOR', `^${src[t.GTLT]}\\s*(${src[t.FULLPLAIN]})$|^$`);
+		createToken('COMPARATORTRIM', `(\\s*)${src[t.GTLT]
+		}\\s*(${src[t.LOOSEPLAIN]}|${src[t.XRANGEPLAIN]})`, true);
+		exports.comparatorTrimReplace = '$1$2$3';
+		createToken('HYPHENRANGE', `^\\s*(${src[t.XRANGEPLAIN]})` +
+		                   `\\s+-\\s+` +
+		                   `(${src[t.XRANGEPLAIN]})` +
+		                   `\\s*$`);
+		createToken('HYPHENRANGELOOSE', `^\\s*(${src[t.XRANGEPLAINLOOSE]})` +
+		                        `\\s+-\\s+` +
+		                        `(${src[t.XRANGEPLAINLOOSE]})` +
+		                        `\\s*$`);
+		createToken('STAR', '(<|>)?=?\\s*\\*');
+		createToken('GTE0', '^\\s*>=\\s*0\\.0\\.0\\s*$');
+		createToken('GTE0PRE', '^\\s*>=\\s*0\\.0\\.0-0\\s*$');
+	} (re, re.exports));
+	return re.exports;
+}
+
+var parseOptions_1;
+var hasRequiredParseOptions;
+function requireParseOptions () {
+	if (hasRequiredParseOptions) return parseOptions_1;
+	hasRequiredParseOptions = 1;
+	const looseOption = Object.freeze({ loose: true });
+	const emptyOpts = Object.freeze({ });
+	const parseOptions = options => {
+	  if (!options) {
+	    return emptyOpts
+	  }
+	  if (typeof options !== 'object') {
+	    return looseOption
+	  }
+	  return options
+	};
+	parseOptions_1 = parseOptions;
+	return parseOptions_1;
+}
+
+var identifiers;
+var hasRequiredIdentifiers;
+function requireIdentifiers () {
+	if (hasRequiredIdentifiers) return identifiers;
+	hasRequiredIdentifiers = 1;
+	const numeric = /^[0-9]+$/;
+	const compareIdentifiers = (a, b) => {
+	  const anum = numeric.test(a);
+	  const bnum = numeric.test(b);
+	  if (anum && bnum) {
+	    a = +a;
+	    b = +b;
 	  }
-	  return value
+	  return a === b ? 0
+	    : (anum && !bnum) ? -1
+	    : (bnum && !anum) ? 1
+	    : a < b ? -1
+	    : 1
 	};
-	const createToken = (name, value, isGlobal) => {
-	  const safe = makeSafeRegex(value);
-	  const index = R++;
-	  debug(name, index, value);
-	  t[name] = index;
-	  src[index] = value;
-	  re[index] = new RegExp(value, isGlobal ? 'g' : undefined);
-	  safeRe[index] = new RegExp(safe, isGlobal ? 'g' : undefined);
+	const rcompareIdentifiers = (a, b) => compareIdentifiers(b, a);
+	identifiers = {
+	  compareIdentifiers,
+	  rcompareIdentifiers,
 	};
-	createToken('NUMERICIDENTIFIER', '0|[1-9]\\d*');
-	createToken('NUMERICIDENTIFIERLOOSE', '\\d+');
-	createToken('NONNUMERICIDENTIFIER', `\\d*[a-zA-Z-]${LETTERDASHNUMBER}*`);
-	createToken('MAINVERSION', `(${src[t.NUMERICIDENTIFIER]})\\.` +
-	                   `(${src[t.NUMERICIDENTIFIER]})\\.` +
-	                   `(${src[t.NUMERICIDENTIFIER]})`);
-	createToken('MAINVERSIONLOOSE', `(${src[t.NUMERICIDENTIFIERLOOSE]})\\.` +
-	                        `(${src[t.NUMERICIDENTIFIERLOOSE]})\\.` +
-	                        `(${src[t.NUMERICIDENTIFIERLOOSE]})`);
-	createToken('PRERELEASEIDENTIFIER', `(?:${src[t.NUMERICIDENTIFIER]
-	}|${src[t.NONNUMERICIDENTIFIER]})`);
-	createToken('PRERELEASEIDENTIFIERLOOSE', `(?:${src[t.NUMERICIDENTIFIERLOOSE]
-	}|${src[t.NONNUMERICIDENTIFIER]})`);
-	createToken('PRERELEASE', `(?:-(${src[t.PRERELEASEIDENTIFIER]
-	}(?:\\.${src[t.PRERELEASEIDENTIFIER]})*))`);
-	createToken('PRERELEASELOOSE', `(?:-?(${src[t.PRERELEASEIDENTIFIERLOOSE]
-	}(?:\\.${src[t.PRERELEASEIDENTIFIERLOOSE]})*))`);
-	createToken('BUILDIDENTIFIER', `${LETTERDASHNUMBER}+`);
-	createToken('BUILD', `(?:\\+(${src[t.BUILDIDENTIFIER]
-	}(?:\\.${src[t.BUILDIDENTIFIER]})*))`);
-	createToken('FULLPLAIN', `v?${src[t.MAINVERSION]
-	}${src[t.PRERELEASE]}?${
-	  src[t.BUILD]}?`);
-	createToken('FULL', `^${src[t.FULLPLAIN]}$`);
-	createToken('LOOSEPLAIN', `[v=\\s]*${src[t.MAINVERSIONLOOSE]
-	}${src[t.PRERELEASELOOSE]}?${
-	  src[t.BUILD]}?`);
-	createToken('LOOSE', `^${src[t.LOOSEPLAIN]}$`);
-	createToken('GTLT', '((?:<|>)?=?)');
-	createToken('XRANGEIDENTIFIERLOOSE', `${src[t.NUMERICIDENTIFIERLOOSE]}|x|X|\\*`);
-	createToken('XRANGEIDENTIFIER', `${src[t.NUMERICIDENTIFIER]}|x|X|\\*`);
-	createToken('XRANGEPLAIN', `[v=\\s]*(${src[t.XRANGEIDENTIFIER]})` +
-	                   `(?:\\.(${src[t.XRANGEIDENTIFIER]})` +
-	                   `(?:\\.(${src[t.XRANGEIDENTIFIER]})` +
-	                   `(?:${src[t.PRERELEASE]})?${
-	                     src[t.BUILD]}?` +
-	                   `)?)?`);
-	createToken('XRANGEPLAINLOOSE', `[v=\\s]*(${src[t.XRANGEIDENTIFIERLOOSE]})` +
-	                        `(?:\\.(${src[t.XRANGEIDENTIFIERLOOSE]})` +
-	                        `(?:\\.(${src[t.XRANGEIDENTIFIERLOOSE]})` +
-	                        `(?:${src[t.PRERELEASELOOSE]})?${
-	                          src[t.BUILD]}?` +
-	                        `)?)?`);
-	createToken('XRANGE', `^${src[t.GTLT]}\\s*${src[t.XRANGEPLAIN]}$`);
-	createToken('XRANGELOOSE', `^${src[t.GTLT]}\\s*${src[t.XRANGEPLAINLOOSE]}$`);
-	createToken('COERCEPLAIN', `${'(^|[^\\d])' +
-	              '(\\d{1,'}${MAX_SAFE_COMPONENT_LENGTH}})` +
-	              `(?:\\.(\\d{1,${MAX_SAFE_COMPONENT_LENGTH}}))?` +
-	              `(?:\\.(\\d{1,${MAX_SAFE_COMPONENT_LENGTH}}))?`);
-	createToken('COERCE', `${src[t.COERCEPLAIN]}(?:$|[^\\d])`);
-	createToken('COERCEFULL', src[t.COERCEPLAIN] +
-	              `(?:${src[t.PRERELEASE]})?` +
-	              `(?:${src[t.BUILD]})?` +
-	              `(?:$|[^\\d])`);
-	createToken('COERCERTL', src[t.COERCE], true);
-	createToken('COERCERTLFULL', src[t.COERCEFULL], true);
-	createToken('LONETILDE', '(?:~>?)');
-	createToken('TILDETRIM', `(\\s*)${src[t.LONETILDE]}\\s+`, true);
-	exports.tildeTrimReplace = '$1~';
-	createToken('TILDE', `^${src[t.LONETILDE]}${src[t.XRANGEPLAIN]}$`);
-	createToken('TILDELOOSE', `^${src[t.LONETILDE]}${src[t.XRANGEPLAINLOOSE]}$`);
-	createToken('LONECARET', '(?:\\^)');
-	createToken('CARETTRIM', `(\\s*)${src[t.LONECARET]}\\s+`, true);
-	exports.caretTrimReplace = '$1^';
-	createToken('CARET', `^${src[t.LONECARET]}${src[t.XRANGEPLAIN]}$`);
-	createToken('CARETLOOSE', `^${src[t.LONECARET]}${src[t.XRANGEPLAINLOOSE]}$`);
-	createToken('COMPARATORLOOSE', `^${src[t.GTLT]}\\s*(${src[t.LOOSEPLAIN]})$|^$`);
-	createToken('COMPARATOR', `^${src[t.GTLT]}\\s*(${src[t.FULLPLAIN]})$|^$`);
-	createToken('COMPARATORTRIM', `(\\s*)${src[t.GTLT]
-	}\\s*(${src[t.LOOSEPLAIN]}|${src[t.XRANGEPLAIN]})`, true);
-	exports.comparatorTrimReplace = '$1$2$3';
-	createToken('HYPHENRANGE', `^\\s*(${src[t.XRANGEPLAIN]})` +
-	                   `\\s+-\\s+` +
-	                   `(${src[t.XRANGEPLAIN]})` +
-	                   `\\s*$`);
-	createToken('HYPHENRANGELOOSE', `^\\s*(${src[t.XRANGEPLAINLOOSE]})` +
-	                        `\\s+-\\s+` +
-	                        `(${src[t.XRANGEPLAINLOOSE]})` +
-	                        `\\s*$`);
-	createToken('STAR', '(<|>)?=?\\s*\\*');
-	createToken('GTE0', '^\\s*>=\\s*0\\.0\\.0\\s*$');
-	createToken('GTE0PRE', '^\\s*>=\\s*0\\.0\\.0-0\\s*$');
-} (re$1, re$1.exports));
-var reExports = re$1.exports;
-getDefaultExportFromCjs(reExports);
+	return identifiers;
+}
 
-const looseOption = Object.freeze({ loose: true });
-const emptyOpts = Object.freeze({ });
-const parseOptions$1 = options => {
-  if (!options) {
-    return emptyOpts
-  }
-  if (typeof options !== 'object') {
-    return looseOption
-  }
-  return options
-};
-var parseOptions_1 = parseOptions$1;
-getDefaultExportFromCjs(parseOptions_1);
+var semver;
+var hasRequiredSemver;
+function requireSemver () {
+	if (hasRequiredSemver) return semver;
+	hasRequiredSemver = 1;
+	const debug = requireDebug();
+	const { MAX_LENGTH, MAX_SAFE_INTEGER } = requireConstants();
+	const { safeRe: re, t } = requireRe();
+	const parseOptions = requireParseOptions();
+	const { compareIdentifiers } = requireIdentifiers();
+	class SemVer {
+	  constructor (version, options) {
+	    options = parseOptions(options);
+	    if (version instanceof SemVer) {
+	      if (version.loose === !!options.loose &&
+	          version.includePrerelease === !!options.includePrerelease) {
+	        return version
+	      } else {
+	        version = version.version;
+	      }
+	    } else if (typeof version !== 'string') {
+	      throw new TypeError(`Invalid version. Must be a string. Got type "${typeof version}".`)
+	    }
+	    if (version.length > MAX_LENGTH) {
+	      throw new TypeError(
+	        `version is longer than ${MAX_LENGTH} characters`
+	      )
+	    }
+	    debug('SemVer', version, options);
+	    this.options = options;
+	    this.loose = !!options.loose;
+	    this.includePrerelease = !!options.includePrerelease;
+	    const m = version.trim().match(options.loose ? re[t.LOOSE] : re[t.FULL]);
+	    if (!m) {
+	      throw new TypeError(`Invalid Version: ${version}`)
+	    }
+	    this.raw = version;
+	    this.major = +m[1];
+	    this.minor = +m[2];
+	    this.patch = +m[3];
+	    if (this.major > MAX_SAFE_INTEGER || this.major < 0) {
+	      throw new TypeError('Invalid major version')
+	    }
+	    if (this.minor > MAX_SAFE_INTEGER || this.minor < 0) {
+	      throw new TypeError('Invalid minor version')
+	    }
+	    if (this.patch > MAX_SAFE_INTEGER || this.patch < 0) {
+	      throw new TypeError('Invalid patch version')
+	    }
+	    if (!m[4]) {
+	      this.prerelease = [];
+	    } else {
+	      this.prerelease = m[4].split('.').map((id) => {
+	        if (/^[0-9]+$/.test(id)) {
+	          const num = +id;
+	          if (num >= 0 && num < MAX_SAFE_INTEGER) {
+	            return num
+	          }
+	        }
+	        return id
+	      });
+	    }
+	    this.build = m[5] ? m[5].split('.') : [];
+	    this.format();
+	  }
+	  format () {
+	    this.version = `${this.major}.${this.minor}.${this.patch}`;
+	    if (this.prerelease.length) {
+	      this.version += `-${this.prerelease.join('.')}`;
+	    }
+	    return this.version
+	  }
+	  toString () {
+	    return this.version
+	  }
+	  compare (other) {
+	    debug('SemVer.compare', this.version, this.options, other);
+	    if (!(other instanceof SemVer)) {
+	      if (typeof other === 'string' && other === this.version) {
+	        return 0
+	      }
+	      other = new SemVer(other, this.options);
+	    }
+	    if (other.version === this.version) {
+	      return 0
+	    }
+	    return this.compareMain(other) || this.comparePre(other)
+	  }
+	  compareMain (other) {
+	    if (!(other instanceof SemVer)) {
+	      other = new SemVer(other, this.options);
+	    }
+	    return (
+	      compareIdentifiers(this.major, other.major) ||
+	      compareIdentifiers(this.minor, other.minor) ||
+	      compareIdentifiers(this.patch, other.patch)
+	    )
+	  }
+	  comparePre (other) {
+	    if (!(other instanceof SemVer)) {
+	      other = new SemVer(other, this.options);
+	    }
+	    if (this.prerelease.length && !other.prerelease.length) {
+	      return -1
+	    } else if (!this.prerelease.length && other.prerelease.length) {
+	      return 1
+	    } else if (!this.prerelease.length && !other.prerelease.length) {
+	      return 0
+	    }
+	    let i = 0;
+	    do {
+	      const a = this.prerelease[i];
+	      const b = other.prerelease[i];
+	      debug('prerelease compare', i, a, b);
+	      if (a === undefined && b === undefined) {
+	        return 0
+	      } else if (b === undefined) {
+	        return 1
+	      } else if (a === undefined) {
+	        return -1
+	      } else if (a === b) {
+	        continue
+	      } else {
+	        return compareIdentifiers(a, b)
+	      }
+	    } while (++i)
+	  }
+	  compareBuild (other) {
+	    if (!(other instanceof SemVer)) {
+	      other = new SemVer(other, this.options);
+	    }
+	    let i = 0;
+	    do {
+	      const a = this.build[i];
+	      const b = other.build[i];
+	      debug('build compare', i, a, b);
+	      if (a === undefined && b === undefined) {
+	        return 0
+	      } else if (b === undefined) {
+	        return 1
+	      } else if (a === undefined) {
+	        return -1
+	      } else if (a === b) {
+	        continue
+	      } else {
+	        return compareIdentifiers(a, b)
+	      }
+	    } while (++i)
+	  }
+	  inc (release, identifier, identifierBase) {
+	    switch (release) {
+	      case 'premajor':
+	        this.prerelease.length = 0;
+	        this.patch = 0;
+	        this.minor = 0;
+	        this.major++;
+	        this.inc('pre', identifier, identifierBase);
+	        break
+	      case 'preminor':
+	        this.prerelease.length = 0;
+	        this.patch = 0;
+	        this.minor++;
+	        this.inc('pre', identifier, identifierBase);
+	        break
+	      case 'prepatch':
+	        this.prerelease.length = 0;
+	        this.inc('patch', identifier, identifierBase);
+	        this.inc('pre', identifier, identifierBase);
+	        break
+	      case 'prerelease':
+	        if (this.prerelease.length === 0) {
+	          this.inc('patch', identifier, identifierBase);
+	        }
+	        this.inc('pre', identifier, identifierBase);
+	        break
+	      case 'major':
+	        if (
+	          this.minor !== 0 ||
+	          this.patch !== 0 ||
+	          this.prerelease.length === 0
+	        ) {
+	          this.major++;
+	        }
+	        this.minor = 0;
+	        this.patch = 0;
+	        this.prerelease = [];
+	        break
+	      case 'minor':
+	        if (this.patch !== 0 || this.prerelease.length === 0) {
+	          this.minor++;
+	        }
+	        this.patch = 0;
+	        this.prerelease = [];
+	        break
+	      case 'patch':
+	        if (this.prerelease.length === 0) {
+	          this.patch++;
+	        }
+	        this.prerelease = [];
+	        break
+	      case 'pre': {
+	        const base = Number(identifierBase) ? 1 : 0;
+	        if (!identifier && identifierBase === false) {
+	          throw new Error('invalid increment argument: identifier is empty')
+	        }
+	        if (this.prerelease.length === 0) {
+	          this.prerelease = [base];
+	        } else {
+	          let i = this.prerelease.length;
+	          while (--i >= 0) {
+	            if (typeof this.prerelease[i] === 'number') {
+	              this.prerelease[i]++;
+	              i = -2;
+	            }
+	          }
+	          if (i === -1) {
+	            if (identifier === this.prerelease.join('.') && identifierBase === false) {
+	              throw new Error('invalid increment argument: identifier already exists')
+	            }
+	            this.prerelease.push(base);
+	          }
+	        }
+	        if (identifier) {
+	          let prerelease = [identifier, base];
+	          if (identifierBase === false) {
+	            prerelease = [identifier];
+	          }
+	          if (compareIdentifiers(this.prerelease[0], identifier) === 0) {
+	            if (isNaN(this.prerelease[1])) {
+	              this.prerelease = prerelease;
+	            }
+	          } else {
+	            this.prerelease = prerelease;
+	          }
+	        }
+	        break
+	      }
+	      default:
+	        throw new Error(`invalid increment argument: ${release}`)
+	    }
+	    this.raw = this.format();
+	    if (this.build.length) {
+	      this.raw += `+${this.build.join('.')}`;
+	    }
+	    return this
+	  }
+	}
+	semver = SemVer;
+	return semver;
+}
 
-const numeric = /^[0-9]+$/;
-const compareIdentifiers$1 = (a, b) => {
-  const anum = numeric.test(a);
-  const bnum = numeric.test(b);
-  if (anum && bnum) {
-    a = +a;
-    b = +b;
-  }
-  return a === b ? 0
-    : (anum && !bnum) ? -1
-    : (bnum && !anum) ? 1
-    : a < b ? -1
-    : 1
-};
-const rcompareIdentifiers = (a, b) => compareIdentifiers$1(b, a);
-var identifiers = {
-  compareIdentifiers: compareIdentifiers$1,
-  rcompareIdentifiers,
-};
-getDefaultExportFromCjs(identifiers);
+var parse_1;
+var hasRequiredParse;
+function requireParse () {
+	if (hasRequiredParse) return parse_1;
+	hasRequiredParse = 1;
+	const SemVer = requireSemver();
+	const parse = (version, options, throwErrors = false) => {
+	  if (version instanceof SemVer) {
+	    return version
+	  }
+	  try {
+	    return new SemVer(version, options)
+	  } catch (er) {
+	    if (!throwErrors) {
+	      return null
+	    }
+	    throw er
+	  }
+	};
+	parse_1 = parse;
+	return parse_1;
+}
 
-const debug = debug_1;
-const { MAX_LENGTH, MAX_SAFE_INTEGER } = constants;
-const { safeRe: re, t } = reExports;
-const parseOptions = parseOptions_1;
-const { compareIdentifiers } = identifiers;
-let SemVer$2 = class SemVer {
-  constructor (version, options) {
-    options = parseOptions(options);
-    if (version instanceof SemVer) {
-      if (version.loose === !!options.loose &&
-          version.includePrerelease === !!options.includePrerelease) {
-        return version
-      } else {
-        version = version.version;
-      }
-    } else if (typeof version !== 'string') {
-      throw new TypeError(`Invalid version. Must be a string. Got type "${typeof version}".`)
-    }
-    if (version.length > MAX_LENGTH) {
-      throw new TypeError(
-        `version is longer than ${MAX_LENGTH} characters`
-      )
-    }
-    debug('SemVer', version, options);
-    this.options = options;
-    this.loose = !!options.loose;
-    this.includePrerelease = !!options.includePrerelease;
-    const m = version.trim().match(options.loose ? re[t.LOOSE] : re[t.FULL]);
-    if (!m) {
-      throw new TypeError(`Invalid Version: ${version}`)
-    }
-    this.raw = version;
-    this.major = +m[1];
-    this.minor = +m[2];
-    this.patch = +m[3];
-    if (this.major > MAX_SAFE_INTEGER || this.major < 0) {
-      throw new TypeError('Invalid major version')
-    }
-    if (this.minor > MAX_SAFE_INTEGER || this.minor < 0) {
-      throw new TypeError('Invalid minor version')
-    }
-    if (this.patch > MAX_SAFE_INTEGER || this.patch < 0) {
-      throw new TypeError('Invalid patch version')
-    }
-    if (!m[4]) {
-      this.prerelease = [];
-    } else {
-      this.prerelease = m[4].split('.').map((id) => {
-        if (/^[0-9]+$/.test(id)) {
-          const num = +id;
-          if (num >= 0 && num < MAX_SAFE_INTEGER) {
-            return num
-          }
-        }
-        return id
-      });
-    }
-    this.build = m[5] ? m[5].split('.') : [];
-    this.format();
-  }
-  format () {
-    this.version = `${this.major}.${this.minor}.${this.patch}`;
-    if (this.prerelease.length) {
-      this.version += `-${this.prerelease.join('.')}`;
-    }
-    return this.version
-  }
-  toString () {
-    return this.version
-  }
-  compare (other) {
-    debug('SemVer.compare', this.version, this.options, other);
-    if (!(other instanceof SemVer)) {
-      if (typeof other === 'string' && other === this.version) {
-        return 0
-      }
-      other = new SemVer(other, this.options);
-    }
-    if (other.version === this.version) {
-      return 0
-    }
-    return this.compareMain(other) || this.comparePre(other)
-  }
-  compareMain (other) {
-    if (!(other instanceof SemVer)) {
-      other = new SemVer(other, this.options);
-    }
-    return (
-      compareIdentifiers(this.major, other.major) ||
-      compareIdentifiers(this.minor, other.minor) ||
-      compareIdentifiers(this.patch, other.patch)
-    )
-  }
-  comparePre (other) {
-    if (!(other instanceof SemVer)) {
-      other = new SemVer(other, this.options);
-    }
-    if (this.prerelease.length && !other.prerelease.length) {
-      return -1
-    } else if (!this.prerelease.length && other.prerelease.length) {
-      return 1
-    } else if (!this.prerelease.length && !other.prerelease.length) {
-      return 0
-    }
-    let i = 0;
-    do {
-      const a = this.prerelease[i];
-      const b = other.prerelease[i];
-      debug('prerelease compare', i, a, b);
-      if (a === undefined && b === undefined) {
-        return 0
-      } else if (b === undefined) {
-        return 1
-      } else if (a === undefined) {
-        return -1
-      } else if (a === b) {
-        continue
-      } else {
-        return compareIdentifiers(a, b)
-      }
-    } while (++i)
-  }
-  compareBuild (other) {
-    if (!(other instanceof SemVer)) {
-      other = new SemVer(other, this.options);
-    }
-    let i = 0;
-    do {
-      const a = this.build[i];
-      const b = other.build[i];
-      debug('build compare', i, a, b);
-      if (a === undefined && b === undefined) {
-        return 0
-      } else if (b === undefined) {
-        return 1
-      } else if (a === undefined) {
-        return -1
-      } else if (a === b) {
-        continue
-      } else {
-        return compareIdentifiers(a, b)
-      }
-    } while (++i)
-  }
-  inc (release, identifier, identifierBase) {
-    switch (release) {
-      case 'premajor':
-        this.prerelease.length = 0;
-        this.patch = 0;
-        this.minor = 0;
-        this.major++;
-        this.inc('pre', identifier, identifierBase);
-        break
-      case 'preminor':
-        this.prerelease.length = 0;
-        this.patch = 0;
-        this.minor++;
-        this.inc('pre', identifier, identifierBase);
-        break
-      case 'prepatch':
-        this.prerelease.length = 0;
-        this.inc('patch', identifier, identifierBase);
-        this.inc('pre', identifier, identifierBase);
-        break
-      case 'prerelease':
-        if (this.prerelease.length === 0) {
-          this.inc('patch', identifier, identifierBase);
-        }
-        this.inc('pre', identifier, identifierBase);
-        break
-      case 'major':
-        if (
-          this.minor !== 0 ||
-          this.patch !== 0 ||
-          this.prerelease.length === 0
-        ) {
-          this.major++;
-        }
-        this.minor = 0;
-        this.patch = 0;
-        this.prerelease = [];
-        break
-      case 'minor':
-        if (this.patch !== 0 || this.prerelease.length === 0) {
-          this.minor++;
-        }
-        this.patch = 0;
-        this.prerelease = [];
-        break
-      case 'patch':
-        if (this.prerelease.length === 0) {
-          this.patch++;
-        }
-        this.prerelease = [];
-        break
-      case 'pre': {
-        const base = Number(identifierBase) ? 1 : 0;
-        if (!identifier && identifierBase === false) {
-          throw new Error('invalid increment argument: identifier is empty')
-        }
-        if (this.prerelease.length === 0) {
-          this.prerelease = [base];
-        } else {
-          let i = this.prerelease.length;
-          while (--i >= 0) {
-            if (typeof this.prerelease[i] === 'number') {
-              this.prerelease[i]++;
-              i = -2;
-            }
-          }
-          if (i === -1) {
-            if (identifier === this.prerelease.join('.') && identifierBase === false) {
-              throw new Error('invalid increment argument: identifier already exists')
-            }
-            this.prerelease.push(base);
-          }
-        }
-        if (identifier) {
-          let prerelease = [identifier, base];
-          if (identifierBase === false) {
-            prerelease = [identifier];
-          }
-          if (compareIdentifiers(this.prerelease[0], identifier) === 0) {
-            if (isNaN(this.prerelease[1])) {
-              this.prerelease = prerelease;
-            }
-          } else {
-            this.prerelease = prerelease;
-          }
-        }
-        break
-      }
-      default:
-        throw new Error(`invalid increment argument: ${release}`)
-    }
-    this.raw = this.format();
-    if (this.build.length) {
-      this.raw += `+${this.build.join('.')}`;
-    }
-    return this
-  }
-};
-var semver = SemVer$2;
-getDefaultExportFromCjs(semver);
+var parseExports = requireParse();
+var semverParse = /*@__PURE__*/getDefaultExportFromCjs(parseExports);
 
-const SemVer$1 = semver;
-const parse = (version, options, throwErrors = false) => {
-  if (version instanceof SemVer$1) {
-    return version
-  }
-  try {
-    return new SemVer$1(version, options)
-  } catch (er) {
-    if (!throwErrors) {
-      return null
-    }
-    throw er
-  }
-};
-var parse_1 = parse;
-var semverParse = getDefaultExportFromCjs(parse_1);
+var compare_1;
+var hasRequiredCompare;
+function requireCompare () {
+	if (hasRequiredCompare) return compare_1;
+	hasRequiredCompare = 1;
+	const SemVer = requireSemver();
+	const compare = (a, b, loose) =>
+	  new SemVer(a, loose).compare(new SemVer(b, loose));
+	compare_1 = compare;
+	return compare_1;
+}
 
-const SemVer = semver;
-const compare$1 = (a, b, loose) =>
-  new SemVer(a, loose).compare(new SemVer(b, loose));
-var compare_1 = compare$1;
-getDefaultExportFromCjs(compare_1);
+var lt_1;
+var hasRequiredLt;
+function requireLt () {
+	if (hasRequiredLt) return lt_1;
+	hasRequiredLt = 1;
+	const compare = requireCompare();
+	const lt = (a, b, loose) => compare(a, b, loose) < 0;
+	lt_1 = lt;
+	return lt_1;
+}
 
-const compare = compare_1;
-const lt = (a, b, loose) => compare(a, b, loose) < 0;
-var lt_1 = lt;
-var semverLt = getDefaultExportFromCjs(lt_1);
+var ltExports = requireLt();
+var semverLt = /*@__PURE__*/getDefaultExportFromCjs(ltExports);
 
 const allowedKeys = [
   "added",
@@ -23139,7 +22910,7 @@ const plugins = [
   remarkLintNoShellDollars,
   remarkLintNoTableIndentation,
   remarkLintNoTabs,
-  remarkLintNoTrailingSpaces$1,
+  rule,
   remarkLintNodejsLinks,
   remarkLintNodejsYamlComments,
   [
@@ -23184,7 +22955,7 @@ function read(description, options, callback) {
   function executor(resolve, reject) {
     let fp;
     try {
-      fp = path$1.resolve(file.cwd, file.path);
+      fp = minpath.resolve(file.cwd, file.path);
     } catch (error) {
       const exception =  (error);
       return reject(exception)
@@ -23226,9 +22997,10 @@ function isUint8Array(value) {
 }
 
 function ansiRegex({onlyFirst = false} = {}) {
+	const ST = '(?:\\u0007|\\u001B\\u005C|\\u009C)';
 	const pattern = [
-	    '[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]+)*|[a-zA-Z\\d]+(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
-		'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-ntqry=><~]))'
+		`[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]+)*|[a-zA-Z\\d]+(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?${ST})`,
+		'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-nq-uy=><~]))',
 	].join('|');
 	return new RegExp(pattern, onlyFirst ? undefined : 'g');
 }
@@ -23243,313 +23015,320 @@ function stripAnsi(string) {
 
 var eastasianwidth = {exports: {}};
 
-(function (module) {
-	var eaw = {};
-	{
-	  module.exports = eaw;
-	}
-	eaw.eastAsianWidth = function(character) {
-	  var x = character.charCodeAt(0);
-	  var y = (character.length == 2) ? character.charCodeAt(1) : 0;
-	  var codePoint = x;
-	  if ((0xD800 <= x && x <= 0xDBFF) && (0xDC00 <= y && y <= 0xDFFF)) {
-	    x &= 0x3FF;
-	    y &= 0x3FF;
-	    codePoint = (x << 10) | y;
-	    codePoint += 0x10000;
-	  }
-	  if ((0x3000 == codePoint) ||
-	      (0xFF01 <= codePoint && codePoint <= 0xFF60) ||
-	      (0xFFE0 <= codePoint && codePoint <= 0xFFE6)) {
-	    return 'F';
-	  }
-	  if ((0x20A9 == codePoint) ||
-	      (0xFF61 <= codePoint && codePoint <= 0xFFBE) ||
-	      (0xFFC2 <= codePoint && codePoint <= 0xFFC7) ||
-	      (0xFFCA <= codePoint && codePoint <= 0xFFCF) ||
-	      (0xFFD2 <= codePoint && codePoint <= 0xFFD7) ||
-	      (0xFFDA <= codePoint && codePoint <= 0xFFDC) ||
-	      (0xFFE8 <= codePoint && codePoint <= 0xFFEE)) {
-	    return 'H';
-	  }
-	  if ((0x1100 <= codePoint && codePoint <= 0x115F) ||
-	      (0x11A3 <= codePoint && codePoint <= 0x11A7) ||
-	      (0x11FA <= codePoint && codePoint <= 0x11FF) ||
-	      (0x2329 <= codePoint && codePoint <= 0x232A) ||
-	      (0x2E80 <= codePoint && codePoint <= 0x2E99) ||
-	      (0x2E9B <= codePoint && codePoint <= 0x2EF3) ||
-	      (0x2F00 <= codePoint && codePoint <= 0x2FD5) ||
-	      (0x2FF0 <= codePoint && codePoint <= 0x2FFB) ||
-	      (0x3001 <= codePoint && codePoint <= 0x303E) ||
-	      (0x3041 <= codePoint && codePoint <= 0x3096) ||
-	      (0x3099 <= codePoint && codePoint <= 0x30FF) ||
-	      (0x3105 <= codePoint && codePoint <= 0x312D) ||
-	      (0x3131 <= codePoint && codePoint <= 0x318E) ||
-	      (0x3190 <= codePoint && codePoint <= 0x31BA) ||
-	      (0x31C0 <= codePoint && codePoint <= 0x31E3) ||
-	      (0x31F0 <= codePoint && codePoint <= 0x321E) ||
-	      (0x3220 <= codePoint && codePoint <= 0x3247) ||
-	      (0x3250 <= codePoint && codePoint <= 0x32FE) ||
-	      (0x3300 <= codePoint && codePoint <= 0x4DBF) ||
-	      (0x4E00 <= codePoint && codePoint <= 0xA48C) ||
-	      (0xA490 <= codePoint && codePoint <= 0xA4C6) ||
-	      (0xA960 <= codePoint && codePoint <= 0xA97C) ||
-	      (0xAC00 <= codePoint && codePoint <= 0xD7A3) ||
-	      (0xD7B0 <= codePoint && codePoint <= 0xD7C6) ||
-	      (0xD7CB <= codePoint && codePoint <= 0xD7FB) ||
-	      (0xF900 <= codePoint && codePoint <= 0xFAFF) ||
-	      (0xFE10 <= codePoint && codePoint <= 0xFE19) ||
-	      (0xFE30 <= codePoint && codePoint <= 0xFE52) ||
-	      (0xFE54 <= codePoint && codePoint <= 0xFE66) ||
-	      (0xFE68 <= codePoint && codePoint <= 0xFE6B) ||
-	      (0x1B000 <= codePoint && codePoint <= 0x1B001) ||
-	      (0x1F200 <= codePoint && codePoint <= 0x1F202) ||
-	      (0x1F210 <= codePoint && codePoint <= 0x1F23A) ||
-	      (0x1F240 <= codePoint && codePoint <= 0x1F248) ||
-	      (0x1F250 <= codePoint && codePoint <= 0x1F251) ||
-	      (0x20000 <= codePoint && codePoint <= 0x2F73F) ||
-	      (0x2B740 <= codePoint && codePoint <= 0x2FFFD) ||
-	      (0x30000 <= codePoint && codePoint <= 0x3FFFD)) {
-	    return 'W';
-	  }
-	  if ((0x0020 <= codePoint && codePoint <= 0x007E) ||
-	      (0x00A2 <= codePoint && codePoint <= 0x00A3) ||
-	      (0x00A5 <= codePoint && codePoint <= 0x00A6) ||
-	      (0x00AC == codePoint) ||
-	      (0x00AF == codePoint) ||
-	      (0x27E6 <= codePoint && codePoint <= 0x27ED) ||
-	      (0x2985 <= codePoint && codePoint <= 0x2986)) {
-	    return 'Na';
-	  }
-	  if ((0x00A1 == codePoint) ||
-	      (0x00A4 == codePoint) ||
-	      (0x00A7 <= codePoint && codePoint <= 0x00A8) ||
-	      (0x00AA == codePoint) ||
-	      (0x00AD <= codePoint && codePoint <= 0x00AE) ||
-	      (0x00B0 <= codePoint && codePoint <= 0x00B4) ||
-	      (0x00B6 <= codePoint && codePoint <= 0x00BA) ||
-	      (0x00BC <= codePoint && codePoint <= 0x00BF) ||
-	      (0x00C6 == codePoint) ||
-	      (0x00D0 == codePoint) ||
-	      (0x00D7 <= codePoint && codePoint <= 0x00D8) ||
-	      (0x00DE <= codePoint && codePoint <= 0x00E1) ||
-	      (0x00E6 == codePoint) ||
-	      (0x00E8 <= codePoint && codePoint <= 0x00EA) ||
-	      (0x00EC <= codePoint && codePoint <= 0x00ED) ||
-	      (0x00F0 == codePoint) ||
-	      (0x00F2 <= codePoint && codePoint <= 0x00F3) ||
-	      (0x00F7 <= codePoint && codePoint <= 0x00FA) ||
-	      (0x00FC == codePoint) ||
-	      (0x00FE == codePoint) ||
-	      (0x0101 == codePoint) ||
-	      (0x0111 == codePoint) ||
-	      (0x0113 == codePoint) ||
-	      (0x011B == codePoint) ||
-	      (0x0126 <= codePoint && codePoint <= 0x0127) ||
-	      (0x012B == codePoint) ||
-	      (0x0131 <= codePoint && codePoint <= 0x0133) ||
-	      (0x0138 == codePoint) ||
-	      (0x013F <= codePoint && codePoint <= 0x0142) ||
-	      (0x0144 == codePoint) ||
-	      (0x0148 <= codePoint && codePoint <= 0x014B) ||
-	      (0x014D == codePoint) ||
-	      (0x0152 <= codePoint && codePoint <= 0x0153) ||
-	      (0x0166 <= codePoint && codePoint <= 0x0167) ||
-	      (0x016B == codePoint) ||
-	      (0x01CE == codePoint) ||
-	      (0x01D0 == codePoint) ||
-	      (0x01D2 == codePoint) ||
-	      (0x01D4 == codePoint) ||
-	      (0x01D6 == codePoint) ||
-	      (0x01D8 == codePoint) ||
-	      (0x01DA == codePoint) ||
-	      (0x01DC == codePoint) ||
-	      (0x0251 == codePoint) ||
-	      (0x0261 == codePoint) ||
-	      (0x02C4 == codePoint) ||
-	      (0x02C7 == codePoint) ||
-	      (0x02C9 <= codePoint && codePoint <= 0x02CB) ||
-	      (0x02CD == codePoint) ||
-	      (0x02D0 == codePoint) ||
-	      (0x02D8 <= codePoint && codePoint <= 0x02DB) ||
-	      (0x02DD == codePoint) ||
-	      (0x02DF == codePoint) ||
-	      (0x0300 <= codePoint && codePoint <= 0x036F) ||
-	      (0x0391 <= codePoint && codePoint <= 0x03A1) ||
-	      (0x03A3 <= codePoint && codePoint <= 0x03A9) ||
-	      (0x03B1 <= codePoint && codePoint <= 0x03C1) ||
-	      (0x03C3 <= codePoint && codePoint <= 0x03C9) ||
-	      (0x0401 == codePoint) ||
-	      (0x0410 <= codePoint && codePoint <= 0x044F) ||
-	      (0x0451 == codePoint) ||
-	      (0x2010 == codePoint) ||
-	      (0x2013 <= codePoint && codePoint <= 0x2016) ||
-	      (0x2018 <= codePoint && codePoint <= 0x2019) ||
-	      (0x201C <= codePoint && codePoint <= 0x201D) ||
-	      (0x2020 <= codePoint && codePoint <= 0x2022) ||
-	      (0x2024 <= codePoint && codePoint <= 0x2027) ||
-	      (0x2030 == codePoint) ||
-	      (0x2032 <= codePoint && codePoint <= 0x2033) ||
-	      (0x2035 == codePoint) ||
-	      (0x203B == codePoint) ||
-	      (0x203E == codePoint) ||
-	      (0x2074 == codePoint) ||
-	      (0x207F == codePoint) ||
-	      (0x2081 <= codePoint && codePoint <= 0x2084) ||
-	      (0x20AC == codePoint) ||
-	      (0x2103 == codePoint) ||
-	      (0x2105 == codePoint) ||
-	      (0x2109 == codePoint) ||
-	      (0x2113 == codePoint) ||
-	      (0x2116 == codePoint) ||
-	      (0x2121 <= codePoint && codePoint <= 0x2122) ||
-	      (0x2126 == codePoint) ||
-	      (0x212B == codePoint) ||
-	      (0x2153 <= codePoint && codePoint <= 0x2154) ||
-	      (0x215B <= codePoint && codePoint <= 0x215E) ||
-	      (0x2160 <= codePoint && codePoint <= 0x216B) ||
-	      (0x2170 <= codePoint && codePoint <= 0x2179) ||
-	      (0x2189 == codePoint) ||
-	      (0x2190 <= codePoint && codePoint <= 0x2199) ||
-	      (0x21B8 <= codePoint && codePoint <= 0x21B9) ||
-	      (0x21D2 == codePoint) ||
-	      (0x21D4 == codePoint) ||
-	      (0x21E7 == codePoint) ||
-	      (0x2200 == codePoint) ||
-	      (0x2202 <= codePoint && codePoint <= 0x2203) ||
-	      (0x2207 <= codePoint && codePoint <= 0x2208) ||
-	      (0x220B == codePoint) ||
-	      (0x220F == codePoint) ||
-	      (0x2211 == codePoint) ||
-	      (0x2215 == codePoint) ||
-	      (0x221A == codePoint) ||
-	      (0x221D <= codePoint && codePoint <= 0x2220) ||
-	      (0x2223 == codePoint) ||
-	      (0x2225 == codePoint) ||
-	      (0x2227 <= codePoint && codePoint <= 0x222C) ||
-	      (0x222E == codePoint) ||
-	      (0x2234 <= codePoint && codePoint <= 0x2237) ||
-	      (0x223C <= codePoint && codePoint <= 0x223D) ||
-	      (0x2248 == codePoint) ||
-	      (0x224C == codePoint) ||
-	      (0x2252 == codePoint) ||
-	      (0x2260 <= codePoint && codePoint <= 0x2261) ||
-	      (0x2264 <= codePoint && codePoint <= 0x2267) ||
-	      (0x226A <= codePoint && codePoint <= 0x226B) ||
-	      (0x226E <= codePoint && codePoint <= 0x226F) ||
-	      (0x2282 <= codePoint && codePoint <= 0x2283) ||
-	      (0x2286 <= codePoint && codePoint <= 0x2287) ||
-	      (0x2295 == codePoint) ||
-	      (0x2299 == codePoint) ||
-	      (0x22A5 == codePoint) ||
-	      (0x22BF == codePoint) ||
-	      (0x2312 == codePoint) ||
-	      (0x2460 <= codePoint && codePoint <= 0x24E9) ||
-	      (0x24EB <= codePoint && codePoint <= 0x254B) ||
-	      (0x2550 <= codePoint && codePoint <= 0x2573) ||
-	      (0x2580 <= codePoint && codePoint <= 0x258F) ||
-	      (0x2592 <= codePoint && codePoint <= 0x2595) ||
-	      (0x25A0 <= codePoint && codePoint <= 0x25A1) ||
-	      (0x25A3 <= codePoint && codePoint <= 0x25A9) ||
-	      (0x25B2 <= codePoint && codePoint <= 0x25B3) ||
-	      (0x25B6 <= codePoint && codePoint <= 0x25B7) ||
-	      (0x25BC <= codePoint && codePoint <= 0x25BD) ||
-	      (0x25C0 <= codePoint && codePoint <= 0x25C1) ||
-	      (0x25C6 <= codePoint && codePoint <= 0x25C8) ||
-	      (0x25CB == codePoint) ||
-	      (0x25CE <= codePoint && codePoint <= 0x25D1) ||
-	      (0x25E2 <= codePoint && codePoint <= 0x25E5) ||
-	      (0x25EF == codePoint) ||
-	      (0x2605 <= codePoint && codePoint <= 0x2606) ||
-	      (0x2609 == codePoint) ||
-	      (0x260E <= codePoint && codePoint <= 0x260F) ||
-	      (0x2614 <= codePoint && codePoint <= 0x2615) ||
-	      (0x261C == codePoint) ||
-	      (0x261E == codePoint) ||
-	      (0x2640 == codePoint) ||
-	      (0x2642 == codePoint) ||
-	      (0x2660 <= codePoint && codePoint <= 0x2661) ||
-	      (0x2663 <= codePoint && codePoint <= 0x2665) ||
-	      (0x2667 <= codePoint && codePoint <= 0x266A) ||
-	      (0x266C <= codePoint && codePoint <= 0x266D) ||
-	      (0x266F == codePoint) ||
-	      (0x269E <= codePoint && codePoint <= 0x269F) ||
-	      (0x26BE <= codePoint && codePoint <= 0x26BF) ||
-	      (0x26C4 <= codePoint && codePoint <= 0x26CD) ||
-	      (0x26CF <= codePoint && codePoint <= 0x26E1) ||
-	      (0x26E3 == codePoint) ||
-	      (0x26E8 <= codePoint && codePoint <= 0x26FF) ||
-	      (0x273D == codePoint) ||
-	      (0x2757 == codePoint) ||
-	      (0x2776 <= codePoint && codePoint <= 0x277F) ||
-	      (0x2B55 <= codePoint && codePoint <= 0x2B59) ||
-	      (0x3248 <= codePoint && codePoint <= 0x324F) ||
-	      (0xE000 <= codePoint && codePoint <= 0xF8FF) ||
-	      (0xFE00 <= codePoint && codePoint <= 0xFE0F) ||
-	      (0xFFFD == codePoint) ||
-	      (0x1F100 <= codePoint && codePoint <= 0x1F10A) ||
-	      (0x1F110 <= codePoint && codePoint <= 0x1F12D) ||
-	      (0x1F130 <= codePoint && codePoint <= 0x1F169) ||
-	      (0x1F170 <= codePoint && codePoint <= 0x1F19A) ||
-	      (0xE0100 <= codePoint && codePoint <= 0xE01EF) ||
-	      (0xF0000 <= codePoint && codePoint <= 0xFFFFD) ||
-	      (0x100000 <= codePoint && codePoint <= 0x10FFFD)) {
-	    return 'A';
-	  }
-	  return 'N';
-	};
-	eaw.characterLength = function(character) {
-	  var code = this.eastAsianWidth(character);
-	  if (code == 'F' || code == 'W' || code == 'A') {
-	    return 2;
-	  } else {
-	    return 1;
-	  }
-	};
-	function stringToArray(string) {
-	  return string.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]|[^\uD800-\uDFFF]/g) || [];
-	}
-	eaw.length = function(string) {
-	  var characters = stringToArray(string);
-	  var len = 0;
-	  for (var i = 0; i < characters.length; i++) {
-	    len = len + this.characterLength(characters[i]);
-	  }
-	  return len;
-	};
-	eaw.slice = function(text, start, end) {
-	  textLen = eaw.length(text);
-	  start = start ? start : 0;
-	  end = end ? end : 1;
-	  if (start < 0) {
-	      start = textLen + start;
-	  }
-	  if (end < 0) {
-	      end = textLen + end;
-	  }
-	  var result = '';
-	  var eawLen = 0;
-	  var chars = stringToArray(text);
-	  for (var i = 0; i < chars.length; i++) {
-	    var char = chars[i];
-	    var charLen = eaw.length(char);
-	    if (eawLen >= start - (charLen == 2 ? 1 : 0)) {
-	        if (eawLen + charLen <= end) {
-	            result += char;
-	        } else {
-	            break;
-	        }
-	    }
-	    eawLen += charLen;
-	  }
-	  return result;
-	};
-} (eastasianwidth));
-var eastasianwidthExports = eastasianwidth.exports;
-var eastAsianWidth = getDefaultExportFromCjs(eastasianwidthExports);
+var hasRequiredEastasianwidth;
+function requireEastasianwidth () {
+	if (hasRequiredEastasianwidth) return eastasianwidth.exports;
+	hasRequiredEastasianwidth = 1;
+	(function (module) {
+		var eaw = {};
+		{
+		  module.exports = eaw;
+		}
+		eaw.eastAsianWidth = function(character) {
+		  var x = character.charCodeAt(0);
+		  var y = (character.length == 2) ? character.charCodeAt(1) : 0;
+		  var codePoint = x;
+		  if ((0xD800 <= x && x <= 0xDBFF) && (0xDC00 <= y && y <= 0xDFFF)) {
+		    x &= 0x3FF;
+		    y &= 0x3FF;
+		    codePoint = (x << 10) | y;
+		    codePoint += 0x10000;
+		  }
+		  if ((0x3000 == codePoint) ||
+		      (0xFF01 <= codePoint && codePoint <= 0xFF60) ||
+		      (0xFFE0 <= codePoint && codePoint <= 0xFFE6)) {
+		    return 'F';
+		  }
+		  if ((0x20A9 == codePoint) ||
+		      (0xFF61 <= codePoint && codePoint <= 0xFFBE) ||
+		      (0xFFC2 <= codePoint && codePoint <= 0xFFC7) ||
+		      (0xFFCA <= codePoint && codePoint <= 0xFFCF) ||
+		      (0xFFD2 <= codePoint && codePoint <= 0xFFD7) ||
+		      (0xFFDA <= codePoint && codePoint <= 0xFFDC) ||
+		      (0xFFE8 <= codePoint && codePoint <= 0xFFEE)) {
+		    return 'H';
+		  }
+		  if ((0x1100 <= codePoint && codePoint <= 0x115F) ||
+		      (0x11A3 <= codePoint && codePoint <= 0x11A7) ||
+		      (0x11FA <= codePoint && codePoint <= 0x11FF) ||
+		      (0x2329 <= codePoint && codePoint <= 0x232A) ||
+		      (0x2E80 <= codePoint && codePoint <= 0x2E99) ||
+		      (0x2E9B <= codePoint && codePoint <= 0x2EF3) ||
+		      (0x2F00 <= codePoint && codePoint <= 0x2FD5) ||
+		      (0x2FF0 <= codePoint && codePoint <= 0x2FFB) ||
+		      (0x3001 <= codePoint && codePoint <= 0x303E) ||
+		      (0x3041 <= codePoint && codePoint <= 0x3096) ||
+		      (0x3099 <= codePoint && codePoint <= 0x30FF) ||
+		      (0x3105 <= codePoint && codePoint <= 0x312D) ||
+		      (0x3131 <= codePoint && codePoint <= 0x318E) ||
+		      (0x3190 <= codePoint && codePoint <= 0x31BA) ||
+		      (0x31C0 <= codePoint && codePoint <= 0x31E3) ||
+		      (0x31F0 <= codePoint && codePoint <= 0x321E) ||
+		      (0x3220 <= codePoint && codePoint <= 0x3247) ||
+		      (0x3250 <= codePoint && codePoint <= 0x32FE) ||
+		      (0x3300 <= codePoint && codePoint <= 0x4DBF) ||
+		      (0x4E00 <= codePoint && codePoint <= 0xA48C) ||
+		      (0xA490 <= codePoint && codePoint <= 0xA4C6) ||
+		      (0xA960 <= codePoint && codePoint <= 0xA97C) ||
+		      (0xAC00 <= codePoint && codePoint <= 0xD7A3) ||
+		      (0xD7B0 <= codePoint && codePoint <= 0xD7C6) ||
+		      (0xD7CB <= codePoint && codePoint <= 0xD7FB) ||
+		      (0xF900 <= codePoint && codePoint <= 0xFAFF) ||
+		      (0xFE10 <= codePoint && codePoint <= 0xFE19) ||
+		      (0xFE30 <= codePoint && codePoint <= 0xFE52) ||
+		      (0xFE54 <= codePoint && codePoint <= 0xFE66) ||
+		      (0xFE68 <= codePoint && codePoint <= 0xFE6B) ||
+		      (0x1B000 <= codePoint && codePoint <= 0x1B001) ||
+		      (0x1F200 <= codePoint && codePoint <= 0x1F202) ||
+		      (0x1F210 <= codePoint && codePoint <= 0x1F23A) ||
+		      (0x1F240 <= codePoint && codePoint <= 0x1F248) ||
+		      (0x1F250 <= codePoint && codePoint <= 0x1F251) ||
+		      (0x20000 <= codePoint && codePoint <= 0x2F73F) ||
+		      (0x2B740 <= codePoint && codePoint <= 0x2FFFD) ||
+		      (0x30000 <= codePoint && codePoint <= 0x3FFFD)) {
+		    return 'W';
+		  }
+		  if ((0x0020 <= codePoint && codePoint <= 0x007E) ||
+		      (0x00A2 <= codePoint && codePoint <= 0x00A3) ||
+		      (0x00A5 <= codePoint && codePoint <= 0x00A6) ||
+		      (0x00AC == codePoint) ||
+		      (0x00AF == codePoint) ||
+		      (0x27E6 <= codePoint && codePoint <= 0x27ED) ||
+		      (0x2985 <= codePoint && codePoint <= 0x2986)) {
+		    return 'Na';
+		  }
+		  if ((0x00A1 == codePoint) ||
+		      (0x00A4 == codePoint) ||
+		      (0x00A7 <= codePoint && codePoint <= 0x00A8) ||
+		      (0x00AA == codePoint) ||
+		      (0x00AD <= codePoint && codePoint <= 0x00AE) ||
+		      (0x00B0 <= codePoint && codePoint <= 0x00B4) ||
+		      (0x00B6 <= codePoint && codePoint <= 0x00BA) ||
+		      (0x00BC <= codePoint && codePoint <= 0x00BF) ||
+		      (0x00C6 == codePoint) ||
+		      (0x00D0 == codePoint) ||
+		      (0x00D7 <= codePoint && codePoint <= 0x00D8) ||
+		      (0x00DE <= codePoint && codePoint <= 0x00E1) ||
+		      (0x00E6 == codePoint) ||
+		      (0x00E8 <= codePoint && codePoint <= 0x00EA) ||
+		      (0x00EC <= codePoint && codePoint <= 0x00ED) ||
+		      (0x00F0 == codePoint) ||
+		      (0x00F2 <= codePoint && codePoint <= 0x00F3) ||
+		      (0x00F7 <= codePoint && codePoint <= 0x00FA) ||
+		      (0x00FC == codePoint) ||
+		      (0x00FE == codePoint) ||
+		      (0x0101 == codePoint) ||
+		      (0x0111 == codePoint) ||
+		      (0x0113 == codePoint) ||
+		      (0x011B == codePoint) ||
+		      (0x0126 <= codePoint && codePoint <= 0x0127) ||
+		      (0x012B == codePoint) ||
+		      (0x0131 <= codePoint && codePoint <= 0x0133) ||
+		      (0x0138 == codePoint) ||
+		      (0x013F <= codePoint && codePoint <= 0x0142) ||
+		      (0x0144 == codePoint) ||
+		      (0x0148 <= codePoint && codePoint <= 0x014B) ||
+		      (0x014D == codePoint) ||
+		      (0x0152 <= codePoint && codePoint <= 0x0153) ||
+		      (0x0166 <= codePoint && codePoint <= 0x0167) ||
+		      (0x016B == codePoint) ||
+		      (0x01CE == codePoint) ||
+		      (0x01D0 == codePoint) ||
+		      (0x01D2 == codePoint) ||
+		      (0x01D4 == codePoint) ||
+		      (0x01D6 == codePoint) ||
+		      (0x01D8 == codePoint) ||
+		      (0x01DA == codePoint) ||
+		      (0x01DC == codePoint) ||
+		      (0x0251 == codePoint) ||
+		      (0x0261 == codePoint) ||
+		      (0x02C4 == codePoint) ||
+		      (0x02C7 == codePoint) ||
+		      (0x02C9 <= codePoint && codePoint <= 0x02CB) ||
+		      (0x02CD == codePoint) ||
+		      (0x02D0 == codePoint) ||
+		      (0x02D8 <= codePoint && codePoint <= 0x02DB) ||
+		      (0x02DD == codePoint) ||
+		      (0x02DF == codePoint) ||
+		      (0x0300 <= codePoint && codePoint <= 0x036F) ||
+		      (0x0391 <= codePoint && codePoint <= 0x03A1) ||
+		      (0x03A3 <= codePoint && codePoint <= 0x03A9) ||
+		      (0x03B1 <= codePoint && codePoint <= 0x03C1) ||
+		      (0x03C3 <= codePoint && codePoint <= 0x03C9) ||
+		      (0x0401 == codePoint) ||
+		      (0x0410 <= codePoint && codePoint <= 0x044F) ||
+		      (0x0451 == codePoint) ||
+		      (0x2010 == codePoint) ||
+		      (0x2013 <= codePoint && codePoint <= 0x2016) ||
+		      (0x2018 <= codePoint && codePoint <= 0x2019) ||
+		      (0x201C <= codePoint && codePoint <= 0x201D) ||
+		      (0x2020 <= codePoint && codePoint <= 0x2022) ||
+		      (0x2024 <= codePoint && codePoint <= 0x2027) ||
+		      (0x2030 == codePoint) ||
+		      (0x2032 <= codePoint && codePoint <= 0x2033) ||
+		      (0x2035 == codePoint) ||
+		      (0x203B == codePoint) ||
+		      (0x203E == codePoint) ||
+		      (0x2074 == codePoint) ||
+		      (0x207F == codePoint) ||
+		      (0x2081 <= codePoint && codePoint <= 0x2084) ||
+		      (0x20AC == codePoint) ||
+		      (0x2103 == codePoint) ||
+		      (0x2105 == codePoint) ||
+		      (0x2109 == codePoint) ||
+		      (0x2113 == codePoint) ||
+		      (0x2116 == codePoint) ||
+		      (0x2121 <= codePoint && codePoint <= 0x2122) ||
+		      (0x2126 == codePoint) ||
+		      (0x212B == codePoint) ||
+		      (0x2153 <= codePoint && codePoint <= 0x2154) ||
+		      (0x215B <= codePoint && codePoint <= 0x215E) ||
+		      (0x2160 <= codePoint && codePoint <= 0x216B) ||
+		      (0x2170 <= codePoint && codePoint <= 0x2179) ||
+		      (0x2189 == codePoint) ||
+		      (0x2190 <= codePoint && codePoint <= 0x2199) ||
+		      (0x21B8 <= codePoint && codePoint <= 0x21B9) ||
+		      (0x21D2 == codePoint) ||
+		      (0x21D4 == codePoint) ||
+		      (0x21E7 == codePoint) ||
+		      (0x2200 == codePoint) ||
+		      (0x2202 <= codePoint && codePoint <= 0x2203) ||
+		      (0x2207 <= codePoint && codePoint <= 0x2208) ||
+		      (0x220B == codePoint) ||
+		      (0x220F == codePoint) ||
+		      (0x2211 == codePoint) ||
+		      (0x2215 == codePoint) ||
+		      (0x221A == codePoint) ||
+		      (0x221D <= codePoint && codePoint <= 0x2220) ||
+		      (0x2223 == codePoint) ||
+		      (0x2225 == codePoint) ||
+		      (0x2227 <= codePoint && codePoint <= 0x222C) ||
+		      (0x222E == codePoint) ||
+		      (0x2234 <= codePoint && codePoint <= 0x2237) ||
+		      (0x223C <= codePoint && codePoint <= 0x223D) ||
+		      (0x2248 == codePoint) ||
+		      (0x224C == codePoint) ||
+		      (0x2252 == codePoint) ||
+		      (0x2260 <= codePoint && codePoint <= 0x2261) ||
+		      (0x2264 <= codePoint && codePoint <= 0x2267) ||
+		      (0x226A <= codePoint && codePoint <= 0x226B) ||
+		      (0x226E <= codePoint && codePoint <= 0x226F) ||
+		      (0x2282 <= codePoint && codePoint <= 0x2283) ||
+		      (0x2286 <= codePoint && codePoint <= 0x2287) ||
+		      (0x2295 == codePoint) ||
+		      (0x2299 == codePoint) ||
+		      (0x22A5 == codePoint) ||
+		      (0x22BF == codePoint) ||
+		      (0x2312 == codePoint) ||
+		      (0x2460 <= codePoint && codePoint <= 0x24E9) ||
+		      (0x24EB <= codePoint && codePoint <= 0x254B) ||
+		      (0x2550 <= codePoint && codePoint <= 0x2573) ||
+		      (0x2580 <= codePoint && codePoint <= 0x258F) ||
+		      (0x2592 <= codePoint && codePoint <= 0x2595) ||
+		      (0x25A0 <= codePoint && codePoint <= 0x25A1) ||
+		      (0x25A3 <= codePoint && codePoint <= 0x25A9) ||
+		      (0x25B2 <= codePoint && codePoint <= 0x25B3) ||
+		      (0x25B6 <= codePoint && codePoint <= 0x25B7) ||
+		      (0x25BC <= codePoint && codePoint <= 0x25BD) ||
+		      (0x25C0 <= codePoint && codePoint <= 0x25C1) ||
+		      (0x25C6 <= codePoint && codePoint <= 0x25C8) ||
+		      (0x25CB == codePoint) ||
+		      (0x25CE <= codePoint && codePoint <= 0x25D1) ||
+		      (0x25E2 <= codePoint && codePoint <= 0x25E5) ||
+		      (0x25EF == codePoint) ||
+		      (0x2605 <= codePoint && codePoint <= 0x2606) ||
+		      (0x2609 == codePoint) ||
+		      (0x260E <= codePoint && codePoint <= 0x260F) ||
+		      (0x2614 <= codePoint && codePoint <= 0x2615) ||
+		      (0x261C == codePoint) ||
+		      (0x261E == codePoint) ||
+		      (0x2640 == codePoint) ||
+		      (0x2642 == codePoint) ||
+		      (0x2660 <= codePoint && codePoint <= 0x2661) ||
+		      (0x2663 <= codePoint && codePoint <= 0x2665) ||
+		      (0x2667 <= codePoint && codePoint <= 0x266A) ||
+		      (0x266C <= codePoint && codePoint <= 0x266D) ||
+		      (0x266F == codePoint) ||
+		      (0x269E <= codePoint && codePoint <= 0x269F) ||
+		      (0x26BE <= codePoint && codePoint <= 0x26BF) ||
+		      (0x26C4 <= codePoint && codePoint <= 0x26CD) ||
+		      (0x26CF <= codePoint && codePoint <= 0x26E1) ||
+		      (0x26E3 == codePoint) ||
+		      (0x26E8 <= codePoint && codePoint <= 0x26FF) ||
+		      (0x273D == codePoint) ||
+		      (0x2757 == codePoint) ||
+		      (0x2776 <= codePoint && codePoint <= 0x277F) ||
+		      (0x2B55 <= codePoint && codePoint <= 0x2B59) ||
+		      (0x3248 <= codePoint && codePoint <= 0x324F) ||
+		      (0xE000 <= codePoint && codePoint <= 0xF8FF) ||
+		      (0xFE00 <= codePoint && codePoint <= 0xFE0F) ||
+		      (0xFFFD == codePoint) ||
+		      (0x1F100 <= codePoint && codePoint <= 0x1F10A) ||
+		      (0x1F110 <= codePoint && codePoint <= 0x1F12D) ||
+		      (0x1F130 <= codePoint && codePoint <= 0x1F169) ||
+		      (0x1F170 <= codePoint && codePoint <= 0x1F19A) ||
+		      (0xE0100 <= codePoint && codePoint <= 0xE01EF) ||
+		      (0xF0000 <= codePoint && codePoint <= 0xFFFFD) ||
+		      (0x100000 <= codePoint && codePoint <= 0x10FFFD)) {
+		    return 'A';
+		  }
+		  return 'N';
+		};
+		eaw.characterLength = function(character) {
+		  var code = this.eastAsianWidth(character);
+		  if (code == 'F' || code == 'W' || code == 'A') {
+		    return 2;
+		  } else {
+		    return 1;
+		  }
+		};
+		function stringToArray(string) {
+		  return string.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]|[^\uD800-\uDFFF]/g) || [];
+		}
+		eaw.length = function(string) {
+		  var characters = stringToArray(string);
+		  var len = 0;
+		  for (var i = 0; i < characters.length; i++) {
+		    len = len + this.characterLength(characters[i]);
+		  }
+		  return len;
+		};
+		eaw.slice = function(text, start, end) {
+		  textLen = eaw.length(text);
+		  start = start ? start : 0;
+		  end = end ? end : 1;
+		  if (start < 0) {
+		      start = textLen + start;
+		  }
+		  if (end < 0) {
+		      end = textLen + end;
+		  }
+		  var result = '';
+		  var eawLen = 0;
+		  var chars = stringToArray(text);
+		  for (var i = 0; i < chars.length; i++) {
+		    var char = chars[i];
+		    var charLen = eaw.length(char);
+		    if (eawLen >= start - (charLen == 2 ? 1 : 0)) {
+		        if (eawLen + charLen <= end) {
+		            result += char;
+		        } else {
+		            break;
+		        }
+		    }
+		    eawLen += charLen;
+		  }
+		  return result;
+		};
+	} (eastasianwidth));
+	return eastasianwidth.exports;
+}
+
+var eastasianwidthExports = requireEastasianwidth();
+var eastAsianWidth = /*@__PURE__*/getDefaultExportFromCjs(eastasianwidthExports);
 
 var emojiRegex = () => {
-	return /[#*0-9]\uFE0F?\u20E3|[\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23ED-\u23EF\u23F1\u23F2\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB\u25FC\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692\u2694-\u2697\u2699\u269B\u269C\u26A0\u26A7\u26AA\u26B0\u26B1\u26BD\u26BE\u26C4\u26C8\u26CF\u26D1\u26E9\u26F0-\u26F5\u26F7\u26F8\u26FA\u2702\u2708\u2709\u270F\u2712\u2714\u2716\u271D\u2721\u2733\u2734\u2744\u2747\u2757\u2763\u27A1\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B55\u3030\u303D\u3297\u3299]\uFE0F?|[\u261D\u270C\u270D](?:\uFE0F|\uD83C[\uDFFB-\uDFFF])?|[\u270A\u270B](?:\uD83C[\uDFFB-\uDFFF])?|[\u23E9-\u23EC\u23F0\u23F3\u25FD\u2693\u26A1\u26AB\u26C5\u26CE\u26D4\u26EA\u26FD\u2705\u2728\u274C\u274E\u2753-\u2755\u2795-\u2797\u27B0\u27BF\u2B50]|\u26D3\uFE0F?(?:\u200D\uD83D\uDCA5)?|\u26F9(?:\uFE0F|\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|\u2764\uFE0F?(?:\u200D(?:\uD83D\uDD25|\uD83E\uDE79))?|\uD83C(?:[\uDC04\uDD70\uDD71\uDD7E\uDD7F\uDE02\uDE37\uDF21\uDF24-\uDF2C\uDF36\uDF7D\uDF96\uDF97\uDF99-\uDF9B\uDF9E\uDF9F\uDFCD\uDFCE\uDFD4-\uDFDF\uDFF5\uDFF7]\uFE0F?|[\uDF85\uDFC2\uDFC7](?:\uD83C[\uDFFB-\uDFFF])?|[\uDFC4\uDFCA](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDFCB\uDFCC](?:\uFE0F|\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDCCF\uDD8E\uDD91-\uDD9A\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF43\uDF45-\uDF4A\uDF4C-\uDF7C\uDF7E-\uDF84\uDF86-\uDF93\uDFA0-\uDFC1\uDFC5\uDFC6\uDFC8\uDFC9\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF8-\uDFFF]|\uDDE6\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF]|\uDDE7\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF]|\uDDE8\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF]|\uDDE9\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF]|\uDDEA\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA]|\uDDEB\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7]|\uDDEC\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE]|\uDDED\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA]|\uDDEE\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9]|\uDDEF\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5]|\uDDF0\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF]|\uDDF1\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE]|\uDDF2\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF]|\uDDF3\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF]|\uDDF4\uD83C\uDDF2|\uDDF5\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE]|\uDDF6\uD83C\uDDE6|\uDDF7\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC]|\uDDF8\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF]|\uDDF9\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF]|\uDDFA\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF]|\uDDFB\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA]|\uDDFC\uD83C[\uDDEB\uDDF8]|\uDDFD\uD83C\uDDF0|\uDDFE\uD83C[\uDDEA\uDDF9]|\uDDFF\uD83C[\uDDE6\uDDF2\uDDFC]|\uDF44(?:\u200D\uD83D\uDFEB)?|\uDF4B(?:\u200D\uD83D\uDFE9)?|\uDFC3(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?|\uDFF3\uFE0F?(?:\u200D(?:\u26A7\uFE0F?|\uD83C\uDF08))?|\uDFF4(?:\u200D\u2620\uFE0F?|\uDB40\uDC67\uDB40\uDC62\uDB40(?:\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDC73\uDB40\uDC63\uDB40\uDC74|\uDC77\uDB40\uDC6C\uDB40\uDC73)\uDB40\uDC7F)?)|\uD83D(?:[\uDC3F\uDCFD\uDD49\uDD4A\uDD6F\uDD70\uDD73\uDD76-\uDD79\uDD87\uDD8A-\uDD8D\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA\uDECB\uDECD-\uDECF\uDEE0-\uDEE5\uDEE9\uDEF0\uDEF3]\uFE0F?|[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC6B-\uDC6D\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDC8F\uDC91\uDCAA\uDD7A\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC](?:\uD83C[\uDFFB-\uDFFF])?|[\uDC6E\uDC70\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4\uDEB5](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDD74\uDD90](?:\uFE0F|\uD83C[\uDFFB-\uDFFF])?|[\uDC00-\uDC07\uDC09-\uDC14\uDC16-\uDC25\uDC27-\uDC3A\uDC3C-\uDC3E\uDC40\uDC44\uDC45\uDC51-\uDC65\uDC6A\uDC79-\uDC7B\uDC7D-\uDC80\uDC84\uDC88-\uDC8E\uDC90\uDC92-\uDCA9\uDCAB-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDDA4\uDDFB-\uDE2D\uDE2F-\uDE34\uDE37-\uDE41\uDE43\uDE44\uDE48-\uDE4A\uDE80-\uDEA2\uDEA4-\uDEB3\uDEB7-\uDEBF\uDEC1-\uDEC5\uDED0-\uDED2\uDED5-\uDED7\uDEDC-\uDEDF\uDEEB\uDEEC\uDEF4-\uDEFC\uDFE0-\uDFEB\uDFF0]|\uDC08(?:\u200D\u2B1B)?|\uDC15(?:\u200D\uD83E\uDDBA)?|\uDC26(?:\u200D(?:\u2B1B|\uD83D\uDD25))?|\uDC3B(?:\u200D\u2744\uFE0F?)?|\uDC41\uFE0F?(?:\u200D\uD83D\uDDE8\uFE0F?)?|\uDC68(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:[\uDC68\uDC69]\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFC-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB\uDFFD-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFD\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFE])))?))?|\uDC69(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?[\uDC68\uDC69]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?|\uDC69\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?))|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFC-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB\uDFFD-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB-\uDFFD\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB-\uDFFE])))?))?|\uDC6F(?:\u200D[\u2640\u2642]\uFE0F?)?|\uDD75(?:\uFE0F|\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|\uDE2E(?:\u200D\uD83D\uDCA8)?|\uDE35(?:\u200D\uD83D\uDCAB)?|\uDE36(?:\u200D\uD83C\uDF2B\uFE0F?)?|\uDE42(?:\u200D[\u2194\u2195]\uFE0F?)?|\uDEB6(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?)|\uD83E(?:[\uDD0C\uDD0F\uDD18-\uDD1F\uDD30-\uDD34\uDD36\uDD77\uDDB5\uDDB6\uDDBB\uDDD2\uDDD3\uDDD5\uDEC3-\uDEC5\uDEF0\uDEF2-\uDEF8](?:\uD83C[\uDFFB-\uDFFF])?|[\uDD26\uDD35\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDCD\uDDCF\uDDD4\uDDD6-\uDDDD](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDDDE\uDDDF](?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDD0D\uDD0E\uDD10-\uDD17\uDD20-\uDD25\uDD27-\uDD2F\uDD3A\uDD3F-\uDD45\uDD47-\uDD76\uDD78-\uDDB4\uDDB7\uDDBA\uDDBC-\uDDCC\uDDD0\uDDE0-\uDDFF\uDE70-\uDE7C\uDE80-\uDE88\uDE90-\uDEBD\uDEBF-\uDEC2\uDECE-\uDEDB\uDEE0-\uDEE8]|\uDD3C(?:\u200D[\u2640\u2642]\uFE0F?|\uD83C[\uDFFB-\uDFFF])?|\uDDCE(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?|\uDDD1(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1|\uDDD1\u200D\uD83E\uDDD2(?:\u200D\uD83E\uDDD2)?|\uDDD2(?:\u200D\uD83E\uDDD2)?))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFC-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB\uDFFD-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB-\uDFFD\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB-\uDFFE]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?))?|\uDEF1(?:\uD83C(?:\uDFFB(?:\u200D\uD83E\uDEF2\uD83C[\uDFFC-\uDFFF])?|\uDFFC(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB\uDFFD-\uDFFF])?|\uDFFD(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])?|\uDFFE(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB-\uDFFD\uDFFF])?|\uDFFF(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB-\uDFFE])?))?)/g;
+	return /[#*0-9]\uFE0F?\u20E3|[\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23ED-\u23EF\u23F1\u23F2\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB\u25FC\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692\u2694-\u2697\u2699\u269B\u269C\u26A0\u26A7\u26AA\u26B0\u26B1\u26BD\u26BE\u26C4\u26C8\u26CF\u26D1\u26E9\u26F0-\u26F5\u26F7\u26F8\u26FA\u2702\u2708\u2709\u270F\u2712\u2714\u2716\u271D\u2721\u2733\u2734\u2744\u2747\u2757\u2763\u27A1\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B55\u3030\u303D\u3297\u3299]\uFE0F?|[\u261D\u270C\u270D](?:\uD83C[\uDFFB-\uDFFF]|\uFE0F)?|[\u270A\u270B](?:\uD83C[\uDFFB-\uDFFF])?|[\u23E9-\u23EC\u23F0\u23F3\u25FD\u2693\u26A1\u26AB\u26C5\u26CE\u26D4\u26EA\u26FD\u2705\u2728\u274C\u274E\u2753-\u2755\u2795-\u2797\u27B0\u27BF\u2B50]|\u26D3\uFE0F?(?:\u200D\uD83D\uDCA5)?|\u26F9(?:\uD83C[\uDFFB-\uDFFF]|\uFE0F)?(?:\u200D[\u2640\u2642]\uFE0F?)?|\u2764\uFE0F?(?:\u200D(?:\uD83D\uDD25|\uD83E\uDE79))?|\uD83C(?:[\uDC04\uDD70\uDD71\uDD7E\uDD7F\uDE02\uDE37\uDF21\uDF24-\uDF2C\uDF36\uDF7D\uDF96\uDF97\uDF99-\uDF9B\uDF9E\uDF9F\uDFCD\uDFCE\uDFD4-\uDFDF\uDFF5\uDFF7]\uFE0F?|[\uDF85\uDFC2\uDFC7](?:\uD83C[\uDFFB-\uDFFF])?|[\uDFC4\uDFCA](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDFCB\uDFCC](?:\uD83C[\uDFFB-\uDFFF]|\uFE0F)?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDCCF\uDD8E\uDD91-\uDD9A\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF43\uDF45-\uDF4A\uDF4C-\uDF7C\uDF7E-\uDF84\uDF86-\uDF93\uDFA0-\uDFC1\uDFC5\uDFC6\uDFC8\uDFC9\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF8-\uDFFF]|\uDDE6\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF]|\uDDE7\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF]|\uDDE8\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF7\uDDFA-\uDDFF]|\uDDE9\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF]|\uDDEA\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA]|\uDDEB\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7]|\uDDEC\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE]|\uDDED\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA]|\uDDEE\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9]|\uDDEF\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5]|\uDDF0\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF]|\uDDF1\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE]|\uDDF2\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF]|\uDDF3\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF]|\uDDF4\uD83C\uDDF2|\uDDF5\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE]|\uDDF6\uD83C\uDDE6|\uDDF7\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC]|\uDDF8\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF]|\uDDF9\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF]|\uDDFA\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF]|\uDDFB\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA]|\uDDFC\uD83C[\uDDEB\uDDF8]|\uDDFD\uD83C\uDDF0|\uDDFE\uD83C[\uDDEA\uDDF9]|\uDDFF\uD83C[\uDDE6\uDDF2\uDDFC]|\uDF44(?:\u200D\uD83D\uDFEB)?|\uDF4B(?:\u200D\uD83D\uDFE9)?|\uDFC3(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?|\uDFF3\uFE0F?(?:\u200D(?:\u26A7\uFE0F?|\uD83C\uDF08))?|\uDFF4(?:\u200D\u2620\uFE0F?|\uDB40\uDC67\uDB40\uDC62\uDB40(?:\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDC73\uDB40\uDC63\uDB40\uDC74|\uDC77\uDB40\uDC6C\uDB40\uDC73)\uDB40\uDC7F)?)|\uD83D(?:[\uDC3F\uDCFD\uDD49\uDD4A\uDD6F\uDD70\uDD73\uDD76-\uDD79\uDD87\uDD8A-\uDD8D\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA\uDECB\uDECD-\uDECF\uDEE0-\uDEE5\uDEE9\uDEF0\uDEF3]\uFE0F?|[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC6B-\uDC6D\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDC8F\uDC91\uDCAA\uDD7A\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC](?:\uD83C[\uDFFB-\uDFFF])?|[\uDC6E\uDC70\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4\uDEB5](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDD74\uDD90](?:\uD83C[\uDFFB-\uDFFF]|\uFE0F)?|[\uDC00-\uDC07\uDC09-\uDC14\uDC16-\uDC25\uDC27-\uDC3A\uDC3C-\uDC3E\uDC40\uDC44\uDC45\uDC51-\uDC65\uDC6A\uDC79-\uDC7B\uDC7D-\uDC80\uDC84\uDC88-\uDC8E\uDC90\uDC92-\uDCA9\uDCAB-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDDA4\uDDFB-\uDE2D\uDE2F-\uDE34\uDE37-\uDE41\uDE43\uDE44\uDE48-\uDE4A\uDE80-\uDEA2\uDEA4-\uDEB3\uDEB7-\uDEBF\uDEC1-\uDEC5\uDED0-\uDED2\uDED5-\uDED7\uDEDC-\uDEDF\uDEEB\uDEEC\uDEF4-\uDEFC\uDFE0-\uDFEB\uDFF0]|\uDC08(?:\u200D\u2B1B)?|\uDC15(?:\u200D\uD83E\uDDBA)?|\uDC26(?:\u200D(?:\u2B1B|\uD83D\uDD25))?|\uDC3B(?:\u200D\u2744\uFE0F?)?|\uDC41\uFE0F?(?:\u200D\uD83D\uDDE8\uFE0F?)?|\uDC68(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:[\uDC68\uDC69]\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?)|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFC-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB\uDFFD-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFD\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?\uDC68\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D\uDC68\uD83C[\uDFFB-\uDFFE])))?))?|\uDC69(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:\uDC8B\u200D\uD83D)?[\uDC68\uDC69]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D(?:[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?|\uDC69\u200D\uD83D(?:\uDC66(?:\u200D\uD83D\uDC66)?|\uDC67(?:\u200D\uD83D[\uDC66\uDC67])?))|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFC-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB\uDFFD-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB-\uDFFD\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D\uD83D(?:[\uDC68\uDC69]|\uDC8B\u200D\uD83D[\uDC68\uDC69])\uD83C[\uDFFB-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83D[\uDC68\uDC69]\uD83C[\uDFFB-\uDFFE])))?))?|\uDC6F(?:\u200D[\u2640\u2642]\uFE0F?)?|\uDD75(?:\uD83C[\uDFFB-\uDFFF]|\uFE0F)?(?:\u200D[\u2640\u2642]\uFE0F?)?|\uDE2E(?:\u200D\uD83D\uDCA8)?|\uDE35(?:\u200D\uD83D\uDCAB)?|\uDE36(?:\u200D\uD83C\uDF2B\uFE0F?)?|\uDE42(?:\u200D[\u2194\u2195]\uFE0F?)?|\uDEB6(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?)|\uD83E(?:[\uDD0C\uDD0F\uDD18-\uDD1F\uDD30-\uDD34\uDD36\uDD77\uDDB5\uDDB6\uDDBB\uDDD2\uDDD3\uDDD5\uDEC3-\uDEC5\uDEF0\uDEF2-\uDEF8](?:\uD83C[\uDFFB-\uDFFF])?|[\uDD26\uDD35\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDCD\uDDCF\uDDD4\uDDD6-\uDDDD](?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDDDE\uDDDF](?:\u200D[\u2640\u2642]\uFE0F?)?|[\uDD0D\uDD0E\uDD10-\uDD17\uDD20-\uDD25\uDD27-\uDD2F\uDD3A\uDD3F-\uDD45\uDD47-\uDD76\uDD78-\uDDB4\uDDB7\uDDBA\uDDBC-\uDDCC\uDDD0\uDDE0-\uDDFF\uDE70-\uDE7C\uDE80-\uDE89\uDE8F-\uDEC2\uDEC6\uDECE-\uDEDC\uDEDF-\uDEE9]|\uDD3C(?:\u200D[\u2640\u2642]\uFE0F?|\uD83C[\uDFFB-\uDFFF])?|\uDDCE(?:\uD83C[\uDFFB-\uDFFF])?(?:\u200D(?:[\u2640\u2642]\uFE0F?(?:\u200D\u27A1\uFE0F?)?|\u27A1\uFE0F?))?|\uDDD1(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1|\uDDD1\u200D\uD83E\uDDD2(?:\u200D\uD83E\uDDD2)?|\uDDD2(?:\u200D\uD83E\uDDD2)?))|\uD83C(?:\uDFFB(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFC-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFC(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB\uDFFD-\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFD(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFE(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB-\uDFFD\uDFFF]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?|\uDFFF(?:\u200D(?:[\u2695\u2696\u2708]\uFE0F?|\u2764\uFE0F?\u200D(?:\uD83D\uDC8B\u200D)?\uD83E\uDDD1\uD83C[\uDFFB-\uDFFE]|\uD83C[\uDF3E\uDF73\uDF7C\uDF84\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E(?:[\uDDAF\uDDBC\uDDBD](?:\u200D\u27A1\uFE0F?)?|[\uDDB0-\uDDB3]|\uDD1D\u200D\uD83E\uDDD1\uD83C[\uDFFB-\uDFFF])))?))?|\uDEF1(?:\uD83C(?:\uDFFB(?:\u200D\uD83E\uDEF2\uD83C[\uDFFC-\uDFFF])?|\uDFFC(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB\uDFFD-\uDFFF])?|\uDFFD(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB\uDFFC\uDFFE\uDFFF])?|\uDFFE(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB-\uDFFD\uDFFF])?|\uDFFF(?:\u200D\uD83E\uDEF2\uD83C[\uDFFB-\uDFFE])?))?)/g;
 };
 
 function stringWidth(string, options) {
diff --git a/tools/lint-md/package-lock.json b/tools/lint-md/package-lock.json
index 0cb81981258df5..0958904e2d0218 100644
--- a/tools/lint-md/package-lock.json
+++ b/tools/lint-md/package-lock.json
@@ -9,64 +9,38 @@
       "version": "1.0.0",
       "dependencies": {
         "remark-parse": "^11.0.0",
-        "remark-preset-lint-node": "^5.1.0",
+        "remark-preset-lint-node": "^5.1.2",
         "remark-stringify": "^11.0.0",
         "to-vfile": "^8.0.0",
         "unified": "^11.0.5",
         "vfile-reporter": "^8.1.1"
       },
       "devDependencies": {
-        "@rollup/plugin-commonjs": "^26.0.1",
-        "@rollup/plugin-node-resolve": "^15.2.3",
-        "rollup": "^4.22.4",
+        "@rollup/plugin-commonjs": "^28.0.1",
+        "@rollup/plugin-node-resolve": "^15.3.0",
+        "rollup": "^4.24.0",
         "rollup-plugin-cleanup": "^3.2.1"
       }
     },
-    "node_modules/@isaacs/cliui": {
-      "version": "8.0.2",
-      "resolved": "https://registry.npmjs.org/@isaacs/cliui/-/cliui-8.0.2.tgz",
-      "integrity": "sha512-O8jcjabXaleOG9DQ0+ARXWZBTfnP4WNAqzuiJK7ll44AmxGKv/J2M4TPjxjY3znBCfvBXFzucm1twdyFybFqEA==",
-      "dev": true,
-      "dependencies": {
-        "string-width": "^5.1.2",
-        "string-width-cjs": "npm:string-width@^4.2.0",
-        "strip-ansi": "^7.0.1",
-        "strip-ansi-cjs": "npm:strip-ansi@^6.0.1",
-        "wrap-ansi": "^8.1.0",
-        "wrap-ansi-cjs": "npm:wrap-ansi@^7.0.0"
-      },
-      "engines": {
-        "node": ">=12"
-      }
-    },
     "node_modules/@jridgewell/sourcemap-codec": {
       "version": "1.5.0",
       "resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.0.tgz",
       "integrity": "sha512-gv3ZRaISU3fjPAgNsriBRqGWQL6quFx04YMPW/zD8XMLsU32mhCCbfbO6KZFLjvYpCZ8zyDEgqsgf+PwPaM7GQ==",
       "dev": true
     },
-    "node_modules/@pkgjs/parseargs": {
-      "version": "0.11.0",
-      "resolved": "https://registry.npmjs.org/@pkgjs/parseargs/-/parseargs-0.11.0.tgz",
-      "integrity": "sha512-+1VkjdD0QBLPodGrJUeqarH8VAIvQODIbwh9XpP5Syisf7YoQgsJKPNFoqqLQlu+VQ/tVSshMR6loPMn8U+dPg==",
-      "dev": true,
-      "optional": true,
-      "engines": {
-        "node": ">=14"
-      }
-    },
     "node_modules/@rollup/plugin-commonjs": {
-      "version": "26.0.1",
-      "resolved": "https://registry.npmjs.org/@rollup/plugin-commonjs/-/plugin-commonjs-26.0.1.tgz",
-      "integrity": "sha512-UnsKoZK6/aGIH6AdkptXhNvhaqftcjq3zZdT+LY5Ftms6JR06nADcDsYp5hTU9E2lbJUEOhdlY5J4DNTneM+jQ==",
+      "version": "28.0.1",
+      "resolved": "https://registry.npmjs.org/@rollup/plugin-commonjs/-/plugin-commonjs-28.0.1.tgz",
+      "integrity": "sha512-+tNWdlWKbpB3WgBN7ijjYkq9X5uhjmcvyjEght4NmH5fAU++zfQzAJ6wumLS+dNcvwEZhKx2Z+skY8m7v0wGSA==",
       "dev": true,
       "dependencies": {
         "@rollup/pluginutils": "^5.0.1",
         "commondir": "^1.0.1",
         "estree-walker": "^2.0.2",
-        "glob": "^10.4.1",
+        "fdir": "^6.2.0",
         "is-reference": "1.2.1",
-        "magic-string": "^0.30.3"
+        "magic-string": "^0.30.3",
+        "picomatch": "^4.0.2"
       },
       "engines": {
         "node": ">=16.0.0 || 14 >= 14.17"
@@ -81,15 +55,14 @@
       }
     },
     "node_modules/@rollup/plugin-node-resolve": {
-      "version": "15.2.3",
-      "resolved": "https://registry.npmjs.org/@rollup/plugin-node-resolve/-/plugin-node-resolve-15.2.3.tgz",
-      "integrity": "sha512-j/lym8nf5E21LwBT4Df1VD6hRO2L2iwUeUmP7litikRsVp1H6NWx20NEp0Y7su+7XGc476GnXXc4kFeZNGmaSQ==",
+      "version": "15.3.0",
+      "resolved": "https://registry.npmjs.org/@rollup/plugin-node-resolve/-/plugin-node-resolve-15.3.0.tgz",
+      "integrity": "sha512-9eO5McEICxMzJpDW9OnMYSv4Sta3hmt7VtBFz5zR9273suNOydOyq/FrGeGy+KsTRFm8w0SLVhzig2ILFT63Ag==",
       "dev": true,
       "dependencies": {
         "@rollup/pluginutils": "^5.0.1",
         "@types/resolve": "1.20.2",
         "deepmerge": "^4.2.2",
-        "is-builtin-module": "^3.2.1",
         "is-module": "^1.0.0",
         "resolve": "^1.22.1"
       },
@@ -106,9 +79,9 @@
       }
     },
     "node_modules/@rollup/pluginutils": {
-      "version": "5.1.0",
-      "resolved": "https://registry.npmjs.org/@rollup/pluginutils/-/pluginutils-5.1.0.tgz",
-      "integrity": "sha512-XTIWOPPcpvyKI6L1NHo0lFlCyznUEyPmPY1mc3KpPVDYulHSTvyeLNVW00QTLIAFNhR3kYnJTQHeGqU4M3n09g==",
+      "version": "5.1.2",
+      "resolved": "https://registry.npmjs.org/@rollup/pluginutils/-/pluginutils-5.1.2.tgz",
+      "integrity": "sha512-/FIdS3PyZ39bjZlwqFnWqCOVnW7o963LtKMwQOD0NhQqw22gSr2YY1afu3FxRip4ZCZNsD5jq6Aaz6QV3D/Njw==",
       "dev": true,
       "dependencies": {
         "@types/estree": "^1.0.0",
@@ -127,10 +100,22 @@
         }
       }
     },
+    "node_modules/@rollup/pluginutils/node_modules/picomatch": {
+      "version": "2.3.1",
+      "resolved": "https://registry.npmjs.org/picomatch/-/picomatch-2.3.1.tgz",
+      "integrity": "sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==",
+      "dev": true,
+      "engines": {
+        "node": ">=8.6"
+      },
+      "funding": {
+        "url": "https://github.com/sponsors/jonschlinkert"
+      }
+    },
     "node_modules/@rollup/rollup-android-arm-eabi": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.22.4.tgz",
-      "integrity": "sha512-Fxamp4aEZnfPOcGA8KSNEohV8hX7zVHOemC8jVBoBUHu5zpJK/Eu3uJwt6BMgy9fkvzxDaurgj96F/NiLukF2w==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.24.0.tgz",
+      "integrity": "sha512-Q6HJd7Y6xdB48x8ZNVDOqsbh2uByBhgK8PiQgPhwkIw/HC/YX5Ghq2mQY5sRMZWHb3VsFkWooUVOZHKr7DmDIA==",
       "cpu": [
         "arm"
       ],
@@ -141,9 +126,9 @@
       ]
     },
     "node_modules/@rollup/rollup-android-arm64": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.22.4.tgz",
-      "integrity": "sha512-VXoK5UMrgECLYaMuGuVTOx5kcuap1Jm8g/M83RnCHBKOqvPPmROFJGQaZhGccnsFtfXQ3XYa4/jMCJvZnbJBdA==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.24.0.tgz",
+      "integrity": "sha512-ijLnS1qFId8xhKjT81uBHuuJp2lU4x2yxa4ctFPtG+MqEE6+C5f/+X/bStmxapgmwLwiL3ih122xv8kVARNAZA==",
       "cpu": [
         "arm64"
       ],
@@ -154,9 +139,9 @@
       ]
     },
     "node_modules/@rollup/rollup-darwin-arm64": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.22.4.tgz",
-      "integrity": "sha512-xMM9ORBqu81jyMKCDP+SZDhnX2QEVQzTcC6G18KlTQEzWK8r/oNZtKuZaCcHhnsa6fEeOBionoyl5JsAbE/36Q==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.24.0.tgz",
+      "integrity": "sha512-bIv+X9xeSs1XCk6DVvkO+S/z8/2AMt/2lMqdQbMrmVpgFvXlmde9mLcbQpztXm1tajC3raFDqegsH18HQPMYtA==",
       "cpu": [
         "arm64"
       ],
@@ -167,9 +152,9 @@
       ]
     },
     "node_modules/@rollup/rollup-darwin-x64": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.22.4.tgz",
-      "integrity": "sha512-aJJyYKQwbHuhTUrjWjxEvGnNNBCnmpHDvrb8JFDbeSH3m2XdHcxDd3jthAzvmoI8w/kSjd2y0udT+4okADsZIw==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.24.0.tgz",
+      "integrity": "sha512-X6/nOwoFN7RT2svEQWUsW/5C/fYMBe4fnLK9DQk4SX4mgVBiTA9h64kjUYPvGQ0F/9xwJ5U5UfTbl6BEjaQdBQ==",
       "cpu": [
         "x64"
       ],
@@ -180,9 +165,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-arm-gnueabihf": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.22.4.tgz",
-      "integrity": "sha512-j63YtCIRAzbO+gC2L9dWXRh5BFetsv0j0va0Wi9epXDgU/XUi5dJKo4USTttVyK7fGw2nPWK0PbAvyliz50SCQ==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.24.0.tgz",
+      "integrity": "sha512-0KXvIJQMOImLCVCz9uvvdPgfyWo93aHHp8ui3FrtOP57svqrF/roSSR5pjqL2hcMp0ljeGlU4q9o/rQaAQ3AYA==",
       "cpu": [
         "arm"
       ],
@@ -193,9 +178,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-arm-musleabihf": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.22.4.tgz",
-      "integrity": "sha512-dJnWUgwWBX1YBRsuKKMOlXCzh2Wu1mlHzv20TpqEsfdZLb3WoJW2kIEsGwLkroYf24IrPAvOT/ZQ2OYMV6vlrg==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.24.0.tgz",
+      "integrity": "sha512-it2BW6kKFVh8xk/BnHfakEeoLPv8STIISekpoF+nBgWM4d55CZKc7T4Dx1pEbTnYm/xEKMgy1MNtYuoA8RFIWw==",
       "cpu": [
         "arm"
       ],
@@ -206,9 +191,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-arm64-gnu": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.22.4.tgz",
-      "integrity": "sha512-AdPRoNi3NKVLolCN/Sp4F4N1d98c4SBnHMKoLuiG6RXgoZ4sllseuGioszumnPGmPM2O7qaAX/IJdeDU8f26Aw==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.24.0.tgz",
+      "integrity": "sha512-i0xTLXjqap2eRfulFVlSnM5dEbTVque/3Pi4g2y7cxrs7+a9De42z4XxKLYJ7+OhE3IgxvfQM7vQc43bwTgPwA==",
       "cpu": [
         "arm64"
       ],
@@ -219,9 +204,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-arm64-musl": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.22.4.tgz",
-      "integrity": "sha512-Gl0AxBtDg8uoAn5CCqQDMqAx22Wx22pjDOjBdmG0VIWX3qUBHzYmOKh8KXHL4UpogfJ14G4wk16EQogF+v8hmA==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.24.0.tgz",
+      "integrity": "sha512-9E6MKUJhDuDh604Qco5yP/3qn3y7SLXYuiC0Rpr89aMScS2UAmK1wHP2b7KAa1nSjWJc/f/Lc0Wl1L47qjiyQw==",
       "cpu": [
         "arm64"
       ],
@@ -232,9 +217,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-powerpc64le-gnu": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-powerpc64le-gnu/-/rollup-linux-powerpc64le-gnu-4.22.4.tgz",
-      "integrity": "sha512-3aVCK9xfWW1oGQpTsYJJPF6bfpWfhbRnhdlyhak2ZiyFLDaayz0EP5j9V1RVLAAxlmWKTDfS9wyRyY3hvhPoOg==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-powerpc64le-gnu/-/rollup-linux-powerpc64le-gnu-4.24.0.tgz",
+      "integrity": "sha512-2XFFPJ2XMEiF5Zi2EBf4h73oR1V/lycirxZxHZNc93SqDN/IWhYYSYj8I9381ikUFXZrz2v7r2tOVk2NBwxrWw==",
       "cpu": [
         "ppc64"
       ],
@@ -245,9 +230,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-riscv64-gnu": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.22.4.tgz",
-      "integrity": "sha512-ePYIir6VYnhgv2C5Xe9u+ico4t8sZWXschR6fMgoPUK31yQu7hTEJb7bCqivHECwIClJfKgE7zYsh1qTP3WHUA==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.24.0.tgz",
+      "integrity": "sha512-M3Dg4hlwuntUCdzU7KjYqbbd+BLq3JMAOhCKdBE3TcMGMZbKkDdJ5ivNdehOssMCIokNHFOsv7DO4rlEOfyKpg==",
       "cpu": [
         "riscv64"
       ],
@@ -258,9 +243,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-s390x-gnu": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.22.4.tgz",
-      "integrity": "sha512-GqFJ9wLlbB9daxhVlrTe61vJtEY99/xB3C8e4ULVsVfflcpmR6c8UZXjtkMA6FhNONhj2eA5Tk9uAVw5orEs4Q==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.24.0.tgz",
+      "integrity": "sha512-mjBaoo4ocxJppTorZVKWFpy1bfFj9FeCMJqzlMQGjpNPY9JwQi7OuS1axzNIk0nMX6jSgy6ZURDZ2w0QW6D56g==",
       "cpu": [
         "s390x"
       ],
@@ -271,9 +256,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-x64-gnu": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.22.4.tgz",
-      "integrity": "sha512-87v0ol2sH9GE3cLQLNEy0K/R0pz1nvg76o8M5nhMR0+Q+BBGLnb35P0fVz4CQxHYXaAOhE8HhlkaZfsdUOlHwg==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.24.0.tgz",
+      "integrity": "sha512-ZXFk7M72R0YYFN5q13niV0B7G8/5dcQ9JDp8keJSfr3GoZeXEoMHP/HlvqROA3OMbMdfr19IjCeNAnPUG93b6A==",
       "cpu": [
         "x64"
       ],
@@ -284,9 +269,9 @@
       ]
     },
     "node_modules/@rollup/rollup-linux-x64-musl": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.22.4.tgz",
-      "integrity": "sha512-UV6FZMUgePDZrFjrNGIWzDo/vABebuXBhJEqrHxrGiU6HikPy0Z3LfdtciIttEUQfuDdCn8fqh7wiFJjCNwO+g==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.24.0.tgz",
+      "integrity": "sha512-w1i+L7kAXZNdYl+vFvzSZy8Y1arS7vMgIy8wusXJzRrPyof5LAb02KGr1PD2EkRcl73kHulIID0M501lN+vobQ==",
       "cpu": [
         "x64"
       ],
@@ -297,9 +282,9 @@
       ]
     },
     "node_modules/@rollup/rollup-win32-arm64-msvc": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.22.4.tgz",
-      "integrity": "sha512-BjI+NVVEGAXjGWYHz/vv0pBqfGoUH0IGZ0cICTn7kB9PyjrATSkX+8WkguNjWoj2qSr1im/+tTGRaY+4/PdcQw==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.24.0.tgz",
+      "integrity": "sha512-VXBrnPWgBpVDCVY6XF3LEW0pOU51KbaHhccHw6AS6vBWIC60eqsH19DAeeObl+g8nKAz04QFdl/Cefta0xQtUQ==",
       "cpu": [
         "arm64"
       ],
@@ -310,9 +295,9 @@
       ]
     },
     "node_modules/@rollup/rollup-win32-ia32-msvc": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.22.4.tgz",
-      "integrity": "sha512-SiWG/1TuUdPvYmzmYnmd3IEifzR61Tragkbx9D3+R8mzQqDBz8v+BvZNDlkiTtI9T15KYZhP0ehn3Dld4n9J5g==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.24.0.tgz",
+      "integrity": "sha512-xrNcGDU0OxVcPTH/8n/ShH4UevZxKIO6HJFK0e15XItZP2UcaiLFd5kiX7hJnqCbSztUF8Qot+JWBC/QXRPYWQ==",
       "cpu": [
         "ia32"
       ],
@@ -323,9 +308,9 @@
       ]
     },
     "node_modules/@rollup/rollup-win32-x64-msvc": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.22.4.tgz",
-      "integrity": "sha512-j8pPKp53/lq9lMXN57S8cFz0MynJk8OWNuUnXct/9KCpKU7DgU3bYMJhwWmcqC0UU29p8Lr0/7KEVcaM6bf47Q==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.24.0.tgz",
+      "integrity": "sha512-fbMkAF7fufku0N2dE5TBXcNlg0pt0cJue4xBRE2Qc5Vqikxr4VCgKj/ht6SMdFcOacVA9rqF70APJ8RN/4vMJw==",
       "cpu": [
         "x64"
       ],
@@ -344,9 +329,9 @@
       }
     },
     "node_modules/@types/estree": {
-      "version": "1.0.5",
-      "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.5.tgz",
-      "integrity": "sha512-/kYRxGDLWzHOB7q+wtSUQlFrtcdUccpfy+X+9iMBpHK8QLLhx2wIPYuS5DYtR9Wa/YlZAbIovy7qVdB1Aq6Lyw=="
+      "version": "1.0.6",
+      "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.6.tgz",
+      "integrity": "sha512-AYnb1nQyY49te+VRAVgmzfcgjYS91mY5P0TKUDCLEM+gNnA+3T6rWITXRLYCpahpqSQbN5cE+gHpnPyXjHWxcw=="
     },
     "node_modules/@types/estree-jsx": {
       "version": "1.0.5",
@@ -389,14 +374,14 @@
       "integrity": "sha512-Hy6UMpxhE3j1tLpl27exp1XqHD7n8chAiNPzWfz16LPZoMMoSc4dzLl6w9qijkEb/r5O1ozdu1CWGA2L83ZeZg=="
     },
     "node_modules/@types/unist": {
-      "version": "3.0.2",
-      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-3.0.2.tgz",
-      "integrity": "sha512-dqId9J8K/vGi5Zr7oo212BGii5m3q5Hxlkwy3WpYuKPklmBEvsbMYYyLxAQpSffdLl/gdW0XUpKWFvYmyoWCoQ=="
+      "version": "3.0.3",
+      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-3.0.3.tgz",
+      "integrity": "sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q=="
     },
     "node_modules/ansi-regex": {
-      "version": "6.0.1",
-      "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.0.1.tgz",
-      "integrity": "sha512-n5M855fKb2SsfMIiFFoVrABHJC8QtHwVx+mHWP3QcEqBHYienj5dHSgjbxtC0WEZXYt4wcD6zrQElDPhFuZgfA==",
+      "version": "6.1.0",
+      "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.1.0.tgz",
+      "integrity": "sha512-7HSX4QQb4CspciLpVFwyRe79O3xsIZDDLER21kERQ71oaPodF8jL725AgJMFAYbooIqolJoRLuM81SpeUkpkvA==",
       "engines": {
         "node": ">=12"
       },
@@ -404,18 +389,6 @@
         "url": "https://github.com/chalk/ansi-regex?sponsor=1"
       }
     },
-    "node_modules/ansi-styles": {
-      "version": "6.2.1",
-      "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.1.tgz",
-      "integrity": "sha512-bN798gFfQX+viw3R7yrGWRqnrN2oRkEkUjjl4JNn4E8GxxbjtG3FbrEIIY3l8/hrwUwIeCZvi4QuOTP4MErVug==",
-      "dev": true,
-      "engines": {
-        "node": ">=12"
-      },
-      "funding": {
-        "url": "https://github.com/chalk/ansi-styles?sponsor=1"
-      }
-    },
     "node_modules/argparse": {
       "version": "2.0.1",
       "resolved": "https://registry.npmjs.org/argparse/-/argparse-2.0.1.tgz",
@@ -430,33 +403,6 @@
         "url": "https://github.com/sponsors/wooorm"
       }
     },
-    "node_modules/balanced-match": {
-      "version": "1.0.2",
-      "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
-      "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==",
-      "dev": true
-    },
-    "node_modules/brace-expansion": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.1.tgz",
-      "integrity": "sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==",
-      "dev": true,
-      "dependencies": {
-        "balanced-match": "^1.0.0"
-      }
-    },
-    "node_modules/builtin-modules": {
-      "version": "3.3.0",
-      "resolved": "https://registry.npmjs.org/builtin-modules/-/builtin-modules-3.3.0.tgz",
-      "integrity": "sha512-zhaCDicdLuWN5UbN5IMnFqNMhNfo919sH85y2/ea+5Yg9TsTkeZxpL+JLbp6cgYFS4sRLp3YV4S6yDuqVWHYOw==",
-      "dev": true,
-      "engines": {
-        "node": ">=6"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/sindresorhus"
-      }
-    },
     "node_modules/ccount": {
       "version": "2.0.1",
       "resolved": "https://registry.npmjs.org/ccount/-/ccount-2.0.1.tgz",
@@ -502,11 +448,6 @@
         "url": "https://github.com/sponsors/wooorm"
       }
     },
-    "node_modules/co": {
-      "version": "3.1.0",
-      "resolved": "https://registry.npmjs.org/co/-/co-3.1.0.tgz",
-      "integrity": "sha512-CQsjCRiNObI8AtTsNIBDRMQ4oMR83CzEswHYahClvul7gKk+lDQiOKv+5qh7LQWf5sh6jkZNispz/QlsZxyNgA=="
-    },
     "node_modules/collapse-white-space": {
       "version": "2.1.0",
       "resolved": "https://registry.npmjs.org/collapse-white-space/-/collapse-white-space-2.1.0.tgz",
@@ -516,50 +457,18 @@
         "url": "https://github.com/sponsors/wooorm"
       }
     },
-    "node_modules/color-convert": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz",
-      "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==",
-      "dev": true,
-      "dependencies": {
-        "color-name": "~1.1.4"
-      },
-      "engines": {
-        "node": ">=7.0.0"
-      }
-    },
-    "node_modules/color-name": {
-      "version": "1.1.4",
-      "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz",
-      "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==",
-      "dev": true
-    },
     "node_modules/commondir": {
       "version": "1.0.1",
       "resolved": "https://registry.npmjs.org/commondir/-/commondir-1.0.1.tgz",
       "integrity": "sha512-W9pAhw0ja1Edb5GVdIF1mjZw/ASI0AlShXM83UUGe2DVr5TdAPEA1OA8m/g8zWp9x6On7gqufY+FatDbC3MDQg==",
       "dev": true
     },
-    "node_modules/cross-spawn": {
-      "version": "7.0.3",
-      "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.3.tgz",
-      "integrity": "sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w==",
-      "dev": true,
-      "dependencies": {
-        "path-key": "^3.1.0",
-        "shebang-command": "^2.0.0",
-        "which": "^2.0.1"
-      },
-      "engines": {
-        "node": ">= 8"
-      }
-    },
     "node_modules/debug": {
-      "version": "4.3.5",
-      "resolved": "https://registry.npmjs.org/debug/-/debug-4.3.5.tgz",
-      "integrity": "sha512-pt0bNEmneDIvdL1Xsd9oDQ/wrQRkXDT4AUWlNZNPKvW5x/jyO9VFXkJUP07vQ2upmw5PlaITaPKc31jK13V+jg==",
+      "version": "4.3.7",
+      "resolved": "https://registry.npmjs.org/debug/-/debug-4.3.7.tgz",
+      "integrity": "sha512-Er2nc/H7RrMXZBFCEim6TCmMk02Z8vLC2Rbi1KEBggpo0fS6l0S1nnapwmIi3yW/+GOJap1Krg4w0Hg80oCqgQ==",
       "dependencies": {
-        "ms": "2.1.2"
+        "ms": "^2.1.3"
       },
       "engines": {
         "node": ">=6.0"
@@ -617,10 +526,9 @@
       "integrity": "sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA=="
     },
     "node_modules/emoji-regex": {
-      "version": "9.2.2",
-      "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-9.2.2.tgz",
-      "integrity": "sha512-L18DaJsXSUk2+42pv8mLs5jJT2hqFkFE4j21wOmgbUqsZ2hL72NsUU785g9RXgo3s0ZNgVl42TiHp3ZtOv/Vyg==",
-      "dev": true
+      "version": "10.4.0",
+      "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-10.4.0.tgz",
+      "integrity": "sha512-EC+0oUMY1Rqm4O6LLrgjtYDvcVYTy7chDnM4Q7030tP4Kwj3u/pR6gP9ygnp2CJMK5Gq+9Q2oqmrFJAz01DXjw=="
     },
     "node_modules/escape-string-regexp": {
       "version": "5.0.0",
@@ -644,20 +552,18 @@
       "resolved": "https://registry.npmjs.org/extend/-/extend-3.0.2.tgz",
       "integrity": "sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g=="
     },
-    "node_modules/foreground-child": {
-      "version": "3.2.1",
-      "resolved": "https://registry.npmjs.org/foreground-child/-/foreground-child-3.2.1.tgz",
-      "integrity": "sha512-PXUUyLqrR2XCWICfv6ukppP96sdFwWbNEnfEMt7jNsISjMsvaLNinAHNDYyvkyU+SZG2BTSbT5NjG+vZslfGTA==",
+    "node_modules/fdir": {
+      "version": "6.4.2",
+      "resolved": "https://registry.npmjs.org/fdir/-/fdir-6.4.2.tgz",
+      "integrity": "sha512-KnhMXsKSPZlAhp7+IjUkRZKPb4fUyccpDrdFXbi4QL1qkmFh9kVY09Yox+n4MaOb3lHZ1Tv829C3oaaXoMYPDQ==",
       "dev": true,
-      "dependencies": {
-        "cross-spawn": "^7.0.0",
-        "signal-exit": "^4.0.1"
-      },
-      "engines": {
-        "node": ">=14"
+      "peerDependencies": {
+        "picomatch": "^3 || ^4"
       },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
+      "peerDependenciesMeta": {
+        "picomatch": {
+          "optional": true
+        }
       }
     },
     "node_modules/fsevents": {
@@ -683,26 +589,6 @@
         "url": "https://github.com/sponsors/ljharb"
       }
     },
-    "node_modules/glob": {
-      "version": "10.4.5",
-      "resolved": "https://registry.npmjs.org/glob/-/glob-10.4.5.tgz",
-      "integrity": "sha512-7Bv8RF0k6xjo7d4A/PxYLbUCfb6c+Vpd2/mB2yRDlew7Jb5hEXiCD9ibfO7wpk8i4sevK6DFny9h7EYbM3/sHg==",
-      "dev": true,
-      "dependencies": {
-        "foreground-child": "^3.1.0",
-        "jackspeak": "^3.1.2",
-        "minimatch": "^9.0.4",
-        "minipass": "^7.1.2",
-        "package-json-from-dist": "^1.0.0",
-        "path-scurry": "^1.11.1"
-      },
-      "bin": {
-        "glob": "dist/esm/bin.mjs"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
-      }
-    },
     "node_modules/hasown": {
       "version": "2.0.2",
       "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
@@ -759,25 +645,10 @@
         "node": ">=4"
       }
     },
-    "node_modules/is-builtin-module": {
-      "version": "3.2.1",
-      "resolved": "https://registry.npmjs.org/is-builtin-module/-/is-builtin-module-3.2.1.tgz",
-      "integrity": "sha512-BSLE3HnV2syZ0FK0iMA/yUGplUeMmNz4AW5fnTunbCIqZi4vG3WjJT9FHMy5D69xmAYBHXQhJdALdpwVxV501A==",
-      "dev": true,
-      "dependencies": {
-        "builtin-modules": "^3.3.0"
-      },
-      "engines": {
-        "node": ">=6"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/sindresorhus"
-      }
-    },
     "node_modules/is-core-module": {
-      "version": "2.14.0",
-      "resolved": "https://registry.npmjs.org/is-core-module/-/is-core-module-2.14.0.tgz",
-      "integrity": "sha512-a5dFJih5ZLYlRtDc0dZWP7RiKr6xIKzmn/oAYCDvdLThadVgyJwlaoQPmRtMSpz+rk0OGAgIu+TcM9HUF0fk1A==",
+      "version": "2.15.1",
+      "resolved": "https://registry.npmjs.org/is-core-module/-/is-core-module-2.15.1.tgz",
+      "integrity": "sha512-z0vtXSwucUJtANQWldhbtbt7BnL0vxiFjIdDLAatwhDYty2bad6s+rijD6Ri4YuYJubLzIJLUidCh09e1djEVQ==",
       "dev": true,
       "dependencies": {
         "hasown": "^2.0.2"
@@ -798,15 +669,6 @@
         "url": "https://github.com/sponsors/wooorm"
       }
     },
-    "node_modules/is-fullwidth-code-point": {
-      "version": "3.0.0",
-      "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-3.0.0.tgz",
-      "integrity": "sha512-zymm5+u+sCsSWyD9qNaejV3DFvhCKclKdizYaJUuHA83RLjb7nSuGnddCHGv0hk+KY7BMAlsWeK4Ueg6EV6XQg==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
-    },
     "node_modules/is-hexadecimal": {
       "version": "2.0.1",
       "resolved": "https://registry.npmjs.org/is-hexadecimal/-/is-hexadecimal-2.0.1.tgz",
@@ -842,27 +704,6 @@
         "@types/estree": "*"
       }
     },
-    "node_modules/isexe": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz",
-      "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==",
-      "dev": true
-    },
-    "node_modules/jackspeak": {
-      "version": "3.4.3",
-      "resolved": "https://registry.npmjs.org/jackspeak/-/jackspeak-3.4.3.tgz",
-      "integrity": "sha512-OGlZQpz2yfahA/Rd1Y8Cd9SIEsqvXkLVoSw/cgwhnhFMDbsQFeZYoJJ7bIZBS9BcamUW96asq/npPWugM+RQBw==",
-      "dev": true,
-      "dependencies": {
-        "@isaacs/cliui": "^8.0.2"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
-      },
-      "optionalDependencies": {
-        "@pkgjs/parseargs": "^0.11.0"
-      }
-    },
     "node_modules/js-cleanup": {
       "version": "1.2.0",
       "resolved": "https://registry.npmjs.org/js-cleanup/-/js-cleanup-1.2.0.tgz",
@@ -906,19 +747,13 @@
         "url": "https://github.com/sponsors/wooorm"
       }
     },
-    "node_modules/lru-cache": {
-      "version": "10.4.3",
-      "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-10.4.3.tgz",
-      "integrity": "sha512-JNAzZcXrCt42VGLuYz0zfAzDfAvJWW6AfYlDBQyDV5DClI2m5sAmK+OIO7s59XfsRsWHp02jAJrRadPRGTt6SQ==",
-      "dev": true
-    },
     "node_modules/magic-string": {
-      "version": "0.30.10",
-      "resolved": "https://registry.npmjs.org/magic-string/-/magic-string-0.30.10.tgz",
-      "integrity": "sha512-iIRwTIf0QKV3UAnYK4PU8uiEc4SRh5jX0mwpIwETPpHdhVM4f53RSwS/vXvN1JhGX+Cs7B8qIq3d6AH49O5fAQ==",
+      "version": "0.30.12",
+      "resolved": "https://registry.npmjs.org/magic-string/-/magic-string-0.30.12.tgz",
+      "integrity": "sha512-Ea8I3sQMVXr8JhN4z+H/d8zwo+tYDgHE9+5G4Wnrwhs0gaK9fXTKx0Tw5Xwsd/bCPTTZNRAdpyzvoeORe9LYpw==",
       "dev": true,
       "dependencies": {
-        "@jridgewell/sourcemap-codec": "^1.4.15"
+        "@jridgewell/sourcemap-codec": "^1.5.0"
       }
     },
     "node_modules/markdown-table": {
@@ -1019,9 +854,9 @@
       }
     },
     "node_modules/mdast-util-gfm-autolink-literal": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/mdast-util-gfm-autolink-literal/-/mdast-util-gfm-autolink-literal-2.0.0.tgz",
-      "integrity": "sha512-FyzMsduZZHSc3i0Px3PQcBT4WJY/X/RCtEJKuybiC6sjPqLv7h1yqAkmILZtuxMSsUyaLUWNp71+vQH2zqp5cg==",
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/mdast-util-gfm-autolink-literal/-/mdast-util-gfm-autolink-literal-2.0.1.tgz",
+      "integrity": "sha512-5HVP2MKaP6L+G6YaxPNjuL0BPrq9orG3TsrZ9YXbA3vDw/ACI4MEsnoDpn6ZNm7GnZgtAcONJyPhOP8tNJQavQ==",
       "dependencies": {
         "@types/mdast": "^4.0.0",
         "ccount": "^2.0.0",
@@ -1124,9 +959,9 @@
       }
     },
     "node_modules/mdast-util-mdx-expression": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/mdast-util-mdx-expression/-/mdast-util-mdx-expression-2.0.0.tgz",
-      "integrity": "sha512-fGCu8eWdKUKNu5mohVGkhBXCXGnOTLuFqOvGMvdikr+J1w7lDJgxThOKpwRWzzbyXAU2hhSwsmssOY4yTokluw==",
+      "version": "2.0.1",
+      "resolved": "https://registry.npmjs.org/mdast-util-mdx-expression/-/mdast-util-mdx-expression-2.0.1.tgz",
+      "integrity": "sha512-J6f+9hUp+ldTZqKRSg7Vw5V6MqjATc+3E4gf3CFNcuZNWD8XdyI6zQ8GqH7f8169MM6P7hMBRDVGnn7oHB9kXQ==",
       "dependencies": {
         "@types/estree-jsx": "^1.0.0",
         "@types/hast": "^3.0.0",
@@ -1141,9 +976,9 @@
       }
     },
     "node_modules/mdast-util-mdx-jsx": {
-      "version": "3.1.2",
-      "resolved": "https://registry.npmjs.org/mdast-util-mdx-jsx/-/mdast-util-mdx-jsx-3.1.2.tgz",
-      "integrity": "sha512-eKMQDeywY2wlHc97k5eD8VC+9ASMjN8ItEZQNGwJ6E0XWKiW/Z0V5/H8pvoXUf+y+Mj0VIgeRRbujBmFn4FTyA==",
+      "version": "3.1.3",
+      "resolved": "https://registry.npmjs.org/mdast-util-mdx-jsx/-/mdast-util-mdx-jsx-3.1.3.tgz",
+      "integrity": "sha512-bfOjvNt+1AcbPLTFMFWY149nJz0OjmewJs3LQQ5pIyVGxP4CdOqNVJL6kTaM5c68p8q82Xv3nCyFfUnuEcH3UQ==",
       "dependencies": {
         "@types/estree-jsx": "^1.0.0",
         "@types/hast": "^3.0.0",
@@ -1155,7 +990,6 @@
         "mdast-util-to-markdown": "^2.0.0",
         "parse-entities": "^4.0.0",
         "stringify-entities": "^4.0.0",
-        "unist-util-remove-position": "^5.0.0",
         "unist-util-stringify-position": "^4.0.0",
         "vfile-message": "^4.0.0"
       },
@@ -1760,40 +1594,10 @@
         }
       ]
     },
-    "node_modules/minimatch": {
-      "version": "9.0.5",
-      "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-9.0.5.tgz",
-      "integrity": "sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==",
-      "dev": true,
-      "dependencies": {
-        "brace-expansion": "^2.0.1"
-      },
-      "engines": {
-        "node": ">=16 || 14 >=14.17"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
-      }
-    },
-    "node_modules/minipass": {
-      "version": "7.1.2",
-      "resolved": "https://registry.npmjs.org/minipass/-/minipass-7.1.2.tgz",
-      "integrity": "sha512-qOOzS1cBTWYF4BH8fVePDBOO9iptMnGUEZwNc/cMWnTV2nVLZ7VoNWEPHkYczZA0pdoA7dl6e7FL659nX9S2aw==",
-      "dev": true,
-      "engines": {
-        "node": ">=16 || 14 >=14.17"
-      }
-    },
     "node_modules/ms": {
-      "version": "2.1.2",
-      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.2.tgz",
-      "integrity": "sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w=="
-    },
-    "node_modules/package-json-from-dist": {
-      "version": "1.0.0",
-      "resolved": "https://registry.npmjs.org/package-json-from-dist/-/package-json-from-dist-1.0.0.tgz",
-      "integrity": "sha512-dATvCeZN/8wQsGywez1mzHtTlP22H8OEfPrVMLNr4/eGa+ijtLn/6M5f0dY8UKNrC2O9UCU6SSoG3qRKnt7STw==",
-      "dev": true
+      "version": "2.1.3",
+      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
+      "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA=="
     },
     "node_modules/parse-entities": {
       "version": "4.0.1",
@@ -1815,18 +1619,9 @@
       }
     },
     "node_modules/parse-entities/node_modules/@types/unist": {
-      "version": "2.0.10",
-      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-2.0.10.tgz",
-      "integrity": "sha512-IfYcSBWE3hLpBg8+X2SEa8LVkJdJEkT2Ese2aaLs3ptGdVtABxndrMaxuFlQ1qdFf9Q5rDvDpxI3WwgvKFAsQA=="
-    },
-    "node_modules/path-key": {
-      "version": "3.1.1",
-      "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz",
-      "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
+      "version": "2.0.11",
+      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-2.0.11.tgz",
+      "integrity": "sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA=="
     },
     "node_modules/path-parse": {
       "version": "1.0.7",
@@ -1834,22 +1629,6 @@
       "integrity": "sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw==",
       "dev": true
     },
-    "node_modules/path-scurry": {
-      "version": "1.11.1",
-      "resolved": "https://registry.npmjs.org/path-scurry/-/path-scurry-1.11.1.tgz",
-      "integrity": "sha512-Xa4Nw17FS9ApQFJ9umLiJS4orGjm7ZzwUrwamcGQuHSzDyth9boKDaycYdDcZDuqYATXw4HFXgaqWTctW/v1HA==",
-      "dev": true,
-      "dependencies": {
-        "lru-cache": "^10.2.0",
-        "minipass": "^5.0.0 || ^6.0.2 || ^7.0.0"
-      },
-      "engines": {
-        "node": ">=16 || 14 >=14.18"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
-      }
-    },
     "node_modules/perf-regexes": {
       "version": "1.0.1",
       "resolved": "https://registry.npmjs.org/perf-regexes/-/perf-regexes-1.0.1.tgz",
@@ -1860,12 +1639,12 @@
       }
     },
     "node_modules/picomatch": {
-      "version": "2.3.1",
-      "resolved": "https://registry.npmjs.org/picomatch/-/picomatch-2.3.1.tgz",
-      "integrity": "sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==",
+      "version": "4.0.2",
+      "resolved": "https://registry.npmjs.org/picomatch/-/picomatch-4.0.2.tgz",
+      "integrity": "sha512-M7BAV6Rlcy5u+m6oPhAPFgJTzAioX/6B0DxyvDlo9l8+T3nLKbrczg2WLUyzd45L8RqfUMyGPzekbMvX2Ldkwg==",
       "dev": true,
       "engines": {
-        "node": ">=8.6"
+        "node": ">=12"
       },
       "funding": {
         "url": "https://github.com/sponsors/jonschlinkert"
@@ -2424,23 +2203,12 @@
       }
     },
     "node_modules/remark-lint-no-trailing-spaces": {
-      "version": "2.0.1",
-      "resolved": "https://registry.npmjs.org/remark-lint-no-trailing-spaces/-/remark-lint-no-trailing-spaces-2.0.1.tgz",
-      "integrity": "sha512-cj8t+nvtO6eAY2lJC7o5du8VeOCK13XiDUHL4U6k5aw6ZLr3EYWbQ/rNc6cr60eHkh5Ldm09KiZjV3CWpxqJ0g==",
-      "dependencies": {
-        "unified-lint-rule": "^1.0.2"
-      }
-    },
-    "node_modules/remark-lint-no-trailing-spaces/node_modules/unified-lint-rule": {
-      "version": "1.0.6",
-      "resolved": "https://registry.npmjs.org/unified-lint-rule/-/unified-lint-rule-1.0.6.tgz",
-      "integrity": "sha512-YPK15YBFwnsVorDFG/u0cVVQN5G2a3V8zv5/N6KN3TCG+ajKtaALcy7u14DCSrJI+gZeyYquFL9cioJXOGXSvg==",
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/remark-lint-no-trailing-spaces/-/remark-lint-no-trailing-spaces-3.0.2.tgz",
+      "integrity": "sha512-4KxOdzZ+BlCZDu9yYsKGOfB8fknukxYD+9VhI1I5l7Ns7TgqdYq4k/qwUfTHpQ4TF9aokL/tOpsEGOc4PGKTKw==",
       "dependencies": {
-        "wrapped": "^1.0.1"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/unified"
+        "unified-lint-rule": "^3.0.0",
+        "vfile-location": "^5.0.3"
       }
     },
     "node_modules/remark-lint-no-undefined-references": {
@@ -2508,9 +2276,9 @@
       }
     },
     "node_modules/remark-lint-prohibited-strings/node_modules/@types/unist": {
-      "version": "2.0.10",
-      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-2.0.10.tgz",
-      "integrity": "sha512-IfYcSBWE3hLpBg8+X2SEa8LVkJdJEkT2Ese2aaLs3ptGdVtABxndrMaxuFlQ1qdFf9Q5rDvDpxI3WwgvKFAsQA=="
+      "version": "2.0.11",
+      "resolved": "https://registry.npmjs.org/@types/unist/-/unist-2.0.11.tgz",
+      "integrity": "sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA=="
     },
     "node_modules/remark-lint-prohibited-strings/node_modules/unified": {
       "version": "10.1.2",
@@ -2703,9 +2471,9 @@
       }
     },
     "node_modules/remark-preset-lint-node": {
-      "version": "5.1.0",
-      "resolved": "https://registry.npmjs.org/remark-preset-lint-node/-/remark-preset-lint-node-5.1.0.tgz",
-      "integrity": "sha512-Nt7f1K37qUQZ9OomkXyCH0vsKihQ+qXR+4+r8Emw67hfOrVWhNzJw2jy96bJHMMWLPD1Fp4Q8cSGQxhthaORkQ==",
+      "version": "5.1.2",
+      "resolved": "https://registry.npmjs.org/remark-preset-lint-node/-/remark-preset-lint-node-5.1.2.tgz",
+      "integrity": "sha512-ukBPfLqD05AomGL+Z3tbmBCKTaEM+9Dv8Pn0r/0vok8F95Z0wj/AY70cFhm038ID1vKBD07anky11dvigDAHlw==",
       "dependencies": {
         "js-yaml": "^4.1.0",
         "remark-gfm": "^4.0.0",
@@ -2730,7 +2498,7 @@
         "remark-lint-no-shell-dollars": "^4.0.0",
         "remark-lint-no-table-indentation": "^5.0.0",
         "remark-lint-no-tabs": "^4.0.0",
-        "remark-lint-no-trailing-spaces": "^2.0.1",
+        "remark-lint-no-trailing-spaces": "^3.0.2",
         "remark-lint-prohibited-strings": "^4.0.0",
         "remark-lint-rule-style": "^4.0.0",
         "remark-lint-strong-marker": "^4.0.0",
@@ -2804,12 +2572,12 @@
       }
     },
     "node_modules/rollup": {
-      "version": "4.22.4",
-      "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.22.4.tgz",
-      "integrity": "sha512-vD8HJ5raRcWOyymsR6Z3o6+RzfEPCnVLMFJ6vRslO1jt4LO6dUo5Qnpg7y4RkZFM2DMe3WUirkI5c16onjrc6A==",
+      "version": "4.24.0",
+      "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.24.0.tgz",
+      "integrity": "sha512-DOmrlGSXNk1DM0ljiQA+i+o0rSLhtii1je5wgk60j49d1jHT5YYttBv1iWOnYSTG+fZZESUOSNiAl89SIet+Cg==",
       "dev": true,
       "dependencies": {
-        "@types/estree": "1.0.5"
+        "@types/estree": "1.0.6"
       },
       "bin": {
         "rollup": "dist/bin/rollup"
@@ -2819,22 +2587,22 @@
         "npm": ">=8.0.0"
       },
       "optionalDependencies": {
-        "@rollup/rollup-android-arm-eabi": "4.22.4",
-        "@rollup/rollup-android-arm64": "4.22.4",
-        "@rollup/rollup-darwin-arm64": "4.22.4",
-        "@rollup/rollup-darwin-x64": "4.22.4",
-        "@rollup/rollup-linux-arm-gnueabihf": "4.22.4",
-        "@rollup/rollup-linux-arm-musleabihf": "4.22.4",
-        "@rollup/rollup-linux-arm64-gnu": "4.22.4",
-        "@rollup/rollup-linux-arm64-musl": "4.22.4",
-        "@rollup/rollup-linux-powerpc64le-gnu": "4.22.4",
-        "@rollup/rollup-linux-riscv64-gnu": "4.22.4",
-        "@rollup/rollup-linux-s390x-gnu": "4.22.4",
-        "@rollup/rollup-linux-x64-gnu": "4.22.4",
-        "@rollup/rollup-linux-x64-musl": "4.22.4",
-        "@rollup/rollup-win32-arm64-msvc": "4.22.4",
-        "@rollup/rollup-win32-ia32-msvc": "4.22.4",
-        "@rollup/rollup-win32-x64-msvc": "4.22.4",
+        "@rollup/rollup-android-arm-eabi": "4.24.0",
+        "@rollup/rollup-android-arm64": "4.24.0",
+        "@rollup/rollup-darwin-arm64": "4.24.0",
+        "@rollup/rollup-darwin-x64": "4.24.0",
+        "@rollup/rollup-linux-arm-gnueabihf": "4.24.0",
+        "@rollup/rollup-linux-arm-musleabihf": "4.24.0",
+        "@rollup/rollup-linux-arm64-gnu": "4.24.0",
+        "@rollup/rollup-linux-arm64-musl": "4.24.0",
+        "@rollup/rollup-linux-powerpc64le-gnu": "4.24.0",
+        "@rollup/rollup-linux-riscv64-gnu": "4.24.0",
+        "@rollup/rollup-linux-s390x-gnu": "4.24.0",
+        "@rollup/rollup-linux-x64-gnu": "4.24.0",
+        "@rollup/rollup-linux-x64-musl": "4.24.0",
+        "@rollup/rollup-win32-arm64-msvc": "4.24.0",
+        "@rollup/rollup-win32-ia32-msvc": "4.24.0",
+        "@rollup/rollup-win32-x64-msvc": "4.24.0",
         "fsevents": "~2.3.2"
       }
     },
@@ -2870,9 +2638,9 @@
       "dev": true
     },
     "node_modules/semver": {
-      "version": "7.6.2",
-      "resolved": "https://registry.npmjs.org/semver/-/semver-7.6.2.tgz",
-      "integrity": "sha512-FNAIBWCx9qcRhoHcgcJ0gvU7SN1lYU2ZXuSfl04bSC5OpvDHFyJCjdNHomPXxjQlCBU67YW64PzY7/VIEH7F2w==",
+      "version": "7.6.3",
+      "resolved": "https://registry.npmjs.org/semver/-/semver-7.6.3.tgz",
+      "integrity": "sha512-oVekP1cKtI+CTDvHWYFUcMtsK/00wmAEfyqKfNdARm8u1wNVhSgaX7A8d4UuIlUI5e84iEwOhs7ZPYRmzU9U6A==",
       "bin": {
         "semver": "bin/semver.js"
       },
@@ -2880,39 +2648,6 @@
         "node": ">=10"
       }
     },
-    "node_modules/shebang-command": {
-      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz",
-      "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==",
-      "dev": true,
-      "dependencies": {
-        "shebang-regex": "^3.0.0"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/shebang-regex": {
-      "version": "3.0.0",
-      "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz",
-      "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/signal-exit": {
-      "version": "4.1.0",
-      "resolved": "https://registry.npmjs.org/signal-exit/-/signal-exit-4.1.0.tgz",
-      "integrity": "sha512-bzyZ1e88w9O1iNJbKnOlvYTrWPDl46O1bG0D3XInv+9tkPrxrN8jUUTiFlDkkmKWgn1M6CfIA13SuGqOa9Korw==",
-      "dev": true,
-      "engines": {
-        "node": ">=14"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/isaacs"
-      }
-    },
     "node_modules/skip-regex": {
       "version": "1.0.2",
       "resolved": "https://registry.npmjs.org/skip-regex/-/skip-regex-1.0.2.tgz",
@@ -2922,11 +2657,6 @@
         "node": ">=4.2"
       }
     },
-    "node_modules/sliced": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/sliced/-/sliced-1.0.1.tgz",
-      "integrity": "sha512-VZBmZP8WU3sMOZm1bdgTadsQbcscK0UM8oKxKVBs4XAhUo2Xxzm/OFMGBkPusxw9xL3Uy8LrzEqGqJhclsr0yA=="
-    },
     "node_modules/sourcemap-codec": {
       "version": "1.4.8",
       "resolved": "https://registry.npmjs.org/sourcemap-codec/-/sourcemap-codec-1.4.8.tgz",
@@ -2944,64 +2674,21 @@
       }
     },
     "node_modules/string-width": {
-      "version": "5.1.2",
-      "resolved": "https://registry.npmjs.org/string-width/-/string-width-5.1.2.tgz",
-      "integrity": "sha512-HnLOCR3vjcY8beoNLtcjZ5/nxn2afmME6lhrDrebokqMap+XbeW8n9TXpPDOqdGK5qcI3oT0GKTW6wC7EMiVqA==",
-      "dev": true,
+      "version": "6.1.0",
+      "resolved": "https://registry.npmjs.org/string-width/-/string-width-6.1.0.tgz",
+      "integrity": "sha512-k01swCJAgQmuADB0YIc+7TuatfNvTBVOoaUWJjTB9R4VJzR5vNWzf5t42ESVZFPS8xTySF7CAdV4t/aaIm3UnQ==",
       "dependencies": {
         "eastasianwidth": "^0.2.0",
-        "emoji-regex": "^9.2.2",
+        "emoji-regex": "^10.2.1",
         "strip-ansi": "^7.0.1"
       },
       "engines": {
-        "node": ">=12"
+        "node": ">=16"
       },
       "funding": {
         "url": "https://github.com/sponsors/sindresorhus"
       }
     },
-    "node_modules/string-width-cjs": {
-      "name": "string-width",
-      "version": "4.2.3",
-      "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
-      "integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
-      "dev": true,
-      "dependencies": {
-        "emoji-regex": "^8.0.0",
-        "is-fullwidth-code-point": "^3.0.0",
-        "strip-ansi": "^6.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/string-width-cjs/node_modules/ansi-regex": {
-      "version": "5.0.1",
-      "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
-      "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/string-width-cjs/node_modules/emoji-regex": {
-      "version": "8.0.0",
-      "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
-      "integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==",
-      "dev": true
-    },
-    "node_modules/string-width-cjs/node_modules/strip-ansi": {
-      "version": "6.0.1",
-      "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
-      "integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
-      "dev": true,
-      "dependencies": {
-        "ansi-regex": "^5.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
     "node_modules/stringify-entities": {
       "version": "4.0.4",
       "resolved": "https://registry.npmjs.org/stringify-entities/-/stringify-entities-4.0.4.tgz",
@@ -3029,28 +2716,6 @@
         "url": "https://github.com/chalk/strip-ansi?sponsor=1"
       }
     },
-    "node_modules/strip-ansi-cjs": {
-      "name": "strip-ansi",
-      "version": "6.0.1",
-      "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
-      "integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
-      "dev": true,
-      "dependencies": {
-        "ansi-regex": "^5.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/strip-ansi-cjs/node_modules/ansi-regex": {
-      "version": "5.0.1",
-      "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
-      "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
-    },
     "node_modules/supports-color": {
       "version": "9.4.0",
       "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-9.4.0.tgz",
@@ -3171,19 +2836,6 @@
         "url": "https://opencollective.com/unified"
       }
     },
-    "node_modules/unist-util-remove-position": {
-      "version": "5.0.0",
-      "resolved": "https://registry.npmjs.org/unist-util-remove-position/-/unist-util-remove-position-5.0.0.tgz",
-      "integrity": "sha512-Hp5Kh3wLxv0PHj9m2yZhhLt58KzPtEYKQQ4yxfYFEO7EvHwzyDYnduhHnY1mDxoqr7VUwVuHXk9RXKIiYS1N8Q==",
-      "dependencies": {
-        "@types/unist": "^3.0.0",
-        "unist-util-visit": "^5.0.0"
-      },
-      "funding": {
-        "type": "opencollective",
-        "url": "https://opencollective.com/unified"
-      }
-    },
     "node_modules/unist-util-stringify-position": {
       "version": "4.0.0",
       "resolved": "https://registry.npmjs.org/unist-util-stringify-position/-/unist-util-stringify-position-4.0.0.tgz",
@@ -3224,12 +2876,11 @@
       }
     },
     "node_modules/vfile": {
-      "version": "6.0.1",
-      "resolved": "https://registry.npmjs.org/vfile/-/vfile-6.0.1.tgz",
-      "integrity": "sha512-1bYqc7pt6NIADBJ98UiG0Bn/CHIVOoZ/IyEkqIruLg0mE1BKzkOXY2D6CSqQIcKqgadppE5lrxgWXJmXd7zZJw==",
+      "version": "6.0.3",
+      "resolved": "https://registry.npmjs.org/vfile/-/vfile-6.0.3.tgz",
+      "integrity": "sha512-KzIbH/9tXat2u30jf+smMwFCsno4wHVdNmzFyL+T/L3UGqqk6JKfVqOFOZEpZSHADH1k40ab6NUIXZq422ov3Q==",
       "dependencies": {
         "@types/unist": "^3.0.0",
-        "unist-util-stringify-position": "^4.0.0",
         "vfile-message": "^4.0.0"
       },
       "funding": {
@@ -3238,9 +2889,9 @@
       }
     },
     "node_modules/vfile-location": {
-      "version": "5.0.2",
-      "resolved": "https://registry.npmjs.org/vfile-location/-/vfile-location-5.0.2.tgz",
-      "integrity": "sha512-NXPYyxyBSH7zB5U6+3uDdd6Nybz6o6/od9rk8bp9H8GR3L+cm/fC0uUTbqBmUTnMCUDslAGBOIKNfvvb+gGlDg==",
+      "version": "5.0.3",
+      "resolved": "https://registry.npmjs.org/vfile-location/-/vfile-location-5.0.3.tgz",
+      "integrity": "sha512-5yXvWDEgqeiYiBe1lbxYF7UMAIm/IcopxMHrMQDq3nvKcjPKIhZklUKL+AE7J7uApI4kwe2snsK+eI6UTj9EHg==",
       "dependencies": {
         "@types/unist": "^3.0.0",
         "vfile": "^6.0.0"
@@ -3282,27 +2933,6 @@
         "url": "https://opencollective.com/unified"
       }
     },
-    "node_modules/vfile-reporter/node_modules/emoji-regex": {
-      "version": "10.3.0",
-      "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-10.3.0.tgz",
-      "integrity": "sha512-QpLs9D9v9kArv4lfDEgg1X/gN5XLnf/A6l9cs8SPZLRZR3ZkY9+kwIQTxm+fsSej5UMYGE8fdoaZVIBlqG0XTw=="
-    },
-    "node_modules/vfile-reporter/node_modules/string-width": {
-      "version": "6.1.0",
-      "resolved": "https://registry.npmjs.org/string-width/-/string-width-6.1.0.tgz",
-      "integrity": "sha512-k01swCJAgQmuADB0YIc+7TuatfNvTBVOoaUWJjTB9R4VJzR5vNWzf5t42ESVZFPS8xTySF7CAdV4t/aaIm3UnQ==",
-      "dependencies": {
-        "eastasianwidth": "^0.2.0",
-        "emoji-regex": "^10.2.1",
-        "strip-ansi": "^7.0.1"
-      },
-      "engines": {
-        "node": ">=16"
-      },
-      "funding": {
-        "url": "https://github.com/sponsors/sindresorhus"
-      }
-    },
     "node_modules/vfile-sort": {
       "version": "4.0.0",
       "resolved": "https://registry.npmjs.org/vfile-sort/-/vfile-sort-4.0.0.tgz",
@@ -3329,121 +2959,6 @@
         "url": "https://opencollective.com/unified"
       }
     },
-    "node_modules/which": {
-      "version": "2.0.2",
-      "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz",
-      "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==",
-      "dev": true,
-      "dependencies": {
-        "isexe": "^2.0.0"
-      },
-      "bin": {
-        "node-which": "bin/node-which"
-      },
-      "engines": {
-        "node": ">= 8"
-      }
-    },
-    "node_modules/wrap-ansi": {
-      "version": "8.1.0",
-      "resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-8.1.0.tgz",
-      "integrity": "sha512-si7QWI6zUMq56bESFvagtmzMdGOtoxfR+Sez11Mobfc7tm+VkUckk9bW2UeffTGVUbOksxmSw0AA2gs8g71NCQ==",
-      "dev": true,
-      "dependencies": {
-        "ansi-styles": "^6.1.0",
-        "string-width": "^5.0.1",
-        "strip-ansi": "^7.0.1"
-      },
-      "engines": {
-        "node": ">=12"
-      },
-      "funding": {
-        "url": "https://github.com/chalk/wrap-ansi?sponsor=1"
-      }
-    },
-    "node_modules/wrap-ansi-cjs": {
-      "name": "wrap-ansi",
-      "version": "7.0.0",
-      "resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz",
-      "integrity": "sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==",
-      "dev": true,
-      "dependencies": {
-        "ansi-styles": "^4.0.0",
-        "string-width": "^4.1.0",
-        "strip-ansi": "^6.0.0"
-      },
-      "engines": {
-        "node": ">=10"
-      },
-      "funding": {
-        "url": "https://github.com/chalk/wrap-ansi?sponsor=1"
-      }
-    },
-    "node_modules/wrap-ansi-cjs/node_modules/ansi-regex": {
-      "version": "5.0.1",
-      "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
-      "integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==",
-      "dev": true,
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/wrap-ansi-cjs/node_modules/ansi-styles": {
-      "version": "4.3.0",
-      "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
-      "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
-      "dev": true,
-      "dependencies": {
-        "color-convert": "^2.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      },
-      "funding": {
-        "url": "https://github.com/chalk/ansi-styles?sponsor=1"
-      }
-    },
-    "node_modules/wrap-ansi-cjs/node_modules/emoji-regex": {
-      "version": "8.0.0",
-      "resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
-      "integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==",
-      "dev": true
-    },
-    "node_modules/wrap-ansi-cjs/node_modules/string-width": {
-      "version": "4.2.3",
-      "resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
-      "integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
-      "dev": true,
-      "dependencies": {
-        "emoji-regex": "^8.0.0",
-        "is-fullwidth-code-point": "^3.0.0",
-        "strip-ansi": "^6.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/wrap-ansi-cjs/node_modules/strip-ansi": {
-      "version": "6.0.1",
-      "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
-      "integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
-      "dev": true,
-      "dependencies": {
-        "ansi-regex": "^5.0.1"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
-    "node_modules/wrapped": {
-      "version": "1.0.1",
-      "resolved": "https://registry.npmjs.org/wrapped/-/wrapped-1.0.1.tgz",
-      "integrity": "sha512-ZTKuqiTu3WXtL72UKCCnQLRax2IScKH7oQ+mvjbpvNE+NJxIWIemDqqM2GxNr4N16NCjOYpIgpin5pStM7kM5g==",
-      "dependencies": {
-        "co": "3.1.0",
-        "sliced": "^1.0.1"
-      }
-    },
     "node_modules/zwitch": {
       "version": "2.0.4",
       "resolved": "https://registry.npmjs.org/zwitch/-/zwitch-2.0.4.tgz",
diff --git a/tools/lint-md/package.json b/tools/lint-md/package.json
index 3ab5185d969f50..dbc662730253f6 100644
--- a/tools/lint-md/package.json
+++ b/tools/lint-md/package.json
@@ -7,16 +7,16 @@
   },
   "dependencies": {
     "remark-parse": "^11.0.0",
-    "remark-preset-lint-node": "^5.1.0",
+    "remark-preset-lint-node": "^5.1.2",
     "remark-stringify": "^11.0.0",
     "to-vfile": "^8.0.0",
     "unified": "^11.0.5",
     "vfile-reporter": "^8.1.1"
   },
   "devDependencies": {
-    "@rollup/plugin-commonjs": "^26.0.1",
-    "@rollup/plugin-node-resolve": "^15.2.3",
-    "rollup": "^4.22.4",
+    "@rollup/plugin-commonjs": "^28.0.1",
+    "@rollup/plugin-node-resolve": "^15.3.0",
+    "rollup": "^4.24.0",
     "rollup-plugin-cleanup": "^3.2.1"
   }
 }

From 86cb697b81ddfaebbb98d336f570c052aac43959 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Tue, 22 Oct 2024 18:09:32 +0200
Subject: [PATCH 045/216] esm: add a fallback when importer in not a file

PR-URL: https://github.com/nodejs/node/pull/55471
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 lib/internal/modules/esm/resolve.js     | 24 +++++++++++++++++++++---
 test/es-module/test-esm-main-lookup.mjs |  8 ++++++++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/internal/modules/esm/resolve.js b/lib/internal/modules/esm/resolve.js
index e73a8ad60a1392..c37d769f40088f 100644
--- a/lib/internal/modules/esm/resolve.js
+++ b/lib/internal/modules/esm/resolve.js
@@ -236,9 +236,15 @@ const encodedSepRegEx = /%2F|%5C/i;
  */
 function finalizeResolution(resolved, base, preserveSymlinks) {
   if (RegExpPrototypeExec(encodedSepRegEx, resolved.pathname) !== null) {
+    let basePath;
+    try {
+      basePath = fileURLToPath(base);
+    } catch {
+      basePath = base;
+    }
     throw new ERR_INVALID_MODULE_SPECIFIER(
       resolved.pathname, 'must not include encoded "/" or "\\" characters',
-      fileURLToPath(base));
+      basePath);
   }
 
   let path;
@@ -256,14 +262,26 @@ function finalizeResolution(resolved, base, preserveSymlinks) {
 
   // Check for stats.isDirectory()
   if (stats === 1) {
-    throw new ERR_UNSUPPORTED_DIR_IMPORT(path, fileURLToPath(base), String(resolved));
+    let basePath;
+    try {
+      basePath = fileURLToPath(base);
+    } catch {
+      basePath = base;
+    }
+    throw new ERR_UNSUPPORTED_DIR_IMPORT(path, basePath, String(resolved));
   } else if (stats !== 0) {
     // Check for !stats.isFile()
     if (process.env.WATCH_REPORT_DEPENDENCIES && process.send) {
       process.send({ 'watch:require': [path || resolved.pathname] });
     }
+    let basePath;
+    try {
+      basePath = fileURLToPath(base);
+    } catch {
+      basePath = base;
+    }
     throw new ERR_MODULE_NOT_FOUND(
-      path || resolved.pathname, base && fileURLToPath(base), resolved);
+      path || resolved.pathname, basePath, resolved);
   }
 
   if (!preserveSymlinks) {
diff --git a/test/es-module/test-esm-main-lookup.mjs b/test/es-module/test-esm-main-lookup.mjs
index 4f4f1c378914d7..30042514f26e0e 100644
--- a/test/es-module/test-esm-main-lookup.mjs
+++ b/test/es-module/test-esm-main-lookup.mjs
@@ -15,6 +15,14 @@ await assert.rejects(import('../fixtures/es-modules/pjson-main'), {
   code: 'ERR_UNSUPPORTED_DIR_IMPORT',
   url: fixtures.fileURL('es-modules/pjson-main').href,
 });
+await assert.rejects(import(`data:text/javascript,import${encodeURIComponent(JSON.stringify(fixtures.fileURL('es-modules/pjson-main')))}`), {
+  code: 'ERR_UNSUPPORTED_DIR_IMPORT',
+  url: fixtures.fileURL('es-modules/pjson-main').href,
+});
+await assert.rejects(import(`data:text/javascript,import${encodeURIComponent(JSON.stringify(fixtures.fileURL('es-modules/does-not-exist')))}`), {
+  code: 'ERR_MODULE_NOT_FOUND',
+  url: fixtures.fileURL('es-modules/does-not-exist').href,
+});
 
 assert.deepStrictEqual(
   { ...await import('../fixtures/es-modules/pjson-main/main.mjs') },

From ac583d45497d0b91a53473537733738c7588ea15 Mon Sep 17 00:00:00 2001
From: Marvin ROGER <marvinroger@users.noreply.github.com>
Date: Tue, 22 Oct 2024 18:14:02 +0200
Subject: [PATCH 046/216] stream: propagate AbortSignal reason

PR-URL: https://github.com/nodejs/node/pull/55473
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 lib/internal/streams/pipeline.js      | 2 +-
 test/parallel/test-stream-pipeline.js | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/internal/streams/pipeline.js b/lib/internal/streams/pipeline.js
index 098ae7462cb55c..598153fb39ef64 100644
--- a/lib/internal/streams/pipeline.js
+++ b/lib/internal/streams/pipeline.js
@@ -204,7 +204,7 @@ function pipelineImpl(streams, callback, opts) {
   validateAbortSignal(outerSignal, 'options.signal');
 
   function abort() {
-    finishImpl(new AbortError());
+    finishImpl(new AbortError(undefined, { cause: outerSignal?.reason }));
   }
 
   addAbortListener ??= require('internal/events/abort_listener').addAbortListener;
diff --git a/test/parallel/test-stream-pipeline.js b/test/parallel/test-stream-pipeline.js
index 8237fff33b3ac8..d31c598abf03e5 100644
--- a/test/parallel/test-stream-pipeline.js
+++ b/test/parallel/test-stream-pipeline.js
@@ -1331,12 +1331,13 @@ const tsp = require('timers/promises');
 
 {
   const ac = new AbortController();
+  const reason = new Error('Reason');
   const r = Readable.from(async function* () {
     for (let i = 0; i < 10; i++) {
       await Promise.resolve();
       yield String(i);
       if (i === 5) {
-        ac.abort();
+        ac.abort(reason);
       }
     }
   }());
@@ -1349,6 +1350,7 @@ const tsp = require('timers/promises');
   });
   const cb = common.mustCall((err) => {
     assert.strictEqual(err.name, 'AbortError');
+    assert.strictEqual(err.cause, reason);
     assert.strictEqual(res, '012345');
     assert.strictEqual(w.destroyed, true);
     assert.strictEqual(r.destroyed, true);

From 8f9c6423697f7c083c805f73fc657e1d6a67bd17 Mon Sep 17 00:00:00 2001
From: Cheng <git@zcbenz.com>
Date: Wed, 23 Oct 2024 12:48:59 +0900
Subject: [PATCH 047/216] build: fix GN build for cares/uv deps
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55477
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
---
 deps/cares/unofficial.gni | 8 ++++----
 deps/uv/unofficial.gni    | 1 +
 unofficial.gni            | 1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/deps/cares/unofficial.gni b/deps/cares/unofficial.gni
index 9296548239fcde..e02d7f425194c9 100644
--- a/deps/cares/unofficial.gni
+++ b/deps/cares/unofficial.gni
@@ -38,7 +38,10 @@ template("cares_gn_build") {
       ]
     }
 
-    include_dirs = [ "src/lib" ]
+    include_dirs = [
+      "src/lib",
+      "src/lib/include",
+    ]
     if (is_win) {
       include_dirs += [ "config/win32" ]
     } else if (is_linux) {
@@ -55,9 +58,6 @@ template("cares_gn_build") {
     }
 
     sources = gypi_values.cares_sources_common
-    if (is_win) {
-      sources += gypi_values.cares_sources_win
-    }
     if (is_linux) {
       sources += [ "config/linux/ares_config.h" ]
     }
diff --git a/deps/uv/unofficial.gni b/deps/uv/unofficial.gni
index ce30341044e907..7a73f891e3fc32 100644
--- a/deps/uv/unofficial.gni
+++ b/deps/uv/unofficial.gni
@@ -40,6 +40,7 @@ template("uv_gn_build") {
         "-Wno-extra-semi",
         "-Wno-implicit-fallthrough",
         "-Wno-missing-braces",
+        "-Wno-sign-compare",
         "-Wno-string-conversion",
         "-Wno-shadow",
         "-Wno-unreachable-code",
diff --git a/unofficial.gni b/unofficial.gni
index b2dd0b9fa3c7aa..c967fabd0c4906 100644
--- a/unofficial.gni
+++ b/unofficial.gni
@@ -68,6 +68,7 @@ template("node_gn_build") {
       "-Wno-extra-semi",
       "-Wno-implicit-fallthrough",
       "-Wno-macro-redefined",
+      "-Wno-missing-braces",
       "-Wno-return-type",
       "-Wno-shadow",
       "-Wno-sometimes-uninitialized",

From c1b63e5e6bd60eeab78ef8cc45e881876d1cfde3 Mon Sep 17 00:00:00 2001
From: adriancuadrado <29214635+adriancuadrado@users.noreply.github.com>
Date: Wed, 23 Oct 2024 14:06:41 +0200
Subject: [PATCH 048/216] doc: changed the command used to verify SHASUMS256
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55420
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
---
 README.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 2bb65e5dcc4d6d..efbd31e03f9bdb 100644
--- a/README.md
+++ b/README.md
@@ -104,11 +104,10 @@ To download `SHASUMS256.txt` using `curl`:
 curl -O https://nodejs.org/dist/vx.y.z/SHASUMS256.txt
 ```
 
-To check that a downloaded file matches the checksum, run
-it through `sha256sum` with a command such as:
+To check that downloaded files match the checksum, use `sha256sum`:
 
 ```bash
-grep node-vx.y.z.tar.gz SHASUMS256.txt | sha256sum -c -
+sha256sum -c SHASUMS256.txt --ignore-missing
 ```
 
 For Current and LTS, the GPG detached signature of `SHASUMS256.txt` is in

From ceafb3250d9960462c37fab76466ade0b03bd0ec Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Thu, 24 Oct 2024 15:37:15 +0200
Subject: [PATCH 049/216] test,crypto: make crypto tests work with BoringSSL

PR-URL: https://github.com/nodejs/node/pull/55491
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Filip Skokan <panva.ip@gmail.com>
---
 test/parallel/test-crypto-dh-errors.js               | 4 ++--
 test/parallel/test-crypto-private-decrypt-gh32240.js | 2 +-
 test/parallel/test-tls-getcertificate-x509.js        | 9 ++-------
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/test/parallel/test-crypto-dh-errors.js b/test/parallel/test-crypto-dh-errors.js
index fcf1922bcdba73..476ca64b4425b5 100644
--- a/test/parallel/test-crypto-dh-errors.js
+++ b/test/parallel/test-crypto-dh-errors.js
@@ -43,7 +43,7 @@ for (const g of [-1, 1]) {
   const ex = {
     code: 'ERR_OSSL_DH_BAD_GENERATOR',
     name: 'Error',
-    message: /bad generator/,
+    message: /(?:bad[_ ]generator)/i,
   };
   assert.throws(() => crypto.createDiffieHellman('abcdef', g), ex);
   assert.throws(() => crypto.createDiffieHellman('abcdef', 'hex', g), ex);
@@ -55,7 +55,7 @@ for (const g of [Buffer.from([]),
   const ex = {
     code: 'ERR_OSSL_DH_BAD_GENERATOR',
     name: 'Error',
-    message: /bad generator/,
+    message: /(?:bad[_ ]generator)/i,
   };
   assert.throws(() => crypto.createDiffieHellman('abcdef', g), ex);
   assert.throws(() => crypto.createDiffieHellman('abcdef', 'hex', g), ex);
diff --git a/test/parallel/test-crypto-private-decrypt-gh32240.js b/test/parallel/test-crypto-private-decrypt-gh32240.js
index 1785f5eef3d202..e88227a215ba4f 100644
--- a/test/parallel/test-crypto-private-decrypt-gh32240.js
+++ b/test/parallel/test-crypto-private-decrypt-gh32240.js
@@ -24,7 +24,7 @@ const pkeyEncrypted =
   pair.privateKey.export({
     type: 'pkcs1',
     format: 'pem',
-    cipher: 'aes128',
+    cipher: 'aes-128-cbc',
     passphrase: 'secret',
   });
 
diff --git a/test/parallel/test-tls-getcertificate-x509.js b/test/parallel/test-tls-getcertificate-x509.js
index aa685ca9e09cf0..704aa33e6edfab 100644
--- a/test/parallel/test-tls-getcertificate-x509.js
+++ b/test/parallel/test-tls-getcertificate-x509.js
@@ -20,9 +20,7 @@ const server = tls.createServer(options, function(cleartext) {
 server.once('secureConnection', common.mustCall(function(socket) {
   const cert = socket.getX509Certificate();
   assert(cert instanceof X509Certificate);
-  assert.strictEqual(
-    cert.serialNumber,
-    '5B75D77EDC7FB5B7FA9F1424DA4C64FB815DCBDE');
+  assert.match(cert.serialNumber, /5B75D77EDC7FB5B7FA9F1424DA4C64FB815DCBDE/i);
 }));
 
 server.listen(0, common.mustCall(function() {
@@ -33,10 +31,7 @@ server.listen(0, common.mustCall(function() {
     const peerCert = socket.getPeerX509Certificate();
     assert(peerCert.issuerCertificate instanceof X509Certificate);
     assert.strictEqual(peerCert.issuerCertificate.issuerCertificate, undefined);
-    assert.strictEqual(
-      peerCert.issuerCertificate.serialNumber,
-      '147D36C1C2F74206DE9FAB5F2226D78ADB00A425'
-    );
+    assert.match(peerCert.issuerCertificate.serialNumber, /147D36C1C2F74206DE9FAB5F2226D78ADB00A425/i);
     server.close();
   }));
   socket.end('Hello');

From 29862ae105620860956486d41be64d4f7fb52820 Mon Sep 17 00:00:00 2001
From: Jason Zhang <xzha4350@gmail.com>
Date: Fri, 25 Oct 2024 20:40:39 +1030
Subject: [PATCH 050/216] doc: add jazelly to collaborators

PR-URL: https://github.com/nodejs/node/pull/55531
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index efbd31e03f9bdb..f51b1487f86589 100644
--- a/README.md
+++ b/README.md
@@ -357,6 +357,8 @@ For information about the governance of the Node.js project, see
   **Jacob Smith** <<jacob@frende.me>> (he/him)
 * [jasnell](https://github.com/jasnell) -
   **James M Snell** <<jasnell@gmail.com>> (he/him)
+* [jazelly](https://github.com/jazelly) -
+  **Jason Zhang** <<xzha4350@gmail.com>> (he/him)
 * [jkrems](https://github.com/jkrems) -
   **Jan Krems** <<jan.krems@gmail.com>> (he/him)
 * [joyeecheung](https://github.com/joyeecheung) -

From e9b0ff482b65a194cfd0f2f779a3f9b8cb959e95 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Jos=C3=A9?= <soyjuanarbol@gmail.com>
Date: Fri, 25 Oct 2024 12:39:45 -0500
Subject: [PATCH 051/216] test: increase test coverage for
 `http.OutgoingMessage.appendHeader()`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55467
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
---
 test/parallel/test-http-multiple-headers.js | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/test/parallel/test-http-multiple-headers.js b/test/parallel/test-http-multiple-headers.js
index 8f52f817273c52..b83360edca2c4f 100644
--- a/test/parallel/test-http-multiple-headers.js
+++ b/test/parallel/test-http-multiple-headers.js
@@ -65,6 +65,10 @@ const server = createServer(
       res.write('BODY');
       res.end();
 
+      assert.throws(() => res.appendHeader(), {
+        code: 'ERR_HTTP_HEADERS_SENT',
+      });
+
       assert.deepStrictEqual(res.getHeader('X-Res-a'), ['AAA', 'BBB', 'CCC']);
       assert.deepStrictEqual(res.getHeader('x-res-a'), ['AAA', 'BBB', 'CCC']);
       assert.deepStrictEqual(

From 8f6462f40bf180a179c5f325695b3ab58dc98a34 Mon Sep 17 00:00:00 2001
From: Livia Medeiros <livia@cirno.name>
Date: Sun, 27 Oct 2024 05:27:20 +0900
Subject: [PATCH 052/216] test: avoid `apply()` calls with large amount of
 elements

PR-URL: https://github.com/nodejs/node/pull/55501
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
---
 test/parallel/test-buffer-includes.js | 5 +----
 test/parallel/test-buffer-indexof.js  | 5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/test/parallel/test-buffer-includes.js b/test/parallel/test-buffer-includes.js
index 57b3a33aaa487f..3d6138faca813f 100644
--- a/test/parallel/test-buffer-includes.js
+++ b/test/parallel/test-buffer-includes.js
@@ -213,10 +213,7 @@ assert(!asciiString.includes('\x2061'));
 assert(asciiString.includes('leb', 0));
 
 // Search in string containing many non-ASCII chars.
-const allCodePoints = [];
-for (let i = 0; i < 65534; i++) allCodePoints[i] = i;
-const allCharsString = String.fromCharCode.apply(String, allCodePoints) +
-    String.fromCharCode(65534, 65535);
+const allCharsString = Array.from({ length: 65536 }, (_, i) => String.fromCharCode(i)).join('');
 const allCharsBufferUtf8 = Buffer.from(allCharsString);
 const allCharsBufferUcs2 = Buffer.from(allCharsString, 'ucs2');
 
diff --git a/test/parallel/test-buffer-indexof.js b/test/parallel/test-buffer-indexof.js
index 01cbd3dd5ab186..108b2f3071557e 100644
--- a/test/parallel/test-buffer-indexof.js
+++ b/test/parallel/test-buffer-indexof.js
@@ -285,10 +285,7 @@ assert.strictEqual(-1, asciiString.indexOf('\x2061'));
 assert.strictEqual(asciiString.indexOf('leb', 0), 3);
 
 // Search in string containing many non-ASCII chars.
-const allCodePoints = [];
-for (let i = 0; i < 65534; i++) allCodePoints[i] = i;
-const allCharsString = String.fromCharCode.apply(String, allCodePoints) +
-    String.fromCharCode(65534, 65535);
+const allCharsString = Array.from({ length: 65536 }, (_, i) => String.fromCharCode(i)).join('');
 const allCharsBufferUtf8 = Buffer.from(allCharsString);
 const allCharsBufferUcs2 = Buffer.from(allCharsString, 'ucs2');
 

From 362b01b275770fe357584c883955fb9be14436be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alfredo=20Gonz=C3=A1lez?=
 <12631491+mfdebian@users.noreply.github.com>
Date: Sat, 26 Oct 2024 17:36:25 -0300
Subject: [PATCH 053/216] doc: add esm examples to node:string_decoder

PR-URL: https://github.com/nodejs/node/pull/55507
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
---
 doc/api/string_decoder.md | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/doc/api/string_decoder.md b/doc/api/string_decoder.md
index 18960f6acb1736..5d848cfb6d550b 100644
--- a/doc/api/string_decoder.md
+++ b/doc/api/string_decoder.md
@@ -10,13 +10,29 @@ The `node:string_decoder` module provides an API for decoding `Buffer` objects
 into strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16
 characters. It can be accessed using:
 
-```js
+```mjs
+import { StringDecoder } from 'node:string_decoder';
+```
+
+```cjs
 const { StringDecoder } = require('node:string_decoder');
 ```
 
 The following example shows the basic use of the `StringDecoder` class.
 
-```js
+```mjs
+import { StringDecoder } from 'node:string_decoder';
+import { Buffer } from 'node:buffer';
+const decoder = new StringDecoder('utf8');
+
+const cent = Buffer.from([0xC2, 0xA2]);
+console.log(decoder.write(cent)); // Prints: ¢
+
+const euro = Buffer.from([0xE2, 0x82, 0xAC]);
+console.log(decoder.write(euro)); // Prints: €
+```
+
+```cjs
 const { StringDecoder } = require('node:string_decoder');
 const decoder = new StringDecoder('utf8');
 
@@ -35,7 +51,17 @@ next call to `stringDecoder.write()` or until `stringDecoder.end()` is called.
 In the following example, the three UTF-8 encoded bytes of the European Euro
 symbol (`€`) are written over three separate operations:
 
-```js
+```mjs
+import { StringDecoder } from 'node:string_decoder';
+import { Buffer } from 'node:buffer';
+const decoder = new StringDecoder('utf8');
+
+decoder.write(Buffer.from([0xE2]));
+decoder.write(Buffer.from([0x82]));
+console.log(decoder.end(Buffer.from([0xAC]))); // Prints: €
+```
+
+```cjs
 const { StringDecoder } = require('node:string_decoder');
 const decoder = new StringDecoder('utf8');
 

From 0b6d62c81297b4b5807d7b0f87efda88bb085207 Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Mon, 28 Oct 2024 10:27:31 +0100
Subject: [PATCH 054/216] build: fix GN arg used in generate_config_gypi.py

PR-URL: https://github.com/nodejs/node/pull/55530
Reviewed-By: Cheng Zhao <zcbenz@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 tools/generate_config_gypi.py | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/generate_config_gypi.py b/tools/generate_config_gypi.py
index 1110c1caa934d9..206da7f08eaea4 100755
--- a/tools/generate_config_gypi.py
+++ b/tools/generate_config_gypi.py
@@ -19,11 +19,7 @@
 
 # Regex used for parsing results of "gn args".
 GN_RE = re.compile(r'(\w+)\s+=\s+(.*?)$', re.MULTILINE)
-
-if sys.platform == 'win32':
-  GN = 'gn.exe'
-else:
-  GN = 'gn'
+GN = 'gn.bat' if sys.platform == 'win32' else 'gn'
 
 def bool_to_number(v):
   return 1 if v else 0

From 07359ec14f8e1f7f1c6687012a5c4294f7814eb4 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 28 Oct 2024 21:36:48 -0400
Subject: [PATCH 055/216] deps: update acorn to 8.13.0

PR-URL: https://github.com/nodejs/node/pull/55558
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 deps/acorn/acorn/CHANGELOG.md   |  6 ++++++
 deps/acorn/acorn/dist/acorn.js  | 10 +++++-----
 deps/acorn/acorn/dist/acorn.mjs | 10 +++++-----
 deps/acorn/acorn/package.json   |  2 +-
 src/acorn_version.h             |  2 +-
 5 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/deps/acorn/acorn/CHANGELOG.md b/deps/acorn/acorn/CHANGELOG.md
index c404a235c5eef4..1e090161fffa80 100644
--- a/deps/acorn/acorn/CHANGELOG.md
+++ b/deps/acorn/acorn/CHANGELOG.md
@@ -1,3 +1,9 @@
+## 8.13.0 (2024-10-16)
+
+### New features
+
+Upgrade to Unicode 16.0.
+
 ## 8.12.1 (2024-07-03)
 
 ### Bug fixes
diff --git a/deps/acorn/acorn/dist/acorn.js b/deps/acorn/acorn/dist/acorn.js
index 68bf2a714e294d..7cd26fa36b5caa 100644
--- a/deps/acorn/acorn/dist/acorn.js
+++ b/deps/acorn/acorn/dist/acorn.js
@@ -5,16 +5,16 @@
 })(this, (function (exports) { 'use strict';
 
   // This file was generated. Do not modify manually!
-  var astralIdentifierCodes = [509, 0, 227, 0, 150, 4, 294, 9, 1368, 2, 2, 1, 6, 3, 41, 2, 5, 0, 166, 1, 574, 3, 9, 9, 370, 1, 81, 2, 71, 10, 50, 3, 123, 2, 54, 14, 32, 10, 3, 1, 11, 3, 46, 10, 8, 0, 46, 9, 7, 2, 37, 13, 2, 9, 6, 1, 45, 0, 13, 2, 49, 13, 9, 3, 2, 11, 83, 11, 7, 0, 3, 0, 158, 11, 6, 9, 7, 3, 56, 1, 2, 6, 3, 1, 3, 2, 10, 0, 11, 1, 3, 6, 4, 4, 193, 17, 10, 9, 5, 0, 82, 19, 13, 9, 214, 6, 3, 8, 28, 1, 83, 16, 16, 9, 82, 12, 9, 9, 84, 14, 5, 9, 243, 14, 166, 9, 71, 5, 2, 1, 3, 3, 2, 0, 2, 1, 13, 9, 120, 6, 3, 6, 4, 0, 29, 9, 41, 6, 2, 3, 9, 0, 10, 10, 47, 15, 406, 7, 2, 7, 17, 9, 57, 21, 2, 13, 123, 5, 4, 0, 2, 1, 2, 6, 2, 0, 9, 9, 49, 4, 2, 1, 2, 4, 9, 9, 330, 3, 10, 1, 2, 0, 49, 6, 4, 4, 14, 9, 5351, 0, 7, 14, 13835, 9, 87, 9, 39, 4, 60, 6, 26, 9, 1014, 0, 2, 54, 8, 3, 82, 0, 12, 1, 19628, 1, 4706, 45, 3, 22, 543, 4, 4, 5, 9, 7, 3, 6, 31, 3, 149, 2, 1418, 49, 513, 54, 5, 49, 9, 0, 15, 0, 23, 4, 2, 14, 1361, 6, 2, 16, 3, 6, 2, 1, 2, 4, 101, 0, 161, 6, 10, 9, 357, 0, 62, 13, 499, 13, 983, 6, 110, 6, 6, 9, 4759, 9, 787719, 239];
+  var astralIdentifierCodes = [509, 0, 227, 0, 150, 4, 294, 9, 1368, 2, 2, 1, 6, 3, 41, 2, 5, 0, 166, 1, 574, 3, 9, 9, 7, 9, 32, 4, 318, 1, 80, 3, 71, 10, 50, 3, 123, 2, 54, 14, 32, 10, 3, 1, 11, 3, 46, 10, 8, 0, 46, 9, 7, 2, 37, 13, 2, 9, 6, 1, 45, 0, 13, 2, 49, 13, 9, 3, 2, 11, 83, 11, 7, 0, 3, 0, 158, 11, 6, 9, 7, 3, 56, 1, 2, 6, 3, 1, 3, 2, 10, 0, 11, 1, 3, 6, 4, 4, 68, 8, 2, 0, 3, 0, 2, 3, 2, 4, 2, 0, 15, 1, 83, 17, 10, 9, 5, 0, 82, 19, 13, 9, 214, 6, 3, 8, 28, 1, 83, 16, 16, 9, 82, 12, 9, 9, 7, 19, 58, 14, 5, 9, 243, 14, 166, 9, 71, 5, 2, 1, 3, 3, 2, 0, 2, 1, 13, 9, 120, 6, 3, 6, 4, 0, 29, 9, 41, 6, 2, 3, 9, 0, 10, 10, 47, 15, 343, 9, 54, 7, 2, 7, 17, 9, 57, 21, 2, 13, 123, 5, 4, 0, 2, 1, 2, 6, 2, 0, 9, 9, 49, 4, 2, 1, 2, 4, 9, 9, 330, 3, 10, 1, 2, 0, 49, 6, 4, 4, 14, 10, 5350, 0, 7, 14, 11465, 27, 2343, 9, 87, 9, 39, 4, 60, 6, 26, 9, 535, 9, 470, 0, 2, 54, 8, 3, 82, 0, 12, 1, 19628, 1, 4178, 9, 519, 45, 3, 22, 543, 4, 4, 5, 9, 7, 3, 6, 31, 3, 149, 2, 1418, 49, 513, 54, 5, 49, 9, 0, 15, 0, 23, 4, 2, 14, 1361, 6, 2, 16, 3, 6, 2, 1, 2, 4, 101, 0, 161, 6, 10, 9, 357, 0, 62, 13, 499, 13, 245, 1, 2, 9, 726, 6, 110, 6, 6, 9, 4759, 9, 787719, 239];
 
   // This file was generated. Do not modify manually!
-  var astralIdentifierStartCodes = [0, 11, 2, 25, 2, 18, 2, 1, 2, 14, 3, 13, 35, 122, 70, 52, 268, 28, 4, 48, 48, 31, 14, 29, 6, 37, 11, 29, 3, 35, 5, 7, 2, 4, 43, 157, 19, 35, 5, 35, 5, 39, 9, 51, 13, 10, 2, 14, 2, 6, 2, 1, 2, 10, 2, 14, 2, 6, 2, 1, 68, 310, 10, 21, 11, 7, 25, 5, 2, 41, 2, 8, 70, 5, 3, 0, 2, 43, 2, 1, 4, 0, 3, 22, 11, 22, 10, 30, 66, 18, 2, 1, 11, 21, 11, 25, 71, 55, 7, 1, 65, 0, 16, 3, 2, 2, 2, 28, 43, 28, 4, 28, 36, 7, 2, 27, 28, 53, 11, 21, 11, 18, 14, 17, 111, 72, 56, 50, 14, 50, 14, 35, 349, 41, 7, 1, 79, 28, 11, 0, 9, 21, 43, 17, 47, 20, 28, 22, 13, 52, 58, 1, 3, 0, 14, 44, 33, 24, 27, 35, 30, 0, 3, 0, 9, 34, 4, 0, 13, 47, 15, 3, 22, 0, 2, 0, 36, 17, 2, 24, 20, 1, 64, 6, 2, 0, 2, 3, 2, 14, 2, 9, 8, 46, 39, 7, 3, 1, 3, 21, 2, 6, 2, 1, 2, 4, 4, 0, 19, 0, 13, 4, 159, 52, 19, 3, 21, 2, 31, 47, 21, 1, 2, 0, 185, 46, 42, 3, 37, 47, 21, 0, 60, 42, 14, 0, 72, 26, 38, 6, 186, 43, 117, 63, 32, 7, 3, 0, 3, 7, 2, 1, 2, 23, 16, 0, 2, 0, 95, 7, 3, 38, 17, 0, 2, 0, 29, 0, 11, 39, 8, 0, 22, 0, 12, 45, 20, 0, 19, 72, 264, 8, 2, 36, 18, 0, 50, 29, 113, 6, 2, 1, 2, 37, 22, 0, 26, 5, 2, 1, 2, 31, 15, 0, 328, 18, 16, 0, 2, 12, 2, 33, 125, 0, 80, 921, 103, 110, 18, 195, 2637, 96, 16, 1071, 18, 5, 4026, 582, 8634, 568, 8, 30, 18, 78, 18, 29, 19, 47, 17, 3, 32, 20, 6, 18, 689, 63, 129, 74, 6, 0, 67, 12, 65, 1, 2, 0, 29, 6135, 9, 1237, 43, 8, 8936, 3, 2, 6, 2, 1, 2, 290, 16, 0, 30, 2, 3, 0, 15, 3, 9, 395, 2309, 106, 6, 12, 4, 8, 8, 9, 5991, 84, 2, 70, 2, 1, 3, 0, 3, 1, 3, 3, 2, 11, 2, 0, 2, 6, 2, 64, 2, 3, 3, 7, 2, 6, 2, 27, 2, 3, 2, 4, 2, 0, 4, 6, 2, 339, 3, 24, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 7, 1845, 30, 7, 5, 262, 61, 147, 44, 11, 6, 17, 0, 322, 29, 19, 43, 485, 27, 757, 6, 2, 3, 2, 1, 2, 14, 2, 196, 60, 67, 8, 0, 1205, 3, 2, 26, 2, 1, 2, 0, 3, 0, 2, 9, 2, 3, 2, 0, 2, 0, 7, 0, 5, 0, 2, 0, 2, 0, 2, 2, 2, 1, 2, 0, 3, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 1, 2, 0, 3, 3, 2, 6, 2, 3, 2, 3, 2, 0, 2, 9, 2, 16, 6, 2, 2, 4, 2, 16, 4421, 42719, 33, 4153, 7, 221, 3, 5761, 15, 7472, 16, 621, 2467, 541, 1507, 4938, 6, 4191];
+  var astralIdentifierStartCodes = [0, 11, 2, 25, 2, 18, 2, 1, 2, 14, 3, 13, 35, 122, 70, 52, 268, 28, 4, 48, 48, 31, 14, 29, 6, 37, 11, 29, 3, 35, 5, 7, 2, 4, 43, 157, 19, 35, 5, 35, 5, 39, 9, 51, 13, 10, 2, 14, 2, 6, 2, 1, 2, 10, 2, 14, 2, 6, 2, 1, 4, 51, 13, 310, 10, 21, 11, 7, 25, 5, 2, 41, 2, 8, 70, 5, 3, 0, 2, 43, 2, 1, 4, 0, 3, 22, 11, 22, 10, 30, 66, 18, 2, 1, 11, 21, 11, 25, 71, 55, 7, 1, 65, 0, 16, 3, 2, 2, 2, 28, 43, 28, 4, 28, 36, 7, 2, 27, 28, 53, 11, 21, 11, 18, 14, 17, 111, 72, 56, 50, 14, 50, 14, 35, 39, 27, 10, 22, 251, 41, 7, 1, 17, 2, 60, 28, 11, 0, 9, 21, 43, 17, 47, 20, 28, 22, 13, 52, 58, 1, 3, 0, 14, 44, 33, 24, 27, 35, 30, 0, 3, 0, 9, 34, 4, 0, 13, 47, 15, 3, 22, 0, 2, 0, 36, 17, 2, 24, 20, 1, 64, 6, 2, 0, 2, 3, 2, 14, 2, 9, 8, 46, 39, 7, 3, 1, 3, 21, 2, 6, 2, 1, 2, 4, 4, 0, 19, 0, 13, 4, 31, 9, 2, 0, 3, 0, 2, 37, 2, 0, 26, 0, 2, 0, 45, 52, 19, 3, 21, 2, 31, 47, 21, 1, 2, 0, 185, 46, 42, 3, 37, 47, 21, 0, 60, 42, 14, 0, 72, 26, 38, 6, 186, 43, 117, 63, 32, 7, 3, 0, 3, 7, 2, 1, 2, 23, 16, 0, 2, 0, 95, 7, 3, 38, 17, 0, 2, 0, 29, 0, 11, 39, 8, 0, 22, 0, 12, 45, 20, 0, 19, 72, 200, 32, 32, 8, 2, 36, 18, 0, 50, 29, 113, 6, 2, 1, 2, 37, 22, 0, 26, 5, 2, 1, 2, 31, 15, 0, 328, 18, 16, 0, 2, 12, 2, 33, 125, 0, 80, 921, 103, 110, 18, 195, 2637, 96, 16, 1071, 18, 5, 26, 3994, 6, 582, 6842, 29, 1763, 568, 8, 30, 18, 78, 18, 29, 19, 47, 17, 3, 32, 20, 6, 18, 433, 44, 212, 63, 129, 74, 6, 0, 67, 12, 65, 1, 2, 0, 29, 6135, 9, 1237, 42, 9, 8936, 3, 2, 6, 2, 1, 2, 290, 16, 0, 30, 2, 3, 0, 15, 3, 9, 395, 2309, 106, 6, 12, 4, 8, 8, 9, 5991, 84, 2, 70, 2, 1, 3, 0, 3, 1, 3, 3, 2, 11, 2, 0, 2, 6, 2, 64, 2, 3, 3, 7, 2, 6, 2, 27, 2, 3, 2, 4, 2, 0, 4, 6, 2, 339, 3, 24, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 7, 1845, 30, 7, 5, 262, 61, 147, 44, 11, 6, 17, 0, 322, 29, 19, 43, 485, 27, 229, 29, 3, 0, 496, 6, 2, 3, 2, 1, 2, 14, 2, 196, 60, 67, 8, 0, 1205, 3, 2, 26, 2, 1, 2, 0, 3, 0, 2, 9, 2, 3, 2, 0, 2, 0, 7, 0, 5, 0, 2, 0, 2, 0, 2, 2, 2, 1, 2, 0, 3, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 1, 2, 0, 3, 3, 2, 6, 2, 3, 2, 3, 2, 0, 2, 9, 2, 16, 6, 2, 2, 4, 2, 16, 4421, 42719, 33, 4153, 7, 221, 3, 5761, 15, 7472, 16, 621, 2467, 541, 1507, 4938, 6, 4191];
 
   // This file was generated. Do not modify manually!
-  var nonASCIIidentifierChars = "\u200c\u200d\xb7\u0300-\u036f\u0387\u0483-\u0487\u0591-\u05bd\u05bf\u05c1\u05c2\u05c4\u05c5\u05c7\u0610-\u061a\u064b-\u0669\u0670\u06d6-\u06dc\u06df-\u06e4\u06e7\u06e8\u06ea-\u06ed\u06f0-\u06f9\u0711\u0730-\u074a\u07a6-\u07b0\u07c0-\u07c9\u07eb-\u07f3\u07fd\u0816-\u0819\u081b-\u0823\u0825-\u0827\u0829-\u082d\u0859-\u085b\u0898-\u089f\u08ca-\u08e1\u08e3-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962\u0963\u0966-\u096f\u0981-\u0983\u09bc\u09be-\u09c4\u09c7\u09c8\u09cb-\u09cd\u09d7\u09e2\u09e3\u09e6-\u09ef\u09fe\u0a01-\u0a03\u0a3c\u0a3e-\u0a42\u0a47\u0a48\u0a4b-\u0a4d\u0a51\u0a66-\u0a71\u0a75\u0a81-\u0a83\u0abc\u0abe-\u0ac5\u0ac7-\u0ac9\u0acb-\u0acd\u0ae2\u0ae3\u0ae6-\u0aef\u0afa-\u0aff\u0b01-\u0b03\u0b3c\u0b3e-\u0b44\u0b47\u0b48\u0b4b-\u0b4d\u0b55-\u0b57\u0b62\u0b63\u0b66-\u0b6f\u0b82\u0bbe-\u0bc2\u0bc6-\u0bc8\u0bca-\u0bcd\u0bd7\u0be6-\u0bef\u0c00-\u0c04\u0c3c\u0c3e-\u0c44\u0c46-\u0c48\u0c4a-\u0c4d\u0c55\u0c56\u0c62\u0c63\u0c66-\u0c6f\u0c81-\u0c83\u0cbc\u0cbe-\u0cc4\u0cc6-\u0cc8\u0cca-\u0ccd\u0cd5\u0cd6\u0ce2\u0ce3\u0ce6-\u0cef\u0cf3\u0d00-\u0d03\u0d3b\u0d3c\u0d3e-\u0d44\u0d46-\u0d48\u0d4a-\u0d4d\u0d57\u0d62\u0d63\u0d66-\u0d6f\u0d81-\u0d83\u0dca\u0dcf-\u0dd4\u0dd6\u0dd8-\u0ddf\u0de6-\u0def\u0df2\u0df3\u0e31\u0e34-\u0e3a\u0e47-\u0e4e\u0e50-\u0e59\u0eb1\u0eb4-\u0ebc\u0ec8-\u0ece\u0ed0-\u0ed9\u0f18\u0f19\u0f20-\u0f29\u0f35\u0f37\u0f39\u0f3e\u0f3f\u0f71-\u0f84\u0f86\u0f87\u0f8d-\u0f97\u0f99-\u0fbc\u0fc6\u102b-\u103e\u1040-\u1049\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f-\u109d\u135d-\u135f\u1369-\u1371\u1712-\u1715\u1732-\u1734\u1752\u1753\u1772\u1773\u17b4-\u17d3\u17dd\u17e0-\u17e9\u180b-\u180d\u180f-\u1819\u18a9\u1920-\u192b\u1930-\u193b\u1946-\u194f\u19d0-\u19da\u1a17-\u1a1b\u1a55-\u1a5e\u1a60-\u1a7c\u1a7f-\u1a89\u1a90-\u1a99\u1ab0-\u1abd\u1abf-\u1ace\u1b00-\u1b04\u1b34-\u1b44\u1b50-\u1b59\u1b6b-\u1b73\u1b80-\u1b82\u1ba1-\u1bad\u1bb0-\u1bb9\u1be6-\u1bf3\u1c24-\u1c37\u1c40-\u1c49\u1c50-\u1c59\u1cd0-\u1cd2\u1cd4-\u1ce8\u1ced\u1cf4\u1cf7-\u1cf9\u1dc0-\u1dff\u200c\u200d\u203f\u2040\u2054\u20d0-\u20dc\u20e1\u20e5-\u20f0\u2cef-\u2cf1\u2d7f\u2de0-\u2dff\u302a-\u302f\u3099\u309a\u30fb\ua620-\ua629\ua66f\ua674-\ua67d\ua69e\ua69f\ua6f0\ua6f1\ua802\ua806\ua80b\ua823-\ua827\ua82c\ua880\ua881\ua8b4-\ua8c5\ua8d0-\ua8d9\ua8e0-\ua8f1\ua8ff-\ua909\ua926-\ua92d\ua947-\ua953\ua980-\ua983\ua9b3-\ua9c0\ua9d0-\ua9d9\ua9e5\ua9f0-\ua9f9\uaa29-\uaa36\uaa43\uaa4c\uaa4d\uaa50-\uaa59\uaa7b-\uaa7d\uaab0\uaab2-\uaab4\uaab7\uaab8\uaabe\uaabf\uaac1\uaaeb-\uaaef\uaaf5\uaaf6\uabe3-\uabea\uabec\uabed\uabf0-\uabf9\ufb1e\ufe00-\ufe0f\ufe20-\ufe2f\ufe33\ufe34\ufe4d-\ufe4f\uff10-\uff19\uff3f\uff65";
+  var nonASCIIidentifierChars = "\u200c\u200d\xb7\u0300-\u036f\u0387\u0483-\u0487\u0591-\u05bd\u05bf\u05c1\u05c2\u05c4\u05c5\u05c7\u0610-\u061a\u064b-\u0669\u0670\u06d6-\u06dc\u06df-\u06e4\u06e7\u06e8\u06ea-\u06ed\u06f0-\u06f9\u0711\u0730-\u074a\u07a6-\u07b0\u07c0-\u07c9\u07eb-\u07f3\u07fd\u0816-\u0819\u081b-\u0823\u0825-\u0827\u0829-\u082d\u0859-\u085b\u0897-\u089f\u08ca-\u08e1\u08e3-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962\u0963\u0966-\u096f\u0981-\u0983\u09bc\u09be-\u09c4\u09c7\u09c8\u09cb-\u09cd\u09d7\u09e2\u09e3\u09e6-\u09ef\u09fe\u0a01-\u0a03\u0a3c\u0a3e-\u0a42\u0a47\u0a48\u0a4b-\u0a4d\u0a51\u0a66-\u0a71\u0a75\u0a81-\u0a83\u0abc\u0abe-\u0ac5\u0ac7-\u0ac9\u0acb-\u0acd\u0ae2\u0ae3\u0ae6-\u0aef\u0afa-\u0aff\u0b01-\u0b03\u0b3c\u0b3e-\u0b44\u0b47\u0b48\u0b4b-\u0b4d\u0b55-\u0b57\u0b62\u0b63\u0b66-\u0b6f\u0b82\u0bbe-\u0bc2\u0bc6-\u0bc8\u0bca-\u0bcd\u0bd7\u0be6-\u0bef\u0c00-\u0c04\u0c3c\u0c3e-\u0c44\u0c46-\u0c48\u0c4a-\u0c4d\u0c55\u0c56\u0c62\u0c63\u0c66-\u0c6f\u0c81-\u0c83\u0cbc\u0cbe-\u0cc4\u0cc6-\u0cc8\u0cca-\u0ccd\u0cd5\u0cd6\u0ce2\u0ce3\u0ce6-\u0cef\u0cf3\u0d00-\u0d03\u0d3b\u0d3c\u0d3e-\u0d44\u0d46-\u0d48\u0d4a-\u0d4d\u0d57\u0d62\u0d63\u0d66-\u0d6f\u0d81-\u0d83\u0dca\u0dcf-\u0dd4\u0dd6\u0dd8-\u0ddf\u0de6-\u0def\u0df2\u0df3\u0e31\u0e34-\u0e3a\u0e47-\u0e4e\u0e50-\u0e59\u0eb1\u0eb4-\u0ebc\u0ec8-\u0ece\u0ed0-\u0ed9\u0f18\u0f19\u0f20-\u0f29\u0f35\u0f37\u0f39\u0f3e\u0f3f\u0f71-\u0f84\u0f86\u0f87\u0f8d-\u0f97\u0f99-\u0fbc\u0fc6\u102b-\u103e\u1040-\u1049\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f-\u109d\u135d-\u135f\u1369-\u1371\u1712-\u1715\u1732-\u1734\u1752\u1753\u1772\u1773\u17b4-\u17d3\u17dd\u17e0-\u17e9\u180b-\u180d\u180f-\u1819\u18a9\u1920-\u192b\u1930-\u193b\u1946-\u194f\u19d0-\u19da\u1a17-\u1a1b\u1a55-\u1a5e\u1a60-\u1a7c\u1a7f-\u1a89\u1a90-\u1a99\u1ab0-\u1abd\u1abf-\u1ace\u1b00-\u1b04\u1b34-\u1b44\u1b50-\u1b59\u1b6b-\u1b73\u1b80-\u1b82\u1ba1-\u1bad\u1bb0-\u1bb9\u1be6-\u1bf3\u1c24-\u1c37\u1c40-\u1c49\u1c50-\u1c59\u1cd0-\u1cd2\u1cd4-\u1ce8\u1ced\u1cf4\u1cf7-\u1cf9\u1dc0-\u1dff\u200c\u200d\u203f\u2040\u2054\u20d0-\u20dc\u20e1\u20e5-\u20f0\u2cef-\u2cf1\u2d7f\u2de0-\u2dff\u302a-\u302f\u3099\u309a\u30fb\ua620-\ua629\ua66f\ua674-\ua67d\ua69e\ua69f\ua6f0\ua6f1\ua802\ua806\ua80b\ua823-\ua827\ua82c\ua880\ua881\ua8b4-\ua8c5\ua8d0-\ua8d9\ua8e0-\ua8f1\ua8ff-\ua909\ua926-\ua92d\ua947-\ua953\ua980-\ua983\ua9b3-\ua9c0\ua9d0-\ua9d9\ua9e5\ua9f0-\ua9f9\uaa29-\uaa36\uaa43\uaa4c\uaa4d\uaa50-\uaa59\uaa7b-\uaa7d\uaab0\uaab2-\uaab4\uaab7\uaab8\uaabe\uaabf\uaac1\uaaeb-\uaaef\uaaf5\uaaf6\uabe3-\uabea\uabec\uabed\uabf0-\uabf9\ufb1e\ufe00-\ufe0f\ufe20-\ufe2f\ufe33\ufe34\ufe4d-\ufe4f\uff10-\uff19\uff3f\uff65";
 
   // This file was generated. Do not modify manually!
-  var nonASCIIidentifierStartChars = "\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\u02c1\u02c6-\u02d1\u02e0-\u02e4\u02ec\u02ee\u0370-\u0374\u0376\u0377\u037a-\u037d\u037f\u0386\u0388-\u038a\u038c\u038e-\u03a1\u03a3-\u03f5\u03f7-\u0481\u048a-\u052f\u0531-\u0556\u0559\u0560-\u0588\u05d0-\u05ea\u05ef-\u05f2\u0620-\u064a\u066e\u066f\u0671-\u06d3\u06d5\u06e5\u06e6\u06ee\u06ef\u06fa-\u06fc\u06ff\u0710\u0712-\u072f\u074d-\u07a5\u07b1\u07ca-\u07ea\u07f4\u07f5\u07fa\u0800-\u0815\u081a\u0824\u0828\u0840-\u0858\u0860-\u086a\u0870-\u0887\u0889-\u088e\u08a0-\u08c9\u0904-\u0939\u093d\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098c\u098f\u0990\u0993-\u09a8\u09aa-\u09b0\u09b2\u09b6-\u09b9\u09bd\u09ce\u09dc\u09dd\u09df-\u09e1\u09f0\u09f1\u09fc\u0a05-\u0a0a\u0a0f\u0a10\u0a13-\u0a28\u0a2a-\u0a30\u0a32\u0a33\u0a35\u0a36\u0a38\u0a39\u0a59-\u0a5c\u0a5e\u0a72-\u0a74\u0a85-\u0a8d\u0a8f-\u0a91\u0a93-\u0aa8\u0aaa-\u0ab0\u0ab2\u0ab3\u0ab5-\u0ab9\u0abd\u0ad0\u0ae0\u0ae1\u0af9\u0b05-\u0b0c\u0b0f\u0b10\u0b13-\u0b28\u0b2a-\u0b30\u0b32\u0b33\u0b35-\u0b39\u0b3d\u0b5c\u0b5d\u0b5f-\u0b61\u0b71\u0b83\u0b85-\u0b8a\u0b8e-\u0b90\u0b92-\u0b95\u0b99\u0b9a\u0b9c\u0b9e\u0b9f\u0ba3\u0ba4\u0ba8-\u0baa\u0bae-\u0bb9\u0bd0\u0c05-\u0c0c\u0c0e-\u0c10\u0c12-\u0c28\u0c2a-\u0c39\u0c3d\u0c58-\u0c5a\u0c5d\u0c60\u0c61\u0c80\u0c85-\u0c8c\u0c8e-\u0c90\u0c92-\u0ca8\u0caa-\u0cb3\u0cb5-\u0cb9\u0cbd\u0cdd\u0cde\u0ce0\u0ce1\u0cf1\u0cf2\u0d04-\u0d0c\u0d0e-\u0d10\u0d12-\u0d3a\u0d3d\u0d4e\u0d54-\u0d56\u0d5f-\u0d61\u0d7a-\u0d7f\u0d85-\u0d96\u0d9a-\u0db1\u0db3-\u0dbb\u0dbd\u0dc0-\u0dc6\u0e01-\u0e30\u0e32\u0e33\u0e40-\u0e46\u0e81\u0e82\u0e84\u0e86-\u0e8a\u0e8c-\u0ea3\u0ea5\u0ea7-\u0eb0\u0eb2\u0eb3\u0ebd\u0ec0-\u0ec4\u0ec6\u0edc-\u0edf\u0f00\u0f40-\u0f47\u0f49-\u0f6c\u0f88-\u0f8c\u1000-\u102a\u103f\u1050-\u1055\u105a-\u105d\u1061\u1065\u1066\u106e-\u1070\u1075-\u1081\u108e\u10a0-\u10c5\u10c7\u10cd\u10d0-\u10fa\u10fc-\u1248\u124a-\u124d\u1250-\u1256\u1258\u125a-\u125d\u1260-\u1288\u128a-\u128d\u1290-\u12b0\u12b2-\u12b5\u12b8-\u12be\u12c0\u12c2-\u12c5\u12c8-\u12d6\u12d8-\u1310\u1312-\u1315\u1318-\u135a\u1380-\u138f\u13a0-\u13f5\u13f8-\u13fd\u1401-\u166c\u166f-\u167f\u1681-\u169a\u16a0-\u16ea\u16ee-\u16f8\u1700-\u1711\u171f-\u1731\u1740-\u1751\u1760-\u176c\u176e-\u1770\u1780-\u17b3\u17d7\u17dc\u1820-\u1878\u1880-\u18a8\u18aa\u18b0-\u18f5\u1900-\u191e\u1950-\u196d\u1970-\u1974\u1980-\u19ab\u19b0-\u19c9\u1a00-\u1a16\u1a20-\u1a54\u1aa7\u1b05-\u1b33\u1b45-\u1b4c\u1b83-\u1ba0\u1bae\u1baf\u1bba-\u1be5\u1c00-\u1c23\u1c4d-\u1c4f\u1c5a-\u1c7d\u1c80-\u1c88\u1c90-\u1cba\u1cbd-\u1cbf\u1ce9-\u1cec\u1cee-\u1cf3\u1cf5\u1cf6\u1cfa\u1d00-\u1dbf\u1e00-\u1f15\u1f18-\u1f1d\u1f20-\u1f45\u1f48-\u1f4d\u1f50-\u1f57\u1f59\u1f5b\u1f5d\u1f5f-\u1f7d\u1f80-\u1fb4\u1fb6-\u1fbc\u1fbe\u1fc2-\u1fc4\u1fc6-\u1fcc\u1fd0-\u1fd3\u1fd6-\u1fdb\u1fe0-\u1fec\u1ff2-\u1ff4\u1ff6-\u1ffc\u2071\u207f\u2090-\u209c\u2102\u2107\u210a-\u2113\u2115\u2118-\u211d\u2124\u2126\u2128\u212a-\u2139\u213c-\u213f\u2145-\u2149\u214e\u2160-\u2188\u2c00-\u2ce4\u2ceb-\u2cee\u2cf2\u2cf3\u2d00-\u2d25\u2d27\u2d2d\u2d30-\u2d67\u2d6f\u2d80-\u2d96\u2da0-\u2da6\u2da8-\u2dae\u2db0-\u2db6\u2db8-\u2dbe\u2dc0-\u2dc6\u2dc8-\u2dce\u2dd0-\u2dd6\u2dd8-\u2dde\u3005-\u3007\u3021-\u3029\u3031-\u3035\u3038-\u303c\u3041-\u3096\u309b-\u309f\u30a1-\u30fa\u30fc-\u30ff\u3105-\u312f\u3131-\u318e\u31a0-\u31bf\u31f0-\u31ff\u3400-\u4dbf\u4e00-\ua48c\ua4d0-\ua4fd\ua500-\ua60c\ua610-\ua61f\ua62a\ua62b\ua640-\ua66e\ua67f-\ua69d\ua6a0-\ua6ef\ua717-\ua71f\ua722-\ua788\ua78b-\ua7ca\ua7d0\ua7d1\ua7d3\ua7d5-\ua7d9\ua7f2-\ua801\ua803-\ua805\ua807-\ua80a\ua80c-\ua822\ua840-\ua873\ua882-\ua8b3\ua8f2-\ua8f7\ua8fb\ua8fd\ua8fe\ua90a-\ua925\ua930-\ua946\ua960-\ua97c\ua984-\ua9b2\ua9cf\ua9e0-\ua9e4\ua9e6-\ua9ef\ua9fa-\ua9fe\uaa00-\uaa28\uaa40-\uaa42\uaa44-\uaa4b\uaa60-\uaa76\uaa7a\uaa7e-\uaaaf\uaab1\uaab5\uaab6\uaab9-\uaabd\uaac0\uaac2\uaadb-\uaadd\uaae0-\uaaea\uaaf2-\uaaf4\uab01-\uab06\uab09-\uab0e\uab11-\uab16\uab20-\uab26\uab28-\uab2e\uab30-\uab5a\uab5c-\uab69\uab70-\uabe2\uac00-\ud7a3\ud7b0-\ud7c6\ud7cb-\ud7fb\uf900-\ufa6d\ufa70-\ufad9\ufb00-\ufb06\ufb13-\ufb17\ufb1d\ufb1f-\ufb28\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40\ufb41\ufb43\ufb44\ufb46-\ufbb1\ufbd3-\ufd3d\ufd50-\ufd8f\ufd92-\ufdc7\ufdf0-\ufdfb\ufe70-\ufe74\ufe76-\ufefc\uff21-\uff3a\uff41-\uff5a\uff66-\uffbe\uffc2-\uffc7\uffca-\uffcf\uffd2-\uffd7\uffda-\uffdc";
+  var nonASCIIidentifierStartChars = "\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\u02c1\u02c6-\u02d1\u02e0-\u02e4\u02ec\u02ee\u0370-\u0374\u0376\u0377\u037a-\u037d\u037f\u0386\u0388-\u038a\u038c\u038e-\u03a1\u03a3-\u03f5\u03f7-\u0481\u048a-\u052f\u0531-\u0556\u0559\u0560-\u0588\u05d0-\u05ea\u05ef-\u05f2\u0620-\u064a\u066e\u066f\u0671-\u06d3\u06d5\u06e5\u06e6\u06ee\u06ef\u06fa-\u06fc\u06ff\u0710\u0712-\u072f\u074d-\u07a5\u07b1\u07ca-\u07ea\u07f4\u07f5\u07fa\u0800-\u0815\u081a\u0824\u0828\u0840-\u0858\u0860-\u086a\u0870-\u0887\u0889-\u088e\u08a0-\u08c9\u0904-\u0939\u093d\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098c\u098f\u0990\u0993-\u09a8\u09aa-\u09b0\u09b2\u09b6-\u09b9\u09bd\u09ce\u09dc\u09dd\u09df-\u09e1\u09f0\u09f1\u09fc\u0a05-\u0a0a\u0a0f\u0a10\u0a13-\u0a28\u0a2a-\u0a30\u0a32\u0a33\u0a35\u0a36\u0a38\u0a39\u0a59-\u0a5c\u0a5e\u0a72-\u0a74\u0a85-\u0a8d\u0a8f-\u0a91\u0a93-\u0aa8\u0aaa-\u0ab0\u0ab2\u0ab3\u0ab5-\u0ab9\u0abd\u0ad0\u0ae0\u0ae1\u0af9\u0b05-\u0b0c\u0b0f\u0b10\u0b13-\u0b28\u0b2a-\u0b30\u0b32\u0b33\u0b35-\u0b39\u0b3d\u0b5c\u0b5d\u0b5f-\u0b61\u0b71\u0b83\u0b85-\u0b8a\u0b8e-\u0b90\u0b92-\u0b95\u0b99\u0b9a\u0b9c\u0b9e\u0b9f\u0ba3\u0ba4\u0ba8-\u0baa\u0bae-\u0bb9\u0bd0\u0c05-\u0c0c\u0c0e-\u0c10\u0c12-\u0c28\u0c2a-\u0c39\u0c3d\u0c58-\u0c5a\u0c5d\u0c60\u0c61\u0c80\u0c85-\u0c8c\u0c8e-\u0c90\u0c92-\u0ca8\u0caa-\u0cb3\u0cb5-\u0cb9\u0cbd\u0cdd\u0cde\u0ce0\u0ce1\u0cf1\u0cf2\u0d04-\u0d0c\u0d0e-\u0d10\u0d12-\u0d3a\u0d3d\u0d4e\u0d54-\u0d56\u0d5f-\u0d61\u0d7a-\u0d7f\u0d85-\u0d96\u0d9a-\u0db1\u0db3-\u0dbb\u0dbd\u0dc0-\u0dc6\u0e01-\u0e30\u0e32\u0e33\u0e40-\u0e46\u0e81\u0e82\u0e84\u0e86-\u0e8a\u0e8c-\u0ea3\u0ea5\u0ea7-\u0eb0\u0eb2\u0eb3\u0ebd\u0ec0-\u0ec4\u0ec6\u0edc-\u0edf\u0f00\u0f40-\u0f47\u0f49-\u0f6c\u0f88-\u0f8c\u1000-\u102a\u103f\u1050-\u1055\u105a-\u105d\u1061\u1065\u1066\u106e-\u1070\u1075-\u1081\u108e\u10a0-\u10c5\u10c7\u10cd\u10d0-\u10fa\u10fc-\u1248\u124a-\u124d\u1250-\u1256\u1258\u125a-\u125d\u1260-\u1288\u128a-\u128d\u1290-\u12b0\u12b2-\u12b5\u12b8-\u12be\u12c0\u12c2-\u12c5\u12c8-\u12d6\u12d8-\u1310\u1312-\u1315\u1318-\u135a\u1380-\u138f\u13a0-\u13f5\u13f8-\u13fd\u1401-\u166c\u166f-\u167f\u1681-\u169a\u16a0-\u16ea\u16ee-\u16f8\u1700-\u1711\u171f-\u1731\u1740-\u1751\u1760-\u176c\u176e-\u1770\u1780-\u17b3\u17d7\u17dc\u1820-\u1878\u1880-\u18a8\u18aa\u18b0-\u18f5\u1900-\u191e\u1950-\u196d\u1970-\u1974\u1980-\u19ab\u19b0-\u19c9\u1a00-\u1a16\u1a20-\u1a54\u1aa7\u1b05-\u1b33\u1b45-\u1b4c\u1b83-\u1ba0\u1bae\u1baf\u1bba-\u1be5\u1c00-\u1c23\u1c4d-\u1c4f\u1c5a-\u1c7d\u1c80-\u1c8a\u1c90-\u1cba\u1cbd-\u1cbf\u1ce9-\u1cec\u1cee-\u1cf3\u1cf5\u1cf6\u1cfa\u1d00-\u1dbf\u1e00-\u1f15\u1f18-\u1f1d\u1f20-\u1f45\u1f48-\u1f4d\u1f50-\u1f57\u1f59\u1f5b\u1f5d\u1f5f-\u1f7d\u1f80-\u1fb4\u1fb6-\u1fbc\u1fbe\u1fc2-\u1fc4\u1fc6-\u1fcc\u1fd0-\u1fd3\u1fd6-\u1fdb\u1fe0-\u1fec\u1ff2-\u1ff4\u1ff6-\u1ffc\u2071\u207f\u2090-\u209c\u2102\u2107\u210a-\u2113\u2115\u2118-\u211d\u2124\u2126\u2128\u212a-\u2139\u213c-\u213f\u2145-\u2149\u214e\u2160-\u2188\u2c00-\u2ce4\u2ceb-\u2cee\u2cf2\u2cf3\u2d00-\u2d25\u2d27\u2d2d\u2d30-\u2d67\u2d6f\u2d80-\u2d96\u2da0-\u2da6\u2da8-\u2dae\u2db0-\u2db6\u2db8-\u2dbe\u2dc0-\u2dc6\u2dc8-\u2dce\u2dd0-\u2dd6\u2dd8-\u2dde\u3005-\u3007\u3021-\u3029\u3031-\u3035\u3038-\u303c\u3041-\u3096\u309b-\u309f\u30a1-\u30fa\u30fc-\u30ff\u3105-\u312f\u3131-\u318e\u31a0-\u31bf\u31f0-\u31ff\u3400-\u4dbf\u4e00-\ua48c\ua4d0-\ua4fd\ua500-\ua60c\ua610-\ua61f\ua62a\ua62b\ua640-\ua66e\ua67f-\ua69d\ua6a0-\ua6ef\ua717-\ua71f\ua722-\ua788\ua78b-\ua7cd\ua7d0\ua7d1\ua7d3\ua7d5-\ua7dc\ua7f2-\ua801\ua803-\ua805\ua807-\ua80a\ua80c-\ua822\ua840-\ua873\ua882-\ua8b3\ua8f2-\ua8f7\ua8fb\ua8fd\ua8fe\ua90a-\ua925\ua930-\ua946\ua960-\ua97c\ua984-\ua9b2\ua9cf\ua9e0-\ua9e4\ua9e6-\ua9ef\ua9fa-\ua9fe\uaa00-\uaa28\uaa40-\uaa42\uaa44-\uaa4b\uaa60-\uaa76\uaa7a\uaa7e-\uaaaf\uaab1\uaab5\uaab6\uaab9-\uaabd\uaac0\uaac2\uaadb-\uaadd\uaae0-\uaaea\uaaf2-\uaaf4\uab01-\uab06\uab09-\uab0e\uab11-\uab16\uab20-\uab26\uab28-\uab2e\uab30-\uab5a\uab5c-\uab69\uab70-\uabe2\uac00-\ud7a3\ud7b0-\ud7c6\ud7cb-\ud7fb\uf900-\ufa6d\ufa70-\ufad9\ufb00-\ufb06\ufb13-\ufb17\ufb1d\ufb1f-\ufb28\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40\ufb41\ufb43\ufb44\ufb46-\ufbb1\ufbd3-\ufd3d\ufd50-\ufd8f\ufd92-\ufdc7\ufdf0-\ufdfb\ufe70-\ufe74\ufe76-\ufefc\uff21-\uff3a\uff41-\uff5a\uff66-\uffbe\uffc2-\uffc7\uffca-\uffcf\uffd2-\uffd7\uffda-\uffdc";
 
   // These are a run-length and offset encoded representation of the
   // >0xffff code points that are a valid part of identifiers. The
@@ -5990,7 +5990,7 @@
   // [walk]: util/walk.js
 
 
-  var version = "8.12.1";
+  var version = "8.13.0";
 
   Parser.acorn = {
     Parser: Parser,
diff --git a/deps/acorn/acorn/dist/acorn.mjs b/deps/acorn/acorn/dist/acorn.mjs
index 3fd7cb30c67b0e..21b860f275a064 100644
--- a/deps/acorn/acorn/dist/acorn.mjs
+++ b/deps/acorn/acorn/dist/acorn.mjs
@@ -1,14 +1,14 @@
 // This file was generated. Do not modify manually!
-var astralIdentifierCodes = [509, 0, 227, 0, 150, 4, 294, 9, 1368, 2, 2, 1, 6, 3, 41, 2, 5, 0, 166, 1, 574, 3, 9, 9, 370, 1, 81, 2, 71, 10, 50, 3, 123, 2, 54, 14, 32, 10, 3, 1, 11, 3, 46, 10, 8, 0, 46, 9, 7, 2, 37, 13, 2, 9, 6, 1, 45, 0, 13, 2, 49, 13, 9, 3, 2, 11, 83, 11, 7, 0, 3, 0, 158, 11, 6, 9, 7, 3, 56, 1, 2, 6, 3, 1, 3, 2, 10, 0, 11, 1, 3, 6, 4, 4, 193, 17, 10, 9, 5, 0, 82, 19, 13, 9, 214, 6, 3, 8, 28, 1, 83, 16, 16, 9, 82, 12, 9, 9, 84, 14, 5, 9, 243, 14, 166, 9, 71, 5, 2, 1, 3, 3, 2, 0, 2, 1, 13, 9, 120, 6, 3, 6, 4, 0, 29, 9, 41, 6, 2, 3, 9, 0, 10, 10, 47, 15, 406, 7, 2, 7, 17, 9, 57, 21, 2, 13, 123, 5, 4, 0, 2, 1, 2, 6, 2, 0, 9, 9, 49, 4, 2, 1, 2, 4, 9, 9, 330, 3, 10, 1, 2, 0, 49, 6, 4, 4, 14, 9, 5351, 0, 7, 14, 13835, 9, 87, 9, 39, 4, 60, 6, 26, 9, 1014, 0, 2, 54, 8, 3, 82, 0, 12, 1, 19628, 1, 4706, 45, 3, 22, 543, 4, 4, 5, 9, 7, 3, 6, 31, 3, 149, 2, 1418, 49, 513, 54, 5, 49, 9, 0, 15, 0, 23, 4, 2, 14, 1361, 6, 2, 16, 3, 6, 2, 1, 2, 4, 101, 0, 161, 6, 10, 9, 357, 0, 62, 13, 499, 13, 983, 6, 110, 6, 6, 9, 4759, 9, 787719, 239];
+var astralIdentifierCodes = [509, 0, 227, 0, 150, 4, 294, 9, 1368, 2, 2, 1, 6, 3, 41, 2, 5, 0, 166, 1, 574, 3, 9, 9, 7, 9, 32, 4, 318, 1, 80, 3, 71, 10, 50, 3, 123, 2, 54, 14, 32, 10, 3, 1, 11, 3, 46, 10, 8, 0, 46, 9, 7, 2, 37, 13, 2, 9, 6, 1, 45, 0, 13, 2, 49, 13, 9, 3, 2, 11, 83, 11, 7, 0, 3, 0, 158, 11, 6, 9, 7, 3, 56, 1, 2, 6, 3, 1, 3, 2, 10, 0, 11, 1, 3, 6, 4, 4, 68, 8, 2, 0, 3, 0, 2, 3, 2, 4, 2, 0, 15, 1, 83, 17, 10, 9, 5, 0, 82, 19, 13, 9, 214, 6, 3, 8, 28, 1, 83, 16, 16, 9, 82, 12, 9, 9, 7, 19, 58, 14, 5, 9, 243, 14, 166, 9, 71, 5, 2, 1, 3, 3, 2, 0, 2, 1, 13, 9, 120, 6, 3, 6, 4, 0, 29, 9, 41, 6, 2, 3, 9, 0, 10, 10, 47, 15, 343, 9, 54, 7, 2, 7, 17, 9, 57, 21, 2, 13, 123, 5, 4, 0, 2, 1, 2, 6, 2, 0, 9, 9, 49, 4, 2, 1, 2, 4, 9, 9, 330, 3, 10, 1, 2, 0, 49, 6, 4, 4, 14, 10, 5350, 0, 7, 14, 11465, 27, 2343, 9, 87, 9, 39, 4, 60, 6, 26, 9, 535, 9, 470, 0, 2, 54, 8, 3, 82, 0, 12, 1, 19628, 1, 4178, 9, 519, 45, 3, 22, 543, 4, 4, 5, 9, 7, 3, 6, 31, 3, 149, 2, 1418, 49, 513, 54, 5, 49, 9, 0, 15, 0, 23, 4, 2, 14, 1361, 6, 2, 16, 3, 6, 2, 1, 2, 4, 101, 0, 161, 6, 10, 9, 357, 0, 62, 13, 499, 13, 245, 1, 2, 9, 726, 6, 110, 6, 6, 9, 4759, 9, 787719, 239];
 
 // This file was generated. Do not modify manually!
-var astralIdentifierStartCodes = [0, 11, 2, 25, 2, 18, 2, 1, 2, 14, 3, 13, 35, 122, 70, 52, 268, 28, 4, 48, 48, 31, 14, 29, 6, 37, 11, 29, 3, 35, 5, 7, 2, 4, 43, 157, 19, 35, 5, 35, 5, 39, 9, 51, 13, 10, 2, 14, 2, 6, 2, 1, 2, 10, 2, 14, 2, 6, 2, 1, 68, 310, 10, 21, 11, 7, 25, 5, 2, 41, 2, 8, 70, 5, 3, 0, 2, 43, 2, 1, 4, 0, 3, 22, 11, 22, 10, 30, 66, 18, 2, 1, 11, 21, 11, 25, 71, 55, 7, 1, 65, 0, 16, 3, 2, 2, 2, 28, 43, 28, 4, 28, 36, 7, 2, 27, 28, 53, 11, 21, 11, 18, 14, 17, 111, 72, 56, 50, 14, 50, 14, 35, 349, 41, 7, 1, 79, 28, 11, 0, 9, 21, 43, 17, 47, 20, 28, 22, 13, 52, 58, 1, 3, 0, 14, 44, 33, 24, 27, 35, 30, 0, 3, 0, 9, 34, 4, 0, 13, 47, 15, 3, 22, 0, 2, 0, 36, 17, 2, 24, 20, 1, 64, 6, 2, 0, 2, 3, 2, 14, 2, 9, 8, 46, 39, 7, 3, 1, 3, 21, 2, 6, 2, 1, 2, 4, 4, 0, 19, 0, 13, 4, 159, 52, 19, 3, 21, 2, 31, 47, 21, 1, 2, 0, 185, 46, 42, 3, 37, 47, 21, 0, 60, 42, 14, 0, 72, 26, 38, 6, 186, 43, 117, 63, 32, 7, 3, 0, 3, 7, 2, 1, 2, 23, 16, 0, 2, 0, 95, 7, 3, 38, 17, 0, 2, 0, 29, 0, 11, 39, 8, 0, 22, 0, 12, 45, 20, 0, 19, 72, 264, 8, 2, 36, 18, 0, 50, 29, 113, 6, 2, 1, 2, 37, 22, 0, 26, 5, 2, 1, 2, 31, 15, 0, 328, 18, 16, 0, 2, 12, 2, 33, 125, 0, 80, 921, 103, 110, 18, 195, 2637, 96, 16, 1071, 18, 5, 4026, 582, 8634, 568, 8, 30, 18, 78, 18, 29, 19, 47, 17, 3, 32, 20, 6, 18, 689, 63, 129, 74, 6, 0, 67, 12, 65, 1, 2, 0, 29, 6135, 9, 1237, 43, 8, 8936, 3, 2, 6, 2, 1, 2, 290, 16, 0, 30, 2, 3, 0, 15, 3, 9, 395, 2309, 106, 6, 12, 4, 8, 8, 9, 5991, 84, 2, 70, 2, 1, 3, 0, 3, 1, 3, 3, 2, 11, 2, 0, 2, 6, 2, 64, 2, 3, 3, 7, 2, 6, 2, 27, 2, 3, 2, 4, 2, 0, 4, 6, 2, 339, 3, 24, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 7, 1845, 30, 7, 5, 262, 61, 147, 44, 11, 6, 17, 0, 322, 29, 19, 43, 485, 27, 757, 6, 2, 3, 2, 1, 2, 14, 2, 196, 60, 67, 8, 0, 1205, 3, 2, 26, 2, 1, 2, 0, 3, 0, 2, 9, 2, 3, 2, 0, 2, 0, 7, 0, 5, 0, 2, 0, 2, 0, 2, 2, 2, 1, 2, 0, 3, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 1, 2, 0, 3, 3, 2, 6, 2, 3, 2, 3, 2, 0, 2, 9, 2, 16, 6, 2, 2, 4, 2, 16, 4421, 42719, 33, 4153, 7, 221, 3, 5761, 15, 7472, 16, 621, 2467, 541, 1507, 4938, 6, 4191];
+var astralIdentifierStartCodes = [0, 11, 2, 25, 2, 18, 2, 1, 2, 14, 3, 13, 35, 122, 70, 52, 268, 28, 4, 48, 48, 31, 14, 29, 6, 37, 11, 29, 3, 35, 5, 7, 2, 4, 43, 157, 19, 35, 5, 35, 5, 39, 9, 51, 13, 10, 2, 14, 2, 6, 2, 1, 2, 10, 2, 14, 2, 6, 2, 1, 4, 51, 13, 310, 10, 21, 11, 7, 25, 5, 2, 41, 2, 8, 70, 5, 3, 0, 2, 43, 2, 1, 4, 0, 3, 22, 11, 22, 10, 30, 66, 18, 2, 1, 11, 21, 11, 25, 71, 55, 7, 1, 65, 0, 16, 3, 2, 2, 2, 28, 43, 28, 4, 28, 36, 7, 2, 27, 28, 53, 11, 21, 11, 18, 14, 17, 111, 72, 56, 50, 14, 50, 14, 35, 39, 27, 10, 22, 251, 41, 7, 1, 17, 2, 60, 28, 11, 0, 9, 21, 43, 17, 47, 20, 28, 22, 13, 52, 58, 1, 3, 0, 14, 44, 33, 24, 27, 35, 30, 0, 3, 0, 9, 34, 4, 0, 13, 47, 15, 3, 22, 0, 2, 0, 36, 17, 2, 24, 20, 1, 64, 6, 2, 0, 2, 3, 2, 14, 2, 9, 8, 46, 39, 7, 3, 1, 3, 21, 2, 6, 2, 1, 2, 4, 4, 0, 19, 0, 13, 4, 31, 9, 2, 0, 3, 0, 2, 37, 2, 0, 26, 0, 2, 0, 45, 52, 19, 3, 21, 2, 31, 47, 21, 1, 2, 0, 185, 46, 42, 3, 37, 47, 21, 0, 60, 42, 14, 0, 72, 26, 38, 6, 186, 43, 117, 63, 32, 7, 3, 0, 3, 7, 2, 1, 2, 23, 16, 0, 2, 0, 95, 7, 3, 38, 17, 0, 2, 0, 29, 0, 11, 39, 8, 0, 22, 0, 12, 45, 20, 0, 19, 72, 200, 32, 32, 8, 2, 36, 18, 0, 50, 29, 113, 6, 2, 1, 2, 37, 22, 0, 26, 5, 2, 1, 2, 31, 15, 0, 328, 18, 16, 0, 2, 12, 2, 33, 125, 0, 80, 921, 103, 110, 18, 195, 2637, 96, 16, 1071, 18, 5, 26, 3994, 6, 582, 6842, 29, 1763, 568, 8, 30, 18, 78, 18, 29, 19, 47, 17, 3, 32, 20, 6, 18, 433, 44, 212, 63, 129, 74, 6, 0, 67, 12, 65, 1, 2, 0, 29, 6135, 9, 1237, 42, 9, 8936, 3, 2, 6, 2, 1, 2, 290, 16, 0, 30, 2, 3, 0, 15, 3, 9, 395, 2309, 106, 6, 12, 4, 8, 8, 9, 5991, 84, 2, 70, 2, 1, 3, 0, 3, 1, 3, 3, 2, 11, 2, 0, 2, 6, 2, 64, 2, 3, 3, 7, 2, 6, 2, 27, 2, 3, 2, 4, 2, 0, 4, 6, 2, 339, 3, 24, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 30, 2, 24, 2, 7, 1845, 30, 7, 5, 262, 61, 147, 44, 11, 6, 17, 0, 322, 29, 19, 43, 485, 27, 229, 29, 3, 0, 496, 6, 2, 3, 2, 1, 2, 14, 2, 196, 60, 67, 8, 0, 1205, 3, 2, 26, 2, 1, 2, 0, 3, 0, 2, 9, 2, 3, 2, 0, 2, 0, 7, 0, 5, 0, 2, 0, 2, 0, 2, 2, 2, 1, 2, 0, 3, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 1, 2, 0, 3, 3, 2, 6, 2, 3, 2, 3, 2, 0, 2, 9, 2, 16, 6, 2, 2, 4, 2, 16, 4421, 42719, 33, 4153, 7, 221, 3, 5761, 15, 7472, 16, 621, 2467, 541, 1507, 4938, 6, 4191];
 
 // This file was generated. Do not modify manually!
-var nonASCIIidentifierChars = "\u200c\u200d\xb7\u0300-\u036f\u0387\u0483-\u0487\u0591-\u05bd\u05bf\u05c1\u05c2\u05c4\u05c5\u05c7\u0610-\u061a\u064b-\u0669\u0670\u06d6-\u06dc\u06df-\u06e4\u06e7\u06e8\u06ea-\u06ed\u06f0-\u06f9\u0711\u0730-\u074a\u07a6-\u07b0\u07c0-\u07c9\u07eb-\u07f3\u07fd\u0816-\u0819\u081b-\u0823\u0825-\u0827\u0829-\u082d\u0859-\u085b\u0898-\u089f\u08ca-\u08e1\u08e3-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962\u0963\u0966-\u096f\u0981-\u0983\u09bc\u09be-\u09c4\u09c7\u09c8\u09cb-\u09cd\u09d7\u09e2\u09e3\u09e6-\u09ef\u09fe\u0a01-\u0a03\u0a3c\u0a3e-\u0a42\u0a47\u0a48\u0a4b-\u0a4d\u0a51\u0a66-\u0a71\u0a75\u0a81-\u0a83\u0abc\u0abe-\u0ac5\u0ac7-\u0ac9\u0acb-\u0acd\u0ae2\u0ae3\u0ae6-\u0aef\u0afa-\u0aff\u0b01-\u0b03\u0b3c\u0b3e-\u0b44\u0b47\u0b48\u0b4b-\u0b4d\u0b55-\u0b57\u0b62\u0b63\u0b66-\u0b6f\u0b82\u0bbe-\u0bc2\u0bc6-\u0bc8\u0bca-\u0bcd\u0bd7\u0be6-\u0bef\u0c00-\u0c04\u0c3c\u0c3e-\u0c44\u0c46-\u0c48\u0c4a-\u0c4d\u0c55\u0c56\u0c62\u0c63\u0c66-\u0c6f\u0c81-\u0c83\u0cbc\u0cbe-\u0cc4\u0cc6-\u0cc8\u0cca-\u0ccd\u0cd5\u0cd6\u0ce2\u0ce3\u0ce6-\u0cef\u0cf3\u0d00-\u0d03\u0d3b\u0d3c\u0d3e-\u0d44\u0d46-\u0d48\u0d4a-\u0d4d\u0d57\u0d62\u0d63\u0d66-\u0d6f\u0d81-\u0d83\u0dca\u0dcf-\u0dd4\u0dd6\u0dd8-\u0ddf\u0de6-\u0def\u0df2\u0df3\u0e31\u0e34-\u0e3a\u0e47-\u0e4e\u0e50-\u0e59\u0eb1\u0eb4-\u0ebc\u0ec8-\u0ece\u0ed0-\u0ed9\u0f18\u0f19\u0f20-\u0f29\u0f35\u0f37\u0f39\u0f3e\u0f3f\u0f71-\u0f84\u0f86\u0f87\u0f8d-\u0f97\u0f99-\u0fbc\u0fc6\u102b-\u103e\u1040-\u1049\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f-\u109d\u135d-\u135f\u1369-\u1371\u1712-\u1715\u1732-\u1734\u1752\u1753\u1772\u1773\u17b4-\u17d3\u17dd\u17e0-\u17e9\u180b-\u180d\u180f-\u1819\u18a9\u1920-\u192b\u1930-\u193b\u1946-\u194f\u19d0-\u19da\u1a17-\u1a1b\u1a55-\u1a5e\u1a60-\u1a7c\u1a7f-\u1a89\u1a90-\u1a99\u1ab0-\u1abd\u1abf-\u1ace\u1b00-\u1b04\u1b34-\u1b44\u1b50-\u1b59\u1b6b-\u1b73\u1b80-\u1b82\u1ba1-\u1bad\u1bb0-\u1bb9\u1be6-\u1bf3\u1c24-\u1c37\u1c40-\u1c49\u1c50-\u1c59\u1cd0-\u1cd2\u1cd4-\u1ce8\u1ced\u1cf4\u1cf7-\u1cf9\u1dc0-\u1dff\u200c\u200d\u203f\u2040\u2054\u20d0-\u20dc\u20e1\u20e5-\u20f0\u2cef-\u2cf1\u2d7f\u2de0-\u2dff\u302a-\u302f\u3099\u309a\u30fb\ua620-\ua629\ua66f\ua674-\ua67d\ua69e\ua69f\ua6f0\ua6f1\ua802\ua806\ua80b\ua823-\ua827\ua82c\ua880\ua881\ua8b4-\ua8c5\ua8d0-\ua8d9\ua8e0-\ua8f1\ua8ff-\ua909\ua926-\ua92d\ua947-\ua953\ua980-\ua983\ua9b3-\ua9c0\ua9d0-\ua9d9\ua9e5\ua9f0-\ua9f9\uaa29-\uaa36\uaa43\uaa4c\uaa4d\uaa50-\uaa59\uaa7b-\uaa7d\uaab0\uaab2-\uaab4\uaab7\uaab8\uaabe\uaabf\uaac1\uaaeb-\uaaef\uaaf5\uaaf6\uabe3-\uabea\uabec\uabed\uabf0-\uabf9\ufb1e\ufe00-\ufe0f\ufe20-\ufe2f\ufe33\ufe34\ufe4d-\ufe4f\uff10-\uff19\uff3f\uff65";
+var nonASCIIidentifierChars = "\u200c\u200d\xb7\u0300-\u036f\u0387\u0483-\u0487\u0591-\u05bd\u05bf\u05c1\u05c2\u05c4\u05c5\u05c7\u0610-\u061a\u064b-\u0669\u0670\u06d6-\u06dc\u06df-\u06e4\u06e7\u06e8\u06ea-\u06ed\u06f0-\u06f9\u0711\u0730-\u074a\u07a6-\u07b0\u07c0-\u07c9\u07eb-\u07f3\u07fd\u0816-\u0819\u081b-\u0823\u0825-\u0827\u0829-\u082d\u0859-\u085b\u0897-\u089f\u08ca-\u08e1\u08e3-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962\u0963\u0966-\u096f\u0981-\u0983\u09bc\u09be-\u09c4\u09c7\u09c8\u09cb-\u09cd\u09d7\u09e2\u09e3\u09e6-\u09ef\u09fe\u0a01-\u0a03\u0a3c\u0a3e-\u0a42\u0a47\u0a48\u0a4b-\u0a4d\u0a51\u0a66-\u0a71\u0a75\u0a81-\u0a83\u0abc\u0abe-\u0ac5\u0ac7-\u0ac9\u0acb-\u0acd\u0ae2\u0ae3\u0ae6-\u0aef\u0afa-\u0aff\u0b01-\u0b03\u0b3c\u0b3e-\u0b44\u0b47\u0b48\u0b4b-\u0b4d\u0b55-\u0b57\u0b62\u0b63\u0b66-\u0b6f\u0b82\u0bbe-\u0bc2\u0bc6-\u0bc8\u0bca-\u0bcd\u0bd7\u0be6-\u0bef\u0c00-\u0c04\u0c3c\u0c3e-\u0c44\u0c46-\u0c48\u0c4a-\u0c4d\u0c55\u0c56\u0c62\u0c63\u0c66-\u0c6f\u0c81-\u0c83\u0cbc\u0cbe-\u0cc4\u0cc6-\u0cc8\u0cca-\u0ccd\u0cd5\u0cd6\u0ce2\u0ce3\u0ce6-\u0cef\u0cf3\u0d00-\u0d03\u0d3b\u0d3c\u0d3e-\u0d44\u0d46-\u0d48\u0d4a-\u0d4d\u0d57\u0d62\u0d63\u0d66-\u0d6f\u0d81-\u0d83\u0dca\u0dcf-\u0dd4\u0dd6\u0dd8-\u0ddf\u0de6-\u0def\u0df2\u0df3\u0e31\u0e34-\u0e3a\u0e47-\u0e4e\u0e50-\u0e59\u0eb1\u0eb4-\u0ebc\u0ec8-\u0ece\u0ed0-\u0ed9\u0f18\u0f19\u0f20-\u0f29\u0f35\u0f37\u0f39\u0f3e\u0f3f\u0f71-\u0f84\u0f86\u0f87\u0f8d-\u0f97\u0f99-\u0fbc\u0fc6\u102b-\u103e\u1040-\u1049\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f-\u109d\u135d-\u135f\u1369-\u1371\u1712-\u1715\u1732-\u1734\u1752\u1753\u1772\u1773\u17b4-\u17d3\u17dd\u17e0-\u17e9\u180b-\u180d\u180f-\u1819\u18a9\u1920-\u192b\u1930-\u193b\u1946-\u194f\u19d0-\u19da\u1a17-\u1a1b\u1a55-\u1a5e\u1a60-\u1a7c\u1a7f-\u1a89\u1a90-\u1a99\u1ab0-\u1abd\u1abf-\u1ace\u1b00-\u1b04\u1b34-\u1b44\u1b50-\u1b59\u1b6b-\u1b73\u1b80-\u1b82\u1ba1-\u1bad\u1bb0-\u1bb9\u1be6-\u1bf3\u1c24-\u1c37\u1c40-\u1c49\u1c50-\u1c59\u1cd0-\u1cd2\u1cd4-\u1ce8\u1ced\u1cf4\u1cf7-\u1cf9\u1dc0-\u1dff\u200c\u200d\u203f\u2040\u2054\u20d0-\u20dc\u20e1\u20e5-\u20f0\u2cef-\u2cf1\u2d7f\u2de0-\u2dff\u302a-\u302f\u3099\u309a\u30fb\ua620-\ua629\ua66f\ua674-\ua67d\ua69e\ua69f\ua6f0\ua6f1\ua802\ua806\ua80b\ua823-\ua827\ua82c\ua880\ua881\ua8b4-\ua8c5\ua8d0-\ua8d9\ua8e0-\ua8f1\ua8ff-\ua909\ua926-\ua92d\ua947-\ua953\ua980-\ua983\ua9b3-\ua9c0\ua9d0-\ua9d9\ua9e5\ua9f0-\ua9f9\uaa29-\uaa36\uaa43\uaa4c\uaa4d\uaa50-\uaa59\uaa7b-\uaa7d\uaab0\uaab2-\uaab4\uaab7\uaab8\uaabe\uaabf\uaac1\uaaeb-\uaaef\uaaf5\uaaf6\uabe3-\uabea\uabec\uabed\uabf0-\uabf9\ufb1e\ufe00-\ufe0f\ufe20-\ufe2f\ufe33\ufe34\ufe4d-\ufe4f\uff10-\uff19\uff3f\uff65";
 
 // This file was generated. Do not modify manually!
-var nonASCIIidentifierStartChars = "\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\u02c1\u02c6-\u02d1\u02e0-\u02e4\u02ec\u02ee\u0370-\u0374\u0376\u0377\u037a-\u037d\u037f\u0386\u0388-\u038a\u038c\u038e-\u03a1\u03a3-\u03f5\u03f7-\u0481\u048a-\u052f\u0531-\u0556\u0559\u0560-\u0588\u05d0-\u05ea\u05ef-\u05f2\u0620-\u064a\u066e\u066f\u0671-\u06d3\u06d5\u06e5\u06e6\u06ee\u06ef\u06fa-\u06fc\u06ff\u0710\u0712-\u072f\u074d-\u07a5\u07b1\u07ca-\u07ea\u07f4\u07f5\u07fa\u0800-\u0815\u081a\u0824\u0828\u0840-\u0858\u0860-\u086a\u0870-\u0887\u0889-\u088e\u08a0-\u08c9\u0904-\u0939\u093d\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098c\u098f\u0990\u0993-\u09a8\u09aa-\u09b0\u09b2\u09b6-\u09b9\u09bd\u09ce\u09dc\u09dd\u09df-\u09e1\u09f0\u09f1\u09fc\u0a05-\u0a0a\u0a0f\u0a10\u0a13-\u0a28\u0a2a-\u0a30\u0a32\u0a33\u0a35\u0a36\u0a38\u0a39\u0a59-\u0a5c\u0a5e\u0a72-\u0a74\u0a85-\u0a8d\u0a8f-\u0a91\u0a93-\u0aa8\u0aaa-\u0ab0\u0ab2\u0ab3\u0ab5-\u0ab9\u0abd\u0ad0\u0ae0\u0ae1\u0af9\u0b05-\u0b0c\u0b0f\u0b10\u0b13-\u0b28\u0b2a-\u0b30\u0b32\u0b33\u0b35-\u0b39\u0b3d\u0b5c\u0b5d\u0b5f-\u0b61\u0b71\u0b83\u0b85-\u0b8a\u0b8e-\u0b90\u0b92-\u0b95\u0b99\u0b9a\u0b9c\u0b9e\u0b9f\u0ba3\u0ba4\u0ba8-\u0baa\u0bae-\u0bb9\u0bd0\u0c05-\u0c0c\u0c0e-\u0c10\u0c12-\u0c28\u0c2a-\u0c39\u0c3d\u0c58-\u0c5a\u0c5d\u0c60\u0c61\u0c80\u0c85-\u0c8c\u0c8e-\u0c90\u0c92-\u0ca8\u0caa-\u0cb3\u0cb5-\u0cb9\u0cbd\u0cdd\u0cde\u0ce0\u0ce1\u0cf1\u0cf2\u0d04-\u0d0c\u0d0e-\u0d10\u0d12-\u0d3a\u0d3d\u0d4e\u0d54-\u0d56\u0d5f-\u0d61\u0d7a-\u0d7f\u0d85-\u0d96\u0d9a-\u0db1\u0db3-\u0dbb\u0dbd\u0dc0-\u0dc6\u0e01-\u0e30\u0e32\u0e33\u0e40-\u0e46\u0e81\u0e82\u0e84\u0e86-\u0e8a\u0e8c-\u0ea3\u0ea5\u0ea7-\u0eb0\u0eb2\u0eb3\u0ebd\u0ec0-\u0ec4\u0ec6\u0edc-\u0edf\u0f00\u0f40-\u0f47\u0f49-\u0f6c\u0f88-\u0f8c\u1000-\u102a\u103f\u1050-\u1055\u105a-\u105d\u1061\u1065\u1066\u106e-\u1070\u1075-\u1081\u108e\u10a0-\u10c5\u10c7\u10cd\u10d0-\u10fa\u10fc-\u1248\u124a-\u124d\u1250-\u1256\u1258\u125a-\u125d\u1260-\u1288\u128a-\u128d\u1290-\u12b0\u12b2-\u12b5\u12b8-\u12be\u12c0\u12c2-\u12c5\u12c8-\u12d6\u12d8-\u1310\u1312-\u1315\u1318-\u135a\u1380-\u138f\u13a0-\u13f5\u13f8-\u13fd\u1401-\u166c\u166f-\u167f\u1681-\u169a\u16a0-\u16ea\u16ee-\u16f8\u1700-\u1711\u171f-\u1731\u1740-\u1751\u1760-\u176c\u176e-\u1770\u1780-\u17b3\u17d7\u17dc\u1820-\u1878\u1880-\u18a8\u18aa\u18b0-\u18f5\u1900-\u191e\u1950-\u196d\u1970-\u1974\u1980-\u19ab\u19b0-\u19c9\u1a00-\u1a16\u1a20-\u1a54\u1aa7\u1b05-\u1b33\u1b45-\u1b4c\u1b83-\u1ba0\u1bae\u1baf\u1bba-\u1be5\u1c00-\u1c23\u1c4d-\u1c4f\u1c5a-\u1c7d\u1c80-\u1c88\u1c90-\u1cba\u1cbd-\u1cbf\u1ce9-\u1cec\u1cee-\u1cf3\u1cf5\u1cf6\u1cfa\u1d00-\u1dbf\u1e00-\u1f15\u1f18-\u1f1d\u1f20-\u1f45\u1f48-\u1f4d\u1f50-\u1f57\u1f59\u1f5b\u1f5d\u1f5f-\u1f7d\u1f80-\u1fb4\u1fb6-\u1fbc\u1fbe\u1fc2-\u1fc4\u1fc6-\u1fcc\u1fd0-\u1fd3\u1fd6-\u1fdb\u1fe0-\u1fec\u1ff2-\u1ff4\u1ff6-\u1ffc\u2071\u207f\u2090-\u209c\u2102\u2107\u210a-\u2113\u2115\u2118-\u211d\u2124\u2126\u2128\u212a-\u2139\u213c-\u213f\u2145-\u2149\u214e\u2160-\u2188\u2c00-\u2ce4\u2ceb-\u2cee\u2cf2\u2cf3\u2d00-\u2d25\u2d27\u2d2d\u2d30-\u2d67\u2d6f\u2d80-\u2d96\u2da0-\u2da6\u2da8-\u2dae\u2db0-\u2db6\u2db8-\u2dbe\u2dc0-\u2dc6\u2dc8-\u2dce\u2dd0-\u2dd6\u2dd8-\u2dde\u3005-\u3007\u3021-\u3029\u3031-\u3035\u3038-\u303c\u3041-\u3096\u309b-\u309f\u30a1-\u30fa\u30fc-\u30ff\u3105-\u312f\u3131-\u318e\u31a0-\u31bf\u31f0-\u31ff\u3400-\u4dbf\u4e00-\ua48c\ua4d0-\ua4fd\ua500-\ua60c\ua610-\ua61f\ua62a\ua62b\ua640-\ua66e\ua67f-\ua69d\ua6a0-\ua6ef\ua717-\ua71f\ua722-\ua788\ua78b-\ua7ca\ua7d0\ua7d1\ua7d3\ua7d5-\ua7d9\ua7f2-\ua801\ua803-\ua805\ua807-\ua80a\ua80c-\ua822\ua840-\ua873\ua882-\ua8b3\ua8f2-\ua8f7\ua8fb\ua8fd\ua8fe\ua90a-\ua925\ua930-\ua946\ua960-\ua97c\ua984-\ua9b2\ua9cf\ua9e0-\ua9e4\ua9e6-\ua9ef\ua9fa-\ua9fe\uaa00-\uaa28\uaa40-\uaa42\uaa44-\uaa4b\uaa60-\uaa76\uaa7a\uaa7e-\uaaaf\uaab1\uaab5\uaab6\uaab9-\uaabd\uaac0\uaac2\uaadb-\uaadd\uaae0-\uaaea\uaaf2-\uaaf4\uab01-\uab06\uab09-\uab0e\uab11-\uab16\uab20-\uab26\uab28-\uab2e\uab30-\uab5a\uab5c-\uab69\uab70-\uabe2\uac00-\ud7a3\ud7b0-\ud7c6\ud7cb-\ud7fb\uf900-\ufa6d\ufa70-\ufad9\ufb00-\ufb06\ufb13-\ufb17\ufb1d\ufb1f-\ufb28\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40\ufb41\ufb43\ufb44\ufb46-\ufbb1\ufbd3-\ufd3d\ufd50-\ufd8f\ufd92-\ufdc7\ufdf0-\ufdfb\ufe70-\ufe74\ufe76-\ufefc\uff21-\uff3a\uff41-\uff5a\uff66-\uffbe\uffc2-\uffc7\uffca-\uffcf\uffd2-\uffd7\uffda-\uffdc";
+var nonASCIIidentifierStartChars = "\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\u02c1\u02c6-\u02d1\u02e0-\u02e4\u02ec\u02ee\u0370-\u0374\u0376\u0377\u037a-\u037d\u037f\u0386\u0388-\u038a\u038c\u038e-\u03a1\u03a3-\u03f5\u03f7-\u0481\u048a-\u052f\u0531-\u0556\u0559\u0560-\u0588\u05d0-\u05ea\u05ef-\u05f2\u0620-\u064a\u066e\u066f\u0671-\u06d3\u06d5\u06e5\u06e6\u06ee\u06ef\u06fa-\u06fc\u06ff\u0710\u0712-\u072f\u074d-\u07a5\u07b1\u07ca-\u07ea\u07f4\u07f5\u07fa\u0800-\u0815\u081a\u0824\u0828\u0840-\u0858\u0860-\u086a\u0870-\u0887\u0889-\u088e\u08a0-\u08c9\u0904-\u0939\u093d\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098c\u098f\u0990\u0993-\u09a8\u09aa-\u09b0\u09b2\u09b6-\u09b9\u09bd\u09ce\u09dc\u09dd\u09df-\u09e1\u09f0\u09f1\u09fc\u0a05-\u0a0a\u0a0f\u0a10\u0a13-\u0a28\u0a2a-\u0a30\u0a32\u0a33\u0a35\u0a36\u0a38\u0a39\u0a59-\u0a5c\u0a5e\u0a72-\u0a74\u0a85-\u0a8d\u0a8f-\u0a91\u0a93-\u0aa8\u0aaa-\u0ab0\u0ab2\u0ab3\u0ab5-\u0ab9\u0abd\u0ad0\u0ae0\u0ae1\u0af9\u0b05-\u0b0c\u0b0f\u0b10\u0b13-\u0b28\u0b2a-\u0b30\u0b32\u0b33\u0b35-\u0b39\u0b3d\u0b5c\u0b5d\u0b5f-\u0b61\u0b71\u0b83\u0b85-\u0b8a\u0b8e-\u0b90\u0b92-\u0b95\u0b99\u0b9a\u0b9c\u0b9e\u0b9f\u0ba3\u0ba4\u0ba8-\u0baa\u0bae-\u0bb9\u0bd0\u0c05-\u0c0c\u0c0e-\u0c10\u0c12-\u0c28\u0c2a-\u0c39\u0c3d\u0c58-\u0c5a\u0c5d\u0c60\u0c61\u0c80\u0c85-\u0c8c\u0c8e-\u0c90\u0c92-\u0ca8\u0caa-\u0cb3\u0cb5-\u0cb9\u0cbd\u0cdd\u0cde\u0ce0\u0ce1\u0cf1\u0cf2\u0d04-\u0d0c\u0d0e-\u0d10\u0d12-\u0d3a\u0d3d\u0d4e\u0d54-\u0d56\u0d5f-\u0d61\u0d7a-\u0d7f\u0d85-\u0d96\u0d9a-\u0db1\u0db3-\u0dbb\u0dbd\u0dc0-\u0dc6\u0e01-\u0e30\u0e32\u0e33\u0e40-\u0e46\u0e81\u0e82\u0e84\u0e86-\u0e8a\u0e8c-\u0ea3\u0ea5\u0ea7-\u0eb0\u0eb2\u0eb3\u0ebd\u0ec0-\u0ec4\u0ec6\u0edc-\u0edf\u0f00\u0f40-\u0f47\u0f49-\u0f6c\u0f88-\u0f8c\u1000-\u102a\u103f\u1050-\u1055\u105a-\u105d\u1061\u1065\u1066\u106e-\u1070\u1075-\u1081\u108e\u10a0-\u10c5\u10c7\u10cd\u10d0-\u10fa\u10fc-\u1248\u124a-\u124d\u1250-\u1256\u1258\u125a-\u125d\u1260-\u1288\u128a-\u128d\u1290-\u12b0\u12b2-\u12b5\u12b8-\u12be\u12c0\u12c2-\u12c5\u12c8-\u12d6\u12d8-\u1310\u1312-\u1315\u1318-\u135a\u1380-\u138f\u13a0-\u13f5\u13f8-\u13fd\u1401-\u166c\u166f-\u167f\u1681-\u169a\u16a0-\u16ea\u16ee-\u16f8\u1700-\u1711\u171f-\u1731\u1740-\u1751\u1760-\u176c\u176e-\u1770\u1780-\u17b3\u17d7\u17dc\u1820-\u1878\u1880-\u18a8\u18aa\u18b0-\u18f5\u1900-\u191e\u1950-\u196d\u1970-\u1974\u1980-\u19ab\u19b0-\u19c9\u1a00-\u1a16\u1a20-\u1a54\u1aa7\u1b05-\u1b33\u1b45-\u1b4c\u1b83-\u1ba0\u1bae\u1baf\u1bba-\u1be5\u1c00-\u1c23\u1c4d-\u1c4f\u1c5a-\u1c7d\u1c80-\u1c8a\u1c90-\u1cba\u1cbd-\u1cbf\u1ce9-\u1cec\u1cee-\u1cf3\u1cf5\u1cf6\u1cfa\u1d00-\u1dbf\u1e00-\u1f15\u1f18-\u1f1d\u1f20-\u1f45\u1f48-\u1f4d\u1f50-\u1f57\u1f59\u1f5b\u1f5d\u1f5f-\u1f7d\u1f80-\u1fb4\u1fb6-\u1fbc\u1fbe\u1fc2-\u1fc4\u1fc6-\u1fcc\u1fd0-\u1fd3\u1fd6-\u1fdb\u1fe0-\u1fec\u1ff2-\u1ff4\u1ff6-\u1ffc\u2071\u207f\u2090-\u209c\u2102\u2107\u210a-\u2113\u2115\u2118-\u211d\u2124\u2126\u2128\u212a-\u2139\u213c-\u213f\u2145-\u2149\u214e\u2160-\u2188\u2c00-\u2ce4\u2ceb-\u2cee\u2cf2\u2cf3\u2d00-\u2d25\u2d27\u2d2d\u2d30-\u2d67\u2d6f\u2d80-\u2d96\u2da0-\u2da6\u2da8-\u2dae\u2db0-\u2db6\u2db8-\u2dbe\u2dc0-\u2dc6\u2dc8-\u2dce\u2dd0-\u2dd6\u2dd8-\u2dde\u3005-\u3007\u3021-\u3029\u3031-\u3035\u3038-\u303c\u3041-\u3096\u309b-\u309f\u30a1-\u30fa\u30fc-\u30ff\u3105-\u312f\u3131-\u318e\u31a0-\u31bf\u31f0-\u31ff\u3400-\u4dbf\u4e00-\ua48c\ua4d0-\ua4fd\ua500-\ua60c\ua610-\ua61f\ua62a\ua62b\ua640-\ua66e\ua67f-\ua69d\ua6a0-\ua6ef\ua717-\ua71f\ua722-\ua788\ua78b-\ua7cd\ua7d0\ua7d1\ua7d3\ua7d5-\ua7dc\ua7f2-\ua801\ua803-\ua805\ua807-\ua80a\ua80c-\ua822\ua840-\ua873\ua882-\ua8b3\ua8f2-\ua8f7\ua8fb\ua8fd\ua8fe\ua90a-\ua925\ua930-\ua946\ua960-\ua97c\ua984-\ua9b2\ua9cf\ua9e0-\ua9e4\ua9e6-\ua9ef\ua9fa-\ua9fe\uaa00-\uaa28\uaa40-\uaa42\uaa44-\uaa4b\uaa60-\uaa76\uaa7a\uaa7e-\uaaaf\uaab1\uaab5\uaab6\uaab9-\uaabd\uaac0\uaac2\uaadb-\uaadd\uaae0-\uaaea\uaaf2-\uaaf4\uab01-\uab06\uab09-\uab0e\uab11-\uab16\uab20-\uab26\uab28-\uab2e\uab30-\uab5a\uab5c-\uab69\uab70-\uabe2\uac00-\ud7a3\ud7b0-\ud7c6\ud7cb-\ud7fb\uf900-\ufa6d\ufa70-\ufad9\ufb00-\ufb06\ufb13-\ufb17\ufb1d\ufb1f-\ufb28\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40\ufb41\ufb43\ufb44\ufb46-\ufbb1\ufbd3-\ufd3d\ufd50-\ufd8f\ufd92-\ufdc7\ufdf0-\ufdfb\ufe70-\ufe74\ufe76-\ufefc\uff21-\uff3a\uff41-\uff5a\uff66-\uffbe\uffc2-\uffc7\uffca-\uffcf\uffd2-\uffd7\uffda-\uffdc";
 
 // These are a run-length and offset encoded representation of the
 // >0xffff code points that are a valid part of identifiers. The
@@ -5984,7 +5984,7 @@ pp.readWord = function() {
 // [walk]: util/walk.js
 
 
-var version = "8.12.1";
+var version = "8.13.0";
 
 Parser.acorn = {
   Parser: Parser,
diff --git a/deps/acorn/acorn/package.json b/deps/acorn/acorn/package.json
index 355692a301ea5d..3396013bbbf060 100644
--- a/deps/acorn/acorn/package.json
+++ b/deps/acorn/acorn/package.json
@@ -16,7 +16,7 @@
     ],
     "./package.json": "./package.json"
   },
-  "version": "8.12.1",
+  "version": "8.13.0",
   "engines": {
     "node": ">=0.4.0"
   },
diff --git a/src/acorn_version.h b/src/acorn_version.h
index b625cceb704283..fdafbf96987762 100644
--- a/src/acorn_version.h
+++ b/src/acorn_version.h
@@ -2,5 +2,5 @@
 // Refer to tools/dep_updaters/update-acorn.sh
 #ifndef SRC_ACORN_VERSION_H_
 #define SRC_ACORN_VERSION_H_
-#define ACORN_VERSION "8.12.1"
+#define ACORN_VERSION "8.13.0"
 #endif  // SRC_ACORN_VERSION_H_

From 046430c47efb5ac758253d4e2d2fd539fa94535b Mon Sep 17 00:00:00 2001
From: Michael Cho <michael@michaelcho.dev>
Date: Tue, 29 Oct 2024 19:24:38 -0400
Subject: [PATCH 056/216] build: fix building with system icu 76

ICU 76 decided to reduce overlinking[^1] thus `icu-i18n` will no longer
add `icu-uc` when linking to shared libraries. This results in undefined
symbols/references when trying to build with system ICU 76.

[^1]: unicode-org/icu@199bc82

PR-URL: https://github.com/nodejs/node/pull/55563
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 configure.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.py b/configure.py
index 6cfb95eee95690..ae3dd156d4e02e 100755
--- a/configure.py
+++ b/configure.py
@@ -1829,7 +1829,7 @@ def icu_download(path):
   elif with_intl == 'system-icu':
     # ICU from pkg-config.
     o['variables']['v8_enable_i18n_support'] = 1
-    pkgicu = pkg_config('icu-i18n')
+    pkgicu = pkg_config(['icu-i18n', 'icu-uc'])
     if not pkgicu[0]:
       error('''Could not load pkg-config data for "icu-i18n".
        See above errors or the README.md.''')

From 0f8b8269d1055408e297abe2781d7969da112635 Mon Sep 17 00:00:00 2001
From: Julian Gassner <julian.gassner@tum.de>
Date: Wed, 30 Oct 2024 03:10:30 +0100
Subject: [PATCH 057/216] test: split up test-runner-mock-timers test

PR-URL: https://github.com/nodejs/node/pull/55506
Reviewed-By: Erick Wendel <erick.workspace@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Claudio Wunder <cwunder@gnome.org>
---
 test/parallel/test-runner-mock-timers-date.js | 120 +++++++++
 .../test-runner-mock-timers-scheduler.js      | 124 +++++++++
 test/parallel/test-runner-mock-timers.js      | 240 +-----------------
 3 files changed, 248 insertions(+), 236 deletions(-)
 create mode 100644 test/parallel/test-runner-mock-timers-date.js
 create mode 100644 test/parallel/test-runner-mock-timers-scheduler.js

diff --git a/test/parallel/test-runner-mock-timers-date.js b/test/parallel/test-runner-mock-timers-date.js
new file mode 100644
index 00000000000000..ebd1e430be803f
--- /dev/null
+++ b/test/parallel/test-runner-mock-timers-date.js
@@ -0,0 +1,120 @@
+'use strict';
+process.env.NODE_TEST_KNOWN_GLOBALS = 0;
+require('../common');
+
+const assert = require('node:assert');
+const { it, describe } = require('node:test');
+
+describe('Mock Timers Date Test Suite', () => {
+  it('should return the initial UNIX epoch if not specified', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    const date = new Date();
+    assert.strictEqual(date.getTime(), 0);
+    assert.strictEqual(Date.now(), 0);
+  });
+
+  it('should throw an error if setTime is called without enabling timers', (t) => {
+    assert.throws(
+      () => {
+        t.mock.timers.setTime(100);
+      },
+      { code: 'ERR_INVALID_STATE' }
+    );
+  });
+
+  it('should throw an error if epoch passed to enable is not valid', (t) => {
+    assert.throws(
+      () => {
+        t.mock.timers.enable({ now: -1 });
+      },
+      { code: 'ERR_INVALID_ARG_VALUE' }
+    );
+
+    assert.throws(
+      () => {
+        t.mock.timers.enable({ now: 'string' });
+      },
+      { code: 'ERR_INVALID_ARG_TYPE' }
+    );
+
+    assert.throws(
+      () => {
+        t.mock.timers.enable({ now: NaN });
+      },
+      { code: 'ERR_INVALID_ARG_VALUE' }
+    );
+  });
+
+  it('should replace the original Date with the mocked one', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    assert.ok(Date.isMock);
+  });
+
+  it('should return the ticked time when calling Date.now after tick', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    const time = 100;
+    t.mock.timers.tick(time);
+    assert.strictEqual(Date.now(), time);
+  });
+
+  it('should return the Date as string when calling it as a function', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    const returned = Date();
+    // Matches the format: 'Mon Jan 01 1970 00:00:00'
+    // We don't care about the date, just the format
+    assert.ok(/\w{3}\s\w{3}\s\d{1,2}\s\d{2,4}\s\d{1,2}:\d{2}:\d{2}/.test(returned));
+  });
+
+  it('should return the date with different argument calls', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    assert.strictEqual(new Date(0).getTime(), 0);
+    assert.strictEqual(new Date(100).getTime(), 100);
+    assert.strictEqual(new Date('1970-01-01T00:00:00.000Z').getTime(), 0);
+    assert.strictEqual(new Date(1970, 0).getFullYear(), 1970);
+    assert.strictEqual(new Date(1970, 0).getMonth(), 0);
+    assert.strictEqual(new Date(1970, 0, 1).getDate(), 1);
+    assert.strictEqual(new Date(1970, 0, 1, 11).getHours(), 11);
+    assert.strictEqual(new Date(1970, 0, 1, 11, 10).getMinutes(), 10);
+    assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45).getSeconds(), 45);
+    assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45, 898).getMilliseconds(), 898);
+    assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45, 898).toDateString(), 'Thu Jan 01 1970');
+  });
+
+  it('should return native code when calling Date.toString', (t) => {
+    t.mock.timers.enable({ apis: ['Date'] });
+    assert.strictEqual(Date.toString(), 'function Date() { [native code] }');
+  });
+
+  it('should start with a custom epoch if the second argument is specified', (t) => {
+    t.mock.timers.enable({ apis: ['Date'], now: 100 });
+    const date1 = new Date();
+    assert.strictEqual(date1.getTime(), 100);
+
+    t.mock.timers.reset();
+    t.mock.timers.enable({ apis: ['Date'], now: new Date(200) });
+    const date2 = new Date();
+    assert.strictEqual(date2.getTime(), 200);
+  });
+
+  it('should replace epoch if setTime is lesser than now and not tick', (t) => {
+    t.mock.timers.enable();
+    const fn = t.mock.fn();
+    const id = setTimeout(fn, 1000);
+    t.mock.timers.setTime(800);
+    assert.strictEqual(Date.now(), 800);
+    t.mock.timers.setTime(500);
+    assert.strictEqual(Date.now(), 500);
+    assert.strictEqual(fn.mock.callCount(), 0);
+    clearTimeout(id);
+  });
+
+  it('should not tick time when setTime is called', (t) => {
+    t.mock.timers.enable();
+    const fn = t.mock.fn();
+    const id = setTimeout(fn, 1000);
+    t.mock.timers.setTime(1200);
+    assert.strictEqual(Date.now(), 1200);
+    assert.strictEqual(fn.mock.callCount(), 0);
+    clearTimeout(id);
+  });
+});
diff --git a/test/parallel/test-runner-mock-timers-scheduler.js b/test/parallel/test-runner-mock-timers-scheduler.js
new file mode 100644
index 00000000000000..6a83056e70eda2
--- /dev/null
+++ b/test/parallel/test-runner-mock-timers-scheduler.js
@@ -0,0 +1,124 @@
+'use strict';
+process.env.NODE_TEST_KNOWN_GLOBALS = 0;
+const common = require('../common');
+
+const assert = require('node:assert');
+const { it, describe } = require('node:test');
+const nodeTimersPromises = require('node:timers/promises');
+
+describe('Mock Timers Scheduler Test Suite', () => {
+  it('should advance in time and trigger timers when calling the .tick function', (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+
+    const now = Date.now();
+    const durationAtMost = 100;
+
+    const p = nodeTimersPromises.scheduler.wait(4000);
+    t.mock.timers.tick(4000);
+
+    return p.then(common.mustCall((result) => {
+      assert.strictEqual(result, undefined);
+      assert.ok(
+        Date.now() - now < durationAtMost,
+        `time should be advanced less than the ${durationAtMost}ms`
+      );
+    }));
+  });
+
+  it('should advance in time and trigger timers when calling the .tick function multiple times', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+
+    const fn = t.mock.fn();
+
+    nodeTimersPromises.scheduler.wait(9999).then(fn);
+
+    t.mock.timers.tick(8999);
+    assert.strictEqual(fn.mock.callCount(), 0);
+    t.mock.timers.tick(500);
+
+    await nodeTimersPromises.setImmediate();
+
+    assert.strictEqual(fn.mock.callCount(), 0);
+    t.mock.timers.tick(500);
+
+    await nodeTimersPromises.setImmediate();
+    assert.strictEqual(fn.mock.callCount(), 1);
+  });
+
+  it('should work with the same params as the original timers/promises/scheduler.wait', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+    const controller = new AbortController();
+    const p = nodeTimersPromises.scheduler.wait(2000, {
+      ref: true,
+      signal: controller.signal,
+    });
+
+    t.mock.timers.tick(1000);
+    t.mock.timers.tick(500);
+    t.mock.timers.tick(500);
+    t.mock.timers.tick(500);
+
+    const result = await p;
+    assert.strictEqual(result, undefined);
+  });
+
+  it('should abort operation if timers/promises/scheduler.wait received an aborted signal', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+    const controller = new AbortController();
+    const p = nodeTimersPromises.scheduler.wait(2000, {
+      ref: true,
+      signal: controller.signal,
+    });
+
+    t.mock.timers.tick(1000);
+    controller.abort();
+    t.mock.timers.tick(500);
+    t.mock.timers.tick(500);
+    t.mock.timers.tick(500);
+
+    await assert.rejects(() => p, {
+      name: 'AbortError',
+    });
+  });
+  it('should abort operation even if the .tick was not called', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+    const controller = new AbortController();
+    const p = nodeTimersPromises.scheduler.wait(2000, {
+      ref: true,
+      signal: controller.signal,
+    });
+
+    controller.abort();
+
+    await assert.rejects(() => p, {
+      name: 'AbortError',
+    });
+  });
+
+  it('should abort operation when .abort is called before calling setInterval', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+    const controller = new AbortController();
+    controller.abort();
+    const p = nodeTimersPromises.scheduler.wait(2000, {
+      ref: true,
+      signal: controller.signal,
+    });
+
+    await assert.rejects(() => p, {
+      name: 'AbortError',
+    });
+  });
+
+  it('should reject given an an invalid signal instance', async (t) => {
+    t.mock.timers.enable({ apis: ['scheduler.wait'] });
+    const p = nodeTimersPromises.scheduler.wait(2000, {
+      ref: true,
+      signal: {},
+    });
+
+    await assert.rejects(() => p, {
+      name: 'TypeError',
+      code: 'ERR_INVALID_ARG_TYPE',
+    });
+  });
+});
diff --git a/test/parallel/test-runner-mock-timers.js b/test/parallel/test-runner-mock-timers.js
index e438b2636b832a..87b8ba7e3784d2 100644
--- a/test/parallel/test-runner-mock-timers.js
+++ b/test/parallel/test-runner-mock-timers.js
@@ -1,3 +1,4 @@
+// Flags: --expose-internals
 'use strict';
 process.env.NODE_TEST_KNOWN_GLOBALS = 0;
 const common = require('../common');
@@ -6,6 +7,7 @@ const assert = require('node:assert');
 const { it, mock, describe } = require('node:test');
 const nodeTimers = require('node:timers');
 const nodeTimersPromises = require('node:timers/promises');
+const { TIMEOUT_MAX } = require('internal/timers');
 
 describe('Mock Timers Test Suite', () => {
   describe('MockTimers API', () => {
@@ -252,10 +254,10 @@ describe('Mock Timers Test Suite', () => {
         }), timeout);
       });
 
-      it('should change timeout to 1ms when it is >= 2 ** 31', (t) => {
+      it('should change timeout to 1ms when it is > TIMEOUT_MAX', (t) => {
         t.mock.timers.enable({ apis: ['setTimeout'] });
         const fn = t.mock.fn();
-        global.setTimeout(fn, 2 ** 31);
+        global.setTimeout(fn, TIMEOUT_MAX + 1);
         t.mock.timers.tick(1);
         assert.strictEqual(fn.mock.callCount(), 1);
       });
@@ -791,240 +793,6 @@ describe('Mock Timers Test Suite', () => {
     });
   });
 
-  describe('scheduler Suite', () => {
-    describe('scheduler.wait', () => {
-      it('should advance in time and trigger timers when calling the .tick function', (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-
-        const now = Date.now();
-        const durationAtMost = 100;
-
-        const p = nodeTimersPromises.scheduler.wait(4000);
-        t.mock.timers.tick(4000);
-
-        return p.then(common.mustCall((result) => {
-          assert.strictEqual(result, undefined);
-          assert.ok(
-            Date.now() - now < durationAtMost,
-            `time should be advanced less than the ${durationAtMost}ms`
-          );
-        }));
-      });
-
-      it('should advance in time and trigger timers when calling the .tick function multiple times', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-
-        const fn = t.mock.fn();
-
-        nodeTimersPromises.scheduler.wait(9999).then(fn);
-
-        t.mock.timers.tick(8999);
-        assert.strictEqual(fn.mock.callCount(), 0);
-        t.mock.timers.tick(500);
-
-        await nodeTimersPromises.setImmediate();
-
-        assert.strictEqual(fn.mock.callCount(), 0);
-        t.mock.timers.tick(500);
-
-        await nodeTimersPromises.setImmediate();
-        assert.strictEqual(fn.mock.callCount(), 1);
-      });
-
-      it('should work with the same params as the original timers/promises/scheduler.wait', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-        const controller = new AbortController();
-        const p = nodeTimersPromises.scheduler.wait(2000, {
-          ref: true,
-          signal: controller.signal,
-        });
-
-        t.mock.timers.tick(1000);
-        t.mock.timers.tick(500);
-        t.mock.timers.tick(500);
-        t.mock.timers.tick(500);
-
-        const result = await p;
-        assert.strictEqual(result, undefined);
-      });
-
-      it('should abort operation if timers/promises/scheduler.wait received an aborted signal', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-        const controller = new AbortController();
-        const p = nodeTimersPromises.scheduler.wait(2000, {
-          ref: true,
-          signal: controller.signal,
-        });
-
-        t.mock.timers.tick(1000);
-        controller.abort();
-        t.mock.timers.tick(500);
-        t.mock.timers.tick(500);
-        t.mock.timers.tick(500);
-
-        await assert.rejects(() => p, {
-          name: 'AbortError',
-        });
-      });
-      it('should abort operation even if the .tick was not called', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-        const controller = new AbortController();
-        const p = nodeTimersPromises.scheduler.wait(2000, {
-          ref: true,
-          signal: controller.signal,
-        });
-
-        controller.abort();
-
-        await assert.rejects(() => p, {
-          name: 'AbortError',
-        });
-      });
-
-      it('should abort operation when .abort is called before calling setInterval', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-        const controller = new AbortController();
-        controller.abort();
-        const p = nodeTimersPromises.scheduler.wait(2000, {
-          ref: true,
-          signal: controller.signal,
-        });
-
-        await assert.rejects(() => p, {
-          name: 'AbortError',
-        });
-      });
-
-      it('should reject given an an invalid signal instance', async (t) => {
-        t.mock.timers.enable({ apis: ['scheduler.wait'] });
-        const p = nodeTimersPromises.scheduler.wait(2000, {
-          ref: true,
-          signal: {},
-        });
-
-        await assert.rejects(() => p, {
-          name: 'TypeError',
-          code: 'ERR_INVALID_ARG_TYPE',
-        });
-      });
-
-    });
-  });
-
-  describe('Date Suite', () => {
-    it('should return the initial UNIX epoch if not specified', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      const date = new Date();
-      assert.strictEqual(date.getTime(), 0);
-      assert.strictEqual(Date.now(), 0);
-    });
-
-    it('should throw an error if setTime is called without enabling timers', (t) => {
-      assert.throws(
-        () => {
-          t.mock.timers.setTime(100);
-        },
-        { code: 'ERR_INVALID_STATE' }
-      );
-    });
-
-    it('should throw an error if epoch passed to enable is not valid', (t) => {
-      assert.throws(
-        () => {
-          t.mock.timers.enable({ now: -1 });
-        },
-        { code: 'ERR_INVALID_ARG_VALUE' }
-      );
-
-      assert.throws(
-        () => {
-          t.mock.timers.enable({ now: 'string' });
-        },
-        { code: 'ERR_INVALID_ARG_TYPE' }
-      );
-
-      assert.throws(
-        () => {
-          t.mock.timers.enable({ now: NaN });
-        },
-        { code: 'ERR_INVALID_ARG_VALUE' }
-      );
-    });
-
-    it('should replace the original Date with the mocked one', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      assert.ok(Date.isMock);
-    });
-
-    it('should return the ticked time when calling Date.now after tick', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      const time = 100;
-      t.mock.timers.tick(time);
-      assert.strictEqual(Date.now(), time);
-    });
-
-    it('should return the Date as string when calling it as a function', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      const returned = Date();
-      // Matches the format: 'Mon Jan 01 1970 00:00:00'
-      // We don't care about the date, just the format
-      assert.ok(/\w{3}\s\w{3}\s\d{1,2}\s\d{2,4}\s\d{1,2}:\d{2}:\d{2}/.test(returned));
-    });
-
-    it('should return the date with different argument calls', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      assert.strictEqual(new Date(0).getTime(), 0);
-      assert.strictEqual(new Date(100).getTime(), 100);
-      assert.strictEqual(new Date('1970-01-01T00:00:00.000Z').getTime(), 0);
-      assert.strictEqual(new Date(1970, 0).getFullYear(), 1970);
-      assert.strictEqual(new Date(1970, 0).getMonth(), 0);
-      assert.strictEqual(new Date(1970, 0, 1).getDate(), 1);
-      assert.strictEqual(new Date(1970, 0, 1, 11).getHours(), 11);
-      assert.strictEqual(new Date(1970, 0, 1, 11, 10).getMinutes(), 10);
-      assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45).getSeconds(), 45);
-      assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45, 898).getMilliseconds(), 898);
-      assert.strictEqual(new Date(1970, 0, 1, 11, 10, 45, 898).toDateString(), 'Thu Jan 01 1970');
-    });
-
-    it('should return native code when calling Date.toString', (t) => {
-      t.mock.timers.enable({ apis: ['Date'] });
-      assert.strictEqual(Date.toString(), 'function Date() { [native code] }');
-    });
-
-    it('should start with a custom epoch if the second argument is specified', (t) => {
-      t.mock.timers.enable({ apis: ['Date'], now: 100 });
-      const date1 = new Date();
-      assert.strictEqual(date1.getTime(), 100);
-
-      t.mock.timers.reset();
-      t.mock.timers.enable({ apis: ['Date'], now: new Date(200) });
-      const date2 = new Date();
-      assert.strictEqual(date2.getTime(), 200);
-    });
-
-    it('should replace epoch if setTime is lesser than now and not tick', (t) => {
-      t.mock.timers.enable();
-      const fn = t.mock.fn();
-      const id = setTimeout(fn, 1000);
-      t.mock.timers.setTime(800);
-      assert.strictEqual(Date.now(), 800);
-      t.mock.timers.setTime(500);
-      assert.strictEqual(Date.now(), 500);
-      assert.strictEqual(fn.mock.callCount(), 0);
-      clearTimeout(id);
-    });
-
-    it('should not tick time when setTime is called', (t) => {
-      t.mock.timers.enable();
-      const fn = t.mock.fn();
-      const id = setTimeout(fn, 1000);
-      t.mock.timers.setTime(1200);
-      assert.strictEqual(Date.now(), 1200);
-      assert.strictEqual(fn.mock.callCount(), 0);
-      clearTimeout(id);
-    });
-  });
-
   describe('Api should have same public properties as original', () => {
     it('should have hasRef', (t) => {
       t.mock.timers.enable();

From 12bd57fbaa19ee102235b7d53e68fa4bf01aad5f Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Wed, 30 Oct 2024 09:33:04 -0400
Subject: [PATCH 058/216] doc: capitalize "MIT License"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55575
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f51b1487f86589..8f01a50eb370d7 100644
--- a/README.md
+++ b/README.md
@@ -877,7 +877,7 @@ releases on a rotation basis as outlined in the
 ## License
 
 Node.js is available under the
-[MIT license](https://opensource.org/licenses/MIT). Node.js also includes
+[MIT License](https://opensource.org/licenses/MIT). Node.js also includes
 external libraries that are available under a variety of licenses.  See
 [LICENSE](https://github.com/nodejs/node/blob/HEAD/LICENSE) for the full
 license text.

From 22e0d17097fa419cde5fcd5d648fe70aa9fb80e2 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Wed, 30 Oct 2024 10:10:28 -0400
Subject: [PATCH 059/216] dns: stop using deprecated `ares_query`

PR-URL: https://github.com/nodejs/node/pull/55430
Refs: https://github.com/nodejs/node/issues/52464
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 src/cares_wrap.cc | 24 ++++++++++++------------
 src/cares_wrap.h  | 35 +++++++++++++++++++----------------
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/src/cares_wrap.cc b/src/cares_wrap.cc
index 2392adfc0121bc..6209956c405e04 100644
--- a/src/cares_wrap.cc
+++ b/src/cares_wrap.cc
@@ -829,62 +829,62 @@ void ChannelWrap::EnsureServers() {
 }
 
 int AnyTraits::Send(QueryWrap<AnyTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_any);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_ANY);
   return ARES_SUCCESS;
 }
 
 int ATraits::Send(QueryWrap<ATraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_a);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_A);
   return ARES_SUCCESS;
 }
 
 int AaaaTraits::Send(QueryWrap<AaaaTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_aaaa);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_AAAA);
   return ARES_SUCCESS;
 }
 
 int CaaTraits::Send(QueryWrap<CaaTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, T_CAA);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_CAA);
   return ARES_SUCCESS;
 }
 
 int CnameTraits::Send(QueryWrap<CnameTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_cname);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_CNAME);
   return ARES_SUCCESS;
 }
 
 int MxTraits::Send(QueryWrap<MxTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_mx);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_MX);
   return ARES_SUCCESS;
 }
 
 int NsTraits::Send(QueryWrap<NsTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_ns);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_NS);
   return ARES_SUCCESS;
 }
 
 int TxtTraits::Send(QueryWrap<TxtTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_txt);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_TXT);
   return ARES_SUCCESS;
 }
 
 int SrvTraits::Send(QueryWrap<SrvTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_srv);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_SRV);
   return ARES_SUCCESS;
 }
 
 int PtrTraits::Send(QueryWrap<PtrTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_ptr);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_PTR);
   return ARES_SUCCESS;
 }
 
 int NaptrTraits::Send(QueryWrap<NaptrTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_naptr);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_NAPTR);
   return ARES_SUCCESS;
 }
 
 int SoaTraits::Send(QueryWrap<SoaTraits>* wrap, const char* name) {
-  wrap->AresQuery(name, ns_c_in, ns_t_soa);
+  wrap->AresQuery(name, ARES_CLASS_IN, ARES_REC_TYPE_SOA);
   return ARES_SUCCESS;
 }
 
diff --git a/src/cares_wrap.h b/src/cares_wrap.h
index 021ef1c9de518e..4a5d22c0ef085f 100644
--- a/src/cares_wrap.h
+++ b/src/cares_wrap.h
@@ -246,18 +246,20 @@ class QueryWrap final : public AsyncWrap {
     return Traits::Send(this, name);
   }
 
-  void AresQuery(const char* name, int dnsclass, int type) {
+  void AresQuery(const char* name,
+                 ares_dns_class_t dnsclass,
+                 ares_dns_rec_type_t type) {
     channel_->EnsureServers();
     TRACE_EVENT_NESTABLE_ASYNC_BEGIN1(
       TRACING_CATEGORY_NODE2(dns, native), trace_name_, this,
       "name", TRACE_STR_COPY(name));
-    ares_query(
-        channel_->cares_channel(),
-        name,
-        dnsclass,
-        type,
-        Callback,
-        MakeCallbackPointer());
+    ares_query_dnsrec(channel_->cares_channel(),
+                      name,
+                      dnsclass,
+                      type,
+                      Callback,
+                      MakeCallbackPointer(),
+                      nullptr);
   }
 
   void ParseError(int status) {
@@ -304,19 +306,20 @@ class QueryWrap final : public AsyncWrap {
     return wrap;
   }
 
-  static void Callback(
-      void* arg,
-      int status,
-      int timeouts,
-      unsigned char* answer_buf,
-      int answer_len) {
+  static void Callback(void* arg,
+                       ares_status_t status,
+                       size_t timeouts,
+                       const ares_dns_record_t* dnsrec) {
     QueryWrap<Traits>* wrap = FromCallbackPointer(arg);
     if (wrap == nullptr) return;
 
     unsigned char* buf_copy = nullptr;
+    size_t answer_len = 0;
     if (status == ARES_SUCCESS) {
-      buf_copy = node::Malloc<unsigned char>(answer_len);
-      memcpy(buf_copy, answer_buf, answer_len);
+      // No need to explicitly call ares_free_string here,
+      // as it is a wrapper around free, which is already
+      // invoked when MallocedBuffer is destructed.
+      ares_dns_write(dnsrec, &buf_copy, &answer_len);
     }
 
     wrap->response_data_ = std::make_unique<ResponseData>();

From 8d5b8c31d84963a7958c419068d23ab84f8c0c8f Mon Sep 17 00:00:00 2001
From: Charles Kerr <charles@charleskerr.com>
Date: Wed, 30 Oct 2024 23:48:58 -0500
Subject: [PATCH 060/216] src: use NewFromUtf8Literal in NODE_DEFINE_CONSTANT

Small efficiency improvement over NewFromUtf8(): the literal's
length is known at compile time, so V8 doesn't have to call
strlen() or ToLocalChecked().

PR-URL: https://github.com/nodejs/node/pull/55581
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 src/node.h | 70 +++++++++++++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 38 deletions(-)

diff --git a/src/node.h b/src/node.h
index 6373adacb62845..ec5f6d0d25731d 100644
--- a/src/node.h
+++ b/src/node.h
@@ -1023,44 +1023,38 @@ NODE_DEPRECATED("Use v8::Date::ValueOf() directly",
 })
 #define NODE_V8_UNIXTIME node::NODE_V8_UNIXTIME
 
-#define NODE_DEFINE_CONSTANT(target, constant)                                \
-  do {                                                                        \
-    v8::Isolate* isolate = target->GetIsolate();                              \
-    v8::Local<v8::Context> context = isolate->GetCurrentContext();            \
-    v8::Local<v8::String> constant_name =                                     \
-        v8::String::NewFromUtf8(isolate, #constant,                           \
-            v8::NewStringType::kInternalized).ToLocalChecked();               \
-    v8::Local<v8::Number> constant_value =                                    \
-        v8::Number::New(isolate, static_cast<double>(constant));              \
-    v8::PropertyAttribute constant_attributes =                               \
-        static_cast<v8::PropertyAttribute>(v8::ReadOnly | v8::DontDelete);    \
-    (target)->DefineOwnProperty(context,                                      \
-                                constant_name,                                \
-                                constant_value,                               \
-                                constant_attributes).Check();                 \
-  }                                                                           \
-  while (0)
-
-#define NODE_DEFINE_HIDDEN_CONSTANT(target, constant)                         \
-  do {                                                                        \
-    v8::Isolate* isolate = target->GetIsolate();                              \
-    v8::Local<v8::Context> context = isolate->GetCurrentContext();            \
-    v8::Local<v8::String> constant_name =                                     \
-        v8::String::NewFromUtf8(isolate, #constant,                           \
-                                v8::NewStringType::kInternalized)             \
-                                  .ToLocalChecked();                          \
-    v8::Local<v8::Number> constant_value =                                    \
-        v8::Number::New(isolate, static_cast<double>(constant));              \
-    v8::PropertyAttribute constant_attributes =                               \
-        static_cast<v8::PropertyAttribute>(v8::ReadOnly |                     \
-                                           v8::DontDelete |                   \
-                                           v8::DontEnum);                     \
-    (target)->DefineOwnProperty(context,                                      \
-                                constant_name,                                \
-                                constant_value,                               \
-                                constant_attributes).Check();                 \
-  }                                                                           \
-  while (0)
+#define NODE_DEFINE_CONSTANT(target, constant)                                 \
+  do {                                                                         \
+    v8::Isolate* isolate = target->GetIsolate();                               \
+    v8::Local<v8::Context> context = isolate->GetCurrentContext();             \
+    v8::Local<v8::String> constant_name = v8::String::NewFromUtf8Literal(      \
+        isolate, #constant, v8::NewStringType::kInternalized);                 \
+    v8::Local<v8::Number> constant_value =                                     \
+        v8::Number::New(isolate, static_cast<double>(constant));               \
+    v8::PropertyAttribute constant_attributes =                                \
+        static_cast<v8::PropertyAttribute>(v8::ReadOnly | v8::DontDelete);     \
+    (target)                                                                   \
+        ->DefineOwnProperty(                                                   \
+            context, constant_name, constant_value, constant_attributes)       \
+        .Check();                                                              \
+  } while (0)
+
+#define NODE_DEFINE_HIDDEN_CONSTANT(target, constant)                          \
+  do {                                                                         \
+    v8::Isolate* isolate = target->GetIsolate();                               \
+    v8::Local<v8::Context> context = isolate->GetCurrentContext();             \
+    v8::Local<v8::String> constant_name = v8::String::NewFromUtf8Literal(      \
+        isolate, #constant, v8::NewStringType::kInternalized);                 \
+    v8::Local<v8::Number> constant_value =                                     \
+        v8::Number::New(isolate, static_cast<double>(constant));               \
+    v8::PropertyAttribute constant_attributes =                                \
+        static_cast<v8::PropertyAttribute>(v8::ReadOnly | v8::DontDelete |     \
+                                           v8::DontEnum);                      \
+    (target)                                                                   \
+        ->DefineOwnProperty(                                                   \
+            context, constant_name, constant_value, constant_attributes)       \
+        .Check();                                                              \
+  } while (0)
 
 // Used to be a macro, hence the uppercase name.
 inline void NODE_SET_METHOD(v8::Local<v8::Template> recv,

From 4c15bd44a0afb9b8aa86ca3d3ce3e1a56f6186b3 Mon Sep 17 00:00:00 2001
From: Orgad Shaneh <orgad.shaneh@audiocodes.com>
Date: Thu, 31 Oct 2024 19:08:45 +0200
Subject: [PATCH 061/216] http2: fix client async storage persistence
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Create and store an AsyncResource for each stream, following a similar
approach as used in HttpAgent.

Fixes: https://github.com/nodejs/node/issues/55376
PR-URL: https://github.com/nodejs/node/pull/55460
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
---
 lib/internal/http2/core.js                    | 13 ++++-
 .../test-http2-async-local-storage.js         | 55 +++++++++++++++++++
 2 files changed, 66 insertions(+), 2 deletions(-)
 create mode 100644 test/parallel/test-http2-async-local-storage.js

diff --git a/lib/internal/http2/core.js b/lib/internal/http2/core.js
index 67ca3e59fd6870..2b8d25acfe7ae3 100644
--- a/lib/internal/http2/core.js
+++ b/lib/internal/http2/core.js
@@ -60,6 +60,8 @@ const {
     owner_symbol,
   },
 } = require('internal/async_hooks');
+const { AsyncResource } = require('async_hooks');
+
 const {
   aggregateTwoErrors,
   codes: {
@@ -241,6 +243,7 @@ const kPendingRequestCalls = Symbol('kPendingRequestCalls');
 const kProceed = Symbol('proceed');
 const kProtocol = Symbol('protocol');
 const kRemoteSettings = Symbol('remote-settings');
+const kRequestAsyncResource = Symbol('requestAsyncResource');
 const kSelectPadding = Symbol('select-padding');
 const kSentHeaders = Symbol('sent-headers');
 const kSentTrailers = Symbol('sent-trailers');
@@ -408,7 +411,11 @@ function onSessionHeaders(handle, id, cat, flags, headers, sensitiveHeaders) {
       originSet.delete(stream[kOrigin]);
     }
     debugStream(id, type, "emitting stream '%s' event", event);
-    process.nextTick(emit, stream, event, obj, flags, headers);
+    const reqAsync = stream[kRequestAsyncResource];
+    if (reqAsync)
+      reqAsync.runInAsyncScope(process.nextTick, null, emit, stream, event, obj, flags, headers);
+    else
+      process.nextTick(emit, stream, event, obj, flags, headers);
   }
   if (endOfStream) {
     stream.push(null);
@@ -1809,6 +1816,8 @@ class ClientHttp2Session extends Http2Session {
     stream[kSentHeaders] = headers;
     stream[kOrigin] = `${headers[HTTP2_HEADER_SCHEME]}://` +
                       `${getAuthority(headers)}`;
+    const reqAsync = new AsyncResource('PendingRequest');
+    stream[kRequestAsyncResource] = reqAsync;
 
     // Close the writable side of the stream if options.endStream is set.
     if (options.endStream)
@@ -1831,7 +1840,7 @@ class ClientHttp2Session extends Http2Session {
       }
     }
 
-    const onConnect = requestOnConnect.bind(stream, headersList, options);
+    const onConnect = reqAsync.bind(requestOnConnect.bind(stream, headersList, options));
     if (this.connecting) {
       if (this[kPendingRequestCalls] !== null) {
         this[kPendingRequestCalls].push(onConnect);
diff --git a/test/parallel/test-http2-async-local-storage.js b/test/parallel/test-http2-async-local-storage.js
new file mode 100644
index 00000000000000..699285221f847e
--- /dev/null
+++ b/test/parallel/test-http2-async-local-storage.js
@@ -0,0 +1,55 @@
+'use strict';
+
+const common = require('../common');
+if (!common.hasCrypto)
+  common.skip('missing crypto');
+const assert = require('assert');
+const http2 = require('http2');
+const async_hooks = require('async_hooks');
+
+const storage = new async_hooks.AsyncLocalStorage();
+
+const {
+  HTTP2_HEADER_CONTENT_TYPE,
+  HTTP2_HEADER_PATH,
+  HTTP2_HEADER_STATUS,
+} = http2.constants;
+
+const server = http2.createServer();
+server.on('stream', (stream) => {
+  stream.respond({
+    [HTTP2_HEADER_CONTENT_TYPE]: 'text/plain; charset=utf-8',
+    [HTTP2_HEADER_STATUS]: 200
+  });
+  stream.on('error', common.mustNotCall());
+  stream.end('data');
+});
+
+server.listen(0, async () => {
+  const client = storage.run({ id: 0 }, () => http2.connect(`http://localhost:${server.address().port}`));
+
+  async function doReq(id) {
+    const req = client.request({ [HTTP2_HEADER_PATH]: '/' });
+
+    req.on('response', common.mustCall((headers) => {
+      assert.strictEqual(headers[HTTP2_HEADER_STATUS], 200);
+      assert.strictEqual(id, storage.getStore().id);
+    }));
+    req.on('data', common.mustCall((data) => {
+      assert.strictEqual(data.toString(), 'data');
+      assert.strictEqual(id, storage.getStore().id);
+    }));
+    req.on('end', common.mustCall(() => {
+      assert.strictEqual(id, storage.getStore().id);
+      server.close();
+      client.close();
+    }));
+  }
+
+  function doReqWith(id) {
+    storage.run({ id }, () => doReq(id));
+  }
+
+  doReqWith(1);
+  doReqWith(2);
+});

From f17416ec3e0cd9e28503c2a80834e0418e3f9a72 Mon Sep 17 00:00:00 2001
From: theanarkh <theratliter@gmail.com>
Date: Fri, 1 Nov 2024 11:28:03 +0800
Subject: [PATCH 062/216] src: fix dns crash when failed to create NodeAresTask

PR-URL: https://github.com/nodejs/node/pull/55521
Fixes: https://github.com/nodejs/node/issues/52439
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 src/cares_wrap.cc | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/cares_wrap.cc b/src/cares_wrap.cc
index 6209956c405e04..deb3e19a6cde7c 100644
--- a/src/cares_wrap.cc
+++ b/src/cares_wrap.cc
@@ -144,14 +144,10 @@ void ares_sockstate_cb(void* data, ares_socket_t sock, int read, int write) {
                   ares_poll_cb);
 
   } else {
-    /* read == 0 and write == 0 this is c-ares's way of notifying us that */
-    /* the socket is now closed. We must free the data associated with */
-    /* socket. */
-    CHECK(task &&
-          "When an ares socket is closed we should have a handle for it");
-
-    channel->task_list()->erase(it);
-    channel->env()->CloseHandle(&task->poll_watcher, ares_poll_close_cb);
+    if (task != nullptr) {
+      channel->task_list()->erase(it);
+      channel->env()->CloseHandle(&task->poll_watcher, ares_poll_close_cb);
+    }
 
     if (channel->task_list()->empty()) {
       channel->CloseTimer();
@@ -682,7 +678,6 @@ GetNameInfoReqWrap::GetNameInfoReqWrap(
 void ChannelWrap::AresTimeout(uv_timer_t* handle) {
   ChannelWrap* channel = static_cast<ChannelWrap*>(handle->data);
   CHECK_EQ(channel->timer_handle(), handle);
-  CHECK_EQ(false, channel->task_list()->empty());
   ares_process_fd(channel->cares_channel(), ARES_SOCKET_BAD, ARES_SOCKET_BAD);
 }
 

From 4576d14d0f46a7b5bf7d22ad8db635cf2b503186 Mon Sep 17 00:00:00 2001
From: Gireesh Punathil <gpunathi@in.ibm.com>
Date: Fri, 1 Nov 2024 09:13:53 +0530
Subject: [PATCH 063/216] doc: improve c++ embedder API doc

normalise the headers, fixup bullet points and
expand `node::IsolateData` scope for clarity.

PR-URL: https://github.com/nodejs/node/pull/55597
Reviewed-By: Akhil Marsonya <akhil.marsonya27@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/embedding.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/doc/api/embedding.md b/doc/api/embedding.md
index d4ae090c255f97..114f1128af0a42 100644
--- a/doc/api/embedding.md
+++ b/doc/api/embedding.md
@@ -23,7 +23,7 @@ a Node.js-specific environment.
 
 The full code can be found [in the Node.js source tree][embedtest.cc].
 
-### Setting up per-process state
+### Setting up a per-process state
 
 Node.js requires some per-process state management in order to run:
 
@@ -72,7 +72,7 @@ int main(int argc, char** argv) {
 }
 ```
 
-### Per-instance state
+### Setting up a per-instance state
 
 <!-- YAML
 changes:
@@ -86,11 +86,12 @@ Node.js has a concept of a “Node.js instance”, that is commonly being referr
 to as `node::Environment`. Each `node::Environment` is associated with:
 
 * Exactly one `v8::Isolate`, i.e. one JS Engine instance,
-* Exactly one `uv_loop_t`, i.e. one event loop, and
-* A number of `v8::Context`s, but exactly one main `v8::Context`.
+* Exactly one `uv_loop_t`, i.e. one event loop,
+* A number of `v8::Context`s, but exactly one main `v8::Context`, and
 * One `node::IsolateData` instance that contains information that could be
-  shared by multiple `node::Environment`s that use the same `v8::Isolate`.
-  Currently, no testing is performed for this scenario.
+  shared by multiple `node::Environment`s. The embedder should make sure
+  that `node::IsolateData` is shared only among `node::Environment`s that
+  use the same `v8::Isolate`, Node.js does not perform this check.
 
 In order to set up a `v8::Isolate`, an `v8::ArrayBuffer::Allocator` needs
 to be provided. One possible choice is the default Node.js allocator, which

From 6f47f53f9056b74e95993e332e4e9a57212b550c Mon Sep 17 00:00:00 2001
From: RafaelGSS <rafael.nunu@hotmail.com>
Date: Wed, 30 Oct 2024 15:57:46 -0300
Subject: [PATCH 064/216] src,lib: optimize nodeTiming.uvMetricsInfo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55614
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
---
 lib/internal/perf/nodetiming.js |  9 ++++++++-
 src/node_perf.cc                | 26 +++++++++-----------------
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/lib/internal/perf/nodetiming.js b/lib/internal/perf/nodetiming.js
index d19bc021263a7f..a9e0c3f252ce5e 100644
--- a/lib/internal/perf/nodetiming.js
+++ b/lib/internal/perf/nodetiming.js
@@ -128,7 +128,14 @@ class PerformanceNodeTiming {
         __proto__: null,
         enumerable: true,
         configurable: true,
-        get: uvMetricsInfo,
+        get: () => {
+          const metrics = uvMetricsInfo();
+          return {
+            loopCount: metrics[0],
+            events: metrics[1],
+            eventsWaiting: metrics[2],
+          };
+        },
       },
     });
   }
diff --git a/src/node_perf.cc b/src/node_perf.cc
index a22eaf1a6f23c5..81a4b4ffa636b6 100644
--- a/src/node_perf.cc
+++ b/src/node_perf.cc
@@ -14,6 +14,7 @@
 namespace node {
 namespace performance {
 
+using v8::Array;
 using v8::Context;
 using v8::DontDelete;
 using v8::Function;
@@ -263,26 +264,17 @@ void LoopIdleTime(const FunctionCallbackInfo<Value>& args) {
 
 void UvMetricsInfo(const FunctionCallbackInfo<Value>& args) {
   Environment* env = Environment::GetCurrent(args);
+  Isolate* isolate = env->isolate();
   uv_metrics_t metrics;
-
   // uv_metrics_info always return 0
   CHECK_EQ(uv_metrics_info(env->event_loop(), &metrics), 0);
-
-  Local<Object> obj = Object::New(env->isolate());
-  obj->Set(env->context(),
-           env->loop_count(),
-           Integer::NewFromUnsigned(env->isolate(), metrics.loop_count))
-      .Check();
-  obj->Set(env->context(),
-           env->events(),
-           Integer::NewFromUnsigned(env->isolate(), metrics.events))
-      .Check();
-  obj->Set(env->context(),
-           env->events_waiting(),
-           Integer::NewFromUnsigned(env->isolate(), metrics.events_waiting))
-      .Check();
-
-  args.GetReturnValue().Set(obj);
+  Local<Value> data[] = {
+      Integer::New(isolate, metrics.loop_count),
+      Integer::New(isolate, metrics.events),
+      Integer::New(isolate, metrics.events_waiting),
+  };
+  Local<Array> arr = Array::New(env->isolate(), data, arraysize(data));
+  args.GetReturnValue().Set(arr);
 }
 
 void CreateELDHistogram(const FunctionCallbackInfo<Value>& args) {

From 17abec436773410e142c2b615efd5a6c77eab2ef Mon Sep 17 00:00:00 2001
From: RafaelGSS <rafael.nunu@hotmail.com>
Date: Wed, 30 Oct 2024 22:58:06 -0300
Subject: [PATCH 065/216] benchmark: add nodeTiming.uvmetricsinfo bench
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55614
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
---
 .../perf_hooks/nodetiming-uvmetricsinfo.js    | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 benchmark/perf_hooks/nodetiming-uvmetricsinfo.js

diff --git a/benchmark/perf_hooks/nodetiming-uvmetricsinfo.js b/benchmark/perf_hooks/nodetiming-uvmetricsinfo.js
new file mode 100644
index 00000000000000..1d8d174de14fbc
--- /dev/null
+++ b/benchmark/perf_hooks/nodetiming-uvmetricsinfo.js
@@ -0,0 +1,29 @@
+'use strict';
+
+const common = require('../common.js');
+const assert = require('node:assert');
+const fs = require('node:fs/promises');
+
+const {
+  performance,
+} = require('perf_hooks');
+
+const bench = common.createBenchmark(main, {
+  n: [1e6],
+  events: [1, 1000, 10000],
+});
+
+async function runEvents(events) {
+  for (let i = 0; i < events; ++i) {
+    assert.ok(await fs.statfs(__filename));
+  }
+}
+
+async function main({ n, events }) {
+  await runEvents(events);
+  bench.start();
+  for (let i = 0; i < n; i++) {
+    assert.ok(performance.nodeTiming.uvMetricsInfo);
+  }
+  bench.end(n);
+}

From 0c14cae2b27f3acaf2ad10c2a55b2989ef604071 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Fri, 1 Nov 2024 14:32:48 -0400
Subject: [PATCH 066/216] meta: show PR/issue title on review-wanted
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55606
Refs: https://openjs-foundation.slack.com/archives/C019Y2T6STH/p1730308054959239?thread_ts=1730296053.898089&cid=C019Y2T6STH
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
---
 .github/workflows/notify-on-review-wanted.yml | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/notify-on-review-wanted.yml b/.github/workflows/notify-on-review-wanted.yml
index 471455e3bf6b28..6521318a2dce81 100644
--- a/.github/workflows/notify-on-review-wanted.yml
+++ b/.github/workflows/notify-on-review-wanted.yml
@@ -16,17 +16,20 @@ jobs:
     steps:
       - name: Determine PR or Issue
         id: define-message
+        env:
+          TITLE_ISSUE: ${{ github.event.issue.title }}
+          TITLE_PR: ${{ github.event.pull_request.title }}
         run: |
           if [[ -n "${{ github.event.pull_request.number }}" ]]; then
             number="${{ github.event.pull_request.number }}"
             link="https://github.com/${{ github.repository }}/pull/$number"
             echo "message=The PR (#$number) requires review from Node.js maintainers. See: $link" >> "$GITHUB_OUTPUT"
-            echo "title=${{ github.actor }} asks for attention on pull request #$number" >> "$GITHUB_OUTPUT"
+            echo "title=$TITLE_PR" >> "$GITHUB_OUTPUT"
           else
             number="${{ github.event.issue.number }}"
             link="https://github.com/${{ github.repository }}/issues/$number"
             echo "message=The issue (#$number) requires review from Node.js maintainers. See: $link" >> "$GITHUB_OUTPUT"
-            echo "title=${{ github.actor }} asks for attention on issue #$number" >> "$GITHUB_OUTPUT"
+            echo "title=$TITLE_ISSUE" >> "$GITHUB_OUTPUT"
           fi
 
       - name: Slack Notification

From 8cd619f8d72e2e339c3ad31799f6f46d3c0f56e7 Mon Sep 17 00:00:00 2001
From: Filip Skokan <panva.ip@gmail.com>
Date: Sat, 2 Nov 2024 12:36:25 +0000
Subject: [PATCH 067/216] doc: remove mention of ECDH-ES in
 crypto.diffieHellman
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55611
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
---
 doc/api/crypto.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/api/crypto.md b/doc/api/crypto.md
index bfd96bd1169f32..4086f8a81a9b07 100644
--- a/doc/api/crypto.md
+++ b/doc/api/crypto.md
@@ -3638,7 +3638,7 @@ added:
 
 Computes the Diffie-Hellman secret based on a `privateKey` and a `publicKey`.
 Both keys must have the same `asymmetricKeyType`, which must be one of `'dh'`
-(for Diffie-Hellman), `'ec'` (for ECDH), `'x448'`, or `'x25519'` (for ECDH-ES).
+(for Diffie-Hellman), `'ec'`, `'x448'`, or `'x25519'` (for ECDH).
 
 ### `crypto.hash(algorithm, data[, outputEncoding])`
 

From 9c2d0fd242dde941876be57d8b7ac8ec862ce841 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Sat, 2 Nov 2024 11:15:25 -0400
Subject: [PATCH 068/216] meta: make review-wanted message minimal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55607
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 .github/workflows/notify-on-review-wanted.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/notify-on-review-wanted.yml b/.github/workflows/notify-on-review-wanted.yml
index 6521318a2dce81..1f076a00027766 100644
--- a/.github/workflows/notify-on-review-wanted.yml
+++ b/.github/workflows/notify-on-review-wanted.yml
@@ -35,6 +35,7 @@ jobs:
       - name: Slack Notification
         uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907  # 2.3.0
         env:
+          MSG_MINIMAL: actions url
           SLACK_COLOR: '#3d85c6'
           SLACK_ICON: https://github.com/nodejs.png?size=48
           SLACK_TITLE: ${{ steps.define-message.outputs.title }}

From 1bb461e2b63cfea0f1dda651c0ba85174242ab1c Mon Sep 17 00:00:00 2001
From: robberfree <robberfree@outlook.com>
Date: Sun, 3 Nov 2024 00:55:53 +0800
Subject: [PATCH 069/216] doc: add write flag when open file as the demo code's
 intention

PR-URL: https://github.com/nodejs/node/pull/54626
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 doc/api/stream.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/api/stream.md b/doc/api/stream.md
index f566558c97a0d2..e5116b42650e41 100644
--- a/doc/api/stream.md
+++ b/doc/api/stream.md
@@ -3715,7 +3715,7 @@ class WriteStream extends Writable {
     this.fd = null;
   }
   _construct(callback) {
-    fs.open(this.filename, (err, fd) => {
+    fs.open(this.filename, 'w', (err, fd) => {
       if (err) {
         callback(err);
       } else {

From cdb7839a0cfed324db9691990b928ee96684231a Mon Sep 17 00:00:00 2001
From: Filip Skokan <panva.ip@gmail.com>
Date: Sat, 2 Nov 2024 20:24:19 +0000
Subject: [PATCH 070/216] tools: run daily WPT.fyi report on all supported
 releases

PR-URL: https://github.com/nodejs/node/pull/55619
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 .github/workflows/daily-wpt-fyi.yml | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index 0cea4b01619b30..35f326365f1865 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -6,11 +6,6 @@ name: Daily WPT report
 
 on:
   workflow_dispatch:
-    inputs:
-      node-versions:
-        description: Node.js versions (as supported by actions/setup-node) to test as JSON array
-        required: false
-        default: '["current", "lts/*", "lts/-1"]'
   schedule:
     # This is 20 minutes after `epochs/daily` branch is triggered to be created
     # in WPT repo.
@@ -24,11 +19,22 @@ permissions:
   contents: read
 
 jobs:
-  report:
+  collect-versions:
     if: github.repository == 'nodejs/node' || github.event_name == 'workflow_dispatch'
+    runs-on: ubuntu-latest
+    outputs:
+      matrix: ${{ steps.query.outputs.matrix }}
+    steps:
+      - id: query
+        run: |
+          matrix=$(curl -s https://raw.githubusercontent.com/nodejs/Release/refs/heads/main/schedule.json | jq --arg now "$(date +%Y-%m-%d)" '[with_entries(select(.value.end > $now and .value.start < $now)) | keys[] | ltrimstr("v") | tonumber] + ["latest-nightly"]')
+          echo "matrix=$matrix" >> "$GITHUB_OUTPUT"
+  report:
+    needs:
+      - collect-versions
     strategy:
       matrix:
-        node-version: ${{ fromJSON(github.event.inputs.node-versions || '["latest-nightly", "current", "lts/*", "lts/-1"]') }}
+        node-version: ${{ fromJSON(needs.collect-versions.outputs.matrix) }}
       fail-fast: false
     runs-on: ubuntu-latest
     steps:

From d336f8de158767c128a8bfed4b8fc7544348208c Mon Sep 17 00:00:00 2001
From: Filip Skokan <panva.ip@gmail.com>
Date: Sat, 2 Nov 2024 20:42:06 +0000
Subject: [PATCH 071/216] tools: compact jq output in daily-wpt-fyi.yml action
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55695
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 .github/workflows/daily-wpt-fyi.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index 35f326365f1865..a148f21428170c 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -27,7 +27,7 @@ jobs:
     steps:
       - id: query
         run: |
-          matrix=$(curl -s https://raw.githubusercontent.com/nodejs/Release/refs/heads/main/schedule.json | jq --arg now "$(date +%Y-%m-%d)" '[with_entries(select(.value.end > $now and .value.start < $now)) | keys[] | ltrimstr("v") | tonumber] + ["latest-nightly"]')
+          matrix=$(curl -s https://raw.githubusercontent.com/nodejs/Release/refs/heads/main/schedule.json | jq -c --arg now "$(date +%Y-%m-%d)" '[with_entries(select(.value.end > $now and .value.start < $now)) | keys[] | ltrimstr("v") | tonumber] + ["latest-nightly"]')
           echo "matrix=$matrix" >> "$GITHUB_OUTPUT"
   report:
     needs:

From ccc1ea057686ea3aa01ef5682de041573966168b Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:03:18 +0000
Subject: [PATCH 072/216] meta: bump github/codeql-action from 3.26.10 to
 3.27.0

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.10 to 3.27.0.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/e2b3eafc8d227b0241d48be5f425d47c2d750a13...662472033e021d55d94146f66f6058822b0b39fd)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55682
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 .github/workflows/scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index c23dae03fc036c..2ce8b534de03c9 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -73,6 +73,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: Upload to code-scanning
-        uses: github/codeql-action/upload-sarif@e2b3eafc8d227b0241d48be5f425d47c2d750a13  # v3.26.10
+        uses: github/codeql-action/upload-sarif@662472033e021d55d94146f66f6058822b0b39fd  # v3.27.0
         with:
           sarif_file: results.sarif

From c33de63a86dac4a03e4fd273707ff27e5ab8fbf0 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:03:33 +0000
Subject: [PATCH 073/216] meta: bump actions/checkout from 4.2.0 to 4.2.2

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.0 to 4.2.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/d632683dd7b4114ad314bca15554477dd762a938...11bd71901bbe5b1630ceea73d27597364c9af683)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55683
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/auto-start-ci.yml           |  2 +-
 .github/workflows/build-tarball.yml           |  4 ++--
 .github/workflows/commit-lint.yml             |  2 +-
 .github/workflows/commit-queue.yml            |  2 +-
 .../workflows/coverage-linux-without-intl.yml |  2 +-
 .github/workflows/coverage-linux.yml          |  2 +-
 .github/workflows/coverage-windows.yml        |  2 +-
 .github/workflows/daily-wpt-fyi.yml           |  6 +++---
 .github/workflows/daily.yml                   |  2 +-
 .github/workflows/doc.yml                     |  2 +-
 .../workflows/find-inactive-collaborators.yml |  2 +-
 .github/workflows/find-inactive-tsc.yml       |  4 ++--
 .github/workflows/license-builder.yml         |  2 +-
 .github/workflows/linters.yml                 | 20 +++++++++----------
 .github/workflows/notify-on-push.yml          |  2 +-
 .github/workflows/scorecard.yml               |  2 +-
 .github/workflows/test-asan.yml               |  2 +-
 .github/workflows/test-internet.yml           |  2 +-
 .github/workflows/test-linux.yml              |  2 +-
 .github/workflows/test-macos.yml              |  2 +-
 .github/workflows/test-ubsan.yml              |  2 +-
 .github/workflows/timezone-update.yml         |  4 ++--
 .github/workflows/tools.yml                   |  2 +-
 .github/workflows/update-openssl.yml          |  2 +-
 .github/workflows/update-v8.yml               |  2 +-
 25 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/.github/workflows/auto-start-ci.yml b/.github/workflows/auto-start-ci.yml
index f22a13b1abf188..d2a6c3ff2919a4 100644
--- a/.github/workflows/auto-start-ci.yml
+++ b/.github/workflows/auto-start-ci.yml
@@ -45,7 +45,7 @@ jobs:
     if: needs.get-prs-for-ci.outputs.numbers != ''
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
 
diff --git a/.github/workflows/build-tarball.yml b/.github/workflows/build-tarball.yml
index 094f06043afe66..0988a11108ae5c 100644
--- a/.github/workflows/build-tarball.yml
+++ b/.github/workflows/build-tarball.yml
@@ -42,7 +42,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
@@ -72,7 +72,7 @@ jobs:
     needs: build-tarball
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/commit-lint.yml b/.github/workflows/commit-lint.yml
index 0b39ccc287340f..cdada624420302 100644
--- a/.github/workflows/commit-lint.yml
+++ b/.github/workflows/commit-lint.yml
@@ -17,7 +17,7 @@ jobs:
         run: |
           echo "plusOne=$((${{ github.event.pull_request.commits }} + 1))" >> $GITHUB_OUTPUT
           echo "minusOne=$((${{ github.event.pull_request.commits }} - 1))" >> $GITHUB_OUTPUT
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: ${{ steps.nb-of-commits.outputs.plusOne }}
           persist-credentials: false
diff --git a/.github/workflows/commit-queue.yml b/.github/workflows/commit-queue.yml
index 8fc0a4a057fd7c..f72211d345e664 100644
--- a/.github/workflows/commit-queue.yml
+++ b/.github/workflows/commit-queue.yml
@@ -58,7 +58,7 @@ jobs:
     if: needs.get_mergeable_prs.outputs.numbers != ''
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           # Needs the whole git history for ncu to work
           # See https://github.com/nodejs/node-core-utils/pull/486
diff --git a/.github/workflows/coverage-linux-without-intl.yml b/.github/workflows/coverage-linux-without-intl.yml
index 744e9b30b89ab7..919995b76fbcca 100644
--- a/.github/workflows/coverage-linux-without-intl.yml
+++ b/.github/workflows/coverage-linux-without-intl.yml
@@ -48,7 +48,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/coverage-linux.yml b/.github/workflows/coverage-linux.yml
index 68f98937e73ba5..66c5bc1dcde878 100644
--- a/.github/workflows/coverage-linux.yml
+++ b/.github/workflows/coverage-linux.yml
@@ -48,7 +48,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/coverage-windows.yml b/.github/workflows/coverage-windows.yml
index ced6ff661c297a..1224b91ef2dccd 100644
--- a/.github/workflows/coverage-windows.yml
+++ b/.github/workflows/coverage-windows.yml
@@ -45,7 +45,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: windows-2022
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index a148f21428170c..ff8b4a25d8d62f 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -63,7 +63,7 @@ jobs:
           SHORT_SHA=$(node -p 'process.version.split(/-nightly\d{8}/)[1]')
           echo "NIGHTLY_REF=$(gh api /repos/nodejs/node/commits/$SHORT_SHA --jq '.sha')" >> $GITHUB_ENV
       - name: Checkout ${{ steps.setup-node.outputs.node-version }}
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
           ref: ${{ env.NIGHTLY_REF || steps.setup-node.outputs.node-version }}
@@ -79,7 +79,7 @@ jobs:
         run: rm -rf wpt
         working-directory: test/fixtures
       - name: Checkout epochs/daily WPT
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           repository: web-platform-tests/wpt
           persist-credentials: false
@@ -104,7 +104,7 @@ jobs:
         run: rm -rf deps/undici
       - name: Checkout undici
         if: ${{ env.WPT_REPORT != '' }}
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           repository: nodejs/undici
           persist-credentials: false
diff --git a/.github/workflows/daily.yml b/.github/workflows/daily.yml
index a94a3a52677165..619ba34b117003 100644
--- a/.github/workflows/daily.yml
+++ b/.github/workflows/daily.yml
@@ -15,7 +15,7 @@ jobs:
   build-lto:
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
diff --git a/.github/workflows/doc.yml b/.github/workflows/doc.yml
index 00adb5efc1f849..57e61747d4fdc9 100644
--- a/.github/workflows/doc.yml
+++ b/.github/workflows/doc.yml
@@ -24,7 +24,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
diff --git a/.github/workflows/find-inactive-collaborators.yml b/.github/workflows/find-inactive-collaborators.yml
index 813b6c5ff3dcbd..987797b924b0c5 100644
--- a/.github/workflows/find-inactive-collaborators.yml
+++ b/.github/workflows/find-inactive-collaborators.yml
@@ -19,7 +19,7 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: 0
           persist-credentials: false
diff --git a/.github/workflows/find-inactive-tsc.yml b/.github/workflows/find-inactive-tsc.yml
index ea35029a74c553..9a5d1fae481b5d 100644
--- a/.github/workflows/find-inactive-tsc.yml
+++ b/.github/workflows/find-inactive-tsc.yml
@@ -20,13 +20,13 @@ jobs:
 
     steps:
       - name: Checkout the repo
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: 0
           persist-credentials: false
 
       - name: Clone nodejs/TSC repository
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: 0
           path: .tmp
diff --git a/.github/workflows/license-builder.yml b/.github/workflows/license-builder.yml
index 6068e41b80e2f4..c62e9b1f08fe54 100644
--- a/.github/workflows/license-builder.yml
+++ b/.github/workflows/license-builder.yml
@@ -17,7 +17,7 @@ jobs:
     if: github.repository == 'nodejs/node'
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - run: ./tools/license-builder.sh  # Run the license builder tool
diff --git a/.github/workflows/linters.yml b/.github/workflows/linters.yml
index 4340554d9772d9..2fd02329d16c18 100644
--- a/.github/workflows/linters.yml
+++ b/.github/workflows/linters.yml
@@ -25,7 +25,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
@@ -40,7 +40,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
@@ -55,7 +55,7 @@ jobs:
     if: ${{ github.event.pull_request && github.event.pull_request.draft == false && github.base_ref == github.event.repository.default_branch }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: 0
           persist-credentials: false
@@ -93,7 +93,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
@@ -118,7 +118,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
@@ -135,7 +135,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Use Python ${{ env.PYTHON_VERSION }}
@@ -153,7 +153,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - run: shellcheck -V
@@ -163,7 +163,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - uses: mszostok/codeowners-validator@7f3f5e28c6d7b8dfae5731e54ce2272ca384592f
@@ -173,7 +173,7 @@ jobs:
     if: ${{ github.event.pull_request }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           fetch-depth: 2
           persist-credentials: false
@@ -182,7 +182,7 @@ jobs:
   lint-readme:
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - run: tools/lint-readme-lists.mjs
diff --git a/.github/workflows/notify-on-push.yml b/.github/workflows/notify-on-push.yml
index 1e3d618f2dc6df..3f100b68976383 100644
--- a/.github/workflows/notify-on-push.yml
+++ b/.github/workflows/notify-on-push.yml
@@ -34,7 +34,7 @@ jobs:
     permissions:
       pull-requests: write
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Check commit message
diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index 2ce8b534de03c9..c5bc60632cbc71 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -38,7 +38,7 @@ jobs:
           egress-policy: audit  # TODO: change to 'egress-policy: block' after couple of runs
 
       - name: Checkout code
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
 
diff --git a/.github/workflows/test-asan.yml b/.github/workflows/test-asan.yml
index 2965008cca1e27..ca70add4d3d55c 100644
--- a/.github/workflows/test-asan.yml
+++ b/.github/workflows/test-asan.yml
@@ -47,7 +47,7 @@ jobs:
       CONFIG_FLAGS: --enable-asan
       SCCACHE_GHA_ENABLED: 'true'
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/test-internet.yml b/.github/workflows/test-internet.yml
index b7eba9d5015814..8a0280780e4e12 100644
--- a/.github/workflows/test-internet.yml
+++ b/.github/workflows/test-internet.yml
@@ -44,7 +44,7 @@ jobs:
     if: github.repository == 'nodejs/node' || github.event_name != 'schedule'
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/test-linux.yml b/.github/workflows/test-linux.yml
index bc63a2e1edd588..19d37966a16181 100644
--- a/.github/workflows/test-linux.yml
+++ b/.github/workflows/test-linux.yml
@@ -37,7 +37,7 @@ jobs:
     if: github.event.pull_request.draft == false
     runs-on: ubuntu-24.04
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/test-macos.yml b/.github/workflows/test-macos.yml
index 74e9f45a0fe51c..ce4abdf54ca4fe 100644
--- a/.github/workflows/test-macos.yml
+++ b/.github/workflows/test-macos.yml
@@ -44,7 +44,7 @@ jobs:
       CXX: sccache g++
       SCCACHE_GHA_ENABLED: 'true'
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
diff --git a/.github/workflows/test-ubsan.yml b/.github/workflows/test-ubsan.yml
index 9ee7a53229fd8d..f8fb51d15189ff 100644
--- a/.github/workflows/test-ubsan.yml
+++ b/.github/workflows/test-ubsan.yml
@@ -45,7 +45,7 @@ jobs:
       LINK: sccache g++
       CONFIG_FLAGS: --enable-ubsan
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Store suppressions path
diff --git a/.github/workflows/timezone-update.yml b/.github/workflows/timezone-update.yml
index cf2b53fe62c264..e951b848ad4155 100644
--- a/.github/workflows/timezone-update.yml
+++ b/.github/workflows/timezone-update.yml
@@ -20,12 +20,12 @@ jobs:
 
     steps:
       - name: Checkout nodejs/node
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
 
       - name: Checkout unicode-org/icu-data
-        uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           path: icu-data
           persist-credentials: false
diff --git a/.github/workflows/tools.yml b/.github/workflows/tools.yml
index 7f0e79da288667..ab77431e678e21 100644
--- a/.github/workflows/tools.yml
+++ b/.github/workflows/tools.yml
@@ -297,7 +297,7 @@ jobs:
               tail -n1 temp-output | grep "NEW_VERSION=" >> "$GITHUB_ENV" || true
               rm temp-output
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         if: github.event_name == 'schedule' || inputs.id == 'all' || inputs.id == matrix.id
         with:
           persist-credentials: false
diff --git a/.github/workflows/update-openssl.yml b/.github/workflows/update-openssl.yml
index b31b9365c8983e..37f25011eaa3f9 100644
--- a/.github/workflows/update-openssl.yml
+++ b/.github/workflows/update-openssl.yml
@@ -14,7 +14,7 @@ jobs:
     if: github.repository == 'nodejs/node'
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Check and download new OpenSSL version
diff --git a/.github/workflows/update-v8.yml b/.github/workflows/update-v8.yml
index e00add5765d79a..248795e8d624bb 100644
--- a/.github/workflows/update-v8.yml
+++ b/.github/workflows/update-v8.yml
@@ -16,7 +16,7 @@ jobs:
     if: github.repository == 'nodejs/node'
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938  # v4.2.0
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
       - name: Cache node modules and update-v8

From 3d06971a15fe0c0338a6de21a9acbff6e33e487c Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:04:01 +0000
Subject: [PATCH 074/216] meta: bump actions/cache from 4.0.2 to 4.1.2

Bumps [actions/cache](https://github.com/actions/cache) from 4.0.2 to 4.1.2.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/0c45773b623bea8c8e75f6c82b208c3cf94ea4f9...6849a6489940f00c2f30c0fb92c6274307ccb58a)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55684
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/update-v8.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/update-v8.yml b/.github/workflows/update-v8.yml
index 248795e8d624bb..48324dd0c464a7 100644
--- a/.github/workflows/update-v8.yml
+++ b/.github/workflows/update-v8.yml
@@ -20,7 +20,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Cache node modules and update-v8
-        uses: actions/cache@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9  # v4.0.2
+        uses: actions/cache@6849a6489940f00c2f30c0fb92c6274307ccb58a  # v4.1.2
         id: cache-v8-npm
         env:
           cache-name: cache-v8-npm

From 024c5b2ab3cda2df91aba659308016aff37987f7 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:04:57 +0000
Subject: [PATCH 075/216] meta: bump actions/upload-artifact from 4.4.0 to
 4.4.3

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.0 to 4.4.3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/50769540e7f4bd5e21e526ee35c689e35e0d6874...b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55685
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/build-tarball.yml | 2 +-
 .github/workflows/daily-wpt-fyi.yml | 2 +-
 .github/workflows/doc.yml           | 2 +-
 .github/workflows/scorecard.yml     | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build-tarball.yml b/.github/workflows/build-tarball.yml
index 0988a11108ae5c..05524c9ced1745 100644
--- a/.github/workflows/build-tarball.yml
+++ b/.github/workflows/build-tarball.yml
@@ -64,7 +64,7 @@ jobs:
           mkdir tarballs
           mv *.tar.gz tarballs
       - name: Upload tarball artifact
-        uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874  # v4.4.0
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4.4.3
         with:
           name: tarballs
           path: tarballs
diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index ff8b4a25d8d62f..6272248406ed2a 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -127,7 +127,7 @@ jobs:
         run: cp wptreport.json wptreport-${{ steps.setup-node.outputs.node-version }}.json
       - name: Upload GitHub Actions artifact
         if: ${{ env.WPT_REPORT != '' }}
-        uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874  # v4.4.0
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4.4.3
         with:
           path: out/wpt/wptreport-*.json
           name: WPT Report for ${{ steps.setup-node.outputs.node-version }}
diff --git a/.github/workflows/doc.yml b/.github/workflows/doc.yml
index 57e61747d4fdc9..5e439ff57c62f5 100644
--- a/.github/workflows/doc.yml
+++ b/.github/workflows/doc.yml
@@ -35,7 +35,7 @@ jobs:
         run: npx envinfo
       - name: Build
         run: NODE=$(command -v node) make doc-only
-      - uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874  # v4.4.0
+      - uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4.4.3
         with:
           name: docs
           path: out/doc
diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index c5bc60632cbc71..6fbf46e7f3e22f 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -65,7 +65,7 @@ jobs:
       # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
       # format to the repository Actions tab.
       - name: Upload artifact
-        uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874  # v4.4.0
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4.4.3
         with:
           name: SARIF file
           path: results.sarif

From 8b4f2e0c6a4b54aab2885df2ccb8650fd487bf83 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:05:54 +0000
Subject: [PATCH 076/216] meta: bump rtCamp/action-slack-notify from 2.3.0 to
 2.3.2

Bumps [rtCamp/action-slack-notify](https://github.com/rtcamp/action-slack-notify) from 2.3.0 to 2.3.2.
- [Release notes](https://github.com/rtcamp/action-slack-notify/releases)
- [Commits](https://github.com/rtcamp/action-slack-notify/compare/4e5fb42d249be6a45a298f3c9543b111b02f7907...c33737706dea87cd7784c687dadc9adf1be59990)

---
updated-dependencies:
- dependency-name: rtCamp/action-slack-notify
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55686
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 .github/workflows/notify-on-push.yml          | 4 ++--
 .github/workflows/notify-on-review-wanted.yml | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/notify-on-push.yml b/.github/workflows/notify-on-push.yml
index 3f100b68976383..14b184deb515c2 100644
--- a/.github/workflows/notify-on-push.yml
+++ b/.github/workflows/notify-on-push.yml
@@ -14,7 +14,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Slack Notification
-        uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907  # 2.3.0
+        uses: rtCamp/action-slack-notify@c33737706dea87cd7784c687dadc9adf1be59990  # 2.3.2
         env:
           SLACK_COLOR: '#DE512A'
           SLACK_ICON: https://github.com/nodejs.png?size=48
@@ -56,7 +56,7 @@ jobs:
           GH_TOKEN: ${{ github.token }}
       - name: Slack Notification
         if: ${{ env.INVALID_COMMIT_MESSAGE }}
-        uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907  # 2.3.0
+        uses: rtCamp/action-slack-notify@c33737706dea87cd7784c687dadc9adf1be59990  # 2.3.2
         env:
           SLACK_COLOR: '#DE512A'
           SLACK_ICON: https://github.com/nodejs.png?size=48
diff --git a/.github/workflows/notify-on-review-wanted.yml b/.github/workflows/notify-on-review-wanted.yml
index 1f076a00027766..b4e3490f31f3d6 100644
--- a/.github/workflows/notify-on-review-wanted.yml
+++ b/.github/workflows/notify-on-review-wanted.yml
@@ -33,7 +33,7 @@ jobs:
           fi
 
       - name: Slack Notification
-        uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907  # 2.3.0
+        uses: rtCamp/action-slack-notify@c33737706dea87cd7784c687dadc9adf1be59990  # 2.3.2
         env:
           MSG_MINIMAL: actions url
           SLACK_COLOR: '#3d85c6'

From 7a46ffd18a48bcd44aea33fb91a94f75547bc116 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:06:43 +0000
Subject: [PATCH 077/216] meta: bump actions/setup-node from 4.0.4 to 4.1.0

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.0.4 to 4.1.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/0a44ba7841725637a19e28fa30b79a866c81b0a6...39370e3970a6d050c480ffad4ff0ed4d3fdee5af)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55687
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/auto-start-ci.yml               | 2 +-
 .github/workflows/commit-lint.yml                 | 2 +-
 .github/workflows/commit-queue.yml                | 2 +-
 .github/workflows/daily-wpt-fyi.yml               | 2 +-
 .github/workflows/daily.yml                       | 2 +-
 .github/workflows/doc.yml                         | 2 +-
 .github/workflows/find-inactive-collaborators.yml | 2 +-
 .github/workflows/find-inactive-tsc.yml           | 2 +-
 .github/workflows/linters.yml                     | 6 +++---
 .github/workflows/update-v8.yml                   | 2 +-
 10 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/auto-start-ci.yml b/.github/workflows/auto-start-ci.yml
index d2a6c3ff2919a4..3703a28abbc231 100644
--- a/.github/workflows/auto-start-ci.yml
+++ b/.github/workflows/auto-start-ci.yml
@@ -50,7 +50,7 @@ jobs:
           persist-credentials: false
 
       - name: Install Node.js
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
 
diff --git a/.github/workflows/commit-lint.yml b/.github/workflows/commit-lint.yml
index cdada624420302..1eb5622358ed7d 100644
--- a/.github/workflows/commit-lint.yml
+++ b/.github/workflows/commit-lint.yml
@@ -23,7 +23,7 @@ jobs:
           persist-credentials: false
       - run: git reset HEAD^2
       - name: Install Node.js
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Validate commit message
diff --git a/.github/workflows/commit-queue.yml b/.github/workflows/commit-queue.yml
index f72211d345e664..3417ed62a53b6b 100644
--- a/.github/workflows/commit-queue.yml
+++ b/.github/workflows/commit-queue.yml
@@ -71,7 +71,7 @@ jobs:
 
       # Install dependencies
       - name: Install Node.js
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Install @node-core/utils
diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index 6272248406ed2a..cb57f02c02e1b8 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -51,7 +51,7 @@ jobs:
         run: echo "NIGHTLY=$(curl -s https://nodejs.org/download/nightly/index.json | jq -r '[.[] | select(.files[] | contains("linux-x64"))][0].version')" >> $GITHUB_ENV
       - name: Install Node.js
         id: setup-node
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NIGHTLY || matrix.node-version }}
           check-latest: true
diff --git a/.github/workflows/daily.yml b/.github/workflows/daily.yml
index 619ba34b117003..e929e8168b0e66 100644
--- a/.github/workflows/daily.yml
+++ b/.github/workflows/daily.yml
@@ -19,7 +19,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/doc.yml b/.github/workflows/doc.yml
index 5e439ff57c62f5..6bf6254abf3555 100644
--- a/.github/workflows/doc.yml
+++ b/.github/workflows/doc.yml
@@ -28,7 +28,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/find-inactive-collaborators.yml b/.github/workflows/find-inactive-collaborators.yml
index 987797b924b0c5..30ae63ee115be8 100644
--- a/.github/workflows/find-inactive-collaborators.yml
+++ b/.github/workflows/find-inactive-collaborators.yml
@@ -25,7 +25,7 @@ jobs:
           persist-credentials: false
 
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
 
diff --git a/.github/workflows/find-inactive-tsc.yml b/.github/workflows/find-inactive-tsc.yml
index 9a5d1fae481b5d..85c16ad0648fca 100644
--- a/.github/workflows/find-inactive-tsc.yml
+++ b/.github/workflows/find-inactive-tsc.yml
@@ -34,7 +34,7 @@ jobs:
           repository: nodejs/TSC
 
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
 
diff --git a/.github/workflows/linters.yml b/.github/workflows/linters.yml
index 2fd02329d16c18..1e190d93a4f987 100644
--- a/.github/workflows/linters.yml
+++ b/.github/workflows/linters.yml
@@ -29,7 +29,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Environment Information
@@ -60,7 +60,7 @@ jobs:
           fetch-depth: 0
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Set up Python ${{ env.PYTHON_VERSION }}
@@ -97,7 +97,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Use Node.js ${{ env.NODE_VERSION }}
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/update-v8.yml b/.github/workflows/update-v8.yml
index 48324dd0c464a7..2d28ae5d602099 100644
--- a/.github/workflows/update-v8.yml
+++ b/.github/workflows/update-v8.yml
@@ -30,7 +30,7 @@ jobs:
             ~/.npm
           key: ${{ runner.os }}-build-${{ env.cache-name }}
       - name: Install Node.js
-        uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6  # v4.0.4
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Install @node-core/utils

From 070aa9d6a52550828ae1142c266a6a03bfa1b3c6 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 3 Nov 2024 19:07:53 +0000
Subject: [PATCH 078/216] meta: bump actions/setup-python from 5.2.0 to 5.3.0

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.2.0 to 5.3.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/f677139bbe7f9c59b41e40162b753c062f5d49a3...0b93645e9fea7318ecaed2b359559ac225c90a2b)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/55688
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/build-tarball.yml               | 4 ++--
 .github/workflows/coverage-linux-without-intl.yml | 2 +-
 .github/workflows/coverage-linux.yml              | 2 +-
 .github/workflows/coverage-windows.yml            | 2 +-
 .github/workflows/daily-wpt-fyi.yml               | 2 +-
 .github/workflows/linters.yml                     | 8 ++++----
 .github/workflows/test-asan.yml                   | 2 +-
 .github/workflows/test-internet.yml               | 2 +-
 .github/workflows/test-linux.yml                  | 2 +-
 .github/workflows/test-macos.yml                  | 2 +-
 .github/workflows/test-ubsan.yml                  | 2 +-
 .github/workflows/tools.yml                       | 2 +-
 12 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/.github/workflows/build-tarball.yml b/.github/workflows/build-tarball.yml
index 05524c9ced1745..4796b5b316ebe4 100644
--- a/.github/workflows/build-tarball.yml
+++ b/.github/workflows/build-tarball.yml
@@ -46,7 +46,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
@@ -76,7 +76,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/coverage-linux-without-intl.yml b/.github/workflows/coverage-linux-without-intl.yml
index 919995b76fbcca..ddd85fb8a4ff0e 100644
--- a/.github/workflows/coverage-linux-without-intl.yml
+++ b/.github/workflows/coverage-linux-without-intl.yml
@@ -52,7 +52,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/coverage-linux.yml b/.github/workflows/coverage-linux.yml
index 66c5bc1dcde878..153504ba4280d6 100644
--- a/.github/workflows/coverage-linux.yml
+++ b/.github/workflows/coverage-linux.yml
@@ -52,7 +52,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/coverage-windows.yml b/.github/workflows/coverage-windows.yml
index 1224b91ef2dccd..84feb7b09018de 100644
--- a/.github/workflows/coverage-windows.yml
+++ b/.github/workflows/coverage-windows.yml
@@ -49,7 +49,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Install deps
diff --git a/.github/workflows/daily-wpt-fyi.yml b/.github/workflows/daily-wpt-fyi.yml
index cb57f02c02e1b8..ebac102f63e115 100644
--- a/.github/workflows/daily-wpt-fyi.yml
+++ b/.github/workflows/daily-wpt-fyi.yml
@@ -39,7 +39,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/linters.yml b/.github/workflows/linters.yml
index 1e190d93a4f987..0149bace5b8e23 100644
--- a/.github/workflows/linters.yml
+++ b/.github/workflows/linters.yml
@@ -44,7 +44,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
@@ -64,7 +64,7 @@ jobs:
         with:
           node-version: ${{ env.NODE_VERSION }}
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
@@ -122,7 +122,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
@@ -139,7 +139,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Use Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/test-asan.yml b/.github/workflows/test-asan.yml
index ca70add4d3d55c..d918fa7d87300b 100644
--- a/.github/workflows/test-asan.yml
+++ b/.github/workflows/test-asan.yml
@@ -51,7 +51,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/test-internet.yml b/.github/workflows/test-internet.yml
index 8a0280780e4e12..eced01cfbdaa0e 100644
--- a/.github/workflows/test-internet.yml
+++ b/.github/workflows/test-internet.yml
@@ -48,7 +48,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Environment Information
diff --git a/.github/workflows/test-linux.yml b/.github/workflows/test-linux.yml
index 19d37966a16181..24cf47f9b376bd 100644
--- a/.github/workflows/test-linux.yml
+++ b/.github/workflows/test-linux.yml
@@ -41,7 +41,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/test-macos.yml b/.github/workflows/test-macos.yml
index ce4abdf54ca4fe..6bb22265032605 100644
--- a/.github/workflows/test-macos.yml
+++ b/.github/workflows/test-macos.yml
@@ -48,7 +48,7 @@ jobs:
         with:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/test-ubsan.yml b/.github/workflows/test-ubsan.yml
index f8fb51d15189ff..9f33fa670b8231 100644
--- a/.github/workflows/test-ubsan.yml
+++ b/.github/workflows/test-ubsan.yml
@@ -52,7 +52,7 @@ jobs:
         run: |
           echo "UBSAN_OPTIONS=suppressions=$GITHUB_WORKSPACE/suppressions.supp" >> $GITHUB_ENV
       - name: Set up Python ${{ env.PYTHON_VERSION }}
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - name: Set up sccache
diff --git a/.github/workflows/tools.yml b/.github/workflows/tools.yml
index ab77431e678e21..09fb8968bf1d00 100644
--- a/.github/workflows/tools.yml
+++ b/.github/workflows/tools.yml
@@ -303,7 +303,7 @@ jobs:
           persist-credentials: false
       - name: Set up Python ${{ env.PYTHON_VERSION }}
         if: matrix.id == 'icu' && (github.event_name == 'schedule' || inputs.id == 'all' || inputs.id == matrix.id)
-        uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5d49a3  # v5.2.0
+        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
         with:
           python-version: ${{ env.PYTHON_VERSION }}
       - run: ${{ matrix.run }}

From adfc2f993a9cbf79be5e1ddd4b6fa5edb0b31c9d Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Fri, 1 Nov 2024 16:36:45 +0000
Subject: [PATCH 079/216] tools: fix root certificate updater
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Determine the NSS version from actual Firefox releases, instead of
attempting to parse a wiki page (which is sensitive to formatting
changes and relies on the page being up to date).

PR-URL: https://github.com/nodejs/node/pull/55681
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 tools/dep_updaters/update-root-certs.mjs | 186 +++++++++--------------
 1 file changed, 69 insertions(+), 117 deletions(-)

diff --git a/tools/dep_updaters/update-root-certs.mjs b/tools/dep_updaters/update-root-certs.mjs
index 64f3c88b851b7f..a9e0a009f02deb 100644
--- a/tools/dep_updaters/update-root-certs.mjs
+++ b/tools/dep_updaters/update-root-certs.mjs
@@ -8,109 +8,78 @@ import { pipeline } from 'node:stream/promises';
 import { fileURLToPath } from 'node:url';
 import { parseArgs } from 'node:util';
 
-// Constants for NSS release metadata.
-const kNSSVersion = 'version';
-const kNSSDate = 'date';
-const kFirefoxVersion = 'firefoxVersion';
-const kFirefoxDate = 'firefoxDate';
-
 const __filename = fileURLToPath(import.meta.url);
-const now = new Date();
-
-const formatDate = (d) => {
-  const iso = d.toISOString();
-  return iso.substring(0, iso.indexOf('T'));
-};
 
 const getCertdataURL = (version) => {
   const tag = `NSS_${version.replaceAll('.', '_')}_RTM`;
-  const certdataURL = `https://hg.mozilla.org/projects/nss/raw-file/${tag}/lib/ckfw/builtins/certdata.txt`;
+  const certdataURL = `https://raw.githubusercontent.com/nss-dev/nss/refs/tags/${tag}/lib/ckfw/builtins/certdata.txt`;
   return certdataURL;
 };
 
-const normalizeTD = (text = '') => {
-  // Remove whitespace and any HTML tags.
-  return text?.trim().replace(/<.*?>/g, '');
-};
-const getReleases = (text) => {
-  const releases = [];
-  const tableRE = /<table [^>]+>([\S\s]*?)<\/table>/g;
-  const tableRowRE = /<tr ?[^>]*>([\S\s]*?)<\/tr>/g;
-  const tableHeaderRE = /<th ?[^>]*>([\S\s]*?)<\/th>/g;
-  const tableDataRE = /<td ?[^>]*>([\S\s]*?)<\/td>/g;
-  for (const table of text.matchAll(tableRE)) {
-    const columns = {};
-    const matches = table[1].matchAll(tableRowRE);
-    // First row has the table header.
-    let row = matches.next();
-    if (row.done) {
-      continue;
-    }
-    const headers = Array.from(row.value[1].matchAll(tableHeaderRE), (m) => m[1]);
-    if (headers.length > 0) {
-      for (let i = 0; i < headers.length; i++) {
-        if (/NSS version/i.test(headers[i])) {
-          columns[kNSSVersion] = i;
-        } else if (/Release.*from branch/i.test(headers[i])) {
-          columns[kNSSDate] = i;
-        } else if (/Firefox version/i.test(headers[i])) {
-          columns[kFirefoxVersion] = i;
-        } else if (/Firefox release date/i.test(headers[i])) {
-          columns[kFirefoxDate] = i;
-        }
-      }
-    }
-    // Filter out "NSS Certificate bugs" table.
-    if (columns[kNSSDate] === undefined) {
-      continue;
-    }
-    // Scrape releases.
-    row = matches.next();
-    while (!row.done) {
-      const cells = Array.from(row.value[1].matchAll(tableDataRE), (m) => m[1]);
-      const release = {};
-      release[kNSSVersion] = normalizeTD(cells[columns[kNSSVersion]]);
-      release[kNSSDate] = new Date(normalizeTD(cells[columns[kNSSDate]]));
-      release[kFirefoxVersion] = normalizeTD(cells[columns[kFirefoxVersion]]);
-      release[kFirefoxDate] = new Date(normalizeTD(cells[columns[kFirefoxDate]]));
-      releases.push(release);
-      row = matches.next();
-    }
+const getFirefoxReleases = async (everything = false) => {
+  const releaseDataURL = `https://nucleus.mozilla.org/rna/all-releases.json${everything ? '?all=true' : ''}`;
+  if (values.verbose) {
+    console.log(`Fetching Firefox release data from ${releaseDataURL}.`);
+  }
+  const releaseData = await fetch(releaseDataURL);
+  if (!releaseData.ok) {
+    console.error(`Failed to fetch ${releaseDataURL}: ${releaseData.status}: ${releaseData.statusText}.`);
+    process.exit(-1);
   }
-  return releases;
+  return (await releaseData.json()).filter((release) => {
+    // We're only interested in public releases of Firefox.
+    return (release.product === 'Firefox' && release.channel === 'Release' && release.is_public === true);
+  }).sort((a, b) => {
+    // Sort results by release date.
+    return new Date(b.release_date) - new Date(a.release_date);
+  });
 };
 
-const getLatestVersion = async (releases) => {
-  const arrayNumberSortDescending = (x, y, i) => {
-    if (x[i] === undefined && y[i] === undefined) {
-      return 0;
-    } else if (x[i] === y[i]) {
-      return arrayNumberSortDescending(x, y, i + 1);
-    }
-    return (y[i] ?? 0) - (x[i] ?? 0);
-  };
-  const extractVersion = (t) => {
-    return t[kNSSVersion].split('.').map((n) => parseInt(n));
-  };
-  const releaseSorter = (x, y) => {
-    return arrayNumberSortDescending(extractVersion(x), extractVersion(y), 0);
-  };
-  // Return the most recent certadata.txt that exists on the server.
-  const sortedReleases = releases.sort(releaseSorter).filter(pastRelease);
-  for (const candidate of sortedReleases) {
-    const candidateURL = getCertdataURL(candidate[kNSSVersion]);
-    if (values.verbose) {
-      console.log(`Trying ${candidateURL}`);
+const getFirefoxRelease = async (version) => {
+  let releases = await getFirefoxReleases();
+  let found;
+  if (version === undefined) {
+    // No version specified. Find the most recent.
+    if (releases.length > 0) {
+      found = releases[0];
+    } else {
+      if (values.verbose) {
+        console.log('Unable to find release data for Firefox. Searching full release data.');
+      }
+      releases = await getFirefoxReleases(true);
+      found = releases[0];
     }
-    const response = await fetch(candidateURL, { method: 'HEAD' });
-    if (response.ok) {
-      return candidate[kNSSVersion];
+  } else {
+    // Search for the specified release.
+    found = releases.find((release) => release.version === version);
+    if (found === undefined) {
+      if (values.verbose) {
+        console.log(`Unable to find release data for Firefox ${version}. Searching full release data.`);
+      }
+      releases = await getFirefoxReleases(true);
+      found = releases.find((release) => release.version === version);
     }
   }
+  return found;
 };
 
-const pastRelease = (r) => {
-  return r[kNSSDate] < now;
+const getNSSVersion = async (release) => {
+  const latestFirefox = release.version;
+  const firefoxTag = `FIREFOX_${latestFirefox.replace('.', '_')}_RELEASE`;
+  const tagInfoURL = `https://hg.mozilla.org/releases/mozilla-release/raw-file/${firefoxTag}/security/nss/TAG-INFO`;
+  if (values.verbose) {
+    console.log(`Fetching NSS tag from ${tagInfoURL}.`);
+  }
+  const tagInfo = await fetch(tagInfoURL);
+  if (!tagInfo.ok) {
+    console.error(`Failed to fetch ${tagInfoURL}: ${tagInfo.status}: ${tagInfo.statusText}`);
+  }
+  const tag = await tagInfo.text();
+  if (values.verbose) {
+    console.log(`Found tag ${tag}.`);
+  }
+  // Tag will be of form `NSS_x_y_RTM`. Convert to `x.y`.
+  return tag.split('_').slice(1, -1).join('.');
 };
 
 const options = {
@@ -135,9 +104,9 @@ const {
 });
 
 if (values.help) {
-  console.log(`Usage: ${basename(__filename)} [OPTION]... [VERSION]...`);
+  console.log(`Usage: ${basename(__filename)} [OPTION]... [RELEASE]...`);
   console.log();
-  console.log('Updates certdata.txt to NSS VERSION (most recent release by default).');
+  console.log('Updates certdata.txt to NSS version contained in Firefox RELEASE (default: most recent release).');
   console.log('');
   console.log('  -f, --file=FILE  writes a commit message reflecting the change to the');
   console.log('                     specified FILE');
@@ -146,29 +115,11 @@ if (values.help) {
   process.exit(0);
 }
 
-const scheduleURL = 'https://wiki.mozilla.org/NSS:Release_Versions';
-if (values.verbose) {
-  console.log(`Fetching NSS release schedule from ${scheduleURL}`);
-}
-const schedule = await fetch(scheduleURL);
-if (!schedule.ok) {
-  console.error(`Failed to fetch ${scheduleURL}: ${schedule.status}: ${schedule.statusText}`);
-  process.exit(-1);
-}
-const scheduleText = await schedule.text();
-const nssReleases = getReleases(scheduleText);
-
+const firefoxRelease = await getFirefoxRelease(positionals[0]);
 // Retrieve metadata for the NSS release being updated to.
-const version = positionals[0] ?? await getLatestVersion(nssReleases);
-const release = nssReleases.find((r) => {
-  return new RegExp(`^${version.replace('.', '\\.')}\\b`).test(r[kNSSVersion]);
-});
-if (!pastRelease(release)) {
-  console.warn(`Warning: NSS ${version} is not due to be released until ${formatDate(release[kNSSDate])}`);
-}
+const version = await getNSSVersion(firefoxRelease);
 if (values.verbose) {
-  console.log('Found NSS version:');
-  console.log(release);
+  console.log(`Updating to NSS version ${version}`);
 }
 
 // Fetch certdata.txt and overwrite the local copy.
@@ -213,14 +164,15 @@ const added = [ ...diff.matchAll(certsAddedRE) ].map((m) => m[1]);
 const removed = [ ...diff.matchAll(certsRemovedRE) ].map((m) => m[1]);
 
 const commitMsg = [
-  `crypto: update root certificates to NSS ${release[kNSSVersion]}`,
+  `crypto: update root certificates to NSS ${version}`,
   '',
-  `This is the certdata.txt[0] from NSS ${release[kNSSVersion]}, released on ${formatDate(release[kNSSDate])}.`,
-  '',
-  `This is the version of NSS that ${release[kFirefoxDate] < now ? 'shipped' : 'will ship'} in Firefox ${release[kFirefoxVersion]} on`,
-  `${formatDate(release[kFirefoxDate])}.`,
+  `This is the certdata.txt[0] from NSS ${version}.`,
   '',
 ];
+if (firefoxRelease) {
+  commitMsg.push(`This is the version of NSS that shipped in Firefox ${firefoxRelease.version} on ${firefoxRelease.release_date}.`);
+  commitMsg.push('');
+}
 if (added.length > 0) {
   commitMsg.push('Certificates added:');
   commitMsg.push(...added.map((cert) => `- ${cert}`));
@@ -234,7 +186,7 @@ if (removed.length > 0) {
 commitMsg.push(`[0] ${certdataURL}`);
 const delimiter = randomUUID();
 const properties = [
-  `NEW_VERSION=${release[kNSSVersion]}`,
+  `NEW_VERSION=${version}`,
   `COMMIT_MSG<<${delimiter}`,
   ...commitMsg,
   delimiter,

From 247fa1959f522bc9db00c7256a6175cdad2054fe Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Fri, 1 Nov 2024 17:53:50 +0000
Subject: [PATCH 080/216] crypto: update root certificates to NSS 3.104
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is the certdata.txt[0] from NSS 3.104.

This is the version of NSS that shipped in Firefox 131.0 on 2024-10-01.

Certificates added:
- FIRMAPROFESIONAL CA ROOT-A WEB
- TWCA CYBER Root CA
- SecureSign Root CA12
- SecureSign Root CA14
- SecureSign Root CA15

[0] https://raw.githubusercontent.com/nss-dev/nss/refs/tags/NSS_3_104_RTM/lib/ckfw/builtins/certdata.txt

PR-URL: https://github.com/nodejs/node/pull/55681
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 src/node_root_certs.h | 113 ++++++
 tools/certdata.txt    | 896 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 1005 insertions(+), 4 deletions(-)

diff --git a/src/node_root_certs.h b/src/node_root_certs.h
index 93af565fbc1add..2c8670be39e586 100644
--- a/src/node_root_certs.h
+++ b/src/node_root_certs.h
@@ -3571,4 +3571,117 @@
 "4Sw5/7W0cwDk90imc6y/st53BIe0o82bNSQ3+pCTE4FCxpgmdTdmQRCsu/WU48IxK63nI1bM\n"
 "NSWSs1A=\n"
 "-----END CERTIFICATE-----",
+
+/* FIRMAPROFESIONAL CA ROOT-A WEB */
+"-----BEGIN CERTIFICATE-----\n"
+"MIICejCCAgCgAwIBAgIQMZch7a+JQn81QYehZ1ZMbTAKBggqhkjOPQQDAzBuMQswCQYDVQQG\n"
+"EwJFUzEcMBoGA1UECgwTRmlybWFwcm9mZXNpb25hbCBTQTEYMBYGA1UEYQwPVkFURVMtQTYy\n"
+"NjM0MDY4MScwJQYDVQQDDB5GSVJNQVBST0ZFU0lPTkFMIENBIFJPT1QtQSBXRUIwHhcNMjIw\n"
+"NDA2MDkwMTM2WhcNNDcwMzMxMDkwMTM2WjBuMQswCQYDVQQGEwJFUzEcMBoGA1UECgwTRmly\n"
+"bWFwcm9mZXNpb25hbCBTQTEYMBYGA1UEYQwPVkFURVMtQTYyNjM0MDY4MScwJQYDVQQDDB5G\n"
+"SVJNQVBST0ZFU0lPTkFMIENBIFJPT1QtQSBXRUIwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAARH\n"
+"U+osEaR3xyrq89Zfe9MEkVz6iMYiuYMQYneEMy3pA4jU4DP37XcsSmDq5G+tbbT4TIqk5B/K\n"
+"6k84Si6CcyvHZpsKjECcfIr28jlgst7L7Ljkb+qbXbdTkBgyVcUgt5SjYzBhMA8GA1UdEwEB\n"
+"/wQFMAMBAf8wHwYDVR0jBBgwFoAUk+FDY1w8ndYn81LsF7Kpryz3dvgwHQYDVR0OBBYEFJPh\n"
+"Q2NcPJ3WJ/NS7Beyqa8s93b4MA4GA1UdDwEB/wQEAwIBBjAKBggqhkjOPQQDAwNoADBlAjAd\n"
+"fKR7w4l1M+E7qUW/Runpod3JIha3RxEL2Jq68cgLcFBTApFwhVmpHqTm6iMxoAACMQD94viz\n"
+"rxa5HnPEluPBMBnYfubDl94cT7iJLzPrSA8Z94dGXSaQpYXFuXqUPoeovQA=\n"
+"-----END CERTIFICATE-----",
+
+/* TWCA CYBER Root CA */
+"-----BEGIN CERTIFICATE-----\n"
+"MIIFjTCCA3WgAwIBAgIQQAE0jMIAAAAAAAAAATzyxjANBgkqhkiG9w0BAQwFADBQMQswCQYD\n"
+"VQQGEwJUVzESMBAGA1UEChMJVEFJV0FOLUNBMRAwDgYDVQQLEwdSb290IENBMRswGQYDVQQD\n"
+"ExJUV0NBIENZQkVSIFJvb3QgQ0EwHhcNMjIxMTIyMDY1NDI5WhcNNDcxMTIyMTU1OTU5WjBQ\n"
+"MQswCQYDVQQGEwJUVzESMBAGA1UEChMJVEFJV0FOLUNBMRAwDgYDVQQLEwdSb290IENBMRsw\n"
+"GQYDVQQDExJUV0NBIENZQkVSIFJvb3QgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK\n"
+"AoICAQDG+Moe2Qkgfh1sTs6P40czRJzHyWmqOlt47nDSkvgEs1JSHWdyKKHfi12VCv7qze33\n"
+"Kc7wb3+szT3vsxxFavcokPFhV8UMxKNQXd7UtcsZyoC5dc4pztKFIuwCY8xEMCDa6pFbVuYd\n"
+"HNWdZsc/34bKS1PE2Y2yHer43CdTo0fhYcx9tbD47nORxc5zb87uEB8aBs/pJ2DFTxnk684i\n"
+"JkXXYJndzk834H/nY62wuFm40AZoNWDTNq5xQwTxaWV4fPMf88oon1oglWa0zbfuj3ikRRjp\n"
+"Ji+NmykosaS3Om251Bw4ckVYsV7r8Cibt4LK/c/WMw+f+5eesRycnupfXtuq3VTpMCEobY55\n"
+"83WSjCb+3MX2w7DfRFlDo7YDKPYIMKoNM+HvnKkHIuNZW0CP2oi3aQiotyMuRAlZN1vH4xfy\n"
+"IutuOVLF3lSnmMlLIJXcRolftBL5hSmO68gnFSDAS9TMfAxsNAwmmyYxpjyn9tnQS6Jk/zuZ\n"
+"QXLB4HCX8SS7K8R0IrGsayIyJNN4KsDAoS/xUgXJP+92ZuJF2A09rZXIx4kmyA+upwMu+8Ff\n"
+"+iDhcK2wZSA3M2Cw1a/XDBzCkHDXShi8fgGwsOsVHkQGzaRP6AzRwyAQ4VRlnrZR0Bp2a0Ja\n"
+"WHY06rc3Ga4udfmW5cFZ95RXKSWNOkyrTZpB0F8mAwIDAQABo2MwYTAOBgNVHQ8BAf8EBAMC\n"
+"AQYwDwYDVR0TAQH/BAUwAwEB/zAfBgNVHSMEGDAWgBSdhWEUfMFib5do5E83QOGt4A1WNzAd\n"
+"BgNVHQ4EFgQUnYVhFHzBYm+XaORPN0DhreANVjcwDQYJKoZIhvcNAQEMBQADggIBAGSPesRi\n"
+"DrWIzLjHhg6hShbNcAu3p4ULs3a2D6f/CIsLJc+o1IN1KriWiLb73y0ttGlTITVX1olNc79p\n"
+"j3CjYcya2x6a4CD4bLubIp1dhDGaLIrdaqHXKGnK/nZVekZn68xDiBaiA9a5F/gZbG0jAn/x\n"
+"X9AKKSM70aoK7akXJlQKTcKlTfjF/biBzysseKNnTKkHmvPfXvt89YnNdJdhEGoHK4Fa0o63\n"
+"5yDRIG4kqIQnoVesqlVYL9zZyvpoBJ7tRCT5dEA7IzOrg1oYJkK2bVS1FmAwbLGg+LhBoF1J\n"
+"SdJlBTrq/p1hvIbZv97Tujqxf36SNI7JAG7cmL3c7IAFrQI932XtCwP39xaEBDG6k5TY8hL4\n"
+"iuO/Qq+n1M0RFxbIQh0UqEL20kCGoE8jypZFVmAGzbdVAaYBlGX+bgUJurSkquLvWL69J1bY\n"
+"73NxW0Qz8ppy6rBePm6pUlvscG21h483XjyMnM7k8M4MZ0HMzvaAq07MTFb1wWFZk7Q+ptq4\n"
+"NxKfKjLji7gh7MMrZQzvIt6IKTtM1/r+t+FHvpw+PoP7UV31aPcuIYXcv/Fa4nzXxeSDwWrr\n"
+"uoBa3lwtcHb4yOWHh8qgnaHlIhInD0Q9HWzq1MKLL295q39QpsQZp6F6t5b5wR9iWqJDB0Be\n"
+"Jsas7a5wFsWqynKKTbDPAYsDP27X\n"
+"-----END CERTIFICATE-----",
+
+/* SecureSign Root CA12 */
+"-----BEGIN CERTIFICATE-----\n"
+"MIIDcjCCAlqgAwIBAgIUZvnHwa/swlG07VOX5uaCwysckBYwDQYJKoZIhvcNAQELBQAwUTEL\n"
+"MAkGA1UEBhMCSlAxIzAhBgNVBAoTGkN5YmVydHJ1c3QgSmFwYW4gQ28uLCBMdGQuMR0wGwYD\n"
+"VQQDExRTZWN1cmVTaWduIFJvb3QgQ0ExMjAeFw0yMDA0MDgwNTM2NDZaFw00MDA0MDgwNTM2\n"
+"NDZaMFExCzAJBgNVBAYTAkpQMSMwIQYDVQQKExpDeWJlcnRydXN0IEphcGFuIENvLiwgTHRk\n"
+"LjEdMBsGA1UEAxMUU2VjdXJlU2lnbiBSb290IENBMTIwggEiMA0GCSqGSIb3DQEBAQUAA4IB\n"
+"DwAwggEKAoIBAQC6OcE3emhFKxS06+QT61d1I02PJC0W6K6OyX2kVzsqdiUzg2zqMoqUm048\n"
+"luT9Ub+ZyZN+v/mtp7JIKwccJ/VMvHASd6SFVLX9kHrko+RRWAPNEHl57muTH2SOa2SroxPj\n"
+"cf59q5zdJ1M3s6oYwlkm7Fsf0uZlfO+TvdhYXAvA42VvPMfKWeP+bl+sg779XSVOKik71gur\n"
+"FzJ4pOE+lEa+Ym6b3kaosRbnhW70CEBFEaCeVESE99g2zvVQR9wsMJvuwPWW0v4JhscGWa5P\n"
+"ro4RmHvzC1KqYiaqId+OJTN5lxZJjfU+1UefNzFJM3IFTQy2VYzxV4+Kh9GtxRESOaCtAgMB\n"
+"AAGjQjBAMA8GA1UdEwEB/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMB0GA1UdDgQWBBRXNPN0\n"
+"zwRL1SXm8UC2LEzZLemgrTANBgkqhkiG9w0BAQsFAAOCAQEAPrvbFxbS8hQBICw4g0utvsqF\n"
+"epq2m2um4fylOqyttCg6r9cBg0krY6LdmmQOmFxv3Y67ilQiLUoT865AQ9tPkbeGGuwAtEGB\n"
+"pE/6aouIs3YIcipJQMPTw4WJmBClnW8Zt7vPemVV2zfrPIpyMpcemik+rY3moxtt9XUa5rBo\n"
+"uVui7mlHJzWhhpmA8zNL4WukJsPvdFlseqJkth5Ew1DgDzk9qTPxpfPSvWKErI4cqc1avTc7\n"
+"bgoitPQV55FYxTpE05Uo2cBl6XLK0A+9H7MV2anjpEcJnuDLN/v9vZfVvhgaaaI5gdka9at/\n"
+"yOPiZwud9AzqVN/Ssq+xIvEg37xEHA==\n"
+"-----END CERTIFICATE-----",
+
+/* SecureSign Root CA14 */
+"-----BEGIN CERTIFICATE-----\n"
+"MIIFcjCCA1qgAwIBAgIUZNtaDCBO6Ncpd8hQJ6JaJ90t8sswDQYJKoZIhvcNAQEMBQAwUTEL\n"
+"MAkGA1UEBhMCSlAxIzAhBgNVBAoTGkN5YmVydHJ1c3QgSmFwYW4gQ28uLCBMdGQuMR0wGwYD\n"
+"VQQDExRTZWN1cmVTaWduIFJvb3QgQ0ExNDAeFw0yMDA0MDgwNzA2MTlaFw00NTA0MDgwNzA2\n"
+"MTlaMFExCzAJBgNVBAYTAkpQMSMwIQYDVQQKExpDeWJlcnRydXN0IEphcGFuIENvLiwgTHRk\n"
+"LjEdMBsGA1UEAxMUU2VjdXJlU2lnbiBSb290IENBMTQwggIiMA0GCSqGSIb3DQEBAQUAA4IC\n"
+"DwAwggIKAoICAQDF0nqh1oq/FjHQmNE6lPxauG4iwWL3pwon71D2LrGeaBLwbCRjOfHw3xDG\n"
+"3rdSINVSW0KZnvOgvlIfX8xnbacuUKLBl422+JX1sLrcneC+y9/3OPJH9aaakpUqYllQC6Kx\n"
+"NedlsmGy6pJxaeQp8E+BgQQ8sqVb1MWoWWd7VRxJq3qdwudzTe/NCcLEVxLbAQ4jeQkHO6Lo\n"
+"/IrPj8BGJJw4J+CDnRugv3gVEOuGTgpa/d/aLIJ+7sr2KeH6caH3iGicnPCNvg9JkdjqOvn9\n"
+"0Ghx2+m1K06Ckm9mH+Dw3EzsytHqunQG+bOEkJTRX45zGRBdAuVwpcAQ0BB8b8VYSbSwbpra\n"
+"fZX1zNoCr7gsfXmPvkPx+SgojQlD+Ajda8iLLCSxjVIHvXiby8posqTdDEx5YMaZ0ZPxMBoH\n"
+"064iwurO8YQJzOAUbn8/ftKChazcqRZOhaBgy/ac18izju3Gm5h1DVXoX+WViwKkrkMpKBGk\n"
+"5hIwAUt1ax5mnXkvpXYvHUC0bcl9eQjs0Wq2XSqypWa9a4X0dFbD9ed1Uigspf9mR6XU/v6e\n"
+"VL9lfgHWMI+lNpyiUBzuOIABSMbHdPTGrMNASRZhdCyvjG817XsYAFs2PJxQDcqSMxDxJklt\n"
+"33UkN4Ii1+iW/RVLApY+B3KVfqs9TC7XyvDf4Fg/LS8EmjijAQIDAQABo0IwQDAPBgNVHRMB\n"
+"Af8EBTADAQH/MA4GA1UdDwEB/wQEAwIBBjAdBgNVHQ4EFgQUBpOjCl4oaTeqYR3r6/wtbyPk\n"
+"86AwDQYJKoZIhvcNAQEMBQADggIBAJaAcgkGfpzMkwQWu6A6jZJOtxEaCnFxEM0ErX+lRVAQ\n"
+"Zk5KQaID2RFPeje5S+LGjzJmdSX7684/AykmjbgWHfYfM25I5uj4V7Ibed87hwriZLoAymzv\n"
+"ftAj63iP/2SbNDefNWWipAA9EiOWWF3KY4fGoweITedpdopTzfFP7ELyk+OZpDc8h7hi2/Ds\n"
+"Hzc/N19DzFGdtfCXwreFamgLRB7lUe6TzktuhsHSDCRZNhqfLJGP4xjblJUK7ZGqDpncllPj\n"
+"YYPGFrojutzdfhrGe0K22VoF3Jpf1d+42kd92jjbrDnVHmtsKheMYc2xbXIBw8MgAGJoFjHV\n"
+"dqqGuw6qnsb58Nn4DSEC5MUoFlkRudlpcyqSeLiSV5sI8jrlL5WwWLdrIBRtFO8KvH7YVdiI\n"
+"2i/6GaX7i+B/OfVyK4XELKzvGUWSTLNhB9xNH27SgRNcmvMSZ4PPmz+Ln52kuaiWA3rF7iDe\n"
+"M9ovnhp6dB7h7sxaOgTdsxoEqBRjrLdHEoOabPXm6RUVkRqEGQ6UROcSjiVbgGcZ3GOTEAtl\n"
+"Lor6CZpO2oYofaphNdgOpygau1LgePhsumywbrmHXumZNTfxPWQrqaA0k89jL9WB365jJ6Ue\n"
+"To3cKXhZ+PmhIIynJkBugnLNeLLIjzwec+fBH7/PzqUqm9tEZDKgu39cJRNItX+S\n"
+"-----END CERTIFICATE-----",
+
+/* SecureSign Root CA15 */
+"-----BEGIN CERTIFICATE-----\n"
+"MIICIzCCAamgAwIBAgIUFhXHw9hJp75pDIqI7fBw+d23PocwCgYIKoZIzj0EAwMwUTELMAkG\n"
+"A1UEBhMCSlAxIzAhBgNVBAoTGkN5YmVydHJ1c3QgSmFwYW4gQ28uLCBMdGQuMR0wGwYDVQQD\n"
+"ExRTZWN1cmVTaWduIFJvb3QgQ0ExNTAeFw0yMDA0MDgwODMyNTZaFw00NTA0MDgwODMyNTZa\n"
+"MFExCzAJBgNVBAYTAkpQMSMwIQYDVQQKExpDeWJlcnRydXN0IEphcGFuIENvLiwgTHRkLjEd\n"
+"MBsGA1UEAxMUU2VjdXJlU2lnbiBSb290IENBMTUwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAAQL\n"
+"UHSNZDKZmbPSYAi4Io5GdCx4wCtELW1fHcmuS1Iggz24FG1Th2CeX2yF2wYUleDHKP+dX+Sq\n"
+"8bOLbe1PL0vJSpSRZHX+AezB2Ot6lHhWGENfa4HL9rzatAy2KZMIaY+jQjBAMA8GA1UdEwEB\n"
+"/wQFMAMBAf8wDgYDVR0PAQH/BAQDAgEGMB0GA1UdDgQWBBTrQciu/NWeUUj1vYv0hyCTQSvT\n"
+"9DAKBggqhkjOPQQDAwNoADBlAjEA2S6Jfl5OpBEHvVnCB96rMjhTKkZEBhd6zlHp4P9mLQlO\n"
+"4E/0BdGF9jVg3PVys0Z9AjBEmEYagoUeYWmJSwdLZrWeqrqgHkHZAXQ6bkU6iYAZezKYVWOr\n"
+"62Nuk22rGwlgMU4=\n"
+"-----END CERTIFICATE-----",
 #endif  // defined(NODE_WANT_INTERNALS) && NODE_WANT_INTERNALS
diff --git a/tools/certdata.txt b/tools/certdata.txt
index ed5e6cb17cab57..110a814718cfd7 100644
--- a/tools/certdata.txt
+++ b/tools/certdata.txt
@@ -3645,7 +3645,7 @@ CKA_SERIAL_NUMBER MULTILINE_OCTAL
 \002\006\040\006\005\026\160\002
 END
 CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
-CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
 
@@ -7252,7 +7252,7 @@ CKA_SERIAL_NUMBER MULTILINE_OCTAL
 \002\010\136\303\267\246\103\177\244\340
 END
 CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
-CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
 
@@ -17020,8 +17020,14 @@ CKA_VALUE MULTILINE_OCTAL
 \155\015\277\173\327\222
 END
 CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
-CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
-CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+# For Server Distrust After: Sun Jun 30 00:00:00 2024
+CKA_NSS_SERVER_DISTRUST_AFTER MULTILINE_OCTAL
+\062\064\060\066\063\060\060\060\060\060\060\060\132
+END
+# For Email Distrust After: Sun Jun 30 00:00:00 2024
+CKA_NSS_EMAIL_DISTRUST_AFTER MULTILINE_OCTAL
+\062\064\060\066\063\060\060\060\060\060\060\060\132
+END
 
 # Trust for "GLOBALTRUST 2020"
 # Issuer: CN=GLOBALTRUST 2020,O=e-commerce monitoring GmbH,C=AT
@@ -25359,3 +25365,885 @@ CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
 CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
 CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "FIRMAPROFESIONAL CA ROOT-A WEB"
+#
+# Issuer: CN=FIRMAPROFESIONAL CA ROOT-A WEB,OID.2.5.4.97=VATES-A62634068,O=Firmaprofesional SA,C=ES
+# Serial Number:31:97:21:ed:af:89:42:7f:35:41:87:a1:67:56:4c:6d
+# Subject: CN=FIRMAPROFESIONAL CA ROOT-A WEB,OID.2.5.4.97=VATES-A62634068,O=Firmaprofesional SA,C=ES
+# Not Valid Before: Wed Apr 06 09:01:36 2022
+# Not Valid After : Sun Mar 31 09:01:36 2047
+# Fingerprint (SHA-256): BE:F2:56:DA:F2:6E:9C:69:BD:EC:16:02:35:97:98:F3:CA:F7:18:21:A0:3E:01:82:57:C5:3C:65:61:7F:3D:4A
+# Fingerprint (SHA1): A8:31:11:74:A6:14:15:0D:CA:77:DD:0E:E4:0C:5D:58:FC:A0:72:A5
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "FIRMAPROFESIONAL CA ROOT-A WEB"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\156\061\013\060\011\006\003\125\004\006\023\002\105\123\061
+\034\060\032\006\003\125\004\012\014\023\106\151\162\155\141\160
+\162\157\146\145\163\151\157\156\141\154\040\123\101\061\030\060
+\026\006\003\125\004\141\014\017\126\101\124\105\123\055\101\066
+\062\066\063\064\060\066\070\061\047\060\045\006\003\125\004\003
+\014\036\106\111\122\115\101\120\122\117\106\105\123\111\117\116
+\101\114\040\103\101\040\122\117\117\124\055\101\040\127\105\102
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\156\061\013\060\011\006\003\125\004\006\023\002\105\123\061
+\034\060\032\006\003\125\004\012\014\023\106\151\162\155\141\160
+\162\157\146\145\163\151\157\156\141\154\040\123\101\061\030\060
+\026\006\003\125\004\141\014\017\126\101\124\105\123\055\101\066
+\062\066\063\064\060\066\070\061\047\060\045\006\003\125\004\003
+\014\036\106\111\122\115\101\120\122\117\106\105\123\111\117\116
+\101\114\040\103\101\040\122\117\117\124\055\101\040\127\105\102
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\061\227\041\355\257\211\102\177\065\101\207\241\147\126
+\114\155
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\002\172\060\202\002\000\240\003\002\001\002\002\020\061
+\227\041\355\257\211\102\177\065\101\207\241\147\126\114\155\060
+\012\006\010\052\206\110\316\075\004\003\003\060\156\061\013\060
+\011\006\003\125\004\006\023\002\105\123\061\034\060\032\006\003
+\125\004\012\014\023\106\151\162\155\141\160\162\157\146\145\163
+\151\157\156\141\154\040\123\101\061\030\060\026\006\003\125\004
+\141\014\017\126\101\124\105\123\055\101\066\062\066\063\064\060
+\066\070\061\047\060\045\006\003\125\004\003\014\036\106\111\122
+\115\101\120\122\117\106\105\123\111\117\116\101\114\040\103\101
+\040\122\117\117\124\055\101\040\127\105\102\060\036\027\015\062
+\062\060\064\060\066\060\071\060\061\063\066\132\027\015\064\067
+\060\063\063\061\060\071\060\061\063\066\132\060\156\061\013\060
+\011\006\003\125\004\006\023\002\105\123\061\034\060\032\006\003
+\125\004\012\014\023\106\151\162\155\141\160\162\157\146\145\163
+\151\157\156\141\154\040\123\101\061\030\060\026\006\003\125\004
+\141\014\017\126\101\124\105\123\055\101\066\062\066\063\064\060
+\066\070\061\047\060\045\006\003\125\004\003\014\036\106\111\122
+\115\101\120\122\117\106\105\123\111\117\116\101\114\040\103\101
+\040\122\117\117\124\055\101\040\127\105\102\060\166\060\020\006
+\007\052\206\110\316\075\002\001\006\005\053\201\004\000\042\003
+\142\000\004\107\123\352\054\021\244\167\307\052\352\363\326\137
+\173\323\004\221\134\372\210\306\042\271\203\020\142\167\204\063
+\055\351\003\210\324\340\063\367\355\167\054\112\140\352\344\157
+\255\155\264\370\114\212\244\344\037\312\352\117\070\112\056\202
+\163\053\307\146\233\012\214\100\234\174\212\366\362\071\140\262
+\336\313\354\270\344\157\352\233\135\267\123\220\030\062\125\305
+\040\267\224\243\143\060\141\060\017\006\003\125\035\023\001\001
+\377\004\005\060\003\001\001\377\060\037\006\003\125\035\043\004
+\030\060\026\200\024\223\341\103\143\134\074\235\326\047\363\122
+\354\027\262\251\257\054\367\166\370\060\035\006\003\125\035\016
+\004\026\004\024\223\341\103\143\134\074\235\326\047\363\122\354
+\027\262\251\257\054\367\166\370\060\016\006\003\125\035\017\001
+\001\377\004\004\003\002\001\006\060\012\006\010\052\206\110\316
+\075\004\003\003\003\150\000\060\145\002\060\035\174\244\173\303
+\211\165\063\341\073\251\105\277\106\351\351\241\335\311\042\026
+\267\107\021\013\330\232\272\361\310\013\160\120\123\002\221\160
+\205\131\251\036\244\346\352\043\061\240\000\002\061\000\375\342
+\370\263\257\026\271\036\163\304\226\343\301\060\031\330\176\346
+\303\227\336\034\117\270\211\057\063\353\110\017\031\367\207\106
+\135\046\220\245\205\305\271\172\224\076\207\250\275\000
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "FIRMAPROFESIONAL CA ROOT-A WEB"
+# Issuer: CN=FIRMAPROFESIONAL CA ROOT-A WEB,OID.2.5.4.97=VATES-A62634068,O=Firmaprofesional SA,C=ES
+# Serial Number:31:97:21:ed:af:89:42:7f:35:41:87:a1:67:56:4c:6d
+# Subject: CN=FIRMAPROFESIONAL CA ROOT-A WEB,OID.2.5.4.97=VATES-A62634068,O=Firmaprofesional SA,C=ES
+# Not Valid Before: Wed Apr 06 09:01:36 2022
+# Not Valid After : Sun Mar 31 09:01:36 2047
+# Fingerprint (SHA-256): BE:F2:56:DA:F2:6E:9C:69:BD:EC:16:02:35:97:98:F3:CA:F7:18:21:A0:3E:01:82:57:C5:3C:65:61:7F:3D:4A
+# Fingerprint (SHA1): A8:31:11:74:A6:14:15:0D:CA:77:DD:0E:E4:0C:5D:58:FC:A0:72:A5
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "FIRMAPROFESIONAL CA ROOT-A WEB"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\250\061\021\164\246\024\025\015\312\167\335\016\344\014\135\130
+\374\240\162\245
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\202\262\255\105\000\202\260\146\143\370\137\303\147\116\316\243
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\156\061\013\060\011\006\003\125\004\006\023\002\105\123\061
+\034\060\032\006\003\125\004\012\014\023\106\151\162\155\141\160
+\162\157\146\145\163\151\157\156\141\154\040\123\101\061\030\060
+\026\006\003\125\004\141\014\017\126\101\124\105\123\055\101\066
+\062\066\063\064\060\066\070\061\047\060\045\006\003\125\004\003
+\014\036\106\111\122\115\101\120\122\117\106\105\123\111\117\116
+\101\114\040\103\101\040\122\117\117\124\055\101\040\127\105\102
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\061\227\041\355\257\211\102\177\065\101\207\241\147\126
+\114\155
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "TWCA CYBER Root CA"
+#
+# Issuer: CN=TWCA CYBER Root CA,OU=Root CA,O=TAIWAN-CA,C=TW
+# Serial Number:40:01:34:8c:c2:00:00:00:00:00:00:00:01:3c:f2:c6
+# Subject: CN=TWCA CYBER Root CA,OU=Root CA,O=TAIWAN-CA,C=TW
+# Not Valid Before: Tue Nov 22 06:54:29 2022
+# Not Valid After : Fri Nov 22 15:59:59 2047
+# Fingerprint (SHA-256): 3F:63:BB:28:14:BE:17:4E:C8:B6:43:9C:F0:8D:6D:56:F0:B7:C4:05:88:3A:56:48:A3:34:42:4D:6B:3E:C5:58
+# Fingerprint (SHA1): F6:B1:1C:1A:83:38:E9:7B:DB:B3:A8:C8:33:24:E0:2D:9C:7F:26:66
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "TWCA CYBER Root CA"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\120\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\033\060\031\006\003\125\004\003\023\022
+\124\127\103\101\040\103\131\102\105\122\040\122\157\157\164\040
+\103\101
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\120\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\033\060\031\006\003\125\004\003\023\022
+\124\127\103\101\040\103\131\102\105\122\040\122\157\157\164\040
+\103\101
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\100\001\064\214\302\000\000\000\000\000\000\000\001\074
+\362\306
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\005\215\060\202\003\165\240\003\002\001\002\002\020\100
+\001\064\214\302\000\000\000\000\000\000\000\001\074\362\306\060
+\015\006\011\052\206\110\206\367\015\001\001\014\005\000\060\120
+\061\013\060\011\006\003\125\004\006\023\002\124\127\061\022\060
+\020\006\003\125\004\012\023\011\124\101\111\127\101\116\055\103
+\101\061\020\060\016\006\003\125\004\013\023\007\122\157\157\164
+\040\103\101\061\033\060\031\006\003\125\004\003\023\022\124\127
+\103\101\040\103\131\102\105\122\040\122\157\157\164\040\103\101
+\060\036\027\015\062\062\061\061\062\062\060\066\065\064\062\071
+\132\027\015\064\067\061\061\062\062\061\065\065\071\065\071\132
+\060\120\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\033\060\031\006\003\125\004\003\023\022
+\124\127\103\101\040\103\131\102\105\122\040\122\157\157\164\040
+\103\101\060\202\002\042\060\015\006\011\052\206\110\206\367\015
+\001\001\001\005\000\003\202\002\017\000\060\202\002\012\002\202
+\002\001\000\306\370\312\036\331\011\040\176\035\154\116\316\217
+\343\107\063\104\234\307\311\151\252\072\133\170\356\160\322\222
+\370\004\263\122\122\035\147\162\050\241\337\213\135\225\012\376
+\352\315\355\367\051\316\360\157\177\254\315\075\357\263\034\105
+\152\367\050\220\361\141\127\305\014\304\243\120\135\336\324\265
+\313\031\312\200\271\165\316\051\316\322\205\042\354\002\143\314
+\104\060\040\332\352\221\133\126\346\035\034\325\235\146\307\077
+\337\206\312\113\123\304\331\215\262\035\352\370\334\047\123\243
+\107\341\141\314\175\265\260\370\356\163\221\305\316\163\157\316
+\356\020\037\032\006\317\351\047\140\305\117\031\344\353\316\042
+\046\105\327\140\231\335\316\117\067\340\177\347\143\255\260\270
+\131\270\320\006\150\065\140\323\066\256\161\103\004\361\151\145
+\170\174\363\037\363\312\050\237\132\040\225\146\264\315\267\356
+\217\170\244\105\030\351\046\057\215\233\051\050\261\244\267\072
+\155\271\324\034\070\162\105\130\261\136\353\360\050\233\267\202
+\312\375\317\326\063\017\237\373\227\236\261\034\234\236\352\137
+\136\333\252\335\124\351\060\041\050\155\216\171\363\165\222\214
+\046\376\334\305\366\303\260\337\104\131\103\243\266\003\050\366
+\010\060\252\015\063\341\357\234\251\007\042\343\131\133\100\217
+\332\210\267\151\010\250\267\043\056\104\011\131\067\133\307\343
+\027\362\042\353\156\071\122\305\336\124\247\230\311\113\040\225
+\334\106\211\137\264\022\371\205\051\216\353\310\047\025\040\300
+\113\324\314\174\014\154\064\014\046\233\046\061\246\074\247\366
+\331\320\113\242\144\377\073\231\101\162\301\340\160\227\361\044
+\273\053\304\164\042\261\254\153\042\062\044\323\170\052\300\300
+\241\057\361\122\005\311\077\357\166\146\342\105\330\015\075\255
+\225\310\307\211\046\310\017\256\247\003\056\373\301\137\372\040
+\341\160\255\260\145\040\067\063\140\260\325\257\327\014\034\302
+\220\160\327\112\030\274\176\001\260\260\353\025\036\104\006\315
+\244\117\350\014\321\303\040\020\341\124\145\236\266\121\320\032
+\166\153\102\132\130\166\064\352\267\067\031\256\056\165\371\226
+\345\301\131\367\224\127\051\045\215\072\114\253\115\232\101\320
+\137\046\003\002\003\001\000\001\243\143\060\141\060\016\006\003
+\125\035\017\001\001\377\004\004\003\002\001\006\060\017\006\003
+\125\035\023\001\001\377\004\005\060\003\001\001\377\060\037\006
+\003\125\035\043\004\030\060\026\200\024\235\205\141\024\174\301
+\142\157\227\150\344\117\067\100\341\255\340\015\126\067\060\035
+\006\003\125\035\016\004\026\004\024\235\205\141\024\174\301\142
+\157\227\150\344\117\067\100\341\255\340\015\126\067\060\015\006
+\011\052\206\110\206\367\015\001\001\014\005\000\003\202\002\001
+\000\144\217\172\304\142\016\265\210\314\270\307\206\016\241\112
+\026\315\160\013\267\247\205\013\263\166\266\017\247\377\010\213
+\013\045\317\250\324\203\165\052\270\226\210\266\373\337\055\055
+\264\151\123\041\065\127\326\211\115\163\277\151\217\160\243\141
+\314\232\333\036\232\340\040\370\154\273\233\042\235\135\204\061
+\232\054\212\335\152\241\327\050\151\312\376\166\125\172\106\147
+\353\314\103\210\026\242\003\326\271\027\370\031\154\155\043\002
+\177\361\137\320\012\051\043\073\321\252\012\355\251\027\046\124
+\012\115\302\245\115\370\305\375\270\201\317\053\054\170\243\147
+\114\251\007\232\363\337\136\373\174\365\211\315\164\227\141\020
+\152\007\053\201\132\322\216\267\347\040\321\040\156\044\250\204
+\047\241\127\254\252\125\130\057\334\331\312\372\150\004\236\355
+\104\044\371\164\100\073\043\063\253\203\132\030\046\102\266\155
+\124\265\026\140\060\154\261\240\370\270\101\240\135\111\111\322
+\145\005\072\352\376\235\141\274\206\331\277\336\323\272\072\261
+\177\176\222\064\216\311\000\156\334\230\275\334\354\200\005\255
+\002\075\337\145\355\013\003\367\367\026\204\004\061\272\223\224
+\330\362\022\370\212\343\277\102\257\247\324\315\021\027\026\310
+\102\035\024\250\102\366\322\100\206\240\117\043\312\226\105\126
+\140\006\315\267\125\001\246\001\224\145\376\156\005\011\272\264
+\244\252\342\357\130\276\275\047\126\330\357\163\161\133\104\063
+\362\232\162\352\260\136\076\156\251\122\133\354\160\155\265\207
+\217\067\136\074\214\234\316\344\360\316\014\147\101\314\316\366
+\200\253\116\314\114\126\365\301\141\131\223\264\076\246\332\270
+\067\022\237\052\062\343\213\270\041\354\303\053\145\014\357\042
+\336\210\051\073\114\327\372\376\267\341\107\276\234\076\076\203
+\373\121\135\365\150\367\056\041\205\334\277\361\132\342\174\327
+\305\344\203\301\152\353\272\200\132\336\134\055\160\166\370\310
+\345\207\207\312\240\235\241\345\042\022\047\017\104\075\035\154
+\352\324\302\213\057\157\171\253\177\120\246\304\031\247\241\172
+\267\226\371\301\037\142\132\242\103\007\100\136\046\306\254\355
+\256\160\026\305\252\312\162\212\115\260\317\001\213\003\077\156
+\327
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "TWCA CYBER Root CA"
+# Issuer: CN=TWCA CYBER Root CA,OU=Root CA,O=TAIWAN-CA,C=TW
+# Serial Number:40:01:34:8c:c2:00:00:00:00:00:00:00:01:3c:f2:c6
+# Subject: CN=TWCA CYBER Root CA,OU=Root CA,O=TAIWAN-CA,C=TW
+# Not Valid Before: Tue Nov 22 06:54:29 2022
+# Not Valid After : Fri Nov 22 15:59:59 2047
+# Fingerprint (SHA-256): 3F:63:BB:28:14:BE:17:4E:C8:B6:43:9C:F0:8D:6D:56:F0:B7:C4:05:88:3A:56:48:A3:34:42:4D:6B:3E:C5:58
+# Fingerprint (SHA1): F6:B1:1C:1A:83:38:E9:7B:DB:B3:A8:C8:33:24:E0:2D:9C:7F:26:66
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "TWCA CYBER Root CA"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\366\261\034\032\203\070\351\173\333\263\250\310\063\044\340\055
+\234\177\046\146
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\013\063\240\227\122\225\324\251\375\273\333\156\243\125\133\121
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\120\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\033\060\031\006\003\125\004\003\023\022
+\124\127\103\101\040\103\131\102\105\122\040\122\157\157\164\040
+\103\101
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\100\001\064\214\302\000\000\000\000\000\000\000\001\074
+\362\306
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "TWCA Global Root CA G2"
+#
+# Issuer: CN=TWCA Global Root CA G2,OU=Root CA,O=TAIWAN-CA,C=TW
+# Serial Number:40:01:34:8c:c2:00:00:00:00:00:00:00:01:97:58:f4
+# Subject: CN=TWCA Global Root CA G2,OU=Root CA,O=TAIWAN-CA,C=TW
+# Not Valid Before: Tue Nov 22 06:42:21 2022
+# Not Valid After : Fri Nov 22 15:59:59 2047
+# Fingerprint (SHA-256): 3A:00:72:D4:9F:FC:04:E9:96:C5:9A:EB:75:99:1D:3C:34:0F:36:15:D6:FD:4D:CE:90:AC:0B:3D:88:EA:D4:F4
+# Fingerprint (SHA1): 73:FE:92:2F:83:63:91:FF:C8:C6:C4:DA:D6:20:2F:6B:07:2E:7F:1B
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "TWCA Global Root CA G2"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\124\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\037\060\035\006\003\125\004\003\023\026
+\124\127\103\101\040\107\154\157\142\141\154\040\122\157\157\164
+\040\103\101\040\107\062
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\124\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\037\060\035\006\003\125\004\003\023\026
+\124\127\103\101\040\107\154\157\142\141\154\040\122\157\157\164
+\040\103\101\040\107\062
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\100\001\064\214\302\000\000\000\000\000\000\000\001\227
+\130\364
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\005\225\060\202\003\175\240\003\002\001\002\002\020\100
+\001\064\214\302\000\000\000\000\000\000\000\001\227\130\364\060
+\015\006\011\052\206\110\206\367\015\001\001\014\005\000\060\124
+\061\013\060\011\006\003\125\004\006\023\002\124\127\061\022\060
+\020\006\003\125\004\012\023\011\124\101\111\127\101\116\055\103
+\101\061\020\060\016\006\003\125\004\013\023\007\122\157\157\164
+\040\103\101\061\037\060\035\006\003\125\004\003\023\026\124\127
+\103\101\040\107\154\157\142\141\154\040\122\157\157\164\040\103
+\101\040\107\062\060\036\027\015\062\062\061\061\062\062\060\066
+\064\062\062\061\132\027\015\064\067\061\061\062\062\061\065\065
+\071\065\071\132\060\124\061\013\060\011\006\003\125\004\006\023
+\002\124\127\061\022\060\020\006\003\125\004\012\023\011\124\101
+\111\127\101\116\055\103\101\061\020\060\016\006\003\125\004\013
+\023\007\122\157\157\164\040\103\101\061\037\060\035\006\003\125
+\004\003\023\026\124\127\103\101\040\107\154\157\142\141\154\040
+\122\157\157\164\040\103\101\040\107\062\060\202\002\042\060\015
+\006\011\052\206\110\206\367\015\001\001\001\005\000\003\202\002
+\017\000\060\202\002\012\002\202\002\001\000\252\016\325\040\222
+\001\255\202\371\014\010\221\064\153\212\026\320\106\026\377\003
+\270\330\215\352\223\064\373\377\053\275\375\156\252\334\233\362
+\206\201\125\365\211\034\304\215\165\152\130\170\221\023\036\002
+\023\160\075\357\276\012\347\000\217\270\061\345\164\305\060\276
+\377\175\326\231\345\302\102\243\317\041\326\263\010\177\221\325
+\141\346\242\225\020\015\357\136\227\013\111\070\325\042\260\327
+\213\131\157\237\065\233\177\322\221\314\172\177\273\240\237\336
+\125\063\366\113\215\012\352\175\011\300\171\334\275\104\342\376
+\034\347\144\041\050\317\004\112\342\264\277\206\171\052\273\016
+\223\311\217\136\254\060\071\122\220\007\271\352\234\046\102\024
+\304\147\106\376\321\032\150\241\076\120\031\243\046\012\047\051
+\220\302\366\264\353\163\232\170\036\341\230\364\145\014\065\041
+\006\370\013\336\142\345\115\301\263\135\331\271\372\141\227\052
+\343\352\307\104\125\044\222\376\022\247\077\304\167\340\055\002
+\201\007\325\373\175\346\020\236\072\264\250\357\354\373\120\352
+\065\317\314\176\273\102\271\104\154\122\351\277\052\162\037\077
+\336\233\160\351\334\132\305\073\273\277\360\131\205\257\057\301
+\260\024\171\005\254\165\237\045\365\021\047\006\140\041\307\155
+\145\276\250\211\234\345\254\106\337\370\135\104\003\215\140\275
+\367\261\015\314\057\357\101\124\057\356\153\225\271\116\174\064
+\337\073\371\167\235\175\315\007\075\034\006\063\022\200\354\162
+\234\362\055\202\332\325\073\304\307\371\004\303\144\002\174\365
+\065\140\247\264\106\051\056\033\357\245\130\200\056\172\211\121
+\070\066\074\375\241\167\270\200\060\320\212\336\215\247\064\046
+\354\043\273\030\125\030\066\105\356\355\001\006\252\115\277\144
+\014\312\230\227\032\061\002\146\370\170\150\133\210\337\011\250
+\347\233\372\064\155\160\034\041\255\010\213\362\241\266\254\166
+\152\277\361\200\045\000\276\074\036\115\256\271\074\266\225\143
+\275\153\176\107\022\220\125\105\021\215\354\027\037\301\276\047
+\201\223\127\143\151\000\046\167\213\303\131\345\173\321\015\104
+\362\250\360\367\205\232\005\367\302\056\160\232\223\205\330\225
+\220\061\220\124\246\354\013\237\067\105\017\002\003\001\000\001
+\243\143\060\141\060\016\006\003\125\035\017\001\001\377\004\004
+\003\002\001\006\060\017\006\003\125\035\023\001\001\377\004\005
+\060\003\001\001\377\060\037\006\003\125\035\043\004\030\060\026
+\200\024\222\214\324\066\321\133\107\123\304\161\015\204\335\144
+\052\365\066\144\100\347\060\035\006\003\125\035\016\004\026\004
+\024\222\214\324\066\321\133\107\123\304\161\015\204\335\144\052
+\365\066\144\100\347\060\015\006\011\052\206\110\206\367\015\001
+\001\014\005\000\003\202\002\001\000\045\374\113\332\220\264\332
+\165\347\101\072\201\321\246\376\240\152\363\030\161\142\152\044
+\010\213\251\172\115\311\125\316\317\020\050\056\004\031\226\005
+\317\135\002\040\052\073\263\125\077\001\315\102\315\262\167\355
+\377\165\363\174\167\333\226\245\317\214\147\006\364\244\233\162
+\366\041\111\011\230\243\062\136\167\132\143\011\357\142\103\227
+\002\070\265\352\074\030\120\150\374\131\133\331\171\324\361\344
+\126\110\023\126\330\323\161\013\136\170\224\070\021\105\372\005
+\027\365\016\165\036\142\122\141\106\272\056\031\255\206\264\210
+\017\261\120\346\100\000\064\032\225\235\223\340\121\371\324\125
+\106\351\225\074\045\206\056\227\327\001\061\030\104\354\034\140
+\351\175\151\257\062\370\227\100\045\044\266\215\032\125\074\305
+\267\367\274\006\122\073\161\060\160\076\161\027\176\361\146\004
+\136\135\274\212\061\103\246\222\035\173\124\322\245\066\213\157
+\215\326\136\332\324\303\056\035\337\071\125\140\202\060\236\047
+\377\216\200\335\143\114\246\125\065\330\320\063\251\200\155\076
+\136\235\314\250\147\200\146\372\231\127\014\122\312\031\165\260
+\070\065\125\052\201\305\214\036\126\327\137\220\362\040\330\332
+\340\146\161\351\262\170\253\147\271\044\156\153\066\162\374\157
+\215\375\177\162\071\050\147\122\221\005\037\127\145\322\243\247
+\015\141\372\241\347\325\065\106\225\311\006\207\366\060\354\062
+\121\251\254\126\300\041\116\243\024\164\005\072\274\343\277\155
+\075\116\077\136\245\244\155\051\277\204\121\165\123\216\206\032
+\365\121\160\052\015\034\116\100\341\375\243\343\245\053\147\220
+\222\307\154\256\205\277\072\233\027\025\312\234\052\223\324\115
+\071\015\274\040\010\243\215\210\154\011\015\214\256\104\041\115
+\311\161\354\330\046\327\027\236\055\021\030\074\243\042\175\270
+\047\124\277\150\310\073\102\314\217\136\116\347\334\302\305\372
+\152\104\017\215\126\210\172\337\211\204\154\240\263\076\075\361
+\145\000\011\210\352\052\353\100\316\263\135\254\062\027\256\301
+\233\351\320\301\365\111\224\335\247\316\174\132\007\353\256\040
+\234\027\060\222\151\223\162\363\232\133\161\233\376\152\337\172
+\060\151\216\263\056\333\017\054\335
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "TWCA Global Root CA G2"
+# Issuer: CN=TWCA Global Root CA G2,OU=Root CA,O=TAIWAN-CA,C=TW
+# Serial Number:40:01:34:8c:c2:00:00:00:00:00:00:00:01:97:58:f4
+# Subject: CN=TWCA Global Root CA G2,OU=Root CA,O=TAIWAN-CA,C=TW
+# Not Valid Before: Tue Nov 22 06:42:21 2022
+# Not Valid After : Fri Nov 22 15:59:59 2047
+# Fingerprint (SHA-256): 3A:00:72:D4:9F:FC:04:E9:96:C5:9A:EB:75:99:1D:3C:34:0F:36:15:D6:FD:4D:CE:90:AC:0B:3D:88:EA:D4:F4
+# Fingerprint (SHA1): 73:FE:92:2F:83:63:91:FF:C8:C6:C4:DA:D6:20:2F:6B:07:2E:7F:1B
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "TWCA Global Root CA G2"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\163\376\222\057\203\143\221\377\310\306\304\332\326\040\057\153
+\007\056\177\033
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\023\215\135\372\031\265\346\253\144\173\020\164\160\032\043\056
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\124\061\013\060\011\006\003\125\004\006\023\002\124\127\061
+\022\060\020\006\003\125\004\012\023\011\124\101\111\127\101\116
+\055\103\101\061\020\060\016\006\003\125\004\013\023\007\122\157
+\157\164\040\103\101\061\037\060\035\006\003\125\004\003\023\026
+\124\127\103\101\040\107\154\157\142\141\154\040\122\157\157\164
+\040\103\101\040\107\062
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\020\100\001\064\214\302\000\000\000\000\000\000\000\001\227
+\130\364
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "SecureSign Root CA12"
+#
+# Issuer: CN=SecureSign Root CA12,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:66:f9:c7:c1:af:ec:c2:51:b4:ed:53:97:e6:e6:82:c3:2b:1c:90:16
+# Subject: CN=SecureSign Root CA12,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 05:36:46 2020
+# Not Valid After : Sun Apr 08 05:36:46 2040
+# Fingerprint (SHA-256): 3F:03:4B:B5:70:4D:44:B2:D0:85:45:A0:20:57:DE:93:EB:F3:90:5F:CE:72:1A:CB:C7:30:C0:6D:DA:EE:90:4E
+# Fingerprint (SHA1): 7A:22:1E:3D:DE:1B:06:AC:9E:C8:47:70:16:8E:3C:E5:F7:6B:06:F4
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA12"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\062
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\062
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\146\371\307\301\257\354\302\121\264\355\123\227\346\346
+\202\303\053\034\220\026
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\003\162\060\202\002\132\240\003\002\001\002\002\024\146
+\371\307\301\257\354\302\121\264\355\123\227\346\346\202\303\053
+\034\220\026\060\015\006\011\052\206\110\206\367\015\001\001\013
+\005\000\060\121\061\013\060\011\006\003\125\004\006\023\002\112
+\120\061\043\060\041\006\003\125\004\012\023\032\103\171\142\145
+\162\164\162\165\163\164\040\112\141\160\141\156\040\103\157\056
+\054\040\114\164\144\056\061\035\060\033\006\003\125\004\003\023
+\024\123\145\143\165\162\145\123\151\147\156\040\122\157\157\164
+\040\103\101\061\062\060\036\027\015\062\060\060\064\060\070\060
+\065\063\066\064\066\132\027\015\064\060\060\064\060\070\060\065
+\063\066\064\066\132\060\121\061\013\060\011\006\003\125\004\006
+\023\002\112\120\061\043\060\041\006\003\125\004\012\023\032\103
+\171\142\145\162\164\162\165\163\164\040\112\141\160\141\156\040
+\103\157\056\054\040\114\164\144\056\061\035\060\033\006\003\125
+\004\003\023\024\123\145\143\165\162\145\123\151\147\156\040\122
+\157\157\164\040\103\101\061\062\060\202\001\042\060\015\006\011
+\052\206\110\206\367\015\001\001\001\005\000\003\202\001\017\000
+\060\202\001\012\002\202\001\001\000\272\071\301\067\172\150\105
+\053\024\264\353\344\023\353\127\165\043\115\217\044\055\026\350
+\256\216\311\175\244\127\073\052\166\045\063\203\154\352\062\212
+\224\233\116\074\226\344\375\121\277\231\311\223\176\277\371\255
+\247\262\110\053\007\034\047\365\114\274\160\022\167\244\205\124
+\265\375\220\172\344\243\344\121\130\003\315\020\171\171\356\153
+\223\037\144\216\153\144\253\243\023\343\161\376\175\253\234\335
+\047\123\067\263\252\030\302\131\046\354\133\037\322\346\145\174
+\357\223\275\330\130\134\013\300\343\145\157\074\307\312\131\343
+\376\156\137\254\203\276\375\135\045\116\052\051\073\326\013\253
+\027\062\170\244\341\076\224\106\276\142\156\233\336\106\250\261
+\026\347\205\156\364\010\100\105\021\240\236\124\104\204\367\330
+\066\316\365\120\107\334\054\060\233\356\300\365\226\322\376\011
+\206\307\006\131\256\117\256\216\021\230\173\363\013\122\252\142
+\046\252\041\337\216\045\063\171\227\026\111\215\365\076\325\107
+\237\067\061\111\063\162\005\115\014\266\125\214\361\127\217\212
+\207\321\255\305\021\022\071\240\255\002\003\001\000\001\243\102
+\060\100\060\017\006\003\125\035\023\001\001\377\004\005\060\003
+\001\001\377\060\016\006\003\125\035\017\001\001\377\004\004\003
+\002\001\006\060\035\006\003\125\035\016\004\026\004\024\127\064
+\363\164\317\004\113\325\045\346\361\100\266\054\114\331\055\351
+\240\255\060\015\006\011\052\206\110\206\367\015\001\001\013\005
+\000\003\202\001\001\000\076\273\333\027\026\322\362\024\001\040
+\054\070\203\113\255\276\312\205\172\232\266\233\153\246\341\374
+\245\072\254\255\264\050\072\257\327\001\203\111\053\143\242\335
+\232\144\016\230\134\157\335\216\273\212\124\042\055\112\023\363
+\256\100\103\333\117\221\267\206\032\354\000\264\101\201\244\117
+\372\152\213\210\263\166\010\162\052\111\100\303\323\303\205\211
+\230\020\245\235\157\031\267\273\317\172\145\125\333\067\353\074
+\212\162\062\227\036\232\051\076\255\215\346\243\033\155\365\165
+\032\346\260\150\271\133\242\356\151\107\047\065\241\206\231\200
+\363\063\113\341\153\244\046\303\357\164\131\154\172\242\144\266
+\036\104\303\120\340\017\071\075\251\063\361\245\363\322\275\142
+\204\254\216\034\251\315\132\275\067\073\156\012\042\264\364\025
+\347\221\130\305\072\104\323\225\050\331\300\145\351\162\312\320
+\017\275\037\263\025\331\251\343\244\107\011\236\340\313\067\373
+\375\275\227\325\276\030\032\151\242\071\201\331\032\365\253\177
+\310\343\342\147\013\235\364\014\352\124\337\322\262\257\261\042
+\361\040\337\274\104\034
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "SecureSign Root CA12"
+# Issuer: CN=SecureSign Root CA12,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:66:f9:c7:c1:af:ec:c2:51:b4:ed:53:97:e6:e6:82:c3:2b:1c:90:16
+# Subject: CN=SecureSign Root CA12,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 05:36:46 2020
+# Not Valid After : Sun Apr 08 05:36:46 2040
+# Fingerprint (SHA-256): 3F:03:4B:B5:70:4D:44:B2:D0:85:45:A0:20:57:DE:93:EB:F3:90:5F:CE:72:1A:CB:C7:30:C0:6D:DA:EE:90:4E
+# Fingerprint (SHA1): 7A:22:1E:3D:DE:1B:06:AC:9E:C8:47:70:16:8E:3C:E5:F7:6B:06:F4
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA12"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\172\042\036\075\336\033\006\254\236\310\107\160\026\216\074\345
+\367\153\006\364
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\306\211\312\144\102\233\142\010\111\013\036\177\351\007\075\350
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\062
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\146\371\307\301\257\354\302\121\264\355\123\227\346\346
+\202\303\053\034\220\026
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "SecureSign Root CA14"
+#
+# Issuer: CN=SecureSign Root CA14,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:64:db:5a:0c:20:4e:e8:d7:29:77:c8:50:27:a2:5a:27:dd:2d:f2:cb
+# Subject: CN=SecureSign Root CA14,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 07:06:19 2020
+# Not Valid After : Sat Apr 08 07:06:19 2045
+# Fingerprint (SHA-256): 4B:00:9C:10:34:49:4F:9A:B5:6B:BA:3B:A1:D6:27:31:FC:4D:20:D8:95:5A:DC:EC:10:A9:25:60:72:61:E3:38
+# Fingerprint (SHA1): DD:50:C0:F7:79:B3:64:2E:74:A2:B8:9D:9F:D3:40:DD:BB:F0:F2:4F
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA14"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\064
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\064
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\144\333\132\014\040\116\350\327\051\167\310\120\047\242
+\132\047\335\055\362\313
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\005\162\060\202\003\132\240\003\002\001\002\002\024\144
+\333\132\014\040\116\350\327\051\167\310\120\047\242\132\047\335
+\055\362\313\060\015\006\011\052\206\110\206\367\015\001\001\014
+\005\000\060\121\061\013\060\011\006\003\125\004\006\023\002\112
+\120\061\043\060\041\006\003\125\004\012\023\032\103\171\142\145
+\162\164\162\165\163\164\040\112\141\160\141\156\040\103\157\056
+\054\040\114\164\144\056\061\035\060\033\006\003\125\004\003\023
+\024\123\145\143\165\162\145\123\151\147\156\040\122\157\157\164
+\040\103\101\061\064\060\036\027\015\062\060\060\064\060\070\060
+\067\060\066\061\071\132\027\015\064\065\060\064\060\070\060\067
+\060\066\061\071\132\060\121\061\013\060\011\006\003\125\004\006
+\023\002\112\120\061\043\060\041\006\003\125\004\012\023\032\103
+\171\142\145\162\164\162\165\163\164\040\112\141\160\141\156\040
+\103\157\056\054\040\114\164\144\056\061\035\060\033\006\003\125
+\004\003\023\024\123\145\143\165\162\145\123\151\147\156\040\122
+\157\157\164\040\103\101\061\064\060\202\002\042\060\015\006\011
+\052\206\110\206\367\015\001\001\001\005\000\003\202\002\017\000
+\060\202\002\012\002\202\002\001\000\305\322\172\241\326\212\277
+\026\061\320\230\321\072\224\374\132\270\156\042\301\142\367\247
+\012\047\357\120\366\056\261\236\150\022\360\154\044\143\071\361
+\360\337\020\306\336\267\122\040\325\122\133\102\231\236\363\240
+\276\122\037\137\314\147\155\247\056\120\242\301\227\215\266\370
+\225\365\260\272\334\235\340\276\313\337\367\070\362\107\365\246
+\232\222\225\052\142\131\120\013\242\261\065\347\145\262\141\262
+\352\222\161\151\344\051\360\117\201\201\004\074\262\245\133\324
+\305\250\131\147\173\125\034\111\253\172\235\302\347\163\115\357
+\315\011\302\304\127\022\333\001\016\043\171\011\007\073\242\350
+\374\212\317\217\300\106\044\234\070\047\340\203\235\033\240\277
+\170\025\020\353\206\116\012\132\375\337\332\054\202\176\356\312
+\366\051\341\372\161\241\367\210\150\234\234\360\215\276\017\111
+\221\330\352\072\371\375\320\150\161\333\351\265\053\116\202\222
+\157\146\037\340\360\334\114\354\312\321\352\272\164\006\371\263
+\204\220\224\321\137\216\163\031\020\135\002\345\160\245\300\020
+\320\020\174\157\305\130\111\264\260\156\232\332\175\225\365\314
+\332\002\257\270\054\175\171\217\276\103\361\371\050\050\215\011
+\103\370\010\335\153\310\213\054\044\261\215\122\007\275\170\233
+\313\312\150\262\244\335\014\114\171\140\306\231\321\223\361\060
+\032\007\323\256\042\302\352\316\361\204\011\314\340\024\156\177
+\077\176\322\202\205\254\334\251\026\116\205\240\140\313\366\234
+\327\310\263\216\355\306\233\230\165\015\125\350\137\345\225\213
+\002\244\256\103\051\050\021\244\346\022\060\001\113\165\153\036
+\146\235\171\057\245\166\057\035\100\264\155\311\175\171\010\354
+\321\152\266\135\052\262\245\146\275\153\205\364\164\126\303\365
+\347\165\122\050\054\245\377\146\107\245\324\376\376\236\124\277
+\145\176\001\326\060\217\245\066\234\242\120\034\356\070\200\001
+\110\306\307\164\364\306\254\303\100\111\026\141\164\054\257\214
+\157\065\355\173\030\000\133\066\074\234\120\015\312\222\063\020
+\361\046\111\155\337\165\044\067\202\042\327\350\226\375\025\113
+\002\226\076\007\162\225\176\253\075\114\056\327\312\360\337\340
+\130\077\055\057\004\232\070\243\001\002\003\001\000\001\243\102
+\060\100\060\017\006\003\125\035\023\001\001\377\004\005\060\003
+\001\001\377\060\016\006\003\125\035\017\001\001\377\004\004\003
+\002\001\006\060\035\006\003\125\035\016\004\026\004\024\006\223
+\243\012\136\050\151\067\252\141\035\353\353\374\055\157\043\344
+\363\240\060\015\006\011\052\206\110\206\367\015\001\001\014\005
+\000\003\202\002\001\000\226\200\162\011\006\176\234\314\223\004
+\026\273\240\072\215\222\116\267\021\032\012\161\161\020\315\004
+\255\177\245\105\120\020\146\116\112\101\242\003\331\021\117\172
+\067\271\113\342\306\217\062\146\165\045\373\353\316\077\003\051
+\046\215\270\026\035\366\037\063\156\110\346\350\370\127\262\033
+\171\337\073\207\012\342\144\272\000\312\154\357\176\320\043\353
+\170\217\377\144\233\064\067\237\065\145\242\244\000\075\022\043
+\226\130\135\312\143\207\306\243\007\210\115\347\151\166\212\123
+\315\361\117\354\102\362\223\343\231\244\067\074\207\270\142\333
+\360\354\037\067\077\067\137\103\314\121\235\265\360\227\302\267
+\205\152\150\013\104\036\345\121\356\223\316\113\156\206\301\322
+\014\044\131\066\032\237\054\221\217\343\030\333\224\225\012\355
+\221\252\016\231\334\226\123\343\141\203\306\026\272\043\272\334
+\335\176\032\306\173\102\266\331\132\005\334\232\137\325\337\270
+\332\107\175\332\070\333\254\071\325\036\153\154\052\027\214\141
+\315\261\155\162\001\303\303\040\000\142\150\026\061\325\166\252
+\206\273\016\252\236\306\371\360\331\370\015\041\002\344\305\050
+\026\131\021\271\331\151\163\052\222\170\270\222\127\233\010\362
+\072\345\057\225\260\130\267\153\040\024\155\024\357\012\274\176
+\330\125\330\210\332\057\372\031\245\373\213\340\177\071\365\162
+\053\205\304\054\254\357\031\105\222\114\263\141\007\334\115\037
+\156\322\201\023\134\232\363\022\147\203\317\233\077\213\237\235
+\244\271\250\226\003\172\305\356\040\336\063\332\057\236\032\172
+\164\036\341\356\314\132\072\004\335\263\032\004\250\024\143\254
+\267\107\022\203\232\154\365\346\351\025\025\221\032\204\031\016
+\224\104\347\022\216\045\133\200\147\031\334\143\223\020\013\145
+\056\212\372\011\232\116\332\206\050\175\252\141\065\330\016\247
+\050\032\273\122\340\170\370\154\272\154\260\156\271\207\136\351
+\231\065\067\361\075\144\053\251\240\064\223\317\143\057\325\201
+\337\256\143\047\245\036\116\215\334\051\170\131\370\371\241\040
+\214\247\046\100\156\202\162\315\170\262\310\217\074\036\163\347
+\301\037\277\317\316\245\052\233\333\104\144\062\240\273\177\134
+\045\023\110\265\177\222
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "SecureSign Root CA14"
+# Issuer: CN=SecureSign Root CA14,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:64:db:5a:0c:20:4e:e8:d7:29:77:c8:50:27:a2:5a:27:dd:2d:f2:cb
+# Subject: CN=SecureSign Root CA14,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 07:06:19 2020
+# Not Valid After : Sat Apr 08 07:06:19 2045
+# Fingerprint (SHA-256): 4B:00:9C:10:34:49:4F:9A:B5:6B:BA:3B:A1:D6:27:31:FC:4D:20:D8:95:5A:DC:EC:10:A9:25:60:72:61:E3:38
+# Fingerprint (SHA1): DD:50:C0:F7:79:B3:64:2E:74:A2:B8:9D:9F:D3:40:DD:BB:F0:F2:4F
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA14"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\335\120\300\367\171\263\144\056\164\242\270\235\237\323\100\335
+\273\360\362\117
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\161\015\162\372\222\031\145\136\211\004\254\026\063\360\274\325
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\064
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\144\333\132\014\040\116\350\327\051\167\310\120\047\242
+\132\047\335\055\362\313
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE
+
+#
+# Certificate "SecureSign Root CA15"
+#
+# Issuer: CN=SecureSign Root CA15,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:16:15:c7:c3:d8:49:a7:be:69:0c:8a:88:ed:f0:70:f9:dd:b7:3e:87
+# Subject: CN=SecureSign Root CA15,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 08:32:56 2020
+# Not Valid After : Sat Apr 08 08:32:56 2045
+# Fingerprint (SHA-256): E7:78:F0:F0:95:FE:84:37:29:CD:1A:00:82:17:9E:53:14:A9:C2:91:44:28:05:E1:FB:1D:8F:B6:B8:88:6C:3A
+# Fingerprint (SHA1): CB:BA:83:C8:C1:5A:5D:F1:F9:73:6F:CA:D7:EF:28:13:06:4A:07:7D
+CKA_CLASS CK_OBJECT_CLASS CKO_CERTIFICATE
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA15"
+CKA_CERTIFICATE_TYPE CK_CERTIFICATE_TYPE CKC_X_509
+CKA_SUBJECT MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\065
+END
+CKA_ID UTF8 "0"
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\065
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\026\025\307\303\330\111\247\276\151\014\212\210\355\360
+\160\371\335\267\076\207
+END
+CKA_VALUE MULTILINE_OCTAL
+\060\202\002\043\060\202\001\251\240\003\002\001\002\002\024\026
+\025\307\303\330\111\247\276\151\014\212\210\355\360\160\371\335
+\267\076\207\060\012\006\010\052\206\110\316\075\004\003\003\060
+\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061\043
+\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164\162
+\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040\114
+\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123\145
+\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103\101
+\061\065\060\036\027\015\062\060\060\064\060\070\060\070\063\062
+\065\066\132\027\015\064\065\060\064\060\070\060\070\063\062\065
+\066\132\060\121\061\013\060\011\006\003\125\004\006\023\002\112
+\120\061\043\060\041\006\003\125\004\012\023\032\103\171\142\145
+\162\164\162\165\163\164\040\112\141\160\141\156\040\103\157\056
+\054\040\114\164\144\056\061\035\060\033\006\003\125\004\003\023
+\024\123\145\143\165\162\145\123\151\147\156\040\122\157\157\164
+\040\103\101\061\065\060\166\060\020\006\007\052\206\110\316\075
+\002\001\006\005\053\201\004\000\042\003\142\000\004\013\120\164
+\215\144\062\231\231\263\322\140\010\270\042\216\106\164\054\170
+\300\053\104\055\155\137\035\311\256\113\122\040\203\075\270\024
+\155\123\207\140\236\137\154\205\333\006\024\225\340\307\050\377
+\235\137\344\252\361\263\213\155\355\117\057\113\311\112\224\221
+\144\165\376\001\354\301\330\353\172\224\170\126\030\103\137\153
+\201\313\366\274\332\264\014\266\051\223\010\151\217\243\102\060
+\100\060\017\006\003\125\035\023\001\001\377\004\005\060\003\001
+\001\377\060\016\006\003\125\035\017\001\001\377\004\004\003\002
+\001\006\060\035\006\003\125\035\016\004\026\004\024\353\101\310
+\256\374\325\236\121\110\365\275\213\364\207\040\223\101\053\323
+\364\060\012\006\010\052\206\110\316\075\004\003\003\003\150\000
+\060\145\002\061\000\331\056\211\176\136\116\244\021\007\275\131
+\302\007\336\253\062\070\123\052\106\104\006\027\172\316\121\351
+\340\377\146\055\011\116\340\117\364\005\321\205\366\065\140\334
+\365\162\263\106\175\002\060\104\230\106\032\202\205\036\141\151
+\211\113\007\113\146\265\236\252\272\240\036\101\331\001\164\072
+\156\105\072\211\200\031\173\062\230\125\143\253\353\143\156\223
+\155\253\033\011\140\061\116
+END
+CKA_NSS_MOZILLA_CA_POLICY CK_BBOOL CK_TRUE
+CKA_NSS_SERVER_DISTRUST_AFTER CK_BBOOL CK_FALSE
+CKA_NSS_EMAIL_DISTRUST_AFTER CK_BBOOL CK_FALSE
+
+# Trust for "SecureSign Root CA15"
+# Issuer: CN=SecureSign Root CA15,O="Cybertrust Japan Co., Ltd.",C=JP
+# Serial Number:16:15:c7:c3:d8:49:a7:be:69:0c:8a:88:ed:f0:70:f9:dd:b7:3e:87
+# Subject: CN=SecureSign Root CA15,O="Cybertrust Japan Co., Ltd.",C=JP
+# Not Valid Before: Wed Apr 08 08:32:56 2020
+# Not Valid After : Sat Apr 08 08:32:56 2045
+# Fingerprint (SHA-256): E7:78:F0:F0:95:FE:84:37:29:CD:1A:00:82:17:9E:53:14:A9:C2:91:44:28:05:E1:FB:1D:8F:B6:B8:88:6C:3A
+# Fingerprint (SHA1): CB:BA:83:C8:C1:5A:5D:F1:F9:73:6F:CA:D7:EF:28:13:06:4A:07:7D
+CKA_CLASS CK_OBJECT_CLASS CKO_NSS_TRUST
+CKA_TOKEN CK_BBOOL CK_TRUE
+CKA_PRIVATE CK_BBOOL CK_FALSE
+CKA_MODIFIABLE CK_BBOOL CK_FALSE
+CKA_LABEL UTF8 "SecureSign Root CA15"
+CKA_CERT_SHA1_HASH MULTILINE_OCTAL
+\313\272\203\310\301\132\135\361\371\163\157\312\327\357\050\023
+\006\112\007\175
+END
+CKA_CERT_MD5_HASH MULTILINE_OCTAL
+\023\060\374\304\142\246\251\336\265\301\150\257\265\322\061\107
+END
+CKA_ISSUER MULTILINE_OCTAL
+\060\121\061\013\060\011\006\003\125\004\006\023\002\112\120\061
+\043\060\041\006\003\125\004\012\023\032\103\171\142\145\162\164
+\162\165\163\164\040\112\141\160\141\156\040\103\157\056\054\040
+\114\164\144\056\061\035\060\033\006\003\125\004\003\023\024\123
+\145\143\165\162\145\123\151\147\156\040\122\157\157\164\040\103
+\101\061\065
+END
+CKA_SERIAL_NUMBER MULTILINE_OCTAL
+\002\024\026\025\307\303\330\111\247\276\151\014\212\210\355\360
+\160\371\335\267\076\207
+END
+CKA_TRUST_SERVER_AUTH CK_TRUST CKT_NSS_TRUSTED_DELEGATOR
+CKA_TRUST_EMAIL_PROTECTION CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_CODE_SIGNING CK_TRUST CKT_NSS_MUST_VERIFY_TRUST
+CKA_TRUST_STEP_UP_APPROVED CK_BBOOL CK_FALSE

From 4f89af8a6f8b2e59d9d5e496ddfe723d6a4ce1f8 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 4 Nov 2024 20:00:11 -0500
Subject: [PATCH 081/216] deps: update acorn to 8.14.0

PR-URL: https://github.com/nodejs/node/pull/55699
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 deps/acorn/acorn/CHANGELOG.md     |  12 +++
 deps/acorn/acorn/dist/acorn.d.mts |  12 ++-
 deps/acorn/acorn/dist/acorn.d.ts  |  12 ++-
 deps/acorn/acorn/dist/acorn.js    | 135 +++++++++++++++++++++++++++---
 deps/acorn/acorn/dist/acorn.mjs   | 135 +++++++++++++++++++++++++++---
 deps/acorn/acorn/package.json     |   2 +-
 src/acorn_version.h               |   2 +-
 7 files changed, 280 insertions(+), 30 deletions(-)

diff --git a/deps/acorn/acorn/CHANGELOG.md b/deps/acorn/acorn/CHANGELOG.md
index 1e090161fffa80..313718616b575a 100644
--- a/deps/acorn/acorn/CHANGELOG.md
+++ b/deps/acorn/acorn/CHANGELOG.md
@@ -1,3 +1,15 @@
+## 8.14.0 (2024-10-27)
+
+### New features
+
+Support ES2025 import attributes.
+
+Support ES2025 RegExp modifiers.
+
+### Bug fixes
+
+Support some missing Unicode properties.
+
 ## 8.13.0 (2024-10-16)
 
 ### New features
diff --git a/deps/acorn/acorn/dist/acorn.d.mts b/deps/acorn/acorn/dist/acorn.d.mts
index cd204b1c50db94..81f4e38fdbf4c9 100644
--- a/deps/acorn/acorn/dist/acorn.d.mts
+++ b/deps/acorn/acorn/dist/acorn.d.mts
@@ -403,6 +403,7 @@ export interface ImportDeclaration extends Node {
   type: "ImportDeclaration"
   specifiers: Array<ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier>
   source: Literal
+  attributes: Array<ImportAttribute>
 }
 
 export interface ImportSpecifier extends Node {
@@ -421,11 +422,18 @@ export interface ImportNamespaceSpecifier extends Node {
   local: Identifier
 }
 
+export interface ImportAttribute extends Node {
+  type: "ImportAttribute"
+  key: Identifier | Literal
+  value: Literal
+}
+
 export interface ExportNamedDeclaration extends Node {
   type: "ExportNamedDeclaration"
   declaration?: Declaration | null
   specifiers: Array<ExportSpecifier>
   source?: Literal | null
+  attributes: Array<ImportAttribute>
 }
 
 export interface ExportSpecifier extends Node {
@@ -454,6 +462,7 @@ export interface ExportAllDeclaration extends Node {
   type: "ExportAllDeclaration"
   source: Literal
   exported?: Identifier | Literal | null
+  attributes: Array<ImportAttribute>
 }
 
 export interface AwaitExpression extends Node {
@@ -469,6 +478,7 @@ export interface ChainExpression extends Node {
 export interface ImportExpression extends Node {
   type: "ImportExpression"
   source: Expression
+  options: Expression | null
 }
 
 export interface ParenthesizedExpression extends Node {
@@ -562,7 +572,7 @@ export type ModuleDeclaration =
 | ExportDefaultDeclaration
 | ExportAllDeclaration
 
-export type AnyNode = Statement | Expression | Declaration | ModuleDeclaration | Literal | Program | SwitchCase | CatchClause | Property | Super | SpreadElement | TemplateElement | AssignmentProperty | ObjectPattern | ArrayPattern | RestElement | AssignmentPattern | ClassBody | MethodDefinition | MetaProperty | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier | AnonymousFunctionDeclaration | AnonymousClassDeclaration | PropertyDefinition | PrivateIdentifier | StaticBlock | VariableDeclarator
+export type AnyNode = Statement | Expression | Declaration | ModuleDeclaration | Literal | Program | SwitchCase | CatchClause | Property | Super | SpreadElement | TemplateElement | AssignmentProperty | ObjectPattern | ArrayPattern | RestElement | AssignmentPattern | ClassBody | MethodDefinition | MetaProperty | ImportAttribute | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier | AnonymousFunctionDeclaration | AnonymousClassDeclaration | PropertyDefinition | PrivateIdentifier | StaticBlock | VariableDeclarator
 
 export function parse(input: string, options: Options): Program
 
diff --git a/deps/acorn/acorn/dist/acorn.d.ts b/deps/acorn/acorn/dist/acorn.d.ts
index cd204b1c50db94..81f4e38fdbf4c9 100644
--- a/deps/acorn/acorn/dist/acorn.d.ts
+++ b/deps/acorn/acorn/dist/acorn.d.ts
@@ -403,6 +403,7 @@ export interface ImportDeclaration extends Node {
   type: "ImportDeclaration"
   specifiers: Array<ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier>
   source: Literal
+  attributes: Array<ImportAttribute>
 }
 
 export interface ImportSpecifier extends Node {
@@ -421,11 +422,18 @@ export interface ImportNamespaceSpecifier extends Node {
   local: Identifier
 }
 
+export interface ImportAttribute extends Node {
+  type: "ImportAttribute"
+  key: Identifier | Literal
+  value: Literal
+}
+
 export interface ExportNamedDeclaration extends Node {
   type: "ExportNamedDeclaration"
   declaration?: Declaration | null
   specifiers: Array<ExportSpecifier>
   source?: Literal | null
+  attributes: Array<ImportAttribute>
 }
 
 export interface ExportSpecifier extends Node {
@@ -454,6 +462,7 @@ export interface ExportAllDeclaration extends Node {
   type: "ExportAllDeclaration"
   source: Literal
   exported?: Identifier | Literal | null
+  attributes: Array<ImportAttribute>
 }
 
 export interface AwaitExpression extends Node {
@@ -469,6 +478,7 @@ export interface ChainExpression extends Node {
 export interface ImportExpression extends Node {
   type: "ImportExpression"
   source: Expression
+  options: Expression | null
 }
 
 export interface ParenthesizedExpression extends Node {
@@ -562,7 +572,7 @@ export type ModuleDeclaration =
 | ExportDefaultDeclaration
 | ExportAllDeclaration
 
-export type AnyNode = Statement | Expression | Declaration | ModuleDeclaration | Literal | Program | SwitchCase | CatchClause | Property | Super | SpreadElement | TemplateElement | AssignmentProperty | ObjectPattern | ArrayPattern | RestElement | AssignmentPattern | ClassBody | MethodDefinition | MetaProperty | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier | AnonymousFunctionDeclaration | AnonymousClassDeclaration | PropertyDefinition | PrivateIdentifier | StaticBlock | VariableDeclarator
+export type AnyNode = Statement | Expression | Declaration | ModuleDeclaration | Literal | Program | SwitchCase | CatchClause | Property | Super | SpreadElement | TemplateElement | AssignmentProperty | ObjectPattern | ArrayPattern | RestElement | AssignmentPattern | ClassBody | MethodDefinition | MetaProperty | ImportAttribute | ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier | ExportSpecifier | AnonymousFunctionDeclaration | AnonymousClassDeclaration | PropertyDefinition | PrivateIdentifier | StaticBlock | VariableDeclarator
 
 export function parse(input: string, options: Options): Program
 
diff --git a/deps/acorn/acorn/dist/acorn.js b/deps/acorn/acorn/dist/acorn.js
index 7cd26fa36b5caa..2bfc15b5ef2204 100644
--- a/deps/acorn/acorn/dist/acorn.js
+++ b/deps/acorn/acorn/dist/acorn.js
@@ -1678,6 +1678,8 @@
     this.expectContextual("from");
     if (this.type !== types$1.string) { this.unexpected(); }
     node.source = this.parseExprAtom();
+    if (this.options.ecmaVersion >= 16)
+      { node.attributes = this.parseWithClause(); }
     this.semicolon();
     return this.finishNode(node, "ExportAllDeclaration")
   };
@@ -1708,6 +1710,8 @@
       if (this.eatContextual("from")) {
         if (this.type !== types$1.string) { this.unexpected(); }
         node.source = this.parseExprAtom();
+        if (this.options.ecmaVersion >= 16)
+          { node.attributes = this.parseWithClause(); }
       } else {
         for (var i = 0, list = node.specifiers; i < list.length; i += 1) {
           // check for keywords used as local names
@@ -1848,6 +1852,8 @@
       this.expectContextual("from");
       node.source = this.type === types$1.string ? this.parseExprAtom() : this.unexpected();
     }
+    if (this.options.ecmaVersion >= 16)
+      { node.attributes = this.parseWithClause(); }
     this.semicolon();
     return this.finishNode(node, "ImportDeclaration")
   };
@@ -1908,6 +1914,41 @@
     return nodes
   };
 
+  pp$8.parseWithClause = function() {
+    var nodes = [];
+    if (!this.eat(types$1._with)) {
+      return nodes
+    }
+    this.expect(types$1.braceL);
+    var attributeKeys = {};
+    var first = true;
+    while (!this.eat(types$1.braceR)) {
+      if (!first) {
+        this.expect(types$1.comma);
+        if (this.afterTrailingComma(types$1.braceR)) { break }
+      } else { first = false; }
+
+      var attr = this.parseImportAttribute();
+      var keyName = attr.key.type === "Identifier" ? attr.key.name : attr.key.value;
+      if (hasOwn(attributeKeys, keyName))
+        { this.raiseRecoverable(attr.key.start, "Duplicate attribute key '" + keyName + "'"); }
+      attributeKeys[keyName] = true;
+      nodes.push(attr);
+    }
+    return nodes
+  };
+
+  pp$8.parseImportAttribute = function() {
+    var node = this.startNode();
+    node.key = this.type === types$1.string ? this.parseExprAtom() : this.parseIdent(this.options.allowReserved !== "never");
+    this.expect(types$1.colon);
+    if (this.type !== types$1.string) {
+      this.unexpected();
+    }
+    node.value = this.parseExprAtom();
+    return this.finishNode(node, "ImportAttribute")
+  };
+
   pp$8.parseModuleExportName = function() {
     if (this.options.ecmaVersion >= 13 && this.type === types$1.string) {
       var stringLiteral = this.parseLiteral(this.value);
@@ -2975,13 +3016,32 @@
     // Parse node.source.
     node.source = this.parseMaybeAssign();
 
-    // Verify ending.
-    if (!this.eat(types$1.parenR)) {
-      var errorPos = this.start;
-      if (this.eat(types$1.comma) && this.eat(types$1.parenR)) {
-        this.raiseRecoverable(errorPos, "Trailing comma is not allowed in import()");
+    if (this.options.ecmaVersion >= 16) {
+      if (!this.eat(types$1.parenR)) {
+        this.expect(types$1.comma);
+        if (!this.afterTrailingComma(types$1.parenR)) {
+          node.options = this.parseMaybeAssign();
+          if (!this.eat(types$1.parenR)) {
+            this.expect(types$1.comma);
+            if (!this.afterTrailingComma(types$1.parenR)) {
+              this.unexpected();
+            }
+          }
+        } else {
+          node.options = null;
+        }
       } else {
-        this.unexpected(errorPos);
+        node.options = null;
+      }
+    } else {
+      // Verify ending.
+      if (!this.eat(types$1.parenR)) {
+        var errorPos = this.start;
+        if (this.eat(types$1.comma) && this.eat(types$1.parenR)) {
+          this.raiseRecoverable(errorPos, "Trailing comma is not allowed in import()");
+        } else {
+          this.unexpected(errorPos);
+        }
       }
     }
 
@@ -3741,6 +3801,9 @@
     return newNode
   };
 
+  // This file was generated by "bin/generate-unicode-script-values.js". Do not modify manually!
+  var scriptValuesAddedInUnicode = "Gara Garay Gukh Gurung_Khema Hrkt Katakana_Or_Hiragana Kawi Kirat_Rai Krai Nag_Mundari Nagm Ol_Onal Onao Sunu Sunuwar Todhri Todr Tulu_Tigalari Tutg Unknown Zzzz";
+
   // This file contains Unicode properties extracted from the ECMAScript specification.
   // The lists are extracted like so:
   // $$('#table-binary-unicode-properties > figure > table > tbody > tr > td:nth-child(1) code').map(el => el.innerText)
@@ -3783,7 +3846,7 @@
   var ecma11ScriptValues = ecma10ScriptValues + " Elymaic Elym Nandinagari Nand Nyiakeng_Puachue_Hmong Hmnp Wancho Wcho";
   var ecma12ScriptValues = ecma11ScriptValues + " Chorasmian Chrs Diak Dives_Akuru Khitan_Small_Script Kits Yezi Yezidi";
   var ecma13ScriptValues = ecma12ScriptValues + " Cypro_Minoan Cpmn Old_Uyghur Ougr Tangsa Tnsa Toto Vithkuqi Vith";
-  var ecma14ScriptValues = ecma13ScriptValues + " Hrkt Katakana_Or_Hiragana Kawi Nag_Mundari Nagm Unknown Zzzz";
+  var ecma14ScriptValues = ecma13ScriptValues + " " + scriptValuesAddedInUnicode;
 
   var unicodeScriptValues = {
     9: ecma9ScriptValues,
@@ -4208,12 +4271,41 @@
   pp$1.regexp_eatUncapturingGroup = function(state) {
     var start = state.pos;
     if (state.eat(0x28 /* ( */)) {
-      if (state.eat(0x3F /* ? */) && state.eat(0x3A /* : */)) {
-        this.regexp_disjunction(state);
-        if (state.eat(0x29 /* ) */)) {
-          return true
+      if (state.eat(0x3F /* ? */)) {
+        if (this.options.ecmaVersion >= 16) {
+          var addModifiers = this.regexp_eatModifiers(state);
+          var hasHyphen = state.eat(0x2D /* - */);
+          if (addModifiers || hasHyphen) {
+            for (var i = 0; i < addModifiers.length; i++) {
+              var modifier = addModifiers.charAt(i);
+              if (addModifiers.indexOf(modifier, i + 1) > -1) {
+                state.raise("Duplicate regular expression modifiers");
+              }
+            }
+            if (hasHyphen) {
+              var removeModifiers = this.regexp_eatModifiers(state);
+              if (!addModifiers && !removeModifiers && state.current() === 0x3A /* : */) {
+                state.raise("Invalid regular expression modifiers");
+              }
+              for (var i$1 = 0; i$1 < removeModifiers.length; i$1++) {
+                var modifier$1 = removeModifiers.charAt(i$1);
+                if (
+                  removeModifiers.indexOf(modifier$1, i$1 + 1) > -1 ||
+                  addModifiers.indexOf(modifier$1) > -1
+                ) {
+                  state.raise("Duplicate regular expression modifiers");
+                }
+              }
+            }
+          }
+        }
+        if (state.eat(0x3A /* : */)) {
+          this.regexp_disjunction(state);
+          if (state.eat(0x29 /* ) */)) {
+            return true
+          }
+          state.raise("Unterminated group");
         }
-        state.raise("Unterminated group");
       }
       state.pos = start;
     }
@@ -4235,6 +4327,23 @@
     }
     return false
   };
+  // RegularExpressionModifiers ::
+  //   [empty]
+  //   RegularExpressionModifiers RegularExpressionModifier
+  pp$1.regexp_eatModifiers = function(state) {
+    var modifiers = "";
+    var ch = 0;
+    while ((ch = state.current()) !== -1 && isRegularExpressionModifier(ch)) {
+      modifiers += codePointToString(ch);
+      state.advance();
+    }
+    return modifiers
+  };
+  // RegularExpressionModifier :: one of
+  //   `i` `m` `s`
+  function isRegularExpressionModifier(ch) {
+    return ch === 0x69 /* i */ || ch === 0x6d /* m */ || ch === 0x73 /* s */
+  }
 
   // https://www.ecma-international.org/ecma-262/8.0/#prod-annexB-ExtendedAtom
   pp$1.regexp_eatExtendedAtom = function(state) {
@@ -5990,7 +6099,7 @@
   // [walk]: util/walk.js
 
 
-  var version = "8.13.0";
+  var version = "8.14.0";
 
   Parser.acorn = {
     Parser: Parser,
diff --git a/deps/acorn/acorn/dist/acorn.mjs b/deps/acorn/acorn/dist/acorn.mjs
index 21b860f275a064..43e58efe7f03e1 100644
--- a/deps/acorn/acorn/dist/acorn.mjs
+++ b/deps/acorn/acorn/dist/acorn.mjs
@@ -1672,6 +1672,8 @@ pp$8.parseExportAllDeclaration = function(node, exports) {
   this.expectContextual("from");
   if (this.type !== types$1.string) { this.unexpected(); }
   node.source = this.parseExprAtom();
+  if (this.options.ecmaVersion >= 16)
+    { node.attributes = this.parseWithClause(); }
   this.semicolon();
   return this.finishNode(node, "ExportAllDeclaration")
 };
@@ -1702,6 +1704,8 @@ pp$8.parseExport = function(node, exports) {
     if (this.eatContextual("from")) {
       if (this.type !== types$1.string) { this.unexpected(); }
       node.source = this.parseExprAtom();
+      if (this.options.ecmaVersion >= 16)
+        { node.attributes = this.parseWithClause(); }
     } else {
       for (var i = 0, list = node.specifiers; i < list.length; i += 1) {
         // check for keywords used as local names
@@ -1842,6 +1846,8 @@ pp$8.parseImport = function(node) {
     this.expectContextual("from");
     node.source = this.type === types$1.string ? this.parseExprAtom() : this.unexpected();
   }
+  if (this.options.ecmaVersion >= 16)
+    { node.attributes = this.parseWithClause(); }
   this.semicolon();
   return this.finishNode(node, "ImportDeclaration")
 };
@@ -1902,6 +1908,41 @@ pp$8.parseImportSpecifiers = function() {
   return nodes
 };
 
+pp$8.parseWithClause = function() {
+  var nodes = [];
+  if (!this.eat(types$1._with)) {
+    return nodes
+  }
+  this.expect(types$1.braceL);
+  var attributeKeys = {};
+  var first = true;
+  while (!this.eat(types$1.braceR)) {
+    if (!first) {
+      this.expect(types$1.comma);
+      if (this.afterTrailingComma(types$1.braceR)) { break }
+    } else { first = false; }
+
+    var attr = this.parseImportAttribute();
+    var keyName = attr.key.type === "Identifier" ? attr.key.name : attr.key.value;
+    if (hasOwn(attributeKeys, keyName))
+      { this.raiseRecoverable(attr.key.start, "Duplicate attribute key '" + keyName + "'"); }
+    attributeKeys[keyName] = true;
+    nodes.push(attr);
+  }
+  return nodes
+};
+
+pp$8.parseImportAttribute = function() {
+  var node = this.startNode();
+  node.key = this.type === types$1.string ? this.parseExprAtom() : this.parseIdent(this.options.allowReserved !== "never");
+  this.expect(types$1.colon);
+  if (this.type !== types$1.string) {
+    this.unexpected();
+  }
+  node.value = this.parseExprAtom();
+  return this.finishNode(node, "ImportAttribute")
+};
+
 pp$8.parseModuleExportName = function() {
   if (this.options.ecmaVersion >= 13 && this.type === types$1.string) {
     var stringLiteral = this.parseLiteral(this.value);
@@ -2969,13 +3010,32 @@ pp$5.parseDynamicImport = function(node) {
   // Parse node.source.
   node.source = this.parseMaybeAssign();
 
-  // Verify ending.
-  if (!this.eat(types$1.parenR)) {
-    var errorPos = this.start;
-    if (this.eat(types$1.comma) && this.eat(types$1.parenR)) {
-      this.raiseRecoverable(errorPos, "Trailing comma is not allowed in import()");
+  if (this.options.ecmaVersion >= 16) {
+    if (!this.eat(types$1.parenR)) {
+      this.expect(types$1.comma);
+      if (!this.afterTrailingComma(types$1.parenR)) {
+        node.options = this.parseMaybeAssign();
+        if (!this.eat(types$1.parenR)) {
+          this.expect(types$1.comma);
+          if (!this.afterTrailingComma(types$1.parenR)) {
+            this.unexpected();
+          }
+        }
+      } else {
+        node.options = null;
+      }
     } else {
-      this.unexpected(errorPos);
+      node.options = null;
+    }
+  } else {
+    // Verify ending.
+    if (!this.eat(types$1.parenR)) {
+      var errorPos = this.start;
+      if (this.eat(types$1.comma) && this.eat(types$1.parenR)) {
+        this.raiseRecoverable(errorPos, "Trailing comma is not allowed in import()");
+      } else {
+        this.unexpected(errorPos);
+      }
     }
   }
 
@@ -3735,6 +3795,9 @@ pp$2.copyNode = function(node) {
   return newNode
 };
 
+// This file was generated by "bin/generate-unicode-script-values.js". Do not modify manually!
+var scriptValuesAddedInUnicode = "Gara Garay Gukh Gurung_Khema Hrkt Katakana_Or_Hiragana Kawi Kirat_Rai Krai Nag_Mundari Nagm Ol_Onal Onao Sunu Sunuwar Todhri Todr Tulu_Tigalari Tutg Unknown Zzzz";
+
 // This file contains Unicode properties extracted from the ECMAScript specification.
 // The lists are extracted like so:
 // $$('#table-binary-unicode-properties > figure > table > tbody > tr > td:nth-child(1) code').map(el => el.innerText)
@@ -3777,7 +3840,7 @@ var ecma10ScriptValues = ecma9ScriptValues + " Dogra Dogr Gunjala_Gondi Gong Han
 var ecma11ScriptValues = ecma10ScriptValues + " Elymaic Elym Nandinagari Nand Nyiakeng_Puachue_Hmong Hmnp Wancho Wcho";
 var ecma12ScriptValues = ecma11ScriptValues + " Chorasmian Chrs Diak Dives_Akuru Khitan_Small_Script Kits Yezi Yezidi";
 var ecma13ScriptValues = ecma12ScriptValues + " Cypro_Minoan Cpmn Old_Uyghur Ougr Tangsa Tnsa Toto Vithkuqi Vith";
-var ecma14ScriptValues = ecma13ScriptValues + " Hrkt Katakana_Or_Hiragana Kawi Nag_Mundari Nagm Unknown Zzzz";
+var ecma14ScriptValues = ecma13ScriptValues + " " + scriptValuesAddedInUnicode;
 
 var unicodeScriptValues = {
   9: ecma9ScriptValues,
@@ -4202,12 +4265,41 @@ pp$1.regexp_eatReverseSolidusAtomEscape = function(state) {
 pp$1.regexp_eatUncapturingGroup = function(state) {
   var start = state.pos;
   if (state.eat(0x28 /* ( */)) {
-    if (state.eat(0x3F /* ? */) && state.eat(0x3A /* : */)) {
-      this.regexp_disjunction(state);
-      if (state.eat(0x29 /* ) */)) {
-        return true
+    if (state.eat(0x3F /* ? */)) {
+      if (this.options.ecmaVersion >= 16) {
+        var addModifiers = this.regexp_eatModifiers(state);
+        var hasHyphen = state.eat(0x2D /* - */);
+        if (addModifiers || hasHyphen) {
+          for (var i = 0; i < addModifiers.length; i++) {
+            var modifier = addModifiers.charAt(i);
+            if (addModifiers.indexOf(modifier, i + 1) > -1) {
+              state.raise("Duplicate regular expression modifiers");
+            }
+          }
+          if (hasHyphen) {
+            var removeModifiers = this.regexp_eatModifiers(state);
+            if (!addModifiers && !removeModifiers && state.current() === 0x3A /* : */) {
+              state.raise("Invalid regular expression modifiers");
+            }
+            for (var i$1 = 0; i$1 < removeModifiers.length; i$1++) {
+              var modifier$1 = removeModifiers.charAt(i$1);
+              if (
+                removeModifiers.indexOf(modifier$1, i$1 + 1) > -1 ||
+                addModifiers.indexOf(modifier$1) > -1
+              ) {
+                state.raise("Duplicate regular expression modifiers");
+              }
+            }
+          }
+        }
+      }
+      if (state.eat(0x3A /* : */)) {
+        this.regexp_disjunction(state);
+        if (state.eat(0x29 /* ) */)) {
+          return true
+        }
+        state.raise("Unterminated group");
       }
-      state.raise("Unterminated group");
     }
     state.pos = start;
   }
@@ -4229,6 +4321,23 @@ pp$1.regexp_eatCapturingGroup = function(state) {
   }
   return false
 };
+// RegularExpressionModifiers ::
+//   [empty]
+//   RegularExpressionModifiers RegularExpressionModifier
+pp$1.regexp_eatModifiers = function(state) {
+  var modifiers = "";
+  var ch = 0;
+  while ((ch = state.current()) !== -1 && isRegularExpressionModifier(ch)) {
+    modifiers += codePointToString(ch);
+    state.advance();
+  }
+  return modifiers
+};
+// RegularExpressionModifier :: one of
+//   `i` `m` `s`
+function isRegularExpressionModifier(ch) {
+  return ch === 0x69 /* i */ || ch === 0x6d /* m */ || ch === 0x73 /* s */
+}
 
 // https://www.ecma-international.org/ecma-262/8.0/#prod-annexB-ExtendedAtom
 pp$1.regexp_eatExtendedAtom = function(state) {
@@ -5984,7 +6093,7 @@ pp.readWord = function() {
 // [walk]: util/walk.js
 
 
-var version = "8.13.0";
+var version = "8.14.0";
 
 Parser.acorn = {
   Parser: Parser,
diff --git a/deps/acorn/acorn/package.json b/deps/acorn/acorn/package.json
index 3396013bbbf060..795cf83eff64d7 100644
--- a/deps/acorn/acorn/package.json
+++ b/deps/acorn/acorn/package.json
@@ -16,7 +16,7 @@
     ],
     "./package.json": "./package.json"
   },
-  "version": "8.13.0",
+  "version": "8.14.0",
   "engines": {
     "node": ">=0.4.0"
   },
diff --git a/src/acorn_version.h b/src/acorn_version.h
index fdafbf96987762..b4e18696f8b6db 100644
--- a/src/acorn_version.h
+++ b/src/acorn_version.h
@@ -2,5 +2,5 @@
 // Refer to tools/dep_updaters/update-acorn.sh
 #ifndef SRC_ACORN_VERSION_H_
 #define SRC_ACORN_VERSION_H_
-#define ACORN_VERSION "8.13.0"
+#define ACORN_VERSION "8.14.0"
 #endif  // SRC_ACORN_VERSION_H_

From a7ce82e3cc4ebd0c32492e257a17de8b73cf1ff8 Mon Sep 17 00:00:00 2001
From: Joe Bowbeer <joe.bowbeer@gmail.com>
Date: Tue, 5 Nov 2024 15:14:15 -0800
Subject: [PATCH 082/216] doc: update `--max-semi-space-size` description

PR-URL: https://github.com/nodejs/node/pull/55495
Fixes: https://github.com/nodejs/node/issues/55487
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
---
 doc/api/cli.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/doc/api/cli.md b/doc/api/cli.md
index 825e3c6e313cf6..1390c50f381f74 100644
--- a/doc/api/cli.md
+++ b/doc/api/cli.md
@@ -3258,8 +3258,12 @@ an increase of 1 MiB to semi-space applies to each of the three individual
 semi-spaces and causes the heap size to increase by 3 MiB. The throughput
 improvement depends on your workload (see [#42511][]).
 
-The default value is 16 MiB for 64-bit systems and 8 MiB for 32-bit systems. To
-get the best configuration for your application, you should try different
+The default value depends on the memory limit. For example, on 64-bit systems
+with a memory limit of 512 MiB, the max size of a semi-space defaults to 1 MiB.
+On 64-bit systems with a memory limit of 2 GiB, the max size of a semi-space
+defaults to 16 MiB.
+
+To get the best configuration for your application, you should try different
 max-semi-space-size values when running benchmarks for your application.
 
 For example, benchmark on a 64-bit systems:

From 0ac0afc4a98b31d7035e5ffdc800106447136c61 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Tue, 5 Nov 2024 23:40:39 +0000
Subject: [PATCH 083/216] test: refactor some esm tests

PR-URL: https://github.com/nodejs/node/pull/55472
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
---
 .../test-esm-import-meta-resolve.mjs          | 23 ++++++++--------
 test/es-module/test-esm-pkgname.mjs           | 27 +++++++------------
 2 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/test/es-module/test-esm-import-meta-resolve.mjs b/test/es-module/test-esm-import-meta-resolve.mjs
index f3b153062192af..49b6d1ff906ec3 100644
--- a/test/es-module/test-esm-import-meta-resolve.mjs
+++ b/test/es-module/test-esm-import-meta-resolve.mjs
@@ -1,22 +1,21 @@
 // Flags: --experimental-import-meta-resolve
 import { spawnPromisified } from '../common/index.mjs';
+import { fileURL as fixturesFileURL } from '../common/fixtures.mjs';
 import assert from 'assert';
 import { spawn } from 'child_process';
 import { execPath } from 'process';
 
-const dirname = import.meta.url.slice(0, import.meta.url.lastIndexOf('/') + 1);
-const fixtures = dirname.slice(0, dirname.lastIndexOf('/', dirname.length - 2) + 1) + 'fixtures/';
+const fixtures = `${fixturesFileURL()}/`;
 
 assert.strictEqual(import.meta.resolve('./test-esm-import-meta.mjs'),
-                   dirname + 'test-esm-import-meta.mjs');
+                   new URL('./test-esm-import-meta.mjs', import.meta.url).href);
 assert.strictEqual(import.meta.resolve('./notfound.mjs'), new URL('./notfound.mjs', import.meta.url).href);
 assert.strictEqual(import.meta.resolve('./asset'), new URL('./asset', import.meta.url).href);
-try {
+assert.throws(() => {
   import.meta.resolve('does-not-exist');
-  assert.fail();
-} catch (e) {
-  assert.strictEqual(e.code, 'ERR_MODULE_NOT_FOUND');
-}
+}, {
+  code: 'ERR_MODULE_NOT_FOUND',
+});
 assert.strictEqual(
   import.meta.resolve('../fixtures/empty-with-bom.txt'),
   fixtures + 'empty-with-bom.txt');
@@ -60,11 +59,11 @@ await assert.rejects(import('data:text/javascript,export default import.meta.res
 });
 
 {
-  const cp = spawn(execPath, [
+  const { stdout } = await spawnPromisified(execPath, [
     '--input-type=module',
     '--eval', 'console.log(typeof import.meta.resolve)',
   ]);
-  assert.match((await cp.stdout.toArray()).toString(), /^function\r?\n$/);
+  assert.match(stdout, /^function\r?\n$/);
 }
 
 {
@@ -76,11 +75,11 @@ await assert.rejects(import('data:text/javascript,export default import.meta.res
 }
 
 {
-  const cp = spawn(execPath, [
+  const { stdout } = await spawnPromisified(execPath, [
     '--input-type=module',
     '--eval', 'import "data:text/javascript,console.log(import.meta.resolve(%22node:os%22))"',
   ]);
-  assert.match((await cp.stdout.toArray()).toString(), /^node:os\r?\n$/);
+  assert.match(stdout, /^node:os\r?\n$/);
 }
 
 {
diff --git a/test/es-module/test-esm-pkgname.mjs b/test/es-module/test-esm-pkgname.mjs
index 5090d6c22ce689..6f2d538841e5cf 100644
--- a/test/es-module/test-esm-pkgname.mjs
+++ b/test/es-module/test-esm-pkgname.mjs
@@ -1,20 +1,13 @@
-import { mustCall } from '../common/index.mjs';
-import { strictEqual } from 'assert';
+import '../common/index.mjs';
+import assert from 'node:assert';
 
 import { importFixture } from '../fixtures/pkgexports.mjs';
 
-importFixture('as%2Ff').catch(mustCall((err) => {
-  strictEqual(err.code, 'ERR_INVALID_MODULE_SPECIFIER');
-}));
-
-importFixture('as%5Cf').catch(mustCall((err) => {
-  strictEqual(err.code, 'ERR_INVALID_MODULE_SPECIFIER');
-}));
-
-importFixture('as\\df').catch(mustCall((err) => {
-  strictEqual(err.code, 'ERR_INVALID_MODULE_SPECIFIER');
-}));
-
-importFixture('@as@df').catch(mustCall((err) => {
-  strictEqual(err.code, 'ERR_INVALID_MODULE_SPECIFIER');
-}));
+await Promise.all([
+  'as%2Ff',
+  'as%5Cf',
+  'as\\df',
+  '@as@df',
+].map((specifier) => assert.rejects(importFixture(specifier), {
+  code: 'ERR_INVALID_MODULE_SPECIFIER',
+})));

From d2421f3c92c5cd1b8cfdd59d62d19cea0ab586e3 Mon Sep 17 00:00:00 2001
From: Carlos Espa <43477095+Ceres6@users.noreply.github.com>
Date: Wed, 6 Nov 2024 02:10:33 +0100
Subject: [PATCH 084/216] test: ignore unrelated events in FW watch tests

Change assertions on `test-fs-watch-recursive-add-*` tests to only take
into account change events that match the file.

PR-URL: https://github.com/nodejs/node/pull/55605
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 ...ecursive-add-file-to-existing-subfolder.js |  3 +-
 ...-watch-recursive-add-file-to-new-folder.js | 36 +++++++------------
 ...st-fs-watch-recursive-add-file-with-url.js |  3 +-
 .../test-fs-watch-recursive-add-file.js       |  3 +-
 .../test-fs-watch-recursive-add-folder.js     |  3 +-
 5 files changed, 16 insertions(+), 32 deletions(-)

diff --git a/test/parallel/test-fs-watch-recursive-add-file-to-existing-subfolder.js b/test/parallel/test-fs-watch-recursive-add-file-to-existing-subfolder.js
index 628ca4b2fdf805..511829fa385e52 100644
--- a/test/parallel/test-fs-watch-recursive-add-file-to-existing-subfolder.js
+++ b/test/parallel/test-fs-watch-recursive-add-file-to-existing-subfolder.js
@@ -40,9 +40,8 @@ const relativePath = path.join(file, path.basename(subfolderPath), childrenFile)
 const watcher = fs.watch(testDirectory, { recursive: true });
 let watcherClosed = false;
 watcher.on('change', function(event, filename) {
-  assert.strictEqual(event, 'rename');
-
   if (filename === relativePath) {
+    assert.strictEqual(event, 'rename');
     watcher.close();
     watcherClosed = true;
   }
diff --git a/test/parallel/test-fs-watch-recursive-add-file-to-new-folder.js b/test/parallel/test-fs-watch-recursive-add-file-to-new-folder.js
index 32a397821b8502..fcc49bb7464937 100644
--- a/test/parallel/test-fs-watch-recursive-add-file-to-new-folder.js
+++ b/test/parallel/test-fs-watch-recursive-add-file-to-new-folder.js
@@ -33,32 +33,20 @@ const childrenAbsolutePath = path.join(filePath, childrenFile);
 const childrenRelativePath = path.join(path.basename(filePath), childrenFile);
 let watcherClosed = false;
 
-function doWatch() {
-  const watcher = fs.watch(testDirectory, { recursive: true });
-  watcher.on('change', function(event, filename) {
+const watcher = fs.watch(testDirectory, { recursive: true });
+watcher.on('change', function(event, filename) {
+  if (filename === childrenRelativePath) {
     assert.strictEqual(event, 'rename');
-    assert.ok(filename === path.basename(filePath) || filename === childrenRelativePath);
-
-    if (filename === childrenRelativePath) {
-      watcher.close();
-      watcherClosed = true;
-    }
-  });
-
-  // Do the write with a delay to ensure that the OS is ready to notify us.
-  setTimeout(() => {
-    fs.mkdirSync(filePath);
-    fs.writeFileSync(childrenAbsolutePath, 'world');
-  }, common.platformTimeout(200));
-}
+    watcher.close();
+    watcherClosed = true;
+  }
+});
 
-if (common.isMacOS) {
-  // On macOS delay watcher start to avoid leaking previous events.
-  // Refs: https://github.com/libuv/libuv/pull/4503
-  setTimeout(doWatch, common.platformTimeout(100));
-} else {
-  doWatch();
-}
+// Do the write with a delay to ensure that the OS is ready to notify us.
+setTimeout(() => {
+  fs.mkdirSync(filePath);
+  fs.writeFileSync(childrenAbsolutePath, 'world');
+}, common.platformTimeout(200));
 
 process.once('exit', function() {
   assert(watcherClosed, 'watcher Object was not closed');
diff --git a/test/parallel/test-fs-watch-recursive-add-file-with-url.js b/test/parallel/test-fs-watch-recursive-add-file-with-url.js
index ee726961c41e9e..852c7088d59792 100644
--- a/test/parallel/test-fs-watch-recursive-add-file-with-url.js
+++ b/test/parallel/test-fs-watch-recursive-add-file-with-url.js
@@ -35,9 +35,8 @@ tmpdir.refresh();
   const watcher = fs.watch(url, { recursive: true });
   let watcherClosed = false;
   watcher.on('change', function(event, filename) {
-    assert.strictEqual(event, 'rename');
-
     if (filename === path.basename(filePath)) {
+      assert.strictEqual(event, 'rename');
       watcher.close();
       watcherClosed = true;
     }
diff --git a/test/parallel/test-fs-watch-recursive-add-file.js b/test/parallel/test-fs-watch-recursive-add-file.js
index 27b933871cb403..e8724102c89ff8 100644
--- a/test/parallel/test-fs-watch-recursive-add-file.js
+++ b/test/parallel/test-fs-watch-recursive-add-file.js
@@ -31,9 +31,8 @@ const testFile = path.join(testDirectory, 'file-1.txt');
 const watcher = fs.watch(testDirectory, { recursive: true });
 let watcherClosed = false;
 watcher.on('change', function(event, filename) {
-  assert.strictEqual(event, 'rename');
-
   if (filename === path.basename(testFile)) {
+    assert.strictEqual(event, 'rename');
     watcher.close();
     watcherClosed = true;
   }
diff --git a/test/parallel/test-fs-watch-recursive-add-folder.js b/test/parallel/test-fs-watch-recursive-add-folder.js
index 1851a7850f66ff..1a6671de2f3617 100644
--- a/test/parallel/test-fs-watch-recursive-add-folder.js
+++ b/test/parallel/test-fs-watch-recursive-add-folder.js
@@ -33,9 +33,8 @@ tmpdir.refresh();
   const watcher = fs.watch(testDirectory, { recursive: true });
   let watcherClosed = false;
   watcher.on('change', function(event, filename) {
-    assert.strictEqual(event, 'rename');
-
     if (filename === path.basename(testFile)) {
+      assert.strictEqual(event, 'rename');
       watcher.close();
       watcherClosed = true;
     }

From 16eef6461eec052610d810fc41d6bc3dcb17ab46 Mon Sep 17 00:00:00 2001
From: Preveen P <31464911+preveen-stack@users.noreply.github.com>
Date: Wed, 6 Nov 2024 13:25:49 +0530
Subject: [PATCH 085/216] doc: clarity to available addon options

bullet pointed addon optons; wording clarity; fixes typo

PR-URL: https://github.com/nodejs/node/pull/55715
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/api/addons.md | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/doc/api/addons.md b/doc/api/addons.md
index 446a63b1a2fc8f..e0e00dca0b9e8b 100644
--- a/doc/api/addons.md
+++ b/doc/api/addons.md
@@ -8,20 +8,25 @@ _Addons_ are dynamically-linked shared objects written in C++. The
 [`require()`][require] function can load addons as ordinary Node.js modules.
 Addons provide an interface between JavaScript and C/C++ libraries.
 
-There are three options for implementing addons: Node-API, nan, or direct
-use of internal V8, libuv, and Node.js libraries. Unless there is a need for
-direct access to functionality which is not exposed by Node-API, use Node-API.
+There are three options for implementing addons:
+
+* Node-API
+* `nan` ([Native Abstractions for Node.js][])
+* direct use of internal V8, libuv, and Node.js libraries
+
+Unless there is a need for direct access to functionality which is not\
+exposed by Node-API, use Node-API.
 Refer to [C/C++ addons with Node-API](n-api.md) for more information on
 Node-API.
 
-When not using Node-API, implementing addons is complicated,
-involving knowledge of several components and APIs:
+When not using Node-API, implementing addons becomes more complex, requiring\
+knowledge of multiple components and APIs:
 
 * [V8][]: the C++ library Node.js uses to provide the
-  JavaScript implementation. V8 provides the mechanisms for creating objects,
-  calling functions, etc. V8's API is documented mostly in the
+  JavaScript implementation. It provides the mechanisms for creating objects,
+  calling functions, etc. The V8's API is documented mostly in the
   `v8.h` header file (`deps/v8/include/v8.h` in the Node.js source
-  tree), which is also available [online][v8-docs].
+  tree), and is also available [online][v8-docs].
 
 * [libuv][]: The C library that implements the Node.js event loop, its worker
   threads and all of the asynchronous behaviors of the platform. It also
@@ -35,10 +40,10 @@ involving knowledge of several components and APIs:
   offloading work via libuv to non-blocking system operations, worker threads,
   or a custom use of libuv threads.
 
-* Internal Node.js libraries. Node.js itself exports C++ APIs that addons can
+* Internal Node.js libraries: Node.js itself exports C++ APIs that addons can
   use, the most important of which is the `node::ObjectWrap` class.
 
-* Node.js includes other statically linked libraries including OpenSSL. These
+* Other statically linked libraries (including OpenSSL): These
   other libraries are located in the `deps/` directory in the Node.js source
   tree. Only the libuv, OpenSSL, V8, and zlib symbols are purposefully
   re-exported by Node.js and may be used to various extents by addons. See
@@ -148,8 +153,8 @@ invocation of `NODE_MODULE_INIT()`:
 * `Local<Value> module`, and
 * `Local<Context> context`
 
-The choice to build a context-aware addon carries with it the responsibility of
-carefully managing global static data. Since the addon may be loaded multiple
+Building a context-aware addon requires careful management of global static data
+to ensure stability and correctness. Since the addon may be loaded multiple
 times, potentially even from different threads, any global static data stored
 in the addon must be properly protected, and must not contain any persistent
 references to JavaScript objects. The reason for this is that JavaScript
@@ -255,7 +260,7 @@ such as a main thread and a Worker thread, an add-on needs to either:
 * Be declared as context-aware using `NODE_MODULE_INIT()` as described above
 
 In order to support [`Worker`][] threads, addons need to clean up any resources
-they may have allocated when such a thread exists. This can be achieved through
+they may have allocated when such a thread exits. This can be achieved through
 the usage of the `AddEnvironmentCleanupHook()` function:
 
 ```cpp
@@ -1273,7 +1278,7 @@ class MyObject : public node::ObjectWrap {
 #endif
 ```
 
-The implementation of `myobject.cc` is similar to before:
+The implementation of `myobject.cc` remains similar to the previous version:
 
 ```cpp
 // myobject.cc

From 7ed346d8fd1fbec58370b12d8d6e7e658cc59e9b Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Wed, 6 Nov 2024 04:57:15 -0500
Subject: [PATCH 086/216] util: do not catch on circular `@@toStringTag` errors

PR-URL: https://github.com/nodejs/node/pull/55544
Fixes: https://github.com/nodejs/node/issues/55539
Reviewed-By: James M Snell <jasnell@gmail.com>
Co-Authored-By: Colin Ihrig <cjihrig@gmail.com>
---
 lib/internal/util/inspect.js       | 19 ++++++++-----------
 test/parallel/test-util-inspect.js |  9 +++++++++
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/lib/internal/util/inspect.js b/lib/internal/util/inspect.js
index b46038425b8c70..c8bb717788fe0a 100644
--- a/lib/internal/util/inspect.js
+++ b/lib/internal/util/inspect.js
@@ -1062,6 +1062,7 @@ function formatRaw(ctx, value, recurseTimes, typedArray) {
       ArrayPrototypePushApply(output, protoProps);
     }
   } catch (err) {
+    if (!isStackOverflowError(err)) throw err;
     const constructorName = StringPrototypeSlice(getCtxStyle(value, constructor, tag), 0, -1);
     return handleMaxCallStackSize(ctx, err, constructorName, indentationLvl);
   }
@@ -1547,17 +1548,13 @@ function groupArrayElements(ctx, output, value) {
 }
 
 function handleMaxCallStackSize(ctx, err, constructorName, indentationLvl) {
-  if (isStackOverflowError(err)) {
-    ctx.seen.pop();
-    ctx.indentationLvl = indentationLvl;
-    return ctx.stylize(
-      `[${constructorName}: Inspection interrupted ` +
-        'prematurely. Maximum call stack size exceeded.]',
-      'special',
-    );
-  }
-  /* c8 ignore next */
-  assert.fail(err.stack);
+  ctx.seen.pop();
+  ctx.indentationLvl = indentationLvl;
+  return ctx.stylize(
+    `[${constructorName}: Inspection interrupted ` +
+      'prematurely. Maximum call stack size exceeded.]',
+    'special',
+  );
 }
 
 function addNumericSeparator(integerString) {
diff --git a/test/parallel/test-util-inspect.js b/test/parallel/test-util-inspect.js
index 707a73820a4a59..811f085879e2c6 100644
--- a/test/parallel/test-util-inspect.js
+++ b/test/parallel/test-util-inspect.js
@@ -1644,6 +1644,15 @@ util.inspect(process);
 
   assert.throws(() => util.inspect(new ThrowingClass()), /toStringTag error/);
 
+  const y = {
+    get [Symbol.toStringTag]() {
+      return JSON.stringify(this);
+    }
+  };
+  const x = { y };
+  y.x = x;
+  assert.throws(() => util.inspect(x), /TypeError: Converting circular structure to JSON/);
+
   class NotStringClass {
     get [Symbol.toStringTag]() {
       return null;

From 032ff07a2dfb33da65cffb744e5c8b7e0d33a17c Mon Sep 17 00:00:00 2001
From: Gireesh Punathil <gpunathi@in.ibm.com>
Date: Wed, 6 Nov 2024 15:58:28 +0530
Subject: [PATCH 087/216] doc: consistent use of word child process

reword "child" to "child process" wherever possible.
this helps in maintaining clarity and precision,
consistency while avoiding misinterpretation.

PR-URL: https://github.com/nodejs/node/pull/55654
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/api/child_process.md | 110 ++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 54 deletions(-)

diff --git a/doc/api/child_process.md b/doc/api/child_process.md
index b98e0e041cf1b8..60d005ffea472e 100644
--- a/doc/api/child_process.md
+++ b/doc/api/child_process.md
@@ -280,9 +280,9 @@ exec('cat *.js missing_file | wc -l', (error, stdout, stderr) => {
 });
 ```
 
-If `timeout` is greater than `0`, the parent will send the signal
+If `timeout` is greater than `0`, the parent process will send the signal
 identified by the `killSignal` property (the default is `'SIGTERM'`) if the
-child runs longer than `timeout` milliseconds.
+child process runs longer than `timeout` milliseconds.
 
 Unlike the exec(3) POSIX system call, `child_process.exec()` does not replace
 the existing process and uses a shell to execute the command.
@@ -535,8 +535,8 @@ changes:
 * `args` {string\[]} List of string arguments.
 * `options` {Object}
   * `cwd` {string|URL} Current working directory of the child process.
-  * `detached` {boolean} Prepare child to run independently of its parent
-    process. Specific behavior depends on the platform, see
+  * `detached` {boolean} Prepare child process to run independently of its
+    parent process. Specific behavior depends on the platform, see
     [`options.detached`][]).
   * `env` {Object} Environment key-value pairs. **Default:** `process.env`.
   * `execPath` {string} Executable used to create the child process.
@@ -550,10 +550,11 @@ changes:
     AbortSignal.
   * `killSignal` {string|integer} The signal value to be used when the spawned
     process will be killed by timeout or abort signal. **Default:** `'SIGTERM'`.
-  * `silent` {boolean} If `true`, stdin, stdout, and stderr of the child will be
-    piped to the parent, otherwise they will be inherited from the parent, see
-    the `'pipe'` and `'inherit'` options for [`child_process.spawn()`][]'s
-    [`stdio`][] for more details. **Default:** `false`.
+  * `silent` {boolean} If `true`, stdin, stdout, and stderr of the child
+    process will be piped to the parent process, otherwise they will be inherited
+    from the parent process, see the `'pipe'` and `'inherit'` options for
+    [`child_process.spawn()`][]'s [`stdio`][] for more details.
+    **Default:** `false`.
   * `stdio` {Array|string} See [`child_process.spawn()`][]'s [`stdio`][].
     When this option is provided, it overrides `silent`. If the array variant
     is used, it must contain exactly one item with value `'ipc'` or an error
@@ -686,8 +687,8 @@ changes:
     process. This will be set to `command` if not specified.
   * `stdio` {Array|string} Child's stdio configuration (see
     [`options.stdio`][`stdio`]).
-  * `detached` {boolean} Prepare child to run independently of its parent
-    process. Specific behavior depends on the platform, see
+  * `detached` {boolean} Prepare child process to run independently of
+    its parent process. Specific behavior depends on the platform, see
     [`options.detached`][]).
   * `uid` {number} Sets the user identity of the process (see setuid(2)).
   * `gid` {number} Sets the group identity of the process (see setgid(2)).
@@ -909,27 +910,27 @@ added: v0.7.10
 -->
 
 On Windows, setting `options.detached` to `true` makes it possible for the
-child process to continue running after the parent exits. The child will have
-its own console window. Once enabled for a child process, it cannot be
-disabled.
+child process to continue running after the parent exits. The child process
+will have its own console window. Once enabled for a child process,
+it cannot be disabled.
 
 On non-Windows platforms, if `options.detached` is set to `true`, the child
 process will be made the leader of a new process group and session. Child
 processes may continue running after the parent exits regardless of whether
 they are detached or not. See setsid(2) for more information.
 
-By default, the parent will wait for the detached child to exit. To prevent the
-parent from waiting for a given `subprocess` to exit, use the
-`subprocess.unref()` method. Doing so will cause the parent's event loop to not
-include the child in its reference count, allowing the parent to exit
-independently of the child, unless there is an established IPC channel between
-the child and the parent.
+By default, the parent will wait for the detached child process to exit.
+To prevent the parent process from waiting for a given `subprocess` to exit, use
+the `subprocess.unref()` method. Doing so will cause the parent process' event
+loop to not include the child process in its reference count, allowing the
+parent process to exit independently of the child process, unless there is an established
+IPC channel between the child and the parent processes.
 
 When using the `detached` option to start a long-running process, the process
 will not stay running in the background after the parent exits unless it is
 provided with a `stdio` configuration that is not connected to the parent.
-If the parent's `stdio` is inherited, the child will remain attached to the
-controlling terminal.
+If the parent process' `stdio` is inherited, the child process will remain attached
+to the controlling terminal.
 
 Example of a long-running process, by detaching and also ignoring its parent
 `stdio` file descriptors, in order to ignore the parent's termination:
@@ -1039,10 +1040,10 @@ pipes between the parent and child. The value is one of the following:
 3. `'ipc'`: Create an IPC channel for passing messages/file descriptors
    between parent and child. A [`ChildProcess`][] may have at most one IPC
    stdio file descriptor. Setting this option enables the
-   [`subprocess.send()`][] method. If the child is a Node.js process, the
-   presence of an IPC channel will enable [`process.send()`][] and
+   [`subprocess.send()`][] method. If the child process is a Node.js instance,
+   the presence of an IPC channel will enable [`process.send()`][] and
    [`process.disconnect()`][] methods, as well as [`'disconnect'`][] and
-   [`'message'`][] events within the child.
+   [`'message'`][] events within the child process.
 
    Accessing the IPC channel fd in any way other than [`process.send()`][]
    or using the IPC channel with a child process that is not a Node.js instance
@@ -1109,12 +1110,12 @@ spawn('prg', [], { stdio: ['pipe', null, null, null, 'pipe'] });
 ```
 
 _It is worth noting that when an IPC channel is established between the
-parent and child processes, and the child is a Node.js process, the child
-is launched with the IPC channel unreferenced (using `unref()`) until the
-child registers an event handler for the [`'disconnect'`][] event
-or the [`'message'`][] event. This allows the child to exit
-normally without the process being held open by the open IPC channel._
-
+parent and child processes, and the child process is a Node.js instance,
+the child process is launched with the IPC channel unreferenced (using
+`unref()`) until the child process registers an event handler for the
+[`'disconnect'`][] event or the [`'message'`][] event. This allows the
+child process to exit normally without the process being held open by the
+open IPC channel._
 See also: [`child_process.exec()`][] and [`child_process.fork()`][].
 
 ## Synchronous process creation
@@ -1437,14 +1438,14 @@ instances of `ChildProcess`.
 added: v0.7.7
 -->
 
-* `code` {number} The exit code if the child exited on its own.
+* `code` {number} The exit code if the child process exited on its own.
 * `signal` {string} The signal by which the child process was terminated.
 
 The `'close'` event is emitted after a process has ended _and_ the stdio
 streams of a child process have been closed. This is distinct from the
 [`'exit'`][] event, since multiple processes might share the same stdio
 streams. The `'close'` event will always emit after [`'exit'`][] was
-already emitted, or [`'error'`][] if the child failed to spawn.
+already emitted, or [`'error'`][] if the child process failed to spawn.
 
 ```cjs
 const { spawn } = require('node:child_process');
@@ -1515,7 +1516,7 @@ See also [`subprocess.kill()`][] and [`subprocess.send()`][].
 added: v0.1.90
 -->
 
-* `code` {number} The exit code if the child exited on its own.
+* `code` {number} The exit code if the child process exited on its own.
 * `signal` {string} The signal by which the child process was terminated.
 
 The `'exit'` event is emitted after the child process ends. If the process
@@ -1625,11 +1626,12 @@ send and receive messages from a child process. When `subprocess.connected` is
 added: v0.7.2
 -->
 
-Closes the IPC channel between parent and child, allowing the child to exit
-gracefully once there are no other connections keeping it alive. After calling
-this method the `subprocess.connected` and `process.connected` properties in
-both the parent and child (respectively) will be set to `false`, and it will be
-no longer possible to pass messages between the processes.
+Closes the IPC channel between parent and child processes, allowing the child
+process to exit gracefully once there are no other connections keeping it alive.
+After calling this method the `subprocess.connected` and
+`process.connected` properties in both the parent and child processes
+(respectively) will be set to `false`, and it will be no longer possible
+to pass messages between the processes.
 
 The `'disconnect'` event will be emitted when there are no messages in the
 process of being received. This will most often be triggered immediately after
@@ -1805,7 +1807,7 @@ added: v0.7.10
 
 Calling `subprocess.ref()` after making a call to `subprocess.unref()` will
 restore the removed reference count for the child process, forcing the parent
-to wait for the child to exit before exiting itself.
+process to wait for the child process to exit before exiting itself.
 
 ```cjs
 const { spawn } = require('node:child_process');
@@ -1862,9 +1864,9 @@ changes:
 * `callback` {Function}
 * Returns: {boolean}
 
-When an IPC channel has been established between the parent and child (
-i.e. when using [`child_process.fork()`][]), the `subprocess.send()` method can
-be used to send messages to the child process. When the child process is a
+When an IPC channel has been established between the parent and child processes
+( i.e. when using [`child_process.fork()`][]), the `subprocess.send()` method
+can be used to send messages to the child process. When the child process is a
 Node.js instance, these messages can be received via the [`'message'`][] event.
 
 The message goes through serialization and parsing. The resulting
@@ -1908,7 +1910,7 @@ process.send({ foo: 'bar', baz: NaN });
 ```
 
 Child Node.js processes will have a [`process.send()`][] method of their own
-that allows the child to send messages back to the parent.
+that allows the child process to send messages back to the parent process.
 
 There is a special case when sending a `{cmd: 'NODE_foo'}` message. Messages
 containing a `NODE_` prefix in the `cmd` property are reserved for use within
@@ -1919,14 +1921,14 @@ Applications should avoid using such messages or listening for
 `'internalMessage'` events as it is subject to change without notice.
 
 The optional `sendHandle` argument that may be passed to `subprocess.send()` is
-for passing a TCP server or socket object to the child process. The child will
+for passing a TCP server or socket object to the child process. The child process will
 receive the object as the second argument passed to the callback function
 registered on the [`'message'`][] event. Any data that is received
 and buffered in the socket will not be sent to the child. Sending IPC sockets is
 not supported on Windows.
 
 The optional `callback` is a function that is invoked after the message is
-sent but before the child may have received it. The function is called with a
+sent but before the child process may have received it. The function is called with a
 single argument: `null` on success, or an [`Error`][] object on failure.
 
 If no `callback` function is provided and the message cannot be sent, an
@@ -1975,7 +1977,7 @@ server.listen(1337, () => {
 });
 ```
 
-The child would then receive the server object as:
+The child process would then receive the server object as:
 
 ```js
 process.on('message', (m, server) => {
@@ -2109,7 +2111,7 @@ added: v0.1.90
 
 A `Readable Stream` that represents the child process's `stderr`.
 
-If the child was spawned with `stdio[2]` set to anything other than `'pipe'`,
+If the child process was spawned with `stdio[2]` set to anything other than `'pipe'`,
 then this will be `null`.
 
 `subprocess.stderr` is an alias for `subprocess.stdio[2]`. Both properties will
@@ -2128,10 +2130,10 @@ added: v0.1.90
 
 A `Writable Stream` that represents the child process's `stdin`.
 
-If a child process waits to read all of its input, the child will not continue
+If a child process waits to read all of its input, the child process will not continue
 until this stream has been closed via `end()`.
 
-If the child was spawned with `stdio[0]` set to anything other than `'pipe'`,
+If the child process was spawned with `stdio[0]` set to anything other than `'pipe'`,
 then this will be `null`.
 
 `subprocess.stdin` is an alias for `subprocess.stdio[0]`. Both properties will
@@ -2217,7 +2219,7 @@ added: v0.1.90
 
 A `Readable Stream` that represents the child process's `stdout`.
 
-If the child was spawned with `stdio[1]` set to anything other than `'pipe'`,
+If the child process was spawned with `stdio[1]` set to anything other than `'pipe'`,
 then this will be `null`.
 
 `subprocess.stdout` is an alias for `subprocess.stdio[1]`. Both properties will
@@ -2252,12 +2254,12 @@ if the child process could not be successfully spawned.
 added: v0.7.10
 -->
 
-By default, the parent will wait for the detached child to exit. To prevent the
-parent from waiting for a given `subprocess` to exit, use the
+By default, the parent process will wait for the detached child process to exit.
+To prevent the parent process from waiting for a given `subprocess` to exit, use the
 `subprocess.unref()` method. Doing so will cause the parent's event loop to not
-include the child in its reference count, allowing the parent to exit
+include the child process in its reference count, allowing the parent to exit
 independently of the child, unless there is an established IPC channel between
-the child and the parent.
+the child and the parent processes.
 
 ```cjs
 const { spawn } = require('node:child_process');

From d0417eaec9462696ee827043830c21556fffbbdd Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Fri, 8 Nov 2024 05:27:17 -0500
Subject: [PATCH 088/216] doc: add esm example in `path.md`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55745
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
---
 doc/api/path.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/api/path.md b/doc/api/path.md
index f2240c396629ad..a0619d12287468 100644
--- a/doc/api/path.md
+++ b/doc/api/path.md
@@ -9,10 +9,14 @@
 The `node:path` module provides utilities for working with file and directory
 paths. It can be accessed using:
 
-```js
+```cjs
 const path = require('node:path');
 ```
 
+```mjs
+import path from 'node:path';
+```
+
 ## Windows vs. POSIX
 
 The default operation of the `node:path` module varies based on the operating

From c3913f9c871bb9c7db85eeaf74d3fc589bd0c36d Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Wed, 6 Nov 2024 15:50:12 +0000
Subject: [PATCH 089/216] tools: fix c-ares updater script for Node.js 18

GitHub Actions is by default running the tools updater workflow
with Node.js 18. Avoid use of `import.meta.dirname`, which wasn't
backported to Node.js 18.

PR-URL: https://github.com/nodejs/node/pull/55717
Refs: https://github.com/nodejs/node/pull/55445
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 tools/dep_updaters/update-c-ares.mjs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/dep_updaters/update-c-ares.mjs b/tools/dep_updaters/update-c-ares.mjs
index c5057e666de70d..bd99dfd34129e2 100644
--- a/tools/dep_updaters/update-c-ares.mjs
+++ b/tools/dep_updaters/update-c-ares.mjs
@@ -1,8 +1,9 @@
 // Synchronize the sources for our c-ares gyp file from c-ares' Makefiles.
 import { readFileSync, writeFileSync } from 'node:fs';
 import { join } from 'node:path';
+import { fileURLToPath } from 'node:url';
 
-const srcroot = join(import.meta.dirname, '..', '..');
+const srcroot = fileURLToPath(new URL('../../', import.meta.url));
 const options = { encoding: 'utf8' };
 
 // Extract list of sources from the gyp file.

From a9998799be08f9cb2a4f4d0161d2b8c03411674b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Jos=C3=A9?= <soyjuanarbol@gmail.com>
Date: Thu, 7 Nov 2024 12:45:45 -0500
Subject: [PATCH 090/216] test: improve test coverage for `ServerResponse`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55711
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
---
 test/parallel/test-http-write-head.js | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/test/parallel/test-http-write-head.js b/test/parallel/test-http-write-head.js
index e132f607ba0ec7..1093a3ce5d60ca 100644
--- a/test/parallel/test-http-write-head.js
+++ b/test/parallel/test-http-write-head.js
@@ -51,6 +51,12 @@ const s = http.createServer(common.mustCall((req, res) => {
     }
   );
 
+  assert.throws(() => {
+    res.writeHead(200, ['invalid', 'headers', 'args']);
+  }, {
+    code: 'ERR_INVALID_ARG_VALUE'
+  });
+
   res.writeHead(200, { Test: '2' });
 
   assert.throws(() => {
@@ -78,7 +84,9 @@ function runTest() {
 
 {
   const server = http.createServer(common.mustCall((req, res) => {
-    res.writeHead(200, [ 'test', '1' ]);
+    res.writeHead(220, [ 'test', '1' ]); // 220 is not a standard status code
+    assert.strictEqual(res.statusMessage, 'unknown');
+
     assert.throws(() => res.writeHead(200, [ 'test2', '2' ]), {
       code: 'ERR_HTTP_HEADERS_SENT',
       name: 'Error',

From 39b89e90b4d85a7670284dad05fd7269ed7ce79e Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Thu, 7 Nov 2024 16:08:56 -0300
Subject: [PATCH 091/216] doc: enforce strict policy to semver-major releases
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55732
Refs: https://github.com/nodejs/Release/issues/1054
Reviewed-By: Paolo Insogna <paolo@cowtech.it>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 doc/contributing/releases.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 0160d94fea5a61..400aa51fb2667d 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -1217,8 +1217,14 @@ the releaser, these must be kept in sync with `main`.
 The `vN.x` and `vN.x-staging` branches must be kept in sync with one another
 up until the date of the release.
 
-The TSC should be informed of any `SEMVER-MAJOR` commits that land within one
-month of the release.
+If a `SEMVER-MAJOR` pull request lands on the default branch within one month
+prior to the major release date, it must not be included on the new major
+staging branch, unless there is consensus from the Node.js releasers team to
+do so. This measure aims to ensure better stability for the release candidate
+(RC) phase, which begins approximately two weeks prior to the official release.
+By restricting `SEMVER-MAJOR` commits in this period, we provide more time for
+thorough testing and reduce the potential for major breakages, especially in
+LTS lines.
 
 ### Create release labels
 

From b40789e0858ec73086a38cb735c9250455b8c2ec Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Thu, 7 Nov 2024 18:48:46 -0300
Subject: [PATCH 092/216] test: add buffer to fs_permission tests

PR-URL: https://github.com/nodejs/node/pull/55734
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 test/fixtures/permission/fs-read.js  | 54 ++++++++++++++++++++++++
 test/fixtures/permission/fs-write.js | 63 ++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/test/fixtures/permission/fs-read.js b/test/fixtures/permission/fs-read.js
index f0cd80b360d08b..a32e12f82aa8ab 100644
--- a/test/fixtures/permission/fs-read.js
+++ b/test/fixtures/permission/fs-read.js
@@ -20,6 +20,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.readFile(bufferBlockedFile, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.readFileSync(blockedFile);
   }, common.expectsError({
@@ -78,6 +83,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.stat(bufferBlockedFile, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.statSync(blockedFile);
   }, common.expectsError({
@@ -111,6 +121,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.access(bufferBlockedFile, fs.constants.R_OK, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.accessSync(blockedFileURL, fs.constants.R_OK);
   }, common.expectsError({
@@ -139,6 +154,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.copyFile(bufferBlockedFile, path.join(blockedFolder, 'any-other-file'), common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.copyFileSync(blockedFileURL, path.join(blockedFolder, 'any-other-file'));
   }, common.expectsError({
@@ -165,6 +185,13 @@ const regularFile = __filename;
     // cpSync calls lstatSync before reading blockedFile
     resource: path.toNamespacedPath(blockedFile),
   }));
+  assert.throws(() => {
+    fs.cpSync(bufferBlockedFile, path.join(blockedFolder, 'any-other-file'));
+  }, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.cpSync(blockedFileURL, path.join(blockedFolder, 'any-other-file'));
   }, common.expectsError({
@@ -189,6 +216,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.open(bufferBlockedFile, 'r', common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.openSync(blockedFileURL, 'r');
   }, common.expectsError({
@@ -313,6 +345,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.rename(bufferBlockedFile, 'newfile', common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.renameSync(blockedFile, 'newfile');
   }, common.expectsError({
@@ -338,6 +375,13 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  assert.throws(() => {
+    fs.openAsBlob(bufferBlockedFile);
+  }, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.openAsBlob(blockedFileURL);
   }, common.expectsError({
@@ -372,6 +416,11 @@ const regularFile = __filename;
     permission: 'FileSystemRead',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.statfs(bufferBlockedFile, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemRead',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.statfsSync(blockedFile);
   }, common.expectsError({
@@ -406,6 +455,11 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
   }));
+  assert.throws(() => {
+    fs.lstatSync(bufferBlockedFile);
+  }, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+  }));
   assert.throws(() => {
     fs.lstatSync(path.join(blockedFolder, 'anyfile'));
   }, common.expectsError({
diff --git a/test/fixtures/permission/fs-write.js b/test/fixtures/permission/fs-write.js
index 31e96860972a9f..0c0ec72602041a 100644
--- a/test/fixtures/permission/fs-write.js
+++ b/test/fixtures/permission/fs-write.js
@@ -11,6 +11,7 @@ const regularFolder = process.env.ALLOWEDFOLDER;
 const regularFile = process.env.ALLOWEDFILE;
 const blockedFolder = process.env.BLOCKEDFOLDER;
 const blockedFile = process.env.BLOCKEDFILE;
+const bufferBlockedFile = Buffer.from(process.env.BLOCKEDFILE);
 const blockedFileURL = require('url').pathToFileURL(process.env.BLOCKEDFILE);
 const relativeProtectedFile = process.env.RELATIVEBLOCKEDFILE;
 const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
@@ -34,6 +35,11 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.writeFile(bufferBlockedFile, 'example', common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.writeFileSync(blockedFileURL, 'example');
   }, {
@@ -102,6 +108,13 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(blockedFile),
   });
+  assert.throws(() => {
+    fs.utimes(bufferBlockedFile, new Date(), new Date(), () => {});
+  }, {
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(blockedFile),
+  });
   assert.throws(() => {
     fs.utimes(blockedFileURL, new Date(), new Date(), () => {});
   }, {
@@ -135,6 +148,13 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(blockedFile),
   });
+  assert.throws(() => {
+    fs.lutimes(bufferBlockedFile, new Date(), new Date(), () => {});
+  },{
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(blockedFile),
+  });
   assert.throws(() => {
     fs.lutimes(blockedFileURL, new Date(), new Date(), () => {});
   }, {
@@ -193,6 +213,11 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(blockedFile),
   }));
+  fs.rename(bufferBlockedFile, path.join(blockedFile, 'renamed'), common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(blockedFile),
+  }));
   assert.throws(() => {
     fs.renameSync(blockedFileURL, path.join(blockedFile, 'renamed'));
   }, {
@@ -245,6 +270,11 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
   }));
+  fs.copyFile(bufferBlockedFile, path.join(relativeProtectedFolder, 'any-file'), common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
+  }));
 }
 
 // fs.cp
@@ -295,6 +325,10 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   }));
+  fs.open(bufferBlockedFile, fs.constants.O_RDWR | 0x10000000, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  }));
   assert.rejects(async () => {
     await fs.promises.open(blockedFile, fs.constants.O_RDWR | fs.constants.O_NOFOLLOW);
   }, {
@@ -322,6 +356,12 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   });
+  assert.throws(() => {
+    fs.chmod(bufferBlockedFile, 0o755, common.mustNotCall());
+  }, {
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  });
   assert.throws(() => {
     fs.chmod(blockedFileURL, 0o755, common.mustNotCall());
   }, {
@@ -358,6 +398,10 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   }));
+  fs.appendFile(bufferBlockedFile, 'new data', common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  }));
   assert.throws(() => {
     fs.appendFileSync(blockedFileURL, 'new data');
   }, {
@@ -378,6 +422,10 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   }));
+  fs.chown(bufferBlockedFile, 1541, 999, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  }));
   assert.throws(() => {
     fs.chownSync(blockedFileURL, 1541, 999);
   }, {
@@ -399,6 +447,10 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   }));
+  fs.lchown(bufferBlockedFile, 1541, 999, common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  }));
   assert.throws(() => {
     fs.lchownSync(blockedFileURL, 1541, 999);
   }, {
@@ -426,6 +478,10 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
   }));
+  fs.link(bufferBlockedFile, path.join(blockedFolder, '/linked'), common.expectsError({
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+  }));
   assert.throws(() => {
     fs.linkSync(blockedFileURL, path.join(blockedFolder, '/linked'));
   }, {
@@ -450,6 +506,13 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
     permission: 'FileSystemWrite',
     resource: path.toNamespacedPath(blockedFile),
   });
+  assert.throws(() => {
+    fs.unlinkSync(bufferBlockedFile);
+  }, {
+    code: 'ERR_ACCESS_DENIED',
+    permission: 'FileSystemWrite',
+    resource: path.toNamespacedPath(blockedFile),
+  });
   fs.unlink(blockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',

From ebdbbc3ec8cc93f220d4962784dbed6de2206303 Mon Sep 17 00:00:00 2001
From: Livia Medeiros <livia@cirno.name>
Date: Fri, 8 Nov 2024 19:39:11 +0900
Subject: [PATCH 093/216] test: ensure that test priority is not higher than
 current priority

PR-URL: https://github.com/nodejs/node/pull/55739
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 test/parallel/test-os.js | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/test/parallel/test-os.js b/test/parallel/test-os.js
index f7059260ce507e..efaec2b3ead385 100644
--- a/test/parallel/test-os.js
+++ b/test/parallel/test-os.js
@@ -83,11 +83,12 @@ assert.ok(hostname.length > 0);
 
 // IBMi process priority is different.
 if (!common.isIBMi) {
-  const DUMMY_PRIORITY = 10;
-  os.setPriority(DUMMY_PRIORITY);
+  const { PRIORITY_BELOW_NORMAL, PRIORITY_LOW } = os.constants.priority;
+  const LOWER_PRIORITY = os.getPriority() > PRIORITY_BELOW_NORMAL ? PRIORITY_BELOW_NORMAL : PRIORITY_LOW;
+  os.setPriority(LOWER_PRIORITY);
   const priority = os.getPriority();
   is.number(priority);
-  assert.strictEqual(priority, DUMMY_PRIORITY);
+  assert.strictEqual(priority, LOWER_PRIORITY);
 }
 
 // On IBMi, os.uptime() returns 'undefined'

From 2abfdefcf3f5fe30712c71af1536eb8051dcaf12 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 8 Nov 2024 13:02:49 +0000
Subject: [PATCH 094/216] doc: clarify removal of experimental API does not
 require a deprecation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55746
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/documentation.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/api/documentation.md b/doc/api/documentation.md
index edac7426fe0324..c6edb13ad613cd 100644
--- a/doc/api/documentation.md
+++ b/doc/api/documentation.md
@@ -44,6 +44,9 @@ The stability indexes are as follows:
 >   still occur in response to user feedback. We encourage user testing and
 >   feedback so that we can know that this feature is ready to be marked as
 >   stable.
+>
+> Experimental features leave the experimental status typically either by
+> graduating to stable, or are removed without a deprecation cycle.
 
 <!-- separator -->
 

From 0aa9e74027d600f836728a41766c07019296711a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Jos=C3=A9?= <soyjuanarbol@gmail.com>
Date: Fri, 8 Nov 2024 11:32:30 -0500
Subject: [PATCH 095/216] test: improve test coverage for child process message
 sending
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55710
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 test/parallel/test-child-process-send-type-error.js | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/test/parallel/test-child-process-send-type-error.js b/test/parallel/test-child-process-send-type-error.js
index 65c620dd29b3d2..91136cd03179fa 100644
--- a/test/parallel/test-child-process-send-type-error.js
+++ b/test/parallel/test-child-process-send-type-error.js
@@ -4,10 +4,10 @@ const common = require('../common');
 const assert = require('assert');
 const cp = require('child_process');
 
-function fail(proc, args) {
+function fail(proc, args, code = 'ERR_INVALID_ARG_TYPE') {
   assert.throws(() => {
     proc.send.apply(proc, args);
-  }, { code: 'ERR_INVALID_ARG_TYPE', name: 'TypeError' });
+  }, { code, name: 'TypeError' });
 }
 
 let target = process;
@@ -25,5 +25,6 @@ fail(target, ['msg', null, '']);
 fail(target, ['msg', null, 'foo']);
 fail(target, ['msg', null, 0]);
 fail(target, ['msg', null, NaN]);
+fail(target, ['msg', 'meow', undefined], 'ERR_INVALID_HANDLE_TYPE');
 fail(target, ['msg', null, 1]);
 fail(target, ['msg', null, null, common.mustNotCall()]);

From df1002438ac432a275db523d3912d52e5265fad1 Mon Sep 17 00:00:00 2001
From: Yagiz Nizipli <yagiz@nizipli.com>
Date: Fri, 8 Nov 2024 14:40:25 -0500
Subject: [PATCH 096/216] src: improve `node:os` userInfo performance

PR-URL: https://github.com/nodejs/node/pull/55719
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 lib/os.js      |  5 -----
 src/node_os.cc | 47 ++++++++++++++++++++++++++++-------------------
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/lib/os.js b/lib/os.js
index bef4936c7c0bbe..f39e8319ca6b80 100644
--- a/lib/os.js
+++ b/lib/os.js
@@ -364,11 +364,6 @@ function userInfo(options) {
   if (user === undefined)
     throw new ERR_SYSTEM_ERROR(ctx);
 
-  if (isWindows) {
-    user.uid |= 0;
-    user.gid |= 0;
-  }
-
   return user;
 }
 
diff --git a/src/node_os.cc b/src/node_os.cc
index 7318c1a368d871..ce2af8d83b7443 100644
--- a/src/node_os.cc
+++ b/src/node_os.cc
@@ -287,21 +287,29 @@ static void GetUserInfo(const FunctionCallbackInfo<Value>& args) {
     encoding = UTF8;
   }
 
-  const int err = uv_os_get_passwd(&pwd);
-
-  if (err) {
+  if (const int err = uv_os_get_passwd(&pwd)) {
     CHECK_GE(args.Length(), 2);
     env->CollectUVExceptionInfo(args[args.Length() - 1], err,
                                 "uv_os_get_passwd");
     return args.GetReturnValue().SetUndefined();
   }
 
-  auto free_passwd = OnScopeLeave([&]() { uv_os_free_passwd(&pwd); });
+  auto free_passwd = OnScopeLeave([&] { uv_os_free_passwd(&pwd); });
 
   Local<Value> error;
 
+#ifdef _WIN32
+  Local<Value> uid = Number::New(
+      env->isolate(),
+      static_cast<double>(static_cast<int32_t>(pwd.uid & 0xFFFFFFFF)));
+  Local<Value> gid = Number::New(
+      env->isolate(),
+      static_cast<double>(static_cast<int32_t>(pwd.gid & 0xFFFFFFFF)));
+#else
   Local<Value> uid = Number::New(env->isolate(), pwd.uid);
   Local<Value> gid = Number::New(env->isolate(), pwd.gid);
+#endif
+
   MaybeLocal<Value> username = StringBytes::Encode(env->isolate(),
                                                    pwd.username,
                                                    encoding,
@@ -323,21 +331,22 @@ static void GetUserInfo(const FunctionCallbackInfo<Value>& args) {
     return;
   }
 
-  Local<Object> entry = Object::New(env->isolate());
-
-  entry->Set(env->context(), env->uid_string(), uid).Check();
-  entry->Set(env->context(), env->gid_string(), gid).Check();
-  entry->Set(env->context(),
-             env->username_string(),
-             username.ToLocalChecked()).Check();
-  entry->Set(env->context(),
-             env->homedir_string(),
-             homedir.ToLocalChecked()).Check();
-  entry->Set(env->context(),
-             env->shell_string(),
-             shell.ToLocalChecked()).Check();
-
-  args.GetReturnValue().Set(entry);
+  constexpr size_t kRetLength = 5;
+  std::array<Local<v8::Name>, kRetLength> names = {env->uid_string(),
+                                                   env->gid_string(),
+                                                   env->username_string(),
+                                                   env->homedir_string(),
+                                                   env->shell_string()};
+  std::array values = {uid,
+                       gid,
+                       username.ToLocalChecked(),
+                       homedir.ToLocalChecked(),
+                       shell.ToLocalChecked()};
+  args.GetReturnValue().Set(Object::New(env->isolate(),
+                                        Null(env->isolate()),
+                                        names.data(),
+                                        values.data(),
+                                        kRetLength));
 }
 
 

From 1c8c881aef184002388b120aaeed726f03b5fa8f Mon Sep 17 00:00:00 2001
From: Marco Ippolito <marcoippolito54@gmail.com>
Date: Sun, 10 Nov 2024 13:49:09 +0000
Subject: [PATCH 097/216] tools: make commit-queue check blocked label
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55781
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/commit-queue.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/workflows/commit-queue.yml b/.github/workflows/commit-queue.yml
index 3417ed62a53b6b..f51ebe2540320f 100644
--- a/.github/workflows/commit-queue.yml
+++ b/.github/workflows/commit-queue.yml
@@ -37,6 +37,7 @@ jobs:
                   --repo ${{ github.repository }} \
                   --base ${{ github.ref_name }} \
                   --label 'commit-queue' \
+                  --no-label 'blocked' \
                   --json 'number' \
                   --search "created:<=$(date --date="2 days ago"  +"%Y-%m-%dT%H:%M:%S%z")" \
                   -t '{{ range . }}{{ .number }} {{ end }}' \
@@ -46,6 +47,7 @@ jobs:
                   --base ${{ github.ref_name }} \
                   --label 'commit-queue' \
                   --label 'fast-track' \
+                  --no-label 'blocked' \
                   --json 'number' \
                   -t '{{ range . }}{{ .number }} {{ end }}' \
                   --limit 100)

From aad478e58d2650c5a16af77d0053e49cc61b174c Mon Sep 17 00:00:00 2001
From: Richard Lau <rlau@redhat.com>
Date: Sun, 10 Nov 2024 15:59:05 +0000
Subject: [PATCH 098/216] tools: fix exclude labels for commit-queue

The `gh` cli doesn't recognise `--no-label`. Instead exclude labels
via the `--search` flag.

Refs: https://github.com/nodejs/node/pull/55781#issuecomment-2466782441
Refs: https://github.com/cli/cli/discussions/4142
PR-URL: https://github.com/nodejs/node/pull/55809
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Beth Griggs <bethanyngriggs@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
---
 .github/workflows/commit-queue.yml | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/commit-queue.yml b/.github/workflows/commit-queue.yml
index f51ebe2540320f..0317e17e6605f4 100644
--- a/.github/workflows/commit-queue.yml
+++ b/.github/workflows/commit-queue.yml
@@ -37,9 +37,8 @@ jobs:
                   --repo ${{ github.repository }} \
                   --base ${{ github.ref_name }} \
                   --label 'commit-queue' \
-                  --no-label 'blocked' \
                   --json 'number' \
-                  --search "created:<=$(date --date="2 days ago"  +"%Y-%m-%dT%H:%M:%S%z")" \
+                  --search "created:<=$(date --date="2 days ago"  +"%Y-%m-%dT%H:%M:%S%z") -label:blocked" \
                   -t '{{ range . }}{{ .number }} {{ end }}' \
                   --limit 100)
           fast_track_prs=$(gh pr list \
@@ -47,7 +46,7 @@ jobs:
                   --base ${{ github.ref_name }} \
                   --label 'commit-queue' \
                   --label 'fast-track' \
-                  --no-label 'blocked' \
+                  --search "-label:blocked" \
                   --json 'number' \
                   -t '{{ range . }}{{ .number }} {{ end }}' \
                   --limit 100)

From 07f53b1d75a253b207ee3fd16a0e3db63eb6aca3 Mon Sep 17 00:00:00 2001
From: Gireesh Punathil <gpunathi@in.ibm.com>
Date: Mon, 11 Nov 2024 03:45:13 +0530
Subject: [PATCH 099/216] doc: clarify triager role
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

highlight additional points around triager role

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55775
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
---
 doc/contributing/issues.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/contributing/issues.md b/doc/contributing/issues.md
index 7cd043148ff000..e9bbc4de28ed9e 100644
--- a/doc/contributing/issues.md
+++ b/doc/contributing/issues.md
@@ -60,5 +60,17 @@ activities, such as applying labels and closing/reopening/assigning issues.
 For more information on the roles and permissions, see ["Permission levels for
 repositories owned by an organization"](https://docs.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization#permission-levels-for-repositories-owned-by-an-organization).
 
+When triagging issues and PRs:
+
+* Show patience and empathy, especially to first-time contributors.
+* Show no patience towards spam or troll, close the issue without interacting with it and
+  report the user to the moderation repository.
+* If you're not able to reproduce an issue, leave a comment asking for more info and
+  add the `needs more info` label.
+* Ideally issues should be closed only when they have been fixed or answered (and
+  merged for pull requests). Closing an issue (or PR) earlier can be seen as
+  dismissive from the point of view of the reporter/author.
+  Always try to communicate the reason for closing the issue/PR.
+
 [Node.js help repository]: https://github.com/nodejs/help/issues
 [Technical Steering Committee (TSC) repository]: https://github.com/nodejs/TSC/issues

From 9f14ba808db48c02a7021ea23b695270369b295c Mon Sep 17 00:00:00 2001
From: Cheng <git@zcbenz.com>
Date: Mon, 11 Nov 2024 13:33:06 +0900
Subject: [PATCH 100/216] build: implement node_use_amaro flag in GN build

PR-URL: https://github.com/nodejs/node/pull/55798
Refs: https://github.com/nodejs/node/pull/54136
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 node.gni                      | 3 +++
 tools/generate_config_gypi.py | 1 +
 unofficial.gni                | 5 +++++
 3 files changed, 9 insertions(+)

diff --git a/node.gni b/node.gni
index f7d896f0ef1c13..1d35e81039e566 100644
--- a/node.gni
+++ b/node.gni
@@ -58,6 +58,9 @@ declare_args() {
   #   1. cross-os compilation is not supported.
   #   2. node_mksnapshot crashes when cross-compiling for x64 from arm64.
   node_use_node_snapshot = (host_os == target_os) && !(host_cpu == "arm64" && target_cpu == "x64")
+
+  # Build with Amaro (TypeScript utils).
+  node_use_amaro = true
 }
 
 assert(!node_enable_inspector || node_use_openssl,
diff --git a/tools/generate_config_gypi.py b/tools/generate_config_gypi.py
index 206da7f08eaea4..4c0caa711a09ea 100755
--- a/tools/generate_config_gypi.py
+++ b/tools/generate_config_gypi.py
@@ -61,6 +61,7 @@ def translate_config(out_dir, config, v8_config):
           eval(config['node_builtin_shareable_builtins']),
       'node_module_version': int(config['node_module_version']),
       'node_use_openssl': config['node_use_openssl'],
+      'node_use_amaro': config['node_use_amaro'],
       'node_use_node_code_cache': config['node_use_node_code_cache'],
       'node_use_node_snapshot': config['node_use_node_snapshot'],
       'v8_enable_inspector':  # this is actually a node misnomer
diff --git a/unofficial.gni b/unofficial.gni
index c967fabd0c4906..b079fca822df24 100644
--- a/unofficial.gni
+++ b/unofficial.gni
@@ -22,6 +22,11 @@ template("node_gn_build") {
     } else {
       defines += [ "HAVE_OPENSSL=0" ]
     }
+    if (node_use_amaro) {
+      defines += [ "HAVE_AMARO=1" ]
+    } else {
+      defines += [ "HAVE_AMARO=0" ]
+    }
     if (node_use_v8_platform) {
       defines += [ "NODE_USE_V8_PLATFORM=1" ]
     } else {

From 83b415e8f3dcd80cbec87eeb295c8d7aa6964753 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Mon, 11 Nov 2024 15:45:43 +0000
Subject: [PATCH 101/216] doc: run license-builder
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55813
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 LICENSE | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/LICENSE b/LICENSE
index 3f6cacb6785850..e4f0804ee5bdb1 100644
--- a/LICENSE
+++ b/LICENSE
@@ -567,6 +567,34 @@ The externally maintained libraries used by Node.js are:
 
     ----------------------------------------------------------------------
 
+    JSON parsing library (nlohmann/json)
+
+    File: vendor/json/upstream/single_include/nlohmann/json.hpp (only for ICU4C)
+
+    MIT License
+
+    Copyright (c) 2013-2022 Niels Lohmann
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE.
+
+    ----------------------------------------------------------------------
+
     File: aclocal.m4 (only for ICU4C)
     Section: pkg.m4 - Macros to locate and utilise pkg-config.
 
@@ -604,7 +632,7 @@ The externally maintained libraries used by Node.js are:
 
     This file is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published by
-    the Free Software Foundation; either version 3 of the License, or
+    the Free Software Foundation, either version 3 of the License, or
     (at your option) any later version.
 
     This program is distributed in the hope that it will be useful, but

From 407992e2726709847df314f725c786809029c7f4 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Mon, 11 Nov 2024 15:45:59 -0500
Subject: [PATCH 102/216] benchmark: add `test_runner/mock-fn`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55771
Refs: https://github.com/nodejs/node/issues/55723
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
---
 benchmark/test_runner/mock-fn.js | 48 ++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 benchmark/test_runner/mock-fn.js

diff --git a/benchmark/test_runner/mock-fn.js b/benchmark/test_runner/mock-fn.js
new file mode 100644
index 00000000000000..6489ccf815e294
--- /dev/null
+++ b/benchmark/test_runner/mock-fn.js
@@ -0,0 +1,48 @@
+'use strict';
+
+const common = require('../common');
+const assert = require('node:assert');
+const { test } = require('node:test');
+
+const bench = common.createBenchmark(main, {
+  n: [1e6],
+  mode: ['define', 'execute'],
+}, {
+  // We don't want to test the reporter here
+  flags: ['--test-reporter=./benchmark/fixtures/empty-test-reporter.js'],
+});
+
+const noop = () => {};
+
+function benchmarkDefine(n) {
+  let noDead;
+  test((t) => {
+    bench.start();
+    for (let i = 0; i < n; i++) {
+      noDead = t.mock.fn(noop);
+    }
+    bench.end(n);
+    assert.ok(noDead);
+  });
+}
+
+function benchmarkExecute(n) {
+  let noDead;
+  test((t) => {
+    const mocked = t.mock.fn(noop);
+    bench.start();
+    for (let i = 0; i < n; i++) {
+      noDead = mocked();
+    }
+    bench.end(n);
+    assert.strictEqual(noDead, noop());
+  });
+}
+
+function main({ n, mode }) {
+  if (mode === 'define') {
+    benchmarkDefine(n);
+  } else if (mode === 'execute') {
+    benchmarkExecute(n);
+  }
+}

From 753c3b322fd5bd375aceaab912da33daaaa7592b Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 11 Nov 2024 20:01:23 -0500
Subject: [PATCH 103/216] deps: update c-ares to v1.34.3

PR-URL: https://github.com/nodejs/node/pull/55803
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 deps/cares/CMakeLists.txt                    |    6 +-
 deps/cares/Makefile.in                       |   47 +-
 deps/cares/RELEASE-NOTES.md                  |   31 +
 deps/cares/aclocal.m4                        |  421 +-
 deps/cares/aminclude_static.am               |    2 +-
 deps/cares/config/ltmain.sh                  |    0
 deps/cares/configure                         | 4448 +++++++-----------
 deps/cares/configure.ac                      |    9 +-
 deps/cares/docs/Makefile.in                  |   27 +-
 deps/cares/include/Makefile.in               |   31 +-
 deps/cares/include/ares_version.h            |    4 +-
 deps/cares/m4/libtool.m4                     |  549 +--
 deps/cares/m4/ltoptions.m4                   |  108 +-
 deps/cares/m4/ltsugar.m4                     |    2 +-
 deps/cares/m4/ltversion.m4                   |   13 +-
 deps/cares/m4/lt~obsolete.m4                 |    4 +-
 deps/cares/src/Makefile.in                   |   20 +-
 deps/cares/src/lib/Makefile.in               |  103 +-
 deps/cares/src/lib/ares_config.h.in          |   37 +-
 deps/cares/src/lib/ares_getaddrinfo.c        |   34 +-
 deps/cares/src/lib/ares_process.c            |   55 +-
 deps/cares/src/lib/ares_send.c               |    5 +
 deps/cares/src/lib/event/ares_event_thread.c |    8 +-
 deps/cares/src/tools/Makefile.in             |   35 +-
 24 files changed, 2523 insertions(+), 3476 deletions(-)
 mode change 100644 => 100755 deps/cares/config/ltmain.sh
 mode change 100644 => 100755 deps/cares/m4/libtool.m4
 mode change 100644 => 100755 deps/cares/m4/ltoptions.m4
 mode change 100644 => 100755 deps/cares/m4/ltsugar.m4
 mode change 100644 => 100755 deps/cares/m4/ltversion.m4
 mode change 100644 => 100755 deps/cares/m4/lt~obsolete.m4

diff --git a/deps/cares/CMakeLists.txt b/deps/cares/CMakeLists.txt
index cf9a516414d1ab..f6560d56b08ddd 100644
--- a/deps/cares/CMakeLists.txt
+++ b/deps/cares/CMakeLists.txt
@@ -12,7 +12,7 @@ INCLUDE (CheckCSourceCompiles)
 INCLUDE (CheckStructHasMember)
 INCLUDE (CheckLibraryExists)
 
-PROJECT (c-ares LANGUAGES C VERSION "1.34.2" )
+PROJECT (c-ares LANGUAGES C VERSION "1.34.3" )
 
 # Set this version before release
 SET (CARES_VERSION "${PROJECT_VERSION}")
@@ -30,7 +30,7 @@ INCLUDE (GNUInstallDirs) # include this *AFTER* PROJECT(), otherwise paths are w
 # For example, a version of 4:0:2 would generate output such as:
 #    libname.so   -> libname.so.2
 #    libname.so.2 -> libname.so.2.2.0
-SET (CARES_LIB_VERSIONINFO "21:1:19")
+SET (CARES_LIB_VERSIONINFO "21:2:19")
 
 
 OPTION (CARES_STATIC        "Build as a static library"                                             OFF)
@@ -263,7 +263,7 @@ ENDIF ()
 # Set system-specific compiler flags
 IF (CMAKE_SYSTEM_NAME STREQUAL "Darwin")
 	LIST (APPEND SYSFLAGS -D_DARWIN_C_SOURCE)
-ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "Linux")
+ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "Linux" OR CMAKE_SYSTEM_NAME STREQUAL "Android")
 	LIST (APPEND SYSFLAGS -D_GNU_SOURCE -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700)
 ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "SunOS")
 	LIST (APPEND SYSFLAGS -D__EXTENSIONS__ -D_REENTRANT -D_XOPEN_SOURCE=600)
diff --git a/deps/cares/Makefile.in b/deps/cares/Makefile.in
index 706dafdbdfc5fa..ba78cb77cbe335 100644
--- a/deps/cares/Makefile.in
+++ b/deps/cares/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -91,8 +91,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -196,9 +194,10 @@ am__base_list = \
   sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \
   sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g'
 am__uninstall_files_from_dir = { \
-  { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
-  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
-       $(am__cd) "$$dir" && echo $$files | $(am__xargs_n) 40 $(am__rm_f); }; \
+  test -z "$$files" \
+    || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
+    || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
+         $(am__cd) "$$dir" && rm -f $$files; }; \
   }
 am__installdirs = "$(DESTDIR)$(pkgconfigdir)"
 DATA = $(pkgconfig_DATA)
@@ -239,8 +238,8 @@ distdir = $(PACKAGE)-$(VERSION)
 top_distdir = $(distdir)
 am__remove_distdir = \
   if test -d "$(distdir)"; then \
-    find "$(distdir)" -type d ! -perm -700 -exec chmod u+rwx {} ';' \
-      ; rm -rf "$(distdir)" \
+    find "$(distdir)" -type d ! -perm -200 -exec chmod u+w {} ';' \
+      && rm -rf "$(distdir)" \
       || { sleep 5 && rm -rf "$(distdir)"; }; \
   else :; fi
 am__post_remove_distdir = $(am__remove_distdir)
@@ -270,16 +269,14 @@ am__relativize = \
   done; \
   reldir="$$dir2"
 DIST_ARCHIVES = $(distdir).tar.gz
-GZIP_ENV = -9
+GZIP_ENV = --best
 DIST_TARGETS = dist-gzip
 # Exists only to be overridden by the user if desired.
 AM_DISTCHECK_DVI_TARGET = dvi
 distuninstallcheck_listfiles = find . -type f -print
 am__distuninstallcheck_listfiles = $(distuninstallcheck_listfiles) \
   | sed 's|^\./|$(prefix)/|' | grep -v '$(infodir)/dir$$'
-distcleancheck_listfiles = \
-  find . \( -type f -a \! \
-            \( -name .nfs* -o -name .smb* -o -name .__afs* \) \) -print
+distcleancheck_listfiles = find . -type f -print
 ACLOCAL = @ACLOCAL@
 AMTAR = @AMTAR@
 AM_CFLAGS = @AM_CFLAGS@
@@ -325,7 +322,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -392,10 +388,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -641,7 +635,7 @@ distdir: $(BUILT_SOURCES)
 
 distdir-am: $(DISTFILES)
 	$(am__remove_distdir)
-	$(AM_V_at)$(MKDIR_P) "$(distdir)"
+	test -d "$(distdir)" || mkdir "$(distdir)"
 	@srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \
 	topsrcdirstrip=`echo "$(top_srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \
 	list='$(DISTFILES)'; \
@@ -755,7 +749,7 @@ dist dist-all:
 distcheck: dist
 	case '$(DIST_ARCHIVES)' in \
 	*.tar.gz*) \
-	  eval GZIP= gzip -dc $(distdir).tar.gz | $(am__untar) ;;\
+	  eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).tar.gz | $(am__untar) ;;\
 	*.tar.bz2*) \
 	  bzip2 -dc $(distdir).tar.bz2 | $(am__untar) ;;\
 	*.tar.lz*) \
@@ -765,7 +759,7 @@ distcheck: dist
 	*.tar.Z*) \
 	  uncompress -c $(distdir).tar.Z | $(am__untar) ;;\
 	*.shar.gz*) \
-	  eval GZIP= gzip -dc $(distdir).shar.gz | unshar ;;\
+	  eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).shar.gz | unshar ;;\
 	*.zip*) \
 	  unzip $(distdir).zip ;;\
 	*.tar.zst*) \
@@ -866,12 +860,12 @@ install-strip:
 mostlyclean-generic:
 
 clean-generic:
-	-$(am__rm_f) $(CLEANFILES)
+	-test -z "$(CLEANFILES)" || rm -f $(CLEANFILES)
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
-	-$(am__rm_f) $(DISTCLEANFILES)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
+	-test -z "$(DISTCLEANFILES)" || rm -f $(DISTCLEANFILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -980,10 +974,3 @@ dist-hook:
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%
diff --git a/deps/cares/RELEASE-NOTES.md b/deps/cares/RELEASE-NOTES.md
index cbd4788600f3ac..f9d58d278432f1 100644
--- a/deps/cares/RELEASE-NOTES.md
+++ b/deps/cares/RELEASE-NOTES.md
@@ -1,3 +1,34 @@
+## c-ares version 1.34.3 - November 9 2024
+
+This is a bugfix release.
+
+Changes:
+* Build the release package in an automated way so we can provide
+  provenance as per [SLSA3](https://slsa.dev/).
+  [PR #906](https://github.com/c-ares/c-ares/pull/906)
+
+Bugfixes:
+* Some upstream servers are non-compliant with EDNS options, resend queries
+  without EDNS. [Issue #911](https://github.com/c-ares/c-ares/issues/911)
+* Android: <=7 needs sys/system_properties.h
+  [a70637c](https://github.com/c-ares/c-ares/commit/a70637c)
+* Android: CMake needs `-D_GNU_SOURCE` and others.
+  [PR #915](https://github.com/c-ares/c-ares/pull/914)
+* TSAN warns on missing lock, but lock isn't actually necessary.
+  [PR #915](https://github.com/c-ares/c-ares/pull/915)
+* `ares_getaddrinfo()` for `AF_UNSPEC` should retry IPv4 if only IPv6 is
+  received. [765d558](https://github.com/c-ares/c-ares/commit/765d558)
+* `ares_send()` shouldn't return `ARES_EBADRESP`, its `ARES_EBADQUERY`.
+  [91519e7](https://github.com/c-ares/c-ares/commit/91519e7)
+* Fix typos in man pages. [PR #905](https://github.com/c-ares/c-ares/pull/905)
+
+Thanks go to these friendly people for their efforts and contributions for this
+release:
+
+* Brad House (@bradh352)
+* Jiwoo Park (@jimmy-park)
+
+
 ## c-ares version 1.34.2 - October 15 2024
 
 This release contains a fix for downstream packages detecting the c-ares
diff --git a/deps/cares/aclocal.m4 b/deps/cares/aclocal.m4
index 68e283c8e5941a..ce7ad1c8a86a43 100644
--- a/deps/cares/aclocal.m4
+++ b/deps/cares/aclocal.m4
@@ -1,6 +1,6 @@
-# generated automatically by aclocal 1.17 -*- Autoconf -*-
+# generated automatically by aclocal 1.16.5 -*- Autoconf -*-
 
-# Copyright (C) 1996-2024 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -14,13 +14,13 @@
 m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])])
 m4_ifndef([AC_AUTOCONF_VERSION],
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
-m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.72],,
-[m4_warning([this file was generated for autoconf 2.72.
+m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.71],,
+[m4_warning([this file was generated for autoconf 2.71.
 You have another version of autoconf.  It may work, but is not guaranteed to.
 If you have problems, you may need to regenerate the build system entirely.
 To do so, use the procedure documented by the package, typically 'autoreconf'.])])
 
-# Copyright (C) 2002-2024 Free Software Foundation, Inc.
+# Copyright (C) 2002-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -32,10 +32,10 @@ To do so, use the procedure documented by the package, typically 'autoreconf'.])
 # generated from the m4 files accompanying Automake X.Y.
 # (This private macro should not be called outside this file.)
 AC_DEFUN([AM_AUTOMAKE_VERSION],
-[am__api_version='1.17'
+[am__api_version='1.16'
 dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to
 dnl require some minimum version.  Point them to the right macro.
-m4_if([$1], [1.17], [],
+m4_if([$1], [1.16.5], [],
       [AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl
 ])
 
@@ -51,14 +51,14 @@ m4_define([_AM_AUTOCONF_VERSION], [])
 # Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced.
 # This function is AC_REQUIREd by AM_INIT_AUTOMAKE.
 AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
-[AM_AUTOMAKE_VERSION([1.17])dnl
+[AM_AUTOMAKE_VERSION([1.16.5])dnl
 m4_ifndef([AC_AUTOCONF_VERSION],
   [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
 _AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))])
 
 # AM_AUX_DIR_EXPAND                                         -*- Autoconf -*-
 
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -110,7 +110,7 @@ am_aux_dir=`cd "$ac_aux_dir" && pwd`
 
 # AM_COND_IF                                            -*- Autoconf -*-
 
-# Copyright (C) 2008-2024 Free Software Foundation, Inc.
+# Copyright (C) 2008-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -147,7 +147,7 @@ fi[]dnl
 
 # AM_CONDITIONAL                                            -*- Autoconf -*-
 
-# Copyright (C) 1997-2024 Free Software Foundation, Inc.
+# Copyright (C) 1997-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -178,7 +178,7 @@ AC_CONFIG_COMMANDS_PRE(
 Usually this means the macro was only invoked conditionally.]])
 fi])])
 
-# Copyright (C) 1999-2024 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -310,7 +310,7 @@ AC_CACHE_CHECK([dependency style of $depcc],
       # icc doesn't choke on unknown options, it will just issue warnings
       # or remarks (even with -Werror).  So we grep stderr for any message
       # that says an option was ignored or not supported.
-      # When given -MP, icc 7.0 and 7.1 complain thus:
+      # When given -MP, icc 7.0 and 7.1 complain thusly:
       #   icc: Command line warning: ignoring option '-M'; no argument required
       # The diagnosis changed in icc 8.0:
       #   icc: Command line remark: option '-MP' not supported
@@ -369,7 +369,7 @@ _AM_SUBST_NOTMAKE([am__nodep])dnl
 
 # Generate code to set up dependency tracking.              -*- Autoconf -*-
 
-# Copyright (C) 1999-2024 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -437,7 +437,7 @@ AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
 
 # Do all the work for Automake.                             -*- Autoconf -*-
 
-# Copyright (C) 1996-2024 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -571,7 +571,7 @@ if test -z "$CSCOPE"; then
 fi
 AC_SUBST([CSCOPE])
 
-AC_REQUIRE([_AM_SILENT_RULES])dnl
+AC_REQUIRE([AM_SILENT_RULES])dnl
 dnl The testsuite driver may need to know about EXEEXT, so add the
 dnl 'am__EXEEXT' conditional if _AM_COMPILER_EXEEXT was seen.  This
 dnl macro is hooked onto _AC_COMPILER_EXEEXT early, see below.
@@ -579,9 +579,47 @@ AC_CONFIG_COMMANDS_PRE(dnl
 [m4_provide_if([_AM_COMPILER_EXEEXT],
   [AM_CONDITIONAL([am__EXEEXT], [test -n "$EXEEXT"])])])dnl
 
-AC_REQUIRE([_AM_PROG_RM_F])
-AC_REQUIRE([_AM_PROG_XARGS_N])
+# POSIX will say in a future version that running "rm -f" with no argument
+# is OK; and we want to be able to make that assumption in our Makefile
+# recipes.  So use an aggressive probe to check that the usage we want is
+# actually supported "in the wild" to an acceptable degree.
+# See automake bug#10828.
+# To make any issue more visible, cause the running configure to be aborted
+# by default if the 'rm' program in use doesn't match our expectations; the
+# user can still override this though.
+if rm -f && rm -fr && rm -rf; then : OK; else
+  cat >&2 <<'END'
+Oops!
+
+Your 'rm' program seems unable to run without file operands specified
+on the command line, even when the '-f' option is present.  This is contrary
+to the behaviour of most rm programs out there, and not conforming with
+the upcoming POSIX standard: <http://austingroupbugs.net/view.php?id=542>
+
+Please tell bug-automake@gnu.org about your system, including the value
+of your $PATH and any error possibly output before this message.  This
+can help us improve future automake versions.
 
+END
+  if test x"$ACCEPT_INFERIOR_RM_PROGRAM" = x"yes"; then
+    echo 'Configuration will proceed anyway, since you have set the' >&2
+    echo 'ACCEPT_INFERIOR_RM_PROGRAM variable to "yes"' >&2
+    echo >&2
+  else
+    cat >&2 <<'END'
+Aborting the configuration process, to ensure you take notice of the issue.
+
+You can download and install GNU coreutils to get an 'rm' implementation
+that behaves properly: <https://www.gnu.org/software/coreutils/>.
+
+If you want to complete the configuration process using your problematic
+'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
+to "yes", and re-run configure.
+
+END
+    AC_MSG_ERROR([Your 'rm' program is bad, sorry.])
+  fi
+fi
 dnl The trailing newline in this macro's definition is deliberate, for
 dnl backward compatibility and to allow trailing 'dnl'-style comments
 dnl after the AM_INIT_AUTOMAKE invocation. See automake bug#16841.
@@ -614,7 +652,7 @@ for _am_header in $config_headers :; do
 done
 echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count])
 
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -635,7 +673,7 @@ if test x"${install_sh+set}" != xset; then
 fi
 AC_SUBST([install_sh])])
 
-# Copyright (C) 2003-2024 Free Software Foundation, Inc.
+# Copyright (C) 2003-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -657,7 +695,7 @@ AC_SUBST([am__leading_dot])])
 # Add --enable-maintainer-mode option to configure.         -*- Autoconf -*-
 # From Jim Meyering
 
-# Copyright (C) 1996-2024 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -692,7 +730,7 @@ AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
 
 # Check to see how 'make' treats includes.	            -*- Autoconf -*-
 
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -735,7 +773,7 @@ AC_SUBST([am__quote])])
 
 # Fake the existence of programs that GNU maintainers use.  -*- Autoconf -*-
 
-# Copyright (C) 1997-2024 Free Software Foundation, Inc.
+# Copyright (C) 1997-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -769,7 +807,7 @@ fi
 
 # Helper functions for option handling.                     -*- Autoconf -*-
 
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -798,7 +836,7 @@ AC_DEFUN([_AM_SET_OPTIONS],
 AC_DEFUN([_AM_IF_OPTION],
 [m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
 
-# Copyright (C) 1999-2024 Free Software Foundation, Inc.
+# Copyright (C) 1999-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -845,23 +883,7 @@ AC_LANG_POP([C])])
 # For backward compatibility.
 AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])])
 
-# Copyright (C) 2022-2024 Free Software Foundation, Inc.
-#
-# This file is free software; the Free Software Foundation
-# gives unlimited permission to copy and/or distribute it,
-# with or without modifications, as long as this notice is preserved.
-
-# _AM_PROG_RM_F
-# ---------------
-# Check whether 'rm -f' without any arguments works.
-# https://bugs.gnu.org/10828
-AC_DEFUN([_AM_PROG_RM_F],
-[am__rm_f_notfound=
-AS_IF([(rm -f && rm -fr && rm -rf) 2>/dev/null], [], [am__rm_f_notfound='""'])
-AC_SUBST(am__rm_f_notfound)
-])
-
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -880,169 +902,16 @@ AC_DEFUN([AM_RUN_LOG],
 
 # Check to make sure that the build environment is sane.    -*- Autoconf -*-
 
-# Copyright (C) 1996-2024 Free Software Foundation, Inc.
+# Copyright (C) 1996-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # with or without modifications, as long as this notice is preserved.
 
-# _AM_SLEEP_FRACTIONAL_SECONDS
-# ----------------------------
-AC_DEFUN([_AM_SLEEP_FRACTIONAL_SECONDS], [dnl
-AC_CACHE_CHECK([whether sleep supports fractional seconds],
-               am_cv_sleep_fractional_seconds, [dnl
-AS_IF([sleep 0.001 2>/dev/null], [am_cv_sleep_fractional_seconds=yes],
-                                 [am_cv_sleep_fractional_seconds=no])
-])])
-
-# _AM_FILESYSTEM_TIMESTAMP_RESOLUTION
-# -----------------------------------
-# Determine the filesystem's resolution for file modification
-# timestamps.  The coarsest we know of is FAT, with a resolution
-# of only two seconds, even with the most recent "exFAT" extensions.
-# The finest (e.g. ext4 with large inodes, XFS, ZFS) is one
-# nanosecond, matching clock_gettime.  However, it is probably not
-# possible to delay execution of a shell script for less than one
-# millisecond, due to process creation overhead and scheduling
-# granularity, so we don't check for anything finer than that. (See below.)
-AC_DEFUN([_AM_FILESYSTEM_TIMESTAMP_RESOLUTION], [dnl
-AC_REQUIRE([_AM_SLEEP_FRACTIONAL_SECONDS])
-AC_CACHE_CHECK([filesystem timestamp resolution],
-               am_cv_filesystem_timestamp_resolution, [dnl
-# Default to the worst case.
-am_cv_filesystem_timestamp_resolution=2
-
-# Only try to go finer than 1 sec if sleep can do it.
-# Don't try 1 sec, because if 0.01 sec and 0.1 sec don't work,
-# - 1 sec is not much of a win compared to 2 sec, and
-# - it takes 2 seconds to perform the test whether 1 sec works.
-# 
-# Instead, just use the default 2s on platforms that have 1s resolution,
-# accept the extra 1s delay when using $sleep in the Automake tests, in
-# exchange for not incurring the 2s delay for running the test for all
-# packages.
-#
-am_try_resolutions=
-if test "$am_cv_sleep_fractional_seconds" = yes; then
-  # Even a millisecond often causes a bunch of false positives,
-  # so just try a hundredth of a second. The time saved between .001 and
-  # .01 is not terribly consequential.
-  am_try_resolutions="0.01 0.1 $am_try_resolutions"
-fi
-
-# In order to catch current-generation FAT out, we must *modify* files
-# that already exist; the *creation* timestamp is finer.  Use names
-# that make ls -t sort them differently when they have equal
-# timestamps than when they have distinct timestamps, keeping
-# in mind that ls -t prints the *newest* file first.
-rm -f conftest.ts?
-: > conftest.ts1
-: > conftest.ts2
-: > conftest.ts3
-
-# Make sure ls -t actually works.  Do 'set' in a subshell so we don't
-# clobber the current shell's arguments. (Outer-level square brackets
-# are removed by m4; they're present so that m4 does not expand
-# <dollar><star>; be careful, easy to get confused.)
-if (
-     set X `[ls -t conftest.ts[12]]` &&
-     {
-       test "$[]*" != "X conftest.ts1 conftest.ts2" ||
-       test "$[]*" != "X conftest.ts2 conftest.ts1";
-     }
-); then :; else
-  # If neither matched, then we have a broken ls.  This can happen
-  # if, for instance, CONFIG_SHELL is bash and it inherits a
-  # broken ls alias from the environment.  This has actually
-  # happened.  Such a system could not be considered "sane".
-  _AS_ECHO_UNQUOTED(
-    ["Bad output from ls -t: \"`[ls -t conftest.ts[12]]`\""],
-    [AS_MESSAGE_LOG_FD])
-  AC_MSG_FAILURE([ls -t produces unexpected output.
-Make sure there is not a broken ls alias in your environment.])
-fi
-
-for am_try_res in $am_try_resolutions; do
-  # Any one fine-grained sleep might happen to cross the boundary
-  # between two values of a coarser actual resolution, but if we do
-  # two fine-grained sleeps in a row, at least one of them will fall
-  # entirely within a coarse interval.
-  echo alpha > conftest.ts1
-  sleep $am_try_res
-  echo beta > conftest.ts2
-  sleep $am_try_res
-  echo gamma > conftest.ts3
-
-  # We assume that 'ls -t' will make use of high-resolution
-  # timestamps if the operating system supports them at all.
-  if (set X `ls -t conftest.ts?` &&
-      test "$[]2" = conftest.ts3 &&
-      test "$[]3" = conftest.ts2 &&
-      test "$[]4" = conftest.ts1); then
-    #
-    # Ok, ls -t worked. If we're at a resolution of 1 second, we're done,
-    # because we don't need to test make.
-    make_ok=true
-    if test $am_try_res != 1; then
-      # But if we've succeeded so far with a subsecond resolution, we
-      # have one more thing to check: make. It can happen that
-      # everything else supports the subsecond mtimes, but make doesn't;
-      # notably on macOS, which ships make 3.81 from 2006 (the last one
-      # released under GPLv2). https://bugs.gnu.org/68808
-      # 
-      # We test $MAKE if it is defined in the environment, else "make".
-      # It might get overridden later, but our hope is that in practice
-      # it does not matter: it is the system "make" which is (by far)
-      # the most likely to be broken, whereas if the user overrides it,
-      # probably they did so with a better, or at least not worse, make.
-      # https://lists.gnu.org/archive/html/automake/2024-06/msg00051.html
-      #
-      # Create a Makefile (real tab character here):
-      rm -f conftest.mk
-      echo 'conftest.ts1: conftest.ts2' >conftest.mk
-      echo '	touch conftest.ts2' >>conftest.mk
-      #
-      # Now, running
-      #   touch conftest.ts1; touch conftest.ts2; make
-      # should touch ts1 because ts2 is newer. This could happen by luck,
-      # but most often, it will fail if make's support is insufficient. So
-      # test for several consecutive successes.
-      #
-      # (We reuse conftest.ts[12] because we still want to modify existing
-      # files, not create new ones, per above.)
-      n=0
-      make=${MAKE-make}
-      until test $n -eq 3; do
-        echo one > conftest.ts1
-        sleep $am_try_res
-        echo two > conftest.ts2 # ts2 should now be newer than ts1
-        if $make -f conftest.mk | grep 'up to date' >/dev/null; then
-          make_ok=false
-          break # out of $n loop
-        fi
-        n=`expr $n + 1`
-      done
-    fi
-    #
-    if $make_ok; then
-      # Everything we know to check worked out, so call this resolution good.
-      am_cv_filesystem_timestamp_resolution=$am_try_res
-      break # out of $am_try_res loop
-    fi
-    # Otherwise, we'll go on to check the next resolution.
-  fi
-done
-rm -f conftest.ts?
-# (end _am_filesystem_timestamp_resolution)
-])])
-
 # AM_SANITY_CHECK
 # ---------------
 AC_DEFUN([AM_SANITY_CHECK],
-[AC_REQUIRE([_AM_FILESYSTEM_TIMESTAMP_RESOLUTION])
-# This check should not be cached, as it may vary across builds of
-# different projects.
-AC_MSG_CHECKING([whether build environment is sane])
+[AC_MSG_CHECKING([whether build environment is sane])
 # Reject unsafe characters in $srcdir or the absolute working directory
 # name.  Accept space and tab only in the latter.
 am_lf='
@@ -1061,40 +930,49 @@ esac
 # symlink; some systems play weird games with the mod time of symlinks
 # (eg FreeBSD returns the mod time of the symlink's containing
 # directory).
-am_build_env_is_sane=no
-am_has_slept=no
-rm -f conftest.file
-for am_try in 1 2; do
-  echo "timestamp, slept: $am_has_slept" > conftest.file
-  if (
-    set X `ls -Lt "$srcdir/configure" conftest.file 2> /dev/null`
-    if test "$[]*" = "X"; then
-      # -L didn't work.
-      set X `ls -t "$srcdir/configure" conftest.file`
-    fi
-    test "$[]2" = conftest.file
-  ); then
-    am_build_env_is_sane=yes
-    break
-  fi
-  # Just in case.
-  sleep "$am_cv_filesystem_timestamp_resolution"
-  am_has_slept=yes
-done
-
-AC_MSG_RESULT([$am_build_env_is_sane])
-if test "$am_build_env_is_sane" = no; then
-  AC_MSG_ERROR([newly created file is older than distributed files!
+if (
+   am_has_slept=no
+   for am_try in 1 2; do
+     echo "timestamp, slept: $am_has_slept" > conftest.file
+     set X `ls -Lt "$srcdir/configure" conftest.file 2> /dev/null`
+     if test "$[*]" = "X"; then
+	# -L didn't work.
+	set X `ls -t "$srcdir/configure" conftest.file`
+     fi
+     if test "$[*]" != "X $srcdir/configure conftest.file" \
+	&& test "$[*]" != "X conftest.file $srcdir/configure"; then
+
+	# If neither matched, then we have a broken ls.  This can happen
+	# if, for instance, CONFIG_SHELL is bash and it inherits a
+	# broken ls alias from the environment.  This has actually
+	# happened.  Such a system could not be considered "sane".
+	AC_MSG_ERROR([ls -t appears to fail.  Make sure there is not a broken
+  alias in your environment])
+     fi
+     if test "$[2]" = conftest.file || test $am_try -eq 2; then
+       break
+     fi
+     # Just in case.
+     sleep 1
+     am_has_slept=yes
+   done
+   test "$[2]" = conftest.file
+   )
+then
+   # Ok.
+   :
+else
+   AC_MSG_ERROR([newly created file is older than distributed files!
 Check your system clock])
 fi
-
+AC_MSG_RESULT([yes])
 # If we didn't sleep, we still need to ensure time stamps of config.status and
 # generated files are strictly newer.
 am_sleep_pid=
-AS_IF([test -e conftest.file || grep 'slept: no' conftest.file >/dev/null 2>&1],, [dnl
-  ( sleep "$am_cv_filesystem_timestamp_resolution" ) &
+if grep 'slept: no' conftest.file >/dev/null 2>&1; then
+  ( sleep 1 ) &
   am_sleep_pid=$!
-])
+fi
 AC_CONFIG_COMMANDS_PRE(
   [AC_MSG_CHECKING([that generated files are newer than configure])
    if test -n "$am_sleep_pid"; then
@@ -1105,18 +983,18 @@ AC_CONFIG_COMMANDS_PRE(
 rm -f conftest.file
 ])
 
-# Copyright (C) 2009-2024 Free Software Foundation, Inc.
+# Copyright (C) 2009-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
 # with or without modifications, as long as this notice is preserved.
 
-# _AM_SILENT_RULES
-# ----------------
-# Enable less verbose build rules support.
-AC_DEFUN([_AM_SILENT_RULES],
-[AM_DEFAULT_VERBOSITY=1
-AC_ARG_ENABLE([silent-rules], [dnl
+# AM_SILENT_RULES([DEFAULT])
+# --------------------------
+# Enable less verbose build rules; with the default set to DEFAULT
+# ("yes" being less verbose, "no" or empty being verbose).
+AC_DEFUN([AM_SILENT_RULES],
+[AC_ARG_ENABLE([silent-rules], [dnl
 AS_HELP_STRING(
   [--enable-silent-rules],
   [less verbose build output (undo: "make V=1")])
@@ -1124,6 +1002,11 @@ AS_HELP_STRING(
   [--disable-silent-rules],
   [verbose build output (undo: "make V=0")])dnl
 ])
+case $enable_silent_rules in @%:@ (((
+  yes) AM_DEFAULT_VERBOSITY=0;;
+   no) AM_DEFAULT_VERBOSITY=1;;
+    *) AM_DEFAULT_VERBOSITY=m4_if([$1], [yes], [0], [1]);;
+esac
 dnl
 dnl A few 'make' implementations (e.g., NonStop OS and NextStep)
 dnl do not support nested variable expansions.
@@ -1142,21 +1025,6 @@ am__doit:
 else
   am_cv_make_support_nested_variables=no
 fi])
-AC_SUBST([AM_V])dnl
-AM_SUBST_NOTMAKE([AM_V])dnl
-AC_SUBST([AM_DEFAULT_V])dnl
-AM_SUBST_NOTMAKE([AM_DEFAULT_V])dnl
-AC_SUBST([AM_DEFAULT_VERBOSITY])dnl
-AM_BACKSLASH='\'
-AC_SUBST([AM_BACKSLASH])dnl
-_AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
-dnl Delay evaluation of AM_DEFAULT_VERBOSITY to the end to allow multiple calls
-dnl to AM_SILENT_RULES to change the default value.
-AC_CONFIG_COMMANDS_PRE([dnl
-case $enable_silent_rules in @%:@ (((
-  yes) AM_DEFAULT_VERBOSITY=0;;
-   no) AM_DEFAULT_VERBOSITY=1;;
-esac
 if test $am_cv_make_support_nested_variables = yes; then
   dnl Using '$V' instead of '$(V)' breaks IRIX make.
   AM_V='$(V)'
@@ -1165,18 +1033,17 @@ else
   AM_V=$AM_DEFAULT_VERBOSITY
   AM_DEFAULT_V=$AM_DEFAULT_VERBOSITY
 fi
-])dnl
+AC_SUBST([AM_V])dnl
+AM_SUBST_NOTMAKE([AM_V])dnl
+AC_SUBST([AM_DEFAULT_V])dnl
+AM_SUBST_NOTMAKE([AM_DEFAULT_V])dnl
+AC_SUBST([AM_DEFAULT_VERBOSITY])dnl
+AM_BACKSLASH='\'
+AC_SUBST([AM_BACKSLASH])dnl
+_AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
 ])
 
-# AM_SILENT_RULES([DEFAULT])
-# --------------------------
-# Set the default verbosity level to DEFAULT ("yes" being less verbose, "no" or
-# empty being verbose).
-AC_DEFUN([AM_SILENT_RULES],
-[AC_REQUIRE([_AM_SILENT_RULES])
-AM_DEFAULT_VERBOSITY=m4_if([$1], [yes], [0], [1])])
-
-# Copyright (C) 2001-2024 Free Software Foundation, Inc.
+# Copyright (C) 2001-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -1204,7 +1071,7 @@ fi
 INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s"
 AC_SUBST([INSTALL_STRIP_PROGRAM])])
 
-# Copyright (C) 2006-2024 Free Software Foundation, Inc.
+# Copyright (C) 2006-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -1223,7 +1090,7 @@ AC_DEFUN([AM_SUBST_NOTMAKE], [_AM_SUBST_NOTMAKE($@)])
 
 # Check how to create a tarball.                            -*- Autoconf -*-
 
-# Copyright (C) 2004-2024 Free Software Foundation, Inc.
+# Copyright (C) 2004-2021 Free Software Foundation, Inc.
 #
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -1269,19 +1136,15 @@ m4_if([$1], [v7],
       am_uid=`id -u || echo unknown`
       am_gid=`id -g || echo unknown`
       AC_MSG_CHECKING([whether UID '$am_uid' is supported by ustar format])
-      if test x$am_uid = xunknown; then
-        AC_MSG_WARN([ancient id detected; assuming current UID is ok, but dist-ustar might not work])
-      elif test $am_uid -le $am_max_uid; then
-        AC_MSG_RESULT([yes])
+      if test $am_uid -le $am_max_uid; then
+         AC_MSG_RESULT([yes])
       else
-        AC_MSG_RESULT([no])
-        _am_tools=none
+         AC_MSG_RESULT([no])
+         _am_tools=none
       fi
       AC_MSG_CHECKING([whether GID '$am_gid' is supported by ustar format])
-      if test x$gm_gid = xunknown; then
-        AC_MSG_WARN([ancient id detected; assuming current GID is ok, but dist-ustar might not work])
-      elif test $am_gid -le $am_max_gid; then
-        AC_MSG_RESULT([yes])
+      if test $am_gid -le $am_max_gid; then
+         AC_MSG_RESULT([yes])
       else
         AC_MSG_RESULT([no])
         _am_tools=none
@@ -1358,26 +1221,6 @@ AC_SUBST([am__tar])
 AC_SUBST([am__untar])
 ]) # _AM_PROG_TAR
 
-# Copyright (C) 2022-2024 Free Software Foundation, Inc.
-#
-# This file is free software; the Free Software Foundation
-# gives unlimited permission to copy and/or distribute it,
-# with or without modifications, as long as this notice is preserved.
-
-# _AM_PROG_XARGS_N
-# ----------------
-# Check whether 'xargs -n' works.  It should work everywhere, so the fallback
-# is not optimized at all as we never expect to use it.
-AC_DEFUN([_AM_PROG_XARGS_N],
-[AC_CACHE_CHECK([xargs -n works], am_cv_xargs_n_works, [dnl
-AS_IF([test "`echo 1 2 3 | xargs -n2 echo`" = "1 2
-3"], [am_cv_xargs_n_works=yes], [am_cv_xargs_n_works=no])])
-AS_IF([test "$am_cv_xargs_n_works" = yes], [am__xargs_n='xargs -n'], [dnl
-  am__xargs_n='am__xargs_n () { shift; sed "s/ /\\n/g" | while read am__xargs_n_arg; do "$@" "$am__xargs_n_arg"; done; }'
-])dnl
-AC_SUBST(am__xargs_n)
-])
-
 m4_include([m4/ax_ac_append_to_file.m4])
 m4_include([m4/ax_ac_print_to_file.m4])
 m4_include([m4/ax_add_am_macro_static.m4])
diff --git a/deps/cares/aminclude_static.am b/deps/cares/aminclude_static.am
index 9e346c39c815a1..b83549f81adde4 100644
--- a/deps/cares/aminclude_static.am
+++ b/deps/cares/aminclude_static.am
@@ -1,6 +1,6 @@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Tue Oct 15 06:09:51 EDT 2024
+# from AX_AM_MACROS_STATIC on Sat Nov  9 17:40:37 UTC 2024
 
 
 # Code coverage
diff --git a/deps/cares/config/ltmain.sh b/deps/cares/config/ltmain.sh
old mode 100644
new mode 100755
diff --git a/deps/cares/configure b/deps/cares/configure
index a6b48c9872767b..76b0ddf39c136a 100755
--- a/deps/cares/configure
+++ b/deps/cares/configure
@@ -1,11 +1,11 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.72 for c-ares 1.34.2.
+# Generated by GNU Autoconf 2.71 for c-ares 1.34.3.
 #
 # Report bugs to <c-ares mailing list: http://lists.haxx.se/listinfo/c-ares>.
 #
 #
-# Copyright (C) 1992-1996, 1998-2017, 2020-2023 Free Software Foundation,
+# Copyright (C) 1992-1996, 1998-2017, 2020-2021 Free Software Foundation,
 # Inc.
 #
 #
@@ -17,6 +17,7 @@
 
 # Be more Bourne compatible
 DUALCASE=1; export DUALCASE # for MKS sh
+as_nop=:
 if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1
 then :
   emulate sh
@@ -25,13 +26,12 @@ then :
   # is contrary to our usage.  Disable this feature.
   alias -g '${1+"$@"}'='"$@"'
   setopt NO_GLOB_SUBST
-else case e in #(
-  e) case `(set -o) 2>/dev/null` in #(
+else $as_nop
+  case `(set -o) 2>/dev/null` in #(
   *posix*) :
     set -o posix ;; #(
   *) :
      ;;
-esac ;;
 esac
 fi
 
@@ -103,7 +103,7 @@ IFS=$as_save_IFS
 
      ;;
 esac
-# We did not find ourselves, most probably we were run as 'sh COMMAND'
+# We did not find ourselves, most probably we were run as `sh COMMAND'
 # in which case we are not to be found in the path.
 if test "x$as_myself" = x; then
   as_myself=$0
@@ -133,14 +133,15 @@ case $- in # ((((
 esac
 exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"}
 # Admittedly, this is quite paranoid, since all the known shells bail
-# out after a failed 'exec'.
+# out after a failed `exec'.
 printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2
 exit 255
   fi
   # We don't want this to propagate to other subprocesses.
           { _as_can_reexec=; unset _as_can_reexec;}
 if test "x$CONFIG_SHELL" = x; then
-  as_bourne_compatible="if test \${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1
+  as_bourne_compatible="as_nop=:
+if test \${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1
 then :
   emulate sh
   NULLCMD=:
@@ -148,13 +149,12 @@ then :
   # is contrary to our usage.  Disable this feature.
   alias -g '\${1+\"\$@\"}'='\"\$@\"'
   setopt NO_GLOB_SUBST
-else case e in #(
-  e) case \`(set -o) 2>/dev/null\` in #(
+else \$as_nop
+  case \`(set -o) 2>/dev/null\` in #(
   *posix*) :
     set -o posix ;; #(
   *) :
      ;;
-esac ;;
 esac
 fi
 "
@@ -172,9 +172,8 @@ as_fn_ret_failure && { exitcode=1; echo as_fn_ret_failure succeeded.; }
 if ( set x; as_fn_ret_success y && test x = \"\$1\" )
 then :
 
-else case e in #(
-  e) exitcode=1; echo positional parameters were not saved. ;;
-esac
+else \$as_nop
+  exitcode=1; echo positional parameters were not saved.
 fi
 test x\$exitcode = x0 || exit 1
 blah=\$(echo \$(echo blah))
@@ -196,15 +195,14 @@ test \$(( 1 + 1 )) = 2 || exit 1"
   if (eval "$as_required") 2>/dev/null
 then :
   as_have_required=yes
-else case e in #(
-  e) as_have_required=no ;;
-esac
+else $as_nop
+  as_have_required=no
 fi
   if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null
 then :
 
-else case e in #(
-  e) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+else $as_nop
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
 as_found=false
 for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH
 do
@@ -237,13 +235,12 @@ IFS=$as_save_IFS
 if $as_found
 then :
 
-else case e in #(
-  e) if { test -f "$SHELL" || test -f "$SHELL.exe"; } &&
+else $as_nop
+  if { test -f "$SHELL" || test -f "$SHELL.exe"; } &&
 	      as_run=a "$SHELL" -c "$as_bourne_compatible""$as_required" 2>/dev/null
 then :
   CONFIG_SHELL=$SHELL as_have_required=yes
-fi ;;
-esac
+fi
 fi
 
 
@@ -265,7 +262,7 @@ case $- in # ((((
 esac
 exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"}
 # Admittedly, this is quite paranoid, since all the known shells bail
-# out after a failed 'exec'.
+# out after a failed `exec'.
 printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2
 exit 255
 fi
@@ -285,8 +282,7 @@ $0: message. Then install a modern shell, or manually run
 $0: the script under such a shell if you do have one."
   fi
   exit 1
-fi ;;
-esac
+fi
 fi
 fi
 SHELL=${CONFIG_SHELL-/bin/sh}
@@ -325,6 +321,14 @@ as_fn_exit ()
   as_fn_set_status $1
   exit $1
 } # as_fn_exit
+# as_fn_nop
+# ---------
+# Do nothing but, unlike ":", preserve the value of $?.
+as_fn_nop ()
+{
+  return $?
+}
+as_nop=as_fn_nop
 
 # as_fn_mkdir_p
 # -------------
@@ -393,12 +397,11 @@ then :
   {
     eval $1+=\$2
   }'
-else case e in #(
-  e) as_fn_append ()
+else $as_nop
+  as_fn_append ()
   {
     eval $1=\$$1\$2
-  } ;;
-esac
+  }
 fi # as_fn_append
 
 # as_fn_arith ARG...
@@ -412,14 +415,21 @@ then :
   {
     as_val=$(( $* ))
   }'
-else case e in #(
-  e) as_fn_arith ()
+else $as_nop
+  as_fn_arith ()
   {
     as_val=`expr "$@" || test $? -eq 1`
-  } ;;
-esac
+  }
 fi # as_fn_arith
 
+# as_fn_nop
+# ---------
+# Do nothing but, unlike ":", preserve the value of $?.
+as_fn_nop ()
+{
+  return $?
+}
+as_nop=as_fn_nop
 
 # as_fn_error STATUS ERROR [LINENO LOG_FD]
 # ----------------------------------------
@@ -493,8 +503,6 @@ as_cr_alnum=$as_cr_Letters$as_cr_digits
     /[$]LINENO/=
   ' <$as_myself |
     sed '
-      t clear
-      :clear
       s/[$]LINENO.*/&-/
       t lineno
       b
@@ -543,6 +551,7 @@ esac
 as_echo='printf %s\n'
 as_echo_n='printf %s'
 
+
 rm -f conf$$ conf$$.exe conf$$.file
 if test -d conf$$.dir; then
   rm -f conf$$.dir/conf$$.file
@@ -554,9 +563,9 @@ if (echo >conf$$.file) 2>/dev/null; then
   if ln -s conf$$.file conf$$ 2>/dev/null; then
     as_ln_s='ln -s'
     # ... but there are two gotchas:
-    # 1) On MSYS, both 'ln -s file dir' and 'ln file dir' fail.
-    # 2) DJGPP < 2.04 has no symlinks; 'ln -s' creates a wrapper executable.
-    # In both cases, we have to default to 'cp -pR'.
+    # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail.
+    # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable.
+    # In both cases, we have to default to `cp -pR'.
     ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe ||
       as_ln_s='cp -pR'
   elif ln conf$$.file conf$$ 2>/dev/null; then
@@ -581,12 +590,10 @@ as_test_x='test -x'
 as_executable_p=as_fn_executable_p
 
 # Sed expression to map a string onto a valid CPP name.
-as_sed_cpp="y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g"
-as_tr_cpp="eval sed '$as_sed_cpp'" # deprecated
+as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'"
 
 # Sed expression to map a string onto a valid variable name.
-as_sed_sh="y%*+%pp%;s%[^_$as_cr_alnum]%_%g"
-as_tr_sh="eval sed '$as_sed_sh'" # deprecated
+as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'"
 
 SHELL=${CONFIG_SHELL-/bin/sh}
 
@@ -614,8 +621,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='c-ares'
 PACKAGE_TARNAME='c-ares'
-PACKAGE_VERSION='1.34.2'
-PACKAGE_STRING='c-ares 1.34.2'
+PACKAGE_VERSION='1.34.3'
+PACKAGE_STRING='c-ares 1.34.3'
 PACKAGE_BUGREPORT='c-ares mailing list: http://lists.haxx.se/listinfo/c-ares'
 PACKAGE_URL=''
 
@@ -652,7 +659,6 @@ ac_includes_default="\
 #endif"
 
 ac_header_c_list=
-enable_year2038=no
 ac_subst_vars='am__EXEEXT_FALSE
 am__EXEEXT_TRUE
 LTLIBOBJS
@@ -709,7 +715,6 @@ MANIFEST_TOOL
 RANLIB
 ac_ct_AR
 AR
-FILECMD
 LN_S
 NM
 ac_ct_DUMPBIN
@@ -731,8 +736,6 @@ LIBTOOL
 OBJDUMP
 DLLTOOL
 AS
-am__xargs_n
-am__rm_f_notfound
 AM_BACKSLASH
 AM_DEFAULT_VERBOSITY
 AM_DEFAULT_V
@@ -834,10 +837,8 @@ enable_dependency_tracking
 enable_silent_rules
 enable_shared
 enable_static
-enable_pic
 with_pic
 enable_fast_install
-enable_aix_soname
 with_aix_soname
 with_gnu_ld
 with_sysroot
@@ -852,7 +853,6 @@ with_gcov
 enable_code_coverage
 enable_largefile
 enable_libgcc
-enable_year2038
 '
       ac_precious_vars='build_alias
 host_alias
@@ -983,7 +983,7 @@ do
     ac_useropt=`expr "x$ac_option" : 'x-*disable-\(.*\)'`
     # Reject names that are not valid shell variable names.
     expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null &&
-      as_fn_error $? "invalid feature name: '$ac_useropt'"
+      as_fn_error $? "invalid feature name: \`$ac_useropt'"
     ac_useropt_orig=$ac_useropt
     ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'`
     case $ac_user_opts in
@@ -1009,7 +1009,7 @@ do
     ac_useropt=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'`
     # Reject names that are not valid shell variable names.
     expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null &&
-      as_fn_error $? "invalid feature name: '$ac_useropt'"
+      as_fn_error $? "invalid feature name: \`$ac_useropt'"
     ac_useropt_orig=$ac_useropt
     ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'`
     case $ac_user_opts in
@@ -1222,7 +1222,7 @@ do
     ac_useropt=`expr "x$ac_option" : 'x-*with-\([^=]*\)'`
     # Reject names that are not valid shell variable names.
     expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null &&
-      as_fn_error $? "invalid package name: '$ac_useropt'"
+      as_fn_error $? "invalid package name: \`$ac_useropt'"
     ac_useropt_orig=$ac_useropt
     ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'`
     case $ac_user_opts in
@@ -1238,7 +1238,7 @@ do
     ac_useropt=`expr "x$ac_option" : 'x-*without-\(.*\)'`
     # Reject names that are not valid shell variable names.
     expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null &&
-      as_fn_error $? "invalid package name: '$ac_useropt'"
+      as_fn_error $? "invalid package name: \`$ac_useropt'"
     ac_useropt_orig=$ac_useropt
     ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'`
     case $ac_user_opts in
@@ -1268,8 +1268,8 @@ do
   | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*)
     x_libraries=$ac_optarg ;;
 
-  -*) as_fn_error $? "unrecognized option: '$ac_option'
-Try '$0 --help' for more information"
+  -*) as_fn_error $? "unrecognized option: \`$ac_option'
+Try \`$0 --help' for more information"
     ;;
 
   *=*)
@@ -1277,7 +1277,7 @@ Try '$0 --help' for more information"
     # Reject names that are not valid shell variable names.
     case $ac_envvar in #(
       '' | [0-9]* | *[!_$as_cr_alnum]* )
-      as_fn_error $? "invalid variable name: '$ac_envvar'" ;;
+      as_fn_error $? "invalid variable name: \`$ac_envvar'" ;;
     esac
     eval $ac_envvar=\$ac_optarg
     export $ac_envvar ;;
@@ -1327,7 +1327,7 @@ do
   as_fn_error $? "expected an absolute directory name for --$ac_var: $ac_val"
 done
 
-# There might be people who depend on the old broken behavior: '$host'
+# There might be people who depend on the old broken behavior: `$host'
 # used to hold the argument of --host etc.
 # FIXME: To remove some day.
 build=$build_alias
@@ -1395,7 +1395,7 @@ if test ! -r "$srcdir/$ac_unique_file"; then
   test "$ac_srcdir_defaulted" = yes && srcdir="$ac_confdir or .."
   as_fn_error $? "cannot find sources ($ac_unique_file) in $srcdir"
 fi
-ac_msg="sources are in $srcdir, but 'cd $srcdir' does not work"
+ac_msg="sources are in $srcdir, but \`cd $srcdir' does not work"
 ac_abs_confdir=`(
 	cd "$srcdir" && test -r "./$ac_unique_file" || as_fn_error $? "$ac_msg"
 	pwd)`
@@ -1423,7 +1423,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-'configure' configures c-ares 1.34.2 to adapt to many kinds of systems.
+\`configure' configures c-ares 1.34.3 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1437,11 +1437,11 @@ Configuration:
       --help=short        display options specific to this package
       --help=recursive    display the short help of all the included packages
   -V, --version           display version information and exit
-  -q, --quiet, --silent   do not print 'checking ...' messages
+  -q, --quiet, --silent   do not print \`checking ...' messages
       --cache-file=FILE   cache test results in FILE [disabled]
-  -C, --config-cache      alias for '--cache-file=config.cache'
+  -C, --config-cache      alias for \`--cache-file=config.cache'
   -n, --no-create         do not create output files
-      --srcdir=DIR        find the sources in DIR [configure dir or '..']
+      --srcdir=DIR        find the sources in DIR [configure dir or \`..']
 
 Installation directories:
   --prefix=PREFIX         install architecture-independent files in PREFIX
@@ -1449,10 +1449,10 @@ Installation directories:
   --exec-prefix=EPREFIX   install architecture-dependent files in EPREFIX
                           [PREFIX]
 
-By default, 'make install' will install all the files in
-'$ac_default_prefix/bin', '$ac_default_prefix/lib' etc.  You can specify
-an installation prefix other than '$ac_default_prefix' using '--prefix',
-for instance '--prefix=\$HOME'.
+By default, \`make install' will install all the files in
+\`$ac_default_prefix/bin', \`$ac_default_prefix/lib' etc.  You can specify
+an installation prefix other than \`$ac_default_prefix' using \`--prefix',
+for instance \`--prefix=\$HOME'.
 
 For better control, use the options below.
 
@@ -1494,7 +1494,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of c-ares 1.34.2:";;
+     short | recursive ) echo "Configuration of c-ares 1.34.3:";;
    esac
   cat <<\_ACEOF
 
@@ -1510,13 +1510,8 @@ Optional Features:
   --disable-silent-rules  verbose build output (undo: "make V=0")
   --enable-shared[=PKGS]  build shared libraries [default=yes]
   --enable-static[=PKGS]  build static libraries [default=yes]
-  --enable-pic[=PKGS]     try to use only PIC/non-PIC objects [default=use
-                          both]
   --enable-fast-install[=PKGS]
                           optimize for fast installation [default=yes]
-  --enable-aix-soname=aix|svr4|both
-                          shared library versioning (aka "SONAME") variant to
-                          provide on AIX, [default=aix].
   --disable-libtool-lock  avoid locking (might break parallel builds)
   --disable-warnings      Disable strict compiler warnings
   --disable-symbol-hiding Disable symbol hiding. Enabled by default if the
@@ -1530,11 +1525,15 @@ Optional Features:
   --enable-code-coverage  Whether to enable code coverage support
   --disable-largefile     omit support for large files
   --enable-libgcc         use libgcc when linking
-  --enable-year2038       support timestamps after 2038
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
   --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
+  --with-pic[=PKGS]       try to use only PIC/non-PIC objects [default=use
+                          both]
+  --with-aix-soname=aix|svr4|both
+                          shared library versioning (aka "SONAME") variant to
+                          provide on AIX, [default=aix].
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
   --with-sysroot[=DIR]    Search for dependent libraries within DIR (or the
                           compiler's sysroot if not specified).
@@ -1568,7 +1567,7 @@ Some influential environment variables:
   GMOCK112_LIBS
               linker flags for GMOCK112, overriding pkg-config
 
-Use these variables to override the choices made by 'configure' or to help
+Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
 Report bugs to <c-ares mailing list: http://lists.haxx.se/listinfo/c-ares>.
@@ -1635,10 +1634,10 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-c-ares configure 1.34.2
-generated by GNU Autoconf 2.72
+c-ares configure 1.34.3
+generated by GNU Autoconf 2.71
 
-Copyright (C) 2023 Free Software Foundation, Inc.
+Copyright (C) 2021 Free Software Foundation, Inc.
 This configure script is free software; the Free Software Foundation
 gives unlimited permission to copy, distribute and modify it.
 _ACEOF
@@ -1677,12 +1676,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        } && test -s conftest.$ac_objext
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-	ac_retval=1 ;;
-esac
+	ac_retval=1
 fi
   eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
   as_fn_set_status $ac_retval
@@ -1701,8 +1699,8 @@ printf %s "checking for $2... " >&6; }
 if eval test \${$3+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 $4
 #include <$2>
@@ -1710,12 +1708,10 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$3=yes"
-else case e in #(
-  e) eval "$3=no" ;;
-esac
+else $as_nop
+  eval "$3=no"
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 eval ac_res=\$$3
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -1752,12 +1748,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        } && test -s conftest.$ac_objext
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-	ac_retval=1 ;;
-esac
+	ac_retval=1
 fi
   eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
   as_fn_set_status $ac_retval
@@ -1795,12 +1790,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        }
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-	ac_retval=1 ;;
-esac
+	ac_retval=1
 fi
   # Delete the IPA/IPO (Inter Procedural Analysis/Optimization) information
   # created by the PGI compiler (conftest_ipa8_conftest.oo), as it would
@@ -1823,15 +1817,15 @@ printf %s "checking for $2... " >&6; }
 if eval test \${$3+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 /* Define $2 to an innocuous variant, in case <limits.h> declares $2.
    For example, HP-UX 11i <limits.h> declares gettimeofday.  */
 #define $2 innocuous_$2
 
 /* System header to define __stub macros and hopefully few prototypes,
-   which can conflict with char $2 (void); below.  */
+   which can conflict with char $2 (); below.  */
 
 #include <limits.h>
 #undef $2
@@ -1842,7 +1836,7 @@ else case e in #(
 #ifdef __cplusplus
 extern "C"
 #endif
-char $2 (void);
+char $2 ();
 /* The GNU C library defines this for functions which it implements
     to always fail with ENOSYS.  Some functions are actually named
     something starting with __ and the normal name is an alias.  */
@@ -1861,13 +1855,11 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   eval "$3=yes"
-else case e in #(
-  e) eval "$3=no" ;;
-esac
+else $as_nop
+  eval "$3=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext ;;
-esac
+    conftest$ac_exeext conftest.$ac_ext
 fi
 eval ac_res=\$$3
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -1903,12 +1895,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        }
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-    ac_retval=1 ;;
-esac
+    ac_retval=1
 fi
   eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
   as_fn_set_status $ac_retval
@@ -1946,12 +1937,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        }
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-	ac_retval=1 ;;
-esac
+	ac_retval=1
 fi
   # Delete the IPA/IPO (Inter Procedural Analysis/Optimization) information
   # created by the PGI compiler (conftest_ipa8_conftest.oo), as it would
@@ -1990,12 +1980,11 @@ printf "%s\n" "$ac_try_echo"; } >&5
        }
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-    ac_retval=1 ;;
-esac
+    ac_retval=1
 fi
   eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
   as_fn_set_status $ac_retval
@@ -2014,20 +2003,18 @@ printf %s "checking for $2... " >&6; }
 if eval test \${$3+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <$2>
 _ACEOF
 if ac_fn_c_try_cpp "$LINENO"
 then :
   eval "$3=yes"
-else case e in #(
-  e) eval "$3=no" ;;
-esac
+else $as_nop
+  eval "$3=no"
 fi
-rm -f conftest.err conftest.i conftest.$ac_ext ;;
-esac
+rm -f conftest.err conftest.i conftest.$ac_ext
 fi
 eval ac_res=\$$3
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -2049,8 +2036,8 @@ printf %s "checking whether $as_decl_name is declared... " >&6; }
 if eval test \${$3+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'`
+else $as_nop
+  as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'`
   eval ac_save_FLAGS=\$$6
   as_fn_append $6 " $5"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -2074,14 +2061,12 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$3=yes"
-else case e in #(
-  e) eval "$3=no" ;;
-esac
+else $as_nop
+  eval "$3=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
   eval $6=\$ac_save_FLAGS
- ;;
-esac
+
 fi
 eval ac_res=\$$3
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -2102,8 +2087,8 @@ printf %s "checking for $2... " >&6; }
 if eval test \${$3+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) eval "$3=no"
+else $as_nop
+  eval "$3=no"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 $4
@@ -2133,14 +2118,12 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) eval "$3=yes" ;;
-esac
+else $as_nop
+  eval "$3=yes"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 eval ac_res=\$$3
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -2161,8 +2144,8 @@ printf %s "checking for $2.$3... " >&6; }
 if eval test \${$4+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 $5
 int
@@ -2178,8 +2161,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$4=yes"
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 $5
 int
@@ -2195,15 +2178,12 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$4=yes"
-else case e in #(
-  e) eval "$4=no" ;;
-esac
+else $as_nop
+  eval "$4=no"
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 eval ac_res=\$$4
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -2242,13 +2222,12 @@ printf "%s\n" "$ac_try_echo"; } >&5
   test $ac_status = 0; }; }
 then :
   ac_retval=0
-else case e in #(
-  e) printf "%s\n" "$as_me: program exited with status $ac_status" >&5
+else $as_nop
+  printf "%s\n" "$as_me: program exited with status $ac_status" >&5
        printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-       ac_retval=$ac_status ;;
-esac
+       ac_retval=$ac_status
 fi
   rm -rf conftest.dSYM conftest_ipa8_conftest.oo
   eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
@@ -2279,8 +2258,8 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by c-ares $as_me 1.34.2, which was
-generated by GNU Autoconf 2.72.  Invocation command line was
+It was created by c-ares $as_me 1.34.3, which was
+generated by GNU Autoconf 2.71.  Invocation command line was
 
   $ $0$ac_configure_args_raw
 
@@ -2526,10 +2505,10 @@ esac
 printf "%s\n" "$as_me: loading site script $ac_site_file" >&6;}
     sed 's/^/| /' "$ac_site_file" >&5
     . "$ac_site_file" \
-      || { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+      || { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "failed to load site script $ac_site_file
-See 'config.log' for more details" "$LINENO" 5; }
+See \`config.log' for more details" "$LINENO" 5; }
   fi
 done
 
@@ -2566,7 +2545,9 @@ struct stat;
 /* Most of the following tests are stolen from RCS 5.7 src/conf.sh.  */
 struct buf { int x; };
 struct buf * (*rcsopen) (struct buf *, struct stat *, int);
-static char *e (char **p, int i)
+static char *e (p, i)
+     char **p;
+     int i;
 {
   return p[i];
 }
@@ -2580,21 +2561,6 @@ static char *f (char * (*g) (char **, int), char **p, ...)
   return s;
 }
 
-/* C89 style stringification. */
-#define noexpand_stringify(a) #a
-const char *stringified = noexpand_stringify(arbitrary+token=sequence);
-
-/* C89 style token pasting.  Exercises some of the corner cases that
-   e.g. old MSVC gets wrong, but not very hard. */
-#define noexpand_concat(a,b) a##b
-#define expand_concat(a,b) noexpand_concat(a,b)
-extern int vA;
-extern int vbee;
-#define aye A
-#define bee B
-int *pvA = &expand_concat(v,aye);
-int *pvbee = &noexpand_concat(v,bee);
-
 /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default.  It has
    function prototypes and stuff, but not \xHH hex character constants.
    These do not provoke an error unfortunately, instead are silently treated
@@ -2622,19 +2588,16 @@ ok |= (argc == 0 || f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]);
 
 # Test code for whether the C compiler supports C99 (global declarations)
 ac_c_conftest_c99_globals='
-/* Does the compiler advertise C99 conformance? */
+// Does the compiler advertise C99 conformance?
 #if !defined __STDC_VERSION__ || __STDC_VERSION__ < 199901L
 # error "Compiler does not advertise C99 conformance"
 #endif
 
-// See if C++-style comments work.
-
 #include <stdbool.h>
 extern int puts (const char *);
 extern int printf (const char *, ...);
 extern int dprintf (int, const char *, ...);
 extern void *malloc (size_t);
-extern void free (void *);
 
 // Check varargs macros.  These examples are taken from C99 6.10.3.5.
 // dprintf is used instead of fprintf to avoid needing to declare
@@ -2684,6 +2647,7 @@ typedef const char *ccp;
 static inline int
 test_restrict (ccp restrict text)
 {
+  // See if C++-style comments work.
   // Iterate through items via the restricted pointer.
   // Also check for declarations in for loops.
   for (unsigned int i = 0; *(text+i) != '\''\0'\''; ++i)
@@ -2749,8 +2713,6 @@ ac_c_conftest_c99_main='
   ia->datasize = 10;
   for (int i = 0; i < ia->datasize; ++i)
     ia->data[i] = i * 1.234;
-  // Work around memory leak warnings.
-  free (ia);
 
   // Check named initializers.
   struct named_init ni = {
@@ -2772,7 +2734,7 @@ ac_c_conftest_c99_main='
 
 # Test code for whether the C compiler supports C11 (global declarations)
 ac_c_conftest_c11_globals='
-/* Does the compiler advertise C11 conformance? */
+// Does the compiler advertise C11 conformance?
 #if !defined __STDC_VERSION__ || __STDC_VERSION__ < 201112L
 # error "Compiler does not advertise C11 conformance"
 #endif
@@ -3181,9 +3143,8 @@ IFS=$as_save_IFS
 if $as_found
 then :
 
-else case e in #(
-  e) as_fn_error $? "cannot find required auxiliary files:$ac_missing_aux_files" "$LINENO" 5 ;;
-esac
+else $as_nop
+  as_fn_error $? "cannot find required auxiliary files:$ac_missing_aux_files" "$LINENO" 5
 fi
 
 
@@ -3211,12 +3172,12 @@ for ac_var in $ac_precious_vars; do
   eval ac_new_val=\$ac_env_${ac_var}_value
   case $ac_old_set,$ac_new_set in
     set,)
-      { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: '$ac_var' was set to '$ac_old_val' in the previous run" >&5
-printf "%s\n" "$as_me: error: '$ac_var' was set to '$ac_old_val' in the previous run" >&2;}
+      { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5
+printf "%s\n" "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;}
       ac_cache_corrupted=: ;;
     ,set)
-      { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: '$ac_var' was not set in the previous run" >&5
-printf "%s\n" "$as_me: error: '$ac_var' was not set in the previous run" >&2;}
+      { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5
+printf "%s\n" "$as_me: error: \`$ac_var' was not set in the previous run" >&2;}
       ac_cache_corrupted=: ;;
     ,);;
     *)
@@ -3225,18 +3186,18 @@ printf "%s\n" "$as_me: error: '$ac_var' was not set in the previous run" >&2;}
 	ac_old_val_w=`echo x $ac_old_val`
 	ac_new_val_w=`echo x $ac_new_val`
 	if test "$ac_old_val_w" != "$ac_new_val_w"; then
-	  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: '$ac_var' has changed since the previous run:" >&5
-printf "%s\n" "$as_me: error: '$ac_var' has changed since the previous run:" >&2;}
+	  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5
+printf "%s\n" "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;}
 	  ac_cache_corrupted=:
 	else
-	  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in '$ac_var' since the previous run:" >&5
-printf "%s\n" "$as_me: warning: ignoring whitespace changes in '$ac_var' since the previous run:" >&2;}
+	  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5
+printf "%s\n" "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;}
 	  eval $ac_var=\$ac_old_val
 	fi
-	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}:   former value:  '$ac_old_val'" >&5
-printf "%s\n" "$as_me:   former value:  '$ac_old_val'" >&2;}
-	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}:   current value: '$ac_new_val'" >&5
-printf "%s\n" "$as_me:   current value: '$ac_new_val'" >&2;}
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}:   former value:  \`$ac_old_val'" >&5
+printf "%s\n" "$as_me:   former value:  \`$ac_old_val'" >&2;}
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}:   current value: \`$ac_new_val'" >&5
+printf "%s\n" "$as_me:   current value: \`$ac_new_val'" >&2;}
       fi;;
   esac
   # Pass precious variables to config.status.
@@ -3252,11 +3213,11 @@ printf "%s\n" "$as_me:   current value: '$ac_new_val'" >&2;}
   fi
 done
 if $ac_cache_corrupted; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5
 printf "%s\n" "$as_me: error: changes in the environment can compromise the build" >&2;}
-  as_fn_error $? "run '${MAKE-make} distclean' and/or 'rm $cache_file'
+  as_fn_error $? "run \`${MAKE-make} distclean' and/or \`rm $cache_file'
 	    and start over" "$LINENO" 5
 fi
 ## -------------------- ##
@@ -3271,7 +3232,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
-CARES_VERSION_INFO="21:1:19"
+CARES_VERSION_INFO="21:2:19"
 
 
 
@@ -3306,8 +3267,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3329,8 +3290,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -3352,8 +3312,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3375,8 +3335,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -3411,8 +3370,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3434,8 +3393,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -3457,8 +3415,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
   ac_prog_rejected=no
@@ -3497,8 +3455,7 @@ if test $ac_prog_rejected = yes; then
     ac_cv_prog_CC="$as_dir$ac_word${1+' '}$@"
   fi
 fi
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -3522,8 +3479,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3545,8 +3502,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -3572,8 +3528,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3595,8 +3551,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -3634,8 +3589,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3657,8 +3612,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -3680,8 +3634,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -3703,8 +3657,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -3733,10 +3686,10 @@ fi
 fi
 
 
-test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "no acceptable C compiler found in \$PATH
-See 'config.log' for more details" "$LINENO" 5; }
+See \`config.log' for more details" "$LINENO" 5; }
 
 # Provide some information about the compiler.
 printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5
@@ -3808,8 +3761,8 @@ printf "%s\n" "$ac_try_echo"; } >&5
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
 then :
-  # Autoconf-2.13 could set the ac_cv_exeext variable to 'no'.
-# So ignore a value of 'no', otherwise this would lead to 'EXEEXT = no'
+  # Autoconf-2.13 could set the ac_cv_exeext variable to `no'.
+# So ignore a value of `no', otherwise this would lead to `EXEEXT = no'
 # in a Makefile.  We should not override ac_cv_exeext if it was cached,
 # so that the user can short-circuit this test for compilers unknown to
 # Autoconf.
@@ -3829,7 +3782,7 @@ do
 	   ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'`
 	fi
 	# We set ac_cv_exeext here because the later test for it is not
-	# safe: cross compilers may not add the suffix if given an '-o'
+	# safe: cross compilers may not add the suffix if given an `-o'
 	# argument, so we may need to know it at that point already.
 	# Even if this section looks crufty: it has the advantage of
 	# actually working.
@@ -3840,9 +3793,8 @@ do
 done
 test "$ac_cv_exeext" = no && ac_cv_exeext=
 
-else case e in #(
-  e) ac_file='' ;;
-esac
+else $as_nop
+  ac_file=''
 fi
 if test -z "$ac_file"
 then :
@@ -3851,14 +3803,13 @@ printf "%s\n" "no" >&6; }
 printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error 77 "C compiler cannot create executables
-See 'config.log' for more details" "$LINENO" 5; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
-printf "%s\n" "yes" >&6; } ;;
-esac
+See \`config.log' for more details" "$LINENO" 5; }
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+printf "%s\n" "yes" >&6; }
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5
 printf %s "checking for C compiler default output file name... " >&6; }
@@ -3882,10 +3833,10 @@ printf "%s\n" "$ac_try_echo"; } >&5
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
 then :
-  # If both 'conftest.exe' and 'conftest' are 'present' (well, observable)
-# catch 'conftest.exe'.  For instance with Cygwin, 'ls conftest' will
-# work properly (i.e., refer to 'conftest.exe'), while it won't with
-# 'rm'.
+  # If both `conftest.exe' and `conftest' are `present' (well, observable)
+# catch `conftest.exe'.  For instance with Cygwin, `ls conftest' will
+# work properly (i.e., refer to `conftest.exe'), while it won't with
+# `rm'.
 for ac_file in conftest.exe conftest conftest.*; do
   test -f "$ac_file" || continue
   case $ac_file in
@@ -3895,12 +3846,11 @@ for ac_file in conftest.exe conftest conftest.*; do
     * ) break;;
   esac
 done
-else case e in #(
-  e) { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+else $as_nop
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "cannot compute suffix of executables: cannot compile and link
-See 'config.log' for more details" "$LINENO" 5; } ;;
-esac
+See \`config.log' for more details" "$LINENO" 5; }
 fi
 rm -f conftest conftest$ac_cv_exeext
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5
@@ -3916,8 +3866,6 @@ int
 main (void)
 {
 FILE *f = fopen ("conftest.out", "w");
- if (!f)
-  return 1;
  return ferror (f) || fclose (f) != 0;
 
   ;
@@ -3957,27 +3905,26 @@ printf "%s\n" "$ac_try_echo"; } >&5
     if test "$cross_compiling" = maybe; then
 	cross_compiling=yes
     else
-	{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+	{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error 77 "cannot run C compiled programs.
-If you meant to cross compile, use '--host'.
-See 'config.log' for more details" "$LINENO" 5; }
+If you meant to cross compile, use \`--host'.
+See \`config.log' for more details" "$LINENO" 5; }
     fi
   fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5
 printf "%s\n" "$cross_compiling" >&6; }
 
-rm -f conftest.$ac_ext conftest$ac_cv_exeext \
-  conftest.o conftest.obj conftest.out
+rm -f conftest.$ac_ext conftest$ac_cv_exeext conftest.out
 ac_clean_files=$ac_clean_files_save
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5
 printf %s "checking for suffix of object files... " >&6; }
 if test ${ac_cv_objext+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -4009,18 +3956,16 @@ then :
        break;;
   esac
 done
-else case e in #(
-  e) printf "%s\n" "$as_me: failed program was:" >&5
+else $as_nop
+  printf "%s\n" "$as_me: failed program was:" >&5
 sed 's/^/| /' conftest.$ac_ext >&5
 
-{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "cannot compute suffix of object files: cannot compile
-See 'config.log' for more details" "$LINENO" 5; } ;;
-esac
+See \`config.log' for more details" "$LINENO" 5; }
 fi
-rm -f conftest.$ac_cv_objext conftest.$ac_ext ;;
-esac
+rm -f conftest.$ac_cv_objext conftest.$ac_ext
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5
 printf "%s\n" "$ac_cv_objext" >&6; }
@@ -4031,8 +3976,8 @@ printf %s "checking whether the compiler supports GNU C... " >&6; }
 if test ${ac_cv_c_compiler_gnu+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -4049,14 +3994,12 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ac_compiler_gnu=yes
-else case e in #(
-  e) ac_compiler_gnu=no ;;
-esac
+else $as_nop
+  ac_compiler_gnu=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 ac_cv_c_compiler_gnu=$ac_compiler_gnu
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5
 printf "%s\n" "$ac_cv_c_compiler_gnu" >&6; }
@@ -4074,8 +4017,8 @@ printf %s "checking whether $CC accepts -g... " >&6; }
 if test ${ac_cv_prog_cc_g+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_c_werror_flag=$ac_c_werror_flag
+else $as_nop
+  ac_save_c_werror_flag=$ac_c_werror_flag
    ac_c_werror_flag=yes
    ac_cv_prog_cc_g=no
    CFLAGS="-g"
@@ -4093,8 +4036,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_prog_cc_g=yes
-else case e in #(
-  e) CFLAGS=""
+else $as_nop
+  CFLAGS=""
       cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -4109,8 +4052,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) ac_c_werror_flag=$ac_save_c_werror_flag
+else $as_nop
+  ac_c_werror_flag=$ac_save_c_werror_flag
 	 CFLAGS="-g"
 	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4127,15 +4070,12 @@ if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_prog_cc_g=yes
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-   ac_c_werror_flag=$ac_save_c_werror_flag ;;
-esac
+   ac_c_werror_flag=$ac_save_c_werror_flag
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5
 printf "%s\n" "$ac_cv_prog_cc_g" >&6; }
@@ -4162,8 +4102,8 @@ printf %s "checking for $CC option to enable C11 features... " >&6; }
 if test ${ac_cv_prog_cc_c11+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c11=no
+else $as_nop
+  ac_cv_prog_cc_c11=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4180,28 +4120,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c11" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c11" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c11" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c11" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5
 printf "%s\n" "$ac_cv_prog_cc_c11" >&6; }
-     CC="$CC $ac_cv_prog_cc_c11" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c11"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c11
-  ac_prog_cc_stdc=c11 ;;
-esac
+  ac_prog_cc_stdc=c11
 fi
 fi
 if test x$ac_prog_cc_stdc = xno
@@ -4211,8 +4148,8 @@ printf %s "checking for $CC option to enable C99 features... " >&6; }
 if test ${ac_cv_prog_cc_c99+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c99=no
+else $as_nop
+  ac_cv_prog_cc_c99=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4229,28 +4166,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c99" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c99" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c99" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c99" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5
 printf "%s\n" "$ac_cv_prog_cc_c99" >&6; }
-     CC="$CC $ac_cv_prog_cc_c99" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c99"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c99
-  ac_prog_cc_stdc=c99 ;;
-esac
+  ac_prog_cc_stdc=c99
 fi
 fi
 if test x$ac_prog_cc_stdc = xno
@@ -4260,8 +4194,8 @@ printf %s "checking for $CC option to enable C89 features... " >&6; }
 if test ${ac_cv_prog_cc_c89+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c89=no
+else $as_nop
+  ac_cv_prog_cc_c89=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4278,28 +4212,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c89" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c89" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c89" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c89" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5
 printf "%s\n" "$ac_cv_prog_cc_c89" >&6; }
-     CC="$CC $ac_cv_prog_cc_c89" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c89"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c89
-  ac_prog_cc_stdc=c89 ;;
-esac
+  ac_prog_cc_stdc=c89
 fi
 fi
 
@@ -4320,8 +4251,8 @@ printf %s "checking whether $CC understands -c and -o together... " >&6; }
 if test ${am_cv_prog_cc_c_o+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -4351,8 +4282,7 @@ _ACEOF
     fi
   done
   rm -f core conftest*
-  unset am_i ;;
-esac
+  unset am_i
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_prog_cc_c_o" >&5
 printf "%s\n" "$am_cv_prog_cc_c_o" >&6; }
@@ -4412,8 +4342,8 @@ printf %s "checking whether it is safe to define __EXTENSIONS__... " >&6; }
 if test ${ac_cv_safe_to_define___extensions__+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 #         define __EXTENSIONS__ 1
@@ -4429,12 +4359,10 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_safe_to_define___extensions__=yes
-else case e in #(
-  e) ac_cv_safe_to_define___extensions__=no ;;
-esac
+else $as_nop
+  ac_cv_safe_to_define___extensions__=no
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_safe_to_define___extensions__" >&5
 printf "%s\n" "$ac_cv_safe_to_define___extensions__" >&6; }
@@ -4444,8 +4372,8 @@ printf %s "checking whether _XOPEN_SOURCE should be defined... " >&6; }
 if test ${ac_cv_should_define__xopen_source+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_should_define__xopen_source=no
+else $as_nop
+  ac_cv_should_define__xopen_source=no
     if test $ac_cv_header_wchar_h = yes
 then :
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -4464,8 +4392,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
             #define _XOPEN_SOURCE 500
@@ -4483,12 +4411,10 @@ if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_should_define__xopen_source=yes
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-fi ;;
-esac
+fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_should_define__xopen_source" >&5
 printf "%s\n" "$ac_cv_should_define__xopen_source" >&6; }
@@ -4513,8 +4439,6 @@ printf "%s\n" "$ac_cv_should_define__xopen_source" >&6; }
 
   printf "%s\n" "#define __STDC_WANT_IEC_60559_DFP_EXT__ 1" >>confdefs.h
 
-  printf "%s\n" "#define __STDC_WANT_IEC_60559_EXT__ 1" >>confdefs.h
-
   printf "%s\n" "#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1" >>confdefs.h
 
   printf "%s\n" "#define __STDC_WANT_IEC_60559_TYPES_EXT__ 1" >>confdefs.h
@@ -4534,9 +4458,8 @@ then :
 
     printf "%s\n" "#define _POSIX_1_SOURCE 2" >>confdefs.h
 
-else case e in #(
-  e) MINIX= ;;
-esac
+else $as_nop
+  MINIX=
 fi
   if test $ac_cv_safe_to_define___extensions__ = yes
 then :
@@ -4574,8 +4497,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CXX"; then
+else $as_nop
+  if test -n "$CXX"; then
   ac_cv_prog_CXX="$CXX" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -4597,8 +4520,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CXX=$ac_cv_prog_CXX
 if test -n "$CXX"; then
@@ -4624,8 +4546,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CXX"; then
+else $as_nop
+  if test -n "$ac_ct_CXX"; then
   ac_cv_prog_ac_ct_CXX="$ac_ct_CXX" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -4647,8 +4569,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CXX=$ac_cv_prog_ac_ct_CXX
 if test -n "$ac_ct_CXX"; then
@@ -4708,8 +4629,8 @@ printf %s "checking whether the compiler supports GNU C++... " >&6; }
 if test ${ac_cv_cxx_compiler_gnu+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -4726,14 +4647,12 @@ _ACEOF
 if ac_fn_cxx_try_compile "$LINENO"
 then :
   ac_compiler_gnu=yes
-else case e in #(
-  e) ac_compiler_gnu=no ;;
-esac
+else $as_nop
+  ac_compiler_gnu=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 ac_cv_cxx_compiler_gnu=$ac_compiler_gnu
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_cxx_compiler_gnu" >&5
 printf "%s\n" "$ac_cv_cxx_compiler_gnu" >&6; }
@@ -4751,8 +4670,8 @@ printf %s "checking whether $CXX accepts -g... " >&6; }
 if test ${ac_cv_prog_cxx_g+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+else $as_nop
+  ac_save_cxx_werror_flag=$ac_cxx_werror_flag
    ac_cxx_werror_flag=yes
    ac_cv_prog_cxx_g=no
    CXXFLAGS="-g"
@@ -4770,8 +4689,8 @@ _ACEOF
 if ac_fn_cxx_try_compile "$LINENO"
 then :
   ac_cv_prog_cxx_g=yes
-else case e in #(
-  e) CXXFLAGS=""
+else $as_nop
+  CXXFLAGS=""
       cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -4786,8 +4705,8 @@ _ACEOF
 if ac_fn_cxx_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) ac_cxx_werror_flag=$ac_save_cxx_werror_flag
+else $as_nop
+  ac_cxx_werror_flag=$ac_save_cxx_werror_flag
 	 CXXFLAGS="-g"
 	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4804,15 +4723,12 @@ if ac_fn_cxx_try_compile "$LINENO"
 then :
   ac_cv_prog_cxx_g=yes
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-   ac_cxx_werror_flag=$ac_save_cxx_werror_flag ;;
-esac
+   ac_cxx_werror_flag=$ac_save_cxx_werror_flag
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_g" >&5
 printf "%s\n" "$ac_cv_prog_cxx_g" >&6; }
@@ -4836,11 +4752,11 @@ if test x$ac_prog_cxx_stdcxx = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CXX option to enable C++11 features" >&5
 printf %s "checking for $CXX option to enable C++11 features... " >&6; }
-if test ${ac_cv_prog_cxx_cxx11+y}
+if test ${ac_cv_prog_cxx_11+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cxx_cxx11=no
+else $as_nop
+  ac_cv_prog_cxx_11=no
 ac_save_CXX=$CXX
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4857,39 +4773,36 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cxx_cxx11" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CXX=$ac_save_CXX ;;
-esac
+CXX=$ac_save_CXX
 fi
 
 if test "x$ac_cv_prog_cxx_cxx11" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cxx_cxx11" = x
+else $as_nop
+  if test "x$ac_cv_prog_cxx_cxx11" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_cxx11" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_cxx11" >&5
 printf "%s\n" "$ac_cv_prog_cxx_cxx11" >&6; }
-     CXX="$CXX $ac_cv_prog_cxx_cxx11" ;;
-esac
+     CXX="$CXX $ac_cv_prog_cxx_cxx11"
 fi
   ac_cv_prog_cxx_stdcxx=$ac_cv_prog_cxx_cxx11
-  ac_prog_cxx_stdcxx=cxx11 ;;
-esac
+  ac_prog_cxx_stdcxx=cxx11
 fi
 fi
 if test x$ac_prog_cxx_stdcxx = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CXX option to enable C++98 features" >&5
 printf %s "checking for $CXX option to enable C++98 features... " >&6; }
-if test ${ac_cv_prog_cxx_cxx98+y}
+if test ${ac_cv_prog_cxx_98+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cxx_cxx98=no
+else $as_nop
+  ac_cv_prog_cxx_98=no
 ac_save_CXX=$CXX
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -4906,28 +4819,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cxx_cxx98" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CXX=$ac_save_CXX ;;
-esac
+CXX=$ac_save_CXX
 fi
 
 if test "x$ac_cv_prog_cxx_cxx98" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cxx_cxx98" = x
+else $as_nop
+  if test "x$ac_cv_prog_cxx_cxx98" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_cxx98" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_cxx98" >&5
 printf "%s\n" "$ac_cv_prog_cxx_cxx98" >&6; }
-     CXX="$CXX $ac_cv_prog_cxx_cxx98" ;;
-esac
+     CXX="$CXX $ac_cv_prog_cxx_cxx98"
 fi
   ac_cv_prog_cxx_stdcxx=$ac_cv_prog_cxx_cxx98
-  ac_prog_cxx_stdcxx=cxx98 ;;
-esac
+  ac_prog_cxx_stdcxx=cxx98
 fi
 fi
 
@@ -4955,17 +4865,17 @@ ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
       for switch in -std=c++${alternative} +std=c++${alternative} "-h std=c++${alternative}" MSVC; do
         if test x"$switch" = xMSVC; then
                                         switch=-std:c++${alternative}
-          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_${switch}_MSVC" | sed "$as_sed_sh"`
+          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_${switch}_MSVC" | $as_tr_sh`
         else
-          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_$switch" | sed "$as_sed_sh"`
+          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_$switch" | $as_tr_sh`
         fi
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $CXX supports C++14 features with $switch" >&5
 printf %s "checking whether $CXX supports C++14 features with $switch... " >&6; }
 if eval test \${$cachevar+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_CXX="$CXX"
+else $as_nop
+  ac_save_CXX="$CXX"
            CXX="$CXX $switch"
            cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -5385,13 +5295,11 @@ _ACEOF
 if ac_fn_cxx_try_compile "$LINENO"
 then :
   eval $cachevar=yes
-else case e in #(
-  e) eval $cachevar=no ;;
-esac
+else $as_nop
+  eval $cachevar=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-           CXX="$ac_save_CXX" ;;
-esac
+           CXX="$ac_save_CXX"
 fi
 eval ac_res=\$$cachevar
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -5433,7 +5341,7 @@ printf "%s\n" "#define HAVE_CXX14 1" >>confdefs.h
   fi
 
 
-am__api_version='1.17'
+am__api_version='1.16'
 
 
   # Find a good install program.  We prefer a C program (faster),
@@ -5456,8 +5364,8 @@ if test -z "$INSTALL"; then
 if test ${ac_cv_path_install+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+else $as_nop
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
 for as_dir in $PATH
 do
   IFS=$as_save_IFS
@@ -5511,8 +5419,7 @@ esac
 IFS=$as_save_IFS
 
 rm -rf conftest.one conftest.two conftest.dir
- ;;
-esac
+
 fi
   if test ${ac_cv_path_install+y}; then
     INSTALL=$ac_cv_path_install
@@ -5535,165 +5442,6 @@ test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}'
 
 test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644'
 
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether sleep supports fractional seconds" >&5
-printf %s "checking whether sleep supports fractional seconds... " >&6; }
-if test ${am_cv_sleep_fractional_seconds+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if sleep 0.001 2>/dev/null
-then :
-  am_cv_sleep_fractional_seconds=yes
-else case e in #(
-  e) am_cv_sleep_fractional_seconds=no ;;
-esac
-fi
- ;;
-esac
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_sleep_fractional_seconds" >&5
-printf "%s\n" "$am_cv_sleep_fractional_seconds" >&6; }
-
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking filesystem timestamp resolution" >&5
-printf %s "checking filesystem timestamp resolution... " >&6; }
-if test ${am_cv_filesystem_timestamp_resolution+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) # Default to the worst case.
-am_cv_filesystem_timestamp_resolution=2
-
-# Only try to go finer than 1 sec if sleep can do it.
-# Don't try 1 sec, because if 0.01 sec and 0.1 sec don't work,
-# - 1 sec is not much of a win compared to 2 sec, and
-# - it takes 2 seconds to perform the test whether 1 sec works.
-#
-# Instead, just use the default 2s on platforms that have 1s resolution,
-# accept the extra 1s delay when using $sleep in the Automake tests, in
-# exchange for not incurring the 2s delay for running the test for all
-# packages.
-#
-am_try_resolutions=
-if test "$am_cv_sleep_fractional_seconds" = yes; then
-  # Even a millisecond often causes a bunch of false positives,
-  # so just try a hundredth of a second. The time saved between .001 and
-  # .01 is not terribly consequential.
-  am_try_resolutions="0.01 0.1 $am_try_resolutions"
-fi
-
-# In order to catch current-generation FAT out, we must *modify* files
-# that already exist; the *creation* timestamp is finer.  Use names
-# that make ls -t sort them differently when they have equal
-# timestamps than when they have distinct timestamps, keeping
-# in mind that ls -t prints the *newest* file first.
-rm -f conftest.ts?
-: > conftest.ts1
-: > conftest.ts2
-: > conftest.ts3
-
-# Make sure ls -t actually works.  Do 'set' in a subshell so we don't
-# clobber the current shell's arguments. (Outer-level square brackets
-# are removed by m4; they're present so that m4 does not expand
-# <dollar><star>; be careful, easy to get confused.)
-if (
-     set X `ls -t conftest.ts[12]` &&
-     {
-       test "$*" != "X conftest.ts1 conftest.ts2" ||
-       test "$*" != "X conftest.ts2 conftest.ts1";
-     }
-); then :; else
-  # If neither matched, then we have a broken ls.  This can happen
-  # if, for instance, CONFIG_SHELL is bash and it inherits a
-  # broken ls alias from the environment.  This has actually
-  # happened.  Such a system could not be considered "sane".
-  printf "%s\n" ""Bad output from ls -t: \"`ls -t conftest.ts[12]`\""" >&5
-  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
-as_fn_error $? "ls -t produces unexpected output.
-Make sure there is not a broken ls alias in your environment.
-See 'config.log' for more details" "$LINENO" 5; }
-fi
-
-for am_try_res in $am_try_resolutions; do
-  # Any one fine-grained sleep might happen to cross the boundary
-  # between two values of a coarser actual resolution, but if we do
-  # two fine-grained sleeps in a row, at least one of them will fall
-  # entirely within a coarse interval.
-  echo alpha > conftest.ts1
-  sleep $am_try_res
-  echo beta > conftest.ts2
-  sleep $am_try_res
-  echo gamma > conftest.ts3
-
-  # We assume that 'ls -t' will make use of high-resolution
-  # timestamps if the operating system supports them at all.
-  if (set X `ls -t conftest.ts?` &&
-      test "$2" = conftest.ts3 &&
-      test "$3" = conftest.ts2 &&
-      test "$4" = conftest.ts1); then
-    #
-    # Ok, ls -t worked. If we're at a resolution of 1 second, we're done,
-    # because we don't need to test make.
-    make_ok=true
-    if test $am_try_res != 1; then
-      # But if we've succeeded so far with a subsecond resolution, we
-      # have one more thing to check: make. It can happen that
-      # everything else supports the subsecond mtimes, but make doesn't;
-      # notably on macOS, which ships make 3.81 from 2006 (the last one
-      # released under GPLv2). https://bugs.gnu.org/68808
-      #
-      # We test $MAKE if it is defined in the environment, else "make".
-      # It might get overridden later, but our hope is that in practice
-      # it does not matter: it is the system "make" which is (by far)
-      # the most likely to be broken, whereas if the user overrides it,
-      # probably they did so with a better, or at least not worse, make.
-      # https://lists.gnu.org/archive/html/automake/2024-06/msg00051.html
-      #
-      # Create a Makefile (real tab character here):
-      rm -f conftest.mk
-      echo 'conftest.ts1: conftest.ts2' >conftest.mk
-      echo '	touch conftest.ts2' >>conftest.mk
-      #
-      # Now, running
-      #   touch conftest.ts1; touch conftest.ts2; make
-      # should touch ts1 because ts2 is newer. This could happen by luck,
-      # but most often, it will fail if make's support is insufficient. So
-      # test for several consecutive successes.
-      #
-      # (We reuse conftest.ts[12] because we still want to modify existing
-      # files, not create new ones, per above.)
-      n=0
-      make=${MAKE-make}
-      until test $n -eq 3; do
-        echo one > conftest.ts1
-        sleep $am_try_res
-        echo two > conftest.ts2 # ts2 should now be newer than ts1
-        if $make -f conftest.mk | grep 'up to date' >/dev/null; then
-          make_ok=false
-          break # out of $n loop
-        fi
-        n=`expr $n + 1`
-      done
-    fi
-    #
-    if $make_ok; then
-      # Everything we know to check worked out, so call this resolution good.
-      am_cv_filesystem_timestamp_resolution=$am_try_res
-      break # out of $am_try_res loop
-    fi
-    # Otherwise, we'll go on to check the next resolution.
-  fi
-done
-rm -f conftest.ts?
-# (end _am_filesystem_timestamp_resolution)
- ;;
-esac
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_filesystem_timestamp_resolution" >&5
-printf "%s\n" "$am_cv_filesystem_timestamp_resolution" >&6; }
-
-# This check should not be cached, as it may vary across builds of
-# different projects.
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether build environment is sane" >&5
 printf %s "checking whether build environment is sane... " >&6; }
 # Reject unsafe characters in $srcdir or the absolute working directory
@@ -5714,45 +5462,49 @@ esac
 # symlink; some systems play weird games with the mod time of symlinks
 # (eg FreeBSD returns the mod time of the symlink's containing
 # directory).
-am_build_env_is_sane=no
-am_has_slept=no
-rm -f conftest.file
-for am_try in 1 2; do
-  echo "timestamp, slept: $am_has_slept" > conftest.file
-  if (
-    set X `ls -Lt "$srcdir/configure" conftest.file 2> /dev/null`
-    if test "$*" = "X"; then
-      # -L didn't work.
-      set X `ls -t "$srcdir/configure" conftest.file`
-    fi
-    test "$2" = conftest.file
-  ); then
-    am_build_env_is_sane=yes
-    break
-  fi
-  # Just in case.
-  sleep "$am_cv_filesystem_timestamp_resolution"
-  am_has_slept=yes
-done
-
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_build_env_is_sane" >&5
-printf "%s\n" "$am_build_env_is_sane" >&6; }
-if test "$am_build_env_is_sane" = no; then
-  as_fn_error $? "newly created file is older than distributed files!
+if (
+   am_has_slept=no
+   for am_try in 1 2; do
+     echo "timestamp, slept: $am_has_slept" > conftest.file
+     set X `ls -Lt "$srcdir/configure" conftest.file 2> /dev/null`
+     if test "$*" = "X"; then
+	# -L didn't work.
+	set X `ls -t "$srcdir/configure" conftest.file`
+     fi
+     if test "$*" != "X $srcdir/configure conftest.file" \
+	&& test "$*" != "X conftest.file $srcdir/configure"; then
+
+	# If neither matched, then we have a broken ls.  This can happen
+	# if, for instance, CONFIG_SHELL is bash and it inherits a
+	# broken ls alias from the environment.  This has actually
+	# happened.  Such a system could not be considered "sane".
+	as_fn_error $? "ls -t appears to fail.  Make sure there is not a broken
+  alias in your environment" "$LINENO" 5
+     fi
+     if test "$2" = conftest.file || test $am_try -eq 2; then
+       break
+     fi
+     # Just in case.
+     sleep 1
+     am_has_slept=yes
+   done
+   test "$2" = conftest.file
+   )
+then
+   # Ok.
+   :
+else
+   as_fn_error $? "newly created file is older than distributed files!
 Check your system clock" "$LINENO" 5
 fi
-
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+printf "%s\n" "yes" >&6; }
 # If we didn't sleep, we still need to ensure time stamps of config.status and
 # generated files are strictly newer.
 am_sleep_pid=
-if test -e conftest.file || grep 'slept: no' conftest.file >/dev/null 2>&1
-then :
-
-else case e in #(
-  e)   ( sleep "$am_cv_filesystem_timestamp_resolution" ) &
+if grep 'slept: no' conftest.file >/dev/null 2>&1; then
+  ( sleep 1 ) &
   am_sleep_pid=$!
- ;;
-esac
 fi
 
 rm -f conftest.file
@@ -5763,7 +5515,7 @@ test "$program_prefix" != NONE &&
 test "$program_suffix" != NONE &&
   program_transform_name="s&\$&$program_suffix&;$program_transform_name"
 # Double any \ or $.
-# By default was 's,x,x', remove it if useless.
+# By default was `s,x,x', remove it if useless.
 ac_script='s/[\\$]/&&/g;s/;s,x,x,$//'
 program_transform_name=`printf "%s\n" "$program_transform_name" | sed "$ac_script"`
 
@@ -5802,8 +5554,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_STRIP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$STRIP"; then
+else $as_nop
+  if test -n "$STRIP"; then
   ac_cv_prog_STRIP="$STRIP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -5825,8 +5577,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 STRIP=$ac_cv_prog_STRIP
 if test -n "$STRIP"; then
@@ -5848,8 +5599,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_STRIP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_STRIP"; then
+else $as_nop
+  if test -n "$ac_ct_STRIP"; then
   ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -5871,8 +5622,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP
 if test -n "$ac_ct_STRIP"; then
@@ -5908,8 +5658,8 @@ if test -z "$MKDIR_P"; then
   if test ${ac_cv_path_mkdir+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+else $as_nop
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
 for as_dir in $PATH$PATH_SEPARATOR/opt/sfw/bin
 do
   IFS=$as_save_IFS
@@ -5923,7 +5673,7 @@ do
 	   as_fn_executable_p "$as_dir$ac_prog$ac_exec_ext" || continue
 	   case `"$as_dir$ac_prog$ac_exec_ext" --version 2>&1` in #(
 	     'mkdir ('*'coreutils) '* | \
-	     *'BusyBox '* | \
+	     'BusyBox '* | \
 	     'mkdir (fileutils) '4.1*)
 	       ac_cv_path_mkdir=$as_dir$ac_prog$ac_exec_ext
 	       break 3;;
@@ -5932,17 +5682,18 @@ do
        done
   done
 IFS=$as_save_IFS
- ;;
-esac
+
 fi
 
   test -d ./--version && rmdir ./--version
   if test ${ac_cv_path_mkdir+y}; then
     MKDIR_P="$ac_cv_path_mkdir -p"
   else
-    # As a last resort, use plain mkdir -p,
-    # in the hope it doesn't have the bugs of ancient mkdir.
-    MKDIR_P='mkdir -p'
+    # As a last resort, use the slow shell script.  Don't cache a
+    # value for MKDIR_P within a source directory, because that will
+    # break other packages using the cache if that directory is
+    # removed, or if the value is a relative name.
+    MKDIR_P="$ac_install_sh -d"
   fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $MKDIR_P" >&5
@@ -5957,8 +5708,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_AWK+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$AWK"; then
+else $as_nop
+  if test -n "$AWK"; then
   ac_cv_prog_AWK="$AWK" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -5980,8 +5731,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 AWK=$ac_cv_prog_AWK
 if test -n "$AWK"; then
@@ -6003,8 +5753,8 @@ ac_make=`printf "%s\n" "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'`
 if eval test \${ac_cv_prog_make_${ac_make}_set+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat >conftest.make <<\_ACEOF
+else $as_nop
+  cat >conftest.make <<\_ACEOF
 SHELL = /bin/sh
 all:
 	@echo '@@@%%%=$(MAKE)=@@@%%%'
@@ -6016,8 +5766,7 @@ case `${MAKE-make} -f conftest.make 2>/dev/null` in
   *)
     eval ac_cv_prog_make_${ac_make}_set=no;;
 esac
-rm -f conftest.make ;;
-esac
+rm -f conftest.make
 fi
 if eval test \$ac_cv_prog_make_${ac_make}_set = yes; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
@@ -6102,21 +5851,25 @@ else
 fi
 
 
-AM_DEFAULT_VERBOSITY=1
 # Check whether --enable-silent-rules was given.
 if test ${enable_silent_rules+y}
 then :
   enableval=$enable_silent_rules;
 fi
 
+case $enable_silent_rules in # (((
+  yes) AM_DEFAULT_VERBOSITY=0;;
+   no) AM_DEFAULT_VERBOSITY=1;;
+    *) AM_DEFAULT_VERBOSITY=1;;
+esac
 am_make=${MAKE-make}
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5
 printf %s "checking whether $am_make supports nested variables... " >&6; }
 if test ${am_cv_make_support_nested_variables+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if printf "%s\n" 'TRUE=$(BAR$(V))
+else $as_nop
+  if printf "%s\n" 'TRUE=$(BAR$(V))
 BAR0=false
 BAR1=true
 V=1
@@ -6126,49 +5879,18 @@ am__doit:
   am_cv_make_support_nested_variables=yes
 else
   am_cv_make_support_nested_variables=no
-fi ;;
-esac
+fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5
 printf "%s\n" "$am_cv_make_support_nested_variables" >&6; }
-AM_BACKSLASH='\'
-
-am__rm_f_notfound=
-if (rm -f && rm -fr && rm -rf) 2>/dev/null
-then :
-
-else case e in #(
-  e) am__rm_f_notfound='""' ;;
-esac
-fi
-
-
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking xargs -n works" >&5
-printf %s "checking xargs -n works... " >&6; }
-if test ${am_cv_xargs_n_works+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test "`echo 1 2 3 | xargs -n2 echo`" = "1 2
-3"
-then :
-  am_cv_xargs_n_works=yes
-else case e in #(
-  e) am_cv_xargs_n_works=no ;;
-esac
-fi ;;
-esac
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_xargs_n_works" >&5
-printf "%s\n" "$am_cv_xargs_n_works" >&6; }
-if test "$am_cv_xargs_n_works" = yes
-then :
-  am__xargs_n='xargs -n'
-else case e in #(
-  e)   am__xargs_n='am__xargs_n () { shift; sed "s/ /\\n/g" | while read am__xargs_n_arg; do "" "$am__xargs_n_arg"; done; }'
- ;;
-esac
+if test $am_cv_make_support_nested_variables = yes; then
+    AM_V='$(V)'
+  AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)'
+else
+  AM_V=$AM_DEFAULT_VERBOSITY
+  AM_DEFAULT_V=$AM_DEFAULT_VERBOSITY
 fi
+AM_BACKSLASH='\'
 
 if test "`cd $srcdir && pwd`" != "`pwd`"; then
   # Use -I$(srcdir) only when $(srcdir) != ., so that make's output
@@ -6192,7 +5914,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='c-ares'
- VERSION='1.34.2'
+ VERSION='1.34.3'
 
 
 printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h
@@ -6245,8 +5967,8 @@ printf %s "checking dependency style of $depcc... " >&6; }
 if test ${am_cv_CC_dependencies_compiler_type+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
+else $as_nop
+  if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
   # We make a subdir and do the tests there.  Otherwise we can end up
   # making bogus files that we don't know about and never remove.  For
   # instance it was reported that on HP-UX the gcc test will end up
@@ -6333,7 +6055,7 @@ else case e in #(
       # icc doesn't choke on unknown options, it will just issue warnings
       # or remarks (even with -Werror).  So we grep stderr for any message
       # that says an option was ignored or not supported.
-      # When given -MP, icc 7.0 and 7.1 complain thus:
+      # When given -MP, icc 7.0 and 7.1 complain thusly:
       #   icc: Command line warning: ignoring option '-M'; no argument required
       # The diagnosis changed in icc 8.0:
       #   icc: Command line remark: option '-MP' not supported
@@ -6350,8 +6072,7 @@ else case e in #(
 else
   am_cv_CC_dependencies_compiler_type=none
 fi
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_CC_dependencies_compiler_type" >&5
 printf "%s\n" "$am_cv_CC_dependencies_compiler_type" >&6; }
@@ -6375,8 +6096,8 @@ printf %s "checking dependency style of $depcc... " >&6; }
 if test ${am_cv_CXX_dependencies_compiler_type+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
+else $as_nop
+  if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
   # We make a subdir and do the tests there.  Otherwise we can end up
   # making bogus files that we don't know about and never remove.  For
   # instance it was reported that on HP-UX the gcc test will end up
@@ -6463,7 +6184,7 @@ else case e in #(
       # icc doesn't choke on unknown options, it will just issue warnings
       # or remarks (even with -Werror).  So we grep stderr for any message
       # that says an option was ignored or not supported.
-      # When given -MP, icc 7.0 and 7.1 complain thus:
+      # When given -MP, icc 7.0 and 7.1 complain thusly:
       #   icc: Command line warning: ignoring option '-M'; no argument required
       # The diagnosis changed in icc 8.0:
       #   icc: Command line remark: option '-MP' not supported
@@ -6480,8 +6201,7 @@ else case e in #(
 else
   am_cv_CXX_dependencies_compiler_type=none
 fi
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_CXX_dependencies_compiler_type" >&5
 printf "%s\n" "$am_cv_CXX_dependencies_compiler_type" >&6; }
@@ -6513,9 +6233,47 @@ fi
 
 
 
+# POSIX will say in a future version that running "rm -f" with no argument
+# is OK; and we want to be able to make that assumption in our Makefile
+# recipes.  So use an aggressive probe to check that the usage we want is
+# actually supported "in the wild" to an acceptable degree.
+# See automake bug#10828.
+# To make any issue more visible, cause the running configure to be aborted
+# by default if the 'rm' program in use doesn't match our expectations; the
+# user can still override this though.
+if rm -f && rm -fr && rm -rf; then : OK; else
+  cat >&2 <<'END'
+Oops!
+
+Your 'rm' program seems unable to run without file operands specified
+on the command line, even when the '-f' option is present.  This is contrary
+to the behaviour of most rm programs out there, and not conforming with
+the upcoming POSIX standard: <http://austingroupbugs.net/view.php?id=542>
+
+Please tell bug-automake@gnu.org about your system, including the value
+of your $PATH and any error possibly output before this message.  This
+can help us improve future automake versions.
 
+END
+  if test x"$ACCEPT_INFERIOR_RM_PROGRAM" = x"yes"; then
+    echo 'Configuration will proceed anyway, since you have set the' >&2
+    echo 'ACCEPT_INFERIOR_RM_PROGRAM variable to "yes"' >&2
+    echo >&2
+  else
+    cat >&2 <<'END'
+Aborting the configuration process, to ensure you take notice of the issue.
 
+You can download and install GNU coreutils to get an 'rm' implementation
+that behaves properly: <https://www.gnu.org/software/coreutils/>.
 
+If you want to complete the configuration process using your problematic
+'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
+to "yes", and re-run configure.
+
+END
+    as_fn_error $? "Your 'rm' program is bad, sorry." "$LINENO" 5
+  fi
+fi
 
 case `pwd` in
   *\ * | *\	*)
@@ -6525,8 +6283,8 @@ esac
 
 
 
-macro_version='2.5.3'
-macro_revision='2.5.3'
+macro_version='2.4.6'
+macro_revision='2.4.6'
 
 
 
@@ -6554,16 +6312,15 @@ printf %s "checking build system type... " >&6; }
 if test ${ac_cv_build+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_build_alias=$build_alias
+else $as_nop
+  ac_build_alias=$build_alias
 test "x$ac_build_alias" = x &&
   ac_build_alias=`$SHELL "${ac_aux_dir}config.guess"`
 test "x$ac_build_alias" = x &&
   as_fn_error $? "cannot guess build type; you must specify one" "$LINENO" 5
 ac_cv_build=`$SHELL "${ac_aux_dir}config.sub" $ac_build_alias` ||
   as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $ac_build_alias failed" "$LINENO" 5
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_build" >&5
 printf "%s\n" "$ac_cv_build" >&6; }
@@ -6590,15 +6347,14 @@ printf %s "checking host system type... " >&6; }
 if test ${ac_cv_host+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test "x$host_alias" = x; then
+else $as_nop
+  if test "x$host_alias" = x; then
   ac_cv_host=$ac_cv_build
 else
   ac_cv_host=`$SHELL "${ac_aux_dir}config.sub" $host_alias` ||
     as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $host_alias failed" "$LINENO" 5
 fi
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_host" >&5
 printf "%s\n" "$ac_cv_host" >&6; }
@@ -6694,8 +6450,8 @@ printf %s "checking for a sed that does not truncate output... " >&6; }
 if test ${ac_cv_path_SED+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)           ac_script=s/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb/
+else $as_nop
+            ac_script=s/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb/
      for ac_i in 1 2 3 4 5 6 7; do
        ac_script="$ac_script$as_nl$ac_script"
      done
@@ -6720,10 +6476,9 @@ do
       as_fn_executable_p "$ac_path_SED" || continue
 # Check for GNU ac_path_SED and select it if it is found.
   # Check for GNU $ac_path_SED
-case `"$ac_path_SED" --version 2>&1` in #(
+case `"$ac_path_SED" --version 2>&1` in
 *GNU*)
   ac_cv_path_SED="$ac_path_SED" ac_path_SED_found=:;;
-#(
 *)
   ac_count=0
   printf %s 0123456789 >"conftest.in"
@@ -6758,8 +6513,7 @@ IFS=$as_save_IFS
 else
   ac_cv_path_SED=$SED
 fi
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_SED" >&5
 printf "%s\n" "$ac_cv_path_SED" >&6; }
@@ -6784,8 +6538,8 @@ printf %s "checking for grep that handles long lines and -e... " >&6; }
 if test ${ac_cv_path_GREP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$GREP"; then
+else $as_nop
+  if test -z "$GREP"; then
   ac_path_GREP_found=false
   # Loop through the user's path and test for each of PROGNAME-LIST
   as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -6804,10 +6558,9 @@ do
       as_fn_executable_p "$ac_path_GREP" || continue
 # Check for GNU ac_path_GREP and select it if it is found.
   # Check for GNU $ac_path_GREP
-case `"$ac_path_GREP" --version 2>&1` in #(
+case `"$ac_path_GREP" --version 2>&1` in
 *GNU*)
   ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;;
-#(
 *)
   ac_count=0
   printf %s 0123456789 >"conftest.in"
@@ -6842,8 +6595,7 @@ IFS=$as_save_IFS
 else
   ac_cv_path_GREP=$GREP
 fi
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5
 printf "%s\n" "$ac_cv_path_GREP" >&6; }
@@ -6855,8 +6607,8 @@ printf %s "checking for egrep... " >&6; }
 if test ${ac_cv_path_EGREP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
+else $as_nop
+  if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
    then ac_cv_path_EGREP="$GREP -E"
    else
      if test -z "$EGREP"; then
@@ -6878,10 +6630,9 @@ do
       as_fn_executable_p "$ac_path_EGREP" || continue
 # Check for GNU ac_path_EGREP and select it if it is found.
   # Check for GNU $ac_path_EGREP
-case `"$ac_path_EGREP" --version 2>&1` in #(
+case `"$ac_path_EGREP" --version 2>&1` in
 *GNU*)
   ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;;
-#(
 *)
   ac_count=0
   printf %s 0123456789 >"conftest.in"
@@ -6917,23 +6668,20 @@ else
   ac_cv_path_EGREP=$EGREP
 fi
 
-   fi ;;
-esac
+   fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5
 printf "%s\n" "$ac_cv_path_EGREP" >&6; }
  EGREP="$ac_cv_path_EGREP"
 
-         EGREP_TRADITIONAL=$EGREP
- ac_cv_path_EGREP_TRADITIONAL=$EGREP
 
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fgrep" >&5
 printf %s "checking for fgrep... " >&6; }
 if test ${ac_cv_path_FGREP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if echo 'ab*c' | $GREP -F 'ab*c' >/dev/null 2>&1
+else $as_nop
+  if echo 'ab*c' | $GREP -F 'ab*c' >/dev/null 2>&1
    then ac_cv_path_FGREP="$GREP -F"
    else
      if test -z "$FGREP"; then
@@ -6955,10 +6703,9 @@ do
       as_fn_executable_p "$ac_path_FGREP" || continue
 # Check for GNU ac_path_FGREP and select it if it is found.
   # Check for GNU $ac_path_FGREP
-case `"$ac_path_FGREP" --version 2>&1` in #(
+case `"$ac_path_FGREP" --version 2>&1` in
 *GNU*)
   ac_cv_path_FGREP="$ac_path_FGREP" ac_path_FGREP_found=:;;
-#(
 *)
   ac_count=0
   printf %s 0123456789 >"conftest.in"
@@ -6994,8 +6741,7 @@ else
   ac_cv_path_FGREP=$FGREP
 fi
 
-   fi ;;
-esac
+   fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_FGREP" >&5
 printf "%s\n" "$ac_cv_path_FGREP" >&6; }
@@ -7026,9 +6772,8 @@ test -z "$GREP" && GREP=grep
 if test ${with_gnu_ld+y}
 then :
   withval=$with_gnu_ld; test no = "$withval" || with_gnu_ld=yes
-else case e in #(
-  e) with_gnu_ld=no ;;
-esac
+else $as_nop
+  with_gnu_ld=no
 fi
 
 ac_prog=ld
@@ -7037,7 +6782,7 @@ if test yes = "$GCC"; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5
 printf %s "checking for ld used by $CC... " >&6; }
   case $host in
-  *-*-mingw* | *-*-windows*)
+  *-*-mingw*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -7073,8 +6818,8 @@ fi
 if test ${lt_cv_path_LD+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$LD"; then
+else $as_nop
+  if test -z "$LD"; then
   lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR
   for ac_dir in $PATH; do
     IFS=$lt_save_ifs
@@ -7097,8 +6842,7 @@ else case e in #(
   IFS=$lt_save_ifs
 else
   lt_cv_path_LD=$LD # Let the user override the test with a path.
-fi ;;
-esac
+fi
 fi
 
 LD=$lt_cv_path_LD
@@ -7115,8 +6859,8 @@ printf %s "checking if the linker ($LD) is GNU ld... " >&6; }
 if test ${lt_cv_prog_gnu_ld+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) # I'd rather use --version here, but apparently some GNU lds only accept -v.
+else $as_nop
+  # I'd rather use --version here, but apparently some GNU lds only accept -v.
 case `$LD -v 2>&1 </dev/null` in
 *GNU* | *'with BFD'*)
   lt_cv_prog_gnu_ld=yes
@@ -7124,7 +6868,6 @@ case `$LD -v 2>&1 </dev/null` in
 *)
   lt_cv_prog_gnu_ld=no
   ;;
-esac ;;
 esac
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_gnu_ld" >&5
@@ -7144,8 +6887,8 @@ printf %s "checking for BSD- or MS-compatible name lister (nm)... " >&6; }
 if test ${lt_cv_path_NM+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$NM"; then
+else $as_nop
+  if test -n "$NM"; then
   # Let the user override the test.
   lt_cv_path_NM=$NM
 else
@@ -7166,16 +6909,16 @@ else
 	# Tru64's nm complains that /dev/null is an invalid object file
 	# MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty
 	case $build_os in
-	mingw* | windows*) lt_bad_file=conftest.nm/nofile ;;
+	mingw*) lt_bad_file=conftest.nm/nofile ;;
 	*) lt_bad_file=/dev/null ;;
 	esac
-	case `"$tmp_nm" -B $lt_bad_file 2>&1 | $SED '1q'` in
+	case `"$tmp_nm" -B $lt_bad_file 2>&1 | sed '1q'` in
 	*$lt_bad_file* | *'Invalid file or object type'*)
 	  lt_cv_path_NM="$tmp_nm -B"
 	  break 2
 	  ;;
 	*)
-	  case `"$tmp_nm" -p /dev/null 2>&1 | $SED '1q'` in
+	  case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in
 	  */dev/null*)
 	    lt_cv_path_NM="$tmp_nm -p"
 	    break 2
@@ -7192,8 +6935,7 @@ else
     IFS=$lt_save_ifs
   done
   : ${lt_cv_path_NM=no}
-fi ;;
-esac
+fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_NM" >&5
 printf "%s\n" "$lt_cv_path_NM" >&6; }
@@ -7214,8 +6956,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_DUMPBIN+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$DUMPBIN"; then
+else $as_nop
+  if test -n "$DUMPBIN"; then
   ac_cv_prog_DUMPBIN="$DUMPBIN" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -7237,8 +6979,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 DUMPBIN=$ac_cv_prog_DUMPBIN
 if test -n "$DUMPBIN"; then
@@ -7264,8 +7005,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_DUMPBIN+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_DUMPBIN"; then
+else $as_nop
+  if test -n "$ac_ct_DUMPBIN"; then
   ac_cv_prog_ac_ct_DUMPBIN="$ac_ct_DUMPBIN" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -7287,8 +7028,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_DUMPBIN=$ac_cv_prog_ac_ct_DUMPBIN
 if test -n "$ac_ct_DUMPBIN"; then
@@ -7316,7 +7056,7 @@ esac
   fi
 fi
 
-    case `$DUMPBIN -symbols -headers /dev/null 2>&1 | $SED '1q'` in
+    case `$DUMPBIN -symbols -headers /dev/null 2>&1 | sed '1q'` in
     *COFF*)
       DUMPBIN="$DUMPBIN -symbols -headers"
       ;;
@@ -7342,8 +7082,8 @@ printf %s "checking the name lister ($NM) interface... " >&6; }
 if test ${lt_cv_nm_interface+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_nm_interface="BSD nm"
+else $as_nop
+  lt_cv_nm_interface="BSD nm"
   echo "int some_variable = 0;" > conftest.$ac_ext
   (eval echo "\"\$as_me:$LINENO: $ac_compile\"" >&5)
   (eval "$ac_compile" 2>conftest.err)
@@ -7356,8 +7096,7 @@ else case e in #(
   if $GREP 'External.*some_variable' conftest.out > /dev/null; then
     lt_cv_nm_interface="MS dumpbin"
   fi
-  rm -f conftest* ;;
-esac
+  rm -f conftest*
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_nm_interface" >&5
 printf "%s\n" "$lt_cv_nm_interface" >&6; }
@@ -7379,8 +7118,8 @@ printf %s "checking the maximum length of command line arguments... " >&6; }
 if test ${lt_cv_sys_max_cmd_len+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)   i=0
+else $as_nop
+    i=0
   teststring=ABCD
 
   case $build_os in
@@ -7399,7 +7138,7 @@ else case e in #(
     lt_cv_sys_max_cmd_len=-1;
     ;;
 
-  cygwin* | mingw* | windows* | cegcc*)
+  cygwin* | mingw* | cegcc*)
     # On Win9x/ME, this test blows up -- it succeeds, but takes
     # about 5 minutes as the teststring grows exponentially.
     # Worse, since 9x/ME are not pre-emptively multitasking,
@@ -7421,7 +7160,7 @@ else case e in #(
     lt_cv_sys_max_cmd_len=8192;
     ;;
 
-  darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
+  bitrig* | darwin* | dragonfly* | freebsd* | netbsd* | openbsd*)
     # This has been around since 386BSD, at least.  Likely further.
     if test -x /sbin/sysctl; then
       lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax`
@@ -7464,7 +7203,7 @@ else case e in #(
   sysv5* | sco5v6* | sysv4.2uw2*)
     kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null`
     if test -n "$kargmax"; then
-      lt_cv_sys_max_cmd_len=`echo $kargmax | $SED 's/.*[	 ]//'`
+      lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[	 ]//'`
     else
       lt_cv_sys_max_cmd_len=32768
     fi
@@ -7502,8 +7241,7 @@ else case e in #(
     fi
     ;;
   esac
- ;;
-esac
+
 fi
 
 if test -n "$lt_cv_sys_max_cmd_len"; then
@@ -7560,11 +7298,11 @@ printf %s "checking how to convert $build file names to $host format... " >&6; }
 if test ${lt_cv_to_host_file_cmd+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) case $host in
+else $as_nop
+  case $host in
   *-*-mingw* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32
         ;;
       *-*-cygwin* )
@@ -7577,7 +7315,7 @@ else case e in #(
     ;;
   *-*-cygwin* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin
         ;;
       *-*-cygwin* )
@@ -7592,8 +7330,7 @@ else case e in #(
     lt_cv_to_host_file_cmd=func_convert_file_noop
     ;;
 esac
- ;;
-esac
+
 fi
 
 to_host_file_cmd=$lt_cv_to_host_file_cmd
@@ -7609,20 +7346,19 @@ printf %s "checking how to convert $build file names to toolchain format... " >&
 if test ${lt_cv_to_tool_file_cmd+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) #assume ordinary cross tools, or native build.
+else $as_nop
+  #assume ordinary cross tools, or native build.
 lt_cv_to_tool_file_cmd=func_convert_file_noop
 case $host in
-  *-*-mingw* | *-*-windows* )
+  *-*-mingw* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32
         ;;
     esac
     ;;
 esac
- ;;
-esac
+
 fi
 
 to_tool_file_cmd=$lt_cv_to_tool_file_cmd
@@ -7638,9 +7374,8 @@ printf %s "checking for $LD option to reload object files... " >&6; }
 if test ${lt_cv_ld_reload_flag+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_ld_reload_flag='-r' ;;
-esac
+else $as_nop
+  lt_cv_ld_reload_flag='-r'
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_reload_flag" >&5
 printf "%s\n" "$lt_cv_ld_reload_flag" >&6; }
@@ -7651,7 +7386,7 @@ case $reload_flag in
 esac
 reload_cmds='$LD$reload_flag -o $output$reload_objs'
 case $host_os in
-  cygwin* | mingw* | windows* | pw32* | cegcc*)
+  cygwin* | mingw* | pw32* | cegcc*)
     if test yes != "$GCC"; then
       reload_cmds=false
     fi
@@ -7673,56 +7408,6 @@ esac
 
 
 
-# Extract the first word of "file", so it can be a program name with args.
-set dummy file; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_FILECMD+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$FILECMD"; then
-  ac_cv_prog_FILECMD="$FILECMD" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_FILECMD="file"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-  test -z "$ac_cv_prog_FILECMD" && ac_cv_prog_FILECMD=":"
-fi ;;
-esac
-fi
-FILECMD=$ac_cv_prog_FILECMD
-if test -n "$FILECMD"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $FILECMD" >&5
-printf "%s\n" "$FILECMD" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-
-
-
-
-
-
 if test -n "$ac_tool_prefix"; then
   # Extract the first word of "${ac_tool_prefix}objdump", so it can be a program name with args.
 set dummy ${ac_tool_prefix}objdump; ac_word=$2
@@ -7731,8 +7416,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_OBJDUMP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$OBJDUMP"; then
+else $as_nop
+  if test -n "$OBJDUMP"; then
   ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -7754,8 +7439,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 OBJDUMP=$ac_cv_prog_OBJDUMP
 if test -n "$OBJDUMP"; then
@@ -7777,8 +7461,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_OBJDUMP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_OBJDUMP"; then
+else $as_nop
+  if test -n "$ac_ct_OBJDUMP"; then
   ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -7800,8 +7484,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP
 if test -n "$ac_ct_OBJDUMP"; then
@@ -7839,8 +7522,8 @@ printf %s "checking how to recognize dependent libraries... " >&6; }
 if test ${lt_cv_deplibs_check_method+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_file_magic_cmd='$MAGIC_CMD'
+else $as_nop
+  lt_cv_file_magic_cmd='$MAGIC_CMD'
 lt_cv_file_magic_test_file=
 lt_cv_deplibs_check_method='unknown'
 # Need to set the preceding variable on all platforms that support
@@ -7848,6 +7531,7 @@ lt_cv_deplibs_check_method='unknown'
 # 'none' -- dependencies not supported.
 # 'unknown' -- same as none, but documents that we really don't know.
 # 'pass_all' -- all dependencies passed with no checks.
+# 'test_compile' -- check by making test program.
 # 'file_magic [[regex]]' -- check by looking for files in library path
 # that responds to the $file_magic_cmd with a given extended regex.
 # If you have 'file' or equivalent on your system and you're not sure
@@ -7864,7 +7548,7 @@ beos*)
 
 bsdi[45]*)
   lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib)'
-  lt_cv_file_magic_cmd='$FILECMD -L'
+  lt_cv_file_magic_cmd='/usr/bin/file -L'
   lt_cv_file_magic_test_file=/shlib/libc.so
   ;;
 
@@ -7874,7 +7558,7 @@ cygwin*)
   lt_cv_file_magic_cmd='func_win32_libid'
   ;;
 
-mingw* | windows* | pw32*)
+mingw* | pw32*)
   # Base MSYS/MinGW do not provide the 'file' command needed by
   # func_win32_libid shell function, so use a weaker test based on 'objdump',
   # unless we find 'file', for example because we are cross-compiling.
@@ -7883,7 +7567,7 @@ mingw* | windows* | pw32*)
     lt_cv_file_magic_cmd='func_win32_libid'
   else
     # Keep this pattern in sync with the one in func_win32_libid.
-    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64|pe-aarch64)'
+    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)'
     lt_cv_file_magic_cmd='$OBJDUMP -f'
   fi
   ;;
@@ -7898,14 +7582,14 @@ darwin* | rhapsody*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-freebsd* | dragonfly* | midnightbsd*)
+freebsd* | dragonfly*)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     case $host_cpu in
     i*86 )
       # Not sure whether the presence of OpenBSD here was a mistake.
       # Let's accept both of them until this is cleared up.
       lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[3-9]86 (compact )?demand paged shared library'
-      lt_cv_file_magic_cmd=$FILECMD
+      lt_cv_file_magic_cmd=/usr/bin/file
       lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*`
       ;;
     esac
@@ -7919,7 +7603,7 @@ haiku*)
   ;;
 
 hpux10.20* | hpux11*)
-  lt_cv_file_magic_cmd=$FILECMD
+  lt_cv_file_magic_cmd=/usr/bin/file
   case $host_cpu in
   ia64*)
     lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - IA64'
@@ -7956,7 +7640,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-netbsd*)
+netbsd* | netbsdelf*-gnu)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$'
   else
@@ -7966,7 +7650,7 @@ netbsd*)
 
 newos6*)
   lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (executable|dynamic lib)'
-  lt_cv_file_magic_cmd=$FILECMD
+  lt_cv_file_magic_cmd=/usr/bin/file
   lt_cv_file_magic_test_file=/usr/lib/libnls.so
   ;;
 
@@ -7974,7 +7658,7 @@ newos6*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-openbsd*)
+openbsd* | bitrig*)
   if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then
     lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|\.so|_pic\.a)$'
   else
@@ -8032,8 +7716,7 @@ os2*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 esac
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_deplibs_check_method" >&5
 printf "%s\n" "$lt_cv_deplibs_check_method" >&6; }
@@ -8042,7 +7725,7 @@ file_magic_glob=
 want_nocaseglob=no
 if test "$build" = "$host"; then
   case $host_os in
-  mingw* | windows* | pw32*)
+  mingw* | pw32*)
     if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then
       want_nocaseglob=yes
     else
@@ -8085,8 +7768,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_DLLTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$DLLTOOL"; then
+else $as_nop
+  if test -n "$DLLTOOL"; then
   ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8108,8 +7791,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 DLLTOOL=$ac_cv_prog_DLLTOOL
 if test -n "$DLLTOOL"; then
@@ -8131,8 +7813,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_DLLTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_DLLTOOL"; then
+else $as_nop
+  if test -n "$ac_ct_DLLTOOL"; then
   ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8154,8 +7836,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL
 if test -n "$ac_ct_DLLTOOL"; then
@@ -8194,11 +7875,11 @@ printf %s "checking how to associate runtime and link libraries... " >&6; }
 if test ${lt_cv_sharedlib_from_linklib_cmd+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_sharedlib_from_linklib_cmd='unknown'
+else $as_nop
+  lt_cv_sharedlib_from_linklib_cmd='unknown'
 
 case $host_os in
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   # two different shell functions defined in ltmain.sh;
   # decide which one to use based on capabilities of $DLLTOOL
   case `$DLLTOOL --help 2>&1` in
@@ -8215,8 +7896,7 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
   lt_cv_sharedlib_from_linklib_cmd=$ECHO
   ;;
 esac
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_sharedlib_from_linklib_cmd" >&5
 printf "%s\n" "$lt_cv_sharedlib_from_linklib_cmd" >&6; }
@@ -8229,110 +7909,6 @@ test -z "$sharedlib_from_linklib_cmd" && sharedlib_from_linklib_cmd=$ECHO
 
 
 
-if test -n "$ac_tool_prefix"; then
-  # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args.
-set dummy ${ac_tool_prefix}ranlib; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_RANLIB+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$RANLIB"; then
-  ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-fi ;;
-esac
-fi
-RANLIB=$ac_cv_prog_RANLIB
-if test -n "$RANLIB"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5
-printf "%s\n" "$RANLIB" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-fi
-if test -z "$ac_cv_prog_RANLIB"; then
-  ac_ct_RANLIB=$RANLIB
-  # Extract the first word of "ranlib", so it can be a program name with args.
-set dummy ranlib; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_prog_ac_ct_RANLIB+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_RANLIB"; then
-  ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-    ac_cv_prog_ac_ct_RANLIB="ranlib"
-    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
-    break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-fi ;;
-esac
-fi
-ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB
-if test -n "$ac_ct_RANLIB"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5
-printf "%s\n" "$ac_ct_RANLIB" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-  if test "x$ac_ct_RANLIB" = x; then
-    RANLIB=":"
-  else
-    case $cross_compiling:$ac_tool_warned in
-yes:)
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
-printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
-ac_tool_warned=yes ;;
-esac
-    RANLIB=$ac_ct_RANLIB
-  fi
-else
-  RANLIB="$ac_cv_prog_RANLIB"
-fi
-
 if test -n "$ac_tool_prefix"; then
   for ac_prog in ar
   do
@@ -8343,8 +7919,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_AR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$AR"; then
+else $as_nop
+  if test -n "$AR"; then
   ac_cv_prog_AR="$AR" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8366,8 +7942,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 AR=$ac_cv_prog_AR
 if test -n "$AR"; then
@@ -8393,8 +7968,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_AR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_AR"; then
+else $as_nop
+  if test -n "$ac_ct_AR"; then
   ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8416,8 +7991,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_AR=$ac_cv_prog_ac_ct_AR
 if test -n "$ac_ct_AR"; then
@@ -8446,29 +8020,13 @@ esac
 fi
 
 : ${AR=ar}
+: ${AR_FLAGS=cr}
 
 
 
 
 
 
-# Use ARFLAGS variable as AR's operation code to sync the variable naming with
-# Automake.  If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have
-# higher priority because that's what people were doing historically (setting
-# ARFLAGS for automake and AR_FLAGS for libtool).  FIXME: Make the AR_FLAGS
-# variable obsoleted/removed.
-
-test ${AR_FLAGS+y} || AR_FLAGS=${ARFLAGS-cr}
-lt_ar_flags=$AR_FLAGS
-
-
-
-
-
-
-# Make AR_FLAGS overridable by 'make ARFLAGS='.  Don't try to run-time override
-# by AR_FLAGS because that was never working and AR_FLAGS is about to die.
-
 
 
 
@@ -8479,8 +8037,8 @@ printf %s "checking for archiver @FILE support... " >&6; }
 if test ${lt_cv_ar_at_file+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_ar_at_file=no
+else $as_nop
+  lt_cv_ar_at_file=no
    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -8517,8 +8075,7 @@ then :
 
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-   ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ar_at_file" >&5
 printf "%s\n" "$lt_cv_ar_at_file" >&6; }
@@ -8543,8 +8100,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_STRIP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$STRIP"; then
+else $as_nop
+  if test -n "$STRIP"; then
   ac_cv_prog_STRIP="$STRIP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8566,8 +8123,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 STRIP=$ac_cv_prog_STRIP
 if test -n "$STRIP"; then
@@ -8589,8 +8145,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_STRIP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_STRIP"; then
+else $as_nop
+  if test -n "$ac_ct_STRIP"; then
   ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -8612,8 +8168,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP
 if test -n "$ac_ct_STRIP"; then
@@ -8646,6 +8201,107 @@ test -z "$STRIP" && STRIP=:
 
 
 
+if test -n "$ac_tool_prefix"; then
+  # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args.
+set dummy ${ac_tool_prefix}ranlib; ac_word=$2
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+printf %s "checking for $ac_word... " >&6; }
+if test ${ac_cv_prog_RANLIB+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  if test -n "$RANLIB"; then
+  ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test.
+else
+as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  case $as_dir in #(((
+    '') as_dir=./ ;;
+    */) ;;
+    *) as_dir=$as_dir/ ;;
+  esac
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
+    ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib"
+    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+fi
+fi
+RANLIB=$ac_cv_prog_RANLIB
+if test -n "$RANLIB"; then
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5
+printf "%s\n" "$RANLIB" >&6; }
+else
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+
+
+fi
+if test -z "$ac_cv_prog_RANLIB"; then
+  ac_ct_RANLIB=$RANLIB
+  # Extract the first word of "ranlib", so it can be a program name with args.
+set dummy ranlib; ac_word=$2
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+printf %s "checking for $ac_word... " >&6; }
+if test ${ac_cv_prog_ac_ct_RANLIB+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  if test -n "$ac_ct_RANLIB"; then
+  ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test.
+else
+as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  case $as_dir in #(((
+    '') as_dir=./ ;;
+    */) ;;
+    *) as_dir=$as_dir/ ;;
+  esac
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
+    ac_cv_prog_ac_ct_RANLIB="ranlib"
+    printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+fi
+fi
+ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB
+if test -n "$ac_ct_RANLIB"; then
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5
+printf "%s\n" "$ac_ct_RANLIB" >&6; }
+else
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+fi
+
+  if test "x$ac_ct_RANLIB" = x; then
+    RANLIB=":"
+  else
+    case $cross_compiling:$ac_tool_warned in
+yes:)
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
+printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
+ac_tool_warned=yes ;;
+esac
+    RANLIB=$ac_ct_RANLIB
+  fi
+else
+  RANLIB="$ac_cv_prog_RANLIB"
+fi
 
 test -z "$RANLIB" && RANLIB=:
 
@@ -8660,8 +8316,15 @@ old_postinstall_cmds='chmod 644 $oldlib'
 old_postuninstall_cmds=
 
 if test -n "$RANLIB"; then
+  case $host_os in
+  bitrig* | openbsd*)
+    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib"
+    ;;
+  *)
+    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
+    ;;
+  esac
   old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib"
-  old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
 fi
 
 case $host_os in
@@ -8725,8 +8388,8 @@ printf %s "checking command to parse $NM output from $compiler object... " >&6;
 if test ${lt_cv_sys_global_symbol_pipe+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
 # These are sane defaults that work on at least a few old systems.
 # [They come from Ultrix.  What could be older than Ultrix?!! ;)]
 
@@ -8741,7 +8404,7 @@ case $host_os in
 aix*)
   symcode='[BCDT]'
   ;;
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   symcode='[ABCDGISTW]'
   ;;
 hpux*)
@@ -8756,7 +8419,7 @@ osf*)
   symcode='[BCDEGQRST]'
   ;;
 solaris*)
-  symcode='[BCDRT]'
+  symcode='[BDRT]'
   ;;
 sco3.2v5*)
   symcode='[DT]'
@@ -8780,7 +8443,7 @@ esac
 
 if test "$lt_cv_nm_interface" = "MS dumpbin"; then
   # Gets list of data symbols to import.
-  lt_cv_sys_global_symbol_to_import="$SED -n -e 's/^I .* \(.*\)$/\1/p'"
+  lt_cv_sys_global_symbol_to_import="sed -n -e 's/^I .* \(.*\)$/\1/p'"
   # Adjust the below global symbol transforms to fixup imported variables.
   lt_cdecl_hook=" -e 's/^I .* \(.*\)$/extern __declspec(dllimport) char \1;/p'"
   lt_c_name_hook=" -e 's/^I .* \(.*\)$/  {\"\1\", (void *) 0},/p'"
@@ -8798,20 +8461,20 @@ fi
 # Transform an extracted symbol line into a proper C declaration.
 # Some systems (esp. on ia64) link data and code symbols differently,
 # so use this general approach.
-lt_cv_sys_global_symbol_to_cdecl="$SED -n"\
+lt_cv_sys_global_symbol_to_cdecl="sed -n"\
 $lt_cdecl_hook\
 " -e 's/^T .* \(.*\)$/extern int \1();/p'"\
 " -e 's/^$symcode$symcode* .* \(.*\)$/extern char \1;/p'"
 
 # Transform an extracted symbol line into symbol name and symbol address
-lt_cv_sys_global_symbol_to_c_name_address="$SED -n"\
+lt_cv_sys_global_symbol_to_c_name_address="sed -n"\
 $lt_c_name_hook\
 " -e 's/^: \(.*\) .*$/  {\"\1\", (void *) 0},/p'"\
 " -e 's/^$symcode$symcode* .* \(.*\)$/  {\"\1\", (void *) \&\1},/p'"
 
 # Transform an extracted symbol line into symbol name with lib prefix and
 # symbol address.
-lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="$SED -n"\
+lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="sed -n"\
 $lt_c_name_lib_hook\
 " -e 's/^: \(.*\) .*$/  {\"\1\", (void *) 0},/p'"\
 " -e 's/^$symcode$symcode* .* \(lib.*\)$/  {\"\1\", (void *) \&\1},/p'"\
@@ -8820,7 +8483,7 @@ $lt_c_name_lib_hook\
 # Handle CRLF in mingw tool chain
 opt_cr=
 case $build_os in
-mingw* | windows*)
+mingw*)
   opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp
   ;;
 esac
@@ -8835,7 +8498,7 @@ for ac_symprfx in "" "_"; do
   if test "$lt_cv_nm_interface" = "MS dumpbin"; then
     # Fake it for dumpbin and say T for any non-static function,
     # D for any global variable and I for any imported variable.
-    # Also find C++ and __fastcall symbols from MSVC++ or ICC,
+    # Also find C++ and __fastcall symbols from MSVC++,
     # which start with @ or ?.
     lt_cv_sys_global_symbol_pipe="$AWK '"\
 "     {last_section=section; section=\$ 3};"\
@@ -8853,9 +8516,9 @@ for ac_symprfx in "" "_"; do
 "     s[1]~prfx {split(s[1],t,\"@\"); print f,t[1],substr(t[1],length(prfx))}"\
 "     ' prfx=^$ac_symprfx"
   else
-    lt_cv_sys_global_symbol_pipe="$SED -n -e 's/^.*[	 ]\($symcode$symcode*\)[	 ][	 ]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
+    lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[	 ]\($symcode$symcode*\)[	 ][	 ]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
   fi
-  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | $SED '/ __gnu_lto/d'"
+  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ __gnu_lto/d'"
 
   # Check to see that the pipe works correctly.
   pipe_works=no
@@ -8871,7 +8534,7 @@ void nm_test_func(void){}
 #ifdef __cplusplus
 }
 #endif
-int main(void){nm_test_var='a';nm_test_func();return(0);}
+int main(){nm_test_var='a';nm_test_func();return(0);}
 _LT_EOF
 
   if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
@@ -8881,11 +8544,8 @@ _LT_EOF
   test $ac_status = 0; }; then
     # Now try to grab the symbols.
     nlist=conftest.nm
-    if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist\""; } >&5
-  (eval $NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist) 2>&5
-  ac_status=$?
-  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; } && test -s "$nlist"; then
+    $ECHO "$as_me:$LINENO: $NM conftest.$ac_objext | $lt_cv_sys_global_symbol_pipe > $nlist" >&5
+    if eval "$NM" conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist 2>&5 && test -s "$nlist"; then
       # Try sorting and uniquifying the output.
       if sort "$nlist" | uniq > "$nlist"T; then
 	mv -f "$nlist"T "$nlist"
@@ -8981,8 +8641,7 @@ _LT_EOF
     lt_cv_sys_global_symbol_pipe=
   fi
 done
- ;;
-esac
+
 fi
 
 if test -z "$lt_cv_sys_global_symbol_pipe"; then
@@ -9046,9 +8705,8 @@ printf %s "checking for sysroot... " >&6; }
 if test ${with_sysroot+y}
 then :
   withval=$with_sysroot;
-else case e in #(
-  e) with_sysroot=no ;;
-esac
+else $as_nop
+  with_sysroot=no
 fi
 
 
@@ -9056,13 +8714,11 @@ lt_sysroot=
 case $with_sysroot in #(
  yes)
    if test yes = "$GCC"; then
-     # Trim trailing / since we'll always append absolute paths and we want
-     # to avoid //, if only for less confusing output for the user.
-     lt_sysroot=`$CC --print-sysroot 2>/dev/null | $SED 's:/\+$::'`
+     lt_sysroot=`$CC --print-sysroot 2>/dev/null`
    fi
    ;; #(
  /*)
-   lt_sysroot=`echo "$with_sysroot" | $SED -e "$sed_quote_subst"`
+   lt_sysroot=`echo "$with_sysroot" | sed -e "$sed_quote_subst"`
    ;; #(
  no|'')
    ;; #(
@@ -9085,8 +8741,8 @@ printf %s "checking for a working dd... " >&6; }
 if test ${ac_cv_path_lt_DD+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) printf 0123456789abcdef0123456789abcdef >conftest.i
+else $as_nop
+  printf 0123456789abcdef0123456789abcdef >conftest.i
 cat conftest.i conftest.i >conftest2.i
 : ${lt_DD:=$DD}
 if test -z "$lt_DD"; then
@@ -9122,8 +8778,7 @@ else
   ac_cv_path_lt_DD=$lt_DD
 fi
 
-rm -f conftest.i conftest2.i conftest.out ;;
-esac
+rm -f conftest.i conftest2.i conftest.out
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_lt_DD" >&5
 printf "%s\n" "$ac_cv_path_lt_DD" >&6; }
@@ -9134,8 +8789,8 @@ printf %s "checking how to truncate binary pipes... " >&6; }
 if test ${lt_cv_truncate_bin+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) printf 0123456789abcdef0123456789abcdef >conftest.i
+else $as_nop
+  printf 0123456789abcdef0123456789abcdef >conftest.i
 cat conftest.i conftest.i >conftest2.i
 lt_cv_truncate_bin=
 if "$ac_cv_path_lt_DD" bs=32 count=1 <conftest2.i >conftest.out 2>/dev/null; then
@@ -9143,8 +8798,7 @@ if "$ac_cv_path_lt_DD" bs=32 count=1 <conftest2.i >conftest.out 2>/dev/null; the
   && lt_cv_truncate_bin="$ac_cv_path_lt_DD bs=4096 count=1"
 fi
 rm -f conftest.i conftest2.i conftest.out
-test -z "$lt_cv_truncate_bin" && lt_cv_truncate_bin="$SED -e 4q" ;;
-esac
+test -z "$lt_cv_truncate_bin" && lt_cv_truncate_bin="$SED -e 4q"
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_truncate_bin" >&5
 printf "%s\n" "$lt_cv_truncate_bin" >&6; }
@@ -9189,7 +8843,7 @@ ia64-*-hpux*)
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *ELF-32*)
 	HPUX_IA64_MODE=32
 	;;
@@ -9210,7 +8864,7 @@ ia64-*-hpux*)
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
     if test yes = "$lt_cv_prog_gnu_ld"; then
-      case `$FILECMD conftest.$ac_objext` in
+      case `/usr/bin/file conftest.$ac_objext` in
 	*32-bit*)
 	  LD="${LD-ld} -melf32bsmip"
 	  ;;
@@ -9222,7 +8876,7 @@ ia64-*-hpux*)
 	;;
       esac
     else
-      case `$FILECMD conftest.$ac_objext` in
+      case `/usr/bin/file conftest.$ac_objext` in
 	*32-bit*)
 	  LD="${LD-ld} -32"
 	  ;;
@@ -9248,7 +8902,7 @@ mips64*-*linux*)
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
     emul=elf
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *32-bit*)
 	emul="${emul}32"
 	;;
@@ -9256,7 +8910,7 @@ mips64*-*linux*)
 	emul="${emul}64"
 	;;
     esac
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *MSB*)
 	emul="${emul}btsmip"
 	;;
@@ -9264,7 +8918,7 @@ mips64*-*linux*)
 	emul="${emul}ltsmip"
 	;;
     esac
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *N32*)
 	emul="${emul}n32"
 	;;
@@ -9275,7 +8929,7 @@ mips64*-*linux*)
   ;;
 
 x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \
-s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
+s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
   # Find out what ABI is being produced by ac_compile, and set linker
   # options accordingly.  Note that the listed cases only cover the
   # situations where additional linker options are needed (such as when
@@ -9288,14 +8942,14 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-    case `$FILECMD conftest.o` in
+    case `/usr/bin/file conftest.o` in
       *32-bit*)
 	case $host in
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_i386_fbsd"
 	    ;;
-	  x86_64-*linux*|x86_64-gnu*)
-	    case `$FILECMD conftest.o` in
+	  x86_64-*linux*)
+	    case `/usr/bin/file conftest.o` in
 	      *x86-64*)
 		LD="${LD-ld} -m elf32_x86_64"
 		;;
@@ -9323,7 +8977,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_x86_64_fbsd"
 	    ;;
-	  x86_64-*linux*|x86_64-gnu*)
+	  x86_64-*linux*)
 	    LD="${LD-ld} -m elf_x86_64"
 	    ;;
 	  powerpcle-*linux*)
@@ -9354,8 +9008,8 @@ printf %s "checking whether the C compiler needs -belf... " >&6; }
 if test ${lt_cv_cc_needs_belf+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_ext=c
+else $as_nop
+  ac_ext=c
 ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
 ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
@@ -9375,9 +9029,8 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   lt_cv_cc_needs_belf=yes
-else case e in #(
-  e) lt_cv_cc_needs_belf=no ;;
-esac
+else $as_nop
+  lt_cv_cc_needs_belf=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
@@ -9386,8 +9039,7 @@ ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
 ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
 ac_compiler_gnu=$ac_cv_c_compiler_gnu
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_cc_needs_belf" >&5
 printf "%s\n" "$lt_cv_cc_needs_belf" >&6; }
@@ -9405,7 +9057,7 @@ printf "%s\n" "$lt_cv_cc_needs_belf" >&6; }
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; then
-    case `$FILECMD conftest.o` in
+    case `/usr/bin/file conftest.o` in
     *64-bit*)
       case $lt_cv_prog_gnu_ld in
       yes*)
@@ -9445,8 +9097,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_MANIFEST_TOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$MANIFEST_TOOL"; then
+else $as_nop
+  if test -n "$MANIFEST_TOOL"; then
   ac_cv_prog_MANIFEST_TOOL="$MANIFEST_TOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9468,8 +9120,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 MANIFEST_TOOL=$ac_cv_prog_MANIFEST_TOOL
 if test -n "$MANIFEST_TOOL"; then
@@ -9491,8 +9142,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_MANIFEST_TOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_MANIFEST_TOOL"; then
+else $as_nop
+  if test -n "$ac_ct_MANIFEST_TOOL"; then
   ac_cv_prog_ac_ct_MANIFEST_TOOL="$ac_ct_MANIFEST_TOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9514,8 +9165,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_MANIFEST_TOOL=$ac_cv_prog_ac_ct_MANIFEST_TOOL
 if test -n "$ac_ct_MANIFEST_TOOL"; then
@@ -9544,23 +9194,22 @@ fi
 test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $MANIFEST_TOOL is a manifest tool" >&5
 printf %s "checking if $MANIFEST_TOOL is a manifest tool... " >&6; }
-if test ${lt_cv_path_manifest_tool+y}
+if test ${lt_cv_path_mainfest_tool+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_path_manifest_tool=no
+else $as_nop
+  lt_cv_path_mainfest_tool=no
   echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&5
   $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out
   cat conftest.err >&5
   if $GREP 'Manifest Tool' conftest.out > /dev/null; then
-    lt_cv_path_manifest_tool=yes
+    lt_cv_path_mainfest_tool=yes
   fi
-  rm -f conftest* ;;
-esac
+  rm -f conftest*
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_manifest_tool" >&5
-printf "%s\n" "$lt_cv_path_manifest_tool" >&6; }
-if test yes != "$lt_cv_path_manifest_tool"; then
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_mainfest_tool" >&5
+printf "%s\n" "$lt_cv_path_mainfest_tool" >&6; }
+if test yes != "$lt_cv_path_mainfest_tool"; then
   MANIFEST_TOOL=:
 fi
 
@@ -9579,8 +9228,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_DSYMUTIL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$DSYMUTIL"; then
+else $as_nop
+  if test -n "$DSYMUTIL"; then
   ac_cv_prog_DSYMUTIL="$DSYMUTIL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9602,8 +9251,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 DSYMUTIL=$ac_cv_prog_DSYMUTIL
 if test -n "$DSYMUTIL"; then
@@ -9625,8 +9273,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_DSYMUTIL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_DSYMUTIL"; then
+else $as_nop
+  if test -n "$ac_ct_DSYMUTIL"; then
   ac_cv_prog_ac_ct_DSYMUTIL="$ac_ct_DSYMUTIL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9648,8 +9296,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_DSYMUTIL=$ac_cv_prog_ac_ct_DSYMUTIL
 if test -n "$ac_ct_DSYMUTIL"; then
@@ -9683,8 +9330,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_NMEDIT+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$NMEDIT"; then
+else $as_nop
+  if test -n "$NMEDIT"; then
   ac_cv_prog_NMEDIT="$NMEDIT" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9706,8 +9353,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 NMEDIT=$ac_cv_prog_NMEDIT
 if test -n "$NMEDIT"; then
@@ -9729,8 +9375,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_NMEDIT+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_NMEDIT"; then
+else $as_nop
+  if test -n "$ac_ct_NMEDIT"; then
   ac_cv_prog_ac_ct_NMEDIT="$ac_ct_NMEDIT" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9752,8 +9398,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_NMEDIT=$ac_cv_prog_ac_ct_NMEDIT
 if test -n "$ac_ct_NMEDIT"; then
@@ -9787,8 +9432,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_LIPO+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$LIPO"; then
+else $as_nop
+  if test -n "$LIPO"; then
   ac_cv_prog_LIPO="$LIPO" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9810,8 +9455,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 LIPO=$ac_cv_prog_LIPO
 if test -n "$LIPO"; then
@@ -9833,8 +9477,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_LIPO+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_LIPO"; then
+else $as_nop
+  if test -n "$ac_ct_LIPO"; then
   ac_cv_prog_ac_ct_LIPO="$ac_ct_LIPO" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9856,8 +9500,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_LIPO=$ac_cv_prog_ac_ct_LIPO
 if test -n "$ac_ct_LIPO"; then
@@ -9891,8 +9534,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_OTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$OTOOL"; then
+else $as_nop
+  if test -n "$OTOOL"; then
   ac_cv_prog_OTOOL="$OTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9914,8 +9557,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 OTOOL=$ac_cv_prog_OTOOL
 if test -n "$OTOOL"; then
@@ -9937,8 +9579,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_OTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_OTOOL"; then
+else $as_nop
+  if test -n "$ac_ct_OTOOL"; then
   ac_cv_prog_ac_ct_OTOOL="$ac_ct_OTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -9960,8 +9602,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_OTOOL=$ac_cv_prog_ac_ct_OTOOL
 if test -n "$ac_ct_OTOOL"; then
@@ -9995,8 +9636,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_OTOOL64+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$OTOOL64"; then
+else $as_nop
+  if test -n "$OTOOL64"; then
   ac_cv_prog_OTOOL64="$OTOOL64" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10018,8 +9659,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 OTOOL64=$ac_cv_prog_OTOOL64
 if test -n "$OTOOL64"; then
@@ -10041,8 +9681,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_OTOOL64+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_OTOOL64"; then
+else $as_nop
+  if test -n "$ac_ct_OTOOL64"; then
   ac_cv_prog_ac_ct_OTOOL64="$ac_ct_OTOOL64" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10064,8 +9704,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_OTOOL64=$ac_cv_prog_ac_ct_OTOOL64
 if test -n "$ac_ct_OTOOL64"; then
@@ -10122,8 +9761,8 @@ printf %s "checking for -single_module linker flag... " >&6; }
 if test ${lt_cv_apple_cc_single_mod+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_apple_cc_single_mod=no
+else $as_nop
+  lt_cv_apple_cc_single_mod=no
       if test -z "$LT_MULTI_MODULE"; then
 	# By default we will add the -single_module flag. You can override
 	# by either setting the environment variable LT_MULTI_MODULE
@@ -10149,58 +9788,18 @@ else case e in #(
 	fi
 	rm -rf libconftest.dylib*
 	rm -f conftest.*
-      fi ;;
-esac
+      fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_apple_cc_single_mod" >&5
 printf "%s\n" "$lt_cv_apple_cc_single_mod" >&6; }
 
-    # Feature test to disable chained fixups since it is not
-    # compatible with '-undefined dynamic_lookup'
-    { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -no_fixup_chains linker flag" >&5
-printf %s "checking for -no_fixup_chains linker flag... " >&6; }
-if test ${lt_cv_support_no_fixup_chains+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e)  save_LDFLAGS=$LDFLAGS
-        LDFLAGS="$LDFLAGS -Wl,-no_fixup_chains"
-        cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main (void)
-{
-
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"
-then :
-  lt_cv_support_no_fixup_chains=yes
-else case e in #(
-  e) lt_cv_support_no_fixup_chains=no
-         ;;
-esac
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam \
-    conftest$ac_exeext conftest.$ac_ext
-        LDFLAGS=$save_LDFLAGS
-
-     ;;
-esac
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_support_no_fixup_chains" >&5
-printf "%s\n" "$lt_cv_support_no_fixup_chains" >&6; }
-
     { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -exported_symbols_list linker flag" >&5
 printf %s "checking for -exported_symbols_list linker flag... " >&6; }
 if test ${lt_cv_ld_exported_symbols_list+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_ld_exported_symbols_list=no
+else $as_nop
+  lt_cv_ld_exported_symbols_list=no
       save_LDFLAGS=$LDFLAGS
       echo "_main" > conftest.sym
       LDFLAGS="$LDFLAGS -Wl,-exported_symbols_list,conftest.sym"
@@ -10218,15 +9817,13 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   lt_cv_ld_exported_symbols_list=yes
-else case e in #(
-  e) lt_cv_ld_exported_symbols_list=no ;;
-esac
+else $as_nop
+  lt_cv_ld_exported_symbols_list=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
 	LDFLAGS=$save_LDFLAGS
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_exported_symbols_list" >&5
 printf "%s\n" "$lt_cv_ld_exported_symbols_list" >&6; }
@@ -10236,19 +9833,19 @@ printf %s "checking for -force_load linker flag... " >&6; }
 if test ${lt_cv_ld_force_load+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_ld_force_load=no
+else $as_nop
+  lt_cv_ld_force_load=no
       cat > conftest.c << _LT_EOF
 int forced_loaded() { return 2;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS -c -o conftest.o conftest.c" >&5
       $LTCC $LTCFLAGS -c -o conftest.o conftest.c 2>&5
-      echo "$AR $AR_FLAGS libconftest.a conftest.o" >&5
-      $AR $AR_FLAGS libconftest.a conftest.o 2>&5
+      echo "$AR cr libconftest.a conftest.o" >&5
+      $AR cr libconftest.a conftest.o 2>&5
       echo "$RANLIB libconftest.a" >&5
       $RANLIB libconftest.a 2>&5
       cat > conftest.c << _LT_EOF
-int main(void) { return 0;}
+int main() { return 0;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&5
       $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err
@@ -10262,8 +9859,7 @@ _LT_EOF
       fi
         rm -f conftest.err libconftest.a conftest conftest.c
         rm -rf conftest.dSYM
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_force_load" >&5
 printf "%s\n" "$lt_cv_ld_force_load" >&6; }
@@ -10272,16 +9868,17 @@ printf "%s\n" "$lt_cv_ld_force_load" >&6; }
       _lt_dar_allow_undefined='$wl-undefined ${wl}suppress' ;;
     darwin1.*)
       _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
-    darwin*)
-      case $MACOSX_DEPLOYMENT_TARGET,$host in
-        10.[012],*|,*powerpc*-darwin[5-8]*)
-          _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
-        *)
-          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup'
-          if test yes = "$lt_cv_support_no_fixup_chains"; then
-            as_fn_append _lt_dar_allow_undefined ' $wl-no_fixup_chains'
-          fi
-        ;;
+    darwin*) # darwin 5.x on
+      # if running on 10.5 or later, the deployment target defaults
+      # to the OS version, if on x86, and 10.4, the deployment
+      # target defaults to 10.4. Don't you love it?
+      case ${MACOSX_DEPLOYMENT_TARGET-10.0},$host in
+	10.0,*86*-darwin8*|10.0,*-darwin[912]*)
+	  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
+	10.[012][,.]*)
+	  _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
+	10.*|11.*)
+	  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
       esac
     ;;
   esac
@@ -10362,7 +9959,7 @@ func_stripname_cnf ()
 enable_win32_dll=yes
 
 case $host in
-*-*-cygwin* | *-*-mingw* | *-*-windows* | *-*-pw32* | *-*-cegcc*)
+*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*)
   if test -n "$ac_tool_prefix"; then
   # Extract the first word of "${ac_tool_prefix}as", so it can be a program name with args.
 set dummy ${ac_tool_prefix}as; ac_word=$2
@@ -10371,8 +9968,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_AS+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$AS"; then
+else $as_nop
+  if test -n "$AS"; then
   ac_cv_prog_AS="$AS" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10394,8 +9991,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 AS=$ac_cv_prog_AS
 if test -n "$AS"; then
@@ -10417,8 +10013,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_AS+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_AS"; then
+else $as_nop
+  if test -n "$ac_ct_AS"; then
   ac_cv_prog_ac_ct_AS="$ac_ct_AS" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10440,8 +10036,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_AS=$ac_cv_prog_ac_ct_AS
 if test -n "$ac_ct_AS"; then
@@ -10475,8 +10070,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_DLLTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$DLLTOOL"; then
+else $as_nop
+  if test -n "$DLLTOOL"; then
   ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10498,8 +10093,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 DLLTOOL=$ac_cv_prog_DLLTOOL
 if test -n "$DLLTOOL"; then
@@ -10521,8 +10115,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_DLLTOOL+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_DLLTOOL"; then
+else $as_nop
+  if test -n "$ac_ct_DLLTOOL"; then
   ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10544,8 +10138,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL
 if test -n "$ac_ct_DLLTOOL"; then
@@ -10579,8 +10172,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_OBJDUMP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$OBJDUMP"; then
+else $as_nop
+  if test -n "$OBJDUMP"; then
   ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10602,8 +10195,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 OBJDUMP=$ac_cv_prog_OBJDUMP
 if test -n "$OBJDUMP"; then
@@ -10625,8 +10217,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_OBJDUMP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_OBJDUMP"; then
+else $as_nop
+  if test -n "$ac_ct_OBJDUMP"; then
   ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -10648,8 +10240,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP
 if test -n "$ac_ct_OBJDUMP"; then
@@ -10722,9 +10313,8 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_shared=yes ;;
-esac
+else $as_nop
+  enable_shared=yes
 fi
 
 
@@ -10755,9 +10345,8 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_static=yes ;;
-esac
+else $as_nop
+  enable_static=yes
 fi
 
 
@@ -10768,52 +10357,28 @@ fi
 
 
 
-  # Check whether --enable-pic was given.
-if test ${enable_pic+y}
-then :
-  enableval=$enable_pic; lt_p=${PACKAGE-default}
-     case $enableval in
-     yes|no) pic_mode=$enableval ;;
-     *)
-       pic_mode=default
-       # Look at the argument we got.  We use all the common list separators.
-       lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-       for lt_pkg in $enableval; do
-	 IFS=$lt_save_ifs
-	 if test "X$lt_pkg" = "X$lt_p"; then
-	   pic_mode=yes
-	 fi
-       done
-       IFS=$lt_save_ifs
-       ;;
-     esac
-else case e in #(
-  e)           # Check whether --with-pic was given.
+
+# Check whether --with-pic was given.
 if test ${with_pic+y}
 then :
   withval=$with_pic; lt_p=${PACKAGE-default}
-	 case $withval in
-	 yes|no) pic_mode=$withval ;;
-	 *)
-	   pic_mode=default
-	   # Look at the argument we got.  We use all the common list separators.
-	   lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-	   for lt_pkg in $withval; do
-	     IFS=$lt_save_ifs
-	     if test "X$lt_pkg" = "X$lt_p"; then
-	       pic_mode=yes
-	     fi
-	   done
-	   IFS=$lt_save_ifs
-	   ;;
-	 esac
-else case e in #(
-  e) pic_mode=default ;;
-esac
-fi
-
-     ;;
-esac
+    case $withval in
+    yes|no) pic_mode=$withval ;;
+    *)
+      pic_mode=default
+      # Look at the argument we got.  We use all the common list separators.
+      lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+      for lt_pkg in $withval; do
+	IFS=$lt_save_ifs
+	if test "X$lt_pkg" = "X$lt_p"; then
+	  pic_mode=yes
+	fi
+      done
+      IFS=$lt_save_ifs
+      ;;
+    esac
+else $as_nop
+  pic_mode=default
 fi
 
 
@@ -10843,9 +10408,8 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_fast_install=yes ;;
-esac
+else $as_nop
+  enable_fast_install=yes
 fi
 
 
@@ -10860,46 +10424,29 @@ case $host,$enable_shared in
 power*-*-aix[5-9]*,yes)
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking which variant of shared library versioning to provide" >&5
 printf %s "checking which variant of shared library versioning to provide... " >&6; }
-  # Check whether --enable-aix-soname was given.
-if test ${enable_aix_soname+y}
-then :
-  enableval=$enable_aix_soname; case $enableval in
-     aix|svr4|both)
-       ;;
-     *)
-       as_fn_error $? "Unknown argument to --enable-aix-soname" "$LINENO" 5
-       ;;
-     esac
-     lt_cv_with_aix_soname=$enable_aix_soname
-else case e in #(
-  e) # Check whether --with-aix-soname was given.
+
+# Check whether --with-aix-soname was given.
 if test ${with_aix_soname+y}
 then :
   withval=$with_aix_soname; case $withval in
-         aix|svr4|both)
-           ;;
-         *)
-           as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5
-           ;;
-         esac
-         lt_cv_with_aix_soname=$with_aix_soname
-else case e in #(
-  e) if test ${lt_cv_with_aix_soname+y}
+    aix|svr4|both)
+      ;;
+    *)
+      as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5
+      ;;
+    esac
+    lt_cv_with_aix_soname=$with_aix_soname
+else $as_nop
+  if test ${lt_cv_with_aix_soname+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_with_aix_soname=aix ;;
-esac
-fi
- ;;
-esac
+else $as_nop
+  lt_cv_with_aix_soname=aix
 fi
 
-     enable_aix_soname=$lt_cv_with_aix_soname ;;
-esac
+    with_aix_soname=$lt_cv_with_aix_soname
 fi
 
-  with_aix_soname=$enable_aix_soname
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $with_aix_soname" >&5
 printf "%s\n" "$with_aix_soname" >&6; }
   if test aix != "$with_aix_soname"; then
@@ -10988,8 +10535,8 @@ printf %s "checking for objdir... " >&6; }
 if test ${lt_cv_objdir+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) rm -f .libs 2>/dev/null
+else $as_nop
+  rm -f .libs 2>/dev/null
 mkdir .libs 2>/dev/null
 if test -d .libs; then
   lt_cv_objdir=.libs
@@ -10997,8 +10544,7 @@ else
   # MS-DOS does not allow filenames that begin with a dot.
   lt_cv_objdir=_libs
 fi
-rmdir .libs 2>/dev/null ;;
-esac
+rmdir .libs 2>/dev/null
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_objdir" >&5
 printf "%s\n" "$lt_cv_objdir" >&6; }
@@ -11029,8 +10575,8 @@ esac
 ofile=libtool
 can_build_shared=yes
 
-# All known linkers require a '.a' archive for static linking (except MSVC and
-# ICC, which need '.lib').
+# All known linkers require a '.a' archive for static linking (except MSVC,
+# which needs '.lib').
 libext=a
 
 with_gnu_ld=$lt_cv_prog_gnu_ld
@@ -11059,8 +10605,8 @@ printf %s "checking for ${ac_tool_prefix}file... " >&6; }
 if test ${lt_cv_path_MAGIC_CMD+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) case $MAGIC_CMD in
+else $as_nop
+  case $MAGIC_CMD in
 [\\/*] |  ?:[\\/]*)
   lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path.
   ;;
@@ -11103,7 +10649,6 @@ _LT_EOF
   IFS=$lt_save_ifs
   MAGIC_CMD=$lt_save_MAGIC_CMD
   ;;
-esac ;;
 esac
 fi
 
@@ -11127,8 +10672,8 @@ printf %s "checking for file... " >&6; }
 if test ${lt_cv_path_MAGIC_CMD+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) case $MAGIC_CMD in
+else $as_nop
+  case $MAGIC_CMD in
 [\\/*] |  ?:[\\/]*)
   lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path.
   ;;
@@ -11171,7 +10716,6 @@ _LT_EOF
   IFS=$lt_save_ifs
   MAGIC_CMD=$lt_save_MAGIC_CMD
   ;;
-esac ;;
 esac
 fi
 
@@ -11215,7 +10759,7 @@ objext=$objext
 lt_simple_compile_test_code="int some_variable = 0;"
 
 # Code to be used in simple link tests
-lt_simple_link_test_code='int main(void){return(0);}'
+lt_simple_link_test_code='int main(){return(0);}'
 
 
 
@@ -11271,8 +10815,8 @@ printf %s "checking if $compiler supports -fno-rtti -fno-exceptions... " >&6; }
 if test ${lt_cv_prog_compiler_rtti_exceptions+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_rtti_exceptions=no
+else $as_nop
+  lt_cv_prog_compiler_rtti_exceptions=no
    ac_outfile=conftest.$ac_objext
    echo "$lt_simple_compile_test_code" > conftest.$ac_ext
    lt_compiler_flag="-fno-rtti -fno-exceptions"  ## exclude from sc_useless_quotes_in_assignment
@@ -11300,8 +10844,7 @@ else case e in #(
      fi
    fi
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_rtti_exceptions" >&5
 printf "%s\n" "$lt_cv_prog_compiler_rtti_exceptions" >&6; }
@@ -11357,7 +10900,7 @@ lt_prog_compiler_static=
       # PIC is the default for these OSes.
       ;;
 
-    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -11460,7 +11003,7 @@ lt_prog_compiler_static=
       esac
       ;;
 
-    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       lt_prog_compiler_pic='-DDLL_EXPORT'
@@ -11501,8 +11044,8 @@ lt_prog_compiler_static=
 	lt_prog_compiler_pic='-KPIC'
 	lt_prog_compiler_static='-static'
         ;;
-      *flang* | ftn)
-        # Flang compiler.
+      # flang / f18. f95 an alias for gfortran or flang on Debian
+      flang* | f18* | f95*)
 	lt_prog_compiler_wl='-Wl,'
 	lt_prog_compiler_pic='-fPIC'
 	lt_prog_compiler_static='-static'
@@ -11551,7 +11094,7 @@ lt_prog_compiler_static=
 	lt_prog_compiler_static='-qstaticlink'
 	;;
       *)
-	case `$CC -V 2>&1 | $SED 5q` in
+	case `$CC -V 2>&1 | sed 5q` in
 	*Sun\ Ceres\ Fortran* | *Sun*Fortran*\ [1-7].* | *Sun*Fortran*\ 8.[0-3]*)
 	  # Sun Fortran 8.3 passes all unrecognized flags to the linker
 	  lt_prog_compiler_pic='-KPIC'
@@ -11672,9 +11215,8 @@ printf %s "checking for $compiler option to produce PIC... " >&6; }
 if test ${lt_cv_prog_compiler_pic+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_pic=$lt_prog_compiler_pic ;;
-esac
+else $as_nop
+  lt_cv_prog_compiler_pic=$lt_prog_compiler_pic
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic" >&5
 printf "%s\n" "$lt_cv_prog_compiler_pic" >&6; }
@@ -11689,8 +11231,8 @@ printf %s "checking if $compiler PIC flag $lt_prog_compiler_pic works... " >&6;
 if test ${lt_cv_prog_compiler_pic_works+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_pic_works=no
+else $as_nop
+  lt_cv_prog_compiler_pic_works=no
    ac_outfile=conftest.$ac_objext
    echo "$lt_simple_compile_test_code" > conftest.$ac_ext
    lt_compiler_flag="$lt_prog_compiler_pic -DPIC"  ## exclude from sc_useless_quotes_in_assignment
@@ -11718,8 +11260,7 @@ else case e in #(
      fi
    fi
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_works" >&5
 printf "%s\n" "$lt_cv_prog_compiler_pic_works" >&6; }
@@ -11755,8 +11296,8 @@ printf %s "checking if $compiler static flag $lt_tmp_static_flag works... " >&6;
 if test ${lt_cv_prog_compiler_static_works+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_static_works=no
+else $as_nop
+  lt_cv_prog_compiler_static_works=no
    save_LDFLAGS=$LDFLAGS
    LDFLAGS="$LDFLAGS $lt_tmp_static_flag"
    echo "$lt_simple_link_test_code" > conftest.$ac_ext
@@ -11777,8 +11318,7 @@ else case e in #(
    fi
    $RM -r conftest*
    LDFLAGS=$save_LDFLAGS
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_static_works" >&5
 printf "%s\n" "$lt_cv_prog_compiler_static_works" >&6; }
@@ -11800,8 +11340,8 @@ printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; }
 if test ${lt_cv_prog_compiler_c_o+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_c_o=no
+else $as_nop
+  lt_cv_prog_compiler_c_o=no
    $RM -r conftest 2>/dev/null
    mkdir conftest
    cd conftest
@@ -11841,8 +11381,7 @@ else case e in #(
    cd ..
    $RM -r conftest
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5
 printf "%s\n" "$lt_cv_prog_compiler_c_o" >&6; }
@@ -11857,8 +11396,8 @@ printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; }
 if test ${lt_cv_prog_compiler_c_o+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_c_o=no
+else $as_nop
+  lt_cv_prog_compiler_c_o=no
    $RM -r conftest 2>/dev/null
    mkdir conftest
    cd conftest
@@ -11898,8 +11437,7 @@ else case e in #(
    cd ..
    $RM -r conftest
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5
 printf "%s\n" "$lt_cv_prog_compiler_c_o" >&6; }
@@ -11978,21 +11516,24 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
   extract_expsyms_cmds=
 
   case $host_os in
-  cygwin* | mingw* | windows* | pw32* | cegcc*)
-    # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time
+  cygwin* | mingw* | pw32* | cegcc*)
+    # FIXME: the MSVC++ port hasn't been tested in a loooong time
     # When not using gcc, we currently assume that we are using
-    # Microsoft Visual C++ or Intel C++ Compiler.
+    # Microsoft Visual C++.
     if test yes != "$GCC"; then
       with_gnu_ld=no
     fi
     ;;
   interix*)
-    # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC)
+    # we just hope/assume this is gcc and not c89 (= MSVC++)
     with_gnu_ld=yes
     ;;
-  openbsd*)
+  openbsd* | bitrig*)
     with_gnu_ld=no
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    link_all_deplibs=no
+    ;;
   esac
 
   ld_shlibs=yes
@@ -12039,7 +11580,7 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
       whole_archive_flag_spec=
     fi
     supports_anon_versioning=no
-    case `$LD -v | $SED -e 's/([^)]\+)\s\+//' 2>&1` in
+    case `$LD -v | $SED -e 's/(^)\+)\s\+//' 2>&1` in
       *GNU\ gold*) supports_anon_versioning=yes ;;
       *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11
       *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ...
@@ -12093,7 +11634,7 @@ _LT_EOF
       fi
       ;;
 
-    cygwin* | mingw* | windows* | pw32* | cegcc*)
+    cygwin* | mingw* | pw32* | cegcc*)
       # _LT_TAGVAR(hardcode_libdir_flag_spec, ) is actually meaningless,
       # as there is no search path for DLLs.
       hardcode_libdir_flag_spec='-L$libdir'
@@ -12149,9 +11690,8 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      old_archive_from_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       enable_shared_with_static_runtimes=yes
-      file_list_spec='@'
       ;;
 
     interix[3-9]*)
@@ -12166,7 +11706,7 @@ _LT_EOF
       # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
       # time.  Moving up from 0x10000000 also allows more sbrk(2) space.
       archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
-      archive_expsym_cmds='$SED "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
+      archive_expsym_cmds='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
       ;;
 
     gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu)
@@ -12209,7 +11749,7 @@ _LT_EOF
 	  compiler_needs_object=yes
 	  ;;
 	esac
-	case `$CC -V 2>&1 | $SED 5q` in
+	case `$CC -V 2>&1 | sed 5q` in
 	*Sun\ C*)			# Sun C 5.9
 	  whole_archive_flag_spec='$wl--whole-archive`new_convenience=; for conv in $convenience\"\"; do test -z \"$conv\" || new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive'
 	  compiler_needs_object=yes
@@ -12221,7 +11761,7 @@ _LT_EOF
 
         if test yes = "$supports_anon_versioning"; then
           archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~
-            cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+            cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
             echo "local: *; };" >> $output_objdir/$libname.ver~
             $CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib'
         fi
@@ -12237,7 +11777,7 @@ _LT_EOF
 	  archive_cmds='$LD -shared $libobjs $deplibs $linker_flags -soname $soname -o $lib'
 	  if test yes = "$supports_anon_versioning"; then
 	    archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~
-              cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+              cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
               echo "local: *; };" >> $output_objdir/$libname.ver~
               $LD -shared $libobjs $deplibs $linker_flags -soname $soname -version-script $output_objdir/$libname.ver -o $lib'
 	  fi
@@ -12248,7 +11788,7 @@ _LT_EOF
       fi
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	archive_cmds='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib'
 	wlarc=
@@ -12369,7 +11909,7 @@ _LT_EOF
 	if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
 	  export_symbols_cmds='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && (substr(\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols'
 	else
-	  export_symbols_cmds='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
+	  export_symbols_cmds='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
 	fi
 	aix_use_runtimelinking=no
 
@@ -12494,8 +12034,8 @@ else
   if test ${lt_cv_aix_libpath_+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -12527,8 +12067,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
   if test -z "$lt_cv_aix_libpath_"; then
     lt_cv_aix_libpath_=/usr/lib:/lib
   fi
-   ;;
-esac
+
 fi
 
   aix_libpath=$lt_cv_aix_libpath_
@@ -12550,8 +12089,8 @@ else
   if test ${lt_cv_aix_libpath_+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -12583,8 +12122,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
   if test -z "$lt_cv_aix_libpath_"; then
     lt_cv_aix_libpath_=/usr/lib:/lib
   fi
-   ;;
-esac
+
 fi
 
   aix_libpath=$lt_cv_aix_libpath_
@@ -12640,14 +12178,14 @@ fi
       export_dynamic_flag_spec=-rdynamic
       ;;
 
-    cygwin* | mingw* | windows* | pw32* | cegcc*)
+    cygwin* | mingw* | pw32* | cegcc*)
       # When not using gcc, we currently assume that we are using
-      # Microsoft Visual C++ or Intel C++ Compiler.
+      # Microsoft Visual C++.
       # hardcode_libdir_flag_spec is actually meaningless, as there is
       # no search path for DLLs.
       case $cc_basename in
-      cl* | icl*)
-	# Native MSVC or ICC
+      cl*)
+	# Native MSVC
 	hardcode_libdir_flag_spec=' '
 	allow_undefined_flag=unsupported
 	always_export_symbols=yes
@@ -12657,14 +12195,14 @@ fi
 	# Tell ltmain to make .dll files, not .so files.
 	shrext_cmds=.dll
 	# FIXME: Setting linknames here is a bad hack.
-	archive_cmds='$CC -Fe $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
+	archive_cmds='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
 	archive_expsym_cmds='if   test DEF = "`$SED -n     -e '\''s/^[	 ]*//'\''     -e '\''/^\(;.*\)*$/d'\''     -e '\''s/^\(EXPORTS\|LIBRARY\)\([	 ].*\)*$/DEF/p'\''     -e q     $export_symbols`" ; then
             cp "$export_symbols" "$output_objdir/$soname.def";
             echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp";
           else
             $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp;
           fi~
-          $CC -Fe $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
+          $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
           linknames='
 	# The linker will not automatically build a static lib if we build a DLL.
 	# _LT_TAGVAR(old_archive_from_new_cmds, )='true'
@@ -12688,7 +12226,7 @@ fi
           fi'
 	;;
       *)
-	# Assume MSVC and ICC wrapper
+	# Assume MSVC wrapper
 	hardcode_libdir_flag_spec=' '
 	allow_undefined_flag=unsupported
 	# Tell ltmain to make .lib files, not .a files.
@@ -12729,8 +12267,8 @@ fi
     output_verbose_link_cmd=func_echo_all
     archive_cmds="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil"
     module_cmds="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil"
-    archive_expsym_cmds="$SED 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
-    module_expsym_cmds="$SED -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
+    archive_expsym_cmds="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
+    module_expsym_cmds="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
 
   else
   ld_shlibs=no
@@ -12764,7 +12302,7 @@ fi
       ;;
 
     # FreeBSD 3 and greater uses gcc -shared to do shared libraries.
-    freebsd* | dragonfly* | midnightbsd*)
+    freebsd* | dragonfly*)
       archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags'
       hardcode_libdir_flag_spec='-R$libdir'
       hardcode_direct=yes
@@ -12835,8 +12373,8 @@ printf %s "checking if $CC understands -b... " >&6; }
 if test ${lt_cv_prog_compiler__b+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler__b=no
+else $as_nop
+  lt_cv_prog_compiler__b=no
    save_LDFLAGS=$LDFLAGS
    LDFLAGS="$LDFLAGS -b"
    echo "$lt_simple_link_test_code" > conftest.$ac_ext
@@ -12857,8 +12395,7 @@ else case e in #(
    fi
    $RM -r conftest*
    LDFLAGS=$save_LDFLAGS
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler__b" >&5
 printf "%s\n" "$lt_cv_prog_compiler__b" >&6; }
@@ -12906,8 +12443,8 @@ printf %s "checking whether the $host_os linker accepts -exported_symbol... " >&
 if test ${lt_cv_irix_exported_symbol+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) save_LDFLAGS=$LDFLAGS
+else $as_nop
+  save_LDFLAGS=$LDFLAGS
 	   LDFLAGS="$LDFLAGS -shared $wl-exported_symbol ${wl}foo $wl-update_registry $wl/dev/null"
 	   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -12916,20 +12453,19 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   lt_cv_irix_exported_symbol=yes
-else case e in #(
-  e) lt_cv_irix_exported_symbol=no ;;
-esac
+else $as_nop
+  lt_cv_irix_exported_symbol=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-           LDFLAGS=$save_LDFLAGS ;;
-esac
+           LDFLAGS=$save_LDFLAGS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_irix_exported_symbol" >&5
 printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
 	if test yes = "$lt_cv_irix_exported_symbol"; then
           archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib'
 	fi
+	link_all_deplibs=no
       else
 	archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib'
 	archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib'
@@ -12951,7 +12487,7 @@ printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
       esac
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'  # a.out
       else
@@ -12973,7 +12509,7 @@ printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
     *nto* | *qnx*)
       ;;
 
-    openbsd*)
+    openbsd* | bitrig*)
       if test -f /usr/libexec/ld.so; then
 	hardcode_direct=yes
 	hardcode_shlibpath_var=no
@@ -13016,9 +12552,8 @@ printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; }
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      old_archive_from_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       enable_shared_with_static_runtimes=yes
-      file_list_spec='@'
       ;;
 
     osf3*)
@@ -13249,8 +12784,8 @@ printf %s "checking whether -lc should be explicitly linked in... " >&6; }
 if test ${lt_cv_archive_cmds_need_lc+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) $RM conftest*
+else $as_nop
+  $RM conftest*
 	echo "$lt_simple_compile_test_code" > conftest.$ac_ext
 
 	if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
@@ -13286,8 +12821,7 @@ else case e in #(
 	  cat conftest.err 1>&5
 	fi
 	$RM conftest*
-	 ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_archive_cmds_need_lc" >&5
 printf "%s\n" "$lt_cv_archive_cmds_need_lc" >&6; }
@@ -13458,7 +12992,7 @@ if test yes = "$GCC"; then
     *) lt_awk_arg='/^libraries:/' ;;
   esac
   case $host_os in
-    mingw* | windows* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;;
+    mingw* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;;
     *) lt_sed_strip_eq='s|=/|/|g' ;;
   esac
   lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq`
@@ -13516,7 +13050,7 @@ BEGIN {RS = " "; FS = "/|\n";} {
   # AWK program above erroneously prepends '/' to C:/dos/paths
   # for these hosts.
   case $host_os in
-    mingw* | windows* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
+    mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
       $SED 's|/\([A-Za-z]:\)|\1|g'` ;;
   esac
   sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP`
@@ -13590,7 +13124,7 @@ aix[4-9]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -13684,7 +13218,7 @@ bsdi[45]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -13695,19 +13229,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
-    # If user builds GCC with mulitlibs enabled,
-    # it should just install on $(libdir)
-    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
-    if test yes = $multilib; then
-    postinstall_cmds='base_file=`basename \$file`~
-      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
-      dldir=$destdir/`dirname \$dlpath`~
-      $install_prog $dir/$dlname $destdir/$dlname~
-      chmod a+x $destdir/$dlname~
-      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
-        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
-      fi'
-    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -13717,7 +13238,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
-    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -13726,30 +13246,30 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     case $host_os in
     cygwin*)
       # Cygwin DLLs use 'cyg' prefix rather than 'lib'
-      soname_spec='`echo $libname | $SED -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
+      soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
 
       sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api"
       ;;
-    mingw* | windows* | cegcc*)
+    mingw* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
     pw32*)
       # pw32 DLLs use 'pw' prefix rather than 'lib'
-      library_names_spec='`echo $libname | $SED -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
+      library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
     esac
     dynamic_linker='Win32 ld.exe'
     ;;
 
-  *,cl* | *,icl*)
-    # Native MSVC or ICC
+  *,cl*)
+    # Native MSVC
     libname_spec='$name'
     soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw* | windows*)
+    mingw*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -13762,7 +13282,7 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
       done
       IFS=$lt_save_ifs
       # Convert to MSYS style.
-      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'`
+      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'`
       ;;
     cygwin*)
       # Convert to unix form, then to dos form, then back to unix form
@@ -13799,7 +13319,7 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     ;;
 
   *)
-    # Assume MSVC and ICC wrapper
+    # Assume MSVC wrapper
     library_names_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext $libname.lib'
     dynamic_linker='Win32 ld.exe'
     ;;
@@ -13832,7 +13352,7 @@ dgux*)
   shlibpath_var=LD_LIBRARY_PATH
   ;;
 
-freebsd* | dragonfly* | midnightbsd*)
+freebsd* | dragonfly*)
   # DragonFly does not have aout.  When/if they implement a new
   # versioning mechanism, adjust this.
   if test -x /usr/bin/objformat; then
@@ -13856,28 +13376,7 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
-  case $host_cpu in
-    powerpc64)
-      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
-      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
-      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-int test_pointer_size[sizeof (void *) - 5];
-
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"
-then :
   shlibpath_var=LD_LIBRARY_PATH
-else case e in #(
-  e) shlibpath_var=LD_32_LIBRARY_PATH ;;
-esac
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-      ;;
-    *)
-      shlibpath_var=LD_LIBRARY_PATH
-      ;;
-  esac
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -14018,7 +13517,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
+  library_names_spec='$libname$release$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -14030,9 +13529,8 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # -rpath works at least for libraries that are not overridden by
-  # libraries installed in system locations.
-  hardcode_libdir_flag_spec='$wl-rpath $wl$libdir'
+  # Don't embed -rpath directories since the linker doesn't support them.
+  hardcode_libdir_flag_spec='-L$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -14050,8 +13548,8 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   if test ${lt_cv_shlibpath_overrides_runpath+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_shlibpath_overrides_runpath=no
+else $as_nop
+  lt_cv_shlibpath_overrides_runpath=no
     save_LDFLAGS=$LDFLAGS
     save_libdir=$libdir
     eval "libdir=/foo; wl=\"$lt_prog_compiler_wl\"; \
@@ -14078,8 +13576,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
     LDFLAGS=$save_LDFLAGS
     libdir=$save_libdir
-     ;;
-esac
+
 fi
 
   shlibpath_overrides_runpath=$lt_cv_shlibpath_overrides_runpath
@@ -14089,7 +13586,7 @@ fi
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directories which are
+  # Ideally, we could use ldconfig to report *all* directores which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -14109,6 +13606,18 @@ fi
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -14146,7 +13655,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd*)
+openbsd* | bitrig*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -14487,7 +13996,7 @@ else
     lt_cv_dlopen_self=yes
     ;;
 
-  mingw* | windows* | pw32* | cegcc*)
+  mingw* | pw32* | cegcc*)
     lt_cv_dlopen=LoadLibrary
     lt_cv_dlopen_libs=
     ;;
@@ -14504,22 +14013,16 @@ printf %s "checking for dlopen in -ldl... " >&6; }
 if test ${ac_cv_lib_dl_dlopen+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_check_lib_save_LIBS=$LIBS
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
 LIBS="-ldl  $LIBS"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char dlopen (void);
+   builtin and then its argument prototype would still apply.  */
+char dlopen ();
 int
 main (void)
 {
@@ -14531,27 +14034,24 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ac_cv_lib_dl_dlopen=yes
-else case e in #(
-  e) ac_cv_lib_dl_dlopen=no ;;
-esac
+else $as_nop
+  ac_cv_lib_dl_dlopen=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS ;;
-esac
+LIBS=$ac_check_lib_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5
 printf "%s\n" "$ac_cv_lib_dl_dlopen" >&6; }
 if test "x$ac_cv_lib_dl_dlopen" = xyes
 then :
   lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl
-else case e in #(
-  e)
+else $as_nop
+
     lt_cv_dlopen=dyld
     lt_cv_dlopen_libs=
     lt_cv_dlopen_self=yes
-     ;;
-esac
+
 fi
 
     ;;
@@ -14569,28 +14069,22 @@ fi
 if test "x$ac_cv_func_shl_load" = xyes
 then :
   lt_cv_dlopen=shl_load
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for shl_load in -ldld" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for shl_load in -ldld" >&5
 printf %s "checking for shl_load in -ldld... " >&6; }
 if test ${ac_cv_lib_dld_shl_load+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_check_lib_save_LIBS=$LIBS
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
 LIBS="-ldld  $LIBS"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char shl_load (void);
+   builtin and then its argument prototype would still apply.  */
+char shl_load ();
 int
 main (void)
 {
@@ -14602,47 +14096,39 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ac_cv_lib_dld_shl_load=yes
-else case e in #(
-  e) ac_cv_lib_dld_shl_load=no ;;
-esac
+else $as_nop
+  ac_cv_lib_dld_shl_load=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS ;;
-esac
+LIBS=$ac_check_lib_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_shl_load" >&5
 printf "%s\n" "$ac_cv_lib_dld_shl_load" >&6; }
 if test "x$ac_cv_lib_dld_shl_load" = xyes
 then :
   lt_cv_dlopen=shl_load lt_cv_dlopen_libs=-ldld
-else case e in #(
-  e) ac_fn_c_check_func "$LINENO" "dlopen" "ac_cv_func_dlopen"
+else $as_nop
+  ac_fn_c_check_func "$LINENO" "dlopen" "ac_cv_func_dlopen"
 if test "x$ac_cv_func_dlopen" = xyes
 then :
   lt_cv_dlopen=dlopen
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5
 printf %s "checking for dlopen in -ldl... " >&6; }
 if test ${ac_cv_lib_dl_dlopen+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_check_lib_save_LIBS=$LIBS
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
 LIBS="-ldl  $LIBS"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char dlopen (void);
+   builtin and then its argument prototype would still apply.  */
+char dlopen ();
 int
 main (void)
 {
@@ -14654,42 +14140,34 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ac_cv_lib_dl_dlopen=yes
-else case e in #(
-  e) ac_cv_lib_dl_dlopen=no ;;
-esac
+else $as_nop
+  ac_cv_lib_dl_dlopen=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS ;;
-esac
+LIBS=$ac_check_lib_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5
 printf "%s\n" "$ac_cv_lib_dl_dlopen" >&6; }
 if test "x$ac_cv_lib_dl_dlopen" = xyes
 then :
   lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -lsvld" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -lsvld" >&5
 printf %s "checking for dlopen in -lsvld... " >&6; }
 if test ${ac_cv_lib_svld_dlopen+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_check_lib_save_LIBS=$LIBS
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
 LIBS="-lsvld  $LIBS"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char dlopen (void);
+   builtin and then its argument prototype would still apply.  */
+char dlopen ();
 int
 main (void)
 {
@@ -14701,42 +14179,34 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ac_cv_lib_svld_dlopen=yes
-else case e in #(
-  e) ac_cv_lib_svld_dlopen=no ;;
-esac
+else $as_nop
+  ac_cv_lib_svld_dlopen=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS ;;
-esac
+LIBS=$ac_check_lib_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_svld_dlopen" >&5
 printf "%s\n" "$ac_cv_lib_svld_dlopen" >&6; }
 if test "x$ac_cv_lib_svld_dlopen" = xyes
 then :
   lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-lsvld
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dld_link in -ldld" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dld_link in -ldld" >&5
 printf %s "checking for dld_link in -ldld... " >&6; }
 if test ${ac_cv_lib_dld_dld_link+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_check_lib_save_LIBS=$LIBS
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
 LIBS="-ldld  $LIBS"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char dld_link (void);
+   builtin and then its argument prototype would still apply.  */
+char dld_link ();
 int
 main (void)
 {
@@ -14748,14 +14218,12 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ac_cv_lib_dld_dld_link=yes
-else case e in #(
-  e) ac_cv_lib_dld_dld_link=no ;;
-esac
+else $as_nop
+  ac_cv_lib_dld_dld_link=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-LIBS=$ac_check_lib_save_LIBS ;;
-esac
+LIBS=$ac_check_lib_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_dld_link" >&5
 printf "%s\n" "$ac_cv_lib_dld_dld_link" >&6; }
@@ -14764,24 +14232,19 @@ then :
   lt_cv_dlopen=dld_link lt_cv_dlopen_libs=-ldld
 fi
 
-	       ;;
-esac
+
 fi
 
-	     ;;
-esac
+
 fi
 
-	   ;;
-esac
+
 fi
 
-	 ;;
-esac
+
 fi
 
-       ;;
-esac
+
 fi
 
     ;;
@@ -14809,8 +14272,8 @@ printf %s "checking whether a program can dlopen itself... " >&6; }
 if test ${lt_cv_dlopen_self+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) 	  if test yes = "$cross_compiling"; then :
+else $as_nop
+  	  if test yes = "$cross_compiling"; then :
   lt_cv_dlopen_self=cross
 else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
@@ -14860,11 +14323,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord (void) __attribute__((visibility("default")));
+int fnord () __attribute__((visibility("default")));
 #endif
 
-int fnord (void) { return 42; }
-int main (void)
+int fnord () { return 42; }
+int main ()
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -14904,8 +14367,7 @@ _LT_EOF
 fi
 rm -fr conftest*
 
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self" >&5
 printf "%s\n" "$lt_cv_dlopen_self" >&6; }
@@ -14917,8 +14379,8 @@ printf %s "checking whether a statically linked program can dlopen itself... " >
 if test ${lt_cv_dlopen_self_static+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) 	  if test yes = "$cross_compiling"; then :
+else $as_nop
+  	  if test yes = "$cross_compiling"; then :
   lt_cv_dlopen_self_static=cross
 else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
@@ -14968,11 +14430,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord (void) __attribute__((visibility("default")));
+int fnord () __attribute__((visibility("default")));
 #endif
 
-int fnord (void) { return 42; }
-int main (void)
+int fnord () { return 42; }
+int main ()
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -15012,8 +14474,7 @@ _LT_EOF
 fi
 rm -fr conftest*
 
-       ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self_static" >&5
 printf "%s\n" "$lt_cv_dlopen_self_static" >&6; }
@@ -15056,41 +14517,30 @@ striplib=
 old_striplib=
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether stripping libraries is possible" >&5
 printf %s "checking whether stripping libraries is possible... " >&6; }
-if test -z "$STRIP"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-else
-  if $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then
-    old_striplib="$STRIP --strip-debug"
-    striplib="$STRIP --strip-unneeded"
-    { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+if test -n "$STRIP" && $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then
+  test -z "$old_striplib" && old_striplib="$STRIP --strip-debug"
+  test -z "$striplib" && striplib="$STRIP --strip-unneeded"
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 printf "%s\n" "yes" >&6; }
-  else
-    case $host_os in
-    darwin*)
-      # FIXME - insert some real tests, host_os isn't really good enough
+else
+# FIXME - insert some real tests, host_os isn't really good enough
+  case $host_os in
+  darwin*)
+    if test -n "$STRIP"; then
       striplib="$STRIP -x"
       old_striplib="$STRIP -S"
       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 printf "%s\n" "yes" >&6; }
-      ;;
-    freebsd*)
-      if $STRIP -V 2>&1 | $GREP "elftoolchain" >/dev/null; then
-        old_striplib="$STRIP --strip-debug"
-        striplib="$STRIP --strip-unneeded"
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5
-printf "%s\n" "yes" >&6; }
-      else
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-      fi
-      ;;
-    *)
+    else
       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
-      ;;
-    esac
-  fi
+    fi
+    ;;
+  *)
+    { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+printf "%s\n" "no" >&6; }
+    ;;
+  esac
 fi
 
 
@@ -15171,8 +14621,8 @@ if test -z "$CXXCPP"; then
   if test ${ac_cv_prog_CXXCPP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)     # Double quotes because $CXX needs to be expanded
+else $as_nop
+      # Double quotes because $CXX needs to be expanded
     for CXXCPP in "$CXX -E" cpp /lib/cpp
     do
       ac_preproc_ok=false
@@ -15190,10 +14640,9 @@ _ACEOF
 if ac_fn_cxx_try_cpp "$LINENO"
 then :
 
-else case e in #(
-  e) # Broken: fails on valid input.
-continue ;;
-esac
+else $as_nop
+  # Broken: fails on valid input.
+continue
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
@@ -15207,16 +14656,15 @@ if ac_fn_cxx_try_cpp "$LINENO"
 then :
   # Broken: success on invalid input.
 continue
-else case e in #(
-  e) # Passes both tests.
+else $as_nop
+  # Passes both tests.
 ac_preproc_ok=:
-break ;;
-esac
+break
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
 done
-# Because of 'break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
 rm -f conftest.i conftest.err conftest.$ac_ext
 if $ac_preproc_ok
 then :
@@ -15225,8 +14673,7 @@ fi
 
     done
     ac_cv_prog_CXXCPP=$CXXCPP
-   ;;
-esac
+
 fi
   CXXCPP=$ac_cv_prog_CXXCPP
 else
@@ -15249,10 +14696,9 @@ _ACEOF
 if ac_fn_cxx_try_cpp "$LINENO"
 then :
 
-else case e in #(
-  e) # Broken: fails on valid input.
-continue ;;
-esac
+else $as_nop
+  # Broken: fails on valid input.
+continue
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
@@ -15266,26 +14712,24 @@ if ac_fn_cxx_try_cpp "$LINENO"
 then :
   # Broken: success on invalid input.
 continue
-else case e in #(
-  e) # Passes both tests.
+else $as_nop
+  # Passes both tests.
 ac_preproc_ok=:
-break ;;
-esac
+break
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
 done
-# Because of 'break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
 rm -f conftest.i conftest.err conftest.$ac_ext
 if $ac_preproc_ok
 then :
 
-else case e in #(
-  e) { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+else $as_nop
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "C++ preprocessor \"$CXXCPP\" fails sanity check
-See 'config.log' for more details" "$LINENO" 5; } ;;
-esac
+See \`config.log' for more details" "$LINENO" 5; }
 fi
 
 ac_ext=c
@@ -15422,9 +14866,8 @@ cc_basename=$func_cc_basename_result
 if test ${with_gnu_ld+y}
 then :
   withval=$with_gnu_ld; test no = "$withval" || with_gnu_ld=yes
-else case e in #(
-  e) with_gnu_ld=no ;;
-esac
+else $as_nop
+  with_gnu_ld=no
 fi
 
 ac_prog=ld
@@ -15433,7 +14876,7 @@ if test yes = "$GCC"; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5
 printf %s "checking for ld used by $CC... " >&6; }
   case $host in
-  *-*-mingw* | *-*-windows*)
+  *-*-mingw*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -15469,8 +14912,8 @@ fi
 if test ${lt_cv_path_LD+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$LD"; then
+else $as_nop
+  if test -z "$LD"; then
   lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR
   for ac_dir in $PATH; do
     IFS=$lt_save_ifs
@@ -15493,8 +14936,7 @@ else case e in #(
   IFS=$lt_save_ifs
 else
   lt_cv_path_LD=$LD # Let the user override the test with a path.
-fi ;;
-esac
+fi
 fi
 
 LD=$lt_cv_path_LD
@@ -15511,8 +14953,8 @@ printf %s "checking if the linker ($LD) is GNU ld... " >&6; }
 if test ${lt_cv_prog_gnu_ld+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) # I'd rather use --version here, but apparently some GNU lds only accept -v.
+else $as_nop
+  # I'd rather use --version here, but apparently some GNU lds only accept -v.
 case `$LD -v 2>&1 </dev/null` in
 *GNU* | *'with BFD'*)
   lt_cv_prog_gnu_ld=yes
@@ -15520,7 +14962,6 @@ case `$LD -v 2>&1 </dev/null` in
 *)
   lt_cv_prog_gnu_ld=no
   ;;
-esac ;;
 esac
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_gnu_ld" >&5
@@ -15548,7 +14989,8 @@ with_gnu_ld=$lt_cv_prog_gnu_ld
         wlarc='$wl'
 
         # ancient GNU ld didn't support --whole-archive et. al.
-        if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then
+        if eval "`$CC -print-prog-name=ld` --help 2>&1" |
+	  $GREP 'no-whole-archive' > /dev/null; then
           whole_archive_flag_spec_CXX=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive'
         else
           whole_archive_flag_spec_CXX=
@@ -15568,7 +15010,7 @@ with_gnu_ld=$lt_cv_prog_gnu_ld
       # Commands to make compiler produce verbose output that lists
       # what "hidden" libraries, object files and flags are used when
       # linking a shared library.
-      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
+      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 
     else
       GXX=no
@@ -15719,8 +15161,8 @@ else
   if test ${lt_cv_aix_libpath__CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -15752,8 +15194,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
   if test -z "$lt_cv_aix_libpath__CXX"; then
     lt_cv_aix_libpath__CXX=/usr/lib:/lib
   fi
-   ;;
-esac
+
 fi
 
   aix_libpath=$lt_cv_aix_libpath__CXX
@@ -15776,8 +15217,8 @@ else
   if test ${lt_cv_aix_libpath__CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -15809,8 +15250,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
   if test -z "$lt_cv_aix_libpath__CXX"; then
     lt_cv_aix_libpath__CXX=/usr/lib:/lib
   fi
-   ;;
-esac
+
 fi
 
   aix_libpath=$lt_cv_aix_libpath__CXX
@@ -15868,10 +15308,10 @@ fi
         esac
         ;;
 
-      cygwin* | mingw* | windows* | pw32* | cegcc*)
+      cygwin* | mingw* | pw32* | cegcc*)
 	case $GXX,$cc_basename in
-	,cl* | no,cl* | ,icl* | no,icl*)
-	  # Native MSVC or ICC
+	,cl* | no,cl*)
+	  # Native MSVC
 	  # hardcode_libdir_flag_spec is actually meaningless, as there is
 	  # no search path for DLLs.
 	  hardcode_libdir_flag_spec_CXX=' '
@@ -15962,11 +15402,11 @@ fi
     output_verbose_link_cmd=func_echo_all
     archive_cmds_CXX="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil"
     module_cmds_CXX="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil"
-    archive_expsym_cmds_CXX="$SED 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
-    module_expsym_cmds_CXX="$SED -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
+    archive_expsym_cmds_CXX="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
+    module_expsym_cmds_CXX="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
        if test yes != "$lt_cv_apple_cc_single_mod"; then
       archive_cmds_CXX="\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dsymutil"
-      archive_expsym_cmds_CXX="$SED 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dar_export_syms$_lt_dsymutil"
+      archive_expsym_cmds_CXX="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dar_export_syms$_lt_dsymutil"
     fi
 
   else
@@ -15999,9 +15439,8 @@ fi
 	  cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	  $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	  emximp -o $lib $output_objdir/$libname.def'
-	old_archive_from_new_cmds_CXX='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+	old_archive_From_new_cmds_CXX='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
 	enable_shared_with_static_runtimes_CXX=yes
-	file_list_spec_CXX='@'
 	;;
 
       dgux*)
@@ -16032,7 +15471,7 @@ fi
         archive_cmds_need_lc_CXX=no
         ;;
 
-      freebsd* | dragonfly* | midnightbsd*)
+      freebsd* | dragonfly*)
         # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF
         # conventions
         ld_shlibs_CXX=yes
@@ -16067,7 +15506,7 @@ fi
             # explicitly linking system object files so we need to strip them
             # from the output so that they don't get included in the library
             # dependencies.
-            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "[-]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP " \-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
             ;;
           *)
             if test yes = "$GXX"; then
@@ -16132,7 +15571,7 @@ fi
 	    # explicitly linking system object files so we need to strip them
 	    # from the output so that they don't get included in the library
 	    # dependencies.
-	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "[-]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP " \-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
 	    ;;
           *)
 	    if test yes = "$GXX"; then
@@ -16169,7 +15608,7 @@ fi
 	# 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
 	# time.  Moving up from 0x10000000 also allows more sbrk(2) space.
 	archive_cmds_CXX='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
-	archive_expsym_cmds_CXX='$SED "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
+	archive_expsym_cmds_CXX='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
 	;;
       irix5* | irix6*)
         case $cc_basename in
@@ -16309,13 +15748,13 @@ fi
 	    archive_cmds_CXX='$CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib'
 	    if test yes = "$supports_anon_versioning"; then
 	      archive_expsym_cmds_CXX='echo "{ global:" > $output_objdir/$libname.ver~
-                cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+                cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
                 echo "local: *; };" >> $output_objdir/$libname.ver~
                 $CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib'
 	    fi
 	    ;;
 	  *)
-	    case `$CC -V 2>&1 | $SED 5q` in
+	    case `$CC -V 2>&1 | sed 5q` in
 	    *Sun\ C*)
 	      # Sun C++ 5.9
 	      no_undefined_flag_CXX=' -zdefs'
@@ -16380,7 +15819,7 @@ fi
         ld_shlibs_CXX=yes
 	;;
 
-      openbsd*)
+      openbsd* | bitrig*)
 	if test -f /usr/libexec/ld.so; then
 	  hardcode_direct_CXX=yes
 	  hardcode_shlibpath_var_CXX=no
@@ -16471,7 +15910,7 @@ fi
 	      # Commands to make compiler produce verbose output that lists
 	      # what "hidden" libraries, object files and flags are used when
 	      # linking a shared library.
-	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
+	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 
 	    else
 	      # FIXME: insert proper C++ library support
@@ -16555,7 +15994,7 @@ fi
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
+	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 	      else
 	        # g++ 2.7 appears to require '-G' NOT '-shared' on this
 	        # platform.
@@ -16566,7 +16005,7 @@ fi
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[-]L"'
+	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 	      fi
 
 	      hardcode_libdir_flag_spec_CXX='$wl-R $wl$libdir'
@@ -16709,11 +16148,10 @@ if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
     case $prev$p in
 
     -L* | -R* | -l*)
-       # Some compilers place space between "-{L,R,l}" and the path.
+       # Some compilers place space between "-{L,R}" and the path.
        # Remove the space.
-       if test x-L = x"$p" ||
-          test x-R = x"$p" ||
-          test x-l = x"$p"; then
+       if test x-L = "$p" ||
+          test x-R = "$p"; then
 	 prev=$p
 	 continue
        fi
@@ -16880,7 +16318,7 @@ lt_prog_compiler_static_CXX=
     beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
       # PIC is the default for these OSes.
       ;;
-    mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
+    mingw* | cygwin* | os2* | pw32* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -16955,7 +16393,7 @@ lt_prog_compiler_static_CXX=
 	  ;;
 	esac
 	;;
-      mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
+      mingw* | cygwin* | os2* | pw32* | cegcc*)
 	# This hack is so that the source file can tell whether it is being
 	# built for inclusion in a dll (and should export symbols for example).
 	lt_prog_compiler_pic_CXX='-DDLL_EXPORT'
@@ -16973,7 +16411,7 @@ lt_prog_compiler_static_CXX=
 	    ;;
 	esac
 	;;
-      freebsd* | dragonfly* | midnightbsd*)
+      freebsd* | dragonfly*)
 	# FreeBSD uses GNU C++
 	;;
       hpux9* | hpux10* | hpux11*)
@@ -17056,7 +16494,7 @@ lt_prog_compiler_static_CXX=
 	    lt_prog_compiler_static_CXX='-qstaticlink'
 	    ;;
 	  *)
-	    case `$CC -V 2>&1 | $SED 5q` in
+	    case `$CC -V 2>&1 | sed 5q` in
 	    *Sun\ C*)
 	      # Sun C++ 5.9
 	      lt_prog_compiler_pic_CXX='-KPIC'
@@ -17080,7 +16518,7 @@ lt_prog_compiler_static_CXX=
 	    ;;
 	esac
 	;;
-      netbsd*)
+      netbsd* | netbsdelf*-gnu)
 	;;
       *qnx* | *nto*)
         # QNX uses GNU C++, but need to define -shared option too, otherwise
@@ -17183,9 +16621,8 @@ printf %s "checking for $compiler option to produce PIC... " >&6; }
 if test ${lt_cv_prog_compiler_pic_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_pic_CXX=$lt_prog_compiler_pic_CXX ;;
-esac
+else $as_nop
+  lt_cv_prog_compiler_pic_CXX=$lt_prog_compiler_pic_CXX
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_CXX" >&5
 printf "%s\n" "$lt_cv_prog_compiler_pic_CXX" >&6; }
@@ -17200,8 +16637,8 @@ printf %s "checking if $compiler PIC flag $lt_prog_compiler_pic_CXX works... " >
 if test ${lt_cv_prog_compiler_pic_works_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_pic_works_CXX=no
+else $as_nop
+  lt_cv_prog_compiler_pic_works_CXX=no
    ac_outfile=conftest.$ac_objext
    echo "$lt_simple_compile_test_code" > conftest.$ac_ext
    lt_compiler_flag="$lt_prog_compiler_pic_CXX -DPIC"  ## exclude from sc_useless_quotes_in_assignment
@@ -17229,8 +16666,7 @@ else case e in #(
      fi
    fi
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_works_CXX" >&5
 printf "%s\n" "$lt_cv_prog_compiler_pic_works_CXX" >&6; }
@@ -17260,8 +16696,8 @@ printf %s "checking if $compiler static flag $lt_tmp_static_flag works... " >&6;
 if test ${lt_cv_prog_compiler_static_works_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_static_works_CXX=no
+else $as_nop
+  lt_cv_prog_compiler_static_works_CXX=no
    save_LDFLAGS=$LDFLAGS
    LDFLAGS="$LDFLAGS $lt_tmp_static_flag"
    echo "$lt_simple_link_test_code" > conftest.$ac_ext
@@ -17282,8 +16718,7 @@ else case e in #(
    fi
    $RM -r conftest*
    LDFLAGS=$save_LDFLAGS
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_static_works_CXX" >&5
 printf "%s\n" "$lt_cv_prog_compiler_static_works_CXX" >&6; }
@@ -17302,8 +16737,8 @@ printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; }
 if test ${lt_cv_prog_compiler_c_o_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_c_o_CXX=no
+else $as_nop
+  lt_cv_prog_compiler_c_o_CXX=no
    $RM -r conftest 2>/dev/null
    mkdir conftest
    cd conftest
@@ -17343,8 +16778,7 @@ else case e in #(
    cd ..
    $RM -r conftest
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o_CXX" >&5
 printf "%s\n" "$lt_cv_prog_compiler_c_o_CXX" >&6; }
@@ -17356,8 +16790,8 @@ printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; }
 if test ${lt_cv_prog_compiler_c_o_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_prog_compiler_c_o_CXX=no
+else $as_nop
+  lt_cv_prog_compiler_c_o_CXX=no
    $RM -r conftest 2>/dev/null
    mkdir conftest
    cd conftest
@@ -17397,8 +16831,7 @@ else case e in #(
    cd ..
    $RM -r conftest
    $RM conftest*
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o_CXX" >&5
 printf "%s\n" "$lt_cv_prog_compiler_c_o_CXX" >&6; }
@@ -17448,15 +16881,15 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
     if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
       export_symbols_cmds_CXX='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && (substr(\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols'
     else
-      export_symbols_cmds_CXX='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
+      export_symbols_cmds_CXX='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
     fi
     ;;
   pw32*)
     export_symbols_cmds_CXX=$ltdll_cmds
     ;;
-  cygwin* | mingw* | windows* | cegcc*)
+  cygwin* | mingw* | cegcc*)
     case $cc_basename in
-    cl* | icl*)
+    cl*)
       exclude_expsyms_CXX='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*'
       ;;
     *)
@@ -17465,6 +16898,9 @@ printf %s "checking whether the $compiler linker ($LD) supports shared libraries
       ;;
     esac
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    link_all_deplibs_CXX=no
+    ;;
   *)
     export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
     ;;
@@ -17503,8 +16939,8 @@ printf %s "checking whether -lc should be explicitly linked in... " >&6; }
 if test ${lt_cv_archive_cmds_need_lc_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) $RM conftest*
+else $as_nop
+  $RM conftest*
 	echo "$lt_simple_compile_test_code" > conftest.$ac_ext
 
 	if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
@@ -17540,8 +16976,7 @@ else case e in #(
 	  cat conftest.err 1>&5
 	fi
 	$RM conftest*
-	 ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_archive_cmds_need_lc_CXX" >&5
 printf "%s\n" "$lt_cv_archive_cmds_need_lc_CXX" >&6; }
@@ -17683,7 +17118,7 @@ aix[4-9]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -17777,7 +17212,7 @@ bsdi[45]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -17788,19 +17223,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
-    # If user builds GCC with mulitlibs enabled,
-    # it should just install on $(libdir)
-    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
-    if test yes = $multilib; then
-    postinstall_cmds='base_file=`basename \$file`~
-      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
-      dldir=$destdir/`dirname \$dlpath`~
-      $install_prog $dir/$dlname $destdir/$dlname~
-      chmod a+x $destdir/$dlname~
-      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
-        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
-      fi'
-    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -17810,7 +17232,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
-    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -17819,29 +17240,29 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     case $host_os in
     cygwin*)
       # Cygwin DLLs use 'cyg' prefix rather than 'lib'
-      soname_spec='`echo $libname | $SED -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
+      soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
 
       ;;
-    mingw* | windows* | cegcc*)
+    mingw* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
     pw32*)
       # pw32 DLLs use 'pw' prefix rather than 'lib'
-      library_names_spec='`echo $libname | $SED -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
+      library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
       ;;
     esac
     dynamic_linker='Win32 ld.exe'
     ;;
 
-  *,cl* | *,icl*)
-    # Native MSVC or ICC
+  *,cl*)
+    # Native MSVC
     libname_spec='$name'
     soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext'
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw* | windows*)
+    mingw*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -17854,7 +17275,7 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
       done
       IFS=$lt_save_ifs
       # Convert to MSYS style.
-      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'`
+      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'`
       ;;
     cygwin*)
       # Convert to unix form, then to dos form, then back to unix form
@@ -17891,7 +17312,7 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     ;;
 
   *)
-    # Assume MSVC and ICC wrapper
+    # Assume MSVC wrapper
     library_names_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext $libname.lib'
     dynamic_linker='Win32 ld.exe'
     ;;
@@ -17923,7 +17344,7 @@ dgux*)
   shlibpath_var=LD_LIBRARY_PATH
   ;;
 
-freebsd* | dragonfly* | midnightbsd*)
+freebsd* | dragonfly*)
   # DragonFly does not have aout.  When/if they implement a new
   # versioning mechanism, adjust this.
   if test -x /usr/bin/objformat; then
@@ -17947,28 +17368,7 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
-  case $host_cpu in
-    powerpc64)
-      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
-      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
-      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-int test_pointer_size[sizeof (void *) - 5];
-
-_ACEOF
-if ac_fn_cxx_try_compile "$LINENO"
-then :
   shlibpath_var=LD_LIBRARY_PATH
-else case e in #(
-  e) shlibpath_var=LD_32_LIBRARY_PATH ;;
-esac
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-      ;;
-    *)
-      shlibpath_var=LD_LIBRARY_PATH
-      ;;
-  esac
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -18109,7 +17509,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
+  library_names_spec='$libname$release$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -18121,9 +17521,8 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # -rpath works at least for libraries that are not overridden by
-  # libraries installed in system locations.
-  hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir'
+  # Don't embed -rpath directories since the linker doesn't support them.
+  hardcode_libdir_flag_spec_CXX='-L$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -18141,8 +17540,8 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   if test ${lt_cv_shlibpath_overrides_runpath+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) lt_cv_shlibpath_overrides_runpath=no
+else $as_nop
+  lt_cv_shlibpath_overrides_runpath=no
     save_LDFLAGS=$LDFLAGS
     save_libdir=$libdir
     eval "libdir=/foo; wl=\"$lt_prog_compiler_wl_CXX\"; \
@@ -18169,8 +17568,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
     LDFLAGS=$save_LDFLAGS
     libdir=$save_libdir
-     ;;
-esac
+
 fi
 
   shlibpath_overrides_runpath=$lt_cv_shlibpath_overrides_runpath
@@ -18180,7 +17578,7 @@ fi
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directories which are
+  # Ideally, we could use ldconfig to report *all* directores which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -18200,6 +17598,18 @@ fi
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -18237,7 +17647,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd*)
+openbsd* | bitrig*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -18568,8 +17978,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18591,8 +18001,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -18614,8 +18023,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18637,8 +18046,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -18673,8 +18081,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18696,8 +18104,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -18719,8 +18126,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
   ac_prog_rejected=no
@@ -18759,8 +18166,7 @@ if test $ac_prog_rejected = yes; then
     ac_cv_prog_CC="$as_dir$ac_word${1+' '}$@"
   fi
 fi
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -18784,8 +18190,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18807,8 +18213,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -18834,8 +18239,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18857,8 +18262,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -18896,8 +18300,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$CC"; then
+else $as_nop
+  if test -n "$CC"; then
   ac_cv_prog_CC="$CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18919,8 +18323,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 CC=$ac_cv_prog_CC
 if test -n "$CC"; then
@@ -18942,8 +18345,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_CC"; then
+else $as_nop
+  if test -n "$ac_ct_CC"; then
   ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -18965,8 +18368,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_CC=$ac_cv_prog_ac_ct_CC
 if test -n "$ac_ct_CC"; then
@@ -18995,10 +18397,10 @@ fi
 fi
 
 
-test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "no acceptable C compiler found in \$PATH
-See 'config.log' for more details" "$LINENO" 5; }
+See \`config.log' for more details" "$LINENO" 5; }
 
 # Provide some information about the compiler.
 printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5
@@ -19030,8 +18432,8 @@ printf %s "checking whether the compiler supports GNU C... " >&6; }
 if test ${ac_cv_c_compiler_gnu+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -19048,14 +18450,12 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ac_compiler_gnu=yes
-else case e in #(
-  e) ac_compiler_gnu=no ;;
-esac
+else $as_nop
+  ac_compiler_gnu=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 ac_cv_c_compiler_gnu=$ac_compiler_gnu
- ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5
 printf "%s\n" "$ac_cv_c_compiler_gnu" >&6; }
@@ -19073,8 +18473,8 @@ printf %s "checking whether $CC accepts -g... " >&6; }
 if test ${ac_cv_prog_cc_g+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_c_werror_flag=$ac_c_werror_flag
+else $as_nop
+  ac_save_c_werror_flag=$ac_c_werror_flag
    ac_c_werror_flag=yes
    ac_cv_prog_cc_g=no
    CFLAGS="-g"
@@ -19092,8 +18492,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_prog_cc_g=yes
-else case e in #(
-  e) CFLAGS=""
+else $as_nop
+  CFLAGS=""
       cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -19108,8 +18508,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) ac_c_werror_flag=$ac_save_c_werror_flag
+else $as_nop
+  ac_c_werror_flag=$ac_save_c_werror_flag
 	 CFLAGS="-g"
 	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -19126,15 +18526,12 @@ if ac_fn_c_try_compile "$LINENO"
 then :
   ac_cv_prog_cc_g=yes
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-   ac_c_werror_flag=$ac_save_c_werror_flag ;;
-esac
+   ac_c_werror_flag=$ac_save_c_werror_flag
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5
 printf "%s\n" "$ac_cv_prog_cc_g" >&6; }
@@ -19161,8 +18558,8 @@ printf %s "checking for $CC option to enable C11 features... " >&6; }
 if test ${ac_cv_prog_cc_c11+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c11=no
+else $as_nop
+  ac_cv_prog_cc_c11=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -19179,28 +18576,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c11" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c11" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c11" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c11" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5
 printf "%s\n" "$ac_cv_prog_cc_c11" >&6; }
-     CC="$CC $ac_cv_prog_cc_c11" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c11"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c11
-  ac_prog_cc_stdc=c11 ;;
-esac
+  ac_prog_cc_stdc=c11
 fi
 fi
 if test x$ac_prog_cc_stdc = xno
@@ -19210,8 +18604,8 @@ printf %s "checking for $CC option to enable C99 features... " >&6; }
 if test ${ac_cv_prog_cc_c99+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c99=no
+else $as_nop
+  ac_cv_prog_cc_c99=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -19228,28 +18622,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c99" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c99" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c99" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c99" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5
 printf "%s\n" "$ac_cv_prog_cc_c99" >&6; }
-     CC="$CC $ac_cv_prog_cc_c99" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c99"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c99
-  ac_prog_cc_stdc=c99 ;;
-esac
+  ac_prog_cc_stdc=c99
 fi
 fi
 if test x$ac_prog_cc_stdc = xno
@@ -19259,8 +18650,8 @@ printf %s "checking for $CC option to enable C89 features... " >&6; }
 if test ${ac_cv_prog_cc_c89+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_cv_prog_cc_c89=no
+else $as_nop
+  ac_cv_prog_cc_c89=no
 ac_save_CC=$CC
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -19277,28 +18668,25 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam
   test "x$ac_cv_prog_cc_c89" != "xno" && break
 done
 rm -f conftest.$ac_ext
-CC=$ac_save_CC ;;
-esac
+CC=$ac_save_CC
 fi
 
 if test "x$ac_cv_prog_cc_c89" = xno
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5
 printf "%s\n" "unsupported" >&6; }
-else case e in #(
-  e) if test "x$ac_cv_prog_cc_c89" = x
+else $as_nop
+  if test "x$ac_cv_prog_cc_c89" = x
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5
 printf "%s\n" "none needed" >&6; }
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5
 printf "%s\n" "$ac_cv_prog_cc_c89" >&6; }
-     CC="$CC $ac_cv_prog_cc_c89" ;;
-esac
+     CC="$CC $ac_cv_prog_cc_c89"
 fi
   ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c89
-  ac_prog_cc_stdc=c89 ;;
-esac
+  ac_prog_cc_stdc=c89
 fi
 fi
 
@@ -19319,8 +18707,8 @@ printf %s "checking whether $CC understands -c and -o together... " >&6; }
 if test ${am_cv_prog_cc_c_o+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 int
@@ -19350,8 +18738,7 @@ _ACEOF
     fi
   done
   rm -f core conftest*
-  unset am_i ;;
-esac
+  unset am_i
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_prog_cc_c_o" >&5
 printf "%s\n" "$am_cv_prog_cc_c_o" >&6; }
@@ -19376,8 +18763,8 @@ printf %s "checking for egrep... " >&6; }
 if test ${ac_cv_path_EGREP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
+else $as_nop
+  if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
    then ac_cv_path_EGREP="$GREP -E"
    else
      if test -z "$EGREP"; then
@@ -19399,10 +18786,9 @@ do
       as_fn_executable_p "$ac_path_EGREP" || continue
 # Check for GNU ac_path_EGREP and select it if it is found.
   # Check for GNU $ac_path_EGREP
-case `"$ac_path_EGREP" --version 2>&1` in #(
+case `"$ac_path_EGREP" --version 2>&1` in
 *GNU*)
   ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;;
-#(
 *)
   ac_count=0
   printf %s 0123456789 >"conftest.in"
@@ -19438,15 +18824,12 @@ else
   ac_cv_path_EGREP=$EGREP
 fi
 
-   fi ;;
-esac
+   fi
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5
 printf "%s\n" "$ac_cv_path_EGREP" >&6; }
  EGREP="$ac_cv_path_EGREP"
 
-         EGREP_TRADITIONAL=$EGREP
- ac_cv_path_EGREP_TRADITIONAL=$EGREP
 
 
 
@@ -19455,8 +18838,8 @@ printf %s "checking for C compiler vendor... " >&6; }
 if test ${ax_cv_c_compiler_vendor+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
 	vendors="
 		intel:		__ICC,__ECC,__INTEL_COMPILER
 		ibm:		__xlc__,__xlC__,__IBMC__,__IBMCPP__,__ibmxl__
@@ -19517,8 +18900,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 	done
 
 	ax_cv_c_compiler_vendor=`echo $vendor | cut -d: -f1`
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_c_compiler_vendor" >&5
 printf "%s\n" "$ax_cv_c_compiler_vendor" >&6; }
@@ -19565,9 +18947,8 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_shared=yes ;;
-esac
+else $as_nop
+  enable_shared=yes
 fi
 
 
@@ -19598,9 +18979,8 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_static=no ;;
-esac
+else $as_nop
+  enable_static=no
 fi
 
 
@@ -19608,8 +18988,8 @@ fi
 
 
 
-else case e in #(
-  e) # Check whether --enable-static was given.
+else $as_nop
+  # Check whether --enable-static was given.
 if test ${enable_static+y}
 then :
   enableval=$enable_static; p=${PACKAGE-default}
@@ -19629,26 +19009,23 @@ then :
       IFS=$lt_save_ifs
       ;;
     esac
-else case e in #(
-  e) enable_static=yes ;;
-esac
+else $as_nop
+  enable_static=yes
 fi
 
 
 
 
 
- ;;
-esac
+
 fi
 
 # Check whether --enable-warnings was given.
 if test ${enable_warnings+y}
 then :
   enableval=$enable_warnings;  enable_warnings=${enableval}
-else case e in #(
-  e)  enable_warnings=yes  ;;
-esac
+else $as_nop
+   enable_warnings=yes
 fi
 
 
@@ -19665,16 +19042,15 @@ then :
       esac
     fi
 
-else case e in #(
-  e)
+else $as_nop
+
     if test "x$enable_shared" = "xyes" ; then
       symbol_hiding="maybe"
     else
       symbol_hiding="no"
     fi
 
- ;;
-esac
+
 fi
 
 
@@ -19682,15 +19058,14 @@ fi
 if test ${enable_tests+y}
 then :
   enableval=$enable_tests;  build_tests="$enableval"
-else case e in #(
-  e)  if test "x$HAVE_CXX14" = "x1" && test "x$cross_compiling" = "xno" ; then
+else $as_nop
+   if test "x$HAVE_CXX14" = "x1" && test "x$cross_compiling" = "xno" ; then
       build_tests="maybe"
     else
       build_tests="no"
     fi
 
- ;;
-esac
+
 fi
 
 
@@ -19698,9 +19073,8 @@ fi
 if test ${enable_cares_threads+y}
 then :
   enableval=$enable_cares_threads;  CARES_THREADS=${enableval}
-else case e in #(
-  e)  CARES_THREADS=yes  ;;
-esac
+else $as_nop
+   CARES_THREADS=yes
 fi
 
 
@@ -19709,10 +19083,9 @@ fi
 if test ${with_random+y}
 then :
   withval=$with_random;  CARES_RANDOM_FILE="$withval"
-else case e in #(
-  e)  CARES_RANDOM_FILE="/dev/urandom"
- ;;
-esac
+else $as_nop
+   CARES_RANDOM_FILE="/dev/urandom"
+
 fi
 
 if test -n "$CARES_RANDOM_FILE" && test X"$CARES_RANDOM_FILE" != Xno ; then
@@ -19729,9 +19102,8 @@ printf %s "checking whether to enable maintainer-specific portions of Makefiles.
 if test ${enable_maintainer_mode+y}
 then :
   enableval=$enable_maintainer_mode; USE_MAINTAINER_MODE=$enableval
-else case e in #(
-  e) USE_MAINTAINER_MODE=no ;;
-esac
+else $as_nop
+  USE_MAINTAINER_MODE=no
 fi
 
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $USE_MAINTAINER_MODE" >&5
@@ -19747,8 +19119,47 @@ fi
   MAINT=$MAINTAINER_MODE_TRUE
 
 
+# Check whether --enable-silent-rules was given.
+if test ${enable_silent_rules+y}
+then :
+  enableval=$enable_silent_rules;
+fi
+
+case $enable_silent_rules in # (((
+  yes) AM_DEFAULT_VERBOSITY=0;;
+   no) AM_DEFAULT_VERBOSITY=1;;
+    *) AM_DEFAULT_VERBOSITY=0;;
+esac
+am_make=${MAKE-make}
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5
+printf %s "checking whether $am_make supports nested variables... " >&6; }
+if test ${am_cv_make_support_nested_variables+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  if printf "%s\n" 'TRUE=$(BAR$(V))
+BAR0=false
+BAR1=true
+V=1
+am__doit:
+	@$(TRUE)
+.PHONY: am__doit' | $am_make -f - >/dev/null 2>&1; then
+  am_cv_make_support_nested_variables=yes
+else
+  am_cv_make_support_nested_variables=no
+fi
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5
+printf "%s\n" "$am_cv_make_support_nested_variables" >&6; }
+if test $am_cv_make_support_nested_variables = yes; then
+    AM_V='$(V)'
+  AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)'
+else
+  AM_V=$AM_DEFAULT_VERBOSITY
+  AM_DEFAULT_V=$AM_DEFAULT_VERBOSITY
+fi
+AM_BACKSLASH='\'
 
-AM_DEFAULT_VERBOSITY=0
 
 
 
@@ -19774,9 +19185,8 @@ AM_DEFAULT_VERBOSITY=0
 if test ${with_gcov+y}
 then :
   withval=$with_gcov; _AX_CODE_COVERAGE_GCOV_PROG_WITH=$with_gcov
-else case e in #(
-  e) _AX_CODE_COVERAGE_GCOV_PROG_WITH=gcov ;;
-esac
+else $as_nop
+  _AX_CODE_COVERAGE_GCOV_PROG_WITH=gcov
 fi
 
 
@@ -19786,9 +19196,8 @@ printf %s "checking whether to build with code coverage support... " >&6; }
 if test ${enable_code_coverage+y}
 then :
   enableval=$enable_code_coverage;
-else case e in #(
-  e) enable_code_coverage=no ;;
-esac
+else $as_nop
+  enable_code_coverage=no
 fi
 
 
@@ -19818,8 +19227,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_AWK+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$AWK"; then
+else $as_nop
+  if test -n "$AWK"; then
   ac_cv_prog_AWK="$AWK" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -19841,8 +19250,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 AWK=$ac_cv_prog_AWK
 if test -n "$AWK"; then
@@ -19862,8 +19270,8 @@ printf %s "checking for GNU make... " >&6; }
 if test ${_cv_gnu_make_command+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)     _cv_gnu_make_command="" ;
+else $as_nop
+      _cv_gnu_make_command="" ;
     for a in "$MAKE" make gmake gnumake ; do
       if test -z "$a" ; then continue ; fi ;
       if "$a" --version 2> /dev/null | grep GNU 2>&1 > /dev/null ; then
@@ -19872,31 +19280,27 @@ else case e in #(
         ax_check_gnu_make_version=$(echo ${AX_CHECK_GNU_MAKE_HEADLINE} | ${AWK} -F " " '{ print $(NF); }')
         break ;
       fi
-    done ; ;;
-esac
+    done ;
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $_cv_gnu_make_command" >&5
 printf "%s\n" "$_cv_gnu_make_command" >&6; }
   if test "x$_cv_gnu_make_command" = x""
 then :
   ifGNUmake="#"
-else case e in #(
-  e) ifGNUmake="" ;;
-esac
+else $as_nop
+  ifGNUmake=""
 fi
   if test "x$_cv_gnu_make_command" = x""
 then :
   ifnGNUmake=""
-else case e in #(
-  e) ifnGNUmake="#" ;;
-esac
+else $as_nop
+  ifnGNUmake="#"
 fi
   if test "x$_cv_gnu_make_command" = x""
 then :
   { ax_cv_gnu_make_command=; unset ax_cv_gnu_make_command;}
-else case e in #(
-  e) ax_cv_gnu_make_command=${_cv_gnu_make_command} ;;
-esac
+else $as_nop
+  ax_cv_gnu_make_command=${_cv_gnu_make_command}
 fi
   if test "x$_cv_gnu_make_command" = x""
 then :
@@ -19915,8 +19319,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_GCOV+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$GCOV"; then
+else $as_nop
+  if test -n "$GCOV"; then
   ac_cv_prog_GCOV="$GCOV" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -19938,8 +19342,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 GCOV=$ac_cv_prog_GCOV
 if test -n "$GCOV"; then
@@ -19961,8 +19364,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ac_ct_GCOV+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ac_ct_GCOV"; then
+else $as_nop
+  if test -n "$ac_ct_GCOV"; then
   ac_cv_prog_ac_ct_GCOV="$ac_ct_GCOV" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -19984,8 +19387,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 ac_ct_GCOV=$ac_cv_prog_ac_ct_GCOV
 if test -n "$ac_ct_GCOV"; then
@@ -20031,8 +19433,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_LCOV+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$LCOV"; then
+else $as_nop
+  if test -n "$LCOV"; then
   ac_cv_prog_LCOV="$LCOV" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -20054,8 +19456,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 LCOV=$ac_cv_prog_LCOV
 if test -n "$LCOV"; then
@@ -20074,8 +19475,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_GENHTML+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$GENHTML"; then
+else $as_nop
+  if test -n "$GENHTML"; then
   ac_cv_prog_GENHTML="$GENHTML" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -20097,8 +19498,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 GENHTML=$ac_cv_prog_GENHTML
 if test -n "$GENHTML"; then
@@ -20153,34 +19553,31 @@ if test ${enable_largefile+y}
 then :
   enableval=$enable_largefile;
 fi
-if test "$enable_largefile,$enable_year2038" != no,no
-then :
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable large file support" >&5
-printf %s "checking for $CC option to enable large file support... " >&6; }
-if test ${ac_cv_sys_largefile_opts+y}
+
+if test "$enable_largefile" != no; then
+
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for special C compiler options needed for large files" >&5
+printf %s "checking for special C compiler options needed for large files... " >&6; }
+if test ${ac_cv_sys_largefile_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_CC="$CC"
-  ac_opt_found=no
-  for ac_opt in "none needed" "-D_FILE_OFFSET_BITS=64" "-D_LARGE_FILES=1" "-n32"; do
-    if test x"$ac_opt" != x"none needed"
-then :
-  CC="$ac_save_CC $ac_opt"
-fi
-    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  ac_cv_sys_largefile_CC=no
+     if test "$GCC" != yes; then
+       ac_save_CC=$CC
+       while :; do
+	 # IRIX 6.2 and later do not support large files by default,
+	 # so use the C compiler's -n32 option if that helps.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <sys/types.h>
-#ifndef FTYPE
-# define FTYPE off_t
-#endif
- /* Check that FTYPE can represent 2**63 - 1 correctly.
-    We can't simply define LARGE_FTYPE to be 9223372036854775807,
+ /* Check that off_t can represent 2**63 - 1 correctly.
+    We can't simply define LARGE_OFF_T to be 9223372036854775807,
     since some C++ compilers masquerading as C compilers
     incorrectly reject 9223372036854775807.  */
-#define LARGE_FTYPE (((FTYPE) 1 << 31 << 31) - 1 + ((FTYPE) 1 << 31 << 31))
-  int FTYPE_is_large[(LARGE_FTYPE % 2147483629 == 721
-		       && LARGE_FTYPE % 2147483647 == 1)
+#define LARGE_OFF_T (((off_t) 1 << 31 << 31) - 1 + ((off_t) 1 << 31 << 31))
+  int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
+		       && LARGE_OFF_T % 2147483647 == 1)
 		      ? 1 : -1];
 int
 main (void)
@@ -20190,88 +19587,142 @@ main (void)
   return 0;
 }
 _ACEOF
-if ac_fn_c_try_compile "$LINENO"
-then :
-  if test x"$ac_opt" = x"none needed"
-then :
-  # GNU/Linux s390x and alpha need _FILE_OFFSET_BITS=64 for wide ino_t.
-	 CC="$CC -DFTYPE=ino_t"
 	 if ac_fn_c_try_compile "$LINENO"
 then :
-
-else case e in #(
-  e) CC="$CC -D_FILE_OFFSET_BITS=64"
-	    if ac_fn_c_try_compile "$LINENO"
-then :
-  ac_opt='-D_FILE_OFFSET_BITS=64'
+  break
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam
+	 CC="$CC -n32"
+	 if ac_fn_c_try_compile "$LINENO"
+then :
+  ac_cv_sys_largefile_CC=' -n32'; break
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam
+	 break
+       done
+       CC=$ac_save_CC
+       rm -f conftest.$ac_ext
+    fi
 fi
-      ac_cv_sys_largefile_opts=$ac_opt
-      ac_opt_found=yes
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sys_largefile_CC" >&5
+printf "%s\n" "$ac_cv_sys_largefile_CC" >&6; }
+  if test "$ac_cv_sys_largefile_CC" != no; then
+    CC=$CC$ac_cv_sys_largefile_CC
+  fi
+
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for _FILE_OFFSET_BITS value needed for large files" >&5
+printf %s "checking for _FILE_OFFSET_BITS value needed for large files... " >&6; }
+if test ${ac_cv_sys_file_offset_bits+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  while :; do
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+ /* Check that off_t can represent 2**63 - 1 correctly.
+    We can't simply define LARGE_OFF_T to be 9223372036854775807,
+    since some C++ compilers masquerading as C compilers
+    incorrectly reject 9223372036854775807.  */
+#define LARGE_OFF_T (((off_t) 1 << 31 << 31) - 1 + ((off_t) 1 << 31 << 31))
+  int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
+		       && LARGE_OFF_T % 2147483647 == 1)
+		      ? 1 : -1];
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+  ac_cv_sys_file_offset_bits=no; break
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-    test $ac_opt_found = no || break
-  done
-  CC="$ac_save_CC"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#define _FILE_OFFSET_BITS 64
+#include <sys/types.h>
+ /* Check that off_t can represent 2**63 - 1 correctly.
+    We can't simply define LARGE_OFF_T to be 9223372036854775807,
+    since some C++ compilers masquerading as C compilers
+    incorrectly reject 9223372036854775807.  */
+#define LARGE_OFF_T (((off_t) 1 << 31 << 31) - 1 + ((off_t) 1 << 31 << 31))
+  int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
+		       && LARGE_OFF_T % 2147483647 == 1)
+		      ? 1 : -1];
+int
+main (void)
+{
 
-  test $ac_opt_found = yes || ac_cv_sys_largefile_opts="support not detected" ;;
-esac
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+  ac_cv_sys_file_offset_bits=64; break
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sys_largefile_opts" >&5
-printf "%s\n" "$ac_cv_sys_largefile_opts" >&6; }
-
-ac_have_largefile=yes
-case $ac_cv_sys_largefile_opts in #(
-  "none needed") :
-     ;; #(
-  "supported through gnulib") :
-     ;; #(
-  "support not detected") :
-    ac_have_largefile=no ;; #(
-  "-D_FILE_OFFSET_BITS=64") :
-
-printf "%s\n" "#define _FILE_OFFSET_BITS 64" >>confdefs.h
- ;; #(
-  "-D_LARGE_FILES=1") :
-
-printf "%s\n" "#define _LARGE_FILES 1" >>confdefs.h
- ;; #(
-  "-n32") :
-    CC="$CC -n32" ;; #(
-  *) :
-    as_fn_error $? "internal error: bad value for \$ac_cv_sys_largefile_opts" "$LINENO" 5 ;;
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+  ac_cv_sys_file_offset_bits=unknown
+  break
+done
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sys_file_offset_bits" >&5
+printf "%s\n" "$ac_cv_sys_file_offset_bits" >&6; }
+case $ac_cv_sys_file_offset_bits in #(
+  no | unknown) ;;
+  *)
+printf "%s\n" "#define _FILE_OFFSET_BITS $ac_cv_sys_file_offset_bits" >>confdefs.h
+;;
 esac
-
-if test "$enable_year2038" != no
-then :
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option for timestamps after 2038" >&5
-printf %s "checking for $CC option for timestamps after 2038... " >&6; }
-if test ${ac_cv_sys_year2038_opts+y}
+rm -rf conftest*
+  if test $ac_cv_sys_file_offset_bits = unknown; then
+    { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for _LARGE_FILES value needed for large files" >&5
+printf %s "checking for _LARGE_FILES value needed for large files... " >&6; }
+if test ${ac_cv_sys_large_files+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_CPPFLAGS="$CPPFLAGS"
-  ac_opt_found=no
-  for ac_opt in "none needed" "-D_TIME_BITS=64" "-D__MINGW_USE_VC2005_COMPAT" "-U_USE_32_BIT_TIME_T -D__MINGW_USE_VC2005_COMPAT"; do
-    if test x"$ac_opt" != x"none needed"
+else $as_nop
+  while :; do
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+ /* Check that off_t can represent 2**63 - 1 correctly.
+    We can't simply define LARGE_OFF_T to be 9223372036854775807,
+    since some C++ compilers masquerading as C compilers
+    incorrectly reject 9223372036854775807.  */
+#define LARGE_OFF_T (((off_t) 1 << 31 << 31) - 1 + ((off_t) 1 << 31 << 31))
+  int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
+		       && LARGE_OFF_T % 2147483647 == 1)
+		      ? 1 : -1];
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
 then :
-  CPPFLAGS="$ac_save_CPPFLAGS $ac_opt"
+  ac_cv_sys_large_files=no; break
 fi
-    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-
-  #include <time.h>
-  /* Check that time_t can represent 2**32 - 1 correctly.  */
-  #define LARGE_TIME_T \\
-    ((time_t) (((time_t) 1 << 30) - 1 + 3 * ((time_t) 1 << 30)))
-  int verify_time_t_range[(LARGE_TIME_T / 65537 == 65535
-                           && LARGE_TIME_T % 65537 == 0)
-                          ? 1 : -1];
-
+#define _LARGE_FILES 1
+#include <sys/types.h>
+ /* Check that off_t can represent 2**63 - 1 correctly.
+    We can't simply define LARGE_OFF_T to be 9223372036854775807,
+    since some C++ compilers masquerading as C compilers
+    incorrectly reject 9223372036854775807.  */
+#define LARGE_OFF_T (((off_t) 1 << 31 << 31) - 1 + ((off_t) 1 << 31 << 31))
+  int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
+		       && LARGE_OFF_T % 2147483647 == 1)
+		      ? 1 : -1];
 int
 main (void)
 {
@@ -20282,47 +19733,25 @@ main (void)
 _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
-  ac_cv_sys_year2038_opts="$ac_opt"
-      ac_opt_found=yes
+  ac_cv_sys_large_files=1; break
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-    test $ac_opt_found = no || break
-  done
-  CPPFLAGS="$ac_save_CPPFLAGS"
-  test $ac_opt_found = yes || ac_cv_sys_year2038_opts="support not detected" ;;
-esac
+  ac_cv_sys_large_files=unknown
+  break
+done
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sys_year2038_opts" >&5
-printf "%s\n" "$ac_cv_sys_year2038_opts" >&6; }
-
-ac_have_year2038=yes
-case $ac_cv_sys_year2038_opts in #(
-  "none needed") :
-     ;; #(
-  "support not detected") :
-    ac_have_year2038=no ;; #(
-  "-D_TIME_BITS=64") :
-
-printf "%s\n" "#define _TIME_BITS 64" >>confdefs.h
- ;; #(
-  "-D__MINGW_USE_VC2005_COMPAT") :
-
-printf "%s\n" "#define __MINGW_USE_VC2005_COMPAT 1" >>confdefs.h
- ;; #(
-  "-U_USE_32_BIT_TIME_T"*) :
-    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
-as_fn_error $? "the 'time_t' type is currently forced to be 32-bit. It
-will stop working after mid-January 2038. Remove
-_USE_32BIT_TIME_T from the compiler flags.
-See 'config.log' for more details" "$LINENO" 5; } ;; #(
-  *) :
-    as_fn_error $? "internal error: bad value for \$ac_cv_sys_year2038_opts" "$LINENO" 5 ;;
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sys_large_files" >&5
+printf "%s\n" "$ac_cv_sys_large_files" >&6; }
+case $ac_cv_sys_large_files in #(
+  no | unknown) ;;
+  *)
+printf "%s\n" "#define _LARGE_FILES $ac_cv_sys_large_files" >>confdefs.h
+;;
 esac
-
+rm -rf conftest*
+  fi
 fi
 
-fi
 
 case $host_os in
   solaris*)
@@ -20340,14 +19769,14 @@ case $host_os in
 
 
 for flag in -mimpure-text; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_ldflags__$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_ldflags__$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the linker accepts $flag" >&5
 printf %s "checking whether the linker accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$LDFLAGS
   LDFLAGS="$LDFLAGS  $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20364,14 +19793,12 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-  LDFLAGS=$ax_check_save_flags ;;
-esac
+  LDFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20397,14 +19824,12 @@ then :
       LDFLAGS="$LDFLAGS $flag"
       ;;
    esac
-else case e in #(
-  e) LDFLAGS="$flag" ;;
-esac
+else $as_nop
+  LDFLAGS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20464,9 +19889,8 @@ then :
       AM_CPPFLAGS="$AM_CPPFLAGS -DCARES_STATICLIB"
       ;;
    esac
-else case e in #(
-  e) AM_CPPFLAGS="-DCARES_STATICLIB" ;;
-esac
+else $as_nop
+  AM_CPPFLAGS="-DCARES_STATICLIB"
 fi
 
     PKGCONFIG_CFLAGS="-DCARES_STATICLIB"
@@ -20491,8 +19915,8 @@ printf %s "checking whether C compiler accepts ... " >&6; }
 if test ${ax_cv_check_cflags__+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS  "
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20509,35 +19933,32 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   ax_cv_check_cflags__=yes
-else case e in #(
-  e) ax_cv_check_cflags__=no ;;
-esac
+else $as_nop
+  ax_cv_check_cflags__=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_check_cflags__" >&5
 printf "%s\n" "$ax_cv_check_cflags__" >&6; }
 if test x"$ax_cv_check_cflags__" = xyes
 then :
   :
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 
 
 for flag in -fvisibility=hidden; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS  $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20554,13 +19975,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20586,14 +20005,12 @@ then :
       CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag"
       ;;
    esac
-else case e in #(
-  e) CARES_SYMBOL_HIDING_CFLAG="$flag" ;;
-esac
+else $as_nop
+  CARES_SYMBOL_HIDING_CFLAG="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20606,14 +20023,14 @@ done
 
 
 for flag in -xldscope=hidden; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS  $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20630,13 +20047,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20662,14 +20077,12 @@ then :
       CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag"
       ;;
    esac
-else case e in #(
-  e) CARES_SYMBOL_HIDING_CFLAG="$flag" ;;
-esac
+else $as_nop
+  CARES_SYMBOL_HIDING_CFLAG="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20708,14 +20121,14 @@ if test "$enable_warnings" = "yes"; then
 
 
 for flag in -Wall -Wextra -Waggregate-return -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdouble-promotion -Wfloat-equal -Wformat-security -Winit-self -Wjump-misses-init -Wlogical-op -Wmissing-braces -Wmissing-declarations -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wno-coverage-mismatch -Wold-style-definition -Wpacked -Wpedantic -Wpointer-arith -Wredundant-decls -Wshadow -Wsign-conversion -Wstrict-overflow -Wstrict-prototypes -Wtrampolines -Wundef -Wunreachable-code -Wunused -Wvariadic-macros -Wvla -Wwrite-strings -Werror=implicit-int -Werror=implicit-function-declaration -Werror=partial-availability -Wno-long-long ; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS -Werror $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20732,13 +20145,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20764,14 +20175,12 @@ then :
       AM_CFLAGS="$AM_CFLAGS $flag"
       ;;
    esac
-else case e in #(
-  e) AM_CFLAGS="$flag" ;;
-esac
+else $as_nop
+  AM_CFLAGS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20782,14 +20191,14 @@ done
 
 
 for flag in -std=c99; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS -Werror $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20806,13 +20215,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20838,14 +20245,12 @@ then :
       AM_CFLAGS="$AM_CFLAGS $flag"
       ;;
    esac
-else case e in #(
-  e) AM_CFLAGS="$flag" ;;
-esac
+else $as_nop
+  AM_CFLAGS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20855,14 +20260,14 @@ done
 
 
 for flag in -std=c90; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS -Werror $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20879,13 +20284,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20911,14 +20314,12 @@ then :
       AM_CFLAGS="$AM_CFLAGS $flag"
       ;;
    esac
-else case e in #(
-  e) AM_CFLAGS="$flag" ;;
-esac
+else $as_nop
+  AM_CFLAGS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -20931,14 +20332,14 @@ if test "$ax_cv_c_compiler_vendor" = "intel"; then
 
 
 for flag in -shared-intel; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
 printf %s "checking whether C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$CFLAGS
   CFLAGS="$CFLAGS  $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -20955,13 +20356,11 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags ;;
-esac
+  CFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -20987,14 +20386,12 @@ then :
       AM_CFLAGS="$AM_CFLAGS $flag"
       ;;
    esac
-else case e in #(
-  e) AM_CFLAGS="$flag" ;;
-esac
+else $as_nop
+  AM_CFLAGS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -21018,8 +20415,8 @@ if test -z "$CPP"; then
   if test ${ac_cv_prog_CPP+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)     # Double quotes because $CC needs to be expanded
+else $as_nop
+      # Double quotes because $CC needs to be expanded
     for CPP in "$CC -E" "$CC -E -traditional-cpp" cpp /lib/cpp
     do
       ac_preproc_ok=false
@@ -21037,10 +20434,9 @@ _ACEOF
 if ac_fn_c_try_cpp "$LINENO"
 then :
 
-else case e in #(
-  e) # Broken: fails on valid input.
-continue ;;
-esac
+else $as_nop
+  # Broken: fails on valid input.
+continue
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
@@ -21054,16 +20450,15 @@ if ac_fn_c_try_cpp "$LINENO"
 then :
   # Broken: success on invalid input.
 continue
-else case e in #(
-  e) # Passes both tests.
+else $as_nop
+  # Passes both tests.
 ac_preproc_ok=:
-break ;;
-esac
+break
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
 done
-# Because of 'break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
 rm -f conftest.i conftest.err conftest.$ac_ext
 if $ac_preproc_ok
 then :
@@ -21072,8 +20467,7 @@ fi
 
     done
     ac_cv_prog_CPP=$CPP
-   ;;
-esac
+
 fi
   CPP=$ac_cv_prog_CPP
 else
@@ -21096,10 +20490,9 @@ _ACEOF
 if ac_fn_c_try_cpp "$LINENO"
 then :
 
-else case e in #(
-  e) # Broken: fails on valid input.
-continue ;;
-esac
+else $as_nop
+  # Broken: fails on valid input.
+continue
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
@@ -21113,26 +20506,24 @@ if ac_fn_c_try_cpp "$LINENO"
 then :
   # Broken: success on invalid input.
 continue
-else case e in #(
-  e) # Passes both tests.
+else $as_nop
+  # Passes both tests.
 ac_preproc_ok=:
-break ;;
-esac
+break
 fi
 rm -f conftest.err conftest.i conftest.$ac_ext
 
 done
-# Because of 'break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
 rm -f conftest.i conftest.err conftest.$ac_ext
 if $ac_preproc_ok
 then :
 
-else case e in #(
-  e) { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+else $as_nop
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "C preprocessor \"$CPP\" fails sanity check
-See 'config.log' for more details" "$LINENO" 5; } ;;
-esac
+See \`config.log' for more details" "$LINENO" 5; }
 fi
 
 ac_ext=c
@@ -21215,21 +20606,15 @@ printf %s "checking for library containing getservbyport... " >&6; }
 if test ${ac_cv_search_getservbyport+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_func_search_save_LIBS=$LIBS
+else $as_nop
+  ac_func_search_save_LIBS=$LIBS
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char getservbyport (void);
+   builtin and then its argument prototype would still apply.  */
+char getservbyport ();
 int
 main (void)
 {
@@ -21260,13 +20645,11 @@ done
 if test ${ac_cv_search_getservbyport+y}
 then :
 
-else case e in #(
-  e) ac_cv_search_getservbyport=no ;;
-esac
+else $as_nop
+  ac_cv_search_getservbyport=no
 fi
 rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS ;;
-esac
+LIBS=$ac_func_search_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_getservbyport" >&5
 printf "%s\n" "$ac_cv_search_getservbyport" >&6; }
@@ -21289,14 +20672,14 @@ case $host_os in
 
 
 for flag in -lxnet; do
-  as_CACHEVAR=`printf "%s\n" "ax_cv_check_ldflags__$flag" | sed "$as_sed_sh"`
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_ldflags__$flag" | $as_tr_sh`
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the linker accepts $flag" >&5
 printf %s "checking whether the linker accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ax_check_save_flags=$LDFLAGS
   LDFLAGS="$LDFLAGS  $flag"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -21313,14 +20696,12 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   eval "$as_CACHEVAR=yes"
-else case e in #(
-  e) eval "$as_CACHEVAR=no" ;;
-esac
+else $as_nop
+  eval "$as_CACHEVAR=no"
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-  LDFLAGS=$ax_check_save_flags ;;
-esac
+  LDFLAGS=$ax_check_save_flags
 fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -21346,14 +20727,12 @@ then :
       XNET_LIBS="$XNET_LIBS $flag"
       ;;
    esac
-else case e in #(
-  e) XNET_LIBS="$flag" ;;
-esac
+else $as_nop
+  XNET_LIBS="$flag"
 fi
 
-else case e in #(
-  e) : ;;
-esac
+else $as_nop
+  :
 fi
 
 done
@@ -21375,21 +20754,15 @@ printf %s "checking for library containing res_init... " >&6; }
 if test ${ac_cv_search_res_init+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_func_search_save_LIBS=$LIBS
+else $as_nop
+  ac_func_search_save_LIBS=$LIBS
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char res_init (void);
+   builtin and then its argument prototype would still apply.  */
+char res_init ();
 int
 main (void)
 {
@@ -21420,13 +20793,11 @@ done
 if test ${ac_cv_search_res_init+y}
 then :
 
-else case e in #(
-  e) ac_cv_search_res_init=no ;;
-esac
+else $as_nop
+  ac_cv_search_res_init=no
 fi
 rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS ;;
-esac
+LIBS=$ac_func_search_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_res_init" >&5
 printf "%s\n" "$ac_cv_search_res_init" >&6; }
@@ -21439,11 +20810,10 @@ then :
 printf "%s\n" "#define CARES_USE_LIBRESOLV 1" >>confdefs.h
 
 
-else case e in #(
-  e)
+else $as_nop
+
     as_fn_error $? "Unable to find libresolv which is required for z/OS" "$LINENO" 5
-   ;;
-esac
+
 fi
 
 
@@ -21484,12 +20854,11 @@ then :
 printf "%s\n" "yes" >&6; }
     ac_cv_ios_10="yes"
 
-else case e in #(
-  e)
+else $as_nop
+
     { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
-   ;;
-esac
+
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 
@@ -21532,12 +20901,11 @@ then :
 printf "%s\n" "yes" >&6; }
     ac_cv_macos_10_12="yes"
 
-else case e in #(
-  e)
+else $as_nop
+
     { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
-   ;;
-esac
+
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 
@@ -21559,11 +20927,10 @@ printf "%s\n" "yes" >&6; }
 printf "%s\n" "no" >&6; }
     ;;
   esac
-else case e in #(
-  e) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
+else $as_nop
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
 printf "%s\n" "no" >&6; }
- ;;
-esac
+
 fi
 
 
@@ -22417,6 +21784,31 @@ then :
   printf "%s\n" "#define HAVE_ARPA_INET_H 1" >>confdefs.h
 
 fi
+ac_fn_c_check_header_compile "$LINENO" "sys/system_properties.h" "ac_cv_header_sys_system_properties_h" "
+#ifdef HAVE_SYS_TYPES_H
+#include <sys/types.h>
+#endif
+#ifdef HAVE_SYS_TIME_H
+#include <sys/time.h>
+#endif
+#ifdef HAVE_ARPA_NAMESER_H
+#include <arpa/nameser.h>
+#endif
+
+#ifdef HAVE_SYS_SOCKET_H
+#include <sys/socket.h>
+#endif
+#ifdef HAVE_NETINET_IN_H
+#include <netinet/in.h>
+#endif
+
+
+"
+if test "x$ac_cv_header_sys_system_properties_h" = xyes
+then :
+  printf "%s\n" "#define HAVE_SYS_SYSTEM_PROPERTIES_H 1" >>confdefs.h
+
+fi
 
 
 
@@ -22504,6 +21896,9 @@ cares_all_includes="
 #ifdef HAVE_RESOLV_H
 #  include <resolv.h>
 #endif
+#ifdef HAVE_SYS_SYSTEM_PROPERTIES_H
+#  include <sys/system_properties.h>
+#endif
 #ifdef HAVE_IPHLPAPI_H
 #  include <iphlpapi.h>
 #endif
@@ -22529,8 +21924,8 @@ printf %s "checking for $CC options needed to detect all undeclared functions...
 if test ${ac_cv_c_undeclared_builtin_options+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_CFLAGS=$CFLAGS
+else $as_nop
+  ac_save_CFLAGS=$CFLAGS
    ac_cv_c_undeclared_builtin_options='cannot detect'
    for ac_arg in '' -fno-builtin; do
      CFLAGS="$ac_save_CFLAGS $ac_arg"
@@ -22549,8 +21944,8 @@ _ACEOF
 if ac_fn_c_try_compile "$LINENO"
 then :
 
-else case e in #(
-  e) # This test program should compile successfully.
+else $as_nop
+  # This test program should compile successfully.
         # No library function is consistently available on
         # freestanding implementations, so test against a dummy
         # declaration.  Include always-available headers on the
@@ -22578,29 +21973,26 @@ then :
   if test x"$ac_arg" = x
 then :
   ac_cv_c_undeclared_builtin_options='none needed'
-else case e in #(
-  e) ac_cv_c_undeclared_builtin_options=$ac_arg ;;
-esac
+else $as_nop
+  ac_cv_c_undeclared_builtin_options=$ac_arg
 fi
           break
 fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
     done
     CFLAGS=$ac_save_CFLAGS
-   ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_undeclared_builtin_options" >&5
 printf "%s\n" "$ac_cv_c_undeclared_builtin_options" >&6; }
   case $ac_cv_c_undeclared_builtin_options in #(
   'cannot detect') :
-    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "cannot make $CC report undeclared builtins
-See 'config.log' for more details" "$LINENO" 5; } ;; #(
+See \`config.log' for more details" "$LINENO" 5; } ;; #(
   'none needed') :
     ac_c_undeclared_builtin_options='' ;; #(
   *) :
@@ -22637,9 +22029,8 @@ ac_fn_c_check_type "$LINENO" "ssize_t" "ac_cv_type_ssize_t" "$ac_includes_defaul
 if test "x$ac_cv_type_ssize_t" = xyes
 then :
    CARES_TYPEOF_ARES_SSIZE_T=ssize_t
-else case e in #(
-  e)  CARES_TYPEOF_ARES_SSIZE_T=int  ;;
-esac
+else $as_nop
+   CARES_TYPEOF_ARES_SSIZE_T=int
 fi
 
 
@@ -22661,13 +22052,12 @@ cat >>confdefs.h <<_EOF
 _EOF
 
 
-else case e in #(
-  e)
+else $as_nop
+
 cat >>confdefs.h <<_EOF
 #define CARES_TYPEOF_ARES_SOCKLEN_T int
 _EOF
-  ;;
-esac
+
 fi
 
 
@@ -22685,21 +22075,15 @@ printf %s "checking for library containing clock_gettime... " >&6; }
 if test ${ac_cv_search_clock_gettime+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_func_search_save_LIBS=$LIBS
+else $as_nop
+  ac_func_search_save_LIBS=$LIBS
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char clock_gettime (void);
+   builtin and then its argument prototype would still apply.  */
+char clock_gettime ();
 int
 main (void)
 {
@@ -22730,13 +22114,11 @@ done
 if test ${ac_cv_search_clock_gettime+y}
 then :
 
-else case e in #(
-  e) ac_cv_search_clock_gettime=no ;;
-esac
+else $as_nop
+  ac_cv_search_clock_gettime=no
 fi
 rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS ;;
-esac
+LIBS=$ac_func_search_save_LIBS
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_clock_gettime" >&5
 printf "%s\n" "$ac_cv_search_clock_gettime" >&6; }
@@ -23298,11 +22680,10 @@ ac_fn_c_check_type "$LINENO" "size_t" "ac_cv_type_size_t" "$ac_includes_default"
 if test "x$ac_cv_type_size_t" = xyes
 then :
 
-else case e in #(
-  e)
+else $as_nop
+
 printf "%s\n" "#define size_t unsigned int" >>confdefs.h
- ;;
-esac
+
 fi
 
 ac_fn_check_decl "$LINENO" "AF_INET6" "ac_cv_have_decl_AF_INET6" "$cares_all_includes
@@ -23509,140 +22890,6 @@ fi
 
 
 if test "${CARES_THREADS}" = "yes" -a "x${ac_cv_native_windows}" != "xyes" ; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for egrep -e" >&5
-printf %s "checking for egrep -e... " >&6; }
-if test ${ac_cv_path_EGREP_TRADITIONAL+y}
-then :
-  printf %s "(cached) " >&6
-else case e in #(
-  e) if test -z "$EGREP_TRADITIONAL"; then
-  ac_path_EGREP_TRADITIONAL_found=false
-  # Loop through the user's path and test for each of PROGNAME-LIST
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_prog in grep ggrep
-   do
-    for ac_exec_ext in '' $ac_executable_extensions; do
-      ac_path_EGREP_TRADITIONAL="$as_dir$ac_prog$ac_exec_ext"
-      as_fn_executable_p "$ac_path_EGREP_TRADITIONAL" || continue
-# Check for GNU ac_path_EGREP_TRADITIONAL and select it if it is found.
-  # Check for GNU $ac_path_EGREP_TRADITIONAL
-case `"$ac_path_EGREP_TRADITIONAL" --version 2>&1` in #(
-*GNU*)
-  ac_cv_path_EGREP_TRADITIONAL="$ac_path_EGREP_TRADITIONAL" ac_path_EGREP_TRADITIONAL_found=:;;
-#(
-*)
-  ac_count=0
-  printf %s 0123456789 >"conftest.in"
-  while :
-  do
-    cat "conftest.in" "conftest.in" >"conftest.tmp"
-    mv "conftest.tmp" "conftest.in"
-    cp "conftest.in" "conftest.nl"
-    printf "%s\n" 'EGREP_TRADITIONAL' >> "conftest.nl"
-    "$ac_path_EGREP_TRADITIONAL" -E 'EGR(EP|AC)_TRADITIONAL$' < "conftest.nl" >"conftest.out" 2>/dev/null || break
-    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
-    as_fn_arith $ac_count + 1 && ac_count=$as_val
-    if test $ac_count -gt ${ac_path_EGREP_TRADITIONAL_max-0}; then
-      # Best one so far, save it but keep looking for a better one
-      ac_cv_path_EGREP_TRADITIONAL="$ac_path_EGREP_TRADITIONAL"
-      ac_path_EGREP_TRADITIONAL_max=$ac_count
-    fi
-    # 10*(2^10) chars as input seems more than enough
-    test $ac_count -gt 10 && break
-  done
-  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
-esac
-
-      $ac_path_EGREP_TRADITIONAL_found && break 3
-    done
-  done
-  done
-IFS=$as_save_IFS
-  if test -z "$ac_cv_path_EGREP_TRADITIONAL"; then
-    :
-  fi
-else
-  ac_cv_path_EGREP_TRADITIONAL=$EGREP_TRADITIONAL
-fi
-
-    if test "$ac_cv_path_EGREP_TRADITIONAL"
-then :
-  ac_cv_path_EGREP_TRADITIONAL="$ac_cv_path_EGREP_TRADITIONAL -E"
-else case e in #(
-  e) if test -z "$EGREP_TRADITIONAL"; then
-  ac_path_EGREP_TRADITIONAL_found=false
-  # Loop through the user's path and test for each of PROGNAME-LIST
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-    '') as_dir=./ ;;
-    */) ;;
-    *) as_dir=$as_dir/ ;;
-  esac
-    for ac_prog in egrep
-   do
-    for ac_exec_ext in '' $ac_executable_extensions; do
-      ac_path_EGREP_TRADITIONAL="$as_dir$ac_prog$ac_exec_ext"
-      as_fn_executable_p "$ac_path_EGREP_TRADITIONAL" || continue
-# Check for GNU ac_path_EGREP_TRADITIONAL and select it if it is found.
-  # Check for GNU $ac_path_EGREP_TRADITIONAL
-case `"$ac_path_EGREP_TRADITIONAL" --version 2>&1` in #(
-*GNU*)
-  ac_cv_path_EGREP_TRADITIONAL="$ac_path_EGREP_TRADITIONAL" ac_path_EGREP_TRADITIONAL_found=:;;
-#(
-*)
-  ac_count=0
-  printf %s 0123456789 >"conftest.in"
-  while :
-  do
-    cat "conftest.in" "conftest.in" >"conftest.tmp"
-    mv "conftest.tmp" "conftest.in"
-    cp "conftest.in" "conftest.nl"
-    printf "%s\n" 'EGREP_TRADITIONAL' >> "conftest.nl"
-    "$ac_path_EGREP_TRADITIONAL" 'EGR(EP|AC)_TRADITIONAL$' < "conftest.nl" >"conftest.out" 2>/dev/null || break
-    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
-    as_fn_arith $ac_count + 1 && ac_count=$as_val
-    if test $ac_count -gt ${ac_path_EGREP_TRADITIONAL_max-0}; then
-      # Best one so far, save it but keep looking for a better one
-      ac_cv_path_EGREP_TRADITIONAL="$ac_path_EGREP_TRADITIONAL"
-      ac_path_EGREP_TRADITIONAL_max=$ac_count
-    fi
-    # 10*(2^10) chars as input seems more than enough
-    test $ac_count -gt 10 && break
-  done
-  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
-esac
-
-      $ac_path_EGREP_TRADITIONAL_found && break 3
-    done
-  done
-  done
-IFS=$as_save_IFS
-  if test -z "$ac_cv_path_EGREP_TRADITIONAL"; then
-    as_fn_error $? "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5
-  fi
-else
-  ac_cv_path_EGREP_TRADITIONAL=$EGREP_TRADITIONAL
-fi
- ;;
-esac
-fi ;;
-esac
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP_TRADITIONAL" >&5
-printf "%s\n" "$ac_cv_path_EGREP_TRADITIONAL" >&6; }
- EGREP_TRADITIONAL=$ac_cv_path_EGREP_TRADITIONAL
-
 
 
 
@@ -23683,14 +22930,8 @@ printf %s "checking for pthread_join using $CC $PTHREAD_CFLAGS $PTHREAD_LIBS...
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char pthread_join (void);
+   builtin and then its argument prototype would still apply.  */
+char pthread_join ();
 int
 main (void)
 {
@@ -23784,7 +23025,7 @@ case $host_os in
 
 _ACEOF
 if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
-  $EGREP_TRADITIONAL "AX_PTHREAD_ZOS_MISSING" >/dev/null 2>&1
+  $EGREP "AX_PTHREAD_ZOS_MISSING" >/dev/null 2>&1
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: IBM z/OS requires -D_OPEN_THREADS or -D_UNIX03_THREADS to enable pthreads support." >&5
 printf "%s\n" "$as_me: WARNING: IBM z/OS requires -D_OPEN_THREADS or -D_UNIX03_THREADS to enable pthreads support." >&2;}
@@ -23814,8 +23055,8 @@ printf %s "checking whether $CC is Clang... " >&6; }
 if test ${ax_cv_PTHREAD_CLANG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_CLANG=no
+else $as_nop
+  ax_cv_PTHREAD_CLANG=no
      # Note that Autoconf sets GCC=yes for Clang as well as GCC
      if test "x$GCC" = "xyes"; then
         cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -23827,15 +23068,14 @@ else case e in #(
 
 _ACEOF
 if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
-  $EGREP_TRADITIONAL "AX_PTHREAD_CC_IS_CLANG" >/dev/null 2>&1
+  $EGREP "AX_PTHREAD_CC_IS_CLANG" >/dev/null 2>&1
 then :
   ax_cv_PTHREAD_CLANG=yes
 fi
 rm -rf conftest*
 
      fi
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_CLANG" >&5
 printf "%s\n" "$ax_cv_PTHREAD_CLANG" >&6; }
@@ -23885,9 +23125,8 @@ esac
 if test "x$ax_pthread_check_macro" = "x--"
 then :
   ax_pthread_check_cond=0
-else case e in #(
-  e) ax_pthread_check_cond="!defined($ax_pthread_check_macro)" ;;
-esac
+else $as_nop
+  ax_pthread_check_cond="!defined($ax_pthread_check_macro)"
 fi
 
 
@@ -23921,8 +23160,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ax_pthread_config+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ax_pthread_config"; then
+else $as_nop
+  if test -n "$ax_pthread_config"; then
   ac_cv_prog_ax_pthread_config="$ax_pthread_config" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -23945,8 +23184,7 @@ done
 IFS=$as_save_IFS
 
   test -z "$ac_cv_prog_ax_pthread_config" && ac_cv_prog_ax_pthread_config="no"
-fi ;;
-esac
+fi
 fi
 ax_pthread_config=$ac_cv_prog_ax_pthread_config
 if test -n "$ax_pthread_config"; then
@@ -24079,8 +23317,8 @@ printf %s "checking whether Clang needs flag to prevent \"argument unused\" warn
 if test ${ax_cv_PTHREAD_CLANG_NO_WARN_FLAG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_CLANG_NO_WARN_FLAG=unknown
+else $as_nop
+  ax_cv_PTHREAD_CLANG_NO_WARN_FLAG=unknown
              # Create an alternate version of $ac_link that compiles and
              # links in two steps (.c -> .o, .o -> exe) instead of one
              # (.c -> exe), because the warning occurs only in the second
@@ -24126,8 +23364,7 @@ then :
   ax_pthread_try=no
 fi
              ax_cv_PTHREAD_CLANG_NO_WARN_FLAG="$ax_pthread_try"
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_CLANG_NO_WARN_FLAG" >&5
 printf "%s\n" "$ax_cv_PTHREAD_CLANG_NO_WARN_FLAG" >&6; }
@@ -24154,8 +23391,8 @@ printf %s "checking for joinable pthread attribute... " >&6; }
 if test ${ax_cv_PTHREAD_JOINABLE_ATTR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_JOINABLE_ATTR=unknown
+else $as_nop
+  ax_cv_PTHREAD_JOINABLE_ATTR=unknown
              for ax_pthread_attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do
                  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -24175,8 +23412,7 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
              done
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_JOINABLE_ATTR" >&5
 printf "%s\n" "$ax_cv_PTHREAD_JOINABLE_ATTR" >&6; }
@@ -24196,15 +23432,14 @@ printf %s "checking whether more special flags are required for pthreads... " >&
 if test ${ax_cv_PTHREAD_SPECIAL_FLAGS+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_SPECIAL_FLAGS=no
+else $as_nop
+  ax_cv_PTHREAD_SPECIAL_FLAGS=no
              case $host_os in
              solaris*)
              ax_cv_PTHREAD_SPECIAL_FLAGS="-D_POSIX_PTHREAD_SEMANTICS"
              ;;
              esac
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_SPECIAL_FLAGS" >&5
 printf "%s\n" "$ax_cv_PTHREAD_SPECIAL_FLAGS" >&6; }
@@ -24220,8 +23455,8 @@ printf %s "checking for PTHREAD_PRIO_INHERIT... " >&6; }
 if test ${ax_cv_PTHREAD_PRIO_INHERIT+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <pthread.h>
 int
@@ -24236,14 +23471,12 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ax_cv_PTHREAD_PRIO_INHERIT=yes
-else case e in #(
-  e) ax_cv_PTHREAD_PRIO_INHERIT=no ;;
-esac
+else $as_nop
+  ax_cv_PTHREAD_PRIO_INHERIT=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_PRIO_INHERIT" >&5
 printf "%s\n" "$ax_cv_PTHREAD_PRIO_INHERIT" >&6; }
@@ -24293,8 +23526,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_PTHREAD_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$PTHREAD_CC"; then
+else $as_nop
+  if test -n "$PTHREAD_CC"; then
   ac_cv_prog_PTHREAD_CC="$PTHREAD_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -24316,8 +23549,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 PTHREAD_CC=$ac_cv_prog_PTHREAD_CC
 if test -n "$PTHREAD_CC"; then
@@ -24344,8 +23576,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_PTHREAD_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$PTHREAD_CXX"; then
+else $as_nop
+  if test -n "$PTHREAD_CXX"; then
   ac_cv_prog_PTHREAD_CXX="$PTHREAD_CXX" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -24367,8 +23599,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 PTHREAD_CXX=$ac_cv_prog_PTHREAD_CXX
 if test -n "$PTHREAD_CXX"; then
@@ -24494,8 +23725,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_path_PKG_CONFIG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) case $PKG_CONFIG in
+else $as_nop
+  case $PKG_CONFIG in
   [\\/]* | ?:[\\/]*)
   ac_cv_path_PKG_CONFIG="$PKG_CONFIG" # Let the user override the test with a path.
   ;;
@@ -24520,7 +23751,6 @@ done
 IFS=$as_save_IFS
 
   ;;
-esac ;;
 esac
 fi
 PKG_CONFIG=$ac_cv_path_PKG_CONFIG
@@ -24543,8 +23773,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_path_ac_pt_PKG_CONFIG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) case $ac_pt_PKG_CONFIG in
+else $as_nop
+  case $ac_pt_PKG_CONFIG in
   [\\/]* | ?:[\\/]*)
   ac_cv_path_ac_pt_PKG_CONFIG="$ac_pt_PKG_CONFIG" # Let the user override the test with a path.
   ;;
@@ -24569,7 +23799,6 @@ done
 IFS=$as_save_IFS
 
   ;;
-esac ;;
 esac
 fi
 ac_pt_PKG_CONFIG=$ac_cv_path_ac_pt_PKG_CONFIG
@@ -24767,8 +23996,8 @@ printf %s "checking whether user namespaces are supported... " >&6; }
 if test ${ax_cv_user_namespace+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ac_ext=c
 ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
@@ -24778,8 +24007,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
   if test "$cross_compiling" = yes
 then :
   ax_cv_user_namespace=no
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 #define _GNU_SOURCE
@@ -24818,13 +24047,11 @@ _ACEOF
 if ac_fn_c_try_run "$LINENO"
 then :
   ax_cv_user_namespace=yes
-else case e in #(
-  e) ax_cv_user_namespace=no ;;
-esac
+else $as_nop
+  ax_cv_user_namespace=no
 fi
 rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
-  conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 
  ac_ext=c
@@ -24833,8 +24060,7 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
 ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
 ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
-  ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_user_namespace" >&5
 printf "%s\n" "$ax_cv_user_namespace" >&6; }
@@ -24849,8 +24075,8 @@ printf %s "checking whether UTS namespaces are supported... " >&6; }
 if test ${ax_cv_uts_namespace+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e)
+else $as_nop
+
   ac_ext=c
 ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
@@ -24860,8 +24086,8 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
   if test "$cross_compiling" = yes
 then :
   ax_cv_uts_namespace=no
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
 #define _GNU_SOURCE
@@ -24920,13 +24146,11 @@ _ACEOF
 if ac_fn_c_try_run "$LINENO"
 then :
   ax_cv_uts_namespace=yes
-else case e in #(
-  e) ax_cv_uts_namespace=no ;;
-esac
+else $as_nop
+  ax_cv_uts_namespace=no
 fi
 rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
-  conftest.$ac_objext conftest.beam conftest.$ac_ext ;;
-esac
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
 fi
 
  ac_ext=c
@@ -24935,8 +24159,7 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
 ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
 ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
-  ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_uts_namespace" >&5
 printf "%s\n" "$ax_cv_uts_namespace" >&6; }
@@ -24969,17 +24192,17 @@ ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
       for switch in -std=c++${alternative} +std=c++${alternative} "-h std=c++${alternative}" MSVC; do
         if test x"$switch" = xMSVC; then
                                         switch=-std:c++${alternative}
-          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_${switch}_MSVC" | sed "$as_sed_sh"`
+          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_${switch}_MSVC" | $as_tr_sh`
         else
-          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_$switch" | sed "$as_sed_sh"`
+          cachevar=`printf "%s\n" "ax_cv_cxx_compile_cxx14_$switch" | $as_tr_sh`
         fi
         { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $CXX supports C++14 features with $switch" >&5
 printf %s "checking whether $CXX supports C++14 features with $switch... " >&6; }
 if eval test \${$cachevar+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ac_save_CXX="$CXX"
+else $as_nop
+  ac_save_CXX="$CXX"
            CXX="$CXX $switch"
            cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -25399,13 +24622,11 @@ _ACEOF
 if ac_fn_cxx_try_compile "$LINENO"
 then :
   eval $cachevar=yes
-else case e in #(
-  e) eval $cachevar=no ;;
-esac
+else $as_nop
+  eval $cachevar=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-           CXX="$ac_save_CXX" ;;
-esac
+           CXX="$ac_save_CXX"
 fi
 eval ac_res=\$$cachevar
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
@@ -25488,14 +24709,8 @@ printf %s "checking for pthread_join using $CC $PTHREAD_CFLAGS $PTHREAD_LIBS...
 
 /* Override any GCC internal prototype to avoid an error.
    Use char because int might match the return type of a GCC
-   builtin and then its argument prototype would still apply.
-   The 'extern "C"' is for builds by C++ compilers;
-   although this is not generally supported in C code supporting it here
-   has little cost and some practical benefit (sr 110532).  */
-#ifdef __cplusplus
-extern "C"
-#endif
-char pthread_join (void);
+   builtin and then its argument prototype would still apply.  */
+char pthread_join ();
 int
 main (void)
 {
@@ -25589,7 +24804,7 @@ case $host_os in
 
 _ACEOF
 if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
-  $EGREP_TRADITIONAL "AX_PTHREAD_ZOS_MISSING" >/dev/null 2>&1
+  $EGREP "AX_PTHREAD_ZOS_MISSING" >/dev/null 2>&1
 then :
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: IBM z/OS requires -D_OPEN_THREADS or -D_UNIX03_THREADS to enable pthreads support." >&5
 printf "%s\n" "$as_me: WARNING: IBM z/OS requires -D_OPEN_THREADS or -D_UNIX03_THREADS to enable pthreads support." >&2;}
@@ -25619,8 +24834,8 @@ printf %s "checking whether $CC is Clang... " >&6; }
 if test ${ax_cv_PTHREAD_CLANG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_CLANG=no
+else $as_nop
+  ax_cv_PTHREAD_CLANG=no
      # Note that Autoconf sets GCC=yes for Clang as well as GCC
      if test "x$GCC" = "xyes"; then
         cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -25632,15 +24847,14 @@ else case e in #(
 
 _ACEOF
 if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
-  $EGREP_TRADITIONAL "AX_PTHREAD_CC_IS_CLANG" >/dev/null 2>&1
+  $EGREP "AX_PTHREAD_CC_IS_CLANG" >/dev/null 2>&1
 then :
   ax_cv_PTHREAD_CLANG=yes
 fi
 rm -rf conftest*
 
      fi
-     ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_CLANG" >&5
 printf "%s\n" "$ax_cv_PTHREAD_CLANG" >&6; }
@@ -25690,9 +24904,8 @@ esac
 if test "x$ax_pthread_check_macro" = "x--"
 then :
   ax_pthread_check_cond=0
-else case e in #(
-  e) ax_pthread_check_cond="!defined($ax_pthread_check_macro)" ;;
-esac
+else $as_nop
+  ax_pthread_check_cond="!defined($ax_pthread_check_macro)"
 fi
 
 
@@ -25726,8 +24939,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_ax_pthread_config+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$ax_pthread_config"; then
+else $as_nop
+  if test -n "$ax_pthread_config"; then
   ac_cv_prog_ax_pthread_config="$ax_pthread_config" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -25750,8 +24963,7 @@ done
 IFS=$as_save_IFS
 
   test -z "$ac_cv_prog_ax_pthread_config" && ac_cv_prog_ax_pthread_config="no"
-fi ;;
-esac
+fi
 fi
 ax_pthread_config=$ac_cv_prog_ax_pthread_config
 if test -n "$ax_pthread_config"; then
@@ -25884,8 +25096,8 @@ printf %s "checking whether Clang needs flag to prevent \"argument unused\" warn
 if test ${ax_cv_PTHREAD_CLANG_NO_WARN_FLAG+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_CLANG_NO_WARN_FLAG=unknown
+else $as_nop
+  ax_cv_PTHREAD_CLANG_NO_WARN_FLAG=unknown
              # Create an alternate version of $ac_link that compiles and
              # links in two steps (.c -> .o, .o -> exe) instead of one
              # (.c -> exe), because the warning occurs only in the second
@@ -25931,8 +25143,7 @@ then :
   ax_pthread_try=no
 fi
              ax_cv_PTHREAD_CLANG_NO_WARN_FLAG="$ax_pthread_try"
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_CLANG_NO_WARN_FLAG" >&5
 printf "%s\n" "$ax_cv_PTHREAD_CLANG_NO_WARN_FLAG" >&6; }
@@ -25959,8 +25170,8 @@ printf %s "checking for joinable pthread attribute... " >&6; }
 if test ${ax_cv_PTHREAD_JOINABLE_ATTR+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_JOINABLE_ATTR=unknown
+else $as_nop
+  ax_cv_PTHREAD_JOINABLE_ATTR=unknown
              for ax_pthread_attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do
                  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -25980,8 +25191,7 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
              done
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_JOINABLE_ATTR" >&5
 printf "%s\n" "$ax_cv_PTHREAD_JOINABLE_ATTR" >&6; }
@@ -26001,15 +25211,14 @@ printf %s "checking whether more special flags are required for pthreads... " >&
 if test ${ax_cv_PTHREAD_SPECIAL_FLAGS+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) ax_cv_PTHREAD_SPECIAL_FLAGS=no
+else $as_nop
+  ax_cv_PTHREAD_SPECIAL_FLAGS=no
              case $host_os in
              solaris*)
              ax_cv_PTHREAD_SPECIAL_FLAGS="-D_POSIX_PTHREAD_SEMANTICS"
              ;;
              esac
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_SPECIAL_FLAGS" >&5
 printf "%s\n" "$ax_cv_PTHREAD_SPECIAL_FLAGS" >&6; }
@@ -26025,8 +25234,8 @@ printf %s "checking for PTHREAD_PRIO_INHERIT... " >&6; }
 if test ${ax_cv_PTHREAD_PRIO_INHERIT+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+else $as_nop
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <pthread.h>
 int
@@ -26041,14 +25250,12 @@ _ACEOF
 if ac_fn_c_try_link "$LINENO"
 then :
   ax_cv_PTHREAD_PRIO_INHERIT=yes
-else case e in #(
-  e) ax_cv_PTHREAD_PRIO_INHERIT=no ;;
-esac
+else $as_nop
+  ax_cv_PTHREAD_PRIO_INHERIT=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.beam \
     conftest$ac_exeext conftest.$ac_ext
-             ;;
-esac
+
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_PRIO_INHERIT" >&5
 printf "%s\n" "$ax_cv_PTHREAD_PRIO_INHERIT" >&6; }
@@ -26098,8 +25305,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_PTHREAD_CC+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$PTHREAD_CC"; then
+else $as_nop
+  if test -n "$PTHREAD_CC"; then
   ac_cv_prog_PTHREAD_CC="$PTHREAD_CC" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -26121,8 +25328,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 PTHREAD_CC=$ac_cv_prog_PTHREAD_CC
 if test -n "$PTHREAD_CC"; then
@@ -26149,8 +25355,8 @@ printf %s "checking for $ac_word... " >&6; }
 if test ${ac_cv_prog_PTHREAD_CXX+y}
 then :
   printf %s "(cached) " >&6
-else case e in #(
-  e) if test -n "$PTHREAD_CXX"; then
+else $as_nop
+  if test -n "$PTHREAD_CXX"; then
   ac_cv_prog_PTHREAD_CXX="$PTHREAD_CXX" # Let the user override the test.
 else
 as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
@@ -26172,8 +25378,7 @@ done
   done
 IFS=$as_save_IFS
 
-fi ;;
-esac
+fi
 fi
 PTHREAD_CXX=$ac_cv_prog_PTHREAD_CXX
 if test -n "$PTHREAD_CXX"; then
@@ -26265,8 +25470,8 @@ cat >confcache <<\_ACEOF
 # config.status only pays attention to the cache file if you give it
 # the --recheck option to rerun configure.
 #
-# 'ac_cv_env_foo' variables (set or unset) will be overridden when
-# loading this file, other *unset* 'ac_cv_foo' will be assigned the
+# `ac_cv_env_foo' variables (set or unset) will be overridden when
+# loading this file, other *unset* `ac_cv_foo' will be assigned the
 # following values.
 
 _ACEOF
@@ -26296,14 +25501,14 @@ printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;}
   (set) 2>&1 |
     case $as_nl`(ac_space=' '; set) 2>&1` in #(
     *${as_nl}ac_space=\ *)
-      # 'set' does not quote correctly, so add quotes: double-quote
+      # `set' does not quote correctly, so add quotes: double-quote
       # substitution turns \\\\ into \\, and sed turns \\ into \.
       sed -n \
 	"s/'/'\\\\''/g;
 	  s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\\2'/p"
       ;; #(
     *)
-      # 'set' quotes correctly as required by POSIX, so do not add quotes.
+      # `set' quotes correctly as required by POSIX, so do not add quotes.
       sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p"
       ;;
     esac |
@@ -26384,18 +25589,6 @@ if test -z "${am__fastdepCXX_TRUE}" && test -z "${am__fastdepCXX_FALSE}"; then
   as_fn_error $? "conditional \"am__fastdepCXX\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
-case $enable_silent_rules in # (((
-  yes) AM_DEFAULT_VERBOSITY=0;;
-   no) AM_DEFAULT_VERBOSITY=1;;
-esac
-if test $am_cv_make_support_nested_variables = yes; then
-    AM_V='$(V)'
-  AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)'
-else
-  AM_V=$AM_DEFAULT_VERBOSITY
-  AM_DEFAULT_V=$AM_DEFAULT_VERBOSITY
-fi
-
  if test -n "$EXEEXT"; then
   am__EXEEXT_TRUE=
   am__EXEEXT_FALSE='#'
@@ -26412,12 +25605,6 @@ if test -z "${CODE_COVERAGE_ENABLED_TRUE}" && test -z "${CODE_COVERAGE_ENABLED_F
   as_fn_error $? "conditional \"CODE_COVERAGE_ENABLED\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
-# Check whether --enable-year2038 was given.
-if test ${enable_year2038+y}
-then :
-  enableval=$enable_year2038;
-fi
-
 if test -z "${CARES_USE_NO_UNDEFINED_TRUE}" && test -z "${CARES_USE_NO_UNDEFINED_FALSE}"; then
   as_fn_error $? "conditional \"CARES_USE_NO_UNDEFINED\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
@@ -26459,6 +25646,7 @@ cat >>$CONFIG_STATUS <<\_ASEOF || as_write_fail=1
 
 # Be more Bourne compatible
 DUALCASE=1; export DUALCASE # for MKS sh
+as_nop=:
 if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1
 then :
   emulate sh
@@ -26467,13 +25655,12 @@ then :
   # is contrary to our usage.  Disable this feature.
   alias -g '${1+"$@"}'='"$@"'
   setopt NO_GLOB_SUBST
-else case e in #(
-  e) case `(set -o) 2>/dev/null` in #(
+else $as_nop
+  case `(set -o) 2>/dev/null` in #(
   *posix*) :
     set -o posix ;; #(
   *) :
      ;;
-esac ;;
 esac
 fi
 
@@ -26545,7 +25732,7 @@ IFS=$as_save_IFS
 
      ;;
 esac
-# We did not find ourselves, most probably we were run as 'sh COMMAND'
+# We did not find ourselves, most probably we were run as `sh COMMAND'
 # in which case we are not to be found in the path.
 if test "x$as_myself" = x; then
   as_myself=$0
@@ -26574,6 +25761,7 @@ as_fn_error ()
 } # as_fn_error
 
 
+
 # as_fn_set_status STATUS
 # -----------------------
 # Set $? to STATUS, without forking.
@@ -26613,12 +25801,11 @@ then :
   {
     eval $1+=\$2
   }'
-else case e in #(
-  e) as_fn_append ()
+else $as_nop
+  as_fn_append ()
   {
     eval $1=\$$1\$2
-  } ;;
-esac
+  }
 fi # as_fn_append
 
 # as_fn_arith ARG...
@@ -26632,12 +25819,11 @@ then :
   {
     as_val=$(( $* ))
   }'
-else case e in #(
-  e) as_fn_arith ()
+else $as_nop
+  as_fn_arith ()
   {
     as_val=`expr "$@" || test $? -eq 1`
-  } ;;
-esac
+  }
 fi # as_fn_arith
 
 
@@ -26720,9 +25906,9 @@ if (echo >conf$$.file) 2>/dev/null; then
   if ln -s conf$$.file conf$$ 2>/dev/null; then
     as_ln_s='ln -s'
     # ... but there are two gotchas:
-    # 1) On MSYS, both 'ln -s file dir' and 'ln file dir' fail.
-    # 2) DJGPP < 2.04 has no symlinks; 'ln -s' creates a wrapper executable.
-    # In both cases, we have to default to 'cp -pR'.
+    # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail.
+    # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable.
+    # In both cases, we have to default to `cp -pR'.
     ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe ||
       as_ln_s='cp -pR'
   elif ln conf$$.file conf$$ 2>/dev/null; then
@@ -26803,12 +25989,10 @@ as_test_x='test -x'
 as_executable_p=as_fn_executable_p
 
 # Sed expression to map a string onto a valid CPP name.
-as_sed_cpp="y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g"
-as_tr_cpp="eval sed '$as_sed_cpp'" # deprecated
+as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'"
 
 # Sed expression to map a string onto a valid variable name.
-as_sed_sh="y%*+%pp%;s%[^_$as_cr_alnum]%_%g"
-as_tr_sh="eval sed '$as_sed_sh'" # deprecated
+as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'"
 
 
 exec 6>&1
@@ -26823,8 +26007,8 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by c-ares $as_me 1.34.2, which was
-generated by GNU Autoconf 2.72.  Invocation command line was
+This file was extended by c-ares $as_me 1.34.3, which was
+generated by GNU Autoconf 2.71.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
   CONFIG_HEADERS  = $CONFIG_HEADERS
@@ -26856,7 +26040,7 @@ _ACEOF
 
 cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 ac_cs_usage="\
-'$as_me' instantiates files and other configuration actions
+\`$as_me' instantiates files and other configuration actions
 from templates according to the current configuration.  Unless the files
 and actions are specified as TAGs, all are instantiated by default.
 
@@ -26891,11 +26075,11 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config='$ac_cs_config_escaped'
 ac_cs_version="\\
-c-ares config.status 1.34.2
-configured by $0, generated by GNU Autoconf 2.72,
+c-ares config.status 1.34.3
+configured by $0, generated by GNU Autoconf 2.71,
   with options \\"\$ac_cs_config\\"
 
-Copyright (C) 2023 Free Software Foundation, Inc.
+Copyright (C) 2021 Free Software Foundation, Inc.
 This config.status script is free software; the Free Software Foundation
 gives unlimited permission to copy, distribute and modify it."
 
@@ -26957,8 +26141,8 @@ do
     ac_need_defaults=false;;
   --he | --h)
     # Conflict between --help and --header
-    as_fn_error $? "ambiguous option: '$1'
-Try '$0 --help' for more information.";;
+    as_fn_error $? "ambiguous option: \`$1'
+Try \`$0 --help' for more information.";;
   --help | --hel | -h )
     printf "%s\n" "$ac_cs_usage"; exit ;;
   -q | -quiet | --quiet | --quie | --qui | --qu | --q \
@@ -26966,8 +26150,8 @@ Try '$0 --help' for more information.";;
     ac_cs_silent=: ;;
 
   # This is an error.
-  -*) as_fn_error $? "unrecognized option: '$1'
-Try '$0 --help' for more information." ;;
+  -*) as_fn_error $? "unrecognized option: \`$1'
+Try \`$0 --help' for more information." ;;
 
   *) as_fn_append ac_config_targets " $1"
      ac_need_defaults=false ;;
@@ -27057,14 +26241,12 @@ lt_cv_to_host_file_cmd='`$ECHO "$lt_cv_to_host_file_cmd" | $SED "$delay_single_q
 lt_cv_to_tool_file_cmd='`$ECHO "$lt_cv_to_tool_file_cmd" | $SED "$delay_single_quote_subst"`'
 reload_flag='`$ECHO "$reload_flag" | $SED "$delay_single_quote_subst"`'
 reload_cmds='`$ECHO "$reload_cmds" | $SED "$delay_single_quote_subst"`'
-FILECMD='`$ECHO "$FILECMD" | $SED "$delay_single_quote_subst"`'
 deplibs_check_method='`$ECHO "$deplibs_check_method" | $SED "$delay_single_quote_subst"`'
 file_magic_cmd='`$ECHO "$file_magic_cmd" | $SED "$delay_single_quote_subst"`'
 file_magic_glob='`$ECHO "$file_magic_glob" | $SED "$delay_single_quote_subst"`'
 want_nocaseglob='`$ECHO "$want_nocaseglob" | $SED "$delay_single_quote_subst"`'
 sharedlib_from_linklib_cmd='`$ECHO "$sharedlib_from_linklib_cmd" | $SED "$delay_single_quote_subst"`'
 AR='`$ECHO "$AR" | $SED "$delay_single_quote_subst"`'
-lt_ar_flags='`$ECHO "$lt_ar_flags" | $SED "$delay_single_quote_subst"`'
 AR_FLAGS='`$ECHO "$AR_FLAGS" | $SED "$delay_single_quote_subst"`'
 archiver_list_spec='`$ECHO "$archiver_list_spec" | $SED "$delay_single_quote_subst"`'
 STRIP='`$ECHO "$STRIP" | $SED "$delay_single_quote_subst"`'
@@ -27242,13 +26424,13 @@ LN_S \
 lt_SP2NL \
 lt_NL2SP \
 reload_flag \
-FILECMD \
 deplibs_check_method \
 file_magic_cmd \
 file_magic_glob \
 want_nocaseglob \
 sharedlib_from_linklib_cmd \
 AR \
+AR_FLAGS \
 archiver_list_spec \
 STRIP \
 RANLIB \
@@ -27418,7 +26600,7 @@ do
     "libcares.pc") CONFIG_FILES="$CONFIG_FILES libcares.pc" ;;
     "test/Makefile") CONFIG_FILES="$CONFIG_FILES test/Makefile" ;;
 
-  *) as_fn_error $? "invalid argument: '$ac_config_target'" "$LINENO" 5;;
+  *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
   esac
 done
 
@@ -27438,7 +26620,7 @@ fi
 # creating and moving files from /tmp can sometimes cause problems.
 # Hook for its removal unless debugging.
 # Note that there is a small window in which the directory will not be cleaned:
-# after its creation but before its name has been assigned to '$tmp'.
+# after its creation but before its name has been assigned to `$tmp'.
 $debug ||
 {
   tmp= ac_tmp=
@@ -27462,7 +26644,7 @@ ac_tmp=$tmp
 
 # Set up the scripts for CONFIG_FILES section.
 # No need to generate them if there are no CONFIG_FILES.
-# This happens for instance with './config.status config.h'.
+# This happens for instance with `./config.status config.h'.
 if test -n "$CONFIG_FILES"; then
 
 
@@ -27620,13 +26802,13 @@ fi # test -n "$CONFIG_FILES"
 
 # Set up the scripts for CONFIG_HEADERS section.
 # No need to generate them if there are no CONFIG_HEADERS.
-# This happens for instance with './config.status Makefile'.
+# This happens for instance with `./config.status Makefile'.
 if test -n "$CONFIG_HEADERS"; then
 cat >"$ac_tmp/defines.awk" <<\_ACAWK ||
 BEGIN {
 _ACEOF
 
-# Transform confdefs.h into an awk script 'defines.awk', embedded as
+# Transform confdefs.h into an awk script `defines.awk', embedded as
 # here-document in config.status, that substitutes the proper values into
 # config.h.in to produce config.h.
 
@@ -27736,7 +26918,7 @@ do
   esac
   case $ac_mode$ac_tag in
   :[FHL]*:*);;
-  :L* | :C*:*) as_fn_error $? "invalid tag '$ac_tag'" "$LINENO" 5;;
+  :L* | :C*:*) as_fn_error $? "invalid tag \`$ac_tag'" "$LINENO" 5;;
   :[FH]-) ac_tag=-:-;;
   :[FH]*) ac_tag=$ac_tag:$ac_tag.in;;
   esac
@@ -27758,19 +26940,19 @@ do
       -) ac_f="$ac_tmp/stdin";;
       *) # Look for the file first in the build tree, then in the source tree
 	 # (if the path is not absolute).  The absolute path cannot be DOS-style,
-	 # because $ac_f cannot contain ':'.
+	 # because $ac_f cannot contain `:'.
 	 test -f "$ac_f" ||
 	   case $ac_f in
 	   [\\/$]*) false;;
 	   *) test -f "$srcdir/$ac_f" && ac_f="$srcdir/$ac_f";;
 	   esac ||
-	   as_fn_error 1 "cannot find input file: '$ac_f'" "$LINENO" 5;;
+	   as_fn_error 1 "cannot find input file: \`$ac_f'" "$LINENO" 5;;
       esac
       case $ac_f in *\'*) ac_f=`printf "%s\n" "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac
       as_fn_append ac_file_inputs " '$ac_f'"
     done
 
-    # Let's still pretend it is 'configure' which instantiates (i.e., don't
+    # Let's still pretend it is `configure' which instantiates (i.e., don't
     # use $as_me), people would be surprised to read:
     #    /* config.h.  Generated by config.status.  */
     configure_input='Generated from '`
@@ -27903,7 +27085,7 @@ cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 esac
 _ACEOF
 
-# Neutralize VPATH when '$srcdir' = '.'.
+# Neutralize VPATH when `$srcdir' = `.'.
 # Shell code in configure.ac might set extrasub.
 # FIXME: do we really want to maintain this feature?
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
@@ -27934,9 +27116,9 @@ test -z "$ac_datarootdir_hack$ac_datarootdir_seen" &&
   { ac_out=`sed -n '/\${datarootdir}/p' "$ac_tmp/out"`; test -n "$ac_out"; } &&
   { ac_out=`sed -n '/^[	 ]*datarootdir[	 ]*:*=/p' \
       "$ac_tmp/out"`; test -z "$ac_out"; } &&
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable 'datarootdir'
+  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir'
 which seems to be undefined.  Please make sure it is defined" >&5
-printf "%s\n" "$as_me: WARNING: $ac_file contains a reference to the variable 'datarootdir'
+printf "%s\n" "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir'
 which seems to be undefined.  Please make sure it is defined" >&2;}
 
   rm -f "$ac_tmp/stdin"
@@ -28091,15 +27273,15 @@ printf "%s\n" X/"$am_mf" |
    (exit $ac_status); } || am_rc=$?
   done
   if test $am_rc -ne 0; then
-    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in '$ac_pwd':" >&5
-printf "%s\n" "$as_me: error: in '$ac_pwd':" >&2;}
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;}
 as_fn_error $? "Something went wrong bootstrapping makefile fragments
     for automatic dependency tracking.  If GNU make was not used, consider
     re-running the configure script with MAKE=\"gmake\" (or whatever is
     necessary).  You can also try re-running configure with the
     '--disable-dependency-tracking' option to at least be able to build
     the package (albeit without support for automatic dependency tracking).
-See 'config.log' for more details" "$LINENO" 5; }
+See \`config.log' for more details" "$LINENO" 5; }
   fi
   { am_dirpart=; unset am_dirpart;}
   { am_filepart=; unset am_filepart;}
@@ -28128,13 +27310,13 @@ See 'config.log' for more details" "$LINENO" 5; }
 # Provide generalized library-building support services.
 # Written by Gordon Matzigkeit, 1996
 
-# Copyright (C) 2024 Free Software Foundation, Inc.
+# Copyright (C) 2014 Free Software Foundation, Inc.
 # This is free software; see the source for copying conditions.  There is NO
 # warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 # GNU Libtool is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# the Free Software Foundation; either version 2 of of the License, or
 # (at your option) any later version.
 #
 # As a special exception to the GNU General Public License, if you
@@ -28251,9 +27433,6 @@ to_host_file_cmd=$lt_cv_to_host_file_cmd
 # convert \$build files to toolchain format.
 to_tool_file_cmd=$lt_cv_to_tool_file_cmd
 
-# A file(cmd) program that detects file types.
-FILECMD=$lt_FILECMD
-
 # Method to check whether dependent libraries are shared objects.
 deplibs_check_method=$lt_deplibs_check_method
 
@@ -28272,11 +27451,8 @@ sharedlib_from_linklib_cmd=$lt_sharedlib_from_linklib_cmd
 # The archiver.
 AR=$lt_AR
 
-# Flags to create an archive (by configure).
-lt_ar_flags=$lt_ar_flags
-
 # Flags to create an archive.
-AR_FLAGS=\${ARFLAGS-"\$lt_ar_flags"}
+AR_FLAGS=$lt_AR_FLAGS
 
 # How to feed a file listing to the archiver.
 archiver_list_spec=$lt_archiver_list_spec
@@ -28518,7 +27694,7 @@ hardcode_direct=$hardcode_direct
 
 # Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes
 # DIR into the resulting binary and the resulting library dependency is
-# "absolute",i.e. impossible to change by setting \$shlibpath_var if the
+# "absolute",i.e impossible to change by setting \$shlibpath_var if the
 # library is relocated.
 hardcode_direct_absolute=$hardcode_direct_absolute
 
@@ -28666,7 +27842,7 @@ ltmain=$ac_aux_dir/ltmain.sh
   # if finds mixed CR/LF and LF-only lines.  Since sed operates in
   # text mode, it properly converts lines to CR/LF.  This bash problem
   # is reportedly fixed, but why not run on old versions too?
-  $SED '$q' "$ltmain" >> "$cfgfile" \
+  sed '$q' "$ltmain" >> "$cfgfile" \
      || (rm -f "$cfgfile"; exit 1)
 
    mv -f "$cfgfile" "$ofile" ||
@@ -28761,7 +27937,7 @@ hardcode_direct=$hardcode_direct_CXX
 
 # Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes
 # DIR into the resulting binary and the resulting library dependency is
-# "absolute",i.e. impossible to change by setting \$shlibpath_var if the
+# "absolute",i.e impossible to change by setting \$shlibpath_var if the
 # library is relocated.
 hardcode_direct_absolute=$hardcode_direct_absolute_CXX
 
diff --git a/deps/cares/configure.ac b/deps/cares/configure.ac
index 0ebda1c63f5f5e..5f848c28598a95 100644
--- a/deps/cares/configure.ac
+++ b/deps/cares/configure.ac
@@ -2,10 +2,10 @@ dnl Copyright (C) The c-ares project and its contributors
 dnl SPDX-License-Identifier: MIT
 AC_PREREQ([2.69])
 
-AC_INIT([c-ares], [1.34.2],
+AC_INIT([c-ares], [1.34.3],
   [c-ares mailing list: http://lists.haxx.se/listinfo/c-ares])
 
-CARES_VERSION_INFO="21:1:19"
+CARES_VERSION_INFO="21:2:19"
 dnl This flag accepts an argument of the form current[:revision[:age]]. So,
 dnl passing -version-info 3:12:1 sets current to 3, revision to 12, and age to
 dnl 1.
@@ -373,7 +373,7 @@ AS_HELP_STRING([--enable-libgcc],[use libgcc when linking]),
 dnl check for a few basic system headers we need.  It would be nice if we could
 dnl split these on separate lines, but for some reason autotools on Windows doesn't
 dnl allow this, even tried ending lines with a backslash.
-AC_CHECK_HEADERS([malloc.h memory.h AvailabilityMacros.h sys/types.h sys/time.h sys/select.h sys/socket.h sys/filio.h sys/ioctl.h sys/param.h sys/uio.h sys/random.h sys/event.h sys/epoll.h assert.h iphlpapi.h netioapi.h netdb.h netinet/in.h netinet6/in6.h netinet/tcp.h net/if.h ifaddrs.h fcntl.h errno.h socket.h strings.h stdbool.h time.h poll.h limits.h arpa/nameser.h arpa/nameser_compat.h arpa/inet.h ],
+AC_CHECK_HEADERS([malloc.h memory.h AvailabilityMacros.h sys/types.h sys/time.h sys/select.h sys/socket.h sys/filio.h sys/ioctl.h sys/param.h sys/uio.h sys/random.h sys/event.h sys/epoll.h assert.h iphlpapi.h netioapi.h netdb.h netinet/in.h netinet6/in6.h netinet/tcp.h net/if.h ifaddrs.h fcntl.h errno.h socket.h strings.h stdbool.h time.h poll.h limits.h arpa/nameser.h arpa/nameser_compat.h arpa/inet.h sys/system_properties.h ],
 dnl to do if not found
 [],
 dnl to do if found
@@ -488,6 +488,9 @@ cares_all_includes="
 #ifdef HAVE_RESOLV_H
 #  include <resolv.h>
 #endif
+#ifdef HAVE_SYS_SYSTEM_PROPERTIES_H
+#  include <sys/system_properties.h>
+#endif
 #ifdef HAVE_IPHLPAPI_H
 #  include <iphlpapi.h>
 #endif
diff --git a/deps/cares/docs/Makefile.in b/deps/cares/docs/Makefile.in
index da57136dad9a88..6b7bb8e30d1a20 100644
--- a/deps/cares/docs/Makefile.in
+++ b/deps/cares/docs/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -72,8 +72,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -168,9 +166,10 @@ am__base_list = \
   sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \
   sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g'
 am__uninstall_files_from_dir = { \
-  { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
-  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
-       $(am__cd) "$$dir" && echo $$files | $(am__xargs_n) 40 $(am__rm_f); }; \
+  test -z "$$files" \
+    || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
+    || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
+         $(am__cd) "$$dir" && rm -f $$files; }; \
   }
 man3dir = $(mandir)/man3
 am__installdirs = "$(DESTDIR)$(man3dir)"
@@ -224,7 +223,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -291,10 +289,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -649,8 +645,8 @@ mostlyclean-generic:
 clean-generic:
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -744,10 +740,3 @@ uninstall-man: uninstall-man3
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%
diff --git a/deps/cares/include/Makefile.in b/deps/cares/include/Makefile.in
index 99936f8649748f..0beee44a22bb22 100644
--- a/deps/cares/include/Makefile.in
+++ b/deps/cares/include/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -70,8 +70,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -164,9 +162,10 @@ am__base_list = \
   sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \
   sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g'
 am__uninstall_files_from_dir = { \
-  { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
-  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
-       $(am__cd) "$$dir" && echo $$files | $(am__xargs_n) 40 $(am__rm_f); }; \
+  test -z "$$files" \
+    || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
+    || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
+         $(am__cd) "$$dir" && rm -f $$files; }; \
   }
 am__installdirs = "$(DESTDIR)$(includedir)"
 HEADERS = $(include_HEADERS)
@@ -235,7 +234,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -302,10 +300,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -398,8 +394,8 @@ ares_build.h: stamp-h2
 	@test -f $@ || $(MAKE) $(AM_MAKEFLAGS) stamp-h2
 
 stamp-h2: $(srcdir)/ares_build.h.in $(top_builddir)/config.status
-	$(AM_V_at)rm -f stamp-h2
-	$(AM_V_GEN)cd $(top_builddir) && $(SHELL) ./config.status include/ares_build.h
+	@rm -f stamp-h2
+	cd $(top_builddir) && $(SHELL) ./config.status include/ares_build.h
 
 distclean-hdr:
 	-rm -f ares_build.h stamp-h2
@@ -546,8 +542,8 @@ mostlyclean-generic:
 clean-generic:
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -640,10 +636,3 @@ uninstall-am: uninstall-includeHEADERS
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%
diff --git a/deps/cares/include/ares_version.h b/deps/cares/include/ares_version.h
index d7a9c9e61e36d2..9cb8084dd56bc9 100644
--- a/deps/cares/include/ares_version.h
+++ b/deps/cares/include/ares_version.h
@@ -32,8 +32,8 @@
 
 #define ARES_VERSION_MAJOR 1
 #define ARES_VERSION_MINOR 34
-#define ARES_VERSION_PATCH 2
-#define ARES_VERSION_STR "1.34.2"
+#define ARES_VERSION_PATCH 3
+#define ARES_VERSION_STR "1.34.3"
 
 /* NOTE: We cannot make the version string a C preprocessor stringify operation
  *       due to assumptions made by integrators that aren't properly using
diff --git a/deps/cares/m4/libtool.m4 b/deps/cares/m4/libtool.m4
old mode 100644
new mode 100755
index e5ddacee99c5cd..c4c02946dece79
--- a/deps/cares/m4/libtool.m4
+++ b/deps/cares/m4/libtool.m4
@@ -1,7 +1,6 @@
 # libtool.m4 - Configure libtool for the host system. -*-Autoconf-*-
 #
-#   Copyright (C) 1996-2001, 2003-2019, 2021-2024 Free Software
-#   Foundation, Inc.
+#   Copyright (C) 1996-2001, 2003-2015 Free Software Foundation, Inc.
 #   Written by Gordon Matzigkeit, 1996
 #
 # This file is free software; the Free Software Foundation gives
@@ -9,13 +8,13 @@
 # modifications, as long as this notice is preserved.
 
 m4_define([_LT_COPYING], [dnl
-# Copyright (C) 2024 Free Software Foundation, Inc.
+# Copyright (C) 2014 Free Software Foundation, Inc.
 # This is free software; see the source for copying conditions.  There is NO
 # warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 # GNU Libtool is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# the Free Software Foundation; either version 2 of of the License, or
 # (at your option) any later version.
 #
 # As a special exception to the GNU General Public License, if you
@@ -32,7 +31,7 @@ m4_define([_LT_COPYING], [dnl
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 ])
 
-# serial 62 LT_INIT
+# serial 58 LT_INIT
 
 
 # LT_PREREQ(VERSION)
@@ -60,7 +59,7 @@ esac
 # LT_INIT([OPTIONS])
 # ------------------
 AC_DEFUN([LT_INIT],
-[AC_PREREQ([2.64])dnl We use AC_PATH_PROGS_FEATURE_CHECK
+[AC_PREREQ([2.62])dnl We use AC_PATH_PROGS_FEATURE_CHECK
 AC_REQUIRE([AC_CONFIG_AUX_DIR_DEFAULT])dnl
 AC_BEFORE([$0], [LT_LANG])dnl
 AC_BEFORE([$0], [LT_OUTPUT])dnl
@@ -182,7 +181,6 @@ m4_require([_LT_FILEUTILS_DEFAULTS])dnl
 m4_require([_LT_CHECK_SHELL_FEATURES])dnl
 m4_require([_LT_PATH_CONVERSION_FUNCTIONS])dnl
 m4_require([_LT_CMD_RELOAD])dnl
-m4_require([_LT_DECL_FILECMD])dnl
 m4_require([_LT_CHECK_MAGIC_METHOD])dnl
 m4_require([_LT_CHECK_SHAREDLIB_FROM_LINKLIB])dnl
 m4_require([_LT_CMD_OLD_ARCHIVE])dnl
@@ -221,8 +219,8 @@ esac
 ofile=libtool
 can_build_shared=yes
 
-# All known linkers require a '.a' archive for static linking (except MSVC and
-# ICC, which need '.lib').
+# All known linkers require a '.a' archive for static linking (except MSVC,
+# which needs '.lib').
 libext=a
 
 with_gnu_ld=$lt_cv_prog_gnu_ld
@@ -616,7 +614,7 @@ m4_popdef([AS_MESSAGE_LOG_FD])])])# _LT_GENERATED_FILE_INIT
 # LT_OUTPUT
 # ---------
 # This macro allows early generation of the libtool script (before
-# AC_OUTPUT is called), in case it is used in configure for compilation
+# AC_OUTPUT is called), incase it is used in configure for compilation
 # tests.
 AC_DEFUN([LT_OUTPUT],
 [: ${CONFIG_LT=./config.lt}
@@ -651,9 +649,9 @@ m4_ifset([AC_PACKAGE_NAME], [AC_PACKAGE_NAME ])config.lt[]dnl
 m4_ifset([AC_PACKAGE_VERSION], [ AC_PACKAGE_VERSION])
 configured by $[0], generated by m4_PACKAGE_STRING.
 
-Copyright (C) 2024 Free Software Foundation, Inc.
+Copyright (C) 2011 Free Software Foundation, Inc.
 This config.lt script is free software; the Free Software Foundation
-gives unlimited permission to copy, distribute and modify it."
+gives unlimited permision to copy, distribute and modify it."
 
 while test 0 != $[#]
 do
@@ -779,7 +777,7 @@ _LT_EOF
   # if finds mixed CR/LF and LF-only lines.  Since sed operates in
   # text mode, it properly converts lines to CR/LF.  This bash problem
   # is reportedly fixed, but why not run on old versions too?
-  $SED '$q' "$ltmain" >> "$cfgfile" \
+  sed '$q' "$ltmain" >> "$cfgfile" \
      || (rm -f "$cfgfile"; exit 1)
 
    mv -f "$cfgfile" "$ofile" ||
@@ -974,7 +972,6 @@ _lt_linker_boilerplate=`cat conftest.err`
 $RM -r conftest*
 ])# _LT_LINKER_BOILERPLATE
 
-
 # _LT_REQUIRED_DARWIN_CHECKS
 # -------------------------
 m4_defun_once([_LT_REQUIRED_DARWIN_CHECKS],[
@@ -1025,21 +1022,6 @@ m4_defun_once([_LT_REQUIRED_DARWIN_CHECKS],[
 	rm -f conftest.*
       fi])
 
-    # Feature test to disable chained fixups since it is not
-    # compatible with '-undefined dynamic_lookup'
-    AC_CACHE_CHECK([for -no_fixup_chains linker flag],
-      [lt_cv_support_no_fixup_chains],
-      [ save_LDFLAGS=$LDFLAGS
-        LDFLAGS="$LDFLAGS -Wl,-no_fixup_chains"
-        AC_LINK_IFELSE(
-          [AC_LANG_PROGRAM([],[])],
-          lt_cv_support_no_fixup_chains=yes,
-          lt_cv_support_no_fixup_chains=no
-        )
-        LDFLAGS=$save_LDFLAGS
-      ]
-    )
-
     AC_CACHE_CHECK([for -exported_symbols_list linker flag],
       [lt_cv_ld_exported_symbols_list],
       [lt_cv_ld_exported_symbols_list=no
@@ -1059,12 +1041,12 @@ int forced_loaded() { return 2;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS -c -o conftest.o conftest.c" >&AS_MESSAGE_LOG_FD
       $LTCC $LTCFLAGS -c -o conftest.o conftest.c 2>&AS_MESSAGE_LOG_FD
-      echo "$AR $AR_FLAGS libconftest.a conftest.o" >&AS_MESSAGE_LOG_FD
-      $AR $AR_FLAGS libconftest.a conftest.o 2>&AS_MESSAGE_LOG_FD
+      echo "$AR cr libconftest.a conftest.o" >&AS_MESSAGE_LOG_FD
+      $AR cr libconftest.a conftest.o 2>&AS_MESSAGE_LOG_FD
       echo "$RANLIB libconftest.a" >&AS_MESSAGE_LOG_FD
       $RANLIB libconftest.a 2>&AS_MESSAGE_LOG_FD
       cat > conftest.c << _LT_EOF
-int main(void) { return 0;}
+int main() { return 0;}
 _LT_EOF
       echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&AS_MESSAGE_LOG_FD
       $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err
@@ -1084,16 +1066,17 @@ _LT_EOF
       _lt_dar_allow_undefined='$wl-undefined ${wl}suppress' ;;
     darwin1.*)
       _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
-    darwin*)
-      case $MACOSX_DEPLOYMENT_TARGET,$host in
-        10.[[012]],*|,*powerpc*-darwin[[5-8]]*)
-          _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
-        *)
-          _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup'
-          if test yes = "$lt_cv_support_no_fixup_chains"; then
-            AS_VAR_APPEND([_lt_dar_allow_undefined], [' $wl-no_fixup_chains'])
-          fi
-        ;;
+    darwin*) # darwin 5.x on
+      # if running on 10.5 or later, the deployment target defaults
+      # to the OS version, if on x86, and 10.4, the deployment
+      # target defaults to 10.4. Don't you love it?
+      case ${MACOSX_DEPLOYMENT_TARGET-10.0},$host in
+	10.0,*86*-darwin8*|10.0,*-darwin[[912]]*)
+	  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
+	10.[[012]][[,.]]*)
+	  _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;;
+	10.*|11.*)
+	  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
       esac
     ;;
   esac
@@ -1142,12 +1125,12 @@ m4_defun([_LT_DARWIN_LINKER_FEATURES],
     output_verbose_link_cmd=func_echo_all
     _LT_TAGVAR(archive_cmds, $1)="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil"
     _LT_TAGVAR(module_cmds, $1)="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil"
-    _LT_TAGVAR(archive_expsym_cmds, $1)="$SED 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
-    _LT_TAGVAR(module_expsym_cmds, $1)="$SED -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
+    _LT_TAGVAR(archive_expsym_cmds, $1)="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil"
+    _LT_TAGVAR(module_expsym_cmds, $1)="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil"
     m4_if([$1], [CXX],
 [   if test yes != "$lt_cv_apple_cc_single_mod"; then
       _LT_TAGVAR(archive_cmds, $1)="\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dsymutil"
-      _LT_TAGVAR(archive_expsym_cmds, $1)="$SED 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dar_export_syms$_lt_dsymutil"
+      _LT_TAGVAR(archive_expsym_cmds, $1)="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dar_export_syms$_lt_dsymutil"
     fi
 ],[])
   else
@@ -1261,8 +1244,7 @@ _LT_DECL([], [ECHO], [1], [An echo program that protects backslashes])
 # _LT_WITH_SYSROOT
 # ----------------
 AC_DEFUN([_LT_WITH_SYSROOT],
-[m4_require([_LT_DECL_SED])dnl
-AC_MSG_CHECKING([for sysroot])
+[AC_MSG_CHECKING([for sysroot])
 AC_ARG_WITH([sysroot],
 [AS_HELP_STRING([--with-sysroot@<:@=DIR@:>@],
   [Search for dependent libraries within DIR (or the compiler's sysroot
@@ -1275,13 +1257,11 @@ lt_sysroot=
 case $with_sysroot in #(
  yes)
    if test yes = "$GCC"; then
-     # Trim trailing / since we'll always append absolute paths and we want
-     # to avoid //, if only for less confusing output for the user.
-     lt_sysroot=`$CC --print-sysroot 2>/dev/null | $SED 's:/\+$::'`
+     lt_sysroot=`$CC --print-sysroot 2>/dev/null`
    fi
    ;; #(
  /*)
-   lt_sysroot=`echo "$with_sysroot" | $SED -e "$sed_quote_subst"`
+   lt_sysroot=`echo "$with_sysroot" | sed -e "$sed_quote_subst"`
    ;; #(
  no|'')
    ;; #(
@@ -1311,7 +1291,7 @@ ia64-*-hpux*)
   # options accordingly.
   echo 'int i;' > conftest.$ac_ext
   if AC_TRY_EVAL(ac_compile); then
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *ELF-32*)
 	HPUX_IA64_MODE=32
 	;;
@@ -1328,7 +1308,7 @@ ia64-*-hpux*)
   echo '[#]line '$LINENO' "configure"' > conftest.$ac_ext
   if AC_TRY_EVAL(ac_compile); then
     if test yes = "$lt_cv_prog_gnu_ld"; then
-      case `$FILECMD conftest.$ac_objext` in
+      case `/usr/bin/file conftest.$ac_objext` in
 	*32-bit*)
 	  LD="${LD-ld} -melf32bsmip"
 	  ;;
@@ -1340,7 +1320,7 @@ ia64-*-hpux*)
 	;;
       esac
     else
-      case `$FILECMD conftest.$ac_objext` in
+      case `/usr/bin/file conftest.$ac_objext` in
 	*32-bit*)
 	  LD="${LD-ld} -32"
 	  ;;
@@ -1362,7 +1342,7 @@ mips64*-*linux*)
   echo '[#]line '$LINENO' "configure"' > conftest.$ac_ext
   if AC_TRY_EVAL(ac_compile); then
     emul=elf
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *32-bit*)
 	emul="${emul}32"
 	;;
@@ -1370,7 +1350,7 @@ mips64*-*linux*)
 	emul="${emul}64"
 	;;
     esac
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *MSB*)
 	emul="${emul}btsmip"
 	;;
@@ -1378,7 +1358,7 @@ mips64*-*linux*)
 	emul="${emul}ltsmip"
 	;;
     esac
-    case `$FILECMD conftest.$ac_objext` in
+    case `/usr/bin/file conftest.$ac_objext` in
       *N32*)
 	emul="${emul}n32"
 	;;
@@ -1389,7 +1369,7 @@ mips64*-*linux*)
   ;;
 
 x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \
-s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
+s390*-*linux*|s390*-*tpf*|sparc*-*linux*)
   # Find out what ABI is being produced by ac_compile, and set linker
   # options accordingly.  Note that the listed cases only cover the
   # situations where additional linker options are needed (such as when
@@ -1398,14 +1378,14 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
   # not appear in the list.
   echo 'int i;' > conftest.$ac_ext
   if AC_TRY_EVAL(ac_compile); then
-    case `$FILECMD conftest.o` in
+    case `/usr/bin/file conftest.o` in
       *32-bit*)
 	case $host in
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_i386_fbsd"
 	    ;;
-	  x86_64-*linux*|x86_64-gnu*)
-	    case `$FILECMD conftest.o` in
+	  x86_64-*linux*)
+	    case `/usr/bin/file conftest.o` in
 	      *x86-64*)
 		LD="${LD-ld} -m elf32_x86_64"
 		;;
@@ -1433,7 +1413,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
 	  x86_64-*kfreebsd*-gnu)
 	    LD="${LD-ld} -m elf_x86_64_fbsd"
 	    ;;
-	  x86_64-*linux*|x86_64-gnu*)
+	  x86_64-*linux*)
 	    LD="${LD-ld} -m elf_x86_64"
 	    ;;
 	  powerpcle-*linux*)
@@ -1473,7 +1453,7 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux*|x86_64-gnu*)
   # options accordingly.
   echo 'int i;' > conftest.$ac_ext
   if AC_TRY_EVAL(ac_compile); then
-    case `$FILECMD conftest.o` in
+    case `/usr/bin/file conftest.o` in
     *64-bit*)
       case $lt_cv_prog_gnu_ld in
       yes*)
@@ -1512,22 +1492,9 @@ need_locks=$enable_libtool_lock
 m4_defun([_LT_PROG_AR],
 [AC_CHECK_TOOLS(AR, [ar], false)
 : ${AR=ar}
+: ${AR_FLAGS=cr}
 _LT_DECL([], [AR], [1], [The archiver])
-
-# Use ARFLAGS variable as AR's operation code to sync the variable naming with
-# Automake.  If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have
-# higher priority because that's what people were doing historically (setting
-# ARFLAGS for automake and AR_FLAGS for libtool).  FIXME: Make the AR_FLAGS
-# variable obsoleted/removed.
-
-test ${AR_FLAGS+y} || AR_FLAGS=${ARFLAGS-cr}
-lt_ar_flags=$AR_FLAGS
-_LT_DECL([], [lt_ar_flags], [0], [Flags to create an archive (by configure)])
-
-# Make AR_FLAGS overridable by 'make ARFLAGS='.  Don't try to run-time override
-# by AR_FLAGS because that was never working and AR_FLAGS is about to die.
-_LT_DECL([], [AR_FLAGS], [\@S|@{ARFLAGS-"\@S|@lt_ar_flags"}],
-         [Flags to create an archive])
+_LT_DECL([], [AR_FLAGS], [1], [Flags to create an archive])
 
 AC_CACHE_CHECK([for archiver @FILE support], [lt_cv_ar_at_file],
   [lt_cv_ar_at_file=no
@@ -1566,7 +1533,7 @@ AC_CHECK_TOOL(STRIP, strip, :)
 test -z "$STRIP" && STRIP=:
 _LT_DECL([], [STRIP], [1], [A symbol stripping program])
 
-AC_REQUIRE([AC_PROG_RANLIB])
+AC_CHECK_TOOL(RANLIB, ranlib, :)
 test -z "$RANLIB" && RANLIB=:
 _LT_DECL([], [RANLIB], [1],
     [Commands used to install an old-style archive])
@@ -1577,8 +1544,15 @@ old_postinstall_cmds='chmod 644 $oldlib'
 old_postuninstall_cmds=
 
 if test -n "$RANLIB"; then
+  case $host_os in
+  bitrig* | openbsd*)
+    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib"
+    ;;
+  *)
+    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
+    ;;
+  esac
   old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib"
-  old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib"
 fi
 
 case $host_os in
@@ -1717,7 +1691,7 @@ AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
     lt_cv_sys_max_cmd_len=-1;
     ;;
 
-  cygwin* | mingw* | windows* | cegcc*)
+  cygwin* | mingw* | cegcc*)
     # On Win9x/ME, this test blows up -- it succeeds, but takes
     # about 5 minutes as the teststring grows exponentially.
     # Worse, since 9x/ME are not pre-emptively multitasking,
@@ -1739,7 +1713,7 @@ AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
     lt_cv_sys_max_cmd_len=8192;
     ;;
 
-  darwin* | dragonfly* | freebsd* | midnightbsd* | netbsd* | openbsd*)
+  bitrig* | darwin* | dragonfly* | freebsd* | netbsd* | openbsd*)
     # This has been around since 386BSD, at least.  Likely further.
     if test -x /sbin/sysctl; then
       lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax`
@@ -1782,7 +1756,7 @@ AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
   sysv5* | sco5v6* | sysv4.2uw2*)
     kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null`
     if test -n "$kargmax"; then
-      lt_cv_sys_max_cmd_len=`echo $kargmax | $SED 's/.*[[	 ]]//'`
+      lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[[	 ]]//'`
     else
       lt_cv_sys_max_cmd_len=32768
     fi
@@ -1899,11 +1873,11 @@ else
 /* When -fvisibility=hidden is used, assume the code has been annotated
    correspondingly for the symbols needed.  */
 #if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3))
-int fnord (void) __attribute__((visibility("default")));
+int fnord () __attribute__((visibility("default")));
 #endif
 
-int fnord (void) { return 42; }
-int main (void)
+int fnord () { return 42; }
+int main ()
 {
   void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
   int status = $lt_dlunknown;
@@ -1960,7 +1934,7 @@ else
     lt_cv_dlopen_self=yes
     ;;
 
-  mingw* | windows* | pw32* | cegcc*)
+  mingw* | pw32* | cegcc*)
     lt_cv_dlopen=LoadLibrary
     lt_cv_dlopen_libs=
     ;;
@@ -2232,35 +2206,26 @@ m4_defun([_LT_CMD_STRIPLIB],
 striplib=
 old_striplib=
 AC_MSG_CHECKING([whether stripping libraries is possible])
-if test -z "$STRIP"; then
-  AC_MSG_RESULT([no])
+if test -n "$STRIP" && $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then
+  test -z "$old_striplib" && old_striplib="$STRIP --strip-debug"
+  test -z "$striplib" && striplib="$STRIP --strip-unneeded"
+  AC_MSG_RESULT([yes])
 else
-  if $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then
-    old_striplib="$STRIP --strip-debug"
-    striplib="$STRIP --strip-unneeded"
-    AC_MSG_RESULT([yes])
-  else
-    case $host_os in
-    darwin*)
-      # FIXME - insert some real tests, host_os isn't really good enough
+# FIXME - insert some real tests, host_os isn't really good enough
+  case $host_os in
+  darwin*)
+    if test -n "$STRIP"; then
       striplib="$STRIP -x"
       old_striplib="$STRIP -S"
       AC_MSG_RESULT([yes])
-      ;;
-    freebsd*)
-      if $STRIP -V 2>&1 | $GREP "elftoolchain" >/dev/null; then
-        old_striplib="$STRIP --strip-debug"
-        striplib="$STRIP --strip-unneeded"
-        AC_MSG_RESULT([yes])
-      else
-        AC_MSG_RESULT([no])
-      fi
-      ;;
-    *)
+    else
       AC_MSG_RESULT([no])
-      ;;
-    esac
-  fi
+    fi
+    ;;
+  *)
+    AC_MSG_RESULT([no])
+    ;;
+  esac
 fi
 _LT_DECL([], [old_striplib], [1], [Commands to strip libraries])
 _LT_DECL([], [striplib], [1])
@@ -2328,7 +2293,7 @@ if test yes = "$GCC"; then
     *) lt_awk_arg='/^libraries:/' ;;
   esac
   case $host_os in
-    mingw* | windows* | cegcc*) lt_sed_strip_eq='s|=\([[A-Za-z]]:\)|\1|g' ;;
+    mingw* | cegcc*) lt_sed_strip_eq='s|=\([[A-Za-z]]:\)|\1|g' ;;
     *) lt_sed_strip_eq='s|=/|/|g' ;;
   esac
   lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq`
@@ -2386,7 +2351,7 @@ BEGIN {RS = " "; FS = "/|\n";} {
   # AWK program above erroneously prepends '/' to C:/dos/paths
   # for these hosts.
   case $host_os in
-    mingw* | windows* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
+    mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\
       $SED 's|/\([[A-Za-z]]:\)|\1|g'` ;;
   esac
   sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP`
@@ -2461,7 +2426,7 @@ aix[[4-9]]*)
     # Unfortunately, runtime linking may impact performance, so we do
     # not want this to be the default eventually. Also, we use the
     # versioned .so libs for executables only if there is the -brtl
-    # linker flag in LDFLAGS as well, or --enable-aix-soname=svr4 only.
+    # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only.
     # To allow for filename-based versioning support, we need to create
     # libNAME.so.V as an archive file, containing:
     # *) an Import File, referring to the versioned filename of the
@@ -2555,7 +2520,7 @@ bsdi[[45]]*)
   # libtool to hard-code these into programs
   ;;
 
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   version_type=windows
   shrext_cmds=.dll
   need_version=no
@@ -2566,19 +2531,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     # gcc
     library_names_spec='$libname.dll.a'
     # DLL is installed to $(libdir)/../bin by postinstall_cmds
-    # If user builds GCC with mulitlibs enabled,
-    # it should just install on $(libdir)
-    # not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
-    if test yes = $multilib; then
-    postinstall_cmds='base_file=`basename \$file`~
-      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
-      dldir=$destdir/`dirname \$dlpath`~
-      $install_prog $dir/$dlname $destdir/$dlname~
-      chmod a+x $destdir/$dlname~
-      if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
-        eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
-      fi'
-    else
     postinstall_cmds='base_file=`basename \$file`~
       dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~
       dldir=$destdir/`dirname \$dlpath`~
@@ -2588,7 +2540,6 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
       if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
         eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
       fi'
-    fi
     postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
       dlpath=$dir/\$dldll~
        $RM \$dlpath'
@@ -2597,30 +2548,30 @@ cygwin* | mingw* | windows* | pw32* | cegcc*)
     case $host_os in
     cygwin*)
       # Cygwin DLLs use 'cyg' prefix rather than 'lib'
-      soname_spec='`echo $libname | $SED -e 's/^lib/cyg/'``echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
+      soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
 m4_if([$1], [],[
       sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api"])
       ;;
-    mingw* | windows* | cegcc*)
+    mingw* | cegcc*)
       # MinGW DLLs use traditional 'lib' prefix
       soname_spec='$libname`echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
       ;;
     pw32*)
       # pw32 DLLs use 'pw' prefix rather than 'lib'
-      library_names_spec='`echo $libname | $SED -e 's/^lib/pw/'``echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
+      library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
       ;;
     esac
     dynamic_linker='Win32 ld.exe'
     ;;
 
-  *,cl* | *,icl*)
-    # Native MSVC or ICC
+  *,cl*)
+    # Native MSVC
     libname_spec='$name'
     soname_spec='$libname`echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext'
     library_names_spec='$libname.dll.lib'
 
     case $build_os in
-    mingw* | windows*)
+    mingw*)
       sys_lib_search_path_spec=
       lt_save_ifs=$IFS
       IFS=';'
@@ -2633,7 +2584,7 @@ m4_if([$1], [],[
       done
       IFS=$lt_save_ifs
       # Convert to MSYS style.
-      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's|\\\\|/|g' -e 's| \\([[a-zA-Z]]\\):| /\\1|g' -e 's|^ ||'`
+      sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([[a-zA-Z]]\\):| /\\1|g' -e 's|^ ||'`
       ;;
     cygwin*)
       # Convert to unix form, then to dos form, then back to unix form
@@ -2670,7 +2621,7 @@ m4_if([$1], [],[
     ;;
 
   *)
-    # Assume MSVC and ICC wrapper
+    # Assume MSVC wrapper
     library_names_spec='$libname`echo $release | $SED -e 's/[[.]]/-/g'`$versuffix$shared_ext $libname.lib'
     dynamic_linker='Win32 ld.exe'
     ;;
@@ -2703,7 +2654,7 @@ dgux*)
   shlibpath_var=LD_LIBRARY_PATH
   ;;
 
-freebsd* | dragonfly* | midnightbsd*)
+freebsd* | dragonfly*)
   # DragonFly does not have aout.  When/if they implement a new
   # versioning mechanism, adjust this.
   if test -x /usr/bin/objformat; then
@@ -2727,21 +2678,7 @@ freebsd* | dragonfly* | midnightbsd*)
       need_version=yes
       ;;
   esac
-  case $host_cpu in
-    powerpc64)
-      # On FreeBSD bi-arch platforms, a different variable is used for 32-bit
-      # binaries.  See <https://man.freebsd.org/cgi/man.cgi?query=ld.so>.
-      AC_COMPILE_IFELSE(
-        [AC_LANG_SOURCE(
-           [[int test_pointer_size[sizeof (void *) - 5];
-           ]])],
-        [shlibpath_var=LD_LIBRARY_PATH],
-        [shlibpath_var=LD_32_LIBRARY_PATH])
-      ;;
-    *)
-      shlibpath_var=LD_LIBRARY_PATH
-      ;;
-  esac
+  shlibpath_var=LD_LIBRARY_PATH
   case $host_os in
   freebsd2.*)
     shlibpath_overrides_runpath=yes
@@ -2882,7 +2819,7 @@ linux*android*)
   version_type=none # Android doesn't support versioned libraries.
   need_lib_prefix=no
   need_version=no
-  library_names_spec='$libname$release$shared_ext $libname$shared_ext'
+  library_names_spec='$libname$release$shared_ext'
   soname_spec='$libname$release$shared_ext'
   finish_cmds=
   shlibpath_var=LD_LIBRARY_PATH
@@ -2894,9 +2831,8 @@ linux*android*)
   hardcode_into_libs=yes
 
   dynamic_linker='Android linker'
-  # -rpath works at least for libraries that are not overridden by
-  # libraries installed in system locations.
-  _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='$wl-rpath $wl$libdir'
+  # Don't embed -rpath directories since the linker doesn't support them.
+  _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
   ;;
 
 # This must be glibc/ELF.
@@ -2930,7 +2866,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   # before this can be enabled.
   hardcode_into_libs=yes
 
-  # Ideally, we could use ldconfig to report *all* directories which are
+  # Ideally, we could use ldconfig to report *all* directores which are
   # searched for libraries, however this is still not possible.  Aside from not
   # being certain /sbin/ldconfig is available, command
   # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64,
@@ -2950,6 +2886,18 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   dynamic_linker='GNU/Linux ld.so'
   ;;
 
+netbsdelf*-gnu)
+  version_type=linux
+  need_lib_prefix=no
+  need_version=no
+  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
+  soname_spec='${libname}${release}${shared_ext}$major'
+  shlibpath_var=LD_LIBRARY_PATH
+  shlibpath_overrides_runpath=no
+  hardcode_into_libs=yes
+  dynamic_linker='NetBSD ld.elf_so'
+  ;;
+
 netbsd*)
   version_type=sunos
   need_lib_prefix=no
@@ -2987,7 +2935,7 @@ newsos6)
   dynamic_linker='ldqnx.so'
   ;;
 
-openbsd*)
+openbsd* | bitrig*)
   version_type=sunos
   sys_lib_dlsearch_path_spec=/usr/lib
   need_lib_prefix=no
@@ -3319,7 +3267,7 @@ if test yes = "$GCC"; then
   # Check if gcc -print-prog-name=ld gives a path.
   AC_MSG_CHECKING([for ld used by $CC])
   case $host in
-  *-*-mingw* | *-*-windows*)
+  *-*-mingw*)
     # gcc leaves a trailing carriage return, which upsets mingw
     ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
   *)
@@ -3428,7 +3376,7 @@ case $reload_flag in
 esac
 reload_cmds='$LD$reload_flag -o $output$reload_objs'
 case $host_os in
-  cygwin* | mingw* | windows* | pw32* | cegcc*)
+  cygwin* | mingw* | pw32* | cegcc*)
     if test yes != "$GCC"; then
       reload_cmds=false
     fi
@@ -3500,6 +3448,7 @@ lt_cv_deplibs_check_method='unknown'
 # 'none' -- dependencies not supported.
 # 'unknown' -- same as none, but documents that we really don't know.
 # 'pass_all' -- all dependencies passed with no checks.
+# 'test_compile' -- check by making test program.
 # 'file_magic [[regex]]' -- check by looking for files in library path
 # that responds to the $file_magic_cmd with a given extended regex.
 # If you have 'file' or equivalent on your system and you're not sure
@@ -3516,7 +3465,7 @@ beos*)
 
 bsdi[[45]]*)
   lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (shared object|dynamic lib)'
-  lt_cv_file_magic_cmd='$FILECMD -L'
+  lt_cv_file_magic_cmd='/usr/bin/file -L'
   lt_cv_file_magic_test_file=/shlib/libc.so
   ;;
 
@@ -3526,7 +3475,7 @@ cygwin*)
   lt_cv_file_magic_cmd='func_win32_libid'
   ;;
 
-mingw* | windows* | pw32*)
+mingw* | pw32*)
   # Base MSYS/MinGW do not provide the 'file' command needed by
   # func_win32_libid shell function, so use a weaker test based on 'objdump',
   # unless we find 'file', for example because we are cross-compiling.
@@ -3535,7 +3484,7 @@ mingw* | windows* | pw32*)
     lt_cv_file_magic_cmd='func_win32_libid'
   else
     # Keep this pattern in sync with the one in func_win32_libid.
-    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64|pe-aarch64)'
+    lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)'
     lt_cv_file_magic_cmd='$OBJDUMP -f'
   fi
   ;;
@@ -3550,14 +3499,14 @@ darwin* | rhapsody*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-freebsd* | dragonfly* | midnightbsd*)
+freebsd* | dragonfly*)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     case $host_cpu in
     i*86 )
       # Not sure whether the presence of OpenBSD here was a mistake.
       # Let's accept both of them until this is cleared up.
       lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[[3-9]]86 (compact )?demand paged shared library'
-      lt_cv_file_magic_cmd=$FILECMD
+      lt_cv_file_magic_cmd=/usr/bin/file
       lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*`
       ;;
     esac
@@ -3571,7 +3520,7 @@ haiku*)
   ;;
 
 hpux10.20* | hpux11*)
-  lt_cv_file_magic_cmd=$FILECMD
+  lt_cv_file_magic_cmd=/usr/bin/file
   case $host_cpu in
   ia64*)
     lt_cv_deplibs_check_method='file_magic (s[[0-9]][[0-9]][[0-9]]|ELF-[[0-9]][[0-9]]) shared object file - IA64'
@@ -3608,7 +3557,7 @@ linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-netbsd*)
+netbsd* | netbsdelf*-gnu)
   if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then
     lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$'
   else
@@ -3618,7 +3567,7 @@ netbsd*)
 
 newos6*)
   lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (executable|dynamic lib)'
-  lt_cv_file_magic_cmd=$FILECMD
+  lt_cv_file_magic_cmd=/usr/bin/file
   lt_cv_file_magic_test_file=/usr/lib/libnls.so
   ;;
 
@@ -3626,7 +3575,7 @@ newos6*)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
-openbsd*)
+openbsd* | bitrig*)
   if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then
     lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|\.so|_pic\.a)$'
   else
@@ -3690,7 +3639,7 @@ file_magic_glob=
 want_nocaseglob=no
 if test "$build" = "$host"; then
   case $host_os in
-  mingw* | windows* | pw32*)
+  mingw* | pw32*)
     if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then
       want_nocaseglob=yes
     else
@@ -3742,16 +3691,16 @@ else
 	# Tru64's nm complains that /dev/null is an invalid object file
 	# MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty
 	case $build_os in
-	mingw* | windows*) lt_bad_file=conftest.nm/nofile ;;
+	mingw*) lt_bad_file=conftest.nm/nofile ;;
 	*) lt_bad_file=/dev/null ;;
 	esac
-	case `"$tmp_nm" -B $lt_bad_file 2>&1 | $SED '1q'` in
+	case `"$tmp_nm" -B $lt_bad_file 2>&1 | sed '1q'` in
 	*$lt_bad_file* | *'Invalid file or object type'*)
 	  lt_cv_path_NM="$tmp_nm -B"
 	  break 2
 	  ;;
 	*)
-	  case `"$tmp_nm" -p /dev/null 2>&1 | $SED '1q'` in
+	  case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in
 	  */dev/null*)
 	    lt_cv_path_NM="$tmp_nm -p"
 	    break 2
@@ -3777,7 +3726,7 @@ else
     # Let the user override the test.
   else
     AC_CHECK_TOOLS(DUMPBIN, [dumpbin "link -dump"], :)
-    case `$DUMPBIN -symbols -headers /dev/null 2>&1 | $SED '1q'` in
+    case `$DUMPBIN -symbols -headers /dev/null 2>&1 | sed '1q'` in
     *COFF*)
       DUMPBIN="$DUMPBIN -symbols -headers"
       ;;
@@ -3833,7 +3782,7 @@ lt_cv_sharedlib_from_linklib_cmd,
 [lt_cv_sharedlib_from_linklib_cmd='unknown'
 
 case $host_os in
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   # two different shell functions defined in ltmain.sh;
   # decide which one to use based on capabilities of $DLLTOOL
   case `$DLLTOOL --help 2>&1` in
@@ -3865,16 +3814,16 @@ _LT_DECL([], [sharedlib_from_linklib_cmd], [1],
 m4_defun([_LT_PATH_MANIFEST_TOOL],
 [AC_CHECK_TOOL(MANIFEST_TOOL, mt, :)
 test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt
-AC_CACHE_CHECK([if $MANIFEST_TOOL is a manifest tool], [lt_cv_path_manifest_tool],
-  [lt_cv_path_manifest_tool=no
+AC_CACHE_CHECK([if $MANIFEST_TOOL is a manifest tool], [lt_cv_path_mainfest_tool],
+  [lt_cv_path_mainfest_tool=no
   echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&AS_MESSAGE_LOG_FD
   $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out
   cat conftest.err >&AS_MESSAGE_LOG_FD
   if $GREP 'Manifest Tool' conftest.out > /dev/null; then
-    lt_cv_path_manifest_tool=yes
+    lt_cv_path_mainfest_tool=yes
   fi
   rm -f conftest*])
-if test yes != "$lt_cv_path_manifest_tool"; then
+if test yes != "$lt_cv_path_mainfest_tool"; then
   MANIFEST_TOOL=:
 fi
 _LT_DECL([], [MANIFEST_TOOL], [1], [Manifest tool])dnl
@@ -3903,7 +3852,7 @@ AC_DEFUN([LT_LIB_M],
 [AC_REQUIRE([AC_CANONICAL_HOST])dnl
 LIBM=
 case $host in
-*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-mingw* | *-*-pw32* | *-*-darwin*)
+*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-pw32* | *-*-darwin*)
   # These system don't have libm, or don't need it
   ;;
 *-ncr-sysv4.3*)
@@ -3978,7 +3927,7 @@ case $host_os in
 aix*)
   symcode='[[BCDT]]'
   ;;
-cygwin* | mingw* | windows* | pw32* | cegcc*)
+cygwin* | mingw* | pw32* | cegcc*)
   symcode='[[ABCDGISTW]]'
   ;;
 hpux*)
@@ -3993,7 +3942,7 @@ osf*)
   symcode='[[BCDEGQRST]]'
   ;;
 solaris*)
-  symcode='[[BCDRT]]'
+  symcode='[[BDRT]]'
   ;;
 sco3.2v5*)
   symcode='[[DT]]'
@@ -4017,7 +3966,7 @@ esac
 
 if test "$lt_cv_nm_interface" = "MS dumpbin"; then
   # Gets list of data symbols to import.
-  lt_cv_sys_global_symbol_to_import="$SED -n -e 's/^I .* \(.*\)$/\1/p'"
+  lt_cv_sys_global_symbol_to_import="sed -n -e 's/^I .* \(.*\)$/\1/p'"
   # Adjust the below global symbol transforms to fixup imported variables.
   lt_cdecl_hook=" -e 's/^I .* \(.*\)$/extern __declspec(dllimport) char \1;/p'"
   lt_c_name_hook=" -e 's/^I .* \(.*\)$/  {\"\1\", (void *) 0},/p'"
@@ -4035,20 +3984,20 @@ fi
 # Transform an extracted symbol line into a proper C declaration.
 # Some systems (esp. on ia64) link data and code symbols differently,
 # so use this general approach.
-lt_cv_sys_global_symbol_to_cdecl="$SED -n"\
+lt_cv_sys_global_symbol_to_cdecl="sed -n"\
 $lt_cdecl_hook\
 " -e 's/^T .* \(.*\)$/extern int \1();/p'"\
 " -e 's/^$symcode$symcode* .* \(.*\)$/extern char \1;/p'"
 
 # Transform an extracted symbol line into symbol name and symbol address
-lt_cv_sys_global_symbol_to_c_name_address="$SED -n"\
+lt_cv_sys_global_symbol_to_c_name_address="sed -n"\
 $lt_c_name_hook\
 " -e 's/^: \(.*\) .*$/  {\"\1\", (void *) 0},/p'"\
 " -e 's/^$symcode$symcode* .* \(.*\)$/  {\"\1\", (void *) \&\1},/p'"
 
 # Transform an extracted symbol line into symbol name with lib prefix and
 # symbol address.
-lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="$SED -n"\
+lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="sed -n"\
 $lt_c_name_lib_hook\
 " -e 's/^: \(.*\) .*$/  {\"\1\", (void *) 0},/p'"\
 " -e 's/^$symcode$symcode* .* \(lib.*\)$/  {\"\1\", (void *) \&\1},/p'"\
@@ -4057,7 +4006,7 @@ $lt_c_name_lib_hook\
 # Handle CRLF in mingw tool chain
 opt_cr=
 case $build_os in
-mingw* | windows*)
+mingw*)
   opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp
   ;;
 esac
@@ -4072,7 +4021,7 @@ for ac_symprfx in "" "_"; do
   if test "$lt_cv_nm_interface" = "MS dumpbin"; then
     # Fake it for dumpbin and say T for any non-static function,
     # D for any global variable and I for any imported variable.
-    # Also find C++ and __fastcall symbols from MSVC++ or ICC,
+    # Also find C++ and __fastcall symbols from MSVC++,
     # which start with @ or ?.
     lt_cv_sys_global_symbol_pipe="$AWK ['"\
 "     {last_section=section; section=\$ 3};"\
@@ -4090,9 +4039,9 @@ for ac_symprfx in "" "_"; do
 "     s[1]~prfx {split(s[1],t,\"@\"); print f,t[1],substr(t[1],length(prfx))}"\
 "     ' prfx=^$ac_symprfx]"
   else
-    lt_cv_sys_global_symbol_pipe="$SED -n -e 's/^.*[[	 ]]\($symcode$symcode*\)[[	 ]][[	 ]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
+    lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[	 ]]\($symcode$symcode*\)[[	 ]][[	 ]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
   fi
-  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | $SED '/ __gnu_lto/d'"
+  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ __gnu_lto/d'"
 
   # Check to see that the pipe works correctly.
   pipe_works=no
@@ -4108,13 +4057,14 @@ void nm_test_func(void){}
 #ifdef __cplusplus
 }
 #endif
-int main(void){nm_test_var='a';nm_test_func();return(0);}
+int main(){nm_test_var='a';nm_test_func();return(0);}
 _LT_EOF
 
   if AC_TRY_EVAL(ac_compile); then
     # Now try to grab the symbols.
     nlist=conftest.nm
-    if AC_TRY_EVAL(NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist) && test -s "$nlist"; then
+    $ECHO "$as_me:$LINENO: $NM conftest.$ac_objext | $lt_cv_sys_global_symbol_pipe > $nlist" >&AS_MESSAGE_LOG_FD
+    if eval "$NM" conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist 2>&AS_MESSAGE_LOG_FD && test -s "$nlist"; then
       # Try sorting and uniquifying the output.
       if sort "$nlist" | uniq > "$nlist"T; then
 	mv -f "$nlist"T "$nlist"
@@ -4284,7 +4234,7 @@ m4_if([$1], [CXX], [
     beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
       # PIC is the default for these OSes.
       ;;
-    mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
+    mingw* | cygwin* | os2* | pw32* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -4360,7 +4310,7 @@ m4_if([$1], [CXX], [
 	  ;;
 	esac
 	;;
-      mingw* | windows* | cygwin* | os2* | pw32* | cegcc*)
+      mingw* | cygwin* | os2* | pw32* | cegcc*)
 	# This hack is so that the source file can tell whether it is being
 	# built for inclusion in a dll (and should export symbols for example).
 	m4_if([$1], [GCJ], [],
@@ -4379,7 +4329,7 @@ m4_if([$1], [CXX], [
 	    ;;
 	esac
 	;;
-      freebsd* | dragonfly* | midnightbsd*)
+      freebsd* | dragonfly*)
 	# FreeBSD uses GNU C++
 	;;
       hpux9* | hpux10* | hpux11*)
@@ -4462,7 +4412,7 @@ m4_if([$1], [CXX], [
 	    _LT_TAGVAR(lt_prog_compiler_static, $1)='-qstaticlink'
 	    ;;
 	  *)
-	    case `$CC -V 2>&1 | $SED 5q` in
+	    case `$CC -V 2>&1 | sed 5q` in
 	    *Sun\ C*)
 	      # Sun C++ 5.9
 	      _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
@@ -4486,7 +4436,7 @@ m4_if([$1], [CXX], [
 	    ;;
 	esac
 	;;
-      netbsd*)
+      netbsd* | netbsdelf*-gnu)
 	;;
       *qnx* | *nto*)
         # QNX uses GNU C++, but need to define -shared option too, otherwise
@@ -4608,7 +4558,7 @@ m4_if([$1], [CXX], [
       # PIC is the default for these OSes.
       ;;
 
-    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       # Although the cygwin gcc ignores -fPIC, still need this for old-style
@@ -4712,7 +4662,7 @@ m4_if([$1], [CXX], [
       esac
       ;;
 
-    mingw* | windows* | cygwin* | pw32* | os2* | cegcc*)
+    mingw* | cygwin* | pw32* | os2* | cegcc*)
       # This hack is so that the source file can tell whether it is being
       # built for inclusion in a dll (and should export symbols for example).
       m4_if([$1], [GCJ], [],
@@ -4754,8 +4704,8 @@ m4_if([$1], [CXX], [
 	_LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
 	_LT_TAGVAR(lt_prog_compiler_static, $1)='-static'
         ;;
-      *flang* | ftn)
-        # Flang compiler.
+      # flang / f18. f95 an alias for gfortran or flang on Debian
+      flang* | f18* | f95*)
 	_LT_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
 	_LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
 	_LT_TAGVAR(lt_prog_compiler_static, $1)='-static'
@@ -4804,7 +4754,7 @@ m4_if([$1], [CXX], [
 	_LT_TAGVAR(lt_prog_compiler_static, $1)='-qstaticlink'
 	;;
       *)
-	case `$CC -V 2>&1 | $SED 5q` in
+	case `$CC -V 2>&1 | sed 5q` in
 	*Sun\ Ceres\ Fortran* | *Sun*Fortran*\ [[1-7]].* | *Sun*Fortran*\ 8.[[0-3]]*)
 	  # Sun Fortran 8.3 passes all unrecognized flags to the linker
 	  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
@@ -4987,15 +4937,15 @@ m4_if([$1], [CXX], [
     if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
       _LT_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && ([substr](\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols'
     else
-      _LT_TAGVAR(export_symbols_cmds, $1)='`func_echo_all $NM | $SED -e '\''s/B\([[^B]]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && ([substr](\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
+      _LT_TAGVAR(export_symbols_cmds, $1)='`func_echo_all $NM | $SED -e '\''s/B\([[^B]]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && ([substr](\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
     fi
     ;;
   pw32*)
     _LT_TAGVAR(export_symbols_cmds, $1)=$ltdll_cmds
     ;;
-  cygwin* | mingw* | windows* | cegcc*)
+  cygwin* | mingw* | cegcc*)
     case $cc_basename in
-    cl* | icl*)
+    cl*)
       _LT_TAGVAR(exclude_expsyms, $1)='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*'
       ;;
     *)
@@ -5004,6 +4954,9 @@ m4_if([$1], [CXX], [
       ;;
     esac
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    _LT_TAGVAR(link_all_deplibs, $1)=no
+    ;;
   *)
     _LT_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
     ;;
@@ -5051,21 +5004,24 @@ dnl Note also adjust exclude_expsyms for C++ above.
   extract_expsyms_cmds=
 
   case $host_os in
-  cygwin* | mingw* | windows* | pw32* | cegcc*)
-    # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time
+  cygwin* | mingw* | pw32* | cegcc*)
+    # FIXME: the MSVC++ port hasn't been tested in a loooong time
     # When not using gcc, we currently assume that we are using
-    # Microsoft Visual C++ or Intel C++ Compiler.
+    # Microsoft Visual C++.
     if test yes != "$GCC"; then
       with_gnu_ld=no
     fi
     ;;
   interix*)
-    # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC)
+    # we just hope/assume this is gcc and not c89 (= MSVC++)
     with_gnu_ld=yes
     ;;
-  openbsd*)
+  openbsd* | bitrig*)
     with_gnu_ld=no
     ;;
+  linux* | k*bsd*-gnu | gnu*)
+    _LT_TAGVAR(link_all_deplibs, $1)=no
+    ;;
   esac
 
   _LT_TAGVAR(ld_shlibs, $1)=yes
@@ -5112,7 +5068,7 @@ dnl Note also adjust exclude_expsyms for C++ above.
       _LT_TAGVAR(whole_archive_flag_spec, $1)=
     fi
     supports_anon_versioning=no
-    case `$LD -v | $SED -e 's/([[^)]]\+)\s\+//' 2>&1` in
+    case `$LD -v | $SED -e 's/([^)]\+)\s\+//' 2>&1` in
       *GNU\ gold*) supports_anon_versioning=yes ;;
       *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.10.*) ;; # catch versions < 2.11
       *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ...
@@ -5166,7 +5122,7 @@ _LT_EOF
       fi
       ;;
 
-    cygwin* | mingw* | windows* | pw32* | cegcc*)
+    cygwin* | mingw* | pw32* | cegcc*)
       # _LT_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless,
       # as there is no search path for DLLs.
       _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
@@ -5222,9 +5178,8 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      _LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      _LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       _LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
-      _LT_TAGVAR(file_list_spec, $1)='@'
       ;;
 
     interix[[3-9]]*)
@@ -5239,7 +5194,7 @@ _LT_EOF
       # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
       # time.  Moving up from 0x10000000 also allows more sbrk(2) space.
       _LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
-      _LT_TAGVAR(archive_expsym_cmds, $1)='$SED "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
+      _LT_TAGVAR(archive_expsym_cmds, $1)='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
       ;;
 
     gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu)
@@ -5282,7 +5237,7 @@ _LT_EOF
 	  _LT_TAGVAR(compiler_needs_object, $1)=yes
 	  ;;
 	esac
-	case `$CC -V 2>&1 | $SED 5q` in
+	case `$CC -V 2>&1 | sed 5q` in
 	*Sun\ C*)			# Sun C 5.9
 	  _LT_TAGVAR(whole_archive_flag_spec, $1)='$wl--whole-archive`new_convenience=; for conv in $convenience\"\"; do test -z \"$conv\" || new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive'
 	  _LT_TAGVAR(compiler_needs_object, $1)=yes
@@ -5294,7 +5249,7 @@ _LT_EOF
 
         if test yes = "$supports_anon_versioning"; then
           _LT_TAGVAR(archive_expsym_cmds, $1)='echo "{ global:" > $output_objdir/$libname.ver~
-            cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+            cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
             echo "local: *; };" >> $output_objdir/$libname.ver~
             $CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib'
         fi
@@ -5310,7 +5265,7 @@ _LT_EOF
 	  _LT_TAGVAR(archive_cmds, $1)='$LD -shared $libobjs $deplibs $linker_flags -soname $soname -o $lib'
 	  if test yes = "$supports_anon_versioning"; then
 	    _LT_TAGVAR(archive_expsym_cmds, $1)='echo "{ global:" > $output_objdir/$libname.ver~
-              cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+              cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
               echo "local: *; };" >> $output_objdir/$libname.ver~
               $LD -shared $libobjs $deplibs $linker_flags -soname $soname -version-script $output_objdir/$libname.ver -o $lib'
 	  fi
@@ -5321,7 +5276,7 @@ _LT_EOF
       fi
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	_LT_TAGVAR(archive_cmds, $1)='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib'
 	wlarc=
@@ -5442,7 +5397,7 @@ _LT_EOF
 	if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then
 	  _LT_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && ([substr](\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols'
 	else
-	  _LT_TAGVAR(export_symbols_cmds, $1)='`func_echo_all $NM | $SED -e '\''s/B\([[^B]]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && ([substr](\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
+	  _LT_TAGVAR(export_symbols_cmds, $1)='`func_echo_all $NM | $SED -e '\''s/B\([[^B]]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && ([substr](\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols'
 	fi
 	aix_use_runtimelinking=no
 
@@ -5623,14 +5578,14 @@ _LT_EOF
       _LT_TAGVAR(export_dynamic_flag_spec, $1)=-rdynamic
       ;;
 
-    cygwin* | mingw* | windows* | pw32* | cegcc*)
+    cygwin* | mingw* | pw32* | cegcc*)
       # When not using gcc, we currently assume that we are using
-      # Microsoft Visual C++ or Intel C++ Compiler.
+      # Microsoft Visual C++.
       # hardcode_libdir_flag_spec is actually meaningless, as there is
       # no search path for DLLs.
       case $cc_basename in
-      cl* | icl*)
-	# Native MSVC or ICC
+      cl*)
+	# Native MSVC
 	_LT_TAGVAR(hardcode_libdir_flag_spec, $1)=' '
 	_LT_TAGVAR(allow_undefined_flag, $1)=unsupported
 	_LT_TAGVAR(always_export_symbols, $1)=yes
@@ -5640,14 +5595,14 @@ _LT_EOF
 	# Tell ltmain to make .dll files, not .so files.
 	shrext_cmds=.dll
 	# FIXME: Setting linknames here is a bad hack.
-	_LT_TAGVAR(archive_cmds, $1)='$CC -Fe $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
+	_LT_TAGVAR(archive_cmds, $1)='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames='
 	_LT_TAGVAR(archive_expsym_cmds, $1)='if _LT_DLL_DEF_P([$export_symbols]); then
             cp "$export_symbols" "$output_objdir/$soname.def";
             echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp";
           else
             $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp;
           fi~
-          $CC -Fe $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
+          $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~
           linknames='
 	# The linker will not automatically build a static lib if we build a DLL.
 	# _LT_TAGVAR(old_archive_from_new_cmds, $1)='true'
@@ -5671,7 +5626,7 @@ _LT_EOF
           fi'
 	;;
       *)
-	# Assume MSVC and ICC wrapper
+	# Assume MSVC wrapper
 	_LT_TAGVAR(hardcode_libdir_flag_spec, $1)=' '
 	_LT_TAGVAR(allow_undefined_flag, $1)=unsupported
 	# Tell ltmain to make .lib files, not .a files.
@@ -5719,7 +5674,7 @@ _LT_EOF
       ;;
 
     # FreeBSD 3 and greater uses gcc -shared to do shared libraries.
-    freebsd* | dragonfly* | midnightbsd*)
+    freebsd* | dragonfly*)
       _LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags'
       _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
       _LT_TAGVAR(hardcode_direct, $1)=yes
@@ -5842,6 +5797,7 @@ _LT_EOF
 	if test yes = "$lt_cv_irix_exported_symbol"; then
           _LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib'
 	fi
+	_LT_TAGVAR(link_all_deplibs, $1)=no
       else
 	_LT_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib'
 	_LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib'
@@ -5863,7 +5819,7 @@ _LT_EOF
       esac
       ;;
 
-    netbsd*)
+    netbsd* | netbsdelf*-gnu)
       if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then
 	_LT_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'  # a.out
       else
@@ -5885,7 +5841,7 @@ _LT_EOF
     *nto* | *qnx*)
       ;;
 
-    openbsd*)
+    openbsd* | bitrig*)
       if test -f /usr/libexec/ld.so; then
 	_LT_TAGVAR(hardcode_direct, $1)=yes
 	_LT_TAGVAR(hardcode_shlibpath_var, $1)=no
@@ -5928,9 +5884,8 @@ _LT_EOF
 	cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	$CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	emximp -o $lib $output_objdir/$libname.def'
-      _LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+      _LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
       _LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
-      _LT_TAGVAR(file_list_spec, $1)='@'
       ;;
 
     osf3*)
@@ -6222,7 +6177,7 @@ _LT_TAGDECL([], [hardcode_direct], [0],
 _LT_TAGDECL([], [hardcode_direct_absolute], [0],
     [Set to "yes" if using DIR/libNAME$shared_ext during linking hardcodes
     DIR into the resulting binary and the resulting library dependency is
-    "absolute", i.e. impossible to change by setting $shlibpath_var if the
+    "absolute", i.e impossible to change by setting $shlibpath_var if the
     library is relocated])
 _LT_TAGDECL([], [hardcode_minus_L], [0],
     [Set to "yes" if using the -LDIR flag during linking hardcodes DIR
@@ -6280,7 +6235,7 @@ _LT_TAGVAR(objext, $1)=$objext
 lt_simple_compile_test_code="int some_variable = 0;"
 
 # Code to be used in simple link tests
-lt_simple_link_test_code='int main(void){return(0);}'
+lt_simple_link_test_code='int main(){return(0);}'
 
 _LT_TAG_COMPILER
 # Save the default compiler, since it gets overwritten when the other
@@ -6469,7 +6424,8 @@ if test yes != "$_lt_caught_CXX_error"; then
         wlarc='$wl'
 
         # ancient GNU ld didn't support --whole-archive et. al.
-        if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then
+        if eval "`$CC -print-prog-name=ld` --help 2>&1" |
+	  $GREP 'no-whole-archive' > /dev/null; then
           _LT_TAGVAR(whole_archive_flag_spec, $1)=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive'
         else
           _LT_TAGVAR(whole_archive_flag_spec, $1)=
@@ -6489,7 +6445,7 @@ if test yes != "$_lt_caught_CXX_error"; then
       # Commands to make compiler produce verbose output that lists
       # what "hidden" libraries, object files and flags are used when
       # linking a shared library.
-      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
+      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 
     else
       GXX=no
@@ -6698,10 +6654,10 @@ if test yes != "$_lt_caught_CXX_error"; then
         esac
         ;;
 
-      cygwin* | mingw* | windows* | pw32* | cegcc*)
+      cygwin* | mingw* | pw32* | cegcc*)
 	case $GXX,$cc_basename in
-	,cl* | no,cl* | ,icl* | no,icl*)
-	  # Native MSVC or ICC
+	,cl* | no,cl*)
+	  # Native MSVC
 	  # hardcode_libdir_flag_spec is actually meaningless, as there is
 	  # no search path for DLLs.
 	  _LT_TAGVAR(hardcode_libdir_flag_spec, $1)=' '
@@ -6797,9 +6753,8 @@ if test yes != "$_lt_caught_CXX_error"; then
 	  cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~
 	  $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~
 	  emximp -o $lib $output_objdir/$libname.def'
-	_LT_TAGVAR(old_archive_from_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
+	_LT_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def'
 	_LT_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
-	_LT_TAGVAR(file_list_spec, $1)='@'
 	;;
 
       dgux*)
@@ -6830,7 +6785,7 @@ if test yes != "$_lt_caught_CXX_error"; then
         _LT_TAGVAR(archive_cmds_need_lc, $1)=no
         ;;
 
-      freebsd* | dragonfly* | midnightbsd*)
+      freebsd* | dragonfly*)
         # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF
         # conventions
         _LT_TAGVAR(ld_shlibs, $1)=yes
@@ -6865,7 +6820,7 @@ if test yes != "$_lt_caught_CXX_error"; then
             # explicitly linking system object files so we need to strip them
             # from the output so that they don't get included in the library
             # dependencies.
-            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "[[-]]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+            output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP " \-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
             ;;
           *)
             if test yes = "$GXX"; then
@@ -6930,7 +6885,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	    # explicitly linking system object files so we need to strip them
 	    # from the output so that they don't get included in the library
 	    # dependencies.
-	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "[[-]]L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
+	    output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP " \-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"'
 	    ;;
           *)
 	    if test yes = "$GXX"; then
@@ -6967,7 +6922,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	# 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
 	# time.  Moving up from 0x10000000 also allows more sbrk(2) space.
 	_LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
-	_LT_TAGVAR(archive_expsym_cmds, $1)='$SED "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
+	_LT_TAGVAR(archive_expsym_cmds, $1)='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
 	;;
       irix5* | irix6*)
         case $cc_basename in
@@ -7107,13 +7062,13 @@ if test yes != "$_lt_caught_CXX_error"; then
 	    _LT_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib'
 	    if test yes = "$supports_anon_versioning"; then
 	      _LT_TAGVAR(archive_expsym_cmds, $1)='echo "{ global:" > $output_objdir/$libname.ver~
-                cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
+                cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
                 echo "local: *; };" >> $output_objdir/$libname.ver~
                 $CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib'
 	    fi
 	    ;;
 	  *)
-	    case `$CC -V 2>&1 | $SED 5q` in
+	    case `$CC -V 2>&1 | sed 5q` in
 	    *Sun\ C*)
 	      # Sun C++ 5.9
 	      _LT_TAGVAR(no_undefined_flag, $1)=' -zdefs'
@@ -7178,7 +7133,7 @@ if test yes != "$_lt_caught_CXX_error"; then
         _LT_TAGVAR(ld_shlibs, $1)=yes
 	;;
 
-      openbsd*)
+      openbsd* | bitrig*)
 	if test -f /usr/libexec/ld.so; then
 	  _LT_TAGVAR(hardcode_direct, $1)=yes
 	  _LT_TAGVAR(hardcode_shlibpath_var, $1)=no
@@ -7269,7 +7224,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	      # Commands to make compiler produce verbose output that lists
 	      # what "hidden" libraries, object files and flags are used when
 	      # linking a shared library.
-	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
+	      output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 
 	    else
 	      # FIXME: insert proper C++ library support
@@ -7353,7 +7308,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
+	        output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 	      else
 	        # g++ 2.7 appears to require '-G' NOT '-shared' on this
 	        # platform.
@@ -7364,7 +7319,7 @@ if test yes != "$_lt_caught_CXX_error"; then
 	        # Commands to make compiler produce verbose output that lists
 	        # what "hidden" libraries, object files and flags are used when
 	        # linking a shared library.
-	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "[[-]]L"'
+	        output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP " \-L"'
 	      fi
 
 	      _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='$wl-R $wl$libdir'
@@ -7602,11 +7557,10 @@ if AC_TRY_EVAL(ac_compile); then
     case $prev$p in
 
     -L* | -R* | -l*)
-       # Some compilers place space between "-{L,R,l}" and the path.
+       # Some compilers place space between "-{L,R}" and the path.
        # Remove the space.
-       if test x-L = x"$p" ||
-          test x-R = x"$p" ||
-          test x-l = x"$p"; then
+       if test x-L = "$p" ||
+          test x-R = "$p"; then
 	 prev=$p
 	 continue
        fi
@@ -8260,14 +8214,6 @@ _LT_DECL([], [DLLTOOL], [1], [DLL creation program])
 AC_SUBST([DLLTOOL])
 ])
 
-# _LT_DECL_FILECMD
-# ----------------
-# Check for a file(cmd) program that can be used to detect file type and magic
-m4_defun([_LT_DECL_FILECMD],
-[AC_CHECK_PROG([FILECMD], [file], [file], [:])
-_LT_DECL([], [FILECMD], [1], [A file(cmd) program that detects file types])
-])# _LD_DECL_FILECMD
-
 # _LT_DECL_SED
 # ------------
 # Check for a fully-functional sed program, that truncates
@@ -8280,6 +8226,73 @@ _LT_DECL([], [SED], [1], [A sed program that does not truncate output])
 _LT_DECL([], [Xsed], ["\$SED -e 1s/^X//"],
     [Sed that helps us avoid accidentally triggering echo(1) options like -n])
 ])# _LT_DECL_SED
+
+m4_ifndef([AC_PROG_SED], [
+############################################################
+# NOTE: This macro has been submitted for inclusion into   #
+#  GNU Autoconf as AC_PROG_SED.  When it is available in   #
+#  a released version of Autoconf we should remove this    #
+#  macro and use it instead.                               #
+############################################################
+
+m4_defun([AC_PROG_SED],
+[AC_MSG_CHECKING([for a sed that does not truncate output])
+AC_CACHE_VAL(lt_cv_path_SED,
+[# Loop through the user's path and test for sed and gsed.
+# Then use that list of sed's as ones to test for truncation.
+as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+  for lt_ac_prog in sed gsed; do
+    for ac_exec_ext in '' $ac_executable_extensions; do
+      if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then
+        lt_ac_sed_list="$lt_ac_sed_list $as_dir/$lt_ac_prog$ac_exec_ext"
+      fi
+    done
+  done
+done
+IFS=$as_save_IFS
+lt_ac_max=0
+lt_ac_count=0
+# Add /usr/xpg4/bin/sed as it is typically found on Solaris
+# along with /bin/sed that truncates output.
+for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do
+  test ! -f "$lt_ac_sed" && continue
+  cat /dev/null > conftest.in
+  lt_ac_count=0
+  echo $ECHO_N "0123456789$ECHO_C" >conftest.in
+  # Check for GNU sed and select it if it is found.
+  if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; then
+    lt_cv_path_SED=$lt_ac_sed
+    break
+  fi
+  while true; do
+    cat conftest.in conftest.in >conftest.tmp
+    mv conftest.tmp conftest.in
+    cp conftest.in conftest.nl
+    echo >>conftest.nl
+    $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break
+    cmp -s conftest.out conftest.nl || break
+    # 10000 chars as input seems more than enough
+    test 10 -lt "$lt_ac_count" && break
+    lt_ac_count=`expr $lt_ac_count + 1`
+    if test "$lt_ac_count" -gt "$lt_ac_max"; then
+      lt_ac_max=$lt_ac_count
+      lt_cv_path_SED=$lt_ac_sed
+    fi
+  done
+done
+])
+SED=$lt_cv_path_SED
+AC_SUBST([SED])
+AC_MSG_RESULT([$SED])
+])#AC_PROG_SED
+])#m4_ifndef
+
+# Old name:
+AU_ALIAS([LT_AC_PROG_SED], [AC_PROG_SED])
 dnl aclocal-1.4 backwards compatibility:
 dnl AC_DEFUN([LT_AC_PROG_SED], [])
 
@@ -8326,7 +8339,7 @@ AC_CACHE_VAL(lt_cv_to_host_file_cmd,
 [case $host in
   *-*-mingw* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32
         ;;
       *-*-cygwin* )
@@ -8339,7 +8352,7 @@ AC_CACHE_VAL(lt_cv_to_host_file_cmd,
     ;;
   *-*-cygwin* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin
         ;;
       *-*-cygwin* )
@@ -8365,9 +8378,9 @@ AC_CACHE_VAL(lt_cv_to_tool_file_cmd,
 [#assume ordinary cross tools, or native build.
 lt_cv_to_tool_file_cmd=func_convert_file_noop
 case $host in
-  *-*-mingw* | *-*-windows* )
+  *-*-mingw* )
     case $build in
-      *-*-mingw* | *-*-windows* ) # actually msys
+      *-*-mingw* ) # actually msys
         lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32
         ;;
     esac
diff --git a/deps/cares/m4/ltoptions.m4 b/deps/cares/m4/ltoptions.m4
old mode 100644
new mode 100755
index 25caa890298a4e..94b082976667c0
--- a/deps/cares/m4/ltoptions.m4
+++ b/deps/cares/m4/ltoptions.m4
@@ -1,14 +1,14 @@
 # Helper functions for option handling.                    -*- Autoconf -*-
 #
-#   Copyright (C) 2004-2005, 2007-2009, 2011-2019, 2021-2024 Free
-#   Software Foundation, Inc.
+#   Copyright (C) 2004-2005, 2007-2009, 2011-2015 Free Software
+#   Foundation, Inc.
 #   Written by Gary V. Vaughan, 2004
 #
 # This file is free software; the Free Software Foundation gives
 # unlimited permission to copy and/or distribute it, with or without
 # modifications, as long as this notice is preserved.
 
-# serial 10 ltoptions.m4
+# serial 8 ltoptions.m4
 
 # This is to help aclocal find these macros, as it can't see m4_define.
 AC_DEFUN([LTOPTIONS_VERSION], [m4_if([1])])
@@ -128,7 +128,7 @@ LT_OPTION_DEFINE([LT_INIT], [win32-dll],
 [enable_win32_dll=yes
 
 case $host in
-*-*-cygwin* | *-*-mingw* | *-*-windows* | *-*-pw32* | *-*-cegcc*)
+*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*)
   AC_CHECK_TOOL(AS, as, false)
   AC_CHECK_TOOL(DLLTOOL, dlltool, false)
   AC_CHECK_TOOL(OBJDUMP, objdump, false)
@@ -323,39 +323,29 @@ dnl AC_DEFUN([AM_DISABLE_FAST_INSTALL], [])
 
 # _LT_WITH_AIX_SONAME([DEFAULT])
 # ----------------------------------
-# implement the --enable-aix-soname configure option, and support the
-# `aix-soname=aix' and `aix-soname=both' and `aix-soname=svr4' LT_INIT options.
-# DEFAULT is either `aix', `both', or `svr4'.  If omitted, it defaults to `aix'.
+# implement the --with-aix-soname flag, and support the `aix-soname=aix'
+# and `aix-soname=both' and `aix-soname=svr4' LT_INIT options. DEFAULT
+# is either `aix', `both' or `svr4'.  If omitted, it defaults to `aix'.
 m4_define([_LT_WITH_AIX_SONAME],
 [m4_define([_LT_WITH_AIX_SONAME_DEFAULT], [m4_if($1, svr4, svr4, m4_if($1, both, both, aix))])dnl
 shared_archive_member_spec=
 case $host,$enable_shared in
 power*-*-aix[[5-9]]*,yes)
   AC_MSG_CHECKING([which variant of shared library versioning to provide])
-  AC_ARG_ENABLE([aix-soname],
-    [AS_HELP_STRING([--enable-aix-soname=aix|svr4|both],
+  AC_ARG_WITH([aix-soname],
+    [AS_HELP_STRING([--with-aix-soname=aix|svr4|both],
       [shared library versioning (aka "SONAME") variant to provide on AIX, @<:@default=]_LT_WITH_AIX_SONAME_DEFAULT[@:>@.])],
-    [case $enableval in
-     aix|svr4|both)
-       ;;
-     *)
-       AC_MSG_ERROR([Unknown argument to --enable-aix-soname])
-       ;;
-     esac
-     lt_cv_with_aix_soname=$enable_aix_soname],
-    [_AC_ENABLE_IF([with], [aix-soname],
-        [case $withval in
-         aix|svr4|both)
-           ;;
-         *)
-           AC_MSG_ERROR([Unknown argument to --with-aix-soname])
-           ;;
-         esac
-         lt_cv_with_aix_soname=$with_aix_soname],
-        [AC_CACHE_VAL([lt_cv_with_aix_soname],
-           [lt_cv_with_aix_soname=]_LT_WITH_AIX_SONAME_DEFAULT)])
-     enable_aix_soname=$lt_cv_with_aix_soname])
-  with_aix_soname=$enable_aix_soname
+    [case $withval in
+    aix|svr4|both)
+      ;;
+    *)
+      AC_MSG_ERROR([Unknown argument to --with-aix-soname])
+      ;;
+    esac
+    lt_cv_with_aix_soname=$with_aix_soname],
+    [AC_CACHE_VAL([lt_cv_with_aix_soname],
+      [lt_cv_with_aix_soname=]_LT_WITH_AIX_SONAME_DEFAULT)
+    with_aix_soname=$lt_cv_with_aix_soname])
   AC_MSG_RESULT([$with_aix_soname])
   if test aix != "$with_aix_soname"; then
     # For the AIX way of multilib, we name the shared archive member
@@ -386,50 +376,30 @@ LT_OPTION_DEFINE([LT_INIT], [aix-soname=svr4], [_LT_WITH_AIX_SONAME([svr4])])
 
 # _LT_WITH_PIC([MODE])
 # --------------------
-# implement the --enable-pic flag, and support the 'pic-only' and 'no-pic'
+# implement the --with-pic flag, and support the 'pic-only' and 'no-pic'
 # LT_INIT options.
 # MODE is either 'yes' or 'no'.  If omitted, it defaults to 'both'.
 m4_define([_LT_WITH_PIC],
-[AC_ARG_ENABLE([pic],
-    [AS_HELP_STRING([--enable-pic@<:@=PKGS@:>@],
+[AC_ARG_WITH([pic],
+    [AS_HELP_STRING([--with-pic@<:@=PKGS@:>@],
 	[try to use only PIC/non-PIC objects @<:@default=use both@:>@])],
     [lt_p=${PACKAGE-default}
-     case $enableval in
-     yes|no) pic_mode=$enableval ;;
-     *)
-       pic_mode=default
-       # Look at the argument we got.  We use all the common list separators.
-       lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-       for lt_pkg in $enableval; do
-	 IFS=$lt_save_ifs
-	 if test "X$lt_pkg" = "X$lt_p"; then
-	   pic_mode=yes
-	 fi
-       done
-       IFS=$lt_save_ifs
-       ;;
-     esac],
-    [dnl Continue to support --with-pic and --without-pic, for backward
-     dnl compatibility.
-     _AC_ENABLE_IF([with], [pic],
-	[lt_p=${PACKAGE-default}
-	 case $withval in
-	 yes|no) pic_mode=$withval ;;
-	 *)
-	   pic_mode=default
-	   # Look at the argument we got.  We use all the common list separators.
-	   lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
-	   for lt_pkg in $withval; do
-	     IFS=$lt_save_ifs
-	     if test "X$lt_pkg" = "X$lt_p"; then
-	       pic_mode=yes
-	     fi
-	   done
-	   IFS=$lt_save_ifs
-	   ;;
-	 esac],
-	[pic_mode=m4_default([$1], [default])])]
-    )
+    case $withval in
+    yes|no) pic_mode=$withval ;;
+    *)
+      pic_mode=default
+      # Look at the argument we got.  We use all the common list separators.
+      lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR,
+      for lt_pkg in $withval; do
+	IFS=$lt_save_ifs
+	if test "X$lt_pkg" = "X$lt_p"; then
+	  pic_mode=yes
+	fi
+      done
+      IFS=$lt_save_ifs
+      ;;
+    esac],
+    [pic_mode=m4_default([$1], [default])])
 
 _LT_DECL([], [pic_mode], [0], [What type of objects to build])dnl
 ])# _LT_WITH_PIC
diff --git a/deps/cares/m4/ltsugar.m4 b/deps/cares/m4/ltsugar.m4
old mode 100644
new mode 100755
index 5b5c80a3ad78a1..48bc9344a4d661
--- a/deps/cares/m4/ltsugar.m4
+++ b/deps/cares/m4/ltsugar.m4
@@ -1,6 +1,6 @@
 # ltsugar.m4 -- libtool m4 base layer.                         -*-Autoconf-*-
 #
-# Copyright (C) 2004-2005, 2007-2008, 2011-2019, 2021-2024 Free Software
+# Copyright (C) 2004-2005, 2007-2008, 2011-2015 Free Software
 # Foundation, Inc.
 # Written by Gary V. Vaughan, 2004
 #
diff --git a/deps/cares/m4/ltversion.m4 b/deps/cares/m4/ltversion.m4
old mode 100644
new mode 100755
index 149c9719fa5983..fa04b52a3bf868
--- a/deps/cares/m4/ltversion.m4
+++ b/deps/cares/m4/ltversion.m4
@@ -1,7 +1,6 @@
 # ltversion.m4 -- version numbers			-*- Autoconf -*-
 #
-#   Copyright (C) 2004, 2011-2019, 2021-2024 Free Software Foundation,
-#   Inc.
+#   Copyright (C) 2004, 2011-2015 Free Software Foundation, Inc.
 #   Written by Scott James Remnant, 2004
 #
 # This file is free software; the Free Software Foundation gives
@@ -10,15 +9,15 @@
 
 # @configure_input@
 
-# serial 4392 ltversion.m4
+# serial 4179 ltversion.m4
 # This file is part of GNU Libtool
 
-m4_define([LT_PACKAGE_VERSION], [2.5.3])
-m4_define([LT_PACKAGE_REVISION], [2.5.3])
+m4_define([LT_PACKAGE_VERSION], [2.4.6])
+m4_define([LT_PACKAGE_REVISION], [2.4.6])
 
 AC_DEFUN([LTVERSION_VERSION],
-[macro_version='2.5.3'
-macro_revision='2.5.3'
+[macro_version='2.4.6'
+macro_revision='2.4.6'
 _LT_DECL(, macro_version, 0, [Which release of libtool.m4 was used?])
 _LT_DECL(, macro_revision, 0)
 ])
diff --git a/deps/cares/m4/lt~obsolete.m4 b/deps/cares/m4/lt~obsolete.m4
old mode 100644
new mode 100755
index 22b5346973571a..c6b26f88f6c3c1
--- a/deps/cares/m4/lt~obsolete.m4
+++ b/deps/cares/m4/lt~obsolete.m4
@@ -1,7 +1,7 @@
 # lt~obsolete.m4 -- aclocal satisfying obsolete definitions.    -*-Autoconf-*-
 #
-#   Copyright (C) 2004-2005, 2007, 2009, 2011-2019, 2021-2024 Free
-#   Software Foundation, Inc.
+#   Copyright (C) 2004-2005, 2007, 2009, 2011-2015 Free Software
+#   Foundation, Inc.
 #   Written by Scott James Remnant, 2004.
 #
 # This file is free software; the Free Software Foundation gives
diff --git a/deps/cares/src/Makefile.in b/deps/cares/src/Makefile.in
index 3ad8a92a6a4f15..0c3c0864d4460a 100644
--- a/deps/cares/src/Makefile.in
+++ b/deps/cares/src/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -69,8 +69,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -247,7 +245,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -314,10 +311,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -591,8 +586,8 @@ mostlyclean-generic:
 clean-generic:
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -684,10 +679,3 @@ uninstall-am:
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%
diff --git a/deps/cares/src/lib/Makefile.in b/deps/cares/src/lib/Makefile.in
index db6b17f2f53112..4aff043b26a310 100644
--- a/deps/cares/src/lib/Makefile.in
+++ b/deps/cares/src/lib/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -15,7 +15,7 @@
 @SET_MAKE@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Tue Oct 15 06:09:51 EDT 2024
+# from AX_AM_MACROS_STATIC on Sat Nov  9 17:40:37 UTC 2024
 
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
@@ -76,8 +76,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -154,9 +152,10 @@ am__base_list = \
   sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \
   sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g'
 am__uninstall_files_from_dir = { \
-  { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
-  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
-       $(am__cd) "$$dir" && echo $$files | $(am__xargs_n) 40 $(am__rm_f); }; \
+  test -z "$$files" \
+    || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
+    || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
+         $(am__cd) "$$dir" && rm -f $$files; }; \
   }
 am__installdirs = "$(DESTDIR)$(libdir)"
 LTLIBRARIES = $(lib_LTLIBRARIES)
@@ -491,7 +490,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -558,10 +556,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -818,12 +814,12 @@ ares_config.h: stamp-h1
 	@test -f $@ || $(MAKE) $(AM_MAKEFLAGS) stamp-h1
 
 stamp-h1: $(srcdir)/ares_config.h.in $(top_builddir)/config.status
-	$(AM_V_at)rm -f stamp-h1
-	$(AM_V_GEN)cd $(top_builddir) && $(SHELL) ./config.status src/lib/ares_config.h
+	@rm -f stamp-h1
+	cd $(top_builddir) && $(SHELL) ./config.status src/lib/ares_config.h
 $(srcdir)/ares_config.h.in: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) 
-	$(AM_V_GEN)($(am__cd) $(top_srcdir) && $(AUTOHEADER))
-	$(AM_V_at)rm -f stamp-h1
-	$(AM_V_at)touch $@
+	($(am__cd) $(top_srcdir) && $(AUTOHEADER))
+	rm -f stamp-h1
+	touch $@
 
 distclean-hdr:
 	-rm -f ares_config.h stamp-h1
@@ -853,19 +849,21 @@ uninstall-libLTLIBRARIES:
 	done
 
 clean-libLTLIBRARIES:
-	-$(am__rm_f) $(lib_LTLIBRARIES)
+	-test -z "$(lib_LTLIBRARIES)" || rm -f $(lib_LTLIBRARIES)
 	@list='$(lib_LTLIBRARIES)'; \
 	locs=`for p in $$list; do echo $$p; done | \
 	      sed 's|^[^/]*$$|.|; s|/[^/]*$$||; s|$$|/so_locations|' | \
 	      sort -u`; \
-	echo rm -f $${locs}; \
-	$(am__rm_f) $${locs}
+	test -z "$$locs" || { \
+	  echo rm -f $${locs}; \
+	  rm -f $${locs}; \
+	}
 dsa/$(am__dirstamp):
 	@$(MKDIR_P) dsa
-	@: >>dsa/$(am__dirstamp)
+	@: > dsa/$(am__dirstamp)
 dsa/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) dsa/$(DEPDIR)
-	@: >>dsa/$(DEPDIR)/$(am__dirstamp)
+	@: > dsa/$(DEPDIR)/$(am__dirstamp)
 dsa/libcares_la-ares_array.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
 dsa/libcares_la-ares_htable.lo: dsa/$(am__dirstamp) \
@@ -888,10 +886,10 @@ dsa/libcares_la-ares_slist.lo: dsa/$(am__dirstamp) \
 	dsa/$(DEPDIR)/$(am__dirstamp)
 event/$(am__dirstamp):
 	@$(MKDIR_P) event
-	@: >>event/$(am__dirstamp)
+	@: > event/$(am__dirstamp)
 event/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) event/$(DEPDIR)
-	@: >>event/$(DEPDIR)/$(am__dirstamp)
+	@: > event/$(DEPDIR)/$(am__dirstamp)
 event/libcares_la-ares_event_configchg.lo: event/$(am__dirstamp) \
 	event/$(DEPDIR)/$(am__dirstamp)
 event/libcares_la-ares_event_epoll.lo: event/$(am__dirstamp) \
@@ -910,10 +908,10 @@ event/libcares_la-ares_event_win32.lo: event/$(am__dirstamp) \
 	event/$(DEPDIR)/$(am__dirstamp)
 legacy/$(am__dirstamp):
 	@$(MKDIR_P) legacy
-	@: >>legacy/$(am__dirstamp)
+	@: > legacy/$(am__dirstamp)
 legacy/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) legacy/$(DEPDIR)
-	@: >>legacy/$(DEPDIR)/$(am__dirstamp)
+	@: > legacy/$(DEPDIR)/$(am__dirstamp)
 legacy/libcares_la-ares_create_query.lo: legacy/$(am__dirstamp) \
 	legacy/$(DEPDIR)/$(am__dirstamp)
 legacy/libcares_la-ares_expand_name.lo: legacy/$(am__dirstamp) \
@@ -948,10 +946,10 @@ legacy/libcares_la-ares_parse_uri_reply.lo: legacy/$(am__dirstamp) \
 	legacy/$(DEPDIR)/$(am__dirstamp)
 record/$(am__dirstamp):
 	@$(MKDIR_P) record
-	@: >>record/$(am__dirstamp)
+	@: > record/$(am__dirstamp)
 record/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) record/$(DEPDIR)
-	@: >>record/$(DEPDIR)/$(am__dirstamp)
+	@: > record/$(DEPDIR)/$(am__dirstamp)
 record/libcares_la-ares_dns_mapping.lo: record/$(am__dirstamp) \
 	record/$(DEPDIR)/$(am__dirstamp)
 record/libcares_la-ares_dns_multistring.lo: record/$(am__dirstamp) \
@@ -966,10 +964,10 @@ record/libcares_la-ares_dns_write.lo: record/$(am__dirstamp) \
 	record/$(DEPDIR)/$(am__dirstamp)
 str/$(am__dirstamp):
 	@$(MKDIR_P) str
-	@: >>str/$(am__dirstamp)
+	@: > str/$(am__dirstamp)
 str/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) str/$(DEPDIR)
-	@: >>str/$(DEPDIR)/$(am__dirstamp)
+	@: > str/$(DEPDIR)/$(am__dirstamp)
 str/libcares_la-ares_buf.lo: str/$(am__dirstamp) \
 	str/$(DEPDIR)/$(am__dirstamp)
 str/libcares_la-ares_str.lo: str/$(am__dirstamp) \
@@ -978,10 +976,10 @@ str/libcares_la-ares_strsplit.lo: str/$(am__dirstamp) \
 	str/$(DEPDIR)/$(am__dirstamp)
 util/$(am__dirstamp):
 	@$(MKDIR_P) util
-	@: >>util/$(am__dirstamp)
+	@: > util/$(am__dirstamp)
 util/$(DEPDIR)/$(am__dirstamp):
 	@$(MKDIR_P) util/$(DEPDIR)
-	@: >>util/$(DEPDIR)/$(am__dirstamp)
+	@: > util/$(DEPDIR)/$(am__dirstamp)
 util/libcares_la-ares_iface_ips.lo: util/$(am__dirstamp) \
 	util/$(DEPDIR)/$(am__dirstamp)
 util/libcares_la-ares_threads.lo: util/$(am__dirstamp) \
@@ -1110,7 +1108,7 @@ distclean-compile:
 
 $(am__depfiles_remade):
 	@$(MKDIR_P) $(@D)
-	@: >>$@
+	@echo '# dummy' >$@-t && $(am__mv) $@-t $@
 
 am--depfiles: $(am__depfiles_remade)
 
@@ -1975,21 +1973,21 @@ mostlyclean-generic:
 clean-generic:
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
-	-$(am__rm_f) $(DISTCLEANFILES)
-	-$(am__rm_f) dsa/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) dsa/$(am__dirstamp)
-	-$(am__rm_f) event/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) event/$(am__dirstamp)
-	-$(am__rm_f) legacy/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) legacy/$(am__dirstamp)
-	-$(am__rm_f) record/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) record/$(am__dirstamp)
-	-$(am__rm_f) str/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) str/$(am__dirstamp)
-	-$(am__rm_f) util/$(DEPDIR)/$(am__dirstamp)
-	-$(am__rm_f) util/$(am__dirstamp)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
+	-rm -f dsa/$(DEPDIR)/$(am__dirstamp)
+	-rm -f dsa/$(am__dirstamp)
+	-rm -f event/$(DEPDIR)/$(am__dirstamp)
+	-rm -f event/$(am__dirstamp)
+	-rm -f legacy/$(DEPDIR)/$(am__dirstamp)
+	-rm -f legacy/$(am__dirstamp)
+	-rm -f record/$(DEPDIR)/$(am__dirstamp)
+	-rm -f record/$(am__dirstamp)
+	-rm -f str/$(DEPDIR)/$(am__dirstamp)
+	-rm -f str/$(am__dirstamp)
+	-rm -f util/$(DEPDIR)/$(am__dirstamp)
+	-rm -f util/$(am__dirstamp)
+	-test -z "$(DISTCLEANFILES)" || rm -f $(DISTCLEANFILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -2000,7 +1998,7 @@ clean-am: clean-generic clean-libLTLIBRARIES clean-libtool \
 	mostlyclean-am
 
 distclean: distclean-recursive
-	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
+		-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_android.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cancel.Plo
@@ -2136,7 +2134,7 @@ install-ps-am:
 installcheck-am:
 
 maintainer-clean: maintainer-clean-recursive
-	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
+		-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo2hostent.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_addrinfo_localhost.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_android.Plo
 	-rm -f ./$(DEPDIR)/libcares_la-ares_cancel.Plo
@@ -2369,10 +2367,3 @@ code-coverage-capture-hook:
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%
diff --git a/deps/cares/src/lib/ares_config.h.in b/deps/cares/src/lib/ares_config.h.in
index d22fa863477fbf..d1f09d694db68e 100644
--- a/deps/cares/src/lib/ares_config.h.in
+++ b/deps/cares/src/lib/ares_config.h.in
@@ -309,25 +309,25 @@
 /* Define to 1 if you have `strnicmp` */
 #undef HAVE_STRNICMP
 
-/* Define to 1 if the system has the type 'struct addrinfo'. */
+/* Define to 1 if the system has the type `struct addrinfo'. */
 #undef HAVE_STRUCT_ADDRINFO
 
-/* Define to 1 if 'ai_flags' is a member of 'struct addrinfo'. */
+/* Define to 1 if `ai_flags' is a member of `struct addrinfo'. */
 #undef HAVE_STRUCT_ADDRINFO_AI_FLAGS
 
-/* Define to 1 if the system has the type 'struct in6_addr'. */
+/* Define to 1 if the system has the type `struct in6_addr'. */
 #undef HAVE_STRUCT_IN6_ADDR
 
-/* Define to 1 if the system has the type 'struct sockaddr_in6'. */
+/* Define to 1 if the system has the type `struct sockaddr_in6'. */
 #undef HAVE_STRUCT_SOCKADDR_IN6
 
-/* Define to 1 if 'sin6_scope_id' is a member of 'struct sockaddr_in6'. */
+/* Define to 1 if `sin6_scope_id' is a member of `struct sockaddr_in6'. */
 #undef HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID
 
-/* Define to 1 if the system has the type 'struct sockaddr_storage'. */
+/* Define to 1 if the system has the type `struct sockaddr_storage'. */
 #undef HAVE_STRUCT_SOCKADDR_STORAGE
 
-/* Define to 1 if the system has the type 'struct timeval'. */
+/* Define to 1 if the system has the type `struct timeval'. */
 #undef HAVE_STRUCT_TIMEVAL
 
 /* Define to 1 if you have the <sys/epoll.h> header file. */
@@ -357,6 +357,9 @@
 /* Define to 1 if you have the <sys/stat.h> header file. */
 #undef HAVE_SYS_STAT_H
 
+/* Define to 1 if you have the <sys/system_properties.h> header file. */
+#undef HAVE_SYS_SYSTEM_PROPERTIES_H
+
 /* Define to 1 if you have the <sys/time.h> header file. */
 #undef HAVE_SYS_TIME_H
 
@@ -481,12 +484,12 @@
 /* send() return value */
 #undef SEND_TYPE_RETV
 
-/* Define to 1 if all of the C89 standard headers exist (not just the ones
+/* Define to 1 if all of the C90 standard headers exist (not just the ones
    required in a freestanding environment). This macro is provided for
    backward compatibility; new code need not use it. */
 #undef STDC_HEADERS
 
-/* Enable extensions on AIX, Interix, z/OS.  */
+/* Enable extensions on AIX 3, Interix.  */
 #ifndef _ALL_SOURCE
 # undef _ALL_SOURCE
 #endif
@@ -547,15 +550,11 @@
 #ifndef __STDC_WANT_IEC_60559_DFP_EXT__
 # undef __STDC_WANT_IEC_60559_DFP_EXT__
 #endif
-/* Enable extensions specified by C23 Annex F.  */
-#ifndef __STDC_WANT_IEC_60559_EXT__
-# undef __STDC_WANT_IEC_60559_EXT__
-#endif
 /* Enable extensions specified by ISO/IEC TS 18661-4:2015.  */
 #ifndef __STDC_WANT_IEC_60559_FUNCS_EXT__
 # undef __STDC_WANT_IEC_60559_FUNCS_EXT__
 #endif
-/* Enable extensions specified by C23 Annex H and ISO/IEC TS 18661-3:2015.  */
+/* Enable extensions specified by ISO/IEC TS 18661-3:2015.  */
 #ifndef __STDC_WANT_IEC_60559_TYPES_EXT__
 # undef __STDC_WANT_IEC_60559_TYPES_EXT__
 #endif
@@ -584,14 +583,8 @@
 /* Number of bits in a file offset, on hosts where this is settable. */
 #undef _FILE_OFFSET_BITS
 
-/* Define to 1 on platforms where this makes off_t a 64-bit type. */
+/* Define for large files, on AIX-style hosts. */
 #undef _LARGE_FILES
 
-/* Number of bits in time_t, on hosts where this is settable. */
-#undef _TIME_BITS
-
-/* Define to 1 on platforms where this makes time_t a 64-bit type. */
-#undef __MINGW_USE_VC2005_COMPAT
-
-/* Define as 'unsigned int' if <stddef.h> doesn't define. */
+/* Define to `unsigned int' if <sys/types.h> does not define. */
 #undef size_t
diff --git a/deps/cares/src/lib/ares_getaddrinfo.c b/deps/cares/src/lib/ares_getaddrinfo.c
index 09d34d337834af..32791dc37dcd6f 100644
--- a/deps/cares/src/lib/ares_getaddrinfo.c
+++ b/deps/cares/src/lib/ares_getaddrinfo.c
@@ -481,6 +481,18 @@ static void terminate_retries(const struct host_query *hquery,
   query->no_retries = ARES_TRUE;
 }
 
+static ares_bool_t ai_has_ipv4(struct ares_addrinfo *ai)
+{
+  struct ares_addrinfo_node *node;
+
+  for (node = ai->nodes; node != NULL; node = node->ai_next) {
+    if (node->ai_family == AF_INET) {
+      return ARES_TRUE;
+    }
+  }
+  return ARES_FALSE;
+}
+
 static void host_callback(void *arg, ares_status_t status, size_t timeouts,
                           const ares_dns_record_t *dnsrec)
 {
@@ -496,7 +508,27 @@ static void host_callback(void *arg, ares_status_t status, size_t timeouts,
       addinfostatus =
         ares_parse_into_addrinfo(dnsrec, ARES_TRUE, hquery->port, hquery->ai);
     }
-    if (addinfostatus == ARES_SUCCESS) {
+
+    /* We sent out ipv4 and ipv6 requests simultaneously.  If we got a
+     * successful ipv4 response, we want to go ahead and tell the ipv6 request
+     * that if it fails or times out to not try again since we have the data
+     * we need.
+     *
+     * Our initial implementation of this would terminate retries if we got any
+     * successful response (ipv4 _or_ ipv6).  But we did get some user-reported
+     * issues with this that had bad system configs and odd behavior:
+     *  https://github.com/alpinelinux/docker-alpine/issues/366
+     *
+     * Essentially the ipv6 query succeeded but the ipv4 query failed or timed
+     * out, and so we only returned the ipv6 address, but the host couldn't
+     * use ipv6.  If we continued to allow ipv4 retries it would have found a
+     * server that worked and returned both address classes (this is clearly
+     * unexpected behavior).
+     *
+     * At some point down the road if ipv6 actually becomes required and
+     * reliable we can drop this ipv4 check.
+     */
+    if (addinfostatus == ARES_SUCCESS && ai_has_ipv4(hquery->ai)) {
       terminate_retries(hquery, ares_dns_record_get_id(dnsrec));
     }
   }
diff --git a/deps/cares/src/lib/ares_process.c b/deps/cares/src/lib/ares_process.c
index 62a6ae1ddaa46e..3d186ea9d58b31 100644
--- a/deps/cares/src/lib/ares_process.c
+++ b/deps/cares/src/lib/ares_process.c
@@ -650,6 +650,51 @@ static ares_status_t rewrite_without_edns(ares_query_t *query)
   return status;
 }
 
+static ares_bool_t issue_might_be_edns(const ares_dns_record_t *req,
+                                       const ares_dns_record_t *rsp)
+{
+  const ares_dns_rr_t *rr;
+
+  /* If we use EDNS and server answers with FORMERR without an OPT RR, the
+   * protocol extension is not understood by the responder. We must retry the
+   * query without EDNS enabled. */
+  if (ares_dns_record_get_rcode(rsp) != ARES_RCODE_FORMERR) {
+    return ARES_FALSE;
+  }
+
+  rr = ares_dns_get_opt_rr_const(req);
+  if (rr == NULL) {
+    /* We didn't send EDNS */
+    return ARES_FALSE;
+  }
+
+  if (ares_dns_get_opt_rr_const(rsp) == NULL) {
+    /* Spec says EDNS won't be echo'd back on non-supporting servers, so
+     * retry without EDNS */
+    return ARES_TRUE;
+  }
+
+  /* As per issue #911 some non-compliant servers that do indeed support EDNS
+   * but don't support unrecognized option codes exist.  At this point we
+   * expect them to have also returned an EDNS opt record, but we may remove
+   * that check in the future. Lets detect this situation if we're sending
+   * option codes */
+  if (ares_dns_rr_get_opt_cnt(rr, ARES_RR_OPT_OPTIONS) == 0) {
+    /* We didn't send any option codes */
+    return ARES_FALSE;
+  }
+
+  if (ares_dns_get_opt_rr_const(rsp) != NULL) {
+    /* At this time we're requiring the server to respond with EDNS opt
+     * records since that's what has been observed in the field.  We might
+     * find in the future we have to remove this, who knows. Lets go
+     * ahead and force a retry without EDNS*/
+    return ARES_TRUE;
+  }
+
+  return ARES_FALSE;
+}
+
 /* Handle an answer from a server. This must NEVER cleanup the
  * server connection! Return something other than ARES_SUCCESS to cause
  * the connection to be terminated after this call. */
@@ -713,12 +758,10 @@ static ares_status_t process_answer(ares_channel_t      *channel,
   ares_llist_node_destroy(query->node_queries_to_conn);
   query->node_queries_to_conn = NULL;
 
-  /* If we use EDNS and server answers with FORMERR without an OPT RR, the
-   * protocol extension is not understood by the responder. We must retry the
-   * query without EDNS enabled. */
-  if (ares_dns_record_get_rcode(rdnsrec) == ARES_RCODE_FORMERR &&
-      ares_dns_get_opt_rr_const(query->query) != NULL &&
-      ares_dns_get_opt_rr_const(rdnsrec) == NULL) {
+  /* There are old servers that don't understand EDNS at all, then some servers
+   * that have non-compliant implementations.  Lets try to detect this sort
+   * of thing. */
+  if (issue_might_be_edns(query->query, rdnsrec)) {
     status = rewrite_without_edns(query);
     if (status != ARES_SUCCESS) {
       end_query(channel, server, query, status, NULL);
diff --git a/deps/cares/src/lib/ares_send.c b/deps/cares/src/lib/ares_send.c
index ca178a1741ed7d..6efa9580b22165 100644
--- a/deps/cares/src/lib/ares_send.c
+++ b/deps/cares/src/lib/ares_send.c
@@ -153,6 +153,11 @@ ares_status_t ares_send_nolock(ares_channel_t *channel, ares_server_t *server,
   /* Duplicate Query */
   status = ares_dns_record_duplicate_ex(&query->query, dnsrec);
   if (status != ARES_SUCCESS) {
+    /* Sometimes we might get a EBADRESP response from duplicate due to
+     * the way it works (write and parse), rewrite it to EBADQUERY. */
+    if (status == ARES_EBADRESP) {
+      status = ARES_EBADQUERY;
+    }
     ares_free(query);
     callback(arg, status, 0, NULL);
     return status;
diff --git a/deps/cares/src/lib/event/ares_event_thread.c b/deps/cares/src/lib/event/ares_event_thread.c
index 24b55d6945728f..d59b7880a411cf 100644
--- a/deps/cares/src/lib/event/ares_event_thread.c
+++ b/deps/cares/src/lib/event/ares_event_thread.c
@@ -354,14 +354,16 @@ static void *ares_event_thread(void *arg)
       ares_process_pending_write(e->channel);
     }
 
+    /* Relock before we loop again */
+    ares_thread_mutex_lock(e->mutex);
+
     /* Each iteration should do timeout processing and any other cleanup
      * that may not have been performed */
     if (e->isup) {
+      ares_thread_mutex_unlock(e->mutex);
       ares_process_fds(e->channel, NULL, 0, ARES_PROCESS_FLAG_NONE);
+      ares_thread_mutex_lock(e->mutex);
     }
-
-    /* Relock before we loop again */
-    ares_thread_mutex_lock(e->mutex);
   }
 
   /* Lets cleanup while we're in the thread itself */
diff --git a/deps/cares/src/tools/Makefile.in b/deps/cares/src/tools/Makefile.in
index ace5023f03cfb6..9a96a74fa6957d 100644
--- a/deps/cares/src/tools/Makefile.in
+++ b/deps/cares/src/tools/Makefile.in
@@ -1,7 +1,7 @@
-# Makefile.in generated by automake 1.17 from Makefile.am.
+# Makefile.in generated by automake 1.16.5 from Makefile.am.
 # @configure_input@
 
-# Copyright (C) 1994-2024 Free Software Foundation, Inc.
+# Copyright (C) 1994-2021 Free Software Foundation, Inc.
 
 # This Makefile.in is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
@@ -70,8 +70,6 @@ am__make_running_with_option = \
   test $$has_opt = yes
 am__make_dryrun = (target_option=n; $(am__make_running_with_option))
 am__make_keepgoing = (target_option=k; $(am__make_running_with_option))
-am__rm_f = rm -f $(am__rm_f_notfound)
-am__rm_rf = rm -rf $(am__rm_f_notfound)
 pkgdatadir = $(datadir)/@PACKAGE@
 pkgincludedir = $(includedir)/@PACKAGE@
 pkglibdir = $(libdir)/@PACKAGE@
@@ -266,7 +264,6 @@ EGREP = @EGREP@
 ETAGS = @ETAGS@
 EXEEXT = @EXEEXT@
 FGREP = @FGREP@
-FILECMD = @FILECMD@
 GCOV = @GCOV@
 GENHTML = @GENHTML@
 GMOCK112_CFLAGS = @GMOCK112_CFLAGS@
@@ -333,10 +330,8 @@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
 am__include = @am__include@
 am__leading_dot = @am__leading_dot@
 am__quote = @am__quote@
-am__rm_f_notfound = @am__rm_f_notfound@
 am__tar = @am__tar@
 am__untar = @am__untar@
-am__xargs_n = @am__xargs_n@
 ax_pthread_config = @ax_pthread_config@
 bindir = @bindir@
 build = @build@
@@ -438,8 +433,13 @@ $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
 $(am__aclocal_m4_deps):
 
 clean-noinstPROGRAMS:
-	$(am__rm_f) $(noinst_PROGRAMS)
-	test -z "$(EXEEXT)" || $(am__rm_f) $(noinst_PROGRAMS:$(EXEEXT)=)
+	@list='$(noinst_PROGRAMS)'; test -n "$$list" || exit 0; \
+	echo " rm -f" $$list; \
+	rm -f $$list || exit $$?; \
+	test -n "$(EXEEXT)" || exit 0; \
+	list=`for p in $$list; do echo "$$p"; done | sed 's/$(EXEEXT)$$//'`; \
+	echo " rm -f" $$list; \
+	rm -f $$list
 
 adig$(EXEEXT): $(adig_OBJECTS) $(adig_DEPENDENCIES) $(EXTRA_adig_DEPENDENCIES) 
 	@rm -f adig$(EXEEXT)
@@ -461,7 +461,7 @@ distclean-compile:
 
 $(am__depfiles_remade):
 	@$(MKDIR_P) $(@D)
-	@: >>$@
+	@echo '# dummy' >$@-t && $(am__mv) $@-t $@
 
 am--depfiles: $(am__depfiles_remade)
 
@@ -649,8 +649,8 @@ mostlyclean-generic:
 clean-generic:
 
 distclean-generic:
-	-$(am__rm_f) $(CONFIG_CLEAN_FILES)
-	-test . = "$(srcdir)" || $(am__rm_f) $(CONFIG_CLEAN_VPATH_FILES)
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
 
 maintainer-clean-generic:
 	@echo "This command is intended for maintainers to use"
@@ -661,7 +661,7 @@ clean-am: clean-generic clean-libtool clean-noinstPROGRAMS \
 	mostlyclean-am
 
 distclean: distclean-am
-	-rm -f ./$(DEPDIR)/adig-adig.Po
+		-rm -f ./$(DEPDIR)/adig-adig.Po
 	-rm -f ./$(DEPDIR)/ahost-ahost.Po
 	-rm -f ./$(DEPDIR)/ahost-ares_getopt.Po
 	-rm -f Makefile
@@ -709,7 +709,7 @@ install-ps-am:
 installcheck-am:
 
 maintainer-clean: maintainer-clean-am
-	-rm -f ./$(DEPDIR)/adig-adig.Po
+		-rm -f ./$(DEPDIR)/adig-adig.Po
 	-rm -f ./$(DEPDIR)/ahost-ahost.Po
 	-rm -f ./$(DEPDIR)/ahost-ares_getopt.Po
 	-rm -f Makefile
@@ -752,10 +752,3 @@ uninstall-am:
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:
-
-# Tell GNU make to disable its built-in pattern rules.
-%:: %,v
-%:: RCS/%,v
-%:: RCS/%
-%:: s.%
-%:: SCCS/s.%

From 6308c18dbb6f09a751b8d4dd968d56f28abaaf0b Mon Sep 17 00:00:00 2001
From: Adrien Foulon <6115458+Tofandel@users.noreply.github.com>
Date: Tue, 12 Nov 2024 16:57:59 +0100
Subject: [PATCH 104/216] report: fix network queries in getReport libuv with
 exclude-network
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55602
Reviewed-By: Ethan Arrowood <ethan@arrowood.dev>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
---
 doc/api/report.md                          | 52 ++++++++++++++++++--
 src/node_report.cc                         |  6 ++-
 src/node_report.h                          |  3 +-
 src/node_report_utils.cc                   | 55 +++++++++++++--------
 test/common/report.js                      |  2 +-
 test/report/test-report-exclude-network.js | 57 ++++++++++++++++++++++
 6 files changed, 147 insertions(+), 28 deletions(-)

diff --git a/doc/api/report.md b/doc/api/report.md
index ce682798430bed..89d480b25d61ec 100644
--- a/doc/api/report.md
+++ b/doc/api/report.md
@@ -30,7 +30,7 @@ is provided below for reference.
 ```json
 {
   "header": {
-    "reportVersion": 3,
+    "reportVersion": 4,
     "event": "exception",
     "trigger": "Exception",
     "filename": "report.20181221.005011.8974.0.001.json",
@@ -320,6 +320,50 @@ is provided below for reference.
       "is_active": true,
       "address": "0x000055fc7b2cb180",
       "loopIdleTimeSeconds": 22644.8
+    },
+    {
+      "type": "tcp",
+      "is_active": true,
+      "is_referenced": true,
+      "address": "0x000055e70fcb85d8",
+      "localEndpoint": {
+        "host": "localhost",
+        "ip4": "127.0.0.1",
+        "port": 48986
+      },
+      "remoteEndpoint": {
+        "host": "localhost",
+        "ip4": "127.0.0.1",
+        "port": 38573
+      },
+      "sendBufferSize": 2626560,
+      "recvBufferSize": 131072,
+      "fd": 24,
+      "writeQueueSize": 0,
+      "readable": true,
+      "writable": true
+    },
+    {
+      "type": "tcp",
+      "is_active": true,
+      "is_referenced": true,
+      "address": "0x000055e70fcd68c8",
+      "localEndpoint": {
+        "host": "ip6-localhost",
+        "ip6": "::1",
+        "port": 52266
+      },
+      "remoteEndpoint": {
+        "host": "ip6-localhost",
+        "ip6": "::1",
+        "port": 38573
+      },
+      "sendBufferSize": 2626560,
+      "recvBufferSize": 131072,
+      "fd": 25,
+      "writeQueueSize": 0,
+      "readable": false,
+      "writable": false
     }
   ],
   "workers": [],
@@ -459,9 +503,9 @@ meaning of `SIGUSR2` for the said purposes.
 * `--report-signal` Sets or resets the signal for report generation
   (not supported on Windows). Default signal is `SIGUSR2`.
 
-* `--report-exclude-network` Exclude `header.networkInterfaces` from the
-  diagnostic report. By default this is not set and the network interfaces
-  are included.
+* `--report-exclude-network` Exclude `header.networkInterfaces` and disable the reverse DNS queries
+  in `libuv.*.(remote|local)Endpoint.host` from the diagnostic report.
+  By default this is not set and the network interfaces are included.
 
 A report can also be triggered via an API call from a JavaScript application:
 
diff --git a/src/node_report.cc b/src/node_report.cc
index 5368d8eef2fac0..757f8eb5db685e 100644
--- a/src/node_report.cc
+++ b/src/node_report.cc
@@ -24,7 +24,7 @@
 #include <cwctype>
 #include <fstream>
 
-constexpr int NODE_REPORT_VERSION = 3;
+constexpr int NODE_REPORT_VERSION = 4;
 constexpr int NANOS_PER_SEC = 1000 * 1000 * 1000;
 constexpr double SEC_PER_MICROS = 1e-6;
 constexpr int MAX_FRAME_COUNT = node::kMaxFrameCountForLogging;
@@ -203,7 +203,9 @@ static void WriteNodeReport(Isolate* isolate,
 
   writer.json_arraystart("libuv");
   if (env != nullptr) {
-    uv_walk(env->event_loop(), WalkHandle, static_cast<void*>(&writer));
+    uv_walk(env->event_loop(),
+            exclude_network ? WalkHandleNoNetwork : WalkHandleNetwork,
+            static_cast<void*>(&writer));
 
     writer.json_start();
     writer.json_keyvalue("type", "loop");
diff --git a/src/node_report.h b/src/node_report.h
index 7a2e817ac82f6b..98be339ae90d8f 100644
--- a/src/node_report.h
+++ b/src/node_report.h
@@ -19,7 +19,8 @@
 namespace node {
 namespace report {
 // Function declarations - utility functions in src/node_report_utils.cc
-void WalkHandle(uv_handle_t* h, void* arg);
+void WalkHandleNetwork(uv_handle_t* h, void* arg);
+void WalkHandleNoNetwork(uv_handle_t* h, void* arg);
 
 template <typename T>
 std::string ValueToHexString(T value) {
diff --git a/src/node_report_utils.cc b/src/node_report_utils.cc
index 516eac22dc63a2..d4eb52c1ed89c0 100644
--- a/src/node_report_utils.cc
+++ b/src/node_report_utils.cc
@@ -12,7 +12,8 @@ static constexpr auto null = JSONWriter::Null{};
 static void ReportEndpoint(uv_handle_t* h,
                            struct sockaddr* addr,
                            const char* name,
-                           JSONWriter* writer) {
+                           JSONWriter* writer,
+                           bool exclude_network) {
   if (addr == nullptr) {
     writer->json_keyvalue(name, null);
     return;
@@ -20,35 +21,42 @@ static void ReportEndpoint(uv_handle_t* h,
 
   uv_getnameinfo_t endpoint;
   char* host = nullptr;
-  char hostbuf[INET6_ADDRSTRLEN];
   const int family = addr->sa_family;
   const int port = ntohs(family == AF_INET ?
                          reinterpret_cast<sockaddr_in*>(addr)->sin_port :
                          reinterpret_cast<sockaddr_in6*>(addr)->sin6_port);
 
-  if (uv_getnameinfo(h->loop, &endpoint, nullptr, addr, NI_NUMERICSERV) == 0) {
+  writer->json_objectstart(name);
+  if (!exclude_network &&
+      uv_getnameinfo(h->loop, &endpoint, nullptr, addr, NI_NUMERICSERV) == 0) {
     host = endpoint.host;
     DCHECK_EQ(port, std::stoi(endpoint.service));
+    writer->json_keyvalue("host", host);
+  }
+
+  if (family == AF_INET) {
+    char ipbuf[INET_ADDRSTRLEN];
+    if (uv_ip4_name(
+            reinterpret_cast<sockaddr_in*>(addr), ipbuf, sizeof(ipbuf)) == 0) {
+      writer->json_keyvalue("ip4", ipbuf);
+      if (host == nullptr) writer->json_keyvalue("host", ipbuf);
+    }
   } else {
-    const void* src = family == AF_INET ?
-                      static_cast<void*>(
-                        &(reinterpret_cast<sockaddr_in*>(addr)->sin_addr)) :
-                      static_cast<void*>(
-                        &(reinterpret_cast<sockaddr_in6*>(addr)->sin6_addr));
-    if (uv_inet_ntop(family, src, hostbuf, sizeof(hostbuf)) == 0) {
-      host = hostbuf;
+    char ipbuf[INET6_ADDRSTRLEN];
+    if (uv_ip6_name(
+            reinterpret_cast<sockaddr_in6*>(addr), ipbuf, sizeof(ipbuf)) == 0) {
+      writer->json_keyvalue("ip6", ipbuf);
+      if (host == nullptr) writer->json_keyvalue("host", ipbuf);
     }
   }
-  writer->json_objectstart(name);
-  if (host != nullptr) {
-    writer->json_keyvalue("host", host);
-  }
   writer->json_keyvalue("port", port);
   writer->json_objectend();
 }
 
 // Utility function to format libuv socket information.
-static void ReportEndpoints(uv_handle_t* h, JSONWriter* writer) {
+static void ReportEndpoints(uv_handle_t* h,
+                            JSONWriter* writer,
+                            bool exclude_network) {
   struct sockaddr_storage addr_storage;
   struct sockaddr* addr = reinterpret_cast<sockaddr*>(&addr_storage);
   uv_any_handle* handle = reinterpret_cast<uv_any_handle*>(h);
@@ -65,7 +73,8 @@ static void ReportEndpoints(uv_handle_t* h, JSONWriter* writer) {
     default:
       break;
   }
-  ReportEndpoint(h, rc == 0 ? addr : nullptr,  "localEndpoint", writer);
+  ReportEndpoint(
+      h, rc == 0 ? addr : nullptr, "localEndpoint", writer, exclude_network);
 
   switch (h->type) {
     case UV_UDP:
@@ -77,7 +86,8 @@ static void ReportEndpoints(uv_handle_t* h, JSONWriter* writer) {
     default:
       break;
   }
-  ReportEndpoint(h, rc == 0 ? addr : nullptr, "remoteEndpoint", writer);
+  ReportEndpoint(
+      h, rc == 0 ? addr : nullptr, "remoteEndpoint", writer, exclude_network);
 }
 
 // Utility function to format libuv pipe information.
@@ -155,7 +165,7 @@ static void ReportPath(uv_handle_t* h, JSONWriter* writer) {
 }
 
 // Utility function to walk libuv handles.
-void WalkHandle(uv_handle_t* h, void* arg) {
+void WalkHandle(uv_handle_t* h, void* arg, bool exclude_network = false) {
   const char* type = uv_handle_type_name(h->type);
   JSONWriter* writer = static_cast<JSONWriter*>(arg);
   uv_any_handle* handle = reinterpret_cast<uv_any_handle*>(h);
@@ -177,7 +187,7 @@ void WalkHandle(uv_handle_t* h, void* arg) {
       break;
     case UV_TCP:
     case UV_UDP:
-      ReportEndpoints(h, writer);
+      ReportEndpoints(h, writer, exclude_network);
       break;
     case UV_NAMED_PIPE:
       ReportPipeEndpoints(h, writer);
@@ -267,6 +277,11 @@ void WalkHandle(uv_handle_t* h, void* arg) {
   }
   writer->json_end();
 }
-
+void WalkHandleNetwork(uv_handle_t* h, void* arg) {
+  WalkHandle(h, arg, false);
+}
+void WalkHandleNoNetwork(uv_handle_t* h, void* arg) {
+  WalkHandle(h, arg, true);
+}
 }  // namespace report
 }  // namespace node
diff --git a/test/common/report.js b/test/common/report.js
index 6e41561186570d..71b1645afdc95d 100644
--- a/test/common/report.js
+++ b/test/common/report.js
@@ -105,7 +105,7 @@ function _validateContent(report, fields = []) {
                         'glibcVersionRuntime', 'glibcVersionCompiler', 'cwd',
                         'reportVersion', 'networkInterfaces', 'threadId'];
   checkForUnknownFields(header, headerFields);
-  assert.strictEqual(header.reportVersion, 3);  // Increment as needed.
+  assert.strictEqual(header.reportVersion, 4);  // Increment as needed.
   assert.strictEqual(typeof header.event, 'string');
   assert.strictEqual(typeof header.trigger, 'string');
   assert(typeof header.filename === 'string' || header.filename === null);
diff --git a/test/report/test-report-exclude-network.js b/test/report/test-report-exclude-network.js
index c5e50135482f1a..7d0eaa08997cb5 100644
--- a/test/report/test-report-exclude-network.js
+++ b/test/report/test-report-exclude-network.js
@@ -1,5 +1,6 @@
 'use strict';
 require('../common');
+const http = require('node:http');
 const assert = require('node:assert');
 const { spawnSync } = require('node:child_process');
 const tmpdir = require('../common/tmpdir');
@@ -38,4 +39,60 @@ describe('report exclude network option', () => {
     const report = process.report.getReport();
     assert.strictEqual(report.header.networkInterfaces, undefined);
   });
+
+  it('should not do DNS queries in libuv if exclude network', async () => {
+    const server = http.createServer(function(req, res) {
+      res.writeHead(200, { 'Content-Type': 'text/plain' });
+      res.end();
+    });
+    let ipv6Available = true;
+    const port = await new Promise((resolve) => server.listen(0, async () => {
+      await Promise.all([
+        fetch('http://127.0.0.1:' + server.address().port),
+        fetch('http://[::1]:' + server.address().port).catch(() => ipv6Available = false),
+      ]);
+      resolve(server.address().port);
+      server.close();
+    }));
+    process.report.excludeNetwork = false;
+    let report = process.report.getReport();
+    let tcp = report.libuv.filter((uv) => uv.type === 'tcp' && uv.remoteEndpoint?.port === port);
+    assert.strictEqual(tcp.length, ipv6Available ? 2 : 1);
+    const findHandle = (local, ip4 = true) => {
+      return tcp.find(
+        ({ [local ? 'localEndpoint' : 'remoteEndpoint']: ep }) =>
+          (ep[ip4 ? 'ip4' : 'ip6'] === (ip4 ? '127.0.0.1' : '::1')),
+      )?.[local ? 'localEndpoint' : 'remoteEndpoint'];
+    };
+    try {
+      // The reverse DNS of 127.0.0.1 can be a lot of things other than localhost
+      // it could resolve to the server name for instance
+      assert.notStrictEqual(findHandle(true)?.host, '127.0.0.1');
+      assert.notStrictEqual(findHandle(false)?.host, '127.0.0.1');
+
+      if (ipv6Available) {
+        assert.notStrictEqual(findHandle(true, false)?.host, '::1');
+        assert.notStrictEqual(findHandle(false, false)?.host, '::1');
+      }
+    } catch (e) {
+      throw new Error(e?.message + ' in ' + JSON.stringify(tcp, null, 2), { cause: e });
+    }
+
+    process.report.excludeNetwork = true;
+    report = process.report.getReport();
+    tcp = report.libuv.filter((uv) => uv.type === 'tcp' && uv.remoteEndpoint?.port === port);
+
+    try {
+      assert.strictEqual(tcp.length, ipv6Available ? 2 : 1);
+      assert.strictEqual(findHandle(true)?.host, '127.0.0.1');
+      assert.strictEqual(findHandle(false)?.host, '127.0.0.1');
+
+      if (ipv6Available) {
+        assert.strictEqual(findHandle(true, false)?.host, '::1');
+        assert.strictEqual(findHandle(false, false)?.host, '::1');
+      }
+    } catch (e) {
+      throw new Error(e?.message + ' in ' + JSON.stringify(tcp, null, 2), { cause: e });
+    }
+  });
 });

From c14776fbaa568ebfd3d1f73ffc5def1deadbdd21 Mon Sep 17 00:00:00 2001
From: Joe Bowbeer <joe.bowbeer@gmail.com>
Date: Tue, 12 Nov 2024 15:55:32 -0800
Subject: [PATCH 105/216] doc: correct max-semi-space-size statement
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Joe Bowbeer <joe.bowbeer@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55812
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
---
 doc/api/cli.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/api/cli.md b/doc/api/cli.md
index 1390c50f381f74..fa2fccc9200d78 100644
--- a/doc/api/cli.md
+++ b/doc/api/cli.md
@@ -3260,8 +3260,8 @@ improvement depends on your workload (see [#42511][]).
 
 The default value depends on the memory limit. For example, on 64-bit systems
 with a memory limit of 512 MiB, the max size of a semi-space defaults to 1 MiB.
-On 64-bit systems with a memory limit of 2 GiB, the max size of a semi-space
-defaults to 16 MiB.
+For memory limits up to and including 2GiB, the default max size of a
+semi-space will be less than 16 MiB on 64-bit systems.
 
 To get the best configuration for your application, you should try different
 max-semi-space-size values when running benchmarks for your application.

From 79876f0dfdeda4bb7bae1c357f835e1b69a49e8b Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Wed, 13 Nov 2024 16:47:28 +0000
Subject: [PATCH 106/216] doc: fix history info for `URL.prototype.toJSON`

PR-URL: https://github.com/nodejs/node/pull/55818
Fixes: https://github.com/nodejs/node/issues/55806
Refs: https://github.com/nodejs/node/pull/11236
Refs: https://github.com/nodejs/node/pull/17365
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 doc/api/url.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/api/url.md b/doc/api/url.md
index 268ae31245893c..efaa9786a0ea57 100644
--- a/doc/api/url.md
+++ b/doc/api/url.md
@@ -598,6 +598,12 @@ value returned is equivalent to that of [`url.href`][] and [`url.toJSON()`][].
 
 #### `url.toJSON()`
 
+<!-- YAML
+added:
+  - v7.7.0
+  - v6.13.0
+-->
+
 * Returns: {string}
 
 The `toJSON()` method on the `URL` object returns the serialized URL. The

From d6738e919aa5ac34dd245fccb6a0a3108a0b73cd Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Thu, 14 Nov 2024 19:43:58 -0300
Subject: [PATCH 107/216] doc: add notable-change mention to sec release

PR-URL: https://github.com/nodejs/node/pull/55830
Refs: https://github.com/nodejs/changelog-maker/pull/167
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/contributing/releases.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 400aa51fb2667d..5914d4b953bb92 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -311,6 +311,9 @@ $ git checkout -b v1.2.3-proposal upstream/v1.x
 git cherry-pick  ...  # cherry-pick nodejs-private PR commits directly into the proposal
 ```
 
+Be sure to label the CVE fixes as `notable-change` in the nodejs-private repository.
+This will ensure they are included in the "Notable Changes" section of the CHANGELOG.
+
 </details>
 
 ### 3. Update `src/node_version.h`

From 955690e6cf170acb9a71b79d0d9faea3c19014fa Mon Sep 17 00:00:00 2001
From: Preveen P <31464911+preveen-stack@users.noreply.github.com>
Date: Fri, 15 Nov 2024 14:07:52 +0530
Subject: [PATCH 108/216] doc: clarify UV_THREADPOOL_SIZE env var usage

Setting of UV_THREADPOOL_SIZE from inside process using
process.env.UV_THREADPOOL_SIZE is not guaranteed to work as
the thread pool would have been created as part of the runtime
initialisation much before user code is run.

update doc/api/cli.md

PR-URL: https://github.com/nodejs/node/pull/55832
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/cli.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/api/cli.md b/doc/api/cli.md
index fa2fccc9200d78..0b38b15ec47e39 100644
--- a/doc/api/cli.md
+++ b/doc/api/cli.md
@@ -3165,8 +3165,10 @@ reason any of these APIs takes a long time, other (seemingly unrelated) APIs
 that run in libuv's threadpool will experience degraded performance. In order to
 mitigate this issue, one potential solution is to increase the size of libuv's
 threadpool by setting the `'UV_THREADPOOL_SIZE'` environment variable to a value
-greater than `4` (its current default value). For more information, see the
-[libuv threadpool documentation][].
+greater than `4` (its current default value). However, setting this from inside
+the process using `process.env.UV_THREADPOOL_SIZE=size` is not guranteed to work
+as the threadpool would have been created as part of the runtime initialisation
+much before user code is run. For more information, see the [libuv threadpool documentation][].
 
 ### `UV_USE_IO_URING=value`
 

From 3d11a85fe5a4b7ad4f3c9ff288ca91dd3b0a6cd5 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 15 Nov 2024 14:10:22 +0000
Subject: [PATCH 109/216] doc: add `-S` flag release preparation example
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55836
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/contributing/releases.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 5914d4b953bb92..82eec2833998ad 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -284,8 +284,8 @@ You can integrate the PRs into the proposal without running full CI.
 
 ⚠️ At this point, you can either run `git node release --prepare`:
 
-```console
-$ git node release --prepare x.y.z
+```bash
+git node release -S --prepare x.y.z
 ```
 
 to automate the remaining steps until step 6 or you can perform it manually

From b4f5da18a516974e3e039925fa3ee7e52257f3f6 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Fri, 15 Nov 2024 22:55:13 -0500
Subject: [PATCH 110/216] benchmark: add `test-reporters`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55757
Refs: https://github.com/nodejs/node/issues/55723
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br>
Reviewed-By: Raz Luvaton <rluvaton@gmail.com>
---
 benchmark/fixtures/basic-test-runner.js | 11 +++++++
 benchmark/test_runner/test-reporters.js | 41 +++++++++++++++++++++++++
 2 files changed, 52 insertions(+)
 create mode 100644 benchmark/fixtures/basic-test-runner.js
 create mode 100644 benchmark/test_runner/test-reporters.js

diff --git a/benchmark/fixtures/basic-test-runner.js b/benchmark/fixtures/basic-test-runner.js
new file mode 100644
index 00000000000000..d57f70da0d40a5
--- /dev/null
+++ b/benchmark/fixtures/basic-test-runner.js
@@ -0,0 +1,11 @@
+const { test } = require('node:test');
+
+test('should pass', () => {});
+test('should fail', () => { throw new Error('fail'); });
+test('should skip', { skip: true }, () => {});
+test('parent', (t) => {
+  t.test('should fail', () => { throw new Error('fail'); });
+  t.test('should pass but parent fail', (t, done) => {
+    setImmediate(done);
+  });
+});
diff --git a/benchmark/test_runner/test-reporters.js b/benchmark/test_runner/test-reporters.js
new file mode 100644
index 00000000000000..4eecd1c4306005
--- /dev/null
+++ b/benchmark/test_runner/test-reporters.js
@@ -0,0 +1,41 @@
+'use strict';
+
+const common = require('../common');
+const { run } = require('node:test');
+const reporters = require('node:test/reporters');
+const { Readable } = require('node:stream');
+const assert = require('node:assert');
+
+const bench = common.createBenchmark(main, {
+  n: [1e4],
+  reporter: Object.keys(reporters),
+});
+
+// No need to run this for every benchmark,
+// it should always be the same data.
+const stream = run({
+  files: ['../fixtures/basic-test-runner.js'],
+});
+let testResults;
+
+async function main({ n, reporter: r }) {
+  testResults ??= await stream.toArray();
+
+  // Create readable streams for each iteration
+  const readables = Array.from({ length: n }, () => Readable.from(testResults));
+
+  // Get the selected reporter
+  const reporter = reporters[r];
+
+  bench.start();
+
+  let noDead;
+  for (const readable of readables) {
+    // Process each readable stream through the reporter
+    noDead = await readable.compose(reporter).toArray();
+  }
+
+  bench.end(n);
+
+  assert.ok(noDead);
+}

From afed723b6c6c3ed373d208b842031548c93dc07e Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Sat, 16 Nov 2024 18:24:13 -0500
Subject: [PATCH 111/216] deps: update simdutf to 5.6.1

PR-URL: https://github.com/nodejs/node/pull/55850
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 deps/simdutf/simdutf.cpp | 617 +++++++++++++++++++++++++++++++++------
 deps/simdutf/simdutf.h   | 268 +++++++++++------
 2 files changed, 703 insertions(+), 182 deletions(-)

diff --git a/deps/simdutf/simdutf.cpp b/deps/simdutf/simdutf.cpp
index e6b25c7ce27c16..d5f6fbd9a4c413 100644
--- a/deps/simdutf/simdutf.cpp
+++ b/deps/simdutf/simdutf.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-10-11 12:35:29 -0400. Do not edit! */
+/* auto-generated on 2024-11-12 20:00:19 -0500. Do not edit! */
 /* begin file src/simdutf.cpp */
 #include "simdutf.h"
 // We include base64_tables once.
@@ -937,6 +937,10 @@ class implementation final : public simdutf::implementation {
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result
@@ -944,6 +948,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
@@ -2547,6 +2556,10 @@ class implementation final : public simdutf::implementation {
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result
@@ -2554,6 +2567,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
@@ -2885,6 +2903,10 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused virtual size_t
   maximal_binary_length_from_base64(const char16_t *input,
                                     size_t length) const noexcept;
@@ -2893,6 +2915,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused virtual size_t
   base64_length_from_binary(size_t length,
                             base64_options options) const noexcept;
@@ -4142,6 +4169,10 @@ class implementation final : public simdutf::implementation {
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result
@@ -4149,6 +4180,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
@@ -5386,6 +5422,10 @@ class implementation final : public simdutf::implementation {
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result
@@ -5393,6 +5433,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
@@ -6172,6 +6217,10 @@ class implementation final : public simdutf::implementation {
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result
@@ -6179,6 +6228,11 @@ class implementation final : public simdutf::implementation {
                    base64_options options,
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
@@ -6579,6 +6633,10 @@ class implementation final : public simdutf::implementation {
   simdutf_warn_unused result base64_to_binary(
       const char *input, size_t length, char *output, base64_options options,
       last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
   simdutf_warn_unused result base64_to_binary(
@@ -6587,6 +6645,11 @@ class implementation final : public simdutf::implementation {
       last_chunk_handling_options last_chunk_options) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
   size_t binary_to_base64(const char *input, size_t length, char *output,
                           base64_options options) const noexcept;
 };
@@ -7176,7 +7239,7 @@ template <class char_type> bool is_eight_byte(char_type c) {
 // Returns true upon success. The destination buffer must be large enough.
 // This functions assumes that the padding (=) has been removed.
 template <class char_type>
-result
+full_result
 base64_tail_decode(char *dst, const char_type *src, size_t length,
                    size_t padded_characters, // number of padding characters
                                              // '=', typically 0, 1, 2.
@@ -7229,7 +7292,8 @@ base64_tail_decode(char *dst, const char_type *src, size_t length,
       if (is_eight_byte(c) && code <= 63) {
         idx++;
       } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
-        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       } else {
         // We have a space or a newline. We ignore it.
       }
@@ -7239,17 +7303,23 @@ base64_tail_decode(char *dst, const char_type *src, size_t length,
       if (last_chunk_options == last_chunk_handling_options::strict &&
           (idx != 1) && ((idx + padded_characters) & 3) != 0) {
         // The partial chunk was at src - idx
-        return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
+        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       } else if (last_chunk_options ==
                      last_chunk_handling_options::stop_before_partial &&
                  (idx != 1) && ((idx + padded_characters) & 3) != 0) {
         // Rewind src to before partial chunk
         src -= idx;
-        return {SUCCESS, size_t(dst - dstinit)};
+        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
       } else {
         if (idx == 2) {
           uint32_t triple =
               (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
+          if ((last_chunk_options == last_chunk_handling_options::strict) &&
+              (triple & 0xffff)) {
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
           if (match_system(endianness::BIG)) {
             triple <<= 8;
             std::memcpy(dst, &triple, 1);
@@ -7263,6 +7333,11 @@ base64_tail_decode(char *dst, const char_type *src, size_t length,
           uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
                             (uint32_t(buffer[1]) << 2 * 6) +
                             (uint32_t(buffer[2]) << 1 * 6);
+          if ((last_chunk_options == last_chunk_handling_options::strict) &&
+              (triple & 0xff)) {
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
           if (match_system(endianness::BIG)) {
             triple <<= 8;
             std::memcpy(dst, &triple, 2);
@@ -7273,9 +7348,10 @@ base64_tail_decode(char *dst, const char_type *src, size_t length,
           }
           dst += 2;
         } else if (idx == 1) {
-          return {BASE64_INPUT_REMAINDER, size_t(dst - dstinit)};
+          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+                  size_t(dst - dstinit)};
         }
-        return {SUCCESS, size_t(dst - dstinit)};
+        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
       }
     }
 
@@ -7295,17 +7371,15 @@ base64_tail_decode(char *dst, const char_type *src, size_t length,
 }
 
 // like base64_tail_decode, but it will not write past the end of the output
-// buffer. outlen is modified to reflect the number of bytes written. This
-// functions assumes that the padding (=) has been removed.
-// like base64_tail_decode, but it will not write past the end of the output
-// buffer. outlen is modified to reflect the number of bytes written. This
-// functions assumes that the padding (=) has been removed.
+// buffer. The outlen paramter is modified to reflect the number of bytes
+// written. This functions assumes that the padding (=) has been removed.
 template <class char_type>
 result base64_tail_decode_safe(
-    char *dst, size_t &outlen, const char_type *src, size_t length,
+    char *dst, size_t &outlen, const char_type *&srcr, size_t length,
     size_t padded_characters, // number of padding characters '=', typically 0,
                               // 1, 2.
     base64_options options, last_chunk_handling_options last_chunk_options) {
+  const char_type *src = srcr;
   if (length == 0) {
     outlen = 0;
     return {SUCCESS, 0};
@@ -7344,6 +7418,7 @@ result base64_tail_decode_safe(
                 d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
       if (dstend - dst < 3) {
         outlen = size_t(dst - dstinit);
+        srcr = src;
         return {OUTPUT_BUFFER_TOO_SMALL, size_t(src - srcinit)};
       }
       if (match_system(endianness::BIG)) {
@@ -7365,6 +7440,7 @@ result base64_tail_decode_safe(
         idx++;
       } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
         outlen = size_t(dst - dstinit);
+        srcr = src;
         return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
       } else {
         // We have a space or a newline. We ignore it.
@@ -7375,33 +7451,42 @@ result base64_tail_decode_safe(
       if (last_chunk_options == last_chunk_handling_options::strict &&
           ((idx + padded_characters) & 3) != 0) {
         outlen = size_t(dst - dstinit);
+        srcr = src;
         return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
       } else if (last_chunk_options ==
                      last_chunk_handling_options::stop_before_partial &&
                  ((idx + padded_characters) & 3) != 0) {
         // Rewind src to before partial chunk
-        src = srccur;
+        srcr = srccur;
         outlen = size_t(dst - dstinit);
         return {SUCCESS, size_t(dst - dstinit)};
       } else { // loose mode
         if (idx == 0) {
           // No data left; return success
           outlen = size_t(dst - dstinit);
+          srcr = src;
           return {SUCCESS, size_t(dst - dstinit)};
         } else if (idx == 1) {
           // Error: Incomplete chunk of length 1 is invalid in loose mode
           outlen = size_t(dst - dstinit);
+          srcr = src;
           return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
         } else if (idx == 2 || idx == 3) {
           // Check if there's enough space in the destination buffer
           size_t required_space = (idx == 2) ? 1 : 2;
           if (size_t(dstend - dst) < required_space) {
             outlen = size_t(dst - dstinit);
+            srcr = src;
             return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
           }
           uint32_t triple = 0;
           if (idx == 2) {
             triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12);
+            if ((last_chunk_options == last_chunk_handling_options::strict) &&
+                (triple & 0xffff)) {
+              srcr = src;
+              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
+            }
             // Extract the first byte
             triple >>= 16;
             dst[0] = static_cast<char>(triple & 0xFF);
@@ -7409,6 +7494,11 @@ result base64_tail_decode_safe(
           } else if (idx == 3) {
             triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12) +
                      (uint32_t(buffer[2]) << 6);
+            if ((last_chunk_options == last_chunk_handling_options::strict) &&
+                (triple & 0xff)) {
+              srcr = src;
+              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
+            }
             // Extract the first two bytes
             triple >>= 8;
             dst[0] = static_cast<char>((triple >> 8) & 0xFF);
@@ -7416,6 +7506,7 @@ result base64_tail_decode_safe(
             dst += 2;
           }
           outlen = size_t(dst - dstinit);
+          srcr = src;
           return {SUCCESS, size_t(dst - dstinit)};
         }
       }
@@ -7423,6 +7514,7 @@ result base64_tail_decode_safe(
 
     if (dstend - dst < 3) {
       outlen = size_t(dst - dstinit);
+      srcr = src;
       return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
     }
     uint32_t triple = (uint32_t(buffer[0]) << 18) +
@@ -8243,6 +8335,14 @@ class detect_best_supported_implementation_on_first_use final
                                         last_chunk_handling_options);
   }
 
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary_details(input, length, output, options,
+                                                last_chunk_handling_options);
+  }
+
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept override {
     return set_best()->maximal_binary_length_from_base64(input, length);
@@ -8257,6 +8357,15 @@ class detect_best_supported_implementation_on_first_use final
                                         last_chunk_handling_options);
   }
 
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary_details(input, length, output, options,
+                                                last_chunk_handling_options);
+  }
+
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept override {
     return set_best()->base64_length_from_binary(length, options);
@@ -8706,6 +8815,12 @@ class unsupported_implementation final : public implementation {
     return result(error_code::OTHER, 0);
   }
 
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *, size_t, char *, base64_options,
+      last_chunk_handling_options) const noexcept override {
+    return full_result(error_code::OTHER, 0, 0);
+  }
+
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *, size_t) const noexcept override {
     return 0;
@@ -8717,6 +8832,12 @@ class unsupported_implementation final : public implementation {
     return result(error_code::OTHER, 0);
   }
 
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *, size_t, char *, base64_options,
+      last_chunk_handling_options) const noexcept override {
+    return full_result(error_code::OTHER, 0, 0);
+  }
+
   simdutf_warn_unused size_t
   base64_length_from_binary(size_t, base64_options) const noexcept override {
     return 0;
@@ -9365,11 +9486,12 @@ simdutf_warn_unused result base64_to_binary_safe_impl(
   size_t max_length = maximal_binary_length_from_base64(input, length);
   if (outlen >= max_length) {
     // fast path
-    result r = base64_to_binary(input, length, output, options,
-                                last_chunk_handling_options);
-    if (r.error != error_code::INVALID_BASE64_CHARACTER) {
-      outlen = r.count;
-      r.count = length;
+    full_result r = get_default_implementation()->base64_to_binary_details(
+        input, length, output, options, last_chunk_handling_options);
+    if (r.error != error_code::INVALID_BASE64_CHARACTER &&
+        r.error != error_code::BASE64_EXTRA_BITS) {
+      outlen = r.output_count;
+      return {r.error, length};
     }
     return r;
   }
@@ -9377,14 +9499,16 @@ simdutf_warn_unused result base64_to_binary_safe_impl(
   // the input.
   size_t outlen3 = outlen / 3 * 3; // round down to multiple of 3
   size_t safe_input = base64_length_from_binary(outlen3, options);
-  result r = base64_to_binary(input, safe_input, output, options, loose);
+  full_result r = get_default_implementation()->base64_to_binary_details(
+      input, safe_input, output, options, loose);
   if (r.error == error_code::INVALID_BASE64_CHARACTER) {
     return r;
   }
-  size_t offset = (r.error == error_code::BASE64_INPUT_REMAINDER)
-                      ? 1
-                      : ((r.count % 3) == 0 ? 0 : (r.count % 3) + 1);
-  size_t output_index = r.count - (r.count % 3);
+  size_t offset =
+      (r.error == error_code::BASE64_INPUT_REMAINDER)
+          ? 1
+          : ((r.output_count % 3) == 0 ? 0 : (r.output_count % 3) + 1);
+  size_t output_index = r.output_count - (r.output_count % 3);
   size_t input_index = safe_input;
   // offset is a value that is no larger than 3. We backtrack
   // by up to offset characters + an undetermined number of
@@ -9419,19 +9543,25 @@ simdutf_warn_unused result base64_to_binary_safe_impl(
       padding_characts++;
     }
   }
-  r = scalar::base64::base64_tail_decode_safe(
+  // this will advance tail_input and tail_length
+  result rr = scalar::base64::base64_tail_decode_safe(
       output + output_index, remaining_out, tail_input, tail_length,
       padding_characts, options, last_chunk_handling_options);
   outlen = output_index + remaining_out;
   if (last_chunk_handling_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && padding_characts > 0) {
+      rr.error == error_code::SUCCESS && padding_characts > 0) {
     // additional checks
     if ((outlen % 3 == 0) || ((outlen % 3) + 1 + padding_characts != 4)) {
-      r.error = error_code::INVALID_BASE64_CHARACTER;
+      rr.error = error_code::INVALID_BASE64_CHARACTER;
     }
   }
-  r.count += input_index;
-  return r;
+  if (rr.error == error_code::SUCCESS &&
+      last_chunk_handling_options == stop_before_partial) {
+    rr.count = tail_input - input;
+    return rr;
+  }
+  rr.count += input_index;
+  return rr;
 }
 
 simdutf_warn_unused size_t convert_latin1_to_utf8_safe(
@@ -15614,9 +15744,10 @@ void base64_decode_block(char *out, const char *src) {
 }
 
 template <bool base64_url, typename char_type>
-result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
-                              base64_options options,
-                              last_chunk_handling_options last_chunk_options) {
+full_result
+compress_decode_base64(char *dst, const char_type *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
   size_t equallocation =
@@ -15644,9 +15775,9 @@ result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
   }
   if (srclen == 0) {
     if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
     }
-    return {SUCCESS, 0};
+    return {SUCCESS, 0, 0};
   }
   const char_type *const srcinit = src;
   const char *const dstinit = dst;
@@ -15673,7 +15804,8 @@ result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
           if (src < srcend) {
             // should never happen
           }
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                  size_t(dst - dstinit)};
         }
       }
 
@@ -15708,7 +15840,8 @@ result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
       if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       bufferptr += (val <= 63);
       src++;
@@ -15756,20 +15889,22 @@ result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
     }
   }
   if (src < srcend + equalsigns) {
-    result r = scalar::base64::base64_tail_decode(
+    full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-      r.count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      r.input_count += size_t(src - srcinit);
       return r;
     } else {
-      r.count += size_t(dst - dstinit);
+      r.output_count += size_t(dst - dstinit);
     }
     if (last_chunk_options != stop_before_partial &&
         r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.count = equallocation;
+        r.input_count = equallocation;
       }
     }
     return r;
@@ -15777,10 +15912,10 @@ result compress_decode_base64(char *dst, const char_type *src, size_t srclen,
   if (equalsigns > 0) {
     if ((size_t(dst - dstinit) % 3 == 0) ||
         ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
     }
   }
-  return {SUCCESS, size_t(dst - dstinit)};
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
 /* end file src/arm64/arm_base64.cpp */
 /* begin file src/arm64/arm_convert_utf32_to_latin1.cpp */
@@ -19261,6 +19396,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -19276,6 +19421,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
@@ -19822,6 +19977,49 @@ simdutf_warn_unused result implementation::base64_to_binary(
   return r;
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -19869,6 +20067,49 @@ simdutf_warn_unused result implementation::base64_to_binary(
   return r;
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
@@ -23192,18 +23433,41 @@ size_t encode_base64(char *dst, const char *src, size_t srclen,
   const __m512i lookup =
       _mm512_loadu_si512(reinterpret_cast<const __m512i *>(lookup_tbl));
   const __m512i multi_shifts = _mm512_set1_epi64(UINT64_C(0x3036242a1016040a));
-  size_t i = 0;
-  for (; i + 64 <= srclen; i += 48) {
-    const __m512i v =
-        _mm512_loadu_si512(reinterpret_cast<const __m512i *>(input + i));
+  size_t size = srclen;
+  __mmask64 input_mask = 0xffffffffffff; // (1 << 48) - 1
+  while (size >= 48) {
+    const __m512i v = _mm512_maskz_loadu_epi8(
+        input_mask, reinterpret_cast<const __m512i *>(input));
     const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
     const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
     const __m512i result = _mm512_permutexvar_epi8(indices, lookup);
     _mm512_storeu_si512(reinterpret_cast<__m512i *>(out), result);
     out += 64;
-  }
-  return i / 3 * 4 + scalar::base64::tail_encode_base64((char *)out, src + i,
-                                                        srclen - i, options);
+    input += 48;
+    size -= 48;
+  }
+  input_mask = ((__mmask64)1 << size) - 1;
+  const __m512i v = _mm512_maskz_loadu_epi8(
+      input_mask, reinterpret_cast<const __m512i *>(input));
+  const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
+  const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
+  bool padding_needed =
+      (((options & base64_url) == 0) ^
+       ((options & base64_reverse_padding) == base64_reverse_padding));
+  size_t padding_amount = ((size % 3) > 0) ? (3 - (size % 3)) : 0;
+  size_t output_len = ((size + 2) / 3) * 4;
+  size_t non_padded_output_len = output_len - padding_amount;
+  if (!padding_needed) {
+    output_len = non_padded_output_len;
+  }
+  __mmask64 output_mask = output_len == 64 ? (__mmask64)UINT64_MAX
+                                           : ((__mmask64)1 << output_len) - 1;
+  __m512i result = _mm512_mask_permutexvar_epi8(
+      _mm512_set1_epi8('='), ((__mmask64)1 << non_padded_output_len) - 1,
+      indices, lookup);
+  _mm512_mask_storeu_epi8(reinterpret_cast<__m512i *>(out), output_mask,
+                          result);
+  return (size_t)(out - (uint8_t *)dst) + output_len;
 }
 
 template <bool base64_url>
@@ -23309,9 +23573,10 @@ static inline void base64_decode_block(char *out, block64 *b) {
 }
 
 template <bool base64_url, typename chartype>
-result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options,
-                              last_chunk_handling_options last_chunk_options) {
+full_result
+compress_decode_base64(char *dst, const chartype *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
   size_t equallocation =
@@ -23339,9 +23604,9 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   }
   if (srclen == 0) {
     if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
     }
-    return {SUCCESS, 0};
+    return {SUCCESS, 0, 0};
   }
   const chartype *const srcinit = src;
   const char *const dstinit = dst;
@@ -23365,7 +23630,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
                to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -23401,7 +23667,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
       if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       bufferptr += (val <= 63);
       src++;
@@ -23447,20 +23714,22 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     }
   }
   if (src < srcend + equalsigns) {
-    result r = scalar::base64::base64_tail_decode(
+    full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-      r.count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      r.input_count += size_t(src - srcinit);
       return r;
     } else {
-      r.count += size_t(dst - dstinit);
+      r.output_count += size_t(dst - dstinit);
     }
     if (last_chunk_options != stop_before_partial &&
         r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.count = equallocation;
+        r.input_count = equallocation;
       }
     }
     return r;
@@ -23468,10 +23737,10 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (equalsigns > 0) {
     if ((size_t(dst - dstinit) % 3 == 0) ||
         ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
     }
   }
-  return {SUCCESS, size_t(dst - dstinit)};
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
 /* end file src/icelake/icelake_base64.inl.cpp */
 
@@ -25032,6 +25301,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -25047,6 +25326,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
@@ -28101,9 +28390,10 @@ static inline void base64_decode_block_safe(char *out, block64 *b) {
 }
 
 template <bool base64_url, typename chartype>
-result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options,
-                              last_chunk_handling_options last_chunk_options) {
+full_result
+compress_decode_base64(char *dst, const chartype *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
   size_t equallocation =
@@ -28131,9 +28421,9 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   }
   if (srclen == 0) {
     if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
     }
-    return {SUCCESS, 0};
+    return {SUCCESS, 0, 0};
   }
   char *end_of_safe_64byte_zone =
       (srclen + 3) / 4 * 3 >= 63 ? dst + (srclen + 3) / 4 * 3 - 63 : dst;
@@ -28160,7 +28450,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
                to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -28206,7 +28497,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
       if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       bufferptr += (val <= 63);
       src++;
@@ -28258,20 +28550,22 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     }
   }
   if (src < srcend + equalsigns) {
-    result r = scalar::base64::base64_tail_decode(
+    full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-      r.count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      r.input_count += size_t(src - srcinit);
       return r;
     } else {
-      r.count += size_t(dst - dstinit);
+      r.output_count += size_t(dst - dstinit);
     }
     if (last_chunk_options != stop_before_partial &&
         r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.count = equallocation;
+        r.input_count = equallocation;
       }
     }
     return r;
@@ -28279,10 +28573,10 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (equalsigns > 0) {
     if ((size_t(dst - dstinit) % 3 == 0) ||
         ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
     }
   }
-  return {SUCCESS, size_t(dst - dstinit)};
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
 /* end file src/haswell/avx2_base64.cpp */
 
@@ -31062,6 +31356,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -31077,6 +31381,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
@@ -34707,6 +35021,49 @@ simdutf_warn_unused result implementation::base64_to_binary(
   return r;
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -34754,6 +35111,49 @@ simdutf_warn_unused result implementation::base64_to_binary(
   return r;
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
@@ -37746,9 +38146,10 @@ static inline void base64_decode_block_safe(char *out, block64 *b) {
 }
 
 template <bool base64_url, typename chartype>
-result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                              base64_options options,
-                              last_chunk_handling_options last_chunk_options) {
+full_result
+compress_decode_base64(char *dst, const chartype *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
   const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
                                         : tables::base64::to_base64_value;
   size_t equallocation =
@@ -37776,9 +38177,9 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   }
   if (srclen == 0) {
     if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
     }
-    return {SUCCESS, 0};
+    return {SUCCESS, 0, 0};
   }
   char *end_of_safe_64byte_zone =
       (srclen + 3) / 4 * 3 >= 63 ? dst + (srclen + 3) / 4 * 3 - 63 : dst;
@@ -37805,7 +38206,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
                to_base64[uint8_t(*src)] <= 64) {
           src++;
         }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -37850,7 +38252,8 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
       if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       bufferptr += (val <= 63);
       src++;
@@ -37902,20 +38305,22 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
     }
   }
   if (src < srcend + equalsigns) {
-    result r = scalar::base64::base64_tail_decode(
+    full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-      r.count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      r.input_count += size_t(src - srcinit);
       return r;
     } else {
-      r.count += size_t(dst - dstinit);
+      r.output_count += size_t(dst - dstinit);
     }
     if (last_chunk_options != stop_before_partial &&
         r.error == error_code::SUCCESS && equalsigns > 0) {
       // additional checks
-      if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
         r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.count = equallocation;
+        r.input_count = equallocation;
       }
     }
     return r;
@@ -37923,10 +38328,10 @@ result compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (equalsigns > 0) {
     if ((size_t(dst - dstinit) % 3 == 0) ||
         ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
     }
   }
-  return {SUCCESS, size_t(dst - dstinit)};
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
 /* end file src/westmere/sse_base64.cpp */
 
@@ -40715,6 +41120,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
     const char16_t *input, size_t length) const noexcept {
   return scalar::base64::maximal_binary_length_from_base64(input, length);
@@ -40730,6 +41145,16 @@ simdutf_warn_unused result implementation::base64_to_binary(
                                              last_chunk_options);
 }
 
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
     size_t length, base64_options options) const noexcept {
   return scalar::base64::base64_length_from_binary(length, options);
diff --git a/deps/simdutf/simdutf.h b/deps/simdutf/simdutf.h
index 4ed08d542b0ac3..a219e96ad9cd9f 100644
--- a/deps/simdutf/simdutf.h
+++ b/deps/simdutf/simdutf.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-10-11 12:35:29 -0400. Do not edit! */
+/* auto-generated on 2024-11-12 20:00:19 -0500. Do not edit! */
 /* begin file include/simdutf.h */
 #ifndef SIMDUTF_H
 #define SIMDUTF_H
@@ -611,9 +611,12 @@ enum error_code {
              // and a low surrogate must be preceded by a high surrogate
              // (UTF-16) OR there must be no surrogate at all (Latin1)
   INVALID_BASE64_CHARACTER, // Found a character that cannot be part of a valid
-                            // base64 string.
+                            // base64 string. This may include a misplaced
+                            // padding character ('=').
   BASE64_INPUT_REMAINDER,   // The base64 input terminates with a single
                             // character, excluding padding (=).
+  BASE64_EXTRA_BITS,        // The base64 input terminates with non-zero
+                            // padding bits.
   OUTPUT_BUFFER_TOO_SMALL,  // The provided buffer is too small.
   OTHER                     // Not related to validation/transcoding.
 };
@@ -626,8 +629,30 @@ struct result {
 
   simdutf_really_inline result() : error{error_code::SUCCESS}, count{0} {}
 
-  simdutf_really_inline result(error_code _err, size_t _pos)
-      : error{_err}, count{_pos} {}
+  simdutf_really_inline result(error_code err, size_t pos)
+      : error{err}, count{pos} {}
+};
+
+struct full_result {
+  error_code error;
+  size_t input_count;
+  size_t output_count;
+
+  simdutf_really_inline full_result()
+      : error{error_code::SUCCESS}, input_count{0}, output_count{0} {}
+
+  simdutf_really_inline full_result(error_code err, size_t pos_in,
+                                    size_t pos_out)
+      : error{err}, input_count{pos_in}, output_count{pos_out} {}
+
+  simdutf_really_inline operator result() const noexcept {
+    if (error == error_code::SUCCESS ||
+        error == error_code::BASE64_INPUT_REMAINDER) {
+      return result{error, output_count};
+    } else {
+      return result{error, input_count};
+    }
+  }
 };
 
 } // namespace simdutf
@@ -645,7 +670,7 @@ SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 #define SIMDUTF_SIMDUTF_VERSION_H
 
 /** The version of simdutf being used (major.minor.revision) */
-#define SIMDUTF_VERSION "5.6.0"
+#define SIMDUTF_VERSION "5.6.1"
 
 namespace simdutf {
 enum {
@@ -660,7 +685,7 @@ enum {
   /**
    * The revision (major.minor.REVISION) of simdutf being used.
    */
-  SIMDUTF_VERSION_REVISION = 0
+  SIMDUTF_VERSION_REVISION = 1
 };
 } // namespace simdutf
 
@@ -1034,7 +1059,7 @@ simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept;
  *
  * @param buf the UTF-8 string to validate.
  * @param len the length of the string in bytes.
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1061,7 +1086,7 @@ simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) noexcept;
  *
  * @param buf the ASCII string to validate.
  * @param len the length of the string in bytes.
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1132,7 +1157,7 @@ simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
  * @param buf the UTF-16 string to validate.
  * @param len the length of the string in number of 2-byte code units
  * (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1151,7 +1176,7 @@ simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf,
  * @param buf the UTF-16LE string to validate.
  * @param len the length of the string in number of 2-byte code units
  * (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1170,7 +1195,7 @@ simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf,
  * @param buf the UTF-16BE string to validate.
  * @param len the length of the string in number of 2-byte code units
  * (char16_t).
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1206,7 +1231,7 @@ simdutf_warn_unused bool validate_utf32(const char32_t *buf,
  * @param buf the UTF-32 string to validate.
  * @param len the length of the string in number of 4-byte code units
  * (char32_t).
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1366,7 +1391,7 @@ simdutf_warn_unused size_t convert_utf8_to_utf16be(
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param latin1_output  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of code units validated if
  * successful.
@@ -1384,7 +1409,7 @@ simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -1401,7 +1426,7 @@ simdutf_warn_unused result convert_utf8_to_utf16_with_errors(
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -1418,7 +1443,7 @@ simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf16_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -1450,7 +1475,7 @@ simdutf_warn_unused size_t convert_utf8_to_utf32(
  * @param input         the UTF-8 string to convert
  * @param length        the length of the string in bytes
  * @param utf32_buffer  the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char32_t written if
  * successful.
@@ -1715,7 +1740,7 @@ simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *input,
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1733,7 +1758,7 @@ simdutf_warn_unused result convert_utf16_to_latin1_with_errors(
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1753,7 +1778,7 @@ simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1773,7 +1798,7 @@ simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1792,7 +1817,7 @@ simdutf_warn_unused result convert_utf16_to_utf8_with_errors(
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1811,7 +1836,7 @@ simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -1998,7 +2023,7 @@ simdutf_warn_unused size_t convert_utf16be_to_utf32(
  * @param input         the UTF-16 string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char32_t written if
  * successful.
@@ -2017,7 +2042,7 @@ simdutf_warn_unused result convert_utf16_to_utf32_with_errors(
  * @param input         the UTF-16LE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char32_t written if
  * successful.
@@ -2036,7 +2061,7 @@ simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
  * @param input         the UTF-16BE string to convert
  * @param length        the length of the string in 2-byte code units (char16_t)
  * @param utf32_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char32_t written if
  * successful.
@@ -2177,7 +2202,7 @@ simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *input,
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf8_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -2263,7 +2288,7 @@ simdutf_warn_unused size_t convert_utf32_to_latin1(
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param latin1_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char written if
  * successful.
@@ -2322,7 +2347,7 @@ simdutf_warn_unused size_t convert_utf32_to_utf16be(
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -2341,7 +2366,7 @@ simdutf_warn_unused result convert_utf32_to_utf16_with_errors(
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -2360,7 +2385,7 @@ simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
  * @param input         the UTF-32 string to convert
  * @param length        the length of the string in 4-byte code units (char32_t)
  * @param utf16_buffer   the pointer to buffer that can hold conversion result
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in code units) if any, or the number of char16_t written if
  * successful.
@@ -2648,8 +2673,7 @@ simdutf_warn_unused size_t trim_partial_utf16(const char16_t *input,
                                               size_t length);
 
 // base64_options are used to specify the base64 encoding options.
-using base64_options = uint64_t;
-enum : base64_options {
+enum base64_options : uint64_t {
   base64_default = 0,         /* standard base64 format (with padding) */
   base64_url = 1,             /* base64url format (no padding) */
   base64_reverse_padding = 2, /* modifier for base64_default and base64_url */
@@ -2664,9 +2688,9 @@ enum : base64_options {
 // chunk in base64 decoding.
 // https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
 enum last_chunk_handling_options : uint64_t {
-  loose = 0, /* standard base64 format, decode partial final chunk */
-  strict =
-      1, /* error when the last chunk is partial, 2 or 3 chars, and unpadded */
+  loose = 0,  /* standard base64 format, decode partial final chunk */
+  strict = 1, /* error when the last chunk is partial, 2 or 3 chars, and
+                 unpadded, or non-zero bit padding */
   stop_before_partial =
       2, /* if the last chunk is partial (2 or 3 chars), ignore it (no error) */
 };
@@ -2706,11 +2730,11 @@ simdutf_warn_unused size_t maximal_binary_length_from_base64(
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are two possible
- * reasons for failure: the input contains a number of base64 characters that
- * when divided by 4, leaves a single remainder character
- * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
- * valid base64 character (INVALID_BASE64_CHARACTER).
+ * This function will fail in case of invalid input. When last_chunk_options =
+ * loose, there are two possible reasons for failure: the input contains a
+ * number of base64 characters that when divided by 4, leaves a single remainder
+ * character (BASE64_INPUT_REMAINDER), or the input contains a character that is
+ * not a valid base64 character (INVALID_BASE64_CHARACTER).
  *
  * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the
  * input where the invalid character was found. When the error is
@@ -2746,7 +2770,7 @@ simdutf_warn_unused size_t maximal_binary_length_from_base64(
  * last_chunk_handling_options::loose by default
  * but can also be last_chunk_handling_options::strict or
  * last_chunk_handling_options::stop_before_partial.
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and either position of the error
  * (in the input in bytes) if any, or the number of bytes written if successful.
  */
@@ -2798,11 +2822,11 @@ size_t binary_to_base64(const char *input, size_t length, char *output,
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are two possible
- * reasons for failure: the input contains a number of base64 characters that
- * when divided by 4, leaves a single remainder character
- * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
- * valid base64 character (INVALID_BASE64_CHARACTER).
+ * This function will fail in case of invalid input. When last_chunk_options =
+ * loose, there are two possible reasons for failure: the input contains a
+ * number of base64 characters that when divided by 4, leaves a single remainder
+ * character (BASE64_INPUT_REMAINDER), or the input contains a character that is
+ * not a valid base64 character (INVALID_BASE64_CHARACTER).
  *
  * When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the
  * input where the invalid character was found. When the error is
@@ -2839,7 +2863,7 @@ size_t binary_to_base64(const char *input, size_t length, char *output,
  * last_chunk_handling_options::loose by default
  * but can also be last_chunk_handling_options::strict or
  * last_chunk_handling_options::stop_before_partial.
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and position of the
  * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number
  * of bytes written if successful.
@@ -2860,12 +2884,12 @@ base64_to_binary(const char16_t *input, size_t length, char *output,
  *
  * See https://infra.spec.whatwg.org/#forgiving-base64-decode
  *
- * This function will fail in case of invalid input. There are three possible
- * reasons for failure: the input contains a number of base64 characters that
- * when divided by 4, leaves a single remainder character
- * (BASE64_INPUT_REMAINDER), the input contains a character that is not a valid
- * base64 character (INVALID_BASE64_CHARACTER), or the output buffer is too
- * small (OUTPUT_BUFFER_TOO_SMALL).
+ * This function will fail in case of invalid input. When last_chunk_options =
+ * loose, there are three possible reasons for failure: the input contains a
+ * number of base64 characters that when divided by 4, leaves a single remainder
+ * character (BASE64_INPUT_REMAINDER), the input contains a character that is
+ * not a valid base64 character (INVALID_BASE64_CHARACTER), or the output buffer
+ * is too small (OUTPUT_BUFFER_TOO_SMALL).
  *
  * When OUTPUT_BUFFER_TOO_SMALL, we return both the number of bytes written
  * and the number of units processed, see description of the parameters and
@@ -2906,7 +2930,7 @@ base64_to_binary(const char16_t *input, size_t length, char *output,
  * last_chunk_handling_options::loose by default
  * but can also be last_chunk_handling_options::strict or
  * last_chunk_handling_options::stop_before_partial.
- * @return a result pair struct (of type simdutf::error containing the two
+ * @return a result pair struct (of type simdutf::result containing the two
  * fields error and count) with an error code and position of the
  * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the number
  * of units processed if successful.
@@ -3012,7 +3036,7 @@ class implementation {
    *
    * @param buf the UTF-8 string to validate.
    * @param len the length of the string in bytes.
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3039,7 +3063,7 @@ class implementation {
    *
    * @param buf the ASCII string to validate.
    * @param len the length of the string in bytes.
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3092,7 +3116,7 @@ class implementation {
    * @param buf the UTF-16LE string to validate.
    * @param len the length of the string in number of 2-byte code units
    * (char16_t).
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3112,7 +3136,7 @@ class implementation {
    * @param buf the UTF-16BE string to validate.
    * @param len the length of the string in number of 2-byte code units
    * (char16_t).
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3146,7 +3170,7 @@ class implementation {
    * @param buf the UTF-32 string to validate.
    * @param len the length of the string in number of 4-byte code units
    * (char32_t).
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3238,7 +3262,7 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param latin1_output  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3312,7 +3336,7 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3331,7 +3355,7 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf16_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of code units validated
    * if successful.
@@ -3365,7 +3389,7 @@ class implementation {
    * @param input         the UTF-8 string to convert
    * @param length        the length of the string in bytes
    * @param utf32_buffer  the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char32_t written if
    * successful.
@@ -3502,7 +3526,7 @@ class implementation {
    * (char16_t)
    * @param latin1_buffer   the pointer to buffer that can hold conversion
    * result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -3525,7 +3549,7 @@ class implementation {
    * (char16_t)
    * @param latin1_buffer   the pointer to buffer that can hold conversion
    * result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -3633,7 +3657,7 @@ class implementation {
    * @param length        the length of the string in 2-byte code units
    * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -3655,7 +3679,7 @@ class implementation {
    * @param length        the length of the string in 2-byte code units
    * (char16_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -3751,7 +3775,7 @@ class implementation {
    * @param length        the length of the string in 2-byte code units
    * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char32_t written if
    * successful.
@@ -3773,7 +3797,7 @@ class implementation {
    * @param length        the length of the string in 2-byte code units
    * (char16_t)
    * @param utf32_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char32_t written if
    * successful.
@@ -3889,7 +3913,7 @@ class implementation {
    * (char32_t)
    * @param latin1_buffer   the pointer to buffer that can hold conversion
    * result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -3953,7 +3977,7 @@ class implementation {
    * @param length        the length of the string in 4-byte code units
    * (char32_t)
    * @param utf8_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char written if
    * successful.
@@ -4044,7 +4068,7 @@ class implementation {
    * @param length        the length of the string in 4-byte code units
    * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char16_t written if
    * successful.
@@ -4066,7 +4090,7 @@ class implementation {
    * @param length        the length of the string in 4-byte code units
    * (char32_t)
    * @param utf16_buffer   the pointer to buffer that can hold conversion result
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in code units) if any, or the number of char16_t written if
    * successful.
@@ -4361,11 +4385,11 @@ class implementation {
    *
    * See https://infra.spec.whatwg.org/#forgiving-base64-decode
    *
-   * This function will fail in case of invalid input. There are two possible
-   * reasons for failure: the input contains a number of base64 characters that
-   * when divided by 4, leaves a single remainder character
-   * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
-   * valid base64 character (INVALID_BASE64_CHARACTER).
+   * This function will fail in case of invalid input. When last_chunk_options =
+   * loose, there are two possible reasons for failure: the input contains a
+   * number of base64 characters that when divided by 4, leaves a single
+   * remainder character (BASE64_INPUT_REMAINDER), or the input contains a
+   * character that is not a valid base64 character (INVALID_BASE64_CHARACTER).
    *
    * You should call this function with a buffer that is at least
    * maximal_binary_length_from_base64(input, length) bytes long. If you fail to
@@ -4378,7 +4402,7 @@ class implementation {
    * bytes long).
    * @param options       the base64 options to use, can be base64_default or
    * base64_url, is base64_default by default.
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and either position of the error
    * (in the input in bytes) if any, or the number of bytes written if
    * successful.
@@ -4389,6 +4413,42 @@ class implementation {
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept = 0;
 
+  /**
+   * Convert a base64 input to a binary output while returning more details
+   * than base64_to_binary.
+   *
+   * This function follows the WHATWG forgiving-base64 format, which means that
+   * it will ignore any ASCII spaces in the input. You may provide a padded
+   * input (with one or two equal signs at the end) or an unpadded input
+   * (without any equal signs at the end).
+   *
+   * See https://infra.spec.whatwg.org/#forgiving-base64-decode
+   *
+   * This function will fail in case of invalid input. When last_chunk_options =
+   * loose, there are two possible reasons for failure: the input contains a
+   * number of base64 characters that when divided by 4, leaves a single
+   * remainder character (BASE64_INPUT_REMAINDER), or the input contains a
+   * character that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+   *
+   * You should call this function with a buffer that is at least
+   * maximal_binary_length_from_base64(input, length) bytes long. If you fail to
+   * provide that much space, the function may cause a buffer overflow.
+   *
+   * @param input         the base64 string to process
+   * @param length        the length of the string in bytes
+   * @param output        the pointer to buffer that can hold the conversion
+   * result (should be at least maximal_binary_length_from_base64(input, length)
+   * bytes long).
+   * @param options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
+   * @return a full_result pair struct (of type simdutf::result containing the
+   * three fields error, input_count and output_count).
+   */
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char *input, size_t length, char *output,
+      base64_options options = base64_default,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept = 0;
   /**
    * Convert a base64 input to a binary output.
    *
@@ -4399,11 +4459,11 @@ class implementation {
    *
    * See https://infra.spec.whatwg.org/#forgiving-base64-decode
    *
-   * This function will fail in case of invalid input. There are two possible
-   * reasons for failure: the input contains a number of base64 characters that
-   * when divided by 4, leaves a single remainder character
-   * (BASE64_INPUT_REMAINDER), or the input contains a character that is not a
-   * valid base64 character (INVALID_BASE64_CHARACTER).
+   * This function will fail in case of invalid input. When last_chunk_options =
+   * loose, there are two possible reasons for failure: the input contains a
+   * number of base64 characters that when divided by 4, leaves a single
+   * remainder character (BASE64_INPUT_REMAINDER), or the input contains a
+   * character that is not a valid base64 character (INVALID_BASE64_CHARACTER).
    *
    * You should call this function with a buffer that is at least
    * maximal_binary_length_from_utf6_base64(input, length) bytes long. If you
@@ -4417,7 +4477,7 @@ class implementation {
    * bytes long).
    * @param options       the base64 options to use, can be base64_default or
    * base64_url, is base64_default by default.
-   * @return a result pair struct (of type simdutf::error containing the two
+   * @return a result pair struct (of type simdutf::result containing the two
    * fields error and count) with an error code and position of the
    * INVALID_BASE64_CHARACTER error (in the input in units) if any, or the
    * number of bytes written if successful.
@@ -4428,6 +4488,42 @@ class implementation {
                    last_chunk_handling_options last_chunk_options =
                        last_chunk_handling_options::loose) const noexcept = 0;
 
+  /**
+   * Convert a base64 input to a binary output while returning more details
+   * than base64_to_binary.
+   *
+   * This function follows the WHATWG forgiving-base64 format, which means that
+   * it will ignore any ASCII spaces in the input. You may provide a padded
+   * input (with one or two equal signs at the end) or an unpadded input
+   * (without any equal signs at the end).
+   *
+   * See https://infra.spec.whatwg.org/#forgiving-base64-decode
+   *
+   * This function will fail in case of invalid input. When last_chunk_options =
+   * loose, there are two possible reasons for failure: the input contains a
+   * number of base64 characters that when divided by 4, leaves a single
+   * remainder character (BASE64_INPUT_REMAINDER), or the input contains a
+   * character that is not a valid base64 character (INVALID_BASE64_CHARACTER).
+   *
+   * You should call this function with a buffer that is at least
+   * maximal_binary_length_from_base64(input, length) bytes long. If you fail to
+   * provide that much space, the function may cause a buffer overflow.
+   *
+   * @param input         the base64 string to process
+   * @param length        the length of the string in bytes
+   * @param output        the pointer to buffer that can hold the conversion
+   * result (should be at least maximal_binary_length_from_base64(input, length)
+   * bytes long).
+   * @param options       the base64 options to use, can be base64_default or
+   * base64_url, is base64_default by default.
+   * @return a full_result pair struct (of type simdutf::result containing the
+   * three fields error, input_count and output_count).
+   */
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options = base64_default,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept = 0;
   /**
    * Provide the base64 length in bytes given the length of a binary input.
    *

From d9fd632f566b5d32ad860468f976d220207d6750 Mon Sep 17 00:00:00 2001
From: Aviv Keller <redyetidev@gmail.com>
Date: Sun, 17 Nov 2024 05:25:12 -0500
Subject: [PATCH 112/216] test_runner: error on mocking an already mocked date

Fixes #55849

PR-URL: https://github.com/nodejs/node/pull/55858
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
---
 lib/internal/test_runner/mock/mock_timers.js  | 3 +++
 test/parallel/test-runner-mock-timers-date.js | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/lib/internal/test_runner/mock/mock_timers.js b/lib/internal/test_runner/mock/mock_timers.js
index 059d3fc96ac86c..925071044ee868 100644
--- a/lib/internal/test_runner/mock/mock_timers.js
+++ b/lib/internal/test_runner/mock/mock_timers.js
@@ -331,6 +331,9 @@ class MockTimers {
   #createDate() {
     kMock ??= Symbol('MockTimers');
     const NativeDateConstructor = this.#nativeDateDescriptor.value;
+    if (NativeDateConstructor.isMock) {
+      throw new ERR_INVALID_STATE('Date is already being mocked!');
+    }
     /**
      * Function to mock the Date constructor, treats cases as per ECMA-262
      * and returns a Date object with a mocked implementation
diff --git a/test/parallel/test-runner-mock-timers-date.js b/test/parallel/test-runner-mock-timers-date.js
index ebd1e430be803f..7cee835eccaba3 100644
--- a/test/parallel/test-runner-mock-timers-date.js
+++ b/test/parallel/test-runner-mock-timers-date.js
@@ -117,4 +117,11 @@ describe('Mock Timers Date Test Suite', () => {
     assert.strictEqual(fn.mock.callCount(), 0);
     clearTimeout(id);
   });
+
+  it((t) => {
+    t.mock.timers.enable();
+    t.test('should throw when a already-mocked Date is mocked', (t2) => {
+      assert.throws(() => t2.mock.timers.enable(), { code: 'ERR_INVALID_STATE' });
+    });
+  });
 });

From bd0ec907daa3964629236fb709e7c5bb53cf6eeb Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 18 Oct 2024 09:20:43 +0200
Subject: [PATCH 113/216] url: handle "unsafe" characters properly in
 `pathToFileURL`

Co-authored-by: EarlyRiser42 <tkfydtls464@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/54545
Fixes: https://github.com/nodejs/node/issues/54515
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 lib/internal/process/execution.js       |  2 +-
 lib/internal/url.js                     | 94 +++++++++++++++----------
 test/parallel/test-url-pathtofileurl.js | 13 ++--
 3 files changed, 67 insertions(+), 42 deletions(-)

diff --git a/lib/internal/process/execution.js b/lib/internal/process/execution.js
index e69add7394e60f..4f72aa41251836 100644
--- a/lib/internal/process/execution.js
+++ b/lib/internal/process/execution.js
@@ -49,7 +49,7 @@ function tryGetCwd() {
 
 let evalIndex = 0;
 function getEvalModuleUrl() {
-  return pathToFileURL(`${process.cwd()}/[eval${++evalIndex}]`).href;
+  return `${pathToFileURL(process.cwd())}/[eval${++evalIndex}]`;
 }
 
 /**
diff --git a/lib/internal/url.js b/lib/internal/url.js
index e6ed5466b8807a..c8d70c484ee52e 100644
--- a/lib/internal/url.js
+++ b/lib/internal/url.js
@@ -1490,44 +1490,75 @@ function fileURLToPath(path, options = kEmptyObject) {
   return (windows ?? isWindows) ? getPathFromURLWin32(path) : getPathFromURLPosix(path);
 }
 
-// The following characters are percent-encoded when converting from file path
-// to URL:
-// - %: The percent character is the only character not encoded by the
-//        `pathname` setter.
-// - \: Backslash is encoded on non-windows platforms since it's a valid
-//      character but the `pathname` setters replaces it by a forward slash.
-// - LF: The newline character is stripped out by the `pathname` setter.
-//       (See whatwg/url#419)
-// - CR: The carriage return character is also stripped out by the `pathname`
-//       setter.
-// - TAB: The tab character is also stripped out by the `pathname` setter.
+// RFC1738 defines the following chars as "unsafe" for URLs
+// @see https://www.ietf.org/rfc/rfc1738.txt 2.2. URL Character Encoding Issues
 const percentRegEx = /%/g;
-const backslashRegEx = /\\/g;
 const newlineRegEx = /\n/g;
 const carriageReturnRegEx = /\r/g;
 const tabRegEx = /\t/g;
-const questionRegex = /\?/g;
+const quoteRegEx = /"/g;
 const hashRegex = /#/g;
+const spaceRegEx = / /g;
+const questionMarkRegex = /\?/g;
+const openSquareBracketRegEx = /\[/g;
+const backslashRegEx = /\\/g;
+const closeSquareBracketRegEx = /]/g;
+const caretRegEx = /\^/g;
+const verticalBarRegEx = /\|/g;
+const tildeRegEx = /~/g;
 
 function encodePathChars(filepath, options = kEmptyObject) {
-  const windows = options?.windows;
-  if (StringPrototypeIndexOf(filepath, '%') !== -1)
+  if (StringPrototypeIncludes(filepath, '%')) {
     filepath = RegExpPrototypeSymbolReplace(percentRegEx, filepath, '%25');
-  // In posix, backslash is a valid character in paths:
-  if (!(windows ?? isWindows) && StringPrototypeIndexOf(filepath, '\\') !== -1)
-    filepath = RegExpPrototypeSymbolReplace(backslashRegEx, filepath, '%5C');
-  if (StringPrototypeIndexOf(filepath, '\n') !== -1)
+  }
+
+  if (StringPrototypeIncludes(filepath, '\t')) {
+    filepath = RegExpPrototypeSymbolReplace(tabRegEx, filepath, '%09');
+  }
+  if (StringPrototypeIncludes(filepath, '\n')) {
     filepath = RegExpPrototypeSymbolReplace(newlineRegEx, filepath, '%0A');
-  if (StringPrototypeIndexOf(filepath, '\r') !== -1)
+  }
+  if (StringPrototypeIncludes(filepath, '\r')) {
     filepath = RegExpPrototypeSymbolReplace(carriageReturnRegEx, filepath, '%0D');
-  if (StringPrototypeIndexOf(filepath, '\t') !== -1)
-    filepath = RegExpPrototypeSymbolReplace(tabRegEx, filepath, '%09');
+  }
+  if (StringPrototypeIncludes(filepath, ' ')) {
+    filepath = RegExpPrototypeSymbolReplace(spaceRegEx, filepath, '%20');
+  }
+  if (StringPrototypeIncludes(filepath, '"')) {
+    filepath = RegExpPrototypeSymbolReplace(quoteRegEx, filepath, '%22');
+  }
+  if (StringPrototypeIncludes(filepath, '#')) {
+    filepath = RegExpPrototypeSymbolReplace(hashRegex, filepath, '%23');
+  }
+  if (StringPrototypeIncludes(filepath, '?')) {
+    filepath = RegExpPrototypeSymbolReplace(questionMarkRegex, filepath, '%3F');
+  }
+  if (StringPrototypeIncludes(filepath, '[')) {
+    filepath = RegExpPrototypeSymbolReplace(openSquareBracketRegEx, filepath, '%5B');
+  }
+  // Back-slashes must be special-cased on Windows, where they are treated as path separator.
+  if (!options.windows && StringPrototypeIncludes(filepath, '\\')) {
+    filepath = RegExpPrototypeSymbolReplace(backslashRegEx, filepath, '%5C');
+  }
+  if (StringPrototypeIncludes(filepath, ']')) {
+    filepath = RegExpPrototypeSymbolReplace(closeSquareBracketRegEx, filepath, '%5D');
+  }
+  if (StringPrototypeIncludes(filepath, '^')) {
+    filepath = RegExpPrototypeSymbolReplace(caretRegEx, filepath, '%5E');
+  }
+  if (StringPrototypeIncludes(filepath, '|')) {
+    filepath = RegExpPrototypeSymbolReplace(verticalBarRegEx, filepath, '%7C');
+  }
+  if (StringPrototypeIncludes(filepath, '~')) {
+    filepath = RegExpPrototypeSymbolReplace(tildeRegEx, filepath, '%7E');
+  }
+
   return filepath;
 }
 
 function pathToFileURL(filepath, options = kEmptyObject) {
-  const windows = options?.windows;
-  if ((windows ?? isWindows) && StringPrototypeStartsWith(filepath, '\\\\')) {
+  const windows = options?.windows ?? isWindows;
+  if (windows && StringPrototypeStartsWith(filepath, '\\\\')) {
     const outURL = new URL('file://');
     // UNC path format: \\server\share\resource
     // Handle extended UNC path and standard UNC path
@@ -1558,7 +1589,7 @@ function pathToFileURL(filepath, options = kEmptyObject) {
     );
     return outURL;
   }
-  let resolved = (windows ?? isWindows) ? path.win32.resolve(filepath) : path.posix.resolve(filepath);
+  let resolved = windows ? path.win32.resolve(filepath) : path.posix.resolve(filepath);
   // path.resolve strips trailing slashes so we must add them back
   const filePathLast = StringPrototypeCharCodeAt(filepath,
                                                  filepath.length - 1);
@@ -1567,18 +1598,7 @@ function pathToFileURL(filepath, options = kEmptyObject) {
       resolved[resolved.length - 1] !== path.sep)
     resolved += '/';
 
-  // Call encodePathChars first to avoid encoding % again for ? and #.
-  resolved = encodePathChars(resolved, { windows });
-
-  // Question and hash character should be included in pathname.
-  // Therefore, encoding is required to eliminate parsing them in different states.
-  // This is done as an optimization to not creating a URL instance and
-  // later triggering pathname setter, which impacts performance
-  if (StringPrototypeIndexOf(resolved, '?') !== -1)
-    resolved = RegExpPrototypeSymbolReplace(questionRegex, resolved, '%3F');
-  if (StringPrototypeIndexOf(resolved, '#') !== -1)
-    resolved = RegExpPrototypeSymbolReplace(hashRegex, resolved, '%23');
-  return new URL(`file://${resolved}`);
+  return new URL(`file://${encodePathChars(resolved, { windows })}`);
 }
 
 function toPathIfFileURL(fileURLOrPath) {
diff --git a/test/parallel/test-url-pathtofileurl.js b/test/parallel/test-url-pathtofileurl.js
index 20609eb0ff5c9f..9c506e353f49e5 100644
--- a/test/parallel/test-url-pathtofileurl.js
+++ b/test/parallel/test-url-pathtofileurl.js
@@ -13,10 +13,7 @@ const url = require('url');
 {
   const fileURL = url.pathToFileURL('test\\').href;
   assert.ok(fileURL.startsWith('file:///'));
-  if (isWindows)
-    assert.ok(fileURL.endsWith('/'));
-  else
-    assert.ok(fileURL.endsWith('%5C'));
+  assert.match(fileURL, isWindows ? /\/$/ : /%5C$/);
 }
 
 {
@@ -104,6 +101,12 @@ const windowsTestCases = [
   { path: 'C:\\€', expected: 'file:///C:/%E2%82%AC' },
   // Rocket emoji (non-BMP code point)
   { path: 'C:\\🚀', expected: 'file:///C:/%F0%9F%9A%80' },
+  // caret
+  { path: 'C:\\foo^bar', expected: 'file:///C:/foo%5Ebar' },
+  // left bracket
+  { path: 'C:\\foo[bar', expected: 'file:///C:/foo%5Bbar' },
+  // right bracket
+  { path: 'C:\\foo]bar', expected: 'file:///C:/foo%5Dbar' },
   // Local extended path
   { path: '\\\\?\\C:\\path\\to\\file.txt', expected: 'file:///C:/path/to/file.txt' },
   // UNC path (see https://docs.microsoft.com/en-us/archive/blogs/ie/file-uris-in-windows)
@@ -154,6 +157,8 @@ const posixTestCases = [
   { path: '/€', expected: 'file:///%E2%82%AC' },
   // Rocket emoji (non-BMP code point)
   { path: '/🚀', expected: 'file:///%F0%9F%9A%80' },
+  // "unsafe" chars
+  { path: '/foo\r\n\t<>"#%{}|^[\\~]`?bar', expected: 'file:///foo%0D%0A%09%3C%3E%22%23%25%7B%7D%7C%5E%5B%5C%7E%5D%60%3Fbar' },
 ];
 
 for (const { path, expected } of windowsTestCases) {

From 50b6729d8c3fcdc5fe1fcc70661ef148884c49f6 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Sun, 27 Oct 2024 03:21:26 +0100
Subject: [PATCH 114/216] test: increase coverage of `pathToFileURL`

PR-URL: https://github.com/nodejs/node/pull/55493
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
---
 test/parallel/test-url-pathtofileurl.js | 26 +++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/test/parallel/test-url-pathtofileurl.js b/test/parallel/test-url-pathtofileurl.js
index 9c506e353f49e5..089232caeb3b2d 100644
--- a/test/parallel/test-url-pathtofileurl.js
+++ b/test/parallel/test-url-pathtofileurl.js
@@ -114,6 +114,7 @@ const windowsTestCases = [
   // Extended UNC path
   { path: '\\\\?\\UNC\\server\\share\\folder\\file.txt', expected: 'file://server/share/folder/file.txt' },
 ];
+const alphabet = String.fromCharCode(...Array.from({ length: 26 }, (_, i) => 'a'.charCodeAt() + i));
 const posixTestCases = [
   // Lowercase ascii alpha
   { path: '/foo', expected: 'file:///foo' },
@@ -159,6 +160,31 @@ const posixTestCases = [
   { path: '/🚀', expected: 'file:///%F0%9F%9A%80' },
   // "unsafe" chars
   { path: '/foo\r\n\t<>"#%{}|^[\\~]`?bar', expected: 'file:///foo%0D%0A%09%3C%3E%22%23%25%7B%7D%7C%5E%5B%5C%7E%5D%60%3Fbar' },
+  // All of the 16-bit UTF-16 chars
+  {
+    path: `/${Array.from({ length: 0x7FFF }, (_, i) => String.fromCharCode(i)).join('')}`,
+    expected: `file:///${
+      Array.from({ length: 0x21 }, (_, i) => `%${i.toString(16).toUpperCase().padStart(2, '0')}`).join('')
+    }!%22%23$%25&'()*+,-./0123456789:;%3C=%3E%3F@${
+      alphabet.toUpperCase()
+    }%5B%5C%5D%5E_%60${alphabet}%7B%7C%7D%7E%7F${
+      Array.from({ length: 0x800 - 0x80 }, (_, i) => `%${
+        (Math.floor((i - 0x80) / 0x40) + 0xC4).toString(16).toUpperCase()
+      }%${
+        ((i % 0x40) + 0x80).toString(16).toUpperCase()
+      }`).join('')
+    }${
+      Array.from({ length: 0x7FFF - 0x800 }, (_, i) => i + 0x800).map((i) => `%E${
+        (i >> 12).toString(16).toUpperCase()
+      }%${
+        (((i >> 6) % 0x40) + 0x80).toString(16).toUpperCase()
+      }%${
+        ((i % 0x40) + 0x80).toString(16).toUpperCase()
+      }`).join('')
+    }`
+  },
+  // Trying with some pair of 16-bit surrogate pseudo-characters
+  { path: `/${String.fromCodePoint(0x1F303)}`, expected: 'file:///%F0%9F%8C%83' },
 ];
 
 for (const { path, expected } of windowsTestCases) {

From a5b0d8900a112c9fcc0dc789c12a42f6fa668689 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?G=C3=BCrg=C3=BCn=20Day=C4=B1o=C4=9Flu?= <hey@gurgun.day>
Date: Sat, 19 Oct 2024 12:18:10 +0200
Subject: [PATCH 115/216] lib: remove startsWith/endsWith primordials for char
 checks

PR-URL: https://github.com/nodejs/node/pull/55407
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Mattias Buelens <mattias@buelens.com>
Reviewed-By: Ethan Arrowood <ethan@arrowood.dev>
---
 lib/_http_agent.js                       | 2 +-
 lib/internal/main/watch_mode.js          | 2 +-
 lib/internal/modules/esm/fetch_module.js | 7 +++----
 lib/internal/modules/esm/resolve.js      | 5 +++--
 lib/internal/modules/helpers.js          | 2 +-
 lib/internal/process/per_thread.js       | 3 +--
 lib/internal/process/pre_execution.js    | 4 +---
 lib/internal/repl/utils.js               | 5 +----
 lib/internal/url.js                      | 2 +-
 lib/internal/util/inspect.js             | 5 +++--
 lib/repl.js                              | 2 +-
 11 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/lib/_http_agent.js b/lib/_http_agent.js
index 1f7989b843c073..17f5d100c8e6cd 100644
--- a/lib/_http_agent.js
+++ b/lib/_http_agent.js
@@ -347,7 +347,7 @@ function calculateServerName(options, req) {
     // abc:123 => abc
     // [::1] => ::1
     // [::1]:123 => ::1
-    if (hostHeader.startsWith('[')) {
+    if (hostHeader[0] === '[') {
       const index = hostHeader.indexOf(']');
       if (index === -1) {
         // Leading '[', but no ']'. Need to do something...
diff --git a/lib/internal/main/watch_mode.js b/lib/internal/main/watch_mode.js
index fb07f8a642db3d..97ed6272c25560 100644
--- a/lib/internal/main/watch_mode.js
+++ b/lib/internal/main/watch_mode.js
@@ -45,7 +45,7 @@ for (let i = 0; i < process.execArgv.length; i++) {
   if (StringPrototypeStartsWith(arg, '--watch')) {
     i++;
     const nextArg = process.execArgv[i];
-    if (nextArg && StringPrototypeStartsWith(nextArg, '-')) {
+    if (nextArg && nextArg[0] === '-') {
       ArrayPrototypePush(argsWithoutWatchOptions, nextArg);
     }
     continue;
diff --git a/lib/internal/modules/esm/fetch_module.js b/lib/internal/modules/esm/fetch_module.js
index b6fd294eb2b624..93cde80ab9cc8e 100644
--- a/lib/internal/modules/esm/fetch_module.js
+++ b/lib/internal/modules/esm/fetch_module.js
@@ -3,9 +3,7 @@ const {
   ObjectPrototypeHasOwnProperty,
   PromisePrototypeThen,
   SafeMap,
-  StringPrototypeEndsWith,
   StringPrototypeSlice,
-  StringPrototypeStartsWith,
 } = primordials;
 const {
   Buffer: { concat: BufferConcat },
@@ -248,8 +246,9 @@ allowList.addRange('127.0.0.1', '127.255.255.255');
 async function isLocalAddress(hostname) {
   try {
     if (
-      StringPrototypeStartsWith(hostname, '[') &&
-      StringPrototypeEndsWith(hostname, ']')
+      hostname.length &&
+      hostname[0] === '[' &&
+      hostname[hostname.length - 1] === ']'
     ) {
       hostname = StringPrototypeSlice(hostname, 1, -1);
     }
diff --git a/lib/internal/modules/esm/resolve.js b/lib/internal/modules/esm/resolve.js
index c37d769f40088f..93c7a040fd47f0 100644
--- a/lib/internal/modules/esm/resolve.js
+++ b/lib/internal/modules/esm/resolve.js
@@ -399,8 +399,9 @@ function resolvePackageTargetString(
   }
 
   if (!StringPrototypeStartsWith(target, './')) {
-    if (internal && !StringPrototypeStartsWith(target, '../') &&
-        !StringPrototypeStartsWith(target, '/')) {
+    if (internal &&
+        target[0] !== '/' &&
+        !StringPrototypeStartsWith(target, '../')) {
       // No need to convert target to string, since it's already presumed to be
       if (!URLCanParse(target)) {
         const exportTarget = pattern ?
diff --git a/lib/internal/modules/helpers.js b/lib/internal/modules/helpers.js
index eb5c552c500164..7cdde181e97a10 100644
--- a/lib/internal/modules/helpers.js
+++ b/lib/internal/modules/helpers.js
@@ -244,7 +244,7 @@ function addBuiltinLibsToObject(object, dummyModuleName) {
   ArrayPrototypeForEach(builtinModules, (name) => {
     // Neither add underscored modules, nor ones that contain slashes (e.g.,
     // 'fs/promises') or ones that are already defined.
-    if (StringPrototypeStartsWith(name, '_') ||
+    if (name[0] === '_' ||
         StringPrototypeIncludes(name, '/') ||
         ObjectPrototypeHasOwnProperty(object, name)) {
       return;
diff --git a/lib/internal/process/per_thread.js b/lib/internal/process/per_thread.js
index fd515436b4004d..b0f6993afb77ac 100644
--- a/lib/internal/process/per_thread.js
+++ b/lib/internal/process/per_thread.js
@@ -25,7 +25,6 @@ const {
   StringPrototypeEndsWith,
   StringPrototypeReplace,
   StringPrototypeSlice,
-  StringPrototypeStartsWith,
   Symbol,
   SymbolIterator,
 } = primordials;
@@ -302,7 +301,7 @@ function buildAllowedFlags() {
   }
 
   function isAccepted(to) {
-    if (!StringPrototypeStartsWith(to, '-') || to === '--') return true;
+    if (!to.length || to[0] !== '-' || to === '--') return true;
     const recursiveExpansion = aliases.get(to);
     if (recursiveExpansion) {
       if (recursiveExpansion[0] === to)
diff --git a/lib/internal/process/pre_execution.js b/lib/internal/process/pre_execution.js
index a05d2846050c2f..0bbabb80c26a12 100644
--- a/lib/internal/process/pre_execution.js
+++ b/lib/internal/process/pre_execution.js
@@ -15,7 +15,6 @@ const {
   ObjectGetOwnPropertyDescriptor,
   SafeMap,
   String,
-  StringPrototypeStartsWith,
   Symbol,
   globalThis,
 } = primordials;
@@ -244,8 +243,7 @@ function patchProcessObject(expandArgv1) {
   let mainEntry;
   // If requested, update process.argv[1] to replace whatever the user provided with the resolved absolute file path of
   // the entry point.
-  if (expandArgv1 && process.argv[1] &&
-      !StringPrototypeStartsWith(process.argv[1], '-')) {
+  if (expandArgv1 && process.argv[1] && process.argv[1][0] !== '-') {
     // Expand process.argv[1] into a full path.
     const path = require('path');
     try {
diff --git a/lib/internal/repl/utils.js b/lib/internal/repl/utils.js
index 27e1011ec9daf6..126f8ae85d0977 100644
--- a/lib/internal/repl/utils.js
+++ b/lib/internal/repl/utils.js
@@ -10,12 +10,10 @@ const {
   RegExpPrototypeExec,
   SafeSet,
   SafeStringIterator,
-  StringPrototypeEndsWith,
   StringPrototypeIndexOf,
   StringPrototypeLastIndexOf,
   StringPrototypeReplaceAll,
   StringPrototypeSlice,
-  StringPrototypeStartsWith,
   StringPrototypeToLowerCase,
   StringPrototypeTrim,
   Symbol,
@@ -298,8 +296,7 @@ function setupPreview(repl, contextSymbol, bufferSymbol, active) {
   function getInputPreview(input, callback) {
     // For similar reasons as `defaultEval`, wrap expressions starting with a
     // curly brace with parenthesis.
-    if (StringPrototypeStartsWith(input, '{') &&
-        !StringPrototypeEndsWith(input, ';') && !wrapped) {
+    if (!wrapped && input[0] === '{' && input[input.length - 1] !== ';') {
       input = `(${input})`;
       wrapped = true;
     }
diff --git a/lib/internal/url.js b/lib/internal/url.js
index c8d70c484ee52e..42debfc20005b0 100644
--- a/lib/internal/url.js
+++ b/lib/internal/url.js
@@ -1408,7 +1408,7 @@ function urlToHttpOptions(url) {
     __proto__: null,
     ...url, // In case the url object was extended by the user.
     protocol: url.protocol,
-    hostname: hostname && StringPrototypeStartsWith(hostname, '[') ?
+    hostname: hostname && hostname[0] === '[' ?
       StringPrototypeSlice(hostname, 1, -1) :
       hostname,
     hash: url.hash,
diff --git a/lib/internal/util/inspect.js b/lib/internal/util/inspect.js
index c8bb717788fe0a..c547e7697a6b40 100644
--- a/lib/internal/util/inspect.js
+++ b/lib/internal/util/inspect.js
@@ -1181,7 +1181,7 @@ function getClassBase(value, constructor, tag) {
 
 function getFunctionBase(value, constructor, tag) {
   const stringified = FunctionPrototypeToString(value);
-  if (StringPrototypeStartsWith(stringified, 'class') && StringPrototypeEndsWith(stringified, '}')) {
+  if (StringPrototypeStartsWith(stringified, 'class') && stringified[stringified.length - 1] === '}') {
     const slice = StringPrototypeSlice(stringified, 5, -1);
     const bracketIndex = StringPrototypeIndexOf(slice, '{');
     if (bracketIndex !== -1 &&
@@ -1560,7 +1560,8 @@ function handleMaxCallStackSize(ctx, err, constructorName, indentationLvl) {
 function addNumericSeparator(integerString) {
   let result = '';
   let i = integerString.length;
-  const start = StringPrototypeStartsWith(integerString, '-') ? 1 : 0;
+  assert(i !== 0);
+  const start = integerString[0] === '-' ? 1 : 0;
   for (; i >= start + 4; i -= 3) {
     result = `_${StringPrototypeSlice(integerString, i - 3, i)}${result}`;
   }
diff --git a/lib/repl.js b/lib/repl.js
index 63802cc33e43b6..abd7ca512e6295 100644
--- a/lib/repl.js
+++ b/lib/repl.js
@@ -130,7 +130,7 @@ const { shouldColorize } = require('internal/util/colors');
 const CJSModule = require('internal/modules/cjs/loader').Module;
 let _builtinLibs = ArrayPrototypeFilter(
   CJSModule.builtinModules,
-  (e) => !StringPrototypeStartsWith(e, '_'),
+  (e) => e[0] !== '_',
 );
 const nodeSchemeBuiltinLibs = ArrayPrototypeMap(
   _builtinLibs, (lib) => `node:${lib}`);

From 55c205e5f6761d41d56d6ddc4d9a7b17f790e19d Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Sat, 23 Nov 2024 13:48:46 -0300
Subject: [PATCH 116/216] build: add create release proposal action

PR-URL: https://github.com/nodejs/node/pull/55690
Refs: https://github.com/nodejs/security-wg/issues/860
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 .github/workflows/create-release-proposal.yml | 86 +++++++++++++++++++
 tools/actions/create-release.sh               | 33 +++++++
 2 files changed, 119 insertions(+)
 create mode 100644 .github/workflows/create-release-proposal.yml
 create mode 100755 tools/actions/create-release.sh

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
new file mode 100644
index 00000000000000..5f0f80eed24c95
--- /dev/null
+++ b/.github/workflows/create-release-proposal.yml
@@ -0,0 +1,86 @@
+# This action requires the following secrets to be set on the repository:
+#   GH_USER_NAME: GitHub user whose Jenkins and GitHub token are defined below
+#   GH_USER_TOKEN: GitHub user token, to be used by ncu and to push changes
+#   JENKINS_TOKEN: Jenkins token, to be used to check CI status
+
+name: Create Release Proposal
+
+on:
+  workflow_dispatch:
+    inputs:
+      release-line:
+        required: true
+        type: number
+        default: 23
+        description: 'The release line (without dots or prefix). e.g: 22'
+      release-date:
+        required: true
+        type: string
+        default: YYYY-MM-DD
+        description: The release date in YYYY-MM-DD format
+
+concurrency: ${{ github.workflow }}
+
+env:
+  NODE_VERSION: lts/*
+
+permissions:
+  contents: write
+
+jobs:
+  releasePrepare:
+    env:
+      STAGING_BRANCH: v${{ inputs.release-line }}.x-staging
+      RELEASE_BRANCH: v${{ inputs.release-line }}.x
+      RELEASE_DATE: ${{ inputs.release-date }}
+      RELEASE_LINE: ${{ inputs.release-line }}
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332  # v4.1.7
+        with:
+          ref: ${{ env.STAGING_BRANCH }}
+          # Needs the whole git history for ncu to work
+          # See https://github.com/nodejs/node-core-utils/pull/486
+          fetch-depth: 0
+
+      # Install dependencies
+      - name: Install Node.js
+        uses: actions/setup-node@1e60f620b9541d16bece96c5465dc8ee9832be0b  # v4.0.3
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+
+      - name: Install @node-core/utils
+        run: npm install -g @node-core/utils
+
+      - name: Configure @node-core/utils
+        run: |
+          ncu-config set branch "${RELEASE_BRANCH}"
+          ncu-config set upstream origin
+          ncu-config set username "$USERNAME"
+          ncu-config set token "$GH_TOKEN"
+          ncu-config set jenkins_token "$JENKINS_TOKEN"
+          ncu-config set repo "$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)"
+          ncu-config set owner "${GITHUB_REPOSITORY_OWNER}"
+        env:
+          USERNAME: ${{ secrets.JENKINS_USER }}
+          GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
+          JENKINS_TOKEN: ${{ secrets.JENKINS_TOKEN }}
+
+      - name: Set up ghauth config (Ubuntu)
+        run: |
+          mkdir -p ~/.config/changelog-maker/
+          echo '{
+            "user": "'$(ncu-config get username)'",
+            "token": "'$(ncu-config get token)'"
+          }' > ~/.config/changelog-maker/config.json
+
+      - name: Setup git author
+        run: |
+          git config --local user.email "github-bot@iojs.org"
+          git config --local user.name "Node.js GitHub Bot"
+
+      - name: Start git node release prepare
+        run: |
+          ./tools/actions/create-release.sh "${RELEASE_DATE}" "${RELEASE_LINE}"
+        env:
+          GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
diff --git a/tools/actions/create-release.sh b/tools/actions/create-release.sh
new file mode 100755
index 00000000000000..3a69b3f5602ffc
--- /dev/null
+++ b/tools/actions/create-release.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+set -xe
+
+RELEASE_DATE=$1
+RELEASE_LINE=$2
+
+if [ -z "$RELEASE_DATE" ] || [ -z "$RELEASE_LINE" ]; then
+  echo "Usage: $0 <RELEASE_DATE> <RELEASE_LINE>"
+  exit 1
+fi
+
+git node release --prepare --skipBranchDiff --yes --releaseDate "$RELEASE_DATE"
+# We use it to not specify the branch name as it changes based on
+# the commit list (semver-minor/semver-patch)
+git config push.default current
+git push
+
+TITLE=$(awk "/^## ${RELEASE_DATE}/ { print substr(\$0, 4) }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")
+
+# Use a temporary file for the PR body
+TEMP_BODY="$(awk "/## ${RELEASE_DATE}/,/^<a id=/{ if (!/^<a id=/) print }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")"
+
+PR_URL="$(gh pr create --title "$TITLE" --body "$TEMP_BODY" --base "v$RELEASE_LINE.x")"
+
+# Amend commit message so it contains the correct PR-URL trailer.
+AMENDED_COMMIT_MSG="$(git log -1 --pretty=%B | sed "s|PR-URL: TODO|PR-URL: $PR_URL|")"
+
+# Replace "TODO" with the PR URL in the last commit
+git commit --amend --no-edit -m "$AMENDED_COMMIT_MSG" || true
+
+# Force-push the amended commit
+git push --force

From 559a0bfa2e88265079b561d00defeeb74ea9b74a Mon Sep 17 00:00:00 2001
From: Gireesh Punathil <gpunathi@in.ibm.com>
Date: Sat, 2 Nov 2024 18:07:09 +0530
Subject: [PATCH 117/216] doc: add a note on console stream behavior
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Many user reported issues show poor awareness of the
nature of console streams. explicitly document that.

PR-URL: https://github.com/nodejs/node/pull/55616
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/console.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/api/console.md b/doc/api/console.md
index 147a45eadb1203..1c966d094d1472 100644
--- a/doc/api/console.md
+++ b/doc/api/console.md
@@ -19,7 +19,11 @@ The module exports two specific components:
 
 _**Warning**_: The global console object's methods are neither consistently
 synchronous like the browser APIs they resemble, nor are they consistently
-asynchronous like all other Node.js streams. See the [note on process I/O][] for
+asynchronous like all other Node.js streams. Programs that desire to depend
+on the synchronous / asynchronous behavior of the console functions should
+first figure out the nature of console's backing stream. This is because the
+stream is dependent on the underlying platform and standard stream
+configuration of the current process. See the [note on process I/O][] for
 more information.
 
 Example using the global `console`:

From a2f315ef8b1c6a03f2dcdff618560244d9f8c0af Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 18 Nov 2024 20:01:06 -0500
Subject: [PATCH 118/216] deps: update simdutf to 5.6.2

PR-URL: https://github.com/nodejs/node/pull/55889
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 deps/simdutf/simdutf.cpp | 36 ++++++++++++++++++++++++++++++------
 deps/simdutf/simdutf.h   |  6 +++---
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/deps/simdutf/simdutf.cpp b/deps/simdutf/simdutf.cpp
index d5f6fbd9a4c413..2da5ca7284c1f0 100644
--- a/deps/simdutf/simdutf.cpp
+++ b/deps/simdutf/simdutf.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-12 20:00:19 -0500. Do not edit! */
+/* auto-generated on 2024-11-14 14:52:31 -0500. Do not edit! */
 /* begin file src/simdutf.cpp */
 #include "simdutf.h"
 // We include base64_tables once.
@@ -7229,6 +7229,11 @@ template <class char_type> bool is_ascii_white_space(char_type c) {
   return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';
 }
 
+template <class char_type> bool is_ascii_white_space_or_padding(char_type c) {
+  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f' ||
+         c == '=';
+}
+
 template <class char_type> bool is_eight_byte(char_type c) {
   if (sizeof(char_type) == 1) {
     return true;
@@ -9491,6 +9496,21 @@ simdutf_warn_unused result base64_to_binary_safe_impl(
     if (r.error != error_code::INVALID_BASE64_CHARACTER &&
         r.error != error_code::BASE64_EXTRA_BITS) {
       outlen = r.output_count;
+      if (last_chunk_handling_options == stop_before_partial) {
+        if ((r.output_count % 3) != 0) {
+          bool empty_trail = true;
+          for (size_t i = r.input_count; i < length; i++) {
+            if (!scalar::base64::is_ascii_white_space_or_padding(input[i])) {
+              empty_trail = false;
+              break;
+            }
+          }
+          if (empty_trail) {
+            r.input_count = length;
+          }
+        }
+        return {r.error, r.input_count};
+      }
       return {r.error, length};
     }
     return r;
@@ -9557,7 +9577,11 @@ simdutf_warn_unused result base64_to_binary_safe_impl(
   }
   if (rr.error == error_code::SUCCESS &&
       last_chunk_handling_options == stop_before_partial) {
-    rr.count = tail_input - input;
+    if (tail_input > input + input_index) {
+      rr.count = tail_input - input;
+    } else if (r.input_count > 0) {
+      rr.count = r.input_count + rr.count;
+    }
     return rr;
   }
   rr.count += input_index;
@@ -15891,9 +15915,9 @@ compress_decode_base64(char *dst, const char_type *src, size_t srclen,
   if (src < srcend + equalsigns) {
     full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
     if (r.error == error_code::INVALID_BASE64_CHARACTER ||
         r.error == error_code::BASE64_EXTRA_BITS) {
-      r.input_count += size_t(src - srcinit);
       return r;
     } else {
       r.output_count += size_t(dst - dstinit);
@@ -23716,9 +23740,9 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (src < srcend + equalsigns) {
     full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
     if (r.error == error_code::INVALID_BASE64_CHARACTER ||
         r.error == error_code::BASE64_EXTRA_BITS) {
-      r.input_count += size_t(src - srcinit);
       return r;
     } else {
       r.output_count += size_t(dst - dstinit);
@@ -28552,9 +28576,9 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (src < srcend + equalsigns) {
     full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
     if (r.error == error_code::INVALID_BASE64_CHARACTER ||
         r.error == error_code::BASE64_EXTRA_BITS) {
-      r.input_count += size_t(src - srcinit);
       return r;
     } else {
       r.output_count += size_t(dst - dstinit);
@@ -38307,9 +38331,9 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   if (src < srcend + equalsigns) {
     full_result r = scalar::base64::base64_tail_decode(
         dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
     if (r.error == error_code::INVALID_BASE64_CHARACTER ||
         r.error == error_code::BASE64_EXTRA_BITS) {
-      r.input_count += size_t(src - srcinit);
       return r;
     } else {
       r.output_count += size_t(dst - dstinit);
diff --git a/deps/simdutf/simdutf.h b/deps/simdutf/simdutf.h
index a219e96ad9cd9f..a30e5f2cd228cd 100644
--- a/deps/simdutf/simdutf.h
+++ b/deps/simdutf/simdutf.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-12 20:00:19 -0500. Do not edit! */
+/* auto-generated on 2024-11-14 14:52:31 -0500. Do not edit! */
 /* begin file include/simdutf.h */
 #ifndef SIMDUTF_H
 #define SIMDUTF_H
@@ -670,7 +670,7 @@ SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 #define SIMDUTF_SIMDUTF_VERSION_H
 
 /** The version of simdutf being used (major.minor.revision) */
-#define SIMDUTF_VERSION "5.6.1"
+#define SIMDUTF_VERSION "5.6.2"
 
 namespace simdutf {
 enum {
@@ -685,7 +685,7 @@ enum {
   /**
    * The revision (major.minor.REVISION) of simdutf being used.
    */
-  SIMDUTF_VERSION_REVISION = 1
+  SIMDUTF_VERSION_REVISION = 2
 };
 } // namespace simdutf
 

From 61de8f9b04899ca778ab88ad0d0e61920b56c3ae Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Tue, 19 Nov 2024 15:15:34 -0300
Subject: [PATCH 119/216] doc: include git node release --promote to steps

Refs: https://github.com/nodejs/node-core-utils/pull/835
PR-URL: https://github.com/nodejs/node/pull/55835
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
---
 doc/contributing/releases.md | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 82eec2833998ad..3ae682169873df 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -707,12 +707,23 @@ the build before moving forward. Use the following list as a baseline:
 
 ### 11. Tag and sign the release commit
 
-Once you have produced builds that you're happy with, create a new tag. By
-waiting until this stage to create tags, you can discard a proposed release if
-something goes wrong or additional commits are required. Once you have created a
-tag and pushed it to GitHub, you _**must not**_ delete and re-tag. If you make
-a mistake after tagging then you'll have to version-bump and start again and
-count that tag/version as lost.
+Once you have produced builds that you're happy with you can either run
+`git node release --promote`
+
+```bash
+git node release -S --promote https://github.com/nodejs/node/pull/XXXX
+```
+
+to automate the remaining steps until step 16 or you can perform it manually
+following the below steps.
+
+***
+
+Create a new tag: By waiting until this stage to create tags, you can discard
+a proposed release if something goes wrong or additional commits are required.
+Once you have created a tag and pushed it to GitHub, you _**must not**_ delete
+and re-tag. If you make a mistake after tagging then you'll have to version-bump
+and start again and count that tag/version as lost.
 
 Tag summaries have a predictable format. Look at a recent tag to see:
 

From a882536596f6878cc73ebd2779c1b9bb7ad76ac7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?H=C3=BCseyin=20A=C3=A7acak?=
 <110401522+huseyinacacak-janea@users.noreply.github.com>
Date: Wed, 20 Nov 2024 15:14:20 +0300
Subject: [PATCH 120/216] src: fix kill signal on Windows

Fixes: https://github.com/nodejs/node/issues/42923
PR-URL: https://github.com/nodejs/node/pull/55514
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Stefan Stojanovic <stefan.stojanovic@janeasystems.com>
---
 doc/api/child_process.md                 |  4 ++--
 src/process_wrap.cc                      |  6 ++++++
 test/parallel/test-child-process-kill.js | 19 +++++++++++++++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/doc/api/child_process.md b/doc/api/child_process.md
index 60d005ffea472e..7d5f6c52e5af71 100644
--- a/doc/api/child_process.md
+++ b/doc/api/child_process.md
@@ -1700,8 +1700,8 @@ may not actually terminate the process.
 See kill(2) for reference.
 
 On Windows, where POSIX signals do not exist, the `signal` argument will be
-ignored, and the process will be killed forcefully and abruptly (similar to
-`'SIGKILL'`).
+ignored except for `'SIGKILL'`, `'SIGTERM'`, `'SIGINT'` and `'SIGQUIT'`, and the
+process will always be killed forcefully and abruptly (similar to `'SIGKILL'`).
 See [Signal Events][] for more details.
 
 On Linux, child processes of child processes will not be terminated
diff --git a/src/process_wrap.cc b/src/process_wrap.cc
index 556dea18eca76f..74f46bcb651bef 100644
--- a/src/process_wrap.cc
+++ b/src/process_wrap.cc
@@ -312,6 +312,12 @@ class ProcessWrap : public HandleWrap {
     ProcessWrap* wrap;
     ASSIGN_OR_RETURN_UNWRAP(&wrap, args.This());
     int signal = args[0]->Int32Value(env->context()).FromJust();
+#ifdef _WIN32
+    if (signal != SIGKILL && signal != SIGTERM && signal != SIGINT &&
+        signal != SIGQUIT) {
+      signal = SIGKILL;
+    }
+#endif
     int err = uv_process_kill(&wrap->process_, signal);
     args.GetReturnValue().Set(err);
   }
diff --git a/test/parallel/test-child-process-kill.js b/test/parallel/test-child-process-kill.js
index 1025c69ba1ac28..26bdc029c04fe8 100644
--- a/test/parallel/test-child-process-kill.js
+++ b/test/parallel/test-child-process-kill.js
@@ -39,3 +39,22 @@ assert.strictEqual(cat.signalCode, null);
 assert.strictEqual(cat.killed, false);
 cat.kill();
 assert.strictEqual(cat.killed, true);
+
+// Test different types of kill signals on Windows.
+if (common.isWindows) {
+  for (const sendSignal of ['SIGTERM', 'SIGKILL', 'SIGQUIT', 'SIGINT']) {
+    const process = spawn('cmd');
+    process.on('exit', (code, signal) => {
+      assert.strictEqual(code, null);
+      assert.strictEqual(signal, sendSignal);
+    });
+    process.kill(sendSignal);
+  }
+
+  const process = spawn('cmd');
+  process.on('exit', (code, signal) => {
+    assert.strictEqual(code, null);
+    assert.strictEqual(signal, 'SIGKILL');
+  });
+  process.kill('SIGHUP');
+}

From a17d9e1acf3853c4099d40d3b879c752ee7bbd8a Mon Sep 17 00:00:00 2001
From: Livia Medeiros <livia@cirno.name>
Date: Thu, 21 Nov 2024 03:25:05 +0900
Subject: [PATCH 121/216] test: fix determining lower priority

PR-URL: https://github.com/nodejs/node/pull/55908
Fixes: https://github.com/NixOS/nixpkgs/issues/355919
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 test/parallel/test-os.js | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/test/parallel/test-os.js b/test/parallel/test-os.js
index efaec2b3ead385..3d9fe5c1a60e2e 100644
--- a/test/parallel/test-os.js
+++ b/test/parallel/test-os.js
@@ -84,7 +84,8 @@ assert.ok(hostname.length > 0);
 // IBMi process priority is different.
 if (!common.isIBMi) {
   const { PRIORITY_BELOW_NORMAL, PRIORITY_LOW } = os.constants.priority;
-  const LOWER_PRIORITY = os.getPriority() > PRIORITY_BELOW_NORMAL ? PRIORITY_BELOW_NORMAL : PRIORITY_LOW;
+  // Priority means niceness: higher numeric value <=> lower priority
+  const LOWER_PRIORITY = os.getPriority() < PRIORITY_BELOW_NORMAL ? PRIORITY_BELOW_NORMAL : PRIORITY_LOW;
   os.setPriority(LOWER_PRIORITY);
   const priority = os.getPriority();
   is.number(priority);

From 41e3bcd752c9c5edef7bd4104a672797d375b7b7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alfredo=20Gonz=C3=A1lez?=
 <12631491+mfdebian@users.noreply.github.com>
Date: Wed, 20 Nov 2024 20:26:30 -0300
Subject: [PATCH 122/216] doc: add esm examples to node:timers

PR-URL: https://github.com/nodejs/node/pull/55857
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/api/timers.md | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/doc/api/timers.md b/doc/api/timers.md
index 7f3b11fe1f6d08..57279f6aee23dd 100644
--- a/doc/api/timers.md
+++ b/doc/api/timers.md
@@ -288,7 +288,24 @@ returned Promises will be rejected with an `'AbortError'`.
 
 For `setImmediate()`:
 
-```js
+```mjs
+import { setImmediate as setImmediatePromise } from 'node:timers/promises';
+
+const ac = new AbortController();
+const signal = ac.signal;
+
+// We do not `await` the promise so `ac.abort()` is called concurrently.
+setImmediatePromise('foobar', { signal })
+  .then(console.log)
+  .catch((err) => {
+    if (err.name === 'AbortError')
+      console.error('The immediate was aborted');
+  });
+
+ac.abort();
+```
+
+```cjs
 const { setImmediate: setImmediatePromise } = require('node:timers/promises');
 
 const ac = new AbortController();
@@ -306,7 +323,24 @@ ac.abort();
 
 For `setTimeout()`:
 
-```js
+```mjs
+import { setTimeout as setTimeoutPromise } from 'node:timers/promises';
+
+const ac = new AbortController();
+const signal = ac.signal;
+
+// We do not `await` the promise so `ac.abort()` is called concurrently.
+setTimeoutPromise(1000, 'foobar', { signal })
+  .then(console.log)
+  .catch((err) => {
+    if (err.name === 'AbortError')
+      console.error('The timeout was aborted');
+  });
+
+ac.abort();
+```
+
+```cjs
 const { setTimeout: setTimeoutPromise } = require('node:timers/promises');
 
 const ac = new AbortController();

From 207562fa3d79ad966a760ce6ad2f1d121236941a Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Fri, 22 Nov 2024 10:33:06 +0100
Subject: [PATCH 123/216] test: make x509 crypto tests work with BoringSSL

PR-URL: https://github.com/nodejs/node/pull/55927
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 test/parallel/test-crypto-x509.js | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/test/parallel/test-crypto-x509.js b/test/parallel/test-crypto-x509.js
index 89a7521544f705..ee4d96b476864e 100644
--- a/test/parallel/test-crypto-x509.js
+++ b/test/parallel/test-crypto-x509.js
@@ -111,7 +111,7 @@ const der = Buffer.from(
     '5A:42:63:E0:21:2F:D6:70:63:07:96:6F:27:A7:78:12:08:02:7A:8B'
   );
   assert.strictEqual(x509.keyUsage, undefined);
-  assert.strictEqual(x509.serialNumber, '147D36C1C2F74206DE9FAB5F2226D78ADB00A426');
+  assert.strictEqual(x509.serialNumber.toUpperCase(), '147D36C1C2F74206DE9FAB5F2226D78ADB00A426');
 
   assert.deepStrictEqual(x509.raw, der);
 
@@ -253,6 +253,16 @@ oans248kpal88CGqsN2so/wZKxVnpiXlPHMdiNL7hRSUqlHkUi07FrP2Htg8kjI=
   });
   mc.port2.postMessage(x509);
 
+  const modulusOSSL = 'D456320AFB20D3827093DC2C4284ED04DFBABD56E1DDAE529E28B790CD42' +
+                      '56DB273349F3735FFD337C7A6363ECCA5A27B7F73DC7089A96C6D886DB0C' +
+                      '62388F1CDD6A963AFCD599D5800E587A11F908960F84ED50BA25A28303EC' +
+                      'DA6E684FBE7BAEDC9CE8801327B1697AF25097CEE3F175E400984C0DB6A8' +
+                      'EB87BE03B4CF94774BA56FFFC8C63C68D6ADEB60ABBE69A7B14AB6A6B9E7' +
+                      'BAA89B5ADAB8EB07897C07F6D4FA3D660DFF574107D28E8F63467A788624' +
+                      'C574197693E959CEA1362FFAE1BBA10C8C0D88840ABFEF103631B2E8F5C3' +
+                      '9B5548A7EA57E8A39F89291813F45A76C448033A2B7ED8403F4BAA147CF3' +
+                      '5E2D2554AA65CE49695797095BF4DC6B';
+
   // Verify that legacy encoding works
   const legacyObjectCheck = {
     subject: Object.assign({ __proto__: null }, {
@@ -277,15 +287,7 @@ oans248kpal88CGqsN2so/wZKxVnpiXlPHMdiNL7hRSUqlHkUi07FrP2Htg8kjI=
       'OCSP - URI': ['http://ocsp.nodejs.org/'],
       'CA Issuers - URI': ['http://ca.nodejs.org/ca.cert']
     }),
-    modulus: 'D456320AFB20D3827093DC2C4284ED04DFBABD56E1DDAE529E28B790CD42' +
-              '56DB273349F3735FFD337C7A6363ECCA5A27B7F73DC7089A96C6D886DB0C' +
-              '62388F1CDD6A963AFCD599D5800E587A11F908960F84ED50BA25A28303EC' +
-              'DA6E684FBE7BAEDC9CE8801327B1697AF25097CEE3F175E400984C0DB6A8' +
-              'EB87BE03B4CF94774BA56FFFC8C63C68D6ADEB60ABBE69A7B14AB6A6B9E7' +
-              'BAA89B5ADAB8EB07897C07F6D4FA3D660DFF574107D28E8F63467A788624' +
-              'C574197693E959CEA1362FFAE1BBA10C8C0D88840ABFEF103631B2E8F5C3' +
-              '9B5548A7EA57E8A39F89291813F45A76C448033A2B7ED8403F4BAA147CF3' +
-              '5E2D2554AA65CE49695797095BF4DC6B',
+    modulusPattern: new RegExp(`^${modulusOSSL}$`, 'i'),
     bits: 2048,
     exponent: '0x10001',
     valid_from: 'Sep  3 21:40:37 2022 GMT',
@@ -298,7 +300,7 @@ oans248kpal88CGqsN2so/wZKxVnpiXlPHMdiNL7hRSUqlHkUi07FrP2Htg8kjI=
       '51:62:18:39:E2:E2:77:F5:86:11:E8:C0:CA:54:43:7C:76:83:19:05:D0:03:' +
       '24:21:B8:EB:14:61:FB:24:16:EB:BD:51:1A:17:91:04:30:03:EB:68:5F:DC:' +
       '86:E1:D1:7C:FB:AF:78:ED:63:5F:29:9C:32:AF:A1:8E:22:96:D1:02',
-    serialNumber: '147D36C1C2F74206DE9FAB5F2226D78ADB00A426'
+    serialNumberPattern: /^147D36C1C2F74206DE9FAB5F2226D78ADB00A426$/i
   };
 
   const legacyObject = x509.toLegacyObject();
@@ -307,7 +309,7 @@ oans248kpal88CGqsN2so/wZKxVnpiXlPHMdiNL7hRSUqlHkUi07FrP2Htg8kjI=
   assert.deepStrictEqual(legacyObject.subject, legacyObjectCheck.subject);
   assert.deepStrictEqual(legacyObject.issuer, legacyObjectCheck.issuer);
   assert.deepStrictEqual(legacyObject.infoAccess, legacyObjectCheck.infoAccess);
-  assert.strictEqual(legacyObject.modulus, legacyObjectCheck.modulus);
+  assert.match(legacyObject.modulus, legacyObjectCheck.modulusPattern);
   assert.strictEqual(legacyObject.bits, legacyObjectCheck.bits);
   assert.strictEqual(legacyObject.exponent, legacyObjectCheck.exponent);
   assert.strictEqual(legacyObject.valid_from, legacyObjectCheck.valid_from);
@@ -316,9 +318,9 @@ oans248kpal88CGqsN2so/wZKxVnpiXlPHMdiNL7hRSUqlHkUi07FrP2Htg8kjI=
   assert.strictEqual(
     legacyObject.fingerprint256,
     legacyObjectCheck.fingerprint256);
-  assert.strictEqual(
+  assert.match(
     legacyObject.serialNumber,
-    legacyObjectCheck.serialNumber);
+    legacyObjectCheck.serialNumberPattern);
 }
 
 {

From 6327554706f2ba4297f9f1511f3d10202eabbf70 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 22 Nov 2024 18:12:44 +0000
Subject: [PATCH 124/216] tools: add linter for release commit proposals

PR-URL: https://github.com/nodejs/node/pull/55923
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 .github/workflows/lint-release-proposal.yml | 59 +++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 .github/workflows/lint-release-proposal.yml

diff --git a/.github/workflows/lint-release-proposal.yml b/.github/workflows/lint-release-proposal.yml
new file mode 100644
index 00000000000000..bc2ac2d0127865
--- /dev/null
+++ b/.github/workflows/lint-release-proposal.yml
@@ -0,0 +1,59 @@
+name: Linters (release proposals)
+
+on:
+  push:
+    branches:
+      - v[0-9]+.[0-9]+.[0-9]+-proposal
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+env:
+  PYTHON_VERSION: '3.12'
+  NODE_VERSION: lts/*
+
+permissions:
+  contents: read
+
+jobs:
+  lint-release-commit:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
+        with:
+          persist-credentials: false
+      - name: Lint release commit title format
+        run: |
+          EXPECTED_TITLE='^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}, Version [[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+ (\(Current|'.+' \(LTS)\)$'
+          echo "Expected commit title format: $EXPECTED_TITLE"
+          COMMIT_SUBJECT="$(git --no-pager log -1 --format=%s)"
+          echo "Actual: $ACTUAL"
+          echo "$COMMIT_SUBJECT" | grep -q -E "$EXPECTED_TITLE"
+          echo "COMMIT_SUBJECT=$COMMIT_SUBJECT" >> "$GITHUB_ENV"
+      - name: Lint release commit message trailers
+        run: |
+          EXPECTED_TRAILER="^PR-URL: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/pull/[[:digit:]]+\$"
+          echo "Expected trailer format: $EXPECTED_TRAILER"
+          ACTUAL="$(git --no-pager log -1 --format=%b | git interpret-trailers --parse --no-divider)"
+          echo "Actual: $ACTUAL"
+          echo "$ACTUAL" | grep -E -q "$EXPECTED_TRAILER"
+
+          PR_URL="${ACTUAL:8}"
+          PR_HEAD="$(gh pr view "$PR_URL" --json headRefOid -q .headRefOid)"
+          echo "Head of $PR_URL: $PR_HEAD"
+          echo "Current commit: $GITHUB_SHA"
+          [[ "$PR_HEAD" == "$GITHUB_SHA" ]]
+        env:
+          GH_TOKEN: ${{ github.token }}
+      - name: Validate CHANGELOG
+        id: releaser-info
+        run: |
+          EXPECTED_CHANGELOG_TITLE_INTRO="## $COMMIT_SUBJECT, @"
+          echo "Expected CHANGELOG section title: $EXPECTED_CHANGELOG_TITLE_INTRO"
+          CHANGELOG_TITLE="$(grep "$EXPECTED_CHANGELOG_TITLE_INTRO" "doc/changelogs/CHANGELOG_V${COMMIT_SUBJECT:20:2}.md")"
+          echo "Actual: $CHANGELOG_TITLE"
+          [[ "${CHANGELOG_TITLE%@*}@" == "$EXPECTED_CHANGELOG_TITLE_INTRO" ]]
+      - name: Verify NODE_VERSION_IS_RELEASE bit is correctly set
+        run: |
+          grep -q '^#define NODE_VERSION_IS_RELEASE 1$' src/node_version.h

From 58a6fbb9cf6cf0c1fb001c1b9872b1c3e303e435 Mon Sep 17 00:00:00 2001
From: Michael Dawson <midawson@redhat.com>
Date: Fri, 22 Nov 2024 17:25:15 -0500
Subject: [PATCH 125/216] doc: document approach for building wasm in deps

Refs: https://github.com/nodejs/security-wg/issues/1236

Signed-off-by: Michael Dawson <midawson@redhat.com>
PR-URL: https://github.com/nodejs/node/pull/55940
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .../maintaining/maintaining-dependencies.md   | 42 +++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/doc/contributing/maintaining/maintaining-dependencies.md b/doc/contributing/maintaining/maintaining-dependencies.md
index d6a20e56a36253..940605698055ce 100644
--- a/doc/contributing/maintaining/maintaining-dependencies.md
+++ b/doc/contributing/maintaining/maintaining-dependencies.md
@@ -142,6 +142,48 @@ can be added as a non-externalizable dependency. In this case
 simply add the path to the JavaScript file in the `deps_files`
 list in the `node.gyp` file.
 
+## Common approach for dependencies with WASM components
+
+WASM components within dependencies are most often built
+outside of the regular Node.js `make build` step. They also
+require different tools.
+
+It is important that the tools and their versions used to build
+WASM components shipped within Node.js are well documented and
+be available if needed to rebuild/update older Node.js versions.
+
+In order to minimize the different number of tools and versions
+used to build WASM components and to document and ensure future
+availability, the project builds and maintains a common
+[wasm-builder](https://github.com/nodejs/wasm-builder) container
+that should be use to build WASM components in Node.js
+dependencies.
+
+The container provides a durable copy of the versions of the tools
+used for a specific build which are under the control of the Node.js
+project. In addition, the tools and verions are documented through metadata
+within the container in the `/home/node/metadata directory`.
+
+The available tools can be found by looking at the current version of the
+[Dockerfile](https://github.com/nodejs/wasm-builder/blob/main/container-build-info/Dockerfile)
+used to create the container.
+
+If additional WASM tool are needed beyond those available in the
+container, additions should be PR'd into the wasm-builder container.
+
+Examples of using the container include:
+
+* [build/wasm.js](https://github.com/nodejs/undici/blob/main/build/wasm.js) from undici
+* [tools/build-wasm.js](https://github.com/nodejs/amaro/blob/main/tools/build-wasm.js) from amaro
+
+In addition to using the container to build WASM components, the goal is also
+for the WASM components and final files that are shipped with Node.js to be
+built by the [dep-updaters](https://github.com/nodejs/node/tree/main/tools/dep_updaters)
+that are run on a regular basis and that they use only the files available in the Node.js
+repo for the dependency. For example, being able to rebuild the WASM and files that
+we ship in Node.js using only the files in
+[../deps/undici](https://github.com/nodejs/node/tree/main/deps/undici).
+
 ## Updating dependencies
 
 Most dependencies are automatically updated by

From db5378c8b9b32874f8c5484bbea021c0281a505f Mon Sep 17 00:00:00 2001
From: Leonardo Peixoto <67864816+peixotoleonardo@users.noreply.github.com>
Date: Sat, 23 Nov 2024 11:04:16 -0300
Subject: [PATCH 126/216] doc: add esm example for zlib

PR-URL: https://github.com/nodejs/node/pull/55946
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/zlib.md | 243 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 230 insertions(+), 13 deletions(-)

diff --git a/doc/api/zlib.md b/doc/api/zlib.md
index da839466f1bae9..7fb73d387f5410 100644
--- a/doc/api/zlib.md
+++ b/doc/api/zlib.md
@@ -11,7 +11,11 @@ Gzip, Deflate/Inflate, and Brotli.
 
 To access it:
 
-```js
+```mjs
+import os from 'node:zlib';
+```
+
+```cjs
 const zlib = require('node:zlib');
 ```
 
@@ -21,13 +25,35 @@ Compressing or decompressing a stream (such as a file) can be accomplished by
 piping the source stream through a `zlib` `Transform` stream into a destination
 stream:
 
-```js
-const { createGzip } = require('node:zlib');
-const { pipeline } = require('node:stream');
+```mjs
+import {
+  createReadStream,
+  createWriteStream,
+} from 'node:fs';
+import process from 'node:process';
+import { createGzip } from 'node:zlib';
+import { pipeline } from 'node:stream';
+
+const gzip = createGzip();
+const source = createReadStream('input.txt');
+const destination = createWriteStream('input.txt.gz');
+
+pipeline(source, gzip, destination, (err) => {
+  if (err) {
+    console.error('An error occurred:', err);
+    process.exitCode = 1;
+  }
+});
+```
+
+```cjs
 const {
   createReadStream,
   createWriteStream,
 } = require('node:fs');
+const process = require('node:process');
+const { createGzip } = require('node:zlib');
+const { pipeline } = require('node:stream');
 
 const gzip = createGzip();
 const source = createReadStream('input.txt');
@@ -39,17 +65,43 @@ pipeline(source, gzip, destination, (err) => {
     process.exitCode = 1;
   }
 });
+```
 
-// Or, Promisified
+Or, using the promise `pipeline` API:
 
-const { promisify } = require('node:util');
-const pipe = promisify(pipeline);
+```mjs
+import {
+  createReadStream,
+  createWriteStream,
+} from 'node:fs';
+import process from 'node:process';
+import { createGzip } from 'node:zlib';
+import { pipeline } from 'node:stream/promises';
+
+async function do_gzip(input, output) {
+  const gzip = createGzip();
+  const source = createReadStream(input);
+  const destination = createWriteStream(output);
+  await pipeline(source, gzip, destination);
+}
+
+await do_gzip('input.txt', 'input.txt.gz');
+```
+
+```cjs
+const {
+  createReadStream,
+  createWriteStream,
+} = require('node:fs');
+const process = require('node:process');
+const { createGzip } = require('node:zlib');
+const { pipeline } = require('node:stream/promises');
 
 async function do_gzip(input, output) {
   const gzip = createGzip();
   const source = createReadStream(input);
   const destination = createWriteStream(output);
-  await pipe(source, gzip, destination);
+  await pipeline(source, gzip, destination);
 }
 
 do_gzip('input.txt', 'input.txt.gz')
@@ -61,7 +113,39 @@ do_gzip('input.txt', 'input.txt.gz')
 
 It is also possible to compress or decompress data in a single step:
 
-```js
+```mjs
+import process from 'node:process';
+import { Buffer } from 'node:buffer';
+import { deflate, unzip } from 'node:zlib';
+
+const input = '.................................';
+deflate(input, (err, buffer) => {
+  if (err) {
+    console.error('An error occurred:', err);
+    process.exitCode = 1;
+  }
+  console.log(buffer.toString('base64'));
+});
+
+const buffer = Buffer.from('eJzT0yMAAGTvBe8=', 'base64');
+unzip(buffer, (err, buffer) => {
+  if (err) {
+    console.error('An error occurred:', err);
+    process.exitCode = 1;
+  }
+  console.log(buffer.toString());
+});
+
+// Or, Promisified
+
+import { promisify } from 'node:util';
+const do_unzip = promisify(unzip);
+
+const unzippedBuffer = await do_unzip(buffer);
+console.log(unzippedBuffer.toString());
+```
+
+```cjs
 const { deflate, unzip } = require('node:zlib');
 
 const input = '.................................';
@@ -104,7 +188,19 @@ limitations in some applications.
 Creating and using a large number of zlib objects simultaneously can cause
 significant memory fragmentation.
 
-```js
+```mjs
+import zlib from 'node:zlib';
+import { Buffer } from 'node:buffer';
+
+const payload = Buffer.from('This is some data');
+
+// WARNING: DO NOT DO THIS!
+for (let i = 0; i < 30000; ++i) {
+  zlib.deflate(payload, (err, buffer) => {});
+}
+```
+
+```cjs
 const zlib = require('node:zlib');
 
 const payload = Buffer.from('This is some data');
@@ -138,7 +234,47 @@ Using `zlib` encoding can be expensive, and the results ought to be cached.
 See [Memory usage tuning][] for more information on the speed/memory/compression
 tradeoffs involved in `zlib` usage.
 
-```js
+```mjs
+// Client request example
+import fs from 'node:fs';
+import zlib from 'node:zlib';
+import http from 'node:http';
+import process from 'node:process';
+import { pipeline } from 'node:stream';
+
+const request = http.get({ host: 'example.com',
+                           path: '/',
+                           port: 80,
+                           headers: { 'Accept-Encoding': 'br,gzip,deflate' } });
+request.on('response', (response) => {
+  const output = fs.createWriteStream('example.com_index.html');
+
+  const onError = (err) => {
+    if (err) {
+      console.error('An error occurred:', err);
+      process.exitCode = 1;
+    }
+  };
+
+  switch (response.headers['content-encoding']) {
+    case 'br':
+      pipeline(response, zlib.createBrotliDecompress(), output, onError);
+      break;
+    // Or, just use zlib.createUnzip() to handle both of the following cases:
+    case 'gzip':
+      pipeline(response, zlib.createGunzip(), output, onError);
+      break;
+    case 'deflate':
+      pipeline(response, zlib.createInflate(), output, onError);
+      break;
+    default:
+      pipeline(response, output, onError);
+      break;
+  }
+});
+```
+
+```cjs
 // Client request example
 const zlib = require('node:zlib');
 const http = require('node:http');
@@ -177,7 +313,52 @@ request.on('response', (response) => {
 });
 ```
 
-```js
+```mjs
+// server example
+// Running a gzip operation on every request is quite expensive.
+// It would be much more efficient to cache the compressed buffer.
+import zlib from 'node:zlib';
+import http from 'node:http';
+import fs from 'node:fs';
+import { pipeline } from 'node:stream';
+
+http.createServer((request, response) => {
+  const raw = fs.createReadStream('index.html');
+  // Store both a compressed and an uncompressed version of the resource.
+  response.setHeader('Vary', 'Accept-Encoding');
+  const acceptEncoding = request.headers['accept-encoding'] || '';
+
+  const onError = (err) => {
+    if (err) {
+      // If an error occurs, there's not much we can do because
+      // the server has already sent the 200 response code and
+      // some amount of data has already been sent to the client.
+      // The best we can do is terminate the response immediately
+      // and log the error.
+      response.end();
+      console.error('An error occurred:', err);
+    }
+  };
+
+  // Note: This is not a conformant accept-encoding parser.
+  // See https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3
+  if (/\bdeflate\b/.test(acceptEncoding)) {
+    response.writeHead(200, { 'Content-Encoding': 'deflate' });
+    pipeline(raw, zlib.createDeflate(), response, onError);
+  } else if (/\bgzip\b/.test(acceptEncoding)) {
+    response.writeHead(200, { 'Content-Encoding': 'gzip' });
+    pipeline(raw, zlib.createGzip(), response, onError);
+  } else if (/\bbr\b/.test(acceptEncoding)) {
+    response.writeHead(200, { 'Content-Encoding': 'br' });
+    pipeline(raw, zlib.createBrotliCompress(), response, onError);
+  } else {
+    response.writeHead(200, {});
+    pipeline(raw, response, onError);
+  }
+}).listen(1337);
+```
+
+```cjs
 // server example
 // Running a gzip operation on every request is quite expensive.
 // It would be much more efficient to cache the compressed buffer.
@@ -318,7 +499,43 @@ quality, but can be useful when data needs to be available as soon as possible.
 In the following example, `flush()` is used to write a compressed partial
 HTTP response to the client:
 
-```js
+```mjs
+import zlib from 'node:zlib';
+import http from 'node:http';
+import { pipeline } from 'node:stream';
+
+http.createServer((request, response) => {
+  // For the sake of simplicity, the Accept-Encoding checks are omitted.
+  response.writeHead(200, { 'content-encoding': 'gzip' });
+  const output = zlib.createGzip();
+  let i;
+
+  pipeline(output, response, (err) => {
+    if (err) {
+      // If an error occurs, there's not much we can do because
+      // the server has already sent the 200 response code and
+      // some amount of data has already been sent to the client.
+      // The best we can do is terminate the response immediately
+      // and log the error.
+      clearInterval(i);
+      response.end();
+      console.error('An error occurred:', err);
+    }
+  });
+
+  i = setInterval(() => {
+    output.write(`The current time is ${Date()}\n`, () => {
+      // The data has been passed to zlib, but the compression algorithm may
+      // have decided to buffer the data for more efficient compression.
+      // Calling .flush() will make the data available as soon as the client
+      // is ready to receive it.
+      output.flush();
+    });
+  }, 1000);
+}).listen(1337);
+```
+
+```cjs
 const zlib = require('node:zlib');
 const http = require('node:http');
 const { pipeline } = require('node:stream');

From d63ccb60ea8622a96e5a7f701f4e5f93ddbad03f Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Sun, 18 Aug 2024 00:29:34 +0000
Subject: [PATCH 127/216] deps: update zlib to 1.3.0.1-motley-7e2e4d7

PR-URL: https://github.com/nodejs/node/pull/54432
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 deps/zlib/README.chromium               |  1 +
 deps/zlib/google/zip_internal.cc        |  9 ++----
 deps/zlib/google/zip_reader_unittest.cc |  7 +++--
 deps/zlib/google/zip_unittest.cc        | 37 +++++++++++++------------
 src/zlib_version.h                      |  2 +-
 5 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/deps/zlib/README.chromium b/deps/zlib/README.chromium
index 31b9d55860f9e5..92c5bfd1af200e 100644
--- a/deps/zlib/README.chromium
+++ b/deps/zlib/README.chromium
@@ -2,6 +2,7 @@ Name: zlib
 Short Name: zlib
 URL: http://zlib.net/
 Version: 1.3.0.1
+Revision: ac8f12c97d1afd9bafa9c710f827d40a407d3266
 CPEPrefix: cpe:/a:zlib:zlib:1.3.0.1
 Security Critical: yes
 Shipped: yes
diff --git a/deps/zlib/google/zip_internal.cc b/deps/zlib/google/zip_internal.cc
index aa49f4546caa0e..e6b5a4fc3bcb00 100644
--- a/deps/zlib/google/zip_internal.cc
+++ b/deps/zlib/google/zip_internal.cc
@@ -165,8 +165,7 @@ struct ZipBuffer {
 // writing compressed data and it returns NULL for this case.)
 void* OpenZipBuffer(void* opaque, const void* /*filename*/, int mode) {
   if ((mode & ZLIB_FILEFUNC_MODE_READWRITEFILTER) != ZLIB_FILEFUNC_MODE_READ) {
-    NOTREACHED_IN_MIGRATION();
-    return NULL;
+    NOTREACHED();
   }
   ZipBuffer* buffer = static_cast<ZipBuffer*>(opaque);
   if (!buffer || !buffer->data || !buffer->length)
@@ -196,8 +195,7 @@ uLong WriteZipBuffer(void* /*opaque*/,
                      void* /*stream*/,
                      const void* /*buf*/,
                      uLong /*size*/) {
-  NOTREACHED_IN_MIGRATION();
-  return 0;
+  NOTREACHED();
 }
 
 // Returns the offset from the beginning of the data.
@@ -228,8 +226,7 @@ long SeekZipBuffer(void* opaque,
     buffer->offset = std::min(buffer->length, offset);
     return 0;
   }
-  NOTREACHED_IN_MIGRATION();
-  return -1;
+  NOTREACHED();
 }
 
 // Closes the input offset and deletes all resources used for compressing or
diff --git a/deps/zlib/google/zip_reader_unittest.cc b/deps/zlib/google/zip_reader_unittest.cc
index 9d1406feff9887..46c0beb1453237 100644
--- a/deps/zlib/google/zip_reader_unittest.cc
+++ b/deps/zlib/google/zip_reader_unittest.cc
@@ -9,6 +9,7 @@
 #include <string.h>
 
 #include <iterator>
+#include <optional>
 #include <string>
 #include <string_view>
 #include <vector>
@@ -555,10 +556,10 @@ TEST_F(ZipReaderTest, ExtractToFileAsync_RegularFile) {
   const std::string md5 = base::MD5String(output);
   EXPECT_EQ(kQuuxExpectedMD5, md5);
 
-  int64_t file_size = 0;
-  ASSERT_TRUE(base::GetFileSize(target_file, &file_size));
+  std::optional<int64_t> file_size = base::GetFileSize(target_file);
+  ASSERT_TRUE(file_size.has_value());
 
-  EXPECT_EQ(file_size, listener.current_progress());
+  EXPECT_EQ(file_size.value(), listener.current_progress());
 }
 
 TEST_F(ZipReaderTest, ExtractToFileAsync_Encrypted_NoPassword) {
diff --git a/deps/zlib/google/zip_unittest.cc b/deps/zlib/google/zip_unittest.cc
index 58bafb809d6bf9..2bcfa309281fce 100644
--- a/deps/zlib/google/zip_unittest.cc
+++ b/deps/zlib/google/zip_unittest.cc
@@ -63,8 +63,9 @@ bool CreateFile(const std::string& content,
   if (!base::CreateTemporaryFile(file_path))
     return false;
 
-  if (base::WriteFile(*file_path, content.data(), content.size()) == -1)
+  if (!base::WriteFile(*file_path, content)) {
     return false;
+  }
 
   *file = base::File(
       *file_path, base::File::Flags::FLAG_OPEN | base::File::Flags::FLAG_READ);
@@ -350,7 +351,7 @@ class ZipTest : public PlatformTest {
     base::Time now_time;
     EXPECT_TRUE(base::Time::FromUTCExploded(now_parts, &now_time));
 
-    EXPECT_EQ(1, base::WriteFile(src_file, "1", 1));
+    EXPECT_TRUE(base::WriteFile(src_file, "1"));
     EXPECT_TRUE(base::TouchFile(src_file, base::Time::Now(), test_mtime));
 
     EXPECT_TRUE(zip::Zip(src_dir, zip_file, true));
@@ -748,6 +749,8 @@ TEST_F(ZipTest, UnzipMixedPaths) {
       "Space→",  //
 #else
       " ",                        //
+      "...",                      // Disappears on Windows
+      "....",                     // Disappears on Windows
       "AUX",                      // Disappears on Windows
       "COM1",                     // Disappears on Windows
       "COM2",                     // Disappears on Windows
@@ -1113,9 +1116,9 @@ TEST_F(ZipTest, UnzipFilesWithIncorrectSize) {
     SCOPED_TRACE(base::StringPrintf("Processing %d.txt", i));
     base::FilePath file_path =
         temp_dir.AppendASCII(base::StringPrintf("%d.txt", i));
-    int64_t file_size = -1;
-    EXPECT_TRUE(base::GetFileSize(file_path, &file_size));
-    EXPECT_EQ(static_cast<int64_t>(i), file_size);
+    std::optional<int64_t> file_size = base::GetFileSize(file_path);
+    EXPECT_TRUE(file_size.has_value());
+    EXPECT_EQ(static_cast<int64_t>(i), file_size.value());
   }
 }
 
@@ -1306,10 +1309,10 @@ TEST_F(ZipTest, Compressed) {
 
   // Since the source files compress well, the destination ZIP file should be
   // smaller than the source files.
-  int64_t dest_file_size;
-  ASSERT_TRUE(base::GetFileSize(dest_file, &dest_file_size));
-  EXPECT_GT(dest_file_size, 300);
-  EXPECT_LT(dest_file_size, 1000);
+  std::optional<int64_t> dest_file_size = base::GetFileSize(dest_file);
+  ASSERT_TRUE(dest_file_size.has_value());
+  EXPECT_GT(dest_file_size.value(), 300);
+  EXPECT_LT(dest_file_size.value(), 1000);
 }
 
 // Tests that a ZIP put inside a ZIP is simply stored instead of being
@@ -1338,10 +1341,10 @@ TEST_F(ZipTest, NestedZip) {
   // Since the dummy source (inner) ZIP file should simply be stored in the
   // destination (outer) ZIP file, the destination file should be bigger than
   // the source file, but not much bigger.
-  int64_t dest_file_size;
-  ASSERT_TRUE(base::GetFileSize(dest_file, &dest_file_size));
-  EXPECT_GT(dest_file_size, src_size + 100);
-  EXPECT_LT(dest_file_size, src_size + 300);
+  std::optional<int64_t> dest_file_size = base::GetFileSize(dest_file);
+  ASSERT_TRUE(dest_file_size.has_value());
+  EXPECT_GT(dest_file_size.value(), src_size + 100);
+  EXPECT_LT(dest_file_size.value(), src_size + 300);
 }
 
 // Tests that there is no 2GB or 4GB limits. Tests that big files can be zipped
@@ -1402,10 +1405,10 @@ TEST_F(ZipTest, BigFile) {
   // Since the dummy source (inner) ZIP file should simply be stored in the
   // destination (outer) ZIP file, the destination file should be bigger than
   // the source file, but not much bigger.
-  int64_t dest_file_size;
-  ASSERT_TRUE(base::GetFileSize(dest_file, &dest_file_size));
-  EXPECT_GT(dest_file_size, src_size + 100);
-  EXPECT_LT(dest_file_size, src_size + 300);
+  std::optional<int64_t> dest_file_size = base::GetFileSize(dest_file);
+  ASSERT_TRUE(dest_file_size.has_value());
+  EXPECT_GT(dest_file_size.value(), src_size + 100);
+  EXPECT_LT(dest_file_size.value(), src_size + 300);
 
   LOG(INFO) << "Reading big ZIP " << dest_file;
   zip::ZipReader reader;
diff --git a/src/zlib_version.h b/src/zlib_version.h
index f9db56d4f51d37..6484e1dce878f3 100644
--- a/src/zlib_version.h
+++ b/src/zlib_version.h
@@ -2,5 +2,5 @@
 // Refer to tools/dep_updaters/update-zlib.sh
 #ifndef SRC_ZLIB_VERSION_H_
 #define SRC_ZLIB_VERSION_H_
-#define ZLIB_VERSION "1.3.0.1-motley-71660e1"
+#define ZLIB_VERSION "1.3.0.1-motley-7e2e4d7"
 #endif  // SRC_ZLIB_VERSION_H_

From 9b6cc54b505093a79bd7a60f2d8d43246165ab45 Mon Sep 17 00:00:00 2001
From: Michael Dawson <midawson@redhat.com>
Date: Wed, 13 Nov 2024 21:50:32 +0000
Subject: [PATCH 128/216] doc: doc how to add message for promotion

Document the process for adding a message that
ambassadors will be asked to promote.

Signed-off-by: Michael Dawson <midawson@redhat.com>
PR-URL: https://github.com/nodejs/node/pull/55843
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 .../advocacy-ambasador-program.md             | 30 +++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/doc/contributing/advocacy-ambasador-program.md b/doc/contributing/advocacy-ambasador-program.md
index 3b518d092e1feb..f806109f866caf 100644
--- a/doc/contributing/advocacy-ambasador-program.md
+++ b/doc/contributing/advocacy-ambasador-program.md
@@ -96,3 +96,33 @@ an ambassador. These requests can be made through the existing social channel
 in the OpenJS Slack. For that reason and for communication purposes and
 collaboration opportunities, ambassadors should be members of the
 [OpenJS Slack](https://slack-invite.openjsf.org/).
+
+## Messages and topics to promote
+
+### How to add messages or topics to promote
+
+Messages or topics that ambassadors are asked to promote are added to this
+document in the [Current messages for promotion](#current-messages-for-promotion)
+section through the standard PR process except that they should be open
+for 7 days before landing and should include and at mention to the
+nodejs/TSC for awareness. They should be removed through the same process
+when no longer relevant.
+
+### Current messages for promotion
+
+#### Sample message (Leave this one at the top)
+
+##### Goal
+
+The goal is to raise awareness of XYZ in the JavaScript ecosystem.
+
+#### Related Links
+
+List of links with more information about the topic to provide brackground
+or the information to be shared.
+
+#### Project contacts
+
+Add a list of GitHub handles for those within the project that
+have volunteered to be contacated when necessary by ambassadors
+to get more info about the message to be promoted.

From 2d03f87ef7b3f20e22da69eda2f3d75c66d4198f Mon Sep 17 00:00:00 2001
From: Thomas Chetwin <tchetwin@users.noreply.github.com>
Date: Sat, 23 Nov 2024 21:23:53 +0000
Subject: [PATCH 129/216] test: convert readdir test to use test runner

Signed-off-by: tchetwin <tchetwin@bloomberg.net>
PR-URL: https://github.com/nodejs/node/pull/55750
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 test/parallel/test-fs-readdir-recursive.js | 24 +++++++++++++---------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/test/parallel/test-fs-readdir-recursive.js b/test/parallel/test-fs-readdir-recursive.js
index f32e600d2a3660..7cfc0903faa0b4 100644
--- a/test/parallel/test-fs-readdir-recursive.js
+++ b/test/parallel/test-fs-readdir-recursive.js
@@ -1,14 +1,18 @@
 'use strict';
-const common = require('../common');
-const fs = require('fs');
-const net = require('net');
 
+const { PIPE, mustCall } = require('../common');
 const tmpdir = require('../common/tmpdir');
-tmpdir.refresh();
+const { test } = require('node:test');
+const fs = require('node:fs');
+const net = require('node:net');
 
-const server = net.createServer().listen(common.PIPE, common.mustCall(() => {
-  // The process should not crash
-  // See https://github.com/nodejs/node/issues/52159
-  fs.readdirSync(tmpdir.path, { recursive: true });
-  server.close();
-}));
+test('readdir should not recurse into Unix domain sockets', (t, done) => {
+  tmpdir.refresh();
+  const server = net.createServer().listen(PIPE, mustCall(() => {
+    // The process should not crash
+    // See https://github.com/nodejs/node/issues/52159
+    fs.readdirSync(tmpdir.path, { recursive: true });
+    server.close();
+    done();
+  }));
+});

From e99584cd575e21809d1087ae21e9761020f092f3 Mon Sep 17 00:00:00 2001
From: Arne Keller <2012gdwu+github@posteo.de>
Date: Sun, 24 Nov 2024 23:30:38 +0100
Subject: [PATCH 130/216] test: make HTTP/1.0 connection test more robust

Fixes: https://github.com/nodejs/node/issues/47200

Co-authored-by: Luigi Pinca <luigipinca@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55959
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
---
 ...emove-connection-header-persists-connection.js | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/test/parallel/test-http-remove-connection-header-persists-connection.js b/test/parallel/test-http-remove-connection-header-persists-connection.js
index df7e39ae94375f..d5f53a703c821e 100644
--- a/test/parallel/test-http-remove-connection-header-persists-connection.js
+++ b/test/parallel/test-http-remove-connection-header-persists-connection.js
@@ -10,6 +10,13 @@ const server = http.createServer(function(request, response) {
   // For HTTP/1.0, the connection should be closed after the response automatically.
   response.removeHeader('connection');
 
+  if (request.httpVersion === '1.0') {
+    const socket = request.socket;
+    response.on('finish', common.mustCall(function() {
+      assert.ok(socket.writableEnded);
+    }));
+  }
+
   response.end('beep boop\n');
 });
 
@@ -50,9 +57,7 @@ function makeHttp10Request(cb) {
                 '\r\n');
     socket.resume(); // Ignore the response itself
 
-    setTimeout(function() {
-      cb(socket);
-    }, common.platformTimeout(50));
+    socket.on('close', cb);
   });
 }
 
@@ -62,9 +67,7 @@ server.listen(0, function() {
       // Both HTTP/1.1 requests should have used the same socket:
       assert.strictEqual(firstSocket, secondSocket);
 
-      makeHttp10Request(function(socket) {
-        // The server should have immediately closed the HTTP/1.0 socket:
-        assert.strictEqual(socket.closed, true);
+      makeHttp10Request(function() {
         server.close();
       });
     });

From 7021b3b2760478ae83678b0898da1590bdc126b9 Mon Sep 17 00:00:00 2001
From: Colin Ihrig <cjihrig@gmail.com>
Date: Sun, 24 Nov 2024 21:30:43 -0500
Subject: [PATCH 131/216] test_runner: simplify hook running logic

This commit removes some asynchronous logic from the runHook()
method and replaces ArrayPrototypeReduce() with a for loop.

PR-URL: https://github.com/nodejs/node/pull/55963
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 lib/internal/test_runner/test.js              |  8 +++----
 .../test-runner/output/abort_hooks.snapshot   |  4 ++--
 ...global_after_should_fail_the_test.snapshot |  1 -
 .../test-runner/output/hooks.snapshot         | 11 +++++-----
 .../output/hooks_spec_reporter.snapshot       | 22 +++++++++----------
 5 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/lib/internal/test_runner/test.js b/lib/internal/test_runner/test.js
index bf808c41748431..f891315e88f312 100644
--- a/lib/internal/test_runner/test.js
+++ b/lib/internal/test_runner/test.js
@@ -2,7 +2,6 @@
 const {
   ArrayPrototypePush,
   ArrayPrototypePushApply,
-  ArrayPrototypeReduce,
   ArrayPrototypeShift,
   ArrayPrototypeSlice,
   ArrayPrototypeSome,
@@ -718,13 +717,14 @@ class Test extends AsyncResource {
   async runHook(hook, args) {
     validateOneOf(hook, 'hook name', kHookNames);
     try {
-      await ArrayPrototypeReduce(this.hooks[hook], async (prev, hook) => {
-        await prev;
+      const hooks = this.hooks[hook];
+      for (let i = 0; i < hooks.length; ++i) {
+        const hook = hooks[i];
         await hook.run(args);
         if (hook.error) {
           throw hook.error;
         }
-      }, PromiseResolve());
+      }
     } catch (err) {
       const error = new ERR_TEST_FAILURE(`failed running ${hook} hook`, kHookFailure);
       error.cause = isTestFailureError(err) ? err.cause : err;
diff --git a/test/fixtures/test-runner/output/abort_hooks.snapshot b/test/fixtures/test-runner/output/abort_hooks.snapshot
index 278b5e5fd36ca5..e318e36d9d56a4 100644
--- a/test/fixtures/test-runner/output/abort_hooks.snapshot
+++ b/test/fixtures/test-runner/output/abort_hooks.snapshot
@@ -101,7 +101,7 @@ not ok 2 - 2 after describe
         *
         *
         *
-        async Promise.all (index 0)
+        *
       ...
     # Subtest: test 2
     not ok 2 - test 2
@@ -122,7 +122,7 @@ not ok 2 - 2 after describe
         *
         *
         *
-        async Promise.all (index 0)
+        *
       ...
     1..2
 not ok 3 - 3 beforeEach describe
diff --git a/test/fixtures/test-runner/output/global_after_should_fail_the_test.snapshot b/test/fixtures/test-runner/output/global_after_should_fail_the_test.snapshot
index 3196f377b3d4bf..ee4d5f71072ba5 100644
--- a/test/fixtures/test-runner/output/global_after_should_fail_the_test.snapshot
+++ b/test/fixtures/test-runner/output/global_after_should_fail_the_test.snapshot
@@ -21,7 +21,6 @@ not ok 2 - /test/fixtures/test-runner/output/global_after_should_fail_the_test.j
     *
     *
     *
-    *
   ...
 1..1
 # tests 1
diff --git a/test/fixtures/test-runner/output/hooks.snapshot b/test/fixtures/test-runner/output/hooks.snapshot
index 6b9d6d26a90e39..be8d1b210c60e4 100644
--- a/test/fixtures/test-runner/output/hooks.snapshot
+++ b/test/fixtures/test-runner/output/hooks.snapshot
@@ -77,7 +77,6 @@ not ok 3 - before throws
     *
     *
     *
-    *
   ...
 # Subtest: before throws - no subtests
 not ok 4 - before throws - no subtests
@@ -97,7 +96,6 @@ not ok 4 - before throws - no subtests
     *
     *
     *
-    *
   ...
 # Subtest: after throws
     # Subtest: 1
@@ -129,6 +127,7 @@ not ok 5 - after throws
     *
     *
     *
+    *
   ...
 # Subtest: after throws - no subtests
 not ok 6 - after throws - no subtests
@@ -149,6 +148,7 @@ not ok 6 - after throws - no subtests
     *
     *
     *
+    *
   ...
 # Subtest: beforeEach throws
     # Subtest: 1
@@ -167,9 +167,9 @@ not ok 6 - after throws - no subtests
         *
         *
         *
-        async Promise.all (index 0)
         *
         *
+        new Promise (<anonymous>)
       ...
     # Subtest: 2
     not ok 2 - 2
@@ -188,6 +188,8 @@ not ok 6 - after throws - no subtests
         *
         *
         *
+        *
+        async Promise.all (index 0)
       ...
     1..2
 not ok 7 - beforeEach throws
@@ -482,7 +484,6 @@ not ok 15 - t.after throws
     *
     *
     *
-    *
   ...
 # Subtest: t.after throws - no subtests
 not ok 16 - t.after throws - no subtests
@@ -502,7 +503,6 @@ not ok 16 - t.after throws - no subtests
     *
     *
     *
-    *
   ...
 # Subtest: t.beforeEach throws
     # Subtest: 1
@@ -765,7 +765,6 @@ not ok 24 - run after when before throws
     *
     *
     *
-    *
   ...
 # Subtest: test hooks - async
     # Subtest: 1
diff --git a/test/fixtures/test-runner/output/hooks_spec_reporter.snapshot b/test/fixtures/test-runner/output/hooks_spec_reporter.snapshot
index b5c5ab7d1965e5..ea916c2ee754c4 100644
--- a/test/fixtures/test-runner/output/hooks_spec_reporter.snapshot
+++ b/test/fixtures/test-runner/output/hooks_spec_reporter.snapshot
@@ -26,7 +26,6 @@
       *
       *
       *
-      *
 
  before throws - no subtests (*ms)
   Error: before
@@ -38,7 +37,6 @@
       *
       *
       *
-      *
 
  after throws
    1 (*ms)
@@ -55,6 +53,7 @@
       *
       *
       *
+      *
 
  after throws - no subtests (*ms)
   Error: after
@@ -67,6 +66,7 @@
       *
       *
       *
+      *
 
  beforeEach throws
    1 (*ms)
@@ -78,9 +78,9 @@
         *
         *
         *
-        at async Promise.all (index 0)
         *
         *
+        at new Promise (<anonymous>)
 
    2 (*ms)
     Error: beforeEach
@@ -92,6 +92,8 @@
         *
         *
         *
+        *
+        at async Promise.all (index 0)
 
  beforeEach throws (*ms)
  afterEach throws
@@ -242,7 +244,6 @@
       *
       *
       *
-      *
 
  t.after throws - no subtests (*ms)
   Error: after
@@ -255,7 +256,6 @@
       *
       *
       *
-      *
 
  t.beforeEach throws
    1 (*ms)
@@ -390,7 +390,6 @@
       *
       *
       *
-      *
 
  test hooks - async
    1 (*ms)
@@ -430,7 +429,6 @@
       *
       *
       *
-      *
 
 *
  before throws - no subtests (*ms)
@@ -443,7 +441,6 @@
       *
       *
       *
-      *
 
 *
  after throws (*ms)
@@ -457,6 +454,7 @@
       *
       *
       *
+      *
 
 *
  after throws - no subtests (*ms)
@@ -470,6 +468,7 @@
       *
       *
       *
+      *
 
 *
  1 (*ms)
@@ -481,9 +480,9 @@
       *
       *
       *
-      at async Promise.all (index 0)
       *
       *
+      at new Promise (<anonymous>)
 
 *
  2 (*ms)
@@ -496,6 +495,8 @@
       *
       *
       *
+      *
+      at async Promise.all (index 0)
 
 *
  1 (*ms)
@@ -633,7 +634,6 @@
       *
       *
       *
-      *
 
 *
  t.after throws - no subtests (*ms)
@@ -647,7 +647,6 @@
       *
       *
       *
-      *
 
 *
  1 (*ms)
@@ -776,4 +775,3 @@
       *
       *
       *
-      *

From b95c4f5bf03bd539c1045899b619382c7f9a1a02 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C3=ABl=20Zasso?= <targos@protonmail.com>
Date: Mon, 25 Nov 2024 09:27:26 +0100
Subject: [PATCH 132/216] tools: use tokenless Codecov uploads
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refs: https://docs.codecov.com/docs/codecov-tokens#uploading-without-a-token
PR-URL: https://github.com/nodejs/node/pull/55943
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 .github/workflows/coverage-linux-without-intl.yml | 3 +--
 .github/workflows/coverage-linux.yml              | 3 +--
 .github/workflows/coverage-windows.yml            | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/.github/workflows/coverage-linux-without-intl.yml b/.github/workflows/coverage-linux-without-intl.yml
index ddd85fb8a4ff0e..1977eda3f97e03 100644
--- a/.github/workflows/coverage-linux-without-intl.yml
+++ b/.github/workflows/coverage-linux-without-intl.yml
@@ -79,7 +79,6 @@ jobs:
       - name: Clean tmp
         run: rm -rf coverage/tmp && rm -rf out
       - name: Upload
-        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
+        uses: codecov/codecov-action@015f24e6818733317a2da2edd6290ab26238649a  # v5.0.7
         with:
           directory: ./coverage
-          token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.github/workflows/coverage-linux.yml b/.github/workflows/coverage-linux.yml
index 153504ba4280d6..164c0b540a9f45 100644
--- a/.github/workflows/coverage-linux.yml
+++ b/.github/workflows/coverage-linux.yml
@@ -79,7 +79,6 @@ jobs:
       - name: Clean tmp
         run: rm -rf coverage/tmp && rm -rf out
       - name: Upload
-        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
+        uses: codecov/codecov-action@015f24e6818733317a2da2edd6290ab26238649a  # v5.0.7
         with:
           directory: ./coverage
-          token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.github/workflows/coverage-windows.yml b/.github/workflows/coverage-windows.yml
index 84feb7b09018de..fada006e321520 100644
--- a/.github/workflows/coverage-windows.yml
+++ b/.github/workflows/coverage-windows.yml
@@ -71,7 +71,6 @@ jobs:
       - name: Clean tmp
         run: npx rimraf ./coverage/tmp
       - name: Upload
-        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238  # v4.6.0
+        uses: codecov/codecov-action@015f24e6818733317a2da2edd6290ab26238649a  # v5.0.7
         with:
           directory: ./coverage
-          token: ${{ secrets.CODECOV_TOKEN }}

From 682ae41f86633af3714521b85dde3dcaf6197d3c Mon Sep 17 00:00:00 2001
From: Matteo Collina <hello@matteocollina.com>
Date: Mon, 25 Nov 2024 09:41:12 +0100
Subject: [PATCH 133/216] doc: add vetted courses to the ambassador benefits

Signed-off-by: Matteo Collina <hello@matteocollina.com>
PR-URL: https://github.com/nodejs/node/pull/55934
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 doc/contributing/advocacy-ambasador-program.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/contributing/advocacy-ambasador-program.md b/doc/contributing/advocacy-ambasador-program.md
index f806109f866caf..cfb8c5cb1cd484 100644
--- a/doc/contributing/advocacy-ambasador-program.md
+++ b/doc/contributing/advocacy-ambasador-program.md
@@ -18,6 +18,10 @@ The ambassador program does that by:
   messages and topics defined.
 * Advocating for ambassadors to be part of the OpenJS speakers bureau, even if the
   ambassador is not otherwise an active member of the project itself.
+* Each ambassador could add a maximum of three links to resources to learn Node.js
+  on a dedicated page on the main Node.js website. At least one of those must be a
+  free resource. The Node.js TSC members could ask for coupon codes to verify the
+  material if they so decide.
 
 ## Ambassadors nominations
 

From 65c178433729a0caacdd92e2adf4d528d4a6fd4e Mon Sep 17 00:00:00 2001
From: skyclouds2001 <95597335+skyclouds2001@users.noreply.github.com>
Date: Mon, 25 Nov 2024 18:02:32 +0800
Subject: [PATCH 134/216] doc: add doc for PerformanceObserver.takeRecords()

PR-URL: https://github.com/nodejs/node/pull/55786
Fixes: https://github.com/nodejs/node/issues/55779
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
---
 doc/api/perf_hooks.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/doc/api/perf_hooks.md b/doc/api/perf_hooks.md
index de0265d0f89f27..0bdf8626f476a7 100644
--- a/doc/api/perf_hooks.md
+++ b/doc/api/perf_hooks.md
@@ -1330,6 +1330,14 @@ for (let n = 0; n < 3; n++)
   performance.mark(`test${n}`);
 ```
 
+### `performanceObserver.takeRecords()`
+
+<!-- YAML
+added: v16.0.0
+-->
+
+* Returns: {PerformanceEntry\[]} Current list of entries stored in the performance observer, emptying it out.
+
 ## Class: `PerformanceObserverEntryList`
 
 <!-- YAML

From b09f6abcd3e5290c0b2913750d3a8d84d5a331d1 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 25 Nov 2024 20:01:45 -0500
Subject: [PATCH 135/216] deps: update simdutf to 5.6.3

PR-URL: https://github.com/nodejs/node/pull/55973
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 deps/simdutf/simdutf.cpp | 89 +++++++++++++++++++++++-----------------
 deps/simdutf/simdutf.h   | 16 ++++----
 2 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/deps/simdutf/simdutf.cpp b/deps/simdutf/simdutf.cpp
index 2da5ca7284c1f0..007fa02b165204 100644
--- a/deps/simdutf/simdutf.cpp
+++ b/deps/simdutf/simdutf.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-14 14:52:31 -0500. Do not edit! */
+/* auto-generated on 2024-11-21 10:33:28 -0500. Do not edit! */
 /* begin file src/simdutf.cpp */
 #include "simdutf.h"
 // We include base64_tables once.
@@ -23495,7 +23495,7 @@ size_t encode_base64(char *dst, const char *src, size_t srclen,
 }
 
 template <bool base64_url>
-static inline uint64_t to_base64_mask(block64 *b, bool *error) {
+static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
   __m512i input = b->chunks[0];
   const __m512i ascii_space_tbl = _mm512_set_epi8(
       0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10,
@@ -23538,7 +23538,7 @@ static inline uint64_t to_base64_mask(block64 *b, bool *error) {
   if (mask) {
     const __mmask64 spaces = _mm512_cmpeq_epi8_mask(
         _mm512_shuffle_epi8(ascii_space_tbl, input), input);
-    *error |= (mask != spaces);
+    *error = (mask ^ spaces);
   }
   b->chunks[0] = translated;
 
@@ -23646,16 +23646,13 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       block64 b;
       load_block(&b, src);
       src += 64;
-      bool error = false;
+      uint64_t error = 0;
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
-               to_base64[uint8_t(*src)] <= 64) {
-          src++;
-        }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
+        size_t error_offset = _tzcnt_u64(error);
+        return {error_code::INVALID_BASE64_CHARACTER,
+                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -28240,7 +28237,7 @@ struct block64 {
 };
 
 template <bool base64_url>
-static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
+static inline uint32_t to_base64_mask(__m256i *src, uint32_t *error) {
   const __m256i ascii_space_tbl =
       _mm256_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa,
                        0x0, 0xc, 0xd, 0x0, 0x0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0,
@@ -28324,17 +28321,19 @@ static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
   if (mask) {
     __m256i ascii_space =
         _mm256_cmpeq_epi8(_mm256_shuffle_epi8(ascii_space_tbl, *src), *src);
-    *error |= (mask != _mm256_movemask_epi8(ascii_space));
+    *error = (mask ^ _mm256_movemask_epi8(ascii_space));
   }
   *src = out;
   return (uint32_t)mask;
 }
 
 template <bool base64_url>
-static inline uint64_t to_base64_mask(block64 *b, bool *error) {
-  *error = 0;
-  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], error);
-  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], error);
+static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
+  uint32_t err0 = 0;
+  uint32_t err1 = 0;
+  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], &err0);
+  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], &err1);
+  *error = err0 | ((uint64_t)err1 << 32);
   return m0 | (m1 << 32);
 }
 
@@ -28466,16 +28465,13 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       block64 b;
       load_block(&b, src);
       src += 64;
-      bool error = false;
+      uint64_t error = 0;
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
-               to_base64[uint8_t(*src)] <= 64) {
-          src++;
-        }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
+        size_t error_offset = _tzcnt_u64(error);
+        return {error_code::INVALID_BASE64_CHARACTER,
+                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -37992,7 +37988,7 @@ struct block64 {
 };
 
 template <bool base64_url>
-static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
+static inline uint16_t to_base64_mask(__m128i *src, uint32_t *error) {
   const __m128i ascii_space_tbl =
       _mm_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa, 0x0,
                     0xc, 0xd, 0x0, 0x0);
@@ -38059,22 +38055,42 @@ static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
   if (mask) {
     __m128i ascii_space =
         _mm_cmpeq_epi8(_mm_shuffle_epi8(ascii_space_tbl, *src), *src);
-    *error |= (mask != _mm_movemask_epi8(ascii_space));
+    *error = (mask ^ _mm_movemask_epi8(ascii_space));
   }
   *src = out;
   return (uint16_t)mask;
 }
 
 template <bool base64_url>
-static inline uint64_t to_base64_mask(block64 *b, bool *error) {
-  *error = 0;
-  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], error);
-  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], error);
-  uint64_t m2 = to_base64_mask<base64_url>(&b->chunks[2], error);
-  uint64_t m3 = to_base64_mask<base64_url>(&b->chunks[3], error);
+static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
+  uint32_t err0 = 0;
+  uint32_t err1 = 0;
+  uint32_t err2 = 0;
+  uint32_t err3 = 0;
+  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], &err0);
+  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], &err1);
+  uint64_t m2 = to_base64_mask<base64_url>(&b->chunks[2], &err2);
+  uint64_t m3 = to_base64_mask<base64_url>(&b->chunks[3], &err3);
+  *error = (err0) | ((uint64_t)err1 << 16) | ((uint64_t)err2 << 32) |
+           ((uint64_t)err3 << 48);
   return m0 | (m1 << 16) | (m2 << 32) | (m3 << 48);
 }
 
+#if defined(_MSC_VER) && !defined(__clang__)
+static inline size_t simdutf_tzcnt_u64(uint64_t num) {
+  unsigned long ret;
+  if (num == 0) {
+    return 64;
+  }
+  _BitScanForward64(&ret, num);
+  return ret;
+}
+#else // GCC or Clang
+static inline size_t simdutf_tzcnt_u64(uint64_t num) {
+  return num ? __builtin_ctzll(num) : 64;
+}
+#endif
+
 static inline void copy_block(block64 *b, char *output) {
   _mm_storeu_si128(reinterpret_cast<__m128i *>(output), b->chunks[0]);
   _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 16), b->chunks[1]);
@@ -38222,16 +38238,13 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       block64 b;
       load_block(&b, src);
       src += 64;
-      bool error = false;
+      uint64_t error = 0;
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
-               to_base64[uint8_t(*src)] <= 64) {
-          src++;
-        }
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
+        size_t error_offset = simdutf_tzcnt_u64(error);
+        return {error_code::INVALID_BASE64_CHARACTER,
+                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
diff --git a/deps/simdutf/simdutf.h b/deps/simdutf/simdutf.h
index a30e5f2cd228cd..5f82ca372ccfe3 100644
--- a/deps/simdutf/simdutf.h
+++ b/deps/simdutf/simdutf.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-14 14:52:31 -0500. Do not edit! */
+/* auto-generated on 2024-11-21 10:33:28 -0500. Do not edit! */
 /* begin file include/simdutf.h */
 #ifndef SIMDUTF_H
 #define SIMDUTF_H
@@ -155,11 +155,11 @@
   // RISC-V 64-bit
   #define SIMDUTF_IS_RISCV64 1
 
-  #if __clang_major__ >= 19
-    // Does the compiler support target regions for RISC-V
-    #define SIMDUTF_HAS_RVV_TARGET_REGION 1
-  #endif
-
+  // #if __riscv_v_intrinsic >= 1000000
+  //   #define SIMDUTF_HAS_RVV_INTRINSICS 1
+  //   #define SIMDUTF_HAS_RVV_TARGET_REGION 1
+  // #elif ...
+  //  Check for special compiler versions that implement pre v1.0 intrinsics
   #if __riscv_v_intrinsic >= 11000
     #define SIMDUTF_HAS_RVV_INTRINSICS 1
   #endif
@@ -670,7 +670,7 @@ SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 #define SIMDUTF_SIMDUTF_VERSION_H
 
 /** The version of simdutf being used (major.minor.revision) */
-#define SIMDUTF_VERSION "5.6.2"
+#define SIMDUTF_VERSION "5.6.3"
 
 namespace simdutf {
 enum {
@@ -685,7 +685,7 @@ enum {
   /**
    * The revision (major.minor.REVISION) of simdutf being used.
    */
-  SIMDUTF_VERSION_REVISION = 2
+  SIMDUTF_VERSION_REVISION = 3
 };
 } // namespace simdutf
 

From f7131cf178231f578f1da2aa7ff52a427c953b98 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 25 Nov 2024 20:02:35 -0500
Subject: [PATCH 136/216] deps: update corepack to 0.30.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55977
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 deps/corepack/CHANGELOG.md          | 12 ++++++++++++
 deps/corepack/dist/corepack.js      |  1 +
 deps/corepack/dist/lib/corepack.cjs | 20 +++++++++++++-------
 deps/corepack/dist/npm.js           |  1 +
 deps/corepack/dist/npx.js           |  1 +
 deps/corepack/dist/pnpm.js          |  1 +
 deps/corepack/dist/pnpx.js          |  1 +
 deps/corepack/dist/yarn.js          |  1 +
 deps/corepack/dist/yarnpkg.js       |  1 +
 deps/corepack/package.json          |  2 +-
 10 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/deps/corepack/CHANGELOG.md b/deps/corepack/CHANGELOG.md
index 7de934c0d2c0db..941d0b6b7e5e25 100644
--- a/deps/corepack/CHANGELOG.md
+++ b/deps/corepack/CHANGELOG.md
@@ -1,5 +1,17 @@
 # Changelog
 
+## [0.30.0](https://github.com/nodejs/corepack/compare/v0.29.4...v0.30.0) (2024-11-23)
+
+
+### Features
+
+* update package manager versions ([#578](https://github.com/nodejs/corepack/issues/578)) ([a286c8f](https://github.com/nodejs/corepack/commit/a286c8f5537ea9ecf9b6ff53c7bc3e8da4e3c8bb))
+
+
+### Performance Improvements
+
+* prefer `module.enableCompileCache` over `v8-compile-cache` ([#574](https://github.com/nodejs/corepack/issues/574)) ([cba6905](https://github.com/nodejs/corepack/commit/cba690575bd606faeee54bd512ccb8797d49055f))
+
 ## [0.29.4](https://github.com/nodejs/corepack/compare/v0.29.3...v0.29.4) (2024-09-07)
 
 
diff --git a/deps/corepack/dist/corepack.js b/deps/corepack/dist/corepack.js
index b1b22662466f86..6179b11c083cb5 100755
--- a/deps/corepack/dist/corepack.js
+++ b/deps/corepack/dist/corepack.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='0';
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(process.argv.slice(2));
\ No newline at end of file
diff --git a/deps/corepack/dist/lib/corepack.cjs b/deps/corepack/dist/lib/corepack.cjs
index 2978fc336232e0..e1919339dc38bd 100644
--- a/deps/corepack/dist/lib/corepack.cjs
+++ b/deps/corepack/dist/lib/corepack.cjs
@@ -21260,7 +21260,7 @@ function String2(descriptor, ...args) {
 }
 
 // package.json
-var version = "0.29.4";
+var version = "0.30.0";
 
 // sources/Engine.ts
 var import_fs9 = __toESM(require("fs"));
@@ -21274,7 +21274,7 @@ var import_valid3 = __toESM(require_valid2());
 var config_default = {
   definitions: {
     npm: {
-      default: "10.8.3+sha1.e6085b2864fcfd9b1aad7b602601b5a2fc116699",
+      default: "10.9.1+sha1.ab141c1229765c11c8c59060fc9cf450a2207bd6",
       fetchLatestFrom: {
         type: "npm",
         package: "npm"
@@ -21311,7 +21311,7 @@ var config_default = {
       }
     },
     pnpm: {
-      default: "9.9.0+sha1.3edbe440f4e570aa8f049adbd06b9483d55cc2d2",
+      default: "9.14.2+sha1.5202b50ab92394b3c922d2e293f196e2df6d441b",
       fetchLatestFrom: {
         type: "npm",
         package: "pnpm"
@@ -21375,7 +21375,7 @@ var config_default = {
         package: "yarn"
       },
       transparent: {
-        default: "4.4.1+sha224.fd21d9eb5fba020083811af1d4953acc21eeb9f6ff97efd1b3f9d4de",
+        default: "4.5.2+sha224.c2e2e9ed3cdadd6ec250589b3393f71ae56d5ec297af11cec1eba3b4",
         commands: [
           [
             "yarn",
@@ -21965,8 +21965,11 @@ async function runVersion(locator, installSpec, binName, args) {
   }
   if (!binPath)
     throw new Error(`Assertion failed: Unable to locate path for bin '${binName}'`);
-  if (locator.name !== `npm` || (0, import_lt.default)(locator.reference, `9.7.0`))
-    await Promise.resolve().then(() => __toESM(require_v8_compile_cache()));
+  if (!import_module.default.enableCompileCache) {
+    if (locator.name !== `npm` || (0, import_lt.default)(locator.reference, `9.7.0`)) {
+      await Promise.resolve().then(() => __toESM(require_v8_compile_cache()));
+    }
+  }
   process.env.COREPACK_ROOT = import_path7.default.dirname(require.resolve("corepack/package.json"));
   process.argv = [
     process.execPath,
@@ -21976,6 +21979,9 @@ async function runVersion(locator, installSpec, binName, args) {
   process.execArgv = [];
   process.mainModule = void 0;
   process.nextTick(import_module.default.runMain, binPath);
+  if (import_module.default.flushCompileCache) {
+    setImmediate(import_module.default.flushCompileCache);
+  }
 }
 function shouldSkipIntegrityCheck() {
   return process.env.COREPACK_INTEGRITY_KEYS === `` || process.env.COREPACK_INTEGRITY_KEYS === `0`;
@@ -22553,7 +22559,7 @@ var EnableCommand = class extends Command {
     [`enable`]
   ];
   static usage = Command.Usage({
-    description: `Add the Corepack shims to the install directories`,
+    description: `Add the Corepack shims to the install directory`,
     details: `
       When run, this command will check whether the shims for the specified package managers can be found with the correct values inside the install directory. If not, or if they don't exist, they will be created.
 
diff --git a/deps/corepack/dist/npm.js b/deps/corepack/dist/npm.js
index 7d10ba5bdf36b2..75f68b058f2dd6 100755
--- a/deps/corepack/dist/npm.js
+++ b/deps/corepack/dist/npm.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['npm', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/dist/npx.js b/deps/corepack/dist/npx.js
index a8bd3e69014313..b1138bb48e1a82 100755
--- a/deps/corepack/dist/npx.js
+++ b/deps/corepack/dist/npx.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['npx', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/dist/pnpm.js b/deps/corepack/dist/pnpm.js
index a0a87263435562..56ba509405033d 100755
--- a/deps/corepack/dist/pnpm.js
+++ b/deps/corepack/dist/pnpm.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['pnpm', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/dist/pnpx.js b/deps/corepack/dist/pnpx.js
index 57ad4842631cd7..ee36be2e99c686 100755
--- a/deps/corepack/dist/pnpx.js
+++ b/deps/corepack/dist/pnpx.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['pnpx', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/dist/yarn.js b/deps/corepack/dist/yarn.js
index eaed8596eabaa3..ce628c82b6a782 100755
--- a/deps/corepack/dist/yarn.js
+++ b/deps/corepack/dist/yarn.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['yarn', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/dist/yarnpkg.js b/deps/corepack/dist/yarnpkg.js
index aada6032fa67ff..9541ed726aaa3b 100755
--- a/deps/corepack/dist/yarnpkg.js
+++ b/deps/corepack/dist/yarnpkg.js
@@ -1,3 +1,4 @@
 #!/usr/bin/env node
 process.env.COREPACK_ENABLE_DOWNLOAD_PROMPT??='1'
+require('module').enableCompileCache?.();
 require('./lib/corepack.cjs').runMain(['yarnpkg', ...process.argv.slice(2)]);
\ No newline at end of file
diff --git a/deps/corepack/package.json b/deps/corepack/package.json
index 571c359407e07a..c9c6662e99e6c9 100644
--- a/deps/corepack/package.json
+++ b/deps/corepack/package.json
@@ -1,6 +1,6 @@
 {
   "name": "corepack",
-  "version": "0.29.4",
+  "version": "0.30.0",
   "homepage": "https://github.com/nodejs/corepack#readme",
   "bugs": {
     "url": "https://github.com/nodejs/corepack/issues"

From 7ba6dcf180a7f8fcb5f7a3e1d89fd44985715baa Mon Sep 17 00:00:00 2001
From: ywave620 <60539365+ywave620@users.noreply.github.com>
Date: Tue, 26 Nov 2024 11:25:22 +0800
Subject: [PATCH 137/216] http2: fix memory leak caused by premature listener
 removing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Http2Session should always call ondone into JS to detach the
handle. In some case, ondone is defered to be called by the
StreamListener through WriteWrap, we should be careful of this
before getting rid of the StreamListener.

PR-URL: https://github.com/nodejs/node/pull/55966
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 src/node_http2.cc                             |  4 +-
 ...-h2leak-destroy-session-on-socket-ended.js | 80 +++++++++++++++++++
 2 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 test/parallel/test-h2leak-destroy-session-on-socket-ended.js

diff --git a/src/node_http2.cc b/src/node_http2.cc
index 1569149bc47712..73a3836cfeff1b 100644
--- a/src/node_http2.cc
+++ b/src/node_http2.cc
@@ -792,13 +792,15 @@ void Http2Session::Close(uint32_t code, bool socket_closed) {
     CHECK_EQ(nghttp2_session_terminate_session(session_.get(), code), 0);
     SendPendingData();
   } else if (stream_ != nullptr) {
+    // so that the previous listener of the socket, typically, JS code of a
+    // (tls) socket will be notified of any activity later
     stream_->RemoveStreamListener(this);
   }
 
   set_destroyed();
 
   // If we are writing we will get to make the callback in OnStreamAfterWrite.
-  if (!is_write_in_progress()) {
+  if (!is_write_in_progress() || !stream_) {
     Debug(this, "make done session callback");
     HandleScope scope(env()->isolate());
     MakeCallback(env()->ondone_string(), 0, nullptr);
diff --git a/test/parallel/test-h2leak-destroy-session-on-socket-ended.js b/test/parallel/test-h2leak-destroy-session-on-socket-ended.js
new file mode 100644
index 00000000000000..3f0fe3e69d924d
--- /dev/null
+++ b/test/parallel/test-h2leak-destroy-session-on-socket-ended.js
@@ -0,0 +1,80 @@
+'use strict';
+// Flags: --expose-gc
+
+const common = require('../common');
+if (!common.hasCrypto)
+  common.skip('missing crypto');
+const http2 = require('http2');
+const tls = require('tls');
+const fixtures = require('../common/fixtures');
+const assert = require('assert');
+
+const registry = new FinalizationRegistry(common.mustCall((name) => {
+  assert(name, 'session');
+}));
+
+const server = http2.createSecureServer({
+  key: fixtures.readKey('agent1-key.pem'),
+  cert: fixtures.readKey('agent1-cert.pem'),
+});
+
+let firstServerStream;
+
+
+server.on('secureConnection', (s) => {
+  console.log('secureConnection');
+  s.on('end', () => {
+    console.log(s.destroyed); // false !!
+    s.destroy();
+    firstServerStream.session.destroy();
+
+    firstServerStream = null;
+
+    setImmediate(() => {
+      global.gc();
+      global.gc();
+
+      server.close();
+    });
+  });
+});
+
+server.on('session', (s) => {
+  registry.register(s, 'session');
+});
+
+server.on('stream', (stream) => {
+  console.log('stream...');
+  stream.write('a'.repeat(1024));
+  firstServerStream = stream;
+  setImmediate(() => console.log('Draining setImmediate after writing'));
+});
+
+
+server.listen(() => {
+  client();
+});
+
+
+const h2fstStream = [
+  'UFJJICogSFRUUC8yLjANCg0KU00NCg0K',
+  // http message (1st stream:)
+  'AAAABAAAAAAA',
+  'AAAPAQUAAAABhIJBiqDkHROdCbjwHgeG',
+];
+function client() {
+  const client = tls.connect({
+    port: server.address().port,
+    host: 'localhost',
+    rejectUnauthorized: false,
+    ALPNProtocols: ['h2']
+  }, () => {
+    client.end(Buffer.concat(h2fstStream.map((s) => Buffer.from(s, 'base64'))), (err) => {
+      assert.ifError(err);
+    });
+  });
+
+  client.on('error', (error) => {
+    console.error('Connection error:', error);
+  });
+}

From 25e1862e8706285d635e15f8cb29cb9ebabd6151 Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Tue, 26 Nov 2024 11:01:09 +0100
Subject: [PATCH 138/216] build: set node_arch to target_cpu in GN
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55967
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 unofficial.gni | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/unofficial.gni b/unofficial.gni
index b079fca822df24..fedf2e5ed55274 100644
--- a/unofficial.gni
+++ b/unofficial.gni
@@ -91,7 +91,7 @@ template("node_gn_build") {
     if (current_cpu == "x86") {
       node_arch = "ia32"
     } else {
-      node_arch = current_cpu
+      node_arch = target_cpu
     }
     if (target_os == "win") {
       node_platform = "win32"

From 1e09d258da9a5044f4d1a05c60cf3f8f16a95c85 Mon Sep 17 00:00:00 2001
From: Mert Can Altin <mertgold60@gmail.com>
Date: Tue, 26 Nov 2024 16:43:43 +0300
Subject: [PATCH 139/216] tools: add WPT updater for specific subsystems

PR-URL: https://github.com/nodejs/node/pull/54460
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 .github/workflows/update-wpt.yml | 75 ++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 .github/workflows/update-wpt.yml

diff --git a/.github/workflows/update-wpt.yml b/.github/workflows/update-wpt.yml
new file mode 100644
index 00000000000000..4f7eb286ce0462
--- /dev/null
+++ b/.github/workflows/update-wpt.yml
@@ -0,0 +1,75 @@
+name: WPT update
+
+on:
+  schedule:
+    # Run once a week at 12:00 AM UTC on Sunday.
+    - cron: 0 0 * * *
+  workflow_dispatch:
+    inputs:
+      subsystems:
+        description: Subsystem to run the update for
+        required: false
+        default: '["url", "WebCryptoAPI"]'
+
+permissions:
+  contents: read
+
+env:
+  NODE_VERSION: lts/*
+
+jobs:
+  wpt-subsystem-update:
+    if: github.repository == 'nodejs/node' || github.event_name == 'workflow_dispatch'
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        subsystem: ${{ fromJSON(github.event.inputs.subsystems || '["url", "WebCryptoAPI"]') }}
+
+    steps:
+      - uses: actions/checkout@6d193bf28034eafb982f37bd894289fe649468fc  # v4.1.7
+        with:
+          persist-credentials: false
+
+      - name: Install Node.js
+        uses: actions/setup-node@1e60f620b9541d16bece96c5465dc8ee9832be0b  # v4.0.3
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+
+      - name: Install @node-core/utils
+        run: npm install -g @node-core/utils
+
+      - name: Setup @node-core/utils
+        run: |
+          ncu-config set username "$USERNAME"
+          ncu-config set token "$GH_TOKEN"
+          ncu-config set jenkins_token "$JENKINS_TOKEN"
+          ncu-config set owner "${{ github.repository_owner }}"
+          ncu-config set repo "$(echo ${{ github.repository }} | cut -d/ -f2)"
+        env:
+          USERNAME: ${{ secrets.JENKINS_USER }}
+          GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
+          JENKINS_TOKEN: ${{ secrets.JENKINS_TOKEN }}
+
+      - name: Update WPT for subsystem ${{ matrix.subsystem }}
+        run: |
+          git node wpt ${{ matrix.subsystem }}
+
+      - name: Calculate new version for WPT using jq
+        run: |
+          new_version=$(jq -r '.${{ matrix.subsystem }}.commit' test/fixtures/wpt/versions.json)
+          echo "new_version=$new_version" >> $GITHUB_ENV
+
+      - name: Open or update PR for the subsystem update
+        uses: gr2m/create-or-update-pull-request-action@86ec1766034c8173518f61d2075cc2a173fb8c97
+        with:
+          branch: actions/update-wpt-${{ matrix.subsystem }}
+          author: Node.js GitHub Bot <github-bot@iojs.org>
+          title: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}'
+          commit-message: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}'
+          labels: test
+          update-pull-request-title-and-body: true
+          body: |
+            This is an automated update of the WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}.
+        env:
+          GITHUB_TOKEN: ${{ secrets.GH_USER_TOKEN }}

From 954e60b87dcd0873259684241c6083cb498477ba Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Tue, 26 Nov 2024 20:04:41 +0000
Subject: [PATCH 140/216] tools: update WPT updater

Update the workflow to support subsystems that contains non alphanumeric
char. Remove the Jenkins token as it is unused. Update external actions.
Use shorter ref for upstream commits.

PR-URL: https://github.com/nodejs/node/pull/56003
Reviewed-By: Filip Skokan <panva.ip@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 .github/workflows/update-wpt.yml | 36 +++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/.github/workflows/update-wpt.yml b/.github/workflows/update-wpt.yml
index 4f7eb286ce0462..71cd1bab487735 100644
--- a/.github/workflows/update-wpt.yml
+++ b/.github/workflows/update-wpt.yml
@@ -27,7 +27,7 @@ jobs:
         subsystem: ${{ fromJSON(github.event.inputs.subsystems || '["url", "WebCryptoAPI"]') }}
 
     steps:
-      - uses: actions/checkout@6d193bf28034eafb982f37bd894289fe649468fc  # v4.1.7
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           persist-credentials: false
 
@@ -43,33 +43,41 @@ jobs:
         run: |
           ncu-config set username "$USERNAME"
           ncu-config set token "$GH_TOKEN"
-          ncu-config set jenkins_token "$JENKINS_TOKEN"
-          ncu-config set owner "${{ github.repository_owner }}"
-          ncu-config set repo "$(echo ${{ github.repository }} | cut -d/ -f2)"
+          ncu-config set owner "${GITHUB_REPOSITORY_OWNER}"
+          ncu-config set repo "$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)"
         env:
           USERNAME: ${{ secrets.JENKINS_USER }}
           GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
-          JENKINS_TOKEN: ${{ secrets.JENKINS_TOKEN }}
 
       - name: Update WPT for subsystem ${{ matrix.subsystem }}
         run: |
-          git node wpt ${{ matrix.subsystem }}
+          git node wpt "$SUBSYSTEM"
+        env:
+          SUBSYSTEM: ${{ matrix.subsystem }}
 
-      - name: Calculate new version for WPT using jq
+      - name: Retrieve new version commit
         run: |
-          new_version=$(jq -r '.${{ matrix.subsystem }}.commit' test/fixtures/wpt/versions.json)
-          echo "new_version=$new_version" >> $GITHUB_ENV
+          new_version="$(
+            node -p 'require("./test/fixtures/wpt/versions.json")[process.argv[1]].commit' "$SUBSYSTEM"
+          )"
+          {
+            echo "long_version=$new_version"
+            echo "short_version=${new_version:0:10}"
+          } >> "$GITHUB_ENV"
+        env:
+          SUBSYSTEM: ${{ matrix.subsystem }}
 
       - name: Open or update PR for the subsystem update
-        uses: gr2m/create-or-update-pull-request-action@86ec1766034c8173518f61d2075cc2a173fb8c97
+        uses: gr2m/create-or-update-pull-request-action@77596e3166f328b24613f7082ab30bf2d93079d5
         with:
           branch: actions/update-wpt-${{ matrix.subsystem }}
           author: Node.js GitHub Bot <github-bot@iojs.org>
-          title: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}'
-          commit-message: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}'
+          title: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.short_version }}'
+          commit-message: 'test: update WPT for ${{ matrix.subsystem }} to ${{ env.short_version }}'
           labels: test
           update-pull-request-title-and-body: true
-          body: |
-            This is an automated update of the WPT for ${{ matrix.subsystem }} to ${{ env.new_version }}.
+          body: >
+            This is an automated update of the WPT for ${{ matrix.subsystem }} to
+            https://github.com/web-platform-tests/wpt/commit/${{ env.long_version }}.
         env:
           GITHUB_TOKEN: ${{ secrets.GH_USER_TOKEN }}

From d25bcfd0b2aee617f8591f5c3bc27053e908831d Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Wed, 27 Nov 2024 11:26:27 +0100
Subject: [PATCH 141/216] doc: remove confusing and outdated sentence

Remove confusing and outdated sentence in `doc/api/stream.md`.

Fixes: https://github.com/nodejs/node/issues/55987
PR-URL: https://github.com/nodejs/node/pull/55988
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 doc/api/stream.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/doc/api/stream.md b/doc/api/stream.md
index e5116b42650e41..624b655a81e741 100644
--- a/doc/api/stream.md
+++ b/doc/api/stream.md
@@ -3830,8 +3830,6 @@ added: v8.0.0
 
 The `_destroy()` method is called by [`writable.destroy()`][writable-destroy].
 It can be overridden by child classes but it **must not** be called directly.
-Furthermore, the `callback` should not be mixed with async/await
-once it is executed when a promise is resolved.
 
 #### `writable._final(callback)`
 

From 1b3163826236856d1c5321c458860a5a132474c9 Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Wed, 27 Nov 2024 15:11:35 +0100
Subject: [PATCH 142/216] doc: improve GN build documentation a bit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55968
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Stefan Stojanovic <stefan.stojanovic@janeasystems.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 doc/contributing/gn-build.md | 53 +++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/doc/contributing/gn-build.md b/doc/contributing/gn-build.md
index 0981472805a4f8..9054ed5e7d41c4 100644
--- a/doc/contributing/gn-build.md
+++ b/doc/contributing/gn-build.md
@@ -28,27 +28,33 @@ Node.js contains following GN build files:
 
 Unlike GYP, the GN tool does not include any built-in rules for compiling a
 project, which means projects building with GN must provide their own build
-configurations for things like how to invoke a C++ compiler. Chromium related
-projects like V8 and skia choose to reuse Chromium's build configurations, and
-V8's Node.js integration testing repository
-([node-ci](https://chromium.googlesource.com/v8/node-ci/)) can be reused for
-building Node.js.
+configurations for things like how to invoke a C++ compiler.
+
+Chromium related projects like V8 and skia choose to reuse Chromium's build
+configurations, and V8's Node.js integration testing repository
+[`node-ci`][node-ci] can be reused for building Node.js.
 
 ### 1. Install `depot_tools`
 
-The `depot_tools` is a set of tools used by Chromium related projects for
-checking out code and managing dependencies, and since this guide is reusing the
-infra of V8, it needs to be installed and added to `PATH`:
+You'll need to install [`depot_tools`][depot-tools] the toolset
+used for fetching Chromium and its dependencies.
 
 ```bash
 git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
 export PATH=/path/to/depot_tools:$PATH
 ```
 
-You can also follow the [official tutorial of
-`depot_tools`](https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html).
+You can ensure `depot_tools` is correctly added to your PATH by running
+`which gn` and confirming that it returns `/path/to/depot_tools/gn`.
+
+**NOTE:** On Windows you'll also need to set the environment variable
+`DEPOT_TOOLS_WIN_TOOLCHAIN=0`. To do so, open `Control Panel` → `System and
+Security` → `System` → `Advanced system settings` and add a system variable
+`DEPOT_TOOLS_WIN_TOOLCHAIN` with value `0`. This tells `depot_tools` to use
+your locally installed version of Visual Studio (by default, `depot_tools` will
+try to download a Google-internal version that only Googlers have access to).
 
-### 2. Check out code of Node.js
+### 2. Checkout Node.js Source Code
 
 To check out the latest main branch of Node.js for building, use the `fetch`
 tool from `depot_tools`:
@@ -91,9 +97,9 @@ out at `node_gn/node/node`.
 
 ### 3. Build
 
-GN only supports [`ninja`](https://ninja-build.org) for building, so to build
-Node.js with GN, `ninja` build files should be generated first, and then
-`ninja` can be invoked to do the building.
+GN only supports [`ninja`](https://ninja-build.org) for building. To build
+Node.js with GN you'll first need to generate `ninja` build files and then invoke
+`ninja` to perform the build.
 
 The `node-ci` repository provides a script for calling GN:
 
@@ -103,9 +109,10 @@ cd node  # Enter `node_gn/node` which contains a node-ci checkout
 ```
 
 which writes `ninja` build files into the `out/Release` directory under
-`node_gn/node`.
+`node_gn/node`. To see all possible configurable options, run
+`tools/gn-gen.py --help`.
 
-And then you can execute `ninja`:
+When `gn-gen.py` has executed successfully, you can then execute `ninja`:
 
 ```bash
 ninja -C out/Release node
@@ -116,10 +123,12 @@ After the build is completed, the compiled Node.js executable can be found in
 
 ## Status of the GN build
 
-Currently the GN build of Node.js is not fully functioning. It builds for macOS
-and Linux, while the Windows build is still a work in progress. And some tests
-are still failing with the GN build.
+Currently the GN build of Node.js is not fully functioning. Some tests
+are still failing with the GN build, and there may be other small pitfall
+for certain configuration options.
+
+An effort is currently underway to make GN build work without using `depot_tools`,
+which is tracked in [#51689](https://github.com/nodejs/node/issues/51689).
 
-There are also efforts on making GN build work without using `depot_tools`,
-which is tracked in the issue
-[#51689](https://github.com/nodejs/node/issues/51689).
+[depot-tools]: https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html#_setting_up
+[node-ci]: https://chromium.googlesource.com/v8/node-ci

From e016f68c73bc5e3b998f9279b587216aa4e07529 Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Wed, 27 Nov 2024 15:11:49 +0100
Subject: [PATCH 143/216] doc: add history entry for textEncoder.encodeInto()

Fixes: https://github.com/nodejs/node/issues/55938
PR-URL: https://github.com/nodejs/node/pull/55990
Refs: https://github.com/nodejs/node/pull/29524
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
---
 doc/api/util.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/api/util.md b/doc/api/util.md
index 334d89e36c0a2f..702b9468dc9b47 100644
--- a/doc/api/util.md
+++ b/doc/api/util.md
@@ -2047,6 +2047,10 @@ encoded bytes.
 
 ### `textEncoder.encodeInto(src, dest)`
 
+<!-- YAML
+added: v12.11.0
+-->
+
 * `src` {string} The text to encode.
 * `dest` {Uint8Array} The array to hold the encode result.
 * Returns: {Object}

From 1dcf8dfedbbff8b876779a8f63ce27531b349704 Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Wed, 27 Nov 2024 15:24:53 +0100
Subject: [PATCH 144/216] doc: move history entry to class description

Move the history entry for the `TextDecoder` class into the class
description itself instead of its constructor.

Refs: https://github.com/nodejs/node/issues/55938
PR-URL: https://github.com/nodejs/node/pull/55991
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
---
 doc/api/util.md | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/doc/api/util.md b/doc/api/util.md
index 702b9468dc9b47..276cf5ccd82946 100644
--- a/doc/api/util.md
+++ b/doc/api/util.md
@@ -1879,6 +1879,10 @@ The full list of formats can be found in [modifiers][].
 
 <!-- YAML
 added: v8.3.0
+changes:
+  - version: v11.0.0
+    pr-url: https://github.com/nodejs/node/pull/22281
+    description: The class is now available on the global object.
 -->
 
 An implementation of the [WHATWG Encoding Standard][] `TextDecoder` API.
@@ -1957,14 +1961,6 @@ is not supported.
 
 ### `new TextDecoder([encoding[, options]])`
 
-<!-- YAML
-added: v8.3.0
-changes:
-  - version: v11.0.0
-    pr-url: https://github.com/nodejs/node/pull/22281
-    description: The class is now available on the global object.
--->
-
 * `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
   supports. **Default:** `'utf-8'`.
 * `options` {Object}

From 9f3ef4a434b6ffe65a9dc74279433a1c8051cc75 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Wed, 27 Nov 2024 11:57:13 -0300
Subject: [PATCH 145/216] doc: add FAQ to releases section

PR-URL: https://github.com/nodejs/node/pull/55992
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/contributing/releases.md | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 3ae682169873df..32e9880b444dcb 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -1383,6 +1383,42 @@ Infrastructure team is able to perform the switch of the default. An issue
 should be opened on the [Node.js Snap management repository][] requesting this
 take place once a new LTS line has been released.
 
+## FAQ
+
+Due to how `tools/release.sh` work, it isn't uncommon to face some errors
+during the promotion process as it depends on network communication and machine
+availability. This section aims to guide the releaser through potential
+failures.
+
+### Error on dist-indexer while promoting
+
+```bash
+node:events:491
+      throw er; // Unhandled 'error' event
+      ^
+
+Error: read ECONNRESET
+    at TLSWrap.onStreamRead (node:internal/stream_base_commons:217:20)
+Emitted 'error' event on DestroyableTransform instance at:
+    at ClientRequest.<anonymous> (/usr/lib/node_modules/nodejs-dist-indexer/node_modules/hyperquest/index.js:14:19)
+    at ClientRequest.emit (node:events:513:28)
+    at TLSSocket.socketErrorListener (node:_http_client:494:9)
+    at TLSSocket.emit (node:events:513:28)
+    at emitErrorNT (node:internal/streams/destroy:157:8)
+    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
+    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
+  errno: -104,
+  code: 'ECONNRESET',
+  syscall: 'read'
+}
+```
+
+Typical resolution: sign the release again.
+
+```bash
+./tools/release.sh -s vX.Y.Z
+```
+
 [Build issue tracker]: https://github.com/nodejs/build/issues/new
 [CI lockdown procedure]: https://github.com/nodejs/build/blob/HEAD/doc/jenkins-guide.md#restricting-access-for-security-releases
 [Node.js Snap management repository]: https://github.com/nodejs/snap

From bbf39b8c46af824ef772db99f649441799c291a4 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Wed, 27 Nov 2024 20:16:10 +0000
Subject: [PATCH 146/216] tools: filter release keys to reduce interactivity
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55950
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 tools/release.sh | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/tools/release.sh b/tools/release.sh
index fca6e30a6308f2..a40035cb7427fc 100755
--- a/tools/release.sh
+++ b/tools/release.sh
@@ -15,15 +15,25 @@ webuser=dist
 promotablecmd=dist-promotable
 promotecmd=dist-promote
 signcmd=dist-sign
+allPGPKeys=""
 customsshkey="" # let ssh and scp use default key
+readmePath="README.md"
 signversion=""
 cloudflare_bucket="r2:dist-prod"
 
-while getopts ":i:s:" option; do
+while getopts ":i:r:s:a" option; do
     case "${option}" in
+        a)
+            # With -a, local keys are not filtered based on the one listed in the README
+            # useful if you want to sign with a subkey.
+            allPGPKeys="true"
+            ;;
         i)
             customsshkey="-i ${OPTARG}"
             ;;
+        r)
+            readmePath="${OPTARG}"
+            ;;
         s)
             signversion="${OPTARG}"
             ;;
@@ -44,7 +54,16 @@ shift $((OPTIND-1))
 
 echo "# Selecting GPG key ..."
 
-gpgkey=$(gpg --list-secret-keys --keyid-format SHORT | awk -F'( +|/)' '/^(sec|ssb)/{print $3}')
+
+if [ -z "$allPGPKeys" ]; then
+  gpgkey="$(awk '{
+    if ($1 == "gpg" && $2 == "--keyserver" && $4 == "--recv-keys" && (1 == 2'"$(
+      gpg --list-secret-keys | awk -F' = ' '/^ +Key fingerprint/{ gsub(/ /,"",$2); print " || $5 == \"" $2 "\"" }' || true
+    )"')) { print substr($5, 33) }
+  }' "$readmePath")"
+else
+  gpgkey=$(gpg --list-secret-keys --keyid-format SHORT | awk -F'( +|/)' '/^(sec|ssb)/{print $3}')
+fi
 keycount=$(echo "$gpgkey" | wc -w)
 
 if [ "$keycount" -eq 0 ]; then
@@ -68,13 +87,12 @@ elif [ "$keycount" -ne 1 ]; then
   gpgkey=$(echo "$gpgkey" | sed -n "${keynum}p")
 fi
 
-gpgfing=$(gpg --keyid-format 0xLONG --fingerprint "$gpgkey" | grep 'Key fingerprint =' | awk -F' = ' '{print $2}' | tr -d ' ')
-
-grep -q "$gpgfing" README.md || (\
-  echo 'Error: this GPG key fingerprint is not listed in ./README.md' && \
-  exit 1 \
-)
+gpgfing=$(gpg --keyid-format 0xLONG --fingerprint "$gpgkey" | awk -F' = ' '/^ +Key fingerprint/{gsub(/ /,"",$2);print $2}')
 
+grep -q "$gpgfing" "$readmePath" || {
+  echo "Error: this GPG key fingerprint is not listed in $readmePath"
+  exit 1
+}
 
 echo "Using GPG key: $gpgkey"
 echo "  Fingerprint: $gpgfing"

From 6fc7328831a9bb78b976926a19c8433244d84626 Mon Sep 17 00:00:00 2001
From: Blended Bram <bram.kamies@iodigital.com>
Date: Wed, 27 Nov 2024 21:18:00 +0100
Subject: [PATCH 147/216] doc: remove unused import from sample code

The `node:path` module is referenced in a code snippet that doesn't
actually use it.

PR-URL: https://github.com/nodejs/node/pull/55570
Reviewed-By: Raz Luvaton <rluvaton@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 doc/api/async_context.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/doc/api/async_context.md b/doc/api/async_context.md
index 7a55baa53cd925..6e983a1986772f 100644
--- a/doc/api/async_context.md
+++ b/doc/api/async_context.md
@@ -620,7 +620,6 @@ a Worker pool around it could use the following structure:
 ```mjs
 import { AsyncResource } from 'node:async_hooks';
 import { EventEmitter } from 'node:events';
-import path from 'node:path';
 import { Worker } from 'node:worker_threads';
 
 const kTaskInfo = Symbol('kTaskInfo');

From 9317feb8294af858a2b60671de82732a5222e5d5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?G=C3=BCrg=C3=BCn=20Day=C4=B1o=C4=9Flu?= <hey@gurgun.day>
Date: Thu, 28 Nov 2024 13:00:52 +0100
Subject: [PATCH 148/216] fs: lazily load ReadFileContext

PR-URL: https://github.com/nodejs/node/pull/55998
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
---
 lib/fs.js | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/fs.js b/lib/fs.js
index 64f0b5e88ed447..05be1f18410037 100644
--- a/lib/fs.js
+++ b/lib/fs.js
@@ -159,6 +159,7 @@ let WriteStream;
 let rimraf;
 let rimrafSync;
 let kResistStopPropagation;
+let ReadFileContext;
 
 // These have to be separate because of how graceful-fs happens to do it's
 // monkeypatching.
@@ -364,7 +365,7 @@ function readFile(path, options, callback) {
   callback ||= options;
   validateFunction(callback, 'cb');
   options = getOptions(options, { flag: 'r' });
-  const ReadFileContext = require('internal/fs/read/context');
+  ReadFileContext ??= require('internal/fs/read/context');
   const context = new ReadFileContext(callback, options.encoding);
   context.isUserFd = isFd(path); // File descriptor ownership
 

From 5f15d8b3f5ba03e353163ce90c0d7af4d9432dfd Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Thu, 28 Nov 2024 23:52:23 +0000
Subject: [PATCH 149/216] tools: fix nghttp3 updater script
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56007
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 tools/dep_updaters/update-nghttp3.sh | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/dep_updaters/update-nghttp3.sh b/tools/dep_updaters/update-nghttp3.sh
index 50a17e32f8a291..1a4df351b8abba 100755
--- a/tools/dep_updaters/update-nghttp3.sh
+++ b/tools/dep_updaters/update-nghttp3.sh
@@ -42,16 +42,17 @@ cleanup () {
 trap cleanup INT TERM EXIT
 
 NGHTTP3_REF="v$NEW_VERSION"
-NGHTTP3_ZIP="nghttp3-$NEW_VERSION"
+ARCHIVE_BASENAME="nghttp3-${NEW_VERSION}"
 
 cd "$WORKSPACE"
 
 echo "Fetching nghttp3 source archive..."
-curl -sL -o "$NGHTTP3_ZIP.zip" "https://github.com/ngtcp2/nghttp3/archive/refs/tags/$NGHTTP3_REF.zip"
-log_and_verify_sha256sum "nghttp3" "$NGHTTP3_ZIP.zip"
-unzip "$NGHTTP3_ZIP.zip"
-rm "$NGHTTP3_ZIP.zip"
-mv "$NGHTTP3_ZIP" nghttp3
+curl -sL -o "$ARCHIVE_BASENAME.tar.xz" "https://github.com/ngtcp2/nghttp3/releases/download/${NGHTTP3_REF}/${ARCHIVE_BASENAME}.tar.xz"
+SHA256="$(curl -sL "https://github.com/ngtcp2/nghttp3/releases/download/${NGHTTP3_REF}/checksums.txt" | grep 'tar.xz$')"
+log_and_verify_sha256sum "nghttp3" "$ARCHIVE_BASENAME.tar.xz" "$SHA256"
+tar -xJf "$ARCHIVE_BASENAME.tar.xz"
+rm "$ARCHIVE_BASENAME.tar.xz"
+mv "$ARCHIVE_BASENAME" nghttp3
 
 cd nghttp3
 

From e1635fbd4ed56fd076af4eafd36eefda7b9add0d Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Thu, 28 Nov 2024 23:52:37 +0000
Subject: [PATCH 150/216] tools: allow dispatch of `tools.yml` from forks
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56008
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 .github/workflows/tools.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/tools.yml b/.github/workflows/tools.yml
index 09fb8968bf1d00..7e25f49f56db29 100644
--- a/.github/workflows/tools.yml
+++ b/.github/workflows/tools.yml
@@ -50,7 +50,7 @@ permissions:
 
 jobs:
   tools-deps-update:
-    if: github.repository == 'nodejs/node'
+    if: github.repository == 'nodejs/node' || github.event_name == 'workflow_dispatch'
     runs-on: ubuntu-latest
     strategy:
       fail-fast: false  # Prevent other jobs from aborting if one fails

From 782bb6cac4a50141279211fc857db0ad8cb96b23 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Fri, 29 Nov 2024 01:47:40 -0500
Subject: [PATCH 151/216] deps: update zlib to 1.3.0.1-motley-82a5fec

PR-URL: https://github.com/nodejs/node/pull/55980
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 deps/zlib/google/zip_writer.cc | 9 +++++----
 src/zlib_version.h             | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/deps/zlib/google/zip_writer.cc b/deps/zlib/google/zip_writer.cc
index 31161ae86c3b7a..34ab0ad9ef2887 100644
--- a/deps/zlib/google/zip_writer.cc
+++ b/deps/zlib/google/zip_writer.cc
@@ -5,6 +5,7 @@
 #include "third_party/zlib/google/zip_writer.h"
 
 #include <algorithm>
+#include <tuple>
 
 #include "base/files/file.h"
 #include "base/logging.h"
@@ -193,8 +194,8 @@ bool ZipWriter::AddMixedEntries(Paths paths) {
   while (!paths.empty()) {
     // Work with chunks of 50 paths at most.
     const size_t n = std::min<size_t>(paths.size(), 50);
-    const Paths relative_paths = paths.subspan(0, n);
-    paths = paths.subspan(n, paths.size() - n);
+    Paths relative_paths;
+    std::tie(relative_paths, paths) = paths.split_at(n);
 
     files.clear();
     if (!file_accessor_->Open(relative_paths, &files) || files.size() != n)
@@ -233,8 +234,8 @@ bool ZipWriter::AddFileEntries(Paths paths) {
   while (!paths.empty()) {
     // Work with chunks of 50 paths at most.
     const size_t n = std::min<size_t>(paths.size(), 50);
-    const Paths relative_paths = paths.subspan(0, n);
-    paths = paths.subspan(n, paths.size() - n);
+    Paths relative_paths;
+    std::tie(relative_paths, paths) = paths.split_at(n);
 
     DCHECK_EQ(relative_paths.size(), n);
 
diff --git a/src/zlib_version.h b/src/zlib_version.h
index 6484e1dce878f3..3b53884aae2206 100644
--- a/src/zlib_version.h
+++ b/src/zlib_version.h
@@ -2,5 +2,5 @@
 // Refer to tools/dep_updaters/update-zlib.sh
 #ifndef SRC_ZLIB_VERSION_H_
 #define SRC_ZLIB_VERSION_H_
-#define ZLIB_VERSION "1.3.0.1-motley-7e2e4d7"
+#define ZLIB_VERSION "1.3.0.1-motley-82a5fec"
 #endif  // SRC_ZLIB_VERSION_H_

From 851a3d7d8deb37ead1a7cb8806e2e82ef139a8c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C3=ABl=20Zasso?= <targos@protonmail.com>
Date: Fri, 29 Nov 2024 10:09:04 +0100
Subject: [PATCH 152/216] tools: fix update-undici script

The `build:node` npm script now expects esbuild to be installed and
bin-linked.

Closes: https://github.com/nodejs/node/issues/56061
PR-URL: https://github.com/nodejs/node/pull/56069
Fixes: https://github.com/nodejs/node/issues/56061
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Pietro Marchini <pietro.marchini94@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 tools/dep_updaters/update-undici.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/dep_updaters/update-undici.sh b/tools/dep_updaters/update-undici.sh
index f8906f64e846d6..087038cfd45ed1 100755
--- a/tools/dep_updaters/update-undici.sh
+++ b/tools/dep_updaters/update-undici.sh
@@ -80,7 +80,7 @@ cd "$ROOT"
 
   # Rebuild components from source
   rm lib/llhttp/llhttp*.*
-  "$NODE" "$NPM" install --no-bin-link --ignore-scripts
+  "$NODE" "$NPM" install --ignore-scripts
   "$NODE" "$NPM" run build:wasm > lib/llhttp/wasm_build_env.txt
   "$NODE" "$NPM" run build:node
   "$NODE" "$NPM" prune --production

From 12b0cecc206ad7e3fdca0419673d871a80686677 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Fri, 29 Nov 2024 18:04:28 -0300
Subject: [PATCH 153/216] meta: add releasers as CODEOWNERS to proposal action
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56043
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/CODEOWNERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index 3f259a9152eeec..98fdf0b6f498d6 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -142,6 +142,7 @@
 # Actions
 
 /.github/workflows/* @nodejs/actions
+/.github/workflows/create-release-proposal.yml @nodejs/releasers
 /tools/actions/* @nodejs/actions
 
 # Test runner

From f370ec09898d5c7c8ca91870af73e4884234adad Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Fri, 29 Nov 2024 23:14:00 -0300
Subject: [PATCH 154/216] build: remove defaults for create-release-proposal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To prevent users from executing the workflow via CLI
without passing the desired inputs.

PR-URL: https://github.com/nodejs/node/pull/56042
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 .github/workflows/create-release-proposal.yml | 2 --
 1 file changed, 2 deletions(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index 5f0f80eed24c95..c6f77e33961708 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -11,12 +11,10 @@ on:
       release-line:
         required: true
         type: number
-        default: 23
         description: 'The release line (without dots or prefix). e.g: 22'
       release-date:
         required: true
         type: string
-        default: YYYY-MM-DD
         description: The release date in YYYY-MM-DD format
 
 concurrency: ${{ github.workflow }}

From 51262ec84e8b44cac7e4cac1f92541c847979744 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tobias=20Nie=C3=9Fen?= <tniessen@tnie.de>
Date: Sat, 30 Nov 2024 19:41:06 +0100
Subject: [PATCH 155/216] doc: rename file to advocacy-ambassador-program.md

PR-URL: https://github.com/nodejs/node/pull/56046
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 ...vocacy-ambasador-program.md => advocacy-ambassador-program.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename doc/contributing/{advocacy-ambasador-program.md => advocacy-ambassador-program.md} (100%)

diff --git a/doc/contributing/advocacy-ambasador-program.md b/doc/contributing/advocacy-ambassador-program.md
similarity index 100%
rename from doc/contributing/advocacy-ambasador-program.md
rename to doc/contributing/advocacy-ambassador-program.md

From e464c6f7a5c67ae65af6fc1c7d3e57babb0a9d3a Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Sun, 1 Dec 2024 08:13:10 +0100
Subject: [PATCH 156/216] test: move test-worker-arraybuffer-zerofill to
 parallel

Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: https://github.com/nodejs/node/issues/54918
Refs: https://github.com/nodejs/node/pull/54839
Refs: https://github.com/nodejs/node/pull/54802
PR-URL: https://github.com/nodejs/node/pull/56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 test/parallel/parallel.status                                   | 2 --
 .../test-worker-arraybuffer-zerofill.js                         | 0
 2 files changed, 2 deletions(-)
 rename test/{sequential => parallel}/test-worker-arraybuffer-zerofill.js (100%)

diff --git a/test/parallel/parallel.status b/test/parallel/parallel.status
index 79a953df7da64b..ba6f4cbbec1519 100644
--- a/test/parallel/parallel.status
+++ b/test/parallel/parallel.status
@@ -44,8 +44,6 @@ test-esm-loader-hooks-inspect-wait: PASS, FLAKY
 test-runner-run-watch: PASS, FLAKY
 # https://github.com/nodejs/node/issues/37692
 test-fs-utimes: PASS, FLAKY
-# https://github.com/nodejs/node/issues/52274
-test-worker-arraybuffer-zerofill: PASS, FLAKY
 
 [$system==linux || $system==win32]
 # https://github.com/nodejs/node/issues/49605
diff --git a/test/sequential/test-worker-arraybuffer-zerofill.js b/test/parallel/test-worker-arraybuffer-zerofill.js
similarity index 100%
rename from test/sequential/test-worker-arraybuffer-zerofill.js
rename to test/parallel/test-worker-arraybuffer-zerofill.js

From 1d1488626617e07684d8bb2914150df31abfe841 Mon Sep 17 00:00:00 2001
From: theanarkh <theratliter@gmail.com>
Date: Mon, 2 Dec 2024 12:48:53 +0800
Subject: [PATCH 157/216] dgram: check udp buffer size to avoid fd leak

PR-URL: https://github.com/nodejs/node/pull/56084
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 lib/dgram.js                                  |  7 +++++++
 test/parallel/test-dgram-createSocket-type.js | 13 +++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/lib/dgram.js b/lib/dgram.js
index d542e1c001f764..ca70506b427103 100644
--- a/lib/dgram.js
+++ b/lib/dgram.js
@@ -56,6 +56,7 @@ const {
   validateString,
   validateNumber,
   validatePort,
+  validateUint32,
 } = require('internal/validators');
 const { Buffer } = require('buffer');
 const { deprecate, guessHandleType, promisify, SymbolAsyncDispose, SymbolDispose } = require('internal/util');
@@ -111,6 +112,12 @@ function Socket(type, listener) {
     options = type;
     type = options.type;
     lookup = options.lookup;
+    if (options.recvBufferSize) {
+      validateUint32(options.recvBufferSize, 'options.recvBufferSize');
+    }
+    if (options.sendBufferSize) {
+      validateUint32(options.sendBufferSize, 'options.sendBufferSize');
+    }
     recvBufferSize = options.recvBufferSize;
     sendBufferSize = options.sendBufferSize;
   }
diff --git a/test/parallel/test-dgram-createSocket-type.js b/test/parallel/test-dgram-createSocket-type.js
index 19bbd6c1b2b088..ba033839cd1306 100644
--- a/test/parallel/test-dgram-createSocket-type.js
+++ b/test/parallel/test-dgram-createSocket-type.js
@@ -59,3 +59,16 @@ validTypes.forEach((validType) => {
     socket.close();
   }));
 }
+
+{
+  [
+    { type: 'udp4', recvBufferSize: 'invalid' },
+    { type: 'udp4', sendBufferSize: 'invalid' },
+  ].forEach((options) => {
+    assert.throws(() => {
+      dgram.createSocket(options);
+    }, {
+      code: 'ERR_INVALID_ARG_TYPE',
+    });
+  });
+}

From 7a1365ba62c332e6a554436c929e9502b98b65cf Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Tue, 3 Dec 2024 10:22:48 -0300
Subject: [PATCH 158/216] doc: add create-release-action to process

PR-URL: https://github.com/nodejs/node/pull/55993
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
---
 doc/contributing/releases.md | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 32e9880b444dcb..88e1be5538d76e 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -282,7 +282,15 @@ You can integrate the PRs into the proposal without running full CI.
 
 ### 2. Create a new branch for the release
 
-⚠️ At this point, you can either run `git node release --prepare`:
+> \[!TIP] Once the staging branch is up-to-date you can use the
+> [`create-release-proposal`][] action to generate the proposal.
+
+```bash
+gh workflow run "Create Release Proposal" -f release-line=N -f release-date=YYYY-MM-DD
+```
+
+If you prefer to run it locally you can either run
+`git node release --prepare`:
 
 ```bash
 git node release -S --prepare x.y.z
@@ -1424,6 +1432,7 @@ Typical resolution: sign the release again.
 [Node.js Snap management repository]: https://github.com/nodejs/snap
 [Partner Communities]: https://github.com/nodejs/community-committee/blob/HEAD/governance/PARTNER_COMMUNITIES.md
 [Snap]: https://snapcraft.io/node
+[`create-release-proposal`]: https://github.com/nodejs/node/actions/workflows/create-release-proposal.yml
 [build-infra team]: https://github.com/orgs/nodejs/teams/build-infra
 [expected assets]: https://github.com/nodejs/build/tree/HEAD/ansible/www-standalone/tools/promote/expected_assets
 [nodejs.org release-post.js script]: https://github.com/nodejs/nodejs.org/blob/HEAD/scripts/release-post/index.mjs

From bc92a96a5aea7e0217d30cc54594fdeed61d2ca2 Mon Sep 17 00:00:00 2001
From: Shelley Vohr <shelley.vohr@gmail.com>
Date: Tue, 3 Dec 2024 15:17:48 +0100
Subject: [PATCH 159/216] build: allow overriding clang usage

PR-URL: https://github.com/nodejs/node/pull/56016
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 configure.py | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/configure.py b/configure.py
index ae3dd156d4e02e..0df90b176e9b54 100755
--- a/configure.py
+++ b/configure.py
@@ -128,6 +128,12 @@
     default=None,
     help='use the prefix to look for pre-installed headers')
 
+parser.add_argument('--use_clang',
+    action='store_true',
+    dest='use_clang',
+    default=None,
+    help='use clang instead of gcc')
+
 parser.add_argument('--dest-os',
     action='store',
     dest='dest_os',
@@ -1358,6 +1364,10 @@ def configure_node(o):
   o['variables']['target_arch'] = target_arch
   o['variables']['node_byteorder'] = sys.byteorder
 
+  # Allow overriding the compiler - needed by embedders.
+  if options.use_clang:
+    o['variables']['clang'] = 1
+
   cross_compiling = (options.cross_compiling
                      if options.cross_compiling is not None
                      else target_arch != host_arch)

From c58065ae775b91ed33a02168c4c911a16c5cab9a Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 3 Dec 2024 20:01:57 +0000
Subject: [PATCH 160/216] meta: bump actions/setup-node from 4.0.3 to 4.1.0

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.0.3 to 4.1.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v4.0.3...39370e3970a6d050c480ffad4ff0ed4d3fdee5af)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/56100
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/create-release-proposal.yml | 2 +-
 .github/workflows/update-wpt.yml              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index c6f77e33961708..1047bce2420dc8 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -43,7 +43,7 @@ jobs:
 
       # Install dependencies
       - name: Install Node.js
-        uses: actions/setup-node@1e60f620b9541d16bece96c5465dc8ee9832be0b  # v4.0.3
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
 
diff --git a/.github/workflows/update-wpt.yml b/.github/workflows/update-wpt.yml
index 71cd1bab487735..72ec030e9d645a 100644
--- a/.github/workflows/update-wpt.yml
+++ b/.github/workflows/update-wpt.yml
@@ -32,7 +32,7 @@ jobs:
           persist-credentials: false
 
       - name: Install Node.js
-        uses: actions/setup-node@1e60f620b9541d16bece96c5465dc8ee9832be0b  # v4.0.3
+        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af  # v4.1.0
         with:
           node-version: ${{ env.NODE_VERSION }}
 

From a953301a1cb15acf772aebf2418e9d70179cc110 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 3 Dec 2024 20:02:11 +0000
Subject: [PATCH 161/216] meta: bump step-security/harden-runner from 2.10.1 to
 2.10.2

Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.10.1 to 2.10.2.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](https://github.com/step-security/harden-runner/compare/91182cccc01eb5e619899d80e4e971d6181294a7...0080882f6c36860b6ba35c610c98ce87d4e2f26f)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/56101
Reviewed-By: Tierney Cyren <hello@bnb.im>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index 6fbf46e7f3e22f..f14a2e4b584959 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -33,7 +33,7 @@ jobs:
 
     steps:
       - name: Harden Runner
-        uses: step-security/harden-runner@91182cccc01eb5e619899d80e4e971d6181294a7  # v2.10.1
+        uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f  # v2.10.2
         with:
           egress-policy: audit  # TODO: change to 'egress-policy: block' after couple of runs
 

From 23f319803d0935df8acc723f5e3907ccf5ec64a2 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 3 Dec 2024 20:02:26 +0000
Subject: [PATCH 162/216] meta: bump actions/checkout from 4.1.7 to 4.2.2

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.7 to 4.2.2.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.7...11bd71901bbe5b1630ceea73d27597364c9af683)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/56102
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/create-release-proposal.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index 1047bce2420dc8..6155b5da75f17b 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -34,7 +34,7 @@ jobs:
       RELEASE_LINE: ${{ inputs.release-line }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332  # v4.1.7
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           ref: ${{ env.STAGING_BRANCH }}
           # Needs the whole git history for ncu to work

From 1a193bf2560d86842a8dd81517a192034458ea85 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 3 Dec 2024 20:03:02 +0000
Subject: [PATCH 163/216] meta: bump github/codeql-action from 3.27.0 to 3.27.5

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.27.0 to 3.27.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/662472033e021d55d94146f66f6058822b0b39fd...f09c1c0a94de965c15400f5634aa42fac8fb8f88)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
PR-URL: https://github.com/nodejs/node/pull/56103
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 .github/workflows/scorecard.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/scorecard.yml b/.github/workflows/scorecard.yml
index f14a2e4b584959..75c67cc238dbc3 100644
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -73,6 +73,6 @@ jobs:
 
       # Upload the results to GitHub's code scanning dashboard.
       - name: Upload to code-scanning
-        uses: github/codeql-action/upload-sarif@662472033e021d55d94146f66f6058822b0b39fd  # v3.27.0
+        uses: github/codeql-action/upload-sarif@f09c1c0a94de965c15400f5634aa42fac8fb8f88  # v3.27.5
         with:
           sarif_file: results.sarif

From 23fb644037a339f9176a7d8c8f19cf8ae785b8c8 Mon Sep 17 00:00:00 2001
From: Filip Skokan <panva.ip@gmail.com>
Date: Wed, 4 Dec 2024 11:46:27 +0100
Subject: [PATCH 164/216] crypto: ensure CryptoKey usages and algorithm are
 cached objects

PR-URL: https://github.com/nodejs/node/pull/56108
Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 lib/internal/crypto/keys.js                      |  3 +--
 .../test-webcrypto-export-import-cfrg.js         | 10 ++++++++++
 test/parallel/test-webcrypto-export-import-ec.js | 10 ++++++++++
 .../parallel/test-webcrypto-export-import-rsa.js |  8 ++++++++
 test/parallel/test-webcrypto-export-import.js    |  6 ++++++
 test/parallel/test-webcrypto-keygen.js           | 16 ++++++++++++++++
 6 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/lib/internal/crypto/keys.js b/lib/internal/crypto/keys.js
index f078e4410edd9b..0a588a1358c6a4 100644
--- a/lib/internal/crypto/keys.js
+++ b/lib/internal/crypto/keys.js
@@ -1,7 +1,6 @@
 'use strict';
 
 const {
-  ArrayFrom,
   ArrayPrototypeSlice,
   ObjectDefineProperty,
   ObjectDefineProperties,
@@ -687,7 +686,7 @@ class CryptoKey {
   get usages() {
     if (!(this instanceof CryptoKey))
       throw new ERR_INVALID_THIS('CryptoKey');
-    return ArrayFrom(this[kKeyUsages]);
+    return this[kKeyUsages];
   }
 }
 
diff --git a/test/parallel/test-webcrypto-export-import-cfrg.js b/test/parallel/test-webcrypto-export-import-cfrg.js
index 2c2cb80a31c1bf..84d969eb1c5850 100644
--- a/test/parallel/test-webcrypto-export-import-cfrg.js
+++ b/test/parallel/test-webcrypto-export-import-cfrg.js
@@ -115,6 +115,8 @@ async function testImportSpki({ name, publicUsages }, extractable) {
   assert.strictEqual(key.extractable, extractable);
   assert.deepStrictEqual(key.usages, publicUsages);
   assert.deepStrictEqual(key.algorithm.name, name);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     // Test the roundtrip
@@ -151,6 +153,8 @@ async function testImportPkcs8({ name, privateUsages }, extractable) {
   assert.strictEqual(key.extractable, extractable);
   assert.deepStrictEqual(key.usages, privateUsages);
   assert.deepStrictEqual(key.algorithm.name, name);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     // Test the roundtrip
@@ -227,6 +231,10 @@ async function testImportJwk({ name, publicUsages, privateUsages }, extractable)
   assert.deepStrictEqual(privateKey.usages, privateUsages);
   assert.strictEqual(publicKey.algorithm.name, name);
   assert.strictEqual(privateKey.algorithm.name, name);
+  assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+  assert.strictEqual(privateKey.usages, privateKey.usages);
+  assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+  assert.strictEqual(publicKey.usages, publicKey.usages);
 
   if (extractable) {
     // Test the round trip
@@ -345,6 +353,8 @@ async function testImportRaw({ name, publicUsages }) {
   assert.strictEqual(publicKey.type, 'public');
   assert.deepStrictEqual(publicKey.usages, publicUsages);
   assert.strictEqual(publicKey.algorithm.name, name);
+  assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+  assert.strictEqual(publicKey.usages, publicKey.usages);
 }
 
 (async function() {
diff --git a/test/parallel/test-webcrypto-export-import-ec.js b/test/parallel/test-webcrypto-export-import-ec.js
index 35c657bbbf0ef1..57f1b2831e33bc 100644
--- a/test/parallel/test-webcrypto-export-import-ec.js
+++ b/test/parallel/test-webcrypto-export-import-ec.js
@@ -111,6 +111,8 @@ async function testImportSpki({ name, publicUsages }, namedCurve, extractable) {
   assert.deepStrictEqual(key.usages, publicUsages);
   assert.deepStrictEqual(key.algorithm.name, name);
   assert.deepStrictEqual(key.algorithm.namedCurve, namedCurve);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     // Test the roundtrip
@@ -151,6 +153,8 @@ async function testImportPkcs8(
   assert.deepStrictEqual(key.usages, privateUsages);
   assert.deepStrictEqual(key.algorithm.name, name);
   assert.deepStrictEqual(key.algorithm.namedCurve, namedCurve);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     // Test the roundtrip
@@ -234,6 +238,10 @@ async function testImportJwk(
   assert.strictEqual(privateKey.algorithm.name, name);
   assert.strictEqual(publicKey.algorithm.namedCurve, namedCurve);
   assert.strictEqual(privateKey.algorithm.namedCurve, namedCurve);
+  assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+  assert.strictEqual(privateKey.usages, privateKey.usages);
+  assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+  assert.strictEqual(publicKey.usages, publicKey.usages);
 
   if (extractable) {
     // Test the round trip
@@ -368,6 +376,8 @@ async function testImportRaw({ name, publicUsages }, namedCurve) {
   assert.deepStrictEqual(publicKey.usages, publicUsages);
   assert.strictEqual(publicKey.algorithm.name, name);
   assert.strictEqual(publicKey.algorithm.namedCurve, namedCurve);
+  assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+  assert.strictEqual(publicKey.usages, publicKey.usages);
 }
 
 (async function() {
diff --git a/test/parallel/test-webcrypto-export-import-rsa.js b/test/parallel/test-webcrypto-export-import-rsa.js
index 303efef7bb8f84..62734c69d5ed77 100644
--- a/test/parallel/test-webcrypto-export-import-rsa.js
+++ b/test/parallel/test-webcrypto-export-import-rsa.js
@@ -315,6 +315,8 @@ async function testImportSpki({ name, publicUsages }, size, hash, extractable) {
   assert.deepStrictEqual(key.algorithm.publicExponent,
                          new Uint8Array([1, 0, 1]));
   assert.strictEqual(key.algorithm.hash.name, hash);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     const spki = await subtle.exportKey('spki', key);
@@ -349,6 +351,8 @@ async function testImportPkcs8(
   assert.deepStrictEqual(key.algorithm.publicExponent,
                          new Uint8Array([1, 0, 1]));
   assert.strictEqual(key.algorithm.hash.name, hash);
+  assert.strictEqual(key.algorithm, key.algorithm);
+  assert.strictEqual(key.usages, key.usages);
 
   if (extractable) {
     const pkcs8 = await subtle.exportKey('pkcs8', key);
@@ -415,6 +419,10 @@ async function testImportJwk(
                          new Uint8Array([1, 0, 1]));
   assert.deepStrictEqual(publicKey.algorithm.publicExponent,
                          privateKey.algorithm.publicExponent);
+  assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+  assert.strictEqual(privateKey.usages, privateKey.usages);
+  assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+  assert.strictEqual(publicKey.usages, publicKey.usages);
 
   if (extractable) {
     const [
diff --git a/test/parallel/test-webcrypto-export-import.js b/test/parallel/test-webcrypto-export-import.js
index e7d45dbc5efeea..7883268dfd4002 100644
--- a/test/parallel/test-webcrypto-export-import.js
+++ b/test/parallel/test-webcrypto-export-import.js
@@ -81,6 +81,10 @@ const { subtle } = globalThis.crypto;
         hash: 'SHA-256'
       }, true, ['sign', 'verify']);
 
+
+    assert.strictEqual(key.algorithm, key.algorithm);
+    assert.strictEqual(key.usages, key.usages);
+
     const raw = await subtle.exportKey('raw', key);
 
     assert.deepStrictEqual(
@@ -122,6 +126,8 @@ const { subtle } = globalThis.crypto;
         name: 'AES-CTR',
         length: 256,
       }, true, ['encrypt', 'decrypt']);
+    assert.strictEqual(key.algorithm, key.algorithm);
+    assert.strictEqual(key.usages, key.usages);
 
     const raw = await subtle.exportKey('raw', key);
 
diff --git a/test/parallel/test-webcrypto-keygen.js b/test/parallel/test-webcrypto-keygen.js
index dceb3348528911..5d36aa3e16a6ca 100644
--- a/test/parallel/test-webcrypto-keygen.js
+++ b/test/parallel/test-webcrypto-keygen.js
@@ -298,6 +298,10 @@ const vectors = {
       KeyObject.from(privateKey).asymmetricKeyDetails.publicExponent,
       bigIntArrayToUnsignedBigInt(publicExponent));
     assert.strictEqual(privateKey.algorithm.hash.name, hash);
+    assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+    assert.strictEqual(privateKey.usages, privateKey.usages);
+    assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+    assert.strictEqual(publicKey.usages, publicKey.usages);
 
     // Missing parameters
     await assert.rejects(
@@ -442,6 +446,10 @@ const vectors = {
     assert.strictEqual(privateKey.algorithm.name, name);
     assert.strictEqual(publicKey.algorithm.namedCurve, namedCurve);
     assert.strictEqual(privateKey.algorithm.namedCurve, namedCurve);
+    assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+    assert.strictEqual(privateKey.usages, privateKey.usages);
+    assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+    assert.strictEqual(publicKey.usages, publicKey.usages);
 
     // Invalid parameters
     [1, true, {}, [], null].forEach(async (namedCurve) => {
@@ -508,6 +516,8 @@ const vectors = {
     assert.deepStrictEqual(key.usages, usages);
     assert.strictEqual(key.algorithm.name, name);
     assert.strictEqual(key.algorithm.length, length);
+    assert.strictEqual(key.algorithm, key.algorithm);
+    assert.strictEqual(key.usages, key.usages);
 
     // Invalid parameters
     [1, 100, 257, '', false, null].forEach(async (length) => {
@@ -568,6 +578,8 @@ const vectors = {
     assert.strictEqual(key.algorithm.name, 'HMAC');
     assert.strictEqual(key.algorithm.length, length);
     assert.strictEqual(key.algorithm.hash.name, hash);
+    assert.strictEqual(key.algorithm, key.algorithm);
+    assert.strictEqual(key.usages, key.usages);
 
     [1, false, null].forEach(async (hash) => {
       await assert.rejects(
@@ -632,6 +644,10 @@ assert.throws(() => new CryptoKey(), { code: 'ERR_ILLEGAL_CONSTRUCTOR' });
     assert.deepStrictEqual(privateKey.usages, privateUsages);
     assert.strictEqual(publicKey.algorithm.name, name);
     assert.strictEqual(privateKey.algorithm.name, name);
+    assert.strictEqual(privateKey.algorithm, privateKey.algorithm);
+    assert.strictEqual(privateKey.usages, privateKey.usages);
+    assert.strictEqual(publicKey.algorithm, publicKey.algorithm);
+    assert.strictEqual(publicKey.usages, publicKey.usages);
   }
 
   const kTests = [

From b9b006331ff88b02200372836fbef22c33fc93df Mon Sep 17 00:00:00 2001
From: Jordan Harband <ljharb@gmail.com>
Date: Wed, 4 Dec 2024 11:14:44 -0800
Subject: [PATCH 165/216] doc: add LJHarb to collaborators

Fixes: https://github.com/nodejs/node/issues/55918
PR-URL: https://github.com/nodejs/node/pull/56132
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 8f01a50eb370d7..0f793614a6e47f 100644
--- a/README.md
+++ b/README.md
@@ -379,6 +379,8 @@ For information about the governance of the Node.js project, see
   **Nitzan Uziely** <<linkgoron@gmail.com>>
 * [LiviaMedeiros](https://github.com/LiviaMedeiros) -
   **LiviaMedeiros** <<livia@cirno.name>>
+* [ljharb](https://github.com/ljharb) -
+  **Jordan Harband** <<ljharb@gmail.com>>
 * [lpinca](https://github.com/lpinca) -
   **Luigi Pinca** <<luigipinca@gmail.com>> (he/him)
 * [lukekarrys](https://github.com/lukekarrys) -

From 13455ca9ce9d7df8e5ebdaf63e61ee5333b19eb5 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Thu, 5 Dec 2024 20:33:44 +0000
Subject: [PATCH 166/216] tools: update `create-release-proposal` workflow

PR-URL: https://github.com/nodejs/node/pull/56054
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 .github/workflows/create-release-proposal.yml |  27 ++--
 tools/actions/create-release.sh               | 115 ++++++++++++++++--
 2 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index 6155b5da75f17b..d3ffa3ad49b5e2 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -1,7 +1,6 @@
 # This action requires the following secrets to be set on the repository:
 #   GH_USER_NAME: GitHub user whose Jenkins and GitHub token are defined below
 #   GH_USER_TOKEN: GitHub user token, to be used by ncu and to push changes
-#   JENKINS_TOKEN: Jenkins token, to be used to check CI status
 
 name: Create Release Proposal
 
@@ -24,6 +23,7 @@ env:
 
 permissions:
   contents: write
+  pull-requests: write
 
 jobs:
   releasePrepare:
@@ -37,9 +37,7 @@ jobs:
       - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
         with:
           ref: ${{ env.STAGING_BRANCH }}
-          # Needs the whole git history for ncu to work
-          # See https://github.com/nodejs/node-core-utils/pull/486
-          fetch-depth: 0
+          persist-credentials: false
 
       # Install dependencies
       - name: Install Node.js
@@ -56,21 +54,19 @@ jobs:
           ncu-config set upstream origin
           ncu-config set username "$USERNAME"
           ncu-config set token "$GH_TOKEN"
-          ncu-config set jenkins_token "$JENKINS_TOKEN"
           ncu-config set repo "$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)"
           ncu-config set owner "${GITHUB_REPOSITORY_OWNER}"
         env:
           USERNAME: ${{ secrets.JENKINS_USER }}
-          GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
-          JENKINS_TOKEN: ${{ secrets.JENKINS_TOKEN }}
+          GH_TOKEN: ${{ github.token }}
 
       - name: Set up ghauth config (Ubuntu)
         run: |
-          mkdir -p ~/.config/changelog-maker/
-          echo '{
-            "user": "'$(ncu-config get username)'",
-            "token": "'$(ncu-config get token)'"
-          }' > ~/.config/changelog-maker/config.json
+          mkdir -p "${XDG_CONFIG_HOME:-~/.config}/changelog-maker"
+          echo '{}' | jq '{user: env.USERNAME, token: env.TOKEN}' > "${XDG_CONFIG_HOME:-~/.config}/changelog-maker/config.json"
+        env:
+          USERNAME: ${{ secrets.JENKINS_USER }}
+          TOKEN: ${{ github.token }}
 
       - name: Setup git author
         run: |
@@ -78,7 +74,12 @@ jobs:
           git config --local user.name "Node.js GitHub Bot"
 
       - name: Start git node release prepare
+        # The curl command is to make sure we run the version of the script corresponding to the current workflow.
         run: |
+          git update-index --assume-unchanged tools/actions/create-release.sh
+          curl -fsSLo tools/actions/create-release.sh https://github.com/${GITHUB_REPOSITORY}/raw/${GITHUB_SHA}/tools/actions/create-release.sh
           ./tools/actions/create-release.sh "${RELEASE_DATE}" "${RELEASE_LINE}"
         env:
-          GH_TOKEN: ${{ secrets.GH_USER_TOKEN }}
+          GH_TOKEN: ${{ github.token }}
+          # We want the bot to push the push the release commit so CI runs on it.
+          BOT_TOKEN: ${{ secrets.GH_USER_TOKEN }}
diff --git a/tools/actions/create-release.sh b/tools/actions/create-release.sh
index 3a69b3f5602ffc..e3cfd76952a18b 100755
--- a/tools/actions/create-release.sh
+++ b/tools/actions/create-release.sh
@@ -2,6 +2,9 @@
 
 set -xe
 
+GITHUB_REPOSITORY=${GITHUB_REPOSITORY:-nodejs/node}
+BOT_TOKEN=${BOT_TOKEN:-}
+
 RELEASE_DATE=$1
 RELEASE_LINE=$2
 
@@ -10,24 +13,114 @@ if [ -z "$RELEASE_DATE" ] || [ -z "$RELEASE_LINE" ]; then
   exit 1
 fi
 
+if [ -z "$GITHUB_REPOSITORY" ] || [ -z "$BOT_TOKEN" ]; then
+  echo "Invalid value in env for GITHUB_REPOSITORY and BOT_TOKEN"
+  exit 1
+fi
+
+if ! command -v node || ! command -v gh || ! command -v git || ! command -v awk; then
+  echo "Missing required dependencies"
+  exit 1
+fi
+
 git node release --prepare --skipBranchDiff --yes --releaseDate "$RELEASE_DATE"
-# We use it to not specify the branch name as it changes based on
-# the commit list (semver-minor/semver-patch)
-git config push.default current
-git push
+
+HEAD_BRANCH="$(git rev-parse --abbrev-ref HEAD)"
+HEAD_SHA="$(git rev-parse HEAD^)"
 
 TITLE=$(awk "/^## ${RELEASE_DATE}/ { print substr(\$0, 4) }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")
 
 # Use a temporary file for the PR body
 TEMP_BODY="$(awk "/## ${RELEASE_DATE}/,/^<a id=/{ if (!/^<a id=/) print }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")"
 
-PR_URL="$(gh pr create --title "$TITLE" --body "$TEMP_BODY" --base "v$RELEASE_LINE.x")"
+# Create the proposal branch
+gh api \
+  --method POST \
+  -H "Accept: application/vnd.github+json" \
+  -H "X-GitHub-Api-Version: 2022-11-28" \
+  "/repos/${GITHUB_REPOSITORY}/git/refs" \
+   -f "ref=refs/heads/$HEAD_BRANCH" -f "sha=$HEAD_SHA"
+
+# Create the proposal PR
+PR_URL="$(gh api \
+  --method POST \
+  --jq .html_url \
+  -H "Accept: application/vnd.github+json" \
+  -H "X-GitHub-Api-Version: 2022-11-28" \
+  "/repos/${GITHUB_REPOSITORY}/pulls" \
+   -f "title=$TITLE" -f "body=$TEMP_BODY" -f "head=$HEAD_BRANCH" -f "base=v$RELEASE_LINE.x")"
 
-# Amend commit message so it contains the correct PR-URL trailer.
-AMENDED_COMMIT_MSG="$(git log -1 --pretty=%B | sed "s|PR-URL: TODO|PR-URL: $PR_URL|")"
+# Push the release commit to the proposal branch using `BOT_TOKEN` from the env
+node --input-type=module - \
+    "$GITHUB_REPOSITORY" \
+    "$HEAD_BRANCH" \
+    "$HEAD_SHA" \
+    "$(git log -1 HEAD --format=%s || true)" \
+    "$(git log -1 HEAD --format=%b | awk -v PR_URL="$PR_URL" '{sub(/^PR-URL: TODO$/, "PR-URL: " PR_URL)} 1' || true)" \
+    "$(git show HEAD --diff-filter=d --name-only --format= || true)" \
+    "$(git show HEAD --diff-filter=D --name-only --format= || true)" \
+<<'EOF'
+const [,,
+  repo,
+  branch,
+  parentCommitSha,
+  commit_title,
+  commit_body,
+  modifiedOrAddedFiles,
+  deletedFiles,
+] = process.argv;
 
-# Replace "TODO" with the PR URL in the last commit
-git commit --amend --no-edit -m "$AMENDED_COMMIT_MSG" || true
+import { readFileSync } from 'node:fs';
+import util from 'node:util';
 
-# Force-push the amended commit
-git push --force
+const query = `
+mutation ($repo: String! $branch: String!, $parentCommitSha: GitObjectID!, $changes: FileChanges!, $commit_title: String!, $commit_body: String) {
+  createCommitOnBranch(input: {
+    branch: {
+      repositoryNameWithOwner: $repo,
+      branchName: $branch
+    },
+    message: {
+      headline: $commit_title,
+      body: $commit_body
+    },
+    expectedHeadOid: $parentCommitSha,
+    fileChanges: $changes
+  }) {
+    commit {
+      url
+    }
+  }
+}
+`;
+const response = await fetch('https://api.github.com/graphql', {
+  method: 'POST',
+  headers: {
+    'Authorization': `bearer ${process.env.BOT_TOKEN}`,
+  },
+  body: JSON.stringify({
+    query,
+    variables: {
+      repo,
+      branch,
+      parentCommitSha,
+      commit_title,
+      commit_body,
+      changes: {
+        additions: modifiedOrAddedFiles.split('\n').filter(Boolean)
+          .map(path => ({ path, contents: readFileSync(path).toString('base64') })),
+        deletions: deletedFiles.split('\n').filter(Boolean),
+      }
+    },
+  })
+});
+if (!response.ok) {
+  console.log({statusCode: response.status, statusText: response.statusText});
+  process.exitCode ||= 1;
+}
+const data = await response.json();
+if (data.errors?.length) {
+  throw new Error('Endpoint returned an error', { cause: data });
+}
+console.log(util.inspect(data, { depth: Infinity }));
+EOF

From c2fa359f7af5c7de340670a3208087d90c3de8d8 Mon Sep 17 00:00:00 2001
From: Ruy Adorno <ruy@vlt.sh>
Date: Thu, 5 Dec 2024 16:43:08 -0500
Subject: [PATCH 167/216] doc: mention `-a` flag for the release script

Document that running `./tools/release.sh` script using the recently
added `-a` CLI flag enables the previously-default interactive interface
to select the correct PGP key.

PR-URL: https://github.com/nodejs/node/pull/56124
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 doc/contributing/releases.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 88e1be5538d76e..055290263388b5 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -966,6 +966,13 @@ a `NODEJS_RELEASE_HOST` environment variable:
 NODEJS_RELEASE_HOST=proxy.xyz ./tools/release.sh
 ```
 
+In case `gpg` is unable to autoselect a key, you can retry using the
+`-a` option to enable an interactive interface:
+
+```bash
+./tools/release.sh -a
+```
+
 > \[!TIP]
 > Sometimes, due to machines being overloaded or other external factors,
 > the files at <https://nodejs.org/dist/index.json>, <https://nodejs.org/dist/index.tab>

From 21e21a270eec9282a7a9737e6a7ad4c1e97ae497 Mon Sep 17 00:00:00 2001
From: Luigi Pinca <luigipinca@gmail.com>
Date: Fri, 6 Dec 2024 07:42:07 +0100
Subject: [PATCH 168/216] test: remove test-fs-utimes flaky designation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The issue is likely the same as other tests that time out.

Refs: https://github.com/nodejs/node/issues/54918
Refs: https://github.com/nodejs/node/pull/54844
Refs: https://github.com/nodejs/node/pull/54802
PR-URL: https://github.com/nodejs/node/pull/56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
---
 test/parallel/parallel.status | 2 --
 1 file changed, 2 deletions(-)

diff --git a/test/parallel/parallel.status b/test/parallel/parallel.status
index ba6f4cbbec1519..ae1763d1ee4eda 100644
--- a/test/parallel/parallel.status
+++ b/test/parallel/parallel.status
@@ -42,8 +42,6 @@ test-performance-function: PASS, FLAKY
 test-esm-loader-hooks-inspect-wait: PASS, FLAKY
 # https://github.com/nodejs/node/issues/54534
 test-runner-run-watch: PASS, FLAKY
-# https://github.com/nodejs/node/issues/37692
-test-fs-utimes: PASS, FLAKY
 
 [$system==linux || $system==win32]
 # https://github.com/nodejs/node/issues/49605

From ac57dadd9a3cbf825d30e092595fed97b9ab38a8 Mon Sep 17 00:00:00 2001
From: Taejin Kim <60560836+kimtaejin3@users.noreply.github.com>
Date: Fri, 6 Dec 2024 15:53:06 +0900
Subject: [PATCH 169/216] lib: add validation for options in compileFunction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56023
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 lib/vm.js                      |  1 +
 test/parallel/test-vm-basic.js | 25 ++++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/lib/vm.js b/lib/vm.js
index cb64404ef0bee2..a12e932d8d7794 100644
--- a/lib/vm.js
+++ b/lib/vm.js
@@ -319,6 +319,7 @@ function runInThisContext(code, options) {
 
 function compileFunction(code, params, options = kEmptyObject) {
   validateString(code, 'code');
+  validateObject(options, 'options');
   if (params !== undefined) {
     validateStringArray(params, 'params');
   }
diff --git a/test/parallel/test-vm-basic.js b/test/parallel/test-vm-basic.js
index f2424128b66e9f..93c3fbaea631ab 100644
--- a/test/parallel/test-vm-basic.js
+++ b/test/parallel/test-vm-basic.js
@@ -172,7 +172,30 @@ const vm = require('vm');
       'Received null'
   });
 
-  // vm.compileFunction('', undefined, null);
+  // Test for invalid options type
+  assert.throws(() => {
+    vm.compileFunction('', [], null);
+  }, {
+    name: 'TypeError',
+    code: 'ERR_INVALID_ARG_TYPE',
+    message: 'The "options" argument must be of type object. Received null'
+  });
+
+  assert.throws(() => {
+    vm.compileFunction('', [], 'string');
+  }, {
+    name: 'TypeError',
+    code: 'ERR_INVALID_ARG_TYPE',
+    message: 'The "options" argument must be of type object. Received type string (\'string\')'
+  });
+
+  assert.throws(() => {
+    vm.compileFunction('', [], 123);
+  }, {
+    name: 'TypeError',
+    code: 'ERR_INVALID_ARG_TYPE',
+    message: 'The "options" argument must be of type object. Received type number (123)'
+  });
 
   const optionTypes = {
     'filename': 'string',

From cca7c518ded80547c3c53c1975a4af17ed09e6c5 Mon Sep 17 00:00:00 2001
From: Mert Can Altin <mertgold60@gmail.com>
Date: Tue, 3 Dec 2024 14:32:01 +0300
Subject: [PATCH 170/216] util: add fast path for Latin1 decoding

PR-URL: https://github.com/nodejs/node/pull/55275
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
---
 benchmark/util/text-decoder.js       |   2 +-
 lib/internal/encoding.js             |  10 +-
 src/encoding_binding.cc              |  46 ++++++++
 src/encoding_binding.h               |   1 +
 test/cctest/test_encoding_binding.cc | 155 +++++++++++++++++++++++++++
 5 files changed, 212 insertions(+), 2 deletions(-)
 create mode 100644 test/cctest/test_encoding_binding.cc

diff --git a/benchmark/util/text-decoder.js b/benchmark/util/text-decoder.js
index dd4f02016df077..1aa60f2dd0bcd6 100644
--- a/benchmark/util/text-decoder.js
+++ b/benchmark/util/text-decoder.js
@@ -3,7 +3,7 @@
 const common = require('../common.js');
 
 const bench = common.createBenchmark(main, {
-  encoding: ['utf-8', 'latin1', 'iso-8859-3'],
+  encoding: ['utf-8', 'windows-1252', 'iso-8859-3'],
   ignoreBOM: [0, 1],
   fatal: [0, 1],
   len: [256, 1024 * 16, 1024 * 128],
diff --git a/lib/internal/encoding.js b/lib/internal/encoding.js
index 252eaa75fac22b..b2ca3c612bf6ef 100644
--- a/lib/internal/encoding.js
+++ b/lib/internal/encoding.js
@@ -29,6 +29,7 @@ const kDecoder = Symbol('decoder');
 const kEncoder = Symbol('encoder');
 const kFatal = Symbol('kFatal');
 const kUTF8FastPath = Symbol('kUTF8FastPath');
+const kLatin1FastPath = Symbol('kLatin1FastPath');
 const kIgnoreBOM = Symbol('kIgnoreBOM');
 
 const {
@@ -55,6 +56,7 @@ const {
   encodeIntoResults,
   encodeUtf8String,
   decodeUTF8,
+  decodeLatin1,
 } = binding;
 
 const { Buffer } = require('buffer');
@@ -419,9 +421,10 @@ function makeTextDecoderICU() {
       this[kFatal] = Boolean(options?.fatal);
       // Only support fast path for UTF-8.
       this[kUTF8FastPath] = enc === 'utf-8';
+      this[kLatin1FastPath] = enc === 'windows-1252';
       this[kHandle] = undefined;
 
-      if (!this[kUTF8FastPath]) {
+      if (!this[kUTF8FastPath] && !this[kLatin1FastPath]) {
         this.#prepareConverter();
       }
     }
@@ -438,11 +441,16 @@ function makeTextDecoderICU() {
       validateDecoder(this);
 
       this[kUTF8FastPath] &&= !(options?.stream);
+      this[kLatin1FastPath] &&= !(options?.stream);
 
       if (this[kUTF8FastPath]) {
         return decodeUTF8(input, this[kIgnoreBOM], this[kFatal]);
       }
 
+      if (this[kLatin1FastPath]) {
+        return decodeLatin1(input, this[kIgnoreBOM], this[kFatal]);
+      }
+
       this.#prepareConverter();
 
       validateObject(options, 'options', kValidateObjectAllowObjectsAndNull);
diff --git a/src/encoding_binding.cc b/src/encoding_binding.cc
index 97ddd59fb661c8..a132eeb62306c6 100644
--- a/src/encoding_binding.cc
+++ b/src/encoding_binding.cc
@@ -1,6 +1,7 @@
 #include "encoding_binding.h"
 #include "ada.h"
 #include "env-inl.h"
+#include "node_buffer.h"
 #include "node_errors.h"
 #include "node_external_reference.h"
 #include "simdutf.h"
@@ -226,6 +227,7 @@ void BindingData::CreatePerIsolateProperties(IsolateData* isolate_data,
   SetMethodNoSideEffect(isolate, target, "decodeUTF8", DecodeUTF8);
   SetMethodNoSideEffect(isolate, target, "toASCII", ToASCII);
   SetMethodNoSideEffect(isolate, target, "toUnicode", ToUnicode);
+  SetMethodNoSideEffect(isolate, target, "decodeLatin1", DecodeLatin1);
 }
 
 void BindingData::CreatePerContextProperties(Local<Object> target,
@@ -243,6 +245,50 @@ void BindingData::RegisterTimerExternalReferences(
   registry->Register(DecodeUTF8);
   registry->Register(ToASCII);
   registry->Register(ToUnicode);
+  registry->Register(DecodeLatin1);
+}
+
+void BindingData::DecodeLatin1(const FunctionCallbackInfo<Value>& args) {
+  Environment* env = Environment::GetCurrent(args);
+
+  CHECK_GE(args.Length(), 1);
+  if (!(args[0]->IsArrayBuffer() || args[0]->IsSharedArrayBuffer() ||
+        args[0]->IsArrayBufferView())) {
+    return node::THROW_ERR_INVALID_ARG_TYPE(
+        env->isolate(),
+        "The \"input\" argument must be an instance of ArrayBuffer, "
+        "SharedArrayBuffer, or ArrayBufferView.");
+  }
+
+  bool ignore_bom = args[1]->IsTrue();
+  bool has_fatal = args[2]->IsTrue();
+
+  ArrayBufferViewContents<uint8_t> buffer(args[0]);
+  const uint8_t* data = buffer.data();
+  size_t length = buffer.length();
+
+  if (ignore_bom && length > 0 && data[0] == 0xFF) {
+    data++;
+    length--;
+  }
+
+  if (length == 0) {
+    return args.GetReturnValue().SetEmptyString();
+  }
+
+  std::string result(length * 2, '\0');
+
+  size_t written = simdutf::convert_latin1_to_utf8(
+      reinterpret_cast<const char*>(data), length, result.data());
+
+  if (has_fatal && written == 0) {
+    return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA(
+        env->isolate(), "The encoded data was not valid for encoding latin1");
+  }
+
+  Local<Object> buffer_result =
+      node::Buffer::Copy(env, result.c_str(), written).ToLocalChecked();
+  args.GetReturnValue().Set(buffer_result);
 }
 
 }  // namespace encoding_binding
diff --git a/src/encoding_binding.h b/src/encoding_binding.h
index 2690cb74f8a05b..97f55394d27641 100644
--- a/src/encoding_binding.h
+++ b/src/encoding_binding.h
@@ -31,6 +31,7 @@ class BindingData : public SnapshotableObject {
   static void EncodeInto(const v8::FunctionCallbackInfo<v8::Value>& args);
   static void EncodeUtf8String(const v8::FunctionCallbackInfo<v8::Value>& args);
   static void DecodeUTF8(const v8::FunctionCallbackInfo<v8::Value>& args);
+  static void DecodeLatin1(const v8::FunctionCallbackInfo<v8::Value>& args);
 
   static void ToASCII(const v8::FunctionCallbackInfo<v8::Value>& args);
   static void ToUnicode(const v8::FunctionCallbackInfo<v8::Value>& args);
diff --git a/test/cctest/test_encoding_binding.cc b/test/cctest/test_encoding_binding.cc
new file mode 100644
index 00000000000000..06cc36d8f6ae34
--- /dev/null
+++ b/test/cctest/test_encoding_binding.cc
@@ -0,0 +1,155 @@
+#include "encoding_binding.h"
+#include "env-inl.h"
+#include "gtest/gtest.h"
+#include "node_test_fixture.h"
+#include "v8.h"
+
+namespace node {
+namespace encoding_binding {
+
+bool RunDecodeLatin1(Environment* env,
+                     Local<Value> args[],
+                     bool ignore_bom,
+                     bool has_fatal,
+                     Local<Value>* result) {
+  Isolate* isolate = env->isolate();
+  TryCatch try_catch(isolate);
+
+  Local<Boolean> ignoreBOMValue = Boolean::New(isolate, ignore_bom);
+  Local<Boolean> fatalValue = Boolean::New(isolate, has_fatal);
+
+  Local<Value> updatedArgs[] = {args[0], ignoreBOMValue, fatalValue};
+
+  BindingData::DecodeLatin1(FunctionCallbackInfo<Value>(updatedArgs));
+
+  if (try_catch.HasCaught()) {
+    return false;
+  }
+
+  *result = try_catch.Exception();
+  return true;
+}
+
+class EncodingBindingTest : public NodeTestFixture {};
+
+TEST_F(EncodingBindingTest, DecodeLatin1_ValidInput) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t latin1_data[] = {0xC1, 0xE9, 0xF3};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(latin1_data));
+  memcpy(ab->GetBackingStore()->Data(), latin1_data, sizeof(latin1_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(latin1_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_TRUE(RunDecodeLatin1(env, args, false, false, &result));
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "Áéó");
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_EmptyInput) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, 0);
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, 0);
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_TRUE(RunDecodeLatin1(env, args, false, false, &result));
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "");
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_InvalidInput) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  Local<Value> args[] = {String::NewFromUtf8Literal(isolate, "Invalid input")};
+
+  Local<Value> result;
+  EXPECT_FALSE(RunDecodeLatin1(env, args, false, false, &result));
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_IgnoreBOM) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t latin1_data[] = {0xFE, 0xFF, 0xC1, 0xE9, 0xF3};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(latin1_data));
+  memcpy(ab->GetBackingStore()->Data(), latin1_data, sizeof(latin1_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(latin1_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_TRUE(RunDecodeLatin1(env, args, true, false, &result));
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "Áéó");
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_FatalInvalidInput) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t invalid_data[] = {0xFF, 0xFF, 0xFF};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(invalid_data));
+  memcpy(ab->GetBackingStore()->Data(), invalid_data, sizeof(invalid_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(invalid_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_FALSE(RunDecodeLatin1(env, args, false, true, &result));
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_IgnoreBOMAndFatal) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t latin1_data[] = {0xFE, 0xFF, 0xC1, 0xE9, 0xF3};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(latin1_data));
+  memcpy(ab->GetBackingStore()->Data(), latin1_data, sizeof(latin1_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(latin1_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_TRUE(RunDecodeLatin1(env, args, true, true, &result));
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "Áéó");
+}
+
+TEST_F(EncodingBindingTest, DecodeLatin1_BOMPresent) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t latin1_data[] = {0xFF, 0xC1, 0xE9, 0xF3};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(latin1_data));
+  memcpy(ab->GetBackingStore()->Data(), latin1_data, sizeof(latin1_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(latin1_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  EXPECT_TRUE(RunDecodeLatin1(env, args, true, false, &result));
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "Áéó");
+}
+
+}  // namespace encoding_binding
+}  // namespace node

From 026f0198c83d4158afc7125b9fb9b4637d16be38 Mon Sep 17 00:00:00 2001
From: Ruy Adorno <ruy@vlt.sh>
Date: Fri, 6 Dec 2024 09:44:53 -0500
Subject: [PATCH 171/216] doc: update blog release-post link

PR-URL: https://github.com/nodejs/node/pull/56123
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
---
 doc/contributing/releases.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 055290263388b5..90bb324d83ad6e 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -1442,5 +1442,5 @@ Typical resolution: sign the release again.
 [`create-release-proposal`]: https://github.com/nodejs/node/actions/workflows/create-release-proposal.yml
 [build-infra team]: https://github.com/orgs/nodejs/teams/build-infra
 [expected assets]: https://github.com/nodejs/build/tree/HEAD/ansible/www-standalone/tools/promote/expected_assets
-[nodejs.org release-post.js script]: https://github.com/nodejs/nodejs.org/blob/HEAD/scripts/release-post/index.mjs
+[nodejs.org release-post.js script]: https://github.com/nodejs/nodejs.org/blob/HEAD/apps/site/scripts/release-post/index.mjs
 [nodejs.org repository]: https://github.com/nodejs/nodejs.org

From ff2eec7275f0c564ea1846324370ae28f22f6c8f Mon Sep 17 00:00:00 2001
From: Joyee Cheung <joyeec9h3@gmail.com>
Date: Sat, 7 Dec 2024 18:25:02 +0100
Subject: [PATCH 172/216] sea: only assert snapshot main function for main
 threads

Snapshot main functions are only loaded for main threads in single
executable applications. Update the check to avoid asserting it
in worker threads - this allows worker threads to be spawned in
snapshot main functions bundled into a single executable
application.

PR-URL: https://github.com/nodejs/node/pull/56120
Fixes: https://github.com/nodejs/node/issues/56077
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
---
 src/node.cc                                   |  6 +-
 ...-executable-application-snapshot-worker.js | 80 +++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 test/sequential/test-single-executable-application-snapshot-worker.js

diff --git a/src/node.cc b/src/node.cc
index 9f6f8e53abd7e4..1fc236d88e53eb 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -355,7 +355,8 @@ MaybeLocal<Value> StartExecution(Environment* env, StartExecutionCallback cb) {
   CHECK(!env->isolate_data()->is_building_snapshot());
 
 #ifndef DISABLE_SINGLE_EXECUTABLE_APPLICATION
-  if (sea::IsSingleExecutable()) {
+  // Snapshot in SEA is only loaded for the main thread.
+  if (sea::IsSingleExecutable() && env->is_main_thread()) {
     sea::SeaResource sea = sea::FindSingleExecutableResource();
     // The SEA preparation blob building process should already enforce this,
     // this check is just here to guard against the unlikely case where
@@ -374,6 +375,9 @@ MaybeLocal<Value> StartExecution(Environment* env, StartExecutionCallback cb) {
   // move the pre-execution part into a different file that can be
   // reused when dealing with user-defined main functions.
   if (!env->snapshot_deserialize_main().IsEmpty()) {
+    // Custom worker snapshot is not supported yet,
+    // so workers can't have deserialize main functions.
+    CHECK(env->is_main_thread());
     return env->RunSnapshotDeserializeMain();
   }
 
diff --git a/test/sequential/test-single-executable-application-snapshot-worker.js b/test/sequential/test-single-executable-application-snapshot-worker.js
new file mode 100644
index 00000000000000..50c77743573a44
--- /dev/null
+++ b/test/sequential/test-single-executable-application-snapshot-worker.js
@@ -0,0 +1,80 @@
+'use strict';
+
+require('../common');
+
+const {
+  generateSEA,
+  skipIfSingleExecutableIsNotSupported,
+} = require('../common/sea');
+
+skipIfSingleExecutableIsNotSupported();
+
+// This tests the snapshot support in single executable applications.
+
+const tmpdir = require('../common/tmpdir');
+const { writeFileSync, existsSync } = require('fs');
+const {
+  spawnSyncAndAssert, spawnSyncAndExitWithoutError,
+} = require('../common/child_process');
+const assert = require('assert');
+
+const configFile = tmpdir.resolve('sea-config.json');
+const seaPrepBlob = tmpdir.resolve('sea-prep.blob');
+const outputFile = tmpdir.resolve(process.platform === 'win32' ? 'sea.exe' : 'sea');
+
+{
+  tmpdir.refresh();
+
+  // FIXME(joyeecheung): currently `worker_threads` cannot be loaded during the
+  // snapshot building process because internal/worker.js is accessing isMainThread at
+  // the top level (and there are maybe more code that access these at the top-level),
+  // and have to be loaded in the deserialized snapshot main function.
+  // Change these states to be accessed on-demand.
+  const code = `
+  const {
+    setDeserializeMainFunction,
+  } = require('v8').startupSnapshot;
+  setDeserializeMainFunction(() => {
+    const { Worker } = require('worker_threads');
+    new Worker("console.log('Hello from Worker')", { eval: true });
+  });
+  `;
+
+  writeFileSync(tmpdir.resolve('snapshot.js'), code, 'utf-8');
+  writeFileSync(configFile, `
+  {
+    "main": "snapshot.js",
+    "output": "sea-prep.blob",
+    "useSnapshot": true
+  }
+  `);
+
+  spawnSyncAndExitWithoutError(
+    process.execPath,
+    ['--experimental-sea-config', 'sea-config.json'],
+    {
+      cwd: tmpdir.path,
+      env: {
+        NODE_DEBUG_NATIVE: 'SEA',
+        ...process.env,
+      },
+    });
+
+  assert(existsSync(seaPrepBlob));
+
+  generateSEA(outputFile, process.execPath, seaPrepBlob);
+
+  spawnSyncAndAssert(
+    outputFile,
+    {
+      env: {
+        NODE_DEBUG_NATIVE: 'SEA',
+        ...process.env,
+      }
+    },
+    {
+      trim: true,
+      stdout: 'Hello from Worker'
+    }
+  );
+}

From b3d40e3be539eea99473ef04233f8613bf395b83 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Sun, 8 Dec 2024 20:34:31 +0100
Subject: [PATCH 173/216] tools: improve release proposal PR opening
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Open as draft. The releaser should review the PR and mark it as ready.
- Add the "release" label.
- Assign the releaser to the PR so it's clearer who's in charge and
  they can find it more easily. This will also notify and subscribe
  them to the PR.

PR-URL: https://github.com/nodejs/node/pull/56161
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 .github/workflows/create-release-proposal.yml | 2 +-
 tools/actions/create-release.sh               | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index d3ffa3ad49b5e2..de8e91d98a3557 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -78,7 +78,7 @@ jobs:
         run: |
           git update-index --assume-unchanged tools/actions/create-release.sh
           curl -fsSLo tools/actions/create-release.sh https://github.com/${GITHUB_REPOSITORY}/raw/${GITHUB_SHA}/tools/actions/create-release.sh
-          ./tools/actions/create-release.sh "${RELEASE_DATE}" "${RELEASE_LINE}"
+          ./tools/actions/create-release.sh "${RELEASE_DATE}" "${RELEASE_LINE}" "${GITHUB_ACTOR}"
         env:
           GH_TOKEN: ${{ github.token }}
           # We want the bot to push the push the release commit so CI runs on it.
diff --git a/tools/actions/create-release.sh b/tools/actions/create-release.sh
index e3cfd76952a18b..17878ff2e40d86 100755
--- a/tools/actions/create-release.sh
+++ b/tools/actions/create-release.sh
@@ -7,6 +7,7 @@ BOT_TOKEN=${BOT_TOKEN:-}
 
 RELEASE_DATE=$1
 RELEASE_LINE=$2
+RELEASER=$3
 
 if [ -z "$RELEASE_DATE" ] || [ -z "$RELEASE_LINE" ]; then
   echo "Usage: $0 <RELEASE_DATE> <RELEASE_LINE>"
@@ -48,7 +49,7 @@ PR_URL="$(gh api \
   -H "Accept: application/vnd.github+json" \
   -H "X-GitHub-Api-Version: 2022-11-28" \
   "/repos/${GITHUB_REPOSITORY}/pulls" \
-   -f "title=$TITLE" -f "body=$TEMP_BODY" -f "head=$HEAD_BRANCH" -f "base=v$RELEASE_LINE.x")"
+   -f "title=$TITLE" -f "body=$TEMP_BODY" -f "head=$HEAD_BRANCH" -f "base=v$RELEASE_LINE.x" -f draft=true)"
 
 # Push the release commit to the proposal branch using `BOT_TOKEN` from the env
 node --input-type=module - \
@@ -124,3 +125,5 @@ if (data.errors?.length) {
 }
 console.log(util.inspect(data, { depth: Infinity }));
 EOF
+
+gh pr edit "$PR_URL" --add-label release --add-assignee "$RELEASER"

From 3ea738fc26457b7d960c395a6ec3c1965cbb701c Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Mon, 9 Dec 2024 11:51:36 +0100
Subject: [PATCH 174/216] test: remove `hasOpenSSL3x` utils
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In favor of `hasOpenSSL`.

PR-URL: https://github.com/nodejs/node/pull/56164
Refs: https://github.com/nodejs/node/pull/56160/files#r1874118863
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 test/common/index.js                               | 8 --------
 test/parallel/test-https-agent-session-eviction.js | 2 +-
 test/parallel/test-tls-alert.js                    | 2 +-
 test/parallel/test-tls-empty-sni-context.js        | 2 +-
 test/parallel/test-tls-getprotocol.js              | 4 ++--
 test/parallel/test-tls-min-max-version.js          | 2 +-
 test/parallel/test-tls-psk-circuit.js              | 4 ++--
 test/parallel/test-tls-session-cache.js            | 2 +-
 8 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/test/common/index.js b/test/common/index.js
index c36643203746cd..c39488dd0b9819 100644
--- a/test/common/index.js
+++ b/test/common/index.js
@@ -1014,14 +1014,6 @@ const common = {
     return hasOpenSSL(3);
   },
 
-  get hasOpenSSL31() {
-    return hasOpenSSL(3, 1);
-  },
-
-  get hasOpenSSL32() {
-    return hasOpenSSL(3, 2);
-  },
-
   get inFreeBSDJail() {
     if (inFreeBSDJail !== null) return inFreeBSDJail;
 
diff --git a/test/parallel/test-https-agent-session-eviction.js b/test/parallel/test-https-agent-session-eviction.js
index da5600710560b2..e0986e53c1103b 100644
--- a/test/parallel/test-https-agent-session-eviction.js
+++ b/test/parallel/test-https-agent-session-eviction.js
@@ -56,7 +56,7 @@ function faultyServer(port) {
 function second(server, session) {
   const req = https.request({
     port: server.address().port,
-    ciphers: (common.hasOpenSSL31 ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
+    ciphers: (common.hasOpenSSL(3, 1) ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
     rejectUnauthorized: false
   }, function(res) {
     res.resume();
diff --git a/test/parallel/test-tls-alert.js b/test/parallel/test-tls-alert.js
index 04000771aa977b..e6aaaedfe59d72 100644
--- a/test/parallel/test-tls-alert.js
+++ b/test/parallel/test-tls-alert.js
@@ -42,7 +42,7 @@ const server = tls.Server({
   cert: loadPEM('agent2-cert')
 }, null).listen(0, common.mustCall(() => {
   const args = ['s_client', '-quiet', '-tls1_1',
-                '-cipher', (common.hasOpenSSL31 ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
+                '-cipher', (common.hasOpenSSL(3, 1) ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
                 '-connect', `127.0.0.1:${server.address().port}`];
 
   execFile(common.opensslCli, args, common.mustCall((err, _, stderr) => {
diff --git a/test/parallel/test-tls-empty-sni-context.js b/test/parallel/test-tls-empty-sni-context.js
index 3424e057bdef46..7bfe7ac453b864 100644
--- a/test/parallel/test-tls-empty-sni-context.js
+++ b/test/parallel/test-tls-empty-sni-context.js
@@ -26,7 +26,7 @@ const server = tls.createServer(options, (c) => {
   }, common.mustNotCall());
 
   c.on('error', common.mustCall((err) => {
-    const expectedErr = common.hasOpenSSL32 ?
+    const expectedErr = common.hasOpenSSL(3, 2) ?
       'ERR_SSL_SSL/TLS_ALERT_HANDSHAKE_FAILURE' : 'ERR_SSL_SSLV3_ALERT_HANDSHAKE_FAILURE';
     assert.strictEqual(err.code, expectedErr);
   }));
diff --git a/test/parallel/test-tls-getprotocol.js b/test/parallel/test-tls-getprotocol.js
index 571f400cea5746..a9c8775e2f112f 100644
--- a/test/parallel/test-tls-getprotocol.js
+++ b/test/parallel/test-tls-getprotocol.js
@@ -14,11 +14,11 @@ const clientConfigs = [
   {
     secureProtocol: 'TLSv1_method',
     version: 'TLSv1',
-    ciphers: (common.hasOpenSSL31 ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT')
+    ciphers: (common.hasOpenSSL(3, 1) ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT')
   }, {
     secureProtocol: 'TLSv1_1_method',
     version: 'TLSv1.1',
-    ciphers: (common.hasOpenSSL31 ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT')
+    ciphers: (common.hasOpenSSL(3, 1) ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT')
   }, {
     secureProtocol: 'TLSv1_2_method',
     version: 'TLSv1.2'
diff --git a/test/parallel/test-tls-min-max-version.js b/test/parallel/test-tls-min-max-version.js
index ab351558a4c8b3..af32468eea6a68 100644
--- a/test/parallel/test-tls-min-max-version.js
+++ b/test/parallel/test-tls-min-max-version.js
@@ -22,7 +22,7 @@ function test(cmin, cmax, cprot, smin, smax, sprot, proto, cerr, serr) {
     if (serr !== 'ERR_SSL_UNSUPPORTED_PROTOCOL')
       ciphers = 'ALL@SECLEVEL=0';
   }
-  if (common.hasOpenSSL31 && cerr === 'ERR_SSL_TLSV1_ALERT_PROTOCOL_VERSION') {
+  if (common.hasOpenSSL(3, 1) && cerr === 'ERR_SSL_TLSV1_ALERT_PROTOCOL_VERSION') {
     ciphers = 'DEFAULT@SECLEVEL=0';
   }
   // Report where test was called from. Strip leading garbage from
diff --git a/test/parallel/test-tls-psk-circuit.js b/test/parallel/test-tls-psk-circuit.js
index 2b49161df8326c..e93db3eb1b4923 100644
--- a/test/parallel/test-tls-psk-circuit.js
+++ b/test/parallel/test-tls-psk-circuit.js
@@ -62,11 +62,11 @@ test({ psk: USERS.UserA, identity: 'UserA' }, { minVersion: 'TLSv1.3' });
 test({ psk: USERS.UserB, identity: 'UserB' });
 test({ psk: USERS.UserB, identity: 'UserB' }, { minVersion: 'TLSv1.3' });
 // Unrecognized user should fail handshake
-const expectedHandshakeErr = common.hasOpenSSL32 ?
+const expectedHandshakeErr = common.hasOpenSSL(3, 2) ?
   'ERR_SSL_SSL/TLS_ALERT_HANDSHAKE_FAILURE' : 'ERR_SSL_SSLV3_ALERT_HANDSHAKE_FAILURE';
 test({ psk: USERS.UserB, identity: 'UserC' }, {}, expectedHandshakeErr);
 // Recognized user but incorrect secret should fail handshake
-const expectedIllegalParameterErr = common.hasOpenSSL32 ?
+const expectedIllegalParameterErr = common.hasOpenSSL(3, 2) ?
   'ERR_SSL_SSL/TLS_ALERT_ILLEGAL_PARAMETER' : 'ERR_SSL_SSLV3_ALERT_ILLEGAL_PARAMETER';
 test({ psk: USERS.UserA, identity: 'UserB' }, {}, expectedIllegalParameterErr);
 test({ psk: USERS.UserB, identity: 'UserB' });
diff --git a/test/parallel/test-tls-session-cache.js b/test/parallel/test-tls-session-cache.js
index e4ecb53282fbae..b55e150401d8a2 100644
--- a/test/parallel/test-tls-session-cache.js
+++ b/test/parallel/test-tls-session-cache.js
@@ -100,7 +100,7 @@ function doTest(testOptions, callback) {
     const args = [
       's_client',
       '-tls1',
-      '-cipher', (common.hasOpenSSL31 ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
+      '-cipher', (common.hasOpenSSL(3, 1) ? 'DEFAULT:@SECLEVEL=0' : 'DEFAULT'),
       '-connect', `localhost:${this.address().port}`,
       '-servername', 'ohgod',
       '-key', fixtures.path('keys/rsa_private.pem'),

From 10d55e3d731a8cc849bc96f06d32c65858be88f4 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Mon, 9 Dec 2024 12:10:51 +0100
Subject: [PATCH 175/216] tools: use commit title as PR title when creating
 release proposal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56165
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 tools/actions/create-release.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/actions/create-release.sh b/tools/actions/create-release.sh
index 17878ff2e40d86..1392c4dd458476 100755
--- a/tools/actions/create-release.sh
+++ b/tools/actions/create-release.sh
@@ -29,7 +29,7 @@ git node release --prepare --skipBranchDiff --yes --releaseDate "$RELEASE_DATE"
 HEAD_BRANCH="$(git rev-parse --abbrev-ref HEAD)"
 HEAD_SHA="$(git rev-parse HEAD^)"
 
-TITLE=$(awk "/^## ${RELEASE_DATE}/ { print substr(\$0, 4) }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")
+TITLE="$(git log -1 --format=%s)"
 
 # Use a temporary file for the PR body
 TEMP_BODY="$(awk "/## ${RELEASE_DATE}/,/^<a id=/{ if (!/^<a id=/) print }" "doc/changelogs/CHANGELOG_V${RELEASE_LINE}.md")"
@@ -56,7 +56,7 @@ node --input-type=module - \
     "$GITHUB_REPOSITORY" \
     "$HEAD_BRANCH" \
     "$HEAD_SHA" \
-    "$(git log -1 HEAD --format=%s || true)" \
+    "$TITLE" \
     "$(git log -1 HEAD --format=%b | awk -v PR_URL="$PR_URL" '{sub(/^PR-URL: TODO$/, "PR-URL: " PR_URL)} 1' || true)" \
     "$(git show HEAD --diff-filter=d --name-only --format= || true)" \
     "$(git show HEAD --diff-filter=D --name-only --format= || true)" \

From 1dbc7e87d78a0fe62009aec0b8eb502da9cb0771 Mon Sep 17 00:00:00 2001
From: "Edigleysson Silva (Edy)" <edigleyssonsilva@gmail.com>
Date: Mon, 9 Dec 2024 14:43:50 -0300
Subject: [PATCH 176/216] doc: fix c++ addon hello world sample
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56172
Refs: https://github.com/nodejs/node/issues/56173
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: theanarkh <theratliter@gmail.com>
---
 doc/api/addons.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/api/addons.md b/doc/api/addons.md
index e0e00dca0b9e8b..8e2864952e0841 100644
--- a/doc/api/addons.md
+++ b/doc/api/addons.md
@@ -72,6 +72,7 @@ namespace demo {
 using v8::FunctionCallbackInfo;
 using v8::Isolate;
 using v8::Local;
+using v8::NewStringType;
 using v8::Object;
 using v8::String;
 using v8::Value;
@@ -79,7 +80,7 @@ using v8::Value;
 void Method(const FunctionCallbackInfo<Value>& args) {
   Isolate* isolate = args.GetIsolate();
   args.GetReturnValue().Set(String::NewFromUtf8(
-      isolate, "world").ToLocalChecked());
+      isolate, "world", NewStringType::kNormal).ToLocalChecked());
 }
 
 void Initialize(Local<Object> exports) {

From 0e9abf2754d88768244722c16b4740dd62696b43 Mon Sep 17 00:00:00 2001
From: Yuan-Ming Hsu <48866415+technic960183@users.noreply.github.com>
Date: Wed, 11 Dec 2024 03:13:22 +0800
Subject: [PATCH 177/216] doc: fix incorrect link to style guide

The link to the style guide in `pull-requests.md` linked to the main
`README.md` instead of `doc/README.md`. This commit fixes the link.

Refs: https://github.com/nodejs/node/pull/41119
PR-URL: https://github.com/nodejs/node/pull/56181
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/contributing/pull-requests.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/contributing/pull-requests.md b/doc/contributing/pull-requests.md
index 295e9d3695c47e..2ad538b3fd8e29 100644
--- a/doc/contributing/pull-requests.md
+++ b/doc/contributing/pull-requests.md
@@ -122,7 +122,7 @@ If you are modifying code, please be sure to run `make lint` (or
 code style guide.
 
 Any documentation you write (including code comments and API documentation)
-should follow the [Style Guide](../../README.md). Code samples
+should follow the [Style Guide](../../doc/README.md). Code samples
 included in the API docs will also be checked when running `make lint` (or
 `vcbuild.bat lint` on Windows). If you are adding to or deprecating an API,
 add or change the appropriate YAML documentation. Use `REPLACEME` for the

From bbd0222a10f9951606cb362f3e1b8b586b81b49a Mon Sep 17 00:00:00 2001
From: Michael Dawson <midawson@redhat.com>
Date: Fri, 29 Nov 2024 19:56:17 +0000
Subject: [PATCH 178/216] doc: add ambassador message - benefits of Node.js
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add an initial message to be promoted.

Signed-off-by: Michael Dawson <midawson@redhat.com>
PR-URL: https://github.com/nodejs/node/pull/56085
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 .../advocacy-ambassador-program.md            | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/doc/contributing/advocacy-ambassador-program.md b/doc/contributing/advocacy-ambassador-program.md
index cfb8c5cb1cd484..31d8fd58a1a4bf 100644
--- a/doc/contributing/advocacy-ambassador-program.md
+++ b/doc/contributing/advocacy-ambassador-program.md
@@ -130,3 +130,38 @@ or the information to be shared.
 Add a list of GitHub handles for those within the project that
 have volunteered to be contacated when necessary by ambassadors
 to get more info about the message to be promoted.
+
+#### Node.js is a great choice for a JavaScript runtime
+
+##### Goal
+
+Highlight the benefits of chosing Node.js as your backend JavaScript runtime. Focus on what is great
+about Node.js without drawing comparisons to alternatives. We don't want to say negative things about
+other options, only highlight what is great about Node.js as a choice.
+
+Some of the things to highlight include:
+
+* How widely it is used (you never get fired for chosing Node.js).
+* The openess of the project. It is part of the OpenJS Foundation and it's governance is set up to avoid
+  any one company from dominating the project. Decisions are made by the collaborators (of which there are quite
+  a few) versus a small number of people.
+* It has predictable and stable releases and has delivered on the release schedule since 2015.
+* It was a well defined security release process and manages security releases well.
+* As the defacto standard, it has the highest likelihood of being supported for a given package on npm.
+* It is not dependent on any one company for its continued existence reducing risk of using it.
+* The large number of platforms supported.
+* Asynchronous non-blocking i/o architecture drives high transactional throughput, making it ideal for web workloads.
+* Single threaded programming model enables very low resource consumption, making it ideal for containerised workloads.
+* Highly vibrant ecosystem with enterprise support from many vendors.
+
+#### Related Links
+
+* <https://github.com/nodejs/release>
+* <https://github.com/nodejs/node/blob/main/doc/contributing/security-release-process.md>
+* <https://github.com/nodejs/TSC/blob/main/TSC-Charter.md>
+* <https://github.com/mhdawson/presentations/blob/main/2024/NodeConfEU_2024-Node.js_whats_next.pdf>
+  for slide  usage and topping recent surveys.
+
+#### Project contacts
+
+* @mhdawson

From 6797a35a5bbc782655cdf0c90a9402117f8783ab Mon Sep 17 00:00:00 2001
From: Shima Ryuhei <65934663+islandryu@users.noreply.github.com>
Date: Wed, 11 Dec 2024 08:52:29 +0900
Subject: [PATCH 179/216] module: prevent main thread exiting before esm worker
 ends

PR-URL: https://github.com/nodejs/node/pull/56183
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
---
 lib/internal/modules/esm/worker.js               |  6 ++++--
 .../test-esm-loader-spawn-promisified.mjs        | 16 ++++++++++++++++
 test/fixtures/es-module-loaders/hooks-custom.mjs |  7 +++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/lib/internal/modules/esm/worker.js b/lib/internal/modules/esm/worker.js
index 7b295973abe7a4..0bb8243f644628 100644
--- a/lib/internal/modules/esm/worker.js
+++ b/lib/internal/modules/esm/worker.js
@@ -212,8 +212,6 @@ async function customizedModuleWorker(lock, syncCommPort, errorHandler) {
       (port ?? syncCommPort).postMessage(wrapMessage('error', exception));
     }
 
-    AtomicsAdd(lock, WORKER_TO_MAIN_THREAD_NOTIFICATION, 1);
-    AtomicsNotify(lock, WORKER_TO_MAIN_THREAD_NOTIFICATION);
     if (shouldRemoveGlobalErrorHandler) {
       process.off('uncaughtException', errorHandler);
     }
@@ -222,6 +220,10 @@ async function customizedModuleWorker(lock, syncCommPort, errorHandler) {
     // We keep checking for new messages to not miss any.
     clearImmediate(immediate);
     immediate = setImmediate(checkForMessages).unref();
+    // To prevent the main thread from terminating before this function completes after unlocking,
+    // the following process is executed at the end of the function.
+    AtomicsAdd(lock, WORKER_TO_MAIN_THREAD_NOTIFICATION, 1);
+    AtomicsNotify(lock, WORKER_TO_MAIN_THREAD_NOTIFICATION);
   }
 }
 
diff --git a/test/es-module/test-esm-loader-spawn-promisified.mjs b/test/es-module/test-esm-loader-spawn-promisified.mjs
index 162316ade410b9..7c9fc2e7f82e44 100644
--- a/test/es-module/test-esm-loader-spawn-promisified.mjs
+++ b/test/es-module/test-esm-loader-spawn-promisified.mjs
@@ -285,4 +285,20 @@ describe('Loader hooks parsing modules', { concurrency: true }, () => {
     assert.strictEqual(code, 0);
     assert.strictEqual(signal, null);
   });
+
+  it('throw maximum call stack error on the loader', async () => {
+    const { code, signal, stdout, stderr } = await spawnPromisified(execPath, [
+      '--no-warnings',
+      '--experimental-loader',
+      fixtures.fileURL('/es-module-loaders/hooks-custom.mjs'),
+      '--input-type=module',
+      '--eval',
+      'await import("esmHook/maximumCallStack.mjs")',
+    ]);
+
+    assert(stderr.includes('Maximum call stack size exceeded'));
+    assert.strictEqual(stdout, '');
+    assert.strictEqual(code, 1);
+    assert.strictEqual(signal, null);
+  });
 });
diff --git a/test/fixtures/es-module-loaders/hooks-custom.mjs b/test/fixtures/es-module-loaders/hooks-custom.mjs
index 3c38649a88794f..5109d20f4d3711 100644
--- a/test/fixtures/es-module-loaders/hooks-custom.mjs
+++ b/test/fixtures/es-module-loaders/hooks-custom.mjs
@@ -105,5 +105,12 @@ export function load(url, context, next) {
     };
   }
 
+  if (url.endsWith('esmHook/maximumCallStack.mjs')) {
+    function recurse() {
+      recurse();
+    }
+    recurse();
+  }
+
   return next(url);
 }

From b739c2a9268971ec50a00cfd34828a2542de5409 Mon Sep 17 00:00:00 2001
From: Anton Kastritskii <halloy52@gmail.com>
Date: Wed, 11 Dec 2024 01:43:13 +0000
Subject: [PATCH 180/216] doc: call out import.meta is only supported in ES
 modules
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56186
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
---
 doc/api/esm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/api/esm.md b/doc/api/esm.md
index 47e60084a94845..7016e24065feab 100644
--- a/doc/api/esm.md
+++ b/doc/api/esm.md
@@ -330,7 +330,7 @@ modules it can be used to load ES modules.
 * {Object}
 
 The `import.meta` meta property is an `Object` that contains the following
-properties.
+properties. It is only supported in ES modules.
 
 ### `import.meta.dirname`
 

From 195cc42935d0bd21b8ec302cbb5e152841249f35 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Wed, 11 Dec 2024 12:11:38 +0100
Subject: [PATCH 181/216] util: do not rely on mutable `Object` and `Function`'
 `constructor` prop

PR-URL: https://github.com/nodejs/node/pull/56188
Fixes: https://github.com/nodejs/node/issues/55924
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Jordan Harband <ljharb@gmail.com>
---
 lib/internal/util/inspect.js       | 30 +++++++++++++++++++++++++-----
 test/parallel/test-util-inspect.js | 30 ++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/lib/internal/util/inspect.js b/lib/internal/util/inspect.js
index c547e7697a6b40..873e9d5f56165a 100644
--- a/lib/internal/util/inspect.js
+++ b/lib/internal/util/inspect.js
@@ -22,8 +22,11 @@ const {
   DatePrototypeToISOString,
   DatePrototypeToString,
   ErrorPrototypeToString,
+  Function,
+  FunctionPrototype,
   FunctionPrototypeBind,
   FunctionPrototypeCall,
+  FunctionPrototypeSymbolHasInstance,
   FunctionPrototypeToString,
   JSONStringify,
   MapPrototypeGetSize,
@@ -50,6 +53,7 @@ const {
   ObjectGetPrototypeOf,
   ObjectIs,
   ObjectKeys,
+  ObjectPrototype,
   ObjectPrototypeHasOwnProperty,
   ObjectPrototypePropertyIsEnumerable,
   ObjectSeal,
@@ -588,10 +592,26 @@ function isInstanceof(object, proto) {
   }
 }
 
+// Special-case for some builtin prototypes in case their `constructor` property has been tampered.
+const wellKnownPrototypes = new SafeMap();
+wellKnownPrototypes.set(ObjectPrototype, { name: 'Object', constructor: Object });
+wellKnownPrototypes.set(FunctionPrototype, { name: 'Function', constructor: Function });
+
 function getConstructorName(obj, ctx, recurseTimes, protoProps) {
   let firstProto;
   const tmp = obj;
   while (obj || isUndetectableObject(obj)) {
+    const wellKnownPrototypeNameAndConstructor = wellKnownPrototypes.get(obj);
+    if (wellKnownPrototypeNameAndConstructor != null) {
+      const { name, constructor } = wellKnownPrototypeNameAndConstructor;
+      if (FunctionPrototypeSymbolHasInstance(constructor, tmp)) {
+        if (protoProps !== undefined && firstProto !== obj) {
+          addPrototypeProperties(
+            ctx, tmp, firstProto || tmp, recurseTimes, protoProps);
+        }
+        return name;
+      }
+    }
     const descriptor = ObjectGetOwnPropertyDescriptor(obj, 'constructor');
     if (descriptor !== undefined &&
         typeof descriptor.value === 'function' &&
@@ -949,7 +969,11 @@ function formatRaw(ctx, value, recurseTimes, typedArray) {
   if (noIterator) {
     keys = getKeys(value, ctx.showHidden);
     braces = ['{', '}'];
-    if (constructor === 'Object') {
+    if (typeof value === 'function') {
+      base = getFunctionBase(value, constructor, tag);
+      if (keys.length === 0 && protoProps === undefined)
+        return ctx.stylize(base, 'special');
+    } else if (constructor === 'Object') {
       if (isArgumentsObject(value)) {
         braces[0] = '[Arguments] {';
       } else if (tag !== '') {
@@ -958,10 +982,6 @@ function formatRaw(ctx, value, recurseTimes, typedArray) {
       if (keys.length === 0 && protoProps === undefined) {
         return `${braces[0]}}`;
       }
-    } else if (typeof value === 'function') {
-      base = getFunctionBase(value, constructor, tag);
-      if (keys.length === 0 && protoProps === undefined)
-        return ctx.stylize(base, 'special');
     } else if (isRegExp(value)) {
       // Make RegExps say that they are RegExps
       base = RegExpPrototypeToString(
diff --git a/test/parallel/test-util-inspect.js b/test/parallel/test-util-inspect.js
index 811f085879e2c6..d23de817696909 100644
--- a/test/parallel/test-util-inspect.js
+++ b/test/parallel/test-util-inspect.js
@@ -3323,3 +3323,33 @@ assert.strictEqual(
     }
   }), '{ [Symbol(Symbol.iterator)]: [Getter] }');
 }
+
+{
+  const o = {};
+  const { prototype: BuiltinPrototype } = Object;
+  const desc = Reflect.getOwnPropertyDescriptor(BuiltinPrototype, 'constructor');
+  Object.defineProperty(BuiltinPrototype, 'constructor', {
+    get: () => BuiltinPrototype,
+    configurable: true,
+  });
+  assert.strictEqual(
+    util.inspect(o),
+    '{}',
+  );
+  Object.defineProperty(BuiltinPrototype, 'constructor', desc);
+}
+
+{
+  const o = { f() {} };
+  const { prototype: BuiltinPrototype } = Function;
+  const desc = Reflect.getOwnPropertyDescriptor(BuiltinPrototype, 'constructor');
+  Object.defineProperty(BuiltinPrototype, 'constructor', {
+    get: () => BuiltinPrototype,
+    configurable: true,
+  });
+  assert.strictEqual(
+    util.inspect(o),
+    '{ f: [Function: f] }',
+  );
+  Object.defineProperty(BuiltinPrototype, 'constructor', desc);
+}

From 739ee18430479c146c7dba8f1a7ccf4218b4b1be Mon Sep 17 00:00:00 2001
From: ZYSzys <zhangyongsheng.dev@dcarlife.com>
Date: Wed, 11 Dec 2024 21:37:21 +0800
Subject: [PATCH 182/216] http2: support ALPNCallback option
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56187
Fixes: https://github.com/nodejs/node/issues/55994
Refs: https://github.com/nodejs/node/pull/45190
Reviewed-By: Tim Perry <pimterry@gmail.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 lib/internal/http2/core.js       | 10 +++++--
 test/parallel/test-http2-alpn.js | 47 ++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 3 deletions(-)
 create mode 100644 test/parallel/test-http2-alpn.js

diff --git a/lib/internal/http2/core.js b/lib/internal/http2/core.js
index 2b8d25acfe7ae3..ca7e1fb21ccfea 100644
--- a/lib/internal/http2/core.js
+++ b/lib/internal/http2/core.js
@@ -3155,9 +3155,13 @@ function initializeOptions(options) {
 
 function initializeTLSOptions(options, servername) {
   options = initializeOptions(options);
-  options.ALPNProtocols = ['h2'];
-  if (options.allowHTTP1 === true)
-    options.ALPNProtocols.push('http/1.1');
+
+  if (!options.ALPNCallback) {
+    options.ALPNProtocols = ['h2'];
+    if (options.allowHTTP1 === true)
+      options.ALPNProtocols.push('http/1.1');
+  }
+
   if (servername !== undefined && !options.servername)
     options.servername = servername;
   return options;
diff --git a/test/parallel/test-http2-alpn.js b/test/parallel/test-http2-alpn.js
new file mode 100644
index 00000000000000..a073d26e576cce
--- /dev/null
+++ b/test/parallel/test-http2-alpn.js
@@ -0,0 +1,47 @@
+'use strict';
+const common = require('../common');
+const fixtures = require('../common/fixtures');
+
+// This test verifies that http2 server support ALPNCallback option.
+
+if (!common.hasCrypto) common.skip('missing crypto');
+
+const assert = require('assert');
+const h2 = require('http2');
+const tls = require('tls');
+
+{
+  // Server sets two incompatible ALPN options:
+  assert.throws(() => h2.createSecureServer({
+    ALPNCallback: () => 'a',
+    ALPNProtocols: ['b', 'c']
+  }), (error) => error.code === 'ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLS');
+}
+
+{
+  const server = h2.createSecureServer({
+    key: fixtures.readKey('rsa_private.pem'),
+    cert: fixtures.readKey('rsa_cert.crt'),
+    ALPNCallback: () => 'a',
+  });
+
+  server.on(
+    'secureConnection',
+    common.mustCall((socket) => {
+      assert.strictEqual(socket.alpnProtocol, 'a');
+      socket.end();
+      server.close();
+    })
+  );
+
+  server.listen(0, function() {
+    const client = tls.connect({
+      port: server.address().port,
+      rejectUnauthorized: false,
+      ALPNProtocols: ['a'],
+    }, common.mustCall(() => {
+      assert.strictEqual(client.alpnProtocol, 'a');
+      client.end();
+    }));
+  });
+}

From aa031b3eecff509a2e1ee4772148e6ee73c9252c Mon Sep 17 00:00:00 2001
From: Stephen Belanger <admin@stephenbelanger.com>
Date: Wed, 11 Dec 2024 21:37:36 +0800
Subject: [PATCH 183/216] worker: fix crash when a worker joins after exit

If a worker has not already joined before running to completion
it will join in a SetImmediateThreadsafe which could occur after
the worker has already ended by other means. Mutating a JS object
at that point would fail because the isolate is already disposed.

PR-URL: https://github.com/nodejs/node/pull/56191
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
---
 src/node_worker.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/node_worker.cc b/src/node_worker.cc
index eaac06166c6179..255dc042ffc684 100644
--- a/src/node_worker.cc
+++ b/src/node_worker.cc
@@ -449,6 +449,9 @@ void Worker::JoinThread() {
 
   env()->remove_sub_worker_context(this);
 
+  // Join may happen after the worker exits and disposes the isolate
+  if (!env()->can_call_into_js()) return;
+
   {
     HandleScope handle_scope(env()->isolate());
     Context::Scope context_scope(env()->context());

From f9f3003de717d7c993ad44de022e1e30173979d4 Mon Sep 17 00:00:00 2001
From: Chengzhong Wu <cwu631@bloomberg.net>
Date: Wed, 11 Dec 2024 17:04:19 +0000
Subject: [PATCH 184/216] src: fix outdated js2c.cc references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56133
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 src/node_builtins.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/node_builtins.h b/src/node_builtins.h
index 1cb85b9058d065..a73de23a1debfd 100644
--- a/src/node_builtins.h
+++ b/src/node_builtins.h
@@ -71,7 +71,7 @@ using BuiltinSourceMap = std::map<std::string, UnionBytes>;
 using BuiltinCodeCacheMap =
     std::unordered_map<std::string, BuiltinCodeCacheData>;
 
-// Generated by tools/js2c.py as node_javascript.cc
+// Generated by tools/js2c.cc as node_javascript.cc
 void RegisterExternalReferencesForInternalizedBuiltinCode(
     ExternalReferenceRegistry* registry);
 
@@ -134,7 +134,7 @@ class NODE_EXTERN_PRIVATE BuiltinLoader {
   // Only allow access from friends.
   friend class CodeCacheBuilder;
 
-  // Generated by tools/js2c.py as node_javascript.cc
+  // Generated by tools/js2c.cc as node_javascript.cc
   void LoadJavaScriptSource();  // Loads data into source_
   UnionBytes GetConfig();       // Return data for config.gypi
 

From 80e5bb87c45e71a59b1d6775a32256f2509310ec Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Wed, 11 Dec 2024 14:04:36 -0300
Subject: [PATCH 185/216] doc: update blog link to /vulnerability

PR-URL: https://github.com/nodejs/node/pull/56198
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 SECURITY.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/SECURITY.md b/SECURITY.md
index fc95e1941698e6..19e876939f0f55 100644
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -218,7 +218,7 @@ as any other stable feature.
 Security notifications will be distributed via the following methods.
 
 * <https://groups.google.com/group/nodejs-sec>
-* <https://nodejs.org/en/blog/>
+* <https://nodejs.org/en/blog/vulnerability>
 
 ## Comments on this policy
 

From d103917d9253cb34effe1dbd4d8c887501d2b68b Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Wed, 11 Dec 2024 16:08:22 -0300
Subject: [PATCH 186/216] doc: update announce documentation for releases

This updates the announce section to match the current
process of announcing releases on Node.js. This also
mentions the latest Bluesky automation.

PR-URL: https://github.com/nodejs/node/pull/56200
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Ruy Adorno <ruy@vlt.sh>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
---
 doc/contributing/releases.md | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/doc/contributing/releases.md b/doc/contributing/releases.md
index 90bb324d83ad6e..b3b20b8ae5589e 100644
--- a/doc/contributing/releases.md
+++ b/doc/contributing/releases.md
@@ -1090,20 +1090,18 @@ This script will use the promoted builds and changelog to generate the post. Run
 ### 19. Announce
 
 The nodejs.org website will automatically rebuild and include the new version.
-To announce the build on Twitter through the official @nodejs account, email
-<pr@nodejs.org> with a message such as:
+To announce the build on social media, please ping the @nodejs-social-team
+on offical slack channel.
+
+Node.js is also available on Bluesky and a release announcement can be
+reposted using [nodejs/bluesky](https://github.com/nodejs/bluesky) repository.
+
+The post content can be as simple as:
 
 > v5.8.0 of @nodejs is out: <https://nodejs.org/en/blog/release/v5.8.0/>
 > …
 > something here about notable changes
 
-To ensure communication goes out with the timing of the blog post, please allow
-24 hour prior notice. If known, please include the date and time the release
-will be shared with the community in the email to coordinate these
-announcements.
-
-Ping the IRC ops and the other [Partner Communities][] liaisons.
-
 <details>
 <summary>Security release</summary>
 
@@ -1437,7 +1435,6 @@ Typical resolution: sign the release again.
 [Build issue tracker]: https://github.com/nodejs/build/issues/new
 [CI lockdown procedure]: https://github.com/nodejs/build/blob/HEAD/doc/jenkins-guide.md#restricting-access-for-security-releases
 [Node.js Snap management repository]: https://github.com/nodejs/snap
-[Partner Communities]: https://github.com/nodejs/community-committee/blob/HEAD/governance/PARTNER_COMMUNITIES.md
 [Snap]: https://snapcraft.io/node
 [`create-release-proposal`]: https://github.com/nodejs/node/actions/workflows/create-release-proposal.yml
 [build-infra team]: https://github.com/orgs/nodejs/teams/build-infra

From 33cd7d3d8ca34b89fbfcbb87291583619919feb3 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Thu, 12 Dec 2024 11:43:18 +0100
Subject: [PATCH 187/216] tools: fix release proposal linter to support more
 than 1 folk preparing

PR-URL: https://github.com/nodejs/node/pull/56203
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 .github/workflows/lint-release-proposal.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/lint-release-proposal.yml b/.github/workflows/lint-release-proposal.yml
index bc2ac2d0127865..5f0f9a87329b17 100644
--- a/.github/workflows/lint-release-proposal.yml
+++ b/.github/workflows/lint-release-proposal.yml
@@ -43,7 +43,7 @@ jobs:
           PR_HEAD="$(gh pr view "$PR_URL" --json headRefOid -q .headRefOid)"
           echo "Head of $PR_URL: $PR_HEAD"
           echo "Current commit: $GITHUB_SHA"
-          [[ "$PR_HEAD" == "$GITHUB_SHA" ]]
+          [ "$PR_HEAD" = "$GITHUB_SHA" ]
         env:
           GH_TOKEN: ${{ github.token }}
       - name: Validate CHANGELOG
@@ -53,7 +53,7 @@ jobs:
           echo "Expected CHANGELOG section title: $EXPECTED_CHANGELOG_TITLE_INTRO"
           CHANGELOG_TITLE="$(grep "$EXPECTED_CHANGELOG_TITLE_INTRO" "doc/changelogs/CHANGELOG_V${COMMIT_SUBJECT:20:2}.md")"
           echo "Actual: $CHANGELOG_TITLE"
-          [[ "${CHANGELOG_TITLE%@*}@" == "$EXPECTED_CHANGELOG_TITLE_INTRO" ]]
+          [ "${CHANGELOG_TITLE%%@*}@" = "$EXPECTED_CHANGELOG_TITLE_INTRO" ]
       - name: Verify NODE_VERSION_IS_RELEASE bit is correctly set
         run: |
           grep -q '^#define NODE_VERSION_IS_RELEASE 1$' src/node_version.h

From ae3f6fbe595eef73e0c0b392f48a4ae915e8fac8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=B2=88=E9=B8=BF=E9=A3=9E?= <shen.hongfei@outlook.com>
Date: Thu, 12 Dec 2024 19:23:13 +0800
Subject: [PATCH 188/216] doc: `sea.getRawAsset(key)` always returns an
 ArrayBuffer

PR-URL: https://github.com/nodejs/node/pull/56206
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/single-executable-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/api/single-executable-applications.md b/doc/api/single-executable-applications.md
index c282446fa52a91..5327f8b34db830 100644
--- a/doc/api/single-executable-applications.md
+++ b/doc/api/single-executable-applications.md
@@ -344,7 +344,7 @@ writes to the returned array buffer is likely to result in a crash.
 
 * `key`  {string} the key for the asset in the dictionary specified by the
   `assets` field in the single-executable application configuration.
-* Returns: {string|ArrayBuffer}
+* Returns: {ArrayBuffer}
 
 ### `require(id)` in the injected main script is not file based
 

From 2c6dcf7209d41d66955fdeecf95eb4efc5704754 Mon Sep 17 00:00:00 2001
From: LiviaMedeiros <livia@cirno.name>
Date: Fri, 29 Nov 2024 00:52:15 +0800
Subject: [PATCH 189/216] fs: make mutating `options` in Promises `readdir()`
 not affect results
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56057
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 lib/internal/fs/promises.js            | 4 ++++
 test/parallel/test-fs-readdir-types.js | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/lib/internal/fs/promises.js b/lib/internal/fs/promises.js
index 8314cf29a850f5..459b9ad1318901 100644
--- a/lib/internal/fs/promises.js
+++ b/lib/internal/fs/promises.js
@@ -941,6 +941,10 @@ async function readdirRecursive(originalPath, options) {
 
 async function readdir(path, options) {
   options = getOptions(options);
+
+  // Make shallow copy to prevent mutating options from affecting results
+  options = copyObject(options);
+
   path = getValidatedPath(path);
   if (options.recursive) {
     return readdirRecursive(path, options);
diff --git a/test/parallel/test-fs-readdir-types.js b/test/parallel/test-fs-readdir-types.js
index 3cc6b1cceff7fc..848d62399f0494 100644
--- a/test/parallel/test-fs-readdir-types.js
+++ b/test/parallel/test-fs-readdir-types.js
@@ -78,6 +78,14 @@ fs.readdir(readdirDir, {
   assertDirents(dirents);
 })().then(common.mustCall());
 
+// Check that mutating options doesn't affect results
+(async () => {
+  const options = { withFileTypes: true };
+  const direntsPromise = fs.promises.readdir(readdirDir, options);
+  options.withFileTypes = false;
+  assertDirents(await direntsPromise);
+})().then(common.mustCall());
+
 // Check for correct types when the binding returns unknowns
 const UNKNOWN = constants.UV_DIRENT_UNKNOWN;
 const oldReaddir = binding.readdir;

From 4196aaf033d92f728d513bfd4452567f41f8912d Mon Sep 17 00:00:00 2001
From: Michael Dawson <midawson@redhat.com>
Date: Thu, 12 Dec 2024 18:05:01 -0500
Subject: [PATCH 190/216] test: remove exludes for sea tests on PPC

The referenced issue is closed as having been
fixed, so the tests should run ok. Unexclude them.

Signed-off-by: Michael Dawson <midawson@redhat.com>
PR-URL: https://github.com/nodejs/node/pull/56217
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 test/sequential/sequential.status | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/test/sequential/sequential.status b/test/sequential/sequential.status
index 073b29cce8dbca..5f4445416d95fa 100644
--- a/test/sequential/sequential.status
+++ b/test/sequential/sequential.status
@@ -52,14 +52,3 @@ test-watch-mode-inspect: SKIP
 [$arch==s390x]
 # https://github.com/nodejs/node/issues/41286
 test-performance-eventloopdelay: PASS, FLAKY
-
-[$system==linux && $arch==ppc64]
-# https://github.com/nodejs/node/issues/50740
-test-single-executable-application-assets-raw: PASS, FLAKY
-test-single-executable-application-assets: PASS, FLAKY
-test-single-executable-application-disable-experimental-sea-warning: PASS, FLAKY
-test-single-executable-application-empty: PASS, FLAKY
-test-single-executable-application-snapshot-and-code-cache: PASS, FLAKY
-test-single-executable-application-snapshot: PASS, FLAKY
-test-single-executable-application-use-code-cache: PASS, FLAKY
-test-single-executable-application: PASS, FLAKY

From bd99bf109f189ef81a886b3900c757bce333b635 Mon Sep 17 00:00:00 2001
From: Chengzhong Wu <cwu631@bloomberg.net>
Date: Fri, 13 Dec 2024 14:49:45 +0000
Subject: [PATCH 191/216] node-api: allow napi_delete_reference in finalizers

`napi_delete_reference` must be called immediately for a
`napi_reference` returned from `napi_wrap` in the corresponding
finalizer, in order to clean up the `napi_reference` timely.

`napi_delete_reference` is safe to be invoked during GC.

PR-URL: https://github.com/nodejs/node/pull/55620
Reviewed-By: Gabriel Schulhof <gabrielschulhof@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Vladimir Morozov <vmorozov@microsoft.com>
---
 src/js_native_api_v8.cc                       |  4 +-
 .../6_object_wrap/6_object_wrap.cc            | 37 +++++++++++++++++--
 test/js-native-api/6_object_wrap/binding.gyp  |  7 ++++
 test/js-native-api/6_object_wrap/myobject.h   |  4 +-
 .../6_object_wrap/test-basic-finalizer.js     | 24 ++++++++++++
 5 files changed, 71 insertions(+), 5 deletions(-)
 create mode 100644 test/js-native-api/6_object_wrap/test-basic-finalizer.js

diff --git a/src/js_native_api_v8.cc b/src/js_native_api_v8.cc
index fe3c152e83b76d..99bd13c05721f0 100644
--- a/src/js_native_api_v8.cc
+++ b/src/js_native_api_v8.cc
@@ -2776,10 +2776,12 @@ napi_status NAPI_CDECL napi_create_reference(napi_env env,
 
 // Deletes a reference. The referenced value is released, and may be GC'd unless
 // there are other references to it.
+// For a napi_reference returned from `napi_wrap`, this must be called in the
+// finalizer.
 napi_status NAPI_CDECL napi_delete_reference(napi_env env, napi_ref ref) {
   // Omit NAPI_PREAMBLE and GET_RETURN_STATUS because V8 calls here cannot throw
   // JS exceptions.
-  CHECK_ENV_NOT_IN_GC(env);
+  CHECK_ENV(env);
   CHECK_ARG(env, ref);
 
   delete reinterpret_cast<v8impl::Reference*>(ref);
diff --git a/test/js-native-api/6_object_wrap/6_object_wrap.cc b/test/js-native-api/6_object_wrap/6_object_wrap.cc
index 49b1241fb38caa..8a380e3caa20bb 100644
--- a/test/js-native-api/6_object_wrap/6_object_wrap.cc
+++ b/test/js-native-api/6_object_wrap/6_object_wrap.cc
@@ -3,6 +3,8 @@
 #include "assert.h"
 #include "myobject.h"
 
+typedef int32_t FinalizerData;
+
 napi_ref MyObject::constructor;
 
 MyObject::MyObject(double value)
@@ -10,10 +12,16 @@ MyObject::MyObject(double value)
 
 MyObject::~MyObject() { napi_delete_reference(env_, wrapper_); }
 
-void MyObject::Destructor(
-  napi_env env, void* nativeObject, void* /*finalize_hint*/) {
+void MyObject::Destructor(node_api_basic_env env,
+                          void* nativeObject,
+                          void* /*finalize_hint*/) {
   MyObject* obj = static_cast<MyObject*>(nativeObject);
   delete obj;
+
+  FinalizerData* data;
+  NODE_API_BASIC_CALL_RETURN_VOID(
+      env, napi_get_instance_data(env, reinterpret_cast<void**>(&data)));
+  *data += 1;
 }
 
 void MyObject::Init(napi_env env, napi_value exports) {
@@ -154,7 +162,7 @@ napi_value MyObject::Multiply(napi_env env, napi_callback_info info) {
 }
 
 // This finalizer should never be invoked.
-void ObjectWrapDanglingReferenceFinalizer(napi_env env,
+void ObjectWrapDanglingReferenceFinalizer(node_api_basic_env env,
                                           void* finalize_data,
                                           void* finalize_hint) {
   assert(0 && "unreachable");
@@ -198,8 +206,30 @@ napi_value ObjectWrapDanglingReferenceTest(napi_env env,
   return ret;
 }
 
+static napi_value GetFinalizerCallCount(napi_env env, napi_callback_info info) {
+  size_t argc = 1;
+  napi_value argv[1];
+  FinalizerData* data;
+  napi_value result;
+
+  NODE_API_CALL(env,
+                napi_get_cb_info(env, info, &argc, argv, nullptr, nullptr));
+  NODE_API_CALL(env,
+                napi_get_instance_data(env, reinterpret_cast<void**>(&data)));
+  NODE_API_CALL(env, napi_create_int32(env, *data, &result));
+  return result;
+}
+
+static void finalizeData(napi_env env, void* data, void* hint) {
+  delete reinterpret_cast<FinalizerData*>(data);
+}
+
 EXTERN_C_START
 napi_value Init(napi_env env, napi_value exports) {
+  FinalizerData* data = new FinalizerData;
+  *data = 0;
+  NODE_API_CALL(env, napi_set_instance_data(env, data, finalizeData, nullptr));
+
   MyObject::Init(env, exports);
 
   napi_property_descriptor descriptors[] = {
@@ -207,6 +237,7 @@ napi_value Init(napi_env env, napi_value exports) {
                                 ObjectWrapDanglingReference),
       DECLARE_NODE_API_PROPERTY("objectWrapDanglingReferenceTest",
                                 ObjectWrapDanglingReferenceTest),
+      DECLARE_NODE_API_PROPERTY("getFinalizerCallCount", GetFinalizerCallCount),
   };
 
   NODE_API_CALL(
diff --git a/test/js-native-api/6_object_wrap/binding.gyp b/test/js-native-api/6_object_wrap/binding.gyp
index 44c9c3f837b4a6..2be24c9ec171a9 100644
--- a/test/js-native-api/6_object_wrap/binding.gyp
+++ b/test/js-native-api/6_object_wrap/binding.gyp
@@ -5,6 +5,13 @@
       "sources": [
         "6_object_wrap.cc"
       ]
+    },
+    {
+      "target_name": "6_object_wrap_basic_finalizer",
+      "defines": [ "NAPI_EXPERIMENTAL" ],
+      "sources": [
+        "6_object_wrap.cc"
+      ]
     }
   ]
 }
diff --git a/test/js-native-api/6_object_wrap/myobject.h b/test/js-native-api/6_object_wrap/myobject.h
index 337180598bc042..0faff676b4d992 100644
--- a/test/js-native-api/6_object_wrap/myobject.h
+++ b/test/js-native-api/6_object_wrap/myobject.h
@@ -6,7 +6,9 @@
 class MyObject {
  public:
   static void Init(napi_env env, napi_value exports);
-  static void Destructor(napi_env env, void* nativeObject, void* finalize_hint);
+  static void Destructor(node_api_basic_env env,
+                         void* nativeObject,
+                         void* finalize_hint);
 
  private:
   explicit MyObject(double value_ = 0);
diff --git a/test/js-native-api/6_object_wrap/test-basic-finalizer.js b/test/js-native-api/6_object_wrap/test-basic-finalizer.js
new file mode 100644
index 00000000000000..46b5672c5fa4b9
--- /dev/null
+++ b/test/js-native-api/6_object_wrap/test-basic-finalizer.js
@@ -0,0 +1,24 @@
+// Flags: --expose-gc
+
+'use strict';
+const common = require('../../common');
+const assert = require('assert');
+const addon = require(`./build/${common.buildType}/6_object_wrap_basic_finalizer`);
+
+// This test verifies that ObjectWrap can be correctly finalized with a node_api_basic_finalizer
+// in the current JS loop tick
+(() => {
+  let obj = new addon.MyObject(9);
+  obj = null;
+  // Silent eslint about unused variables.
+  assert.strictEqual(obj, null);
+})();
+
+for (let i = 0; i < 10; ++i) {
+  global.gc();
+  if (addon.getFinalizerCallCount() === 1) {
+    break;
+  }
+}
+
+assert.strictEqual(addon.getFinalizerCallCount(), 1);

From b24a85b00bd08566c2a44743db9ca47b7d114530 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Fri, 13 Dec 2024 19:11:27 +0100
Subject: [PATCH 192/216] tools: use `github.actor` instead of bot username for
 release proposals
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56232
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 .github/workflows/create-release-proposal.yml | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/create-release-proposal.yml b/.github/workflows/create-release-proposal.yml
index de8e91d98a3557..f3add22090cbc0 100644
--- a/.github/workflows/create-release-proposal.yml
+++ b/.github/workflows/create-release-proposal.yml
@@ -1,5 +1,4 @@
 # This action requires the following secrets to be set on the repository:
-#   GH_USER_NAME: GitHub user whose Jenkins and GitHub token are defined below
 #   GH_USER_TOKEN: GitHub user token, to be used by ncu and to push changes
 
 name: Create Release Proposal
@@ -52,20 +51,18 @@ jobs:
         run: |
           ncu-config set branch "${RELEASE_BRANCH}"
           ncu-config set upstream origin
-          ncu-config set username "$USERNAME"
+          ncu-config set username "$GITHUB_ACTOR"
           ncu-config set token "$GH_TOKEN"
           ncu-config set repo "$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)"
           ncu-config set owner "${GITHUB_REPOSITORY_OWNER}"
         env:
-          USERNAME: ${{ secrets.JENKINS_USER }}
           GH_TOKEN: ${{ github.token }}
 
       - name: Set up ghauth config (Ubuntu)
         run: |
           mkdir -p "${XDG_CONFIG_HOME:-~/.config}/changelog-maker"
-          echo '{}' | jq '{user: env.USERNAME, token: env.TOKEN}' > "${XDG_CONFIG_HOME:-~/.config}/changelog-maker/config.json"
+          echo '{}' | jq '{user: env.GITHUB_ACTOR, token: env.TOKEN}' > "${XDG_CONFIG_HOME:-~/.config}/changelog-maker/config.json"
         env:
-          USERNAME: ${{ secrets.JENKINS_USER }}
           TOKEN: ${{ github.token }}
 
       - name: Setup git author

From 090c7a3b01c14e2182457555cb9661405b6ddc7a Mon Sep 17 00:00:00 2001
From: Selveter Senitro <107211156+selveter@users.noreply.github.com>
Date: Sat, 14 Dec 2024 15:07:58 +0500
Subject: [PATCH 193/216] doc: fix 'which' to 'that' and add commas

PR-URL: https://github.com/nodejs/node/pull/56216
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
---
 doc/contributing/technical-priorities.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/contributing/technical-priorities.md b/doc/contributing/technical-priorities.md
index 9e566f12ae6750..68ac6f8dd0d00a 100644
--- a/doc/contributing/technical-priorities.md
+++ b/doc/contributing/technical-priorities.md
@@ -21,11 +21,11 @@ on October 1st 2022.
 
 _Present in: 2021_
 
-Base HTTP support is a key component of modern cloud-native applications
+Base HTTP support is a key component of modern cloud-native applications,
 and built-in support was part of what made Node.js a success in the first
 10 years. The current implementation is hard to support and a common
 source of vulnerabilities. We must work towards an
-implementation which is easier to support and makes it easier to integrate
+implementation that is easier to support and makes it easier to integrate
 the new HTTP versions (HTTP3, QUIC) and to support efficient
 implementations of different versions concurrently.
 
@@ -96,7 +96,7 @@ supported tools to implement those processes (logging, metrics and tracing).
 This includes support within the Node.js runtime itself (for example
 generating heap dumps, performance metrics, etc.) as well as support for
 applications on top of the runtime. In addition, it is also important to
-clearly document the use cases, problem determination methods and best
+clearly document the use cases, problem determination methods, and best
 practices for those tools.
 
 ## Better multithreaded support

From 83137bceb65e5660233fb760e4b1435544cfa42f Mon Sep 17 00:00:00 2001
From: Mert Can Altin <mertgold60@gmail.com>
Date: Sat, 14 Dec 2024 21:09:49 +0300
Subject: [PATCH 194/216] util: fix Latin1 decoding to return string output

PR-URL: https://github.com/nodejs/node/pull/56222
Fixes: https://github.com/nodejs/node/issues/56219
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 src/encoding_binding.cc                 |  8 +++++---
 test/cctest/test_encoding_binding.cc    | 23 ++++++++++++++++++++++-
 test/parallel/test-util-text-decoder.js | 17 +++++++++++++++++
 3 files changed, 44 insertions(+), 4 deletions(-)
 create mode 100644 test/parallel/test-util-text-decoder.js

diff --git a/src/encoding_binding.cc b/src/encoding_binding.cc
index a132eeb62306c6..885a0d072312e9 100644
--- a/src/encoding_binding.cc
+++ b/src/encoding_binding.cc
@@ -286,9 +286,11 @@ void BindingData::DecodeLatin1(const FunctionCallbackInfo<Value>& args) {
         env->isolate(), "The encoded data was not valid for encoding latin1");
   }
 
-  Local<Object> buffer_result =
-      node::Buffer::Copy(env, result.c_str(), written).ToLocalChecked();
-  args.GetReturnValue().Set(buffer_result);
+  Local<String> output =
+      String::NewFromUtf8(
+          env->isolate(), result.c_str(), v8::NewStringType::kNormal, written)
+          .ToLocalChecked();
+  args.GetReturnValue().Set(output);
 }
 
 }  // namespace encoding_binding
diff --git a/test/cctest/test_encoding_binding.cc b/test/cctest/test_encoding_binding.cc
index 06cc36d8f6ae34..d5d14c60fedf7e 100644
--- a/test/cctest/test_encoding_binding.cc
+++ b/test/cctest/test_encoding_binding.cc
@@ -26,7 +26,7 @@ bool RunDecodeLatin1(Environment* env,
     return false;
   }
 
-  *result = try_catch.Exception();
+  *result = args[0];
   return true;
 }
 
@@ -151,5 +151,26 @@ TEST_F(EncodingBindingTest, DecodeLatin1_BOMPresent) {
   EXPECT_STREQ(*utf8_result, "Áéó");
 }
 
+TEST_F(EncodingBindingTest, DecodeLatin1_ReturnsString) {
+  Environment* env = CreateEnvironment();
+  Isolate* isolate = env->isolate();
+  HandleScope handle_scope(isolate);
+
+  const uint8_t latin1_data[] = {0xC1, 0xE9, 0xF3};
+  Local<ArrayBuffer> ab = ArrayBuffer::New(isolate, sizeof(latin1_data));
+  memcpy(ab->GetBackingStore()->Data(), latin1_data, sizeof(latin1_data));
+
+  Local<Uint8Array> array = Uint8Array::New(ab, 0, sizeof(latin1_data));
+  Local<Value> args[] = {array};
+
+  Local<Value> result;
+  ASSERT_TRUE(RunDecodeLatin1(env, args, false, false, &result));
+
+  ASSERT_TRUE(result->IsString());
+
+  String::Utf8Value utf8_result(isolate, result);
+  EXPECT_STREQ(*utf8_result, "Áéó");
+}
+
 }  // namespace encoding_binding
 }  // namespace node
diff --git a/test/parallel/test-util-text-decoder.js b/test/parallel/test-util-text-decoder.js
new file mode 100644
index 00000000000000..0f6d0463f9da48
--- /dev/null
+++ b/test/parallel/test-util-text-decoder.js
@@ -0,0 +1,17 @@
+'use strict';
+
+const common = require('../common');
+
+const test = require('node:test');
+const assert = require('node:assert');
+
+test('TextDecoder correctly decodes windows-1252 encoded data', { skip: !common.hasIntl }, () => {
+  const latin1Bytes = new Uint8Array([0xc1, 0xe9, 0xf3]);
+
+  const expectedString = 'Áéó';
+
+  const decoder = new TextDecoder('windows-1252');
+  const decodedString = decoder.decode(latin1Bytes);
+
+  assert.strictEqual(decodedString, expectedString);
+});

From 722dada673067e7c82687d9d677968be33243965 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alfredo=20Gonz=C3=A1lez?=
 <12631491+mfdebian@users.noreply.github.com>
Date: Sat, 14 Dec 2024 16:50:34 -0300
Subject: [PATCH 195/216] doc: add esm examples to node:readline

PR-URL: https://github.com/nodejs/node/pull/55335
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/readline.md | 147 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 127 insertions(+), 20 deletions(-)

diff --git a/doc/api/readline.md b/doc/api/readline.md
index bf0951fdd1b55c..cb1aa52605dc4a 100644
--- a/doc/api/readline.md
+++ b/doc/api/readline.md
@@ -703,9 +703,18 @@ added: v17.0.0
 The `readlinePromises.createInterface()` method creates a new `readlinePromises.Interface`
 instance.
 
-```js
-const readlinePromises = require('node:readline/promises');
-const rl = readlinePromises.createInterface({
+```mjs
+import { createInterface } from 'node:readline/promises';
+import { stdin, stdout } from 'node:process';
+const rl = createInterface({
+  input: stdin,
+  output: stdout,
+});
+```
+
+```cjs
+const { createInterface } = require('node:readline/promises');
+const rl = createInterface({
   input: process.stdin,
   output: process.stdout,
 });
@@ -960,9 +969,18 @@ changes:
 The `readline.createInterface()` method creates a new `readline.Interface`
 instance.
 
-```js
-const readline = require('node:readline');
-const rl = readline.createInterface({
+```mjs
+import { createInterface } from 'node:readline';
+import { stdin, stdout } from 'node:process';
+const rl = createInterface({
+  input: stdin,
+  output: stdout,
+});
+```
+
+```cjs
+const { createInterface } = require('node:readline');
+const rl = createInterface({
   input: process.stdin,
   output: process.stdout,
 });
@@ -1098,9 +1116,36 @@ if (process.stdin.isTTY)
 The following example illustrates the use of `readline.Interface` class to
 implement a small command-line interface:
 
-```js
-const readline = require('node:readline');
-const rl = readline.createInterface({
+```mjs
+import { createInterface } from 'node:readline';
+import { exit, stdin, stdout } from 'node:process';
+const rl = createInterface({
+  input: stdin,
+  output: stdout,
+  prompt: 'OHAI> ',
+});
+
+rl.prompt();
+
+rl.on('line', (line) => {
+  switch (line.trim()) {
+    case 'hello':
+      console.log('world!');
+      break;
+    default:
+      console.log(`Say what? I might have heard '${line.trim()}'`);
+      break;
+  }
+  rl.prompt();
+}).on('close', () => {
+  console.log('Have a great day!');
+  exit(0);
+});
+```
+
+```cjs
+const { createInterface } = require('node:readline');
+const rl = createInterface({
   input: process.stdin,
   output: process.stdout,
   prompt: 'OHAI> ',
@@ -1130,14 +1175,37 @@ A common use case for `readline` is to consume an input file one line at a
 time. The easiest way to do so is leveraging the [`fs.ReadStream`][] API as
 well as a `for await...of` loop:
 
-```js
-const fs = require('node:fs');
-const readline = require('node:readline');
+```mjs
+import { createReadStream } from 'node:fs';
+import { createInterface } from 'node:readline';
 
 async function processLineByLine() {
-  const fileStream = fs.createReadStream('input.txt');
+  const fileStream = createReadStream('input.txt');
 
-  const rl = readline.createInterface({
+  const rl = createInterface({
+    input: fileStream,
+    crlfDelay: Infinity,
+  });
+  // Note: we use the crlfDelay option to recognize all instances of CR LF
+  // ('\r\n') in input.txt as a single line break.
+
+  for await (const line of rl) {
+    // Each line in input.txt will be successively available here as `line`.
+    console.log(`Line from file: ${line}`);
+  }
+}
+
+processLineByLine();
+```
+
+```cjs
+const { createReadStream } = require('node:fs');
+const { createInterface } = require('node:readline');
+
+async function processLineByLine() {
+  const fileStream = createReadStream('input.txt');
+
+  const rl = createInterface({
     input: fileStream,
     crlfDelay: Infinity,
   });
@@ -1155,12 +1223,26 @@ processLineByLine();
 
 Alternatively, one could use the [`'line'`][] event:
 
-```js
-const fs = require('node:fs');
-const readline = require('node:readline');
+```mjs
+import { createReadStream } from 'node:fs';
+import { createInterface } from 'node:readline';
 
-const rl = readline.createInterface({
-  input: fs.createReadStream('sample.txt'),
+const rl = createInterface({
+  input: createReadStream('sample.txt'),
+  crlfDelay: Infinity,
+});
+
+rl.on('line', (line) => {
+  console.log(`Line from file: ${line}`);
+});
+```
+
+```cjs
+const { createReadStream } = require('node:fs');
+const { createInterface } = require('node:readline');
+
+const rl = createInterface({
+  input: createReadStream('sample.txt'),
   crlfDelay: Infinity,
 });
 
@@ -1172,7 +1254,32 @@ rl.on('line', (line) => {
 Currently, `for await...of` loop can be a bit slower. If `async` / `await`
 flow and speed are both essential, a mixed approach can be applied:
 
-```js
+```mjs
+import { once } from 'node:events';
+import { createReadStream } from 'node:fs';
+import { createInterface } from 'node:readline';
+
+(async function processLineByLine() {
+  try {
+    const rl = createInterface({
+      input: createReadStream('big-file.txt'),
+      crlfDelay: Infinity,
+    });
+
+    rl.on('line', (line) => {
+      // Process the line.
+    });
+
+    await once(rl, 'close');
+
+    console.log('File processed.');
+  } catch (err) {
+    console.error(err);
+  }
+})();
+```
+
+```cjs
 const { once } = require('node:events');
 const { createReadStream } = require('node:fs');
 const { createInterface } = require('node:readline');

From 2d88c4b42551c871598f55b7665118a3e2351d8f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alfredo=20Gonz=C3=A1lez?=
 <12631491+mfdebian@users.noreply.github.com>
Date: Sat, 14 Dec 2024 16:50:52 -0300
Subject: [PATCH 196/216] doc: add esm examples to node:repl

PR-URL: https://github.com/nodejs/node/pull/55432
Reviewed-By: Tierney Cyren <hello@bnb.im>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/repl.md | 166 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 150 insertions(+), 16 deletions(-)

diff --git a/doc/api/repl.md b/doc/api/repl.md
index 8d00cdeed3916a..a134c493a54812 100644
--- a/doc/api/repl.md
+++ b/doc/api/repl.md
@@ -10,7 +10,11 @@ The `node:repl` module provides a Read-Eval-Print-Loop (REPL) implementation
 that is available both as a standalone program or includible in other
 applications. It can be accessed using:
 
-```js
+```mjs
+import repl from 'node:repl';
+```
+
+```cjs
 const repl = require('node:repl');
 ```
 
@@ -106,7 +110,14 @@ The default evaluator provides access to any variables that exist in the global
 scope. It is possible to expose a variable to the REPL explicitly by assigning
 it to the `context` object associated with each `REPLServer`:
 
-```js
+```mjs
+import repl from 'node:repl';
+const msg = 'message';
+
+repl.start('> ').context.m = msg;
+```
+
+```cjs
 const repl = require('node:repl');
 const msg = 'message';
 
@@ -124,7 +135,19 @@ $ node repl_test.js
 Context properties are not read-only by default. To specify read-only globals,
 context properties must be defined using `Object.defineProperty()`:
 
-```js
+```mjs
+import repl from 'node:repl';
+const msg = 'message';
+
+const r = repl.start('> ');
+Object.defineProperty(r.context, 'm', {
+  configurable: false,
+  enumerable: true,
+  value: msg,
+});
+```
+
+```cjs
 const repl = require('node:repl');
 const msg = 'message';
 
@@ -280,20 +303,34 @@ When a new [`repl.REPLServer`][] is created, a custom evaluation function may be
 provided. This can be used, for instance, to implement fully customized REPL
 applications.
 
-The following illustrates a hypothetical example of a REPL that performs
-translation of text from one language to another:
+The following illustrates an example of a REPL that squares a given number:
 
-```js
+```mjs
+import repl from 'node:repl';
+
+function byThePowerOfTwo(number) {
+  return number * number;
+}
+
+function myEval(cmd, context, filename, callback) {
+  callback(null, byThePowerOfTwo(cmd));
+}
+
+repl.start({ prompt: 'Enter a number: ', eval: myEval });
+```
+
+```cjs
 const repl = require('node:repl');
-const { Translator } = require('translator');
 
-const myTranslator = new Translator('en', 'fr');
+function byThePowerOfTwo(number) {
+  return number * number;
+}
 
 function myEval(cmd, context, filename, callback) {
-  callback(null, myTranslator.translate(cmd));
+  callback(null, byThePowerOfTwo(cmd));
 }
 
-repl.start({ prompt: '> ', eval: myEval });
+repl.start({ prompt: 'Enter a number: ', eval: myEval });
 ```
 
 #### Recoverable errors
@@ -354,7 +391,21 @@ To fully customize the output of a [`repl.REPLServer`][] instance pass in a new
 function for the `writer` option on construction. The following example, for
 instance, simply converts any input text to upper case:
 
-```js
+```mjs
+import repl from 'node:repl';
+
+const r = repl.start({ prompt: '> ', eval: myEval, writer: myWriter });
+
+function myEval(cmd, context, filename, callback) {
+  callback(null, cmd);
+}
+
+function myWriter(output) {
+  return output.toUpperCase();
+}
+```
+
+```cjs
 const repl = require('node:repl');
 
 const r = repl.start({ prompt: '> ', eval: myEval, writer: myWriter });
@@ -380,7 +431,16 @@ added: v0.1.91
 Instances of `repl.REPLServer` are created using the [`repl.start()`][] method
 or directly using the JavaScript `new` keyword.
 
-```js
+```mjs
+import repl from 'node:repl';
+
+const options = { useColors: true };
+
+const firstInstance = repl.start(options);
+const secondInstance = new repl.REPLServer(options);
+```
+
+```cjs
 const repl = require('node:repl');
 
 const options = { useColors: true };
@@ -424,7 +484,20 @@ reference to the `context` object as the only argument.
 This can be used primarily to re-initialize REPL context to some pre-defined
 state:
 
-```js
+```mjs
+import repl from 'node:repl';
+
+function initializeContext(context) {
+  context.m = 'test';
+}
+
+const r = repl.start({ prompt: '> ' });
+initializeContext(r.context);
+
+r.on('reset', initializeContext);
+```
+
+```cjs
 const repl = require('node:repl');
 
 function initializeContext(context) {
@@ -475,7 +548,25 @@ properties:
 
 The following example shows two new commands added to the REPL instance:
 
-```js
+```mjs
+import repl from 'node:repl';
+
+const replServer = repl.start({ prompt: '> ' });
+replServer.defineCommand('sayhello', {
+  help: 'Say hello',
+  action(name) {
+    this.clearBufferedCommand();
+    console.log(`Hello, ${name}!`);
+    this.displayPrompt();
+  },
+});
+replServer.defineCommand('saybye', function saybye() {
+  console.log('Goodbye!');
+  this.close();
+});
+```
+
+```cjs
 const repl = require('node:repl');
 
 const replServer = repl.start({ prompt: '> ' });
@@ -637,7 +728,14 @@ The `repl.start()` method creates and starts a [`repl.REPLServer`][] instance.
 
 If `options` is a string, then it specifies the input prompt:
 
-```js
+```mjs
+import repl from 'node:repl';
+
+// a Unix style prompt
+repl.start('$ ');
+```
+
+```cjs
 const repl = require('node:repl');
 
 // a Unix style prompt
@@ -709,7 +807,43 @@ separate I/O interfaces.
 The following example, for instance, provides separate REPLs on `stdin`, a Unix
 socket, and a TCP socket:
 
-```js
+```mjs
+import net from 'node:net';
+import repl from 'node:repl';
+import process from 'node:process';
+
+let connections = 0;
+
+repl.start({
+  prompt: 'Node.js via stdin> ',
+  input: process.stdin,
+  output: process.stdout,
+});
+
+net.createServer((socket) => {
+  connections += 1;
+  repl.start({
+    prompt: 'Node.js via Unix socket> ',
+    input: socket,
+    output: socket,
+  }).on('exit', () => {
+    socket.end();
+  });
+}).listen('/tmp/node-repl-sock');
+
+net.createServer((socket) => {
+  connections += 1;
+  repl.start({
+    prompt: 'Node.js via TCP socket> ',
+    input: socket,
+    output: socket,
+  }).on('exit', () => {
+    socket.end();
+  });
+}).listen(5001);
+```
+
+```cjs
 const net = require('node:net');
 const repl = require('node:repl');
 let connections = 0;

From ea1c97ac16b44dc4e3c7cc7b8c680cf7bc50b6f8 Mon Sep 17 00:00:00 2001
From: Duncan <duncpro@icloud.com>
Date: Sun, 15 Dec 2024 16:56:39 -0500
Subject: [PATCH 197/216] buffer: document concat zero-fill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/55562
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Ulises Gascón <ulisesgascongonzalez@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
---
 doc/api/buffer.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/api/buffer.md b/doc/api/buffer.md
index ca820ba909dfdb..7844b840c9cc8d 100644
--- a/doc/api/buffer.md
+++ b/doc/api/buffer.md
@@ -1041,7 +1041,8 @@ in `list` by adding their lengths.
 
 If `totalLength` is provided, it is coerced to an unsigned integer. If the
 combined length of the `Buffer`s in `list` exceeds `totalLength`, the result is
-truncated to `totalLength`.
+truncated to `totalLength`. If the combined length of the `Buffer`s in `list` is
+less than `totalLength`, the remaining space is filled with zeros.
 
 ```mjs
 import { Buffer } from 'node:buffer';

From a51ef9d8299d5f005f5bb32dc5db0f0aaa7cf17b Mon Sep 17 00:00:00 2001
From: Kunal Kumar <kunaldevspro@gmail.com>
Date: Mon, 16 Dec 2024 03:26:47 +0530
Subject: [PATCH 198/216] doc: clarify util.aborted resource usage

PR-URL: https://github.com/nodejs/node/pull/55780
Fixes: https://github.com/nodejs/node/issues/55340
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jason Zhang <xzha4350@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
---
 doc/api/util.md | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/doc/api/util.md b/doc/api/util.md
index 276cf5ccd82946..74e37f237a9e1d 100644
--- a/doc/api/util.md
+++ b/doc/api/util.md
@@ -2123,39 +2123,55 @@ added: v19.7.0
 > Stability: 1 - Experimental
 
 * `signal` {AbortSignal}
-* `resource` {Object} Any non-null entity, reference to which is held weakly.
+* `resource` {Object} Any non-null object tied to the abortable operation and held weakly.
+  If `resource` is garbage collected before the `signal` aborts, the promise remains pending,
+  allowing Node.js to stop tracking it.
+  This helps prevent memory leaks in long-running or non-cancelable operations.
 * Returns: {Promise}
 
-Listens to abort event on the provided `signal` and
-returns a promise that is fulfilled when the `signal` is
-aborted. If the passed `resource` is garbage collected before the `signal` is
-aborted, the returned promise shall remain pending indefinitely.
+Listens to abort event on the provided `signal` and returns a promise that resolves when the `signal` is aborted.
+If `resource` is provided, it weakly references the operation's associated object,
+so if `resource` is garbage collected before the `signal` aborts,
+then returned promise shall remain pending.
+This prevents memory leaks in long-running or non-cancelable operations.
 
 ```cjs
 const { aborted } = require('node:util');
 
+// Obtain an object with an abortable signal, like a custom resource or operation.
 const dependent = obtainSomethingAbortable();
 
+// Pass `dependent` as the resource, indicating the promise should only resolve
+// if `dependent` is still in memory when the signal is aborted.
 aborted(dependent.signal, dependent).then(() => {
-  // Do something when dependent is aborted.
+
+  // This code runs when `dependent` is aborted.
+  console.log('Dependent resource was aborted.');
 });
 
+// Simulate an event that triggers the abort.
 dependent.on('event', () => {
-  dependent.abort();
+  dependent.abort(); // This will cause the `aborted` promise to resolve.
 });
 ```
 
 ```mjs
 import { aborted } from 'node:util';
 
+// Obtain an object with an abortable signal, like a custom resource or operation.
 const dependent = obtainSomethingAbortable();
 
+// Pass `dependent` as the resource, indicating the promise should only resolve
+// if `dependent` is still in memory when the signal is aborted.
 aborted(dependent.signal, dependent).then(() => {
-  // Do something when dependent is aborted.
+
+  // This code runs when `dependent` is aborted.
+  console.log('Dependent resource was aborted.');
 });
 
+// Simulate an event that triggers the abort.
 dependent.on('event', () => {
-  dependent.abort();
+  dependent.abort(); // This will cause the `aborted` promise to resolve.
 });
 ```
 

From d6a1b7440499110e25a5cb7a4574bdc34e3474e8 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Mon, 16 Dec 2024 20:51:05 -0300
Subject: [PATCH 199/216] build: add major release action

This action reminds collaborators of the upcoming
major release date. In the future, this action can
also update and create the branches (that's why the
action name is generic).

PR-URL: https://github.com/nodejs/node/pull/56199
Refs: https://github.com/nodejs/node/pull/55732
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
---
 .github/workflows/major-release.yml | 48 +++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 .github/workflows/major-release.yml

diff --git a/.github/workflows/major-release.yml b/.github/workflows/major-release.yml
new file mode 100644
index 00000000000000..a90be1798fac85
--- /dev/null
+++ b/.github/workflows/major-release.yml
@@ -0,0 +1,48 @@
+name: Major Release
+
+on:
+  schedule:
+    - cron: 0 0 15 2,8 *  # runs at midnight UTC every 15 February and 15 August
+
+permissions:
+  contents: read
+
+jobs:
+  create-issue:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+    steps:
+      - name: Check for release schedule
+        id: check-date
+        run: |
+          # Get the current month and day
+          MONTH=$(date +'%m')
+          DAY=$(date +'%d')
+          # We'll create the reminder issue two months prior the release
+          if [[ "$MONTH" == "02" || "$MONTH" == "08" ]] && [[ "$DAY" == "15" ]]; then
+            echo "create_issue=true" >> "$GITHUB_ENV"
+          fi
+      - name: Retrieve next major release info from nodejs/Release
+        if: env.create_issue == 'true'
+        run: |
+          curl -L https://github.com/nodejs/Release/raw/HEAD/schedule.json | \
+          jq -r 'to_entries | map(select(.value.start | strptime("%Y-%m-%d") | mktime > now)) | first | "VERSION=" + .key + "\nRELEASE_DATE=" + .value.start' >> "$GITHUB_ENV"
+      - name: Compute max date for landing semver-major PRs
+        if: env.create_issue == 'true'
+        run: |
+          echo "PR_MAX_DATE=$(date -d "$RELEASE_DATE -1 month" +%Y-%m-%d)" >> "$GITHUB_ENV"
+      - name: Create release announcement issue
+        if: env.create_issue == 'true'
+        run: |
+         gh issue create --repo "${GITHUB_REPOSITORY}" \
+           --title "Upcoming Node.js Major Release ($VERSION)" \
+           --body-file -<<EOF
+            A reminder that the next Node.js **SemVer Major release** is scheduled for **${RELEASE_DATE}**.
+            All commits that were landed until **${PR_MAX_DATE}** (one month prior to the release) will be included in the next semver major release. Please ensure that any necessary preparations are made in advance.
+            For more details on the release process, consult the [Node.js Release Working Group repository](https://github.com/nodejs/release).
+
+            cc: @nodejs/collaborators
+         EOF
+        env:
+          GH_TOKEN: ${{ github.token }}

From bc7bb1e269b23b7965124982fe2810022c9552b5 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Mon, 16 Dec 2024 20:02:29 -0500
Subject: [PATCH 200/216] deps: update c-ares to v1.34.4

PR-URL: https://github.com/nodejs/node/pull/56256
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 deps/cares/CMakeLists.txt                     |   9 +-
 deps/cares/Makefile.am                        |  29 +-
 deps/cares/Makefile.in                        |  35 +-
 deps/cares/Makefile.msvc                      |  29 +-
 deps/cares/RELEASE-NOTES.md                   | 100 +--
 deps/cares/aclocal.m4                         |   4 +-
 deps/cares/aminclude_static.am                |   6 +-
 deps/cares/configure                          | 669 ++++++++++++------
 deps/cares/configure.ac                       |  43 +-
 deps/cares/docs/Makefile.in                   |   6 +-
 deps/cares/docs/ares_create_query.3           |   3 +
 deps/cares/docs/ares_mkquery.3                |   3 +-
 deps/cares/docs/ares_send.3                   |   3 +
 deps/cares/include/Makefile.in                |   6 +-
 deps/cares/include/ares.h                     |   2 +-
 deps/cares/include/ares_version.h             |   4 +-
 ...espace.m4 => ares_check_user_namespace.m4} |  12 +-
 ...mespace.m4 => ares_check_uts_namespace.m4} |  12 +-
 deps/cares/m4/ax_append_compile_flags.m4      |  47 +-
 deps/cares/m4/ax_append_flag.m4               |  61 +-
 deps/cares/m4/ax_check_compile_flag.m4        |  57 +-
 deps/cares/m4/ax_code_coverage.m4             |  13 +-
 deps/cares/m4/ax_cxx_compile_stdcxx.m4        |  88 ++-
 deps/cares/src/Makefile.in                    |   6 +-
 deps/cares/src/lib/CMakeLists.txt             |  14 +-
 deps/cares/src/lib/Makefile.in                |  12 +-
 deps/cares/src/lib/ares_config.h.cmake        |   3 +
 deps/cares/src/lib/ares_config.h.in           |   3 +
 deps/cares/src/lib/ares_private.h             |  17 +-
 .../cares/src/lib/ares_set_socket_functions.c |   4 +-
 deps/cares/src/lib/ares_socket.c              |   3 +-
 deps/cares/src/lib/ares_sysconfig.c           |  92 ++-
 deps/cares/src/lib/ares_sysconfig_files.c     |  89 ++-
 .../src/lib/event/ares_event_configchg.c      |  22 +-
 deps/cares/src/lib/include/ares_buf.h         |  20 +
 deps/cares/src/lib/include/ares_str.h         |  14 +
 .../src/lib/record/ares_dns_multistring.c     |  58 +-
 deps/cares/src/lib/str/ares_buf.c             |  66 ++
 deps/cares/src/lib/str/ares_str.c             |  17 +
 deps/cares/src/tools/Makefile.in              |   6 +-
 40 files changed, 1099 insertions(+), 588 deletions(-)
 rename deps/cares/m4/{ax_check_user_namespace.m4 => ares_check_user_namespace.m4} (82%)
 rename deps/cares/m4/{ax_check_uts_namespace.m4 => ares_check_uts_namespace.m4} (87%)

diff --git a/deps/cares/CMakeLists.txt b/deps/cares/CMakeLists.txt
index f6560d56b08ddd..139defd8ffd159 100644
--- a/deps/cares/CMakeLists.txt
+++ b/deps/cares/CMakeLists.txt
@@ -1,6 +1,6 @@
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
-CMAKE_MINIMUM_REQUIRED (VERSION 3.5.0)
+CMAKE_MINIMUM_REQUIRED (VERSION 3.5.0...3.10.0)
 
 list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
 
@@ -12,7 +12,7 @@ INCLUDE (CheckCSourceCompiles)
 INCLUDE (CheckStructHasMember)
 INCLUDE (CheckLibraryExists)
 
-PROJECT (c-ares LANGUAGES C VERSION "1.34.3" )
+PROJECT (c-ares LANGUAGES C VERSION "1.34.4" )
 
 # Set this version before release
 SET (CARES_VERSION "${PROJECT_VERSION}")
@@ -30,7 +30,7 @@ INCLUDE (GNUInstallDirs) # include this *AFTER* PROJECT(), otherwise paths are w
 # For example, a version of 4:0:2 would generate output such as:
 #    libname.so   -> libname.so.2
 #    libname.so.2 -> libname.so.2.2.0
-SET (CARES_LIB_VERSIONINFO "21:2:19")
+SET (CARES_LIB_VERSIONINFO "21:3:19")
 
 
 OPTION (CARES_STATIC        "Build as a static library"                                             OFF)
@@ -271,6 +271,8 @@ ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "AIX")
 	LIST (APPEND SYSFLAGS -D_ALL_SOURCE -D_XOPEN_SOURCE=700 -D_USE_IRS)
 ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "FreeBSD")
 	# Don't define _XOPEN_SOURCE on FreeBSD, it actually reduces visibility instead of increasing it
+ELSEIF (CMAKE_SYSTEM_NAME STREQUAL "QNX")
+	LIST (APPEND SYSFLAGS -D_QNX_SOURCE)
 ELSEIF (WIN32)
 	LIST (APPEND SYSFLAGS -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WIN32_WINNT=0x0602)
 ENDIF ()
@@ -406,6 +408,7 @@ ENDIF ()
 
 CHECK_STRUCT_HAS_MEMBER("struct sockaddr_in6" sin6_scope_id "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID LANGUAGE C)
 
+CHECK_SYMBOL_EXISTS (strnlen         "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_STRNLEN)
 CHECK_SYMBOL_EXISTS (memmem          "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_MEMMEM)
 CHECK_SYMBOL_EXISTS (closesocket     "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_CLOSESOCKET)
 CHECK_SYMBOL_EXISTS (CloseSocket     "${CMAKE_EXTRA_INCLUDE_FILES}" HAVE_CLOSESOCKET_CAMEL)
diff --git a/deps/cares/Makefile.am b/deps/cares/Makefile.am
index e99161a45f7883..51b5f6be32be78 100644
--- a/deps/cares/Makefile.am
+++ b/deps/cares/Makefile.am
@@ -3,17 +3,24 @@
 # Copyright (C) the Massachusetts Institute of Technology.
 # Copyright (C) Daniel Stenberg
 #
-# Permission to use, copy, modify, and distribute this
-# software and its documentation for any purpose and without
-# fee is hereby granted, provided that the above copyright
-# notice appear in all copies and that both that copyright
-# notice and this permission notice appear in supporting
-# documentation, and that the name of M.I.T. not be used in
-# advertising or publicity pertaining to distribution of the
-# software without specific, written prior permission.
-# M.I.T. makes no representations about the suitability of
-# this software for any purpose.  It is provided "as is"
-# without express or implied warranty.
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
 #
 # SPDX-License-Identifier: MIT
 #
diff --git a/deps/cares/Makefile.in b/deps/cares/Makefile.in
index ba78cb77cbe335..2342125d136526 100644
--- a/deps/cares/Makefile.in
+++ b/deps/cares/Makefile.in
@@ -19,17 +19,24 @@
 # Copyright (C) the Massachusetts Institute of Technology.
 # Copyright (C) Daniel Stenberg
 #
-# Permission to use, copy, modify, and distribute this
-# software and its documentation for any purpose and without
-# fee is hereby granted, provided that the above copyright
-# notice appear in all copies and that both that copyright
-# notice and this permission notice appear in supporting
-# documentation, and that the name of M.I.T. not be used in
-# advertising or publicity pertaining to distribution of the
-# software without specific, written prior permission.
-# M.I.T. makes no representations about the suitability of
-# this software for any purpose.  It is provided "as is"
-# without express or implied warranty.
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
 #
 # SPDX-License-Identifier: MIT
 #
@@ -111,7 +118,9 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -121,8 +130,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \
diff --git a/deps/cares/Makefile.msvc b/deps/cares/Makefile.msvc
index 8395d1a7d67728..3266db415e09fe 100644
--- a/deps/cares/Makefile.msvc
+++ b/deps/cares/Makefile.msvc
@@ -1,17 +1,24 @@
 
 # Copyright (C) 2009-2013 by Daniel Stenberg
 #
-# Permission to use, copy, modify, and distribute this
-# software and its documentation for any purpose and without
-# fee is hereby granted, provided that the above copyright
-# notice appear in all copies and that both that copyright
-# notice and this permission notice appear in supporting
-# documentation, and that the name of M.I.T. not be used in
-# advertising or publicity pertaining to distribution of the
-# software without specific, written prior permission.
-# M.I.T. makes no representations about the suitability of
-# this software for any purpose.  It is provided "as is"
-# without express or implied warranty.
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
 #
 # SPDX-License-Identifier: MIT
 
diff --git a/deps/cares/RELEASE-NOTES.md b/deps/cares/RELEASE-NOTES.md
index f9d58d278432f1..19a204b3ea96bd 100644
--- a/deps/cares/RELEASE-NOTES.md
+++ b/deps/cares/RELEASE-NOTES.md
@@ -1,97 +1,25 @@
-## c-ares version 1.34.3 - November 9 2024
+## c-ares version 1.34.4 - December 14 2024
 
 This is a bugfix release.
 
 Changes:
-* Build the release package in an automated way so we can provide
-  provenance as per [SLSA3](https://slsa.dev/).
-  [PR #906](https://github.com/c-ares/c-ares/pull/906)
+* QNX Port: Port to QNX 8, add primary config reading support, add CI build. [PR #934](https://github.com/c-ares/c-ares/pull/934), [PR #937](https://github.com/c-ares/c-ares/pull/937), [PR #938](https://github.com/c-ares/c-ares/pull/938)
 
 Bugfixes:
-* Some upstream servers are non-compliant with EDNS options, resend queries
-  without EDNS. [Issue #911](https://github.com/c-ares/c-ares/issues/911)
-* Android: <=7 needs sys/system_properties.h
-  [a70637c](https://github.com/c-ares/c-ares/commit/a70637c)
-* Android: CMake needs `-D_GNU_SOURCE` and others.
-  [PR #915](https://github.com/c-ares/c-ares/pull/914)
-* TSAN warns on missing lock, but lock isn't actually necessary.
-  [PR #915](https://github.com/c-ares/c-ares/pull/915)
-* `ares_getaddrinfo()` for `AF_UNSPEC` should retry IPv4 if only IPv6 is
-  received. [765d558](https://github.com/c-ares/c-ares/commit/765d558)
-* `ares_send()` shouldn't return `ARES_EBADRESP`, its `ARES_EBADQUERY`.
-  [91519e7](https://github.com/c-ares/c-ares/commit/91519e7)
-* Fix typos in man pages. [PR #905](https://github.com/c-ares/c-ares/pull/905)
+* Empty TXT records were not being preserved. [PR #922](https://github.com/c-ares/c-ares/pull/922)
+* docs: update deprecation notices for `ares_create_query()` and `ares_mkquery()`. [PR #910](https://github.com/c-ares/c-ares/pull/910)
+* license: some files weren't properly updated. [PR #920](https://github.com/c-ares/c-ares/pull/920)
+* Fix bind local device regression from 1.34.0. [PR #929](https://github.com/c-ares/c-ares/pull/929), [PR #931](https://github.com/c-ares/c-ares/pull/931), [PR #935](https://github.com/c-ares/c-ares/pull/935)
+* CMake: set policy version to prevent deprecation warnings. [PR #932](https://github.com/c-ares/c-ares/pull/932)
+* CMake: shared and static library names should be the same on unix platforms like autotools uses. [PR #933](https://github.com/c-ares/c-ares/pull/933)
+* Update to latest autoconf archive macros for enhanced system compatibility. [PR #936](https://github.com/c-ares/c-ares/pull/936)
 
 Thanks go to these friendly people for their efforts and contributions for this
 release:
 
 * Brad House (@bradh352)
-* Jiwoo Park (@jimmy-park)
-
-
-## c-ares version 1.34.2 - October 15 2024
-
-This release contains a fix for downstream packages detecting the c-ares
-version based on the contents of the header file rather than the
-distributed pkgconf or cmake files.
-
-## c-ares version 1.34.1 - October 9 2024
-
-This release fixes a packaging issue.
-
-
-## c-ares version 1.34.0 - October 9 2024
-
-This is a feature and bugfix release.
-
-Features:
-* adig: read arguments from adigrc.
-  [PR #856](https://github.com/c-ares/c-ares/pull/856)
-* Add new pending write callback optimization via `ares_set_pending_write_cb`.
-  [PR #857](https://github.com/c-ares/c-ares/pull/857)
-* New function `ares_process_fds()`.
-  [PR #875](https://github.com/c-ares/c-ares/pull/875)
-* Failed servers should be probed rather than redirecting queries which could
-  cause unexpected latency.
-  [PR #877](https://github.com/c-ares/c-ares/pull/877)
-* adig: rework command line arguments to mimic dig from bind.
-  [PR #890](https://github.com/c-ares/c-ares/pull/890)
-* Add new method for overriding network functions
-  `ares_set_socket_function_ex()` to properly support all new functionality.
-  [PR #894](https://github.com/c-ares/c-ares/pull/894)
-* Fix regression with custom socket callbacks due to DNS cookie support.
-  [PR #895](https://github.com/c-ares/c-ares/pull/895)
-* ares_socket: set IP_BIND_ADDRESS_NO_PORT on ares_set_local_ip* tcp sockets
-  [PR #887](https://github.com/c-ares/c-ares/pull/887)
-* URI parser/writer for ares_set_servers_csv()/ares_get_servers_csv().
-  [PR #882](https://github.com/c-ares/c-ares/pull/882)
-
-Changes:
-* Connection handling modularization.
-  [PR #857](https://github.com/c-ares/c-ares/pull/857),
-  [PR #876](https://github.com/c-ares/c-ares/pull/876)
-* Expose library/utility functions to tools.
-  [PR #860](https://github.com/c-ares/c-ares/pull/860)
-* Remove `ares__` prefix, just use `ares_` for internal functions.
-  [PR #872](https://github.com/c-ares/c-ares/pull/872)
-
-
-Bugfixes:
-* fix: potential WIN32_LEAN_AND_MEAN redefinition.
-  [PR #869](https://github.com/c-ares/c-ares/pull/869)
-* Fix googletest v1.15 compatibility.
-  [PR #874](https://github.com/c-ares/c-ares/pull/874)
-* Fix pkgconfig thread dependencies.
-  [PR #884](https://github.com/c-ares/c-ares/pull/884)
-
-
-Thanks go to these friendly people for their efforts and contributions for this
-release:
-
-* Brad House (@bradh352)
-* Cristian Rodríguez (@crrodriguez)
-* Georg (@tacerus)
-* @lifenjoiner
-* Shelley Vohr (@codebytere)
-* 前进，前进，进 (@leleliu008)
-
+* Daniel Stenberg (@bagder)
+* Gregor Jasny (@gjasny)
+* @marcovsz
+* Nikolaos Chatzikonstantinou (@createyourpersonalaccount)
+* @vlasovsoft1979
diff --git a/deps/cares/aclocal.m4 b/deps/cares/aclocal.m4
index ce7ad1c8a86a43..04f8786c9c0c89 100644
--- a/deps/cares/aclocal.m4
+++ b/deps/cares/aclocal.m4
@@ -1221,6 +1221,8 @@ AC_SUBST([am__tar])
 AC_SUBST([am__untar])
 ]) # _AM_PROG_TAR
 
+m4_include([m4/ares_check_user_namespace.m4])
+m4_include([m4/ares_check_uts_namespace.m4])
 m4_include([m4/ax_ac_append_to_file.m4])
 m4_include([m4/ax_ac_print_to_file.m4])
 m4_include([m4/ax_add_am_macro_static.m4])
@@ -1231,8 +1233,6 @@ m4_include([m4/ax_append_link_flags.m4])
 m4_include([m4/ax_check_compile_flag.m4])
 m4_include([m4/ax_check_gnu_make.m4])
 m4_include([m4/ax_check_link_flag.m4])
-m4_include([m4/ax_check_user_namespace.m4])
-m4_include([m4/ax_check_uts_namespace.m4])
 m4_include([m4/ax_code_coverage.m4])
 m4_include([m4/ax_compiler_vendor.m4])
 m4_include([m4/ax_cxx_compile_stdcxx.m4])
diff --git a/deps/cares/aminclude_static.am b/deps/cares/aminclude_static.am
index b83549f81adde4..ec7a86a43e6829 100644
--- a/deps/cares/aminclude_static.am
+++ b/deps/cares/aminclude_static.am
@@ -1,6 +1,6 @@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Sat Nov  9 17:40:37 UTC 2024
+# from AX_AM_MACROS_STATIC on Sat Dec 14 15:15:44 UTC 2024
 
 
 # Code coverage
@@ -66,7 +66,7 @@ code_coverage_v_lcov_cap_ = $(code_coverage_v_lcov_cap_$(AM_DEFAULT_VERBOSITY))
 code_coverage_v_lcov_cap_0 = @echo "  LCOV   --capture" $(CODE_COVERAGE_OUTPUT_FILE);
 code_coverage_v_lcov_ign = $(code_coverage_v_lcov_ign_$(V))
 code_coverage_v_lcov_ign_ = $(code_coverage_v_lcov_ign_$(AM_DEFAULT_VERBOSITY))
-code_coverage_v_lcov_ign_0 = @echo "  LCOV   --remove /tmp/*" $(CODE_COVERAGE_IGNORE_PATTERN);
+code_coverage_v_lcov_ign_0 = @echo "  LCOV   --remove" "$(CODE_COVERAGE_OUTPUT_FILE).tmp" $(CODE_COVERAGE_IGNORE_PATTERN);
 code_coverage_v_genhtml = $(code_coverage_v_genhtml_$(V))
 code_coverage_v_genhtml_ = $(code_coverage_v_genhtml_$(AM_DEFAULT_VERBOSITY))
 code_coverage_v_genhtml_0 = @echo "  GEN   " "$(CODE_COVERAGE_OUTPUT_DIRECTORY)";
@@ -85,7 +85,7 @@ check-code-coverage:
 # Capture code coverage data
 code-coverage-capture: code-coverage-capture-hook
 	$(code_coverage_v_lcov_cap)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --capture --output-file "$(CODE_COVERAGE_OUTPUT_FILE).tmp" --test-name "$(call code_coverage_sanitize,$(PACKAGE_NAME)-$(PACKAGE_VERSION))" --no-checksum --compat-libtool $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_OPTIONS)
-	$(code_coverage_v_lcov_ign)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --remove "$(CODE_COVERAGE_OUTPUT_FILE).tmp" "/tmp/*" $(CODE_COVERAGE_IGNORE_PATTERN) --output-file "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_RMOPTS)
+	$(code_coverage_v_lcov_ign)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --remove "$(CODE_COVERAGE_OUTPUT_FILE).tmp" $(CODE_COVERAGE_IGNORE_PATTERN) --output-file "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_RMOPTS)
 	-@rm -f "$(CODE_COVERAGE_OUTPUT_FILE).tmp"
 	$(code_coverage_v_genhtml)LANG=C $(GENHTML) $(code_coverage_quiet) $(addprefix --prefix ,$(CODE_COVERAGE_DIRECTORY)) --output-directory "$(CODE_COVERAGE_OUTPUT_DIRECTORY)" --title "$(PACKAGE_NAME)-$(PACKAGE_VERSION) Code Coverage" --legend --show-details "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_GENHTML_OPTIONS)
 	@echo "file://$(abs_builddir)/$(CODE_COVERAGE_OUTPUT_DIRECTORY)/index.html"
diff --git a/deps/cares/configure b/deps/cares/configure
index 76b0ddf39c136a..d02f117d2f0b64 100755
--- a/deps/cares/configure
+++ b/deps/cares/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.71 for c-ares 1.34.3.
+# Generated by GNU Autoconf 2.71 for c-ares 1.34.4.
 #
 # Report bugs to <c-ares mailing list: http://lists.haxx.se/listinfo/c-ares>.
 #
@@ -621,8 +621,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='c-ares'
 PACKAGE_TARNAME='c-ares'
-PACKAGE_VERSION='1.34.3'
-PACKAGE_STRING='c-ares 1.34.3'
+PACKAGE_VERSION='1.34.4'
+PACKAGE_STRING='c-ares 1.34.4'
 PACKAGE_BUGREPORT='c-ares mailing list: http://lists.haxx.se/listinfo/c-ares'
 PACKAGE_URL=''
 
@@ -853,6 +853,7 @@ with_gcov
 enable_code_coverage
 enable_largefile
 enable_libgcc
+enable_tests_crossbuild
 '
       ac_precious_vars='build_alias
 host_alias
@@ -1423,7 +1424,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures c-ares 1.34.3 to adapt to many kinds of systems.
+\`configure' configures c-ares 1.34.4 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1494,7 +1495,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of c-ares 1.34.3:";;
+     short | recursive ) echo "Configuration of c-ares 1.34.4:";;
    esac
   cat <<\_ACEOF
 
@@ -1525,6 +1526,8 @@ Optional Features:
   --enable-code-coverage  Whether to enable code coverage support
   --disable-largefile     omit support for large files
   --enable-libgcc         use libgcc when linking
+  --enable-tests-crossbuild
+                          Enable test building even when cross building
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
@@ -1634,7 +1637,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-c-ares configure 1.34.3
+c-ares configure 1.34.4
 generated by GNU Autoconf 2.71
 
 Copyright (C) 2021 Free Software Foundation, Inc.
@@ -2258,7 +2261,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by c-ares $as_me 1.34.3, which was
+It was created by c-ares $as_me 1.34.4, which was
 generated by GNU Autoconf 2.71.  Invocation command line was
 
   $ $0$ac_configure_args_raw
@@ -3232,7 +3235,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
-CARES_VERSION_INFO="21:2:19"
+CARES_VERSION_INFO="21:3:19"
 
 
 
@@ -4891,7 +4894,17 @@ else $as_nop
 // MSVC always sets __cplusplus to 199711L in older versions; newer versions
 // only set it correctly if /Zc:__cplusplus is specified as well as a
 // /std:c++NN switch:
+//
 // https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
+//
+// The value __cplusplus ought to have is available in _MSVC_LANG since
+// Visual Studio 2015 Update 3:
+//
+// https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
+//
+// This was also the first MSVC version to support C++14 so we can't use the
+// value of either __cplusplus or _MSVC_LANG to quickly rule out MSVC having
+// C++11 or C++14 support, but we can check _MSVC_LANG for C++17 and later.
 #elif __cplusplus < 201103L && !defined _MSC_VER
 
 #error "This is not a C++11 compiler"
@@ -5914,7 +5927,7 @@ fi
 
 # Define the identity of the package.
  PACKAGE='c-ares'
- VERSION='1.34.3'
+ VERSION='1.34.4'
 
 
 printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h
@@ -19525,10 +19538,52 @@ then :
 
 fi
 
+	{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for _gcov_init in -lgcov" >&5
+printf %s "checking for _gcov_init in -lgcov... " >&6; }
+if test ${ac_cv_lib_gcov__gcov_init+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lgcov  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+char _gcov_init ();
+int
+main (void)
+{
+return _gcov_init ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"
+then :
+  ac_cv_lib_gcov__gcov_init=yes
+else $as_nop
+  ac_cv_lib_gcov__gcov_init=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_gcov__gcov_init" >&5
+printf "%s\n" "$ac_cv_lib_gcov__gcov_init" >&6; }
+if test "x$ac_cv_lib_gcov__gcov_init" = xyes
+then :
+  CODE_COVERAGE_LIBS="-lgcov"
+else $as_nop
+  CODE_COVERAGE_LIBS=""
+fi
+
+
 			CODE_COVERAGE_CPPFLAGS="-DNDEBUG"
 	CODE_COVERAGE_CFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
 	CODE_COVERAGE_CXXFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
-	CODE_COVERAGE_LIBS="-lgcov"
 
 
 
@@ -19805,27 +19860,37 @@ eval ac_res=\$$as_CACHEVAR
 printf "%s\n" "$ac_res" >&6; }
 if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${LDFLAGS+y}
+
+if test ${LDFLAGS+y}
 then :
-  case " $LDFLAGS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : LDFLAGS already contains \$flag"; } >&5
+
+  case " $LDFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : LDFLAGS already contains \$flag"; } >&5
   (: LDFLAGS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : LDFLAGS=\"\$LDFLAGS \$flag\""; } >&5
-  (: LDFLAGS="$LDFLAGS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append LDFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : LDFLAGS=\"\$LDFLAGS\""; } >&5
+  (: LDFLAGS="$LDFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      LDFLAGS="$LDFLAGS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  LDFLAGS="$flag"
+
+  LDFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : LDFLAGS=\"\$LDFLAGS\""; } >&5
+  (: LDFLAGS="$LDFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -19870,27 +19935,37 @@ if test "x$enable_shared" = "xno" -a "x$enable_static" = "xyes" ; then
   { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether we need CARES_STATICLIB definition" >&5
 printf %s "checking whether we need CARES_STATICLIB definition... " >&6; }
   if test "$ac_cv_native_windows" = "yes" ; then
-    if test ${AM_CPPFLAGS+y}
+
+if test ${AM_CPPFLAGS+y}
 then :
-  case " $AM_CPPFLAGS " in
-    *" -DCARES_STATICLIB "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS already contains -DCARES_STATICLIB"; } >&5
+
+  case " $AM_CPPFLAGS " in #(
+  *" -DCARES_STATICLIB "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS already contains -DCARES_STATICLIB"; } >&5
   (: AM_CPPFLAGS already contains -DCARES_STATICLIB) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS=\"\$AM_CPPFLAGS -DCARES_STATICLIB\""; } >&5
-  (: AM_CPPFLAGS="$AM_CPPFLAGS -DCARES_STATICLIB") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CPPFLAGS " -DCARES_STATICLIB"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS=\"\$AM_CPPFLAGS\""; } >&5
+  (: AM_CPPFLAGS="$AM_CPPFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      AM_CPPFLAGS="$AM_CPPFLAGS -DCARES_STATICLIB"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  AM_CPPFLAGS="-DCARES_STATICLIB"
+
+  AM_CPPFLAGS=-DCARES_STATICLIB
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS=\"\$AM_CPPFLAGS\""; } >&5
+  (: AM_CPPFLAGS="$AM_CPPFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
     PKGCONFIG_CFLAGS="-DCARES_STATICLIB"
@@ -19910,57 +19985,24 @@ if test "$symbol_hiding" != "no" ; then
   else
     case "$ax_cv_c_compiler_vendor" in
       clang|gnu|intel)
-        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts " >&5
-printf %s "checking whether C compiler accepts ... " >&6; }
-if test ${ax_cv_check_cflags__+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
 
-  ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS  "
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main (void)
-{
-
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"
-then :
-  ax_cv_check_cflags__=yes
-else $as_nop
-  ax_cv_check_cflags__=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
-  CFLAGS=$ax_check_save_flags
-fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_check_cflags__" >&5
-printf "%s\n" "$ax_cv_check_cflags__" >&6; }
-if test x"$ax_cv_check_cflags__" = xyes
-then :
-  :
-else $as_nop
-  :
-fi
 
 
 
 for flag in -fvisibility=hidden; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS  $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS  $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -19984,29 +20026,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${CARES_SYMBOL_HIDING_CFLAG+y}
+
+if test ${CARES_SYMBOL_HIDING_CFLAG+y}
 then :
-  case " $CARES_SYMBOL_HIDING_CFLAG " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG already contains \$flag"; } >&5
+
+  case " $CARES_SYMBOL_HIDING_CFLAG " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG already contains \$flag"; } >&5
   (: CARES_SYMBOL_HIDING_CFLAG already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG \$flag\""; } >&5
-  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append CARES_SYMBOL_HIDING_CFLAG " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG\""; } >&5
+  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  CARES_SYMBOL_HIDING_CFLAG="$flag"
+
+  CARES_SYMBOL_HIDING_CFLAG=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG\""; } >&5
+  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20022,17 +20074,22 @@ done
       sun)
 
 
+
+
 for flag in -xldscope=hidden; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS  $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS  $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -20056,29 +20113,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${CARES_SYMBOL_HIDING_CFLAG+y}
+
+if test ${CARES_SYMBOL_HIDING_CFLAG+y}
 then :
-  case " $CARES_SYMBOL_HIDING_CFLAG " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG already contains \$flag"; } >&5
+
+  case " $CARES_SYMBOL_HIDING_CFLAG " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG already contains \$flag"; } >&5
   (: CARES_SYMBOL_HIDING_CFLAG already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG \$flag\""; } >&5
-  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append CARES_SYMBOL_HIDING_CFLAG " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG\""; } >&5
+  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  CARES_SYMBOL_HIDING_CFLAG="$flag"
+
+  CARES_SYMBOL_HIDING_CFLAG=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : CARES_SYMBOL_HIDING_CFLAG=\"\$CARES_SYMBOL_HIDING_CFLAG\""; } >&5
+  (: CARES_SYMBOL_HIDING_CFLAG="$CARES_SYMBOL_HIDING_CFLAG") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20120,17 +20187,22 @@ fi
 if test "$enable_warnings" = "yes"; then
 
 
+
+
 for flag in -Wall -Wextra -Waggregate-return -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdouble-promotion -Wfloat-equal -Wformat-security -Winit-self -Wjump-misses-init -Wlogical-op -Wmissing-braces -Wmissing-declarations -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wno-coverage-mismatch -Wold-style-definition -Wpacked -Wpedantic -Wpointer-arith -Wredundant-decls -Wshadow -Wsign-conversion -Wstrict-overflow -Wstrict-prototypes -Wtrampolines -Wundef -Wunreachable-code -Wunused -Wvariadic-macros -Wvla -Wwrite-strings -Werror=implicit-int -Werror=implicit-function-declaration -Werror=partial-availability -Wno-long-long ; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS -Werror $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS -Werror $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -20154,29 +20226,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${AM_CFLAGS+y}
+
+if test ${AM_CFLAGS+y}
 then :
-  case " $AM_CFLAGS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
+
+  case " $AM_CFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
   (: AM_CFLAGS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS \$flag\""; } >&5
-  (: AM_CFLAGS="$AM_CFLAGS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      AM_CFLAGS="$AM_CFLAGS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  AM_CFLAGS="$flag"
+
+  AM_CFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20185,22 +20267,28 @@ fi
 
 done
 
+fi
+
+case $host_os in
+  *qnx*|*android*)
+
 
-    case $host_os in
-    *android*)
 
 
 for flag in -std=c99; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS -Werror $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS -Werror $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -20224,29 +20312,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${AM_CFLAGS+y}
+
+if test ${AM_CFLAGS+y}
 then :
-  case " $AM_CFLAGS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
+
+  case " $AM_CFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
   (: AM_CFLAGS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS \$flag\""; } >&5
-  (: AM_CFLAGS="$AM_CFLAGS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      AM_CFLAGS="$AM_CFLAGS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  AM_CFLAGS="$flag"
+
+  AM_CFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20255,21 +20353,26 @@ fi
 
 done
 
-      ;;
-    *)
+    ;;
+  *)
+
+
 
 
 for flag in -std=c90; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS -Werror $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS -Werror $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -20293,29 +20396,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${AM_CFLAGS+y}
+
+if test ${AM_CFLAGS+y}
 then :
-  case " $AM_CFLAGS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
+
+  case " $AM_CFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
   (: AM_CFLAGS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS \$flag\""; } >&5
-  (: AM_CFLAGS="$AM_CFLAGS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      AM_CFLAGS="$AM_CFLAGS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  AM_CFLAGS="$flag"
+
+  AM_CFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20324,24 +20437,115 @@ fi
 
 done
 
-      ;;
-  esac
+    ;;
+esac
+
+case $host_os in
+  *qnx*)
+
+
+
+
+for flag in -D_QNX_SOURCE; do
+  as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags_-Werror_$flag" | $as_tr_sh`
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
+if eval test \${$as_CACHEVAR+y}
+then :
+  printf %s "(cached) " >&6
+else $as_nop
+
+  ax_check_save_flags=$CFLAGS
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS -Werror $flag $add_gnu_werror"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main (void)
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+  eval "$as_CACHEVAR=yes"
+else $as_nop
+  eval "$as_CACHEVAR=no"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext
+  CFLAGS=$ax_check_save_flags
+fi
+eval ac_res=\$$as_CACHEVAR
+	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+printf "%s\n" "$ac_res" >&6; }
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
+then :
+
+if test ${AM_CPPFLAGS+y}
+then :
+
+  case " $AM_CPPFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS already contains \$flag"; } >&5
+  (: AM_CPPFLAGS already contains $flag) 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CPPFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS=\"\$AM_CPPFLAGS\""; } >&5
+  (: AM_CPPFLAGS="$AM_CPPFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+     ;;
+esac
+
+else $as_nop
+
+  AM_CPPFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CPPFLAGS=\"\$AM_CPPFLAGS\""; } >&5
+  (: AM_CPPFLAGS="$AM_CPPFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
+fi
+
+else $as_nop
+  :
 fi
 
+done
+
+  ;;
+esac
+
 if test "$ax_cv_c_compiler_vendor" = "intel"; then
 
 
+
+
 for flag in -shared-intel; do
   as_CACHEVAR=`printf "%s\n" "ax_cv_check_cflags__$flag" | $as_tr_sh`
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether C compiler accepts $flag" >&5
-printf %s "checking whether C compiler accepts $flag... " >&6; }
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler accepts $flag" >&5
+printf %s "checking whether the C compiler accepts $flag... " >&6; }
 if eval test \${$as_CACHEVAR+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
 
   ax_check_save_flags=$CFLAGS
-  CFLAGS="$CFLAGS  $flag"
+  if test x"$GCC" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  CFLAGS="$CFLAGS  $flag $add_gnu_werror"
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 
@@ -20365,29 +20569,39 @@ fi
 eval ac_res=\$$as_CACHEVAR
 	       { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
 printf "%s\n" "$ac_res" >&6; }
-if test x"`eval 'as_val=${'$as_CACHEVAR'};printf "%s\n" "$as_val"'`" = xyes
+if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${AM_CFLAGS+y}
+
+if test ${AM_CFLAGS+y}
 then :
-  case " $AM_CFLAGS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
+
+  case " $AM_CFLAGS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS already contains \$flag"; } >&5
   (: AM_CFLAGS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS \$flag\""; } >&5
-  (: AM_CFLAGS="$AM_CFLAGS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append AM_CFLAGS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      AM_CFLAGS="$AM_CFLAGS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  AM_CFLAGS="$flag"
+
+  AM_CFLAGS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : AM_CFLAGS=\"\$AM_CFLAGS\""; } >&5
+  (: AM_CFLAGS="$AM_CFLAGS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -20708,27 +20922,37 @@ eval ac_res=\$$as_CACHEVAR
 printf "%s\n" "$ac_res" >&6; }
 if eval test \"x\$"$as_CACHEVAR"\" = x"yes"
 then :
-  if test ${XNET_LIBS+y}
+
+if test ${XNET_LIBS+y}
 then :
-  case " $XNET_LIBS " in
-    *" $flag "*)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : XNET_LIBS already contains \$flag"; } >&5
+
+  case " $XNET_LIBS " in #(
+  *" $flag "*) :
+    { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : XNET_LIBS already contains \$flag"; } >&5
   (: XNET_LIBS already contains $flag) 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }
-      ;;
-    *)
-      { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : XNET_LIBS=\"\$XNET_LIBS \$flag\""; } >&5
-  (: XNET_LIBS="$XNET_LIBS $flag") 2>&5
+  test $ac_status = 0; } ;; #(
+  *) :
+
+     as_fn_append XNET_LIBS " $flag"
+     { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : XNET_LIBS=\"\$XNET_LIBS\""; } >&5
+  (: XNET_LIBS="$XNET_LIBS") 2>&5
   ac_status=$?
   printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }
-      XNET_LIBS="$XNET_LIBS $flag"
-      ;;
-   esac
+     ;;
+esac
+
 else $as_nop
-  XNET_LIBS="$flag"
+
+  XNET_LIBS=$flag
+  { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: : XNET_LIBS=\"\$XNET_LIBS\""; } >&5
+  (: XNET_LIBS="$XNET_LIBS") 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }
+
 fi
 
 else $as_nop
@@ -22131,6 +22355,14 @@ fi
 
 
 
+ac_fn_check_decl "$LINENO" "strnlen" "ac_cv_have_decl_strnlen" "$cares_all_includes
+" "$ac_c_undeclared_builtin_options" "CFLAGS"
+if test "x$ac_cv_have_decl_strnlen" = xyes
+then :
+
+printf "%s\n" "#define HAVE_STRNLEN 1" >>confdefs.h
+
+fi
 ac_fn_check_decl "$LINENO" "memmem" "ac_cv_have_decl_memmem" "$cares_all_includes
 " "$ac_c_undeclared_builtin_options" "CFLAGS"
 if test "x$ac_cv_have_decl_memmem" = xyes
@@ -23708,6 +23940,15 @@ printf "%s\n" "$as_me: WARNING: cannot build tests when cross compiling" >&2;}
     as_fn_error $? "*** Tests not supported when cross compiling" "$LINENO" 5
   fi
 fi
+
+# Check whether --enable-tests-crossbuild was given.
+if test ${enable_tests_crossbuild+y}
+then :
+  enableval=$enable_tests_crossbuild; build_tests="$enableval"
+
+fi
+
+
 if test "x$build_tests" != "xno" ; then
 
 
@@ -23993,7 +24234,7 @@ fi
     if test "x$have_gmock_v112" = "xyes" ; then
        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether user namespaces are supported" >&5
 printf %s "checking whether user namespaces are supported... " >&6; }
-if test ${ax_cv_user_namespace+y}
+if test ${ares_cv_user_namespace+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
@@ -24006,7 +24247,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
   if test "$cross_compiling" = yes
 then :
-  ax_cv_user_namespace=no
+  ares_cv_user_namespace=no
 else $as_nop
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -24046,9 +24287,9 @@ int main() {
 _ACEOF
 if ac_fn_c_try_run "$LINENO"
 then :
-  ax_cv_user_namespace=yes
+  ares_cv_user_namespace=yes
 else $as_nop
-  ax_cv_user_namespace=no
+  ares_cv_user_namespace=no
 fi
 rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
   conftest.$ac_objext conftest.beam conftest.$ac_ext
@@ -24062,9 +24303,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_user_namespace" >&5
-printf "%s\n" "$ax_cv_user_namespace" >&6; }
- if test "$ax_cv_user_namespace" = yes; then
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ares_cv_user_namespace" >&5
+printf "%s\n" "$ares_cv_user_namespace" >&6; }
+ if test "$ares_cv_user_namespace" = yes; then
 
 printf "%s\n" "#define HAVE_USER_NAMESPACE 1" >>confdefs.h
 
@@ -24072,7 +24313,7 @@ printf "%s\n" "#define HAVE_USER_NAMESPACE 1" >>confdefs.h
 
        { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether UTS namespaces are supported" >&5
 printf %s "checking whether UTS namespaces are supported... " >&6; }
-if test ${ax_cv_uts_namespace+y}
+if test ${ares_cv_uts_namespace+y}
 then :
   printf %s "(cached) " >&6
 else $as_nop
@@ -24085,7 +24326,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
   if test "$cross_compiling" = yes
 then :
-  ax_cv_uts_namespace=no
+  ares_cv_uts_namespace=no
 else $as_nop
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
@@ -24145,9 +24386,9 @@ int main() {
 _ACEOF
 if ac_fn_c_try_run "$LINENO"
 then :
-  ax_cv_uts_namespace=yes
+  ares_cv_uts_namespace=yes
 else $as_nop
-  ax_cv_uts_namespace=no
+  ares_cv_uts_namespace=no
 fi
 rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
   conftest.$ac_objext conftest.beam conftest.$ac_ext
@@ -24161,9 +24402,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 fi
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_uts_namespace" >&5
-printf "%s\n" "$ax_cv_uts_namespace" >&6; }
- if test "$ax_cv_uts_namespace" = yes; then
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ares_cv_uts_namespace" >&5
+printf "%s\n" "$ares_cv_uts_namespace" >&6; }
+ if test "$ares_cv_uts_namespace" = yes; then
 
 printf "%s\n" "#define HAVE_UTS_NAMESPACE 1" >>confdefs.h
 
@@ -24218,7 +24459,17 @@ else $as_nop
 // MSVC always sets __cplusplus to 199711L in older versions; newer versions
 // only set it correctly if /Zc:__cplusplus is specified as well as a
 // /std:c++NN switch:
+//
 // https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
+//
+// The value __cplusplus ought to have is available in _MSVC_LANG since
+// Visual Studio 2015 Update 3:
+//
+// https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
+//
+// This was also the first MSVC version to support C++14 so we can't use the
+// value of either __cplusplus or _MSVC_LANG to quickly rule out MSVC having
+// C++11 or C++14 support, but we can check _MSVC_LANG for C++17 and later.
 #elif __cplusplus < 201103L && !defined _MSC_VER
 
 #error "This is not a C++11 compiler"
@@ -26007,7 +26258,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by c-ares $as_me 1.34.3, which was
+This file was extended by c-ares $as_me 1.34.4, which was
 generated by GNU Autoconf 2.71.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -26075,7 +26326,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config='$ac_cs_config_escaped'
 ac_cs_version="\\
-c-ares config.status 1.34.3
+c-ares config.status 1.34.4
 configured by $0, generated by GNU Autoconf 2.71,
   with options \\"\$ac_cs_config\\"
 
diff --git a/deps/cares/configure.ac b/deps/cares/configure.ac
index 5f848c28598a95..9dacf1fb2e4a40 100644
--- a/deps/cares/configure.ac
+++ b/deps/cares/configure.ac
@@ -2,10 +2,10 @@ dnl Copyright (C) The c-ares project and its contributors
 dnl SPDX-License-Identifier: MIT
 AC_PREREQ([2.69])
 
-AC_INIT([c-ares], [1.34.3],
+AC_INIT([c-ares], [1.34.4],
   [c-ares mailing list: http://lists.haxx.se/listinfo/c-ares])
 
-CARES_VERSION_INFO="21:2:19"
+CARES_VERSION_INFO="21:3:19"
 dnl This flag accepts an argument of the form current[:revision[:age]]. So,
 dnl passing -version-info 3:12:1 sets current to 3, revision to 12, and age to
 dnl 1.
@@ -245,18 +245,25 @@ AC_SUBST(CARES_SYMBOL_HIDING_CFLAG)
 if test "$enable_warnings" = "yes"; then
   AX_APPEND_COMPILE_FLAGS([-Wall -Wextra -Waggregate-return -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdouble-promotion -Wfloat-equal -Wformat-security -Winit-self -Wjump-misses-init -Wlogical-op -Wmissing-braces -Wmissing-declarations -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wno-coverage-mismatch -Wold-style-definition -Wpacked -Wpedantic -Wpointer-arith -Wredundant-decls -Wshadow -Wsign-conversion -Wstrict-overflow -Wstrict-prototypes -Wtrampolines -Wundef -Wunreachable-code -Wunused -Wvariadic-macros -Wvla -Wwrite-strings -Werror=implicit-int -Werror=implicit-function-declaration -Werror=partial-availability -Wno-long-long ],
     [AM_CFLAGS], [-Werror])
-
-  dnl Android requires c99, all others should use c90
-  case $host_os in
-    *android*)
-      AX_APPEND_COMPILE_FLAGS([-std=c99], [AM_CFLAGS], [-Werror])
-      ;;
-    *)
-      AX_APPEND_COMPILE_FLAGS([-std=c90], [AM_CFLAGS], [-Werror])
-      ;;
-  esac
 fi
 
+dnl Android and QNX require c99, all others should use c90
+case $host_os in
+  *qnx*|*android*)
+    AX_APPEND_COMPILE_FLAGS([-std=c99], [AM_CFLAGS], [-Werror])
+    ;;
+  *)
+    AX_APPEND_COMPILE_FLAGS([-std=c90], [AM_CFLAGS], [-Werror])
+    ;;
+esac
+
+dnl QNX needs -D_QNX_SOURCE
+case $host_os in
+  *qnx*)
+    AX_APPEND_COMPILE_FLAGS([-D_QNX_SOURCE], [AM_CPPFLAGS], [-Werror])
+  ;;
+esac
+
 if test "$ax_cv_c_compiler_vendor" = "intel"; then
   AX_APPEND_COMPILE_FLAGS([-shared-intel], [AM_CFLAGS])
 fi
@@ -543,6 +550,7 @@ dnl https://mailman.videolan.org/pipermail/vlc-devel/2015-March/101802.html
 dnl which would require we check each individually and provide function arguments
 dnl for the test.
 
+AC_CHECK_DECL(strnlen,         [AC_DEFINE([HAVE_STRNLEN],           1, [Define to 1 if you have `strnlen`]        )], [], $cares_all_includes)
 AC_CHECK_DECL(memmem,          [AC_DEFINE([HAVE_MEMMEM],            1, [Define to 1 if you have `memmem`]         )], [], $cares_all_includes)
 AC_CHECK_DECL(recv,            [AC_DEFINE([HAVE_RECV],              1, [Define to 1 if you have `recv`]           )], [], $cares_all_includes)
 AC_CHECK_DECL(recvfrom,        [AC_DEFINE([HAVE_RECVFROM],          1, [Define to 1 if you have `recvfrom`]       )], [], $cares_all_includes)
@@ -813,6 +821,13 @@ if test "x$build_tests" != "xno" -a "x$cross_compiling" = "xyes" ; then
     AC_MSG_ERROR([*** Tests not supported when cross compiling])
   fi
 fi
+
+dnl Forces compiling of tests even when cross-compiling.
+AC_ARG_ENABLE(tests-crossbuild,
+  AS_HELP_STRING([--enable-tests-crossbuild], [Enable test building even when cross building]),
+  [build_tests="$enableval"]
+)
+
 if test "x$build_tests" != "xno" ; then
   PKG_CHECK_MODULES([GMOCK], [gmock], [ have_gmock=yes ], [ have_gmock=no ])
   if test "x$have_gmock" = "xno" ; then
@@ -825,8 +840,8 @@ if test "x$build_tests" != "xno" ; then
   else
     PKG_CHECK_MODULES([GMOCK112], [gmock >= 1.12.0], [ have_gmock_v112=yes ], [ have_gmock_v112=no ])
     if test "x$have_gmock_v112" = "xyes" ; then
-      AX_CHECK_USER_NAMESPACE
-      AX_CHECK_UTS_NAMESPACE
+      ARES_CHECK_USER_NAMESPACE
+      ARES_CHECK_UTS_NAMESPACE
     fi
   fi
 fi
diff --git a/deps/cares/docs/Makefile.in b/deps/cares/docs/Makefile.in
index 6b7bb8e30d1a20..0d1873c9662c92 100644
--- a/deps/cares/docs/Makefile.in
+++ b/deps/cares/docs/Makefile.in
@@ -92,7 +92,9 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = docs
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -102,8 +104,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \
diff --git a/deps/cares/docs/ares_create_query.3 b/deps/cares/docs/ares_create_query.3
index a54eec3e2a6bd1..3af6ba4cc3dc5b 100644
--- a/deps/cares/docs/ares_create_query.3
+++ b/deps/cares/docs/ares_create_query.3
@@ -19,6 +19,9 @@ int ares_create_query(const char *\fIname\fP,
                       int \fImax_udp_size\fP)
 .fi
 .SH DESCRIPTION
+This function is deprecated as of c-ares 1.22, please use
+\fIares_dns_record_create(3)\fP instead.
+
 The \fIares_create_query(3)\fP function composes a DNS query with a single
 question.  The parameter \fIname\fP gives the query name as a NUL-terminated C
 string of period-separated labels optionally ending with a period; periods and
diff --git a/deps/cares/docs/ares_mkquery.3 b/deps/cares/docs/ares_mkquery.3
index 0e7b5edbb89353..2f42d169210fef 100644
--- a/deps/cares/docs/ares_mkquery.3
+++ b/deps/cares/docs/ares_mkquery.3
@@ -14,7 +14,8 @@ int ares_mkquery(const char *\fIname\fP, int \fIdnsclass\fP, int \fItype\fP,
                  int *\fIbuflen\fP)
 .fi
 .SH DESCRIPTION
-Deprecated function. See \fIares_create_query(3)\fP instead!
+This function is deprecated as of c-ares 1.10, please use
+\fIares_dns_record_create(3)\fP instead.
 
 The
 .B ares_mkquery
diff --git a/deps/cares/docs/ares_send.3 b/deps/cares/docs/ares_send.3
index f6ea9140e2510c..df3e3bbe4136b0 100644
--- a/deps/cares/docs/ares_send.3
+++ b/deps/cares/docs/ares_send.3
@@ -113,6 +113,9 @@ is being destroyed; the query will not be completed.
 .B ARES_ENOSERVER
 The query will not be completed because no DNS servers were configured on the
 channel.
+.TP 19
+.B ARES_EBADQUERY
+Misformatted DNS query.
 .PP
 
 The callback argument
diff --git a/deps/cares/include/Makefile.in b/deps/cares/include/Makefile.in
index 0beee44a22bb22..7dc40eb08fab9c 100644
--- a/deps/cares/include/Makefile.in
+++ b/deps/cares/include/Makefile.in
@@ -90,7 +90,9 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = include
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -100,8 +102,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \
diff --git a/deps/cares/include/ares.h b/deps/cares/include/ares.h
index 139c6d66ee90df..7fe3ec78f4e651 100644
--- a/deps/cares/include/ares.h
+++ b/deps/cares/include/ares.h
@@ -74,7 +74,7 @@
 #if defined(_AIX) || defined(__NOVELL_LIBC__) || defined(__NetBSD__) || \
   defined(__minix) || defined(__SYMBIAN32__) || defined(__INTEGRITY) || \
   defined(ANDROID) || defined(__ANDROID__) || defined(__OpenBSD__) ||   \
-  defined(__QNXNTO__) || defined(__MVS__) || defined(__HAIKU__)
+  defined(__QNX__) || defined(__MVS__) || defined(__HAIKU__)
 #  include <sys/select.h>
 #endif
 
diff --git a/deps/cares/include/ares_version.h b/deps/cares/include/ares_version.h
index 9cb8084dd56bc9..782046bd79d844 100644
--- a/deps/cares/include/ares_version.h
+++ b/deps/cares/include/ares_version.h
@@ -32,8 +32,8 @@
 
 #define ARES_VERSION_MAJOR 1
 #define ARES_VERSION_MINOR 34
-#define ARES_VERSION_PATCH 3
-#define ARES_VERSION_STR "1.34.3"
+#define ARES_VERSION_PATCH 4
+#define ARES_VERSION_STR "1.34.4"
 
 /* NOTE: We cannot make the version string a C preprocessor stringify operation
  *       due to assumptions made by integrators that aren't properly using
diff --git a/deps/cares/m4/ax_check_user_namespace.m4 b/deps/cares/m4/ares_check_user_namespace.m4
similarity index 82%
rename from deps/cares/m4/ax_check_user_namespace.m4
rename to deps/cares/m4/ares_check_user_namespace.m4
index aca721626f2e89..a26b384fda5c54 100644
--- a/deps/cares/m4/ax_check_user_namespace.m4
+++ b/deps/cares/m4/ares_check_user_namespace.m4
@@ -2,7 +2,7 @@
 
 # SYNOPSIS
 #
-#   AX_CHECK_USER_NAMESPACE
+#   ARES_CHECK_USER_NAMESPACE
 #
 # DESCRIPTION
 #
@@ -12,9 +12,9 @@
 # Copyright (C) The c-ares team
 # SPDX-License-Identifier: MIT
 
-AC_DEFUN([AX_CHECK_USER_NAMESPACE],[dnl
+AC_DEFUN([ARES_CHECK_USER_NAMESPACE],[dnl
  AC_CACHE_CHECK([whether user namespaces are supported],
-  ax_cv_user_namespace,[
+  ares_cv_user_namespace,[
   AC_LANG_PUSH([C])
   AC_RUN_IFELSE([AC_LANG_SOURCE([[
 #define _GNU_SOURCE
@@ -48,10 +48,10 @@ int main() {
   if (!WIFEXITED(status)) return 1;
   return WEXITSTATUS(status);
 }
-  ]])],[ax_cv_user_namespace=yes],[ax_cv_user_namespace=no],[ax_cv_user_namespace=no])
+  ]])],[ares_cv_user_namespace=yes],[ares_cv_user_namespace=no],[ares_cv_user_namespace=no])
  AC_LANG_POP([C])
  ])
- if test "$ax_cv_user_namespace" = yes; then
+ if test "$ares_cv_user_namespace" = yes; then
    AC_DEFINE([HAVE_USER_NAMESPACE],[1],[Whether user namespaces are available])
  fi
-]) # AX_CHECK_USER_NAMESPACE
+]) # ARES_CHECK_USER_NAMESPACE
diff --git a/deps/cares/m4/ax_check_uts_namespace.m4 b/deps/cares/m4/ares_check_uts_namespace.m4
similarity index 87%
rename from deps/cares/m4/ax_check_uts_namespace.m4
rename to deps/cares/m4/ares_check_uts_namespace.m4
index 5708acf1b9f376..0aeefe4a9b7b8b 100644
--- a/deps/cares/m4/ax_check_uts_namespace.m4
+++ b/deps/cares/m4/ares_check_uts_namespace.m4
@@ -2,7 +2,7 @@
 
 # SYNOPSIS
 #
-#   AX_CHECK_UTS_NAMESPACE
+#   ARES_CHECK_UTS_NAMESPACE
 #
 # DESCRIPTION
 #
@@ -14,9 +14,9 @@
 # Copyright (C) The c-ares team
 # SPDX-License-Identifier: MIT
 
-AC_DEFUN([AX_CHECK_UTS_NAMESPACE],[dnl
+AC_DEFUN([ARES_CHECK_UTS_NAMESPACE],[dnl
  AC_CACHE_CHECK([whether UTS namespaces are supported],
-  ax_cv_uts_namespace,[
+  ares_cv_uts_namespace,[
   AC_LANG_PUSH([C])
   AC_RUN_IFELSE([AC_LANG_SOURCE([[
 #define _GNU_SOURCE
@@ -70,10 +70,10 @@ int main() {
   return WEXITSTATUS(status);
 }
 ]])
-  ],[ax_cv_uts_namespace=yes],[ax_cv_uts_namespace=no],[ax_cv_uts_namespace=no])
+  ],[ares_cv_uts_namespace=yes],[ares_cv_uts_namespace=no],[ares_cv_uts_namespace=no])
  AC_LANG_POP([C])
  ])
- if test "$ax_cv_uts_namespace" = yes; then
+ if test "$ares_cv_uts_namespace" = yes; then
    AC_DEFINE([HAVE_UTS_NAMESPACE],[1],[Whether UTS namespaces are available])
  fi
-]) # AX_CHECK_UTS_NAMESPACE
+]) # ARES_CHECK_UTS_NAMESPACE
diff --git a/deps/cares/m4/ax_append_compile_flags.m4 b/deps/cares/m4/ax_append_compile_flags.m4
index 1f8e70845c20d9..9c856356c0cda6 100644
--- a/deps/cares/m4/ax_append_compile_flags.m4
+++ b/deps/cares/m4/ax_append_compile_flags.m4
@@ -1,10 +1,10 @@
-# ===========================================================================
-#  http://www.gnu.org/software/autoconf-archive/ax_append_compile_flags.html
-# ===========================================================================
+# ============================================================================
+#  https://www.gnu.org/software/autoconf-archive/ax_append_compile_flags.html
+# ============================================================================
 #
 # SYNOPSIS
 #
-#   AX_APPEND_COMPILE_FLAGS([FLAG1 FLAG2 ...], [FLAGS-VARIABLE], [EXTRA-FLAGS])
+#   AX_APPEND_COMPILE_FLAGS([FLAG1 FLAG2 ...], [FLAGS-VARIABLE], [EXTRA-FLAGS], [INPUT])
 #
 # DESCRIPTION
 #
@@ -20,6 +20,8 @@
 #   the flags: "CFLAGS EXTRA-FLAGS FLAG".  This can for example be used to
 #   force the compiler to issue an error when a bad flag is given.
 #
+#   INPUT gives an alternative input source to AC_COMPILE_IFELSE.
+#
 #   NOTE: This macro depends on the AX_APPEND_FLAG and
 #   AX_CHECK_COMPILE_FLAG. Please keep this macro in sync with
 #   AX_APPEND_LINK_FLAGS.
@@ -28,38 +30,17 @@
 #
 #   Copyright (c) 2011 Maarten Bosmans <mkbosmans@gmail.com>
 #
-#   This program is free software: you can redistribute it and/or modify it
-#   under the terms of the GNU General Public License as published by the
-#   Free Software Foundation, either version 3 of the License, or (at your
-#   option) any later version.
-#
-#   This program is distributed in the hope that it will be useful, but
-#   WITHOUT ANY WARRANTY; without even the implied warranty of
-#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
-#   Public License for more details.
-#
-#   You should have received a copy of the GNU General Public License along
-#   with this program. If not, see <http://www.gnu.org/licenses/>.
-#
-#   As a special exception, the respective Autoconf Macro's copyright owner
-#   gives unlimited permission to copy, distribute and modify the configure
-#   scripts that are the output of Autoconf when processing the Macro. You
-#   need not follow the terms of the GNU General Public License when using
-#   or distributing such scripts, even though portions of the text of the
-#   Macro appear in them. The GNU General Public License (GPL) does govern
-#   all other use of the material that constitutes the Autoconf Macro.
-#
-#   This special exception to the GPL applies to versions of the Autoconf
-#   Macro released by the Autoconf Archive. When you make and distribute a
-#   modified version of the Autoconf Macro, you may extend this special
-#   exception to the GPL to apply to your modified version as well.
+#   Copying and distribution of this file, with or without modification, are
+#   permitted in any medium without royalty provided the copyright notice
+#   and this notice are preserved.  This file is offered as-is, without any
+#   warranty.
 
-#serial 3
+#serial 7
 
 AC_DEFUN([AX_APPEND_COMPILE_FLAGS],
-[AC_REQUIRE([AX_CHECK_COMPILE_FLAG])
-AC_REQUIRE([AX_APPEND_FLAG])
+[AX_REQUIRE_DEFINED([AX_CHECK_COMPILE_FLAG])
+AX_REQUIRE_DEFINED([AX_APPEND_FLAG])
 for flag in $1; do
-  AX_CHECK_COMPILE_FLAG([$flag], [AX_APPEND_FLAG([$flag], [$2])], [], [$3])
+  AX_CHECK_COMPILE_FLAG([$flag], [AX_APPEND_FLAG([$flag], [$2])], [], [$3], [$4])
 done
 ])dnl AX_APPEND_COMPILE_FLAGS
diff --git a/deps/cares/m4/ax_append_flag.m4 b/deps/cares/m4/ax_append_flag.m4
index 1d38b76fb8e157..dd6d8b61406c32 100644
--- a/deps/cares/m4/ax_append_flag.m4
+++ b/deps/cares/m4/ax_append_flag.m4
@@ -1,5 +1,5 @@
 # ===========================================================================
-#      http://www.gnu.org/software/autoconf-archive/ax_append_flag.html
+#      https://www.gnu.org/software/autoconf-archive/ax_append_flag.html
 # ===========================================================================
 #
 # SYNOPSIS
@@ -23,47 +23,28 @@
 #   Copyright (c) 2008 Guido U. Draheim <guidod@gmx.de>
 #   Copyright (c) 2011 Maarten Bosmans <mkbosmans@gmail.com>
 #
-#   This program is free software: you can redistribute it and/or modify it
-#   under the terms of the GNU General Public License as published by the
-#   Free Software Foundation, either version 3 of the License, or (at your
-#   option) any later version.
-#
-#   This program is distributed in the hope that it will be useful, but
-#   WITHOUT ANY WARRANTY; without even the implied warranty of
-#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
-#   Public License for more details.
-#
-#   You should have received a copy of the GNU General Public License along
-#   with this program. If not, see <http://www.gnu.org/licenses/>.
-#
-#   As a special exception, the respective Autoconf Macro's copyright owner
-#   gives unlimited permission to copy, distribute and modify the configure
-#   scripts that are the output of Autoconf when processing the Macro. You
-#   need not follow the terms of the GNU General Public License when using
-#   or distributing such scripts, even though portions of the text of the
-#   Macro appear in them. The GNU General Public License (GPL) does govern
-#   all other use of the material that constitutes the Autoconf Macro.
-#
-#   This special exception to the GPL applies to versions of the Autoconf
-#   Macro released by the Autoconf Archive. When you make and distribute a
-#   modified version of the Autoconf Macro, you may extend this special
-#   exception to the GPL to apply to your modified version as well.
+#   Copying and distribution of this file, with or without modification, are
+#   permitted in any medium without royalty provided the copyright notice
+#   and this notice are preserved.  This file is offered as-is, without any
+#   warranty.
 
-#serial 2
+#serial 8
 
 AC_DEFUN([AX_APPEND_FLAG],
-[AC_PREREQ(2.59)dnl for _AC_LANG_PREFIX
-AS_VAR_PUSHDEF([FLAGS], [m4_default($2,_AC_LANG_PREFIX[FLAGS])])dnl
-AS_VAR_SET_IF(FLAGS,
-  [case " AS_VAR_GET(FLAGS) " in
-    *" $1 "*)
-      AC_RUN_LOG([: FLAGS already contains $1])
-      ;;
-    *)
-      AC_RUN_LOG([: FLAGS="$FLAGS $1"])
-      AS_VAR_SET(FLAGS, ["AS_VAR_GET(FLAGS) $1"])
-      ;;
-   esac],
-  [AS_VAR_SET(FLAGS,["$1"])])
+[dnl
+AC_PREREQ(2.64)dnl for _AC_LANG_PREFIX and AS_VAR_SET_IF
+AS_VAR_PUSHDEF([FLAGS], [m4_default($2,_AC_LANG_PREFIX[FLAGS])])
+AS_VAR_SET_IF(FLAGS,[
+  AS_CASE([" AS_VAR_GET(FLAGS) "],
+    [*" $1 "*], [AC_RUN_LOG([: FLAGS already contains $1])],
+    [
+     AS_VAR_APPEND(FLAGS,[" $1"])
+     AC_RUN_LOG([: FLAGS="$FLAGS"])
+    ])
+  ],
+  [
+  AS_VAR_SET(FLAGS,[$1])
+  AC_RUN_LOG([: FLAGS="$FLAGS"])
+  ])
 AS_VAR_POPDEF([FLAGS])dnl
 ])dnl AX_APPEND_FLAG
diff --git a/deps/cares/m4/ax_check_compile_flag.m4 b/deps/cares/m4/ax_check_compile_flag.m4
index c3a8d695a1bcda..54191c55353ee5 100644
--- a/deps/cares/m4/ax_check_compile_flag.m4
+++ b/deps/cares/m4/ax_check_compile_flag.m4
@@ -1,10 +1,10 @@
 # ===========================================================================
-#   http://www.gnu.org/software/autoconf-archive/ax_check_compile_flag.html
+#  https://www.gnu.org/software/autoconf-archive/ax_check_compile_flag.html
 # ===========================================================================
 #
 # SYNOPSIS
 #
-#   AX_CHECK_COMPILE_FLAG(FLAG, [ACTION-SUCCESS], [ACTION-FAILURE], [EXTRA-FLAGS])
+#   AX_CHECK_COMPILE_FLAG(FLAG, [ACTION-SUCCESS], [ACTION-FAILURE], [EXTRA-FLAGS], [INPUT])
 #
 # DESCRIPTION
 #
@@ -19,6 +19,8 @@
 #   the flags: "CFLAGS EXTRA-FLAGS FLAG".  This can for example be used to
 #   force the compiler to issue an error when a bad flag is given.
 #
+#   INPUT gives an alternative input source to AC_COMPILE_IFELSE.
+#
 #   NOTE: Implementation based on AX_CFLAGS_GCC_OPTION. Please keep this
 #   macro in sync with AX_CHECK_{PREPROC,LINK}_FLAG.
 #
@@ -27,45 +29,34 @@
 #   Copyright (c) 2008 Guido U. Draheim <guidod@gmx.de>
 #   Copyright (c) 2011 Maarten Bosmans <mkbosmans@gmail.com>
 #
-#   This program is free software: you can redistribute it and/or modify it
-#   under the terms of the GNU General Public License as published by the
-#   Free Software Foundation, either version 3 of the License, or (at your
-#   option) any later version.
-#
-#   This program is distributed in the hope that it will be useful, but
-#   WITHOUT ANY WARRANTY; without even the implied warranty of
-#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
-#   Public License for more details.
-#
-#   You should have received a copy of the GNU General Public License along
-#   with this program. If not, see <http://www.gnu.org/licenses/>.
-#
-#   As a special exception, the respective Autoconf Macro's copyright owner
-#   gives unlimited permission to copy, distribute and modify the configure
-#   scripts that are the output of Autoconf when processing the Macro. You
-#   need not follow the terms of the GNU General Public License when using
-#   or distributing such scripts, even though portions of the text of the
-#   Macro appear in them. The GNU General Public License (GPL) does govern
-#   all other use of the material that constitutes the Autoconf Macro.
-#
-#   This special exception to the GPL applies to versions of the Autoconf
-#   Macro released by the Autoconf Archive. When you make and distribute a
-#   modified version of the Autoconf Macro, you may extend this special
-#   exception to the GPL to apply to your modified version as well.
+#   Copying and distribution of this file, with or without modification, are
+#   permitted in any medium without royalty provided the copyright notice
+#   and this notice are preserved.  This file is offered as-is, without any
+#   warranty.
 
-#serial 2
+#serial 11
 
 AC_DEFUN([AX_CHECK_COMPILE_FLAG],
-[AC_PREREQ(2.59)dnl for _AC_LANG_PREFIX
+[AC_PREREQ(2.64)dnl for _AC_LANG_PREFIX and AS_VAR_IF
 AS_VAR_PUSHDEF([CACHEVAR],[ax_cv_check_[]_AC_LANG_ABBREV[]flags_$4_$1])dnl
-AC_CACHE_CHECK([whether _AC_LANG compiler accepts $1], CACHEVAR, [
+AC_CACHE_CHECK([whether the _AC_LANG compiler accepts $1], CACHEVAR, [
   ax_check_save_flags=$[]_AC_LANG_PREFIX[]FLAGS
-  _AC_LANG_PREFIX[]FLAGS="$[]_AC_LANG_PREFIX[]FLAGS $4 $1"
-  AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
+  if test x"m4_case(_AC_LANG,
+                     [C], [$GCC],
+                     [C++], [$GXX],
+                     [Fortran], [$GFC],
+                     [Fortran 77], [$G77],
+                     [Objective C], [$GOBJC],
+                     [Objective C++], [$GOBJCXX],
+                     [no])" = xyes ; then
+    add_gnu_werror="-Werror"
+  fi
+  _AC_LANG_PREFIX[]FLAGS="$[]_AC_LANG_PREFIX[]FLAGS $4 $1 $add_gnu_werror"
+  AC_COMPILE_IFELSE([m4_default([$5],[AC_LANG_PROGRAM()])],
     [AS_VAR_SET(CACHEVAR,[yes])],
     [AS_VAR_SET(CACHEVAR,[no])])
   _AC_LANG_PREFIX[]FLAGS=$ax_check_save_flags])
-AS_IF([test x"AS_VAR_GET(CACHEVAR)" = xyes],
+AS_VAR_IF(CACHEVAR,yes,
   [m4_default([$2], :)],
   [m4_default([$3], :)])
 AS_VAR_POPDEF([CACHEVAR])dnl
diff --git a/deps/cares/m4/ax_code_coverage.m4 b/deps/cares/m4/ax_code_coverage.m4
index ad4063305ebcdd..216708a41f10c9 100644
--- a/deps/cares/m4/ax_code_coverage.m4
+++ b/deps/cares/m4/ax_code_coverage.m4
@@ -74,7 +74,7 @@
 #   You should have received a copy of the GNU Lesser General Public License
 #   along with this program. If not, see <https://www.gnu.org/licenses/>.
 
-#serial 34
+#serial 37
 
 m4_define(_AX_CODE_COVERAGE_RULES,[
 AX_ADD_AM_MACRO_STATIC([
@@ -144,7 +144,7 @@ code_coverage_v_lcov_cap_ = \$(code_coverage_v_lcov_cap_\$(AM_DEFAULT_VERBOSITY)
 code_coverage_v_lcov_cap_0 = @echo \"  LCOV   --capture\" \$(CODE_COVERAGE_OUTPUT_FILE);
 code_coverage_v_lcov_ign = \$(code_coverage_v_lcov_ign_\$(V))
 code_coverage_v_lcov_ign_ = \$(code_coverage_v_lcov_ign_\$(AM_DEFAULT_VERBOSITY))
-code_coverage_v_lcov_ign_0 = @echo \"  LCOV   --remove /tmp/*\" \$(CODE_COVERAGE_IGNORE_PATTERN);
+code_coverage_v_lcov_ign_0 = @echo \"  LCOV   --remove\" \"\$(CODE_COVERAGE_OUTPUT_FILE).tmp\" \$(CODE_COVERAGE_IGNORE_PATTERN);
 code_coverage_v_genhtml = \$(code_coverage_v_genhtml_\$(V))
 code_coverage_v_genhtml_ = \$(code_coverage_v_genhtml_\$(AM_DEFAULT_VERBOSITY))
 code_coverage_v_genhtml_0 = @echo \"  GEN   \" \"\$(CODE_COVERAGE_OUTPUT_DIRECTORY)\";
@@ -163,7 +163,7 @@ check-code-coverage:
 # Capture code coverage data
 code-coverage-capture: code-coverage-capture-hook
 	\$(code_coverage_v_lcov_cap)\$(LCOV) \$(code_coverage_quiet) \$(addprefix --directory ,\$(CODE_COVERAGE_DIRECTORY)) --capture --output-file \"\$(CODE_COVERAGE_OUTPUT_FILE).tmp\" --test-name \"\$(call code_coverage_sanitize,\$(PACKAGE_NAME)-\$(PACKAGE_VERSION))\" --no-checksum --compat-libtool \$(CODE_COVERAGE_LCOV_SHOPTS) \$(CODE_COVERAGE_LCOV_OPTIONS)
-	\$(code_coverage_v_lcov_ign)\$(LCOV) \$(code_coverage_quiet) \$(addprefix --directory ,\$(CODE_COVERAGE_DIRECTORY)) --remove \"\$(CODE_COVERAGE_OUTPUT_FILE).tmp\" \"/tmp/*\" \$(CODE_COVERAGE_IGNORE_PATTERN) --output-file \"\$(CODE_COVERAGE_OUTPUT_FILE)\" \$(CODE_COVERAGE_LCOV_SHOPTS) \$(CODE_COVERAGE_LCOV_RMOPTS)
+	\$(code_coverage_v_lcov_ign)\$(LCOV) \$(code_coverage_quiet) \$(addprefix --directory ,\$(CODE_COVERAGE_DIRECTORY)) --remove \"\$(CODE_COVERAGE_OUTPUT_FILE).tmp\" \$(CODE_COVERAGE_IGNORE_PATTERN) --output-file \"\$(CODE_COVERAGE_OUTPUT_FILE)\" \$(CODE_COVERAGE_LCOV_SHOPTS) \$(CODE_COVERAGE_LCOV_RMOPTS)
 	-@rm -f \"\$(CODE_COVERAGE_OUTPUT_FILE).tmp\"
 	\$(code_coverage_v_genhtml)LANG=C \$(GENHTML) \$(code_coverage_quiet) \$(addprefix --prefix ,\$(CODE_COVERAGE_DIRECTORY)) --output-directory \"\$(CODE_COVERAGE_OUTPUT_DIRECTORY)\" --title \"\$(PACKAGE_NAME)-\$(PACKAGE_VERSION) Code Coverage\" --legend --show-details \"\$(CODE_COVERAGE_OUTPUT_FILE)\" \$(CODE_COVERAGE_GENHTML_OPTIONS)
 	@echo \"file://\$(abs_builddir)/\$(CODE_COVERAGE_OUTPUT_DIRECTORY)/index.html\"
@@ -206,14 +206,14 @@ code-coverage-capture-hook:
 ])
 
 AC_DEFUN([_AX_CODE_COVERAGE_ENABLED],[
-	AX_CHECK_GNU_MAKE([],AC_MSG_ERROR([not using GNU make that is needed for coverage]))
+	AX_CHECK_GNU_MAKE([],[AC_MSG_ERROR([not using GNU make that is needed for coverage])])
 	AC_REQUIRE([AX_ADD_AM_MACRO_STATIC])
 	# check for gcov
 	AC_CHECK_TOOL([GCOV],
 		  [$_AX_CODE_COVERAGE_GCOV_PROG_WITH],
 		  [:])
 	AS_IF([test "X$GCOV" = "X:"],
-	      AC_MSG_ERROR([gcov is needed to do coverage]))
+	      [AC_MSG_ERROR([gcov is needed to do coverage])])
 	AC_SUBST([GCOV])
 
 	dnl Check if gcc is being used
@@ -232,12 +232,13 @@ AC_DEFUN([_AX_CODE_COVERAGE_ENABLED],[
 		AC_MSG_ERROR([Could not find genhtml from the lcov package])
 	])
 
+	AC_CHECK_LIB([gcov], [_gcov_init], [CODE_COVERAGE_LIBS="-lgcov"], [CODE_COVERAGE_LIBS=""])
+
 	dnl Build the code coverage flags
 	dnl Define CODE_COVERAGE_LDFLAGS for backwards compatibility
 	CODE_COVERAGE_CPPFLAGS="-DNDEBUG"
 	CODE_COVERAGE_CFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
 	CODE_COVERAGE_CXXFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
-	CODE_COVERAGE_LIBS="-lgcov"
 
 	AC_SUBST([CODE_COVERAGE_CPPFLAGS])
 	AC_SUBST([CODE_COVERAGE_CFLAGS])
diff --git a/deps/cares/m4/ax_cxx_compile_stdcxx.m4 b/deps/cares/m4/ax_cxx_compile_stdcxx.m4
index 8edf5152ec7a91..fe6ae17e6c4d32 100644
--- a/deps/cares/m4/ax_cxx_compile_stdcxx.m4
+++ b/deps/cares/m4/ax_cxx_compile_stdcxx.m4
@@ -10,8 +10,8 @@
 #
 #   Check for baseline language coverage in the compiler for the specified
 #   version of the C++ standard.  If necessary, add switches to CXX and
-#   CXXCPP to enable support.  VERSION may be '11', '14', '17', or '20' for
-#   the respective C++ standard version.
+#   CXXCPP to enable support.  VERSION may be '11', '14', '17', '20', or
+#   '23' for the respective C++ standard version.
 #
 #   The second argument, if specified, indicates whether you insist on an
 #   extended mode (e.g. -std=gnu++11) or a strict conformance mode (e.g.
@@ -36,14 +36,15 @@
 #   Copyright (c) 2016, 2018 Krzesimir Nowak <qdlacz@gmail.com>
 #   Copyright (c) 2019 Enji Cooper <yaneurabeya@gmail.com>
 #   Copyright (c) 2020 Jason Merrill <jason@redhat.com>
-#   Copyright (c) 2021 Jörn Heusipp <osmanx@problemloesungsmaschine.de>
+#   Copyright (c) 2021, 2024 Jörn Heusipp <osmanx@problemloesungsmaschine.de>
+#   Copyright (c) 2015, 2022, 2023, 2024 Olly Betts
 #
 #   Copying and distribution of this file, with or without modification, are
 #   permitted in any medium without royalty provided the copyright notice
 #   and this notice are preserved.  This file is offered as-is, without any
 #   warranty.
 
-#serial 18
+#serial 25
 
 dnl  This macro is based on the code from the AX_CXX_COMPILE_STDCXX_11 macro
 dnl  (serial version number 13).
@@ -53,6 +54,7 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
         [$1], [14], [ax_cxx_compile_alternatives="14 1y"],
         [$1], [17], [ax_cxx_compile_alternatives="17 1z"],
         [$1], [20], [ax_cxx_compile_alternatives="20"],
+        [$1], [23], [ax_cxx_compile_alternatives="23"],
         [m4_fatal([invalid first argument `$1' to AX_CXX_COMPILE_STDCXX])])dnl
   m4_if([$2], [], [],
         [$2], [ext], [],
@@ -159,31 +161,41 @@ AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
 dnl  Test body for checking C++11 support
 
 m4_define([_AX_CXX_COMPILE_STDCXX_testbody_11],
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_11
+  [_AX_CXX_COMPILE_STDCXX_testbody_new_in_11]
 )
 
 dnl  Test body for checking C++14 support
 
 m4_define([_AX_CXX_COMPILE_STDCXX_testbody_14],
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_11
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
+  [_AX_CXX_COMPILE_STDCXX_testbody_new_in_11
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_14]
 )
 
 dnl  Test body for checking C++17 support
 
 m4_define([_AX_CXX_COMPILE_STDCXX_testbody_17],
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_11
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_17
+  [_AX_CXX_COMPILE_STDCXX_testbody_new_in_11
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_17]
 )
 
 dnl  Test body for checking C++20 support
 
 m4_define([_AX_CXX_COMPILE_STDCXX_testbody_20],
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_11
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_17
-  _AX_CXX_COMPILE_STDCXX_testbody_new_in_20
+  [_AX_CXX_COMPILE_STDCXX_testbody_new_in_11
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_17
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_20]
+)
+
+dnl  Test body for checking C++23 support
+
+m4_define([_AX_CXX_COMPILE_STDCXX_testbody_23],
+  [_AX_CXX_COMPILE_STDCXX_testbody_new_in_11
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_14
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_17
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_20
+   _AX_CXX_COMPILE_STDCXX_testbody_new_in_23]
 )
 
 
@@ -201,7 +213,17 @@ m4_define([_AX_CXX_COMPILE_STDCXX_testbody_new_in_11], [[
 // MSVC always sets __cplusplus to 199711L in older versions; newer versions
 // only set it correctly if /Zc:__cplusplus is specified as well as a
 // /std:c++NN switch:
+//
 // https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
+//
+// The value __cplusplus ought to have is available in _MSVC_LANG since
+// Visual Studio 2015 Update 3:
+//
+// https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
+//
+// This was also the first MSVC version to support C++14 so we can't use the
+// value of either __cplusplus or _MSVC_LANG to quickly rule out MSVC having
+// C++11 or C++14 support, but we can check _MSVC_LANG for C++17 and later.
 #elif __cplusplus < 201103L && !defined _MSC_VER
 
 #error "This is not a C++11 compiler"
@@ -617,7 +639,7 @@ m4_define([_AX_CXX_COMPILE_STDCXX_testbody_new_in_17], [[
 
 #error "This is not a C++ compiler"
 
-#elif __cplusplus < 201703L && !defined _MSC_VER
+#elif (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 201703L
 
 #error "This is not a C++17 compiler"
 
@@ -983,7 +1005,7 @@ namespace cxx17
 
 }  // namespace cxx17
 
-#endif  // __cplusplus < 201703L && !defined _MSC_VER
+#endif  // (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 201703L
 
 ]])
 
@@ -996,7 +1018,7 @@ m4_define([_AX_CXX_COMPILE_STDCXX_testbody_new_in_20], [[
 
 #error "This is not a C++ compiler"
 
-#elif __cplusplus < 202002L && !defined _MSC_VER
+#elif (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 202002L
 
 #error "This is not a C++20 compiler"
 
@@ -1013,6 +1035,36 @@ namespace cxx20
 
 }  // namespace cxx20
 
-#endif  // __cplusplus < 202002L && !defined _MSC_VER
+#endif  // (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 202002L
+
+]])
+
+
+dnl  Tests for new features in C++23
+
+m4_define([_AX_CXX_COMPILE_STDCXX_testbody_new_in_23], [[
+
+#ifndef __cplusplus
+
+#error "This is not a C++ compiler"
+
+#elif (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 202302L
+
+#error "This is not a C++23 compiler"
+
+#else
+
+#include <version>
+
+namespace cxx23
+{
+
+// As C++23 supports feature test macros in the standard, there is no
+// immediate need to actually test for feature availability on the
+// Autoconf side.
+
+}  // namespace cxx23
+
+#endif  // (defined _MSVC_LANG ? _MSVC_LANG : __cplusplus) < 202302L
 
 ]])
diff --git a/deps/cares/src/Makefile.in b/deps/cares/src/Makefile.in
index 0c3c0864d4460a..1f286880247aa1 100644
--- a/deps/cares/src/Makefile.in
+++ b/deps/cares/src/Makefile.in
@@ -89,7 +89,9 @@ build_triplet = @build@
 host_triplet = @host@
 subdir = src
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -99,8 +101,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \
diff --git a/deps/cares/src/lib/CMakeLists.txt b/deps/cares/src/lib/CMakeLists.txt
index 9956fd625b2ad6..9d4e10924d0adb 100644
--- a/deps/cares/src/lib/CMakeLists.txt
+++ b/deps/cares/src/lib/CMakeLists.txt
@@ -92,11 +92,23 @@ IF (CARES_STATIC)
 
 	SET_TARGET_PROPERTIES (${LIBNAME} PROPERTIES
 		EXPORT_NAME                  cares${STATIC_SUFFIX}
-		OUTPUT_NAME                  cares${STATIC_SUFFIX}
 		COMPILE_PDB_NAME             cares${STATIC_SUFFIX}
 		C_STANDARD                   90
 	)
 
+	# On Windows, the output name should have a static suffix since otherwise
+	# we would have conflicting output names (libcares.lib) for the link
+	# library.
+	# However on Unix-like systems, we typically have something like
+	# libcares.so  for shared libraries  and libcares.a for static
+	# libraries, so these don't conflict.
+	# This behavior better emulates what happens with autotools builds
+	IF (WIN32)
+		SET_TARGET_PROPERTIES(${LIBNAME} PROPERTIES OUTPUT_NAME cares${STATIC_SUFFIX})
+	ELSE ()
+		SET_TARGET_PROPERTIES(${LIBNAME} PROPERTIES OUTPUT_NAME cares)
+	ENDIF()
+
 	IF (ANDROID)
 		SET_TARGET_PROPERTIES (${LIBNAME} PROPERTIES C_STANDARD 99)
 	ENDIF ()
diff --git a/deps/cares/src/lib/Makefile.in b/deps/cares/src/lib/Makefile.in
index 4aff043b26a310..a45fc10b544755 100644
--- a/deps/cares/src/lib/Makefile.in
+++ b/deps/cares/src/lib/Makefile.in
@@ -15,7 +15,7 @@
 @SET_MAKE@
 
 # aminclude_static.am generated automatically by Autoconf
-# from AX_AM_MACROS_STATIC on Sat Nov  9 17:40:37 UTC 2024
+# from AX_AM_MACROS_STATIC on Sat Dec 14 15:15:44 UTC 2024
 
 # Copyright (C) The c-ares project and its contributors
 # SPDX-License-Identifier: MIT
@@ -100,7 +100,9 @@ host_triplet = @host@
 subdir = src/lib
 SUBDIRS =
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -110,8 +112,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \
@@ -629,7 +629,7 @@ libcares_la_CPPFLAGS_EXTRA = -DCARES_BUILDING_LIBRARY $(am__append_3) \
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_lcov_cap_0 = @echo "  LCOV   --capture" $(CODE_COVERAGE_OUTPUT_FILE);
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_lcov_ign = $(code_coverage_v_lcov_ign_$(V))
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_lcov_ign_ = $(code_coverage_v_lcov_ign_$(AM_DEFAULT_VERBOSITY))
-@CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_lcov_ign_0 = @echo "  LCOV   --remove /tmp/*" $(CODE_COVERAGE_IGNORE_PATTERN);
+@CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_lcov_ign_0 = @echo "  LCOV   --remove" "$(CODE_COVERAGE_OUTPUT_FILE).tmp" $(CODE_COVERAGE_IGNORE_PATTERN);
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_genhtml = $(code_coverage_v_genhtml_$(V))
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_genhtml_ = $(code_coverage_v_genhtml_$(AM_DEFAULT_VERBOSITY))
 @CODE_COVERAGE_ENABLED_TRUE@code_coverage_v_genhtml_0 = @echo "  GEN   " "$(CODE_COVERAGE_OUTPUT_DIRECTORY)";
@@ -2328,7 +2328,7 @@ uninstall-am: uninstall-libLTLIBRARIES
 # Capture code coverage data
 @CODE_COVERAGE_ENABLED_TRUE@code-coverage-capture: code-coverage-capture-hook
 @CODE_COVERAGE_ENABLED_TRUE@	$(code_coverage_v_lcov_cap)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --capture --output-file "$(CODE_COVERAGE_OUTPUT_FILE).tmp" --test-name "$(call code_coverage_sanitize,$(PACKAGE_NAME)-$(PACKAGE_VERSION))" --no-checksum --compat-libtool $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_OPTIONS)
-@CODE_COVERAGE_ENABLED_TRUE@	$(code_coverage_v_lcov_ign)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --remove "$(CODE_COVERAGE_OUTPUT_FILE).tmp" "/tmp/*" $(CODE_COVERAGE_IGNORE_PATTERN) --output-file "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_RMOPTS)
+@CODE_COVERAGE_ENABLED_TRUE@	$(code_coverage_v_lcov_ign)$(LCOV) $(code_coverage_quiet) $(addprefix --directory ,$(CODE_COVERAGE_DIRECTORY)) --remove "$(CODE_COVERAGE_OUTPUT_FILE).tmp" $(CODE_COVERAGE_IGNORE_PATTERN) --output-file "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_LCOV_SHOPTS) $(CODE_COVERAGE_LCOV_RMOPTS)
 @CODE_COVERAGE_ENABLED_TRUE@	-@rm -f "$(CODE_COVERAGE_OUTPUT_FILE).tmp"
 @CODE_COVERAGE_ENABLED_TRUE@	$(code_coverage_v_genhtml)LANG=C $(GENHTML) $(code_coverage_quiet) $(addprefix --prefix ,$(CODE_COVERAGE_DIRECTORY)) --output-directory "$(CODE_COVERAGE_OUTPUT_DIRECTORY)" --title "$(PACKAGE_NAME)-$(PACKAGE_VERSION) Code Coverage" --legend --show-details "$(CODE_COVERAGE_OUTPUT_FILE)" $(CODE_COVERAGE_GENHTML_OPTIONS)
 @CODE_COVERAGE_ENABLED_TRUE@	@echo "file://$(abs_builddir)/$(CODE_COVERAGE_OUTPUT_DIRECTORY)/index.html"
diff --git a/deps/cares/src/lib/ares_config.h.cmake b/deps/cares/src/lib/ares_config.h.cmake
index 051b97f494fd32..51744fe143868c 100644
--- a/deps/cares/src/lib/ares_config.h.cmake
+++ b/deps/cares/src/lib/ares_config.h.cmake
@@ -257,6 +257,9 @@
 /* Define to 1 if you have the <signal.h> header file. */
 #cmakedefine HAVE_SIGNAL_H 1
 
+/* Define to 1 if you have the strnlen function. */
+#cmakedefine HAVE_STRNLEN 1
+
 /* Define to 1 if your struct sockaddr_in6 has sin6_scope_id. */
 #cmakedefine HAVE_STRUCT_SOCKADDR_IN6_SIN6_SCOPE_ID 1
 
diff --git a/deps/cares/src/lib/ares_config.h.in b/deps/cares/src/lib/ares_config.h.in
index d1f09d694db68e..a62e17089358aa 100644
--- a/deps/cares/src/lib/ares_config.h.in
+++ b/deps/cares/src/lib/ares_config.h.in
@@ -309,6 +309,9 @@
 /* Define to 1 if you have `strnicmp` */
 #undef HAVE_STRNICMP
 
+/* Define to 1 if you have `strnlen` */
+#undef HAVE_STRNLEN
+
 /* Define to 1 if the system has the type `struct addrinfo'. */
 #undef HAVE_STRUCT_ADDRINFO
 
diff --git a/deps/cares/src/lib/ares_private.h b/deps/cares/src/lib/ares_private.h
index ce8c3f2ddc2f6c..e6d44e8b8640f9 100644
--- a/deps/cares/src/lib/ares_private.h
+++ b/deps/cares/src/lib/ares_private.h
@@ -388,8 +388,23 @@ ares_status_t ares_sysconfig_set_options(ares_sysconfig_t *sysconfig,
 
 ares_status_t ares_init_by_environment(ares_sysconfig_t *sysconfig);
 
+
+typedef ares_status_t (*ares_sysconfig_line_cb_t)(const ares_channel_t *channel,
+                                                  ares_sysconfig_t     *sysconfig,
+                                                  ares_buf_t           *line);
+
+ares_status_t ares_sysconfig_parse_resolv_line(const ares_channel_t *channel,
+                                               ares_sysconfig_t     *sysconfig,
+                                               ares_buf_t           *line);
+
+ares_status_t ares_sysconfig_process_buf(const ares_channel_t    *channel,
+                                         ares_sysconfig_t        *sysconfig,
+                                         ares_buf_t              *buf,
+                                         ares_sysconfig_line_cb_t cb);
+
 ares_status_t ares_init_sysconfig_files(const ares_channel_t *channel,
-                                        ares_sysconfig_t     *sysconfig);
+                                        ares_sysconfig_t     *sysconfig,
+                                        ares_bool_t process_resolvconf);
 #ifdef __APPLE__
 ares_status_t ares_init_sysconfig_macos(const ares_channel_t *channel,
                                         ares_sysconfig_t     *sysconfig);
diff --git a/deps/cares/src/lib/ares_set_socket_functions.c b/deps/cares/src/lib/ares_set_socket_functions.c
index 143c491174fdba..7216ffa933fc07 100644
--- a/deps/cares/src/lib/ares_set_socket_functions.c
+++ b/deps/cares/src/lib/ares_set_socket_functions.c
@@ -288,7 +288,9 @@ static int default_asetsockopt(ares_socket_t sock, ares_socket_opt_t opt,
       return setsockopt(sock, SOL_SOCKET, SO_RCVBUF, val, val_size);
 
     case ARES_SOCKET_OPT_BIND_DEVICE:
-      if (!ares_str_isprint(val, (size_t)val_size)) {
+      /* Count the number of characters before NULL terminator then
+       * validate those are all printable */
+      if (!ares_str_isprint(val, ares_strnlen(val, (size_t)val_size))) {
         SET_SOCKERRNO(EINVAL);
         return -1;
       }
diff --git a/deps/cares/src/lib/ares_socket.c b/deps/cares/src/lib/ares_socket.c
index df02fd61b60b14..516852a84abfb8 100644
--- a/deps/cares/src/lib/ares_socket.c
+++ b/deps/cares/src/lib/ares_socket.c
@@ -263,7 +263,8 @@ ares_status_t ares_socket_configure(ares_channel_t *channel, int family,
      * compatibility */
     (void)channel->sock_funcs.asetsockopt(
       fd, ARES_SOCKET_OPT_BIND_DEVICE, channel->local_dev_name,
-      sizeof(channel->local_dev_name), channel->sock_func_cb_data);
+      (ares_socklen_t)ares_strlen(channel->local_dev_name),
+      channel->sock_func_cb_data);
   }
 
   /* Bind to ip address if configured */
diff --git a/deps/cares/src/lib/ares_sysconfig.c b/deps/cares/src/lib/ares_sysconfig.c
index 9f0d7e5061ffe0..286db60328f45b 100644
--- a/deps/cares/src/lib/ares_sysconfig.c
+++ b/deps/cares/src/lib/ares_sysconfig.c
@@ -260,6 +260,94 @@ static ares_status_t ares_init_sysconfig_android(const ares_channel_t *channel,
 }
 #endif
 
+#if defined(__QNX__)
+static ares_status_t
+  ares_init_sysconfig_qnx(const ares_channel_t *channel,
+                          ares_sysconfig_t     *sysconfig)
+{
+  /* QNX:
+   *   1. use confstr(_CS_RESOLVE, ...) as primary resolv.conf data, replacing
+   *      "_" with " ".  If that is empty, then do normal /etc/resolv.conf
+   *      processing.
+   *   2. We want to process /etc/nsswitch.conf as normal.
+   *   3. if confstr(_CS_DOMAIN, ...) this is the domain name.  Use this as
+   *      preference over anything else found.
+   */
+  ares_buf_t    *buf                = ares_buf_create();
+  unsigned char *data               = NULL;
+  size_t         data_size          = 0;
+  ares_bool_t    process_resolvconf = ARES_TRUE;
+  ares_status_t  status             = ARES_SUCCESS;
+
+  /* Prefer confstr(_CS_RESOLVE, ...) */
+  buf = ares_buf_create();
+  if (buf == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  data_size = 1024;
+  data      = ares_buf_append_start(buf, &data_size);
+  if (data == NULL) {
+    status = ARES_ENOMEM;
+    goto done;
+  }
+
+  data_size = confstr(_CS_RESOLVE, (char *)data, data_size);
+  if (data_size > 1) {
+    /* confstr returns byte for NULL terminator, strip */
+    data_size--;
+
+    ares_buf_append_finish(buf, data_size);
+    /* Its odd, this uses _ instead of " " between keywords, otherwise the
+     * format is the same as resolv.conf, replace. */
+    ares_buf_replace(buf, (const unsigned char *)"_", 1,
+                     (const unsigned char *)" ", 1);
+
+    status = ares_sysconfig_process_buf(channel, sysconfig, buf,
+                                        ares_sysconfig_parse_resolv_line);
+    if (status != ARES_SUCCESS) {
+      /* ENOMEM is really the only error we'll get here */
+      goto done;
+    }
+
+    /* don't read resolv.conf if we processed *any* nameservers */
+    if (ares_llist_len(sysconfig->sconfig) != 0) {
+      process_resolvconf = ARES_FALSE;
+    }
+  }
+
+  /* Process files */
+  status = ares_init_sysconfig_files(channel, sysconfig, process_resolvconf);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  /* Read confstr(_CS_DOMAIN, ...), but if we had a search path specified with
+   * more than one domain, lets prefer that instead.  Its not exactly clear
+   * the best way to handle this. */
+  if (sysconfig->ndomains <= 1) {
+    char   domain[256];
+    size_t domain_len;
+
+    domain_len = confstr(_CS_DOMAIN, domain, sizeof(domain_len));
+    if (domain_len != 0) {
+      ares_strsplit_free(sysconfig->domains, sysconfig->ndomains);
+      sysconfig->domains = ares_strsplit(domain, ", ", &sysconfig->ndomains);
+      if (sysconfig->domains == NULL) {
+        status = ARES_ENOMEM;
+        goto done;
+      }
+    }
+  }
+
+done:
+  ares_buf_destroy(buf);
+
+  return status;
+}
+#endif
+
 #if defined(CARES_USE_LIBRESOLV)
 static ares_status_t
   ares_init_sysconfig_libresolv(const ares_channel_t *channel,
@@ -516,8 +604,10 @@ ares_status_t ares_init_by_sysconfig(ares_channel_t *channel)
   status = ares_init_sysconfig_macos(channel, &sysconfig);
 #elif defined(CARES_USE_LIBRESOLV)
   status = ares_init_sysconfig_libresolv(channel, &sysconfig);
+#elif defined(__QNX__)
+  status = ares_init_sysconfig_qnx(channel, &sysconfig);
 #else
-  status = ares_init_sysconfig_files(channel, &sysconfig);
+  status = ares_init_sysconfig_files(channel, &sysconfig, ARES_TRUE);
 #endif
 
   if (status != ARES_SUCCESS) {
diff --git a/deps/cares/src/lib/ares_sysconfig_files.c b/deps/cares/src/lib/ares_sysconfig_files.c
index 49bc330d9d346d..a6c2a8e62bb34f 100644
--- a/deps/cares/src/lib/ares_sysconfig_files.c
+++ b/deps/cares/src/lib/ares_sysconfig_files.c
@@ -549,9 +549,9 @@ ares_status_t ares_init_by_environment(ares_sysconfig_t *sysconfig)
 /* This function will only return ARES_SUCCESS or ARES_ENOMEM.  Any other
  * conditions are ignored.  Users may mess up config files, but we want to
  * process anything we can. */
-static ares_status_t parse_resolvconf_line(const ares_channel_t *channel,
-                                           ares_sysconfig_t     *sysconfig,
-                                           ares_buf_t           *line)
+ares_status_t ares_sysconfig_parse_resolv_line(const ares_channel_t *channel,
+                                               ares_sysconfig_t     *sysconfig,
+                                               ares_buf_t           *line)
 {
   char          option[32];
   char          value[512];
@@ -726,9 +726,38 @@ static ares_status_t parse_svcconf_line(const ares_channel_t *channel,
   return status;
 }
 
-typedef ares_status_t (*line_callback_t)(const ares_channel_t *channel,
-                                         ares_sysconfig_t     *sysconfig,
-                                         ares_buf_t           *line);
+
+ares_status_t ares_sysconfig_process_buf(const ares_channel_t    *channel,
+                                         ares_sysconfig_t        *sysconfig,
+                                         ares_buf_t              *buf,
+                                         ares_sysconfig_line_cb_t cb)
+{
+  ares_array_t *lines  = NULL;
+  size_t        num;
+  size_t        i;
+  ares_status_t status;
+
+  status = ares_buf_split(buf, (const unsigned char *)"\n", 1,
+                          ARES_BUF_SPLIT_TRIM, 0, &lines);
+  if (status != ARES_SUCCESS) {
+    goto done;
+  }
+
+  num = ares_array_len(lines);
+  for (i = 0; i < num; i++) {
+    ares_buf_t **bufptr = ares_array_at(lines, i);
+    ares_buf_t  *line   = *bufptr;
+
+    status = cb(channel, sysconfig, line);
+    if (status != ARES_SUCCESS) {
+      goto done;
+    }
+  }
+
+done:
+  ares_array_destroy(lines);
+  return status;
+}
 
 /* Should only return:
  *  ARES_ENOTFOUND - file not found
@@ -737,16 +766,13 @@ typedef ares_status_t (*line_callback_t)(const ares_channel_t *channel,
  *  ARES_SUCCESS   - file processed, doesn't necessarily mean it was a good
  *                   file, but we're not erroring out if we can't parse
  *                   something (or anything at all) */
-static ares_status_t process_config_lines(const ares_channel_t *channel,
-                                          const char           *filename,
-                                          ares_sysconfig_t     *sysconfig,
-                                          line_callback_t       cb)
+static ares_status_t process_config_lines(const ares_channel_t    *channel,
+                                          const char              *filename,
+                                          ares_sysconfig_t        *sysconfig,
+                                          ares_sysconfig_line_cb_t cb)
 {
   ares_status_t status = ARES_SUCCESS;
-  ares_array_t *lines  = NULL;
   ares_buf_t   *buf    = NULL;
-  size_t        num;
-  size_t        i;
 
   buf = ares_buf_create();
   if (buf == NULL) {
@@ -759,43 +785,30 @@ static ares_status_t process_config_lines(const ares_channel_t *channel,
     goto done;
   }
 
-  status = ares_buf_split(buf, (const unsigned char *)"\n", 1,
-                          ARES_BUF_SPLIT_TRIM, 0, &lines);
-  if (status != ARES_SUCCESS) {
-    goto done;
-  }
-
-  num = ares_array_len(lines);
-  for (i = 0; i < num; i++) {
-    ares_buf_t **bufptr = ares_array_at(lines, i);
-    ares_buf_t  *line   = *bufptr;
-
-    status = cb(channel, sysconfig, line);
-    if (status != ARES_SUCCESS) {
-      goto done;
-    }
-  }
+  status = ares_sysconfig_process_buf(channel, sysconfig, buf, cb);
 
 done:
   ares_buf_destroy(buf);
-  ares_array_destroy(lines);
 
   return status;
 }
 
 ares_status_t ares_init_sysconfig_files(const ares_channel_t *channel,
-                                        ares_sysconfig_t     *sysconfig)
+                                        ares_sysconfig_t     *sysconfig,
+                                        ares_bool_t process_resolvconf)
 {
   ares_status_t status = ARES_SUCCESS;
 
   /* Resolv.conf */
-  status = process_config_lines(channel,
-                                (channel->resolvconf_path != NULL)
-                                  ? channel->resolvconf_path
-                                  : PATH_RESOLV_CONF,
-                                sysconfig, parse_resolvconf_line);
-  if (status != ARES_SUCCESS && status != ARES_ENOTFOUND) {
-    goto done;
+  if (process_resolvconf) {
+    status = process_config_lines(channel,
+                                  (channel->resolvconf_path != NULL)
+                                    ? channel->resolvconf_path
+                                    : PATH_RESOLV_CONF,
+                                  sysconfig, ares_sysconfig_parse_resolv_line);
+    if (status != ARES_SUCCESS && status != ARES_ENOTFOUND) {
+      goto done;
+    }
   }
 
   /* Nsswitch.conf */
diff --git a/deps/cares/src/lib/event/ares_event_configchg.c b/deps/cares/src/lib/event/ares_event_configchg.c
index e3e665bd165523..5ecc6888ab719f 100644
--- a/deps/cares/src/lib/event/ares_event_configchg.c
+++ b/deps/cares/src/lib/event/ares_event_configchg.c
@@ -558,14 +558,24 @@ static ares_status_t config_change_check(ares_htable_strvp_t *filestat,
                                          const char          *resolvconf_path)
 {
   size_t      i;
-  const char *configfiles[5];
+  const char *configfiles[16];
   ares_bool_t changed = ARES_FALSE;
+  size_t      cnt = 0;
 
-  configfiles[0] = resolvconf_path;
-  configfiles[1] = "/etc/nsswitch.conf";
-  configfiles[2] = "/etc/netsvc.conf";
-  configfiles[3] = "/etc/svc.conf";
-  configfiles[4] = NULL;
+  memset(configfiles, 0, sizeof(configfiles));
+
+  configfiles[cnt++] = resolvconf_path;
+  configfiles[cnt++] = "/etc/nsswitch.conf";
+#ifdef _AIX
+  configfiles[cnt++] = "/etc/netsvc.conf";
+#endif
+#ifdef __osf /* Tru64 */
+  configfiles[cnt++] = "/etc/svc.conf";
+#endif
+#ifdef __QNX__
+  configfiles[cnt++] = "/etc/net.cfg";
+#endif
+  configfiles[cnt++] = NULL;
 
   for (i = 0; configfiles[i] != NULL; i++) {
     fileinfo_t *fi = ares_htable_strvp_get_direct(filestat, configfiles[i]);
diff --git a/deps/cares/src/lib/include/ares_buf.h b/deps/cares/src/lib/include/ares_buf.h
index 7836a313e066d1..10d29eaf83bd8e 100644
--- a/deps/cares/src/lib/include/ares_buf.h
+++ b/deps/cares/src/lib/include/ares_buf.h
@@ -219,6 +219,26 @@ CARES_EXTERN unsigned char *ares_buf_finish_bin(ares_buf_t *buf, size_t *len);
  */
 CARES_EXTERN char          *ares_buf_finish_str(ares_buf_t *buf, size_t *len);
 
+/*! Replace the given search byte sequence with the replacement byte sequence.
+ *  This is only valid for allocated buffers, not const buffers.  Will replace
+ *  all byte sequences starting at the current offset to the end of the buffer.
+ *
+ *  \param[in]  buf       Initialized buffer object. Can not be a "const" buffer.
+ *  \param[in]  srch      Search byte sequence, must not be NULL.
+ *  \param[in]  srch_size Size of byte sequence, must not be zero.
+ *  \param[in]  rplc      Byte sequence to use as replacement.  May be NULL if
+ *                        rplc_size is zero.
+ *  \param[in]  rplc_size Size of replacement byte sequence, may be 0.
+ *  \return ARES_SUCCESS on success, otherwise on may return failure only on
+ *          memory allocation failure or misuse.  Will not return indication
+ *          if any replacements occurred
+ */
+CARES_EXTERN ares_status_t  ares_buf_replace(ares_buf_t *buf,
+                                             const unsigned char *srch,
+                                             size_t srch_size,
+                                             const unsigned char *rplc,
+                                             size_t rplc_size);
+
 /*! Tag a position to save in the buffer in case parsing needs to rollback,
  *  such as if insufficient data is available, but more data may be added in
  *  the future.  Only a single tag can be set per buffer object.  Setting a
diff --git a/deps/cares/src/lib/include/ares_str.h b/deps/cares/src/lib/include/ares_str.h
index ea75b3b3e7441d..4ee339510bf026 100644
--- a/deps/cares/src/lib/include/ares_str.h
+++ b/deps/cares/src/lib/include/ares_str.h
@@ -29,6 +29,20 @@
 
 CARES_EXTERN char  *ares_strdup(const char *s1);
 
+/*! Scan up to maxlen bytes for the first NULL character and return
+ *  its index, or maxlen if not found.  The function only returns
+ *  maxlen if the first maxlen bytes were not NULL characters; it
+ *  makes no guarantee for what \c str[maxlen] (if defined) is, and
+ *  does not access it.  It is behaving like the POSIX \c strnlen()
+ *  function, except that it returns 0 if the \p str pointer is \c
+ *  NULL.
+ *
+ *  \param[in] str    The string to scan for the NULL character
+ *  \param[in] maxlen The maximum number of bytes to scan
+ *  \return Index of first NULL byte. Between 0 and maxlen (inclusive).
+ */
+CARES_EXTERN size_t ares_strnlen(const char *str, size_t maxlen);
+
 CARES_EXTERN size_t ares_strlen(const char *str);
 
 /*! Copy string from source to destination with destination buffer size
diff --git a/deps/cares/src/lib/record/ares_dns_multistring.c b/deps/cares/src/lib/record/ares_dns_multistring.c
index 57c0d1c0a803ec..44fcaccd65bb6a 100644
--- a/deps/cares/src/lib/record/ares_dns_multistring.c
+++ b/deps/cares/src/lib/record/ares_dns_multistring.c
@@ -146,6 +146,18 @@ ares_status_t ares_dns_multistring_add_own(ares_dns_multistring_t *strs,
     return status;
   }
 
+  /* Issue #921, ares_dns_multistring_get() doesn't have a way to indicate
+   * success or fail on a zero-length string which is actually valid.  So we
+   * are going to allocate a 1-byte buffer to use as a placeholder in this
+   * case */
+  if (str == NULL) {
+    str = ares_malloc_zero(1);
+    if (str == NULL) {
+      ares_array_remove_last(strs->strs);
+      return ARES_ENOMEM;
+    }
+  }
+
   data->data = str;
   data->len  = len;
 
@@ -252,36 +264,38 @@ ares_status_t ares_dns_multistring_parse_buf(ares_buf_t *buf,
       break; /* LCOV_EXCL_LINE: DefensiveCoding */
     }
 
-    if (len) {
-      /* When used by the _str() parser, it really needs to be validated to
-       * be a valid printable ascii string.  Do that here */
-      if (validate_printable && ares_buf_len(buf) >= len) {
-        size_t      mylen;
-        const char *data = (const char *)ares_buf_peek(buf, &mylen);
-        if (!ares_str_isprint(data, len)) {
-          status = ARES_EBADSTR;
-          break;
-        }
+
+    /* When used by the _str() parser, it really needs to be validated to
+     * be a valid printable ascii string.  Do that here */
+    if (len && validate_printable && ares_buf_len(buf) >= len) {
+      size_t      mylen;
+      const char *data = (const char *)ares_buf_peek(buf, &mylen);
+      if (!ares_str_isprint(data, len)) {
+        status = ARES_EBADSTR;
+        break;
       }
+    }
 
-      if (strs != NULL) {
-        unsigned char *data = NULL;
+    if (strs != NULL) {
+      unsigned char *data = NULL;
+      if (len) {
         status = ares_buf_fetch_bytes_dup(buf, len, ARES_TRUE, &data);
         if (status != ARES_SUCCESS) {
           break;
         }
-        status = ares_dns_multistring_add_own(*strs, data, len);
-        if (status != ARES_SUCCESS) {
-          ares_free(data);
-          break;
-        }
-      } else {
-        status = ares_buf_consume(buf, len);
-        if (status != ARES_SUCCESS) {
-          break;
-        }
+      }
+      status = ares_dns_multistring_add_own(*strs, data, len);
+      if (status != ARES_SUCCESS) {
+        ares_free(data);
+        break;
+      }
+    } else {
+      status = ares_buf_consume(buf, len);
+      if (status != ARES_SUCCESS) {
+        break;
       }
     }
+
   }
 
   if (status != ARES_SUCCESS && strs != NULL) {
diff --git a/deps/cares/src/lib/str/ares_buf.c b/deps/cares/src/lib/str/ares_buf.c
index 69e6b38aac849e..63acc6cf7714d3 100644
--- a/deps/cares/src/lib/str/ares_buf.c
+++ b/deps/cares/src/lib/str/ares_buf.c
@@ -1104,6 +1104,72 @@ const unsigned char *ares_buf_peek(const ares_buf_t *buf, size_t *len)
   return ares_buf_fetch(buf, len);
 }
 
+ares_status_t ares_buf_replace(ares_buf_t *buf, const unsigned char *srch,
+                               size_t srch_size, const unsigned char *rplc,
+                               size_t rplc_size)
+{
+  size_t        processed_len = 0;
+  ares_status_t status;
+
+  if (buf->alloc_buf == NULL || srch == NULL || srch_size == 0 ||
+      (rplc == NULL && rplc_size != 0)) {
+    return ARES_EFORMERR;
+  }
+
+  while (1) {
+    unsigned char *ptr           = buf->alloc_buf + buf->offset + processed_len;
+    size_t         remaining_len = buf->data_len - buf->offset - processed_len;
+    size_t         found_offset  = 0;
+    size_t         move_data_len;
+
+    /* Find pattern */
+    ptr = ares_memmem(ptr, remaining_len, srch, srch_size);
+    if (ptr == NULL) {
+      break;
+    }
+
+    /* Store the offset this was found because our actual pointer might be
+     * switched out from under us by the call to ensure_space() if the
+     * replacement pattern is larger than the search pattern */
+    found_offset   = (size_t)(ptr - (size_t)(buf->alloc_buf + buf->offset));
+    if (rplc_size > srch_size) {
+      status = ares_buf_ensure_space(buf, rplc_size - srch_size);
+      if (status != ARES_SUCCESS) {
+        return status;
+      }
+    }
+
+    /* Impossible, but silence clang */
+    if (buf->alloc_buf == NULL) {
+      return ARES_ENOMEM;
+    }
+
+    /* Recalculate actual pointer */
+    ptr = buf->alloc_buf + buf->offset + found_offset;
+
+    /* Move the data */
+    move_data_len = buf->data_len - buf->offset - found_offset - srch_size;
+    memmove(ptr + rplc_size,
+            ptr + srch_size,
+            move_data_len);
+
+    /* Copy in the replacement data */
+    if (rplc != NULL && rplc_size > 0) {
+      memcpy(ptr, rplc, rplc_size);
+    }
+
+    if (rplc_size > srch_size) {
+      buf->data_len += rplc_size - srch_size;
+    } else {
+      buf->data_len -= srch_size - rplc_size;
+    }
+
+    processed_len = found_offset + rplc_size;
+  }
+
+  return ARES_SUCCESS;
+}
+
 ares_status_t ares_buf_peek_byte(const ares_buf_t *buf, unsigned char *b)
 {
   size_t               remaining_len = 0;
diff --git a/deps/cares/src/lib/str/ares_str.c b/deps/cares/src/lib/str/ares_str.c
index f6bfabf11f4467..0eda1ab9f15783 100644
--- a/deps/cares/src/lib/str/ares_str.c
+++ b/deps/cares/src/lib/str/ares_str.c
@@ -32,6 +32,23 @@
 #  include <stdint.h>
 #endif
 
+size_t ares_strnlen(const char *str, size_t maxlen) {
+  const char *p = NULL;
+  if (str == NULL) {
+    return 0;
+  }
+#ifdef HAVE_STRNLEN
+  (void)p;
+  return strnlen(str, maxlen);
+#else
+  if ((p = memchr(str, 0, maxlen)) == NULL) {
+    return maxlen;
+  } else {
+    return (size_t)(p - str);
+  }
+#endif /* HAVE_STRNLEN */
+}
+
 size_t ares_strlen(const char *str)
 {
   if (str == NULL) {
diff --git a/deps/cares/src/tools/Makefile.in b/deps/cares/src/tools/Makefile.in
index 9a96a74fa6957d..19e99a253378c7 100644
--- a/deps/cares/src/tools/Makefile.in
+++ b/deps/cares/src/tools/Makefile.in
@@ -91,7 +91,9 @@ host_triplet = @host@
 noinst_PROGRAMS = $(am__EXEEXT_1)
 subdir = src/tools
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
-am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
+am__aclocal_m4_deps = $(top_srcdir)/m4/ares_check_user_namespace.m4 \
+	$(top_srcdir)/m4/ares_check_uts_namespace.m4 \
+	$(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_ac_print_to_file.m4 \
 	$(top_srcdir)/m4/ax_add_am_macro_static.m4 \
 	$(top_srcdir)/m4/ax_am_macros_static.m4 \
@@ -101,8 +103,6 @@ am__aclocal_m4_deps = $(top_srcdir)/m4/ax_ac_append_to_file.m4 \
 	$(top_srcdir)/m4/ax_check_compile_flag.m4 \
 	$(top_srcdir)/m4/ax_check_gnu_make.m4 \
 	$(top_srcdir)/m4/ax_check_link_flag.m4 \
-	$(top_srcdir)/m4/ax_check_user_namespace.m4 \
-	$(top_srcdir)/m4/ax_check_uts_namespace.m4 \
 	$(top_srcdir)/m4/ax_code_coverage.m4 \
 	$(top_srcdir)/m4/ax_compiler_vendor.m4 \
 	$(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \

From 5c819f104312e4e04ffc294f6d13634475d89a06 Mon Sep 17 00:00:00 2001
From: Mert Can Altin <mertgold60@gmail.com>
Date: Tue, 17 Dec 2024 13:03:09 +0300
Subject: [PATCH 201/216] tools: add REPLACEME check to workflow

PR-URL: https://github.com/nodejs/node/pull/56251
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
---
 .github/workflows/lint-release-proposal.yml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.github/workflows/lint-release-proposal.yml b/.github/workflows/lint-release-proposal.yml
index 5f0f9a87329b17..1ea2b4b1b173e2 100644
--- a/.github/workflows/lint-release-proposal.yml
+++ b/.github/workflows/lint-release-proposal.yml
@@ -57,3 +57,6 @@ jobs:
       - name: Verify NODE_VERSION_IS_RELEASE bit is correctly set
         run: |
           grep -q '^#define NODE_VERSION_IS_RELEASE 1$' src/node_version.h
+      - name: Check for placeholders in documentation
+        run: |
+          ! grep "REPLACEME" doc/api/*.md

From 169bc58447f66dba910997f27c4ca85c498c32b7 Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 17 Dec 2024 08:07:33 -0500
Subject: [PATCH 202/216] deps: update simdutf to 5.6.4

PR-URL: https://github.com/nodejs/node/pull/56255
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 deps/simdutf/simdutf.cpp | 63241 +++++++++++++++++++++++--------------
 deps/simdutf/simdutf.h   |    37 +-
 2 files changed, 39212 insertions(+), 24066 deletions(-)

diff --git a/deps/simdutf/simdutf.cpp b/deps/simdutf/simdutf.cpp
index 007fa02b165204..eb3e4598407374 100644
--- a/deps/simdutf/simdutf.cpp
+++ b/deps/simdutf/simdutf.cpp
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-21 10:33:28 -0500. Do not edit! */
+/* auto-generated on 2024-12-10 14:54:53 -0500. Do not edit! */
 /* begin file src/simdutf.cpp */
 #include "simdutf.h"
 // We include base64_tables once.
@@ -6410,43 +6410,42 @@ SIMDUTF_UNTARGET_REGION
 
 #endif // SIMDUTF_RVV_H
 /* end file src/simdutf/rvv.h */
-/* begin file src/simdutf/fallback.h */
-#ifndef SIMDUTF_FALLBACK_H
-#define SIMDUTF_FALLBACK_H
+/* begin file src/simdutf/lsx.h */
+#ifndef SIMDUTF_LSX_H
+#define SIMDUTF_LSX_H
 
+#ifdef SIMDUTF_FALLBACK_H
+  #error "lsx.h must be included before fallback.h"
+#endif
 
-// Note that fallback.h is always imported last.
 
-// Default Fallback to on unless a builtin implementation has already been
-// selected.
-#ifndef SIMDUTF_IMPLEMENTATION_FALLBACK
-  #if SIMDUTF_CAN_ALWAYS_RUN_ARM64 || SIMDUTF_CAN_ALWAYS_RUN_ICELAKE ||        \
-      SIMDUTF_CAN_ALWAYS_RUN_HASWELL || SIMDUTF_CAN_ALWAYS_RUN_WESTMERE ||     \
-      SIMDUTF_CAN_ALWAYS_RUN_PPC64 || SIMDUTF_CAN_ALWAYS_RUN_RVV
-    #define SIMDUTF_IMPLEMENTATION_FALLBACK 0
-  #else
-    #define SIMDUTF_IMPLEMENTATION_FALLBACK 1
-  #endif
+#ifndef SIMDUTF_IMPLEMENTATION_LSX
+  #define SIMDUTF_IMPLEMENTATION_LSX (SIMDUTF_IS_LSX)
+#endif
+#if SIMDUTF_IMPLEMENTATION_LSX && SIMDUTF_IS_LSX
+  #define SIMDUTF_CAN_ALWAYS_RUN_LSX 1
+#else
+  #define SIMDUTF_CAN_ALWAYS_RUN_LSX 0
 #endif
 
 #define SIMDUTF_CAN_ALWAYS_RUN_FALLBACK (SIMDUTF_IMPLEMENTATION_FALLBACK)
 
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
+#if SIMDUTF_IMPLEMENTATION_LSX
 
 namespace simdutf {
 /**
- * Fallback implementation (runs on any machine).
+ * Implementation for LoongArch SX.
  */
-namespace fallback {} // namespace fallback
+namespace lsx {} // namespace lsx
 } // namespace simdutf
 
-/* begin file src/simdutf/fallback/implementation.h */
-#ifndef SIMDUTF_FALLBACK_IMPLEMENTATION_H
-#define SIMDUTF_FALLBACK_IMPLEMENTATION_H
+/* begin file src/simdutf/lsx/implementation.h */
+#ifndef SIMDUTF_LSX_IMPLEMENTATION_H
+#define SIMDUTF_LSX_IMPLEMENTATION_H
 
 
 namespace simdutf {
-namespace fallback {
+namespace lsx {
 
 namespace {
 using namespace simdutf;
@@ -6455,8 +6454,8 @@ using namespace simdutf;
 class implementation final : public simdutf::implementation {
 public:
   simdutf_really_inline implementation()
-      : simdutf::implementation("fallback", "Generic fallback implementation",
-                                0) {}
+      : simdutf::implementation("lsx", "LOONGARCH SX",
+                                internal::instruction_set::LSX) {}
   simdutf_warn_unused int detect_encodings(const char *input,
                                            size_t length) const noexcept final;
   simdutf_warn_unused bool validate_utf8(const char *buf,
@@ -6541,12 +6540,6 @@ class implementation final : public simdutf::implementation {
       const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
   simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
       const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_utf32_to_utf8(
-      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
-      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
-      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
   simdutf_warn_unused size_t
   convert_utf32_to_latin1(const char32_t *buf, size_t len,
                           char *latin1_output) const noexcept final;
@@ -6556,6 +6549,12 @@ class implementation final : public simdutf::implementation {
   simdutf_warn_unused size_t
   convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
                                 char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
   simdutf_warn_unused size_t
   convert_utf32_to_utf16le(const char32_t *buf, size_t len,
                            char16_t *utf16_buffer) const noexcept final;
@@ -6630,3980 +6629,6754 @@ class implementation final : public simdutf::implementation {
   utf8_length_from_latin1(const char *input, size_t length) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char *input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(
-      const char *input, size_t length, char *output, base64_options options,
-      last_chunk_handling_options last_chunk_options) const noexcept;
-  simdutf_warn_unused full_result base64_to_binary_details(
-      const char *input, size_t length, char *output, base64_options options,
-      last_chunk_handling_options last_chunk_options =
-          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options) const noexcept;
   simdutf_warn_unused size_t maximal_binary_length_from_base64(
       const char16_t *input, size_t length) const noexcept;
-  simdutf_warn_unused result base64_to_binary(
-      const char16_t *input, size_t length, char *output,
-      base64_options options,
-      last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options) const noexcept;
   simdutf_warn_unused size_t base64_length_from_binary(
       size_t length, base64_options options) const noexcept;
-  simdutf_warn_unused full_result base64_to_binary_details(
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
+
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
       const char16_t *input, size_t length, char *output,
       base64_options options,
       last_chunk_handling_options last_chunk_options =
           last_chunk_handling_options::loose) const noexcept;
-  size_t binary_to_base64(const char *input, size_t length, char *output,
-                          base64_options options) const noexcept;
 };
-} // namespace fallback
+
+} // namespace lsx
 } // namespace simdutf
 
-#endif // SIMDUTF_FALLBACK_IMPLEMENTATION_H
-/* end file src/simdutf/fallback/implementation.h */
+#endif // SIMDUTF_LSX_IMPLEMENTATION_H
+/* end file src/simdutf/lsx/implementation.h */
 
-/* begin file src/simdutf/fallback/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "fallback"
-// #define SIMDUTF_IMPLEMENTATION fallback
-/* end file src/simdutf/fallback/begin.h */
+/* begin file src/simdutf/lsx/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "lsx"
+// #define SIMDUTF_IMPLEMENTATION lsx
+/* end file src/simdutf/lsx/begin.h */
 
   // Declarations
-/* begin file src/simdutf/fallback/bitmanipulation.h */
-#ifndef SIMDUTF_FALLBACK_BITMANIPULATION_H
-#define SIMDUTF_FALLBACK_BITMANIPULATION_H
+/* begin file src/simdutf/lsx/intrinsics.h */
+#ifndef SIMDUTF_LSX_INTRINSICS_H
+#define SIMDUTF_LSX_INTRINSICS_H
+
+
+// This should be the correct header whether
+// you use visual studio or other compilers.
+#include <lsxintrin.h>
+
+#endif //  SIMDUTF_LSX_INTRINSICS_H
+/* end file src/simdutf/lsx/intrinsics.h */
+/* begin file src/simdutf/lsx/bitmanipulation.h */
+#ifndef SIMDUTF_LSX_BITMANIPULATION_H
+#define SIMDUTF_LSX_BITMANIPULATION_H
 
 #include <limits>
 
 namespace simdutf {
-namespace fallback {
-namespace {} // unnamed namespace
-} // namespace fallback
-} // namespace simdutf
+namespace lsx {
+namespace {
 
-#endif // SIMDUTF_FALLBACK_BITMANIPULATION_H
-/* end file src/simdutf/fallback/bitmanipulation.h */
+simdutf_really_inline int count_ones(uint64_t input_num) {
+  return __lsx_vpickve2gr_w(__lsx_vpcnt_d(__lsx_vreplgr2vr_d(input_num)), 0);
+}
 
-/* begin file src/simdutf/fallback/end.h */
-/* end file src/simdutf/fallback/end.h */
+#if SIMDUTF_NEED_TRAILING_ZEROES
+simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
+  return __builtin_ctzll(input_num);
+}
+#endif
 
-#endif // SIMDUTF_IMPLEMENTATION_FALLBACK
-#endif // SIMDUTF_FALLBACK_H
-/* end file src/simdutf/fallback.h */
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
 
-/* begin file src/scalar/utf8.h */
-#ifndef SIMDUTF_UTF8_H
-#define SIMDUTF_UTF8_H
+#endif // SIMDUTF_LSX_BITMANIPULATION_H
+/* end file src/simdutf/lsx/bitmanipulation.h */
+/* begin file src/simdutf/lsx/simd.h */
+#ifndef SIMDUTF_LSX_SIMD_H
+#define SIMDUTF_LSX_SIMD_H
+
+#include <type_traits>
 
 namespace simdutf {
-namespace scalar {
+namespace lsx {
 namespace {
-namespace utf8 {
-#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_RVV
-// only used by the fallback kernel.
-// credit: based on code from Google Fuchsia (Apache Licensed)
-inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  uint64_t pos = 0;
-  uint32_t code_point = 0;
-  while (pos < len) {
-    // check of the next 16 bytes are ascii.
-    uint64_t next_pos = pos + 16;
-    if (next_pos <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      std::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        pos = next_pos;
-        continue;
-      }
-    }
-    unsigned char byte = data[pos];
+namespace simd {
 
-    while (byte < 0b10000000) {
-      if (++pos == len) {
-        return true;
-      }
-      byte = data[pos];
-    }
+template <typename T> struct simd8;
 
-    if ((byte & 0b11100000) == 0b11000000) {
-      next_pos = pos + 2;
-      if (next_pos > len) {
-        return false;
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      // range check
-      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if ((code_point < 0x80) || (0x7ff < code_point)) {
-        return false;
-      }
-    } else if ((byte & 0b11110000) == 0b11100000) {
-      next_pos = pos + 3;
-      if (next_pos > len) {
-        return false;
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      // range check
-      code_point = (byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point) ||
-          (0xd7ff < code_point && code_point < 0xe000)) {
-        return false;
-      }
-    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
-      next_pos = pos + 4;
-      if (next_pos > len) {
-        return false;
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return false;
-      }
-      // range check
-      code_point =
-          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) {
-        return false;
-      }
-    } else {
-      // we may have a continuation
-      return false;
-    }
-    pos = next_pos;
-  }
-  return true;
-}
-#endif
+//
+// Base class of simd8<uint8_t> and simd8<bool>, both of which use __m128i
+// internally.
+//
+template <typename T, typename Mask = simd8<bool>> struct base_u8 {
+  __m128i value;
+  static const int SIZE = sizeof(value);
 
-inline simdutf_warn_unused result validate_with_errors(const char *buf,
-                                                       size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  uint32_t code_point = 0;
-  while (pos < len) {
-    // check of the next 16 bytes are ascii.
-    size_t next_pos = pos + 16;
-    if (next_pos <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      std::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        pos = next_pos;
-        continue;
-      }
-    }
-    unsigned char byte = data[pos];
+  // Conversion from/to SIMD register
+  simdutf_really_inline base_u8(const __m128i _value) : value(_value) {}
+  simdutf_really_inline operator const __m128i &() const { return this->value; }
+  simdutf_really_inline operator __m128i &() { return this->value; }
+  simdutf_really_inline T first() const {
+    return __lsx_vpickve2gr_bu(this->value, 0);
+  }
+  simdutf_really_inline T last() const {
+    return __lsx_vpickve2gr_bu(this->value, 15);
+  }
 
-    while (byte < 0b10000000) {
-      if (++pos == len) {
-        return result(error_code::SUCCESS, len);
-      }
-      byte = data[pos];
-    }
+  // Bit operations
+  simdutf_really_inline simd8<T> operator|(const simd8<T> other) const {
+    return __lsx_vor_v(this->value, other);
+  }
+  simdutf_really_inline simd8<T> operator&(const simd8<T> other) const {
+    return __lsx_vand_v(this->value, other);
+  }
+  simdutf_really_inline simd8<T> operator^(const simd8<T> other) const {
+    return __lsx_vxor_v(this->value, other);
+  }
+  simdutf_really_inline simd8<T> bit_andnot(const simd8<T> other) const {
+    return __lsx_vandn_v(this->value, other);
+  }
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd8<T> &operator|=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd8<T> &operator&=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd8<T> &operator^=(const simd8<T> other) {
+    auto this_cast = static_cast<simd8<T> *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
+  }
 
-    if ((byte & 0b11100000) == 0b11000000) {
-      next_pos = pos + 2;
-      if (next_pos > len) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if ((code_point < 0x80) || (0x7ff < code_point)) {
-        return result(error_code::OVERLONG, pos);
-      }
-    } else if ((byte & 0b11110000) == 0b11100000) {
-      next_pos = pos + 3;
-      if (next_pos > len) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      code_point = (byte & 0b00001111) << 12 |
-                   (data[pos + 1] & 0b00111111) << 6 |
-                   (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point)) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0xd7ff < code_point && code_point < 0xe000) {
-        return result(error_code::SURROGATE, pos);
-      }
-    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
-      next_pos = pos + 4;
-      if (next_pos > len) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      code_point =
-          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
-          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0x10ffff < code_point) {
-        return result(error_code::TOO_LARGE, pos);
-      }
-    } else {
-      // we either have too many continuation bytes or an invalid leading byte
-      if ((byte & 0b11000000) == 0b10000000) {
-        return result(error_code::TOO_LONG, pos);
-      } else {
-        return result(error_code::HEADER_BITS, pos);
-      }
-    }
-    pos = next_pos;
+  friend simdutf_really_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return __lsx_vseq_b(lhs, rhs);
   }
-  return result(error_code::SUCCESS, len);
-}
 
-// Finds the previous leading byte starting backward from buf and validates with
-// errors from there Used to pinpoint the location of an error when an invalid
-// chunk is detected We assume that the stream starts with a leading byte, and
-// to check that it is the case, we ask that you pass a pointer to the start of
-// the stream (start).
-inline simdutf_warn_unused result rewind_and_validate_with_errors(
-    const char *start, const char *buf, size_t len) noexcept {
-  // First check that we start with a leading byte
-  if ((*start & 0b11000000) == 0b10000000) {
-    return result(error_code::TOO_LONG, 0);
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
+    return __lsx_vor_v(__lsx_vbsll_v(this->value, N),
+                       __lsx_vbsrl_v(prev_chunk.value, 16 - N));
   }
-  size_t extra_len{0};
-  // A leading byte cannot be further than 4 bytes away
-  for (int i = 0; i < 5; i++) {
-    unsigned char byte = *buf;
-    if ((byte & 0b11000000) != 0b10000000) {
-      break;
-    } else {
-      buf--;
-      extra_len++;
-    }
+};
+
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd8<bool> : base_u8<bool> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
+
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return __lsx_vreplgr2vr_b(uint8_t(-(!!_value)));
   }
 
-  result res = validate_with_errors(buf, len + extra_len);
-  res.count -= extra_len;
-  return res;
-}
+  simdutf_really_inline simd8(const __m128i _value) : base_u8<bool>(_value) {}
+  // False constructor
+  simdutf_really_inline simd8() : simd8(__lsx_vldi(0)) {}
+  // Splat constructor
+  simdutf_really_inline simd8(bool _value) : simd8(splat(_value)) {}
+  simdutf_really_inline void store(uint8_t dst[16]) const {
+    return __lsx_vst(this->value, dst, 0);
+  }
 
-inline size_t count_code_points(const char *buf, size_t len) {
-  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    // -65 is 0b10111111, anything larger in two-complement's should start a new
-    // code point.
-    if (p[i] > -65) {
-      counter++;
-    }
+  simdutf_really_inline uint32_t to_bitmask() const {
+    return __lsx_vpickve2gr_wu(__lsx_vmsknz_b(*this), 0);
   }
-  return counter;
-}
 
-inline size_t utf16_length_from_utf8(const char *buf, size_t len) {
-  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    if (p[i] > -65) {
-      counter++;
-    }
-    if (uint8_t(p[i]) >= 240) {
-      counter++;
-    }
+  simdutf_really_inline bool any() const {
+    return __lsx_vpickve2gr_hu(__lsx_vmsknz_b(*this), 0) != 0;
   }
-  return counter;
-}
+  simdutf_really_inline bool none() const {
+    return __lsx_vpickve2gr_hu(__lsx_vmsknz_b(*this), 0) == 0;
+  }
+  simdutf_really_inline bool all() const {
+    return __lsx_vpickve2gr_hu(__lsx_vmsknz_b(*this), 0) == 0xFFFF;
+  }
+};
 
-simdutf_warn_unused inline size_t trim_partial_utf8(const char *input,
-                                                    size_t length) {
-  if (length < 3) {
-    switch (length) {
-    case 2:
-      if (uint8_t(input[length - 1]) >= 0xc0) {
-        return length - 1;
-      } // 2-, 3- and 4-byte characters with only 1 byte left
-      if (uint8_t(input[length - 2]) >= 0xe0) {
-        return length - 2;
-      } // 3- and 4-byte characters with only 2 bytes left
-      return length;
-    case 1:
-      if (uint8_t(input[length - 1]) >= 0xc0) {
-        return length - 1;
-      } // 2-, 3- and 4-byte characters with only 1 byte left
-      return length;
-    case 0:
-      return length;
-    }
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base_u8<uint8_t> {
+  static simdutf_really_inline simd8<uint8_t> splat(uint8_t _value) {
+    return __lsx_vreplgr2vr_b(_value);
   }
-  if (uint8_t(input[length - 1]) >= 0xc0) {
-    return length - 1;
-  } // 2-, 3- and 4-byte characters with only 1 byte left
-  if (uint8_t(input[length - 2]) >= 0xe0) {
-    return length - 2;
-  } // 3- and 4-byte characters with only 1 byte left
-  if (uint8_t(input[length - 3]) >= 0xf0) {
-    return length - 3;
-  } // 4-byte characters with only 3 bytes left
-  return length;
-}
+  static simdutf_really_inline simd8<uint8_t> zero() { return __lsx_vldi(0); }
+  static simdutf_really_inline simd8<uint8_t> load(const uint8_t *values) {
+    return __lsx_vld(values, 0);
+  }
+  simdutf_really_inline simd8(const __m128i _value)
+      : base_u8<uint8_t>(_value) {}
+  // Zero constructor
+  simdutf_really_inline simd8() : simd8(zero()) {}
+  // Array constructor
+  simdutf_really_inline simd8(const uint8_t values[16]) : simd8(load(values)) {}
+  // Splat constructor
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
+  // Member-by-member initialization
 
-} // namespace utf8
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15)
+      : simd8((__m128i)v16u8{v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                             v12, v13, v14, v15}) {}
 
-#endif
-/* end file src/scalar/utf8.h */
-/* begin file src/scalar/utf16.h */
-#ifndef SIMDUTF_UTF16_H
-#define SIMDUTF_UTF16_H
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15);
+  }
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16 {
+  // Store to array
+  simdutf_really_inline void store(uint8_t dst[16]) const {
+    return __lsx_vst(this->value, dst, 0);
+  }
 
-inline simdutf_warn_unused uint16_t swap_bytes(const uint16_t word) {
-  return uint16_t((word >> 8) | (word << 8));
-}
+  // Saturated math
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return __lsx_vsadd_bu(this->value, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return __lsx_vssub_bu(this->value, other);
+  }
 
-template <endianness big_endian>
-inline simdutf_warn_unused bool validate(const char16_t *buf,
-                                         size_t len) noexcept {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  uint64_t pos = 0;
-  while (pos < len) {
-    uint16_t word =
-        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xF800) == 0xD800) {
-      if (pos + 1 >= len) {
-        return false;
-      }
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return false;
-      }
-      uint16_t next_word =
-          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return false;
-      }
-      pos += 2;
-    } else {
-      pos++;
-    }
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd8<uint8_t>
+  operator+(const simd8<uint8_t> other) const {
+    return __lsx_vadd_b(this->value, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  operator-(const simd8<uint8_t> other) const {
+    return __lsx_vsub_b(this->value, other);
+  }
+  simdutf_really_inline simd8<uint8_t> &operator+=(const simd8<uint8_t> other) {
+    *this = *this + other;
+    return *this;
+  }
+  simdutf_really_inline simd8<uint8_t> &operator-=(const simd8<uint8_t> other) {
+    *this = *this - other;
+    return *this;
   }
-  return true;
-}
 
-template <endianness big_endian>
-inline simdutf_warn_unused result validate_with_errors(const char16_t *buf,
-                                                       size_t len) noexcept {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  while (pos < len) {
-    uint16_t word =
-        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xF800) == 0xD800) {
-      if (pos + 1 >= len) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint16_t next_word =
-          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      pos += 2;
+  // Order-specific operations
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return __lsx_vmax_bu(*this, other);
+  }
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return __lsx_vmin_bu(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return __lsx_vsle_bu(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return __lsx_vsle_bu(other, *this);
+  }
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return __lsx_vslt_bu(*this, other);
+  }
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return __lsx_vslt_bu(other, *this);
+  }
+  // Same as >, but instead of guaranteeing all 1's == true, false = 0 and true
+  // = nonzero. For ARM, returns all 1's.
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return simd8<uint8_t>(*this > other);
+  }
+  // Same as <, but instead of guaranteeing all 1's == true, false = 0 and true
+  // = nonzero. For ARM, returns all 1's.
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return simd8<uint8_t>(*this < other);
+  }
+
+  // Bit-specific operations
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return __lsx_vslt_bu(__lsx_vldi(0), __lsx_vand_v(this->value, bits));
+  }
+  simdutf_really_inline bool is_ascii() const {
+    return __lsx_vpickve2gr_hu(__lsx_vmskgez_b(this->value), 0) == 0xFFFF;
+  }
+
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    return __lsx_vpickve2gr_hu(__lsx_vmsknz_b(this->value), 0) > 0;
+  }
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return (*this & bits).any_bits_set_anywhere();
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return __lsx_vsrli_b(this->value, N);
+  }
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return __lsx_vslli_b(this->value, N);
+  }
+
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return lookup_table.apply_lookup_16_to(*this);
+  }
+
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
+  }
+
+  template <typename T>
+  simdutf_really_inline simd8<uint8_t>
+  apply_lookup_16_to(const simd8<T> original) const {
+    __m128i original_tmp = __lsx_vand_v(original, __lsx_vldi(0x1f));
+    return __lsx_vshuf_b(__lsx_vldi(0), *this, simd8<uint8_t>(original_tmp));
+  }
+};
+
+// Signed bytes
+template <> struct simd8<int8_t> {
+  __m128i value;
+
+  static simdutf_really_inline simd8<int8_t> splat(int8_t _value) {
+    return __lsx_vreplgr2vr_b(_value);
+  }
+  static simdutf_really_inline simd8<int8_t> zero() { return __lsx_vldi(0); }
+  static simdutf_really_inline simd8<int8_t> load(const int8_t values[16]) {
+    return __lsx_vld(values, 0);
+  }
+
+  template <endianness big_endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *p) const {
+    __m128i zero = __lsx_vldi(0);
+    if (match_system(big_endian)) {
+      __lsx_vst(__lsx_vilvl_b(zero, (__m128i)this->value),
+                reinterpret_cast<uint16_t *>(p), 0);
+      __lsx_vst(__lsx_vilvh_b(zero, (__m128i)this->value),
+                reinterpret_cast<uint16_t *>(p + 8), 0);
     } else {
-      pos++;
+      __lsx_vst(__lsx_vilvl_b((__m128i)this->value, zero),
+                reinterpret_cast<uint16_t *>(p), 0);
+      __lsx_vst(__lsx_vilvh_b((__m128i)this->value, zero),
+                reinterpret_cast<uint16_t *>(p + 8), 0);
     }
   }
-  return result(error_code::SUCCESS, pos);
-}
 
-template <endianness big_endian>
-inline size_t count_code_points(const char16_t *buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter += ((word & 0xFC00) != 0xDC00);
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *p) const {
+    __m128i zero = __lsx_vldi(0);
+    __m128i in16low = __lsx_vilvl_b(zero, (__m128i)this->value);
+    __m128i in16high = __lsx_vilvh_b(zero, (__m128i)this->value);
+    __m128i in32_0 = __lsx_vilvl_h(zero, in16low);
+    __m128i in32_1 = __lsx_vilvh_h(zero, in16low);
+    __m128i in32_2 = __lsx_vilvl_h(zero, in16high);
+    __m128i in32_3 = __lsx_vilvh_h(zero, in16high);
+    __lsx_vst(in32_0, reinterpret_cast<uint32_t *>(p), 0);
+    __lsx_vst(in32_1, reinterpret_cast<uint32_t *>(p + 4), 0);
+    __lsx_vst(in32_2, reinterpret_cast<uint32_t *>(p + 8), 0);
+    __lsx_vst(in32_3, reinterpret_cast<uint32_t *>(p + 12), 0);
   }
-  return counter;
-}
 
-template <endianness big_endian>
-inline size_t utf8_length_from_utf16(const char16_t *buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter++; // ASCII
-    counter += static_cast<size_t>(
-        word >
-        0x7F); // non-ASCII is at least 2 bytes, surrogates are 2*2 == 4 bytes
-    counter += static_cast<size_t>((word > 0x7FF && word <= 0xD7FF) ||
-                                   (word >= 0xE000)); // three-byte
-  }
-  return counter;
-}
+  // In places where the table can be reused, which is most uses in simdutf, it
+  // is worth it to do 4 table lookups, as there is no direct zero extension
+  // from u8 to u32.
+  simdutf_really_inline void store_ascii_as_utf32_tbl(char32_t *p) const {
+    const simd8<uint8_t> tb1{0, 255, 255, 255, 1, 255, 255, 255,
+                             2, 255, 255, 255, 3, 255, 255, 255};
+    const simd8<uint8_t> tb2{4, 255, 255, 255, 5, 255, 255, 255,
+                             6, 255, 255, 255, 7, 255, 255, 255};
+    const simd8<uint8_t> tb3{8,  255, 255, 255, 9,  255, 255, 255,
+                             10, 255, 255, 255, 11, 255, 255, 255};
+    const simd8<uint8_t> tb4{12, 255, 255, 255, 13, 255, 255, 255,
+                             14, 255, 255, 255, 15, 255, 255, 255};
 
-template <endianness big_endian>
-inline size_t utf32_length_from_utf16(const char16_t *buf, size_t len) {
-  // We are not BOM aware.
-  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
-    counter += ((word & 0xFC00) != 0xDC00);
+    // encourage store pairing and interleaving
+    const auto shuf1 = this->apply_lookup_16_to(tb1);
+    const auto shuf2 = this->apply_lookup_16_to(tb2);
+    shuf1.store(reinterpret_cast<int8_t *>(p));
+    shuf2.store(reinterpret_cast<int8_t *>(p + 4));
+
+    const auto shuf3 = this->apply_lookup_16_to(tb3);
+    const auto shuf4 = this->apply_lookup_16_to(tb4);
+    shuf3.store(reinterpret_cast<int8_t *>(p + 8));
+    shuf4.store(reinterpret_cast<int8_t *>(p + 12));
   }
-  return counter;
-}
+  // Conversion from/to SIMD register
+  simdutf_really_inline simd8(const __m128i _value) : value(_value) {}
+  simdutf_really_inline operator const __m128i &() const { return this->value; }
 
-inline size_t latin1_length_from_utf16(size_t len) { return len; }
+  simdutf_really_inline operator const __m128i() const { return this->value; }
 
-simdutf_really_inline void change_endianness_utf16(const char16_t *in,
-                                                   size_t size, char16_t *out) {
-  const uint16_t *input = reinterpret_cast<const uint16_t *>(in);
-  uint16_t *output = reinterpret_cast<uint16_t *>(out);
-  for (size_t i = 0; i < size; i++) {
-    *output++ = uint16_t(input[i] >> 8 | input[i] << 8);
+  simdutf_really_inline operator __m128i &() { return this->value; }
+
+  // Zero constructor
+  simdutf_really_inline simd8() : simd8(zero()) {}
+  // Splat constructor
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const int8_t *values) : simd8(load(values)) {}
+  // Member-by-member initialization
+
+  simdutf_really_inline simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3,
+                              int8_t v4, int8_t v5, int8_t v6, int8_t v7,
+                              int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+                              int8_t v12, int8_t v13, int8_t v14, int8_t v15)
+      : simd8((__m128i)v16i8{v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                             v12, v13, v14, v15}) {}
+
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15);
   }
-}
 
-template <endianness big_endian>
-simdutf_warn_unused inline size_t trim_partial_utf16(const char16_t *input,
-                                                     size_t length) {
-  if (length <= 1) {
-    return length;
+  // Store to array
+  simdutf_really_inline void store(int8_t dst[16]) const {
+    return __lsx_vst(value, dst, 0);
   }
-  uint16_t last_word = uint16_t(input[length - 1]);
-  last_word = !match_system(big_endian) ? swap_bytes(last_word) : last_word;
-  length -= ((last_word & 0xFC00) == 0xD800);
-  return length;
-}
 
-} // namespace utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  simdutf_really_inline operator simd8<uint8_t>() const {
+    return ((__m128i)this->value);
+  }
 
-#endif
-/* end file src/scalar/utf16.h */
-/* begin file src/scalar/utf32.h */
-#ifndef SIMDUTF_UTF32_H
-#define SIMDUTF_UTF32_H
+  simdutf_really_inline simd8<int8_t>
+  operator|(const simd8<int8_t> other) const {
+    return __lsx_vor_v((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator&(const simd8<int8_t> other) const {
+    return __lsx_vand_v((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator^(const simd8<int8_t> other) const {
+    return __lsx_vxor_v((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  bit_andnot(const simd8<int8_t> other) const {
+    return __lsx_vandn_v((__m128i)other.value, (__m128i)value);
+  }
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32 {
+  // Math
+  simdutf_really_inline simd8<int8_t>
+  operator+(const simd8<int8_t> other) const {
+    return __lsx_vadd_b((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t>
+  operator-(const simd8<int8_t> other) const {
+    return __lsx_vsub_b((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t> &operator+=(const simd8<int8_t> other) {
+    *this = *this + other;
+    return *this;
+  }
+  simdutf_really_inline simd8<int8_t> &operator-=(const simd8<int8_t> other) {
+    *this = *this - other;
+    return *this;
+  }
 
-inline simdutf_warn_unused bool validate(const char32_t *buf,
-                                         size_t len) noexcept {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  uint64_t pos = 0;
-  for (; pos < len; pos++) {
-    uint32_t word = data[pos];
-    if (word > 0x10FFFF || (word >= 0xD800 && word <= 0xDFFF)) {
-      return false;
-    }
+  simdutf_really_inline bool is_ascii() const {
+    return (__lsx_vpickve2gr_hu(__lsx_vmskgez_b((__m128i)this->value), 0) ==
+            0xffff);
   }
-  return true;
-}
 
-inline simdutf_warn_unused result validate_with_errors(const char32_t *buf,
-                                                       size_t len) noexcept {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  for (; pos < len; pos++) {
-    uint32_t word = data[pos];
-    if (word > 0x10FFFF) {
-      return result(error_code::TOO_LARGE, pos);
-    }
-    if (word >= 0xD800 && word <= 0xDFFF) {
-      return result(error_code::SURROGATE, pos);
-    }
+  // Order-sensitive comparisons
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return __lsx_vmax_b((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return __lsx_vmin_b((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return __lsx_vslt_b((__m128i)other.value, (__m128i)value);
+  }
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return __lsx_vslt_b((__m128i)value, (__m128i)other.value);
+  }
+  simdutf_really_inline simd8<bool>
+  operator==(const simd8<int8_t> other) const {
+    return __lsx_vseq_b((__m128i)value, (__m128i)other.value);
   }
-  return result(error_code::SUCCESS, pos);
-}
 
-inline size_t utf8_length_from_utf32(const char32_t *buf, size_t len) {
-  // We are not BOM aware.
-  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    // credit: @ttsugriy  for the vectorizable approach
-    counter++;                                     // ASCII
-    counter += static_cast<size_t>(p[i] > 0x7F);   // two-byte
-    counter += static_cast<size_t>(p[i] > 0x7FF);  // three-byte
-    counter += static_cast<size_t>(p[i] > 0xFFFF); // four-bytes
+  template <int N = 1>
+  simdutf_really_inline simd8<int8_t>
+  prev(const simd8<int8_t> prev_chunk) const {
+    return __lsx_vor_v(__lsx_vbsll_v(this->value, N),
+                       __lsx_vbsrl_v(prev_chunk.value, 16 - N));
   }
-  return counter;
-}
 
-inline size_t utf16_length_from_utf32(const char32_t *buf, size_t len) {
-  // We are not BOM aware.
-  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
-  size_t counter{0};
-  for (size_t i = 0; i < len; i++) {
-    counter++;                                     // non-surrogate word
-    counter += static_cast<size_t>(p[i] > 0xFFFF); // surrogate pair
+  // Perform a lookup assuming no value is larger than 16
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    return lookup_table.apply_lookup_16_to(*this);
+  }
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
   }
-  return counter;
-}
 
-inline size_t latin1_length_from_utf32(size_t len) {
-  // We are not BOM aware.
-  return len; // a utf32 codepoint will always represent 1 latin1 character
-}
+  template <typename T>
+  simdutf_really_inline simd8<int8_t>
+  apply_lookup_16_to(const simd8<T> original) const {
+    __m128i original_tmp = __lsx_vand_v(original, __lsx_vldi(0x1f));
+    return __lsx_vshuf_b(__lsx_vldi(0), (__m128i)this->value,
+                         simd8<uint8_t>(original_tmp));
+  }
+};
 
-inline simdutf_warn_unused uint32_t swap_bytes(const uint32_t word) {
-  return ((word >> 24) & 0xff) |      // move byte 3 to byte 0
-         ((word << 8) & 0xff0000) |   // move byte 1 to byte 2
-         ((word >> 8) & 0xff00) |     // move byte 2 to byte 1
-         ((word << 24) & 0xff000000); // byte 0 to byte 3
-}
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(
+      NUM_CHUNKS == 4,
+      "LoongArch kernel should use four registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
 
-} // namespace utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
 
-#endif
-/* end file src/scalar/utf32.h */
-/* begin file src/scalar/base64.h */
-#ifndef SIMDUTF_BASE64_H
-#define SIMDUTF_BASE64_H
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1,
+                                 const simd8<T> chunk2, const simd8<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 2 * sizeof(simd8<T>) / sizeof(T)),
+               simd8<T>::load(ptr + 3 * sizeof(simd8<T>) / sizeof(T))} {}
 
-#include <cstddef>
-#include <cstdint>
-#include <cstring>
-#include <iostream>
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd8<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd8<T>) * 3 / sizeof(T));
+  }
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace base64 {
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    this->chunks[2] |= other.chunks[2];
+    this->chunks[3] |= other.chunks[3];
+    return *this;
+  }
 
-// This function is not expected to be fast. Do not use in long loops.
-template <class char_type> bool is_ascii_white_space(char_type c) {
-  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';
-}
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
+  }
 
-template <class char_type> bool is_ascii_white_space_or_padding(char_type c) {
-  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f' ||
-         c == '=';
-}
+  simdutf_really_inline bool is_ascii() const { return reduce_or().is_ascii(); }
 
-template <class char_type> bool is_eight_byte(char_type c) {
-  if (sizeof(char_type) == 1) {
-    return true;
+  template <endianness endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 0);
+    this->chunks[1].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 1);
+    this->chunks[2].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 2);
+    this->chunks[3].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 3);
   }
-  return uint8_t(c) == c;
-}
 
-// Returns true upon success. The destination buffer must be large enough.
-// This functions assumes that the padding (=) has been removed.
-template <class char_type>
-full_result
-base64_tail_decode(char *dst, const char_type *src, size_t length,
-                   size_t padded_characters, // number of padding characters
-                                             // '=', typically 0, 1, 2.
-                   base64_options options,
-                   last_chunk_handling_options last_chunk_options) {
-  // This looks like 5 branches, but we expect the compiler to resolve this to a
-  // single branch:
-  const uint8_t *to_base64 = (options & base64_url)
-                                 ? tables::base64::to_base64_url_value
-                                 : tables::base64::to_base64_value;
-  const uint32_t *d0 = (options & base64_url)
-                           ? tables::base64::base64_url::d0
-                           : tables::base64::base64_default::d0;
-  const uint32_t *d1 = (options & base64_url)
-                           ? tables::base64::base64_url::d1
-                           : tables::base64::base64_default::d1;
-  const uint32_t *d2 = (options & base64_url)
-                           ? tables::base64::base64_url::d2
-                           : tables::base64::base64_default::d2;
-  const uint32_t *d3 = (options & base64_url)
-                           ? tables::base64::base64_url::d3
-                           : tables::base64::base64_default::d3;
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 0);
+    this->chunks[1].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 1);
+    this->chunks[2].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 2);
+    this->chunks[3].store_ascii_as_utf32_tbl(ptr + sizeof(simd8<T>) * 3);
+  }
 
-  const char_type *srcend = src + length;
-  const char_type *srcinit = src;
-  const char *dstinit = dst;
+  simdutf_really_inline uint64_t to_bitmask() const {
+    __m128i mask = __lsx_vbsll_v(__lsx_vmsknz_b(this->chunks[3]), 6);
+    mask = __lsx_vor_v(mask, __lsx_vbsll_v(__lsx_vmsknz_b(this->chunks[2]), 4));
+    mask = __lsx_vor_v(mask, __lsx_vbsll_v(__lsx_vmsknz_b(this->chunks[1]), 2));
+    mask = __lsx_vor_v(mask, __lsx_vmsknz_b(this->chunks[0]));
+    return __lsx_vpickve2gr_du(mask, 0);
+  }
 
-  uint32_t x;
-  size_t idx;
-  uint8_t buffer[4];
-  while (true) {
-    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
-           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
-           is_eight_byte(src[3]) &&
-           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
-                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
-      if (match_system(endianness::BIG)) {
-        x = scalar::utf32::swap_bytes(x);
-      }
-      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
-      dst += 3;
-      src += 4;
-    }
-    idx = 0;
-    // we need at least four characters.
-    while (idx < 4 && src < srcend) {
-      char_type c = *src;
-      uint8_t code = to_base64[uint8_t(c)];
-      buffer[idx] = uint8_t(code);
-      if (is_eight_byte(c) && code <= 63) {
-        idx++;
-      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
-        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
-      } else {
-        // We have a space or a newline. We ignore it.
-      }
-      src++;
-    }
-    if (idx != 4) {
-      if (last_chunk_options == last_chunk_handling_options::strict &&
-          (idx != 1) && ((idx + padded_characters) & 3) != 0) {
-        // The partial chunk was at src - idx
-        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
-      } else if (last_chunk_options ==
-                     last_chunk_handling_options::stop_before_partial &&
-                 (idx != 1) && ((idx + padded_characters) & 3) != 0) {
-        // Rewind src to before partial chunk
-        src -= idx;
-        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
-      } else {
-        if (idx == 2) {
-          uint32_t triple =
-              (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
-          if ((last_chunk_options == last_chunk_handling_options::strict) &&
-              (triple & 0xffff)) {
-            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
-                    size_t(dst - dstinit)};
-          }
-          if (match_system(endianness::BIG)) {
-            triple <<= 8;
-            std::memcpy(dst, &triple, 1);
-          } else {
-            triple = scalar::utf32::swap_bytes(triple);
-            triple >>= 8;
-            std::memcpy(dst, &triple, 1);
-          }
-          dst += 1;
-        } else if (idx == 3) {
-          uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
-                            (uint32_t(buffer[1]) << 2 * 6) +
-                            (uint32_t(buffer[2]) << 1 * 6);
-          if ((last_chunk_options == last_chunk_handling_options::strict) &&
-              (triple & 0xff)) {
-            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
-                    size_t(dst - dstinit)};
-          }
-          if (match_system(endianness::BIG)) {
-            triple <<= 8;
-            std::memcpy(dst, &triple, 2);
-          } else {
-            triple = scalar::utf32::swap_bytes(triple);
-            triple >>= 8;
-            std::memcpy(dst, &triple, 2);
-          }
-          dst += 2;
-        } else if (idx == 1) {
-          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
-                  size_t(dst - dstinit)};
-        }
-        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
-      }
-    }
-
-    uint32_t triple =
-        (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6) +
-        (uint32_t(buffer[2]) << 1 * 6) + (uint32_t(buffer[3]) << 0 * 6);
-    if (match_system(endianness::BIG)) {
-      triple <<= 8;
-      std::memcpy(dst, &triple, 3);
-    } else {
-      triple = scalar::utf32::swap_bytes(triple);
-      triple >>= 8;
-      std::memcpy(dst, &triple, 3);
-    }
-    dst += 3;
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                          this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
   }
-}
 
-// like base64_tail_decode, but it will not write past the end of the output
-// buffer. The outlen paramter is modified to reflect the number of bytes
-// written. This functions assumes that the padding (=) has been removed.
-template <class char_type>
-result base64_tail_decode_safe(
-    char *dst, size_t &outlen, const char_type *&srcr, size_t length,
-    size_t padded_characters, // number of padding characters '=', typically 0,
-                              // 1, 2.
-    base64_options options, last_chunk_handling_options last_chunk_options) {
-  const char_type *src = srcr;
-  if (length == 0) {
-    outlen = 0;
-    return {SUCCESS, 0};
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                          this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
   }
-  // This looks like 5 branches, but we expect the compiler to resolve this to a
-  // single branch:
-  const uint8_t *to_base64 = (options & base64_url)
-                                 ? tables::base64::to_base64_url_value
-                                 : tables::base64::to_base64_value;
-  const uint32_t *d0 = (options & base64_url)
-                           ? tables::base64::base64_url::d0
-                           : tables::base64::base64_default::d0;
-  const uint32_t *d1 = (options & base64_url)
-                           ? tables::base64::base64_url::d1
-                           : tables::base64::base64_default::d1;
-  const uint32_t *d2 = (options & base64_url)
-                           ? tables::base64::base64_url::d2
-                           : tables::base64::base64_default::d2;
-  const uint32_t *d3 = (options & base64_url)
-                           ? tables::base64::base64_url::d3
-                           : tables::base64::base64_default::d3;
 
-  const char_type *srcend = src + length;
-  const char_type *srcinit = src;
-  const char *dstinit = dst;
-  const char *dstend = dst + outlen;
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
 
-  uint32_t x;
-  size_t idx;
-  uint8_t buffer[4];
-  while (true) {
-    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
-           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
-           is_eight_byte(src[3]) &&
-           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
-                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
-      if (dstend - dst < 3) {
-        outlen = size_t(dst - dstinit);
-        srcr = src;
-        return {OUTPUT_BUFFER_TOO_SMALL, size_t(src - srcinit)};
-      }
-      if (match_system(endianness::BIG)) {
-        x = scalar::utf32::swap_bytes(x);
-      }
-      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
-      dst += 3;
-      src += 4;
-    }
-    idx = 0;
-    const char_type *srccur = src;
-    // We need at least four characters.
-    while (idx < 4 && src < srcend) {
-      char_type c = *src;
-      uint8_t code = to_base64[uint8_t(c)];
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+    return simd8x64<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
+               (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
+               (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                          this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask,
+                          this->chunks[2] > mask, this->chunks[3] > mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask,
+                          this->chunks[2] >= mask, this->chunks[3] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>(simd8<uint8_t>(this->chunks[0].value) >= mask,
+                          simd8<uint8_t>(this->chunks[1].value) >= mask,
+                          simd8<uint8_t>(this->chunks[2].value) >= mask,
+                          simd8<uint8_t>(this->chunks[3].value) >= mask)
+        .to_bitmask();
+  }
+}; // struct simd8x64<T>
+/* begin file src/simdutf/lsx/simd16-inl.h */
+template <typename T> struct simd16;
 
-      buffer[idx] = uint8_t(code);
-      if (is_eight_byte(c) && code <= 63) {
-        idx++;
-      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
-        outlen = size_t(dst - dstinit);
-        srcr = src;
-        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
-      } else {
-        // We have a space or a newline. We ignore it.
-      }
-      src++;
-    }
-    if (idx != 4) {
-      if (last_chunk_options == last_chunk_handling_options::strict &&
-          ((idx + padded_characters) & 3) != 0) {
-        outlen = size_t(dst - dstinit);
-        srcr = src;
-        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
-      } else if (last_chunk_options ==
-                     last_chunk_handling_options::stop_before_partial &&
-                 ((idx + padded_characters) & 3) != 0) {
-        // Rewind src to before partial chunk
-        srcr = srccur;
-        outlen = size_t(dst - dstinit);
-        return {SUCCESS, size_t(dst - dstinit)};
-      } else { // loose mode
-        if (idx == 0) {
-          // No data left; return success
-          outlen = size_t(dst - dstinit);
-          srcr = src;
-          return {SUCCESS, size_t(dst - dstinit)};
-        } else if (idx == 1) {
-          // Error: Incomplete chunk of length 1 is invalid in loose mode
-          outlen = size_t(dst - dstinit);
-          srcr = src;
-          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
-        } else if (idx == 2 || idx == 3) {
-          // Check if there's enough space in the destination buffer
-          size_t required_space = (idx == 2) ? 1 : 2;
-          if (size_t(dstend - dst) < required_space) {
-            outlen = size_t(dst - dstinit);
-            srcr = src;
-            return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
-          }
-          uint32_t triple = 0;
-          if (idx == 2) {
-            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12);
-            if ((last_chunk_options == last_chunk_handling_options::strict) &&
-                (triple & 0xffff)) {
-              srcr = src;
-              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
-            }
-            // Extract the first byte
-            triple >>= 16;
-            dst[0] = static_cast<char>(triple & 0xFF);
-            dst += 1;
-          } else if (idx == 3) {
-            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12) +
-                     (uint32_t(buffer[2]) << 6);
-            if ((last_chunk_options == last_chunk_handling_options::strict) &&
-                (triple & 0xff)) {
-              srcr = src;
-              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
-            }
-            // Extract the first two bytes
-            triple >>= 8;
-            dst[0] = static_cast<char>((triple >> 8) & 0xFF);
-            dst[1] = static_cast<char>(triple & 0xFF);
-            dst += 2;
-          }
-          outlen = size_t(dst - dstinit);
-          srcr = src;
-          return {SUCCESS, size_t(dst - dstinit)};
-        }
-      }
-    }
+template <typename T, typename Mask = simd16<bool>> struct base_u16 {
+  __m128i value;
+  static const int SIZE = sizeof(value);
 
-    if (dstend - dst < 3) {
-      outlen = size_t(dst - dstinit);
-      srcr = src;
-      return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
-    }
-    uint32_t triple = (uint32_t(buffer[0]) << 18) +
-                      (uint32_t(buffer[1]) << 12) + (uint32_t(buffer[2]) << 6) +
-                      (uint32_t(buffer[3]));
-    if (match_system(endianness::BIG)) {
-      triple <<= 8;
-      std::memcpy(dst, &triple, 3);
-    } else {
-      triple = scalar::utf32::swap_bytes(triple);
-      triple >>= 8;
-      std::memcpy(dst, &triple, 3);
-    }
-    dst += 3;
+  // Conversion from/to SIMD register
+  simdutf_really_inline base_u16() = default;
+  simdutf_really_inline base_u16(const __m128i _value) : value(_value) {}
+  // Bit operations
+  simdutf_really_inline simd16<T> operator|(const simd16<T> other) const {
+    return __lsx_vor_v(this->value, other.value);
   }
-}
-
-// Returns the number of bytes written. The destination buffer must be large
-// enough. It will add padding (=) if needed.
-size_t tail_encode_base64(char *dst, const char *src, size_t srclen,
-                          base64_options options) {
-  // By default, we use padding if we are not using the URL variant.
-  // This is check with ((options & base64_url) == 0) which returns true if we
-  // are not using the URL variant. However, we also allow 'inversion' of the
-  // convention with the base64_reverse_padding option. If the
-  // base64_reverse_padding option is set, we use padding if we are using the
-  // URL variant, and we omit it if we are not using the URL variant. This is
-  // checked with
-  // ((options & base64_reverse_padding) == base64_reverse_padding).
-  bool use_padding =
-      ((options & base64_url) == 0) ^
-      ((options & base64_reverse_padding) == base64_reverse_padding);
-  // This looks like 3 branches, but we expect the compiler to resolve this to
-  // a single branch:
-  const char *e0 = (options & base64_url) ? tables::base64::base64_url::e0
-                                          : tables::base64::base64_default::e0;
-  const char *e1 = (options & base64_url) ? tables::base64::base64_url::e1
-                                          : tables::base64::base64_default::e1;
-  const char *e2 = (options & base64_url) ? tables::base64::base64_url::e2
-                                          : tables::base64::base64_default::e2;
-  char *out = dst;
-  size_t i = 0;
-  uint8_t t1, t2, t3;
-  for (; i + 2 < srclen; i += 3) {
-    t1 = uint8_t(src[i]);
-    t2 = uint8_t(src[i + 1]);
-    t3 = uint8_t(src[i + 2]);
-    *out++ = e0[t1];
-    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
-    *out++ = e1[((t2 & 0x0F) << 2) | ((t3 >> 6) & 0x03)];
-    *out++ = e2[t3];
+  simdutf_really_inline simd16<T> operator&(const simd16<T> other) const {
+    return __lsx_vand_v(this->value, other.value);
   }
-  switch (srclen - i) {
-  case 0:
-    break;
-  case 1:
-    t1 = uint8_t(src[i]);
-    *out++ = e0[t1];
-    *out++ = e1[(t1 & 0x03) << 4];
-    if (use_padding) {
-      *out++ = '=';
-      *out++ = '=';
-    }
-    break;
-  default: /* case 2 */
-    t1 = uint8_t(src[i]);
-    t2 = uint8_t(src[i + 1]);
-    *out++ = e0[t1];
-    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
-    *out++ = e2[(t2 & 0x0F) << 2];
-    if (use_padding) {
-      *out++ = '=';
-    }
+  simdutf_really_inline simd16<T> operator^(const simd16<T> other) const {
+    return __lsx_vxor_v(this->value, other.value);
+  }
+  simdutf_really_inline simd16<T> bit_andnot(const simd16<T> other) const {
+    return __lsx_vandn_v(this->value, other.value);
+  }
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
+  simdutf_really_inline simd16<T> &operator|=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd16<T> &operator&=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline simd16<T> &operator^=(const simd16<T> other) {
+    auto this_cast = static_cast<simd16<T> *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
   }
-  return (size_t)(out - dst);
-}
 
-template <class char_type>
-simdutf_warn_unused size_t maximal_binary_length_from_base64(
-    const char_type *input, size_t length) noexcept {
-  // We follow https://infra.spec.whatwg.org/#forgiving-base64-decode
-  size_t padding = 0;
-  if (length > 0) {
-    if (input[length - 1] == '=') {
-      padding++;
-      if (length > 1 && input[length - 2] == '=') {
-        padding++;
-      }
-    }
+  friend simdutf_really_inline Mask operator==(const simd16<T> lhs,
+                                               const simd16<T> rhs) {
+    return __lsx_vseq_h(lhs.value, rhs.value);
   }
-  size_t actual_length = length - padding;
-  if (actual_length % 4 <= 1) {
-    return actual_length / 4 * 3;
+
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return __lsx_vor_v(__lsx_vbsll_v(*this, N * 2),
+                       __lsx_vbsrl_v(prev_chunk, 16 - N * 2));
   }
-  // if we have a valid input, then the remainder must be 2 or 3 adding one or
-  // two extra bytes.
-  return actual_length / 4 * 3 + (actual_length % 4) - 1;
-}
+};
 
-simdutf_warn_unused size_t
-base64_length_from_binary(size_t length, base64_options options) noexcept {
-  // By default, we use padding if we are not using the URL variant.
-  // This is check with ((options & base64_url) == 0) which returns true if we
-  // are not using the URL variant. However, we also allow 'inversion' of the
-  // convention with the base64_reverse_padding option. If the
-  // base64_reverse_padding option is set, we use padding if we are using the
-  // URL variant, and we omit it if we are not using the URL variant. This is
-  // checked with
-  // ((options & base64_reverse_padding) == base64_reverse_padding).
-  bool use_padding =
-      ((options & base64_url) == 0) ^
-      ((options & base64_reverse_padding) == base64_reverse_padding);
-  if (!use_padding) {
-    return length / 3 * 4 + ((length % 3) ? (length % 3) + 1 : 0);
+template <typename T, typename Mask = simd16<bool>>
+struct base16 : base_u16<T> {
+  typedef uint16_t bitmask_t;
+  typedef uint32_t bitmask2_t;
+
+  simdutf_really_inline base16() : base_u16<T>() {}
+  simdutf_really_inline base16(const __m128i _value) : base_u16<T>(_value) {}
+  template <typename Pointer>
+  simdutf_really_inline base16(const Pointer *ptr)
+      : base16(__lsx_vld(ptr, 0)) {}
+
+  static const int SIZE = sizeof(base_u16<T>::value);
+
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    return __lsx_vor_v(__lsx_vbsll_v(*this, N * 2),
+                       __lsx_vbsrl_v(prev_chunk, 16 - N * 2));
   }
-  return (length + 2) / 3 *
-         4; // We use padding to make the length a multiple of 4.
-}
+};
 
-} // namespace base64
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd16<bool> : base16<bool> {
+  static simdutf_really_inline simd16<bool> splat(bool _value) {
+    return __lsx_vreplgr2vr_h(uint16_t(-(!!_value)));
+  }
 
-#endif
-/* end file src/scalar/base64.h */
-/* begin file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
-#ifndef SIMDUTF_LATIN1_TO_UTF8_H
-#define SIMDUTF_LATIN1_TO_UTF8_H
+  simdutf_really_inline simd16() : base16() {}
+  simdutf_really_inline simd16(const __m128i _value) : base16<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+};
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace latin1_to_utf8 {
+template <typename T> struct base16_numeric : base16<T> {
+  static simdutf_really_inline simd16<T> splat(T _value) {
+    return __lsx_vreplgr2vr_h(_value);
+  }
+  static simdutf_really_inline simd16<T> zero() { return __lsx_vldi(0); }
+  static simdutf_really_inline simd16<T> load(const T values[8]) {
+    return __lsx_vld(reinterpret_cast<const uint16_t *>(values), 0);
+  }
 
-inline size_t convert(const char *buf, size_t len, char *utf8_output) {
-  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  size_t pos = 0;
-  size_t utf8_pos = 0;
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 |
-                 v2}; // We are only interested in these bits: 1000 1000 1000
-                      // 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) ==
-          0) { // if NONE of these are set, e.g. all of them are zero, then
-               // everything is ASCII
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          utf8_output[utf8_pos++] = char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
+  simdutf_really_inline base16_numeric() : base16<T>() {}
+  simdutf_really_inline base16_numeric(const __m128i _value)
+      : base16<T>(_value) {}
 
-    unsigned char byte = data[pos];
-    if ((byte & 0x80) == 0) { // if ASCII
-      // will generate one UTF-8 bytes
-      utf8_output[utf8_pos++] = char(byte);
-      pos++;
-    } else {
-      // will generate two UTF-8 bytes
-      utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
-      utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
-      pos++;
-    }
+  // Store to array
+  simdutf_really_inline void store(T dst[8]) const {
+    return __lsx_vst(this->value, dst, 0);
   }
-  return utf8_pos;
-}
 
-inline size_t convert_safe(const char *buf, size_t len, char *utf8_output,
-                           size_t utf8_len) {
-  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  size_t pos = 0;
-  size_t skip_pos = 0;
-  size_t utf8_pos = 0;
-  while (pos < len && utf8_pos < utf8_len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos >= skip_pos && pos + 16 <= len &&
-        utf8_pos + 16 <= utf8_len) { // if it is safe to read 16 more bytes,
-                                     // check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 |
-                 v2}; // We are only interested in these bits: 1000 1000 1000
-                      // 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) ==
-          0) { // if NONE of these are set, e.g. all of them are zero, then
-               // everything is ASCII
-        ::memcpy(utf8_output + utf8_pos, buf + pos, 16);
-        utf8_pos += 16;
-        pos += 16;
-      } else {
-        // At least one of the next 16 bytes are not ASCII, we will process them
-        // one by one
-        skip_pos = pos + 16;
-      }
-    } else {
-      const auto byte = data[pos];
-      if ((byte & 0x80) == 0) { // if ASCII
-        // will generate one UTF-8 bytes
-        utf8_output[utf8_pos++] = char(byte);
-        pos++;
-      } else if (utf8_pos + 2 <= utf8_len) {
-        // will generate two UTF-8 bytes
-        utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
-        utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
-        pos++;
-      } else {
-        break;
-      }
-    }
-  }
-  return utf8_pos;
-}
-
-} // namespace latin1_to_utf8
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
-
-#endif
-/* end file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
-
-namespace simdutf {
-bool implementation::supported_by_runtime_system() const {
-  uint32_t required_instruction_sets = this->required_instruction_sets();
-  uint32_t supported_instruction_sets =
-      internal::detect_supported_architectures();
-  return ((supported_instruction_sets & required_instruction_sets) ==
-          required_instruction_sets);
-}
+  // Override to distinguish from bool version
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFu; }
 
-simdutf_warn_unused encoding_type implementation::autodetect_encoding(
-    const char *input, size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if (bom_encoding != encoding_type::unspecified) {
-    return bom_encoding;
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const {
+    return __lsx_vadd_b(*this, other);
   }
-  // UTF8 is common, it includes ASCII, and is commonly represented
-  // without a BOM, so if it fits, go with that. Note that it is still
-  // possible to get it wrong, we are only 'guessing'. If some has UTF-16
-  // data without a BOM, it could pass as UTF-8.
-  //
-  // An interesting twist might be to check for UTF-16 ASCII first (every
-  // other byte is zero).
-  if (validate_utf8(input, length)) {
-    return encoding_type::UTF8;
+  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const {
+    return __lsx_vsub_b(*this, other);
   }
-  // The next most common encoding that might appear without BOM is probably
-  // UTF-16LE, so try that next.
-  if ((length % 2) == 0) {
-    // important: we need to divide by two
-    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
-                         length / 2)) {
-      return encoding_type::UTF16_LE;
-    }
+  simdutf_really_inline simd16<T> &operator+=(const simd16<T> other) {
+    *this = *this + other;
+    return *static_cast<simd16<T> *>(this);
   }
-  if ((length % 4) == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
-      return encoding_type::UTF32_LE;
-    }
+  simdutf_really_inline simd16<T> &operator-=(const simd16<T> other) {
+    *this = *this - other;
+    return *static_cast<simd16<T> *>(this);
   }
-  return encoding_type::unspecified;
-}
-
-namespace internal {
-// When there is a single implementation, we should not pay a price
-// for dispatching to the best implementation. We should just use the
-// one we have. This is a compile-time check.
-#define SIMDUTF_SINGLE_IMPLEMENTATION                                          \
-  (SIMDUTF_IMPLEMENTATION_ICELAKE + SIMDUTF_IMPLEMENTATION_HASWELL +           \
-       SIMDUTF_IMPLEMENTATION_WESTMERE + SIMDUTF_IMPLEMENTATION_ARM64 +        \
-       SIMDUTF_IMPLEMENTATION_PPC64 + SIMDUTF_IMPLEMENTATION_FALLBACK ==       \
-   1)
-
-// Static array of known implementations. We are hoping these get baked into the
-// executable without requiring a static initializer.
+};
 
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-static const icelake::implementation *get_icelake_singleton() {
-  static const icelake::implementation icelake_singleton{};
-  return &icelake_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_HASWELL
-static const haswell::implementation *get_haswell_singleton() {
-  static const haswell::implementation haswell_singleton{};
-  return &haswell_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_WESTMERE
-static const westmere::implementation *get_westmere_singleton() {
-  static const westmere::implementation westmere_singleton{};
-  return &westmere_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_ARM64
-static const arm64::implementation *get_arm64_singleton() {
-  static const arm64::implementation arm64_singleton{};
-  return &arm64_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_PPC64
-static const ppc64::implementation *get_ppc64_singleton() {
-  static const ppc64::implementation ppc64_singleton{};
-  return &ppc64_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_RVV
-static const rvv::implementation *get_rvv_singleton() {
-  static const rvv::implementation rvv_singleton{};
-  return &rvv_singleton;
-}
-#endif
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-static const fallback::implementation *get_fallback_singleton() {
-  static const fallback::implementation fallback_singleton{};
-  return &fallback_singleton;
-}
-#endif
+// Signed code unitstemplate<>
+template <> struct simd16<int16_t> : base16_numeric<int16_t> {
+  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+  simdutf_really_inline simd16(const __m128i _value)
+      : base16_numeric<int16_t>(_value) {}
+  simdutf_really_inline simd16(simd16<bool> other)
+      : base16_numeric<int16_t>(other.value) {}
 
-#if SIMDUTF_SINGLE_IMPLEMENTATION
-static const implementation *get_single_implementation() {
-  return
-  #if SIMDUTF_IMPLEMENTATION_ICELAKE
-      get_icelake_singleton();
-  #endif
-  #if SIMDUTF_IMPLEMENTATION_HASWELL
-  get_haswell_singleton();
-  #endif
-  #if SIMDUTF_IMPLEMENTATION_WESTMERE
-  get_westmere_singleton();
-  #endif
-  #if SIMDUTF_IMPLEMENTATION_ARM64
-  get_arm64_singleton();
-  #endif
-  #if SIMDUTF_IMPLEMENTATION_PPC64
-  get_ppc64_singleton();
-  #endif
-  #if SIMDUTF_IMPLEMENTATION_FALLBACK
-  get_fallback_singleton();
-  #endif
-}
-#endif
+  // Splat constructor
+  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const int16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const int16_t *>(values))) {}
+  simdutf_really_inline operator simd16<uint16_t>() const;
 
-/**
- * @private Detects best supported implementation on first use, and sets it
- */
-class detect_best_supported_implementation_on_first_use final
-    : public implementation {
-public:
-  std::string name() const noexcept final { return set_best()->name(); }
-  std::string description() const noexcept final {
-    return set_best()->description();
+  // Order-sensitive comparisons
+  simdutf_really_inline simd16<int16_t>
+  max_val(const simd16<int16_t> other) const {
+    return __lsx_vmax_h(this->value, other.value);
   }
-  uint32_t required_instruction_sets() const noexcept final {
-    return set_best()->required_instruction_sets();
+  simdutf_really_inline simd16<int16_t>
+  min_val(const simd16<int16_t> other) const {
+    return __lsx_vmin_h(this->value, other.value);
   }
-
-  simdutf_warn_unused int
-  detect_encodings(const char *input, size_t length) const noexcept override {
-    return set_best()->detect_encodings(input, length);
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<int16_t> other) const {
+    return __lsx_vsle_h(other.value, this->value);
   }
-
-  simdutf_warn_unused bool
-  validate_utf8(const char *buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf8(buf, len);
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<int16_t> other) const {
+    return __lsx_vslt_h(this->value, other.value);
   }
+};
 
-  simdutf_warn_unused result validate_utf8_with_errors(
-      const char *buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf8_with_errors(buf, len);
-  }
+// Unsigned code unitstemplate<>
+template <> struct simd16<uint16_t> : base16_numeric<uint16_t> {
+  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
+  simdutf_really_inline simd16(const __m128i _value)
+      : base16_numeric<uint16_t>((__m128i)_value) {}
+  simdutf_really_inline simd16(simd16<bool> other)
+      : base16_numeric<uint16_t>(other.value) {}
 
-  simdutf_warn_unused bool
-  validate_ascii(const char *buf, size_t len) const noexcept final override {
-    return set_best()->validate_ascii(buf, len);
-  }
+  // Splat constructor
+  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const uint16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const uint16_t *>(values))) {}
 
-  simdutf_warn_unused result validate_ascii_with_errors(
-      const char *buf, size_t len) const noexcept final override {
-    return set_best()->validate_ascii_with_errors(buf, len);
+  // Saturated math
+  simdutf_really_inline simd16<uint16_t>
+  saturating_add(const simd16<uint16_t> other) const {
+    return __lsx_vsadd_hu(this->value, other.value);
   }
-
-  simdutf_warn_unused bool
-  validate_utf16le(const char16_t *buf,
-                   size_t len) const noexcept final override {
-    return set_best()->validate_utf16le(buf, len);
+  simdutf_really_inline simd16<uint16_t>
+  saturating_sub(const simd16<uint16_t> other) const {
+    return __lsx_vssub_hu(this->value, other.value);
   }
 
-  simdutf_warn_unused bool
-  validate_utf16be(const char16_t *buf,
-                   size_t len) const noexcept final override {
-    return set_best()->validate_utf16be(buf, len);
+  // Order-specific operations
+  simdutf_really_inline simd16<uint16_t>
+  max_val(const simd16<uint16_t> other) const {
+    return __lsx_vmax_hu(this->value, other.value);
   }
-
-  simdutf_warn_unused result validate_utf16le_with_errors(
-      const char16_t *buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16le_with_errors(buf, len);
+  simdutf_really_inline simd16<uint16_t>
+  min_val(const simd16<uint16_t> other) const {
+    return __lsx_vmin_hu(this->value, other.value);
   }
-
-  simdutf_warn_unused result validate_utf16be_with_errors(
-      const char16_t *buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf16be_with_errors(buf, len);
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  gt_bits(const simd16<uint16_t> other) const {
+    return this->saturating_sub(other);
   }
-
-  simdutf_warn_unused bool
-  validate_utf32(const char32_t *buf,
-                 size_t len) const noexcept final override {
-    return set_best()->validate_utf32(buf, len);
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  lt_bits(const simd16<uint16_t> other) const {
+    return other.saturating_sub(*this);
   }
-
-  simdutf_warn_unused result validate_utf32_with_errors(
-      const char32_t *buf, size_t len) const noexcept final override {
-    return set_best()->validate_utf32_with_errors(buf, len);
+  simdutf_really_inline simd16<bool>
+  operator<=(const simd16<uint16_t> other) const {
+    return __lsx_vsle_hu(this->value, other.value);
   }
-
-  simdutf_warn_unused size_t
-  convert_latin1_to_utf8(const char *buf, size_t len,
-                         char *utf8_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf8(buf, len, utf8_output);
+  simdutf_really_inline simd16<bool>
+  operator>=(const simd16<uint16_t> other) const {
+    return __lsx_vsle_hu(other.value, this->value);
   }
-
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf16le(buf, len, utf16_output);
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<uint16_t> other) const {
+    return __lsx_vslt_hu(other.value, this->value);
   }
-
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf16be(buf, len, utf16_output);
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<uint16_t> other) const {
+    return __lsx_vslt_hu(this->value, other.value);
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf32(
-      const char *buf, size_t len,
-      char32_t *latin1_output) const noexcept final override {
-    return set_best()->convert_latin1_to_utf32(buf, len, latin1_output);
+  // Bit-specific operations
+  simdutf_really_inline simd16<bool> bits_not_set() const {
+    return *this == uint16_t(0);
   }
-
-  simdutf_warn_unused size_t
-  convert_utf8_to_latin1(const char *buf, size_t len,
-                         char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf8_to_latin1(buf, len, latin1_output);
+  template <int N> simdutf_really_inline simd16<uint16_t> shr() const {
+    return simd16<uint16_t>(__lsx_vsrli_h(this->value, N));
   }
-
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
-      const char *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf8_to_latin1_with_errors(buf, len,
-                                                          latin1_output);
+  template <int N> simdutf_really_inline simd16<uint16_t> shl() const {
+    return simd16<uint16_t>(__lsx_vslli_h(this->value, N));
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
-      const char *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_latin1(buf, len, latin1_output);
+  // logical operations
+  simdutf_really_inline simd16<uint16_t>
+  operator|(const simd16<uint16_t> other) const {
+    return __lsx_vor_v(this->value, other.value);
   }
-
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16le(buf, len, utf16_output);
+  simdutf_really_inline simd16<uint16_t>
+  operator&(const simd16<uint16_t> other) const {
+    return __lsx_vand_v(this->value, other.value);
   }
-
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16be(buf, len, utf16_output);
+  simdutf_really_inline simd16<uint16_t>
+  operator^(const simd16<uint16_t> other) const {
+    return __lsx_vxor_v(this->value, other.value);
   }
 
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16le_with_errors(buf, len,
-                                                           utf16_output);
+  // Pack with the unsigned saturation of two uint16_t code units into single
+  // uint8_t vector
+  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t> &v0,
+                                                   const simd16<uint16_t> &v1) {
+    return __lsx_vssrlni_bu_h(v1.value, v0.value, 0);
   }
 
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf16be_with_errors(buf, len,
-                                                           utf16_output);
+  // Change the endianness
+  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
+    return __lsx_vshuf4i_b(this->value, 0b10110001);
   }
+};
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf16le(buf, len, utf16_output);
-  }
+simdutf_really_inline simd16<int16_t>::operator simd16<uint16_t>() const {
+  return this->value;
+}
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
-      const char *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf16be(buf, len, utf16_output);
-  }
+template <typename T> struct simd16x32 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
+  static_assert(
+      NUM_CHUNKS == 4,
+      "LOONGARCH kernel should use four registers per 64-byte block.");
+  simd16<T> chunks[NUM_CHUNKS];
 
-  simdutf_warn_unused size_t
-  convert_utf8_to_utf32(const char *buf, size_t len,
-                        char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf32(buf, len, utf32_output);
-  }
+  simd16x32(const simd16x32<T> &o) = delete; // no copy allowed
+  simd16x32<T> &
+  operator=(const simd16<T> other) = delete; // no assignment allowed
+  simd16x32() = delete;                      // no default constructor allowed
 
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
-      const char *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf8_to_utf32_with_errors(buf, len,
-                                                         utf32_output);
-  }
+  simdutf_really_inline
+  simd16x32(const simd16<T> chunk0, const simd16<T> chunk1,
+            const simd16<T> chunk2, const simd16<T> chunk3)
+      : chunks{chunk0, chunk1, chunk2, chunk3} {}
+  simdutf_really_inline simd16x32(const T *ptr)
+      : chunks{simd16<T>::load(ptr),
+               simd16<T>::load(ptr + sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 2 * sizeof(simd16<T>) / sizeof(T)),
+               simd16<T>::load(ptr + 3 * sizeof(simd16<T>) / sizeof(T))} {}
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
-      const char *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_valid_utf8_to_utf32(buf, len, utf32_output);
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd16<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd16<T>) * 1 / sizeof(T));
+    this->chunks[2].store(ptr + sizeof(simd16<T>) * 2 / sizeof(T));
+    this->chunks[3].store(ptr + sizeof(simd16<T>) * 3 / sizeof(T));
   }
 
-  simdutf_warn_unused size_t
-  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
-                            char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_latin1(buf, len, latin1_output);
+  simdutf_really_inline simd16<T> reduce_or() const {
+    return (this->chunks[0] | this->chunks[1]) |
+           (this->chunks[2] | this->chunks[3]);
   }
 
-  simdutf_warn_unused size_t
-  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
-                            char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_latin1(buf, len, latin1_output);
-  }
+  simdutf_really_inline bool is_ascii() const { return reduce_or().is_ascii(); }
 
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
-      const char16_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_latin1_with_errors(buf, len,
-                                                             latin1_output);
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 0);
+    this->chunks[1].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 1);
+    this->chunks[2].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 2);
+    this->chunks[3].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 3);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
-      const char16_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_latin1_with_errors(buf, len,
-                                                             latin1_output);
+  simdutf_really_inline uint64_t to_bitmask() const {
+    __m128i mask = __lsx_vbsll_v(__lsx_vmsknz_b((this->chunks[3]).value), 6);
+    mask = __lsx_vor_v(
+        mask, __lsx_vbsll_v(__lsx_vmsknz_b((this->chunks[2]).value), 4));
+    mask = __lsx_vor_v(
+        mask, __lsx_vbsll_v(__lsx_vmsknz_b((this->chunks[1]).value), 2));
+    mask = __lsx_vor_v(mask, __lsx_vmsknz_b((this->chunks[0]).value));
+    return __lsx_vpickve2gr_du(mask, 0);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
-      const char16_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf16le_to_latin1(buf, len, latin1_output);
+  simdutf_really_inline void swap_bytes() {
+    this->chunks[0] = this->chunks[0].swap_bytes();
+    this->chunks[1] = this->chunks[1].swap_bytes();
+    this->chunks[2] = this->chunks[2].swap_bytes();
+    this->chunks[3] = this->chunks[3].swap_bytes();
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
-      const char16_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_valid_utf16be_to_latin1(buf, len, latin1_output);
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] == mask, this->chunks[1] == mask,
+                           this->chunks[2] == mask, this->chunks[3] == mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t
-  convert_utf16le_to_utf8(const char16_t *buf, size_t len,
-                          char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf8(buf, len, utf8_output);
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask,
+                           this->chunks[2] <= mask, this->chunks[3] <= mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t
-  convert_utf16be_to_utf8(const char16_t *buf, size_t len,
-                          char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf8(buf, len, utf8_output);
-  }
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
 
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
-      const char16_t *buf, size_t len,
-      char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf8_with_errors(buf, len,
-                                                           utf8_output);
+    return simd16x32<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low),
+               (this->chunks[2] <= mask_high) & (this->chunks[2] >= mask_low),
+               (this->chunks[3] <= mask_high) & (this->chunks[3] >= mask_low))
+        .to_bitmask();
   }
-
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
-      const char16_t *buf, size_t len,
-      char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf8_with_errors(buf, len,
-                                                           utf8_output);
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
+    return simd16x32<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low),
+               (this->chunks[2] > mask_high) | (this->chunks[2] < mask_low),
+               (this->chunks[3] > mask_high) | (this->chunks[3] < mask_low))
+        .to_bitmask();
   }
-
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
-      const char16_t *buf, size_t len,
-      char *utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf16le_to_utf8(buf, len, utf8_output);
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] < mask, this->chunks[1] < mask,
+                           this->chunks[2] < mask, this->chunks[3] < mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
-      const char16_t *buf, size_t len,
-      char *utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf16be_to_utf8(buf, len, utf8_output);
-  }
+}; // struct simd16x32<T>
 
-  simdutf_warn_unused size_t
-  convert_utf32_to_latin1(const char32_t *buf, size_t len,
-                          char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
-  }
+template <>
+simdutf_really_inline uint64_t simd16x32<uint16_t>::not_in_range(
+    const uint16_t low, const uint16_t high) const {
+  const simd16<uint16_t> mask_low = simd16<uint16_t>::splat(low);
+  const simd16<uint16_t> mask_high = simd16<uint16_t>::splat(high);
+  simd16x32<uint16_t> x(simd16<uint16_t>((this->chunks[0] > mask_high) |
+                                         (this->chunks[0] < mask_low)),
+                        simd16<uint16_t>((this->chunks[1] > mask_high) |
+                                         (this->chunks[1] < mask_low)),
+                        simd16<uint16_t>((this->chunks[2] > mask_high) |
+                                         (this->chunks[2] < mask_low)),
+                        simd16<uint16_t>((this->chunks[3] > mask_high) |
+                                         (this->chunks[3] < mask_low)));
+  return x.to_bitmask();
+}
+/* end file src/simdutf/lsx/simd16-inl.h */
+} // namespace simd
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
 
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
-      const char32_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1_with_errors(buf, len,
-                                                           latin1_output);
-  }
+#endif // SIMDUTF_LSX_SIMD_H
+/* end file src/simdutf/lsx/simd.h */
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
-      const char32_t *buf, size_t len,
-      char *latin1_output) const noexcept final override {
-    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
-  }
+/* begin file src/simdutf/lsx/end.h */
+/* end file src/simdutf/lsx/end.h */
 
-  simdutf_warn_unused size_t
-  convert_utf32_to_utf8(const char32_t *buf, size_t len,
-                        char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf8(buf, len, utf8_output);
-  }
+#endif // SIMDUTF_IMPLEMENTATION_LSX
 
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
-      const char32_t *buf, size_t len,
-      char *utf8_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
-  }
+#endif // SIMDUTF_LSX_H
+/* end file src/simdutf/lsx.h */
+/* begin file src/simdutf/lasx.h */
+#ifndef SIMDUTF_LASX_H
+#define SIMDUTF_LASX_H
 
-  simdutf_warn_unused size_t
-  convert_valid_utf32_to_utf8(const char32_t *buf, size_t len,
-                              char *utf8_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf8(buf, len, utf8_output);
-  }
+#ifdef SIMDUTF_FALLBACK_H
+  #error "lasx.h must be included before fallback.h"
+#endif
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16le(buf, len, utf16_output);
-  }
 
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16be(buf, len, utf16_output);
-  }
+#ifndef SIMDUTF_IMPLEMENTATION_LASX
+  #define SIMDUTF_IMPLEMENTATION_LASX (SIMDUTF_IS_LASX)
+#endif
+#if SIMDUTF_IMPLEMENTATION_LASX && SIMDUTF_IS_LASX
+  #define SIMDUTF_CAN_ALWAYS_RUN_LASX 1
+#else
+  #define SIMDUTF_CAN_ALWAYS_RUN_LASX 0
+#endif
 
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16le_with_errors(buf, len,
-                                                            utf16_output);
-  }
+#define SIMDUTF_CAN_ALWAYS_RUN_FALLBACK (SIMDUTF_IMPLEMENTATION_FALLBACK)
 
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_utf32_to_utf16be_with_errors(buf, len,
-                                                            utf16_output);
-  }
+#if SIMDUTF_IMPLEMENTATION_LASX
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf16le(buf, len, utf16_output);
-  }
+namespace simdutf {
+/**
+ * Implementation for LoongArch ASX.
+ */
+namespace lasx {} // namespace lasx
+} // namespace simdutf
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
-      const char32_t *buf, size_t len,
-      char16_t *utf16_output) const noexcept final override {
-    return set_best()->convert_valid_utf32_to_utf16be(buf, len, utf16_output);
-  }
+/* begin file src/simdutf/lasx/implementation.h */
+#ifndef SIMDUTF_LASX_IMPLEMENTATION_H
+#define SIMDUTF_LASX_IMPLEMENTATION_H
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(
-      const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf32(buf, len, utf32_output);
-  }
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(
-      const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf32(buf, len, utf32_output);
-  }
+namespace simdutf {
+namespace lasx {
+
+namespace {
+using namespace simdutf;
+}
 
+class implementation final : public simdutf::implementation {
+public:
+  simdutf_really_inline implementation()
+      : simdutf::implementation("lasx", "LOONGARCH ASX",
+                                internal::instruction_set::LSX |
+                                    internal::instruction_set::LASX) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
   simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
       const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf16le_to_utf32_with_errors(buf, len,
-                                                            utf32_output);
-  }
-
+      char32_t *utf32_buffer) const noexcept final;
   simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
       const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_utf16be_to_utf32_with_errors(buf, len,
-                                                            utf32_output);
-  }
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
-      const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_valid_utf16le_to_utf32(buf, len, utf32_output);
-  }
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual result
+  base64_to_binary(const char16_t *input, size_t length, char *output,
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options =
+                       last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused virtual full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+};
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
-      const char16_t *buf, size_t len,
-      char32_t *utf32_output) const noexcept final override {
-    return set_best()->convert_valid_utf16be_to_utf32(buf, len, utf32_output);
-  }
+} // namespace lasx
+} // namespace simdutf
 
-  void change_endianness_utf16(const char16_t *buf, size_t len,
-                               char16_t *output) const noexcept final override {
-    set_best()->change_endianness_utf16(buf, len, output);
-  }
+#endif // SIMDUTF_LASX_IMPLEMENTATION_H
+/* end file src/simdutf/lasx/implementation.h */
 
-  simdutf_warn_unused size_t
-  count_utf16le(const char16_t *buf, size_t len) const noexcept final override {
-    return set_best()->count_utf16le(buf, len);
-  }
+/* begin file src/simdutf/lasx/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "lasx"
+// #define SIMDUTF_IMPLEMENTATION lasx
+/* end file src/simdutf/lasx/begin.h */
 
-  simdutf_warn_unused size_t
-  count_utf16be(const char16_t *buf, size_t len) const noexcept final override {
-    return set_best()->count_utf16be(buf, len);
-  }
+  // Declarations
+/* begin file src/simdutf/lasx/intrinsics.h */
+#ifndef SIMDUTF_LASX_INTRINSICS_H
+#define SIMDUTF_LASX_INTRINSICS_H
 
-  simdutf_warn_unused size_t
-  count_utf8(const char *buf, size_t len) const noexcept final override {
-    return set_best()->count_utf8(buf, len);
-  }
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf8(const char *buf, size_t len) const noexcept override {
-    return set_best()->latin1_length_from_utf8(buf, len);
-  }
+// This should be the correct header whether
+// you use visual studio or other compilers.
+#include <lsxintrin.h>
+#include <lasxintrin.h>
+
+#if defined(__loongarch_asx)
+  #ifdef __clang__
+    #define VREGS_PREFIX "$vr"
+    #define XREGS_PREFIX "$xr"
+  #else // GCC
+    #define VREGS_PREFIX "$f"
+    #define XREGS_PREFIX "$f"
+  #endif
+  #define __ALL_REGS                                                           \
+    "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,"  \
+    "27,28,29,30,31"
+// Convert __m128i to __m256i
+static inline __m256i ____m256i(__m128i in) {
+  __m256i out = __lasx_xvldi(0);
+  __asm__ volatile(".irp i," __ALL_REGS "\n\t"
+                   " .ifc %[out], " XREGS_PREFIX "\\i    \n\t"
+                   "  .irp j," __ALL_REGS "\n\t"
+                   "   .ifc %[in], " VREGS_PREFIX "\\j  \n\t"
+                   "    xvpermi.q $xr\\i, $xr\\j, 0x0  \n\t"
+                   "   .endif                           \n\t"
+                   "  .endr                             \n\t"
+                   " .endif                             \n\t"
+                   ".endr                               \n\t"
+                   : [out] "+f"(out)
+                   : [in] "f"(in));
+  return out;
+}
+// Convert two __m128i to __m256i
+static inline __m256i lasx_set_q(__m128i inhi, __m128i inlo) {
+  __m256i out;
+  __asm__ volatile(".irp i," __ALL_REGS "\n\t"
+                   " .ifc %[hi], " VREGS_PREFIX "\\i    \n\t"
+                   "  .irp j," __ALL_REGS "\n\t"
+                   "   .ifc %[lo], " VREGS_PREFIX "\\j  \n\t"
+                   "    xvpermi.q $xr\\i, $xr\\j, 0x20  \n\t"
+                   "   .endif                           \n\t"
+                   "  .endr                             \n\t"
+                   " .endif                             \n\t"
+                   ".endr                               \n\t"
+                   ".ifnc %[out], %[hi]                 \n\t"
+                   ".irp i," __ALL_REGS "\n\t"
+                   " .ifc %[out], " XREGS_PREFIX "\\i   \n\t"
+                   "  .irp j," __ALL_REGS "\n\t"
+                   "   .ifc %[hi], " VREGS_PREFIX "\\j  \n\t"
+                   "    xvori.b $xr\\i, $xr\\j, 0       \n\t"
+                   "   .endif                           \n\t"
+                   "  .endr                             \n\t"
+                   " .endif                             \n\t"
+                   ".endr                               \n\t"
+                   ".endif                              \n\t"
+                   : [out] "=f"(out), [hi] "+f"(inhi)
+                   : [lo] "f"(inlo));
+  return out;
+}
+// Convert __m256i low part to __m128i
+static inline __m128i lasx_extracti128_lo(__m256i in) {
+  __m128i out;
+  __asm__ volatile(".ifnc %[out], %[in]                 \n\t"
+                   ".irp i," __ALL_REGS "\n\t"
+                   " .ifc %[out], " VREGS_PREFIX "\\i   \n\t"
+                   "  .irp j," __ALL_REGS "\n\t"
+                   "   .ifc %[in], " XREGS_PREFIX "\\j  \n\t"
+                   "    vori.b $vr\\i, $vr\\j, 0        \n\t"
+                   "   .endif                           \n\t"
+                   "  .endr                             \n\t"
+                   " .endif                             \n\t"
+                   ".endr                               \n\t"
+                   ".endif                              \n\t"
+                   : [out] "=f"(out)
+                   : [in] "f"(in));
+  return out;
+}
+// Convert __m256i high part to __m128i
+static inline __m128i lasx_extracti128_hi(__m256i in) {
+  __m128i out;
+  __asm__ volatile(".irp i," __ALL_REGS "\n\t"
+                   " .ifc %[out], " VREGS_PREFIX "\\i   \n\t"
+                   "  .irp j," __ALL_REGS "\n\t"
+                   "   .ifc %[in], " XREGS_PREFIX "\\j  \n\t"
+                   "    xvpermi.q $xr\\i, $xr\\j, 0x11  \n\t"
+                   "   .endif                           \n\t"
+                   "  .endr                             \n\t"
+                   " .endif                             \n\t"
+                   ".endr                               \n\t"
+                   : [out] "=f"(out)
+                   : [in] "f"(in));
+  return out;
+}
+#endif
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf16(size_t len) const noexcept override {
-    return set_best()->latin1_length_from_utf16(len);
-  }
+#endif //  SIMDUTF_LASX_INTRINSICS_H
+/* end file src/simdutf/lasx/intrinsics.h */
+/* begin file src/simdutf/lasx/bitmanipulation.h */
+#ifndef SIMDUTF_LASX_BITMANIPULATION_H
+#define SIMDUTF_LASX_BITMANIPULATION_H
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf32(size_t len) const noexcept override {
-    return set_best()->latin1_length_from_utf32(len);
-  }
+#include <limits>
 
-  simdutf_warn_unused size_t
-  utf8_length_from_latin1(const char *buf, size_t len) const noexcept override {
-    return set_best()->utf8_length_from_latin1(buf, len);
-  }
+namespace simdutf {
+namespace lasx {
+namespace {
 
-  simdutf_warn_unused size_t utf8_length_from_utf16le(
-      const char16_t *buf, size_t len) const noexcept override {
-    return set_best()->utf8_length_from_utf16le(buf, len);
-  }
+simdutf_really_inline int count_ones(uint64_t input_num) {
+  return __lsx_vpickve2gr_w(__lsx_vpcnt_d(__lsx_vreplgr2vr_d(input_num)), 0);
+}
 
-  simdutf_warn_unused size_t utf8_length_from_utf16be(
-      const char16_t *buf, size_t len) const noexcept override {
-    return set_best()->utf8_length_from_utf16be(buf, len);
-  }
+#if SIMDUTF_NEED_TRAILING_ZEROES
+simdutf_really_inline int trailing_zeroes(uint64_t input_num) {
+  return __builtin_ctzll(input_num);
+}
+#endif
 
-  simdutf_warn_unused size_t
-  utf16_length_from_latin1(size_t len) const noexcept override {
-    return set_best()->utf16_length_from_latin1(len);
-  }
+} // unnamed namespace
+} // namespace lasx
+} // namespace simdutf
 
-  simdutf_warn_unused size_t
-  utf32_length_from_latin1(size_t len) const noexcept override {
-    return set_best()->utf32_length_from_latin1(len);
-  }
+#endif // SIMDUTF_LASX_BITMANIPULATION_H
+/* end file src/simdutf/lasx/bitmanipulation.h */
+/* begin file src/simdutf/lasx/simd.h */
+#ifndef SIMDUTF_LASX_SIMD_H
+#define SIMDUTF_LASX_SIMD_H
 
-  simdutf_warn_unused size_t utf32_length_from_utf16le(
-      const char16_t *buf, size_t len) const noexcept override {
-    return set_best()->utf32_length_from_utf16le(buf, len);
-  }
+#include <type_traits>
 
-  simdutf_warn_unused size_t utf32_length_from_utf16be(
-      const char16_t *buf, size_t len) const noexcept override {
-    return set_best()->utf32_length_from_utf16be(buf, len);
-  }
+namespace simdutf {
+namespace lasx {
+namespace {
+namespace simd {
 
-  simdutf_warn_unused size_t
-  utf16_length_from_utf8(const char *buf, size_t len) const noexcept override {
-    return set_best()->utf16_length_from_utf8(buf, len);
-  }
+__attribute__((aligned(32))) static const uint8_t prev_shuf_table[32][32] = {
+    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
+    {0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+     31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14},
+    {0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
+     30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13},
+    {0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
+     29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12},
+    {0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
+     28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11},
+    {0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+     27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
+    {0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
+     26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
+    {0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7, 8,
+     25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6, 7,
+     24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5, 6,
+     23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4, 5,
+     22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3, 4,
+     21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2, 3,
+     20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1, 2,
+     19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 1,
+     18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1},
+    {0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+     17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0},
+    {15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+     15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+     14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
+     13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
+     12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
+     11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
+     10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
+     9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0,  0},
+    {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
+     8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0,  0},
+    {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
+     7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0,  0},
+    {6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
+     6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0,  0},
+    {5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+     5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0,  0},
+    {4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
+     4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0,  0},
+    {3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
+     3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0,  0},
+    {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
+     2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0,  0},
+    {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+     1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0},
+    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
+};
 
-  simdutf_warn_unused size_t utf8_length_from_utf32(
-      const char32_t *buf, size_t len) const noexcept override {
-    return set_best()->utf8_length_from_utf32(buf, len);
-  }
+__attribute__((aligned(32))) static const uint8_t bitsel_mask_table[32][32] = {
+    {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0,  0x0},
+    {0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x0}};
 
-  simdutf_warn_unused size_t utf16_length_from_utf32(
-      const char32_t *buf, size_t len) const noexcept override {
-    return set_best()->utf16_length_from_utf32(buf, len);
-  }
+// Forward-declared so they can be used by splat and friends.
+template <typename Child> struct base {
+  __m256i value;
 
-  simdutf_warn_unused size_t
-  utf32_length_from_utf8(const char *buf, size_t len) const noexcept override {
-    return set_best()->utf32_length_from_utf8(buf, len);
-  }
+  // Zero constructor
+  simdutf_really_inline base() : value{__m256i()} {}
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(
-      const char *input, size_t length) const noexcept override {
-    return set_best()->maximal_binary_length_from_base64(input, length);
+  // Conversion from SIMD register
+  simdutf_really_inline base(const __m256i _value) : value(_value) {}
+  // Conversion to SIMD register
+  simdutf_really_inline operator const __m256i &() const { return this->value; }
+  simdutf_really_inline operator __m256i &() { return this->value; }
+  template <endianness big_endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    if (big_endian) {
+      __m256i zero = __lasx_xvldi(0);
+      __m256i in8 = __lasx_xvpermi_d(this->value, 0b11011000);
+      __m256i inlow = __lasx_xvilvl_b(in8, zero);
+      __m256i inhigh = __lasx_xvilvh_b(in8, zero);
+      __lasx_xvst(inlow, reinterpret_cast<uint16_t *>(ptr), 0);
+      __lasx_xvst(inhigh, reinterpret_cast<uint16_t *>(ptr), 32);
+    } else {
+      __m256i inlow = __lasx_vext2xv_hu_bu(this->value);
+      __m256i inhigh = __lasx_vext2xv_hu_bu(
+          __lasx_xvpermi_q(this->value, this->value, 0b00000001));
+      __lasx_xvst(inlow, reinterpret_cast<__m256i *>(ptr), 0);
+      __lasx_xvst(inhigh, reinterpret_cast<__m256i *>(ptr), 32);
+    }
   }
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    __m256i in32_0 = __lasx_vext2xv_wu_bu(this->value);
+    __lasx_xvst(in32_0, reinterpret_cast<uint32_t *>(ptr), 0);
 
-  simdutf_warn_unused result base64_to_binary(
-      const char *input, size_t length, char *output, base64_options options,
-      last_chunk_handling_options last_chunk_handling_options =
-          last_chunk_handling_options::loose) const noexcept override {
-    return set_best()->base64_to_binary(input, length, output, options,
-                                        last_chunk_handling_options);
-  }
+    __m256i in8_1 = __lasx_xvpermi_d(this->value, 0b00000001);
+    __m256i in32_1 = __lasx_vext2xv_wu_bu(in8_1);
+    __lasx_xvst(in32_1, reinterpret_cast<uint32_t *>(ptr), 32);
 
-  simdutf_warn_unused full_result base64_to_binary_details(
-      const char *input, size_t length, char *output, base64_options options,
-      last_chunk_handling_options last_chunk_handling_options =
-          last_chunk_handling_options::loose) const noexcept override {
-    return set_best()->base64_to_binary_details(input, length, output, options,
-                                                last_chunk_handling_options);
-  }
+    __m256i in8_2 = __lasx_xvpermi_d(this->value, 0b00000010);
+    __m256i in32_2 = __lasx_vext2xv_wu_bu(in8_2);
+    __lasx_xvst(in32_2, reinterpret_cast<uint32_t *>(ptr), 64);
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(
-      const char16_t *input, size_t length) const noexcept override {
-    return set_best()->maximal_binary_length_from_base64(input, length);
+    __m256i in8_3 = __lasx_xvpermi_d(this->value, 0b00000011);
+    __m256i in32_3 = __lasx_vext2xv_wu_bu(in8_3);
+    __lasx_xvst(in32_3, reinterpret_cast<uint32_t *>(ptr), 96);
   }
-
-  simdutf_warn_unused result base64_to_binary(
-      const char16_t *input, size_t length, char *output,
-      base64_options options,
-      last_chunk_handling_options last_chunk_handling_options =
-          last_chunk_handling_options::loose) const noexcept override {
-    return set_best()->base64_to_binary(input, length, output, options,
-                                        last_chunk_handling_options);
+  // Bit operations
+  simdutf_really_inline Child operator|(const Child other) const {
+    return __lasx_xvor_v(this->value, other);
   }
-
-  simdutf_warn_unused full_result base64_to_binary_details(
-      const char16_t *input, size_t length, char *output,
-      base64_options options,
-      last_chunk_handling_options last_chunk_handling_options =
-          last_chunk_handling_options::loose) const noexcept override {
-    return set_best()->base64_to_binary_details(input, length, output, options,
-                                                last_chunk_handling_options);
+  simdutf_really_inline Child operator&(const Child other) const {
+    return __lasx_xvand_v(this->value, other);
   }
-
-  simdutf_warn_unused size_t base64_length_from_binary(
-      size_t length, base64_options options) const noexcept override {
-    return set_best()->base64_length_from_binary(length, options);
+  simdutf_really_inline Child operator^(const Child other) const {
+    return __lasx_xvxor_v(this->value, other);
   }
-
-  size_t binary_to_base64(const char *input, size_t length, char *output,
-                          base64_options options) const noexcept override {
-    return set_best()->binary_to_base64(input, length, output, options);
+  simdutf_really_inline Child bit_andnot(const Child other) const {
+    return __lasx_xvandn_v(this->value, other);
+  }
+  simdutf_really_inline Child &operator|=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast | other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator&=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast & other;
+    return *this_cast;
+  }
+  simdutf_really_inline Child &operator^=(const Child other) {
+    auto this_cast = static_cast<Child *>(this);
+    *this_cast = *this_cast ^ other;
+    return *this_cast;
   }
+};
 
-  simdutf_really_inline
-  detect_best_supported_implementation_on_first_use() noexcept
-      : implementation("best_supported_detector",
-                       "Detects the best supported implementation and sets it",
-                       0) {}
+template <typename T> struct simd8;
 
-private:
-  const implementation *set_best() const noexcept;
-};
+template <typename T, typename Mask = simd8<bool>>
+struct base8 : base<simd8<T>> {
+  typedef uint32_t bitmask_t;
+  typedef uint64_t bitmask2_t;
 
-static_assert(std::is_trivially_destructible<
-                  detect_best_supported_implementation_on_first_use>::value,
-              "detect_best_supported_implementation_on_first_use should be "
-              "trivially destructible");
+  simdutf_really_inline base8() : base<simd8<T>>() {}
+  simdutf_really_inline base8(const __m256i _value) : base<simd8<T>>(_value) {}
+  simdutf_really_inline T first() const {
+    return __lasx_xvpickve2gr_wu(this->value, 0);
+  }
+  simdutf_really_inline T last() const {
+    return __lasx_xvpickve2gr_wu(this->value, 7);
+  }
+  friend simdutf_really_inline Mask operator==(const simd8<T> lhs,
+                                               const simd8<T> rhs) {
+    return __lasx_xvseq_b(lhs, rhs);
+  }
 
-static const std::initializer_list<const implementation *> &
-get_available_implementation_pointers() {
-  static const std::initializer_list<const implementation *>
-      available_implementation_pointers{
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-          get_icelake_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_HASWELL
-          get_haswell_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_WESTMERE
-          get_westmere_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_ARM64
-          get_arm64_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_PPC64
-          get_ppc64_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_RVV
-          get_rvv_singleton(),
-#endif
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-          get_fallback_singleton(),
-#endif
-      }; // available_implementation_pointers
-  return available_implementation_pointers;
-}
+  static const int SIZE = sizeof(base<T>::value);
 
-// So we can return UNSUPPORTED_ARCHITECTURE from the parser when there is no
-// support
-class unsupported_implementation final : public implementation {
-public:
-  simdutf_warn_unused int detect_encodings(const char *,
-                                           size_t) const noexcept override {
-    return encoding_type::unspecified;
+  template <int N = 1>
+  simdutf_really_inline simd8<T> prev(const simd8<T> prev_chunk) const {
+    if (!N)
+      return this->value;
+
+    __m256i zero = __lasx_xvldi(0);
+    __m256i result, shuf;
+    if (N < 16) {
+      shuf = __lasx_xvld(prev_shuf_table[N], 0);
+
+      result = __lasx_xvshuf_b(
+          __lasx_xvpermi_q(this->value, this->value, 0b00000001), this->value,
+          shuf);
+      __m256i srl_prev = __lasx_xvbsrl_v(
+          __lasx_xvpermi_q(zero, prev_chunk.value, 0b00110001), (16 - N));
+      __m256i mask = __lasx_xvld(bitsel_mask_table[N], 0);
+      result = __lasx_xvbitsel_v(result, srl_prev, mask);
+
+      return result;
+    } else if (N == 16) {
+      return __lasx_xvpermi_q(this->value, prev_chunk.value, 0b00100001);
+    } /*else {
+      __m256i sll_value = __lasx_xvbsll_v(
+          __lasx_xvpermi_q(zero, this->value, 0b00000011), (N - 16) % 32);
+      __m256i mask = __lasx_xvld(bitsel_mask_table[N], 0);
+      shuf = __lasx_xvld(prev_shuf_table[N], 0);
+      result = __lasx_xvshuf_b(
+          __lasx_xvpermi_q(prev_chunk.value, prev_chunk.value, 0b00000001),
+          prev_chunk.value, shuf);
+      result = __lasx_xvbitsel_v(sll_value, result, mask);
+      return result;
+    }*/
   }
+};
 
-  simdutf_warn_unused bool validate_utf8(const char *,
-                                         size_t) const noexcept final override {
-    return false; // Just refuse to validate. Given that we have a fallback
-                  // implementation
-    // it seems unlikely that unsupported_implementation will ever be used. If
-    // it is used, then it will flag all strings as invalid. The alternative is
-    // to return an error_code from which the user has to figure out whether the
-    // string is valid UTF-8... which seems like a lot of work just to handle
-    // the very unlikely case that we have an unsupported implementation. And,
-    // when it does happen (that we have an unsupported implementation), what
-    // are the chances that the programmer has a fallback? Given that *we*
-    // provide the fallback, it implies that the programmer would need a
-    // fallback for our fallback.
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd8<bool> : base8<bool> {
+  static simdutf_really_inline simd8<bool> splat(bool _value) {
+    return __lasx_xvreplgr2vr_b(uint8_t(-(!!_value)));
   }
 
-  simdutf_warn_unused result validate_utf8_with_errors(
-      const char *, size_t) const noexcept final override {
-    return result(error_code::OTHER, 0);
-  }
+  simdutf_really_inline simd8() : base8() {}
+  simdutf_really_inline simd8(const __m256i _value) : base8<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd8(bool _value) : base8<bool>(splat(_value)) {}
 
-  simdutf_warn_unused bool
-  validate_ascii(const char *, size_t) const noexcept final override {
-    return false;
+  simdutf_really_inline uint32_t to_bitmask() const {
+    __m256i mask = __lasx_xvmsknz_b(this->value);
+    uint32_t mask0 = __lasx_xvpickve2gr_wu(mask, 0);
+    uint32_t mask1 = __lasx_xvpickve2gr_wu(mask, 4);
+    return (mask0 | (mask1 << 16));
   }
-
-  simdutf_warn_unused result validate_ascii_with_errors(
-      const char *, size_t) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline bool any() const {
+    if (__lasx_xbz_b(this->value))
+      return false;
+    return true;
   }
-
-  simdutf_warn_unused bool
-  validate_utf16le(const char16_t *, size_t) const noexcept final override {
+  simdutf_really_inline bool none() const {
+    if (__lasx_xbz_b(this->value))
+      return true;
     return false;
   }
-
-  simdutf_warn_unused bool
-  validate_utf16be(const char16_t *, size_t) const noexcept final override {
+  simdutf_really_inline bool all() const {
+    if (__lasx_xbnz_b(this->value))
+      return true;
     return false;
   }
+  simdutf_really_inline simd8<bool> operator~() const { return *this ^ true; }
+};
 
-  simdutf_warn_unused result validate_utf16le_with_errors(
-      const char16_t *, size_t) const noexcept final override {
-    return result(error_code::OTHER, 0);
-  }
-
-  simdutf_warn_unused result validate_utf16be_with_errors(
-      const char16_t *, size_t) const noexcept final override {
-    return result(error_code::OTHER, 0);
+template <typename T> struct base8_numeric : base8<T> {
+  static simdutf_really_inline simd8<T> splat(T _value) {
+    return __lasx_xvreplgr2vr_b(_value);
   }
-
-  simdutf_warn_unused bool
-  validate_utf32(const char32_t *, size_t) const noexcept final override {
-    return false;
+  static simdutf_really_inline simd8<T> zero() { return __lasx_xvldi(0); }
+  static simdutf_really_inline simd8<T> load(const T values[32]) {
+    return __lasx_xvld(reinterpret_cast<const __m256i *>(values), 0);
   }
-
-  simdutf_warn_unused result validate_utf32_with_errors(
-      const char32_t *, size_t) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  static simdutf_really_inline simd8<T> repeat_16(T v0, T v1, T v2, T v3, T v4,
+                                                  T v5, T v6, T v7, T v8, T v9,
+                                                  T v10, T v11, T v12, T v13,
+                                                  T v14, T v15) {
+    return simd8<T>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13,
+                    v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11,
+                    v12, v13, v14, v15);
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf8(
-      const char *, size_t, char *) const noexcept final override {
-    return 0;
-  }
+  simdutf_really_inline base8_numeric() : base8<T>() {}
+  simdutf_really_inline base8_numeric(const __m256i _value)
+      : base8<T>(_value) {}
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16le(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  // Store to array
+  simdutf_really_inline void store(T dst[32]) const {
+    return __lasx_xvst(this->value, reinterpret_cast<__m256i *>(dst), 0);
   }
 
-  simdutf_warn_unused size_t convert_latin1_to_utf16be(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd8<T> operator+(const simd8<T> other) const {
+    return __lasx_xvadd_b(this->value, other);
   }
-
-  simdutf_warn_unused size_t convert_latin1_to_utf32(
-      const char *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<T> operator-(const simd8<T> other) const {
+    return __lasx_xvsub_b(this->value, other);
   }
-
-  simdutf_warn_unused size_t convert_utf8_to_latin1(
-      const char *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<T> &operator+=(const simd8<T> other) {
+    *this = *this + other;
+    return *static_cast<simd8<T> *>(this);
   }
-
-  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
-      const char *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<T> &operator-=(const simd8<T> other) {
+    *this = *this - other;
+    return *static_cast<simd8<T> *>(this);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
-      const char *, size_t, char *) const noexcept final override {
-    return 0;
-  }
+  // Override to distinguish from bool version
+  simdutf_really_inline simd8<T> operator~() const { return *this ^ 0xFFu; }
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16le(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  // Perform a lookup assuming the value is between 0 and 16 (undefined behavior
+  // for out of range values)
+  template <typename L>
+  simdutf_really_inline simd8<L> lookup_16(simd8<L> lookup_table) const {
+    __m256i origin = __lasx_xvand_v(this->value, __lasx_xvldi(0x1f));
+    return __lasx_xvshuf_b(__lasx_xvldi(0), lookup_table, origin);
   }
 
-  simdutf_warn_unused size_t convert_utf8_to_utf16be(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  template <typename L>
+  simdutf_really_inline simd8<L>
+  lookup_16(L replace0, L replace1, L replace2, L replace3, L replace4,
+            L replace5, L replace6, L replace7, L replace8, L replace9,
+            L replace10, L replace11, L replace12, L replace13, L replace14,
+            L replace15) const {
+    return lookup_16(simd8<L>::repeat_16(
+        replace0, replace1, replace2, replace3, replace4, replace5, replace6,
+        replace7, replace8, replace9, replace10, replace11, replace12,
+        replace13, replace14, replace15));
   }
+};
 
-  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
-  }
+// Signed bytes
+template <> struct simd8<int8_t> : base8_numeric<int8_t> {
+  simdutf_really_inline simd8() : base8_numeric<int8_t>() {}
+  simdutf_really_inline simd8(const __m256i _value)
+      : base8_numeric<int8_t>(_value) {}
 
-  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  // Splat constructor
+  simdutf_really_inline simd8(int8_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const int8_t values[32]) : simd8(load(values)) {}
+  simdutf_really_inline operator simd8<uint8_t>() const;
+  // Member-by-member initialization
+  simdutf_really_inline
+  simd8(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+        int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+        int8_t v12, int8_t v13, int8_t v14, int8_t v15, int8_t v16, int8_t v17,
+        int8_t v18, int8_t v19, int8_t v20, int8_t v21, int8_t v22, int8_t v23,
+        int8_t v24, int8_t v25, int8_t v26, int8_t v27, int8_t v28, int8_t v29,
+        int8_t v30, int8_t v31)
+      : simd8((__m256i)v32i8{v0,  v1,  v2,  v3,  v4,  v5,  v6,  v7,
+                             v8,  v9,  v10, v11, v12, v13, v14, v15,
+                             v16, v17, v18, v19, v20, v21, v22, v23,
+                             v24, v25, v26, v27, v28, v29, v30, v31}) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<int8_t>
+  repeat_16(int8_t v0, int8_t v1, int8_t v2, int8_t v3, int8_t v4, int8_t v5,
+            int8_t v6, int8_t v7, int8_t v8, int8_t v9, int8_t v10, int8_t v11,
+            int8_t v12, int8_t v13, int8_t v14, int8_t v15) {
+    return simd8<int8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                         v13, v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                         v10, v11, v12, v13, v14, v15);
   }
-
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline bool is_ascii() const {
+    __m256i ascii_mask = __lasx_xvslti_b(this->value, 0);
+    if (__lasx_xbnz_v(ascii_mask))
+      return false;
+    return true;
   }
-
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
-      const char *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  // Order-sensitive comparisons
+  simdutf_really_inline simd8<int8_t> max_val(const simd8<int8_t> other) const {
+    return __lasx_xvmax_b(this->value, other);
   }
-
-  simdutf_warn_unused size_t convert_utf8_to_utf32(
-      const char *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<int8_t> min_val(const simd8<int8_t> other) const {
+    return __lasx_xvmin_b(this->value, other);
   }
-
-  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
-      const char *, size_t, char32_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<bool> operator>(const simd8<int8_t> other) const {
+    return __lasx_xvslt_b(other, this->value);
   }
-
-  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
-      const char *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<bool> operator<(const simd8<int8_t> other) const {
+    return __lasx_xvslt_b(this->value, other);
   }
+};
 
-  simdutf_warn_unused size_t convert_utf16le_to_latin1(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+// Unsigned bytes
+template <> struct simd8<uint8_t> : base8_numeric<uint8_t> {
+  simdutf_really_inline simd8() : base8_numeric<uint8_t>() {}
+  simdutf_really_inline simd8(const __m256i _value)
+      : base8_numeric<uint8_t>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd8(uint8_t _value) : simd8(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd8(const uint8_t values[32]) : simd8(load(values)) {}
+  // Member-by-member initialization
+  simdutf_really_inline
+  simd8(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4, uint8_t v5,
+        uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9, uint8_t v10,
+        uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14, uint8_t v15,
+        uint8_t v16, uint8_t v17, uint8_t v18, uint8_t v19, uint8_t v20,
+        uint8_t v21, uint8_t v22, uint8_t v23, uint8_t v24, uint8_t v25,
+        uint8_t v26, uint8_t v27, uint8_t v28, uint8_t v29, uint8_t v30,
+        uint8_t v31)
+      : simd8((__m256i)v32u8{v0,  v1,  v2,  v3,  v4,  v5,  v6,  v7,
+                             v8,  v9,  v10, v11, v12, v13, v14, v15,
+                             v16, v17, v18, v19, v20, v21, v22, v23,
+                             v24, v25, v26, v27, v28, v29, v30, v31}) {}
+  // Repeat 16 values as many times as necessary (usually for lookup tables)
+  simdutf_really_inline static simd8<uint8_t>
+  repeat_16(uint8_t v0, uint8_t v1, uint8_t v2, uint8_t v3, uint8_t v4,
+            uint8_t v5, uint8_t v6, uint8_t v7, uint8_t v8, uint8_t v9,
+            uint8_t v10, uint8_t v11, uint8_t v12, uint8_t v13, uint8_t v14,
+            uint8_t v15) {
+    return simd8<uint8_t>(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,
+                          v13, v14, v15, v0, v1, v2, v3, v4, v5, v6, v7, v8, v9,
+                          v10, v11, v12, v13, v14, v15);
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_latin1(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  // Saturated math
+  simdutf_really_inline simd8<uint8_t>
+  saturating_add(const simd8<uint8_t> other) const {
+    return __lasx_xvsadd_bu(this->value, other);
   }
-
-  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<uint8_t>
+  saturating_sub(const simd8<uint8_t> other) const {
+    return __lasx_xvssub_bu(this->value, other);
   }
 
-  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  // Order-specific operations
+  simdutf_really_inline simd8<uint8_t>
+  max_val(const simd8<uint8_t> other) const {
+    return __lasx_xvmax_bu(*this, other);
   }
-
-  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<uint8_t>
+  min_val(const simd8<uint8_t> other) const {
+    return __lasx_xvmin_bu(*this, other);
   }
-
-  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint8_t>
+  gt_bits(const simd8<uint8_t> other) const {
+    return this->saturating_sub(other);
   }
-
-  simdutf_warn_unused size_t convert_utf16le_to_utf8(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd8<uint8_t>
+  lt_bits(const simd8<uint8_t> other) const {
+    return other.saturating_sub(*this);
   }
-
-  simdutf_warn_unused size_t convert_utf16be_to_utf8(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<bool>
+  operator<=(const simd8<uint8_t> other) const {
+    return __lasx_xvsle_bu(*this, other);
   }
-
-  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<bool>
+  operator>=(const simd8<uint8_t> other) const {
+    return __lasx_xvsle_bu(other, *this);
   }
-
-  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<bool>
+  operator>(const simd8<uint8_t> other) const {
+    return __lasx_xvslt_bu(*this, other);
   }
-
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<bool>
+  operator<(const simd8<uint8_t> other) const {
+    return __lasx_xvslt_bu(other, *this);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
-      const char16_t *, size_t, char *) const noexcept final override {
-    return 0;
+  // Bit-specific operations
+  simdutf_really_inline simd8<bool> bits_not_set() const {
+    return *this == uint8_t(0);
   }
-
-  simdutf_warn_unused size_t convert_utf32_to_latin1(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<bool> bits_not_set(simd8<uint8_t> bits) const {
+    return (*this & bits).bits_not_set();
   }
-
-  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<bool> any_bits_set() const {
+    return ~this->bits_not_set();
   }
-
-  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8<bool> any_bits_set(simd8<uint8_t> bits) const {
+    return ~this->bits_not_set(bits);
   }
-
-  simdutf_warn_unused size_t convert_utf32_to_utf8(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline bool is_ascii() const {
+    __m256i ascii_mask = __lasx_xvslti_b(this->value, 0);
+    if (__lasx_xbnz_v(ascii_mask))
+      return false;
+    return true;
   }
-
-  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    if (__lasx_xbnz_v(this->value))
+      return true;
+    return false;
   }
-
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
-      const char32_t *, size_t, char *) const noexcept final override {
-    return 0;
+  simdutf_really_inline bool any_bits_set_anywhere(simd8<uint8_t> bits) const {
+    return (*this & bits).any_bits_set_anywhere();
   }
-
-  simdutf_warn_unused size_t convert_utf32_to_utf16le(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  template <int N> simdutf_really_inline simd8<uint8_t> shr() const {
+    return __lasx_xvsrli_b(this->value, N);
   }
-
-  simdutf_warn_unused size_t convert_utf32_to_utf16be(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  template <int N> simdutf_really_inline simd8<uint8_t> shl() const {
+    return __lasx_xvslli_b(this->value, N);
   }
+};
+simdutf_really_inline simd8<int8_t>::operator simd8<uint8_t>() const {
+  return this->value;
+}
 
-  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
-  }
+template <typename T> struct simd8x64 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd8<T>);
+  static_assert(NUM_CHUNKS == 2,
+                "LASX kernel should use two registers per 64-byte block.");
+  simd8<T> chunks[NUM_CHUNKS];
 
-  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
-  }
+  simd8x64(const simd8x64<T> &o) = delete; // no copy allowed
+  simd8x64<T> &
+  operator=(const simd8<T> other) = delete; // no assignment allowed
+  simd8x64() = delete;                      // no default constructor allowed
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return 0;
-  }
+  simdutf_really_inline simd8x64(const simd8<T> chunk0, const simd8<T> chunk1)
+      : chunks{chunk0, chunk1} {}
+  simdutf_really_inline simd8x64(const T *ptr)
+      : chunks{simd8<T>::load(ptr),
+               simd8<T>::load(ptr + sizeof(simd8<T>) / sizeof(T))} {}
 
-  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
-      const char32_t *, size_t, char16_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd8<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd8<T>) * 1 / sizeof(T));
   }
 
-  simdutf_warn_unused size_t convert_utf16le_to_utf32(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r_hi = this->chunks[1].to_bitmask();
+    return r_lo | (r_hi << 32);
   }
 
-  simdutf_warn_unused size_t convert_utf16be_to_utf32(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8x64<T> &operator|=(const simd8x64<T> &other) {
+    this->chunks[0] |= other.chunks[0];
+    this->chunks[1] |= other.chunks[1];
+    return *this;
   }
 
-  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline simd8<T> reduce_or() const {
+    return this->chunks[0] | this->chunks[1];
   }
 
-  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return result(error_code::OTHER, 0);
+  simdutf_really_inline bool is_ascii() const {
+    return this->reduce_or().is_ascii();
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  template <endianness endian>
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 0);
+    this->chunks[1].template store_ascii_as_utf16<endian>(ptr +
+                                                          sizeof(simd8<T>) * 1);
   }
 
-  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
-      const char16_t *, size_t, char32_t *) const noexcept final override {
-    return 0;
+  simdutf_really_inline void store_ascii_as_utf32(char32_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 0);
+    this->chunks[1].store_ascii_as_utf32(ptr + sizeof(simd8<T>) * 1);
   }
 
-  void change_endianness_utf16(const char16_t *, size_t,
-                               char16_t *) const noexcept final override {}
-
-  simdutf_warn_unused size_t
-  count_utf16le(const char16_t *, size_t) const noexcept final override {
-    return 0;
+  simdutf_really_inline simd8x64<T> bit_or(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<T>(this->chunks[0] | mask, this->chunks[1] | mask);
   }
 
-  simdutf_warn_unused size_t
-  count_utf16be(const char16_t *, size_t) const noexcept final override {
-    return 0;
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] == mask, this->chunks[1] == mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t count_utf8(const char *,
-                                        size_t) const noexcept final override {
-    return 0;
+  simdutf_really_inline uint64_t eq(const simd8x64<uint8_t> &other) const {
+    return simd8x64<bool>(this->chunks[0] == other.chunks[0],
+                          this->chunks[1] == other.chunks[1])
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf8(const char *, size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf16(size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+
+    return simd8x64<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd8<T> mask_low = simd8<T>::splat(low);
+    const simd8<T> mask_high = simd8<T>::splat(high);
+    return simd8x64<bool>(
+               (this->chunks[0] > mask_high) | (this->chunks[0] < mask_low),
+               (this->chunks[1] > mask_high) | (this->chunks[1] < mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] < mask, this->chunks[1] < mask)
+        .to_bitmask();
   }
 
-  simdutf_warn_unused size_t
-  latin1_length_from_utf32(size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline uint64_t gt(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] > mask, this->chunks[1] > mask)
+        .to_bitmask();
   }
-  simdutf_warn_unused size_t
-  utf8_length_from_latin1(const char *, size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline uint64_t gteq(const T m) const {
+    const simd8<T> mask = simd8<T>::splat(m);
+    return simd8x64<bool>(this->chunks[0] >= mask, this->chunks[1] >= mask)
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t gteq_unsigned(const uint8_t m) const {
+    const simd8<uint8_t> mask = simd8<uint8_t>::splat(m);
+    return simd8x64<bool>((simd8<uint8_t>(__m256i(this->chunks[0])) >= mask),
+                          (simd8<uint8_t>(__m256i(this->chunks[1])) >= mask))
+        .to_bitmask();
   }
+}; // struct simd8x64<T>
 
-  simdutf_warn_unused size_t
-  utf8_length_from_utf16le(const char16_t *, size_t) const noexcept override {
-    return 0;
+/* begin file src/simdutf/lasx/simd16-inl.h */
+template <typename T> struct simd16;
+
+template <typename T, typename Mask = simd16<bool>>
+struct base16 : base<simd16<T>> {
+  using bitmask_type = uint32_t;
+
+  simdutf_really_inline base16() : base<simd16<T>>() {}
+  simdutf_really_inline base16(const __m256i _value)
+      : base<simd16<T>>(_value) {}
+  template <typename Pointer>
+  simdutf_really_inline base16(const Pointer *ptr)
+      : base16(__lasx_xvld(reinterpret_cast<const __m256i *>(ptr), 0)) {}
+  friend simdutf_really_inline Mask operator==(const simd16<T> lhs,
+                                               const simd16<T> rhs) {
+    return __lasx_xvseq_h(lhs.value, rhs.value);
   }
 
-  simdutf_warn_unused size_t
-  utf8_length_from_utf16be(const char16_t *, size_t) const noexcept override {
-    return 0;
+  /// the size of vector in bytes
+  static const int SIZE = sizeof(base<simd16<T>>::value);
+
+  /// the number of elements of type T a vector can hold
+  static const int ELEMENTS = SIZE / sizeof(T);
+
+  template <int N = 1>
+  simdutf_really_inline simd16<T> prev(const simd16<T> prev_chunk) const {
+    if (!N)
+      return this->value;
+
+    __m256i zero = __lasx_xvldi(0);
+    __m256i result, shuf;
+    if (N < 8) {
+      shuf = __lasx_xvld(prev_shuf_table[N * 2], 0);
+
+      result = __lasx_xvshuf_b(
+          __lasx_xvpermi_q(this->value, this->value, 0b00000001), this->value,
+          shuf);
+      __m256i srl_prev = __lasx_xvbsrl_v(
+          __lasx_xvpermi_q(zero, prev_chunk, 0b00110001), (16 - N * 2));
+      __m256i mask = __lasx_xvld(bitsel_mask_table[N], 0);
+      result = __lasx_xvbitsel_v(result, srl_prev, mask);
+
+      return result;
+    } else if (N == 8) {
+      return __lasx_xvpermi_q(this->value, prev_chunk, 0b00100001);
+    } else {
+      __m256i sll_value = __lasx_xvbsll_v(
+          __lasx_xvpermi_q(zero, this->value, 0b00000011), (N * 2 - 16));
+      __m256i mask = __lasx_xvld(bitsel_mask_table[N * 2], 0);
+      shuf = __lasx_xvld(prev_shuf_table[N * 2], 0);
+      result =
+          __lasx_xvshuf_b(__lasx_xvpermi_q(prev_chunk, prev_chunk, 0b00000001),
+                          prev_chunk, shuf);
+      result = __lasx_xvbitsel_v(sll_value, result, mask);
+      return result;
+    }
   }
+};
 
-  simdutf_warn_unused size_t
-  utf32_length_from_utf16le(const char16_t *, size_t) const noexcept override {
-    return 0;
+// SIMD byte mask type (returned by things like eq and gt)
+template <> struct simd16<bool> : base16<bool> {
+  static simdutf_really_inline simd16<bool> splat(bool _value) {
+    return __lasx_xvreplgr2vr_h(uint8_t(-(!!_value)));
   }
 
-  simdutf_warn_unused size_t
-  utf32_length_from_utf16be(const char16_t *, size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline simd16() : base16() {}
+  simdutf_really_inline simd16(const __m256i _value) : base16<bool>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(bool _value) : base16<bool>(splat(_value)) {}
+
+  simdutf_really_inline bitmask_type to_bitmask() const {
+    __m256i mask = __lasx_xvmsknz_b(this->value);
+    bitmask_type mask0 = __lasx_xvpickve2gr_wu(mask, 0);
+    bitmask_type mask1 = __lasx_xvpickve2gr_wu(mask, 4);
+    return (mask0 | (mask1 << 16));
+  }
+  simdutf_really_inline bool any() const {
+    if (__lasx_xbz_v(this->value))
+      return false;
+    return true;
+  }
+  simdutf_really_inline simd16<bool> operator~() const { return *this ^ true; }
+};
+
+template <typename T> struct base16_numeric : base16<T> {
+  static simdutf_really_inline simd16<T> splat(T _value) {
+    return __lasx_xvreplgr2vr_h((uint16_t)_value);
+  }
+  static simdutf_really_inline simd16<T> zero() { return __lasx_xvldi(0); }
+  static simdutf_really_inline simd16<T> load(const T values[8]) {
+    return __lasx_xvld(reinterpret_cast<const __m256i *>(values), 0);
   }
 
-  simdutf_warn_unused size_t
-  utf32_length_from_latin1(size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline base16_numeric() : base16<T>() {}
+  simdutf_really_inline base16_numeric(const __m256i _value)
+      : base16<T>(_value) {}
+
+  // Store to array
+  simdutf_really_inline void store(T dst[8]) const {
+    return __lasx_xvst(this->value, reinterpret_cast<__m256i *>(dst), 0);
   }
 
-  simdutf_warn_unused size_t
-  utf16_length_from_utf8(const char *, size_t) const noexcept override {
-    return 0;
+  // Override to distinguish from bool version
+  simdutf_really_inline simd16<T> operator~() const { return *this ^ 0xFFFFu; }
+
+  // Addition/subtraction are the same for signed and unsigned
+  simdutf_really_inline simd16<T> operator+(const simd16<T> other) const {
+    return __lasx_xvadd_h(*this, other);
   }
-  simdutf_warn_unused size_t
-  utf16_length_from_latin1(size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline simd16<T> operator-(const simd16<T> other) const {
+    return __lasx_xvsub_h(*this, other);
   }
-  simdutf_warn_unused size_t
-  utf8_length_from_utf32(const char32_t *, size_t) const noexcept override {
-    return 0;
+  simdutf_really_inline simd16<T> &operator+=(const simd16<T> other) {
+    *this = *this + other;
+    return *static_cast<simd16<T> *>(this);
+  }
+  simdutf_really_inline simd16<T> &operator-=(const simd16<T> other) {
+    *this = *this - other;
+    return *static_cast<simd16<T> *>(this);
   }
+};
 
-  simdutf_warn_unused size_t
-  utf16_length_from_utf32(const char32_t *, size_t) const noexcept override {
-    return 0;
+// Signed code units
+template <> struct simd16<int16_t> : base16_numeric<int16_t> {
+  simdutf_really_inline simd16() : base16_numeric<int16_t>() {}
+  simdutf_really_inline simd16(const __m256i _value)
+      : base16_numeric<int16_t>(_value) {}
+  // Splat constructor
+  simdutf_really_inline simd16(int16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const int16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const int16_t *>(values))) {}
+  // Order-sensitive comparisons
+  simdutf_really_inline simd16<int16_t>
+  max_val(const simd16<int16_t> other) const {
+    return __lasx_xvmax_h(*this, other);
+  }
+  simdutf_really_inline simd16<int16_t>
+  min_val(const simd16<int16_t> other) const {
+    return __lasx_xvmin_h(*this, other);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<int16_t> other) const {
+    return __lasx_xvsle_h(other.value, this->value);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<int16_t> other) const {
+    return __lasx_xvslt_h(this->value, other.value);
   }
+};
 
-  simdutf_warn_unused size_t
-  utf32_length_from_utf8(const char *, size_t) const noexcept override {
-    return 0;
+// Unsigned code units
+template <> struct simd16<uint16_t> : base16_numeric<uint16_t> {
+  simdutf_really_inline simd16() : base16_numeric<uint16_t>() {}
+  simdutf_really_inline simd16(const __m256i _value)
+      : base16_numeric<uint16_t>(_value) {}
+
+  // Splat constructor
+  simdutf_really_inline simd16(uint16_t _value) : simd16(splat(_value)) {}
+  // Array constructor
+  simdutf_really_inline simd16(const uint16_t *values) : simd16(load(values)) {}
+  simdutf_really_inline simd16(const char16_t *values)
+      : simd16(load(reinterpret_cast<const uint16_t *>(values))) {}
+
+  // Saturated math
+  simdutf_really_inline simd16<uint16_t>
+  saturating_add(const simd16<uint16_t> other) const {
+    return __lasx_xvsadd_hu(this->value, other.value);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  saturating_sub(const simd16<uint16_t> other) const {
+    return __lasx_xvssub_hu(this->value, other.value);
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(
-      const char *, size_t) const noexcept override {
-    return 0;
+  // Order-specific operations
+  simdutf_really_inline simd16<uint16_t>
+  max_val(const simd16<uint16_t> other) const {
+    return __lasx_xvmax_hu(this->value, other.value);
+  }
+  simdutf_really_inline simd16<uint16_t>
+  min_val(const simd16<uint16_t> other) const {
+    return __lasx_xvmin_hu(this->value, other.value);
+  }
+  // Same as >, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  gt_bits(const simd16<uint16_t> other) const {
+    return this->saturating_sub(other);
+  }
+  // Same as <, but only guarantees true is nonzero (< guarantees true = -1)
+  simdutf_really_inline simd16<uint16_t>
+  lt_bits(const simd16<uint16_t> other) const {
+    return other.saturating_sub(*this);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<=(const simd16<uint16_t> other) const {
+    return __lasx_xvsle_hu(this->value, other.value);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>=(const simd16<uint16_t> other) const {
+    return __lasx_xvsle_hu(other.value, this->value);
+  }
+  simdutf_really_inline simd16<bool>
+  operator>(const simd16<uint16_t> other) const {
+    return __lasx_xvslt_hu(other.value, this->value);
+  }
+  simdutf_really_inline simd16<bool>
+  operator<(const simd16<uint16_t> other) const {
+    return __lasx_xvslt_hu(this->value, other.value);
   }
 
-  simdutf_warn_unused result
-  base64_to_binary(const char *, size_t, char *, base64_options,
-                   last_chunk_handling_options) const noexcept override {
-    return result(error_code::OTHER, 0);
+  // Bit-specific operations
+  simdutf_really_inline simd16<bool> bits_not_set() const {
+    return *this == uint16_t(0);
+  }
+  simdutf_really_inline simd16<bool> bits_not_set(simd16<uint16_t> bits) const {
+    return (*this & bits).bits_not_set();
+  }
+  simdutf_really_inline simd16<bool> any_bits_set() const {
+    return ~this->bits_not_set();
+  }
+  simdutf_really_inline simd16<bool> any_bits_set(simd16<uint16_t> bits) const {
+    return ~this->bits_not_set(bits);
   }
 
-  simdutf_warn_unused full_result base64_to_binary_details(
-      const char *, size_t, char *, base64_options,
-      last_chunk_handling_options) const noexcept override {
-    return full_result(error_code::OTHER, 0, 0);
+  simdutf_really_inline bool any_bits_set_anywhere() const {
+    if (__lasx_xbnz_v(this->value))
+      return true;
+    return false;
+  }
+  simdutf_really_inline bool
+  any_bits_set_anywhere(simd16<uint16_t> bits) const {
+    return (*this & bits).any_bits_set_anywhere();
   }
 
-  simdutf_warn_unused size_t maximal_binary_length_from_base64(
-      const char16_t *, size_t) const noexcept override {
-    return 0;
+  template <int N> simdutf_really_inline simd16<uint16_t> shr() const {
+    return simd16<uint16_t>(__lasx_xvsrli_h(this->value, N));
+  }
+  template <int N> simdutf_really_inline simd16<uint16_t> shl() const {
+    return simd16<uint16_t>(__lasx_xvslli_h(this->value, N));
   }
 
-  simdutf_warn_unused result
-  base64_to_binary(const char16_t *, size_t, char *, base64_options,
-                   last_chunk_handling_options) const noexcept override {
-    return result(error_code::OTHER, 0);
+  // Change the endianness
+  simdutf_really_inline simd16<uint16_t> swap_bytes() const {
+    return __lasx_xvshuf4i_b(this->value, 0b10110001);
   }
 
-  simdutf_warn_unused full_result base64_to_binary_details(
-      const char16_t *, size_t, char *, base64_options,
-      last_chunk_handling_options) const noexcept override {
-    return full_result(error_code::OTHER, 0, 0);
+  // Pack with the unsigned saturation of two uint16_t code units into single
+  // uint8_t vector
+  static simdutf_really_inline simd8<uint8_t> pack(const simd16<uint16_t> &v0,
+                                                   const simd16<uint16_t> &v1) {
+    return __lasx_xvpermi_d(__lasx_xvssrlni_bu_h(v1.value, v0.value, 0),
+                            0b11011000);
   }
+};
 
-  simdutf_warn_unused size_t
-  base64_length_from_binary(size_t, base64_options) const noexcept override {
-    return 0;
+template <typename T> struct simd16x32 {
+  static constexpr int NUM_CHUNKS = 64 / sizeof(simd16<T>);
+  static_assert(NUM_CHUNKS == 2,
+                "LASX kernel should use two registers per 64-byte block.");
+  simd16<T> chunks[NUM_CHUNKS];
+
+  simd16x32(const simd16x32<T> &o) = delete; // no copy allowed
+  simd16x32<T> &
+  operator=(const simd16<T> other) = delete; // no assignment allowed
+  simd16x32() = delete;                      // no default constructor allowed
+
+  simdutf_really_inline simd16x32(const simd16<T> chunk0,
+                                  const simd16<T> chunk1)
+      : chunks{chunk0, chunk1} {}
+  simdutf_really_inline simd16x32(const T *ptr)
+      : chunks{simd16<T>::load(ptr),
+               simd16<T>::load(ptr + sizeof(simd16<T>) / sizeof(T))} {}
+
+  simdutf_really_inline void store(T *ptr) const {
+    this->chunks[0].store(ptr + sizeof(simd16<T>) * 0 / sizeof(T));
+    this->chunks[1].store(ptr + sizeof(simd16<T>) * 1 / sizeof(T));
   }
 
-  size_t binary_to_base64(const char *, size_t, char *,
-                          base64_options) const noexcept override {
-    return 0;
+  simdutf_really_inline uint64_t to_bitmask() const {
+    uint64_t r_lo = uint32_t(this->chunks[0].to_bitmask());
+    uint64_t r_hi = this->chunks[1].to_bitmask();
+    return r_lo | (r_hi << 32);
   }
 
-  unsupported_implementation()
-      : implementation("unsupported",
-                       "Unsupported CPU (no detected SIMD instructions)", 0) {}
-};
+  simdutf_really_inline simd16<T> reduce_or() const {
+    return this->chunks[0] | this->chunks[1];
+  }
 
-const unsupported_implementation *get_unsupported_singleton() {
-  static const unsupported_implementation unsupported_singleton{};
-  return &unsupported_singleton;
-}
-static_assert(std::is_trivially_destructible<unsupported_implementation>::value,
-              "unsupported_singleton should be trivially destructible");
+  simdutf_really_inline bool is_ascii() const {
+    return this->reduce_or().is_ascii();
+  }
 
-size_t available_implementation_list::size() const noexcept {
-  return internal::get_available_implementation_pointers().size();
-}
-const implementation *const *
-available_implementation_list::begin() const noexcept {
-  return internal::get_available_implementation_pointers().begin();
-}
-const implementation *const *
-available_implementation_list::end() const noexcept {
-  return internal::get_available_implementation_pointers().end();
-}
-const implementation *
-available_implementation_list::detect_best_supported() const noexcept {
-  // They are prelisted in priority order, so we just go down the list
-  uint32_t supported_instruction_sets =
-      internal::detect_supported_architectures();
-  for (const implementation *impl :
-       internal::get_available_implementation_pointers()) {
-    uint32_t required_instruction_sets = impl->required_instruction_sets();
-    if ((supported_instruction_sets & required_instruction_sets) ==
-        required_instruction_sets) {
-      return impl;
-    }
+  simdutf_really_inline void store_ascii_as_utf16(char16_t *ptr) const {
+    this->chunks[0].store_ascii_as_utf16(ptr + sizeof(simd16<T>) * 0);
+    this->chunks[1].store_ascii_as_utf16(ptr + sizeof(simd16<T>));
   }
-  return get_unsupported_singleton(); // this should never happen?
-}
 
-const implementation *
-detect_best_supported_implementation_on_first_use::set_best() const noexcept {
-  SIMDUTF_PUSH_DISABLE_WARNINGS
-  SIMDUTF_DISABLE_DEPRECATED_WARNING // Disable CRT_SECURE warning on MSVC:
-                                     // manually verified this is safe
-      char *force_implementation_name = getenv("SIMDUTF_FORCE_IMPLEMENTATION");
-  SIMDUTF_POP_DISABLE_WARNINGS
+  simdutf_really_inline simd16x32<T> bit_or(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<T>(this->chunks[0] | mask, this->chunks[1] | mask);
+  }
 
-  if (force_implementation_name) {
-    auto force_implementation =
-        get_available_implementations()[force_implementation_name];
-    if (force_implementation) {
-      return get_active_implementation() = force_implementation;
-    } else {
-      // Note: abort() and stderr usage within the library is forbidden.
-      return get_active_implementation() = get_unsupported_singleton();
-    }
+  simdutf_really_inline void swap_bytes() {
+    this->chunks[0] = this->chunks[0].swap_bytes();
+    this->chunks[1] = this->chunks[1].swap_bytes();
   }
-  return get_active_implementation() =
-             get_available_implementations().detect_best_supported();
-}
 
-} // namespace internal
+  simdutf_really_inline uint64_t eq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] == mask, this->chunks[1] == mask)
+        .to_bitmask();
+  }
 
-/**
- * The list of available implementations compiled into simdutf.
- */
-SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list &
-get_available_implementations() {
-  static const internal::available_implementation_list
-      available_implementations{};
-  return available_implementations;
-}
+  simdutf_really_inline uint64_t eq(const simd16x32<uint16_t> &other) const {
+    return simd16x32<bool>(this->chunks[0] == other.chunks[0],
+                           this->chunks[1] == other.chunks[1])
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t lteq(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] <= mask, this->chunks[1] <= mask)
+        .to_bitmask();
+  }
+
+  simdutf_really_inline uint64_t in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(low);
+    const simd16<T> mask_high = simd16<T>::splat(high);
+
+    return simd16x32<bool>(
+               (this->chunks[0] <= mask_high) & (this->chunks[0] >= mask_low),
+               (this->chunks[1] <= mask_high) & (this->chunks[1] >= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t not_in_range(const T low, const T high) const {
+    const simd16<T> mask_low = simd16<T>::splat(static_cast<T>(low - 1));
+    const simd16<T> mask_high = simd16<T>::splat(static_cast<T>(high + 1));
+    return simd16x32<bool>(
+               (this->chunks[0] >= mask_high) | (this->chunks[0] <= mask_low),
+               (this->chunks[1] >= mask_high) | (this->chunks[1] <= mask_low))
+        .to_bitmask();
+  }
+  simdutf_really_inline uint64_t lt(const T m) const {
+    const simd16<T> mask = simd16<T>::splat(m);
+    return simd16x32<bool>(this->chunks[0] < mask, this->chunks[1] < mask)
+        .to_bitmask();
+  }
+}; // struct simd16x32<T>
+/* end file src/simdutf/lasx/simd16-inl.h */
+} // namespace simd
+} // unnamed namespace
+} // namespace lasx
+} // namespace simdutf
+
+#endif // SIMDUTF_LASX_SIMD_H
+/* end file src/simdutf/lasx/simd.h */
 
+/* begin file src/simdutf/lasx/end.h */
+/* end file src/simdutf/lasx/end.h */
+
+#endif // SIMDUTF_IMPLEMENTATION_LASX
+
+#endif // SIMDUTF_LASX_H
+/* end file src/simdutf/lasx.h */
+/* begin file src/simdutf/fallback.h */
+#ifndef SIMDUTF_FALLBACK_H
+#define SIMDUTF_FALLBACK_H
+
+
+// Note that fallback.h is always imported last.
+
+// Default Fallback to on unless a builtin implementation has already been
+// selected.
+#ifndef SIMDUTF_IMPLEMENTATION_FALLBACK
+  #if SIMDUTF_CAN_ALWAYS_RUN_ARM64 || SIMDUTF_CAN_ALWAYS_RUN_ICELAKE ||        \
+      SIMDUTF_CAN_ALWAYS_RUN_HASWELL || SIMDUTF_CAN_ALWAYS_RUN_WESTMERE ||     \
+      SIMDUTF_CAN_ALWAYS_RUN_PPC64 || SIMDUTF_CAN_ALWAYS_RUN_RVV ||            \
+      SIMDUTF_CAN_ALWAYS_RUN_LSX || SIMDUTF_CAN_ALWAYS_RUN_LASX
+    #define SIMDUTF_IMPLEMENTATION_FALLBACK 0
+  #else
+    #define SIMDUTF_IMPLEMENTATION_FALLBACK 1
+  #endif
+#endif
+
+#define SIMDUTF_CAN_ALWAYS_RUN_FALLBACK (SIMDUTF_IMPLEMENTATION_FALLBACK)
+
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+
+namespace simdutf {
 /**
- * The active implementation.
+ * Fallback implementation (runs on any machine).
  */
-SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation> &
-get_active_implementation() {
-#if SIMDUTF_SINGLE_IMPLEMENTATION
-  // skip runtime detection
-  static internal::atomic_ptr<const implementation> active_implementation{
-      internal::get_single_implementation()};
-  return active_implementation;
-#else
-  static const internal::detect_best_supported_implementation_on_first_use
-      detect_best_supported_implementation_on_first_use_singleton;
-  static internal::atomic_ptr<const implementation> active_implementation{
-      &detect_best_supported_implementation_on_first_use_singleton};
-  return active_implementation;
-#endif
-}
+namespace fallback {} // namespace fallback
+} // namespace simdutf
 
-#if SIMDUTF_SINGLE_IMPLEMENTATION
-const implementation *get_default_implementation() {
-  return internal::get_single_implementation();
-}
-#else
-internal::atomic_ptr<const implementation> &get_default_implementation() {
-  return get_active_implementation();
-}
-#endif
-#define SIMDUTF_GET_CURRENT_IMPLEMENTION
+/* begin file src/simdutf/fallback/implementation.h */
+#ifndef SIMDUTF_FALLBACK_IMPLEMENTATION_H
+#define SIMDUTF_FALLBACK_IMPLEMENTATION_H
 
-simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept {
-  return get_default_implementation()->validate_utf8(buf, len);
-}
-simdutf_warn_unused result validate_utf8_with_errors(const char *buf,
-                                                     size_t len) noexcept {
-  return get_default_implementation()->validate_utf8_with_errors(buf, len);
-}
-simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) noexcept {
-  return get_default_implementation()->validate_ascii(buf, len);
-}
-simdutf_warn_unused result validate_ascii_with_errors(const char *buf,
-                                                      size_t len) noexcept {
-  return get_default_implementation()->validate_ascii_with_errors(buf, len);
-}
-simdutf_warn_unused size_t convert_utf8_to_utf16(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf8_to_utf16be(input, length, utf16_output);
-#else
-  return convert_utf8_to_utf16le(input, length, utf16_output);
-#endif
-}
-simdutf_warn_unused size_t convert_latin1_to_utf8(const char *buf, size_t len,
-                                                  char *utf8_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf8(buf, len,
-                                                              utf8_output);
-}
-simdutf_warn_unused size_t convert_latin1_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf16le(buf, len,
-                                                                 utf16_output);
+
+namespace simdutf {
+namespace fallback {
+
+namespace {
+using namespace simdutf;
 }
-simdutf_warn_unused size_t convert_latin1_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf16be(buf, len,
-                                                                 utf16_output);
-}
-simdutf_warn_unused size_t convert_latin1_to_utf32(
-    const char *buf, size_t len, char32_t *latin1_output) noexcept {
-  return get_default_implementation()->convert_latin1_to_utf32(buf, len,
-                                                               latin1_output);
+
+class implementation final : public simdutf::implementation {
+public:
+  simdutf_really_inline implementation()
+      : simdutf::implementation("fallback", "Generic fallback implementation",
+                                0) {}
+  simdutf_warn_unused int detect_encodings(const char *input,
+                                           size_t length) const noexcept final;
+  simdutf_warn_unused bool validate_utf8(const char *buf,
+                                         size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_utf8_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_ascii(const char *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result
+  validate_ascii_with_errors(const char *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                            size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                          size_t len) const noexcept final;
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *buf, size_t len, char *utf8_output) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len, char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len, char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len, char16_t *utf16_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len, char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len, char32_t *utf32_output) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len, char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_latin1(const char16_t *buf, size_t len,
+                                  char *latin1_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *buf, size_t len, char *utf8_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final;
+  simdutf_warn_unused result
+  convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                      char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_latin1(const char32_t *buf, size_t len,
+                                char *latin1_output) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16le(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf16be(const char32_t *buf, size_t len,
+                           char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16le(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf16be(const char32_t *buf, size_t len,
+                                 char16_t *utf16_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16le_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  simdutf_warn_unused size_t
+  convert_valid_utf16be_to_utf32(const char16_t *buf, size_t len,
+                                 char32_t *utf32_buffer) const noexcept final;
+  void change_endianness_utf16(const char16_t *buf, size_t length,
+                               char16_t *output) const noexcept final;
+  simdutf_warn_unused size_t count_utf16le(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf16be(const char16_t *buf,
+                                           size_t length) const noexcept;
+  simdutf_warn_unused size_t count_utf8(const char *buf,
+                                        size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t length) const noexcept;
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *input, size_t length) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept;
+  simdutf_warn_unused result base64_to_binary(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options) const noexcept;
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept;
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_options =
+          last_chunk_handling_options::loose) const noexcept;
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept;
+};
+} // namespace fallback
+} // namespace simdutf
+
+#endif // SIMDUTF_FALLBACK_IMPLEMENTATION_H
+/* end file src/simdutf/fallback/implementation.h */
+
+/* begin file src/simdutf/fallback/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "fallback"
+// #define SIMDUTF_IMPLEMENTATION fallback
+/* end file src/simdutf/fallback/begin.h */
+
+  // Declarations
+/* begin file src/simdutf/fallback/bitmanipulation.h */
+#ifndef SIMDUTF_FALLBACK_BITMANIPULATION_H
+#define SIMDUTF_FALLBACK_BITMANIPULATION_H
+
+#include <limits>
+
+namespace simdutf {
+namespace fallback {
+namespace {} // unnamed namespace
+} // namespace fallback
+} // namespace simdutf
+
+#endif // SIMDUTF_FALLBACK_BITMANIPULATION_H
+/* end file src/simdutf/fallback/bitmanipulation.h */
+
+/* begin file src/simdutf/fallback/end.h */
+/* end file src/simdutf/fallback/end.h */
+
+#endif // SIMDUTF_IMPLEMENTATION_FALLBACK
+#endif // SIMDUTF_FALLBACK_H
+/* end file src/simdutf/fallback.h */
+
+/* begin file src/scalar/utf8.h */
+#ifndef SIMDUTF_UTF8_H
+#define SIMDUTF_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8 {
+#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_RVV
+// only used by the fallback kernel.
+// credit: based on code from Google Fuchsia (Apache Licensed)
+inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  uint64_t pos = 0;
+  uint32_t code_point = 0;
+  while (pos < len) {
+    // check of the next 16 bytes are ascii.
+    uint64_t next_pos = pos + 16;
+    if (next_pos <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      std::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        pos = next_pos;
+        continue;
+      }
+    }
+    unsigned char byte = data[pos];
+
+    while (byte < 0b10000000) {
+      if (++pos == len) {
+        return true;
+      }
+      byte = data[pos];
+    }
+
+    if ((byte & 0b11100000) == 0b11000000) {
+      next_pos = pos + 2;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if ((code_point < 0x80) || (0x7ff < code_point)) {
+        return false;
+      }
+    } else if ((byte & 0b11110000) == 0b11100000) {
+      next_pos = pos + 3;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point = (byte & 0b00001111) << 12 |
+                   (data[pos + 1] & 0b00111111) << 6 |
+                   (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point) ||
+          (0xd7ff < code_point && code_point < 0xe000)) {
+        return false;
+      }
+    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
+      next_pos = pos + 4;
+      if (next_pos > len) {
+        return false;
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return false;
+      }
+      // range check
+      code_point =
+          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
+          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return false;
+      }
+    } else {
+      // we may have a continuation
+      return false;
+    }
+    pos = next_pos;
+  }
+  return true;
 }
-simdutf_warn_unused size_t convert_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_latin1(buf, len,
-                                                              latin1_output);
+#endif
+
+inline simdutf_warn_unused result validate_with_errors(const char *buf,
+                                                       size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  uint32_t code_point = 0;
+  while (pos < len) {
+    // check of the next 16 bytes are ascii.
+    size_t next_pos = pos + 16;
+    if (next_pos <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      std::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        pos = next_pos;
+        continue;
+      }
+    }
+    unsigned char byte = data[pos];
+
+    while (byte < 0b10000000) {
+      if (++pos == len) {
+        return result(error_code::SUCCESS, len);
+      }
+      byte = data[pos];
+    }
+
+    if ((byte & 0b11100000) == 0b11000000) {
+      next_pos = pos + 2;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if ((code_point < 0x80) || (0x7ff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+    } else if ((byte & 0b11110000) == 0b11100000) {
+      next_pos = pos + 3;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point = (byte & 0b00001111) << 12 |
+                   (data[pos + 1] & 0b00111111) << 6 |
+                   (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
+    } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
+      next_pos = pos + 4;
+      if (next_pos > len) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      code_point =
+          (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
+          (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+    } else {
+      // we either have too many continuation bytes or an invalid leading byte
+      if ((byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      } else {
+        return result(error_code::HEADER_BITS, pos);
+      }
+    }
+    pos = next_pos;
+  }
+  return result(error_code::SUCCESS, len);
 }
-simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
-    const char *buf, size_t len, char *latin1_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_latin1_with_errors(
-      buf, len, latin1_output);
+
+// Finds the previous leading byte starting backward from buf and validates with
+// errors from there Used to pinpoint the location of an error when an invalid
+// chunk is detected We assume that the stream starts with a leading byte, and
+// to check that it is the case, we ask that you pass a pointer to the start of
+// the stream (start).
+inline simdutf_warn_unused result rewind_and_validate_with_errors(
+    const char *start, const char *buf, size_t len) noexcept {
+  // First check that we start with a leading byte
+  if ((*start & 0b11000000) == 0b10000000) {
+    return result(error_code::TOO_LONG, 0);
+  }
+  size_t extra_len{0};
+  // A leading byte cannot be further than 4 bytes away
+  for (int i = 0; i < 5; i++) {
+    unsigned char byte = *buf;
+    if ((byte & 0b11000000) != 0b10000000) {
+      break;
+    } else {
+      buf--;
+      extra_len++;
+    }
+  }
+
+  result res = validate_with_errors(buf, len + extra_len);
+  res.count -= extra_len;
+  return res;
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_latin1(
-      buf, len, latin1_output);
+
+inline size_t count_code_points(const char *buf, size_t len) {
+  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    // -65 is 0b10111111, anything larger in two-complement's should start a new
+    // code point.
+    if (p[i] > -65) {
+      counter++;
+    }
+  }
+  return counter;
 }
-simdutf_warn_unused size_t convert_utf8_to_utf16le(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16le(input, length,
-                                                               utf16_output);
+
+inline size_t utf16_length_from_utf8(const char *buf, size_t len) {
+  const int8_t *p = reinterpret_cast<const int8_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    if (p[i] > -65) {
+      counter++;
+    }
+    if (uint8_t(p[i]) >= 240) {
+      counter++;
+    }
+  }
+  return counter;
 }
-simdutf_warn_unused size_t convert_utf8_to_utf16be(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16be(input, length,
-                                                               utf16_output);
+
+simdutf_warn_unused inline size_t trim_partial_utf8(const char *input,
+                                                    size_t length) {
+  if (length < 3) {
+    switch (length) {
+    case 2:
+      if (uint8_t(input[length - 1]) >= 0xc0) {
+        return length - 1;
+      } // 2-, 3- and 4-byte characters with only 1 byte left
+      if (uint8_t(input[length - 2]) >= 0xe0) {
+        return length - 2;
+      } // 3- and 4-byte characters with only 2 bytes left
+      return length;
+    case 1:
+      if (uint8_t(input[length - 1]) >= 0xc0) {
+        return length - 1;
+      } // 2-, 3- and 4-byte characters with only 1 byte left
+      return length;
+    case 0:
+      return length;
+    }
+  }
+  if (uint8_t(input[length - 1]) >= 0xc0) {
+    return length - 1;
+  } // 2-, 3- and 4-byte characters with only 1 byte left
+  if (uint8_t(input[length - 2]) >= 0xe0) {
+    return length - 2;
+  } // 3- and 4-byte characters with only 1 byte left
+  if (uint8_t(input[length - 3]) >= 0xf0) {
+    return length - 3;
+  } // 4-byte characters with only 3 bytes left
+  return length;
 }
-simdutf_warn_unused result convert_utf8_to_utf16_with_errors(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf8_to_utf16be_with_errors(input, length, utf16_output);
-#else
-  return convert_utf8_to_utf16le_with_errors(input, length, utf16_output);
+
+} // namespace utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
 #endif
+/* end file src/scalar/utf8.h */
+/* begin file src/scalar/utf16.h */
+#ifndef SIMDUTF_UTF16_H
+#define SIMDUTF_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16 {
+
+inline simdutf_warn_unused uint16_t swap_bytes(const uint16_t word) {
+  return uint16_t((word >> 8) | (word << 8));
 }
-simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16le_with_errors(
-      input, length, utf16_output);
+
+template <endianness big_endian>
+inline simdutf_warn_unused bool validate(const char16_t *buf,
+                                         size_t len) noexcept {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  uint64_t pos = 0;
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) == 0xD800) {
+      if (pos + 1 >= len) {
+        return false;
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return false;
+      }
+      uint16_t next_word =
+          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return false;
+      }
+      pos += 2;
+    } else {
+      pos++;
+    }
+  }
+  return true;
 }
-simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
-    const char *input, size_t length, char16_t *utf16_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf16be_with_errors(
-      input, length, utf16_output);
+
+template <endianness big_endian>
+inline simdutf_warn_unused result validate_with_errors(const char16_t *buf,
+                                                       size_t len) noexcept {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) == 0xD800) {
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t next_word =
+          !match_system(big_endian) ? swap_bytes(data[pos + 1]) : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      pos += 2;
+    } else {
+      pos++;
+    }
+  }
+  return result(error_code::SUCCESS, pos);
 }
-simdutf_warn_unused size_t convert_utf8_to_utf32(
-    const char *input, size_t length, char32_t *utf32_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf32(input, length,
-                                                             utf32_output);
+
+template <endianness big_endian>
+inline size_t count_code_points(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter += ((word & 0xFC00) != 0xDC00);
+  }
+  return counter;
 }
-simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
-    const char *input, size_t length, char32_t *utf32_output) noexcept {
-  return get_default_implementation()->convert_utf8_to_utf32_with_errors(
-      input, length, utf32_output);
+
+template <endianness big_endian>
+inline size_t utf8_length_from_utf16(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter++; // ASCII
+    counter += static_cast<size_t>(
+        word >
+        0x7F); // non-ASCII is at least 2 bytes, surrogates are 2*2 == 4 bytes
+    counter += static_cast<size_t>((word > 0x7FF && word <= 0xD7FF) ||
+                                   (word >= 0xE000)); // three-byte
+  }
+  return counter;
 }
-simdutf_warn_unused bool validate_utf16(const char16_t *buf,
-                                        size_t len) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return validate_utf16be(buf, len);
-#else
-  return validate_utf16le(buf, len);
-#endif
+
+template <endianness big_endian>
+inline size_t utf32_length_from_utf16(const char16_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint16_t *p = reinterpret_cast<const uint16_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    uint16_t word = !match_system(big_endian) ? swap_bytes(p[i]) : p[i];
+    counter += ((word & 0xFC00) != 0xDC00);
+  }
+  return counter;
 }
-simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
-                                          size_t len) noexcept {
-  return get_default_implementation()->validate_utf16le(buf, len);
+
+inline size_t latin1_length_from_utf16(size_t len) { return len; }
+
+simdutf_really_inline void change_endianness_utf16(const char16_t *in,
+                                                   size_t size, char16_t *out) {
+  const uint16_t *input = reinterpret_cast<const uint16_t *>(in);
+  uint16_t *output = reinterpret_cast<uint16_t *>(out);
+  for (size_t i = 0; i < size; i++) {
+    *output++ = uint16_t(input[i] >> 8 | input[i] << 8);
+  }
 }
-simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
-                                          size_t len) noexcept {
-  return get_default_implementation()->validate_utf16be(buf, len);
+
+template <endianness big_endian>
+simdutf_warn_unused inline size_t trim_partial_utf16(const char16_t *input,
+                                                     size_t length) {
+  if (length <= 1) {
+    return length;
+  }
+  uint16_t last_word = uint16_t(input[length - 1]);
+  last_word = !match_system(big_endian) ? swap_bytes(last_word) : last_word;
+  length -= ((last_word & 0xFC00) == 0xD800);
+  return length;
 }
-simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf,
-                                                      size_t len) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return validate_utf16be_with_errors(buf, len);
-#else
-  return validate_utf16le_with_errors(buf, len);
+
+} // namespace utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
 #endif
+/* end file src/scalar/utf16.h */
+/* begin file src/scalar/utf32.h */
+#ifndef SIMDUTF_UTF32_H
+#define SIMDUTF_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32 {
+
+inline simdutf_warn_unused bool validate(const char32_t *buf,
+                                         size_t len) noexcept {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  uint64_t pos = 0;
+  for (; pos < len; pos++) {
+    uint32_t word = data[pos];
+    if (word > 0x10FFFF || (word >= 0xD800 && word <= 0xDFFF)) {
+      return false;
+    }
+  }
+  return true;
 }
-simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf,
-                                                        size_t len) noexcept {
-  return get_default_implementation()->validate_utf16le_with_errors(buf, len);
-}
-simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf,
-                                                        size_t len) noexcept {
-  return get_default_implementation()->validate_utf16be_with_errors(buf, len);
-}
-simdutf_warn_unused bool validate_utf32(const char32_t *buf,
-                                        size_t len) noexcept {
-  return get_default_implementation()->validate_utf32(buf, len);
-}
-simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf,
-                                                      size_t len) noexcept {
-  return get_default_implementation()->validate_utf32_with_errors(buf, len);
+
+inline simdutf_warn_unused result validate_with_errors(const char32_t *buf,
+                                                       size_t len) noexcept {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  for (; pos < len; pos++) {
+    uint32_t word = data[pos];
+    if (word > 0x10FFFF) {
+      return result(error_code::TOO_LARGE, pos);
+    }
+    if (word >= 0xD800 && word <= 0xDFFF) {
+      return result(error_code::SURROGATE, pos);
+    }
+  }
+  return result(error_code::SUCCESS, pos);
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16(
-    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_valid_utf8_to_utf16be(input, length, utf16_buffer);
-#else
-  return convert_valid_utf8_to_utf16le(input, length, utf16_buffer);
-#endif
+
+inline size_t utf8_length_from_utf32(const char32_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    // credit: @ttsugriy  for the vectorizable approach
+    counter++;                                     // ASCII
+    counter += static_cast<size_t>(p[i] > 0x7F);   // two-byte
+    counter += static_cast<size_t>(p[i] > 0x7FF);  // three-byte
+    counter += static_cast<size_t>(p[i] > 0xFFFF); // four-bytes
+  }
+  return counter;
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
-    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf16le(
-      input, length, utf16_buffer);
+
+inline size_t utf16_length_from_utf32(const char32_t *buf, size_t len) {
+  // We are not BOM aware.
+  const uint32_t *p = reinterpret_cast<const uint32_t *>(buf);
+  size_t counter{0};
+  for (size_t i = 0; i < len; i++) {
+    counter++;                                     // non-surrogate word
+    counter += static_cast<size_t>(p[i] > 0xFFFF); // surrogate pair
+  }
+  return counter;
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
-    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf16be(
-      input, length, utf16_buffer);
+
+inline size_t latin1_length_from_utf32(size_t len) {
+  // We are not BOM aware.
+  return len; // a utf32 codepoint will always represent 1 latin1 character
 }
-simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
-    const char *input, size_t length, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf8_to_utf32(
-      input, length, utf32_buffer);
+
+inline simdutf_warn_unused uint32_t swap_bytes(const uint32_t word) {
+  return ((word >> 24) & 0xff) |      // move byte 3 to byte 0
+         ((word << 8) & 0xff0000) |   // move byte 1 to byte 2
+         ((word >> 8) & 0xff00) |     // move byte 2 to byte 1
+         ((word << 24) & 0xff000000); // byte 0 to byte 3
 }
-simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t *buf,
-                                                 size_t len,
-                                                 char *utf8_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_utf8(buf, len, utf8_buffer);
-#else
-  return convert_utf16le_to_utf8(buf, len, utf8_buffer);
+
+} // namespace utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
 #endif
+/* end file src/scalar/utf32.h */
+/* begin file src/scalar/base64.h */
+#ifndef SIMDUTF_BASE64_H
+#define SIMDUTF_BASE64_H
+
+#include <cstddef>
+#include <cstdint>
+#include <cstring>
+#include <iostream>
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace base64 {
+
+// This function is not expected to be fast. Do not use in long loops.
+template <class char_type> bool is_ascii_white_space(char_type c) {
+  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f';
 }
-simdutf_warn_unused size_t convert_utf16_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_latin1(buf, len, latin1_buffer);
-#else
-  return convert_utf16le_to_latin1(buf, len, latin1_buffer);
-#endif
+
+template <class char_type> bool is_ascii_white_space_or_padding(char_type c) {
+  return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f' ||
+         c == '=';
 }
-simdutf_warn_unused size_t convert_latin1_to_utf16(
-    const char *buf, size_t len, char16_t *utf16_output) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_latin1_to_utf16be(buf, len, utf16_output);
-#else
-  return convert_latin1_to_utf16le(buf, len, utf16_output);
-#endif
+
+template <class char_type> bool is_eight_byte(char_type c) {
+  if (sizeof(char_type) == 1) {
+    return true;
+  }
+  return uint8_t(c) == c;
 }
-simdutf_warn_unused size_t convert_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_latin1(buf, len,
-                                                                 latin1_buffer);
+
+// Returns true upon success. The destination buffer must be large enough.
+// This functions assumes that the padding (=) has been removed.
+template <class char_type>
+full_result
+base64_tail_decode(char *dst, const char_type *src, size_t length,
+                   size_t padded_characters, // number of padding characters
+                                             // '=', typically 0, 1, 2.
+                   base64_options options,
+                   last_chunk_handling_options last_chunk_options) {
+  // This looks like 5 branches, but we expect the compiler to resolve this to a
+  // single branch:
+  const uint8_t *to_base64 = (options & base64_url)
+                                 ? tables::base64::to_base64_url_value
+                                 : tables::base64::to_base64_value;
+  const uint32_t *d0 = (options & base64_url)
+                           ? tables::base64::base64_url::d0
+                           : tables::base64::base64_default::d0;
+  const uint32_t *d1 = (options & base64_url)
+                           ? tables::base64::base64_url::d1
+                           : tables::base64::base64_default::d1;
+  const uint32_t *d2 = (options & base64_url)
+                           ? tables::base64::base64_url::d2
+                           : tables::base64::base64_default::d2;
+  const uint32_t *d3 = (options & base64_url)
+                           ? tables::base64::base64_url::d3
+                           : tables::base64::base64_default::d3;
+
+  const char_type *srcend = src + length;
+  const char_type *srcinit = src;
+  const char *dstinit = dst;
+
+  uint32_t x;
+  size_t idx;
+  uint8_t buffer[4];
+  while (true) {
+    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
+           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
+           is_eight_byte(src[3]) &&
+           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
+                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
+      if (match_system(endianness::BIG)) {
+        x = scalar::utf32::swap_bytes(x);
+      }
+      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
+      dst += 3;
+      src += 4;
+    }
+    idx = 0;
+    // we need at least four characters.
+    while (idx < 4 && src < srcend) {
+      char_type c = *src;
+      uint8_t code = to_base64[uint8_t(c)];
+      buffer[idx] = uint8_t(code);
+      if (is_eight_byte(c) && code <= 63) {
+        idx++;
+      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
+        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      } else {
+        // We have a space or a newline. We ignore it.
+      }
+      src++;
+    }
+    if (idx != 4) {
+      if (last_chunk_options == last_chunk_handling_options::strict &&
+          (idx != 1) && ((idx + padded_characters) & 3) != 0) {
+        // The partial chunk was at src - idx
+        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      } else if (last_chunk_options ==
+                     last_chunk_handling_options::stop_before_partial &&
+                 (idx != 1) && ((idx + padded_characters) & 3) != 0) {
+        // Rewind src to before partial chunk
+        src -= idx;
+        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
+      } else {
+        if (idx == 2) {
+          uint32_t triple =
+              (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6);
+          if ((last_chunk_options == last_chunk_handling_options::strict) &&
+              (triple & 0xffff)) {
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
+          if (match_system(endianness::BIG)) {
+            triple <<= 8;
+            std::memcpy(dst, &triple, 1);
+          } else {
+            triple = scalar::utf32::swap_bytes(triple);
+            triple >>= 8;
+            std::memcpy(dst, &triple, 1);
+          }
+          dst += 1;
+        } else if (idx == 3) {
+          uint32_t triple = (uint32_t(buffer[0]) << 3 * 6) +
+                            (uint32_t(buffer[1]) << 2 * 6) +
+                            (uint32_t(buffer[2]) << 1 * 6);
+          if ((last_chunk_options == last_chunk_handling_options::strict) &&
+              (triple & 0xff)) {
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
+          if (match_system(endianness::BIG)) {
+            triple <<= 8;
+            std::memcpy(dst, &triple, 2);
+          } else {
+            triple = scalar::utf32::swap_bytes(triple);
+            triple >>= 8;
+            std::memcpy(dst, &triple, 2);
+          }
+          dst += 2;
+        } else if (idx == 1) {
+          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+                  size_t(dst - dstinit)};
+        }
+        return {SUCCESS, size_t(src - srcinit), size_t(dst - dstinit)};
+      }
+    }
+
+    uint32_t triple =
+        (uint32_t(buffer[0]) << 3 * 6) + (uint32_t(buffer[1]) << 2 * 6) +
+        (uint32_t(buffer[2]) << 1 * 6) + (uint32_t(buffer[3]) << 0 * 6);
+    if (match_system(endianness::BIG)) {
+      triple <<= 8;
+      std::memcpy(dst, &triple, 3);
+    } else {
+      triple = scalar::utf32::swap_bytes(triple);
+      triple >>= 8;
+      std::memcpy(dst, &triple, 3);
+    }
+    dst += 3;
+  }
 }
-simdutf_warn_unused size_t convert_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_latin1(buf, len,
-                                                                 latin1_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_latin1(
-      buf, len, latin1_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_latin1(
-      buf, len, latin1_buffer);
-}
-simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_latin1_with_errors(
-      buf, len, latin1_buffer);
+
+// like base64_tail_decode, but it will not write past the end of the output
+// buffer. The outlen paramter is modified to reflect the number of bytes
+// written. This functions assumes that the padding (=) has been removed.
+template <class char_type>
+result base64_tail_decode_safe(
+    char *dst, size_t &outlen, const char_type *&srcr, size_t length,
+    size_t padded_characters, // number of padding characters '=', typically 0,
+                              // 1, 2.
+    base64_options options, last_chunk_handling_options last_chunk_options) {
+  const char_type *src = srcr;
+  if (length == 0) {
+    outlen = 0;
+    return {SUCCESS, 0};
+  }
+  // This looks like 5 branches, but we expect the compiler to resolve this to a
+  // single branch:
+  const uint8_t *to_base64 = (options & base64_url)
+                                 ? tables::base64::to_base64_url_value
+                                 : tables::base64::to_base64_value;
+  const uint32_t *d0 = (options & base64_url)
+                           ? tables::base64::base64_url::d0
+                           : tables::base64::base64_default::d0;
+  const uint32_t *d1 = (options & base64_url)
+                           ? tables::base64::base64_url::d1
+                           : tables::base64::base64_default::d1;
+  const uint32_t *d2 = (options & base64_url)
+                           ? tables::base64::base64_url::d2
+                           : tables::base64::base64_default::d2;
+  const uint32_t *d3 = (options & base64_url)
+                           ? tables::base64::base64_url::d3
+                           : tables::base64::base64_default::d3;
+
+  const char_type *srcend = src + length;
+  const char_type *srcinit = src;
+  const char *dstinit = dst;
+  const char *dstend = dst + outlen;
+
+  uint32_t x;
+  size_t idx;
+  uint8_t buffer[4];
+  while (true) {
+    while (src + 4 <= srcend && is_eight_byte(src[0]) &&
+           is_eight_byte(src[1]) && is_eight_byte(src[2]) &&
+           is_eight_byte(src[3]) &&
+           (x = d0[uint8_t(src[0])] | d1[uint8_t(src[1])] |
+                d2[uint8_t(src[2])] | d3[uint8_t(src[3])]) < 0x01FFFFFF) {
+      if (dstend - dst < 3) {
+        outlen = size_t(dst - dstinit);
+        srcr = src;
+        return {OUTPUT_BUFFER_TOO_SMALL, size_t(src - srcinit)};
+      }
+      if (match_system(endianness::BIG)) {
+        x = scalar::utf32::swap_bytes(x);
+      }
+      std::memcpy(dst, &x, 3); // optimization opportunity: copy 4 bytes
+      dst += 3;
+      src += 4;
+    }
+    idx = 0;
+    const char_type *srccur = src;
+    // We need at least four characters.
+    while (idx < 4 && src < srcend) {
+      char_type c = *src;
+      uint8_t code = to_base64[uint8_t(c)];
+
+      buffer[idx] = uint8_t(code);
+      if (is_eight_byte(c) && code <= 63) {
+        idx++;
+      } else if (code > 64 || !scalar::base64::is_eight_byte(c)) {
+        outlen = size_t(dst - dstinit);
+        srcr = src;
+        return {INVALID_BASE64_CHARACTER, size_t(src - srcinit)};
+      } else {
+        // We have a space or a newline. We ignore it.
+      }
+      src++;
+    }
+    if (idx != 4) {
+      if (last_chunk_options == last_chunk_handling_options::strict &&
+          ((idx + padded_characters) & 3) != 0) {
+        outlen = size_t(dst - dstinit);
+        srcr = src;
+        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
+      } else if (last_chunk_options ==
+                     last_chunk_handling_options::stop_before_partial &&
+                 ((idx + padded_characters) & 3) != 0) {
+        // Rewind src to before partial chunk
+        srcr = srccur;
+        outlen = size_t(dst - dstinit);
+        return {SUCCESS, size_t(dst - dstinit)};
+      } else { // loose mode
+        if (idx == 0) {
+          // No data left; return success
+          outlen = size_t(dst - dstinit);
+          srcr = src;
+          return {SUCCESS, size_t(dst - dstinit)};
+        } else if (idx == 1) {
+          // Error: Incomplete chunk of length 1 is invalid in loose mode
+          outlen = size_t(dst - dstinit);
+          srcr = src;
+          return {BASE64_INPUT_REMAINDER, size_t(src - srcinit)};
+        } else if (idx == 2 || idx == 3) {
+          // Check if there's enough space in the destination buffer
+          size_t required_space = (idx == 2) ? 1 : 2;
+          if (size_t(dstend - dst) < required_space) {
+            outlen = size_t(dst - dstinit);
+            srcr = src;
+            return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
+          }
+          uint32_t triple = 0;
+          if (idx == 2) {
+            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12);
+            if ((last_chunk_options == last_chunk_handling_options::strict) &&
+                (triple & 0xffff)) {
+              srcr = src;
+              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
+            }
+            // Extract the first byte
+            triple >>= 16;
+            dst[0] = static_cast<char>(triple & 0xFF);
+            dst += 1;
+          } else if (idx == 3) {
+            triple = (uint32_t(buffer[0]) << 18) + (uint32_t(buffer[1]) << 12) +
+                     (uint32_t(buffer[2]) << 6);
+            if ((last_chunk_options == last_chunk_handling_options::strict) &&
+                (triple & 0xff)) {
+              srcr = src;
+              return {BASE64_EXTRA_BITS, size_t(src - srcinit)};
+            }
+            // Extract the first two bytes
+            triple >>= 8;
+            dst[0] = static_cast<char>((triple >> 8) & 0xFF);
+            dst[1] = static_cast<char>(triple & 0xFF);
+            dst += 2;
+          }
+          outlen = size_t(dst - dstinit);
+          srcr = src;
+          return {SUCCESS, size_t(dst - dstinit)};
+        }
+      }
+    }
+
+    if (dstend - dst < 3) {
+      outlen = size_t(dst - dstinit);
+      srcr = src;
+      return {OUTPUT_BUFFER_TOO_SMALL, size_t(srccur - srcinit)};
+    }
+    uint32_t triple = (uint32_t(buffer[0]) << 18) +
+                      (uint32_t(buffer[1]) << 12) + (uint32_t(buffer[2]) << 6) +
+                      (uint32_t(buffer[3]));
+    if (match_system(endianness::BIG)) {
+      triple <<= 8;
+      std::memcpy(dst, &triple, 3);
+    } else {
+      triple = scalar::utf32::swap_bytes(triple);
+      triple >>= 8;
+      std::memcpy(dst, &triple, 3);
+    }
+    dst += 3;
+  }
 }
-simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_latin1_with_errors(
-      buf, len, latin1_buffer);
+
+// Returns the number of bytes written. The destination buffer must be large
+// enough. It will add padding (=) if needed.
+size_t tail_encode_base64(char *dst, const char *src, size_t srclen,
+                          base64_options options) {
+  // By default, we use padding if we are not using the URL variant.
+  // This is check with ((options & base64_url) == 0) which returns true if we
+  // are not using the URL variant. However, we also allow 'inversion' of the
+  // convention with the base64_reverse_padding option. If the
+  // base64_reverse_padding option is set, we use padding if we are using the
+  // URL variant, and we omit it if we are not using the URL variant. This is
+  // checked with
+  // ((options & base64_reverse_padding) == base64_reverse_padding).
+  bool use_padding =
+      ((options & base64_url) == 0) ^
+      ((options & base64_reverse_padding) == base64_reverse_padding);
+  // This looks like 3 branches, but we expect the compiler to resolve this to
+  // a single branch:
+  const char *e0 = (options & base64_url) ? tables::base64::base64_url::e0
+                                          : tables::base64::base64_default::e0;
+  const char *e1 = (options & base64_url) ? tables::base64::base64_url::e1
+                                          : tables::base64::base64_default::e1;
+  const char *e2 = (options & base64_url) ? tables::base64::base64_url::e2
+                                          : tables::base64::base64_default::e2;
+  char *out = dst;
+  size_t i = 0;
+  uint8_t t1, t2, t3;
+  for (; i + 2 < srclen; i += 3) {
+    t1 = uint8_t(src[i]);
+    t2 = uint8_t(src[i + 1]);
+    t3 = uint8_t(src[i + 2]);
+    *out++ = e0[t1];
+    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
+    *out++ = e1[((t2 & 0x0F) << 2) | ((t3 >> 6) & 0x03)];
+    *out++ = e2[t3];
+  }
+  switch (srclen - i) {
+  case 0:
+    break;
+  case 1:
+    t1 = uint8_t(src[i]);
+    *out++ = e0[t1];
+    *out++ = e1[(t1 & 0x03) << 4];
+    if (use_padding) {
+      *out++ = '=';
+      *out++ = '=';
+    }
+    break;
+  default: /* case 2 */
+    t1 = uint8_t(src[i]);
+    t2 = uint8_t(src[i + 1]);
+    *out++ = e0[t1];
+    *out++ = e1[((t1 & 0x03) << 4) | ((t2 >> 4) & 0x0F)];
+    *out++ = e2[(t2 & 0x0F) << 2];
+    if (use_padding) {
+      *out++ = '=';
+    }
+  }
+  return (size_t)(out - dst);
 }
-simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t *buf,
-                                                   size_t len,
-                                                   char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf8(buf, len,
-                                                               utf8_buffer);
+
+template <class char_type>
+simdutf_warn_unused size_t maximal_binary_length_from_base64(
+    const char_type *input, size_t length) noexcept {
+  // We follow https://infra.spec.whatwg.org/#forgiving-base64-decode
+  size_t padding = 0;
+  if (length > 0) {
+    if (input[length - 1] == '=') {
+      padding++;
+      if (length > 1 && input[length - 2] == '=') {
+        padding++;
+      }
+    }
+  }
+  size_t actual_length = length - padding;
+  if (actual_length % 4 <= 1) {
+    return actual_length / 4 * 3;
+  }
+  // if we have a valid input, then the remainder must be 2 or 3 adding one or
+  // two extra bytes.
+  return actual_length / 4 * 3 + (actual_length % 4) - 1;
 }
-simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *buf,
-                                                   size_t len,
-                                                   char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf8(buf, len,
-                                                               utf8_buffer);
+
+simdutf_warn_unused size_t
+base64_length_from_binary(size_t length, base64_options options) noexcept {
+  // By default, we use padding if we are not using the URL variant.
+  // This is check with ((options & base64_url) == 0) which returns true if we
+  // are not using the URL variant. However, we also allow 'inversion' of the
+  // convention with the base64_reverse_padding option. If the
+  // base64_reverse_padding option is set, we use padding if we are using the
+  // URL variant, and we omit it if we are not using the URL variant. This is
+  // checked with
+  // ((options & base64_reverse_padding) == base64_reverse_padding).
+  bool use_padding =
+      ((options & base64_url) == 0) ^
+      ((options & base64_reverse_padding) == base64_reverse_padding);
+  if (!use_padding) {
+    return length / 3 * 4 + ((length % 3) ? (length % 3) + 1 : 0);
+  }
+  return (length + 2) / 3 *
+         4; // We use padding to make the length a multiple of 4.
 }
-simdutf_warn_unused result convert_utf16_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_utf8_with_errors(buf, len, utf8_buffer);
-#else
-  return convert_utf16le_to_utf8_with_errors(buf, len, utf8_buffer);
+
+} // namespace base64
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
 #endif
+/* end file src/scalar/base64.h */
+/* begin file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
+#ifndef SIMDUTF_LATIN1_TO_UTF8_H
+#define SIMDUTF_LATIN1_TO_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace latin1_to_utf8 {
+
+inline size_t convert(const char *buf, size_t len, char *utf8_output) {
+  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
+  size_t pos = 0;
+  size_t utf8_pos = 0;
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          utf8_output[utf8_pos++] = char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+
+    unsigned char byte = data[pos];
+    if ((byte & 0x80) == 0) { // if ASCII
+      // will generate one UTF-8 bytes
+      utf8_output[utf8_pos++] = char(byte);
+      pos++;
+    } else {
+      // will generate two UTF-8 bytes
+      utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
+      utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
+      pos++;
+    }
+  }
+  return utf8_pos;
 }
-simdutf_warn_unused result convert_utf16_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_latin1_with_errors(buf, len, latin1_buffer);
-#else
-  return convert_utf16le_to_latin1_with_errors(buf, len, latin1_buffer);
+
+inline size_t convert_safe(const char *buf, size_t len, char *utf8_output,
+                           size_t utf8_len) {
+  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
+  size_t pos = 0;
+  size_t skip_pos = 0;
+  size_t utf8_pos = 0;
+  while (pos < len && utf8_pos < utf8_len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos >= skip_pos && pos + 16 <= len &&
+        utf8_pos + 16 <= utf8_len) { // if it is safe to read 16 more bytes,
+                                     // check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        ::memcpy(utf8_output + utf8_pos, buf + pos, 16);
+        utf8_pos += 16;
+        pos += 16;
+      } else {
+        // At least one of the next 16 bytes are not ASCII, we will process them
+        // one by one
+        skip_pos = pos + 16;
+      }
+    } else {
+      const auto byte = data[pos];
+      if ((byte & 0x80) == 0) { // if ASCII
+        // will generate one UTF-8 bytes
+        utf8_output[utf8_pos++] = char(byte);
+        pos++;
+      } else if (utf8_pos + 2 <= utf8_len) {
+        // will generate two UTF-8 bytes
+        utf8_output[utf8_pos++] = char((byte >> 6) | 0b11000000);
+        utf8_output[utf8_pos++] = char((byte & 0b111111) | 0b10000000);
+        pos++;
+      } else {
+        break;
+      }
+    }
+  }
+  return utf8_pos;
+}
+
+} // namespace latin1_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
 #endif
+/* end file src/scalar/latin1_to_utf8/latin1_to_utf8.h */
+
+namespace simdutf {
+bool implementation::supported_by_runtime_system() const {
+  uint32_t required_instruction_sets = this->required_instruction_sets();
+  uint32_t supported_instruction_sets =
+      internal::detect_supported_architectures();
+  return ((supported_instruction_sets & required_instruction_sets) ==
+          required_instruction_sets);
 }
-simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf8_with_errors(
-      buf, len, utf8_buffer);
+
+simdutf_warn_unused encoding_type implementation::autodetect_encoding(
+    const char *input, size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  // UTF8 is common, it includes ASCII, and is commonly represented
+  // without a BOM, so if it fits, go with that. Note that it is still
+  // possible to get it wrong, we are only 'guessing'. If some has UTF-16
+  // data without a BOM, it could pass as UTF-8.
+  //
+  // An interesting twist might be to check for UTF-16 ASCII first (every
+  // other byte is zero).
+  if (validate_utf8(input, length)) {
+    return encoding_type::UTF8;
+  }
+  // The next most common encoding that might appear without BOM is probably
+  // UTF-16LE, so try that next.
+  if ((length % 2) == 0) {
+    // important: we need to divide by two
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      return encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      return encoding_type::UTF32_LE;
+    }
+  }
+  return encoding_type::unspecified;
 }
-simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf8_with_errors(
-      buf, len, utf8_buffer);
+
+namespace internal {
+// When there is a single implementation, we should not pay a price
+// for dispatching to the best implementation. We should just use the
+// one we have. This is a compile-time check.
+#define SIMDUTF_SINGLE_IMPLEMENTATION                                          \
+  (SIMDUTF_IMPLEMENTATION_ICELAKE + SIMDUTF_IMPLEMENTATION_HASWELL +           \
+       SIMDUTF_IMPLEMENTATION_WESTMERE + SIMDUTF_IMPLEMENTATION_ARM64 +        \
+       SIMDUTF_IMPLEMENTATION_PPC64 + SIMDUTF_IMPLEMENTATION_LSX +             \
+       SIMDUTF_IMPLEMENTATION_LASX + SIMDUTF_IMPLEMENTATION_FALLBACK ==        \
+   1)
+
+// Static array of known implementations. We are hoping these get baked into the
+// executable without requiring a static initializer.
+
+#if SIMDUTF_IMPLEMENTATION_ICELAKE
+static const icelake::implementation *get_icelake_singleton() {
+  static const icelake::implementation icelake_singleton{};
+  return &icelake_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf16_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_valid_utf16be_to_utf8(buf, len, utf8_buffer);
-#else
-  return convert_valid_utf16le_to_utf8(buf, len, utf8_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_HASWELL
+static const haswell::implementation *get_haswell_singleton() {
+  static const haswell::implementation haswell_singleton{};
+  return &haswell_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf16_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_valid_utf16be_to_latin1(buf, len, latin1_buffer);
-#else
-  return convert_valid_utf16le_to_latin1(buf, len, latin1_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_WESTMERE
+static const westmere::implementation *get_westmere_singleton() {
+  static const westmere::implementation westmere_singleton{};
+  return &westmere_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_utf8(
-      buf, len, utf8_buffer);
+#endif
+#if SIMDUTF_IMPLEMENTATION_ARM64
+static const arm64::implementation *get_arm64_singleton() {
+  static const arm64::implementation arm64_singleton{};
+  return &arm64_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_utf8(
-      buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *buf,
-                                                 size_t len,
-                                                 char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf8(buf, len,
-                                                             utf8_buffer);
-}
-simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
-    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf8_with_errors(
-      buf, len, utf8_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf8(buf, len,
-                                                                   utf8_buffer);
-}
-simdutf_warn_unused size_t convert_utf32_to_utf16(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf32_to_utf16be(buf, len, utf16_buffer);
-#else
-  return convert_utf32_to_utf16le(buf, len, utf16_buffer);
-#endif
-}
-simdutf_warn_unused size_t convert_utf32_to_latin1(
-    const char32_t *input, size_t length, char *latin1_output) noexcept {
-  return get_default_implementation()->convert_utf32_to_latin1(input, length,
-                                                               latin1_output);
-}
-simdutf_warn_unused size_t convert_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16le(buf, len,
-                                                                utf16_buffer);
-}
-simdutf_warn_unused size_t convert_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16be(buf, len,
-                                                                utf16_buffer);
-}
-simdutf_warn_unused result convert_utf32_to_utf16_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf32_to_utf16be_with_errors(buf, len, utf16_buffer);
-#else
-  return convert_utf32_to_utf16le_with_errors(buf, len, utf16_buffer);
-#endif
-}
-simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16le_with_errors(
-      buf, len, utf16_buffer);
-}
-simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_utf32_to_utf16be_with_errors(
-      buf, len, utf16_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_valid_utf32_to_utf16be(buf, len, utf16_buffer);
-#else
-  return convert_valid_utf32_to_utf16le(buf, len, utf16_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_PPC64
+static const ppc64::implementation *get_ppc64_singleton() {
+  static const ppc64::implementation ppc64_singleton{};
+  return &ppc64_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf16le(
-      buf, len, utf16_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf32_to_utf16be(
-      buf, len, utf16_buffer);
-}
-simdutf_warn_unused size_t convert_utf16_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_utf32(buf, len, utf32_buffer);
-#else
-  return convert_utf16le_to_utf32(buf, len, utf32_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_RVV
+static const rvv::implementation *get_rvv_singleton() {
+  static const rvv::implementation rvv_singleton{};
+  return &rvv_singleton;
 }
-simdutf_warn_unused size_t convert_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf32(buf, len,
-                                                                utf32_buffer);
-}
-simdutf_warn_unused size_t convert_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf32(buf, len,
-                                                                utf32_buffer);
-}
-simdutf_warn_unused result convert_utf16_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_utf16be_to_utf32_with_errors(buf, len, utf32_buffer);
-#else
-  return convert_utf16le_to_utf32_with_errors(buf, len, utf32_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_LSX
+static const lsx::implementation *get_lsx_singleton() {
+  static const lsx::implementation lsx_singleton{};
+  return &lsx_singleton;
 }
-simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16le_to_utf32_with_errors(
-      buf, len, utf32_buffer);
-}
-simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_utf16be_to_utf32_with_errors(
-      buf, len, utf32_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return convert_valid_utf16be_to_utf32(buf, len, utf32_buffer);
-#else
-  return convert_valid_utf16le_to_utf32(buf, len, utf32_buffer);
 #endif
+#if SIMDUTF_IMPLEMENTATION_LASX
+static const lasx::implementation *get_lasx_singleton() {
+  static const lasx::implementation lasx_singleton{};
+  return &lasx_singleton;
 }
-simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16le_to_utf32(
-      buf, len, utf32_buffer);
-}
-simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
-  return get_default_implementation()->convert_valid_utf16be_to_utf32(
-      buf, len, utf32_buffer);
-}
-void change_endianness_utf16(const char16_t *input, size_t length,
-                             char16_t *output) noexcept {
-  get_default_implementation()->change_endianness_utf16(input, length, output);
-}
-simdutf_warn_unused size_t count_utf16(const char16_t *input,
-                                       size_t length) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return count_utf16be(input, length);
-#else
-  return count_utf16le(input, length);
 #endif
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+static const fallback::implementation *get_fallback_singleton() {
+  static const fallback::implementation fallback_singleton{};
+  return &fallback_singleton;
 }
-simdutf_warn_unused size_t count_utf16le(const char16_t *input,
-                                         size_t length) noexcept {
-  return get_default_implementation()->count_utf16le(input, length);
-}
-simdutf_warn_unused size_t count_utf16be(const char16_t *input,
-                                         size_t length) noexcept {
-  return get_default_implementation()->count_utf16be(input, length);
-}
-simdutf_warn_unused size_t count_utf8(const char *input,
-                                      size_t length) noexcept {
-  return get_default_implementation()->count_utf8(input, length);
-}
-simdutf_warn_unused size_t latin1_length_from_utf8(const char *buf,
-                                                   size_t len) noexcept {
-  return get_default_implementation()->latin1_length_from_utf8(buf, len);
-}
-simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) noexcept {
-  return get_default_implementation()->latin1_length_from_utf16(len);
-}
-simdutf_warn_unused size_t latin1_length_from_utf32(size_t len) noexcept {
-  return get_default_implementation()->latin1_length_from_utf32(len);
-}
-simdutf_warn_unused size_t utf8_length_from_latin1(const char *buf,
-                                                   size_t len) noexcept {
-  return get_default_implementation()->utf8_length_from_latin1(buf, len);
-}
-simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t *input,
-                                                  size_t length) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return utf8_length_from_utf16be(input, length);
-#else
-  return utf8_length_from_utf16le(input, length);
 #endif
+
+#if SIMDUTF_SINGLE_IMPLEMENTATION
+static const implementation *get_single_implementation() {
+  return
+  #if SIMDUTF_IMPLEMENTATION_ICELAKE
+      get_icelake_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_HASWELL
+  get_haswell_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_WESTMERE
+  get_westmere_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_ARM64
+  get_arm64_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_PPC64
+  get_ppc64_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_LSX
+  get_lsx_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_LASX
+  get_lasx_singleton();
+  #endif
+  #if SIMDUTF_IMPLEMENTATION_FALLBACK
+  get_fallback_singleton();
+  #endif
 }
-simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *input,
-                                                    size_t length) noexcept {
-  return get_default_implementation()->utf8_length_from_utf16le(input, length);
-}
-simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *input,
-                                                    size_t length) noexcept {
-  return get_default_implementation()->utf8_length_from_utf16be(input, length);
-}
-simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t *input,
-                                                   size_t length) noexcept {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return utf32_length_from_utf16be(input, length);
-#else
-  return utf32_length_from_utf16le(input, length);
 #endif
-}
-simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *input,
-                                                     size_t length) noexcept {
-  return get_default_implementation()->utf32_length_from_utf16le(input, length);
-}
-simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *input,
-                                                     size_t length) noexcept {
-  return get_default_implementation()->utf32_length_from_utf16be(input, length);
-}
-simdutf_warn_unused size_t utf16_length_from_utf8(const char *input,
-                                                  size_t length) noexcept {
-  return get_default_implementation()->utf16_length_from_utf8(input, length);
-}
-simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) noexcept {
-  return get_default_implementation()->utf16_length_from_latin1(length);
-}
-simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *input,
-                                                  size_t length) noexcept {
-  return get_default_implementation()->utf8_length_from_utf32(input, length);
-}
-simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *input,
-                                                   size_t length) noexcept {
-  return get_default_implementation()->utf16_length_from_utf32(input, length);
-}
-simdutf_warn_unused size_t utf32_length_from_utf8(const char *input,
-                                                  size_t length) noexcept {
-  return get_default_implementation()->utf32_length_from_utf8(input, length);
-}
 
-simdutf_warn_unused size_t
-maximal_binary_length_from_base64(const char *input, size_t length) noexcept {
-  return get_default_implementation()->maximal_binary_length_from_base64(
-      input, length);
-}
+/**
+ * @private Detects best supported implementation on first use, and sets it
+ */
+class detect_best_supported_implementation_on_first_use final
+    : public implementation {
+public:
+  std::string name() const noexcept final { return set_best()->name(); }
+  std::string description() const noexcept final {
+    return set_best()->description();
+  }
+  uint32_t required_instruction_sets() const noexcept final {
+    return set_best()->required_instruction_sets();
+  }
 
-simdutf_warn_unused result base64_to_binary(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_handling_options) noexcept {
-  return get_default_implementation()->base64_to_binary(
-      input, length, output, options, last_chunk_handling_options);
-}
+  simdutf_warn_unused int
+  detect_encodings(const char *input, size_t length) const noexcept override {
+    return set_best()->detect_encodings(input, length);
+  }
 
-simdutf_warn_unused size_t maximal_binary_length_from_base64(
-    const char16_t *input, size_t length) noexcept {
-  return get_default_implementation()->maximal_binary_length_from_base64(
-      input, length);
-}
+  simdutf_warn_unused bool
+  validate_utf8(const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf8(buf, len);
+  }
 
-simdutf_warn_unused result base64_to_binary(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_handling_options) noexcept {
-  return get_default_implementation()->base64_to_binary(
-      input, length, output, options, last_chunk_handling_options);
-}
+  simdutf_warn_unused result validate_utf8_with_errors(
+      const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf8_with_errors(buf, len);
+  }
 
-template <typename chartype>
-simdutf_warn_unused result base64_to_binary_safe_impl(
-    const chartype *input, size_t length, char *output, size_t &outlen,
-    base64_options options,
-    last_chunk_handling_options last_chunk_handling_options) noexcept {
-  static_assert(std::is_same<chartype, char>::value ||
-                    std::is_same<chartype, char16_t>::value,
-                "Only char and char16_t are supported.");
-  // The implementation could be nicer, but we expect that most times, the user
-  // will provide us with a buffer that is large enough.
-  size_t max_length = maximal_binary_length_from_base64(input, length);
-  if (outlen >= max_length) {
-    // fast path
-    full_result r = get_default_implementation()->base64_to_binary_details(
-        input, length, output, options, last_chunk_handling_options);
-    if (r.error != error_code::INVALID_BASE64_CHARACTER &&
-        r.error != error_code::BASE64_EXTRA_BITS) {
-      outlen = r.output_count;
-      if (last_chunk_handling_options == stop_before_partial) {
-        if ((r.output_count % 3) != 0) {
-          bool empty_trail = true;
-          for (size_t i = r.input_count; i < length; i++) {
-            if (!scalar::base64::is_ascii_white_space_or_padding(input[i])) {
-              empty_trail = false;
-              break;
-            }
-          }
-          if (empty_trail) {
-            r.input_count = length;
-          }
-        }
-        return {r.error, r.input_count};
-      }
-      return {r.error, length};
-    }
-    return r;
+  simdutf_warn_unused bool
+  validate_ascii(const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_ascii(buf, len);
   }
-  // The output buffer is maybe too small. We will decode a truncated version of
-  // the input.
-  size_t outlen3 = outlen / 3 * 3; // round down to multiple of 3
-  size_t safe_input = base64_length_from_binary(outlen3, options);
-  full_result r = get_default_implementation()->base64_to_binary_details(
-      input, safe_input, output, options, loose);
-  if (r.error == error_code::INVALID_BASE64_CHARACTER) {
-    return r;
+
+  simdutf_warn_unused result validate_ascii_with_errors(
+      const char *buf, size_t len) const noexcept final override {
+    return set_best()->validate_ascii_with_errors(buf, len);
   }
-  size_t offset =
-      (r.error == error_code::BASE64_INPUT_REMAINDER)
-          ? 1
-          : ((r.output_count % 3) == 0 ? 0 : (r.output_count % 3) + 1);
-  size_t output_index = r.output_count - (r.output_count % 3);
-  size_t input_index = safe_input;
-  // offset is a value that is no larger than 3. We backtrack
-  // by up to offset characters + an undetermined number of
-  // white space characters. It is expected that the next loop
-  // runs at most 3 times + the number of white space characters
-  // in between them, so we are not worried about performance.
-  while (offset > 0 && input_index > 0) {
-    chartype c = input[--input_index];
-    if (scalar::base64::is_ascii_white_space(c)) {
-      // skipping
-    } else {
-      offset--;
-    }
+
+  simdutf_warn_unused bool
+  validate_utf16le(const char16_t *buf,
+                   size_t len) const noexcept final override {
+    return set_best()->validate_utf16le(buf, len);
   }
-  size_t remaining_out = outlen - output_index;
-  const chartype *tail_input = input + input_index;
-  size_t tail_length = length - input_index;
-  while (tail_length > 0 &&
-         scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
-    tail_length--;
+
+  simdutf_warn_unused bool
+  validate_utf16be(const char16_t *buf,
+                   size_t len) const noexcept final override {
+    return set_best()->validate_utf16be(buf, len);
   }
-  size_t padding_characts = 0;
-  if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
-    tail_length--;
-    padding_characts++;
-    while (tail_length > 0 &&
-           scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
-      tail_length--;
-    }
-    if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
-      tail_length--;
-      padding_characts++;
-    }
+
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf16le_with_errors(buf, len);
   }
-  // this will advance tail_input and tail_length
-  result rr = scalar::base64::base64_tail_decode_safe(
-      output + output_index, remaining_out, tail_input, tail_length,
-      padding_characts, options, last_chunk_handling_options);
-  outlen = output_index + remaining_out;
-  if (last_chunk_handling_options != stop_before_partial &&
-      rr.error == error_code::SUCCESS && padding_characts > 0) {
-    // additional checks
-    if ((outlen % 3 == 0) || ((outlen % 3) + 1 + padding_characts != 4)) {
-      rr.error = error_code::INVALID_BASE64_CHARACTER;
-    }
+
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf16be_with_errors(buf, len);
   }
-  if (rr.error == error_code::SUCCESS &&
-      last_chunk_handling_options == stop_before_partial) {
-    if (tail_input > input + input_index) {
-      rr.count = tail_input - input;
-    } else if (r.input_count > 0) {
-      rr.count = r.input_count + rr.count;
-    }
-    return rr;
+
+  simdutf_warn_unused bool
+  validate_utf32(const char32_t *buf,
+                 size_t len) const noexcept final override {
+    return set_best()->validate_utf32(buf, len);
   }
-  rr.count += input_index;
-  return rr;
-}
 
-simdutf_warn_unused size_t convert_latin1_to_utf8_safe(
-    const char *buf, size_t len, char *utf8_output, size_t utf8_len) noexcept {
-  const auto start{utf8_output};
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *buf, size_t len) const noexcept final override {
+    return set_best()->validate_utf32_with_errors(buf, len);
+  }
 
-  while (true) {
-    // convert_latin1_to_utf8 will never write more than input length * 2
-    auto read_len = std::min(len, utf8_len >> 1);
-    if (read_len <= 16) {
-      break;
-    }
+  simdutf_warn_unused size_t
+  convert_latin1_to_utf8(const char *buf, size_t len,
+                         char *utf8_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf8(buf, len, utf8_output);
+  }
 
-    const auto write_len =
-        simdutf::convert_latin1_to_utf8(buf, read_len, utf8_output);
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf16le(buf, len, utf16_output);
+  }
 
-    utf8_output += write_len;
-    utf8_len -= write_len;
-    buf += read_len;
-    len -= read_len;
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf16be(buf, len, utf16_output);
   }
 
-  utf8_output +=
-      scalar::latin1_to_utf8::convert_safe(buf, len, utf8_output, utf8_len);
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *buf, size_t len,
+      char32_t *latin1_output) const noexcept final override {
+    return set_best()->convert_latin1_to_utf32(buf, len, latin1_output);
+  }
 
-  return utf8_output - start;
-}
+  simdutf_warn_unused size_t
+  convert_utf8_to_latin1(const char *buf, size_t len,
+                         char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf8_to_latin1(buf, len, latin1_output);
+  }
 
-simdutf_warn_unused result base64_to_binary_safe(
-    const char *input, size_t length, char *output, size_t &outlen,
-    base64_options options,
-    last_chunk_handling_options last_chunk_handling_options) noexcept {
-  return base64_to_binary_safe_impl<char>(input, length, output, outlen,
-                                          options, last_chunk_handling_options);
-}
-simdutf_warn_unused result base64_to_binary_safe(
-    const char16_t *input, size_t length, char *output, size_t &outlen,
-    base64_options options,
-    last_chunk_handling_options last_chunk_handling_options) noexcept {
-  return base64_to_binary_safe_impl<char16_t>(
-      input, length, output, outlen, options, last_chunk_handling_options);
-}
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf8_to_latin1_with_errors(buf, len,
+                                                          latin1_output);
+  }
 
-simdutf_warn_unused size_t
-base64_length_from_binary(size_t length, base64_options options) noexcept {
-  return get_default_implementation()->base64_length_from_binary(length,
-                                                                 options);
-}
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_latin1(buf, len, latin1_output);
+  }
 
-size_t binary_to_base64(const char *input, size_t length, char *output,
-                        base64_options options) noexcept {
-  return get_default_implementation()->binary_to_base64(input, length, output,
-                                                        options);
-}
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16le(buf, len, utf16_output);
+  }
 
-simdutf_warn_unused simdutf::encoding_type
-autodetect_encoding(const char *buf, size_t length) noexcept {
-  return get_default_implementation()->autodetect_encoding(buf, length);
-}
-simdutf_warn_unused int detect_encodings(const char *buf,
-                                         size_t length) noexcept {
-  return get_default_implementation()->detect_encodings(buf, length);
-}
-const implementation *builtin_implementation() {
-  static const implementation *builtin_impl =
-      get_available_implementations()[SIMDUTF_STRINGIFY(
-          SIMDUTF_BUILTIN_IMPLEMENTATION)];
-  return builtin_impl;
-}
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16be(buf, len, utf16_output);
+  }
 
-simdutf_warn_unused size_t trim_partial_utf8(const char *input, size_t length) {
-  return scalar::utf8::trim_partial_utf8(input, length);
-}
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16le_with_errors(buf, len,
+                                                           utf16_output);
+  }
 
-simdutf_warn_unused size_t trim_partial_utf16be(const char16_t *input,
-                                                size_t length) {
-  return scalar::utf16::trim_partial_utf16<BIG>(input, length);
-}
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf16be_with_errors(buf, len,
+                                                           utf16_output);
+  }
 
-simdutf_warn_unused size_t trim_partial_utf16le(const char16_t *input,
-                                                size_t length) {
-  return scalar::utf16::trim_partial_utf16<LITTLE>(input, length);
-}
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf16le(buf, len, utf16_output);
+  }
 
-simdutf_warn_unused size_t trim_partial_utf16(const char16_t *input,
-                                              size_t length) {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return trim_partial_utf16be(input, length);
-#else
-  return trim_partial_utf16le(input, length);
-#endif
-}
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf16be(buf, len, utf16_output);
+  }
 
-} // namespace simdutf
-/* end file src/implementation.cpp */
-/* begin file src/encoding_types.cpp */
+  simdutf_warn_unused size_t
+  convert_utf8_to_utf32(const char *buf, size_t len,
+                        char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf32(buf, len, utf32_output);
+  }
 
-namespace simdutf {
-bool match_system(endianness e) {
-#if SIMDUTF_IS_BIG_ENDIAN
-  return e == endianness::BIG;
-#else
-  return e == endianness::LITTLE;
-#endif
-}
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf8_to_utf32_with_errors(buf, len,
+                                                         utf32_output);
+  }
 
-std::string to_string(encoding_type bom) {
-  switch (bom) {
-  case UTF16_LE:
-    return "UTF16 little-endian";
-  case UTF16_BE:
-    return "UTF16 big-endian";
-  case UTF32_LE:
-    return "UTF32 little-endian";
-  case UTF32_BE:
-    return "UTF32 big-endian";
-  case UTF8:
-    return "UTF8";
-  case unspecified:
-    return "unknown";
-  default:
-    return "error";
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_valid_utf8_to_utf32(buf, len, utf32_output);
   }
-}
 
-namespace BOM {
-// Note that BOM for UTF8 is discouraged.
-encoding_type check_bom(const uint8_t *byte, size_t length) {
-  if (length >= 2 && byte[0] == 0xff and byte[1] == 0xfe) {
-    if (length >= 4 && byte[2] == 0x00 and byte[3] == 0x0) {
-      return encoding_type::UTF32_LE;
-    } else {
-      return encoding_type::UTF16_LE;
-    }
-  } else if (length >= 2 && byte[0] == 0xfe and byte[1] == 0xff) {
-    return encoding_type::UTF16_BE;
-  } else if (length >= 4 && byte[0] == 0x00 and byte[1] == 0x00 and
-             byte[2] == 0xfe and byte[3] == 0xff) {
-    return encoding_type::UTF32_BE;
-  } else if (length >= 4 && byte[0] == 0xef and byte[1] == 0xbb and
-             byte[2] == 0xbf) {
-    return encoding_type::UTF8;
+  simdutf_warn_unused size_t
+  convert_utf16le_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_latin1(buf, len, latin1_output);
   }
-  return encoding_type::unspecified;
-}
 
-encoding_type check_bom(const char *byte, size_t length) {
-  return check_bom(reinterpret_cast<const uint8_t *>(byte), length);
-}
+  simdutf_warn_unused size_t
+  convert_utf16be_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_latin1(buf, len, latin1_output);
+  }
 
-size_t bom_byte_size(encoding_type bom) {
-  switch (bom) {
-  case UTF16_LE:
-    return 2;
-  case UTF16_BE:
-    return 2;
-  case UTF32_LE:
-    return 4;
-  case UTF32_BE:
-    return 4;
-  case UTF8:
-    return 3;
-  case unspecified:
-    return 0;
-  default:
-    return 0;
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_latin1_with_errors(buf, len,
+                                                             latin1_output);
   }
-}
 
-} // namespace BOM
-} // namespace simdutf
-/* end file src/encoding_types.cpp */
-/* begin file src/error.cpp */
-namespace simdutf {
-// deliberately empty
-}
-/* end file src/error.cpp */
-// The large tables should be included once and they
-// should not depend on a kernel.
-/* begin file src/tables/utf8_to_utf16_tables.h */
-#ifndef SIMDUTF_UTF8_TO_UTF16_TABLES_H
-#define SIMDUTF_UTF8_TO_UTF16_TABLES_H
-#include <cstdint>
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_latin1_with_errors(buf, len,
+                                                             latin1_output);
+  }
 
-namespace simdutf {
-namespace {
-namespace tables {
-namespace utf8_to_utf16 {
-/**
- * utf8bigindex uses about 8 kB
- * shufutf8 uses about 3344 B
- *
- * So we use a bit over 11 kB. It would be
- * easy to save about 4 kB by only
- * storing the index in utf8bigindex, and
- * deriving the consumed bytes otherwise.
- * However, this may come at a significant (10% to 20%)
- * performance penalty.
- */
+  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf16le_to_latin1(buf, len, latin1_output);
+  }
 
-const uint8_t shufutf8[209][16] = {
-    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 5, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 6, 5, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 6, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 7, 6, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 6, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 7, 6, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 7, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 8, 7, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
-    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
-    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
-    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 8, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 9, 8, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 9, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 10, 9, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 9, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 10, 9, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 255, 0, 0, 0, 0},
-    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 0, 0, 0, 0},
-    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
-    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 5, 255, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 6, 5, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 7, 6, 5, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 6, 255, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 7, 6, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 8, 7, 6, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 7, 255, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 8, 7, 255, 255},
-    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 9, 8, 7, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
-    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
-    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 6, 255, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 7, 6, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 8, 7, 6, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 7, 255, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 8, 7, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 9, 8, 7, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 8, 255, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 9, 8, 255, 255},
-    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 10, 9, 8, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
-    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 7, 6, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 8, 7, 6, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 7, 255, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 8, 7, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 9, 8, 7, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 8, 255, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 9, 8, 255, 255},
-    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 10, 9, 8, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 7, 255, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 8, 7, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 9, 8, 7, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 8, 255, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 9, 8, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 10, 9, 8, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 9, 255, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 10, 9, 255, 255},
-    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 11, 10, 9, 255},
-    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 1, 255, 255, 255, 5, 4, 3, 2, 0, 0, 0, 0},
-    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 2, 1, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
-    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 3, 2, 1, 255, 7, 6, 5, 4, 0, 0, 0, 0},
-    {0, 255, 255, 255, 4, 3, 2, 1, 5, 255, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 4, 3, 2, 1, 6, 5, 255, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 4, 3, 2, 1, 7, 6, 5, 255, 0, 0, 0, 0},
-    {0, 255, 255, 255, 4, 3, 2, 1, 8, 7, 6, 5, 0, 0, 0, 0},
-    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 2, 255, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
-    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 3, 2, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
-    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 4, 3, 2, 255, 8, 7, 6, 5, 0, 0, 0, 0},
-    {1, 0, 255, 255, 5, 4, 3, 2, 6, 255, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 5, 4, 3, 2, 7, 6, 255, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 5, 4, 3, 2, 8, 7, 6, 255, 0, 0, 0, 0},
-    {1, 0, 255, 255, 5, 4, 3, 2, 9, 8, 7, 6, 0, 0, 0, 0},
-    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 3, 255, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
-    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 4, 3, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
-    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 5, 4, 3, 255, 9, 8, 7, 6, 0, 0, 0, 0},
-    {2, 1, 0, 255, 6, 5, 4, 3, 7, 255, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 6, 5, 4, 3, 8, 7, 255, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 6, 5, 4, 3, 9, 8, 7, 255, 0, 0, 0, 0},
-    {2, 1, 0, 255, 6, 5, 4, 3, 10, 9, 8, 7, 0, 0, 0, 0},
-    {3, 2, 1, 0, 4, 255, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 4, 255, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 4, 255, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 4, 255, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
-    {3, 2, 1, 0, 5, 4, 255, 255, 6, 255, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 5, 4, 255, 255, 7, 6, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 5, 4, 255, 255, 8, 7, 6, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 5, 4, 255, 255, 9, 8, 7, 6, 0, 0, 0, 0},
-    {3, 2, 1, 0, 6, 5, 4, 255, 7, 255, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 6, 5, 4, 255, 8, 7, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 6, 5, 4, 255, 9, 8, 7, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 6, 5, 4, 255, 10, 9, 8, 7, 0, 0, 0, 0},
-    {3, 2, 1, 0, 7, 6, 5, 4, 8, 255, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 7, 6, 5, 4, 9, 8, 255, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 7, 6, 5, 4, 10, 9, 8, 255, 0, 0, 0, 0},
-    {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 0, 0, 0, 0}};
-/* number of two bytes : 64 */
-/* number of two + three bytes : 145 */
-/* number of two + three + four bytes : 209 */
-const uint8_t utf8bigindex[4096][2] = {
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
-    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
-    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
-    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12},
-    {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},
-    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
-    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
-    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {152, 7},
-    {164, 7},  {145, 3},  {209, 12}, {155, 7},  {167, 7},  {69, 7},   {179, 7},
-    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},
-    {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
-    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
-    {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
-    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
-    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
-    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
-    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {156, 8},  {168, 8},  {146, 4},
-    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {171, 8},
-    {72, 8},   {183, 8},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
-    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
-    {209, 12}, {174, 8},  {148, 6},  {186, 8},  {80, 8},   {98, 8},   {66, 6},
-    {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},
-    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},
-    {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},
-    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
-    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
-    {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},
-    {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
-    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
-    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},
-    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
-    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
-    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
-    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
-    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
-    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
-    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {187, 9},  {81, 9},
-    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
-    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
-    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
-    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
-    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
-    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {77, 7},   {95, 7},
-    {7, 9},    {194, 7},  {83, 7},   {101, 7},  {11, 9},   {119, 7},  {19, 9},
-    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
-    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {13, 9},
-    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
-    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
-    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
-    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
-    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
-    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
-    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
-    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
-    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
-    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
-    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
-    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
-    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
-    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
-    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
-    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
-    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
-    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
-    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
-    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},
-    {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},
-    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
-    {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10},
-    {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},
-    {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},
-    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
-    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10},
-    {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},
-    {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10},
-    {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},
-    {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12},
-    {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},
-    {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},
-    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},
-    {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},
-    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},
-    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
-    {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},
-    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
-    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},
-    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {15, 10},  {122, 8},  {23, 10},
-    {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
-    {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},
-    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
-    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
-    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},
-    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},
-    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
-    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},
-    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
-    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
-    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
-    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
-    {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},
-    {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
-    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10},
-    {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},
-    {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},
-    {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
-    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},
-    {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},
-    {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},
-    {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},
-    {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
-    {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},
-    {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},
-    {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},
-    {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
-    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},
-    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
-    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
-    {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
-    {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12},
-    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},
-    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
-    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
-    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
-    {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},
-    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
-    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
-    {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
-    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
-    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
-    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
-    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12},
-    {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},
-    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
-    {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},
-    {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},
-    {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},
-    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
-    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {192, 11}, {152, 7},  {164, 7},  {145, 3},  {204, 11}, {155, 7},  {167, 7},
-    {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
-    {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},
-    {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},
-    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},
-    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},
-    {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
-    {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},
-    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {207, 11}, {156, 8},
-    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
-    {159, 8},  {117, 11}, {72, 8},   {135, 11}, {78, 8},   {96, 8},   {65, 5},
-    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
-    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {141, 11}, {80, 8},
-    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},
-    {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
-    {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},
-    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
-    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
-    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},
-    {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},
-    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
-    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},
-    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
-    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
-    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
-    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
-    {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},
-    {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},
-    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},
-    {143, 11}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},
-    {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
-    {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
-    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},
-    {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},
-    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},
-    {31, 11},  {47, 11},  {7, 9},    {194, 7},  {83, 7},   {55, 11},  {11, 9},
-    {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
-    {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
-    {59, 11},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12},
-    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},
-    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},
-    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},
-    {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
-    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
-    {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},
-    {86, 8},   {61, 11},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},
-    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},
-    {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},
-    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
-    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
-    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
-    {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},
-    {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
-    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
-    {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},
-    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
-    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
-    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
-    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12},
-    {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12},
-    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
-    {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},
-    {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},
-    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},
-    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
-    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},
-    {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12},
-    {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},
-    {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},
-    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},
-    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},
-    {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
-    {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},
-    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10},
-    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
-    {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},
-    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
-    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10},
-    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {62, 11},  {15, 10},
-    {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},
-    {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},
-    {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
-    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
-    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},
-    {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},
-    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
-    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},
-    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
-    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
-    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
-    {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},
-    {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},
-    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},
-    {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},
-    {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
-    {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},
-    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},
-    {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},
-    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},
-    {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},
-    {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12},
-    {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
-    {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},
-    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},
-    {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},
-    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},
-    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},
-    {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
-    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
-    {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},
-    {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},
-    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},
-    {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},
-    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
-    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
-    {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},
-    {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
-    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
-    {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},
-    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
-    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
-    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
-    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
-    {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},
-    {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
-    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},
-    {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},
-    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},
-    {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
-    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {152, 7},  {164, 7},  {145, 3},  {209, 12},
-    {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},
-    {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},
-    {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},
-    {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},
-    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},
-    {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
-    {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},
-    {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
-    {208, 12}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
-    {64, 4},   {209, 12}, {159, 8},  {171, 8},  {72, 8},   {183, 8},  {78, 8},
-    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
-    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
-    {186, 8},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
-    {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},
-    {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},
-    {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
-    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
-    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
-    {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},
-    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
-    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
-    {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
-    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
-    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
-    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},
-    {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},
-    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
-    {175, 9},  {148, 6},  {144, 12}, {81, 9},   {99, 9},   {66, 6},   {199, 9},
-    {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},
-    {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},
-    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
-    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},
-    {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},
-    {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},
-    {71, 7},   {131, 9},  {77, 7},   {95, 7},   {7, 9},    {194, 7},  {83, 7},
-    {101, 7},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12},
-    {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},
-    {197, 7},  {85, 7},   {103, 7},  {13, 9},   {121, 7},  {21, 9},   {37, 9},
-    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},
-    {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},
-    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},
-    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
-    {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},
-    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
-    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},
-    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {14, 9},   {122, 8},  {22, 9},
-    {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
-    {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},
-    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
-    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
-    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},
-    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},
-    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
-    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},
-    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
-    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
-    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
-    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
-    {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},
-    {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
-    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10},
-    {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},
-    {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},
-    {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
-    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},
-    {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},
-    {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},
-    {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},
-    {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
-    {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},
-    {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},
-    {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},
-    {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
-    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10},
-    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
-    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
-    {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
-    {63, 12},  {15, 10},  {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12},
-    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},
-    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
-    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
-    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
-    {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},
-    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
-    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
-    {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
-    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
-    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
-    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
-    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},
-    {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},
-    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
-    {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},
-    {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},
-    {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},
-    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
-    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},
-    {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
-    {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},
-    {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},
-    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},
-    {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},
-    {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
-    {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},
-    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},
-    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
-    {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},
-    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
-    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},
-    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},
-    {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
-    {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},
-    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
-    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
-    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},
-    {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},
-    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
-    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},
-    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
-    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
-    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
-    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
-    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},
-    {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},
-    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},
-    {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},
-    {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
-    {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
-    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {192, 11}, {152, 7},  {164, 7},
-    {145, 3},  {204, 11}, {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},
-    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},
-    {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},
-    {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
-    {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
-    {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12},
-    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},
-    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {145, 3},  {207, 11}, {156, 8},  {168, 8},  {146, 4},  {180, 8},
-    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {117, 11}, {72, 8},
-    {135, 11}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
-    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
-    {174, 8},  {148, 6},  {141, 11}, {80, 8},   {98, 8},   {66, 6},   {198, 8},
-    {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},
-    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},
-    {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},
-    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
-    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
-    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
-    {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},
-    {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
-    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
-    {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},
-    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
-    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
-    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
-    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},
-    {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},
-    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
-    {209, 12}, {209, 12}, {175, 9},  {148, 6},  {143, 11}, {81, 9},   {99, 9},
-    {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},
-    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},
-    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
-    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},
-    {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
-    {158, 7},  {113, 9},  {71, 7},   {131, 9},  {31, 11},  {47, 11},  {7, 9},
-    {194, 7},  {83, 7},   {55, 11},  {11, 9},   {119, 7},  {19, 9},   {35, 9},
-    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},
-    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {59, 11},  {13, 9},   {121, 7},
-    {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
-    {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},
-    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},
-    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
-    {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},
-    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
-    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},
-    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {61, 11},  {14, 9},
-    {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},
-    {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},
-    {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
-    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
-    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},
-    {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},
-    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
-    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},
-    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
-    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
-    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
-    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
-    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
-    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10},
-    {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},
-    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
-    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
-    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
-    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},
-    {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10},
-    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},
-    {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
-    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
-    {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},
-    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
-    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
-    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
-    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},
-    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10},
-    {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
-    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
-    {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},
-    {198, 8},  {86, 8},   {62, 11},  {15, 10},  {122, 8},  {23, 10},  {39, 10},
-    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},
-    {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},
-    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
-    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
-    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
-    {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},
-    {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
-    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
-    {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},
-    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
-    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
-    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
-    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
-    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
-    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
-    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},
-    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
-    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
-    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
-    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
-    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
-    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
-    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},
-    {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},
-    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
-    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},
-    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
-    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
-    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
-    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
-    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
-    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
-    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
-    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},
-    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
-    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
-    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
-    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
-    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
-    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
-    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
-    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
-    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
-    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
-    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
-    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
-    {0, 6}};
-} // namespace utf8_to_utf16
-} // namespace tables
-} // unnamed namespace
-} // namespace simdutf
+  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+      const char16_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_valid_utf16be_to_latin1(buf, len, latin1_output);
+  }
 
-#endif // SIMDUTF_UTF8_TO_UTF16_TABLES_H
-/* end file src/tables/utf8_to_utf16_tables.h */
-/* begin file src/tables/utf16_to_utf8_tables.h */
-// file generated by scripts/sse_convert_utf16_to_utf8.py
-#ifndef SIMDUTF_UTF16_TO_UTF8_TABLES_H
-#define SIMDUTF_UTF16_TO_UTF8_TABLES_H
+  simdutf_warn_unused size_t
+  convert_utf16le_to_utf8(const char16_t *buf, size_t len,
+                          char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf8(buf, len, utf8_output);
+  }
 
-namespace simdutf {
-namespace {
-namespace tables {
-namespace utf16_to_utf8 {
+  simdutf_warn_unused size_t
+  convert_utf16be_to_utf8(const char16_t *buf, size_t len,
+                          char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf8(buf, len, utf8_output);
+  }
 
-// 1 byte for length, 16 bytes for mask
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf8_with_errors(buf, len,
+                                                           utf8_output);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf8_with_errors(buf, len,
+                                                           utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf16le_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf16be_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                          char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
+      const char32_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1_with_errors(buf, len,
+                                                           latin1_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
+      const char32_t *buf, size_t len,
+      char *latin1_output) const noexcept final override {
+    return set_best()->convert_utf32_to_latin1(buf, len, latin1_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_utf32_to_utf8(const char32_t *buf, size_t len,
+                        char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *buf, size_t len,
+      char *utf8_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t
+  convert_valid_utf32_to_utf8(const char32_t *buf, size_t len,
+                              char *utf8_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf8(buf, len, utf8_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16le(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16be(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16le_with_errors(buf, len,
+                                                            utf16_output);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_utf32_to_utf16be_with_errors(buf, len,
+                                                            utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf16le(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+      const char32_t *buf, size_t len,
+      char16_t *utf16_output) const noexcept final override {
+    return set_best()->convert_valid_utf32_to_utf16be(buf, len, utf16_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf16le_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf32(buf, len, utf32_output);
+  }
+
+  simdutf_warn_unused size_t convert_utf16be_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf32(buf, len, utf32_output);
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16le_to_utf32_with_errors(buf, len,
+                                                            utf32_output);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_utf16be_to_utf32_with_errors(buf, len,
+                                                            utf32_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_valid_utf16le_to_utf32(buf, len, utf32_output);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+      const char16_t *buf, size_t len,
+      char32_t *utf32_output) const noexcept final override {
+    return set_best()->convert_valid_utf16be_to_utf32(buf, len, utf32_output);
+  }
+
+  void change_endianness_utf16(const char16_t *buf, size_t len,
+                               char16_t *output) const noexcept final override {
+    set_best()->change_endianness_utf16(buf, len, output);
+  }
+
+  simdutf_warn_unused size_t
+  count_utf16le(const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->count_utf16le(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  count_utf16be(const char16_t *buf, size_t len) const noexcept final override {
+    return set_best()->count_utf16be(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  count_utf8(const char *buf, size_t len) const noexcept final override {
+    return set_best()->count_utf8(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *buf, size_t len) const noexcept override {
+    return set_best()->latin1_length_from_utf8(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t len) const noexcept override {
+    return set_best()->latin1_length_from_utf16(len);
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t len) const noexcept override {
+    return set_best()->latin1_length_from_utf32(len);
+  }
+
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *buf, size_t len) const noexcept override {
+    return set_best()->utf8_length_from_latin1(buf, len);
+  }
+
+  simdutf_warn_unused size_t utf8_length_from_utf16le(
+      const char16_t *buf, size_t len) const noexcept override {
+    return set_best()->utf8_length_from_utf16le(buf, len);
+  }
+
+  simdutf_warn_unused size_t utf8_length_from_utf16be(
+      const char16_t *buf, size_t len) const noexcept override {
+    return set_best()->utf8_length_from_utf16be(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t len) const noexcept override {
+    return set_best()->utf16_length_from_latin1(len);
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t len) const noexcept override {
+    return set_best()->utf32_length_from_latin1(len);
+  }
+
+  simdutf_warn_unused size_t utf32_length_from_utf16le(
+      const char16_t *buf, size_t len) const noexcept override {
+    return set_best()->utf32_length_from_utf16le(buf, len);
+  }
+
+  simdutf_warn_unused size_t utf32_length_from_utf16be(
+      const char16_t *buf, size_t len) const noexcept override {
+    return set_best()->utf32_length_from_utf16be(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *buf, size_t len) const noexcept override {
+    return set_best()->utf16_length_from_utf8(buf, len);
+  }
+
+  simdutf_warn_unused size_t utf8_length_from_utf32(
+      const char32_t *buf, size_t len) const noexcept override {
+    return set_best()->utf8_length_from_utf32(buf, len);
+  }
+
+  simdutf_warn_unused size_t utf16_length_from_utf32(
+      const char32_t *buf, size_t len) const noexcept override {
+    return set_best()->utf16_length_from_utf32(buf, len);
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *buf, size_t len) const noexcept override {
+    return set_best()->utf32_length_from_utf8(buf, len);
+  }
+
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *input, size_t length) const noexcept override {
+    return set_best()->maximal_binary_length_from_base64(input, length);
+  }
+
+  simdutf_warn_unused result base64_to_binary(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary(input, length, output, options,
+                                        last_chunk_handling_options);
+  }
+
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *input, size_t length, char *output, base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary_details(input, length, output, options,
+                                                last_chunk_handling_options);
+  }
+
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *input, size_t length) const noexcept override {
+    return set_best()->maximal_binary_length_from_base64(input, length);
+  }
+
+  simdutf_warn_unused result base64_to_binary(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary(input, length, output, options,
+                                        last_chunk_handling_options);
+  }
+
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *input, size_t length, char *output,
+      base64_options options,
+      last_chunk_handling_options last_chunk_handling_options =
+          last_chunk_handling_options::loose) const noexcept override {
+    return set_best()->base64_to_binary_details(input, length, output, options,
+                                                last_chunk_handling_options);
+  }
+
+  simdutf_warn_unused size_t base64_length_from_binary(
+      size_t length, base64_options options) const noexcept override {
+    return set_best()->base64_length_from_binary(length, options);
+  }
+
+  size_t binary_to_base64(const char *input, size_t length, char *output,
+                          base64_options options) const noexcept override {
+    return set_best()->binary_to_base64(input, length, output, options);
+  }
+
+  simdutf_really_inline
+  detect_best_supported_implementation_on_first_use() noexcept
+      : implementation("best_supported_detector",
+                       "Detects the best supported implementation and sets it",
+                       0) {}
+
+private:
+  const implementation *set_best() const noexcept;
+};
+
+static_assert(std::is_trivially_destructible<
+                  detect_best_supported_implementation_on_first_use>::value,
+              "detect_best_supported_implementation_on_first_use should be "
+              "trivially destructible");
+
+static const std::initializer_list<const implementation *> &
+get_available_implementation_pointers() {
+  static const std::initializer_list<const implementation *>
+      available_implementation_pointers{
+#if SIMDUTF_IMPLEMENTATION_ICELAKE
+          get_icelake_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_HASWELL
+          get_haswell_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_WESTMERE
+          get_westmere_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_ARM64
+          get_arm64_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_PPC64
+          get_ppc64_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_RVV
+          get_rvv_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_LSX
+          get_lsx_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_LASX
+          get_lasx_singleton(),
+#endif
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+          get_fallback_singleton(),
+#endif
+      }; // available_implementation_pointers
+  return available_implementation_pointers;
+}
+
+// So we can return UNSUPPORTED_ARCHITECTURE from the parser when there is no
+// support
+class unsupported_implementation final : public implementation {
+public:
+  simdutf_warn_unused int detect_encodings(const char *,
+                                           size_t) const noexcept override {
+    return encoding_type::unspecified;
+  }
+
+  simdutf_warn_unused bool validate_utf8(const char *,
+                                         size_t) const noexcept final override {
+    return false; // Just refuse to validate. Given that we have a fallback
+                  // implementation
+    // it seems unlikely that unsupported_implementation will ever be used. If
+    // it is used, then it will flag all strings as invalid. The alternative is
+    // to return an error_code from which the user has to figure out whether the
+    // string is valid UTF-8... which seems like a lot of work just to handle
+    // the very unlikely case that we have an unsupported implementation. And,
+    // when it does happen (that we have an unsupported implementation), what
+    // are the chances that the programmer has a fallback? Given that *we*
+    // provide the fallback, it implies that the programmer would need a
+    // fallback for our fallback.
+  }
+
+  simdutf_warn_unused result validate_utf8_with_errors(
+      const char *, size_t) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused bool
+  validate_ascii(const char *, size_t) const noexcept final override {
+    return false;
+  }
+
+  simdutf_warn_unused result validate_ascii_with_errors(
+      const char *, size_t) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf16le(const char16_t *, size_t) const noexcept final override {
+    return false;
+  }
+
+  simdutf_warn_unused bool
+  validate_utf16be(const char16_t *, size_t) const noexcept final override {
+    return false;
+  }
+
+  simdutf_warn_unused result validate_utf16le_with_errors(
+      const char16_t *, size_t) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result validate_utf16be_with_errors(
+      const char16_t *, size_t) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused bool
+  validate_utf32(const char32_t *, size_t) const noexcept final override {
+    return false;
+  }
+
+  simdutf_warn_unused result validate_utf32_with_errors(
+      const char32_t *, size_t) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf8(
+      const char *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_latin1_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_latin1(
+      const char *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+      const char *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+      const char *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+      const char *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf8_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+      const char *, size_t, char32_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+      const char *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16le_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16be_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16le_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16be_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+      const char16_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_latin1(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf32_to_latin1_with_errors(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_latin1(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf8(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+      const char32_t *, size_t, char *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16le(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf32_to_utf16be(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+      const char32_t *, size_t, char16_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16le_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_utf16be_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+      const char16_t *, size_t, char32_t *) const noexcept final override {
+    return 0;
+  }
+
+  void change_endianness_utf16(const char16_t *, size_t,
+                               char16_t *) const noexcept final override {}
+
+  simdutf_warn_unused size_t
+  count_utf16le(const char16_t *, size_t) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  count_utf16be(const char16_t *, size_t) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t count_utf8(const char *,
+                                        size_t) const noexcept final override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf8(const char *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf16(size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  latin1_length_from_utf32(size_t) const noexcept override {
+    return 0;
+  }
+  simdutf_warn_unused size_t
+  utf8_length_from_latin1(const char *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16le(const char16_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf8_length_from_utf16be(const char16_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16le(const char16_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_utf16be(const char16_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_latin1(size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf16_length_from_utf8(const char *, size_t) const noexcept override {
+    return 0;
+  }
+  simdutf_warn_unused size_t
+  utf16_length_from_latin1(size_t) const noexcept override {
+    return 0;
+  }
+  simdutf_warn_unused size_t
+  utf8_length_from_utf32(const char32_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf16_length_from_utf32(const char32_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t
+  utf32_length_from_utf8(const char *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused result
+  base64_to_binary(const char *, size_t, char *, base64_options,
+                   last_chunk_handling_options) const noexcept override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char *, size_t, char *, base64_options,
+      last_chunk_handling_options) const noexcept override {
+    return full_result(error_code::OTHER, 0, 0);
+  }
+
+  simdutf_warn_unused size_t maximal_binary_length_from_base64(
+      const char16_t *, size_t) const noexcept override {
+    return 0;
+  }
+
+  simdutf_warn_unused result
+  base64_to_binary(const char16_t *, size_t, char *, base64_options,
+                   last_chunk_handling_options) const noexcept override {
+    return result(error_code::OTHER, 0);
+  }
+
+  simdutf_warn_unused full_result base64_to_binary_details(
+      const char16_t *, size_t, char *, base64_options,
+      last_chunk_handling_options) const noexcept override {
+    return full_result(error_code::OTHER, 0, 0);
+  }
+
+  simdutf_warn_unused size_t
+  base64_length_from_binary(size_t, base64_options) const noexcept override {
+    return 0;
+  }
+
+  size_t binary_to_base64(const char *, size_t, char *,
+                          base64_options) const noexcept override {
+    return 0;
+  }
+
+  unsupported_implementation()
+      : implementation("unsupported",
+                       "Unsupported CPU (no detected SIMD instructions)", 0) {}
+};
+
+const unsupported_implementation *get_unsupported_singleton() {
+  static const unsupported_implementation unsupported_singleton{};
+  return &unsupported_singleton;
+}
+static_assert(std::is_trivially_destructible<unsupported_implementation>::value,
+              "unsupported_singleton should be trivially destructible");
+
+size_t available_implementation_list::size() const noexcept {
+  return internal::get_available_implementation_pointers().size();
+}
+const implementation *const *
+available_implementation_list::begin() const noexcept {
+  return internal::get_available_implementation_pointers().begin();
+}
+const implementation *const *
+available_implementation_list::end() const noexcept {
+  return internal::get_available_implementation_pointers().end();
+}
+const implementation *
+available_implementation_list::detect_best_supported() const noexcept {
+  // They are prelisted in priority order, so we just go down the list
+  uint32_t supported_instruction_sets =
+      internal::detect_supported_architectures();
+  for (const implementation *impl :
+       internal::get_available_implementation_pointers()) {
+    uint32_t required_instruction_sets = impl->required_instruction_sets();
+    if ((supported_instruction_sets & required_instruction_sets) ==
+        required_instruction_sets) {
+      return impl;
+    }
+  }
+  return get_unsupported_singleton(); // this should never happen?
+}
+
+const implementation *
+detect_best_supported_implementation_on_first_use::set_best() const noexcept {
+  SIMDUTF_PUSH_DISABLE_WARNINGS
+  SIMDUTF_DISABLE_DEPRECATED_WARNING // Disable CRT_SECURE warning on MSVC:
+                                     // manually verified this is safe
+      char *force_implementation_name = getenv("SIMDUTF_FORCE_IMPLEMENTATION");
+  SIMDUTF_POP_DISABLE_WARNINGS
+
+  if (force_implementation_name) {
+    auto force_implementation =
+        get_available_implementations()[force_implementation_name];
+    if (force_implementation) {
+      return get_active_implementation() = force_implementation;
+    } else {
+      // Note: abort() and stderr usage within the library is forbidden.
+      return get_active_implementation() = get_unsupported_singleton();
+    }
+  }
+  return get_active_implementation() =
+             get_available_implementations().detect_best_supported();
+}
+
+} // namespace internal
+
+/**
+ * The list of available implementations compiled into simdutf.
+ */
+SIMDUTF_DLLIMPORTEXPORT const internal::available_implementation_list &
+get_available_implementations() {
+  static const internal::available_implementation_list
+      available_implementations{};
+  return available_implementations;
+}
+
+/**
+ * The active implementation.
+ */
+SIMDUTF_DLLIMPORTEXPORT internal::atomic_ptr<const implementation> &
+get_active_implementation() {
+#if SIMDUTF_SINGLE_IMPLEMENTATION
+  // skip runtime detection
+  static internal::atomic_ptr<const implementation> active_implementation{
+      internal::get_single_implementation()};
+  return active_implementation;
+#else
+  static const internal::detect_best_supported_implementation_on_first_use
+      detect_best_supported_implementation_on_first_use_singleton;
+  static internal::atomic_ptr<const implementation> active_implementation{
+      &detect_best_supported_implementation_on_first_use_singleton};
+  return active_implementation;
+#endif
+}
+
+#if SIMDUTF_SINGLE_IMPLEMENTATION
+const implementation *get_default_implementation() {
+  return internal::get_single_implementation();
+}
+#else
+internal::atomic_ptr<const implementation> &get_default_implementation() {
+  return get_active_implementation();
+}
+#endif
+#define SIMDUTF_GET_CURRENT_IMPLEMENTION
+
+simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept {
+  return get_default_implementation()->validate_utf8(buf, len);
+}
+simdutf_warn_unused result validate_utf8_with_errors(const char *buf,
+                                                     size_t len) noexcept {
+  return get_default_implementation()->validate_utf8_with_errors(buf, len);
+}
+simdutf_warn_unused bool validate_ascii(const char *buf, size_t len) noexcept {
+  return get_default_implementation()->validate_ascii(buf, len);
+}
+simdutf_warn_unused result validate_ascii_with_errors(const char *buf,
+                                                      size_t len) noexcept {
+  return get_default_implementation()->validate_ascii_with_errors(buf, len);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf8_to_utf16be(input, length, utf16_output);
+#else
+  return convert_utf8_to_utf16le(input, length, utf16_output);
+#endif
+}
+simdutf_warn_unused size_t convert_latin1_to_utf8(const char *buf, size_t len,
+                                                  char *utf8_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf8(buf, len,
+                                                              utf8_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf16le(buf, len,
+                                                                 utf16_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf16be(buf, len,
+                                                                 utf16_output);
+}
+simdutf_warn_unused size_t convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *latin1_output) noexcept {
+  return get_default_implementation()->convert_latin1_to_utf32(buf, len,
+                                                               latin1_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_latin1(buf, len,
+                                                              latin1_output);
+}
+simdutf_warn_unused result convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_latin1_with_errors(
+      buf, len, latin1_output);
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_latin1(
+      buf, len, latin1_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16le(input, length,
+                                                               utf16_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16be(input, length,
+                                                               utf16_output);
+}
+simdutf_warn_unused result convert_utf8_to_utf16_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf8_to_utf16be_with_errors(input, length, utf16_output);
+#else
+  return convert_utf8_to_utf16le_with_errors(input, length, utf16_output);
+#endif
+}
+simdutf_warn_unused result convert_utf8_to_utf16le_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16le_with_errors(
+      input, length, utf16_output);
+}
+simdutf_warn_unused result convert_utf8_to_utf16be_with_errors(
+    const char *input, size_t length, char16_t *utf16_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf16be_with_errors(
+      input, length, utf16_output);
+}
+simdutf_warn_unused size_t convert_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf32(input, length,
+                                                             utf32_output);
+}
+simdutf_warn_unused result convert_utf8_to_utf32_with_errors(
+    const char *input, size_t length, char32_t *utf32_output) noexcept {
+  return get_default_implementation()->convert_utf8_to_utf32_with_errors(
+      input, length, utf32_output);
+}
+simdutf_warn_unused bool validate_utf16(const char16_t *buf,
+                                        size_t len) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return validate_utf16be(buf, len);
+#else
+  return validate_utf16le(buf, len);
+#endif
+}
+simdutf_warn_unused bool validate_utf16le(const char16_t *buf,
+                                          size_t len) noexcept {
+  return get_default_implementation()->validate_utf16le(buf, len);
+}
+simdutf_warn_unused bool validate_utf16be(const char16_t *buf,
+                                          size_t len) noexcept {
+  return get_default_implementation()->validate_utf16be(buf, len);
+}
+simdutf_warn_unused result validate_utf16_with_errors(const char16_t *buf,
+                                                      size_t len) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return validate_utf16be_with_errors(buf, len);
+#else
+  return validate_utf16le_with_errors(buf, len);
+#endif
+}
+simdutf_warn_unused result validate_utf16le_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept {
+  return get_default_implementation()->validate_utf16le_with_errors(buf, len);
+}
+simdutf_warn_unused result validate_utf16be_with_errors(const char16_t *buf,
+                                                        size_t len) noexcept {
+  return get_default_implementation()->validate_utf16be_with_errors(buf, len);
+}
+simdutf_warn_unused bool validate_utf32(const char32_t *buf,
+                                        size_t len) noexcept {
+  return get_default_implementation()->validate_utf32(buf, len);
+}
+simdutf_warn_unused result validate_utf32_with_errors(const char32_t *buf,
+                                                      size_t len) noexcept {
+  return get_default_implementation()->validate_utf32_with_errors(buf, len);
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_valid_utf8_to_utf16be(input, length, utf16_buffer);
+#else
+  return convert_valid_utf8_to_utf16le(input, length, utf16_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16le(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf16le(
+      input, length, utf16_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_utf16be(
+    const char *input, size_t length, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf16be(
+      input, length, utf16_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf8_to_utf32(
+    const char *input, size_t length, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf8_to_utf32(
+      input, length, utf32_buffer);
+}
+simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t *buf,
+                                                 size_t len,
+                                                 char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_utf8(buf, len, utf8_buffer);
+#else
+  return convert_utf16le_to_utf8(buf, len, utf8_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_utf16_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_latin1(buf, len, latin1_buffer);
+#else
+  return convert_utf16le_to_latin1(buf, len, latin1_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_latin1_to_utf16(
+    const char *buf, size_t len, char16_t *utf16_output) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_latin1_to_utf16be(buf, len, utf16_output);
+#else
+  return convert_latin1_to_utf16le(buf, len, utf16_output);
+#endif
+}
+simdutf_warn_unused size_t convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_latin1(buf, len,
+                                                                 latin1_buffer);
+}
+simdutf_warn_unused size_t convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_latin1(buf, len,
+                                                                 latin1_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_latin1(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_latin1(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused result convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_latin1_with_errors(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused result convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_latin1_with_errors(
+      buf, len, latin1_buffer);
+}
+simdutf_warn_unused size_t convert_utf16le_to_utf8(const char16_t *buf,
+                                                   size_t len,
+                                                   char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf8(buf, len,
+                                                               utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf16be_to_utf8(const char16_t *buf,
+                                                   size_t len,
+                                                   char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf8(buf, len,
+                                                               utf8_buffer);
+}
+simdutf_warn_unused result convert_utf16_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_utf8_with_errors(buf, len, utf8_buffer);
+#else
+  return convert_utf16le_to_utf8_with_errors(buf, len, utf8_buffer);
+#endif
+}
+simdutf_warn_unused result convert_utf16_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_latin1_with_errors(buf, len, latin1_buffer);
+#else
+  return convert_utf16le_to_latin1_with_errors(buf, len, latin1_buffer);
+#endif
+}
+simdutf_warn_unused result convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf8_with_errors(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused result convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf8_with_errors(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_valid_utf16be_to_utf8(buf, len, utf8_buffer);
+#else
+  return convert_valid_utf16le_to_utf8(buf, len, utf8_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_valid_utf16_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_valid_utf16be_to_latin1(buf, len, latin1_buffer);
+#else
+  return convert_valid_utf16le_to_latin1(buf, len, latin1_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_utf8(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_utf8(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf8(const char32_t *buf,
+                                                 size_t len,
+                                                 char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf8(buf, len,
+                                                             utf8_buffer);
+}
+simdutf_warn_unused result convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf8_with_errors(
+      buf, len, utf8_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf8(buf, len,
+                                                                   utf8_buffer);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf16(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf32_to_utf16be(buf, len, utf16_buffer);
+#else
+  return convert_utf32_to_utf16le(buf, len, utf16_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_utf32_to_latin1(
+    const char32_t *input, size_t length, char *latin1_output) noexcept {
+  return get_default_implementation()->convert_utf32_to_latin1(input, length,
+                                                               latin1_output);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16le(buf, len,
+                                                                utf16_buffer);
+}
+simdutf_warn_unused size_t convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16be(buf, len,
+                                                                utf16_buffer);
+}
+simdutf_warn_unused result convert_utf32_to_utf16_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf32_to_utf16be_with_errors(buf, len, utf16_buffer);
+#else
+  return convert_utf32_to_utf16le_with_errors(buf, len, utf16_buffer);
+#endif
+}
+simdutf_warn_unused result convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16le_with_errors(
+      buf, len, utf16_buffer);
+}
+simdutf_warn_unused result convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_utf32_to_utf16be_with_errors(
+      buf, len, utf16_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_valid_utf32_to_utf16be(buf, len, utf16_buffer);
+#else
+  return convert_valid_utf32_to_utf16le(buf, len, utf16_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf16le(
+      buf, len, utf16_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf32_to_utf16be(
+      buf, len, utf16_buffer);
+}
+simdutf_warn_unused size_t convert_utf16_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_utf32(buf, len, utf32_buffer);
+#else
+  return convert_utf16le_to_utf32(buf, len, utf32_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf32(buf, len,
+                                                                utf32_buffer);
+}
+simdutf_warn_unused size_t convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf32(buf, len,
+                                                                utf32_buffer);
+}
+simdutf_warn_unused result convert_utf16_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_utf16be_to_utf32_with_errors(buf, len, utf32_buffer);
+#else
+  return convert_utf16le_to_utf32_with_errors(buf, len, utf32_buffer);
+#endif
+}
+simdutf_warn_unused result convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16le_to_utf32_with_errors(
+      buf, len, utf32_buffer);
+}
+simdutf_warn_unused result convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_utf16be_to_utf32_with_errors(
+      buf, len, utf32_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return convert_valid_utf16be_to_utf32(buf, len, utf32_buffer);
+#else
+  return convert_valid_utf16le_to_utf32(buf, len, utf32_buffer);
+#endif
+}
+simdutf_warn_unused size_t convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16le_to_utf32(
+      buf, len, utf32_buffer);
+}
+simdutf_warn_unused size_t convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_buffer) noexcept {
+  return get_default_implementation()->convert_valid_utf16be_to_utf32(
+      buf, len, utf32_buffer);
+}
+void change_endianness_utf16(const char16_t *input, size_t length,
+                             char16_t *output) noexcept {
+  get_default_implementation()->change_endianness_utf16(input, length, output);
+}
+simdutf_warn_unused size_t count_utf16(const char16_t *input,
+                                       size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return count_utf16be(input, length);
+#else
+  return count_utf16le(input, length);
+#endif
+}
+simdutf_warn_unused size_t count_utf16le(const char16_t *input,
+                                         size_t length) noexcept {
+  return get_default_implementation()->count_utf16le(input, length);
+}
+simdutf_warn_unused size_t count_utf16be(const char16_t *input,
+                                         size_t length) noexcept {
+  return get_default_implementation()->count_utf16be(input, length);
+}
+simdutf_warn_unused size_t count_utf8(const char *input,
+                                      size_t length) noexcept {
+  return get_default_implementation()->count_utf8(input, length);
+}
+simdutf_warn_unused size_t latin1_length_from_utf8(const char *buf,
+                                                   size_t len) noexcept {
+  return get_default_implementation()->latin1_length_from_utf8(buf, len);
+}
+simdutf_warn_unused size_t latin1_length_from_utf16(size_t len) noexcept {
+  return get_default_implementation()->latin1_length_from_utf16(len);
+}
+simdutf_warn_unused size_t latin1_length_from_utf32(size_t len) noexcept {
+  return get_default_implementation()->latin1_length_from_utf32(len);
+}
+simdutf_warn_unused size_t utf8_length_from_latin1(const char *buf,
+                                                   size_t len) noexcept {
+  return get_default_implementation()->utf8_length_from_latin1(buf, len);
+}
+simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t *input,
+                                                  size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return utf8_length_from_utf16be(input, length);
+#else
+  return utf8_length_from_utf16le(input, length);
+#endif
+}
+simdutf_warn_unused size_t utf8_length_from_utf16le(const char16_t *input,
+                                                    size_t length) noexcept {
+  return get_default_implementation()->utf8_length_from_utf16le(input, length);
+}
+simdutf_warn_unused size_t utf8_length_from_utf16be(const char16_t *input,
+                                                    size_t length) noexcept {
+  return get_default_implementation()->utf8_length_from_utf16be(input, length);
+}
+simdutf_warn_unused size_t utf32_length_from_utf16(const char16_t *input,
+                                                   size_t length) noexcept {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return utf32_length_from_utf16be(input, length);
+#else
+  return utf32_length_from_utf16le(input, length);
+#endif
+}
+simdutf_warn_unused size_t utf32_length_from_utf16le(const char16_t *input,
+                                                     size_t length) noexcept {
+  return get_default_implementation()->utf32_length_from_utf16le(input, length);
+}
+simdutf_warn_unused size_t utf32_length_from_utf16be(const char16_t *input,
+                                                     size_t length) noexcept {
+  return get_default_implementation()->utf32_length_from_utf16be(input, length);
+}
+simdutf_warn_unused size_t utf16_length_from_utf8(const char *input,
+                                                  size_t length) noexcept {
+  return get_default_implementation()->utf16_length_from_utf8(input, length);
+}
+simdutf_warn_unused size_t utf16_length_from_latin1(size_t length) noexcept {
+  return get_default_implementation()->utf16_length_from_latin1(length);
+}
+simdutf_warn_unused size_t utf8_length_from_utf32(const char32_t *input,
+                                                  size_t length) noexcept {
+  return get_default_implementation()->utf8_length_from_utf32(input, length);
+}
+simdutf_warn_unused size_t utf16_length_from_utf32(const char32_t *input,
+                                                   size_t length) noexcept {
+  return get_default_implementation()->utf16_length_from_utf32(input, length);
+}
+simdutf_warn_unused size_t utf32_length_from_utf8(const char *input,
+                                                  size_t length) noexcept {
+  return get_default_implementation()->utf32_length_from_utf8(input, length);
+}
+
+simdutf_warn_unused size_t
+maximal_binary_length_from_base64(const char *input, size_t length) noexcept {
+  return get_default_implementation()->maximal_binary_length_from_base64(
+      input, length);
+}
+
+simdutf_warn_unused result base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return get_default_implementation()->base64_to_binary(
+      input, length, output, options, last_chunk_handling_options);
+}
+
+simdutf_warn_unused size_t maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) noexcept {
+  return get_default_implementation()->maximal_binary_length_from_base64(
+      input, length);
+}
+
+simdutf_warn_unused result base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return get_default_implementation()->base64_to_binary(
+      input, length, output, options, last_chunk_handling_options);
+}
+
+template <typename chartype>
+simdutf_warn_unused result base64_to_binary_safe_impl(
+    const chartype *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  static_assert(std::is_same<chartype, char>::value ||
+                    std::is_same<chartype, char16_t>::value,
+                "Only char and char16_t are supported.");
+  // The implementation could be nicer, but we expect that most times, the user
+  // will provide us with a buffer that is large enough.
+  size_t max_length = maximal_binary_length_from_base64(input, length);
+  if (outlen >= max_length) {
+    // fast path
+    full_result r = get_default_implementation()->base64_to_binary_details(
+        input, length, output, options, last_chunk_handling_options);
+    if (r.error != error_code::INVALID_BASE64_CHARACTER &&
+        r.error != error_code::BASE64_EXTRA_BITS) {
+      outlen = r.output_count;
+      if (last_chunk_handling_options == stop_before_partial) {
+        if ((r.output_count % 3) != 0) {
+          bool empty_trail = true;
+          for (size_t i = r.input_count; i < length; i++) {
+            if (!scalar::base64::is_ascii_white_space_or_padding(input[i])) {
+              empty_trail = false;
+              break;
+            }
+          }
+          if (empty_trail) {
+            r.input_count = length;
+          }
+        }
+        return {r.error, r.input_count};
+      }
+      return {r.error, length};
+    }
+    return r;
+  }
+  // The output buffer is maybe too small. We will decode a truncated version of
+  // the input.
+  size_t outlen3 = outlen / 3 * 3; // round down to multiple of 3
+  size_t safe_input = base64_length_from_binary(outlen3, options);
+  full_result r = get_default_implementation()->base64_to_binary_details(
+      input, safe_input, output, options, loose);
+  if (r.error == error_code::INVALID_BASE64_CHARACTER) {
+    return r;
+  }
+  size_t offset =
+      (r.error == error_code::BASE64_INPUT_REMAINDER)
+          ? 1
+          : ((r.output_count % 3) == 0 ? 0 : (r.output_count % 3) + 1);
+  size_t output_index = r.output_count - (r.output_count % 3);
+  size_t input_index = safe_input;
+  // offset is a value that is no larger than 3. We backtrack
+  // by up to offset characters + an undetermined number of
+  // white space characters. It is expected that the next loop
+  // runs at most 3 times + the number of white space characters
+  // in between them, so we are not worried about performance.
+  while (offset > 0 && input_index > 0) {
+    chartype c = input[--input_index];
+    if (scalar::base64::is_ascii_white_space(c)) {
+      // skipping
+    } else {
+      offset--;
+    }
+  }
+  size_t remaining_out = outlen - output_index;
+  const chartype *tail_input = input + input_index;
+  size_t tail_length = length - input_index;
+  while (tail_length > 0 &&
+         scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
+    tail_length--;
+  }
+  size_t padding_characts = 0;
+  if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
+    tail_length--;
+    padding_characts++;
+    while (tail_length > 0 &&
+           scalar::base64::is_ascii_white_space(tail_input[tail_length - 1])) {
+      tail_length--;
+    }
+    if (tail_length > 0 && tail_input[tail_length - 1] == '=') {
+      tail_length--;
+      padding_characts++;
+    }
+  }
+  // this will advance tail_input and tail_length
+  result rr = scalar::base64::base64_tail_decode_safe(
+      output + output_index, remaining_out, tail_input, tail_length,
+      padding_characts, options, last_chunk_handling_options);
+  outlen = output_index + remaining_out;
+  if (last_chunk_handling_options != stop_before_partial &&
+      rr.error == error_code::SUCCESS && padding_characts > 0) {
+    // additional checks
+    if ((outlen % 3 == 0) || ((outlen % 3) + 1 + padding_characts != 4)) {
+      rr.error = error_code::INVALID_BASE64_CHARACTER;
+    }
+  }
+  if (rr.error == error_code::SUCCESS &&
+      last_chunk_handling_options == stop_before_partial) {
+    if (tail_input > input + input_index) {
+      rr.count = tail_input - input;
+    } else if (r.input_count > 0) {
+      rr.count = r.input_count + rr.count;
+    }
+    return rr;
+  }
+  rr.count += input_index;
+  return rr;
+}
+
+simdutf_warn_unused size_t convert_latin1_to_utf8_safe(
+    const char *buf, size_t len, char *utf8_output, size_t utf8_len) noexcept {
+  const auto start{utf8_output};
+
+  while (true) {
+    // convert_latin1_to_utf8 will never write more than input length * 2
+    auto read_len = std::min(len, utf8_len >> 1);
+    if (read_len <= 16) {
+      break;
+    }
+
+    const auto write_len =
+        simdutf::convert_latin1_to_utf8(buf, read_len, utf8_output);
+
+    utf8_output += write_len;
+    utf8_len -= write_len;
+    buf += read_len;
+    len -= read_len;
+  }
+
+  utf8_output +=
+      scalar::latin1_to_utf8::convert_safe(buf, len, utf8_output, utf8_len);
+
+  return utf8_output - start;
+}
+
+simdutf_warn_unused result base64_to_binary_safe(
+    const char *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return base64_to_binary_safe_impl<char>(input, length, output, outlen,
+                                          options, last_chunk_handling_options);
+}
+simdutf_warn_unused result base64_to_binary_safe(
+    const char16_t *input, size_t length, char *output, size_t &outlen,
+    base64_options options,
+    last_chunk_handling_options last_chunk_handling_options) noexcept {
+  return base64_to_binary_safe_impl<char16_t>(
+      input, length, output, outlen, options, last_chunk_handling_options);
+}
+
+simdutf_warn_unused size_t
+base64_length_from_binary(size_t length, base64_options options) noexcept {
+  return get_default_implementation()->base64_length_from_binary(length,
+                                                                 options);
+}
+
+size_t binary_to_base64(const char *input, size_t length, char *output,
+                        base64_options options) noexcept {
+  return get_default_implementation()->binary_to_base64(input, length, output,
+                                                        options);
+}
+
+simdutf_warn_unused simdutf::encoding_type
+autodetect_encoding(const char *buf, size_t length) noexcept {
+  return get_default_implementation()->autodetect_encoding(buf, length);
+}
+simdutf_warn_unused int detect_encodings(const char *buf,
+                                         size_t length) noexcept {
+  return get_default_implementation()->detect_encodings(buf, length);
+}
+const implementation *builtin_implementation() {
+  static const implementation *builtin_impl =
+      get_available_implementations()[SIMDUTF_STRINGIFY(
+          SIMDUTF_BUILTIN_IMPLEMENTATION)];
+  return builtin_impl;
+}
+
+simdutf_warn_unused size_t trim_partial_utf8(const char *input, size_t length) {
+  return scalar::utf8::trim_partial_utf8(input, length);
+}
+
+simdutf_warn_unused size_t trim_partial_utf16be(const char16_t *input,
+                                                size_t length) {
+  return scalar::utf16::trim_partial_utf16<BIG>(input, length);
+}
+
+simdutf_warn_unused size_t trim_partial_utf16le(const char16_t *input,
+                                                size_t length) {
+  return scalar::utf16::trim_partial_utf16<LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t trim_partial_utf16(const char16_t *input,
+                                              size_t length) {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return trim_partial_utf16be(input, length);
+#else
+  return trim_partial_utf16le(input, length);
+#endif
+}
+
+} // namespace simdutf
+/* end file src/implementation.cpp */
+/* begin file src/encoding_types.cpp */
+
+namespace simdutf {
+bool match_system(endianness e) {
+#if SIMDUTF_IS_BIG_ENDIAN
+  return e == endianness::BIG;
+#else
+  return e == endianness::LITTLE;
+#endif
+}
+
+std::string to_string(encoding_type bom) {
+  switch (bom) {
+  case UTF16_LE:
+    return "UTF16 little-endian";
+  case UTF16_BE:
+    return "UTF16 big-endian";
+  case UTF32_LE:
+    return "UTF32 little-endian";
+  case UTF32_BE:
+    return "UTF32 big-endian";
+  case UTF8:
+    return "UTF8";
+  case unspecified:
+    return "unknown";
+  default:
+    return "error";
+  }
+}
+
+namespace BOM {
+// Note that BOM for UTF8 is discouraged.
+encoding_type check_bom(const uint8_t *byte, size_t length) {
+  if (length >= 2 && byte[0] == 0xff and byte[1] == 0xfe) {
+    if (length >= 4 && byte[2] == 0x00 and byte[3] == 0x0) {
+      return encoding_type::UTF32_LE;
+    } else {
+      return encoding_type::UTF16_LE;
+    }
+  } else if (length >= 2 && byte[0] == 0xfe and byte[1] == 0xff) {
+    return encoding_type::UTF16_BE;
+  } else if (length >= 4 && byte[0] == 0x00 and byte[1] == 0x00 and
+             byte[2] == 0xfe and byte[3] == 0xff) {
+    return encoding_type::UTF32_BE;
+  } else if (length >= 4 && byte[0] == 0xef and byte[1] == 0xbb and
+             byte[2] == 0xbf) {
+    return encoding_type::UTF8;
+  }
+  return encoding_type::unspecified;
+}
+
+encoding_type check_bom(const char *byte, size_t length) {
+  return check_bom(reinterpret_cast<const uint8_t *>(byte), length);
+}
+
+size_t bom_byte_size(encoding_type bom) {
+  switch (bom) {
+  case UTF16_LE:
+    return 2;
+  case UTF16_BE:
+    return 2;
+  case UTF32_LE:
+    return 4;
+  case UTF32_BE:
+    return 4;
+  case UTF8:
+    return 3;
+  case unspecified:
+    return 0;
+  default:
+    return 0;
+  }
+}
+
+} // namespace BOM
+} // namespace simdutf
+/* end file src/encoding_types.cpp */
+/* begin file src/error.cpp */
+namespace simdutf {
+// deliberately empty
+}
+/* end file src/error.cpp */
+// The large tables should be included once and they
+// should not depend on a kernel.
+/* begin file src/tables/utf8_to_utf16_tables.h */
+#ifndef SIMDUTF_UTF8_TO_UTF16_TABLES_H
+#define SIMDUTF_UTF8_TO_UTF16_TABLES_H
+#include <cstdint>
+
+namespace simdutf {
+namespace {
+namespace tables {
+namespace utf8_to_utf16 {
+/**
+ * utf8bigindex uses about 8 kB
+ * shufutf8 uses about 3344 B
+ *
+ * So we use a bit over 11 kB. It would be
+ * easy to save about 4 kB by only
+ * storing the index in utf8bigindex, and
+ * deriving the consumed bytes otherwise.
+ * However, this may come at a significant (10% to 20%)
+ * performance penalty.
+ */
+
+const uint8_t shufutf8[209][16] = {
+    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 5, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 4, 255, 6, 5, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 3, 255, 5, 4, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 2, 255, 4, 3, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 1, 255, 3, 2, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {0, 255, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 6, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 5, 255, 7, 6, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 4, 255, 6, 5, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 3, 255, 5, 4, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 2, 255, 4, 3, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 7, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 6, 255, 8, 7, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 5, 255, 7, 6, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 4, 255, 6, 5, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 8, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 7, 255, 9, 8, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 6, 255, 8, 7, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 9, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 8, 255, 10, 9, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 255, 0, 0, 0, 0},
+    {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 4, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 5, 4, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 6, 5, 4, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 5, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 6, 5, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 7, 6, 5, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 6, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 7, 6, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 8, 7, 6, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 7, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 8, 7, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 9, 8, 7, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 8, 255, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 9, 8, 255, 255},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 5, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 6, 5, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 7, 6, 5, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 6, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 7, 6, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 8, 7, 6, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 7, 6, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 8, 7, 6, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 8, 255, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 9, 8, 255, 255},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 7, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 8, 7, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 9, 8, 7, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 8, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 9, 8, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 10, 9, 8, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 9, 255, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 10, 9, 255, 255},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 11, 10, 9, 255},
+    {0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 3, 2, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 4, 3, 2, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 1, 255, 255, 255, 5, 4, 3, 2, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 2, 1, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 3, 2, 1, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 5, 255, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 6, 5, 255, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 7, 6, 5, 255, 0, 0, 0, 0},
+    {0, 255, 255, 255, 4, 3, 2, 1, 8, 7, 6, 5, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 4, 3, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 5, 4, 3, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 2, 255, 255, 255, 6, 5, 4, 3, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 3, 2, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 4, 3, 2, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 6, 255, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 7, 6, 255, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 8, 7, 6, 255, 0, 0, 0, 0},
+    {1, 0, 255, 255, 5, 4, 3, 2, 9, 8, 7, 6, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 4, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 5, 4, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 6, 5, 4, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 3, 255, 255, 255, 7, 6, 5, 4, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 4, 3, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 6, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 7, 6, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 8, 7, 6, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 5, 4, 3, 255, 9, 8, 7, 6, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 7, 255, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 8, 7, 255, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 9, 8, 7, 255, 0, 0, 0, 0},
+    {2, 1, 0, 255, 6, 5, 4, 3, 10, 9, 8, 7, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 5, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 6, 5, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 7, 6, 5, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 4, 255, 255, 255, 8, 7, 6, 5, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 6, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 7, 6, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 8, 7, 6, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 5, 4, 255, 255, 9, 8, 7, 6, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 7, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 8, 7, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 9, 8, 7, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 6, 5, 4, 255, 10, 9, 8, 7, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 8, 255, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 9, 8, 255, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 10, 9, 8, 255, 0, 0, 0, 0},
+    {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 0, 0, 0, 0}};
+/* number of two bytes : 64 */
+/* number of two + three bytes : 145 */
+/* number of two + three + four bytes : 209 */
+const uint8_t utf8bigindex[4096][2] = {
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
+    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12},
+    {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},
+    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {152, 7},
+    {164, 7},  {145, 3},  {209, 12}, {155, 7},  {167, 7},  {69, 7},   {179, 7},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},
+    {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
+    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
+    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {171, 8},
+    {72, 8},   {183, 8},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {186, 8},  {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},
+    {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},
+    {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
+    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {187, 9},  {81, 9},
+    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
+    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
+    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {77, 7},   {95, 7},
+    {7, 9},    {194, 7},  {83, 7},   {101, 7},  {11, 9},   {119, 7},  {19, 9},
+    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {13, 9},
+    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
+    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
+    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
+    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},
+    {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},
+    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10},
+    {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},
+    {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},
+    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
+    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10},
+    {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},
+    {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10},
+    {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},
+    {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},
+    {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},
+    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
+    {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},
+    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},
+    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {15, 10},  {122, 8},  {23, 10},
+    {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
+    {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},
+    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
+    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},
+    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},
+    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
+    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},
+    {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10},
+    {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},
+    {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},
+    {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
+    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},
+    {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},
+    {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},
+    {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},
+    {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},
+    {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},
+    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
+    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
+    {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
+    {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12},
+    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},
+    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
+    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
+    {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},
+    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
+    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
+    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12},
+    {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},
+    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},  {163, 6},  {66, 6},
+    {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},
+    {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},
+    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
+    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {192, 11}, {152, 7},  {164, 7},  {145, 3},  {204, 11}, {155, 7},  {167, 7},
+    {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},   {65, 5},   {194, 7},
+    {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},
+    {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {207, 11}, {156, 8},
+    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {159, 8},  {117, 11}, {72, 8},   {135, 11}, {78, 8},   {96, 8},   {65, 5},
+    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {141, 11}, {80, 8},
+    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},  {68, 6},   {122, 8},
+    {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
+    {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},  {82, 6},   {100, 6},
+    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
+    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {77, 7},   {95, 7},
+    {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},   {119, 7},  {18, 8},
+    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {12, 8},
+    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
+    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},
+    {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},
+    {143, 11}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},
+    {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
+    {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
+    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},
+    {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},
+    {31, 11},  {47, 11},  {7, 9},    {194, 7},  {83, 7},   {55, 11},  {11, 9},
+    {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {59, 11},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},
+    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},
+    {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
+    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},
+    {86, 8},   {61, 11},  {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},
+    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},
+    {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},
+    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
+    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
+    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
+    {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},
+    {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
+    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
+    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {209, 12},
+    {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},   {209, 12},
+    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10}, {151, 6},  {163, 6},
+    {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},   {178, 6},  {74, 6},
+    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},   {181, 6},
+    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
+    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},  {203, 10}, {90, 10},
+    {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},   {95, 7},   {65, 5},
+    {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},   {91, 5},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {138, 10}, {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},   {121, 7},
+    {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},   {100, 6},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {206, 10},
+    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10}, {78, 8},   {96, 8},
+    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {140, 10},
+    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {62, 11},  {15, 10},
+    {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12}, {157, 6},  {110, 8},
+    {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},    {193, 6},  {82, 6},
+    {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
+    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {29, 10},
+    {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},  {10, 8},   {119, 7},
+    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {57, 10},
+    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
+    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},  {184, 9},
+    {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {175, 9},
+    {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},   {199, 9},  {87, 9},
+    {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},  {152, 7},
+    {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},   {125, 9},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},  {71, 7},
+    {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},  {83, 7},   {54, 10},
+    {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},   {37, 9},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},   {41, 9},
+    {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {115, 9},
+    {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},  {22, 9},   {38, 9},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {26, 9},
+    {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},    {194, 7},
+    {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},  {65, 5},
+    {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},  {209, 12}, {151, 6},
+    {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},  {68, 6},   {178, 6},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},  {70, 6},
+    {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {152, 7},  {164, 7},  {145, 3},  {209, 12},
+    {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},  {77, 7},   {95, 7},
+    {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},  {73, 5},
+    {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {185, 7},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},  {68, 6},
+    {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},  {82, 6},
+    {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {208, 12}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {171, 8},  {72, 8},   {183, 8},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {186, 8},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {104, 8},
+    {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},   {5, 8},    {193, 6},
+    {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},   {101, 7},  {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},  {147, 5},
+    {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},  {165, 5},
+    {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {175, 9},  {148, 6},  {144, 12}, {81, 9},   {99, 9},   {66, 6},   {199, 9},
+    {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},   {64, 4},
+    {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},   {94, 6},
+    {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},
+    {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {190, 9},
+    {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},  {69, 7},
+    {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {113, 9},
+    {71, 7},   {131, 9},  {77, 7},   {95, 7},   {7, 9},    {194, 7},  {83, 7},
+    {101, 7},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {13, 9},   {121, 7},  {21, 9},   {37, 9},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {25, 9},
+    {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},  {168, 8},
+    {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},
+    {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},   {195, 8},
+    {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},   {98, 8},
+    {66, 6},   {198, 8},  {86, 8},   {104, 8},  {14, 9},   {122, 8},  {22, 9},
+    {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},
+    {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},   {9, 8},
+    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},
+    {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},   {6, 8},
+    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},   {34, 8},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},   {121, 7},
+    {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12},
+    {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},  {162, 5},
+    {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10}, {148, 6},  {188, 10},
+    {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},  {166, 6},  {68, 6},
+    {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {169, 6},
+    {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},
+    {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},  {164, 7},  {145, 3},
+    {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10}, {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},   {132, 10}, {77, 7},
+    {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},   {119, 7},
+    {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {103, 7},
+    {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},    {193, 6},
+    {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10}, {72, 8},   {134, 10},
+    {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},
+    {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},
+    {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},
+    {63, 12},  {15, 10},  {122, 8},  {23, 10},  {39, 10},  {3, 8},    {209, 12},
+    {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},  {43, 10},  {5, 8},
+    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},  {17, 8},   {33, 8},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},
+    {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},
+    {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},  {83, 7},   {53, 10},
+    {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},
+    {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},
+    {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},  {172, 9},
+    {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},  {153, 5},
+    {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},   {99, 9},   {66, 6},
+    {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},   {92, 6},
+    {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},  {76, 6},
+    {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},
+    {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},   {107, 9},
+    {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},  {7, 9},    {194, 7},
+    {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},   {35, 9},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},   {121, 7},  {21, 9},
+    {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},  {156, 8},
+    {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},   {65, 5},
+    {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},  {80, 8},
+    {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},  {14, 9},   {122, 8},
+    {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},
+    {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},   {50, 9},
+    {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},
+    {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},   {44, 9},
+    {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},  {18, 8},
+    {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},   {12, 8},
+    {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},
+    {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12}, {150, 5},
+    {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},   {177, 5},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {209, 12}, {148, 6},
+    {209, 12}, {151, 6},  {163, 6},  {66, 6},   {209, 12}, {154, 6},  {166, 6},
+    {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},
+    {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},   {193, 6},
+    {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {192, 11}, {152, 7},  {164, 7},
+    {145, 3},  {204, 11}, {155, 7},  {167, 7},  {69, 7},   {179, 7},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {170, 7},  {71, 7},   {182, 7},
+    {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},  {67, 5},
+    {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {185, 7},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},   {4, 7},
+    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {145, 3},  {207, 11}, {156, 8},  {168, 8},  {146, 4},  {180, 8},
+    {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {117, 11}, {72, 8},
+    {135, 11}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},
+    {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12},
+    {174, 8},  {148, 6},  {141, 11}, {80, 8},   {98, 8},   {66, 6},   {198, 8},
+    {86, 8},   {104, 8},  {68, 6},   {122, 8},  {74, 6},   {92, 6},   {3, 8},
+    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {76, 6},   {94, 6},
+    {5, 8},    {193, 6},  {82, 6},   {100, 6},  {9, 8},    {118, 6},  {17, 8},
+    {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},
+    {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},
+    {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},
+    {71, 7},   {130, 8},  {77, 7},   {95, 7},   {6, 8},    {194, 7},  {83, 7},
+    {101, 7},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12},
+    {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},
+    {197, 7},  {85, 7},   {103, 7},  {12, 8},   {121, 7},  {20, 8},   {36, 8},
+    {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},
+    {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},
+    {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12}, {209, 12},
+    {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12}, {160, 9},
+    {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},   {196, 9},
+    {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},   {64, 4},
+    {209, 12}, {209, 12}, {175, 9},  {148, 6},  {143, 11}, {81, 9},   {99, 9},
+    {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},  {74, 6},
+    {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},   {129, 9},
+    {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},  {67, 5},
+    {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},  {89, 9},
+    {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},   {209, 12},
+    {158, 7},  {113, 9},  {71, 7},   {131, 9},  {31, 11},  {47, 11},  {7, 9},
+    {194, 7},  {83, 7},   {55, 11},  {11, 9},   {119, 7},  {19, 9},   {35, 9},
+    {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},  {79, 7},
+    {97, 7},   {66, 6},   {197, 7},  {85, 7},   {59, 11},  {13, 9},   {121, 7},
+    {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},
+    {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},   {49, 9},
+    {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {205, 9},
+    {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},  {64, 4},
+    {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},   {96, 8},
+    {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},  {73, 5},
+    {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},  {139, 9},
+    {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {61, 11},  {14, 9},
+    {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},  {110, 8},
+    {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},  {82, 6},
+    {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},  {145, 3},
+    {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},   {93, 7},
+    {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},  {28, 9},
+    {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},   {119, 7},
+    {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},
+    {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {56, 9},
+    {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12}, {157, 6},
+    {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},    {193, 6},
+    {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {145, 3},  {209, 12}, {209, 12}, {209, 12}, {146, 4},  {209, 12}, {149, 4},
+    {161, 4},  {64, 4},   {209, 12}, {209, 12}, {209, 12}, {147, 5},  {209, 12},
+    {150, 5},  {162, 5},  {65, 5},   {209, 12}, {153, 5},  {165, 5},  {67, 5},
+    {177, 5},  {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {176, 10},
+    {148, 6},  {188, 10}, {151, 6},  {163, 6},  {66, 6},   {200, 10}, {154, 6},
+    {166, 6},  {68, 6},   {178, 6},  {74, 6},   {92, 6},   {64, 4},   {209, 12},
+    {157, 6},  {169, 6},  {70, 6},   {181, 6},  {76, 6},   {94, 6},   {65, 5},
+    {193, 6},  {82, 6},   {100, 6},  {67, 5},   {118, 6},  {73, 5},   {91, 5},
+    {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {191, 10}, {152, 7},
+    {164, 7},  {145, 3},  {203, 10}, {90, 10},  {108, 10}, {69, 7},   {126, 10},
+    {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},  {114, 10}, {71, 7},
+    {132, 10}, {77, 7},   {95, 7},   {65, 5},   {194, 7},  {83, 7},   {101, 7},
+    {67, 5},   {119, 7},  {73, 5},   {91, 5},   {1, 7},    {209, 12}, {209, 12},
+    {173, 7},  {148, 6},  {138, 10}, {79, 7},   {97, 7},   {66, 6},   {197, 7},
+    {85, 7},   {103, 7},  {68, 6},   {121, 7},  {74, 6},   {92, 6},   {2, 7},
+    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},  {76, 6},   {94, 6},
+    {4, 7},    {193, 6},  {82, 6},   {100, 6},  {8, 7},    {118, 6},  {16, 7},
+    {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {145, 3},  {206, 10}, {156, 8},  {168, 8},  {146, 4},
+    {180, 8},  {149, 4},  {161, 4},  {64, 4},   {209, 12}, {159, 8},  {116, 10},
+    {72, 8},   {134, 10}, {78, 8},   {96, 8},   {65, 5},   {195, 8},  {84, 8},
+    {102, 8},  {67, 5},   {120, 8},  {73, 5},   {91, 5},   {64, 4},   {209, 12},
+    {209, 12}, {174, 8},  {148, 6},  {140, 10}, {80, 8},   {98, 8},   {66, 6},
+    {198, 8},  {86, 8},   {62, 11},  {15, 10},  {122, 8},  {23, 10},  {39, 10},
+    {3, 8},    {209, 12}, {157, 6},  {110, 8},  {70, 6},   {128, 8},  {27, 10},
+    {43, 10},  {5, 8},    {193, 6},  {82, 6},   {51, 10},  {9, 8},    {118, 6},
+    {17, 8},   {33, 8},   {0, 6},    {209, 12}, {209, 12}, {209, 12}, {209, 12},
+    {189, 8},  {152, 7},  {164, 7},  {145, 3},  {201, 8},  {88, 8},   {106, 8},
+    {69, 7},   {124, 8},  {75, 7},   {93, 7},   {64, 4},   {209, 12}, {158, 7},
+    {112, 8},  {71, 7},   {130, 8},  {29, 10},  {45, 10},  {6, 8},    {194, 7},
+    {83, 7},   {53, 10},  {10, 8},   {119, 7},  {18, 8},   {34, 8},   {1, 7},
+    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {136, 8},  {79, 7},   {97, 7},
+    {66, 6},   {197, 7},  {85, 7},   {57, 10},  {12, 8},   {121, 7},  {20, 8},
+    {36, 8},   {2, 7},    {209, 12}, {157, 6},  {109, 7},  {70, 6},   {127, 7},
+    {24, 8},   {40, 8},   {4, 7},    {193, 6},  {82, 6},   {48, 8},   {8, 7},
+    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12}, {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},  {209, 12}, {209, 12},
+    {209, 12}, {146, 4},  {209, 12}, {149, 4},  {161, 4},  {64, 4},   {209, 12},
+    {160, 9},  {172, 9},  {147, 5},  {184, 9},  {150, 5},  {162, 5},  {65, 5},
+    {196, 9},  {153, 5},  {165, 5},  {67, 5},   {177, 5},  {73, 5},   {91, 5},
+    {64, 4},   {209, 12}, {209, 12}, {175, 9},  {148, 6},  {142, 10}, {81, 9},
+    {99, 9},   {66, 6},   {199, 9},  {87, 9},   {105, 9},  {68, 6},   {123, 9},
+    {74, 6},   {92, 6},   {64, 4},   {209, 12}, {157, 6},  {111, 9},  {70, 6},
+    {129, 9},  {76, 6},   {94, 6},   {65, 5},   {193, 6},  {82, 6},   {100, 6},
+    {67, 5},   {118, 6},  {73, 5},   {91, 5},   {0, 6},    {209, 12}, {209, 12},
+    {209, 12}, {209, 12}, {190, 9},  {152, 7},  {164, 7},  {145, 3},  {202, 9},
+    {89, 9},   {107, 9},  {69, 7},   {125, 9},  {75, 7},   {93, 7},   {64, 4},
+    {209, 12}, {158, 7},  {113, 9},  {71, 7},   {131, 9},  {30, 10},  {46, 10},
+    {7, 9},    {194, 7},  {83, 7},   {54, 10},  {11, 9},   {119, 7},  {19, 9},
+    {35, 9},   {1, 7},    {209, 12}, {209, 12}, {173, 7},  {148, 6},  {137, 9},
+    {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},   {58, 10},  {13, 9},
+    {121, 7},  {21, 9},   {37, 9},   {2, 7},    {209, 12}, {157, 6},  {109, 7},
+    {70, 6},   {127, 7},  {25, 9},   {41, 9},   {4, 7},    {193, 6},  {82, 6},
+    {49, 9},   {8, 7},    {118, 6},  {16, 7},   {32, 7},   {0, 6},    {209, 12},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {209, 12}, {145, 3},
+    {205, 9},  {156, 8},  {168, 8},  {146, 4},  {180, 8},  {149, 4},  {161, 4},
+    {64, 4},   {209, 12}, {159, 8},  {115, 9},  {72, 8},   {133, 9},  {78, 8},
+    {96, 8},   {65, 5},   {195, 8},  {84, 8},   {102, 8},  {67, 5},   {120, 8},
+    {73, 5},   {91, 5},   {64, 4},   {209, 12}, {209, 12}, {174, 8},  {148, 6},
+    {139, 9},  {80, 8},   {98, 8},   {66, 6},   {198, 8},  {86, 8},   {60, 10},
+    {14, 9},   {122, 8},  {22, 9},   {38, 9},   {3, 8},    {209, 12}, {157, 6},
+    {110, 8},  {70, 6},   {128, 8},  {26, 9},   {42, 9},   {5, 8},    {193, 6},
+    {82, 6},   {50, 9},   {9, 8},    {118, 6},  {17, 8},   {33, 8},   {0, 6},
+    {209, 12}, {209, 12}, {209, 12}, {209, 12}, {189, 8},  {152, 7},  {164, 7},
+    {145, 3},  {201, 8},  {88, 8},   {106, 8},  {69, 7},   {124, 8},  {75, 7},
+    {93, 7},   {64, 4},   {209, 12}, {158, 7},  {112, 8},  {71, 7},   {130, 8},
+    {28, 9},   {44, 9},   {6, 8},    {194, 7},  {83, 7},   {52, 9},   {10, 8},
+    {119, 7},  {18, 8},   {34, 8},   {1, 7},    {209, 12}, {209, 12}, {173, 7},
+    {148, 6},  {136, 8},  {79, 7},   {97, 7},   {66, 6},   {197, 7},  {85, 7},
+    {56, 9},   {12, 8},   {121, 7},  {20, 8},   {36, 8},   {2, 7},    {209, 12},
+    {157, 6},  {109, 7},  {70, 6},   {127, 7},  {24, 8},   {40, 8},   {4, 7},
+    {193, 6},  {82, 6},   {48, 8},   {8, 7},    {118, 6},  {16, 7},   {32, 7},
+    {0, 6}};
+} // namespace utf8_to_utf16
+} // namespace tables
+} // unnamed namespace
+} // namespace simdutf
+
+#endif // SIMDUTF_UTF8_TO_UTF16_TABLES_H
+/* end file src/tables/utf8_to_utf16_tables.h */
+/* begin file src/tables/utf16_to_utf8_tables.h */
+// file generated by scripts/sse_convert_utf16_to_utf8.py
+#ifndef SIMDUTF_UTF16_TO_UTF8_TABLES_H
+#define SIMDUTF_UTF16_TO_UTF8_TABLES_H
+
+namespace simdutf {
+namespace {
+namespace tables {
+namespace utf16_to_utf8 {
+
+// 1 byte for length, 16 bytes for mask
 const uint8_t pack_1_2_utf8_bytes[256][17] = {
     {16, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14},
     {15, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 0x80},
@@ -10734,14577 +13507,24317 @@ const uint8_t pack_1_2_utf8_bytes[256][17] = {
     {10, 1, 0, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
     {9, 0, 2, 4, 6, 8, 10, 12, 15, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
      0x80},
-    {15, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80},
-    {14, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
-    {14, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
-    {13, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {14, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
-    {13, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {13, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80},
-    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {14, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
-    {13, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80},
-    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {14, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
-    {13, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {13, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+    {15, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80},
+    {14, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {14, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 4, 7, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 7, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 7, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 7, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {14, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80},
+    {13, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 2, 5, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {13, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 4, 6, 9, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 8, 11, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 9, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 8, 10, 13, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
+    {12, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {12, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
+    {11, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {11, 1, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {11, 1, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 1, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 1, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80}};
+
+// 1 byte for length, 16 bytes for mask
+const uint8_t pack_1_2_3_utf8_bytes[256][17] = {
+    {12, 2, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80},
+    {9, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {11, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {10, 0, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 2, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {11, 2, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 2, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {11, 2, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {10, 2, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {1, 0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {5, 2, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 2, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 2, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 2, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 2, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {11, 2, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
      0x80},
-    {13, 1, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80},
-    {12, 0, 3, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 2, 5, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 5, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 3, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 5, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 2, 5, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {12, 1, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80},
-    {11, 0, 3, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 3, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {11, 1, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 2, 4, 6, 9, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 2, 4, 6, 8, 11, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+    {10, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {9, 0, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
      0x80},
-    {11, 1, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 3, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 3, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 1, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 2, 4, 6, 9, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 1, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 0, 2, 4, 6, 8, 10, 12, 14, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80}};
+    {8, 2, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 0, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 2, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {4, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {3, 0, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {10, 2, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {10, 2, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {7, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {9, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {8, 0, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 2, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {9, 2, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {8, 2, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 2, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80, 0x80},
+    {3, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {2, 0, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 2, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 2, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {9, 2, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
+    {6, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {7, 0, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 2, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {8, 2, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 2, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {6, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {5, 0, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {8, 2, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {7, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {6, 0, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 2, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {2, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {4, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {3, 0, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {7, 2, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {4, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {5, 0, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {6, 2, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80},
+    {3, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80, 0x80},
+    {5, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80},
+    {4, 0, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+     0x80, 0x80}};
+
+} // namespace utf16_to_utf8
+} // namespace tables
+} // unnamed namespace
+} // namespace simdutf
+
+#endif // SIMDUTF_UTF16_TO_UTF8_TABLES_H
+/* end file src/tables/utf16_to_utf8_tables.h */
+// End of tables.
+
+// The scalar routines should be included once.
+/* begin file src/scalar/ascii.h */
+#ifndef SIMDUTF_ASCII_H
+#define SIMDUTF_ASCII_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace ascii {
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+// Only used by the fallback kernel.
+inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  uint64_t pos = 0;
+  // process in blocks of 16 bytes when possible
+  for (; pos + 16 <= len; pos += 16) {
+    uint64_t v1;
+    std::memcpy(&v1, data + pos, sizeof(uint64_t));
+    uint64_t v2;
+    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+    uint64_t v{v1 | v2};
+    if ((v & 0x8080808080808080) != 0) {
+      return false;
+    }
+  }
+  // process the tail byte-by-byte
+  for (; pos < len; pos++) {
+    if (data[pos] >= 0b10000000) {
+      return false;
+    }
+  }
+  return true;
+}
+#endif
+
+inline simdutf_warn_unused result validate_with_errors(const char *buf,
+                                                       size_t len) noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  // process in blocks of 16 bytes when possible
+  for (; pos + 16 <= len; pos += 16) {
+    uint64_t v1;
+    std::memcpy(&v1, data + pos, sizeof(uint64_t));
+    uint64_t v2;
+    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+    uint64_t v{v1 | v2};
+    if ((v & 0x8080808080808080) != 0) {
+      for (; pos < len; pos++) {
+        if (data[pos] >= 0b10000000) {
+          return result(error_code::TOO_LARGE, pos);
+        }
+      }
+    }
+  }
+  // process the tail byte-by-byte
+  for (; pos < len; pos++) {
+    if (data[pos] >= 0b10000000) {
+      return result(error_code::TOO_LARGE, pos);
+    }
+  }
+  return result(error_code::SUCCESS, pos);
+}
+
+} // namespace ascii
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/ascii.h */
+/* begin file src/scalar/latin1.h */
+#ifndef SIMDUTF_LATIN1_H
+#define SIMDUTF_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace latin1 {
+
+inline size_t utf32_length_from_latin1(size_t len) {
+  // We are not BOM aware.
+  return len; // a utf32 unit will always represent 1 latin1 character
+}
+
+inline size_t utf8_length_from_latin1(const char *buf, size_t len) {
+  const uint8_t *c = reinterpret_cast<const uint8_t *>(buf);
+  size_t answer = 0;
+  for (size_t i = 0; i < len; i++) {
+    if ((c[i] >> 7)) {
+      answer++;
+    }
+  }
+  return answer + len;
+}
+
+inline size_t utf16_length_from_latin1(size_t len) { return len; }
+
+} // namespace latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/latin1.h */
+
+/* begin file src/scalar/utf32_to_utf8/valid_utf32_to_utf8.h */
+#ifndef SIMDUTF_VALID_UTF32_TO_UTF8_H
+#define SIMDUTF_VALID_UTF32_TO_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_utf8 {
+
+#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
+// only used by the fallback and POWER kernel
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char *utf8_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 2 ASCII characters
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
+        *utf8_output++ = char(buf[pos]);
+        *utf8_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
+      }
+    }
+    uint32_t word = data[pos];
+    if ((word & 0xFFFFFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xFFFFF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xFFFF0000) == 0) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    }
+  }
+  return utf8_output - start;
+}
+#endif // SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
+
+} // namespace utf32_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_utf8/valid_utf32_to_utf8.h */
+/* begin file src/scalar/utf32_to_utf8/utf32_to_utf8.h */
+#ifndef SIMDUTF_UTF32_TO_UTF8_H
+#define SIMDUTF_UTF32_TO_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_utf8 {
+
+inline size_t convert(const char32_t *buf, size_t len, char *utf8_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 2 ASCII characters
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
+        *utf8_output++ = char(buf[pos]);
+        *utf8_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
+      }
+    }
+    uint32_t word = data[pos];
+    if ((word & 0xFFFFFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xFFFFF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xFFFF0000) == 0) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return 0;
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      if (word > 0x10FFFF) {
+        return 0;
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    }
+  }
+  return utf8_output - start;
+}
+
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char *utf8_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 2 ASCII characters
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
+        *utf8_output++ = char(buf[pos]);
+        *utf8_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
+      }
+    }
+    uint32_t word = data[pos];
+    if ((word & 0xFFFFFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xFFFFF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xFFFF0000) == 0) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      if (word > 0x10FFFF) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    }
+  }
+  return result(error_code::SUCCESS, utf8_output - start);
+}
+
+} // namespace utf32_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_utf8/utf32_to_utf8.h */
+
+/* begin file src/scalar/utf32_to_utf16/valid_utf32_to_utf16.h */
+#ifndef SIMDUTF_VALID_UTF32_TO_UTF16_H
+#define SIMDUTF_VALID_UTF32_TO_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_utf16 {
+
+template <endianness big_endian>
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char16_t *utf16_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    uint32_t word = data[pos];
+    if ((word & 0xFFFF0000) == 0) {
+      // will not generate a surrogate pair
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
+      pos++;
+    } else {
+      // will generate a surrogate pair
+      word -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+      pos++;
+    }
+  }
+  return utf16_output - start;
+}
+
+} // namespace utf32_to_utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_utf16/valid_utf32_to_utf16.h */
+/* begin file src/scalar/utf32_to_utf16/utf32_to_utf16.h */
+#ifndef SIMDUTF_UTF32_TO_UTF16_H
+#define SIMDUTF_UTF32_TO_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_utf16 {
+
+template <endianness big_endian>
+inline size_t convert(const char32_t *buf, size_t len, char16_t *utf16_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    uint32_t word = data[pos];
+    if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return 0;
+      }
+      // will not generate a surrogate pair
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
+    } else {
+      // will generate a surrogate pair
+      if (word > 0x10FFFF) {
+        return 0;
+      }
+      word -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+    }
+    pos++;
+  }
+  return utf16_output - start;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char16_t *utf16_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    uint32_t word = data[pos];
+    if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      // will not generate a surrogate pair
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
+                            : char16_t(word);
+    } else {
+      // will generate a surrogate pair
+      if (word > 0x10FFFF) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+      word -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+    }
+    pos++;
+  }
+  return result(error_code::SUCCESS, utf16_output - start);
+}
+
+} // namespace utf32_to_utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_utf16/utf32_to_utf16.h */
+
+/* begin file src/scalar/utf16_to_utf8/valid_utf16_to_utf8.h */
+#ifndef SIMDUTF_VALID_UTF16_TO_UTF8_H
+#define SIMDUTF_VALID_UTF16_TO_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_utf8 {
+
+template <endianness big_endian>
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 4 ASCII characters
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if (!match_system(big_endian)) {
+        v = (v >> 8) | (v << (64 - 8));
+      }
+      if ((v & 0xFF80FF80FF80FF80) == 0) {
+        size_t final_pos = pos + 4;
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xF800) != 0xD800) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value & 0b111111) | 0b10000000);
+      pos += 2;
+    }
+  }
+  return utf8_output - start;
+}
+
+} // namespace utf16_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_utf8/valid_utf16_to_utf8.h */
+/* begin file src/scalar/utf16_to_utf8/utf16_to_utf8.h */
+#ifndef SIMDUTF_UTF16_TO_UTF8_H
+#define SIMDUTF_UTF16_TO_UTF8_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_utf8 {
+
+template <endianness big_endian>
+inline size_t convert(const char16_t *buf, size_t len, char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 8 bytes
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if (!match_system(big_endian)) {
+        v = (v >> 8) | (v << (64 - 8));
+      }
+      if ((v & 0xFF80FF80FF80FF80) == 0) {
+        size_t final_pos = pos + 4;
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xF800) != 0xD800) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      if (pos + 1 >= len) {
+        return 0;
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return 0;
+      }
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return 0;
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value & 0b111111) | 0b10000000);
+      pos += 2;
+    }
+  }
+  return utf8_output - start;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char *utf8_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *start{utf8_output};
+  while (pos < len) {
+    // try to convert the next block of 8 bytes
+    if (pos + 4 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if (!match_system(big_endian))
+        v = (v >> 8) | (v << (64 - 8));
+      if ((v & 0xFF80FF80FF80FF80) == 0) {
+        size_t final_pos = pos + 4;
+        while (pos < final_pos) {
+          *utf8_output++ = !match_system(big_endian)
+                               ? char(utf16::swap_bytes(buf[pos]))
+                               : char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF80) == 0) {
+      // will generate one UTF-8 bytes
+      *utf8_output++ = char(word);
+      pos++;
+    } else if ((word & 0xF800) == 0) {
+      // will generate two UTF-8 bytes
+      // we have 0b110XXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else if ((word & 0xF800) != 0xD800) {
+      // will generate three UTF-8 bytes
+      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      // will generate four UTF-8 bytes
+      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+      *utf8_output++ = char((value >> 18) | 0b11110000);
+      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((value & 0b111111) | 0b10000000);
+      pos += 2;
+    }
+  }
+  return result(error_code::SUCCESS, utf8_output - start);
+}
+
+} // namespace utf16_to_utf8
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_utf8/utf16_to_utf8.h */
+
+/* begin file src/scalar/utf16_to_utf32/valid_utf16_to_utf32.h */
+#ifndef SIMDUTF_VALID_UTF16_TO_UTF32_H
+#define SIMDUTF_VALID_UTF16_TO_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_utf32 {
+
+template <endianness big_endian>
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
+      // No surrogate pair, extend 16-bit word to 32-bit word
+      *utf32_output++ = char32_t(word);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      *utf32_output++ = char32_t(value);
+      pos += 2;
+    }
+  }
+  return utf32_output - start;
+}
+
+} // namespace utf16_to_utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_utf32/valid_utf16_to_utf32.h */
+/* begin file src/scalar/utf16_to_utf32/utf16_to_utf32.h */
+#ifndef SIMDUTF_UTF16_TO_UTF32_H
+#define SIMDUTF_UTF16_TO_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_utf32 {
+
+template <endianness big_endian>
+inline size_t convert(const char16_t *buf, size_t len, char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
+      // No surrogate pair, extend 16-bit word to 32-bit word
+      *utf32_output++ = char32_t(word);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return 0;
+      }
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return 0;
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      *utf32_output++ = char32_t(value);
+      pos += 2;
+    }
+  }
+  return utf32_output - start;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char32_t *utf32_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    uint16_t word =
+        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xF800) != 0xD800) {
+      // No surrogate pair, extend 16-bit word to 32-bit word
+      *utf32_output++ = char32_t(word);
+      pos++;
+    } else {
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      if (diff > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      if (pos + 1 >= len) {
+        return result(error_code::SURROGATE, pos);
+      } // minimal bound checking
+      uint16_t next_word = !match_system(big_endian)
+                               ? utf16::swap_bytes(data[pos + 1])
+                               : data[pos + 1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if (diff2 > 0x3FF) {
+        return result(error_code::SURROGATE, pos);
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      *utf32_output++ = char32_t(value);
+      pos += 2;
+    }
+  }
+  return result(error_code::SUCCESS, utf32_output - start);
+}
+
+} // namespace utf16_to_utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_utf32/utf16_to_utf32.h */
+
+/* begin file src/scalar/utf8_to_utf16/valid_utf8_to_utf16.h */
+#ifndef SIMDUTF_VALID_UTF8_TO_UTF16_H
+#define SIMDUTF_VALID_UTF8_TO_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_utf16 {
+
+template <endianness big_endian>
+inline size_t convert_valid(const char *buf, size_t len,
+                            char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    // try to convert the next block of 8 ASCII bytes
+    if (pos + 8 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 8;
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      uint16_t code_point = uint16_t(((leading_byte & 0b00011111) << 6) |
+                                     (data[pos + 1] & 0b00111111));
+      if (!match_system(big_endian)) {
+        code_point = utf16::swap_bytes(uint16_t(code_point));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 2 >= len) {
+        break;
+      } // minimal bound checking
+      uint16_t code_point = uint16_t(((leading_byte & 0b00001111) << 12) |
+                                     ((data[pos + 1] & 0b00111111) << 6) |
+                                     (data[pos + 2] & 0b00111111));
+      if (!match_system(big_endian)) {
+        code_point = utf16::swap_bytes(uint16_t(code_point));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        break;
+      } // minimal bound checking
+      uint32_t code_point = ((leading_byte & 0b00000111) << 18) |
+                            ((data[pos + 1] & 0b00111111) << 12) |
+                            ((data[pos + 2] & 0b00111111) << 6) |
+                            (data[pos + 3] & 0b00111111);
+      code_point -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+      pos += 4;
+    } else {
+      // we may have a continuation but we do not do error checking
+      return 0;
+    }
+  }
+  return utf16_output - start;
+}
+
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_utf16/valid_utf8_to_utf16.h */
+/* begin file src/scalar/utf8_to_utf16/utf8_to_utf16.h */
+#ifndef SIMDUTF_UTF8_TO_UTF16_H
+#define SIMDUTF_UTF8_TO_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_utf16 {
+
+template <endianness big_endian>
+inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      // range check
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return 0;
+      }
+      if (!match_system(big_endian)) {
+        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 2 >= len) {
+        return 0;
+      } // minimal bound checking
+
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      // range check
+      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if (code_point < 0x800 || 0xffff < code_point ||
+          (0xd7ff < code_point && code_point < 0xe000)) {
+        return 0;
+      }
+      if (!match_system(big_endian)) {
+        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+
+      // range check
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return 0;
+      }
+      code_point -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+      pos += 4;
+    } else {
+      return 0;
+    }
+  }
+  return utf16_output - start;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(utf16::swap_bytes(buf[pos]))
+                                : char16_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(utf16::swap_bytes(leading_byte))
+                            : char16_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (!match_system(big_endian)) {
+        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8, it should become
+      // a single UTF-16 word.
+      if (pos + 2 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if ((code_point < 0x800) || (0xffff < code_point)) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
+      if (!match_system(big_endian)) {
+        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
+      }
+      *utf16_output++ = char16_t(code_point);
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+
+      // range check
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+      code_point -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = utf16::swap_bytes(high_surrogate);
+        low_surrogate = utf16::swap_bytes(low_surrogate);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+      pos += 4;
+    } else {
+      // we either have too many continuation bytes or an invalid leading byte
+      if ((leading_byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      } else {
+        return result(error_code::HEADER_BITS, pos);
+      }
+    }
+  }
+  return result(error_code::SUCCESS, utf16_output - start);
+}
+
+/**
+ * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
+ * we have up to len input bytes left, and we encountered some error. It is
+ * possible that the error is at 'buf' exactly, but it could also be in the
+ * previous bytes  (up to 3 bytes back).
+ *
+ * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
+ * current memory section and can be safely accessed. We prior_bytes to access
+ * safely up to three bytes before 'buf'.
+ *
+ * The caller is responsible to ensure that len > 0.
+ *
+ * If the error is believed to have occurred prior to 'buf', the count value
+ * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
+ */
+template <endianness endian>
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char16_t *utf16_output) {
+  size_t extra_len{0};
+  // We potentially need to go back in time and find a leading byte.
+  // In theory '3' would be sufficient, but sometimes the error can go back
+  // quite far.
+  size_t how_far_back = prior_bytes;
+  // size_t how_far_back = 3; // 3 bytes in the past + current position
+  // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
+  bool found_leading_bytes{false};
+  // important: it is i <= how_far_back and not 'i < how_far_back'.
+  for (size_t i = 0; i <= how_far_back; i++) {
+    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
+    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
+        // If we had to go back and the leading byte is ascii
+        // then we can stop right away.
+        return result(error_code::TOO_LONG, 0 - i + 1);
+      }
+      buf -= i;
+      extra_len = i;
+      break;
+    }
+  }
+  //
+  // It is possible for this function to return a negative count in its result.
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
+  //
+  // An unsigned type will simply wrap round arithmetically (well defined).
+  //
+  if (!found_leading_bytes) {
+    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
+  }
+  result res = convert_with_errors<endian>(buf, len + extra_len, utf16_output);
+  if (res.error) {
+    res.count -= extra_len;
+  }
+  return res;
+}
+
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_utf16/utf8_to_utf16.h */
+
+/* begin file src/scalar/utf8_to_utf32/valid_utf8_to_utf32.h */
+#ifndef SIMDUTF_VALID_UTF8_TO_UTF32_H
+#define SIMDUTF_VALID_UTF8_TO_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_utf32 {
+
+inline size_t convert_valid(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    // try to convert the next block of 8 ASCII bytes
+    if (pos + 8 <=
+        len) { // if it is safe to read 8 more bytes, check that they are ascii
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 8;
+        while (pos < final_pos) {
+          *utf32_output++ = char32_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf32_output++ = char32_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      *utf32_output++ = char32_t(((leading_byte & 0b00011111) << 6) |
+                                 (data[pos + 1] & 0b00111111));
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8
+      if (pos + 2 >= len) {
+        break;
+      } // minimal bound checking
+      *utf32_output++ = char32_t(((leading_byte & 0b00001111) << 12) |
+                                 ((data[pos + 1] & 0b00111111) << 6) |
+                                 (data[pos + 2] & 0b00111111));
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        break;
+      } // minimal bound checking
+      uint32_t code_word = ((leading_byte & 0b00000111) << 18) |
+                           ((data[pos + 1] & 0b00111111) << 12) |
+                           ((data[pos + 2] & 0b00111111) << 6) |
+                           (data[pos + 3] & 0b00111111);
+      *utf32_output++ = char32_t(code_word);
+      pos += 4;
+    } else {
+      // we may have a continuation but we do not do error checking
+      return 0;
+    }
+  }
+  return utf32_output - start;
+}
+
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* begin file src/scalar/utf8_to_utf32/utf8_to_utf32.h */
+#ifndef SIMDUTF_UTF8_TO_UTF32_H
+#define SIMDUTF_UTF8_TO_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_utf32 {
+
+inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *utf32_output++ = char32_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf32_output++ = char32_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      // range check
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return 0;
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8
+      if (pos + 2 >= len) {
+        return 0;
+      } // minimal bound checking
+
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      // range check
+      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if (code_point < 0x800 || 0xffff < code_point ||
+          (0xd7ff < code_point && code_point < 0xe000)) {
+        return 0;
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return 0;
+      }
+
+      // range check
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff || 0x10ffff < code_point) {
+        return 0;
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 4;
+    } else {
+      return 0;
+    }
+  }
+  return utf32_output - start;
+}
+
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char32_t *utf32_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2};
+      if ((v & 0x8080808080808080) == 0) {
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *utf32_output++ = char32_t(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *utf32_output++ = char32_t(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) == 0b11000000) {
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
+      if (code_point < 0x80 || 0x7ff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8
+      if (pos + 2 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      // range check
+      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
+                            (data[pos + 1] & 0b00111111) << 6 |
+                            (data[pos + 2] & 0b00111111);
+      if (code_point < 0x800 || 0xffff < code_point) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xd7ff < code_point && code_point < 0xe000) {
+        return result(error_code::SURROGATE, pos);
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 3;
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      if (pos + 3 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      }
+
+      // range check
+      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
+                            (data[pos + 1] & 0b00111111) << 12 |
+                            (data[pos + 2] & 0b00111111) << 6 |
+                            (data[pos + 3] & 0b00111111);
+      if (code_point <= 0xffff) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0x10ffff < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      }
+      *utf32_output++ = char32_t(code_point);
+      pos += 4;
+    } else {
+      // we either have too many continuation bytes or an invalid leading byte
+      if ((leading_byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      } else {
+        return result(error_code::HEADER_BITS, pos);
+      }
+    }
+  }
+  return result(error_code::SUCCESS, utf32_output - start);
+}
+
+/**
+ * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
+ * we have up to len input bytes left, and we encountered some error. It is
+ * possible that the error is at 'buf' exactly, but it could also be in the
+ * previous bytes location (up to 3 bytes back).
+ *
+ * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
+ * current memory section and can be safely accessed. We prior_bytes to access
+ * safely up to three bytes before 'buf'.
+ *
+ * The caller is responsible to ensure that len > 0.
+ *
+ * If the error is believed to have occurred prior to 'buf', the count value
+ * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
+ */
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char32_t *utf32_output) {
+  size_t extra_len{0};
+  // We potentially need to go back in time and find a leading byte.
+  size_t how_far_back = 3; // 3 bytes in the past + current position
+  if (how_far_back > prior_bytes) {
+    how_far_back = prior_bytes;
+  }
+  bool found_leading_bytes{false};
+  // important: it is i <= how_far_back and not 'i < how_far_back'.
+  for (size_t i = 0; i <= how_far_back; i++) {
+    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
+    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
+        // If we had to go back and the leading byte is ascii
+        // then we can stop right away.
+        return result(error_code::TOO_LONG, 0 - i + 1);
+      }
+      buf -= i;
+      extra_len = i;
+      break;
+    }
+  }
+  //
+  // It is possible for this function to return a negative count in its result.
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
+  //
+  // An unsigned type will simply wrap round arithmetically (well defined).
+  //
+  if (!found_leading_bytes) {
+    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
+  }
+
+  result res = convert_with_errors(buf, len + extra_len, utf32_output);
+  if (res.error) {
+    res.count -= extra_len;
+  }
+  return res;
+}
+
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_utf32/utf8_to_utf32.h */
+
+/* begin file src/scalar/latin1_to_utf16/latin1_to_utf16.h */
+#ifndef SIMDUTF_LATIN1_TO_UTF16_H
+#define SIMDUTF_LATIN1_TO_UTF16_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace latin1_to_utf16 {
+
+template <endianness big_endian>
+inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+
+  while (pos < len) {
+    uint16_t word =
+        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
+    *utf16_output++ =
+        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
+    pos++;
+  }
+
+  return utf16_output - start;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char16_t *utf16_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+
+  while (pos < len) {
+    uint16_t word =
+        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
+    *utf16_output++ =
+        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
+    pos++;
+  }
+
+  return result(error_code::SUCCESS, utf16_output - start);
+}
+
+} // namespace latin1_to_utf16
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/latin1_to_utf16/latin1_to_utf16.h */
+/* begin file src/scalar/latin1_to_utf32/latin1_to_utf32.h */
+#ifndef SIMDUTF_LATIN1_TO_UTF32_H
+#define SIMDUTF_LATIN1_TO_UTF32_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace latin1_to_utf32 {
+
+inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
+  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
+  char32_t *start{utf32_output};
+  for (size_t i = 0; i < len; i++) {
+    *utf32_output++ = (char32_t)data[i];
+  }
+  return utf32_output - start;
+}
+
+} // namespace latin1_to_utf32
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/latin1_to_utf32/latin1_to_utf32.h */
+
+/* begin file src/scalar/utf8_to_latin1/utf8_to_latin1.h */
+#ifndef SIMDUTF_UTF8_TO_LATIN1_H
+#define SIMDUTF_UTF8_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_latin1 {
+
+inline size_t convert(const char *buf, size_t len, char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char *start{latin_output};
+
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
+                           // 1000 1000 .... etc
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *latin_output++ = char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+
+    // suppose it is not an all ASCII byte sequence
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *latin_output++ = char(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        return 0;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
+      // range check -
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
+      if (code_point < 0x80 || 0xFF < code_point) {
+        return 0; // We only care about the range 129-255 which is Non-ASCII
+                  // latin1 characters. A code_point beneath 0x80 is invalid as
+                  // it is already covered by bytes whose leading bit is zero.
+      }
+      *latin_output++ = char(code_point);
+      pos += 2;
+    } else {
+      return 0;
+    }
+  }
+  return latin_output - start;
+}
+
+inline result convert_with_errors(const char *buf, size_t len,
+                                  char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  size_t pos = 0;
+  char *start{latin_output};
+
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
+                           // 1000 1000...etc
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *latin_output++ = char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    // suppose it is not an all ASCII byte sequence
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *latin_output++ = char(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        return result(error_code::TOO_SHORT, pos);
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return result(error_code::TOO_SHORT, pos);
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
+      // range check -
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
+      if (code_point < 0x80) {
+        return result(error_code::OVERLONG, pos);
+      }
+      if (0xFF < code_point) {
+        return result(error_code::TOO_LARGE, pos);
+      } // We only care about the range 129-255 which is Non-ASCII latin1
+        // characters
+      *latin_output++ = char(code_point);
+      pos += 2;
+    } else if ((leading_byte & 0b11110000) == 0b11100000) {
+      // We have a three-byte UTF-8
+      return result(error_code::TOO_LARGE, pos);
+    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
+      // we have a 4-byte UTF-8 word.
+      return result(error_code::TOO_LARGE, pos);
+    } else {
+      // we either have too many continuation bytes or an invalid leading byte
+      if ((leading_byte & 0b11000000) == 0b10000000) {
+        return result(error_code::TOO_LONG, pos);
+      }
+
+      return result(error_code::HEADER_BITS, pos);
+    }
+  }
+  return result(error_code::SUCCESS, latin_output - start);
+}
+
+inline result rewind_and_convert_with_errors(size_t prior_bytes,
+                                             const char *buf, size_t len,
+                                             char *latin1_output) {
+  size_t extra_len{0};
+  // We potentially need to go back in time and find a leading byte.
+  // In theory '3' would be sufficient, but sometimes the error can go back
+  // quite far.
+  size_t how_far_back = prior_bytes;
+  // size_t how_far_back = 3; // 3 bytes in the past + current position
+  // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
+  bool found_leading_bytes{false};
+  // important: it is i <= how_far_back and not 'i < how_far_back'.
+  for (size_t i = 0; i <= how_far_back; i++) {
+    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
+    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
+    if (found_leading_bytes) {
+      if (i > 0 && byte < 128) {
+        // If we had to go back and the leading byte is ascii
+        // then we can stop right away.
+        return result(error_code::TOO_LONG, 0 - i + 1);
+      }
+      buf -= i;
+      extra_len = i;
+      break;
+    }
+  }
+  //
+  // It is possible for this function to return a negative count in its result.
+  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
+  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
+  // unsigned integral type of the result of the sizeof operator
+  //
+  // An unsigned type will simply wrap round arithmetically (well defined).
+  //
+  if (!found_leading_bytes) {
+    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
+    // [....] [continuation] [continuation] [continuation] | [buf is
+    // continuation] Or we possibly have a stream that does not start with a
+    // leading byte.
+    return result(error_code::TOO_LONG, 0 - how_far_back);
+  }
+  result res = convert_with_errors(buf, len + extra_len, latin1_output);
+  if (res.error) {
+    res.count -= extra_len;
+  }
+  return res;
+}
+
+} // namespace utf8_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_latin1/utf8_to_latin1.h */
+/* begin file src/scalar/utf16_to_latin1/utf16_to_latin1.h */
+#ifndef SIMDUTF_UTF16_TO_LATIN1_H
+#define SIMDUTF_UTF16_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_latin1 {
+
+#include <cstring> // for std::memcpy
+
+template <endianness big_endian>
+inline size_t convert(const char16_t *buf, size_t len, char *latin_output) {
+  if (len == 0) {
+    return 0;
+  }
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *current_write = latin_output;
+  uint16_t word = 0;
+  uint16_t too_large = 0;
+
+  while (pos < len) {
+    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    too_large |= word;
+    *current_write++ = char(word & 0xFF);
+    pos++;
+  }
+  if ((too_large & 0xFF00) != 0) {
+    return 0;
+  }
+
+  return current_write - latin_output;
+}
+
+template <endianness big_endian>
+inline result convert_with_errors(const char16_t *buf, size_t len,
+                                  char *latin_output) {
+  if (len == 0) {
+    return result(error_code::SUCCESS, 0);
+  }
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *start{latin_output};
+  uint16_t word;
+
+  while (pos < len) {
+    if (pos + 16 <= len) { // if it is safe to read 32 more bytes, check that
+                           // they are Latin1
+      uint64_t v1, v2, v3, v4;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      ::memcpy(&v2, data + pos + 4, sizeof(uint64_t));
+      ::memcpy(&v3, data + pos + 8, sizeof(uint64_t));
+      ::memcpy(&v4, data + pos + 12, sizeof(uint64_t));
+
+      if (!match_system(big_endian)) {
+        v1 = (v1 >> 8) | (v1 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v2 = (v2 >> 8) | (v2 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v3 = (v3 >> 8) | (v3 << (64 - 8));
+      }
+      if (!match_system(big_endian)) {
+        v4 = (v4 >> 8) | (v4 << (64 - 8));
+      }
+
+      if (((v1 | v2 | v3 | v4) & 0xFF00FF00FF00FF00) == 0) {
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *latin_output++ = !match_system(big_endian)
+                                ? char(utf16::swap_bytes(data[pos]))
+                                : char(data[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    if ((word & 0xFF00) == 0) {
+      *latin_output++ = char(word & 0xFF);
+      pos++;
+    } else {
+      return result(error_code::TOO_LARGE, pos);
+    }
+  }
+  return result(error_code::SUCCESS, latin_output - start);
+}
+
+} // namespace utf16_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_latin1/utf16_to_latin1.h */
+/* begin file src/scalar/utf32_to_latin1/utf32_to_latin1.h */
+#ifndef SIMDUTF_UTF32_TO_LATIN1_H
+#define SIMDUTF_UTF32_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_latin1 {
+
+inline size_t convert(const char32_t *buf, size_t len, char *latin1_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  char *start = latin1_output;
+  uint32_t utf32_char;
+  size_t pos = 0;
+  uint32_t too_large = 0;
+
+  while (pos < len) {
+    utf32_char = (uint32_t)data[pos];
+    too_large |= utf32_char;
+    *latin1_output++ = (char)(utf32_char & 0xFF);
+    pos++;
+  }
+  if ((too_large & 0xFFFFFF00) != 0) {
+    return 0;
+  }
+  return latin1_output - start;
+}
+
+inline result convert_with_errors(const char32_t *buf, size_t len,
+                                  char *latin1_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  char *start{latin1_output};
+  size_t pos = 0;
+  while (pos < len) {
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are Latin1
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF00FFFFFF00) == 0) {
+        *latin1_output++ = char(buf[pos]);
+        *latin1_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
+      }
+    }
+    uint32_t utf32_char = data[pos];
+    if ((utf32_char & 0xFFFFFF00) ==
+        0) { // Check if the character can be represented in Latin-1
+      *latin1_output++ = (char)(utf32_char & 0xFF);
+      pos++;
+    } else {
+      return result(error_code::TOO_LARGE, pos);
+    };
+  }
+  return result(error_code::SUCCESS, latin1_output - start);
+}
+
+} // namespace utf32_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_latin1/utf32_to_latin1.h */
+
+/* begin file src/scalar/utf8_to_latin1/valid_utf8_to_latin1.h */
+#ifndef SIMDUTF_VALID_UTF8_TO_LATIN1_H
+#define SIMDUTF_VALID_UTF8_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf8_to_latin1 {
+
+inline size_t convert_valid(const char *buf, size_t len, char *latin_output) {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+
+  size_t pos = 0;
+  char *start{latin_output};
+
+  while (pos < len) {
+    // try to convert the next block of 16 ASCII bytes
+    if (pos + 16 <=
+        len) { // if it is safe to read 16 more bytes, check that they are ascii
+      uint64_t v1;
+      ::memcpy(&v1, data + pos, sizeof(uint64_t));
+      uint64_t v2;
+      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
+      uint64_t v{v1 |
+                 v2}; // We are only interested in these bits: 1000 1000 1000
+                      // 1000, so it makes sense to concatenate everything
+      if ((v & 0x8080808080808080) ==
+          0) { // if NONE of these are set, e.g. all of them are zero, then
+               // everything is ASCII
+        size_t final_pos = pos + 16;
+        while (pos < final_pos) {
+          *latin_output++ = char(buf[pos]);
+          pos++;
+        }
+        continue;
+      }
+    }
+
+    // suppose it is not an all ASCII byte sequence
+    uint8_t leading_byte = data[pos]; // leading byte
+    if (leading_byte < 0b10000000) {
+      // converting one ASCII byte !!!
+      *latin_output++ = char(leading_byte);
+      pos++;
+    } else if ((leading_byte & 0b11100000) ==
+               0b11000000) { // the first three bits indicate:
+      // We have a two-byte UTF-8
+      if (pos + 1 >= len) {
+        break;
+      } // minimal bound checking
+      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
+        return 0;
+      } // checks if the next byte is a valid continuation byte in UTF-8. A
+        // valid continuation byte starts with 10.
+      // range check -
+      uint32_t code_point =
+          (leading_byte & 0b00011111) << 6 |
+          (data[pos + 1] &
+           0b00111111); // assembles the Unicode code point from the two bytes.
+                        // It does this by discarding the leading 110 and 10
+                        // bits from the two bytes, shifting the remaining bits
+                        // of the first byte, and then combining the results
+                        // with a bitwise OR operation.
+      *latin_output++ = char(code_point);
+      pos += 2;
+    } else {
+      // we may have a continuation but we do not do error checking
+      return 0;
+    }
+  }
+  return latin_output - start;
+}
+
+} // namespace utf8_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf8_to_latin1/valid_utf8_to_latin1.h */
+/* begin file src/scalar/utf16_to_latin1/valid_utf16_to_latin1.h */
+#ifndef SIMDUTF_VALID_UTF16_TO_LATIN1_H
+#define SIMDUTF_VALID_UTF16_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf16_to_latin1 {
+
+template <endianness big_endian>
+inline size_t convert_valid(const char16_t *buf, size_t len,
+                            char *latin_output) {
+  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  size_t pos = 0;
+  char *start{latin_output};
+  uint16_t word = 0;
+
+  while (pos < len) {
+    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
+    *latin_output++ = char(word);
+    pos++;
+  }
+
+  return latin_output - start;
+}
+
+} // namespace utf16_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf16_to_latin1/valid_utf16_to_latin1.h */
+/* begin file src/scalar/utf32_to_latin1/valid_utf32_to_latin1.h */
+#ifndef SIMDUTF_VALID_UTF32_TO_LATIN1_H
+#define SIMDUTF_VALID_UTF32_TO_LATIN1_H
+
+namespace simdutf {
+namespace scalar {
+namespace {
+namespace utf32_to_latin1 {
+
+inline size_t convert_valid(const char32_t *buf, size_t len,
+                            char *latin1_output) {
+  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
+  char *start = latin1_output;
+  uint32_t utf32_char;
+  size_t pos = 0;
+
+  while (pos < len) {
+    utf32_char = (uint32_t)data[pos];
+
+    if (pos + 2 <=
+        len) { // if it is safe to read 8 more bytes, check that they are Latin1
+      uint64_t v;
+      ::memcpy(&v, data + pos, sizeof(uint64_t));
+      if ((v & 0xFFFFFF00FFFFFF00) == 0) {
+        *latin1_output++ = char(buf[pos]);
+        *latin1_output++ = char(buf[pos + 1]);
+        pos += 2;
+        continue;
+      } else {
+        // output can not be represented in latin1
+        return 0;
+      }
+    }
+    if ((utf32_char & 0xFFFFFF00) == 0) {
+      *latin1_output++ = char(utf32_char);
+    } else {
+      // output can not be represented in latin1
+      return 0;
+    }
+    pos++;
+  }
+  return latin1_output - start;
+}
+
+} // namespace utf32_to_latin1
+} // unnamed namespace
+} // namespace scalar
+} // namespace simdutf
+
+#endif
+/* end file src/scalar/utf32_to_latin1/valid_utf32_to_latin1.h */
+
+SIMDUTF_PUSH_DISABLE_WARNINGS
+SIMDUTF_DISABLE_UNDESIRED_WARNINGS
+
+#if SIMDUTF_IMPLEMENTATION_ARM64
+/* begin file src/arm64/implementation.cpp */
+/* begin file src/simdutf/arm64/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "arm64"
+// #define SIMDUTF_IMPLEMENTATION arm64
+/* end file src/simdutf/arm64/begin.h */
+namespace simdutf {
+namespace arm64 {
+namespace {
+#ifndef SIMDUTF_ARM64_H
+  #error "arm64.h must be included"
+#endif
+using namespace simd;
+
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
+  simd8<uint8_t> bits = input.reduce_or();
+  return bits.max_val() < 0b10000000u;
+}
+
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller
+  // is using ^ as well. This will work fine because we only have to report
+  // errors for cases with 0-1 lead bytes. Multiple lead bytes implies 2
+  // overlapping multibyte characters, and if that happens, there is guaranteed
+  // to be at least *one* lead byte that is part of only 1 other multibyte
+  // character. The error will be detected there.
+  return is_second_byte ^ is_third_byte ^ is_fourth_byte;
+}
+
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  return is_third_byte ^ is_fourth_byte;
+}
+
+// common functions for utf8 conversions
+simdutf_really_inline uint16x4_t convert_utf8_3_byte_to_utf16(uint8x16_t in) {
+  // Low half contains  10cccccc|1110aaaa
+  // High half contains 10bbbbbb|10bbbbbb
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  const uint8x16_t sh = simdutf_make_uint8x16_t(0, 2, 3, 5, 6, 8, 9, 11, 1, 1,
+                                                4, 4, 7, 7, 10, 10);
+#else
+  const uint8x16_t sh = {0, 2, 3, 5, 6, 8, 9, 11, 1, 1, 4, 4, 7, 7, 10, 10};
+#endif
+  uint8x16_t perm = vqtbl1q_u8(in, sh);
+  // Split into half vectors.
+  // 10cccccc|1110aaaa
+  uint8x8_t perm_low = vget_low_u8(perm); // no-op
+  // 10bbbbbb|10bbbbbb
+  uint8x8_t perm_high = vget_high_u8(perm);
+  // xxxxxxxx 10bbbbbb
+  uint16x4_t mid = vreinterpret_u16_u8(perm_high); // no-op
+  // xxxxxxxx 1110aaaa
+  uint16x4_t high = vreinterpret_u16_u8(perm_low); // no-op
+  // Assemble with shift left insert.
+  // xxxxxxaa aabbbbbb
+  uint16x4_t mid_high = vsli_n_u16(mid, high, 6);
+  // (perm_low << 8) | (perm_low >> 8)
+  // xxxxxxxx 10cccccc
+  uint16x4_t low = vreinterpret_u16_u8(vrev16_u8(perm_low));
+  // Shift left insert into the low bits
+  // aaaabbbb bbcccccc
+  uint16x4_t composed = vsli_n_u16(low, mid_high, 6);
+  return composed;
+}
+
+simdutf_really_inline uint16x8_t convert_utf8_2_byte_to_utf16(uint8x16_t in) {
+  // Converts 6 2 byte UTF-8 characters to 6 UTF-16 characters.
+  // Technically this calculates 8, but 6 does better and happens more often
+  // (The languages which use these codepoints use ASCII spaces so 8 would need
+  // to be in the middle of a very long word).
+
+  // 10bbbbbb 110aaaaa
+  uint16x8_t upper = vreinterpretq_u16_u8(in);
+  // (in << 8) | (in >> 8)
+  // 110aaaaa 10bbbbbb
+  uint16x8_t lower = vreinterpretq_u16_u8(vrev16q_u8(in));
+  // 00000000 000aaaaa
+  uint16x8_t upper_masked = vandq_u16(upper, vmovq_n_u16(0x1F));
+  // Assemble with shift left insert.
+  // 00000aaa aabbbbbb
+  uint16x8_t composed = vsliq_n_u16(lower, upper_masked, 6);
+  return composed;
+}
+
+simdutf_really_inline uint16x8_t
+convert_utf8_1_to_2_byte_to_utf16(uint8x16_t in, size_t shufutf8_idx) {
+  // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
+  // This is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes.
+  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+      simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]));
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
+  // Mask
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000000 00bbbbbb
+  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
+  // 1 byte: 00000000 00000000
+  // 2 byte: 000aaaaa 00000000
+  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
+  // Combine with a shift right accumulate
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000aaa aabbbbbb
+  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
+  return composed;
+}
+
+/* begin file src/arm64/arm_validate_utf16.cpp */
+template <endianness big_endian>
+const char16_t *arm_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  while (end - input >= 16) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    if (!match_system(big_endian)) {
+      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
+      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+    }
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
+    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
+    if (surrogates_wordmask == 0) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint64_t V = ~surrogates_wordmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = ((in & v_fc) == v_dc);
+      const uint64_t H = vH.to_bitmask64();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint64_t L = ~H & surrogates_wordmask;
+
+      const uint64_t a =
+          L & (H >> 4); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint64_t b =
+          a << 4; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint64_t c = V | a | b; // Combine all the masks into the final one.
+      if (c == ~0ull) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0xfffffffffffffffull) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return nullptr;
+      }
+    }
+  }
+  return input;
+}
+
+template <endianness big_endian>
+const result arm_validate_utf16_with_errors(const char16_t *input,
+                                            size_t size) {
+  const char16_t *start = input;
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+  while (input + 16 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+
+    if (!match_system(big_endian)) {
+      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
+      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+    }
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
+    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
+    if (surrogates_wordmask == 0) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint64_t V = ~surrogates_wordmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = ((in & v_fc) == v_dc);
+      const uint64_t H = vH.to_bitmask64();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint64_t L = ~H & surrogates_wordmask;
+
+      const uint64_t a =
+          L & (H >> 4); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint64_t b =
+          a << 4; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint64_t c = V | a | b; // Combine all the masks into the final one.
+      if (c == ~0ull) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0xfffffffffffffffull) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
+    }
+  }
+  return result(error_code::SUCCESS, input - start);
+}
+/* end file src/arm64/arm_validate_utf16.cpp */
+/* begin file src/arm64/arm_validate_utf32le.cpp */
+
+const char32_t *arm_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
+
+  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
+  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
+  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
+  uint32x4_t currentmax = vmovq_n_u32(0x0);
+  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+
+  while (end - input >= 4) {
+    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
+    currentmax = vmaxq_u32(in, currentmax);
+    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
+    input += 4;
+  }
+
+  uint32x4_t is_zero =
+      veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
+  if (vmaxvq_u32(is_zero) != 0) {
+    return nullptr;
+  }
+
+  is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
+                      standardoffsetmax);
+  if (vmaxvq_u32(is_zero) != 0) {
+    return nullptr;
+  }
+
+  return input;
+}
+
+const result arm_validate_utf32le_with_errors(const char32_t *input,
+                                              size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
+
+  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
+  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
+  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
+  uint32x4_t currentmax = vmovq_n_u32(0x0);
+  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+
+  while (end - input >= 4) {
+    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
+    currentmax = vmaxq_u32(in, currentmax);
+    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
+
+    uint32x4_t is_zero =
+        veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
+    if (vmaxvq_u32(is_zero) != 0) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
+
+    is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
+                        standardoffsetmax);
+    if (vmaxvq_u32(is_zero) != 0) {
+      return result(error_code::SURROGATE, input - start);
+    }
+
+    input += 4;
+  }
+
+  return result(error_code::SUCCESS, input - start);
+}
+/* end file src/arm64/arm_validate_utf32le.cpp */
+
+/* begin file src/arm64/arm_convert_latin1_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char *, char16_t *>
+arm_convert_latin1_to_utf16(const char *buf, size_t len,
+                            char16_t *utf16_output) {
+  const char *end = buf + len;
+
+  while (end - buf >= 16) {
+    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
+    uint16x8_t inlow = vmovl_u8(vget_low_u8(in8));
+    if (!match_system(big_endian)) {
+      inlow = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inlow)));
+    }
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), inlow);
+    uint16x8_t inhigh = vmovl_u8(vget_high_u8(in8));
+    if (!match_system(big_endian)) {
+      inhigh = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inhigh)));
+    }
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output + 8), inhigh);
+    utf16_output += 16;
+    buf += 16;
+  }
+
+  return std::make_pair(buf, utf16_output);
+}
+/* end file src/arm64/arm_convert_latin1_to_utf16.cpp */
+/* begin file src/arm64/arm_convert_latin1_to_utf32.cpp */
+std::pair<const char *, char32_t *>
+arm_convert_latin1_to_utf32(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char *end = buf + len;
+
+  while (end - buf >= 16) {
+    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
+    uint16x8_t in8low = vmovl_u8(vget_low_u8(in8));
+    uint32x4_t in16lowlow = vmovl_u16(vget_low_u16(in8low));
+    uint32x4_t in16lowhigh = vmovl_u16(vget_high_u16(in8low));
+    uint16x8_t in8high = vmovl_u8(vget_high_u8(in8));
+    uint32x4_t in8highlow = vmovl_u16(vget_low_u16(in8high));
+    uint32x4_t in8highhigh = vmovl_u16(vget_high_u16(in8high));
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output), in16lowlow);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 4), in16lowhigh);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 8), in8highlow);
+    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 12), in8highhigh);
+
+    utf32_output += 16;
+    buf += 16;
+  }
+
+  return std::make_pair(buf, utf32_output);
+}
+/* end file src/arm64/arm_convert_latin1_to_utf32.cpp */
+/* begin file src/arm64/arm_convert_latin1_to_utf8.cpp */
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+std::pair<const char *, char *>
+arm_convert_latin1_to_utf8(const char *latin1_input, size_t len,
+                           char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char *end = latin1_input + len;
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  // We always write 16 bytes, of which more than the first 8 bytes
+  // are valid. A safety margin of 8 is more than sufficient.
+  while (end - latin1_input >= 16 + 8) {
+    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(latin1_input));
+    if (vmaxvq_u8(in8) <= 0x7F) { // ASCII fast path!!!!
+      vst1q_u8(utf8_output, in8);
+      utf8_output += 16;
+      latin1_input += 16;
+      continue;
+    }
+
+    // We just fallback on UTF-16 code. This could be optimized/simplified
+    // further.
+    uint16x8_t in16 = vmovl_u8(vget_low_u8(in8));
+    // 1. prepare 2-byte values
+    // input 8-bit word : [aabb|bbbb] x 8
+    // expected output   : [1100|00aa|10bb|bbbb] x 8
+    const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+    const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+    // t0 = [0000|00aa|bbbb|bb00]
+    const uint16x8_t t0 = vshlq_n_u16(in16, 2);
+    // t1 = [0000|00aa|0000|0000]
+    const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+    // t2 = [0000|0000|00bb|bbbb]
+    const uint16x8_t t2 = vandq_u16(in16, v_003f);
+    // t3 = [0000|00aa|00bb|bbbb]
+    const uint16x8_t t3 = vorrq_u16(t1, t2);
+    // t4 = [1100|00aa|10bb|bbbb]
+    const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+    // 2. merge ASCII and 2-byte codewords
+    const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+    const uint16x8_t one_byte_bytemask = vcleq_u16(in16, v_007f);
+    const uint8x16_t utf8_unpacked =
+        vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in16, t4));
+    // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+    const uint16x8_t mask = simdutf_make_uint16x8_t(
+        0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+    const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                             0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+    uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+    // 4. pack the bytes
+    const uint8_t *row =
+        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+    const uint8x16_t shuffle = vld1q_u8(row + 1);
+    const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+    // 5. store bytes
+    vst1q_u8(utf8_output, utf8_packed);
+    // 6. adjust pointers
+    latin1_input += 8;
+    utf8_output += row[0];
+
+  } // while
+
+  return std::make_pair(latin1_input, reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/arm64/arm_convert_latin1_to_utf8.cpp */
+
+/* begin file src/arm64/arm_convert_utf8_to_latin1.cpp */
+// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 16, usually 12).
+size_t convert_masked_utf8_to_latin1(const char *input,
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process in chunks of 12 bytes
+    vst1q_u8(reinterpret_cast<uint8_t *>(latin1_output), in);
+    latin1_output += 12; // We wrote 12 18-bit characters.
+    return 12;           // We consumed 12 bytes.
+  }
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  // this indicates an invalid input:
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. Converts 6
+  // 1-2 byte UTF-8 characters to 6 UTF-16 characters. This is a relatively easy
+  // scenario we process SIX (6) input code-code units. The max length in bytes
+  // of six code code units spanning between 1 and 2 bytes each is 12 bytes.
+  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+      simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
+  // Mask
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000000 00bbbbbb
+  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
+  // 1 byte: 00000000 00000000
+  // 2 byte: 000aaaaa 00000000
+  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
+  // Combine with a shift right accumulate
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000aaa aabbbbbb
+  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
+  // writing 8 bytes even though we only care about the first 6 bytes.
+  uint8x8_t latin1_packed = vmovn_u16(composed);
+  vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+  latin1_output += 6; // We wrote 6 bytes.
+  return consumed;
+}
+/* end file src/arm64/arm_convert_utf8_to_latin1.cpp */
+/* begin file src/arm64/arm_convert_utf8_to_utf16.cpp */
+// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 16, usually 12).
+template <endianness big_endian>
+size_t convert_masked_utf8_to_utf16(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xffff) {
+    // We process in chunks of 16 bytes
+    // The routine in simd.h is reused.
+    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
+    temp.store_ascii_as_utf16<big_endian>(utf16_output);
+    utf16_output += 16; // We wrote 16 16-bit characters.
+    return 16;          // We consumed 16 bytes.
+  }
+
+  // 3 byte sequences are the next most common, as seen in CJK, which has long
+  // sequences of these.
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units.
+    uint16x4_t composed = convert_utf8_3_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
+    }
+    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+    utf16_output += 4; // We wrote 4 16-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
+
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if ((utf8_end_of_code_point_mask & 0xFFF) == 0xaaa) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte
+    // UTF-16 code units.
+    uint16x8_t composed = convert_utf8_2_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed =
+          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+    }
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+
+    utf16_output += 6; // We wrote 6 16-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
+
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // Convert to UTF-16
+    uint16x8_t composed = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed =
+          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+    }
+    // Store
+    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+    utf16_output += 6; // We wrote 6 16-bit characters.
+    return consumed;
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    // XXX: depending on the system scalar instructions might be faster.
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: xx0bbbbb x0cccccc
+    // 3 byte: xxbbbbbb x0cccccc
+    uint16x4_t lowperm = vmovn_u32(perm);
+    // Partially mask with bic (doesn't require a temporary register unlike and)
+    // The shift left insert below will clear the top bits.
+    // 1 byte: 00000000 00000000
+    // 2 byte: xx0bbbbb 00000000
+    // 3 byte: xxbbbbbb 00000000
+    uint16x4_t middlebyte = vbic_u16(lowperm, vmov_n_u16(uint16_t(~0xFF00)));
+    // ASCII
+    // 1 byte: 00000000 0ccccccc
+    // 2+byte: 00000000 00cccccc
+    uint16x4_t ascii = vand_u16(lowperm, vmov_n_u16(0x7F));
+    // Split into narrow vectors.
+    // 2 byte: 00000000 00000000
+    // 3 byte: 00000000 xxxxaaaa
+    uint16x4_t highperm = vshrn_n_u32(perm, 16);
+    // Shift right accumulate the middle byte
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: 00xx0bbb bbcccccc
+    // 3 byte: 00xxbbbb bbcccccc
+    uint16x4_t middlelow = vsra_n_u16(ascii, middlebyte, 2);
+    // Shift left and insert the top 4 bits, overwriting the garbage
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: 00000bbb bbcccccc
+    // 3 byte: aaaabbbb bbcccccc
+    uint16x4_t composed = vsli_n_u16(middlelow, highperm, 12);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
+    }
+    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+
+    utf16_output += 4; // We wrote 4 16-bit codepoints
+    return consumed;
+  } else if (idx < 209) {
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-16 pairs. Generating surrogate pairs is a little tricky though, but
+      // it is easier when we can assume they are all pairs. This version does
+      // not use the LUT, but 4 byte sequences are less common and the overhead
+      // of the extra memory access is less important than the early branch
+      // overhead in shorter sequences.
+
+      // Swap byte pairs
+      // 10dddddd 10cccccc|10bbbbbb 11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      uint8x16_t swap = vrev16q_u8(in);
+      // Shift left 2 bits
+      // cccccc00 dddddd00 xxxxxxxx bbbbbb00
+      uint32x4_t shift = vreinterpretq_u32_u8(vshlq_n_u8(swap, 2));
+      // Create a magic number containing the low 2 bits of the trail surrogate
+      // and all the corrections needed to create the pair. UTF-8 4b prefix   =
+      // -0x0000|0xF000 surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
+      // surrogate high    = +0x0000|0xD800
+      // surrogate low     = +0xDC00|0x0000
+      // -------------------------------
+      //                   = +0xDC00|0xE7C0
+      uint32x4_t magic = vmovq_n_u32(0xDC00E7C0);
+      // Generate unadjusted trail surrogate minus lowest 2 bits
+      // xxxxxxxx xxxxxxxx|11110aaa bbbbbb00
+      uint32x4_t trail =
+          vbslq_u32(vmovq_n_u32(0x0000FF00), vreinterpretq_u32_u8(swap), shift);
+      // Insert low 2 bits of trail surrogate to magic number for later
+      // 11011100 00000000 11100111 110000cc
+      uint16x8_t magic_with_low_2 =
+          vreinterpretq_u16_u32(vsraq_n_u32(magic, shift, 30));
+      // Generate lead surrogate
+      // xxxxcccc ccdddddd|xxxxxxxx xxxxxxxx
+      uint32x4_t lead = vreinterpretq_u32_u16(
+          vsliq_n_u16(vreinterpretq_u16_u8(swap), vreinterpretq_u16_u8(in), 6));
+      // Mask out lead
+      // 000000cc ccdddddd|xxxxxxxx xxxxxxxx
+      lead = vbicq_u32(lead, vmovq_n_u32(uint32_t(~0x03FFFFFF)));
+      // Blend pairs
+      // 000000cc ccdddddd|11110aaa bbbbbb00
+      uint16x8_t blend = vreinterpretq_u16_u32(
+          vbslq_u32(vmovq_n_u32(0x0000FFFF), trail, lead));
+      // Add magic number to finish the result
+      // 110111CC CCDDDDDD|110110AA BBBBBBCC
+      uint16x8_t composed = vaddq_u16(blend, magic_with_low_2);
+      // Byte swap if necessary
+      if (!match_system(big_endian)) {
+        composed =
+            vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+      }
+      uint16_t buffer[8];
+      vst1q_u16(reinterpret_cast<uint16_t *>(buffer), composed);
+      for (int k = 0; k < 6; k++) {
+        utf16_output[k] = buffer[k];
+      } // the loop might compiler to a couple of instructions.
+      utf16_output += 6; // We wrote 3 32-bit surrogate pairs.
+      return 12;         // We consumed 12 bytes.
+    }
+    // 3 1-4 byte sequences
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 3 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
+    // added to fix issue https://github.com/simdutf/simdutf/issues/514
+    // We only want to write 2 * 16-bit code units when that is actually what we
+    // have. Unfortunately, we cannot trust the input. So it is possible to get
+    // 0xff as an input byte and it should not result in a surrogate pair. We
+    // need to check for that.
+    uint32_t permbuffer[4];
+    vst1q_u32(permbuffer, perm);
+    // Mask the low and middle bytes
+    // 00000000 00000000 00000000 0ddddddd
+    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7f));
+    // Because the surrogates need more work, the high surrogate is computed
+    // first.
+    uint32x4_t middlehigh = vshlq_n_u32(perm, 2);
+    // 00000000 00000000 00cccccc 00000000
+    uint32x4_t middlebyte = vandq_u32(perm, vmovq_n_u32(0x3F00));
+    // Start assembling the sequence. Since the 4th byte is in the same position
+    // as it would be in a surrogate and there is no dependency, shift left
+    // instead of right. 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx 4 byte:
+    // 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
+    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0xFF000000), perm, middlehigh);
+    // Top 16 bits contains the high ten bits of the surrogate pair before
+    // correction 3 byte: 00000000 10bbbbcc|cccc0000 00000000 4 byte: 11110aaa
+    // bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
+    uint32x4_t abc =
+        vbslq_u32(vmovq_n_u32(0xFFFC0000), ab, vshlq_n_u32(middlebyte, 4));
+    // Combine the low 6 or 7 bits by a shift right accumulate
+    // 3 byte: 00000000 00000010|bbbbcccc ccdddddd - low 16 bits correct
+    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o
+    // correction
+    uint32x4_t composed = vsraq_n_u32(ascii, abc, 6);
+    // After this is for surrogates
+    // Blend the low and high surrogates
+    // 4 byte: 11110aaa bbbbbbcc|bbbbcccc ccdddddd
+    uint32x4_t mixed = vbslq_u32(vmovq_n_u32(0xFFFF0000), abc, composed);
+    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits
+    // yet as 0x10000 was not subtracted from the codepoint yet. 4 byte:
+    // 11110aaa bbbbbbcc|000000cc ccdddddd
+    uint16x8_t masked_pair = vreinterpretq_u16_u32(
+        vbicq_u32(mixed, vmovq_n_u32(uint32_t(~0xFFFF03FF))));
+    // Correct the remaining UTF-8 prefix, surrogate offset, and add the
+    // surrogate prefixes in one magic 16-bit addition. similar magic number but
+    // without the continue byte adjust and halfword swapped UTF-8 4b prefix   =
+    // -0xF000|0x0000 surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
+    // surrogate high    = +0xD800|0x0000
+    // surrogate low     = +0x0000|0xDC00
+    // -----------------------------------
+    //                   = +0xE7C0|0xDC00
+    uint16x8_t magic = vreinterpretq_u16_u32(vmovq_n_u32(0xE7C0DC00));
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD - surrogate pair complete
+    uint32x4_t surrogates =
+        vreinterpretq_u32_u16(vaddq_u16(masked_pair, magic));
+    // If the high bit is 1 (s32 less than zero), this needs a surrogate pair
+    uint32x4_t is_pair = vcltzq_s32(vreinterpretq_s32_u32(perm));
+
+    // Select either the 4 byte surrogate pair or the 2 byte solo codepoint
+    // 3 byte: 0xxxxxxx xxxxxxxx|bbbbcccc ccdddddd
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD
+    uint32x4_t selected = vbslq_u32(is_pair, surrogates, composed);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      selected =
+          vreinterpretq_u32_u8(vrev16q_u8(vreinterpretq_u8_u32(selected)));
+    }
+    // Attempting to shuffle and store would be complex, just scalarize.
+    uint32_t buffer[4];
+    vst1q_u32(buffer, selected);
+    // Test for the top bit of the surrogate mask. Remove due to issue 514
+    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 :
+    // 0x00800000;
+    for (size_t i = 0; i < 3; i++) {
+      // Surrogate
+      // Used to be if (buffer[i] & SURROGATE_MASK) {
+      // See discussion above.
+      // patch for issue https://github.com/simdutf/simdutf/issues/514
+      if ((permbuffer[i] & 0xf8000000) == 0xf0000000) {
+        utf16_output[0] = uint16_t(buffer[i] >> 16);
+        utf16_output[1] = uint16_t(buffer[i] & 0xFFFF);
+        utf16_output += 2;
+      } else {
+        utf16_output[0] = uint16_t(buffer[i] & 0xFFFF);
+        utf16_output++;
+      }
+    }
+    return consumed;
+  } else {
+    // here we know that there is an error but we do not handle errors
+    return 12;
+  }
+}
+/* end file src/arm64/arm_convert_utf8_to_utf16.cpp */
+/* begin file src/arm64/arm_convert_utf8_to_utf32.cpp */
+// Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+size_t convert_masked_utf8_to_utf32(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_out) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  uint32_t *&utf32_output = reinterpret_cast<uint32_t *&>(utf32_out);
+  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xFFF;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  // We first try a few fast paths.
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process in chunks of 12 bytes.
+    // use fast implementation in src/simdutf/arm64/simd.h
+    // Ideally the compiler can keep the tables in registers.
+    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
+    temp.store_ascii_as_utf32_tbl(utf32_out);
+    utf32_output += 12; // We wrote 12 32-bit characters.
+    return 12;          // We consumed 12 bytes.
+  }
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. Convert to UTF-16
+    uint16x4_t composed_utf16 = convert_utf8_3_byte_to_utf16(in);
+    // Zero extend and store via ST2 with a zero.
+    uint16x4x2_t interleaver = {{composed_utf16, vmov_n_u16(0)}};
+    vst2_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
+
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if (input_utf8_end_of_code_point_mask == 0xaaa) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte
+    // UTF-32 code units. Convert to UTF-16
+    uint16x8_t composed_utf16 = convert_utf8_2_byte_to_utf16(in);
+    // Zero extend and store via ST2 with a zero.
+    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
+    vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
+    utf32_output += 6; // We wrote 6 32-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
+  /// Either no fast path or an unimportant fast path.
+
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // Convert to UTF-16
+    uint16x8_t composed_utf16 = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    // Zero extend and store with ST2 and zero
+    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
+    vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
+    utf32_output += 6; // We wrote 6 32-bit characters.
+    return consumed;
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    // Shuffle
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
+    // Split
+    // 00000000 00000000 0ccccccc
+    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F)); // 6 or 7 bits
+    // Note: unmasked
+    // xxxxxxxx aaaaxxxx xxxxxxxx
+    uint32x4_t high = vshrq_n_u32(perm, 4); // 4 bits
+    // Use 16 bit bic instead of and.
+    // The top bits will be corrected later in the bsl
+    // 00000000 10bbbbbb 00000000
+    uint32x4_t middle = vreinterpretq_u32_u16(
+        vbicq_u16(vreinterpretq_u16_u32(perm),
+                  vmovq_n_u16(uint16_t(~0xff00)))); // 5 or 6 bits
+    // Combine low and middle with shift right accumulate
+    // 00000000 00xxbbbb bbcccccc
+    uint32x4_t lowmid = vsraq_n_u32(ascii, middle, 2);
+    // Insert top 4 bits from high byte with bitwise select
+    // 00000000 aaaabbbb bbcccccc
+    uint32x4_t composed = vbslq_u32(vmovq_n_u32(0x0000F000), high, lowmid);
+    vst1q_u32(utf32_output, composed);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return consumed;
+  } else if (idx < 209) {
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-32 code units. This uses the same method as the fixed 3 byte
+      // version, reversing and shift left insert. However, there is no need for
+      // a shuffle mask now, just rev16 and rev32.
+      //
+      // This version does not use the LUT, but 4 byte sequences are less common
+      // and the overhead of the extra memory access is less important than the
+      // early branch overhead in shorter sequences, so it comes last.
+
+      // Swap pairs of bytes
+      // 10dddddd|10cccccc|10bbbbbb|11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      uint16x8_t swap1 = vreinterpretq_u16_u8(vrev16q_u8(in));
+      // Shift left and insert
+      // xxxxcccc ccdddddd|xxxxxxxa aabbbbbb
+      uint16x8_t merge1 = vsliq_n_u16(swap1, vreinterpretq_u16_u8(in), 6);
+      // Swap 16-bit lanes
+      // xxxxcccc ccdddddd xxxxxxxa aabbbbbb
+      // xxxxxxxa aabbbbbb xxxxcccc ccdddddd
+      uint32x4_t swap2 = vreinterpretq_u32_u16(vrev32q_u16(merge1));
+      // Shift insert again
+      // xxxxxxxx xxxaaabb bbbbcccc ccdddddd
+      uint32x4_t merge2 = vsliq_n_u32(swap2, vreinterpretq_u32_u16(merge1), 12);
+      // Clear the garbage
+      // 00000000 000aaabb bbbbcccc ccdddddd
+      uint32x4_t composed = vandq_u32(merge2, vmovq_n_u32(0x1FFFFF));
+      // Store
+      vst1q_u32(utf32_output, composed);
+
+      utf32_output += 3; // We wrote 3 32-bit characters.
+      return 12;         // We consumed 12 bytes.
+    }
+    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit
+    // due to surrogates no longer being involved.
+    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
+        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 2 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
+    // Ascii
+    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F));
+    uint32x4_t middle = vandq_u32(perm, vmovq_n_u32(0x3f00));
+    // When converting the way we do, the 3 byte prefix will be interpreted as
+    // the 18th bit being set, since the code would interpret the lead byte
+    // (0b1110bbbb) as a continuation byte (0b10bbbbbb). To fix this, we can
+    // either xor or do an 8 bit add of the 6th bit shifted right by 1. Since
+    // NEON has shift right accumulate, we use that.
+    //  4 byte   3 byte
+    // 10bbbbbb 1110bbbb
+    // 00000000 01000000 6th bit
+    // 00000000 00100000 shift right
+    // 10bbbbbb 0000bbbb add
+    // 00bbbbbb 0000bbbb mask
+    uint8x16_t correction =
+        vreinterpretq_u8_u32(vandq_u32(perm, vmovq_n_u32(0x00400000)));
+    uint32x4_t corrected = vreinterpretq_u32_u8(
+        vsraq_n_u8(vreinterpretq_u8_u32(perm), correction, 1));
+    // 00000000 00000000 0000cccc ccdddddd
+    uint32x4_t cd = vsraq_n_u32(ascii, middle, 2);
+    // Insert twice
+    // xxxxxxxx xxxaaabb bbbbxxxx xxxxxxxx
+    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0x01C0000), vshrq_n_u32(corrected, 6),
+                              vshrq_n_u32(corrected, 4));
+    // 00000000 000aaabb bbbbcccc ccdddddd
+    uint32x4_t composed = vbslq_u32(vmovq_n_u32(0xFFE00FFF), cd, ab);
+    // Store
+    vst1q_u32(utf32_output, composed);
+    utf32_output += 3; // We wrote 3 32-bit characters.
+    return consumed;
+  } else {
+    // here we know that there is an error but we do not handle errors
+    return 12;
+  }
+}
+/* end file src/arm64/arm_convert_utf8_to_utf32.cpp */
+
+/* begin file src/arm64/arm_convert_utf16_to_latin1.cpp */
+
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+arm_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) {
+  const char16_t *end = buf + len;
+  while (end - buf >= 8) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0xff) {
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(in);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
+    }
+  } // while
+  return std::make_pair(buf, latin1_output);
+}
+
+template <endianness big_endian>
+std::pair<result, char *>
+arm_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+  while (end - buf >= 8) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0xff) {
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(in);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
+    } else {
+      // Let us do a scalar fallback.
+      for (int k = 0; k < 8; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
+        }
+      }
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
+}
+/* end file src/arm64/arm_convert_utf16_to_latin1.cpp */
+/* begin file src/arm64/arm_convert_utf16_to_utf32.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
+
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       is in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
+
+    Ad 1.
+
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    char) or 2) two UTF8 bytes.
+
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+    Ad 2.
+
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
+
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
+
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char32_t *>
+arm_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *end = buf + len;
+
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+
+  while (end - buf >= 8) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
+      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
+      }
+      buf += k;
+    }
+  } // while
+  return std::make_pair(buf, reinterpret_cast<char32_t *>(utf32_output));
+}
+
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char32_t *>
+arm_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                       char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+
+  while ((end - buf) >= 8) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
+      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
+      }
+      buf += k;
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char32_t *>(utf32_output));
+}
+/* end file src/arm64/arm_convert_utf16_to_utf32.cpp */
+/* begin file src/arm64/arm_convert_utf16_to_utf8.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
+
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       is in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
+
+    Ad 1.
+
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    char) or 2) two UTF8 bytes.
+
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+    Ad 2.
+
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
+
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
+
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+arm_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *end = buf + len;
+
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      uint16x8_t nextin =
+          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
+      if (!match_system(big_endian)) {
+        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+      }
+      if (vmaxvq_u16(nextin) > 0x7F) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(in);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
+        // 2. store (16 bytes)
+        vst1q_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
+    }
+
+    if (vmaxvq_u16(in) <= 0x7FF) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const uint16x8_t t0 = vshlq_n_u16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const uint16x8_t t2 = vandq_u16(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const uint16x8_t t3 = vorrq_u16(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+      const uint8x16_t utf8_unpacked =
+          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
+      // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t mask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                               0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+      // 4. pack the bytes
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const uint8x16_t shuffle = vld1q_u8(row + 1);
+      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+      // 5. store bytes
+      vst1q_u8(utf8_output, utf8_packed);
+
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
+    }
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
+
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
+
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
+
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
+
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const uint16x8_t t0 = vreinterpretq_u16_u8(
+          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      const uint16x8_t s0 = vshrq_n_u16(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
+      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+      // [00bb|bbbb|0000|aaaa]
+      const uint16x8_t s2 = vorrq_u16(s0, s1s);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
+      const uint16x8_t m0 =
+          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
+      const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+      // 4. expand code units 16-bit => 32-bit
+      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t onemask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+      const uint16x8_t twomask = simdutf_make_uint16x8_t(
+          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                  0x0100, 0x0400, 0x1000, 0x4000};
+      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                  0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+      const uint16x8_t combined =
+          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                    vandq_u16(one_or_two_bytes_bytemask, twomask));
+      const uint16_t mask = vaddvq_u16(combined);
+      // The following fast path may or may not be beneficial.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += 12;
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += 12;
+        buf += 8;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+      vst1q_u8(utf8_output, utf8_0);
+      utf8_output += row0[0];
+      vst1q_u8(utf8_output, utf8_1);
+      utf8_output += row1[0];
+
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
+
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
+}
+
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char *>
+arm_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+
+  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
+  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
+    if (!match_system(big_endian)) {
+      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+    }
+    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      uint16x8_t nextin =
+          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
+      if (!match_system(big_endian)) {
+        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+      }
+      if (vmaxvq_u16(nextin) > 0x7F) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(in);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
+        // 2. store (16 bytes)
+        vst1q_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
+    }
+
+    if (vmaxvq_u16(in) <= 0x7FF) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const uint16x8_t t0 = vshlq_n_u16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const uint16x8_t t2 = vandq_u16(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const uint16x8_t t3 = vorrq_u16(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+      const uint8x16_t utf8_unpacked =
+          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
+      // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t mask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                               0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+      // 4. pack the bytes
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+      const uint8x16_t shuffle = vld1q_u8(row + 1);
+      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+      // 5. store bytes
+      vst1q_u8(utf8_output, utf8_packed);
+
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
+    }
+    const uint16x8_t surrogates_bytemask =
+        vceqq_u16(vandq_u16(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (vmaxvq_u16(surrogates_bytemask) == 0) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
+
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
+
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
+
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
+
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const uint16x8_t t0 = vreinterpretq_u16_u8(
+          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      const uint16x8_t s0 = vshrq_n_u16(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
+      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+      // [00bb|bbbb|0000|aaaa]
+      const uint16x8_t s2 = vorrq_u16(s0, s1s);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
+      const uint16x8_t m0 =
+          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
+      const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+      // 4. expand code units 16-bit => 32-bit
+      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+      const uint16x8_t onemask = simdutf_make_uint16x8_t(
+          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+      const uint16x8_t twomask = simdutf_make_uint16x8_t(
+          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                  0x0100, 0x0400, 0x1000, 0x4000};
+      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                  0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+      const uint16x8_t combined =
+          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                    vandq_u16(one_or_two_bytes_bytemask, twomask));
+      const uint16_t mask = vaddvq_u16(combined);
+      // The following fast path may or may not be beneficial.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += 12;
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += 12;
+        buf += 8;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+      vst1q_u8(utf8_output, utf8_0);
+      utf8_output += row0[0];
+      vst1q_u8(utf8_output, utf8_1);
+      utf8_output += row1[0];
+
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/arm64/arm_convert_utf16_to_utf8.cpp */
+
+/* begin file src/arm64/arm_base64.cpp */
+/**
+ * References and further reading:
+ *
+ * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
+ * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
+ * https://arxiv.org/abs/1910.05109
+ *
+ * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
+ * Instructions, ACM Transactions on the Web 12 (3), 2018.
+ * https://arxiv.org/abs/1704.00605
+ *
+ * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
+ * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
+ * Request for Comments: 4648.
+ *
+ * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
+ * http://www.alfredklomp.com/programming/sse-base64/. (2014).
+ *
+ * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
+ * acceleration. https://github.com/aklomp/base64. (2014).
+ *
+ * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
+ * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
+ *
+ * Nick Kopp. 2013. Base64 Encoding on a GPU.
+ * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
+ */
+
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
+  // credit: Wojciech Muła
+  uint8_t *out = (uint8_t *)dst;
+  constexpr static uint8_t source_table[64] = {
+      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
+      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
+      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
+      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
+      'N', 'd', 't', '9', 'O', 'e', 'u', '+', 'P', 'f', 'v', '/',
+  };
+  constexpr static uint8_t source_table_url[64] = {
+      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
+      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
+      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
+      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
+      'N', 'd', 't', '9', 'O', 'e', 'u', '-', 'P', 'f', 'v', '_',
+  };
+  const uint8x16_t v3f = vdupq_n_u8(0x3f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  // When trying to load a uint8_t array, Visual Studio might
+  // error with: error C2664: '__n128x4 neon_ld4m_q8(const char *)':
+  // cannot convert argument 1 from 'const uint8_t [64]' to 'const char *
+  const uint8x16x4_t table = vld4q_u8(
+      (reinterpret_cast<const char *>(options & base64_url) ? source_table_url
+                                                            : source_table));
+#else
+  const uint8x16x4_t table =
+      vld4q_u8((options & base64_url) ? source_table_url : source_table);
+#endif
+  size_t i = 0;
+  for (; i + 16 * 3 <= srclen; i += 16 * 3) {
+    const uint8x16x3_t in = vld3q_u8((const uint8_t *)src + i);
+    uint8x16x4_t result;
+    result.val[0] = vshrq_n_u8(in.val[0], 2);
+    result.val[1] =
+        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[1], 4), in.val[0], 4), v3f);
+    result.val[2] =
+        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[2], 6), in.val[1], 2), v3f);
+    result.val[3] = vandq_u8(in.val[2], v3f);
+    result.val[0] = vqtbl4q_u8(table, result.val[0]);
+    result.val[1] = vqtbl4q_u8(table, result.val[1]);
+    result.val[2] = vqtbl4q_u8(table, result.val[2]);
+    result.val[3] = vqtbl4q_u8(table, result.val[3]);
+    vst4q_u8(out, result);
+    out += 64;
+  }
+  out += scalar::base64::tail_encode_base64((char *)out, src + i, srclen - i,
+                                            options);
+
+  return size_t((char *)out - dst);
+}
+
+static inline void compress(uint8x16_t data, uint16_t mask, char *output) {
+  if (mask == 0) {
+    vst1q_u8((uint8_t *)output, data);
+    return;
+  }
+  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
+  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
+  uint64x2_t compactmasku64 = {tables::base64::thintable_epi8[mask1],
+                               tables::base64::thintable_epi8[mask2]};
+  uint8x16_t compactmask = vreinterpretq_u8_u64(compactmasku64);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  const uint8x16_t off =
+      simdutf_make_uint8x16_t(0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8);
+#else
+  const uint8x16_t off = {0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8};
+#endif
+
+  compactmask = vaddq_u8(compactmask, off);
+  uint8x16_t pruned = vqtbl1q_u8(data, compactmask);
+
+  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
+  // then load the corresponding mask, what it does is to write
+  // only the first pop1 bytes from the first 8 bytes, and then
+  // it fills in with the bytes from the second 8 bytes + some filling
+  // at the end.
+  compactmask = vld1q_u8(tables::base64::pshufb_combine_table + pop1 * 8);
+  uint8x16_t answer = vqtbl1q_u8(pruned, compactmask);
+  vst1q_u8((uint8_t *)output, answer);
+}
+
+struct block64 {
+  uint8x16_t chunks[4];
+};
+
+static_assert(sizeof(block64) == 64, "block64 is not 64 bytes");
+template <bool base64_url> uint64_t to_base64_mask(block64 *b, bool *error) {
+  uint8x16_t v0f = vdupq_n_u8(0xf);
+
+  uint8x16_t underscore0, underscore1, underscore2, underscore3;
+  if (base64_url) {
+    underscore0 = vceqq_u8(b->chunks[0], vdupq_n_u8(0x5f));
+    underscore1 = vceqq_u8(b->chunks[1], vdupq_n_u8(0x5f));
+    underscore2 = vceqq_u8(b->chunks[2], vdupq_n_u8(0x5f));
+    underscore3 = vceqq_u8(b->chunks[3], vdupq_n_u8(0x5f));
+  } else {
+    (void)underscore0;
+    (void)underscore1;
+    (void)underscore2;
+    (void)underscore3;
+  }
+
+  uint8x16_t lo_nibbles0 = vandq_u8(b->chunks[0], v0f);
+  uint8x16_t lo_nibbles1 = vandq_u8(b->chunks[1], v0f);
+  uint8x16_t lo_nibbles2 = vandq_u8(b->chunks[2], v0f);
+  uint8x16_t lo_nibbles3 = vandq_u8(b->chunks[3], v0f);
+
+  // Needed by the decoding step.
+  uint8x16_t hi_nibbles0 = vshrq_n_u8(b->chunks[0], 4);
+  uint8x16_t hi_nibbles1 = vshrq_n_u8(b->chunks[1], 4);
+  uint8x16_t hi_nibbles2 = vshrq_n_u8(b->chunks[2], 4);
+  uint8x16_t hi_nibbles3 = vshrq_n_u8(b->chunks[3], 4);
+  uint8x16_t lut_lo;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    lut_lo =
+        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                                0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4);
+  } else {
+    lut_lo =
+        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                                0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4);
+  }
+#else
+  if (base64_url) {
+    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                        0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4};
+  } else {
+    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
+                        0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4};
+  }
+#endif
+  uint8x16_t lo0 = vqtbl1q_u8(lut_lo, lo_nibbles0);
+  uint8x16_t lo1 = vqtbl1q_u8(lut_lo, lo_nibbles1);
+  uint8x16_t lo2 = vqtbl1q_u8(lut_lo, lo_nibbles2);
+  uint8x16_t lo3 = vqtbl1q_u8(lut_lo, lo_nibbles3);
+  uint8x16_t lut_hi;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    lut_hi =
+        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
+                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
+  } else {
+    lut_hi =
+        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
+                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
+  }
+#else
+  if (base64_url) {
+    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
+                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
+  } else {
+    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
+                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
+  }
+#endif
+  uint8x16_t hi0 = vqtbl1q_u8(lut_hi, hi_nibbles0);
+  uint8x16_t hi1 = vqtbl1q_u8(lut_hi, hi_nibbles1);
+  uint8x16_t hi2 = vqtbl1q_u8(lut_hi, hi_nibbles2);
+  uint8x16_t hi3 = vqtbl1q_u8(lut_hi, hi_nibbles3);
+
+  if (base64_url) {
+    hi0 = vbicq_u8(hi0, underscore0);
+    hi1 = vbicq_u8(hi1, underscore1);
+    hi2 = vbicq_u8(hi2, underscore2);
+    hi3 = vbicq_u8(hi3, underscore3);
+  }
+
+  uint8_t checks =
+      vmaxvq_u8(vorrq_u8(vorrq_u8(vandq_u8(lo0, hi0), vandq_u8(lo1, hi1)),
+                         vorrq_u8(vandq_u8(lo2, hi2), vandq_u8(lo3, hi3))));
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  const uint8x16_t bit_mask =
+      simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                              0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
+#else
+  const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
+                               0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
+#endif
+  uint64_t badcharmask = 0;
+  *error = checks > 0x3;
+  if (checks) {
+    // Add each of the elements next to each other, successively, to stuff each
+    // 8 byte mask into one.
+    uint8x16_t test0 = vtstq_u8(lo0, hi0);
+    uint8x16_t test1 = vtstq_u8(lo1, hi1);
+    uint8x16_t test2 = vtstq_u8(lo2, hi2);
+    uint8x16_t test3 = vtstq_u8(lo3, hi3);
+    uint8x16_t sum0 =
+        vpaddq_u8(vandq_u8(test0, bit_mask), vandq_u8(test1, bit_mask));
+    uint8x16_t sum1 =
+        vpaddq_u8(vandq_u8(test2, bit_mask), vandq_u8(test3, bit_mask));
+    sum0 = vpaddq_u8(sum0, sum1);
+    sum0 = vpaddq_u8(sum0, sum0);
+    badcharmask = vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
+  }
+  // This is the transformation step that can be done while we are waiting for
+  // sum0
+  uint8x16_t roll_lut;
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+  if (base64_url) {
+    roll_lut =
+        simdutf_make_uint8x16_t(0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
+  } else {
+    roll_lut =
+        simdutf_make_uint8x16_t(0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
+  }
+#else
+  if (base64_url) {
+    roll_lut = uint8x16_t{0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                          0x0,  0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
+  } else {
+    roll_lut = uint8x16_t{0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
+                          0x0, 0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
+  }
+#endif
+  uint8x16_t vsecond_last = base64_url ? vdupq_n_u8(0x2d) : vdupq_n_u8(0x2f);
+  if (base64_url) {
+    hi_nibbles0 = vbicq_u8(hi_nibbles0, underscore0);
+    hi_nibbles1 = vbicq_u8(hi_nibbles1, underscore1);
+    hi_nibbles2 = vbicq_u8(hi_nibbles2, underscore2);
+    hi_nibbles3 = vbicq_u8(hi_nibbles3, underscore3);
+  }
+  uint8x16_t roll0 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[0], vsecond_last), hi_nibbles0));
+  uint8x16_t roll1 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[1], vsecond_last), hi_nibbles1));
+  uint8x16_t roll2 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[2], vsecond_last), hi_nibbles2));
+  uint8x16_t roll3 = vqtbl1q_u8(
+      roll_lut, vaddq_u8(vceqq_u8(b->chunks[3], vsecond_last), hi_nibbles3));
+  b->chunks[0] = vaddq_u8(b->chunks[0], roll0);
+  b->chunks[1] = vaddq_u8(b->chunks[1], roll1);
+  b->chunks[2] = vaddq_u8(b->chunks[2], roll2);
+  b->chunks[3] = vaddq_u8(b->chunks[3], roll3);
+  return badcharmask;
+}
+
+void copy_block(block64 *b, char *output) {
+  vst1q_u8((uint8_t *)output, b->chunks[0]);
+  vst1q_u8((uint8_t *)output + 16, b->chunks[1]);
+  vst1q_u8((uint8_t *)output + 32, b->chunks[2]);
+  vst1q_u8((uint8_t *)output + 48, b->chunks[3]);
+}
+
+uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
+  uint64_t popcounts =
+      vget_lane_u64(vreinterpret_u64_u8(vcnt_u8(vcreate_u8(~mask))), 0);
+  uint64_t offsets = popcounts * 0x0101010101010101;
+  compress(b->chunks[0], uint16_t(mask), output);
+  compress(b->chunks[1], uint16_t(mask >> 16), &output[(offsets >> 8) & 0xFF]);
+  compress(b->chunks[2], uint16_t(mask >> 32), &output[(offsets >> 24) & 0xFF]);
+  compress(b->chunks[3], uint16_t(mask >> 48), &output[(offsets >> 40) & 0xFF]);
+  return offsets >> 56;
+}
+
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
+void load_block(block64 *b, const char *src) {
+  b->chunks[0] = vld1q_u8(reinterpret_cast<const uint8_t *>(src));
+  b->chunks[1] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 16);
+  b->chunks[2] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 32);
+  b->chunks[3] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 48);
+}
+
+// The caller of this function is responsible to ensure that there are 32 bytes
+// available from reading at data. It returns a 16-byte value, narrowing with
+// saturation the 16-bit words.
+inline uint8x16_t load_satured(const uint16_t *data) {
+  uint16x8_t in1 = vld1q_u16(data);
+  uint16x8_t in2 = vld1q_u16(data + 8);
+  return vqmovn_high_u16(vqmovn_u16(in1), in2);
+}
+
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
+void load_block(block64 *b, const char16_t *src) {
+  b->chunks[0] = load_satured(reinterpret_cast<const uint16_t *>(src));
+  b->chunks[1] = load_satured(reinterpret_cast<const uint16_t *>(src) + 16);
+  b->chunks[2] = load_satured(reinterpret_cast<const uint16_t *>(src) + 32);
+  b->chunks[3] = load_satured(reinterpret_cast<const uint16_t *>(src) + 48);
+}
+
+// decode 64 bytes and output 48 bytes
+void base64_decode_block(char *out, const char *src) {
+  uint8x16x4_t str = vld4q_u8((uint8_t *)src);
+  uint8x16x3_t outvec;
+  outvec.val[0] =
+      vorrq_u8(vshlq_n_u8(str.val[0], 2), vshrq_n_u8(str.val[1], 4));
+  outvec.val[1] =
+      vorrq_u8(vshlq_n_u8(str.val[1], 4), vshrq_n_u8(str.val[2], 2));
+  outvec.val[2] = vorrq_u8(vshlq_n_u8(str.val[2], 6), str.val[3]);
+  vst3q_u8((uint8_t *)out, outvec);
+}
+
+template <bool base64_url, typename char_type>
+full_result
+compress_decode_base64(char *dst, const char_type *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
+  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
+                                        : tables::base64::to_base64_value;
+  size_t equallocation =
+      srclen; // location of the first padding character if any
+  // skip trailing spaces
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
+    srclen--;
+  }
+  size_t equalsigns = 0;
+  if (srclen > 0 && src[srclen - 1] == '=') {
+    equallocation = srclen - 1;
+    srclen--;
+    equalsigns = 1;
+    // skip trailing spaces
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
+      srclen--;
+    }
+    if (srclen > 0 && src[srclen - 1] == '=') {
+      equallocation = srclen - 1;
+      srclen--;
+      equalsigns = 2;
+    }
+  }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  const char_type *const srcinit = src;
+  const char *const dstinit = dst;
+  const char_type *const srcend = src + srclen;
+
+  constexpr size_t block_size = 10;
+  char buffer[block_size * 64];
+  char *bufferptr = buffer;
+  if (srclen >= 64) {
+    const char_type *const srcend64 = src + srclen - 64;
+    while (src <= srcend64) {
+      block64 b;
+      load_block(&b, src);
+      src += 64;
+      bool error = false;
+      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
+      if (badcharmask) {
+        if (error) {
+          src -= 64;
+          while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+                 to_base64[uint8_t(*src)] <= 64) {
+            src++;
+          }
+          if (src < srcend) {
+            // should never happen
+          }
+          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                  size_t(dst - dstinit)};
+        }
+      }
+
+      if (badcharmask != 0) {
+        // optimization opportunity: check for simple masks like those made of
+        // continuous 1s followed by continuous 0s. And masks containing a
+        // single bad character.
+        bufferptr += compress_block(&b, badcharmask, bufferptr);
+      } else {
+        // optimization opportunity: if bufferptr == buffer and mask == 0, we
+        // can avoid the call to compress_block and decode directly.
+        copy_block(&b, bufferptr);
+        bufferptr += 64;
+      }
+      if (bufferptr >= (block_size - 1) * 64 + buffer) {
+        for (size_t i = 0; i < (block_size - 1); i++) {
+          base64_decode_block(dst, buffer + i * 64);
+          dst += 48;
+        }
+        std::memcpy(buffer, buffer + (block_size - 1) * 64,
+                    64); // 64 might be too much
+        bufferptr -= (block_size - 1) * 64;
+      }
+    }
+  }
+  char *buffer_start = buffer;
+  // Optimization note: if this is almost full, then it is worth our
+  // time, otherwise, we should just decode directly.
+  int last_block = (int)((bufferptr - buffer_start) % 64);
+  if (last_block != 0 && srcend - src + last_block >= 64) {
+    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
+      uint8_t val = to_base64[uint8_t(*src)];
+      *bufferptr = char(val);
+      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      }
+      bufferptr += (val <= 63);
+      src++;
+    }
+  }
+
+  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
+    base64_decode_block(dst, buffer_start);
+    dst += 48;
+  }
+  if ((bufferptr - buffer_start) % 64 != 0) {
+    while (buffer_start + 4 < bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 4);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    if (buffer_start + 4 <= bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 3);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
+    // backtrack
+    int leftover = int(bufferptr - buffer_start);
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
+      }
+      src--;
+      leftover--;
+    }
+  }
+  if (src < srcend + equalsigns) {
+    full_result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      return r;
+    } else {
+      r.output_count += size_t(dst - dstinit);
+    }
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
+      // additional checks
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
+        r.error = error_code::INVALID_BASE64_CHARACTER;
+        r.input_count = equallocation;
+      }
+    }
+    return r;
+  }
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
+    }
+  }
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
+}
+/* end file src/arm64/arm_base64.cpp */
+/* begin file src/arm64/arm_convert_utf32_to_latin1.cpp */
+std::pair<const char32_t *, char *>
+arm_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                            char *latin1_output) {
+  const char32_t *end = buf + len;
+  while (end - buf >= 8) {
+    uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+
+    uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
+    if (vmaxvq_u16(utf16_packed) <= 0xff) {
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
+    }
+  } // while
+  return std::make_pair(buf, latin1_output);
+}
+
+std::pair<result, char *>
+arm_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  while (end - buf >= 8) {
+    uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+
+    uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
+
+    if (vmaxvq_u16(utf16_packed) <= 0xff) {
+      // 1. pack the bytes
+      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
+      // 2. store (8 bytes)
+      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
+    } else {
+      // Let us do a scalar fallback.
+      for (int k = 0; k < 8; k++) {
+        uint32_t word = buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
+        }
+      }
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
+}
+/* end file src/arm64/arm_convert_utf32_to_latin1.cpp */
+/* begin file src/arm64/arm_convert_utf32_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char32_t *, char16_t *>
+arm_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                           char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *end = buf + len;
+
+  uint16x4_t forbidden_bytemask = vmov_n_u16(0x0);
+
+  while (end - buf >= 4) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(in) <= 0xFFFF) {
+      uint16x4_t utf16_packed = vmovn_u32(in);
+
+      const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
+      const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
+      forbidden_bytemask = vorr_u16(vand_u16(vcle_u16(utf16_packed, v_dfff),
+                                             vcge_u16(utf16_packed, v_d800)),
+                                    forbidden_bytemask);
+
+      if (!match_system(big_endian)) {
+        utf16_packed =
+            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
+      }
+      vst1_u16(utf16_output, utf16_packed);
+      utf16_output += 4;
+      buf += 4;
+    } else {
+      size_t forward = 3;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (!match_system(big_endian)) {
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
+    }
+  }
+
+  // check for invalid input
+  if (vmaxv_u16(forbidden_bytemask) != 0) {
+    return std::make_pair(nullptr, reinterpret_cast<char16_t *>(utf16_output));
+  }
+
+  return std::make_pair(buf, reinterpret_cast<char16_t *>(utf16_output));
+}
+
+template <endianness big_endian>
+std::pair<result, char16_t *>
+arm_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                       char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  while (end - buf >= 4) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(in) <= 0xFFFF) {
+      uint16x4_t utf16_packed = vmovn_u32(in);
+
+      const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
+      const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
+      const uint16x4_t forbidden_bytemask = vand_u16(
+          vcle_u16(utf16_packed, v_dfff), vcge_u16(utf16_packed, v_d800));
+      if (vmaxv_u16(forbidden_bytemask) != 0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
+
+      if (!match_system(big_endian)) {
+        utf16_packed =
+            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
+      }
+      vst1_u16(utf16_output, utf16_packed);
+      utf16_output += 4;
+      buf += 4;
+    } else {
+      size_t forward = 3;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (!match_system(big_endian)) {
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
+    }
+  }
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char16_t *>(utf16_output));
+}
+/* end file src/arm64/arm_convert_utf32_to_utf16.cpp */
+/* begin file src/arm64/arm_convert_utf32_to_utf8.cpp */
+std::pair<const char32_t *, char *>
+arm_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *end = buf + len;
+
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+
+  uint16x8_t forbidden_bytemask = vmovq_n_u16(0x0);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (buf + 16 + safety_margin < end) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
+      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
+      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
+
+      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
+        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+        // t2 = [0000|0000|00bb|bbbb]
+        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const uint16x8_t t3 = vorrq_u16(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
+            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
+        // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t mask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                 0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+        const uint8x16_t shuffle = vld1q_u8(row + 1);
+        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+        // 5. store bytes
+        vst1q_u8(utf8_output, utf8_packed);
+
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
+        forbidden_bytemask =
+            vorrq_u16(vandq_u16(vcleq_u16(utf16_packed, v_dfff),
+                                vcgeq_u16(utf16_packed, v_d800)),
+                      forbidden_bytemask);
+
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+        /* In this branch we handle three cases:
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+          two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          three UTF-8 bytes
+
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
+
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
+
+          We precompute byte 1 for case #3 and -- **conditionally** --
+          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+          they differ by exactly one bit.
+
+          Finally from these two code units we build proper UTF-8 sequence,
+          taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        const uint16x8_t t0 =
+            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
+                                            vreinterpretq_u8_u16(dup_even)));
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        const uint16x8_t s1 =
+            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+        // [00bb|bbbb|0000|aaaa]
+        const uint16x8_t s2 = vorrq_u16(s0, s1s);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        const uint16x8_t one_or_two_bytes_bytemask =
+            vcleq_u16(utf16_packed, v_07ff);
+        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
+                                        one_or_two_bytes_bytemask);
+        const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+        // 4. expand code units 16-bit => 32-bit
+        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t onemask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+        const uint16x8_t twomask = simdutf_make_uint16x8_t(
+            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                    0x0100, 0x0400, 0x1000, 0x4000};
+        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                    0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+        const uint16x8_t combined =
+            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                      vandq_u16(one_or_two_bytes_bytemask, twomask));
+        const uint16_t mask = vaddvq_u16(combined);
+        // The following fast path may or may not be beneficial.
+        /*if(mask == 0) {
+          // We only have three-byte code units. Use fast path.
+          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+          vst1q_u8(utf8_output, utf8_0);
+          utf8_output += 12;
+          vst1q_u8(utf8_output, utf8_1);
+          utf8_output += 12;
+          buf += 8;
+          continue;
+        }*/
+        const uint8_t mask0 = uint8_t(mask);
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += row0[0];
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += row1[0];
+
+        buf += 8;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
+
+  // check for invalid input
+  if (vmaxvq_u16(forbidden_bytemask) != 0) {
+    return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
+  }
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
+}
+
+std::pair<result, char *>
+arm_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (buf + 16 + safety_margin < end) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+
+    // Check if no bits set above 16th
+    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
+      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
+      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
+        // 2. store (8 bytes)
+        vst1_u8(utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
+
+      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
+        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
+        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
+        // t2 = [0000|0000|00bb|bbbb]
+        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const uint16x8_t t3 = vorrq_u16(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
+            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
+        // 3. prepare bitmask for 8-bit lookup
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t mask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
+#else
+        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                 0x0002, 0x0008, 0x0020, 0x0080};
+#endif
+        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+        const uint8x16_t shuffle = vld1q_u8(row + 1);
+        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+
+        // 5. store bytes
+        vst1q_u8(utf8_output, utf8_packed);
+
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+
+        // check for invalid input
+        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
+        const uint16x8_t forbidden_bytemask = vandq_u16(
+            vcleq_u16(utf16_packed, v_dfff), vcgeq_u16(utf16_packed, v_d800));
+        if (vmaxvq_u16(forbidden_bytemask) != 0) {
+          return std::make_pair(result(error_code::SURROGATE, buf - start),
+                                reinterpret_cast<char *>(utf8_output));
+        }
+
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
+            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+#else
+        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
+                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
+#endif
+        /* In this branch we handle three cases:
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          single UFT-8 byte
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+          two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          three UTF-8 bytes
+
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
+
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
+
+          We precompute byte 1 for case #3 and -- **conditionally** --
+          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+          they differ by exactly one bit.
+
+          Finally from these two code units we build proper UTF-8 sequence,
+          taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        const uint16x8_t t0 =
+            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
+                                            vreinterpretq_u8_u16(dup_even)));
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        const uint16x8_t s1 =
+            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
+        // [00bb|bbbb|0000|aaaa]
+        const uint16x8_t s2 = vorrq_u16(s0, s1s);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
+        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        const uint16x8_t one_or_two_bytes_bytemask =
+            vcleq_u16(utf16_packed, v_07ff);
+        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
+                                        one_or_two_bytes_bytemask);
+        const uint16x8_t s4 = veorq_u16(s3, m0);
+#undef simdutf_vec
+
+        // 4. expand code units 16-bit => 32-bit
+        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
+        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
+        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
+#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
+        const uint16x8_t onemask = simdutf_make_uint16x8_t(
+            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
+        const uint16x8_t twomask = simdutf_make_uint16x8_t(
+            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
+#else
+        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
+                                    0x0100, 0x0400, 0x1000, 0x4000};
+        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
+                                    0x0200, 0x0800, 0x2000, 0x8000};
+#endif
+        const uint16x8_t combined =
+            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
+                      vandq_u16(one_or_two_bytes_bytemask, twomask));
+        const uint16_t mask = vaddvq_u16(combined);
+        // The following fast path may or may not be beneficial.
+        /*if(mask == 0) {
+          // We only have three-byte code units. Use fast path.
+          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
+          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
+          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
+          vst1q_u8(utf8_output, utf8_0);
+          utf8_output += 12;
+          vst1q_u8(utf8_output, utf8_1);
+          utf8_output += 12;
+          buf += 8;
+          continue;
+        }*/
+        const uint8_t mask0 = uint8_t(mask);
+
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
+        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+
+        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
+        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+
+        vst1q_u8(utf8_output, utf8_0);
+        utf8_output += row0[0];
+        vst1q_u8(utf8_output, utf8_1);
+        utf8_output += row1[0];
+
+        buf += 8;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/arm64/arm_convert_utf32_to_utf8.cpp */
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* begin file src/generic/buf_block_reader.h */
+namespace simdutf {
+namespace arm64 {
+namespace {
+
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
+public:
+  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
+  simdutf_really_inline size_t block_index();
+  simdutf_really_inline bool has_full_block() const;
+  simdutf_really_inline const uint8_t *full_block() const;
+  /**
+   * Get the last block, padded with spaces.
+   *
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
+   *
+   * @return the number of effective characters in the last block.
+   */
+  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
+  simdutf_really_inline void advance();
+
+private:
+  const uint8_t *buf;
+  const size_t len;
+  const size_t lenminusstep;
+  size_t idx;
+};
+
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
+}
+
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
+}
+
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
+    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+  }
+  buf[64] = '\0';
+  return buf;
+}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
+  return idx < lenminusstep;
+}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
+  return &buf[idx];
+}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
+  std::memcpy(dst, buf + idx, len - idx);
+  return len - idx;
+}
+
+template <size_t STEP_SIZE>
+simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
+  idx += STEP_SIZE;
+}
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/buf_block_reader.h */
+/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_validation {
+
+using namespace simd;
+
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
+
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
+
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
+      this->error |= this->prev_incomplete;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+      }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
+    }
+  }
+
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
+
+}; // struct utf8_checker
+} // namespace utf8_validation
+
+using utf8_validation::utf8_checker;
+
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+/* begin file src/generic/utf8_validation/utf8_validator.h */
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_validation {
+
+/**
+ * Validates that the string is actual UTF-8.
+ */
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
+}
+
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+/**
+ * Validates that the string is actual UTF-8 and stops on errors.
+ */
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    if (c.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
+      res.count += count;
+      return res;
+    }
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
+
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    running_or |= in;
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
+}
+
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    if (!in.is_ascii()) {
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
+      return result(res.error, count + res.count);
+    }
+    reader.advance();
+
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  if (!in.is_ascii()) {
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
+    return result(res.error, count + res.count);
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
+
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+} // namespace utf8_validation
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_validator.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_utf16 {
+using namespace simd;
+
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
+
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
+
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_utf16 {
+
+using namespace simd;
+
+template <endianness endian>
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
+      pos += 64;
+    } else {
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
+  }
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
+}
+
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_utf32 {
+using namespace simd;
+
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
+
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_utf32 {
+
+using namespace simd;
+
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+    }
+  }
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
+}
+
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+// other functions
+/* begin file src/generic/utf16.h */
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf16 {
+
+template <endianness big_endian>
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
+    }
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+}
+
+template <endianness big_endian>
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
+    }
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
+}
+
+template <endianness big_endian>
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
+}
+
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
+  size_t pos = 0;
+
+  while (pos < size / 32 * 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    input.swap_bytes();
+    input.store(reinterpret_cast<uint16_t *>(output));
+    pos += 32;
+    output += 32;
+  }
+
+  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
+}
+
+} // namespace utf16
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf16.h */
+/* begin file src/generic/utf8.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8 {
+
+using namespace simd;
+
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
+
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8.h */
+// transcoding from UTF-8 to Latin 1
+/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
+
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      FORBIDDEN,
+      // 1110____ ________ <three byte lead in byte 1>
+      FORBIDDEN,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
+
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
+  }
+
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, latin1_output - start);
+  }
+
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
+
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
+} // unnamed namespace
+} // namespace arm64
+} // namespace simdutf
+/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+
+namespace simdutf {
+namespace arm64 {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
+
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
+  }
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
+
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace arm64
+} // namespace simdutf
+  // namespace simdutf
+/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+
+// placeholder scalars
+
+//
+// Implementation-specific overrides
+//
+namespace simdutf {
+namespace arm64 {
+
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  // todo: reimplement as a one-pass algorithm.
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
+  return out;
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_utf8(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_ascii(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return arm64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char16_t *tail = arm_validate_utf16<endianness::LITTLE>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
+  } else {
+    return false;
+  }
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char16_t *tail = arm_validate_utf16<endianness::BIG>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
+}
+
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = arm_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
+
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = arm_validate_utf16_with_errors<endianness::BIG>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char32_t *tail = arm_validate_utf32le(buf, len);
+  if (tail) {
+    return scalar::utf32::validate(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
+}
+
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = arm_validate_utf32le_with_errors(buf, len);
+  if (res.count != len) {
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char *, char *> ret =
+      arm_convert_latin1_to_utf8(buf, len, utf8_output);
+  size_t converted_chars = ret.second - utf8_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      arm_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      arm_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      arm_convert_latin1_to_utf32(buf, len, utf32_output);
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert(buf, len, latin1_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return arm64::utf8_to_latin1::convert_valid(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::BIG>(buf, len, utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert(buf, len, utf32_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                               latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf16be_to_latin1(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf16le_to_latin1(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      arm_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len,
+                                                                utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len,
+                                                             utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16le_to_utf8(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16be_to_utf8(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return 0;
+  }
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      arm_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  if (ret.first.count != len) {
+    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      arm_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      arm_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      arm_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      arm_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len,
+                                                              utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      arm_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      arm_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert_valid(
+        ret.first, len - (ret.first - buf), ret.second);
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf32_to_utf8(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      arm_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      arm_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      arm_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      arm_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                              utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16le(buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16be(buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16le_to_utf32(buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16be_to_utf32(buf, len, utf32_output);
+}
+
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  utf16::change_endianness_utf16(input, length, output);
+}
+
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
+}
+
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
+  return scalar::utf16::latin1_length_from_utf16(length);
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
+  return scalar::utf32::latin1_length_from_utf32(length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  // See
+  // https://lemire.me/blog/2023/05/15/computing-the-utf-8-size-of-a-latin-1-string-quickly-arm-neon-edition/
+  // credit to Pete Cawley
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
+  uint64_t result = 0;
+  const int lanes = sizeof(uint8x16_t);
+  uint8_t rem = length % lanes;
+  const uint8_t *simd_end = data + (length / lanes) * lanes;
+  const uint8x16_t threshold = vdupq_n_u8(0x80);
+  for (; data < simd_end; data += lanes) {
+    // load 16 bytes
+    uint8x16_t input_vec = vld1q_u8(data);
+    // compare to threshold (0x80)
+    uint8x16_t withhighbit = vcgeq_u8(input_vec, threshold);
+    // vertical addition
+    result -= vaddvq_s8(vreinterpretq_s8_u8(withhighbit));
+  }
+  return result + (length / lanes) * lanes +
+         scalar::latin1::utf8_length_from_latin1((const char *)simd_end, rem);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf16_length_from_latin1(length);
+}
+
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf32_length_from_latin1(length);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::utf16_length_from_utf8(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const uint32x4_t v_7f = vmovq_n_u32((uint32_t)0x7f);
+  const uint32x4_t v_7ff = vmovq_n_u32((uint32_t)0x7ff);
+  const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
+  const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 4 <= length; pos += 4) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
+    const uint32x4_t ascii_bytes_bytemask = vcleq_u32(in, v_7f);
+    const uint32x4_t one_two_bytes_bytemask = vcleq_u32(in, v_7ff);
+    const uint32x4_t two_bytes_bytemask =
+        veorq_u32(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const uint32x4_t three_bytes_bytemask =
+        veorq_u32(vcleq_u32(in, v_ffff), one_two_bytes_bytemask);
+
+    const uint16x8_t reduced_ascii_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(ascii_bytes_bytemask, v_1));
+    const uint16x8_t reduced_two_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(two_bytes_bytemask, v_1));
+    const uint16x8_t reduced_three_bytes_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(three_bytes_bytemask, v_1));
+
+    const uint16x8_t compressed_bytemask0 =
+        vpaddq_u16(reduced_ascii_bytes_bytemask, reduced_two_bytes_bytemask);
+    const uint16x8_t compressed_bytemask1 =
+        vpaddq_u16(reduced_three_bytes_bytemask, reduced_three_bytes_bytemask);
+
+    size_t ascii_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 0));
+    size_t two_bytes_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 1));
+    size_t three_bytes_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask1), 0));
+
+    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
+  }
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
+  const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 4 <= length; pos += 4) {
+    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
+    const uint32x4_t surrogate_bytemask = vcgtq_u32(in, v_ffff);
+    const uint16x8_t reduced_bytemask =
+        vreinterpretq_u16_u32(vandq_u32(surrogate_bytemask, v_1));
+    const uint16x8_t compressed_bytemask =
+        vpaddq_u16(reduced_bytemask, reduced_bytemask);
+    size_t surrogate_count = count_ones(
+        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask), 0));
+    count += 4 + surrogate_count;
+  }
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
+}
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
+
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  return encode_base64(output, input, length, options);
+}
+
+} // namespace arm64
+} // namespace simdutf
+
+/* begin file src/simdutf/arm64/end.h */
+/* end file src/simdutf/arm64/end.h */
+/* end file src/arm64/implementation.cpp */
+#endif
+#if SIMDUTF_IMPLEMENTATION_FALLBACK
+/* begin file src/fallback/implementation.cpp */
+/* begin file src/simdutf/fallback/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "fallback"
+// #define SIMDUTF_IMPLEMENTATION fallback
+/* end file src/simdutf/fallback/begin.h */
+
+
+
+
+
+
+
+
+#include <cstdint>
+#include <cstring>
+
+namespace simdutf {
+namespace fallback {
+
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  // todo: reimplement as a one-pass algorithm.
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
+  return out;
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return scalar::utf8::validate(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return scalar::utf8::validate_with_errors(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return scalar::ascii::validate(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return scalar::ascii::validate_with_errors(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  return scalar::utf16::validate<endianness::LITTLE>(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  return scalar::utf16::validate<endianness::BIG>(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  return scalar::utf32::validate(buf, len);
+}
+
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  return scalar::utf32::validate_with_errors(buf, len);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::latin1_to_utf8::convert(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::latin1_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                              utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::latin1_to_utf16::convert<endianness::BIG>(buf, len,
+                                                           utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::latin1_to_utf32::convert(buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf8_to_latin1::convert(buf, len, latin1_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf8_to_latin1::convert_with_errors(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf8_to_latin1::convert_valid(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                            utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert<endianness::BIG>(buf, len,
+                                                         utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf8_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                               utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf8_to_utf32::convert(buf, len, utf32_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf8_to_utf32::convert_with_errors(buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return scalar::utf8_to_utf32::convert_valid(input, size, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert<endianness::LITTLE>(buf, len,
+                                                              latin1_output);
+}
 
-// 1 byte for length, 16 bytes for mask
-const uint8_t pack_1_2_3_utf8_bytes[256][17] = {
-    {12, 2, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80},
-    {9, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {11, 3, 1, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {10, 0, 6, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 2, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {11, 2, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {10, 3, 1, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 7, 5, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {10, 2, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 3, 1, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 0, 4, 10, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 2, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 6, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 2, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {8, 2, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 7, 5, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 4, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {11, 2, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {10, 3, 1, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 6, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {10, 2, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 3, 1, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 0, 7, 5, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 2, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 4, 11, 9, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {10, 2, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 3, 1, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 0, 6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 2, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 2, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 7, 5, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 4, 8, 14, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 2, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 6, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 2, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {8, 2, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 7, 5, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 4, 10, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 2, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {3, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 6, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {2, 3, 1, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {1, 0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {5, 2, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 3, 1, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 0, 7, 5, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 2, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {3, 3, 1, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {2, 0, 4, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {8, 2, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 6, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 2, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 3, 1, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 0, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {7, 2, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 3, 1, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 0, 7, 5, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 2, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {3, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 4, 11, 9, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 3, 1, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 0, 6, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 2, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {3, 3, 1, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {2, 0, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 2, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {3, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 7, 5, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 2, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 3, 1, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 0, 4, 8, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {11, 2, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {10, 3, 1, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {9, 0, 6, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {10, 2, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 3, 1, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 0, 7, 5, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 2, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 4, 10, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 6, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 2, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {4, 3, 1, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {3, 0, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {7, 2, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 7, 5, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 2, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 4, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {10, 2, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 3, 1, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 0, 6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 2, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 2, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 7, 5, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 4, 11, 9, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 2, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 6, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 2, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {8, 2, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 7, 5, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 4, 8, 15, 13, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {10, 2, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {7, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {9, 3, 1, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {8, 0, 6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 2, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {9, 2, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 7, 5, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {8, 2, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 4, 10, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 3, 1, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 0, 6, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 2, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80, 0x80},
-    {3, 3, 1, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {2, 0, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 2, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {3, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 7, 5, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 2, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 3, 1, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 0, 4, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {9, 2, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80},
-    {6, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 3, 1, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {7, 0, 6, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 2, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {8, 2, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 7, 5, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 2, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {6, 3, 1, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {5, 0, 4, 11, 9, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {8, 2, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {7, 3, 1, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {6, 0, 6, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 2, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {2, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {4, 3, 1, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {3, 0, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {7, 2, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {4, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 3, 1, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {5, 0, 7, 5, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {6, 2, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80},
-    {3, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80, 0x80},
-    {5, 3, 1, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80},
-    {4, 0, 4, 8, 12, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
-     0x80, 0x80}};
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert<endianness::BIG>(buf, len,
+                                                           latin1_output);
+}
+
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+      buf, len, latin1_output);
+}
+
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+      buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_valid<endianness::LITTLE>(
+      buf, len, latin1_output);
+}
 
-} // namespace utf16_to_utf8
-} // namespace tables
-} // unnamed namespace
-} // namespace simdutf
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf16_to_latin1::convert_valid<endianness::BIG>(buf, len,
+                                                                 latin1_output);
+}
 
-#endif // SIMDUTF_UTF16_TO_UTF8_TABLES_H
-/* end file src/tables/utf16_to_utf8_tables.h */
-// End of tables.
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
+                                                            utf8_output);
+}
 
-// The scalar routines should be included once.
-/* begin file src/scalar/ascii.h */
-#ifndef SIMDUTF_ASCII_H
-#define SIMDUTF_ASCII_H
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
+}
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace ascii {
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-// Only used by the fallback kernel.
-inline simdutf_warn_unused bool validate(const char *buf, size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  uint64_t pos = 0;
-  // process in blocks of 16 bytes when possible
-  for (; pos + 16 <= len; pos += 16) {
-    uint64_t v1;
-    std::memcpy(&v1, data + pos, sizeof(uint64_t));
-    uint64_t v2;
-    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-    uint64_t v{v1 | v2};
-    if ((v & 0x8080808080808080) != 0) {
-      return false;
-    }
-  }
-  // process the tail byte-by-byte
-  for (; pos < len; pos++) {
-    if (data[pos] >= 0b10000000) {
-      return false;
-    }
-  }
-  return true;
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf8_output);
 }
-#endif
 
-inline simdutf_warn_unused result validate_with_errors(const char *buf,
-                                                       size_t len) noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  // process in blocks of 16 bytes when possible
-  for (; pos + 16 <= len; pos += 16) {
-    uint64_t v1;
-    std::memcpy(&v1, data + pos, sizeof(uint64_t));
-    uint64_t v2;
-    std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-    uint64_t v{v1 | v2};
-    if ((v & 0x8080808080808080) != 0) {
-      for (; pos < len; pos++) {
-        if (data[pos] >= 0b10000000) {
-          return result(error_code::TOO_LARGE, pos);
-        }
-      }
-    }
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+      buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
+                                                               utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf32_to_latin1::convert(buf, len, latin1_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf32_to_latin1::convert_with_errors(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return scalar::utf32_to_latin1::convert_valid(buf, len, latin1_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
+                                                          utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                                utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
+                                                             utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
+                                                          utf32_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf32_output);
+}
+
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+      buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
+      buf, len, utf32_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
+                                                                utf32_output);
+}
+
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  scalar::utf16::change_endianness_utf16(input, length, output);
+}
+
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::count_code_points<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  return scalar::utf8::count_code_points(input, length);
+}
+
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return scalar::utf8::count_code_points(buf, len);
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
+  return scalar::utf16::latin1_length_from_utf16(length);
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
+  return length;
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  size_t answer = length;
+  size_t i = 0;
+  auto pop = [](uint64_t v) {
+    return (size_t)(((v >> 7) & UINT64_C(0x0101010101010101)) *
+                        UINT64_C(0x0101010101010101) >>
+                    56);
+  };
+  for (; i + 32 <= length; i += 32) {
+    uint64_t v;
+    memcpy(&v, input + i, 8);
+    answer += pop(v);
+    memcpy(&v, input + i + 8, sizeof(v));
+    answer += pop(v);
+    memcpy(&v, input + i + 16, sizeof(v));
+    answer += pop(v);
+    memcpy(&v, input + i + 24, sizeof(v));
+    answer += pop(v);
   }
-  // process the tail byte-by-byte
-  for (; pos < len; pos++) {
-    if (data[pos] >= 0b10000000) {
-      return result(error_code::TOO_LARGE, pos);
-    }
+  for (; i + 8 <= length; i += 8) {
+    uint64_t v;
+    memcpy(&v, input + i, sizeof(v));
+    answer += pop(v);
   }
-  return result(error_code::SUCCESS, pos);
+  for (; i + 1 <= length; i += 1) {
+    answer += static_cast<uint8_t>(input[i]) >> 7;
+  }
+  return answer;
 }
 
-} // namespace ascii
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
+                                                                   length);
+}
 
-#endif
-/* end file src/scalar/ascii.h */
-/* begin file src/scalar/latin1.h */
-#ifndef SIMDUTF_LATIN1_H
-#define SIMDUTF_LATIN1_H
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
+}
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace latin1 {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
+                                                                    length);
+}
 
-inline size_t utf32_length_from_latin1(size_t len) {
-  // We are not BOM aware.
-  return len; // a utf32 unit will always represent 1 latin1 character
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-inline size_t utf8_length_from_latin1(const char *buf, size_t len) {
-  const uint8_t *c = reinterpret_cast<const uint8_t *>(buf);
-  size_t answer = 0;
-  for (size_t i = 0; i < len; i++) {
-    if ((c[i] >> 7)) {
-      answer++;
-    }
-  }
-  return answer + len;
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-inline size_t utf16_length_from_latin1(size_t len) { return len; }
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return scalar::utf8::utf16_length_from_utf8(input, length);
+}
 
-} // namespace latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  return scalar::utf32::utf8_length_from_utf32(input, length);
+}
 
-#endif
-/* end file src/scalar/latin1.h */
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  return scalar::utf32::utf16_length_from_utf32(input, length);
+}
 
-/* begin file src/scalar/utf32_to_utf8/valid_utf32_to_utf8.h */
-#ifndef SIMDUTF_VALID_UTF32_TO_UTF8_H
-#define SIMDUTF_VALID_UTF32_TO_UTF8_H
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf32_length_from_latin1(length);
+}
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32_to_utf8 {
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return scalar::utf8::count_code_points(input, length);
+}
 
-#if SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
-// only used by the fallback and POWER kernel
-inline size_t convert_valid(const char32_t *buf, size_t len,
-                            char *utf8_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
-        *utf8_output++ = char(buf[pos]);
-        *utf8_output++ = char(buf[pos + 1]);
-        pos += 2;
-        continue;
-      }
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+  }
+  return r;
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
+}
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+  }
+  return r;
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
     }
-    uint32_t word = data[pos];
-    if ((word & 0xFFFFFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xFFFFF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xFFFF0000) == 0) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else {
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 18) | 0b11110000);
-      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
     }
   }
-  return utf8_output - start;
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
 }
-#endif // SIMDUTF_IMPLEMENTATION_FALLBACK || SIMDUTF_IMPLEMENTATION_PPC64
 
-} // namespace utf32_to_utf8
-} // unnamed namespace
-} // namespace scalar
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
+
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  return scalar::base64::tail_encode_base64(output, input, length, options);
+}
+} // namespace fallback
 } // namespace simdutf
 
+/* begin file src/simdutf/fallback/end.h */
+/* end file src/simdutf/fallback/end.h */
+/* end file src/fallback/implementation.cpp */
 #endif
-/* end file src/scalar/utf32_to_utf8/valid_utf32_to_utf8.h */
-/* begin file src/scalar/utf32_to_utf8/utf32_to_utf8.h */
-#ifndef SIMDUTF_UTF32_TO_UTF8_H
-#define SIMDUTF_UTF32_TO_UTF8_H
+#if SIMDUTF_IMPLEMENTATION_ICELAKE
+/* begin file src/icelake/implementation.cpp */
+
 
+/* begin file src/simdutf/icelake/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "icelake"
+// #define SIMDUTF_IMPLEMENTATION icelake
+
+#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
+// nothing needed.
+#else
+SIMDUTF_TARGET_ICELAKE
+#endif
+
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
+SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
+#endif // end of workaround
+/* end file src/simdutf/icelake/begin.h */
 namespace simdutf {
-namespace scalar {
+namespace icelake {
 namespace {
-namespace utf32_to_utf8 {
+#ifndef SIMDUTF_ICELAKE_H
+  #error "icelake.h must be included"
+#endif
+/* begin file src/icelake/icelake_utf8_common.inl.cpp */
+// Common procedures for both validating and non-validating conversions from
+// UTF-8.
+enum block_processing_mode { SIMDUTF_FULL, SIMDUTF_TAIL };
 
-inline size_t convert(const char32_t *buf, size_t len, char *utf8_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
-        *utf8_output++ = char(buf[pos]);
-        *utf8_output++ = char(buf[pos + 1]);
-        pos += 2;
-        continue;
+using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
+using utf8_to_utf32_result = std::pair<const char *, uint32_t *>;
+
+/*
+    process_block_utf8_to_utf16 converts up to 64 bytes from 'in' from UTF-8
+    to UTF-16. When tail = SIMDUTF_FULL, then the full input buffer (64 bytes)
+    might be used. When tail = SIMDUTF_TAIL, we take into account 'gap' which
+    indicates how many input bytes are relevant.
+
+    Returns true when the result is correct, otherwise it returns false.
+
+    The provided in and out pointers are advanced according to how many input
+    bytes have been processed, upon success.
+*/
+template <block_processing_mode tail, endianness big_endian>
+simdutf_really_inline bool
+process_block_utf8_to_utf16(const char *&in, char16_t *&out, size_t gap) {
+  // constants
+  __m512i mask_identity = _mm512_set_epi8(
+      63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46,
+      45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28,
+      27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9,
+      8, 7, 6, 5, 4, 3, 2, 1, 0);
+  __m512i mask_c0c0c0c0 = _mm512_set1_epi32(0xc0c0c0c0);
+  __m512i mask_80808080 = _mm512_set1_epi32(0x80808080);
+  __m512i mask_f0f0f0f0 = _mm512_set1_epi32(0xf0f0f0f0);
+  __m512i mask_dfdfdfdf_tail = _mm512_set_epi64(
+      0xffffdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
+      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
+      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf);
+  __m512i mask_c2c2c2c2 = _mm512_set1_epi32(0xc2c2c2c2);
+  __m512i mask_ffffffff = _mm512_set1_epi32(0xffffffff);
+  __m512i mask_d7c0d7c0 = _mm512_set1_epi32(0xd7c0d7c0);
+  __m512i mask_dc00dc00 = _mm512_set1_epi32(0xdc00dc00);
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  // Note that 'tail' is a compile-time constant !
+  __mmask64 b =
+      (tail == SIMDUTF_FULL) ? 0xFFFFFFFFFFFFFFFF : (uint64_t(1) << gap) - 1;
+  __m512i input = (tail == SIMDUTF_FULL) ? _mm512_loadu_si512(in)
+                                         : _mm512_maskz_loadu_epi8(b, in);
+  __mmask64 m1 = (tail == SIMDUTF_FULL)
+                     ? _mm512_cmplt_epu8_mask(input, mask_80808080)
+                     : _mm512_mask_cmplt_epu8_mask(b, input, mask_80808080);
+  if (_ktestc_mask64_u8(m1,
+                        b)) { // NOT(m1) AND b -- if all zeroes, then all ASCII
+                              // alternatively, we could do 'if (m1 == b) { '
+    if (tail == SIMDUTF_FULL) {
+      in += 64; // consumed 64 bytes
+      // we convert a full 64-byte block, writing 128 bytes.
+      __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
+      if (big_endian) {
+        input1 = _mm512_shuffle_epi8(input1, byteflip);
       }
-    }
-    uint32_t word = data[pos];
-    if ((word & 0xFFFFFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xFFFFF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xFFFF0000) == 0) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      if (word >= 0xD800 && word <= 0xDFFF) {
-        return 0;
+      _mm512_storeu_si512(out, input1);
+      out += 32;
+      __m512i input2 =
+          _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
+      if (big_endian) {
+        input2 = _mm512_shuffle_epi8(input2, byteflip);
       }
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
+      _mm512_storeu_si512(out, input2);
+      out += 32;
+      return true; // we are done
     } else {
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      if (word > 0x10FFFF) {
-        return 0;
+      in += gap;
+      if (gap <= 32) {
+        __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
+        if (big_endian) {
+          input1 = _mm512_shuffle_epi8(input1, byteflip);
+        }
+        _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << (gap)) - 1),
+                                 input1);
+        out += gap;
+      } else {
+        __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
+        if (big_endian) {
+          input1 = _mm512_shuffle_epi8(input1, byteflip);
+        }
+        _mm512_storeu_si512(out, input1);
+        out += 32;
+        __m512i input2 =
+            _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
+        if (big_endian) {
+          input2 = _mm512_shuffle_epi8(input2, byteflip);
+        }
+        _mm512_mask_storeu_epi16(
+            out, __mmask32((uint32_t(1) << (gap - 32)) - 1), input2);
+        out += gap - 32;
       }
-      *utf8_output++ = char((word >> 18) | 0b11110000);
-      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
+      return true; // we are done
     }
   }
-  return utf8_output - start;
-}
+  // classify characters further
+  __mmask64 m234 = _mm512_cmp_epu8_mask(
+      mask_c0c0c0c0, input,
+      _MM_CMPINT_LE); // 0xc0 <= input, 2, 3, or 4 leading byte
+  __mmask64 m34 =
+      _mm512_cmp_epu8_mask(mask_dfdfdfdf_tail, input,
+                           _MM_CMPINT_LT); // 0xdf < input,  3 or 4 leading byte
 
-inline result convert_with_errors(const char32_t *buf, size_t len,
-                                  char *utf8_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 2 ASCII characters
-    if (pos + 2 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0xFFFFFF80FFFFFF80) == 0) {
-        *utf8_output++ = char(buf[pos]);
-        *utf8_output++ = char(buf[pos + 1]);
-        pos += 2;
-        continue;
-      }
-    }
-    uint32_t word = data[pos];
-    if ((word & 0xFFFFFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xFFFFF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xFFFF0000) == 0) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      if (word >= 0xD800 && word <= 0xDFFF) {
-        return result(error_code::SURROGATE, pos);
+  __mmask64 milltwobytes = _mm512_mask_cmp_epu8_mask(
+      m234, input, mask_c2c2c2c2,
+      _MM_CMPINT_LT); // 0xc0 <= input < 0xc2 (illegal two byte sequence)
+                      // Overlong 2-byte sequence
+  if (_ktestz_mask64_u8(milltwobytes, milltwobytes) == 0) {
+    // Overlong 2-byte sequence
+    return false;
+  }
+  if (_ktestz_mask64_u8(m34, m34) == 0) {
+    // We have a 3-byte sequence and/or a 2-byte sequence, or possibly even a
+    // 4-byte sequence!
+    __mmask64 m4 = _mm512_cmp_epu8_mask(
+        input, mask_f0f0f0f0,
+        _MM_CMPINT_NLT); // 0xf0 <= zmm0 (4 byte start bytes)
+
+    __mmask64 mask_not_ascii = (tail == SIMDUTF_FULL)
+                                   ? _knot_mask64(m1)
+                                   : _kand_mask64(_knot_mask64(m1), b);
+
+    __mmask64 mp1 = _kshiftli_mask64(m234, 1);
+    __mmask64 mp2 = _kshiftli_mask64(m34, 2);
+    // We could do it as follows...
+    // if (_kortestz_mask64_u8(m4,m4)) { // compute the bitwise OR of the 64-bit
+    // masks a and b and return 1 if all zeroes but GCC generates better code
+    // when we do:
+    if (m4 == 0) { // compute the bitwise OR of the 64-bit masks a and b and
+                   // return 1 if all zeroes
+      // Fast path with 1,2,3 bytes
+      __mmask64 mc = _kor_mask64(mp1, mp2); // expected continuation bytes
+      __mmask64 m1234 = _kor_mask64(m1, m234);
+      // mismatched continuation bytes:
+      if (tail == SIMDUTF_FULL) {
+        __mmask64 xnormcm1234 = _kxnor_mask64(
+            mc,
+            m1234); // XNOR of mc and m1234 should be all zero if they differ
+        // the presence of a 1 bit indicates that they overlap.
+        // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return
+        // 1 if all zeroes.
+        if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
+          return false;
+        }
+      } else {
+        __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
+        if (mc != bxorm1234) {
+          return false;
+        }
       }
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else {
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      if (word > 0x10FFFF) {
-        return result(error_code::TOO_LARGE, pos);
+      // mend: identifying the last bytes of each sequence to be decoded
+      __mmask64 mend = _kshiftri_mask64(m1234, 1);
+      if (tail != SIMDUTF_FULL) {
+        mend = _kor_mask64(mend, (uint64_t(1) << (gap - 1)));
       }
-      *utf8_output++ = char((word >> 18) | 0b11110000);
-      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    }
-  }
-  return result(error_code::SUCCESS, utf8_output - start);
-}
 
-} // namespace utf32_to_utf8
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+      __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
+      __m512i last_and_thirdu16 =
+          _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
 
-#endif
-/* end file src/scalar/utf32_to_utf8/utf32_to_utf8.h */
+      __m512i nonasciitags = _mm512_maskz_mov_epi8(
+          mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
+      __m512i clearedbytes = _mm512_andnot_si512(
+          nonasciitags, input); // high two bits cleared where not ASCII
+      __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, last_and_thirdu16,
+          clearedbytes); // the last byte of each character
 
-/* begin file src/scalar/utf32_to_utf16/valid_utf32_to_utf16.h */
-#ifndef SIMDUTF_VALID_UTF32_TO_UTF16_H
-#define SIMDUTF_VALID_UTF32_TO_UTF16_H
+      __mmask64 mask_before_non_ascii = _kshiftri_mask64(
+          mask_not_ascii, 1); // bytes that precede non-ASCII bytes
+      __m512i indexofsecondlastbytes = _mm512_add_epi16(
+          mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
+      __m512i beforeasciibytes =
+          _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
+      __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, indexofsecondlastbytes,
+          beforeasciibytes); // the second last bytes (of two, three byte seq,
+                             // surrogates)
+      secondlastbytes =
+          _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32_to_utf16 {
+      __m512i indexofthirdlastbytes = _mm512_add_epi16(
+          mask_ffffffff,
+          indexofsecondlastbytes); // indices of the second last bytes
+      __m512i thirdlastbyte =
+          _mm512_maskz_mov_epi8(m34,
+                                clearedbytes); // only those that are the third
+                                               // last byte of a sequence
+      __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
+          0x5555555555555555, indexofthirdlastbytes,
+          thirdlastbyte); // the third last bytes (of three byte sequences, hi
+                          // surrogate)
+      thirdlastbytes =
+          _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
+      __m512i Wout = _mm512_ternarylogic_epi32(lastbytes, secondlastbytes,
+                                               thirdlastbytes, 254);
+      // the elements of Wout excluding the last element if it happens to be a
+      // high surrogate:
 
-template <endianness big_endian>
-inline size_t convert_valid(const char32_t *buf, size_t len,
-                            char16_t *utf16_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    uint32_t word = data[pos];
-    if ((word & 0xFFFF0000) == 0) {
-      // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
-                            : char16_t(word);
-      pos++;
-    } else {
-      // will generate a surrogate pair
-      word -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
+      __mmask64 mprocessed =
+          (tail == SIMDUTF_FULL)
+              ? _pdep_u64(0xFFFFFFFF, mend)
+              : _pdep_u64(
+                    0xFFFFFFFF,
+                    _kand_mask64(
+                        mend, b)); // we adjust mend at the end of the output.
+
+      // Encodings out of range...
+      {
+        // the location of 3-byte sequence start bytes in the input
+        __mmask64 m3 = m34 & (b ^ m4);
+        // code units in Wout corresponding to 3-byte sequences.
+        __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
+        __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
+        __mmask32 Msmall800 =
+            _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
+        __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
+        __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
+        __mmask32 M3s =
+            _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
+        if (_kor_mask32(Msmall800, M3s)) {
+          return false;
+        }
       }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
-      pos++;
+      int64_t nout = _mm_popcnt_u64(mprocessed);
+      in += 64 - _lzcnt_u64(mprocessed);
+      if (big_endian) {
+        Wout = _mm512_shuffle_epi8(Wout, byteflip);
+      }
+      _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
+      out += nout;
+      return true; // ok
     }
-  }
-  return utf16_output - start;
-}
+    //
+    // We have a 4-byte sequence, this is the general case.
+    // Slow!
+    __mmask64 mp3 = _kshiftli_mask64(m4, 3);
+    __mmask64 mc =
+        _kor_mask64(_kor_mask64(mp1, mp2), mp3); // expected continuation bytes
+    __mmask64 m1234 = _kor_mask64(m1, m234);
 
-} // namespace utf32_to_utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+    // mend: identifying the last bytes of each sequence to be decoded
+    __mmask64 mend =
+        _kor_mask64(_kshiftri_mask64(_kor_mask64(mp3, m1234), 1), mp3);
+    if (tail != SIMDUTF_FULL) {
+      mend = _kor_mask64(mend, __mmask64(uint64_t(1) << (gap - 1)));
+    }
+    __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
+    __m512i last_and_thirdu16 =
+        _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
 
-#endif
-/* end file src/scalar/utf32_to_utf16/valid_utf32_to_utf16.h */
-/* begin file src/scalar/utf32_to_utf16/utf32_to_utf16.h */
-#ifndef SIMDUTF_UTF32_TO_UTF16_H
-#define SIMDUTF_UTF32_TO_UTF16_H
+    __m512i nonasciitags = _mm512_maskz_mov_epi8(
+        mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
+    __m512i clearedbytes = _mm512_andnot_si512(
+        nonasciitags, input); // high two bits cleared where not ASCII
+    __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, last_and_thirdu16,
+        clearedbytes); // the last byte of each character
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32_to_utf16 {
+    __mmask64 mask_before_non_ascii = _kshiftri_mask64(
+        mask_not_ascii, 1); // bytes that precede non-ASCII bytes
+    __m512i indexofsecondlastbytes = _mm512_add_epi16(
+        mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
+    __m512i beforeasciibytes =
+        _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
+    __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, indexofsecondlastbytes,
+        beforeasciibytes); // the second last bytes (of two, three byte seq,
+                           // surrogates)
+    secondlastbytes =
+        _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
 
-template <endianness big_endian>
-inline size_t convert(const char32_t *buf, size_t len, char16_t *utf16_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    uint32_t word = data[pos];
-    if ((word & 0xFFFF0000) == 0) {
-      if (word >= 0xD800 && word <= 0xDFFF) {
-        return 0;
+    __m512i indexofthirdlastbytes = _mm512_add_epi16(
+        mask_ffffffff,
+        indexofsecondlastbytes); // indices of the second last bytes
+    __m512i thirdlastbyte = _mm512_maskz_mov_epi8(
+        m34,
+        clearedbytes); // only those that are the third last byte of a sequence
+    __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
+        0x5555555555555555, indexofthirdlastbytes,
+        thirdlastbyte); // the third last bytes (of three byte sequences, hi
+                        // surrogate)
+    thirdlastbytes =
+        _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
+    __m512i thirdsecondandlastbytes = _mm512_ternarylogic_epi32(
+        lastbytes, secondlastbytes, thirdlastbytes, 254);
+    uint64_t Mlo_uint64 = _pext_u64(mp3, mend);
+    __mmask32 Mlo = __mmask32(Mlo_uint64);
+    __mmask32 Mhi = __mmask32(Mlo_uint64 >> 1);
+    __m512i lo_surr_mask = _mm512_maskz_mov_epi16(
+        Mlo,
+        mask_dc00dc00); // lo surr: 1101110000000000, other:  0000000000000000
+    __m512i shifted4_thirdsecondandlastbytes =
+        _mm512_srli_epi16(thirdsecondandlastbytes,
+                          4); // hi surr: 00000WVUTSRQPNML  vuts = WVUTS - 1
+    __m512i tagged_lo_surrogates = _mm512_or_si512(
+        thirdsecondandlastbytes,
+        lo_surr_mask); // lo surr: 110111KJHGFEDCBA, other:  unchanged
+    __m512i Wout = _mm512_mask_add_epi16(
+        tagged_lo_surrogates, Mhi, shifted4_thirdsecondandlastbytes,
+        mask_d7c0d7c0); // hi sur: 110110vutsRQPNML, other:  unchanged
+    // the elements of Wout excluding the last element if it happens to be a
+    // high surrogate:
+    __mmask32 Mout = ~(Mhi & 0x80000000);
+    __mmask64 mprocessed =
+        (tail == SIMDUTF_FULL)
+            ? _pdep_u64(Mout, mend)
+            : _pdep_u64(
+                  Mout,
+                  _kand_mask64(mend,
+                               b)); // we adjust mend at the end of the output.
+
+    // mismatched continuation bytes:
+    if (tail == SIMDUTF_FULL) {
+      __mmask64 xnormcm1234 = _kxnor_mask64(
+          mc, m1234); // XNOR of mc and m1234 should be all zero if they differ
+      // the presence of a 1 bit indicates that they overlap.
+      // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return 1
+      // if all zeroes.
+      if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
+        return false;
       }
-      // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
-                            : char16_t(word);
     } else {
-      // will generate a surrogate pair
-      if (word > 0x10FFFF) {
-        return 0;
+      __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
+      if (mc != bxorm1234) {
+        return false;
       }
-      word -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
+    }
+    // Encodings out of range...
+    {
+      // the location of 3-byte sequence start bytes in the input
+      __mmask64 m3 = m34 & (b ^ m4);
+      // code units in Wout corresponding to 3-byte sequences.
+      __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
+      __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
+      __mmask32 Msmall800 =
+          _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
+      __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
+      __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
+      __mmask32 M3s =
+          _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
+      __m512i mask_04000400 = _mm512_set1_epi32(0x04000400);
+      __mmask32 M4s =
+          _mm512_mask_cmpge_epu16_mask(Mhi, Moutminusd800, mask_04000400);
+      if (!_kortestz_mask32_u8(M4s, _kor_mask32(Msmall800, M3s))) {
+        return false;
       }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
     }
-    pos++;
+    in += 64 - _lzcnt_u64(mprocessed);
+    int64_t nout = _mm_popcnt_u64(mprocessed);
+    if (big_endian) {
+      Wout = _mm512_shuffle_epi8(Wout, byteflip);
+    }
+    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
+    out += nout;
+    return true; // ok
   }
-  return utf16_output - start;
-}
-
-template <endianness big_endian>
-inline result convert_with_errors(const char32_t *buf, size_t len,
-                                  char16_t *utf16_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    uint32_t word = data[pos];
-    if ((word & 0xFFFF0000) == 0) {
-      if (word >= 0xD800 && word <= 0xDFFF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      // will not generate a surrogate pair
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(uint16_t(word)))
-                            : char16_t(word);
-    } else {
-      // will generate a surrogate pair
-      if (word > 0x10FFFF) {
-        return result(error_code::TOO_LARGE, pos);
-      }
-      word -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
-      }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
+  // Fast path 2: all ASCII or 2 byte
+  __mmask64 continuation_or_ascii = (tail == SIMDUTF_FULL)
+                                        ? _knot_mask64(m234)
+                                        : _kand_mask64(_knot_mask64(m234), b);
+  // on top of -0xc0 we subtract -2 which we get back later of the
+  // continuation byte tags
+  __m512i leading2byte = _mm512_maskz_sub_epi8(m234, input, mask_c2c2c2c2);
+  __mmask64 leading = tail == (tail == SIMDUTF_FULL)
+                          ? _kor_mask64(m1, m234)
+                          : _kand_mask64(_kor_mask64(m1, m234),
+                                         b); // first bytes of each sequence
+  if (tail == SIMDUTF_FULL) {
+    __mmask64 xnor234leading =
+        _kxnor_mask64(_kshiftli_mask64(m234, 1), leading);
+    if (!_kortestz_mask64_u8(xnor234leading, xnor234leading)) {
+      return false;
+    }
+  } else {
+    __mmask64 bxorleading = _kxor_mask64(b, leading);
+    if (_kshiftli_mask64(m234, 1) != bxorleading) {
+      return false;
     }
-    pos++;
   }
-  return result(error_code::SUCCESS, utf16_output - start);
-}
+  //
+  if (tail == SIMDUTF_FULL) {
+    // In the two-byte/ASCII scenario, we are easily latency bound, so we want
+    // to increment the input buffer as quickly as possible.
+    // We process 32 bytes unless the byte at index 32 is a continuation byte,
+    // in which case we include it as well for a total of 33 bytes.
+    // Note that if x is an ASCII byte, then the following is false:
+    // int8_t(x) <= int8_t(0xc0) under two's complement.
+    in += 32;
+    if (int8_t(*in) <= int8_t(0xc0))
+      in++;
+    // The alternative is to do
+    // in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
+    // but it requires loading the input, doing the mask computation, and
+    // converting back the mask to a general register. It just takes too long,
+    // leaving the processor likely to be idle.
+  } else {
+    in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
+  }
+  __m512i lead = _mm512_maskz_compress_epi8(
+      leading, leading2byte); // will contain zero for ascii, and the data
+  lead = _mm512_cvtepu8_epi16(
+      _mm512_castsi512_si256(lead)); // ... zero extended into code units
+  __m512i follow = _mm512_maskz_compress_epi8(
+      continuation_or_ascii, input); // the last bytes of each sequence
+  follow = _mm512_cvtepu8_epi16(
+      _mm512_castsi512_si256(follow)); // ... zero extended into code units
+  lead = _mm512_slli_epi16(lead, 6);   // shifted into position
+  __m512i final = _mm512_add_epi16(follow, lead); // combining lead and follow
 
-} // namespace utf32_to_utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  if (big_endian) {
+    final = _mm512_shuffle_epi8(final, byteflip);
+  }
+  if (tail == SIMDUTF_FULL) {
+    // Next part is UTF-16 specific and can be generalized to UTF-32.
+    int nout = _mm_popcnt_u32(uint32_t(leading));
+    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), final);
+    out += nout; // UTF-8 to UTF-16 is only expansionary in this case.
+  } else {
+    int nout = int(_mm_popcnt_u64(_pdep_u64(0xFFFFFFFF, leading)));
+    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), final);
+    out += nout; // UTF-8 to UTF-16 is only expansionary in this case.
+  }
 
-#endif
-/* end file src/scalar/utf32_to_utf16/utf32_to_utf16.h */
+  return true; // we are fine.
+}
 
-/* begin file src/scalar/utf16_to_utf8/valid_utf16_to_utf8.h */
-#ifndef SIMDUTF_VALID_UTF16_TO_UTF8_H
-#define SIMDUTF_VALID_UTF16_TO_UTF8_H
+/*
+    utf32_to_utf16_masked converts `count` lower UTF-32 code units
+    from input `utf32` into UTF-16. It differs from utf32_to_utf16
+    in that it 'masks' the writes.
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_utf8 {
+    Returns how many 16-bit code units were stored.
 
+    byteflip is used for flipping 16-bit code units, and it should be
+        __m512i byteflip = _mm512_setr_epi64(
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809
+        );
+    We pass it to the (always inlined) function to encourage the compiler to
+    keep the value in a (constant) register.
+*/
 template <endianness big_endian>
-inline size_t convert_valid(const char16_t *buf, size_t len,
-                            char *utf8_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 4 ASCII characters
-    if (pos + 4 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian)) {
-        v = (v >> 8) | (v << (64 - 8));
-      }
-      if ((v & 0xFF80FF80FF80FF80) == 0) {
-        size_t final_pos = pos + 4;
-        while (pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian)
-                               ? char(utf16::swap_bytes(buf[pos]))
-                               : char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
+simdutf_really_inline size_t utf32_to_utf16_masked(const __m512i byteflip,
+                                                   __m512i utf32,
+                                                   unsigned int count,
+                                                   char16_t *output) {
+
+  const __mmask16 valid = uint16_t((1 << count) - 1);
+  // 1. check if we have any surrogate pairs
+  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
+  const __mmask16 sp_mask =
+      _mm512_mask_cmpgt_epu32_mask(valid, utf32, v_0000_ffff);
+
+  if (sp_mask == 0) {
+    if (big_endian) {
+      _mm256_mask_storeu_epi16(
+          (__m256i *)output, valid,
+          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
+                              _mm512_castsi512_si256(byteflip)));
 
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xF800) != 0xD800) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
     } else {
-      // must be a surrogate pair
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value >> 18) | 0b11110000);
-      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((value & 0b111111) | 0b10000000);
-      pos += 2;
+      _mm256_mask_storeu_epi16((__m256i *)output, valid,
+                               _mm512_cvtepi32_epi16(utf32));
     }
+    return count;
   }
-  return utf8_output - start;
-}
 
-} // namespace utf16_to_utf8
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  {
+    // build surrogate pair code units in 32-bit lanes
 
-#endif
-/* end file src/scalar/utf16_to_utf8/valid_utf16_to_utf8.h */
-/* begin file src/scalar/utf16_to_utf8/utf16_to_utf8.h */
-#ifndef SIMDUTF_UTF16_TO_UTF8_H
-#define SIMDUTF_UTF16_TO_UTF8_H
+    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
+    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
+    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_utf8 {
+    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
+    const __m512i t1 = _mm512_slli_epi32(t0, 6);
 
-template <endianness big_endian>
-inline size_t convert(const char16_t *buf, size_t len, char *utf8_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 8 bytes
-    if (pos + 4 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian)) {
-        v = (v >> 8) | (v << (64 - 8));
-      }
-      if ((v & 0xFF80FF80FF80FF80) == 0) {
-        size_t final_pos = pos + 4;
-        while (pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian)
-                               ? char(utf16::swap_bytes(buf[pos]))
-                               : char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xF800) != 0xD800) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else {
-      // must be a surrogate pair
-      if (pos + 1 >= len) {
-        return 0;
-      }
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return 0;
-      }
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return 0;
-      }
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value >> 18) | 0b11110000);
-      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((value & 0b111111) | 0b10000000);
-      pos += 2;
-    }
-  }
-  return utf8_output - start;
-}
+    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
+    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
+    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
 
-template <endianness big_endian>
-inline result convert_with_errors(const char16_t *buf, size_t len,
-                                  char *utf8_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char *start{utf8_output};
-  while (pos < len) {
-    // try to convert the next block of 8 bytes
-    if (pos + 4 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if (!match_system(big_endian))
-        v = (v >> 8) | (v << (64 - 8));
-      if ((v & 0xFF80FF80FF80FF80) == 0) {
-        size_t final_pos = pos + 4;
-        while (pos < final_pos) {
-          *utf8_output++ = !match_system(big_endian)
-                               ? char(utf16::swap_bytes(buf[pos]))
-                               : char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xFF80) == 0) {
-      // will generate one UTF-8 bytes
-      *utf8_output++ = char(word);
-      pos++;
-    } else if ((word & 0xF800) == 0) {
-      // will generate two UTF-8 bytes
-      // we have 0b110XXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 6) | 0b11000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else if ((word & 0xF800) != 0xD800) {
-      // will generate three UTF-8 bytes
-      // we have 0b1110XXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((word >> 12) | 0b11100000);
-      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((word & 0b111111) | 0b10000000);
-      pos++;
-    } else {
-      // must be a surrogate pair
-      if (pos + 1 >= len) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      // will generate four UTF-8 bytes
-      // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-      *utf8_output++ = char((value >> 18) | 0b11110000);
-      *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
-      *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
-      *utf8_output++ = char((value & 0b111111) | 0b10000000);
-      pos += 2;
+    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
+    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
+    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
+    const __m512i t3 =
+        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
+    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
+    __m512i t5 = _mm512_ror_epi32(t4, 16);
+    // Here we want to trim all of the upper 16-bit code units from the 2-byte
+    // characters represented as 4-byte values. We can compute it from
+    // sp_mask or the following... It can be more optimized!
+    const __mmask32 nonzero = _kor_mask32(
+        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
+    const __mmask32 nonzero_masked =
+        _kand_mask32(nonzero, __mmask32((uint64_t(1) << (2 * count)) - 1));
+    if (big_endian) {
+      t5 = _mm512_shuffle_epi8(t5, byteflip);
     }
+    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
+    // (zen4)
+    __m512i compressed = _mm512_maskz_compress_epi16(nonzero_masked, t5);
+    _mm512_mask_storeu_epi16(
+        output,
+        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
+        compressed);
+    //_mm512_mask_compressstoreu_epi16(output, nonzero_masked, t5);
   }
-  return result(error_code::SUCCESS, utf8_output - start);
-}
-
-} // namespace utf16_to_utf8
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
 
-#endif
-/* end file src/scalar/utf16_to_utf8/utf16_to_utf8.h */
+  return count + static_cast<unsigned int>(count_ones(sp_mask));
+}
 
-/* begin file src/scalar/utf16_to_utf32/valid_utf16_to_utf32.h */
-#ifndef SIMDUTF_VALID_UTF16_TO_UTF32_H
-#define SIMDUTF_VALID_UTF16_TO_UTF32_H
+/*
+    utf32_to_utf16 converts `count` lower UTF-32 code units
+    from input `utf32` into UTF-16. It may overflow.
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_utf32 {
+    Returns how many 16-bit code units were stored.
 
+    byteflip is used for flipping 16-bit code units, and it should be
+        __m512i byteflip = _mm512_setr_epi64(
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809,
+            0x0607040502030001,
+            0x0e0f0c0d0a0b0809
+        );
+    We pass it to the (always inlined) function to encourage the compiler to
+    keep the value in a (constant) register.
+*/
 template <endianness big_endian>
-inline size_t convert_valid(const char16_t *buf, size_t len,
-                            char32_t *utf32_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xF800) != 0xD800) {
-      // No surrogate pair, extend 16-bit word to 32-bit word
-      *utf32_output++ = char32_t(word);
-      pos++;
+simdutf_really_inline size_t utf32_to_utf16(const __m512i byteflip,
+                                            __m512i utf32, unsigned int count,
+                                            char16_t *output) {
+  // check if we have any surrogate pairs
+  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
+  const __mmask16 sp_mask = _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
+
+  if (sp_mask == 0) {
+    // technically, it should be _mm256_storeu_epi16
+    if (big_endian) {
+      _mm256_storeu_si256(
+          (__m256i *)output,
+          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
+                              _mm512_castsi512_si256(byteflip)));
     } else {
-      // must be a surrogate pair
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      *utf32_output++ = char32_t(value);
-      pos += 2;
+      _mm256_storeu_si256((__m256i *)output, _mm512_cvtepi32_epi16(utf32));
     }
+    return count;
   }
-  return utf32_output - start;
-}
 
-} // namespace utf16_to_utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  {
+    // build surrogate pair code units in 32-bit lanes
 
-#endif
-/* end file src/scalar/utf16_to_utf32/valid_utf16_to_utf32.h */
-/* begin file src/scalar/utf16_to_utf32/utf16_to_utf32.h */
-#ifndef SIMDUTF_UTF16_TO_UTF32_H
-#define SIMDUTF_UTF16_TO_UTF32_H
+    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
+    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
+    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_utf32 {
+    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
+    const __m512i t1 = _mm512_slli_epi32(t0, 6);
 
-template <endianness big_endian>
-inline size_t convert(const char16_t *buf, size_t len, char32_t *utf32_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xF800) != 0xD800) {
-      // No surrogate pair, extend 16-bit word to 32-bit word
-      *utf32_output++ = char32_t(word);
-      pos++;
-    } else {
-      // must be a surrogate pair
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return 0;
-      }
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return 0;
-      }
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      *utf32_output++ = char32_t(value);
-      pos += 2;
+    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
+    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
+    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
+
+    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
+    //    to t0
+    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
+    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
+    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
+    const __m512i t3 =
+        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
+    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
+    __m512i t5 = _mm512_ror_epi32(t4, 16);
+    const __mmask32 nonzero = _kor_mask32(
+        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
+    if (big_endian) {
+      t5 = _mm512_shuffle_epi8(t5, byteflip);
     }
+    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
+    // (zen4)
+    __m512i compressed = _mm512_maskz_compress_epi16(nonzero, t5);
+    _mm512_mask_storeu_epi16(
+        output,
+        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
+        compressed);
+    //_mm512_mask_compressstoreu_epi16(output, nonzero, t5);
   }
-  return utf32_output - start;
+
+  return count + static_cast<unsigned int>(count_ones(sp_mask));
 }
 
-template <endianness big_endian>
-inline result convert_with_errors(const char16_t *buf, size_t len,
-                                  char32_t *utf32_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    uint16_t word =
-        !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xF800) != 0xD800) {
-      // No surrogate pair, extend 16-bit word to 32-bit word
-      *utf32_output++ = char32_t(word);
-      pos++;
-    } else {
-      // must be a surrogate pair
-      uint16_t diff = uint16_t(word - 0xD800);
-      if (diff > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      if (pos + 1 >= len) {
-        return result(error_code::SURROGATE, pos);
-      } // minimal bound checking
-      uint16_t next_word = !match_system(big_endian)
-                               ? utf16::swap_bytes(data[pos + 1])
-                               : data[pos + 1];
-      uint16_t diff2 = uint16_t(next_word - 0xDC00);
-      if (diff2 > 0x3FF) {
-        return result(error_code::SURROGATE, pos);
-      }
-      uint32_t value = (diff << 10) + diff2 + 0x10000;
-      *utf32_output++ = char32_t(value);
-      pos += 2;
-    }
+/**
+ * Store the last N bytes of previous followed by 512-N bytes from input.
+ */
+template <int N> __m512i prev(__m512i input, __m512i previous) {
+  static_assert(N <= 32, "N must be no larger than 32");
+  const __m512i movemask =
+      _mm512_setr_epi32(28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+  const __m512i rotated = _mm512_permutex2var_epi32(input, movemask, previous);
+#if SIMDUTF_GCC8 || SIMDUTF_GCC9
+  constexpr int shift = 16 - N; // workaround for GCC8,9
+  return _mm512_alignr_epi8(input, rotated, shift);
+#else
+  return _mm512_alignr_epi8(input, rotated, 16 - N);
+#endif // SIMDUTF_GCC8 || SIMDUTF_GCC9
+}
+
+template <unsigned idx0, unsigned idx1, unsigned idx2, unsigned idx3>
+__m512i shuffle_epi128(__m512i v) {
+  static_assert((idx0 >= 0 && idx0 <= 3), "idx0 must be in range 0..3");
+  static_assert((idx1 >= 0 && idx1 <= 3), "idx1 must be in range 0..3");
+  static_assert((idx2 >= 0 && idx2 <= 3), "idx2 must be in range 0..3");
+  static_assert((idx3 >= 0 && idx3 <= 3), "idx3 must be in range 0..3");
+
+  constexpr unsigned shuffle = idx0 | (idx1 << 2) | (idx2 << 4) | (idx3 << 6);
+  return _mm512_shuffle_i32x4(v, v, shuffle);
+}
+
+template <unsigned idx> constexpr __m512i broadcast_epi128(__m512i v) {
+  return shuffle_epi128<idx, idx, idx, idx>(v);
+}
+
+/**
+ * Current unused.
+ */
+template <int N> __m512i rotate_by_N_epi8(const __m512i input) {
+
+  // lanes order: 1, 2, 3, 0 => 0b00_11_10_01
+  const __m512i permuted = _mm512_shuffle_i32x4(input, input, 0x39);
+
+  return _mm512_alignr_epi8(permuted, input, N);
+}
+
+/*
+    expanded_utf8_to_utf32 converts expanded UTF-8 characters (`utf8`)
+    stored at separate 32-bit lanes.
+
+    For each lane we have also a character class (`char_class), given in form
+    0x8080800N, where N is 4 highest bits from the leading byte; 0x80 resets
+    corresponding bytes during pshufb.
+*/
+simdutf_really_inline __m512i expanded_utf8_to_utf32(__m512i char_class,
+                                                     __m512i utf8) {
+  /*
+      Input:
+      - utf8: bytes stored at separate 32-bit code units
+      - valid: which code units have valid UTF-8 characters
+
+      Bit layout of single word. We show 4 cases for each possible
+      UTF-8 character encoding. The `?` denotes bits we must not
+      assume their value.
+
+      |10dd.dddd|10cc.cccc|10bb.bbbb|1111.0aaa| 4-byte char
+      |????.????|10cc.cccc|10bb.bbbb|1110.aaaa| 3-byte char
+      |????.????|????.????|10bb.bbbb|110a.aaaa| 2-byte char
+      |????.????|????.????|????.????|0aaa.aaaa| ASCII char
+        byte 3    byte 2    byte 1     byte 0
+  */
+
+  /* 1. Reset control bits of continuation bytes and the MSB
+        of the leading byte; this makes all bytes unsigned (and
+        does not alter ASCII char).
+
+      |00dd.dddd|00cc.cccc|00bb.bbbb|0111.0aaa| 4-byte char
+      |00??.????|00cc.cccc|00bb.bbbb|0110.aaaa| 3-byte char
+      |00??.????|00??.????|00bb.bbbb|010a.aaaa| 2-byte char
+      |00??.????|00??.????|00??.????|0aaa.aaaa| ASCII char
+       ^^        ^^        ^^        ^
+  */
+  __m512i values;
+  const __m512i v_3f3f_3f7f = _mm512_set1_epi32(0x3f3f3f7f);
+  values = _mm512_and_si512(utf8, v_3f3f_3f7f);
+
+  /* 2. Swap and join fields A-B and C-D
+
+      |0000.cccc|ccdd.dddd|0001.110a|aabb.bbbb| 4-byte char
+      |0000.cccc|cc??.????|0001.10aa|aabb.bbbb| 3-byte char
+      |0000.????|????.????|0001.0aaa|aabb.bbbb| 2-byte char
+      |0000.????|????.????|000a.aaaa|aa??.????| ASCII char */
+  const __m512i v_0140_0140 = _mm512_set1_epi32(0x01400140);
+  values = _mm512_maddubs_epi16(values, v_0140_0140);
+
+  /* 3. Swap and join fields AB & CD
+
+      |0000.0001|110a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char
+      |0000.0001|10aa.aabb|bbbb.cccc|cc??.????| 3-byte char
+      |0000.0001|0aaa.aabb|bbbb.????|????.????| 2-byte char
+      |0000.000a|aaaa.aa??|????.????|????.????| ASCII char */
+  const __m512i v_0001_1000 = _mm512_set1_epi32(0x00011000);
+  values = _mm512_madd_epi16(values, v_0001_1000);
+
+  /* 4. Shift left the values by variable amounts to reset highest UTF-8 bits
+      |aaab.bbbb|bccc.cccd|dddd.d000|0000.0000| 4-byte char -- by 11
+      |aaaa.bbbb|bbcc.cccc|????.??00|0000.0000| 3-byte char -- by 10
+      |aaaa.abbb|bbb?.????|????.???0|0000.0000| 2-byte char -- by 9
+      |aaaa.aaa?|????.????|????.????|?000.0000| ASCII char -- by 7 */
+  {
+    /** pshufb
+
+    continuation = 0
+    ascii    = 7
+    _2_bytes = 9
+    _3_bytes = 10
+    _4_bytes = 11
+
+    shift_left_v3 = 4 * [
+        ascii, # 0000
+        ascii, # 0001
+        ascii, # 0010
+        ascii, # 0011
+        ascii, # 0100
+        ascii, # 0101
+        ascii, # 0110
+        ascii, # 0111
+        continuation, # 1000
+        continuation, # 1001
+        continuation, # 1010
+        continuation, # 1011
+        _2_bytes, # 1100
+        _2_bytes, # 1101
+        _3_bytes, # 1110
+        _4_bytes, # 1111
+    ] */
+    const __m512i shift_left_v3 = _mm512_setr_epi64(
+        0x0707070707070707, 0x0b0a090900000000, 0x0707070707070707,
+        0x0b0a090900000000, 0x0707070707070707, 0x0b0a090900000000,
+        0x0707070707070707, 0x0b0a090900000000);
+
+    const __m512i shift = _mm512_shuffle_epi8(shift_left_v3, char_class);
+    values = _mm512_sllv_epi32(values, shift);
   }
-  return result(error_code::SUCCESS, utf32_output - start);
+
+  /* 5. Shift right the values by variable amounts to reset lowest bits
+      |0000.0000|000a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char -- by 11
+      |0000.0000|0000.0000|aaaa.bbbb|bbcc.cccc| 3-byte char -- by 16
+      |0000.0000|0000.0000|0000.0aaa|aabb.bbbb| 2-byte char -- by 21
+      |0000.0000|0000.0000|0000.0000|0aaa.aaaa| ASCII char -- by 25 */
+  {
+    // 4 * [25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 21, 21, 16, 11]
+    const __m512i shift_right = _mm512_setr_epi64(
+        0x1919191919191919, 0x0b10151500000000, 0x1919191919191919,
+        0x0b10151500000000, 0x1919191919191919, 0x0b10151500000000,
+        0x1919191919191919, 0x0b10151500000000);
+
+    const __m512i shift = _mm512_shuffle_epi8(shift_right, char_class);
+    values = _mm512_srlv_epi32(values, shift);
+  }
+
+  return values;
 }
 
-} // namespace utf16_to_utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+simdutf_really_inline __m512i expand_and_identify(__m512i lane0, __m512i lane1,
+                                                  int &count) {
+  const __m512i merged = _mm512_mask_mov_epi32(lane0, 0x1000, lane1);
+  const __m512i expand_ver2 = _mm512_setr_epi64(
+      0x0403020103020100, 0x0605040305040302, 0x0807060507060504,
+      0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,
+      0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);
+  const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);
+  const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);
+  const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);
+  const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);
+  const __mmask16 leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);
+  count = static_cast<int>(count_ones(leading_bytes));
+  return _mm512_mask_compress_epi32(_mm512_setzero_si512(), leading_bytes,
+                                    input);
+}
 
-#endif
-/* end file src/scalar/utf16_to_utf32/utf16_to_utf32.h */
+simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
+  __m512i char_class = _mm512_srli_epi32(input, 4);
+  /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */
+  const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);
+  const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);
+  char_class =
+      _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea);
+  return expanded_utf8_to_utf32(char_class, input);
+}
+/* end file src/icelake/icelake_utf8_common.inl.cpp */
+/* begin file src/icelake/icelake_macros.inl.cpp */
 
-/* begin file src/scalar/utf8_to_utf16/valid_utf8_to_utf16.h */
-#ifndef SIMDUTF_VALID_UTF8_TO_UTF16_H
-#define SIMDUTF_VALID_UTF8_TO_UTF16_H
+/*
+    This upcoming macro (SIMDUTF_ICELAKE_TRANSCODE16) takes 16 + 4 bytes (of a
+   UTF-8 string) and loads all possible 4-byte substring into an AVX512
+   register.
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_utf16 {
+    For example if we have bytes abcdefgh... we create following 32-bit lanes
 
-template <endianness big_endian>
-inline size_t convert_valid(const char *buf, size_t len,
-                            char16_t *utf16_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    // try to convert the next block of 8 ASCII bytes
-    if (pos + 8 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 8;
-        while (pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian)
-                                ? char16_t(utf16::swap_bytes(buf[pos]))
-                                : char16_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(leading_byte))
-                            : char16_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 1 >= len) {
-        break;
-      } // minimal bound checking
-      uint16_t code_point = uint16_t(((leading_byte & 0b00011111) << 6) |
-                                     (data[pos + 1] & 0b00111111));
-      if (!match_system(big_endian)) {
-        code_point = utf16::swap_bytes(uint16_t(code_point));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 2 >= len) {
-        break;
-      } // minimal bound checking
-      uint16_t code_point = uint16_t(((leading_byte & 0b00001111) << 12) |
-                                     ((data[pos + 1] & 0b00111111) << 6) |
-                                     (data[pos + 2] & 0b00111111));
-      if (!match_system(big_endian)) {
-        code_point = utf16::swap_bytes(uint16_t(code_point));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        break;
-      } // minimal bound checking
-      uint32_t code_point = ((leading_byte & 0b00000111) << 18) |
-                            ((data[pos + 1] & 0b00111111) << 12) |
-                            ((data[pos + 2] & 0b00111111) << 6) |
-                            (data[pos + 3] & 0b00111111);
-      code_point -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
-      }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
-      pos += 4;
-    } else {
-      // we may have a continuation but we do not do error checking
-      return 0;
-    }
+    [abcd|bcde|cdef|defg|efgh|...]
+     ^                          ^
+     byte 0 of reg              byte 63 of reg
+*/
+/** pshufb
+        # lane{0,1,2} have got bytes: [  0,  1,  2,  3,  4,  5,  6,  8,  9, 10,
+   11, 12, 13, 14, 15] # lane3 has got bytes:        [ 16, 17, 18, 19,  4,  5,
+   6,  8,  9, 10, 11, 12, 13, 14, 15]
+
+        expand_ver2 = [
+            # lane 0:
+            0, 1, 2, 3,
+            1, 2, 3, 4,
+            2, 3, 4, 5,
+            3, 4, 5, 6,
+
+            # lane 1:
+            4, 5, 6, 7,
+            5, 6, 7, 8,
+            6, 7, 8, 9,
+            7, 8, 9, 10,
+
+            # lane 2:
+             8,  9, 10, 11,
+             9, 10, 11, 12,
+            10, 11, 12, 13,
+            11, 12, 13, 14,
+
+            # lane 3 order: 13, 14, 15, 16 14, 15, 16, 17, 15, 16, 17, 18, 16,
+   17, 18, 19 12, 13, 14, 15, 13, 14, 15,  0, 14, 15,  0,  1, 15,  0,  1,  2,
+        ]
+*/
+
+#define SIMDUTF_ICELAKE_TRANSCODE16(LANE0, LANE1, MASKED)                      \
+  {                                                                            \
+    const __m512i merged = _mm512_mask_mov_epi32(LANE0, 0x1000, LANE1);        \
+    const __m512i expand_ver2 = _mm512_setr_epi64(                             \
+        0x0403020103020100, 0x0605040305040302, 0x0807060507060504,            \
+        0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,            \
+        0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);                               \
+    const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);            \
+                                                                               \
+    __mmask16 leading_bytes;                                                   \
+    const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);                       \
+    const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);                   \
+    const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);                       \
+    leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);                 \
+                                                                               \
+    __m512i char_class;                                                        \
+    char_class = _mm512_srli_epi32(input, 4);                                  \
+    /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */                     \
+    const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);                       \
+    const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);                 \
+    char_class =                                                               \
+        _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea); \
+                                                                               \
+    const int valid_count = static_cast<int>(count_ones(leading_bytes));       \
+    const __m512i utf32 = expanded_utf8_to_utf32(char_class, input);           \
+                                                                               \
+    const __m512i out = _mm512_mask_compress_epi32(_mm512_setzero_si512(),     \
+                                                   leading_bytes, utf32);      \
+                                                                               \
+    if (UTF32) {                                                               \
+      if (MASKED) {                                                            \
+        const __mmask16 valid = uint16_t((1 << valid_count) - 1);              \
+        _mm512_mask_storeu_epi32((__m512i *)output, valid, out);               \
+      } else {                                                                 \
+        _mm512_storeu_si512((__m512i *)output, out);                           \
+      }                                                                        \
+      output += valid_count;                                                   \
+    } else {                                                                   \
+      if (MASKED) {                                                            \
+        output += utf32_to_utf16_masked<big_endian>(                           \
+            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
+      } else {                                                                 \
+        output += utf32_to_utf16<big_endian>(                                  \
+            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
+      }                                                                        \
+    }                                                                          \
   }
-  return utf16_output - start;
-}
 
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+#define SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(INPUT, VALID_COUNT, MASKED)       \
+  {                                                                            \
+    if (UTF32) {                                                               \
+      if (MASKED) {                                                            \
+        const __mmask16 valid_mask = uint16_t((1 << VALID_COUNT) - 1);         \
+        _mm512_mask_storeu_epi32((__m512i *)output, valid_mask, INPUT);        \
+      } else {                                                                 \
+        _mm512_storeu_si512((__m512i *)output, INPUT);                         \
+      }                                                                        \
+      output += VALID_COUNT;                                                   \
+    } else {                                                                   \
+      if (MASKED) {                                                            \
+        output += utf32_to_utf16_masked<big_endian>(                           \
+            byteflip, INPUT, VALID_COUNT,                                      \
+            reinterpret_cast<char16_t *>(output));                             \
+      } else {                                                                 \
+        output +=                                                              \
+            utf32_to_utf16<big_endian>(byteflip, INPUT, VALID_COUNT,           \
+                                       reinterpret_cast<char16_t *>(output));  \
+      }                                                                        \
+    }                                                                          \
+  }
 
-#endif
-/* end file src/scalar/utf8_to_utf16/valid_utf8_to_utf16.h */
-/* begin file src/scalar/utf8_to_utf16/utf8_to_utf16.h */
-#ifndef SIMDUTF_UTF8_TO_UTF16_H
-#define SIMDUTF_UTF8_TO_UTF16_H
+#define SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)                       \
+  if (UTF32) {                                                                 \
+    const __m128i t0 = _mm512_castsi512_si128(utf8);                           \
+    const __m128i t1 = _mm512_extracti32x4_epi32(utf8, 1);                     \
+    const __m128i t2 = _mm512_extracti32x4_epi32(utf8, 2);                     \
+    const __m128i t3 = _mm512_extracti32x4_epi32(utf8, 3);                     \
+    _mm512_storeu_si512((__m512i *)(output + 0 * 16),                          \
+                        _mm512_cvtepu8_epi32(t0));                             \
+    _mm512_storeu_si512((__m512i *)(output + 1 * 16),                          \
+                        _mm512_cvtepu8_epi32(t1));                             \
+    _mm512_storeu_si512((__m512i *)(output + 2 * 16),                          \
+                        _mm512_cvtepu8_epi32(t2));                             \
+    _mm512_storeu_si512((__m512i *)(output + 3 * 16),                          \
+                        _mm512_cvtepu8_epi32(t3));                             \
+  } else {                                                                     \
+    const __m256i h0 = _mm512_castsi512_si256(utf8);                           \
+    const __m256i h1 = _mm512_extracti64x4_epi64(utf8, 1);                     \
+    if (big_endian) {                                                          \
+      _mm512_storeu_si512(                                                     \
+          (__m512i *)(output + 0 * 16),                                        \
+          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h0), byteflip));            \
+      _mm512_storeu_si512(                                                     \
+          (__m512i *)(output + 2 * 16),                                        \
+          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h1), byteflip));            \
+    } else {                                                                   \
+      _mm512_storeu_si512((__m512i *)(output + 0 * 16),                        \
+                          _mm512_cvtepu8_epi16(h0));                           \
+      _mm512_storeu_si512((__m512i *)(output + 2 * 16),                        \
+                          _mm512_cvtepu8_epi16(h1));                           \
+    }                                                                          \
+  }
+/* end file src/icelake/icelake_macros.inl.cpp */
+/* begin file src/icelake/icelake_from_valid_utf8.inl.cpp */
+// file included directly
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_utf16 {
+// File contains conversion procedure from VALID UTF-8 strings.
 
-template <endianness big_endian>
-inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian)
-                                ? char16_t(utf16::swap_bytes(buf[pos]))
-                                : char16_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
+/*
+    valid_utf8_to_fixed_length converts a valid UTF-8 string into UTF-32.
 
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(leading_byte))
-                            : char16_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      // range check
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) {
-        return 0;
-      }
-      if (!match_system(big_endian)) {
-        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 2 >= len) {
-        return 0;
-      } // minimal bound checking
+    The `OUTPUT` template type decides what to do with UTF-32: store
+    it directly or convert into UTF-16 (with AVX512).
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      // range check
-      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                            (data[pos + 1] & 0b00111111) << 6 |
-                            (data[pos + 2] & 0b00111111);
-      if (code_point < 0x800 || 0xffff < code_point ||
-          (0xd7ff < code_point && code_point < 0xe000)) {
-        return 0;
-      }
-      if (!match_system(big_endian)) {
-        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        return 0;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
+    Input:
+    - str           - valid UTF-8 string
+    - len           - string length
+    - out_buffer    - output buffer
 
-      // range check
-      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
-                            (data[pos + 1] & 0b00111111) << 12 |
-                            (data[pos + 2] & 0b00111111) << 6 |
-                            (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) {
-        return 0;
-      }
-      code_point -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
-      }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
-      pos += 4;
-    } else {
-      return 0;
-    }
-  }
-  return utf16_output - start;
-}
+    Result:
+    - pair.first    - the first unprocessed input byte
+    - pair.second   - the first unprocessed output word
+*/
+template <endianness big_endian, typename OUTPUT>
+std::pair<const char *, OUTPUT *>
+valid_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
 
-template <endianness big_endian>
-inline result convert_with_errors(const char *buf, size_t len,
-                                  char16_t *utf16_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *utf16_output++ = !match_system(big_endian)
-                                ? char16_t(utf16::swap_bytes(buf[pos]))
-                                : char16_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf16_output++ = !match_system(big_endian)
-                            ? char16_t(utf16::swap_bytes(leading_byte))
-                            : char16_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 1 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (!match_system(big_endian)) {
-        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8, it should become
-      // a single UTF-16 word.
-      if (pos + 2 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  const char *ptr = str;
+  const char *end = ptr + len;
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                            (data[pos + 1] & 0b00111111) << 6 |
-                            (data[pos + 2] & 0b00111111);
-      if ((code_point < 0x800) || (0xffff < code_point)) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0xd7ff < code_point && code_point < 0xe000) {
-        return result(error_code::SURROGATE, pos);
-      }
-      if (!match_system(big_endian)) {
-        code_point = uint32_t(utf16::swap_bytes(uint16_t(code_point)));
-      }
-      *utf16_output++ = char16_t(code_point);
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
+  OUTPUT *output = dwords;
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   * We check for ptr + 64 + 64 <= end because
+   * we want to be do maskless writes without overruns.
+   */
+  while (end - ptr >= 64 + 4) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
+    if (ascii == 0) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
+    }
 
-      // range check
-      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
-                            (data[pos + 1] & 0b00111111) << 12 |
-                            (data[pos + 2] & 0b00111111) << 6 |
-                            (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0x10ffff < code_point) {
-        return result(error_code::TOO_LARGE, pos);
-      }
-      code_point -= 0x10000;
-      uint16_t high_surrogate = uint16_t(0xD800 + (code_point >> 10));
-      uint16_t low_surrogate = uint16_t(0xDC00 + (code_point & 0x3FF));
-      if (!match_system(big_endian)) {
-        high_surrogate = utf16::swap_bytes(high_surrogate);
-        low_surrogate = utf16::swap_bytes(low_surrogate);
-      }
-      *utf16_output++ = char16_t(high_surrogate);
-      *utf16_output++ = char16_t(low_surrogate);
-      pos += 4;
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
     } else {
-      // we either have too many continuation bytes or an invalid leading byte
-      if ((leading_byte & 0b11000000) == 0b10000000) {
-        return result(error_code::TOO_LONG, pos);
-      } else {
-        return result(error_code::HEADER_BITS, pos);
-      }
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
     }
-  }
-  return result(error_code::SUCCESS, utf16_output - start);
-}
-
-/**
- * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
- * we have up to len input bytes left, and we encountered some error. It is
- * possible that the error is at 'buf' exactly, but it could also be in the
- * previous bytes  (up to 3 bytes back).
- *
- * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
- * current memory section and can be safely accessed. We prior_bytes to access
- * safely up to three bytes before 'buf'.
- *
- * The caller is responsible to ensure that len > 0.
- *
- * If the error is believed to have occurred prior to 'buf', the count value
- * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
- */
-template <endianness endian>
-inline result rewind_and_convert_with_errors(size_t prior_bytes,
-                                             const char *buf, size_t len,
-                                             char16_t *utf16_output) {
-  size_t extra_len{0};
-  // We potentially need to go back in time and find a leading byte.
-  // In theory '3' would be sufficient, but sometimes the error can go back
-  // quite far.
-  size_t how_far_back = prior_bytes;
-  // size_t how_far_back = 3; // 3 bytes in the past + current position
-  // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
-  bool found_leading_bytes{false};
-  // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for (size_t i = 0; i <= how_far_back; i++) {
-    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
-    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if (found_leading_bytes) {
-      if (i > 0 && byte < 128) {
-        // If we had to go back and the leading byte is ascii
-        // then we can stop right away.
-        return result(error_code::TOO_LONG, 0 - i + 1);
-      }
-      buf -= i;
-      extra_len = i;
-      break;
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    } else {
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
     }
+    ptr += 4 * 16;
   }
-  //
-  // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
-  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
-  // unsigned integral type of the result of the sizeof operator
-  //
-  // An unsigned type will simply wrap round arithmetically (well defined).
-  //
-  if (!found_leading_bytes) {
-    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is
-    // continuation] Or we possibly have a stream that does not start with a
-    // leading byte.
-    return result(error_code::TOO_LONG, 0 - how_far_back);
-  }
-  result res = convert_with_errors<endian>(buf, len + extra_len, utf16_output);
-  if (res.error) {
-    res.count -= extra_len;
+
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
+    if (ascii == 0) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+      }
+
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+
+      ptr += 3 * 16;
+    }
   }
-  return res;
+  return {ptr, output};
 }
 
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
+/* end file src/icelake/icelake_from_valid_utf8.inl.cpp */
+/* begin file src/icelake/icelake_utf8_validation.inl.cpp */
+// file included directly
 
-#endif
-/* end file src/scalar/utf8_to_utf16/utf8_to_utf16.h */
+simdutf_really_inline __m512i check_special_cases(__m512i input,
+                                                  const __m512i prev1) {
+  __m512i mask1 = _mm512_setr_epi64(0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080,
+                                    0x0202020202020202, 0x4915012180808080);
+  const __m512i v_0f = _mm512_set1_epi8(0x0f);
+  __m512i index1 = _mm512_and_si512(_mm512_srli_epi16(prev1, 4), v_0f);
 
-/* begin file src/scalar/utf8_to_utf32/valid_utf8_to_utf32.h */
-#ifndef SIMDUTF_VALID_UTF8_TO_UTF32_H
-#define SIMDUTF_VALID_UTF8_TO_UTF32_H
+  __m512i byte_1_high = _mm512_shuffle_epi8(mask1, index1);
+  __m512i mask2 = _mm512_setr_epi64(0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
+                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb);
+  __m512i index2 = _mm512_and_si512(prev1, v_0f);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_utf32 {
+  __m512i byte_1_low = _mm512_shuffle_epi8(mask2, index2);
+  __m512i mask3 =
+      _mm512_setr_epi64(0x101010101010101, 0x1010101babaaee6, 0x101010101010101,
+                        0x1010101babaaee6, 0x101010101010101, 0x1010101babaaee6,
+                        0x101010101010101, 0x1010101babaaee6);
+  __m512i index3 = _mm512_and_si512(_mm512_srli_epi16(input, 4), v_0f);
+  __m512i byte_2_high = _mm512_shuffle_epi8(mask3, index3);
+  return _mm512_ternarylogic_epi64(byte_1_high, byte_1_low, byte_2_high, 128);
+}
 
-inline size_t convert_valid(const char *buf, size_t len,
-                            char32_t *utf32_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    // try to convert the next block of 8 ASCII bytes
-    if (pos + 8 <=
-        len) { // if it is safe to read 8 more bytes, check that they are ascii
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 8;
-        while (pos < final_pos) {
-          *utf32_output++ = char32_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf32_output++ = char32_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        break;
-      } // minimal bound checking
-      *utf32_output++ = char32_t(((leading_byte & 0b00011111) << 6) |
-                                 (data[pos + 1] & 0b00111111));
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8
-      if (pos + 2 >= len) {
-        break;
-      } // minimal bound checking
-      *utf32_output++ = char32_t(((leading_byte & 0b00001111) << 12) |
-                                 ((data[pos + 1] & 0b00111111) << 6) |
-                                 (data[pos + 2] & 0b00111111));
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        break;
-      } // minimal bound checking
-      uint32_t code_word = ((leading_byte & 0b00000111) << 18) |
-                           ((data[pos + 1] & 0b00111111) << 12) |
-                           ((data[pos + 2] & 0b00111111) << 6) |
-                           (data[pos + 3] & 0b00111111);
-      *utf32_output++ = char32_t(code_word);
-      pos += 4;
-    } else {
-      // we may have a continuation but we do not do error checking
-      return 0;
-    }
-  }
-  return utf32_output - start;
+simdutf_really_inline __m512i check_multibyte_lengths(const __m512i input,
+                                                      const __m512i prev_input,
+                                                      const __m512i sc) {
+  __m512i prev2 = prev<2>(input, prev_input);
+  __m512i prev3 = prev<3>(input, prev_input);
+  __m512i is_third_byte = _mm512_subs_epu8(
+      prev2, _mm512_set1_epi8(0b11100000u - 1)); // Only 111_____ will be > 0
+  __m512i is_fourth_byte = _mm512_subs_epu8(
+      prev3, _mm512_set1_epi8(0b11110000u - 1)); // Only 1111____ will be > 0
+  __m512i is_third_or_fourth_byte =
+      _mm512_or_si512(is_third_byte, is_fourth_byte);
+  const __m512i v_7f = _mm512_set1_epi8(char(0x7f));
+  is_third_or_fourth_byte = _mm512_adds_epu8(v_7f, is_third_or_fourth_byte);
+  // We want to compute (is_third_or_fourth_byte AND v80) XOR sc.
+  const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+  return _mm512_ternarylogic_epi32(is_third_or_fourth_byte, v_80, sc,
+                                   0b1101010);
+  //__m512i is_third_or_fourth_byte_mask =
+  //_mm512_and_si512(is_third_or_fourth_byte, v_80); return
+  // _mm512_xor_si512(is_third_or_fourth_byte_mask, sc);
+}
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline __m512i is_incomplete(const __m512i input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  __m512i max_value = _mm512_setr_epi64(0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xffffffffffffffff,
+                                        0xffffffffffffffff, 0xbfdfefffffffffff);
+  return _mm512_subs_epu8(input, max_value);
 }
 
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+struct avx512_utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  __m512i error{};
 
-#endif
-/* end file src/scalar/utf8_to_utf32/valid_utf8_to_utf32.h */
-/* begin file src/scalar/utf8_to_utf32/utf8_to_utf32.h */
-#ifndef SIMDUTF_UTF8_TO_UTF32_H
-#define SIMDUTF_UTF8_TO_UTF32_H
+  // The last input we received
+  __m512i prev_input_block{};
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  __m512i prev_incomplete{};
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_utf32 {
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const __m512i input,
+                                              const __m512i prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    __m512i prev1 = prev<1>(input, prev_input);
+    __m512i sc = check_special_cases(input, prev1);
+    this->error = _mm512_or_si512(
+        check_multibyte_lengths(input, prev_input, sc), this->error);
+  }
 
-inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *utf32_output++ = char32_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error = _mm512_or_si512(this->error, this->prev_incomplete);
+  }
+
+  // returns true if ASCII.
+  simdutf_really_inline bool check_next_input(const __m512i input) {
+    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
+    const __mmask64 ascii = _mm512_test_epi8_mask(input, v_80);
+    if (ascii == 0) {
+      this->error = _mm512_or_si512(this->error, this->prev_incomplete);
+      return true;
+    } else {
+      this->check_utf8_bytes(input, this->prev_input_block);
+      this->prev_incomplete = is_incomplete(input);
+      this->prev_input_block = input;
+      return false;
     }
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf32_output++ = char32_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      // range check
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) {
-        return 0;
-      }
-      *utf32_output++ = char32_t(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8
-      if (pos + 2 >= len) {
-        return 0;
-      } // minimal bound checking
+  }
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return _mm512_test_epi8_mask(this->error, this->error) != 0;
+  }
+}; // struct avx512_utf8_checker
+/* end file src/icelake/icelake_utf8_validation.inl.cpp */
+/* begin file src/icelake/icelake_from_utf8.inl.cpp */
+// file included directly
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      // range check
-      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                            (data[pos + 1] & 0b00111111) << 6 |
-                            (data[pos + 2] & 0b00111111);
-      if (code_point < 0x800 || 0xffff < code_point ||
-          (0xd7ff < code_point && code_point < 0xe000)) {
-        return 0;
-      }
-      *utf32_output++ = char32_t(code_point);
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        return 0;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return 0;
-      }
+// File contains conversion procedure from possibly invalid UTF-8 strings.
 
-      // range check
-      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
-                            (data[pos + 1] & 0b00111111) << 12 |
-                            (data[pos + 2] & 0b00111111) << 6 |
-                            (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff || 0x10ffff < code_point) {
-        return 0;
-      }
-      *utf32_output++ = char32_t(code_point);
-      pos += 4;
+/**
+ * Attempts to convert up to len 1-byte code units from in (in UTF-8 format) to
+ * out.
+ * Returns the position of the input and output after the processing is
+ * completed. Upon error, the output is set to null.
+ */
+
+template <endianness big_endian>
+utf8_to_utf16_result
+fast_avx512_convert_utf8_to_utf16(const char *in, size_t len, char16_t *out) {
+  const char *const final_in = in + len;
+  bool result = true;
+  while (result) {
+    if (final_in - in >= 64) {
+      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
+          in, out, final_in - in);
+    } else if (in < final_in) {
+      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
+          in, out, final_in - in);
     } else {
-      return 0;
+      break;
     }
   }
-  return utf32_output - start;
+  if (!result) {
+    out = nullptr;
+  }
+  return std::make_pair(in, out);
 }
 
-inline result convert_with_errors(const char *buf, size_t len,
-                                  char32_t *utf32_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2};
-      if ((v & 0x8080808080808080) == 0) {
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *utf32_output++ = char32_t(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
+template <endianness big_endian>
+simdutf::result fast_avx512_convert_utf8_to_utf16_with_errors(const char *in,
+                                                              size_t len,
+                                                              char16_t *out) {
+  const char *const init_in = in;
+  const char16_t *const init_out = out;
+  const char *const final_in = in + len;
+  bool result = true;
+  while (result) {
+    if (final_in - in >= 64) {
+      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
+          in, out, final_in - in);
+    } else if (in < final_in) {
+      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
+          in, out, final_in - in);
+    } else {
+      break;
     }
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *utf32_output++ = char32_t(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) == 0b11000000) {
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
-      if (code_point < 0x80 || 0x7ff < code_point) {
-        return result(error_code::OVERLONG, pos);
+  }
+  if (!result) {
+    size_t pos = size_t(in - init_in);
+    if (pos < len && (init_in[pos] & 0xc0) == 0x80 && pos >= 64) {
+      // We must check whether we are the fourth continuation byte
+      bool c1 = (init_in[pos - 1] & 0xc0) == 0x80;
+      bool c2 = (init_in[pos - 2] & 0xc0) == 0x80;
+      bool c3 = (init_in[pos - 3] & 0xc0) == 0x80;
+      if (c1 && c2 && c3) {
+        return {simdutf::TOO_LONG, pos};
       }
-      *utf32_output++ = char32_t(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8
-      if (pos + 2 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
+    }
+    // rewind_and_convert_with_errors will seek a potential error from in
+    // onward, with the ability to go back up to in - init_in bytes, and read
+    // final_in - in bytes forward.
+    simdutf::result res =
+        scalar::utf8_to_utf16::rewind_and_convert_with_errors<big_endian>(
+            in - init_in, in, final_in - in, out);
+    res.count += (in - init_in);
+    return res;
+  } else {
+    return simdutf::result(error_code::SUCCESS, out - init_out);
+  }
+}
 
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      // range check
-      uint32_t code_point = (leading_byte & 0b00001111) << 12 |
-                            (data[pos + 1] & 0b00111111) << 6 |
-                            (data[pos + 2] & 0b00111111);
-      if (code_point < 0x800 || 0xffff < code_point) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0xd7ff < code_point && code_point < 0xe000) {
-        return result(error_code::SURROGATE, pos);
-      }
-      *utf32_output++ = char32_t(code_point);
-      pos += 3;
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      if (pos + 3 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 2] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
-      if ((data[pos + 3] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      }
+template <endianness big_endian, typename OUTPUT>
+// todo: replace with the utf-8 to utf-16 routine adapted to utf-32. This code
+// is legacy.
+std::pair<const char *, OUTPUT *>
+validating_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
 
-      // range check
-      uint32_t code_point = (leading_byte & 0b00000111) << 18 |
-                            (data[pos + 1] & 0b00111111) << 12 |
-                            (data[pos + 2] & 0b00111111) << 6 |
-                            (data[pos + 3] & 0b00111111);
-      if (code_point <= 0xffff) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0x10ffff < code_point) {
-        return result(error_code::TOO_LARGE, pos);
-      }
-      *utf32_output++ = char32_t(code_point);
-      pos += 4;
+  const char *ptr = str;
+  const char *end = ptr + len;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  OUTPUT *output = dwords;
+  avx512_utf8_checker checker{};
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   * We use masked writes to avoid overruns, see
+   * https://github.com/simdutf/simdutf/issues/471
+   */
+  while (end - ptr >= 64 + 4) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    if (checker.check_next_input(utf8)) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
+    }
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+    } else {
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+    }
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
     } else {
-      // we either have too many continuation bytes or an invalid leading byte
-      if ((leading_byte & 0b11000000) == 0b10000000) {
-        return result(error_code::TOO_LONG, pos);
-      } else {
-        return result(error_code::HEADER_BITS, pos);
-      }
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
     }
+    ptr += 4 * 16;
   }
-  return result(error_code::SUCCESS, utf32_output - start);
-}
+  const char *validatedptr = ptr; // validated up to ptr
 
-/**
- * When rewind_and_convert_with_errors is called, we are pointing at 'buf' and
- * we have up to len input bytes left, and we encountered some error. It is
- * possible that the error is at 'buf' exactly, but it could also be in the
- * previous bytes location (up to 3 bytes back).
- *
- * prior_bytes indicates how many bytes, prior to 'buf' may belong to the
- * current memory section and can be safely accessed. We prior_bytes to access
- * safely up to three bytes before 'buf'.
- *
- * The caller is responsible to ensure that len > 0.
- *
- * If the error is believed to have occurred prior to 'buf', the count value
- * contain in the result will be SIZE_T - 1, SIZE_T - 2, or SIZE_T - 3.
- */
-inline result rewind_and_convert_with_errors(size_t prior_bytes,
-                                             const char *buf, size_t len,
-                                             char32_t *utf32_output) {
-  size_t extra_len{0};
-  // We potentially need to go back in time and find a leading byte.
-  size_t how_far_back = 3; // 3 bytes in the past + current position
-  if (how_far_back > prior_bytes) {
-    how_far_back = prior_bytes;
-  }
-  bool found_leading_bytes{false};
-  // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for (size_t i = 0; i <= how_far_back; i++) {
-    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
-    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if (found_leading_bytes) {
-      if (i > 0 && byte < 128) {
-        // If we had to go back and the leading byte is ascii
-        // then we can stop right away.
-        return result(error_code::TOO_LONG, 0 - i + 1);
+  // For the final pass, we validate 64 bytes, but we only transcode
+  // 3*16 bytes, so we may end up double-validating 16 bytes.
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    if (checker.check_next_input(utf8)) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
       }
-      buf -= i;
-      extra_len = i;
-      break;
+
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+
+      ptr += 3 * 16;
     }
+    validatedptr += 4 * 16;
   }
-  //
-  // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
-  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
-  // unsigned integral type of the result of the sizeof operator
-  //
-  // An unsigned type will simply wrap round arithmetically (well defined).
-  //
-  if (!found_leading_bytes) {
-    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is
-    // continuation] Or we possibly have a stream that does not start with a
-    // leading byte.
-    return result(error_code::TOO_LONG, 0 - how_far_back);
+  if (end != validatedptr) {
+    const __m512i utf8 =
+        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
+                                (const __m512i *)validatedptr);
+    checker.check_next_input(utf8);
   }
-
-  result res = convert_with_errors(buf, len + extra_len, utf32_output);
-  if (res.error) {
-    res.count -= extra_len;
+  checker.check_eof();
+  if (checker.errors()) {
+    return {ptr, nullptr}; // We found an error.
   }
-  return res;
+  return {ptr, output};
 }
 
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
-
-#endif
-/* end file src/scalar/utf8_to_utf32/utf8_to_utf32.h */
+// Like validating_utf8_to_fixed_length but returns as soon as an error is
+// identified todo: replace with the utf-8 to utf-16 routine adapted to utf-32.
+// This code is legacy.
+template <endianness big_endian, typename OUTPUT>
+std::tuple<const char *, OUTPUT *, bool>
+validating_utf8_to_fixed_length_with_constant_checks(const char *str,
+                                                     size_t len,
+                                                     OUTPUT *dwords) {
+  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
+  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
+  static_assert(
+      UTF32 or UTF16,
+      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
+  static_assert(!(UTF32 and big_endian),
+                "we do not currently support big-endian UTF-32");
 
-/* begin file src/scalar/latin1_to_utf16/latin1_to_utf16.h */
-#ifndef SIMDUTF_LATIN1_TO_UTF16_H
-#define SIMDUTF_LATIN1_TO_UTF16_H
+  const char *ptr = str;
+  const char *end = ptr + len;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  OUTPUT *output = dwords;
+  avx512_utf8_checker checker{};
+  /**
+   * In the main loop, we consume 64 bytes per iteration,
+   * but we access 64 + 4 bytes.
+   */
+  while (end - ptr >= 4 + 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    bool ascii = checker.check_next_input(utf8);
+    if (checker.errors()) {
+      return {ptr, output, false}; // We found an error.
+    }
+    if (ascii) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+      continue;
+    }
+    const __m512i lane0 = broadcast_epi128<0>(utf8);
+    const __m512i lane1 = broadcast_epi128<1>(utf8);
+    int valid_count0;
+    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+    const __m512i lane2 = broadcast_epi128<2>(utf8);
+    int valid_count1;
+    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+    if (valid_count0 + valid_count1 <= 16) {
+      vec0 = _mm512_mask_expand_epi32(
+          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+      valid_count0 += valid_count1;
+      vec0 = expand_utf8_to_utf32(vec0);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+    } else {
+      vec0 = expand_utf8_to_utf32(vec0);
+      vec1 = expand_utf8_to_utf32(vec1);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+    }
+    const __m512i lane3 = broadcast_epi128<3>(utf8);
+    int valid_count2;
+    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
+    uint32_t tmp1;
+    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
+    const __m512i lane4 = _mm512_set1_epi32(tmp1);
+    int valid_count3;
+    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
+    if (valid_count2 + valid_count3 <= 16) {
+      vec2 = _mm512_mask_expand_epi32(
+          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
+      valid_count2 += valid_count3;
+      vec2 = expand_utf8_to_utf32(vec2);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    } else {
+      vec2 = expand_utf8_to_utf32(vec2);
+      vec3 = expand_utf8_to_utf32(vec3);
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+    }
+    ptr += 4 * 16;
+  }
+  const char *validatedptr = ptr; // validated up to ptr
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace latin1_to_utf16 {
+  // For the final pass, we validate 64 bytes, but we only transcode
+  // 3*16 bytes, so we may end up double-validating 16 bytes.
+  if (end - ptr >= 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    bool ascii = checker.check_next_input(utf8);
+    if (checker.errors()) {
+      return {ptr, output, false}; // We found an error.
+    }
+    if (ascii) {
+      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
+      output += 64;
+      ptr += 64;
+    } else {
+      const __m512i lane0 = broadcast_epi128<0>(utf8);
+      const __m512i lane1 = broadcast_epi128<1>(utf8);
+      int valid_count0;
+      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
+      const __m512i lane2 = broadcast_epi128<2>(utf8);
+      int valid_count1;
+      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
+      if (valid_count0 + valid_count1 <= 16) {
+        vec0 = _mm512_mask_expand_epi32(
+            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
+        valid_count0 += valid_count1;
+        vec0 = expand_utf8_to_utf32(vec0);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+      } else {
+        vec0 = expand_utf8_to_utf32(vec0);
+        vec1 = expand_utf8_to_utf32(vec1);
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
+        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+      }
 
-template <endianness big_endian>
-inline size_t convert(const char *buf, size_t len, char16_t *utf16_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
+      const __m512i lane3 = broadcast_epi128<3>(utf8);
+      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
 
-  while (pos < len) {
-    uint16_t word =
-        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
-    *utf16_output++ =
-        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
-    pos++;
+      ptr += 3 * 16;
+    }
+    validatedptr += 4 * 16;
   }
-
-  return utf16_output - start;
+  if (end != validatedptr) {
+    const __m512i utf8 =
+        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
+                                (const __m512i *)validatedptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  if (checker.errors()) {
+    return {ptr, output, false}; // We found an error.
+  }
+  return {ptr, output, true};
 }
+/* end file src/icelake/icelake_from_utf8.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf8_to_latin1.inl.cpp */
+// file included directly
 
-template <endianness big_endian>
-inline result convert_with_errors(const char *buf, size_t len,
-                                  char16_t *utf16_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char16_t *start{utf16_output};
+// File contains conversion procedure from possibly invalid UTF-8 strings.
 
-  while (pos < len) {
-    uint16_t word =
-        uint16_t(data[pos]); // extend Latin-1 char to 16-bit Unicode code point
-    *utf16_output++ =
-        char16_t(match_system(big_endian) ? word : utf16::swap_bytes(word));
-    pos++;
+template <bool is_remaining>
+simdutf_really_inline size_t process_block_from_utf8_to_latin1(
+    const char *buf, size_t len, char *latin_output, __m512i minus64,
+    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
+  __mmask64 load_mask =
+      is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
+  __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
+  __mmask64 nonascii = _mm512_movepi8_mask(input);
+  if (nonascii == 0) {
+    if (*next_leading_ptr) { // If we ended with a leading byte, it is an error.
+      return 0;              // Indicates error
+    }
+    is_remaining
+        ? _mm512_mask_storeu_epi8((__m512i *)latin_output, load_mask, input)
+        : _mm512_storeu_si512((__m512i *)latin_output, input);
+    return len;
   }
 
-  return result(error_code::SUCCESS, utf16_output - start);
-}
-
-} // namespace latin1_to_utf16
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
-
-#endif
-/* end file src/scalar/latin1_to_utf16/latin1_to_utf16.h */
-/* begin file src/scalar/latin1_to_utf32/latin1_to_utf32.h */
-#ifndef SIMDUTF_LATIN1_TO_UTF32_H
-#define SIMDUTF_LATIN1_TO_UTF32_H
+  const __mmask64 leading = _mm512_cmpge_epu8_mask(input, minus64);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace latin1_to_utf32 {
+  __m512i highbits = _mm512_xor_si512(input, _mm512_set1_epi8(-62));
+  __mmask64 invalid_leading_bytes =
+      _mm512_mask_cmpgt_epu8_mask(leading, highbits, one);
 
-inline size_t convert(const char *buf, size_t len, char32_t *utf32_output) {
-  const unsigned char *data = reinterpret_cast<const unsigned char *>(buf);
-  char32_t *start{utf32_output};
-  for (size_t i = 0; i < len; i++) {
-    *utf32_output++ = (char32_t)data[i];
+  if (invalid_leading_bytes) {
+    return 0; // Indicates error
   }
-  return utf32_output - start;
-}
 
-} // namespace latin1_to_utf32
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+  __mmask64 leading_shift = (leading << 1) | *next_leading_ptr;
 
-#endif
-/* end file src/scalar/latin1_to_utf32/latin1_to_utf32.h */
+  if ((nonascii ^ leading) != leading_shift) {
+    return 0; // Indicates error
+  }
 
-/* begin file src/scalar/utf8_to_latin1/utf8_to_latin1.h */
-#ifndef SIMDUTF_UTF8_TO_LATIN1_H
-#define SIMDUTF_UTF8_TO_LATIN1_H
+  const __mmask64 bit6 = _mm512_cmpeq_epi8_mask(highbits, one);
+  input =
+      _mm512_mask_sub_epi8(input, (bit6 << 1) | *next_bit6_ptr, input, minus64);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_latin1 {
+  __mmask64 retain = ~leading & load_mask;
+  __m512i output = _mm512_maskz_compress_epi8(retain, input);
+  int64_t written_out = count_ones(retain);
+  if (written_out == 0) {
+    return 0; // Indicates error
+  }
+  *next_bit6_ptr = bit6 >> 63;
+  *next_leading_ptr = leading >> 63;
 
-inline size_t convert(const char *buf, size_t len, char *latin_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char *start{latin_output};
+  __mmask64 store_mask = ~UINT64_C(0) >> (64 - written_out);
 
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
-                           // 1000 1000 .... etc
-      if ((v & 0x8080808080808080) ==
-          0) { // if NONE of these are set, e.g. all of them are zero, then
-               // everything is ASCII
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *latin_output++ = char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
+  _mm512_mask_storeu_epi8((__m512i *)latin_output, store_mask, output);
 
-    // suppose it is not an all ASCII byte sequence
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *latin_output++ = char(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) ==
-               0b11000000) { // the first three bits indicate:
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        return 0;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      } // checks if the next byte is a valid continuation byte in UTF-8. A
-        // valid continuation byte starts with 10.
-      // range check -
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 |
-          (data[pos + 1] &
-           0b00111111); // assembles the Unicode code point from the two bytes.
-                        // It does this by discarding the leading 110 and 10
-                        // bits from the two bytes, shifting the remaining bits
-                        // of the first byte, and then combining the results
-                        // with a bitwise OR operation.
-      if (code_point < 0x80 || 0xFF < code_point) {
-        return 0; // We only care about the range 129-255 which is Non-ASCII
-                  // latin1 characters. A code_point beneath 0x80 is invalid as
-                  // it is already covered by bytes whose leading bit is zero.
-      }
-      *latin_output++ = char(code_point);
-      pos += 2;
-    } else {
-      return 0;
-    }
-  }
-  return latin_output - start;
+  return written_out;
 }
 
-inline result convert_with_errors(const char *buf, size_t len,
-                                  char *latin_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
-  size_t pos = 0;
-  char *start{latin_output};
-
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 | v2}; // We are only interested in these bits: 1000 1000
-                           // 1000 1000...etc
-      if ((v & 0x8080808080808080) ==
-          0) { // if NONE of these are set, e.g. all of them are zero, then
-               // everything is ASCII
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *latin_output++ = char(buf[pos]);
-          pos++;
-        }
-        continue;
-      }
-    }
-    // suppose it is not an all ASCII byte sequence
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *latin_output++ = char(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) ==
-               0b11000000) { // the first three bits indicate:
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        return result(error_code::TOO_SHORT, pos);
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return result(error_code::TOO_SHORT, pos);
-      } // checks if the next byte is a valid continuation byte in UTF-8. A
-        // valid continuation byte starts with 10.
-      // range check -
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 |
-          (data[pos + 1] &
-           0b00111111); // assembles the Unicode code point from the two bytes.
-                        // It does this by discarding the leading 110 and 10
-                        // bits from the two bytes, shifting the remaining bits
-                        // of the first byte, and then combining the results
-                        // with a bitwise OR operation.
-      if (code_point < 0x80) {
-        return result(error_code::OVERLONG, pos);
-      }
-      if (0xFF < code_point) {
-        return result(error_code::TOO_LARGE, pos);
-      } // We only care about the range 129-255 which is Non-ASCII latin1
-        // characters
-      *latin_output++ = char(code_point);
-      pos += 2;
-    } else if ((leading_byte & 0b11110000) == 0b11100000) {
-      // We have a three-byte UTF-8
-      return result(error_code::TOO_LARGE, pos);
-    } else if ((leading_byte & 0b11111000) == 0b11110000) { // 0b11110000
-      // we have a 4-byte UTF-8 word.
-      return result(error_code::TOO_LARGE, pos);
-    } else {
-      // we either have too many continuation bytes or an invalid leading byte
-      if ((leading_byte & 0b11000000) == 0b10000000) {
-        return result(error_code::TOO_LONG, pos);
-      }
+size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len,
+                             char *&inlatin_output) {
+  const char *buf = inbuf;
+  char *latin_output = inlatin_output;
+  char *start = latin_output;
+  size_t pos = 0;
+  __m512i minus64 = _mm512_set1_epi8(-64); // 11111111111 ... 1100 0000
+  __m512i one = _mm512_set1_epi8(1);
+  __mmask64 next_leading = 0;
+  __mmask64 next_bit6 = 0;
 
-      return result(error_code::HEADER_BITS, pos);
+  while (pos + 64 <= len) {
+    size_t written = process_block_from_utf8_to_latin1<false>(
+        buf + pos, 64, latin_output, minus64, one, &next_leading, &next_bit6);
+    if (written == 0) {
+      inlatin_output = latin_output;
+      inbuf = buf + pos - next_leading;
+      return 0; // Indicates error at pos or after, or just before pos (too
+                // short error)
     }
+    latin_output += written;
+    pos += 64;
   }
-  return result(error_code::SUCCESS, latin_output - start);
-}
 
-inline result rewind_and_convert_with_errors(size_t prior_bytes,
-                                             const char *buf, size_t len,
-                                             char *latin1_output) {
-  size_t extra_len{0};
-  // We potentially need to go back in time and find a leading byte.
-  // In theory '3' would be sufficient, but sometimes the error can go back
-  // quite far.
-  size_t how_far_back = prior_bytes;
-  // size_t how_far_back = 3; // 3 bytes in the past + current position
-  // if(how_far_back >= prior_bytes) { how_far_back = prior_bytes; }
-  bool found_leading_bytes{false};
-  // important: it is i <= how_far_back and not 'i < how_far_back'.
-  for (size_t i = 0; i <= how_far_back; i++) {
-    unsigned char byte = buf[-static_cast<std::ptrdiff_t>(i)];
-    found_leading_bytes = ((byte & 0b11000000) != 0b10000000);
-    if (found_leading_bytes) {
-      if (i > 0 && byte < 128) {
-        // If we had to go back and the leading byte is ascii
-        // then we can stop right away.
-        return result(error_code::TOO_LONG, 0 - i + 1);
-      }
-      buf -= i;
-      extra_len = i;
-      break;
+  if (pos < len) {
+    size_t remaining = len - pos;
+    size_t written = process_block_from_utf8_to_latin1<true>(
+        buf + pos, remaining, latin_output, minus64, one, &next_leading,
+        &next_bit6);
+    if (written == 0) {
+      inbuf = buf + pos - next_leading;
+      inlatin_output = latin_output;
+      return 0; // Indicates error at pos or after, or just before pos (too
+                // short error)
     }
+    latin_output += written;
   }
-  //
-  // It is possible for this function to return a negative count in its result.
-  // C++ Standard Section 18.1 defines size_t is in <cstddef> which is described
-  // in C Standard as <stddef.h>. C Standard Section 4.1.5 defines size_t as an
-  // unsigned integral type of the result of the sizeof operator
-  //
-  // An unsigned type will simply wrap round arithmetically (well defined).
-  //
-  if (!found_leading_bytes) {
-    // If how_far_back == 3, we may have four consecutive continuation bytes!!!
-    // [....] [continuation] [continuation] [continuation] | [buf is
-    // continuation] Or we possibly have a stream that does not start with a
-    // leading byte.
-    return result(error_code::TOO_LONG, 0 - how_far_back);
-  }
-  result res = convert_with_errors(buf, len + extra_len, latin1_output);
-  if (res.error) {
-    res.count -= extra_len;
+  if (next_leading) {
+    inbuf = buf + len - next_leading;
+    inlatin_output = latin_output;
+    return 0; // Indicates error at end of buffer
   }
-  return res;
+  inlatin_output = latin_output;
+  inbuf += len;
+  return size_t(latin_output - start);
 }
+/* end file src/icelake/icelake_convert_utf8_to_latin1.inl.cpp */
+/* begin file src/icelake/icelake_convert_valid_utf8_to_latin1.inl.cpp */
+// file included directly
 
-} // namespace utf8_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+// File contains conversion procedure from valid UTF-8 strings.
 
-#endif
-/* end file src/scalar/utf8_to_latin1/utf8_to_latin1.h */
-/* begin file src/scalar/utf16_to_latin1/utf16_to_latin1.h */
-#ifndef SIMDUTF_UTF16_TO_LATIN1_H
-#define SIMDUTF_UTF16_TO_LATIN1_H
+template <bool is_remaining>
+simdutf_really_inline size_t process_valid_block_from_utf8_to_latin1(
+    const char *buf, size_t len, char *latin_output, __m512i minus64,
+    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
+  __mmask64 load_mask =
+      is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
+  __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
+  __mmask64 nonascii = _mm512_movepi8_mask(input);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_latin1 {
+  if (nonascii == 0) {
+    is_remaining
+        ? _mm512_mask_storeu_epi8((__m512i *)latin_output, load_mask, input)
+        : _mm512_storeu_si512((__m512i *)latin_output, input);
+    return len;
+  }
 
-#include <cstring> // for std::memcpy
+  __mmask64 leading = _mm512_cmpge_epu8_mask(input, minus64);
 
-template <endianness big_endian>
-inline size_t convert(const char16_t *buf, size_t len, char *latin_output) {
-  if (len == 0) {
-    return 0;
+  __m512i highbits = _mm512_xor_si512(input, _mm512_set1_epi8(-62));
+
+  *next_leading_ptr = leading >> 63;
+
+  __mmask64 bit6 = _mm512_cmpeq_epi8_mask(highbits, one);
+  input =
+      _mm512_mask_sub_epi8(input, (bit6 << 1) | *next_bit6_ptr, input, minus64);
+  *next_bit6_ptr = bit6 >> 63;
+
+  __mmask64 retain = ~leading & load_mask;
+  __m512i output = _mm512_maskz_compress_epi8(retain, input);
+  int64_t written_out = count_ones(retain);
+  if (written_out == 0) {
+    return 0; // Indicates error
   }
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
+  __mmask64 store_mask = ~UINT64_C(0) >> (64 - written_out);
+  // Optimization opportunity: sometimes, masked writes are not needed.
+  _mm512_mask_storeu_epi8((__m512i *)latin_output, store_mask, output);
+  return written_out;
+}
+
+size_t valid_utf8_to_latin1_avx512(const char *buf, size_t len,
+                                   char *latin_output) {
+  char *start = latin_output;
   size_t pos = 0;
-  char *current_write = latin_output;
-  uint16_t word = 0;
-  uint16_t too_large = 0;
+  __m512i minus64 = _mm512_set1_epi8(-64); // 11111111111 ... 1100 0000
+  __m512i one = _mm512_set1_epi8(1);
+  __mmask64 next_leading = 0;
+  __mmask64 next_bit6 = 0;
 
-  while (pos < len) {
-    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    too_large |= word;
-    *current_write++ = char(word & 0xFF);
-    pos++;
+  while (pos + 64 <= len) {
+    size_t written = process_valid_block_from_utf8_to_latin1<false>(
+        buf + pos, 64, latin_output, minus64, one, &next_leading, &next_bit6);
+    latin_output += written;
+    pos += 64;
   }
-  if ((too_large & 0xFF00) != 0) {
-    return 0;
+
+  if (pos < len) {
+    size_t remaining = len - pos;
+    size_t written = process_valid_block_from_utf8_to_latin1<true>(
+        buf + pos, remaining, latin_output, minus64, one, &next_leading,
+        &next_bit6);
+    latin_output += written;
   }
 
-  return current_write - latin_output;
+  return (size_t)(latin_output - start);
 }
-
+/* end file src/icelake/icelake_convert_valid_utf8_to_latin1.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf16_to_latin1.inl.cpp */
+// file included directly
 template <endianness big_endian>
-inline result convert_with_errors(const char16_t *buf, size_t len,
-                                  char *latin_output) {
-  if (len == 0) {
-    return result(error_code::SUCCESS, 0);
+size_t icelake_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                                       char *latin1_output) {
+  const char16_t *end = buf + len;
+  __m512i v_0xFF = _mm512_set1_epi16(0xff);
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  __m512i shufmask = _mm512_set_epi8(
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+      0, 0, 0, 0, 0, 0, 0, 62, 60, 58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38,
+      36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0);
+  while (end - buf >= 32) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
+    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
+      return 0;
+    }
+    _mm256_storeu_si256(
+        (__m256i *)latin1_output,
+        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
+    latin1_output += 32;
+    buf += 32;
   }
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char *start{latin_output};
-  uint16_t word;
-
-  while (pos < len) {
-    if (pos + 16 <= len) { // if it is safe to read 32 more bytes, check that
-                           // they are Latin1
-      uint64_t v1, v2, v3, v4;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      ::memcpy(&v2, data + pos + 4, sizeof(uint64_t));
-      ::memcpy(&v3, data + pos + 8, sizeof(uint64_t));
-      ::memcpy(&v4, data + pos + 12, sizeof(uint64_t));
-
-      if (!match_system(big_endian)) {
-        v1 = (v1 >> 8) | (v1 << (64 - 8));
-      }
-      if (!match_system(big_endian)) {
-        v2 = (v2 >> 8) | (v2 << (64 - 8));
-      }
-      if (!match_system(big_endian)) {
-        v3 = (v3 >> 8) | (v3 << (64 - 8));
-      }
-      if (!match_system(big_endian)) {
-        v4 = (v4 >> 8) | (v4 << (64 - 8));
-      }
-
-      if (((v1 | v2 | v3 | v4) & 0xFF00FF00FF00FF00) == 0) {
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *latin_output++ = !match_system(big_endian)
-                                ? char(utf16::swap_bytes(data[pos]))
-                                : char(data[pos]);
-          pos++;
-        }
-        continue;
-      }
+  if (buf < end) {
+    uint32_t mask(uint32_t(1 << (end - buf)) - 1);
+    __m512i in = _mm512_maskz_loadu_epi16(mask, buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
     }
-    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    if ((word & 0xFF00) == 0) {
-      *latin_output++ = char(word & 0xFF);
-      pos++;
-    } else {
-      return result(error_code::TOO_LARGE, pos);
+    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
+      return 0;
     }
+    _mm256_mask_storeu_epi8(
+        latin1_output, mask,
+        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
   }
-  return result(error_code::SUCCESS, latin_output - start);
+  return len;
 }
 
-} // namespace utf16_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
-
-#endif
-/* end file src/scalar/utf16_to_latin1/utf16_to_latin1.h */
-/* begin file src/scalar/utf32_to_latin1/utf32_to_latin1.h */
-#ifndef SIMDUTF_UTF32_TO_LATIN1_H
-#define SIMDUTF_UTF32_TO_LATIN1_H
-
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32_to_latin1 {
-
-inline size_t convert(const char32_t *buf, size_t len, char *latin1_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char *start = latin1_output;
-  uint32_t utf32_char;
-  size_t pos = 0;
-  uint32_t too_large = 0;
-
-  while (pos < len) {
-    utf32_char = (uint32_t)data[pos];
-    too_large |= utf32_char;
-    *latin1_output++ = (char)(utf32_char & 0xFF);
-    pos++;
-  }
-  if ((too_large & 0xFFFFFF00) != 0) {
-    return 0;
+template <endianness big_endian>
+std::pair<result, char *>
+icelake_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                            char *latin1_output) {
+  const char16_t *end = buf + len;
+  const char16_t *start = buf;
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  __m512i v_0xFF = _mm512_set1_epi16(0xff);
+  __m512i shufmask = _mm512_set_epi8(
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+      0, 0, 0, 0, 0, 0, 0, 62, 60, 58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38,
+      36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0);
+  while (end - buf >= 32) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
+    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
+      uint16_t word;
+      while ((word = (big_endian ? scalar::utf16::swap_bytes(uint16_t(*buf))
+                                 : uint16_t(*buf))) <= 0xff) {
+        *latin1_output++ = uint8_t(word);
+        buf++;
+      }
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            latin1_output);
+    }
+    _mm256_storeu_si256(
+        (__m256i *)latin1_output,
+        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
+    latin1_output += 32;
+    buf += 32;
   }
-  return latin1_output - start;
-}
+  if (buf < end) {
+    uint32_t mask(uint32_t(1 << (end - buf)) - 1);
+    __m512i in = _mm512_maskz_loadu_epi16(mask, buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
+    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
 
-inline result convert_with_errors(const char32_t *buf, size_t len,
-                                  char *latin1_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char *start{latin1_output};
-  size_t pos = 0;
-  while (pos < len) {
-    if (pos + 2 <=
-        len) { // if it is safe to read 8 more bytes, check that they are Latin1
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0xFFFFFF00FFFFFF00) == 0) {
-        *latin1_output++ = char(buf[pos]);
-        *latin1_output++ = char(buf[pos + 1]);
-        pos += 2;
-        continue;
+      uint16_t word;
+      while ((word = (big_endian ? scalar::utf16::swap_bytes(uint16_t(*buf))
+                                 : uint16_t(*buf))) <= 0xff) {
+        *latin1_output++ = uint8_t(word);
+        buf++;
       }
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            latin1_output);
     }
-    uint32_t utf32_char = data[pos];
-    if ((utf32_char & 0xFFFFFF00) ==
-        0) { // Check if the character can be represented in Latin-1
-      *latin1_output++ = (char)(utf32_char & 0xFF);
-      pos++;
-    } else {
-      return result(error_code::TOO_LARGE, pos);
-    };
+    _mm256_mask_storeu_epi8(
+        latin1_output, mask,
+        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
   }
-  return result(error_code::SUCCESS, latin1_output - start);
+  return std::make_pair(result(error_code::SUCCESS, len), latin1_output);
 }
+/* end file src/icelake/icelake_convert_utf16_to_latin1.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf16_to_utf8.inl.cpp */
+// file included directly
 
-} // namespace utf32_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
-
-#endif
-/* end file src/scalar/utf32_to_latin1/utf32_to_latin1.h */
-
-/* begin file src/scalar/utf8_to_latin1/valid_utf8_to_latin1.h */
-#ifndef SIMDUTF_VALID_UTF8_TO_LATIN1_H
-#define SIMDUTF_VALID_UTF8_TO_LATIN1_H
+/**
+ * This function converts the input (inbuf, inlen), assumed to be valid
+ * UTF16 (little endian) into UTF-8 (to outbuf). The number of code units
+ * written is written to 'outlen' and the function reports the number of input
+ * word consumed.
+ */
+template <endianness big_endian>
+size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
+                             unsigned char *outbuf, size_t *outlen) {
+  __m512i in;
+  __mmask32 inmask = _cvtu32_mask32(0x7fffffff);
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  const char16_t *const inbuf_orig = inbuf;
+  const unsigned char *const outbuf_orig = outbuf;
+  int adjust = 0;
+  int carry = 0;
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf8_to_latin1 {
+  while (inlen >= 32) {
+    in = _mm512_loadu_si512(inbuf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
+    inlen -= 31;
+  lastiteration:
+    inbuf += 31;
 
-inline size_t convert_valid(const char *buf, size_t len, char *latin_output) {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
+  failiteration:
+    const __mmask32 is234byte = _mm512_mask_cmp_epu16_mask(
+        inmask, in, _mm512_set1_epi16(0x0080), _MM_CMPINT_NLT);
 
-  size_t pos = 0;
-  char *start{latin_output};
+    if (_ktestz_mask32_u8(inmask, is234byte)) {
+      // fast path for ASCII only
+      _mm512_mask_cvtepi16_storeu_epi8(outbuf, inmask, in);
+      outbuf += 31;
+      carry = 0;
 
-  while (pos < len) {
-    // try to convert the next block of 16 ASCII bytes
-    if (pos + 16 <=
-        len) { // if it is safe to read 16 more bytes, check that they are ascii
-      uint64_t v1;
-      ::memcpy(&v1, data + pos, sizeof(uint64_t));
-      uint64_t v2;
-      ::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
-      uint64_t v{v1 |
-                 v2}; // We are only interested in these bits: 1000 1000 1000
-                      // 1000, so it makes sense to concatenate everything
-      if ((v & 0x8080808080808080) ==
-          0) { // if NONE of these are set, e.g. all of them are zero, then
-               // everything is ASCII
-        size_t final_pos = pos + 16;
-        while (pos < final_pos) {
-          *latin_output++ = char(buf[pos]);
-          pos++;
-        }
+      if (inlen < 32) {
+        goto tail;
+      } else {
         continue;
       }
     }
 
-    // suppose it is not an all ASCII byte sequence
-    uint8_t leading_byte = data[pos]; // leading byte
-    if (leading_byte < 0b10000000) {
-      // converting one ASCII byte !!!
-      *latin_output++ = char(leading_byte);
-      pos++;
-    } else if ((leading_byte & 0b11100000) ==
-               0b11000000) { // the first three bits indicate:
-      // We have a two-byte UTF-8
-      if (pos + 1 >= len) {
-        break;
-      } // minimal bound checking
-      if ((data[pos + 1] & 0b11000000) != 0b10000000) {
-        return 0;
-      } // checks if the next byte is a valid continuation byte in UTF-8. A
-        // valid continuation byte starts with 10.
-      // range check -
-      uint32_t code_point =
-          (leading_byte & 0b00011111) << 6 |
-          (data[pos + 1] &
-           0b00111111); // assembles the Unicode code point from the two bytes.
-                        // It does this by discarding the leading 110 and 10
-                        // bits from the two bytes, shifting the remaining bits
-                        // of the first byte, and then combining the results
-                        // with a bitwise OR operation.
-      *latin_output++ = char(code_point);
-      pos += 2;
-    } else {
-      // we may have a continuation but we do not do error checking
-      return 0;
-    }
-  }
-  return latin_output - start;
-}
-
-} // namespace utf8_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+    const __mmask32 is12byte =
+        _mm512_cmp_epu16_mask(in, _mm512_set1_epi16(0x0800), _MM_CMPINT_LT);
 
-#endif
-/* end file src/scalar/utf8_to_latin1/valid_utf8_to_latin1.h */
-/* begin file src/scalar/utf16_to_latin1/valid_utf16_to_latin1.h */
-#ifndef SIMDUTF_VALID_UTF16_TO_LATIN1_H
-#define SIMDUTF_VALID_UTF16_TO_LATIN1_H
+    if (_ktestc_mask32_u8(is12byte, inmask)) {
+      // fast path for 1 and 2 byte only
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf16_to_latin1 {
+      const __m512i twobytes = _mm512_ternarylogic_epi32(
+          _mm512_slli_epi16(in, 8), _mm512_srli_epi16(in, 6),
+          _mm512_set1_epi16(0x3f3f), 0xa8); // (A|B)&C
+      in = _mm512_mask_add_epi16(in, is234byte, twobytes,
+                                 _mm512_set1_epi16(int16_t(0x80c0)));
+      const __m512i cmpmask =
+          _mm512_mask_blend_epi16(inmask, _mm512_set1_epi16(int16_t(0xffff)),
+                                  _mm512_set1_epi16(0x0800));
+      const __mmask64 smoosh =
+          _mm512_cmp_epu8_mask(in, cmpmask, _MM_CMPINT_NLT);
+      const __m512i out = _mm512_maskz_compress_epi8(smoosh, in);
+      _mm512_mask_storeu_epi8(outbuf,
+                              _cvtu64_mask64(_pext_u64(_cvtmask64_u64(smoosh),
+                                                       _cvtmask64_u64(smoosh))),
+                              out);
+      outbuf += 31 + _mm_popcnt_u32(_cvtmask32_u32(is234byte));
+      carry = 0;
 
-template <endianness big_endian>
-inline size_t convert_valid(const char16_t *buf, size_t len,
-                            char *latin_output) {
-  const uint16_t *data = reinterpret_cast<const uint16_t *>(buf);
-  size_t pos = 0;
-  char *start{latin_output};
-  uint16_t word = 0;
+      if (inlen < 32) {
+        goto tail;
+      } else {
+        continue;
+      }
+    }
+    __m512i lo = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
+    __m512i hi = _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
 
-  while (pos < len) {
-    word = !match_system(big_endian) ? utf16::swap_bytes(data[pos]) : data[pos];
-    *latin_output++ = char(word);
-    pos++;
-  }
+    __m512i taglo = _mm512_set1_epi32(0x8080e000);
+    __m512i taghi = taglo;
 
-  return latin_output - start;
-}
+    const __m512i fc00masked =
+        _mm512_and_epi32(in, _mm512_set1_epi16(int16_t(0xfc00)));
+    const __mmask32 hisurr = _mm512_mask_cmp_epu16_mask(
+        inmask, fc00masked, _mm512_set1_epi16(int16_t(0xd800)), _MM_CMPINT_EQ);
+    const __mmask32 losurr = _mm512_cmp_epu16_mask(
+        fc00masked, _mm512_set1_epi16(int16_t(0xdc00)), _MM_CMPINT_EQ);
 
-} // namespace utf16_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+    int carryout = 0;
+    if (!_kortestz_mask32_u8(hisurr, losurr)) {
+      // handle surrogates
 
-#endif
-/* end file src/scalar/utf16_to_latin1/valid_utf16_to_latin1.h */
-/* begin file src/scalar/utf32_to_latin1/valid_utf32_to_latin1.h */
-#ifndef SIMDUTF_VALID_UTF32_TO_LATIN1_H
-#define SIMDUTF_VALID_UTF32_TO_LATIN1_H
+      __m512i los = _mm512_alignr_epi32(hi, lo, 1);
+      __m512i his = _mm512_alignr_epi32(lo, hi, 1);
 
-namespace simdutf {
-namespace scalar {
-namespace {
-namespace utf32_to_latin1 {
+      const __mmask32 hisurrhi = _kshiftri_mask32(hisurr, 16);
+      taglo = _mm512_mask_mov_epi32(taglo, __mmask16(hisurr),
+                                    _mm512_set1_epi32(0x808080f0));
+      taghi = _mm512_mask_mov_epi32(taghi, __mmask16(hisurrhi),
+                                    _mm512_set1_epi32(0x808080f0));
 
-inline size_t convert_valid(const char32_t *buf, size_t len,
-                            char *latin1_output) {
-  const uint32_t *data = reinterpret_cast<const uint32_t *>(buf);
-  char *start = latin1_output;
-  uint32_t utf32_char;
-  size_t pos = 0;
+      lo = _mm512_mask_slli_epi32(lo, __mmask16(hisurr), lo, 10);
+      hi = _mm512_mask_slli_epi32(hi, __mmask16(hisurrhi), hi, 10);
+      los = _mm512_add_epi32(los, _mm512_set1_epi32(0xfca02400));
+      his = _mm512_add_epi32(his, _mm512_set1_epi32(0xfca02400));
+      lo = _mm512_mask_add_epi32(lo, __mmask16(hisurr), lo, los);
+      hi = _mm512_mask_add_epi32(hi, __mmask16(hisurrhi), hi, his);
 
-  while (pos < len) {
-    utf32_char = (uint32_t)data[pos];
+      carryout = _cvtu32_mask32(_kshiftri_mask32(hisurr, 30));
 
-    if (pos + 2 <=
-        len) { // if it is safe to read 8 more bytes, check that they are Latin1
-      uint64_t v;
-      ::memcpy(&v, data + pos, sizeof(uint64_t));
-      if ((v & 0xFFFFFF00FFFFFF00) == 0) {
-        *latin1_output++ = char(buf[pos]);
-        *latin1_output++ = char(buf[pos + 1]);
-        pos += 2;
-        continue;
-      } else {
-        // output can not be represented in latin1
-        return 0;
+      const uint32_t h = _cvtmask32_u32(hisurr);
+      const uint32_t l = _cvtmask32_u32(losurr);
+      // check for mismatched surrogates
+      if ((h + h + carry) ^ l) {
+        const uint32_t lonohi = l & ~(h + h + carry);
+        const uint32_t hinolo = h & ~(l >> 1);
+        inlen = _tzcnt_u32(hinolo | lonohi);
+        inmask = __mmask32(0x7fffffff & ((1U << inlen) - 1));
+        in = _mm512_maskz_mov_epi16(inmask, in);
+        adjust = (int)inlen - 31;
+        inlen = 0;
+        goto failiteration;
       }
     }
-    if ((utf32_char & 0xFFFFFF00) == 0) {
-      *latin1_output++ = char(utf32_char);
-    } else {
-      // output can not be represented in latin1
-      return 0;
-    }
-    pos++;
-  }
-  return latin1_output - start;
-}
 
-} // namespace utf32_to_latin1
-} // unnamed namespace
-} // namespace scalar
-} // namespace simdutf
+    hi = _mm512_maskz_mov_epi32(_cvtu32_mask16(0x7fff), hi);
+    carry = carryout;
 
-#endif
-/* end file src/scalar/utf32_to_latin1/valid_utf32_to_latin1.h */
+    __m512i mslo =
+        _mm512_multishift_epi64_epi8(_mm512_set1_epi64(0x20262c3200060c12), lo);
 
-SIMDUTF_PUSH_DISABLE_WARNINGS
-SIMDUTF_DISABLE_UNDESIRED_WARNINGS
+    __m512i mshi =
+        _mm512_multishift_epi64_epi8(_mm512_set1_epi64(0x20262c3200060c12), hi);
 
-#if SIMDUTF_IMPLEMENTATION_ARM64
-/* begin file src/arm64/implementation.cpp */
-/* begin file src/simdutf/arm64/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "arm64"
-// #define SIMDUTF_IMPLEMENTATION arm64
-/* end file src/simdutf/arm64/begin.h */
-namespace simdutf {
-namespace arm64 {
-namespace {
-#ifndef SIMDUTF_ARM64_H
-  #error "arm64.h must be included"
-#endif
-using namespace simd;
+    const __mmask32 outmask = __mmask32(_kandn_mask64(losurr, inmask));
+    const __mmask64 outmhi = _kshiftri_mask64(outmask, 16);
 
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
-  simd8<uint8_t> bits = input.reduce_or();
-  return bits.max_val() < 0b10000000u;
-}
+    const __mmask32 is1byte = __mmask32(_knot_mask64(is234byte));
+    const __mmask64 is1bhi = _kshiftri_mask64(is1byte, 16);
+    const __mmask64 is12bhi = _kshiftri_mask64(is12byte, 16);
 
-simdutf_unused simdutf_really_inline simd8<bool>
-must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
-                     const simd8<uint8_t> prev3) {
-  simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
-  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
-  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
-  // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller
-  // is using ^ as well. This will work fine because we only have to report
-  // errors for cases with 0-1 lead bytes. Multiple lead bytes implies 2
-  // overlapping multibyte characters, and if that happens, there is guaranteed
-  // to be at least *one* lead byte that is part of only 1 other multibyte
-  // character. The error will be detected there.
-  return is_second_byte ^ is_third_byte ^ is_fourth_byte;
-}
+    taglo = _mm512_mask_mov_epi32(taglo, __mmask16(is12byte),
+                                  _mm512_set1_epi32(0x80c00000));
+    taghi = _mm512_mask_mov_epi32(taghi, __mmask16(is12bhi),
+                                  _mm512_set1_epi32(0x80c00000));
+    __m512i magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
+                                              _mm512_set1_epi32(0xffffffff),
+                                              _mm512_set1_epi32(0x00010101));
+    __m512i magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
+                                              _mm512_set1_epi32(0xffffffff),
+                                              _mm512_set1_epi32(0x00010101));
 
-simdutf_really_inline simd8<bool>
-must_be_2_3_continuation(const simd8<uint8_t> prev2,
-                         const simd8<uint8_t> prev3) {
-  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
-  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
-  return is_third_byte ^ is_fourth_byte;
-}
+    magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
+                                      _mm512_set1_epi32(0xffffffff),
+                                      _mm512_set1_epi32(0x00010101));
+    magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
+                                      _mm512_set1_epi32(0xffffffff),
+                                      _mm512_set1_epi32(0x00010101));
 
-// common functions for utf8 conversions
-simdutf_really_inline uint16x4_t convert_utf8_3_byte_to_utf16(uint8x16_t in) {
-  // Low half contains  10cccccc|1110aaaa
-  // High half contains 10bbbbbb|10bbbbbb
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t sh = simdutf_make_uint8x16_t(0, 2, 3, 5, 6, 8, 9, 11, 1, 1,
-                                                4, 4, 7, 7, 10, 10);
-#else
-  const uint8x16_t sh = {0, 2, 3, 5, 6, 8, 9, 11, 1, 1, 4, 4, 7, 7, 10, 10};
-#endif
-  uint8x16_t perm = vqtbl1q_u8(in, sh);
-  // Split into half vectors.
-  // 10cccccc|1110aaaa
-  uint8x8_t perm_low = vget_low_u8(perm); // no-op
-  // 10bbbbbb|10bbbbbb
-  uint8x8_t perm_high = vget_high_u8(perm);
-  // xxxxxxxx 10bbbbbb
-  uint16x4_t mid = vreinterpret_u16_u8(perm_high); // no-op
-  // xxxxxxxx 1110aaaa
-  uint16x4_t high = vreinterpret_u16_u8(perm_low); // no-op
-  // Assemble with shift left insert.
-  // xxxxxxaa aabbbbbb
-  uint16x4_t mid_high = vsli_n_u16(mid, high, 6);
-  // (perm_low << 8) | (perm_low >> 8)
-  // xxxxxxxx 10cccccc
-  uint16x4_t low = vreinterpret_u16_u8(vrev16_u8(perm_low));
-  // Shift left insert into the low bits
-  // aaaabbbb bbcccccc
-  uint16x4_t composed = vsli_n_u16(low, mid_high, 6);
-  return composed;
-}
+    mslo = _mm512_ternarylogic_epi32(mslo, _mm512_set1_epi32(0x3f3f3f3f), taglo,
+                                     0xea); // A&B|C
+    mshi = _mm512_ternarylogic_epi32(mshi, _mm512_set1_epi32(0x3f3f3f3f), taghi,
+                                     0xea);
+    mslo = _mm512_mask_slli_epi32(mslo, __mmask16(is1byte), lo, 24);
 
-simdutf_really_inline uint16x8_t convert_utf8_2_byte_to_utf16(uint8x16_t in) {
-  // Converts 6 2 byte UTF-8 characters to 6 UTF-16 characters.
-  // Technically this calculates 8, but 6 does better and happens more often
-  // (The languages which use these codepoints use ASCII spaces so 8 would need
-  // to be in the middle of a very long word).
+    mshi = _mm512_mask_slli_epi32(mshi, __mmask16(is1bhi), hi, 24);
 
-  // 10bbbbbb 110aaaaa
-  uint16x8_t upper = vreinterpretq_u16_u8(in);
-  // (in << 8) | (in >> 8)
-  // 110aaaaa 10bbbbbb
-  uint16x8_t lower = vreinterpretq_u16_u8(vrev16q_u8(in));
-  // 00000000 000aaaaa
-  uint16x8_t upper_masked = vandq_u16(upper, vmovq_n_u16(0x1F));
-  // Assemble with shift left insert.
-  // 00000aaa aabbbbbb
-  uint16x8_t composed = vsliq_n_u16(lower, upper_masked, 6);
-  return composed;
-}
+    const __mmask64 wantlo =
+        _mm512_cmp_epu8_mask(mslo, magiclo, _MM_CMPINT_NLT);
+    const __mmask64 wanthi =
+        _mm512_cmp_epu8_mask(mshi, magichi, _MM_CMPINT_NLT);
+    const __m512i outlo = _mm512_maskz_compress_epi8(wantlo, mslo);
+    const __m512i outhi = _mm512_maskz_compress_epi8(wanthi, mshi);
+    const uint64_t wantlo_uint64 = _cvtmask64_u64(wantlo);
+    const uint64_t wanthi_uint64 = _cvtmask64_u64(wanthi);
 
-simdutf_really_inline uint16x8_t
-convert_utf8_1_to_2_byte_to_utf16(uint8x16_t in, size_t shufutf8_idx) {
-  // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
-  // This is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six
-  // code code units spanning between 1 and 2 bytes each is 12 bytes.
-  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-      simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]));
-  // Shuffle
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 110aaaaa 10bbbbbb
-  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
-  // Mask
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000000 00bbbbbb
-  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
-  // 1 byte: 00000000 00000000
-  // 2 byte: 000aaaaa 00000000
-  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
-  // Combine with a shift right accumulate
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000aaa aabbbbbb
-  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
-  return composed;
+    uint64_t advlo = _mm_popcnt_u64(wantlo_uint64);
+    uint64_t advhi = _mm_popcnt_u64(wanthi_uint64);
+
+    _mm512_mask_storeu_epi8(
+        outbuf, _cvtu64_mask64(_pext_u64(wantlo_uint64, wantlo_uint64)), outlo);
+    _mm512_mask_storeu_epi8(
+        outbuf + advlo, _cvtu64_mask64(_pext_u64(wanthi_uint64, wanthi_uint64)),
+        outhi);
+    outbuf += advlo + advhi;
+  }
+  outbuf += -adjust;
+
+tail:
+  if (inlen != 0) {
+    // We must have inlen < 31.
+    inmask = _cvtu32_mask32((1U << inlen) - 1);
+    in = _mm512_maskz_loadu_epi16(inmask, inbuf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
+    }
+    adjust = (int)inlen - 31;
+    inlen = 0;
+    goto lastiteration;
+  }
+  *outlen = (outbuf - outbuf_orig) + adjust;
+  return ((inbuf - inbuf_orig) + adjust);
 }
+/* end file src/icelake/icelake_convert_utf16_to_utf8.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf16_to_utf32.inl.cpp */
+// file included directly
 
-/* begin file src/arm64/arm_validate_utf16.cpp */
+/*
+  Returns a pair: the first unprocessed byte from buf and utf32_output
+  A scalar routing should carry on the conversion of the tail.
+*/
 template <endianness big_endian>
-const char16_t *arm_validate_utf16(const char16_t *input, size_t size) {
-  const char16_t *end = input + size;
-  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-  const auto v_fc = simd8<uint8_t>::splat(0xfc);
-  const auto v_dc = simd8<uint8_t>::splat(0xdc);
-  while (end - input >= 16) {
-    // 0. Load data: since the validation takes into account only higher
-    //    byte of each word, we compress the two vectors into one which
-    //    consists only the higher bytes.
-    auto in0 = simd16<uint16_t>(input);
-    auto in1 =
-        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
-    if (!match_system(big_endian)) {
-      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
-      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+std::tuple<const char16_t *, char32_t *, bool>
+convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                       char32_t *utf32_output) {
+  const char16_t *end = buf + len;
+  const __m512i v_fc00 = _mm512_set1_epi16((uint16_t)0xfc00);
+  const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
+  const __m512i v_dc00 = _mm512_set1_epi16((uint16_t)0xdc00);
+  __mmask32 carry{0};
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  while (std::distance(buf, end) >= 32) {
+    // Always safe because buf + 32 <= end so that end - buf >= 32 bytes:
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (big_endian) {
+      in = _mm512_shuffle_epi8(in, byteflip);
     }
-    const auto t0 = in0.shr<8>();
-    const auto t1 = in1.shr<8>();
-    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
-    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
-    if (surrogates_wordmask == 0) {
-      input += 16;
-    } else {
-      // 2. We have some surrogates that have to be distinguished:
-      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-      //
-      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
 
-      // V - non-surrogate code units
-      //     V = not surrogates_wordmask
-      const uint64_t V = ~surrogates_wordmask;
+    // H - bitmask for high surrogates
+    const __mmask32 H =
+        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_d800);
+    // H - bitmask for low surrogates
+    const __mmask32 L =
+        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_dc00);
 
-      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-      const auto vH = ((in & v_fc) == v_dc);
-      const uint64_t H = vH.to_bitmask64();
+    if ((H | L)) {
+      // surrogate pair(s) in a register
+      const __mmask32 V =
+          (L ^
+           (carry | (H << 1))); // A high surrogate must be followed by low one
+                                // and a low one must be preceded by a high one.
+                                // If valid, V should be equal to 0
 
-      // L - word mask for low surrogates
-      //     L = not H and surrogates_wordmask
-      const uint64_t L = ~H & surrogates_wordmask;
+      if (V == 0) {
+        // valid case
+        /*
+            Input surrogate pair:
+            |1101.11aa.aaaa.aaaa|1101.10bb.bbbb.bbbb|
+                low surrogate      high surrogate
+        */
+        /*  1. Expand all code units to 32-bit code units
+            in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
+        */
+        const __m512i first = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
+        const __m512i second =
+            _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
 
-      const uint64_t a =
-          L & (H >> 4); // A low surrogate must be followed by high one.
-                        // (A low surrogate placed in the 7th register's word
-                        // is an exception we handle.)
-      const uint64_t b =
-          a << 4; // Just mark that the opposite fact is hold,
-                  // thanks to that we have only two masks for valid case.
-      const uint64_t c = V | a | b; // Combine all the masks into the final one.
-      if (c == ~0ull) {
-        // The whole input register contains valid UTF-16, i.e.,
-        // either single code units or proper surrogate pairs.
-        input += 16;
-      } else if (c == 0xfffffffffffffffull) {
-        // The 15 lower code units of the input register contains valid UTF-16.
-        // The 15th word may be either a low or high surrogate. It the next
-        // iteration we 1) check if the low surrogate is followed by a high
-        // one, 2) reject sole high surrogate.
-        input += 15;
+        /*  2. Shift by one 16-bit word to align low surrogates with high
+           surrogates in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
+            shifted
+           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
+        */
+        const __m512i shifted_first = _mm512_alignr_epi32(second, first, 1);
+        const __m512i shifted_second =
+            _mm512_alignr_epi32(_mm512_setzero_si512(), second, 1);
+
+        /*  3. Align all high surrogates in first and second by shifting to the
+           left by 10 bits
+            |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
+        */
+        const __m512i aligned_first =
+            _mm512_mask_slli_epi32(first, (__mmask16)H, first, 10);
+        const __m512i aligned_second =
+            _mm512_mask_slli_epi32(second, (__mmask16)(H >> 16), second, 10);
+
+        /*  4. Remove surrogate prefixes and add offset 0x10000 by adding in,
+           shifted and constant in
+           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
+            shifted
+           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
+            constant|1111.1100.1010.0000.0010.0100.0000.0000|1111.1100.1010.0000.0010.0100.0000.0000|
+        */
+        const __m512i constant = _mm512_set1_epi32((uint32_t)0xfca02400);
+        const __m512i added_first = _mm512_mask_add_epi32(
+            aligned_first, (__mmask16)H, aligned_first, shifted_first);
+        const __m512i utf32_first = _mm512_mask_add_epi32(
+            added_first, (__mmask16)H, added_first, constant);
+
+        const __m512i added_second =
+            _mm512_mask_add_epi32(aligned_second, (__mmask16)(H >> 16),
+                                  aligned_second, shifted_second);
+        const __m512i utf32_second = _mm512_mask_add_epi32(
+            added_second, (__mmask16)(H >> 16), added_second, constant);
+
+        //  5. Store all valid UTF-32 code units (low surrogate positions and
+        //  32nd word are invalid)
+        const __mmask32 valid = ~L & 0x7fffffff;
+        // We deliberately do a _mm512_maskz_compress_epi32 followed by
+        // storeu_epi32 to ease performance portability to Zen 4.
+        const __m512i compressed_first =
+            _mm512_maskz_compress_epi32((__mmask16)(valid), utf32_first);
+        const size_t howmany1 = count_ones((uint16_t)(valid));
+        _mm512_storeu_si512((__m512i *)utf32_output, compressed_first);
+        utf32_output += howmany1;
+        const __m512i compressed_second =
+            _mm512_maskz_compress_epi32((__mmask16)(valid >> 16), utf32_second);
+        const size_t howmany2 = count_ones((uint16_t)(valid >> 16));
+        // The following could be unsafe in some cases?
+        //_mm512_storeu_epi32((__m512i *) utf32_output, compressed_second);
+        _mm512_mask_storeu_epi32((__m512i *)utf32_output,
+                                 __mmask16((1 << howmany2) - 1),
+                                 compressed_second);
+        utf32_output += howmany2;
+        // Only process 31 code units, but keep track if the 31st word is a high
+        // surrogate as a carry
+        buf += 31;
+        carry = (H >> 30) & 0x1;
       } else {
-        return nullptr;
+        // invalid case
+        return std::make_tuple(buf + carry, utf32_output, false);
+      }
+    } else {
+      // no surrogates
+      // extend all thirty-two 16-bit code units to thirty-two 32-bit code units
+      _mm512_storeu_si512((__m512i *)(utf32_output),
+                          _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in)));
+      _mm512_storeu_si512(
+          (__m512i *)(utf32_output) + 1,
+          _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1)));
+      utf32_output += 32;
+      buf += 32;
+      carry = 0;
+    }
+  } // while
+  return std::make_tuple(buf + carry, utf32_output, true);
+}
+/* end file src/icelake/icelake_convert_utf16_to_utf32.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf32_to_latin1.inl.cpp */
+// file included directly
+size_t icelake_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                                       char *latin1_output) {
+  const char32_t *end = buf + len;
+  __m512i v_0xFF = _mm512_set1_epi32(0xff);
+  __m512i shufmask = _mm512_set_epi8(
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60,
+      56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4, 0);
+  while (end - buf >= 16) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
+      return 0;
+    }
+    _mm_storeu_si128(
+        (__m128i *)latin1_output,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+    latin1_output += 16;
+    buf += 16;
+  }
+  if (buf < end) {
+    uint16_t mask = uint16_t((1 << (end - buf)) - 1);
+    __m512i in = _mm512_maskz_loadu_epi32(mask, buf);
+    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
+      return 0;
+    }
+    _mm_mask_storeu_epi8(
+        latin1_output, mask,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+  }
+  return len;
+}
+
+std::pair<result, char *>
+icelake_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                            char *latin1_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
+  __m512i v_0xFF = _mm512_set1_epi32(0xff);
+  __m512i shufmask = _mm512_set_epi8(
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60,
+      56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4, 0);
+  while (end - buf >= 16) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
+      while (uint32_t(*buf) <= 0xff) {
+        *latin1_output++ = uint8_t(*buf++);
       }
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            latin1_output);
     }
+    _mm_storeu_si128(
+        (__m128i *)latin1_output,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+    latin1_output += 16;
+    buf += 16;
   }
-  return input;
+  if (buf < end) {
+    uint16_t mask = uint16_t((1 << (end - buf)) - 1);
+    __m512i in = _mm512_maskz_loadu_epi32(mask, buf);
+    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
+      while (uint32_t(*buf) <= 0xff) {
+        *latin1_output++ = uint8_t(*buf++);
+      }
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            latin1_output);
+    }
+    _mm_mask_storeu_epi8(
+        latin1_output, mask,
+        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+  }
+  return std::make_pair(result(error_code::SUCCESS, len), latin1_output);
 }
+/* end file src/icelake/icelake_convert_utf32_to_latin1.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf32_to_utf8.inl.cpp */
+// file included directly
 
-template <endianness big_endian>
-const result arm_validate_utf16_with_errors(const char16_t *input,
-                                            size_t size) {
-  const char16_t *start = input;
-  const char16_t *end = input + size;
+// Todo: currently, this is just the haswell code, optimize for icelake kernel.
+std::pair<const char32_t *, char *>
+avx512_convert_utf32_to_utf8(const char32_t *buf, size_t len,
+                             char *utf8_output) {
+  const char32_t *end = buf + len;
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
+  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
+  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
+  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
+  __m256i running_max = _mm256_setzero_si256();
+  __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
-  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
-  const auto v_fc = simd8<uint8_t>::splat(0xfc);
-  const auto v_dc = simd8<uint8_t>::splat(0xdc);
-  while (input + 16 < end) {
-    // 0. Load data: since the validation takes into account only higher
-    //    byte of each word, we compress the two vectors into one which
-    //    consists only the higher bytes.
-    auto in0 = simd16<uint16_t>(input);
-    auto in1 =
-        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-    if (!match_system(big_endian)) {
-      in0 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in0)));
-      in1 = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in1)));
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
+    running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
+
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
+    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
+
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
-    const auto t0 = in0.shr<8>();
-    const auto t1 = in1.shr<8>();
-    const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
-    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
-    const uint64_t surrogates_wordmask = ((in & v_f8) == v_d8).to_bitmask64();
-    if (surrogates_wordmask == 0) {
-      input += 16;
-    } else {
-      // 2. We have some surrogates that have to be distinguished:
-      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
-      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
-      //
-      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-      // V - non-surrogate code units
-      //     V = not surrogates_wordmask
-      const uint64_t V = ~surrogates_wordmask;
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
-      const auto vH = ((in & v_fc) == v_dc);
-      const uint64_t H = vH.to_bitmask64();
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-      // L - word mask for low surrogates
-      //     L = not H and surrogates_wordmask
-      const uint64_t L = ~H & surrogates_wordmask;
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
-      const uint64_t a =
-          L & (H >> 4); // A low surrogate must be followed by high one.
-                        // (A low surrogate placed in the 7th register's word
-                        // is an exception we handle.)
-      const uint64_t b =
-          a << 4; // Just mark that the opposite fact is hold,
-                  // thanks to that we have only two masks for valid case.
-      const uint64_t c = V | a | b; // Combine all the masks into the final one.
-      if (c == ~0ull) {
-        // The whole input register contains valid UTF-16, i.e.,
-        // either single code units or proper surrogate pairs.
-        input += 16;
-      } else if (c == 0xfffffffffffffffull) {
-        // The 15 lower code units of the input register contains valid UTF-16.
-        // The 15th word may be either a low or high surrogate. It the next
-        // iteration we 1) check if the low surrogate is followed by a high
-        // one, 2) reject sole high surrogate.
-        input += 15;
-      } else {
-        return result(error_code::SURROGATE, input - start);
-      }
-    }
-  }
-  return result(error_code::SUCCESS, input - start);
-}
-/* end file src/arm64/arm_validate_utf16.cpp */
-/* begin file src/arm64/arm_validate_utf32le.cpp */
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-const char32_t *arm_validate_utf32le(const char32_t *input, size_t size) {
-  const char32_t *end = input + size;
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
-  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
-  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
-  uint32x4_t currentmax = vmovq_n_u32(0x0);
-  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-  while (end - input >= 4) {
-    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
-    currentmax = vmaxq_u32(in, currentmax);
-    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
-    input += 4;
-  }
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
 
-  uint32x4_t is_zero =
-      veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
-  if (vmaxvq_u32(is_zero) != 0) {
-    return nullptr;
-  }
+      // 6. adjust pointers
+      buf += 16;
+      continue;
+    }
+    // Must check for overflow in packing
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    if (saturation_bitmask == 0xffffffff) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
 
-  is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
-                      standardoffsetmax);
-  if (vmaxvq_u32(is_zero) != 0) {
-    return nullptr;
-  }
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-  return input;
-}
+      /* In this branch we handle three cases:
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-const result arm_validate_utf32le_with_errors(const char32_t *input,
-                                              size_t size) {
-  const char32_t *start = input;
-  const char32_t *end = input + size;
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-  const uint32x4_t standardmax = vmovq_n_u32(0x10ffff);
-  const uint32x4_t offset = vmovq_n_u32(0xffff2000);
-  const uint32x4_t standardoffsetmax = vmovq_n_u32(0xfffff7ff);
-  uint32x4_t currentmax = vmovq_n_u32(0x0);
-  uint32x4_t currentoffsetmax = vmovq_n_u32(0x0);
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-  while (end - input >= 4) {
-    const uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input));
-    currentmax = vmaxq_u32(in, currentmax);
-    currentoffsetmax = vmaxq_u32(vaddq_u32(in, offset), currentoffsetmax);
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-    uint32x4_t is_zero =
-        veorq_u32(vmaxq_u32(currentmax, standardmax), standardmax);
-    if (vmaxvq_u32(is_zero) != 0) {
-      return result(error_code::TOO_LARGE, input - start);
-    }
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-    is_zero = veorq_u32(vmaxq_u32(currentoffsetmax, standardoffsetmax),
-                        standardoffsetmax);
-    if (vmaxvq_u32(is_zero) != 0) {
-      return result(error_code::SURROGATE, input - start);
-    }
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-    input += 4;
-  }
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-  return result(error_code::SUCCESS, input - start);
-}
-/* end file src/arm64/arm_validate_utf32le.cpp */
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-/* begin file src/arm64/arm_convert_latin1_to_utf16.cpp */
-template <endianness big_endian>
-std::pair<const char *, char16_t *>
-arm_convert_latin1_to_utf16(const char *buf, size_t len,
-                            char16_t *utf16_output) {
-  const char *end = buf + len;
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
-  while (end - buf >= 16) {
-    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
-    uint16x8_t inlow = vmovl_u8(vget_low_u8(in8));
-    if (!match_system(big_endian)) {
-      inlow = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inlow)));
-    }
-    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), inlow);
-    uint16x8_t inhigh = vmovl_u8(vget_high_u8(in8));
-    if (!match_system(big_endian)) {
-      inhigh = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(inhigh)));
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+    } else {
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
     }
-    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output + 8), inhigh);
-    utf16_output += 16;
-    buf += 16;
+  } // while
+
+  // check for invalid input
+  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
+          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
+    return std::make_pair(nullptr, utf8_output);
   }
 
-  return std::make_pair(buf, utf16_output);
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf8_output);
+  }
+
+  return std::make_pair(buf, utf8_output);
 }
-/* end file src/arm64/arm_convert_latin1_to_utf16.cpp */
-/* begin file src/arm64/arm_convert_latin1_to_utf32.cpp */
-std::pair<const char *, char32_t *>
-arm_convert_latin1_to_utf32(const char *buf, size_t len,
-                            char32_t *utf32_output) {
-  const char *end = buf + len;
 
-  while (end - buf >= 16) {
-    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(buf));
-    uint16x8_t in8low = vmovl_u8(vget_low_u8(in8));
-    uint32x4_t in16lowlow = vmovl_u16(vget_low_u16(in8low));
-    uint32x4_t in16lowhigh = vmovl_u16(vget_high_u16(in8low));
-    uint16x8_t in8high = vmovl_u8(vget_high_u8(in8));
-    uint32x4_t in8highlow = vmovl_u16(vget_low_u16(in8high));
-    uint32x4_t in8highhigh = vmovl_u16(vget_high_u16(in8high));
-    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output), in16lowlow);
-    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 4), in16lowhigh);
-    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 8), in8highlow);
-    vst1q_u32(reinterpret_cast<uint32_t *>(utf32_output + 12), in8highhigh);
+// Todo: currently, this is just the haswell code, optimize for icelake kernel.
+std::pair<result, char *>
+avx512_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                         char *utf8_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
 
-    utf32_output += 16;
-    buf += 16;
-  }
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
+  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
+  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
+  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
+  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
 
-  return std::make_pair(buf, utf32_output);
-}
-/* end file src/arm64/arm_convert_latin1_to_utf32.cpp */
-/* begin file src/arm64/arm_convert_latin1_to_utf8.cpp */
-/*
-  Returns a pair: the first unprocessed byte from buf and utf8_output
-  A scalar routing should carry on the conversion of the tail.
-*/
-std::pair<const char *, char *>
-arm_convert_latin1_to_utf8(const char *latin1_input, size_t len,
-                           char *utf8_out) {
-  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
-  const char *end = latin1_input + len;
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  // We always write 16 bytes, of which more than the first 8 bytes
-  // are valid. A safety margin of 8 is more than sufficient.
-  while (end - latin1_input >= 16 + 8) {
-    uint8x16_t in8 = vld1q_u8(reinterpret_cast<const uint8_t *>(latin1_input));
-    if (vmaxvq_u8(in8) <= 0x7F) { // ASCII fast path!!!!
-      vst1q_u8(utf8_output, in8);
-      utf8_output += 16;
-      latin1_input += 16;
-      continue;
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
+    // Check for too large input
+    const __m256i max_input =
+        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
+    if (static_cast<uint32_t>(_mm256_movemask_epi8(
+            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            utf8_output);
     }
 
-    // We just fallback on UTF-16 code. This could be optimized/simplified
-    // further.
-    uint16x8_t in16 = vmovl_u8(vget_low_u8(in8));
-    // 1. prepare 2-byte values
-    // input 8-bit word : [aabb|bbbb] x 8
-    // expected output   : [1100|00aa|10bb|bbbb] x 8
-    const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-    const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
+    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-    // t0 = [0000|00aa|bbbb|bb00]
-    const uint16x8_t t0 = vshlq_n_u16(in16, 2);
-    // t1 = [0000|00aa|0000|0000]
-    const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-    // t2 = [0000|0000|00bb|bbbb]
-    const uint16x8_t t2 = vandq_u16(in16, v_003f);
-    // t3 = [0000|00aa|00bb|bbbb]
-    const uint16x8_t t3 = vorrq_u16(t1, t2);
-    // t4 = [1100|00aa|10bb|bbbb]
-    const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-    // 2. merge ASCII and 2-byte codewords
-    const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-    const uint16x8_t one_byte_bytemask = vcleq_u16(in16, v_007f);
-    const uint8x16_t utf8_unpacked =
-        vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in16, t4));
-    // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-    const uint16x8_t mask = simdutf_make_uint16x8_t(
-        0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
-#else
-    const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
-                             0x0002, 0x0008, 0x0020, 0x0080};
-#endif
-    uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-    // 4. pack the bytes
-    const uint8_t *row =
-        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-    const uint8x16_t shuffle = vld1q_u8(row + 1);
-    const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-    // 5. store bytes
-    vst1q_u8(utf8_output, utf8_packed);
-    // 6. adjust pointers
-    latin1_input += 8;
-    utf8_output += row[0];
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
+    }
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-  } // while
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-  return std::make_pair(latin1_input, reinterpret_cast<char *>(utf8_output));
-}
-/* end file src/arm64/arm_convert_latin1_to_utf8.cpp */
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-/* begin file src/arm64/arm_convert_utf8_to_latin1.cpp */
-// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 16, usually 12).
-size_t convert_masked_utf8_to_latin1(const char *input,
-                                     uint64_t utf8_end_of_code_point_mask,
-                                     char *&latin1_output) {
-  // we use an approach where we try to process up to 12 input bytes.
-  // Why 12 input bytes and not 16? Because we are concerned with the size of
-  // the lookup tables. Also 12 is nicely divisible by two and three.
-  //
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff;
-  //
-  // Optimization note: our main path below is load-latency dependent. Thus it
-  // is maybe beneficial to have fast paths that depend on branch prediction but
-  // have less latency. This results in more instructions but, potentially, also
-  // higher speeds.
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
-  // We first try a few fast paths.
-  // The obvious first test is ASCII, which actually consumes the full 16.
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process in chunks of 12 bytes
-    vst1q_u8(reinterpret_cast<uint8_t *>(latin1_output), in);
-    latin1_output += 12; // We wrote 12 18-bit characters.
-    return 12;           // We consumed 12 bytes.
-  }
-  /// We do not have a fast path available, or the fast path is unimportant, so
-  /// we fallback.
-  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][0];
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][1];
-  // this indicates an invalid input:
-  if (idx >= 64) {
-    return consumed;
-  }
-  // Here we should have (idx < 64), if not, there is a bug in the validation or
-  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six
-  // code code units spanning between 1 and 2 bytes each is 12 bytes. Converts 6
-  // 1-2 byte UTF-8 characters to 6 UTF-16 characters. This is a relatively easy
-  // scenario we process SIX (6) input code-code units. The max length in bytes
-  // of six code code units spanning between 1 and 2 bytes each is 12 bytes.
-  uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-      simdutf::tables::utf8_to_utf16::shufutf8[idx]));
-  // Shuffle
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 110aaaaa 10bbbbbb
-  uint16x8_t perm = vreinterpretq_u16_u8(vqtbl1q_u8(in, sh));
-  // Mask
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000000 00bbbbbb
-  uint16x8_t ascii = vandq_u16(perm, vmovq_n_u16(0x7f)); // 6 or 7 bits
-  // 1 byte: 00000000 00000000
-  // 2 byte: 000aaaaa 00000000
-  uint16x8_t highbyte = vandq_u16(perm, vmovq_n_u16(0x1f00)); // 5 bits
-  // Combine with a shift right accumulate
-  // 1 byte: 00000000 0bbbbbbb
-  // 2 byte: 00000aaa aabbbbbb
-  uint16x8_t composed = vsraq_n_u16(ascii, highbyte, 2);
-  // writing 8 bytes even though we only care about the first 6 bytes.
-  uint8x8_t latin1_packed = vmovn_u16(composed);
-  vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
-  latin1_output += 6; // We wrote 6 bytes.
-  return consumed;
-}
-/* end file src/arm64/arm_convert_utf8_to_latin1.cpp */
-/* begin file src/arm64/arm_convert_utf8_to_utf16.cpp */
-// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 16, usually 12).
-template <endianness big_endian>
-size_t convert_masked_utf8_to_utf16(const char *input,
-                                    uint64_t utf8_end_of_code_point_mask,
-                                    char16_t *&utf16_output) {
-  // we use an approach where we try to process up to 12 input bytes.
-  // Why 12 input bytes and not 16? Because we are concerned with the size of
-  // the lookup tables. Also 12 is nicely divisible by two and three.
-  //
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff;
-  //
-  // Optimization note: our main path below is load-latency dependent. Thus it
-  // is maybe beneficial to have fast paths that depend on branch prediction but
-  // have less latency. This results in more instructions but, potentially, also
-  // higher speeds.
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-  // We first try a few fast paths.
-  // The obvious first test is ASCII, which actually consumes the full 16.
-  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xffff) {
-    // We process in chunks of 16 bytes
-    // The routine in simd.h is reused.
-    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
-    temp.store_ascii_as_utf16<big_endian>(utf16_output);
-    utf16_output += 16; // We wrote 16 16-bit characters.
-    return 16;          // We consumed 16 bytes.
-  }
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-  // 3 byte sequences are the next most common, as seen in CJK, which has long
-  // sequences of these.
-  if (input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
-    // UTF-16 code units.
-    uint16x4_t composed = convert_utf8_3_byte_to_utf16(in);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
-    }
-    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
-    utf16_output += 4; // We wrote 4 16-bit characters.
-    return 12;         // We consumed 12 bytes.
-  }
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
 
-  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
-  if ((utf8_end_of_code_point_mask & 0xFFF) == 0xaaa) {
-    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte
-    // UTF-16 code units.
-    uint16x8_t composed = convert_utf8_2_byte_to_utf16(in);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      composed =
-          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
+      // 6. adjust pointers
+      buf += 16;
+      continue;
     }
-    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+    // Must check for overflow in packing
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    if (saturation_bitmask == 0xffffffff) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
-    utf16_output += 6; // We wrote 6 16-bit characters.
-    return 12;         // We consumed 12 bytes.
-  }
+      // Check for illegal surrogate code units
+      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf8_output);
+      }
 
-  /// We do not have a fast path available, or the fast path is unimportant, so
-  /// we fallback.
-  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][0];
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][1];
+      /* In this branch we handle three cases:
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-  if (idx < 64) {
-    // SIX (6) input code-code units
-    // Convert to UTF-16
-    uint16x8_t composed = convert_utf8_1_to_2_byte_to_utf16(in, idx);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      composed =
-          vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
-    }
-    // Store
-    vst1q_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
-    utf16_output += 6; // We wrote 6 16-bit characters.
-    return consumed;
-  } else if (idx < 145) {
-    // FOUR (4) input code-code units
-    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
-    // XXX: depending on the system scalar instructions might be faster.
-    // 1 byte: 00000000 00000000 0ccccccc
-    // 2 byte: 00000000 110bbbbb 10cccccc
-    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
-    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
-    // 1 byte: 00000000 0ccccccc
-    // 2 byte: xx0bbbbb x0cccccc
-    // 3 byte: xxbbbbbb x0cccccc
-    uint16x4_t lowperm = vmovn_u32(perm);
-    // Partially mask with bic (doesn't require a temporary register unlike and)
-    // The shift left insert below will clear the top bits.
-    // 1 byte: 00000000 00000000
-    // 2 byte: xx0bbbbb 00000000
-    // 3 byte: xxbbbbbb 00000000
-    uint16x4_t middlebyte = vbic_u16(lowperm, vmov_n_u16(uint16_t(~0xFF00)));
-    // ASCII
-    // 1 byte: 00000000 0ccccccc
-    // 2+byte: 00000000 00cccccc
-    uint16x4_t ascii = vand_u16(lowperm, vmov_n_u16(0x7F));
-    // Split into narrow vectors.
-    // 2 byte: 00000000 00000000
-    // 3 byte: 00000000 xxxxaaaa
-    uint16x4_t highperm = vshrn_n_u32(perm, 16);
-    // Shift right accumulate the middle byte
-    // 1 byte: 00000000 0ccccccc
-    // 2 byte: 00xx0bbb bbcccccc
-    // 3 byte: 00xxbbbb bbcccccc
-    uint16x4_t middlelow = vsra_n_u16(ascii, middlebyte, 2);
-    // Shift left and insert the top 4 bits, overwriting the garbage
-    // 1 byte: 00000000 0ccccccc
-    // 2 byte: 00000bbb bbcccccc
-    // 3 byte: aaaabbbb bbcccccc
-    uint16x4_t composed = vsli_n_u16(middlelow, highperm, 12);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      composed = vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(composed)));
-    }
-    vst1_u16(reinterpret_cast<uint16_t *>(utf16_output), composed);
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-    utf16_output += 4; // We wrote 4 16-bit codepoints
-    return consumed;
-  } else if (idx < 209) {
-    // THREE (3) input code-code units
-    if (input_utf8_end_of_code_point_mask == 0x888) {
-      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
-      // UTF-16 pairs. Generating surrogate pairs is a little tricky though, but
-      // it is easier when we can assume they are all pairs. This version does
-      // not use the LUT, but 4 byte sequences are less common and the overhead
-      // of the extra memory access is less important than the early branch
-      // overhead in shorter sequences.
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-      // Swap byte pairs
-      // 10dddddd 10cccccc|10bbbbbb 11110aaa
-      // 10cccccc 10dddddd|11110aaa 10bbbbbb
-      uint8x16_t swap = vrev16q_u8(in);
-      // Shift left 2 bits
-      // cccccc00 dddddd00 xxxxxxxx bbbbbb00
-      uint32x4_t shift = vreinterpretq_u32_u8(vshlq_n_u8(swap, 2));
-      // Create a magic number containing the low 2 bits of the trail surrogate
-      // and all the corrections needed to create the pair. UTF-8 4b prefix   =
-      // -0x0000|0xF000 surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
-      // surrogate high    = +0x0000|0xD800
-      // surrogate low     = +0xDC00|0x0000
-      // -------------------------------
-      //                   = +0xDC00|0xE7C0
-      uint32x4_t magic = vmovq_n_u32(0xDC00E7C0);
-      // Generate unadjusted trail surrogate minus lowest 2 bits
-      // xxxxxxxx xxxxxxxx|11110aaa bbbbbb00
-      uint32x4_t trail =
-          vbslq_u32(vmovq_n_u32(0x0000FF00), vreinterpretq_u32_u8(swap), shift);
-      // Insert low 2 bits of trail surrogate to magic number for later
-      // 11011100 00000000 11100111 110000cc
-      uint16x8_t magic_with_low_2 =
-          vreinterpretq_u16_u32(vsraq_n_u32(magic, shift, 30));
-      // Generate lead surrogate
-      // xxxxcccc ccdddddd|xxxxxxxx xxxxxxxx
-      uint32x4_t lead = vreinterpretq_u32_u16(
-          vsliq_n_u16(vreinterpretq_u16_u8(swap), vreinterpretq_u16_u8(in), 6));
-      // Mask out lead
-      // 000000cc ccdddddd|xxxxxxxx xxxxxxxx
-      lead = vbicq_u32(lead, vmovq_n_u32(uint32_t(~0x03FFFFFF)));
-      // Blend pairs
-      // 000000cc ccdddddd|11110aaa bbbbbb00
-      uint16x8_t blend = vreinterpretq_u16_u32(
-          vbslq_u32(vmovq_n_u32(0x0000FFFF), trail, lead));
-      // Add magic number to finish the result
-      // 110111CC CCDDDDDD|110110AA BBBBBBCC
-      uint16x8_t composed = vaddq_u16(blend, magic_with_low_2);
-      // Byte swap if necessary
-      if (!match_system(big_endian)) {
-        composed =
-            vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(composed)));
-      }
-      uint16_t buffer[8];
-      vst1q_u16(reinterpret_cast<uint16_t *>(buffer), composed);
-      for (int k = 0; k < 6; k++) {
-        utf16_output[k] = buffer[k];
-      } // the loop might compiler to a couple of instructions.
-      utf16_output += 6; // We wrote 3 32-bit surrogate pairs.
-      return 12;         // We consumed 12 bytes.
-    }
-    // 3 1-4 byte sequences
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-    // 1 byte: 00000000 00000000 00000000 0ddddddd
-    // 3 byte: 00000000 00000000 110ccccc 10dddddd
-    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
-    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
-    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
-    // added to fix issue https://github.com/simdutf/simdutf/issues/514
-    // We only want to write 2 * 16-bit code units when that is actually what we
-    // have. Unfortunately, we cannot trust the input. So it is possible to get
-    // 0xff as an input byte and it should not result in a surrogate pair. We
-    // need to check for that.
-    uint32_t permbuffer[4];
-    vst1q_u32(permbuffer, perm);
-    // Mask the low and middle bytes
-    // 00000000 00000000 00000000 0ddddddd
-    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7f));
-    // Because the surrogates need more work, the high surrogate is computed
-    // first.
-    uint32x4_t middlehigh = vshlq_n_u32(perm, 2);
-    // 00000000 00000000 00cccccc 00000000
-    uint32x4_t middlebyte = vandq_u32(perm, vmovq_n_u32(0x3F00));
-    // Start assembling the sequence. Since the 4th byte is in the same position
-    // as it would be in a surrogate and there is no dependency, shift left
-    // instead of right. 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx 4 byte:
-    // 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
-    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0xFF000000), perm, middlehigh);
-    // Top 16 bits contains the high ten bits of the surrogate pair before
-    // correction 3 byte: 00000000 10bbbbcc|cccc0000 00000000 4 byte: 11110aaa
-    // bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
-    uint32x4_t abc =
-        vbslq_u32(vmovq_n_u32(0xFFFC0000), ab, vshlq_n_u32(middlebyte, 4));
-    // Combine the low 6 or 7 bits by a shift right accumulate
-    // 3 byte: 00000000 00000010|bbbbcccc ccdddddd - low 16 bits correct
-    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o
-    // correction
-    uint32x4_t composed = vsraq_n_u32(ascii, abc, 6);
-    // After this is for surrogates
-    // Blend the low and high surrogates
-    // 4 byte: 11110aaa bbbbbbcc|bbbbcccc ccdddddd
-    uint32x4_t mixed = vbslq_u32(vmovq_n_u32(0xFFFF0000), abc, composed);
-    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits
-    // yet as 0x10000 was not subtracted from the codepoint yet. 4 byte:
-    // 11110aaa bbbbbbcc|000000cc ccdddddd
-    uint16x8_t masked_pair = vreinterpretq_u16_u32(
-        vbicq_u32(mixed, vmovq_n_u32(uint32_t(~0xFFFF03FF))));
-    // Correct the remaining UTF-8 prefix, surrogate offset, and add the
-    // surrogate prefixes in one magic 16-bit addition. similar magic number but
-    // without the continue byte adjust and halfword swapped UTF-8 4b prefix   =
-    // -0xF000|0x0000 surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
-    // surrogate high    = +0xD800|0x0000
-    // surrogate low     = +0x0000|0xDC00
-    // -----------------------------------
-    //                   = +0xE7C0|0xDC00
-    uint16x8_t magic = vreinterpretq_u16_u32(vmovq_n_u32(0xE7C0DC00));
-    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD - surrogate pair complete
-    uint32x4_t surrogates =
-        vreinterpretq_u32_u16(vaddq_u16(masked_pair, magic));
-    // If the high bit is 1 (s32 less than zero), this needs a surrogate pair
-    uint32x4_t is_pair = vcltzq_s32(vreinterpretq_s32_u32(perm));
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
+
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-    // Select either the 4 byte surrogate pair or the 2 byte solo codepoint
-    // 3 byte: 0xxxxxxx xxxxxxxx|bbbbcccc ccdddddd
-    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD
-    uint32x4_t selected = vbslq_u32(is_pair, surrogates, composed);
-    // Byte swap if necessary
-    if (!match_system(big_endian)) {
-      selected =
-          vreinterpretq_u32_u8(vrev16q_u8(vreinterpretq_u8_u32(selected)));
-    }
-    // Attempting to shuffle and store would be complex, just scalarize.
-    uint32_t buffer[4];
-    vst1q_u32(buffer, selected);
-    // Test for the top bit of the surrogate mask. Remove due to issue 514
-    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 :
-    // 0x00800000;
-    for (size_t i = 0; i < 3; i++) {
-      // Surrogate
-      // Used to be if (buffer[i] & SURROGATE_MASK) {
-      // See discussion above.
-      // patch for issue https://github.com/simdutf/simdutf/issues/514
-      if ((permbuffer[i] & 0xf8000000) == 0xf0000000) {
-        utf16_output[0] = uint16_t(buffer[i] >> 16);
-        utf16_output[1] = uint16_t(buffer[i] & 0xFFFF);
-        utf16_output += 2;
-      } else {
-        utf16_output[0] = uint16_t(buffer[i] & 0xFFFF);
-        utf16_output++;
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+    } else {
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
       }
+      buf += k;
     }
-    return consumed;
-  } else {
-    // here we know that there is an error but we do not handle errors
-    return 12;
-  }
+  } // while
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
 }
-/* end file src/arm64/arm_convert_utf8_to_utf16.cpp */
-/* begin file src/arm64/arm_convert_utf8_to_utf32.cpp */
-// Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 12).
-size_t convert_masked_utf8_to_utf32(const char *input,
-                                    uint64_t utf8_end_of_code_point_mask,
-                                    char32_t *&utf32_out) {
-  // we use an approach where we try to process up to 12 input bytes.
-  // Why 12 input bytes and not 16? Because we are concerned with the size of
-  // the lookup tables. Also 12 is nicely divisible by two and three.
-  //
-  uint32_t *&utf32_output = reinterpret_cast<uint32_t *&>(utf32_out);
-  uint8x16_t in = vld1q_u8(reinterpret_cast<const uint8_t *>(input));
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xFFF;
-  //
-  // Optimization note: our main path below is load-latency dependent. Thus it
-  // is maybe beneficial to have fast paths that depend on branch prediction but
-  // have less latency. This results in more instructions but, potentially, also
-  // higher speeds.
-  //
-  // We first try a few fast paths.
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process in chunks of 12 bytes.
-    // use fast implementation in src/simdutf/arm64/simd.h
-    // Ideally the compiler can keep the tables in registers.
-    simd8<int8_t> temp{vreinterpretq_s8_u8(in)};
-    temp.store_ascii_as_utf32_tbl(utf32_out);
-    utf32_output += 12; // We wrote 12 32-bit characters.
-    return 12;          // We consumed 12 bytes.
-  }
-  if (input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
-    // UTF-32 code units. Convert to UTF-16
-    uint16x4_t composed_utf16 = convert_utf8_3_byte_to_utf16(in);
-    // Zero extend and store via ST2 with a zero.
-    uint16x4x2_t interleaver = {{composed_utf16, vmov_n_u16(0)}};
-    vst2_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
-    utf32_output += 4; // We wrote 4 32-bit characters.
-    return 12;         // We consumed 12 bytes.
-  }
+/* end file src/icelake/icelake_convert_utf32_to_utf8.inl.cpp */
+/* begin file src/icelake/icelake_convert_utf32_to_utf16.inl.cpp */
+// file included directly
 
-  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
-  if (input_utf8_end_of_code_point_mask == 0xaaa) {
-    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte
-    // UTF-32 code units. Convert to UTF-16
-    uint16x8_t composed_utf16 = convert_utf8_2_byte_to_utf16(in);
-    // Zero extend and store via ST2 with a zero.
-    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
-    vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
-    utf32_output += 6; // We wrote 6 32-bit characters.
-    return 12;         // We consumed 12 bytes.
-  }
-  /// Either no fast path or an unimportant fast path.
+// Todo: currently, this is just the haswell code, optimize for icelake kernel.
+template <endianness big_endian>
+std::pair<const char32_t *, char16_t *>
+avx512_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                              char16_t *utf16_output) {
+  const char32_t *end = buf + len;
 
-  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][1];
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+  __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-  if (idx < 64) {
-    // SIX (6) input code-code units
-    // Convert to UTF-16
-    uint16x8_t composed_utf16 = convert_utf8_1_to_2_byte_to_utf16(in, idx);
-    // Zero extend and store with ST2 and zero
-    uint16x8x2_t interleaver = {{composed_utf16, vmovq_n_u16(0)}};
-    vst2q_u16(reinterpret_cast<uint16_t *>(utf32_output), interleaver);
-    utf32_output += 6; // We wrote 6 32-bit characters.
-    return consumed;
-  } else if (idx < 145) {
-    // FOUR (4) input code-code units
-    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
-    // Shuffle
-    // 1 byte: 00000000 00000000 0ccccccc
-    // 2 byte: 00000000 110bbbbb 10cccccc
-    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
-    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
-    // Split
-    // 00000000 00000000 0ccccccc
-    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F)); // 6 or 7 bits
-    // Note: unmasked
-    // xxxxxxxx aaaaxxxx xxxxxxxx
-    uint32x4_t high = vshrq_n_u32(perm, 4); // 4 bits
-    // Use 16 bit bic instead of and.
-    // The top bits will be corrected later in the bsl
-    // 00000000 10bbbbbb 00000000
-    uint32x4_t middle = vreinterpretq_u32_u16(
-        vbicq_u16(vreinterpretq_u16_u32(perm),
-                  vmovq_n_u16(uint16_t(~0xff00)))); // 5 or 6 bits
-    // Combine low and middle with shift right accumulate
-    // 00000000 00xxbbbb bbcccccc
-    uint32x4_t lowmid = vsraq_n_u32(ascii, middle, 2);
-    // Insert top 4 bits from high byte with bitwise select
-    // 00000000 aaaabbbb bbcccccc
-    uint32x4_t composed = vbslq_u32(vmovq_n_u32(0x0000F000), high, lowmid);
-    vst1q_u32(utf32_output, composed);
-    utf32_output += 4; // We wrote 4 32-bit characters.
-    return consumed;
-  } else if (idx < 209) {
-    // THREE (3) input code-code units
-    if (input_utf8_end_of_code_point_mask == 0x888) {
-      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
-      // UTF-32 code units. This uses the same method as the fixed 3 byte
-      // version, reversing and shift left insert. However, there is no need for
-      // a shuffle mask now, just rev16 and rev32.
-      //
-      // This version does not use the LUT, but 4 byte sequences are less common
-      // and the overhead of the extra memory access is less important than the
-      // early branch overhead in shorter sequences, so it comes last.
+  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
-      // Swap pairs of bytes
-      // 10dddddd|10cccccc|10bbbbbb|11110aaa
-      // 10cccccc 10dddddd|11110aaa 10bbbbbb
-      uint16x8_t swap1 = vreinterpretq_u16_u8(vrev16q_u8(in));
-      // Shift left and insert
-      // xxxxcccc ccdddddd|xxxxxxxa aabbbbbb
-      uint16x8_t merge1 = vsliq_n_u16(swap1, vreinterpretq_u16_u8(in), 6);
-      // Swap 16-bit lanes
-      // xxxxcccc ccdddddd xxxxxxxa aabbbbbb
-      // xxxxxxxa aabbbbbb xxxxcccc ccdddddd
-      uint32x4_t swap2 = vreinterpretq_u32_u16(vrev32q_u16(merge1));
-      // Shift insert again
-      // xxxxxxxx xxxaaabb bbbbcccc ccdddddd
-      uint32x4_t merge2 = vsliq_n_u32(swap2, vreinterpretq_u32_u16(merge1), 12);
-      // Clear the garbage
-      // 00000000 000aaabb bbbbcccc ccdddddd
-      uint32x4_t composed = vandq_u32(merge2, vmovq_n_u32(0x1FFFFF));
-      // Store
-      vst1q_u32(utf32_output, composed);
+    const __m256i v_00000000 = _mm256_setzero_si256();
+    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
-      utf32_output += 3; // We wrote 3 32-bit characters.
-      return 12;         // We consumed 12 bytes.
-    }
-    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit
-    // due to surrogates no longer being involved.
-    uint8x16_t sh = vld1q_u8(reinterpret_cast<const uint8_t *>(
-        simdutf::tables::utf8_to_utf16::shufutf8[idx]));
-    // 1 byte: 00000000 00000000 00000000 0ddddddd
-    // 2 byte: 00000000 00000000 110ccccc 10dddddd
-    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
-    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
-    uint32x4_t perm = vreinterpretq_u32_u8(vqtbl1q_u8(in, sh));
-    // Ascii
-    uint32x4_t ascii = vandq_u32(perm, vmovq_n_u32(0x7F));
-    uint32x4_t middle = vandq_u32(perm, vmovq_n_u32(0x3f00));
-    // When converting the way we do, the 3 byte prefix will be interpreted as
-    // the 18th bit being set, since the code would interpret the lead byte
-    // (0b1110bbbb) as a continuation byte (0b10bbbbbb). To fix this, we can
-    // either xor or do an 8 bit add of the 6th bit shifted right by 1. Since
-    // NEON has shift right accumulate, we use that.
-    //  4 byte   3 byte
-    // 10bbbbbb 1110bbbb
-    // 00000000 01000000 6th bit
-    // 00000000 00100000 shift right
-    // 10bbbbbb 0000bbbb add
-    // 00bbbbbb 0000bbbb mask
-    uint8x16_t correction =
-        vreinterpretq_u8_u32(vandq_u32(perm, vmovq_n_u32(0x00400000)));
-    uint32x4_t corrected = vreinterpretq_u32_u8(
-        vsraq_n_u8(vreinterpretq_u8_u32(perm), correction, 1));
-    // 00000000 00000000 0000cccc ccdddddd
-    uint32x4_t cd = vsraq_n_u32(ascii, middle, 2);
-    // Insert twice
-    // xxxxxxxx xxxaaabb bbbbxxxx xxxxxxxx
-    uint32x4_t ab = vbslq_u32(vmovq_n_u32(0x01C0000), vshrq_n_u32(corrected, 6),
-                              vshrq_n_u32(corrected, 4));
-    // 00000000 000aaabb bbbbcccc ccdddddd
-    uint32x4_t composed = vbslq_u32(vmovq_n_u32(0xFFE00FFF), cd, ab);
-    // Store
-    vst1q_u32(utf32_output, composed);
-    utf32_output += 3; // We wrote 3 32-bit characters.
-    return consumed;
-  } else {
-    // here we know that there is an error but we do not handle errors
-    return 12;
-  }
-}
-/* end file src/arm64/arm_convert_utf8_to_utf32.cpp */
+    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
-/* begin file src/arm64/arm_convert_utf16_to_latin1.cpp */
+    if (saturation_bitmask == 0xffffffff) {
+      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
+      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
 
-template <endianness big_endian>
-std::pair<const char16_t *, char *>
-arm_convert_utf16_to_latin1(const char16_t *buf, size_t len,
-                            char *latin1_output) {
-  const char16_t *end = buf + len;
-  while (end - buf >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
-    }
-    if (vmaxvq_u16(in) <= 0xff) {
-      // 1. pack the bytes
-      uint8x8_t latin1_packed = vmovn_u16(in);
-      // 2. store (8 bytes)
-      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
-      // 3. adjust pointers
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      }
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
+      utf16_output += 8;
       buf += 8;
-      latin1_output += 8;
     } else {
-      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
+      size_t forward = 7;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (big_endian) {
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
     }
-  } // while
-  return std::make_pair(buf, latin1_output);
+  }
+
+  // check for invalid input
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf16_output);
+  }
+
+  return std::make_pair(buf, utf16_output);
 }
 
+// Todo: currently, this is just the haswell code, optimize for icelake kernel.
 template <endianness big_endian>
-std::pair<result, char *>
-arm_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
-                                        char *latin1_output) {
-  const char16_t *start = buf;
-  const char16_t *end = buf + len;
-  while (end - buf >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
-    }
-    if (vmaxvq_u16(in) <= 0xff) {
-      // 1. pack the bytes
-      uint8x8_t latin1_packed = vmovn_u16(in);
-      // 2. store (8 bytes)
-      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
-      // 3. adjust pointers
+std::pair<result, char16_t *>
+avx512_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                          char16_t *utf16_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+
+    const __m256i v_00000000 = _mm256_setzero_si256();
+    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
+
+    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+
+    if (saturation_bitmask == 0xffffffff) {
+      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
+      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf16_output);
+      }
+
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      }
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
+      utf16_output += 8;
       buf += 8;
-      latin1_output += 8;
     } else {
-      // Let us do a scalar fallback.
-      for (int k = 0; k < 8; k++) {
-        uint16_t word = !match_system(big_endian)
-                            ? scalar::utf16::swap_bytes(buf[k])
-                            : buf[k];
-        if (word <= 0xff) {
-          *latin1_output++ = char(word);
+      size_t forward = 7;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
         } else {
-          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
-                                latin1_output);
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (big_endian) {
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
         }
       }
+      buf += k;
     }
-  } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        latin1_output);
+  }
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf16_output);
+}
+/* end file src/icelake/icelake_convert_utf32_to_utf16.inl.cpp */
+/* begin file src/icelake/icelake_ascii_validation.inl.cpp */
+// file included directly
+
+bool validate_ascii(const char *buf, size_t len) {
+  const char *end = buf + len;
+  const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
+  __m512i running_or = _mm512_setzero_si512();
+  for (; end - buf >= 64; buf += 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)buf);
+    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
+                                           0xf8); // running_or | (utf8 & ascii)
+  }
+  if (buf < end) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        (uint64_t(1) << (end - buf)) - 1, (const __m512i *)buf);
+    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
+                                           0xf8); // running_or | (utf8 & ascii)
+  }
+  return (_mm512_test_epi8_mask(running_or, running_or) == 0);
 }
-/* end file src/arm64/arm_convert_utf16_to_latin1.cpp */
-/* begin file src/arm64/arm_convert_utf16_to_utf32.cpp */
-/*
-    The vectorized algorithm works on single SSE register i.e., it
-    loads eight 16-bit code units.
+/* end file src/icelake/icelake_ascii_validation.inl.cpp */
+/* begin file src/icelake/icelake_utf32_validation.inl.cpp */
+// file included directly
 
-    We consider three cases:
-    1. an input register contains no surrogates and each value
-       is in range 0x0000 .. 0x07ff.
-    2. an input register contains no surrogates and values are
-       is in range 0x0000 .. 0xffff.
-    3. an input register contains surrogates --- i.e. codepoints
-       can have 16 or 32 bits.
+const char32_t *validate_utf32(const char32_t *buf, size_t len) {
+  if (len < 16) {
+    return buf;
+  }
+  const char32_t *end = buf + len - 16;
 
-    Ad 1.
+  const __m512i offset = _mm512_set1_epi32((uint32_t)0xffff2000);
+  __m512i currentmax = _mm512_setzero_si512();
+  __m512i currentoffsetmax = _mm512_setzero_si512();
 
-    When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it is an ASCII
-    char) or 2) two UTF8 bytes.
+  while (buf <= end) {
+    __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
+    buf += 16;
+    currentoffsetmax =
+        _mm512_max_epu32(_mm512_add_epi32(utf32, offset), currentoffsetmax);
+    currentmax = _mm512_max_epu32(utf32, currentmax);
+  }
 
-    For this case we do only some shuffle to obtain these 2-byte
-    codes and finally compress the whole SSE register with a single
-    shuffle.
+  const __m512i standardmax = _mm512_set1_epi32((uint32_t)0x10ffff);
+  const __m512i standardoffsetmax = _mm512_set1_epi32((uint32_t)0xfffff7ff);
+  __m512i is_zero =
+      _mm512_xor_si512(_mm512_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
+    return nullptr;
+  }
+  is_zero = _mm512_xor_si512(
+      _mm512_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
+  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
+    return nullptr;
+  }
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+  return buf;
+}
+/* end file src/icelake/icelake_utf32_validation.inl.cpp */
+/* begin file src/icelake/icelake_convert_latin1_to_utf8.inl.cpp */
+// file included directly
 
-    Ad 2.
+static inline size_t latin1_to_utf8_avx512_vec(__m512i input, size_t input_len,
+                                               char *utf8_output,
+                                               int mask_output) {
+  __mmask64 nonascii = _mm512_movepi8_mask(input);
+  size_t output_size = input_len + (size_t)count_ones(nonascii);
 
-    When values fit in 16-bit code units, but are above 0x07ff, then
-    a single word may produce one, two or three UTF8 bytes.
+  // Mask to denote whether the byte is a leading byte that is not ascii
+  __mmask64 sixth = _mm512_cmpge_epu8_mask(
+      input, _mm512_set1_epi8(-64)); // binary representation of -64: 1100 0000
 
-    We prepare data for all these three cases in two registers.
-    The first register contains lower two UTF8 bytes (used in all
-    cases), while the second one contains just the third byte for
-    the three-UTF8-bytes case.
+  const uint64_t alternate_bits = UINT64_C(0x5555555555555555);
+  uint64_t ascii = ~nonascii;
+  // the bits in ascii are inverted and zeros are interspersed in between them
+  uint64_t maskA = ~_pdep_u64(ascii, alternate_bits);
+  uint64_t maskB = ~_pdep_u64(ascii >> 32, alternate_bits);
 
-    Finally these two registers are interleaved forming eight-element
-    array of 32-bit values. The array spans two SSE registers.
-    The bytes from the registers are compressed using two shuffles.
+  // interleave bytes from top and bottom halves (abcd...ABCD -> aAbBcCdD)
+  __m512i input_interleaved = _mm512_permutexvar_epi8(
+      _mm512_set_epi32(0x3f1f3e1e, 0x3d1d3c1c, 0x3b1b3a1a, 0x39193818,
+                       0x37173616, 0x35153414, 0x33133212, 0x31113010,
+                       0x2f0f2e0e, 0x2d0d2c0c, 0x2b0b2a0a, 0x29092808,
+                       0x27072606, 0x25052404, 0x23032202, 0x21012000),
+      input);
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+  // double size of each byte, and insert the leading byte 1100 0010
 
+  /*
+  upscale the bytes to 16-bit value, adding the 0b11000000 leading byte in the
+  process. We adjust for the bytes that have their two most significant bits.
+  This takes care of the first 32 bytes, assuming we interleaved the bytes. */
+  __m512i outputA =
+      _mm512_shldi_epi16(input_interleaved, _mm512_set1_epi8(-62), 8);
+  outputA = _mm512_mask_add_epi16(
+      outputA, (__mmask32)sixth, outputA,
+      _mm512_set1_epi16(1 - 0x4000)); // 1- 0x4000 = 1100 0000 0000 0001????
 
-    To summarize:
-    - We need two 256-entry tables that have 8704 bytes in total.
-*/
-/*
-  Returns a pair: the first unprocessed byte from buf and utf8_output
-  A scalar routing should carry on the conversion of the tail.
-*/
-template <endianness big_endian>
-std::pair<const char16_t *, char32_t *>
-arm_convert_utf16_to_utf32(const char16_t *buf, size_t len,
-                           char32_t *utf32_out) {
-  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  const char16_t *end = buf + len;
+  // in the second 32-bit half, set first or second option based on whether
+  // original input is leading byte (second case) or not (first case)
+  __m512i leadingB =
+      _mm512_mask_blend_epi16((__mmask32)(sixth >> 32),
+                              _mm512_set1_epi16(0x00c2),  // 0000 0000 1101 0010
+                              _mm512_set1_epi16(0x40c3)); // 0100 0000 1100 0011
+  __m512i outputB = _mm512_ternarylogic_epi32(
+      input_interleaved, leadingB, _mm512_set1_epi16((short)0xff00),
+      (240 & 170) ^ 204); // (input_interleaved & 0xff00) ^ leadingB
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+  // prune redundant bytes
+  outputA = _mm512_maskz_compress_epi8(maskA, outputA);
+  outputB = _mm512_maskz_compress_epi8(maskB, outputB);
 
-  while (end - buf >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
-    }
+  size_t output_sizeA = (size_t)count_ones((uint32_t)nonascii) + 32;
 
-    const uint16x8_t surrogates_bytemask =
-        vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
-    // However, it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
-      // units
-      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
-      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
-      utf32_output += 8;
-      buf += 8;
-      // surrogate pair(s) in a register
+  if (mask_output) {
+    if (input_len > 32) { // is the second half of the input vector used?
+      __mmask64 write_mask = _bzhi_u64(~0ULL, (unsigned int)output_sizeA);
+      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputA);
+      utf8_output += output_sizeA;
+      write_mask = _bzhi_u64(~0ULL, (unsigned int)(output_size - output_sizeA));
+      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputB);
     } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint16_t word = !match_system(big_endian)
-                            ? scalar::utf16::swap_bytes(buf[k])
-                            : buf[k];
-        if ((word & 0xF800) != 0xD800) {
-          *utf32_output++ = char32_t(word);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian)
-                                   ? scalar::utf16::swap_bytes(buf[k + 1])
-                                   : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char32_t *>(utf32_output));
-          }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf32_output++ = char32_t(value);
-        }
-      }
-      buf += k;
+      __mmask64 write_mask = _bzhi_u64(~0ULL, (unsigned int)output_size);
+      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputA);
     }
-  } // while
-  return std::make_pair(buf, reinterpret_cast<char32_t *>(utf32_output));
+  } else {
+    _mm512_storeu_si512(utf8_output, outputA);
+    utf8_output += output_sizeA;
+    _mm512_storeu_si512(utf8_output, outputB);
+  }
+  return output_size;
 }
 
-/*
-  Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the
-  error. Otherwise, it is the position of the first unprocessed byte in buf
-  (even if finished). A scalar routing should carry on the conversion of the
-  tail if needed.
-*/
-template <endianness big_endian>
-std::pair<result, char32_t *>
-arm_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
-                                       char32_t *utf32_out) {
-  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  const char16_t *start = buf;
-  const char16_t *end = buf + len;
+static inline size_t latin1_to_utf8_avx512_branch(__m512i input,
+                                                  char *utf8_output) {
+  __mmask64 nonascii = _mm512_movepi8_mask(input);
+  if (nonascii) {
+    return latin1_to_utf8_avx512_vec(input, 64, utf8_output, 0);
+  } else {
+    _mm512_storeu_si512(utf8_output, input);
+    return 64;
+  }
+}
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
+size_t latin1_to_utf8_avx512_start(const char *buf, size_t len,
+                                   char *utf8_output) {
+  char *start = utf8_output;
+  size_t pos = 0;
+  // if there's at least 128 bytes remaining, we don't need to mask the output
+  for (; pos + 128 <= len; pos += 64) {
+    __m512i input = _mm512_loadu_si512((__m512i *)(buf + pos));
+    utf8_output += latin1_to_utf8_avx512_branch(input, utf8_output);
+  }
+  // in the last 128 bytes, the first 64 may require masking the output
+  if (pos + 64 <= len) {
+    __m512i input = _mm512_loadu_si512((__m512i *)(buf + pos));
+    utf8_output += latin1_to_utf8_avx512_vec(input, 64, utf8_output, 1);
+    pos += 64;
+  }
+  // with the last 64 bytes, the input also needs to be masked
+  if (pos < len) {
+    __mmask64 load_mask = _bzhi_u64(~0ULL, (unsigned int)(len - pos));
+    __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)(buf + pos));
+    utf8_output += latin1_to_utf8_avx512_vec(input, len - pos, utf8_output, 1);
+  }
+  return (size_t)(utf8_output - start);
+}
+/* end file src/icelake/icelake_convert_latin1_to_utf8.inl.cpp */
+/* begin file src/icelake/icelake_convert_latin1_to_utf16.inl.cpp */
+// file included directly
+template <endianness big_endian>
+size_t icelake_convert_latin1_to_utf16(const char *latin1_input, size_t len,
+                                       char16_t *utf16_output) {
+  size_t rounded_len = len & ~0x1F; // Round down to nearest multiple of 32
 
-  while ((end - buf) >= 8) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
+                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  for (size_t i = 0; i < rounded_len; i += 32) {
+    // Load 32 Latin1 characters into a 256-bit register
+    __m256i in = _mm256_loadu_si256((__m256i *)&latin1_input[i]);
+    // Zero extend each set of 8 Latin1 characters to 32 16-bit integers
+    __m512i out = _mm512_cvtepu8_epi16(in);
+    if (big_endian) {
+      out = _mm512_shuffle_epi8(out, byteflip);
     }
+    // Store the results back to memory
+    _mm512_storeu_si512((__m512i *)&utf16_output[i], out);
+  }
+  if (rounded_len != len) {
+    uint32_t mask = uint32_t(1 << (len - rounded_len)) - 1;
+    __m256i in = _mm256_maskz_loadu_epi8(mask, latin1_input + rounded_len);
 
-    const uint16x8_t surrogates_bytemask =
-        vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
-    // However, it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
-      // units
-      vst1q_u32(utf32_output, vmovl_u16(vget_low_u16(in)));
-      vst1q_u32(utf32_output + 4, vmovl_high_u16(in));
-      utf32_output += 8;
-      buf += 8;
-      // surrogate pair(s) in a register
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint16_t word = !match_system(big_endian)
-                            ? scalar::utf16::swap_bytes(buf[k])
-                            : buf[k];
-        if ((word & 0xF800) != 0xD800) {
-          *utf32_output++ = char32_t(word);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian)
-                                   ? scalar::utf16::swap_bytes(buf[k + 1])
-                                   : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k - 1),
-                reinterpret_cast<char32_t *>(utf32_output));
-          }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf32_output++ = char32_t(value);
-        }
-      }
-      buf += k;
+    // Zero extend each set of 8 Latin1 characters to 32 16-bit integers
+    __m512i out = _mm512_cvtepu8_epi16(in);
+    if (big_endian) {
+      out = _mm512_shuffle_epi8(out, byteflip);
     }
-  } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        reinterpret_cast<char32_t *>(utf32_output));
+    // Store the results back to memory
+    _mm512_mask_storeu_epi16(utf16_output + rounded_len, mask, out);
+  }
+
+  return len;
 }
-/* end file src/arm64/arm_convert_utf16_to_utf32.cpp */
-/* begin file src/arm64/arm_convert_utf16_to_utf8.cpp */
-/*
-    The vectorized algorithm works on single SSE register i.e., it
-    loads eight 16-bit code units.
+/* end file src/icelake/icelake_convert_latin1_to_utf16.inl.cpp */
+/* begin file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
+std::pair<const char *, char32_t *>
+avx512_convert_latin1_to_utf32(const char *buf, size_t len,
+                               char32_t *utf32_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
 
-    We consider three cases:
-    1. an input register contains no surrogates and each value
-       is in range 0x0000 .. 0x07ff.
-    2. an input register contains no surrogates and values are
-       is in range 0x0000 .. 0xffff.
-    3. an input register contains surrogates --- i.e. codepoints
-       can have 16 or 32 bits.
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    // Load 16 Latin1 characters into a 128-bit register
+    __m128i in = _mm_loadu_si128((__m128i *)&buf[i]);
 
-    Ad 1.
+    // Zero extend each set of 8 Latin1 characters to 16 32-bit integers using
+    // vpmovzxbd
+    __m512i out = _mm512_cvtepu8_epi32(in);
 
-    When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it is an ASCII
-    char) or 2) two UTF8 bytes.
+    // Store the results back to memory
+    _mm512_storeu_si512((__m512i *)&utf32_output[i], out);
+  }
 
-    For this case we do only some shuffle to obtain these 2-byte
-    codes and finally compress the whole SSE register with a single
-    shuffle.
+  // Return pointers pointing to where we left off
+  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
+}
+/* end file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
+/* begin file src/icelake/icelake_base64.inl.cpp */
+// file included directly
+/**
+ * References and further reading:
+ *
+ * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
+ * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
+ * https://arxiv.org/abs/1910.05109
+ *
+ * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
+ * Instructions, ACM Transactions on the Web 12 (3), 2018.
+ * https://arxiv.org/abs/1704.00605
+ *
+ * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
+ * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
+ * Request for Comments: 4648.
+ *
+ * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
+ * http://www.alfredklomp.com/programming/sse-base64/. (2014).
+ *
+ * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
+ * acceleration. https://github.com/aklomp/base64. (2014).
+ *
+ * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
+ * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
+ *
+ * Nick Kopp. 2013. Base64 Encoding on a GPU.
+ * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
+ */
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+struct block64 {
+  __m512i chunks[1];
+};
+
+template <bool base64_url>
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
+  // credit: Wojciech Muła
+  const uint8_t *input = (const uint8_t *)src;
+
+  uint8_t *out = (uint8_t *)dst;
+  static const char *lookup_tbl =
+      base64_url
+          ? "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
+          : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+
+  const __m512i shuffle_input = _mm512_setr_epi32(
+      0x01020001, 0x04050304, 0x07080607, 0x0a0b090a, 0x0d0e0c0d, 0x10110f10,
+      0x13141213, 0x16171516, 0x191a1819, 0x1c1d1b1c, 0x1f201e1f, 0x22232122,
+      0x25262425, 0x28292728, 0x2b2c2a2b, 0x2e2f2d2e);
+  const __m512i lookup =
+      _mm512_loadu_si512(reinterpret_cast<const __m512i *>(lookup_tbl));
+  const __m512i multi_shifts = _mm512_set1_epi64(UINT64_C(0x3036242a1016040a));
+  size_t size = srclen;
+  __mmask64 input_mask = 0xffffffffffff; // (1 << 48) - 1
+  while (size >= 48) {
+    const __m512i v = _mm512_maskz_loadu_epi8(
+        input_mask, reinterpret_cast<const __m512i *>(input));
+    const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
+    const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
+    const __m512i result = _mm512_permutexvar_epi8(indices, lookup);
+    _mm512_storeu_si512(reinterpret_cast<__m512i *>(out), result);
+    out += 64;
+    input += 48;
+    size -= 48;
+  }
+  input_mask = ((__mmask64)1 << size) - 1;
+  const __m512i v = _mm512_maskz_loadu_epi8(
+      input_mask, reinterpret_cast<const __m512i *>(input));
+  const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
+  const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
+  bool padding_needed =
+      (((options & base64_url) == 0) ^
+       ((options & base64_reverse_padding) == base64_reverse_padding));
+  size_t padding_amount = ((size % 3) > 0) ? (3 - (size % 3)) : 0;
+  size_t output_len = ((size + 2) / 3) * 4;
+  size_t non_padded_output_len = output_len - padding_amount;
+  if (!padding_needed) {
+    output_len = non_padded_output_len;
+  }
+  __mmask64 output_mask = output_len == 64 ? (__mmask64)UINT64_MAX
+                                           : ((__mmask64)1 << output_len) - 1;
+  __m512i result = _mm512_mask_permutexvar_epi8(
+      _mm512_set1_epi8('='), ((__mmask64)1 << non_padded_output_len) - 1,
+      indices, lookup);
+  _mm512_mask_storeu_epi8(reinterpret_cast<__m512i *>(out), output_mask,
+                          result);
+  return (size_t)(out - (uint8_t *)dst) + output_len;
+}
+
+template <bool base64_url>
+static inline uint64_t to_base64_mask(block64 *b, uint64_t *error,
+                                      uint64_t input_mask = UINT64_MAX) {
+  __m512i input = b->chunks[0];
+  const __m512i ascii_space_tbl = _mm512_set_epi8(
+      0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10,
+      9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0,
+      0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32);
+  __m512i lookup0;
+  if (base64_url) {
+    lookup0 = _mm512_set_epi8(
+        -128, -128, -128, -128, -128, -128, 61, 60, 59, 58, 57, 56, 55, 54, 53,
+        52, -128, -128, 62, -128, -128, -128, -128, -128, -128, -128, -128,
+        -128, -128, -128, -128, -1, -128, -128, -128, -128, -128, -128, -128,
+        -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -1,
+        -128, -128, -1, -1, -128, -128, -128, -128, -128, -128, -128, -128, -1);
+  } else {
+    lookup0 = _mm512_set_epi8(
+        -128, -128, -128, -128, -128, -128, 61, 60, 59, 58, 57, 56, 55, 54, 53,
+        52, 63, -128, -128, -128, 62, -128, -128, -128, -128, -128, -128, -128,
+        -128, -128, -128, -1, -128, -128, -128, -128, -128, -128, -128, -128,
+        -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -1, -128,
+        -128, -1, -1, -128, -128, -128, -128, -128, -128, -128, -128, -128);
+  }
+  __m512i lookup1;
+  if (base64_url) {
+    lookup1 = _mm512_set_epi8(
+        -128, -128, -128, -128, -128, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42,
+        41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, -128,
+        63, -128, -128, -128, -128, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,
+        14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -128);
+  } else {
+    lookup1 = _mm512_set_epi8(
+        -128, -128, -128, -128, -128, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42,
+        41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, -128,
+        -128, -128, -128, -128, -128, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16,
+        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -128);
+  }
+
+  const __m512i translated = _mm512_permutex2var_epi8(lookup0, input, lookup1);
+  const __m512i combined = _mm512_or_si512(translated, input);
+  const __mmask64 mask = _mm512_movepi8_mask(combined) & input_mask;
+  if (mask) {
+    const __mmask64 spaces =
+        _mm512_cmpeq_epi8_mask(_mm512_shuffle_epi8(ascii_space_tbl, input),
+                               input) &
+        input_mask;
+    *error = (mask ^ spaces);
+  }
+  b->chunks[0] = translated;
+
+  return mask | (~input_mask);
+}
 
-    Ad 2.
+static inline void copy_block(block64 *b, char *output) {
+  _mm512_storeu_si512(reinterpret_cast<__m512i *>(output), b->chunks[0]);
+}
 
-    When values fit in 16-bit code units, but are above 0x07ff, then
-    a single word may produce one, two or three UTF8 bytes.
+static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
+  uint64_t nmask = ~mask;
+  __m512i c = _mm512_maskz_compress_epi8(nmask, b->chunks[0]);
+  _mm512_storeu_si512(reinterpret_cast<__m512i *>(output), c);
+  return _mm_popcnt_u64(nmask);
+}
 
-    We prepare data for all these three cases in two registers.
-    The first register contains lower two UTF8 bytes (used in all
-    cases), while the second one contains just the third byte for
-    the three-UTF8-bytes case.
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char *src) {
+  b->chunks[0] = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
+}
 
-    Finally these two registers are interleaved forming eight-element
-    array of 32-bit values. The array spans two SSE registers.
-    The bytes from the registers are compressed using two shuffles.
+static inline void load_block_partial(block64 *b, const char *src,
+                                      __mmask64 input_mask) {
+  b->chunks[0] = _mm512_maskz_loadu_epi8(
+      input_mask, reinterpret_cast<const __m512i *>(src));
+}
 
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char16_t *src) {
+  __m512i m1 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
+  __m512i m2 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src + 32));
+  __m512i p = _mm512_packus_epi16(m1, m2);
+  b->chunks[0] =
+      _mm512_permutexvar_epi64(_mm512_setr_epi64(0, 2, 4, 6, 1, 3, 5, 7), p);
+}
 
+static inline void load_block_partial(block64 *b, const char16_t *src,
+                                      __mmask64 input_mask) {
+  __m512i m1 = _mm512_maskz_loadu_epi16((__mmask32)input_mask,
+                                        reinterpret_cast<const __m512i *>(src));
+  __m512i m2 =
+      _mm512_maskz_loadu_epi16((__mmask32)(input_mask >> 32),
+                               reinterpret_cast<const __m512i *>(src + 32));
+  __m512i p = _mm512_packus_epi16(m1, m2);
+  b->chunks[0] =
+      _mm512_permutexvar_epi64(_mm512_setr_epi64(0, 2, 4, 6, 1, 3, 5, 7), p);
+}
 
-    To summarize:
-    - We need two 256-entry tables that have 8704 bytes in total.
-*/
-/*
-  Returns a pair: the first unprocessed byte from buf and utf8_output
-  A scalar routing should carry on the conversion of the tail.
-*/
-template <endianness big_endian>
-std::pair<const char16_t *, char *>
-arm_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_out) {
-  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
-  const char16_t *end = buf + len;
+static inline void base64_decode(char *out, __m512i str) {
+  const __m512i merge_ab_and_bc =
+      _mm512_maddubs_epi16(str, _mm512_set1_epi32(0x01400140));
+  const __m512i merged =
+      _mm512_madd_epi16(merge_ab_and_bc, _mm512_set1_epi32(0x00011000));
+  const __m512i pack = _mm512_set_epi8(
+      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 61, 62, 56, 57, 58,
+      52, 53, 54, 48, 49, 50, 44, 45, 46, 40, 41, 42, 36, 37, 38, 32, 33, 34,
+      28, 29, 30, 24, 25, 26, 20, 21, 22, 16, 17, 18, 12, 13, 14, 8, 9, 10, 4,
+      5, 6, 0, 1, 2);
+  const __m512i shuffled = _mm512_permutexvar_epi8(pack, merged);
+  _mm512_mask_storeu_epi8(
+      (__m512i *)out, 0xffffffffffff,
+      shuffled); // mask would be 0xffffffffffff since we write 48 bytes.
+}
+// decode 64 bytes and output 48 bytes
+static inline void base64_decode_block(char *out, const char *src) {
+  base64_decode(out,
+                _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src)));
+}
+static inline void base64_decode_block(char *out, block64 *b) {
+  base64_decode(out, b->chunks[0]);
+}
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
+template <bool base64_url, typename chartype>
+full_result
+compress_decode_base64(char *dst, const chartype *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
+  (void)options;
+  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
+                                        : tables::base64::to_base64_value;
+  size_t equallocation =
+      srclen; // location of the first padding character if any
+  size_t equalsigns = 0;
+  // skip trailing spaces
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
+    srclen--;
+  }
+  if (srclen > 0 && src[srclen - 1] == '=') {
+    equallocation = srclen - 1;
+    srclen--;
+    equalsigns = 1;
+    // skip trailing spaces
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
+      srclen--;
     }
-    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
-      // It is common enough that we have sequences of 16 consecutive ASCII
-      // characters.
-      uint16x8_t nextin =
-          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
-      if (!match_system(big_endian)) {
-        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+    if (srclen > 0 && src[srclen - 1] == '=') {
+      equallocation = srclen - 1;
+      srclen--;
+      equalsigns = 2;
+    }
+  }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  const chartype *const srcinit = src;
+  const char *const dstinit = dst;
+  const chartype *const srcend = src + srclen;
+
+  // figure out why block_size == 2 is sometimes best???
+  constexpr size_t block_size = 6;
+  char buffer[block_size * 64];
+  char *bufferptr = buffer;
+  if (srclen >= 64) {
+    const chartype *const srcend64 = src + srclen - 64;
+    while (src <= srcend64) {
+      block64 b;
+      load_block(&b, src);
+      src += 64;
+      uint64_t error = 0;
+      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
+      if (error) {
+        src -= 64;
+        size_t error_offset = _tzcnt_u64(error);
+        return {error_code::INVALID_BASE64_CHARACTER,
+                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
       }
-      if (vmaxvq_u16(nextin) > 0x7F) {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x8_t utf8_packed = vmovn_u16(in);
-        // 2. store (8 bytes)
-        vst1_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        in = nextin;
+      if (badcharmask != 0) {
+        // optimization opportunity: check for simple masks like those made of
+        // continuous 1s followed by continuous 0s. And masks containing a
+        // single bad character.
+        bufferptr += compress_block(&b, badcharmask, bufferptr);
+      } else if (bufferptr != buffer) {
+        copy_block(&b, bufferptr);
+        bufferptr += 64;
       } else {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
-        // 2. store (16 bytes)
-        vst1q_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
+        base64_decode_block(dst, &b);
+        dst += 48;
+      }
+      if (bufferptr >= (block_size - 1) * 64 + buffer) {
+        for (size_t i = 0; i < (block_size - 1); i++) {
+          base64_decode_block(dst, buffer + i * 64);
+          dst += 48;
+        }
+        std::memcpy(buffer, buffer + (block_size - 1) * 64,
+                    64); // 64 might be too much
+        bufferptr -= (block_size - 1) * 64;
       }
     }
+  }
 
-    if (vmaxvq_u16(in) <= 0x7FF) {
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const uint16x8_t t0 = vshlq_n_u16(in, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const uint16x8_t t2 = vandq_u16(in, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const uint16x8_t t3 = vorrq_u16(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-      // 2. merge ASCII and 2-byte codewords
-      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-      const uint8x16_t utf8_unpacked =
-          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
-      // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t mask = simdutf_make_uint16x8_t(
-          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
-#else
-      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
-                               0x0002, 0x0008, 0x0020, 0x0080};
-#endif
-      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-      // 4. pack the bytes
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const uint8x16_t shuffle = vld1q_u8(row + 1);
-      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-      // 5. store bytes
-      vst1q_u8(utf8_output, utf8_packed);
-
-      // 6. adjust pointers
-      buf += 8;
-      utf8_output += row[0];
-      continue;
+  int last_block_len = (int)(srcend - src);
+  if (last_block_len != 0) {
+    __mmask64 input_mask = ((__mmask64)1 << last_block_len) - 1;
+    block64 b;
+    load_block_partial(&b, src, input_mask);
+    uint64_t error = 0;
+    uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error, input_mask);
+    if (error) {
+      size_t error_offset = _tzcnt_u64(error);
+      return {error_code::INVALID_BASE64_CHARACTER,
+              size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
     }
-    const uint16x8_t surrogates_bytemask =
-        vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
-    // However, it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-#else
-      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-#endif
-      /* In this branch we handle three cases:
-         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
+    src += last_block_len;
+    bufferptr += compress_block(&b, badcharmask, bufferptr);
+  }
 
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
+  char *buffer_start = buffer;
+  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
+    base64_decode_block(dst, buffer_start);
+    dst += 48;
+  }
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
+  if ((bufferptr - buffer_start) != 0) {
+    size_t rem = (bufferptr - buffer_start);
+    int idx = rem % 4;
+    __mmask64 mask = ((__mmask64)1 << rem) - 1;
+    __m512i input = _mm512_maskz_loadu_epi8(mask, buffer_start);
+    size_t output_len = (rem / 4) * 3;
+    __mmask64 output_mask = mask >> (rem - output_len);
+    const __m512i merge_ab_and_bc =
+        _mm512_maddubs_epi16(input, _mm512_set1_epi32(0x01400140));
+    const __m512i merged =
+        _mm512_madd_epi16(merge_ab_and_bc, _mm512_set1_epi32(0x00011000));
+    const __m512i pack = _mm512_set_epi8(
+        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 61, 62, 56, 57, 58,
+        52, 53, 54, 48, 49, 50, 44, 45, 46, 40, 41, 42, 36, 37, 38, 32, 33, 34,
+        28, 29, 30, 24, 25, 26, 20, 21, 22, 16, 17, 18, 12, 13, 14, 8, 9, 10, 4,
+        5, 6, 0, 1, 2);
+    const __m512i shuffled = _mm512_permutexvar_epi8(pack, merged);
+
+    if (last_chunk_options == last_chunk_handling_options::strict &&
+        (idx != 1) && ((idx + equalsigns) & 3) != 0) {
+      // The partial chunk was at src - idx
+      _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+      dst += output_len;
+      return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+              size_t(dst - dstinit)};
+    } else if (last_chunk_options ==
+                   last_chunk_handling_options::stop_before_partial &&
+               (idx != 1) && ((idx + equalsigns) & 3) != 0) {
+      // Rewind src to before partial chunk
+      _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+      dst += output_len;
+      src -= idx;
+    } else {
+      if (idx == 2) {
+        if (last_chunk_options == last_chunk_handling_options::strict) {
+          uint32_t triple = (uint32_t(bufferptr[-2]) << 3 * 6) +
+                            (uint32_t(bufferptr[-1]) << 2 * 6);
+          if (triple & 0xffff) {
+            _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+            dst += output_len;
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
+        }
+        output_mask = (output_mask << 1) | 1;
+        output_len += 1;
+        _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+        dst += output_len;
+      } else if (idx == 3) {
+        if (last_chunk_options == last_chunk_handling_options::strict) {
+          uint32_t triple = (uint32_t(bufferptr[-3]) << 3 * 6) +
+                            (uint32_t(bufferptr[-2]) << 2 * 6) +
+                            (uint32_t(bufferptr[-1]) << 1 * 6);
+          if (triple & 0xff) {
+            _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+            dst += output_len;
+            return {BASE64_EXTRA_BITS, size_t(src - srcinit),
+                    size_t(dst - dstinit)};
+          }
+        }
+        output_mask = (output_mask << 2) | 3;
+        output_len += 2;
+        _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+        dst += output_len;
+      } else if (idx == 1) {
+        _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+        dst += output_len;
+        return {BASE64_INPUT_REMAINDER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      } else {
+        _mm512_mask_storeu_epi8((__m512i *)dst, output_mask, shuffled);
+        dst += output_len;
+      }
+    }
 
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
+    if (last_chunk_options != stop_before_partial && equalsigns > 0) {
+      size_t output_count = size_t(dst - dstinit);
+      if ((output_count % 3 == 0) ||
+          ((output_count % 3) + 1 + equalsigns != 4)) {
+        return {INVALID_BASE64_CHARACTER, equallocation, output_count};
+      }
+    }
 
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const uint16x8_t t0 = vreinterpretq_u16_u8(
-          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+    return {SUCCESS, srclen, size_t(dst - dstinit)};
+  }
 
-      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-      const uint16x8_t s0 = vshrq_n_u16(in, 12);
-      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
-      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-      // [00bb|bbbb|0000|aaaa]
-      const uint16x8_t s2 = vorrq_u16(s0, s1s);
-      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
-      const uint16x8_t m0 =
-          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-      const uint16x8_t s4 = veorq_u16(s3, m0);
-#undef simdutf_vec
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
+    }
+  }
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
+}
+/* end file src/icelake/icelake_base64.inl.cpp */
 
-      // 4. expand code units 16-bit => 32-bit
-      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+#include <cstdint>
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t onemask = simdutf_make_uint16x8_t(
-          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
-      const uint16x8_t twomask = simdutf_make_uint16x8_t(
-          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
-#else
-      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                  0x0100, 0x0400, 0x1000, 0x4000};
-      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
-                                  0x0200, 0x0800, 0x2000, 0x8000};
-#endif
-      const uint16x8_t combined =
-          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
-                    vandq_u16(one_or_two_bytes_bytemask, twomask));
-      const uint16_t mask = vaddvq_u16(combined);
-      // The following fast path may or may not be beneficial.
-      /*if(mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-        vst1q_u8(utf8_output, utf8_0);
-        utf8_output += 12;
-        vst1q_u8(utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
-        continue;
-      }*/
-      const uint8_t mask0 = uint8_t(mask);
+} // namespace
+} // namespace icelake
+} // namespace simdutf
 
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+namespace simdutf {
+namespace icelake {
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  // todo: convert to a one-pass algorithm
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
+  return out;
+}
 
-      vst1q_u8(utf8_output, utf8_0);
-      utf8_output += row0[0];
-      vst1q_u8(utf8_output, utf8_1);
-      utf8_output += row1[0];
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return true;
+  }
+  avx512_utf8_checker checker{};
+  const char *ptr = buf;
+  const char *end = ptr + len;
+  for (; end - ptr >= 64; ptr += 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  if (end != ptr) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  return !checker.errors();
+}
 
-      buf += 8;
-      // surrogate pair(s) in a register
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint16_t word = !match_system(big_endian)
-                            ? scalar::utf16::swap_bytes(buf[k])
-                            : buf[k];
-        if ((word & 0xFF80) == 0) {
-          *utf8_output++ = char(word);
-        } else if ((word & 0xF800) == 0) {
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xF800) != 0xD800) {
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian)
-                                   ? scalar::utf16::swap_bytes(buf[k + 1])
-                                   : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char *>(utf8_output));
-          }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value >> 18) | 0b11110000);
-          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((value & 0b111111) | 0b10000000);
-        }
-      }
-      buf += k;
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, len);
+  }
+  avx512_utf8_checker checker{};
+  const char *ptr = buf;
+  const char *end = ptr + len;
+  size_t count{0};
+  for (; end - ptr >= 64; ptr += 64) {
+    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
+    checker.check_next_input(utf8);
+    if (checker.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(buf),
+          reinterpret_cast<const char *>(buf + count), len - count);
+      res.count += count;
+      return res;
     }
-  } // while
+    count += 64;
+  }
+  if (end != ptr) {
+    const __m512i utf8 = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
+    checker.check_next_input(utf8);
+  }
+  checker.check_eof();
+  if (checker.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(buf),
+        reinterpret_cast<const char *>(buf + count), len - count);
+    res.count += count;
+    return res;
+  }
+  return result(error_code::SUCCESS, len);
+}
+
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return icelake::validate_ascii(buf, len);
+}
 
-  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  const char *buf_orig = buf;
+  const char *end = buf + len;
+  const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
+  for (; end - buf >= 64; buf += 64) {
+    const __m512i input = _mm512_loadu_si512((const __m512i *)buf);
+    __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
+    if (notascii) {
+      return result(error_code::TOO_LARGE,
+                    buf - buf_orig + _tzcnt_u64(notascii));
+    }
+  }
+  if (end != buf) {
+    const __m512i input = _mm512_maskz_loadu_epi8(
+        ~UINT64_C(0) >> (64 - (end - buf)), (const __m512i *)buf);
+    __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
+    if (notascii) {
+      return result(error_code::TOO_LARGE,
+                    buf - buf_orig + _tzcnt_u64(notascii));
+    }
+  }
+  return result(error_code::SUCCESS, len);
 }
 
-/*
-  Returns a pair: a result struct and utf8_output.
-  If there is an error, the count field of the result is the position of the
-  error. Otherwise, it is the position of the first unprocessed byte in buf
-  (even if finished). A scalar routing should carry on the conversion of the
-  tail if needed.
-*/
-template <endianness big_endian>
-std::pair<result, char *>
-arm_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
-                                      char *utf8_out) {
-  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
-  const char16_t *start = buf;
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
   const char16_t *end = buf + len;
 
-  const uint16x8_t v_f800 = vmovq_n_u16((uint16_t)0xf800);
-  const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
-
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t *>(buf));
-    if (!match_system(big_endian)) {
-      in = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(in)));
-    }
-    if (vmaxvq_u16(in) <= 0x7F) { // ASCII fast path!!!!
-      // It is common enough that we have sequences of 16 consecutive ASCII
-      // characters.
-      uint16x8_t nextin =
-          vld1q_u16(reinterpret_cast<const uint16_t *>(buf) + 8);
-      if (!match_system(big_endian)) {
-        nextin = vreinterpretq_u16_u8(vrev16q_u8(vreinterpretq_u8_u16(nextin)));
+  for (; end - buf >= 32;) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
       }
-      if (vmaxvq_u16(nextin) > 0x7F) {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x8_t utf8_packed = vmovn_u16(in);
-        // 2. store (8 bytes)
-        vst1_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        in = nextin;
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
       } else {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x16_t utf8_packed = vmovn_high_u16(vmovn_u16(in), nextin);
-        // 2. store (16 bytes)
-        vst1q_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
+        buf += 32;
       }
+    } else {
+      buf += 32;
     }
-
-    if (vmaxvq_u16(in) <= 0x7FF) {
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-      const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const uint16x8_t t0 = vshlq_n_u16(in, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const uint16x8_t t2 = vandq_u16(in, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const uint16x8_t t3 = vorrq_u16(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-      // 2. merge ASCII and 2-byte codewords
-      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-      const uint8x16_t utf8_unpacked =
-          vreinterpretq_u8_u16(vbslq_u16(one_byte_bytemask, in, t4));
-      // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t mask = simdutf_make_uint16x8_t(
-          0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
-#else
-      const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
-                               0x0002, 0x0008, 0x0020, 0x0080};
-#endif
-      uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-      // 4. pack the bytes
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const uint8x16_t shuffle = vld1q_u8(row + 1);
-      const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
-
-      // 5. store bytes
-      vst1q_u8(utf8_output, utf8_packed);
-
-      // 6. adjust pointers
-      buf += 8;
-      utf8_output += row[0];
-      continue;
+  }
+  if (buf < end) {
+    __m512i in =
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
+      }
     }
-    const uint16x8_t surrogates_bytemask =
-        vceqq_u16(vandq_u16(in, v_f800), v_d800);
-    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
-    // However, it is likely an uncommon occurrence.
-    if (vmaxvq_u16(surrogates_bytemask) == 0) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t dup_even = simdutf_make_uint16x8_t(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-#else
-      const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                   0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-#endif
-      /* In this branch we handle three cases:
-         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
-
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
-
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
-
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
-
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const uint16x8_t t0 = vreinterpretq_u16_u8(
-          vqtbl1q_u8(vreinterpretq_u8_u16(in), vreinterpretq_u8_u16(dup_even)));
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+  }
+  return true;
+}
 
-      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-      const uint16x8_t s0 = vshrq_n_u16(in, 12);
-      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-      const uint16x8_t s1 = vandq_u16(in, simdutf_vec(0b0000111111000000));
-      // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-      const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-      // [00bb|bbbb|0000|aaaa]
-      const uint16x8_t s2 = vorrq_u16(s0, s1s);
-      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-      const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-      const uint16x8_t one_or_two_bytes_bytemask = vcleq_u16(in, v_07ff);
-      const uint16x8_t m0 =
-          vbicq_u16(simdutf_vec(0b0100000000000000), one_or_two_bytes_bytemask);
-      const uint16x8_t s4 = veorq_u16(s3, m0);
-#undef simdutf_vec
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  const char16_t *end = buf + len;
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  for (; end - buf >= 32;) {
+    __m512i in =
+        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
+      } else {
+        buf += 32;
+      }
+    } else {
+      buf += 32;
+    }
+  }
+  if (buf < end) {
+    __m512i in = _mm512_shuffle_epi8(
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
+        byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        return false;
+      }
+    }
+  }
+  return true;
+}
 
-      // 4. expand code units 16-bit => 32-bit
-      const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-      const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  const char16_t *start_buf = buf;
+  const char16_t *end = buf + len;
+  for (; end - buf >= 32;) {
+    __m512i in = _mm512_loadu_si512((__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
+      } else {
+        buf += 32;
+      }
+    } else {
+      buf += 32;
+    }
+  }
+  if (buf < end) {
+    __m512i in =
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+    }
+  }
+  return result(error_code::SUCCESS, len);
+}
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-      const uint16x8_t one_byte_bytemask = vcleq_u16(in, v_007f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-      const uint16x8_t onemask = simdutf_make_uint16x8_t(
-          0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
-      const uint16x8_t twomask = simdutf_make_uint16x8_t(
-          0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
-#else
-      const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                  0x0100, 0x0400, 0x1000, 0x4000};
-      const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
-                                  0x0200, 0x0800, 0x2000, 0x8000};
-#endif
-      const uint16x8_t combined =
-          vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
-                    vandq_u16(one_or_two_bytes_bytemask, twomask));
-      const uint16_t mask = vaddvq_u16(combined);
-      // The following fast path may or may not be beneficial.
-      /*if(mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-        vst1q_u8(utf8_output, utf8_0);
-        utf8_output += 12;
-        vst1q_u8(utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
-        continue;
-      }*/
-      const uint8_t mask0 = uint8_t(mask);
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  const char16_t *start_buf = buf;
+  const char16_t *end = buf + len;
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  for (; end - buf >= 32;) {
+    __m512i in =
+        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
+      if (ends_with_high) {
+        buf += 31; // advance only by 31 code units so that we start with the
+                   // high surrogate on the next round.
+      } else {
+        buf += 32;
+      }
+    } else {
+      buf += 32;
+    }
+  }
+  if (buf < end) {
+    __m512i in = _mm512_shuffle_epi8(
+        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
+        byteflip);
+    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
+    __mmask32 surrogates =
+        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
+    if (surrogates) {
+      __mmask32 highsurrogates =
+          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
+      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
+      // high must be followed by low
+      if ((highsurrogates << 1) != lowsurrogates) {
+        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
+        uint32_t extra_high =
+            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
+        return result(error_code::SURROGATE,
+                      (buf - start_buf) +
+                          (extra_low < extra_high ? extra_low : extra_high));
+      }
+    }
+  }
+  return result(error_code::SUCCESS, len);
+}
 
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-      const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  const char32_t *tail = icelake::validate_utf32(buf, len);
+  if (tail) {
+    return scalar::utf32::validate(tail, len - (tail - buf));
+  } else {
+    // we come here if there was an error, or buf was nullptr which may happen
+    // for empty input.
+    return len == 0;
+  }
+}
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-      const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  const char32_t *buf_orig = buf;
+  if (len >= 16) {
+    const char32_t *end = buf + len - 16;
+    while (buf <= end) {
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
+      __mmask16 outside_range = _mm512_cmp_epu32_mask(
+          utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
 
-      vst1q_u8(utf8_output, utf8_0);
-      utf8_output += row0[0];
-      vst1q_u8(utf8_output, utf8_1);
-      utf8_output += row1[0];
+      __m512i utf32_off =
+          _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
 
-      buf += 8;
-      // surrogate pair(s) in a register
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint16_t word = !match_system(big_endian)
-                            ? scalar::utf16::swap_bytes(buf[k])
-                            : buf[k];
-        if ((word & 0xFF80) == 0) {
-          *utf8_output++ = char(word);
-        } else if ((word & 0xF800) == 0) {
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xF800) != 0xD800) {
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          // must be a surrogate pair
-          uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word = !match_system(big_endian)
-                                   ? scalar::utf16::swap_bytes(buf[k + 1])
-                                   : buf[k + 1];
-          k++;
-          uint16_t diff2 = uint16_t(next_word - 0xDC00);
-          if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k - 1),
-                reinterpret_cast<char *>(utf8_output));
-          }
-          uint32_t value = (diff << 10) + diff2 + 0x10000;
-          *utf8_output++ = char((value >> 18) | 0b11110000);
-          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+      __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
+          utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
+      if ((outside_range | surrogate_range)) {
+        auto outside_idx = _tzcnt_u32(outside_range);
+        auto surrogate_idx = _tzcnt_u32(surrogate_range);
+
+        if (outside_idx < surrogate_idx) {
+          return result(error_code::TOO_LARGE, buf - buf_orig + outside_idx);
         }
+
+        return result(error_code::SURROGATE, buf - buf_orig + surrogate_idx);
       }
-      buf += k;
+
+      buf += 16;
     }
-  } // while
+  }
+  if (len > 0) {
+    __m512i utf32 = _mm512_maskz_loadu_epi32(
+        __mmask16((1U << (buf_orig + len - buf)) - 1), (const __m512i *)buf);
+    __mmask16 outside_range = _mm512_cmp_epu32_mask(
+        utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
+    __m512i utf32_off = _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        reinterpret_cast<char *>(utf8_output));
+    __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
+        utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
+    if ((outside_range | surrogate_range)) {
+      auto outside_idx = _tzcnt_u32(outside_range);
+      auto surrogate_idx = _tzcnt_u32(surrogate_range);
+
+      if (outside_idx < surrogate_idx) {
+        return result(error_code::TOO_LARGE, buf - buf_orig + outside_idx);
+      }
+
+      return result(error_code::SURROGATE, buf - buf_orig + surrogate_idx);
+    }
+  }
+
+  return result(error_code::SUCCESS, len);
 }
-/* end file src/arm64/arm_convert_utf16_to_utf8.cpp */
 
-/* begin file src/arm64/arm_base64.cpp */
-/**
- * References and further reading:
- *
- * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
- * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
- * https://arxiv.org/abs/1910.05109
- *
- * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
- * Instructions, ACM Transactions on the Web 12 (3), 2018.
- * https://arxiv.org/abs/1704.00605
- *
- * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
- * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
- * Request for Comments: 4648.
- *
- * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
- * http://www.alfredklomp.com/programming/sse-base64/. (2014).
- *
- * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
- * acceleration. https://github.com/aklomp/base64. (2014).
- *
- * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
- * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
- *
- * Nick Kopp. 2013. Base64 Encoding on a GPU.
- * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
- */
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  return icelake::latin1_to_utf8_avx512_start(buf, len, utf8_output);
+}
 
-size_t encode_base64(char *dst, const char *src, size_t srclen,
-                     base64_options options) {
-  // credit: Wojciech Muła
-  uint8_t *out = (uint8_t *)dst;
-  constexpr static uint8_t source_table[64] = {
-      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
-      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
-      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
-      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
-      'N', 'd', 't', '9', 'O', 'e', 'u', '+', 'P', 'f', 'v', '/',
-  };
-  constexpr static uint8_t source_table_url[64] = {
-      'A', 'Q', 'g', 'w', 'B', 'R', 'h', 'x', 'C', 'S', 'i', 'y', 'D',
-      'T', 'j', 'z', 'E', 'U', 'k', '0', 'F', 'V', 'l', '1', 'G', 'W',
-      'm', '2', 'H', 'X', 'n', '3', 'I', 'Y', 'o', '4', 'J', 'Z', 'p',
-      '5', 'K', 'a', 'q', '6', 'L', 'b', 'r', '7', 'M', 'c', 's', '8',
-      'N', 'd', 't', '9', 'O', 'e', 'u', '-', 'P', 'f', 'v', '_',
-  };
-  const uint8x16_t v3f = vdupq_n_u8(0x3f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  // When trying to load a uint8_t array, Visual Studio might
-  // error with: error C2664: '__n128x4 neon_ld4m_q8(const char *)':
-  // cannot convert argument 1 from 'const uint8_t [64]' to 'const char *
-  const uint8x16x4_t table = vld4q_u8(
-      (reinterpret_cast<const char *>(options & base64_url) ? source_table_url
-                                                            : source_table));
-#else
-  const uint8x16x4_t table =
-      vld4q_u8((options & base64_url) ? source_table_url : source_table);
-#endif
-  size_t i = 0;
-  for (; i + 16 * 3 <= srclen; i += 16 * 3) {
-    const uint8x16x3_t in = vld3q_u8((const uint8_t *)src + i);
-    uint8x16x4_t result;
-    result.val[0] = vshrq_n_u8(in.val[0], 2);
-    result.val[1] =
-        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[1], 4), in.val[0], 4), v3f);
-    result.val[2] =
-        vandq_u8(vsliq_n_u8(vshrq_n_u8(in.val[2], 6), in.val[1], 2), v3f);
-    result.val[3] = vandq_u8(in.val[2], v3f);
-    result.val[0] = vqtbl4q_u8(table, result.val[0]);
-    result.val[1] = vqtbl4q_u8(table, result.val[1]);
-    result.val[2] = vqtbl4q_u8(table, result.val[2]);
-    result.val[3] = vqtbl4q_u8(table, result.val[3]);
-    vst4q_u8(out, result);
-    out += 64;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return icelake_convert_latin1_to_utf16<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return icelake_convert_latin1_to_utf16<endianness::BIG>(buf, len,
+                                                          utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      avx512_convert_latin1_to_utf32(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
   }
-  out += scalar::base64::tail_encode_base64((char *)out, src + i, srclen - i,
-                                            options);
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
+    }
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
 
-  return size_t((char *)out - dst);
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake::utf8_to_latin1_avx512(buf, len, latin1_output);
 }
 
-static inline void compress(uint8x16_t data, uint16_t mask, char *output) {
-  if (mask == 0) {
-    vst1q_u8((uint8_t *)output, data);
-    return;
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  // First, try to convert as much as possible using the SIMD implementation.
+  const char *obuf = buf;
+  char *olatin1_output = latin1_output;
+  size_t written = icelake::utf8_to_latin1_avx512(obuf, len, olatin1_output);
+
+  // If we have completely converted the string
+  if (obuf == buf + len) {
+    return {simdutf::SUCCESS, written};
   }
-  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
-  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
-  uint64x2_t compactmasku64 = {tables::base64::thintable_epi8[mask1],
-                               tables::base64::thintable_epi8[mask2]};
-  uint8x16_t compactmask = vreinterpretq_u8_u64(compactmasku64);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t off =
-      simdutf_make_uint8x16_t(0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8);
-#else
-  const uint8x16_t off = {0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8};
-#endif
+  size_t pos = obuf - buf;
+  result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+      pos, buf + pos, len - pos, latin1_output);
+  res.count += pos;
+  return res;
+}
 
-  compactmask = vaddq_u8(compactmask, off);
-  uint8x16_t pruned = vqtbl1q_u8(data, compactmask);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake::valid_utf8_to_latin1_avx512(buf, len, latin1_output);
+}
 
-  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
-  // then load the corresponding mask, what it does is to write
-  // only the first pop1 bytes from the first 8 bytes, and then
-  // it fills in with the bytes from the second 8 bytes + some filling
-  // at the end.
-  compactmask = vld1q_u8(tables::base64::pshufb_combine_table + pop1 * 8);
-  uint8x16_t answer = vqtbl1q_u8(pruned, compactmask);
-  vst1q_u8((uint8_t *)output, answer);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      fast_avx512_convert_utf8_to_utf16<endianness::LITTLE>(buf, len,
+                                                            utf16_output);
+  if (ret.second == nullptr) {
+    return 0;
+  }
+  return ret.second - utf16_output;
 }
 
-struct block64 {
-  uint8x16_t chunks[4];
-};
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret = fast_avx512_convert_utf8_to_utf16<endianness::BIG>(
+      buf, len, utf16_output);
+  if (ret.second == nullptr) {
+    return 0;
+  }
+  return ret.second - utf16_output;
+}
 
-static_assert(sizeof(block64) == 64, "block64 is not 64 bytes");
-template <bool base64_url> uint64_t to_base64_mask(block64 *b, bool *error) {
-  uint8x16_t v0f = vdupq_n_u8(0xf);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
+}
 
-  uint8x16_t underscore0, underscore1, underscore2, underscore3;
-  if (base64_url) {
-    underscore0 = vceqq_u8(b->chunks[0], vdupq_n_u8(0x5f));
-    underscore1 = vceqq_u8(b->chunks[1], vdupq_n_u8(0x5f));
-    underscore2 = vceqq_u8(b->chunks[2], vdupq_n_u8(0x5f));
-    underscore3 = vceqq_u8(b->chunks[3], vdupq_n_u8(0x5f));
-  } else {
-    (void)underscore0;
-    (void)underscore1;
-    (void)underscore2;
-    (void)underscore3;
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, char16_t>(
+          buf, len, utf16_output);
+  size_t saved_bytes = ret.second - utf16_output;
+  const char *end = buf + len;
+  if (ret.first == end) {
+    return saved_bytes;
   }
 
-  uint8x16_t lo_nibbles0 = vandq_u8(b->chunks[0], v0f);
-  uint8x16_t lo_nibbles1 = vandq_u8(b->chunks[1], v0f);
-  uint8x16_t lo_nibbles2 = vandq_u8(b->chunks[2], v0f);
-  uint8x16_t lo_nibbles3 = vandq_u8(b->chunks[3], v0f);
+  // Note: AVX512 procedure looks up 4 bytes forward, and
+  //       correctly converts multi-byte chars even if their
+  //       continuation bytes lie outsiede 16-byte window.
+  //       It meas, we have to skip continuation bytes from
+  //       the beginning ret.first, as they were already consumed.
+  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
+    ret.first += 1;
+  }
 
-  // Needed by the decoding step.
-  uint8x16_t hi_nibbles0 = vshrq_n_u8(b->chunks[0], 4);
-  uint8x16_t hi_nibbles1 = vshrq_n_u8(b->chunks[1], 4);
-  uint8x16_t hi_nibbles2 = vshrq_n_u8(b->chunks[2], 4);
-  uint8x16_t hi_nibbles3 = vshrq_n_u8(b->chunks[3], 4);
-  uint8x16_t lut_lo;
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    lut_lo =
-        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
-                                0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4);
-  } else {
-    lut_lo =
-        simdutf_make_uint8x16_t(0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
-                                0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4);
+  if (ret.first != end) {
+    const size_t scalar_saved_bytes =
+        scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-#else
-  if (base64_url) {
-    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
-                        0x70, 0x61, 0xe1, 0xf4, 0xe5, 0xa5, 0xf4, 0xf4};
-  } else {
-    lut_lo = uint8x16_t{0x3a, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70, 0x70,
-                        0x70, 0x61, 0xe1, 0xb4, 0xe5, 0xe5, 0xf4, 0xb4};
+
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::BIG, char16_t>(
+          buf, len, utf16_output);
+  size_t saved_bytes = ret.second - utf16_output;
+  const char *end = buf + len;
+  if (ret.first == end) {
+    return saved_bytes;
   }
-#endif
-  uint8x16_t lo0 = vqtbl1q_u8(lut_lo, lo_nibbles0);
-  uint8x16_t lo1 = vqtbl1q_u8(lut_lo, lo_nibbles1);
-  uint8x16_t lo2 = vqtbl1q_u8(lut_lo, lo_nibbles2);
-  uint8x16_t lo3 = vqtbl1q_u8(lut_lo, lo_nibbles3);
-  uint8x16_t lut_hi;
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    lut_hi =
-        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
-                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
-  } else {
-    lut_hi =
-        simdutf_make_uint8x16_t(0x11, 0x20, 0x42, 0x80, 0x8, 0x4, 0x8, 0x4,
-                                0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20);
+
+  // Note: AVX512 procedure looks up 4 bytes forward, and
+  //       correctly converts multi-byte chars even if their
+  //       continuation bytes lie outsiede 16-byte window.
+  //       It meas, we have to skip continuation bytes from
+  //       the beginning ret.first, as they were already consumed.
+  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
+    ret.first += 1;
   }
-#else
-  if (base64_url) {
-    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
-                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
-  } else {
-    lut_hi = uint8x16_t{0x11, 0x20, 0x42, 0x80, 0x8,  0x4,  0x8,  0x4,
-                        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
+
+  if (ret.first != end) {
+    const size_t scalar_saved_bytes =
+        scalar::utf8_to_utf16::convert_valid<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-#endif
-  uint8x16_t hi0 = vqtbl1q_u8(lut_hi, hi_nibbles0);
-  uint8x16_t hi1 = vqtbl1q_u8(lut_hi, hi_nibbles1);
-  uint8x16_t hi2 = vqtbl1q_u8(lut_hi, hi_nibbles2);
-  uint8x16_t hi3 = vqtbl1q_u8(lut_hi, hi_nibbles3);
 
-  if (base64_url) {
-    hi0 = vbicq_u8(hi0, underscore0);
-    hi1 = vbicq_u8(hi1, underscore1);
-    hi2 = vbicq_u8(hi2, underscore2);
-    hi3 = vbicq_u8(hi3, underscore3);
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  utf8_to_utf32_result ret =
+      icelake::validating_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
+          buf, len, utf32_output);
+  if (ret.second == nullptr)
+    return 0;
+
+  size_t saved_bytes = ret.second - utf32_output;
+  const char *end = buf + len;
+  if (ret.first == end) {
+    return saved_bytes;
   }
 
-  uint8_t checks =
-      vmaxvq_u8(vorrq_u8(vorrq_u8(vandq_u8(lo0, hi0), vandq_u8(lo1, hi1)),
-                         vorrq_u8(vandq_u8(lo2, hi2), vandq_u8(lo3, hi3))));
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  const uint8x16_t bit_mask =
-      simdutf_make_uint8x16_t(0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                              0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80);
-#else
-  const uint8x16_t bit_mask = {0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80,
-                               0x01, 0x02, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80};
-#endif
-  uint64_t badcharmask = 0;
-  *error = checks > 0x3;
-  if (checks) {
-    // Add each of the elements next to each other, successively, to stuff each
-    // 8 byte mask into one.
-    uint8x16_t test0 = vtstq_u8(lo0, hi0);
-    uint8x16_t test1 = vtstq_u8(lo1, hi1);
-    uint8x16_t test2 = vtstq_u8(lo2, hi2);
-    uint8x16_t test3 = vtstq_u8(lo3, hi3);
-    uint8x16_t sum0 =
-        vpaddq_u8(vandq_u8(test0, bit_mask), vandq_u8(test1, bit_mask));
-    uint8x16_t sum1 =
-        vpaddq_u8(vandq_u8(test2, bit_mask), vandq_u8(test3, bit_mask));
-    sum0 = vpaddq_u8(sum0, sum1);
-    sum0 = vpaddq_u8(sum0, sum0);
-    badcharmask = vgetq_lane_u64(vreinterpretq_u64_u8(sum0), 0);
+  // Note: the AVX512 procedure looks up 4 bytes forward, and
+  //       correctly converts multi-byte chars even if their
+  //       continuation bytes lie outside 16-byte window.
+  //       It means, we have to skip continuation bytes from
+  //       the beginning ret.first, as they were already consumed.
+  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
+    ret.first += 1;
   }
-  // This is the transformation step that can be done while we are waiting for
-  // sum0
-  uint8x16_t roll_lut;
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-  if (base64_url) {
-    roll_lut =
-        simdutf_make_uint8x16_t(0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
-  } else {
-    roll_lut =
-        simdutf_make_uint8x16_t(0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                                0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0);
+  if (ret.first != end) {
+    const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert(
+        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-#else
-  if (base64_url) {
-    roll_lut = uint8x16_t{0xe0, 0x11, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                          0x0,  0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
-  } else {
-    roll_lut = uint8x16_t{0x0, 0x10, 0x13, 0x4, 0xbf, 0xbf, 0xb9, 0xb9,
-                          0x0, 0x0,  0x0,  0x0, 0x0,  0x0,  0x0,  0x0};
+
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return {error_code::SUCCESS, 0};
+  }
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32);
+  auto ret = icelake::validating_utf8_to_fixed_length_with_constant_checks<
+      endianness::LITTLE, uint32_t>(buf, len, utf32_output);
+
+  if (!std::get<2>(ret)) {
+    size_t pos = std::get<0>(ret) - buf;
+    // We might have an error that occurs right before  pos.
+    // This is only a concern if buf[pos] is not a continuation byte.
+    if ((buf[pos] & 0xc0) != 0x80 && pos >= 64) {
+      pos -= 1;
+    } else if ((buf[pos] & 0xc0) == 0x80 && pos >= 64) {
+      // We must check whether we are the fourth continuation byte
+      bool c1 = (buf[pos - 1] & 0xc0) == 0x80;
+      bool c2 = (buf[pos - 2] & 0xc0) == 0x80;
+      bool c3 = (buf[pos - 3] & 0xc0) == 0x80;
+      if (c1 && c2 && c3) {
+        return {simdutf::TOO_LONG, pos};
+      }
+    }
+    // todo: we reset the output to utf32 instead of using std::get<2.(ret) as
+    // you'd expect. that is because
+    // validating_utf8_to_fixed_length_with_constant_checks may have processed
+    // data beyond the error.
+    result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+        pos, buf + pos, len - pos, utf32);
+    res.count += pos;
+    return res;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  const char *end = buf + len;
+  if (std::get<0>(ret) == end) {
+    return {simdutf::SUCCESS, saved_bytes};
+  }
+
+  // Note: the AVX512 procedure looks up 4 bytes forward, and
+  //       correctly converts multi-byte chars even if their
+  //       continuation bytes lie outside 16-byte window.
+  //       It means, we have to skip continuation bytes from
+  //       the beginning ret.first, as they were already consumed.
+  while (std::get<0>(ret) != end and
+         ((uint8_t(*std::get<0>(ret)) & 0xc0) == 0x80)) {
+    std::get<0>(ret) += 1;
+  }
+
+  if (std::get<0>(ret) != end) {
+    auto scalar_result = scalar::utf8_to_utf32::convert_with_errors(
+        std::get<0>(ret), len - (std::get<0>(ret) - buf),
+        reinterpret_cast<char32_t *>(utf32_output) + saved_bytes);
+    if (scalar_result.error != simdutf::SUCCESS) {
+      scalar_result.count += (std::get<0>(ret) - buf);
+    } else {
+      scalar_result.count += saved_bytes;
+    }
+    return scalar_result;
+  }
+
+  return {simdutf::SUCCESS, size_t(std::get<1>(ret) - utf32_output)};
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  utf8_to_utf32_result ret =
+      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
+          buf, len, utf32_output);
+  size_t saved_bytes = ret.second - utf32_output;
+  const char *end = buf + len;
+  if (ret.first == end) {
+    return saved_bytes;
+  }
+
+  // Note: AVX512 procedure looks up 4 bytes forward, and
+  //       correctly converts multi-byte chars even if their
+  //       continuation bytes lie outsiede 16-byte window.
+  //       It meas, we have to skip continuation bytes from
+  //       the beginning ret.first, as they were already consumed.
+  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
+    ret.first += 1;
   }
-#endif
-  uint8x16_t vsecond_last = base64_url ? vdupq_n_u8(0x2d) : vdupq_n_u8(0x2f);
-  if (base64_url) {
-    hi_nibbles0 = vbicq_u8(hi_nibbles0, underscore0);
-    hi_nibbles1 = vbicq_u8(hi_nibbles1, underscore1);
-    hi_nibbles2 = vbicq_u8(hi_nibbles2, underscore2);
-    hi_nibbles3 = vbicq_u8(hi_nibbles3, underscore3);
+
+  if (ret.first != end) {
+    const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert_valid(
+        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-  uint8x16_t roll0 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[0], vsecond_last), hi_nibbles0));
-  uint8x16_t roll1 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[1], vsecond_last), hi_nibbles1));
-  uint8x16_t roll2 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[2], vsecond_last), hi_nibbles2));
-  uint8x16_t roll3 = vqtbl1q_u8(
-      roll_lut, vaddq_u8(vceqq_u8(b->chunks[3], vsecond_last), hi_nibbles3));
-  b->chunks[0] = vaddq_u8(b->chunks[0], roll0);
-  b->chunks[1] = vaddq_u8(b->chunks[1], roll1);
-  b->chunks[2] = vaddq_u8(b->chunks[2], roll2);
-  b->chunks[3] = vaddq_u8(b->chunks[3], roll3);
-  return badcharmask;
+
+  return saved_bytes;
 }
 
-void copy_block(block64 *b, char *output) {
-  vst1q_u8((uint8_t *)output, b->chunks[0]);
-  vst1q_u8((uint8_t *)output + 16, b->chunks[1]);
-  vst1q_u8((uint8_t *)output + 32, b->chunks[2]);
-  vst1q_u8((uint8_t *)output + 48, b->chunks[3]);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
+                                                             latin1_output);
 }
 
-uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
-  uint64_t popcounts =
-      vget_lane_u64(vreinterpret_u64_u8(vcnt_u8(vcreate_u8(~mask))), 0);
-  uint64_t offsets = popcounts * 0x0101010101010101;
-  compress(b->chunks[0], uint16_t(mask), output);
-  compress(b->chunks[1], uint16_t(mask >> 16), &output[(offsets >> 8) & 0xFF]);
-  compress(b->chunks[2], uint16_t(mask >> 32), &output[(offsets >> 24) & 0xFF]);
-  compress(b->chunks[3], uint16_t(mask >> 48), &output[(offsets >> 40) & 0xFF]);
-  return offsets >> 56;
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1<endianness::BIG>(buf, len,
+                                                          latin1_output);
 }
 
-// The caller of this function is responsible to ensure that there are 64 bytes
-// available from reading at src. The data is read into a block64 structure.
-void load_block(block64 *b, const char *src) {
-  b->chunks[0] = vld1q_u8(reinterpret_cast<const uint8_t *>(src));
-  b->chunks[1] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 16);
-  b->chunks[2] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 32);
-  b->chunks[3] = vld1q_u8(reinterpret_cast<const uint8_t *>(src) + 48);
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+             buf, len, latin1_output)
+      .first;
 }
 
-// The caller of this function is responsible to ensure that there are 32 bytes
-// available from reading at data. It returns a 16-byte value, narrowing with
-// saturation the 16-bit words.
-inline uint8x16_t load_satured(const uint16_t *data) {
-  uint16x8_t in1 = vld1q_u16(data);
-  uint16x8_t in2 = vld1q_u16(data + 8);
-  return vqmovn_high_u16(vqmovn_u16(in1), in2);
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf16_to_latin1_with_errors<endianness::BIG>(
+             buf, len, latin1_output)
+      .first;
 }
 
-// The caller of this function is responsible to ensure that there are 128 bytes
-// available from reading at src. The data is read into a block64 structure.
-void load_block(block64 *b, const char16_t *src) {
-  b->chunks[0] = load_satured(reinterpret_cast<const uint16_t *>(src));
-  b->chunks[1] = load_satured(reinterpret_cast<const uint16_t *>(src) + 16);
-  b->chunks[2] = load_satured(reinterpret_cast<const uint16_t *>(src) + 32);
-  b->chunks[3] = load_satured(reinterpret_cast<const uint16_t *>(src) + 48);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement custom function
+  return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-// decode 64 bytes and output 48 bytes
-void base64_decode_block(char *out, const char *src) {
-  uint8x16x4_t str = vld4q_u8((uint8_t *)src);
-  uint8x16x3_t outvec;
-  outvec.val[0] =
-      vorrq_u8(vshlq_n_u8(str.val[0], 2), vshrq_n_u8(str.val[1], 4));
-  outvec.val[1] =
-      vorrq_u8(vshlq_n_u8(str.val[1], 4), vshrq_n_u8(str.val[2], 2));
-  outvec.val[2] = vorrq_u8(vshlq_n_u8(str.val[2], 6), str.val[3]);
-  vst3q_u8((uint8_t *)out, outvec);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement custom function
+  return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-template <bool base64_url, typename char_type>
-full_result
-compress_decode_base64(char *dst, const char_type *src, size_t srclen,
-                       base64_options options,
-                       last_chunk_handling_options last_chunk_options) {
-  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
-                                        : tables::base64::to_base64_value;
-  size_t equallocation =
-      srclen; // location of the first padding character if any
-  // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
-         to_base64[uint8_t(src[srclen - 1])] == 64) {
-    srclen--;
-  }
-  size_t equalsigns = 0;
-  if (srclen > 0 && src[srclen - 1] == '=') {
-    equallocation = srclen - 1;
-    srclen--;
-    equalsigns = 1;
-    // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
-           to_base64[uint8_t(src[srclen - 1])] == 64) {
-      srclen--;
-    }
-    if (srclen > 0 && src[srclen - 1] == '=') {
-      equallocation = srclen - 1;
-      srclen--;
-      equalsigns = 2;
-    }
-  }
-  if (srclen == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
-    }
-    return {SUCCESS, 0, 0};
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  size_t outlen;
+  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    return 0;
   }
-  const char_type *const srcinit = src;
-  const char *const dstinit = dst;
-  const char_type *const srcend = src + srclen;
-
-  constexpr size_t block_size = 10;
-  char buffer[block_size * 64];
-  char *bufferptr = buffer;
-  if (srclen >= 64) {
-    const char_type *const srcend64 = src + srclen - 64;
-    while (src <= srcend64) {
-      block64 b;
-      load_block(&b, src);
-      src += 64;
-      bool error = false;
-      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
-      if (badcharmask) {
-        if (error) {
-          src -= 64;
-          while (src < srcend && scalar::base64::is_eight_byte(*src) &&
-                 to_base64[uint8_t(*src)] <= 64) {
-            src++;
-          }
-          if (src < srcend) {
-            // should never happen
-          }
-          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                  size_t(dst - dstinit)};
-        }
-      }
+  return outlen;
+}
 
-      if (badcharmask != 0) {
-        // optimization opportunity: check for simple masks like those made of
-        // continuous 1s followed by continuous 0s. And masks containing a
-        // single bad character.
-        bufferptr += compress_block(&b, badcharmask, bufferptr);
-      } else {
-        // optimization opportunity: if bufferptr == buffer and mask == 0, we
-        // can avoid the call to compress_block and decode directly.
-        copy_block(&b, bufferptr);
-        bufferptr += 64;
-      }
-      if (bufferptr >= (block_size - 1) * 64 + buffer) {
-        for (size_t i = 0; i < (block_size - 1); i++) {
-          base64_decode_block(dst, buffer + i * 64);
-          dst += 48;
-        }
-        std::memcpy(buffer, buffer + (block_size - 1) * 64,
-                    64); // 64 might be too much
-        bufferptr -= (block_size - 1) * 64;
-      }
-    }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  size_t outlen;
+  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    return 0;
   }
-  char *buffer_start = buffer;
-  // Optimization note: if this is almost full, then it is worth our
-  // time, otherwise, we should just decode directly.
-  int last_block = (int)((bufferptr - buffer_start) % 64);
-  if (last_block != 0 && srcend - src + last_block >= 64) {
-    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
-      uint8_t val = to_base64[uint8_t(*src)];
-      *bufferptr = char(val);
-      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
-      }
-      bufferptr += (val <= 63);
-      src++;
-    }
+  return outlen;
+}
+
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  size_t outlen;
+  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+        buf + inlen, len - inlen, utf8_output + outlen);
+    res.count += inlen;
+    return res;
   }
+  return {simdutf::SUCCESS, outlen};
+}
 
-  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
-    base64_decode_block(dst, buffer_start);
-    dst += 48;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  size_t outlen;
+  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
+      buf, len, (unsigned char *)utf8_output, &outlen);
+  if (inlen != len) {
+    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+        buf + inlen, len - inlen, utf8_output + outlen);
+    res.count += inlen;
+    return res;
   }
-  if ((bufferptr - buffer_start) % 64 != 0) {
-    while (buffer_start + 4 < bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 4);
+  return {simdutf::SUCCESS, outlen};
+}
 
-      dst += 3;
-      buffer_start += 4;
-    }
-    if (buffer_start + 4 <= bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 3);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16le_to_utf8(buf, len, utf8_output);
+}
 
-      dst += 3;
-      buffer_start += 4;
-    }
-    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // backtrack
-    int leftover = int(bufferptr - buffer_start);
-    while (leftover > 0) {
-      while (to_base64[uint8_t(*(src - 1))] == 64) {
-        src--;
-      }
-      src--;
-      leftover--;
-    }
-  }
-  if (src < srcend + equalsigns) {
-    full_result r = scalar::base64::base64_tail_decode(
-        dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    r.input_count += size_t(src - srcinit);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
-        r.error == error_code::BASE64_EXTRA_BITS) {
-      return r;
-    } else {
-      r.output_count += size_t(dst - dstinit);
-    }
-    if (last_chunk_options != stop_before_partial &&
-        r.error == error_code::SUCCESS && equalsigns > 0) {
-      // additional checks
-      if ((r.output_count % 3 == 0) ||
-          ((r.output_count % 3) + 1 + equalsigns != 4)) {
-        r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.input_count = equallocation;
-      }
-    }
-    return r;
-  }
-  if (equalsigns > 0) {
-    if ((size_t(dst - dstinit) % 3 == 0) ||
-        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
-    }
-  }
-  return {SUCCESS, srclen, size_t(dst - dstinit)};
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
-/* end file src/arm64/arm_base64.cpp */
-/* begin file src/arm64/arm_convert_utf32_to_latin1.cpp */
-std::pair<const char32_t *, char *>
-arm_convert_utf32_to_latin1(const char32_t *buf, size_t len,
-                            char *latin1_output) {
-  const char32_t *end = buf + len;
-  while (end - buf >= 8) {
-    uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
 
-    uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
-    if (vmaxvq_u16(utf16_packed) <= 0xff) {
-      // 1. pack the bytes
-      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
-      // 2. store (8 bytes)
-      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
-      // 3. adjust pointers
-      buf += 8;
-      latin1_output += 8;
-    } else {
-      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
-    }
-  } // while
-  return std::make_pair(buf, latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
-std::pair<result, char *>
-arm_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
-                                        char *latin1_output) {
-  const char32_t *start = buf;
-  const char32_t *end = buf + len;
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1_with_errors(buf, len, latin1_output)
+      .first;
+}
 
-  while (end - buf >= 8) {
-    uint32x4_t in1 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t in2 = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
+}
 
-    uint16x8_t utf16_packed = vcombine_u16(vqmovn_u32(in1), vqmovn_u32(in2));
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx512_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    if (vmaxvq_u16(utf16_packed) <= 0xff) {
-      // 1. pack the bytes
-      uint8x8_t latin1_packed = vmovn_u16(utf16_packed);
-      // 2. store (8 bytes)
-      vst1_u8(reinterpret_cast<uint8_t *>(latin1_output), latin1_packed);
-      // 3. adjust pointers
-      buf += 8;
-      latin1_output += 8;
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      icelake::avx512_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  if (ret.first.count != len) {
+    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      // Let us do a scalar fallback.
-      for (int k = 0; k < 8; k++) {
-        uint32_t word = buf[k];
-        if (word <= 0xff) {
-          *latin1_output++ = char(word);
-        } else {
-          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
-                                latin1_output);
-        }
-      }
+      ret.second += scalar_res.count;
     }
-  } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        latin1_output);
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
-/* end file src/arm64/arm_convert_utf32_to_latin1.cpp */
-/* begin file src/arm64/arm_convert_utf32_to_utf16.cpp */
-template <endianness big_endian>
-std::pair<const char32_t *, char16_t *>
-arm_convert_utf32_to_utf16(const char32_t *buf, size_t len,
-                           char16_t *utf16_out) {
-  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
-  const char32_t *end = buf + len;
 
-  uint16x4_t forbidden_bytemask = vmov_n_u16(0x0);
-
-  while (end - buf >= 4) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-
-    // Check if no bits set above 16th
-    if (vmaxvq_u32(in) <= 0xFFFF) {
-      uint16x4_t utf16_packed = vmovn_u32(in);
-
-      const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
-      const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
-      forbidden_bytemask = vorr_u16(vand_u16(vcle_u16(utf16_packed, v_dfff),
-                                             vcge_u16(utf16_packed, v_d800)),
-                                    forbidden_bytemask);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf32_to_utf8(buf, len, utf8_output);
+}
 
-      if (!match_system(big_endian)) {
-        utf16_packed =
-            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
-      }
-      vst1_u16(utf16_output, utf16_packed);
-      utf16_output += 4;
-      buf += 4;
-    } else {
-      size_t forward = 3;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFF0000) == 0) {
-          // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char16_t *>(utf16_output));
-          }
-          *utf16_output++ = !match_system(big_endian)
-                                ? char16_t(word >> 8 | word << 8)
-                                : char16_t(word);
-        } else {
-          // will generate a surrogate pair
-          if (word > 0x10FFFF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char16_t *>(utf16_output));
-          }
-          word -= 0x10000;
-          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (!match_system(big_endian)) {
-            high_surrogate =
-                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
-            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
-          }
-          *utf16_output++ = char16_t(high_surrogate);
-          *utf16_output++ = char16_t(low_surrogate);
-        }
-      }
-      buf += k;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx512_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx512_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
   }
+  return saved_bytes;
+}
 
-  // check for invalid input
-  if (vmaxv_u16(forbidden_bytemask) != 0) {
-    return std::make_pair(nullptr, reinterpret_cast<char16_t *>(utf16_output));
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      avx512_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+          buf, len, utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
   }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  return std::make_pair(buf, reinterpret_cast<char16_t *>(utf16_output));
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      avx512_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                                 utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-template <endianness big_endian>
-std::pair<result, char16_t *>
-arm_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
-                                       char16_t *utf16_out) {
-  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
-  const char32_t *start = buf;
-  const char32_t *end = buf + len;
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16le(buf, len, utf16_output);
+}
 
-  while (end - buf >= 4) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16be(buf, len, utf16_output);
+}
 
-    // Check if no bits set above 16th
-    if (vmaxvq_u32(in) <= 0xFFFF) {
-      uint16x4_t utf16_packed = vmovn_u32(in);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-      const uint16x4_t v_d800 = vmov_n_u16((uint16_t)0xd800);
-      const uint16x4_t v_dfff = vmov_n_u16((uint16_t)0xdfff);
-      const uint16x4_t forbidden_bytemask = vand_u16(
-          vcle_u16(utf16_packed, v_dfff), vcge_u16(utf16_packed, v_d800));
-      if (vmaxv_u16(forbidden_bytemask) != 0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start),
-                              reinterpret_cast<char16_t *>(utf16_output));
-      }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-      if (!match_system(big_endian)) {
-        utf16_packed =
-            vreinterpret_u16_u8(vrev16_u8(vreinterpret_u8_u16(utf16_packed)));
-      }
-      vst1_u16(utf16_output, utf16_packed);
-      utf16_output += 4;
-      buf += 4;
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
+  if (!std::get<2>(ret)) {
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    scalar_res.count += (std::get<0>(ret) - buf);
+    return scalar_res;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_res.error) {
+      scalar_res.count += (std::get<0>(ret) - buf);
+      return scalar_res;
     } else {
-      size_t forward = 3;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFF0000) == 0) {
-          // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k),
-                reinterpret_cast<char16_t *>(utf16_output));
-          }
-          *utf16_output++ = !match_system(big_endian)
-                                ? char16_t(word >> 8 | word << 8)
-                                : char16_t(word);
-        } else {
-          // will generate a surrogate pair
-          if (word > 0x10FFFF) {
-            return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k),
-                reinterpret_cast<char16_t *>(utf16_output));
-          }
-          word -= 0x10000;
-          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (!match_system(big_endian)) {
-            high_surrogate =
-                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
-            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
-          }
-          *utf16_output++ = char16_t(high_surrogate);
-          *utf16_output++ = char16_t(low_surrogate);
-        }
-      }
-      buf += k;
+      scalar_res.count += saved_bytes;
+      return scalar_res;
     }
   }
-
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        reinterpret_cast<char16_t *>(utf16_output));
+  return simdutf::result(simdutf::SUCCESS, saved_bytes);
 }
-/* end file src/arm64/arm_convert_utf32_to_utf16.cpp */
-/* begin file src/arm64/arm_convert_utf32_to_utf8.cpp */
-std::pair<const char32_t *, char *>
-arm_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_out) {
-  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
-  const char32_t *end = buf + len;
-
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
 
-  uint16x8_t forbidden_bytemask = vmovq_n_u16(0x0);
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (!std::get<2>(ret)) {
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    scalar_res.count += (std::get<0>(ret) - buf);
+    return scalar_res;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_res.error) {
+      scalar_res.count += (std::get<0>(ret) - buf);
+      return scalar_res;
+    } else {
+      scalar_res.count += saved_bytes;
+      return scalar_res;
+    }
+  }
+  return simdutf::result(simdutf::SUCCESS, saved_bytes);
+}
 
-  while (buf + 16 + safety_margin < end) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                          utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    // Check if no bits set above 16th
-    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
-      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
-      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
-      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
-      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
-        // 2. store (8 bytes)
-        vst1_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        continue; // we are done for this round!
-      }
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::tuple<const char16_t *, char32_t *, bool> ret =
+      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (!std::get<2>(ret)) {
+    return 0;
+  }
+  size_t saved_bytes = std::get<1>(ret) - utf32_output;
+  if (std::get<0>(ret) != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
-        // 1. prepare 2-byte values
-        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-        // expected output   : [110a|aaaa|10bb|bbbb] x 8
-        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  size_t pos = 0;
+  const __m512i byteflip = _mm512_setr_epi64(
+      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+      0x0607040502030001, 0x0e0f0c0d0a0b0809);
+  while (pos + 32 <= length) {
+    __m512i utf16 = _mm512_loadu_si512((const __m512i *)(input + pos));
+    utf16 = _mm512_shuffle_epi8(utf16, byteflip);
+    _mm512_storeu_si512(output + pos, utf16);
+    pos += 32;
+  }
+  if (pos < length) {
+    __mmask32 m((1U << (length - pos)) - 1);
+    __m512i utf16 = _mm512_maskz_loadu_epi16(m, (const __m512i *)(input + pos));
+    utf16 = _mm512_shuffle_epi8(utf16, byteflip);
+    _mm512_mask_storeu_epi16(output + pos, m, utf16);
+  }
+}
 
-        // t0 = [000a|aaaa|bbbb|bb00]
-        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
-        // t1 = [000a|aaaa|0000|0000]
-        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-        // t2 = [0000|0000|00bb|bbbb]
-        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
-        // t3 = [000a|aaaa|00bb|bbbb]
-        const uint16x8_t t3 = vorrq_u16(t1, t2);
-        // t4 = [110a|aaaa|10bb|bbbb]
-        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-        // 2. merge ASCII and 2-byte codewords
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
-            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
-        // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t mask = simdutf_make_uint16x8_t(
-            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
-#else
-        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                 0x0002, 0x0008, 0x0020, 0x0080};
-#endif
-        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-        // 4. pack the bytes
-        const uint8_t *row =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-        const uint8x16_t shuffle = vld1q_u8(row + 1);
-        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
+  size_t count{0};
 
-        // 5. store bytes
-        vst1q_u8(utf8_output, utf8_packed);
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
-        // 6. adjust pointers
-        buf += 8;
-        utf8_output += row[0];
-        continue;
-      } else {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
-        forbidden_bytemask =
-            vorrq_u16(vandq_u16(vcleq_u16(utf16_packed, v_dfff),
-                                vcgeq_u16(utf16_packed, v_d800)),
-                      forbidden_bytemask);
+    const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
+    const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
 
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
-            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-#else
-        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-#endif
-        /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-          single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
-          two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-          three UTF-8 bytes
+    while (ptr <= end) {
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
+      ptr += 32;
+      uint64_t not_high_surrogate =
+          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
+                                _mm512_cmplt_epu16_mask(utf16, low));
+      count += count_ones(not_high_surrogate);
+    }
+  }
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+  return count + scalar::utf16::count_code_points<endianness::LITTLE>(
+                     ptr, length - (ptr - input));
+}
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
+  size_t count{0};
+  if (length >= 32) {
 
-          We precompute byte 1 for case #3 and -- **conditionally** --
-          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
-          they differ by exactly one bit.
+    const char16_t *end = input + length - 32;
 
-          Finally from these two code units we build proper UTF-8 sequence,
-          taking into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const uint16x8_t t0 =
-            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
-                                            vreinterpretq_u8_u16(dup_even)));
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+    const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
+    const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
 
-        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
-        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-        const uint16x8_t s1 =
-            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
-        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-        // [00bb|bbbb|0000|aaaa]
-        const uint16x8_t s2 = vorrq_u16(s0, s1s);
-        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-        const uint16x8_t one_or_two_bytes_bytemask =
-            vcleq_u16(utf16_packed, v_07ff);
-        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
-                                        one_or_two_bytes_bytemask);
-        const uint16x8_t s4 = veorq_u16(s3, m0);
-#undef simdutf_vec
+    const __m512i byteflip = _mm512_setr_epi64(
+        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+        0x0607040502030001, 0x0e0f0c0d0a0b0809);
+    while (ptr <= end) {
+      __m512i utf16 =
+          _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)ptr), byteflip);
+      ptr += 32;
+      uint64_t not_high_surrogate =
+          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
+                                _mm512_cmplt_epu16_mask(utf16, low));
+      count += count_ones(not_high_surrogate);
+    }
+  }
 
-        // 4. expand code units 16-bit => 32-bit
-        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+  return count + scalar::utf16::count_code_points<endianness::BIG>(
+                     ptr, length - (ptr - input));
+}
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t onemask = simdutf_make_uint16x8_t(
-            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
-        const uint16x8_t twomask = simdutf_make_uint16x8_t(
-            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
-#else
-        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                    0x0100, 0x0400, 0x1000, 0x4000};
-        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
-                                    0x0200, 0x0800, 0x2000, 0x8000};
-#endif
-        const uint16x8_t combined =
-            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
-                      vandq_u16(one_or_two_bytes_bytemask, twomask));
-        const uint16_t mask = vaddvq_u16(combined);
-        // The following fast path may or may not be beneficial.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += 12;
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
-        const uint8_t *row0 =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
+  size_t answer =
+      length / sizeof(__m512i) *
+      sizeof(__m512i); // Number of 512-bit chunks that fits into the length.
+  size_t i = 0;
+  __m512i unrolled_popcount{0};
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t *row1 =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+  const __m512i continuation = _mm512_set1_epi8(char(0b10111111));
 
-        vst1q_u8(utf8_output, utf8_0);
-        utf8_output += row0[0];
-        vst1q_u8(utf8_output, utf8_1);
-        utf8_output += row1[0];
+  while (i + sizeof(__m512i) <= length) {
+    size_t iterations = (length - i) / sizeof(__m512i);
 
-        buf += 8;
-      }
-      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
-      // will produce four UTF-8 bytes.
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) {
-          *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) {
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) {
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char *>(utf8_output));
-          }
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          if (word > 0x10FFFF) {
-            return std::make_pair(nullptr,
-                                  reinterpret_cast<char *>(utf8_output));
-          }
-          *utf8_output++ = char((word >> 18) | 0b11110000);
-          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
-      }
-      buf += k;
+    size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
+    for (; i + 8 * sizeof(__m512i) <= max_i; i += 8 * sizeof(__m512i)) {
+      __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
+      __m512i input2 =
+          _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
+      __m512i input3 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 2 * sizeof(__m512i)));
+      __m512i input4 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 3 * sizeof(__m512i)));
+      __m512i input5 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 4 * sizeof(__m512i)));
+      __m512i input6 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 5 * sizeof(__m512i)));
+      __m512i input7 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 6 * sizeof(__m512i)));
+      __m512i input8 =
+          _mm512_loadu_si512((const __m512i *)(str + i + 7 * sizeof(__m512i)));
+
+      __mmask64 mask1 = _mm512_cmple_epi8_mask(input1, continuation);
+      __mmask64 mask2 = _mm512_cmple_epi8_mask(input2, continuation);
+      __mmask64 mask3 = _mm512_cmple_epi8_mask(input3, continuation);
+      __mmask64 mask4 = _mm512_cmple_epi8_mask(input4, continuation);
+      __mmask64 mask5 = _mm512_cmple_epi8_mask(input5, continuation);
+      __mmask64 mask6 = _mm512_cmple_epi8_mask(input6, continuation);
+      __mmask64 mask7 = _mm512_cmple_epi8_mask(input7, continuation);
+      __mmask64 mask8 = _mm512_cmple_epi8_mask(input8, continuation);
+
+      __m512i mask_register = _mm512_set_epi64(mask8, mask7, mask6, mask5,
+                                               mask4, mask3, mask2, mask1);
+
+      unrolled_popcount = _mm512_add_epi64(unrolled_popcount,
+                                           _mm512_popcnt_epi64(mask_register));
     }
-  } // while
 
-  // check for invalid input
-  if (vmaxvq_u16(forbidden_bytemask) != 0) {
-    return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
+    for (; i <= max_i; i += sizeof(__m512i)) {
+      __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
+      uint64_t continuation_bitmask = static_cast<uint64_t>(
+          _mm512_cmple_epi8_mask(more_input, continuation));
+      answer -= count_ones(continuation_bitmask);
+    }
   }
-  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
-}
 
-std::pair<result, char *>
-arm_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
-                                      char *utf8_out) {
-  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
-  const char32_t *start = buf;
-  const char32_t *end = buf + len;
+  __m256i first_half = _mm512_extracti64x4_epi64(unrolled_popcount, 0);
+  __m256i second_half = _mm512_extracti64x4_epi64(unrolled_popcount, 1);
+  answer -= (size_t)_mm256_extract_epi64(first_half, 0) +
+            (size_t)_mm256_extract_epi64(first_half, 1) +
+            (size_t)_mm256_extract_epi64(first_half, 2) +
+            (size_t)_mm256_extract_epi64(first_half, 3) +
+            (size_t)_mm256_extract_epi64(second_half, 0) +
+            (size_t)_mm256_extract_epi64(second_half, 1) +
+            (size_t)_mm256_extract_epi64(second_half, 2) +
+            (size_t)_mm256_extract_epi64(second_half, 3);
 
-  const uint16x8_t v_c080 = vmovq_n_u16((uint16_t)0xc080);
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
+  return answer + scalar::utf8::count_code_points(
+                      reinterpret_cast<const char *>(str + i), length - i);
+}
 
-  while (buf + 16 + safety_margin < end) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(buf));
-    uint32x4_t nextin = vld1q_u32(reinterpret_cast<const uint32_t *>(buf + 4));
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
+}
 
-    // Check if no bits set above 16th
-    if (vmaxvq_u32(vorrq_u32(in, nextin)) <= 0xFFFF) {
-      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
-      // Apply UTF-16 => UTF-8 routine (arm_convert_utf16_to_utf8.cpp)
-      uint16x8_t utf16_packed = vcombine_u16(vmovn_u32(in), vmovn_u32(nextin));
-      if (vmaxvq_u16(utf16_packed) <= 0x7F) { // ASCII fast path!!!!
-        // 1. pack the bytes
-        // obviously suboptimal.
-        uint8x8_t utf8_packed = vmovn_u16(utf16_packed);
-        // 2. store (8 bytes)
-        vst1_u8(utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        continue; // we are done for this round!
-      }
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
+  return scalar::utf16::latin1_length_from_utf16(length);
+}
 
-      if (vmaxvq_u16(utf16_packed) <= 0x7FF) {
-        // 1. prepare 2-byte values
-        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-        // expected output   : [110a|aaaa|10bb|bbbb] x 8
-        const uint16x8_t v_1f00 = vmovq_n_u16((int16_t)0x1f00);
-        const uint16x8_t v_003f = vmovq_n_u16((int16_t)0x003f);
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
+  return scalar::utf32::latin1_length_from_utf32(length);
+}
 
-        // t0 = [000a|aaaa|bbbb|bb00]
-        const uint16x8_t t0 = vshlq_n_u16(utf16_packed, 2);
-        // t1 = [000a|aaaa|0000|0000]
-        const uint16x8_t t1 = vandq_u16(t0, v_1f00);
-        // t2 = [0000|0000|00bb|bbbb]
-        const uint16x8_t t2 = vandq_u16(utf16_packed, v_003f);
-        // t3 = [000a|aaaa|00bb|bbbb]
-        const uint16x8_t t3 = vorrq_u16(t1, t2);
-        // t4 = [110a|aaaa|10bb|bbbb]
-        const uint16x8_t t4 = vorrq_u16(t3, v_c080);
-        // 2. merge ASCII and 2-byte codewords
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-        const uint8x16_t utf8_unpacked = vreinterpretq_u8_u16(
-            vbslq_u16(one_byte_bytemask, utf16_packed, t4));
-        // 3. prepare bitmask for 8-bit lookup
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t mask = simdutf_make_uint16x8_t(
-            0x0001, 0x0004, 0x0010, 0x0040, 0x0002, 0x0008, 0x0020, 0x0080);
-#else
-        const uint16x8_t mask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                 0x0002, 0x0008, 0x0020, 0x0080};
-#endif
-        uint16_t m2 = vaddvq_u16(vandq_u16(one_byte_bytemask, mask));
-        // 4. pack the bytes
-        const uint8_t *row =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-        const uint8x16_t shuffle = vld1q_u8(row + 1);
-        const uint8x16_t utf8_packed = vqtbl1q_u8(utf8_unpacked, shuffle);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
+  size_t count{0};
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
-        // 5. store bytes
-        vst1q_u8(utf8_output, utf8_packed);
+    const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
+    const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
+    const __m512i v_dfff = _mm512_set1_epi16((uint16_t)0xdfff);
+    const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
 
-        // 6. adjust pointers
-        buf += 8;
-        utf8_output += row[0];
-        continue;
-      } else {
-        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+    while (ptr <= end) {
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
+      ptr += 32;
+      __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
+      __mmask32 two_bytes_bitmask =
+          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
+      __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
+      __mmask32 surrogates_bitmask =
+          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
+          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
 
-        // check for invalid input
-        const uint16x8_t v_d800 = vmovq_n_u16((uint16_t)0xd800);
-        const uint16x8_t v_dfff = vmovq_n_u16((uint16_t)0xdfff);
-        const uint16x8_t forbidden_bytemask = vandq_u16(
-            vcleq_u16(utf16_packed, v_dfff), vcgeq_u16(utf16_packed, v_d800));
-        if (vmaxvq_u16(forbidden_bytemask) != 0) {
-          return std::make_pair(result(error_code::SURROGATE, buf - start),
-                                reinterpret_cast<char *>(utf8_output));
-        }
+      size_t ascii_count = count_ones(ascii_bitmask);
+      size_t two_bytes_count = count_ones(two_bytes_bitmask);
+      size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
+      size_t three_bytes_count =
+          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
 
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t dup_even = simdutf_make_uint16x8_t(
-            0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-#else
-        const uint16x8_t dup_even = {0x0000, 0x0202, 0x0404, 0x0606,
-                                     0x0808, 0x0a0a, 0x0c0c, 0x0e0e};
-#endif
-        /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-          single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
-          two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-          three UTF-8 bytes
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               2 * surrogate_bytes_count;
+    }
+  }
 
-          We expand the input word (16-bit) into two code units (32-bit), thus
-          we have room for four bytes. However, we need five distinct bit
-          layouts. Note that the last byte in cases #2 and #3 is the same.
+  return count + scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(
+                     ptr, length - (ptr - input));
+}
 
-          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-          in register t2.
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  const char16_t *ptr = input;
+  size_t count{0};
 
-          We precompute byte 1 for case #3 and -- **conditionally** --
-          precompute either byte 1 for case #2 or byte 2 for case #3. Note that
-          they differ by exactly one bit.
+  if (length >= 32) {
+    const char16_t *end = input + length - 32;
 
-          Finally from these two code units we build proper UTF-8 sequence,
-          taking into account the case (i.e, the number of bytes to write).
-        */
-        /**
-         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-         * t2 => [0ccc|cccc] [10cc|cccc]
-         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-         */
-#define simdutf_vec(x) vmovq_n_u16(static_cast<uint16_t>(x))
-        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-        const uint16x8_t t0 =
-            vreinterpretq_u16_u8(vqtbl1q_u8(vreinterpretq_u8_u16(utf16_packed),
-                                            vreinterpretq_u8_u16(dup_even)));
-        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-        const uint16x8_t t1 = vandq_u16(t0, simdutf_vec(0b0011111101111111));
-        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-        const uint16x8_t t2 = vorrq_u16(t1, simdutf_vec(0b1000000000000000));
+    const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
+    const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
+    const __m512i v_dfff = _mm512_set1_epi16((uint16_t)0xdfff);
+    const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
 
-        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
-        const uint16x8_t s0 = vshrq_n_u16(utf16_packed, 12);
-        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
-        const uint16x8_t s1 =
-            vandq_u16(utf16_packed, simdutf_vec(0b0000111111000000));
-        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
-        const uint16x8_t s1s = vshlq_n_u16(s1, 2);
-        // [00bb|bbbb|0000|aaaa]
-        const uint16x8_t s2 = vorrq_u16(s0, s1s);
-        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-        const uint16x8_t s3 = vorrq_u16(s2, simdutf_vec(0b1100000011100000));
-        const uint16x8_t v_07ff = vmovq_n_u16((uint16_t)0x07FF);
-        const uint16x8_t one_or_two_bytes_bytemask =
-            vcleq_u16(utf16_packed, v_07ff);
-        const uint16x8_t m0 = vbicq_u16(simdutf_vec(0b0100000000000000),
-                                        one_or_two_bytes_bytemask);
-        const uint16x8_t s4 = veorq_u16(s3, m0);
-#undef simdutf_vec
+    const __m512i byteflip = _mm512_setr_epi64(
+        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
+        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
+        0x0607040502030001, 0x0e0f0c0d0a0b0809);
+    while (ptr <= end) {
+      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
+      utf16 = _mm512_shuffle_epi8(utf16, byteflip);
+      ptr += 32;
+      __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
+      __mmask32 two_bytes_bitmask =
+          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
+      __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
+      __mmask32 surrogates_bitmask =
+          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
+          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
 
-        // 4. expand code units 16-bit => 32-bit
-        const uint8x16_t out0 = vreinterpretq_u8_u16(vzip1q_u16(t2, s4));
-        const uint8x16_t out1 = vreinterpretq_u8_u16(vzip2q_u16(t2, s4));
+      size_t ascii_count = count_ones(ascii_bitmask);
+      size_t two_bytes_count = count_ones(two_bytes_bitmask);
+      size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
+      size_t three_bytes_count =
+          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               2 * surrogate_bytes_count;
+    }
+  }
 
-        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-        const uint16x8_t v_007f = vmovq_n_u16((uint16_t)0x007F);
-        const uint16x8_t one_byte_bytemask = vcleq_u16(utf16_packed, v_007f);
-#ifdef SIMDUTF_REGULAR_VISUAL_STUDIO
-        const uint16x8_t onemask = simdutf_make_uint16x8_t(
-            0x0001, 0x0004, 0x0010, 0x0040, 0x0100, 0x0400, 0x1000, 0x4000);
-        const uint16x8_t twomask = simdutf_make_uint16x8_t(
-            0x0002, 0x0008, 0x0020, 0x0080, 0x0200, 0x0800, 0x2000, 0x8000);
-#else
-        const uint16x8_t onemask = {0x0001, 0x0004, 0x0010, 0x0040,
-                                    0x0100, 0x0400, 0x1000, 0x4000};
-        const uint16x8_t twomask = {0x0002, 0x0008, 0x0020, 0x0080,
-                                    0x0200, 0x0800, 0x2000, 0x8000};
-#endif
-        const uint16x8_t combined =
-            vorrq_u16(vandq_u16(one_byte_bytemask, onemask),
-                      vandq_u16(one_or_two_bytes_bytemask, twomask));
-        const uint16_t mask = vaddvq_u16(combined);
-        // The following fast path may or may not be beneficial.
-        /*if(mask == 0) {
-          // We only have three-byte code units. Use fast path.
-          const uint8x16_t shuffle = {2,3,1,6,7,5,10,11,9,14,15,13,0,0,0,0};
-          const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle);
-          const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle);
-          vst1q_u8(utf8_output, utf8_0);
-          utf8_output += 12;
-          vst1q_u8(utf8_output, utf8_1);
-          utf8_output += 12;
-          buf += 8;
-          continue;
-        }*/
-        const uint8_t mask0 = uint8_t(mask);
+  return count + scalar::utf16::utf8_length_from_utf16<endianness::BIG>(
+                     ptr, length - (ptr - input));
+}
 
-        const uint8_t *row0 =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-        const uint8x16_t shuffle0 = vld1q_u8(row0 + 1);
-        const uint8x16_t utf8_0 = vqtbl1q_u8(out0, shuffle0);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return implementation::count_utf16le(input, length);
+}
 
-        const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-        const uint8_t *row1 =
-            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-        const uint8x16_t shuffle1 = vld1q_u8(row1 + 1);
-        const uint8x16_t utf8_1 = vqtbl1q_u8(out1, shuffle1);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return implementation::count_utf16be(input, length);
+}
 
-        vst1q_u8(utf8_output, utf8_0);
-        utf8_output += row0[0];
-        vst1q_u8(utf8_output, utf8_1);
-        utf8_output += row1[0];
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf16_length_from_latin1(length);
+}
 
-        buf += 8;
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf32_length_from_latin1(length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
+  size_t answer = length / sizeof(__m512i) * sizeof(__m512i);
+  size_t i = 0;
+  if (answer >= 2048) { // long strings optimization
+    unsigned char v_0xFF = 0xff;
+    __m512i eight_64bits = _mm512_setzero_si512();
+    while (i + sizeof(__m512i) <= length) {
+      __m512i runner = _mm512_setzero_si512();
+      size_t iterations = (length - i) / sizeof(__m512i);
+      if (iterations > 255) {
+        iterations = 255;
       }
-      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
-      // will produce four UTF-8 bytes.
-    } else {
-      // Let us do a scalar fallback.
-      // It may seem wasteful to use scalar code, but being efficient with SIMD
-      // in the presence of surrogate pairs may require non-trivial tables.
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
+      size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
+      for (; i + 4 * sizeof(__m512i) <= max_i; i += 4 * sizeof(__m512i)) {
+        // Load four __m512i vectors
+        __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
+        __m512i input2 =
+            _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
+        __m512i input3 = _mm512_loadu_si512(
+            (const __m512i *)(str + i + 2 * sizeof(__m512i)));
+        __m512i input4 = _mm512_loadu_si512(
+            (const __m512i *)(str + i + 3 * sizeof(__m512i)));
+
+        // Generate four masks
+        __mmask64 mask1 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input1);
+        __mmask64 mask2 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input2);
+        __mmask64 mask3 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input3);
+        __mmask64 mask4 =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input4);
+        // Apply the masks and subtract from the runner
+        __m512i not_ascii1 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask1, v_0xFF);
+        __m512i not_ascii2 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask2, v_0xFF);
+        __m512i not_ascii3 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask3, v_0xFF);
+        __m512i not_ascii4 =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask4, v_0xFF);
+
+        runner = _mm512_sub_epi8(runner, not_ascii1);
+        runner = _mm512_sub_epi8(runner, not_ascii2);
+        runner = _mm512_sub_epi8(runner, not_ascii3);
+        runner = _mm512_sub_epi8(runner, not_ascii4);
       }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) {
-          *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) {
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) {
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k),
-                reinterpret_cast<char *>(utf8_output));
-          }
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else {
-          if (word > 0x10FFFF) {
-            return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k),
-                reinterpret_cast<char *>(utf8_output));
-          }
-          *utf8_output++ = char((word >> 18) | 0b11110000);
-          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
+
+      for (; i <= max_i; i += sizeof(__m512i)) {
+        __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
+
+        __mmask64 mask =
+            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), more_input);
+        __m512i not_ascii =
+            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask, v_0xFF);
+        runner = _mm512_sub_epi8(runner, not_ascii);
       }
-      buf += k;
+
+      eight_64bits = _mm512_add_epi64(
+          eight_64bits, _mm512_sad_epu8(runner, _mm512_setzero_si512()));
+    }
+
+    __m256i first_half = _mm512_extracti64x4_epi64(eight_64bits, 0);
+    __m256i second_half = _mm512_extracti64x4_epi64(eight_64bits, 1);
+    answer += (size_t)_mm256_extract_epi64(first_half, 0) +
+              (size_t)_mm256_extract_epi64(first_half, 1) +
+              (size_t)_mm256_extract_epi64(first_half, 2) +
+              (size_t)_mm256_extract_epi64(first_half, 3) +
+              (size_t)_mm256_extract_epi64(second_half, 0) +
+              (size_t)_mm256_extract_epi64(second_half, 1) +
+              (size_t)_mm256_extract_epi64(second_half, 2) +
+              (size_t)_mm256_extract_epi64(second_half, 3);
+  } else if (answer > 0) {
+    for (; i + sizeof(__m512i) <= length; i += sizeof(__m512i)) {
+      __m512i latin = _mm512_loadu_si512((const __m512i *)(str + i));
+      uint64_t non_ascii = _mm512_movepi8_mask(latin);
+      answer += count_ones(non_ascii);
     }
-  } // while
+  }
+  return answer + scalar::latin1::utf8_length_from_latin1(
+                      reinterpret_cast<const char *>(str + i), length - i);
+}
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start),
-                        reinterpret_cast<char *>(utf8_output));
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= length; pos += 64) {
+    __m512i utf8 = _mm512_loadu_si512((const __m512i *)(input + pos));
+    uint64_t utf8_continuation_mask =
+        _mm512_cmplt_epi8_mask(utf8, _mm512_set1_epi8(-65 + 1));
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    uint64_t utf8_4byte =
+        _mm512_cmpge_epu8_mask(utf8, _mm512_set1_epi8(int8_t(240)));
+    count += count_ones(utf8_4byte);
+  }
+  return count +
+         scalar::utf8::utf16_length_from_utf8(input + pos, length - pos);
 }
-/* end file src/arm64/arm_convert_utf32_to_utf8.cpp */
 
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* begin file src/generic/buf_block_reader.h */
-namespace simdutf {
-namespace arm64 {
-namespace {
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const char32_t *ptr = input;
+  size_t count{0};
 
-// Walks through a buffer in block-sized increments, loading the last part with
-// spaces
-template <size_t STEP_SIZE> struct buf_block_reader {
-public:
-  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
-  simdutf_really_inline size_t block_index();
-  simdutf_really_inline bool has_full_block() const;
-  simdutf_really_inline const uint8_t *full_block() const;
-  /**
-   * Get the last block, padded with spaces.
-   *
-   * There will always be a last block, with at least 1 byte, unless len == 0
-   * (in which case this function fills the buffer with spaces and returns 0. In
-   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
-   * block with STEP_SIZE bytes and no spaces for padding.
-   *
-   * @return the number of effective characters in the last block.
-   */
-  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
-  simdutf_really_inline void advance();
+  if (length >= 16) {
+    const char32_t *end = input + length - 16;
 
-private:
-  const uint8_t *buf;
-  const size_t len;
-  const size_t lenminusstep;
-  size_t idx;
-};
+    const __m512i v_0000_007f = _mm512_set1_epi32((uint32_t)0x7f);
+    const __m512i v_0000_07ff = _mm512_set1_epi32((uint32_t)0x7ff);
+    const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
 
-// Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char *format_input_text_64(const uint8_t *text) {
-  static char *buf =
-      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
-    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+    while (ptr <= end) {
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
+      ptr += 16;
+      __mmask16 ascii_bitmask = _mm512_cmple_epu32_mask(utf32, v_0000_007f);
+      __mmask16 two_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
+          _knot_mask16(ascii_bitmask), utf32, v_0000_07ff);
+      __mmask16 three_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
+          _knot_mask16(_mm512_kor(ascii_bitmask, two_bytes_bitmask)), utf32,
+          v_0000_ffff);
+
+      size_t ascii_count = count_ones(ascii_bitmask);
+      size_t two_bytes_count = count_ones(two_bytes_bitmask);
+      size_t three_bytes_count = count_ones(three_bytes_bitmask);
+      size_t four_bytes_count =
+          16 - ascii_count - two_bytes_count - three_bytes_count;
+      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
+               4 * four_bytes_count;
+    }
   }
-  buf[sizeof(simd8x64<uint8_t>)] = '\0';
-  return buf;
+
+  return count +
+         scalar::utf32::utf8_length_from_utf32(ptr, length - (ptr - input));
 }
 
-// Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
-  static char *buf =
-      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t *>(buf));
-  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') {
-      buf[i] = '_';
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const char32_t *ptr = input;
+  size_t count{0};
+
+  if (length >= 16) {
+    const char32_t *end = input + length - 16;
+
+    const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
+
+    while (ptr <= end) {
+      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
+      ptr += 16;
+      __mmask16 surrogates_bitmask =
+          _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
+
+      count += 16 + count_ones(surrogates_bitmask);
     }
   }
-  buf[sizeof(simd8x64<uint8_t>)] = '\0';
-  return buf;
+
+  return count +
+         scalar::utf32::utf16_length_from_utf32(ptr, length - (ptr - input));
 }
 
-simdutf_unused static char *format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
-  for (size_t i = 0; i < 64; i++) {
-    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
-  }
-  buf[64] = '\0';
-  return buf;
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return implementation::count_utf8(input, length);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline
-buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
-    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
-      idx{0} {}
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
 
-template <size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
-  return idx;
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
-  return idx < lenminusstep;
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *
-buf_block_reader<STEP_SIZE>::full_block() const {
-  return &buf[idx];
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline size_t
-buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if (len == idx) {
-    return 0;
-  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20,
-              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
-                          // to write out 8 or 16 bytes at once.
-  std::memcpy(dst, buf + idx, len - idx);
-  return len - idx;
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
-  idx += STEP_SIZE;
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-} // unnamed namespace
-} // namespace arm64
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
+
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
+    return encode_base64<true>(output, input, length, options);
+  } else {
+    return encode_base64<false>(output, input, length, options);
+  }
+}
+
+} // namespace icelake
 } // namespace simdutf
-/* end file src/generic/buf_block_reader.h */
-/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_validation {
 
-using namespace simd;
+/* begin file src/simdutf/icelake/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
+// nothing needed.
+#else
+SIMDUTF_UNTARGET_REGION
+#endif
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+SIMDUTF_POP_DISABLE_WARNINGS
+#endif // end of workaround
+/* end file src/simdutf/icelake/end.h */
+/* end file src/icelake/implementation.cpp */
+#endif
+#if SIMDUTF_IMPLEMENTATION_HASWELL
+/* begin file src/haswell/implementation.cpp */
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+/* begin file src/simdutf/haswell/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "haswell"
+// #define SIMDUTF_IMPLEMENTATION haswell
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+#if SIMDUTF_CAN_ALWAYS_RUN_HASWELL
+// nothing needed.
+#else
+SIMDUTF_TARGET_HASWELL
+#endif
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+// clang-format off
+SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
+// clang-format on
+#endif // end of workaround
+/* end file src/simdutf/haswell/begin.h */
+namespace simdutf {
+namespace haswell {
+namespace {
+#ifndef SIMDUTF_HASWELL_H
+  #error "haswell.h must be included"
+#endif
+using namespace simd;
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
+  return input.reduce_or().is_ascii();
 }
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
+
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_second_byte =
+      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
+         int8_t(0);
 }
 
-//
-// Return nonzero if there are incomplete multibyte characters at the end of the
-// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
-//
-simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-  // If the previous input's last 3 bytes match this, they're too short (they
-  // ended at EOF):
-  // ... 1111____ 111_____ 11______
-  static const uint8_t max_array[32] = {255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        0b11110000u - 1,
-                                        0b11100000u - 1,
-                                        0b11000000u - 1};
-  const simd8<uint8_t> max_value(
-      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
-  return input.gt_bits(max_value);
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be > 0x80
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be > 0x80
+  return simd8<bool>(is_third_byte | is_fourth_byte);
 }
 
-struct utf8_checker {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
-  // The last input we received
-  simd8<uint8_t> prev_input_block;
-  // Whether the last input we received was incomplete (used for ASCII fast
-  // path)
-  simd8<uint8_t> prev_incomplete;
+/* begin file src/haswell/avx2_validate_utf16.cpp */
+/*
+    In UTF-16 code units in range 0xD800 to 0xDFFF have special meaning.
+
+    In a vectorized algorithm we want to examine the most significant
+    nibble in order to select a fast path. If none of highest nibbles
+    are 0xD (13), than we are sure that UTF-16 chunk in a vector
+    register is valid.
+
+    Let us analyze what we need to check if the nibble is 0xD. The
+    value of the preceding nibble determines what we have:
+
+    0xd000 .. 0xd7ff - a valid word
+    0xd800 .. 0xdbff - low surrogate
+    0xdc00 .. 0xdfff - high surrogate
+
+    Other constraints we have to consider:
+    - there must not be two consecutive low surrogates (0xd800 .. 0xdbff)
+    - there must not be two consecutive high surrogates (0xdc00 .. 0xdfff)
+    - there must not be sole low surrogate nor high surrogate
+
+    We're going to build three bitmasks based on the 3rd nibble:
+    - V = valid word,
+    - L = low surrogate (0xd800 .. 0xdbff)
+    - H = high surrogate (0xdc00 .. 0xdfff)
+
+      0   1   2   3   4   5   6   7    <--- word index
+    [ V | L | H | L | H | V | V | L ]
+      1   0   0   0   0   1   1   0     - V = valid masks
+      0   1   0   1   0   0   0   1     - L = low surrogate
+      0   0   1   0   1   0   0   0     - H high surrogate
+
+
+      1   0   0   0   0   1   1   0   V = valid masks
+      0   1   0   1   0   0   0   0   a = L & (H >> 1)
+      0   0   1   0   1   0   0   0   b = a << 1
+      1   1   1   1   1   1   1   0   c = V | a | b
+                                  ^
+                                  the last bit can be zero, we just consume 7
+   code units and recheck this word in the next iteration
+*/
+
+/* Returns:
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
+   - nullptr if an error was detected.
+*/
+template <endianness big_endian>
+const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
+
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
+
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
+
+    const auto in = simd16<uint16_t>::pack(t0, t1);
+
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint32_t V = ~surrogates_bitmask;
 
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
-  }
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint32_t H = vH.to_bitmask();
 
-  // The only problem that can happen at EOF is that a multibyte character is
-  // too short or a byte value too large in the last bytes: check_special_cases
-  // only checks for bytes too large in the first of two bytes.
-  simdutf_really_inline void check_eof() {
-    // If the previous block had incomplete UTF-8 characters at the end, an
-    // ASCII block can't possibly finish them.
-    this->error |= this->prev_incomplete;
-  }
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint32_t L = ~H & surrogates_bitmask;
 
-  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
-    if (simdutf_likely(is_ascii(input))) {
-      this->error |= this->prev_incomplete;
-    } else {
-      // you might think that a for-loop would work, but under Visual Studio, it
-      // is not good enough.
-      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-                    "We support either two or four chunks per 64-byte block.");
-      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+
+      if (c == 0xffffffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+      } else {
+        return nullptr;
       }
-      this->prev_incomplete =
-          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
-      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
   }
 
-  // do not forget to call check_eof!
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
+  return input;
+}
+
+template <endianness big_endian>
+const result avx2_validate_utf16_with_errors(const char16_t *input,
+                                             size_t size) {
+  if (simdutf_unlikely(size == 0)) {
+    return result(error_code::SUCCESS, 0);
   }
+  const char16_t *start = input;
+  const char16_t *end = input + size;
 
-}; // struct utf8_checker
-} // namespace utf8_validation
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-using utf8_validation::utf8_checker;
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
 
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
-/* begin file src/generic/utf8_validation/utf8_validator.h */
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_validation {
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
 
-/**
- * Validates that the string is actual UTF-8.
- */
-template <class checker>
-bool generic_validate_utf8(const uint8_t *input, size_t length) {
-  checker c{};
-  buf_block_reader<64> reader(input, length);
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    c.check_next_input(in);
-    reader.advance();
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  c.check_next_input(in);
-  reader.advance();
-  c.check_eof();
-  return !c.errors();
-}
+    const auto t0 = in0.shr<8>();
+    const auto t1 = in1.shr<8>();
 
-bool generic_validate_utf8(const char *input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
-}
+    const auto in = simd16<uint16_t>::pack(t0, t1);
 
-/**
- * Validates that the string is actual UTF-8 and stops on errors.
- */
-template <class checker>
-result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
-  checker c{};
-  buf_block_reader<64> reader(input, length);
-  size_t count{0};
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    c.check_next_input(in);
-    if (c.errors()) {
-      if (count != 0) {
-        count--;
-      } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(
-          reinterpret_cast<const char *>(input),
-          reinterpret_cast<const char *>(input + count), length - count);
-      res.count += count;
-      return res;
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint32_t V = ~surrogates_bitmask;
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint32_t H = vH.to_bitmask();
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint32_t L = ~H & surrogates_bitmask;
+
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+
+      if (c == 0xffffffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
     }
-    reader.advance();
-    count += 64;
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  c.check_next_input(in);
-  reader.advance();
-  c.check_eof();
-  if (c.errors()) {
-    if (count != 0) {
-      count--;
-    } // Sometimes the error is only detected in the next chunk
-    result res = scalar::utf8::rewind_and_validate_with_errors(
-        reinterpret_cast<const char *>(input),
-        reinterpret_cast<const char *>(input) + count, length - count);
-    res.count += count;
-    return res;
-  } else {
-    return result(error_code::SUCCESS, length);
   }
-}
 
-result generic_validate_utf8_with_errors(const char *input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
+  return result(error_code::SUCCESS, input - start);
 }
+/* end file src/haswell/avx2_validate_utf16.cpp */
+/* begin file src/haswell/avx2_validate_utf32le.cpp */
+/* Returns:
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
+   - nullptr if an error was detected.
+*/
+const char32_t *avx2_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
 
-template <class checker>
-bool generic_validate_ascii(const uint8_t *input, size_t length) {
-  buf_block_reader<64> reader(input, length);
-  uint8_t blocks[64]{};
-  simd::simd8x64<uint8_t> running_or(blocks);
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    running_or |= in;
-    reader.advance();
+  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
+  const __m256i offset = _mm256_set1_epi32(0xffff2000);
+  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
+  __m256i currentmax = _mm256_setzero_si256();
+  __m256i currentoffsetmax = _mm256_setzero_si256();
+
+  while (input + 8 < end) {
+    const __m256i in = _mm256_loadu_si256((__m256i *)input);
+    currentmax = _mm256_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
+    input += 8;
+  }
+  __m256i is_zero =
+      _mm256_xor_si256(_mm256_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+    return nullptr;
   }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  running_or |= in;
-  return running_or.is_ascii();
-}
 
-bool generic_validate_ascii(const char *input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
+  is_zero = _mm256_xor_si256(
+      _mm256_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
+  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+    return nullptr;
+  }
+
+  return input;
 }
 
-template <class checker>
-result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
-  buf_block_reader<64> reader(input, length);
-  size_t count{0};
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(
-          reinterpret_cast<const char *>(input + count), length - count);
-      return result(res.error, count + res.count);
+const result avx2_validate_utf32le_with_errors(const char32_t *input,
+                                               size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
+
+  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
+  const __m256i offset = _mm256_set1_epi32(0xffff2000);
+  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
+  __m256i currentmax = _mm256_setzero_si256();
+  __m256i currentoffsetmax = _mm256_setzero_si256();
+
+  while (input + 8 < end) {
+    const __m256i in = _mm256_loadu_si256((__m256i *)input);
+    currentmax = _mm256_max_epu32(in, currentmax);
+    currentoffsetmax =
+        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
+
+    __m256i is_zero = _mm256_xor_si256(
+        _mm256_max_epu32(currentmax, standardmax), standardmax);
+    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+      return result(error_code::TOO_LARGE, input - start);
     }
-    reader.advance();
 
-    count += 64;
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(
-        reinterpret_cast<const char *>(input + count), length - count);
-    return result(res.error, count + res.count);
-  } else {
-    return result(error_code::SUCCESS, length);
+    is_zero =
+        _mm256_xor_si256(_mm256_max_epu32(currentoffsetmax, standardoffsetmax),
+                         standardoffsetmax);
+    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+      return result(error_code::SURROGATE, input - start);
+    }
+    input += 8;
   }
+
+  return result(error_code::SUCCESS, input - start);
 }
+/* end file src/haswell/avx2_validate_utf32le.cpp */
 
-result generic_validate_ascii_with_errors(const char *input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
+/* begin file src/haswell/avx2_convert_latin1_to_utf8.cpp */
+std::pair<const char *, char *>
+avx2_convert_latin1_to_utf8(const char *latin1_input, size_t len,
+                            char *utf8_output) {
+  const char *end = latin1_input + len;
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
+  const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
+  const size_t safety_margin = 12;
+
+  while (end - latin1_input >= std::ptrdiff_t(16 + safety_margin)) {
+    __m128i in8 = _mm_loadu_si128((__m128i *)latin1_input);
+    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
+    const __m128i v_80 = _mm_set1_epi8((char)0x80);
+    if (_mm_testz_si128(in8, v_80)) { // ASCII fast path!!!!
+      // 1. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, in8);
+      // 2. adjust pointers
+      latin1_input += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
+    }
+    // We proceed only with the first 16 bytes.
+    const __m256i in = _mm256_cvtepu8_epi16((in8));
+
+    // 1. prepare 2-byte values
+    // input 16-bit word : [0000|0000|aabb|bbbb] x 8
+    // expected output   : [1100|00aa|10bb|bbbb] x 8
+    const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+    const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+
+    // t0 = [0000|00aa|bbbb|bb00]
+    const __m256i t0 = _mm256_slli_epi16(in, 2);
+    // t1 = [0000|00aa|0000|0000]
+    const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+    // t2 = [0000|0000|00bb|bbbb]
+    const __m256i t2 = _mm256_and_si256(in, v_003f);
+    // t3 = [000a|aaaa|00bb|bbbb]
+    const __m256i t3 = _mm256_or_si256(t1, t2);
+    // t4 = [1100|00aa|10bb|bbbb]
+    const __m256i t4 = _mm256_or_si256(t3, v_c080);
+
+    // 2. merge ASCII and 2-byte codewords
+
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+
+    const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in, one_byte_bytemask);
+
+    // 3. prepare bitmask for 8-bit lookup
+    const uint32_t M0 = one_byte_bitmask & 0x55555555;
+    const uint32_t M1 = M0 >> 7;
+    const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+    // 4. pack the bytes
+
+    const uint8_t *row =
+        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+    const uint8_t *row_2 =
+        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >> 16)]
+                                                            [0];
+
+    const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+    const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+
+    const __m256i utf8_packed = _mm256_shuffle_epi8(
+        utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+    // 5. store bytes
+    _mm_storeu_si128((__m128i *)utf8_output,
+                     _mm256_castsi256_si128(utf8_packed));
+    utf8_output += row[0];
+    _mm_storeu_si128((__m128i *)utf8_output,
+                     _mm256_extractf128_si256(utf8_packed, 1));
+    utf8_output += row_2[0];
+
+    // 6. adjust pointers
+    latin1_input += 16;
+    continue;
+
+  } // while
+  return std::make_pair(latin1_input, utf8_output);
 }
+/* end file src/haswell/avx2_convert_latin1_to_utf8.cpp */
+/* begin file src/haswell/avx2_convert_latin1_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char *, char16_t *>
+avx2_convert_latin1_to_utf16(const char *latin1_input, size_t len,
+                             char16_t *utf16_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 32
 
-} // namespace utf8_validation
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_validator.h */
-// transcoding from UTF-8 to UTF-16
-/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+  size_t i = 0;
+  for (; i < rounded_len; i += 16) {
+    // Load 16 bytes from the address (input + i) into a xmm register
+    __m128i xmm0 =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(latin1_input + i));
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_utf16 {
-using namespace simd;
+    // Zero extend each byte in xmm0 to word and put it in another xmm register
+    __m128i xmm1 = _mm_cvtepu8_epi16(xmm0);
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+    // Shift xmm0 to the right by 8 bytes
+    xmm0 = _mm_srli_si128(xmm0, 8);
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+    // Zero extend each byte in the shifted xmm0 to word in xmm0
+    xmm0 = _mm_cvtepu8_epi16(xmm0);
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+    if (big_endian) {
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      xmm0 = _mm_shuffle_epi8(xmm0, swap);
+      xmm1 = _mm_shuffle_epi8(xmm1, swap);
+    }
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+    // Store the contents of xmm1 into the address pointed by (output + i)
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i), xmm1);
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+    // Store the contents of xmm0 into the address pointed by (output + i + 8)
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i + 8), xmm0);
+  }
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
+  return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
 }
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
+/* end file src/haswell/avx2_convert_latin1_to_utf16.cpp */
+/* begin file src/haswell/avx2_convert_latin1_to_utf32.cpp */
+std::pair<const char *, char32_t *>
+avx2_convert_latin1_to_utf32(const char *buf, size_t len,
+                             char32_t *utf32_output) {
+  size_t rounded_len = ((len | 7) ^ 7); // Round down to nearest multiple of 8
+
+  for (size_t i = 0; i < rounded_len; i += 8) {
+    // Load 8 Latin1 characters into a 64-bit register
+    __m128i in = _mm_loadl_epi64((__m128i *)&buf[i]);
+
+    // Zero extend each set of 8 Latin1 characters to 8 32-bit integers using
+    // vpmovzxbd
+    __m256i out = _mm256_cvtepu8_epi32(in);
+
+    // Store the results back to memory
+    _mm256_storeu_si256((__m256i *)&utf32_output[i], out);
+  }
+
+  // return pointers pointing to where we left off
+  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
 }
+/* end file src/haswell/avx2_convert_latin1_to_utf32.cpp */
 
-struct validating_transcoder {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
+/* begin file src/haswell/avx2_convert_utf8_to_utf16.cpp */
+// depends on "tables/utf8_to_utf16_tables.h"
 
-  validating_transcoder() : error(uint8_t(0)) {}
+// Convert up to 12 bytes from utf8 to utf16 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+template <endianness big_endian>
+size_t convert_masked_utf8_to_utf16(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
   //
-  // Check whether the current bytes are valid UTF-8.
   //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  // We first try a few fast paths.
+  const __m128i swap =
+      _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+  const __m128i in = _mm_loadu_si128((__m128i *)input);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process the data in chunks of 12 bytes.
+    __m256i ascii = _mm256_cvtepu8_epi16(in);
+    if (big_endian) {
+      const __m256i swap256 = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      ascii = _mm256_shuffle_epi8(ascii, swap256);
+    }
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf16_output), ascii);
+    utf16_output += 12; // We wrote 12 16-bit characters.
+    return 12;          // We consumed 12 bytes.
+  }
+  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+    __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
+    _mm_storeu_si128((__m128i *)utf16_output, composed);
+    utf16_output += 8; // We wrote 16 bytes, 8 code points.
+    return 16;
+  }
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii =
+        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
+    const __m128i middlebyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    const __m128i highbyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
+    __m128i composed_repacked = _mm_packus_epi32(composed, composed);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
+    utf16_output += 4;
+    return 12;
   }
 
-  template <endianness endian>
-  simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char16_t *utf16_output) {
-    size_t pos = 0;
-    char16_t *start{utf16_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // this is a relatively easy scenario
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+    __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+    if (big_endian)
+      composed = _mm_shuffle_epi8(composed, swap);
+    _mm_storeu_si128((__m128i *)utf16_output, composed);
+    utf16_output += 6; // We wrote 12 bytes, 6 code points. There is a potential
+                       // overflow of 4 bytes.
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii =
+        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
+    const __m128i middlebyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    const __m128i highbyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
+    __m128i composed_repacked = _mm_packus_epi32(composed, composed);
+    if (big_endian)
+      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
+    _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
+    utf16_output += 4; // Here we overflow by 8 bytes.
+  } else if (idx < 209) {
+    // TWO (2) input code-code units
+    //////////////
+    // There might be garbage inputs where a leading byte mascarades as a
+    // four-byte leading byte (by being followed by 3 continuation byte), but is
+    // not greater than 0xf0. This could trigger a buffer overflow if we only
+    // counted leading bytes of the form 0xf0 as generating surrogate pairs,
+    // without further UTF-8 validation. Thus we must be careful to ensure that
+    // only leading bytes at least as large as 0xf0 generate surrogate pairs. We
+    // do as at the cost of an extra mask.
+    /////////////
+    const __m128i sh = _mm_loadu_si128(
+        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
+    const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    __m128i middlehighbyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f0000));
+    // correct for spurious high bit
+    const __m128i correct =
+        _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
+    middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
+    const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
+    // We deliberately carry the leading four bits in highbyte if they are
+    // present, we remove them later when computing hightenbits.
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0xff000000));
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
+    // When we need to generate a surrogate pair (leading byte > 0xF0), then
+    // the corresponding 32-bit value in 'composed'  will be greater than
+    // > (0xff00000>>6) or > 0x3c00000. This can be used later to identify the
+    // location of the surrogate pairs.
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
+                     _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
+    const __m128i composedminus =
+        _mm_sub_epi32(composed, _mm_set1_epi32(0x10000));
+    const __m128i lowtenbits =
+        _mm_and_si128(composedminus, _mm_set1_epi32(0x3ff));
+    // Notice the 0x3ff mask:
+    const __m128i hightenbits =
+        _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
+    const __m128i lowtenbitsadd =
+        _mm_add_epi32(lowtenbits, _mm_set1_epi32(0xDC00));
+    const __m128i hightenbitsadd =
+        _mm_add_epi32(hightenbits, _mm_set1_epi32(0xD800));
+    const __m128i lowtenbitsaddshifted = _mm_slli_epi32(lowtenbitsadd, 16);
+    __m128i surrogates = _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
+    uint32_t basic_buffer[4];
+    uint32_t basic_buffer_swap[4];
+    if (big_endian) {
+      _mm_storeu_si128((__m128i *)basic_buffer_swap,
+                       _mm_shuffle_epi8(composed, swap));
+      surrogates = _mm_shuffle_epi8(surrogates, swap);
     }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
-        pos += 64;
+    _mm_storeu_si128((__m128i *)basic_buffer, composed);
+    uint32_t surrogate_buffer[4];
+    _mm_storeu_si128((__m128i *)surrogate_buffer, surrogates);
+    for (size_t i = 0; i < 3; i++) {
+      if (basic_buffer[i] > 0x3c00000) {
+        utf16_output[0] = uint16_t(surrogate_buffer[i] & 0xffff);
+        utf16_output[1] = uint16_t(surrogate_buffer[i] >> 16);
+        utf16_output += 2;
       } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (utf8_continuation_mask & 1) {
-          return 0; // error
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
+        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i])
+                                     : uint16_t(basic_buffer[i]);
+        utf16_output++;
       }
     }
-    if (errors()) {
-      return 0;
+  } else {
+    // here we know that there is an error but we do not handle errors
+  }
+  return consumed;
+}
+/* end file src/haswell/avx2_convert_utf8_to_utf16.cpp */
+/* begin file src/haswell/avx2_convert_utf8_to_utf32.cpp */
+// depends on "tables/utf8_to_utf16_tables.h"
+
+// Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+size_t convert_masked_utf8_to_utf32(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  // We first try a few fast paths.
+  const __m128i in = _mm_loadu_si128((__m128i *)input);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process the data in chunks of 12 bytes.
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                        _mm256_cvtepu8_epi32(in));
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output + 8),
+                        _mm256_cvtepu8_epi32(_mm_srli_si128(in, 8)));
+    utf32_output += 12; // We wrote 12 32-bit characters.
+    return 12;          // We consumed 12 bytes.
+  }
+  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+    const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+    _mm256_storeu_si256((__m256i *)utf32_output,
+                        _mm256_cvtepu16_epi32(composed));
+    utf32_output += 8; // We wrote 16 bytes, 8 code points.
+    return 16;
+  }
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. There is probably a more efficient sequence, but the
+    // following might do.
+    const __m128i sh =
+        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii =
+        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
+    const __m128i middlebyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    const __m128i highbyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
+    _mm_storeu_si128((__m128i *)utf32_output, composed);
+    utf32_output += 4;
+    return 12;
+  }
+  /// We do not have a fast path available, so we fallback.
+
+  const uint8_t idx =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // this is a relatively easy scenario
+    // we process SIX (6) input code-code units. The max length in bytes of six
+    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+    // processors where pdep/pext is fast, we might be able to use a small
+    // lookup table.
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+    const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+    _mm256_storeu_si256((__m256i *)utf32_output,
+                        _mm256_cvtepu16_epi32(composed));
+    utf32_output += 6; // We wrote 24 bytes, 6 code points. There is a potential
+    // overflow of 32 - 24 = 8 bytes.
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii =
+        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
+    const __m128i middlebyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    const __m128i highbyte =
+        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
+    _mm_storeu_si128((__m128i *)utf32_output, composed);
+    utf32_output += 4;
+  } else if (idx < 209) {
+    // TWO (2) input code-code units
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i perm = _mm_shuffle_epi8(in, sh);
+    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
+    const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
+    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
+    __m128i middlehighbyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f0000));
+    // correct for spurious high bit
+    const __m128i correct =
+        _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
+    middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
+    const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
+    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0x07000000));
+    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
+    const __m128i composed =
+        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
+                     _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
+    _mm_storeu_si128((__m128i *)utf32_output, composed);
+    utf32_output +=
+        3; // We wrote 3 * 4 bytes, there is a potential overflow of 4 bytes.
+  } else {
+    // here we know that there is an error but we do not handle errors
+  }
+  return consumed;
+}
+/* end file src/haswell/avx2_convert_utf8_to_utf32.cpp */
+
+/* begin file src/haswell/avx2_convert_utf16_to_latin1.cpp */
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+avx2_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                             char *latin1_output) {
+  const char16_t *end = buf + len;
+  while (end - buf >= 16) {
+    // Load 16 UTF-16 characters into 256-bit AVX2 register
+    __m256i in = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(buf));
+
+    if (!match_system(big_endian)) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
     }
-    if (pos < size) {
-      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
-          in + pos, size - pos, utf16_output);
-      if (howmany == 0) {
-        return 0;
-      }
-      utf16_output += howmany;
+
+    __m256i high_byte_mask = _mm256_set1_epi16((int16_t)0xFF00);
+    if (_mm256_testz_si256(in, high_byte_mask)) {
+      // Pack 16-bit characters into 8-bit and store in latin1_output
+      __m128i lo = _mm256_extractf128_si256(in, 0);
+      __m128i hi = _mm256_extractf128_si256(in, 1);
+      __m128i latin1_packed_lo = _mm_packus_epi16(lo, lo);
+      __m128i latin1_packed_hi = _mm_packus_epi16(hi, hi);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
+                       latin1_packed_lo);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output + 8),
+                       latin1_packed_hi);
+      // Adjust pointers for next iteration
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
-    return utf16_output - start;
-  }
+  } // while
+  return std::make_pair(buf, latin1_output);
+}
 
-  template <endianness endian>
-  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char16_t *utf16_output) {
-    size_t pos = 0;
-    char16_t *start{utf16_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
+template <endianness big_endian>
+std::pair<result, char *>
+avx2_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                         char *latin1_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+  while (end - buf >= 16) {
+    __m256i in = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(buf));
+
+    if (!match_system(big_endian)) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
     }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (errors() || (utf8_continuation_mask & 1)) {
-          // rewind_and_convert_with_errors will seek a potential error from
-          // in+pos onward, with the ability to go back up to pos bytes, and
-          // read size-pos bytes forward.
-          result res =
-              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-                  pos, in + pos, size - pos, utf16_output);
-          res.count += pos;
-          return res;
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
+
+    __m256i high_byte_mask = _mm256_set1_epi16((int16_t)0xFF00);
+    if (_mm256_testz_si256(in, high_byte_mask)) {
+      __m128i lo = _mm256_extractf128_si256(in, 0);
+      __m128i hi = _mm256_extractf128_si256(in, 1);
+      __m128i latin1_packed_lo = _mm_packus_epi16(lo, lo);
+      __m128i latin1_packed_hi = _mm_packus_epi16(hi, hi);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
+                       latin1_packed_lo);
+      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output + 8),
+                       latin1_packed_hi);
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      // Fallback to scalar code for handling errors
+      for (int k = 0; k < 16; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
+        } else {
+          return std::make_pair(
+              result{error_code::TOO_LARGE, (size_t)(buf - start + k)},
+              latin1_output);
         }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
-      }
-    }
-    if (errors()) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
-      res.count += pos;
-      return res;
-    }
-    if (pos < size) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
-      if (res.error) { // In case of error, we want the error position
-        res.count += pos;
-        return res;
-      } else { // In case of success, we want the number of word written
-        utf16_output += res.count;
       }
+      buf += 16;
     }
-    return result(error_code::SUCCESS, utf16_output - start);
-  }
+  } // while
+  return std::make_pair(result{error_code::SUCCESS, (size_t)(buf - start)},
+                        latin1_output);
+}
+/* end file src/haswell/avx2_convert_utf16_to_latin1.cpp */
+/* begin file src/haswell/avx2_convert_utf16_to_utf8.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
 
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       is in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
 
-}; // struct utf8_checker
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+    Ad 1.
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_utf16 {
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    char) or 2) two UTF8 bytes.
 
-using namespace simd;
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
 
-template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char16_t *utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the
-  // generic directory.
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the
-    // mask far more than 64 bytes.
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if (in.is_ascii()) {
-      in.store_ascii_as_utf16<endian>(utf16_output);
-      utf16_output += 64;
-      pos += 64;
-    } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow
-      // path. Anything that is not a continuation mask is a 'leading byte',
-      // that is, the start of a new code point.
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation
-      // byte
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end*
-      // of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times when using solely
-      // the slow/regular path, and at least four times if there are fast paths.
-      while (pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        //
-        // Thus we may allow convert_masked_utf8_to_utf16 to process
-        // more bytes at a time under a fast-path mode where 16 bytes
-        // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(
-            input + pos, utf8_end_of_code_point_mask, utf16_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
-      }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+    Ad 2.
+
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
+
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
+
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
+  const char16_t *end = buf + len;
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
+  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
+  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    if (big_endian) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
     }
-  }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
-      input + pos, size - pos, utf16_output);
-  return utf16_output - start;
-}
+    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
+    const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
+    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
+    }
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-// transcoding from UTF-8 to UTF-32
-/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_utf32 {
-using namespace simd;
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
-}
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
-}
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
+
+      // 6. adjust pointers
+      buf += 16;
+      continue;
+    }
+    // 1. Check if there are any surrogate word in the input chunk.
+    //    We have also deal with situation when there is a surrogate word
+    //    at the end of a chunk.
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+
+    // bitmask = 0x0000 if there are no surrogates
+    //         = 0xc000 if the last word is a surrogate
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (surrogates_bitmask == 0x00000000) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-struct validating_transcoder {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-  validating_transcoder() : error(uint8_t(0)) {}
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
-  }
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-  simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char32_t *utf32_output) {
-    size_t pos = 0;
-    char32_t *start{utf32_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 16 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (utf8_continuation_mask & 1) {
-          return 0; // we have an error
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
-      }
-    }
-    if (errors()) {
-      return 0;
-    }
-    if (pos < size) {
-      size_t howmany =
-          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-      if (howmany == 0) {
-        return 0;
-      }
-      utf32_output += howmany;
-    }
-    return utf32_output - start;
-  }
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char32_t *utf32_output) {
-    size_t pos = 0;
-    char32_t *start{utf32_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (errors() || (utf8_continuation_mask & 1)) {
-          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-              pos, in + pos, size - pos, utf32_output);
-          res.count += pos;
-          return res;
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
-      }
-    }
-    if (errors()) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
-      res.count += pos;
-      return res;
-    }
-    if (pos < size) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
-      if (res.error) { // In case of error, we want the error position
-        res.count += pos;
-        return res;
-      } else { // In case of success, we want the number of word written
-        utf32_output += res.count;
-      }
-    }
-    return result(error_code::SUCCESS, utf32_output - start);
-  }
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-}; // struct utf8_checker
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
-/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_utf32 {
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-using namespace simd;
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
-simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char32_t *utf32_output) noexcept {
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if (in.is_ascii()) {
-      in.store_ascii_as_utf32(utf32_output);
-      utf32_output += 64;
-      pos += 64;
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation
-      // byte
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      size_t max_starting_point = (pos + 64) - 12;
-      while (pos < max_starting_point) {
-        size_t consumed = convert_masked_utf8_to_utf32(
-            input + pos, utf8_end_of_code_point_mask, utf32_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
       }
+      buf += k;
     }
-  }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
-                                                       utf32_output);
-  return utf32_output - start;
+  } // while
+  return std::make_pair(buf, utf8_output);
 }
 
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
-// other functions
-/* begin file src/generic/utf16.h */
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf16 {
-
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
 template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t *in,
-                                               size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
-    }
-    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-    count += count_ones(not_pair) / 2;
-  }
-  return count +
-         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
-}
+std::pair<result, char *>
+avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                       char *utf8_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
-template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
-                                                    size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
+  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
+  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    if (big_endian) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
     }
-    uint64_t ascii_mask = input.lteq(0x7F);
-    uint64_t twobyte_mask = input.lteq(0x7FF);
-    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
+    const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
+    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
+    }
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-    size_t ascii_count = count_ones(ascii_mask) / 2;
-    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
-    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
-    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
-             ascii_count;
-  }
-  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
-                                                                   size - pos);
-}
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
 
-template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
-                                                     size_t size) {
-  return count_code_points<big_endian>(in, size);
-}
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-simdutf_really_inline void
-change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
-  size_t pos = 0;
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-  while (pos < size / 32 * 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    input.swap_bytes();
-    input.store(reinterpret_cast<uint16_t *>(output));
-    pos += 32;
-    output += 32;
-  }
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
+
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
-}
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-} // namespace utf16
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf16.h */
-/* begin file src/generic/utf8.h */
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8 {
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
 
-using namespace simd;
+      // 6. adjust pointers
+      buf += 16;
+      continue;
+    }
+    // 1. Check if there are any surrogate word in the input chunk.
+    //    We have also deal with situation when there is a surrogate word
+    //    at the end of a chunk.
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
-simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.gt(-65);
-    count += count_ones(utf8_continuation_mask);
-  }
-  return count + scalar::utf8::count_code_points(in + pos, size - pos);
-}
+    // bitmask = 0x0000 if there are no surrogates
+    //         = 0xc000 if the last word is a surrogate
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (surrogates_bitmask == 0x00000000) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
-                                                    size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-    // We count one word for anything that is not a continuation (so
-    // leading bytes).
-    count += 64 - count_ones(utf8_continuation_mask);
-    int64_t utf8_4byte = input.gteq_unsigned(240);
-    count += count_ones(utf8_4byte);
-  }
-  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // namespace utf8
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8.h */
-// transcoding from UTF-8 to Latin 1
-/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+      /* In this branch we handle three cases:
+         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_latin1 {
-using namespace simd;
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
-  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
-  // 0b11000010 and nothing else.
-  //
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
-  constexpr const uint8_t FORBIDDEN = 0xff;
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      FORBIDDEN,
-      // 1110____ ________ <three byte lead in byte 1>
-      FORBIDDEN,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      FORBIDDEN);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-              // ____0100 ________
-              FORBIDDEN,
-              // ____0101 ________
-              FORBIDDEN,
-              // ____011_ ________
-              FORBIDDEN, FORBIDDEN,
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-              // ____1___ ________
-              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
-              // ____1101 ________
-              FORBIDDEN, FORBIDDEN, FORBIDDEN);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
-}
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-struct validating_transcoder {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
-  validating_transcoder() : error(uint8_t(0)) {}
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    this->error |= check_special_cases(input, prev1);
-  }
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
-  simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char *latin1_output) {
-    size_t pos = 0;
-    char *start{latin1_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 16 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 16; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) >
-                       -65); // twos complement of -65 is 1011 1111 ...
-    }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store((int8_t *)latin1_output);
-        latin1_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask =
-            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
-                               // this case, we also have ASCII to account for.
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_latin1(
-              in + pos, utf8_end_of_code_point_mask, latin1_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
-      }
-    }
-    if (errors()) {
-      return 0;
-    }
-    if (pos < size) {
-      size_t howmany =
-          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
-      if (howmany == 0) {
-        return 0;
-      }
-      latin1_output += howmany;
-    }
-    return latin1_output - start;
-  }
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char *latin1_output) {
-    size_t pos = 0;
-    char *start{latin1_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store((int8_t *)latin1_output);
-        latin1_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        if (errors()) {
-          // rewind_and_convert_with_errors will seek a potential error from
-          // in+pos onward, with the ability to go back up to pos bytes, and
-          // read size-pos bytes forward.
-          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-              pos, in + pos, size - pos, latin1_output);
-          res.count += pos;
-          return res;
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_latin1(
-              in + pos, utf8_end_of_code_point_mask, latin1_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
       }
-    }
-    if (errors()) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, latin1_output);
-      res.count += pos;
-      return res;
-    }
-    if (pos < size) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, latin1_output);
-      if (res.error) { // In case of error, we want the error position
-        res.count += pos;
-        return res;
-      } else { // In case of success, we want the number of word written
-        latin1_output += res.count;
+      for (; k < forward; k++) {
+        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf8_output);
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
       }
-    }
-    return result(error_code::SUCCESS, latin1_output - start);
-  }
+      buf += k;
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
+}
+/* end file src/haswell/avx2_convert_utf16_to_utf8.cpp */
+/* begin file src/haswell/avx2_convert_utf16_to_utf32.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
 
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
 
-}; // struct utf8_checker
-} // namespace utf8_to_latin1
-} // unnamed namespace
-} // namespace arm64
-} // namespace simdutf
-/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
-/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+    Ad 1.
 
-namespace simdutf {
-namespace arm64 {
-namespace {
-namespace utf8_to_latin1 {
-using namespace simd;
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    char) or 2) two UTF8 bytes.
 
-simdutf_really_inline size_t convert_valid(const char *in, size_t size,
-                                           char *latin1_output) {
-  size_t pos = 0;
-  char *start{latin1_output};
-  // In the worst case, we have the haswell kernel which can cause an overflow
-  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
-  // 16 bytes, and if the data is valid, then it is entirely safe because 16
-  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
-  // assume that you have valid UTF-8 input, so we are going to go back from the
-  // end counting 8 leading bytes, to give us a good margin.
-  size_t leading_byte = 0;
-  size_t margin = size;
-  for (; margin > 0 && leading_byte < 8; margin--) {
-    leading_byte += (int8_t(in[margin - 1]) >
-                     -65); // twos complement of -65 is 1011 1111 ...
-  }
-  // If the input is long enough, then we have that margin-1 is the eight last
-  // leading byte.
-  const size_t safety_margin = size - margin + 1; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    if (input.is_ascii()) {
-      input.store((int8_t *)latin1_output);
-      latin1_output += 64;
-      pos += 64;
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+    Ad 2.
+
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
+
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
+
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
+
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
+
+
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+
+/*
+  Returns a pair: the first unprocessed byte from buf and utf32_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char32_t *>
+avx2_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char16_t *end = buf + len;
+  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
+  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
+
+  while (end - buf >= 16) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    if (big_endian) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
+    }
+
+    // 1. Check if there are any surrogate word in the input chunk.
+    //    We have also deal with situation when there is a surrogate word
+    //    at the end of a chunk.
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+
+    // bitmask = 0x0000 if there are no surrogates
+    //         = 0xc000 if the last word is a surrogate
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (surrogates_bitmask == 0x00000000) {
+      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
+      // units
+      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
+      _mm256_storeu_si256(
+          reinterpret_cast<__m256i *>(utf32_output + 8),
+          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
+      utf32_output += 16;
+      buf += 16;
+      // surrogate pair(s) in a register
     } else {
-      // you might think that a for-loop would work, but under Visual Studio, it
-      // is not good enough.
-      uint64_t utf8_continuation_mask =
-          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
-                             // this case, we also have ASCII to account for.
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times.
-      while (pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        size_t consumed = convert_masked_utf8_to_latin1(
-            in + pos, utf8_end_of_code_point_mask, latin1_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
       }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
+      for (; k < forward; k++) {
+        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          // No surrogate pair
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr, utf32_output);
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
+      }
+      buf += k;
     }
-  }
-  if (pos < size) {
-    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
-                                                           latin1_output);
-    latin1_output += howmany;
-  }
-  return latin1_output - start;
+  } // while
+  return std::make_pair(buf, utf32_output);
 }
 
-} // namespace utf8_to_latin1
-} // namespace
-} // namespace arm64
-} // namespace simdutf
-  // namespace simdutf
-/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char32_t *>
+avx2_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                        char32_t *utf32_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
+  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
 
-// placeholder scalars
+  while (end - buf >= 16) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    if (big_endian) {
+      const __m256i swap = _mm256_setr_epi8(
+          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
+          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
+      in = _mm256_shuffle_epi8(in, swap);
+    }
 
-//
-// Implementation-specific overrides
-//
-namespace simdutf {
-namespace arm64 {
+    // 1. Check if there are any surrogate word in the input chunk.
+    //    We have also deal with situation when there is a surrogate word
+    //    at the end of a chunk.
+    const __m256i surrogates_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
 
-simdutf_warn_unused int
-implementation::detect_encodings(const char *input,
-                                 size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if (bom_encoding != encoding_type::unspecified) {
-    return bom_encoding;
-  }
-  // todo: reimplement as a one-pass algorithm.
-  int out = 0;
-  if (validate_utf8(input, length)) {
-    out |= encoding_type::UTF8;
-  }
-  if ((length % 2) == 0) {
-    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
-                         length / 2)) {
-      out |= encoding_type::UTF16_LE;
-    }
-  }
-  if ((length % 4) == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
-      out |= encoding_type::UTF32_LE;
+    // bitmask = 0x0000 if there are no surrogates
+    //         = 0xc000 if the last word is a surrogate
+    const uint32_t surrogates_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (surrogates_bitmask == 0x00000000) {
+      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
+      // units
+      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
+                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
+      _mm256_storeu_si256(
+          reinterpret_cast<__m256i *>(utf32_output + 8),
+          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
+      utf32_output += 16;
+      buf += 16;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          // No surrogate pair
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word =
+              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                utf32_output);
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
+      }
+      buf += k;
     }
-  }
-  return out;
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf32_output);
 }
+/* end file src/haswell/avx2_convert_utf16_to_utf32.cpp */
 
-simdutf_warn_unused bool
-implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_utf8(buf, len);
-}
+/* begin file src/haswell/avx2_convert_utf32_to_latin1.cpp */
+std::pair<const char32_t *, char *>
+avx2_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                             char *latin1_output) {
+  const size_t rounded_len =
+      len & ~0x1F; // Round down to nearest multiple of 32
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
-}
+  __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
 
-simdutf_warn_unused bool
-implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_ascii(buf, len);
-}
+  __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
+                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return arm64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
-}
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
+    __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
 
-simdutf_warn_unused bool
-implementation::validate_utf16le(const char16_t *buf,
-                                 size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    // empty input is valid. protected the implementation from nullptr.
-    return true;
-  }
-  const char16_t *tail = arm_validate_utf16<endianness::LITTLE>(buf, len);
-  if (tail) {
-    return scalar::utf16::validate<endianness::LITTLE>(tail,
-                                                       len - (tail - buf));
-  } else {
-    return false;
-  }
-}
+    __m256i check_combined = _mm256_or_si256(in1, in2);
 
-simdutf_warn_unused bool
-implementation::validate_utf16be(const char16_t *buf,
-                                 size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    // empty input is valid. protected the implementation from nullptr.
-    return true;
-  }
-  const char16_t *tail = arm_validate_utf16<endianness::BIG>(buf, len);
-  if (tail) {
-    return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
-  } else {
-    return false;
-  }
-}
+    if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
+      return std::make_pair(nullptr, latin1_output);
+    }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return result(error_code::SUCCESS, 0);
-  }
-  result res = arm_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
-  if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
-        buf + res.count, len - res.count);
-    return result(scalar_res.error, res.count + scalar_res.count);
-  } else {
-    return res;
-  }
-}
+    // Turn UTF32 bytes into latin 1 bytes
+    __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
+    __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return result(error_code::SUCCESS, 0);
-  }
-  result res = arm_validate_utf16_with_errors<endianness::BIG>(buf, len);
-  if (res.count != len) {
-    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
-        buf + res.count, len - res.count);
-    return result(scalar_res.error, res.count + scalar_res.count);
-  } else {
-    return res;
-  }
-}
+    // move Latin1 bytes to their correct spot
+    __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
+    __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
+    __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
+    __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
 
-simdutf_warn_unused bool
-implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    // empty input is valid. protected the implementation from nullptr.
-    return true;
-  }
-  const char32_t *tail = arm_validate_utf32le(buf, len);
-  if (tail) {
-    return scalar::utf32::validate(tail, len - (tail - buf));
-  } else {
-    return false;
-  }
-}
+    __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
+    _mm_storeu_si128((__m128i *)latin1_output, _mm256_castsi256_si128(result));
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(
-    const char32_t *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return result(error_code::SUCCESS, 0);
-  }
-  result res = arm_validate_utf32le_with_errors(buf, len);
-  if (res.count != len) {
-    result scalar_res =
-        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
-    return result(scalar_res.error, res.count + scalar_res.count);
-  } else {
-    return res;
+    latin1_output += 16;
+    buf += 16;
   }
+
+  return std::make_pair(buf, latin1_output);
 }
+std::pair<result, char *>
+avx2_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                         char *latin1_output) {
+  const size_t rounded_len =
+      len & ~0x1F; // Round down to nearest multiple of 32
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
-    const char *buf, size_t len, char *utf8_output) const noexcept {
-  std::pair<const char *, char *> ret =
-      arm_convert_latin1_to_utf8(buf, len, utf8_output);
-  size_t converted_chars = ret.second - utf8_output;
+  __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
+  __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
+                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
 
-  if (ret.first != buf + len) {
-    const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    converted_chars += scalar_converted_chars;
-  }
-  return converted_chars;
-}
+  const char32_t *start = buf;
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char *, char16_t *> ret =
-      arm_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  size_t converted_chars = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_converted_chars =
-        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    converted_chars += scalar_converted_chars;
-  }
-  return converted_chars;
-}
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
+    __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char *, char16_t *> ret =
-      arm_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  size_t converted_chars = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_converted_chars =
-        scalar::latin1_to_utf16::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    converted_chars += scalar_converted_chars;
-  }
-  return converted_chars;
-}
+    __m256i check_combined = _mm256_or_si256(in1, in2);
+
+    if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
+      // Fallback to scalar code for handling errors
+      for (int k = 0; k < 8; k++) {
+        char32_t codepoint = buf[k];
+        if (codepoint <= 0xFF) {
+          *latin1_output++ = static_cast<char>(codepoint);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
+        }
+      }
+      buf += 8;
+    } else {
+      __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
+      __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::pair<const char *, char32_t *> ret =
-      arm_convert_latin1_to_utf32(buf, len, utf32_output);
-  size_t converted_chars = ret.second - utf32_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    converted_chars += scalar_converted_chars;
+      __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
+      __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
+      __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
+      __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
+
+      __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
+      _mm_storeu_si128((__m128i *)latin1_output,
+                       _mm256_castsi256_si128(result));
+
+      latin1_output += 16;
+      buf += 16;
+    }
   }
-  return converted_chars;
-}
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  utf8_to_latin1::validating_transcoder converter;
-  return converter.convert(buf, len, latin1_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
+/* end file src/haswell/avx2_convert_utf32_to_latin1.cpp */
+/* begin file src/haswell/avx2_convert_utf32_to_utf8.cpp */
+std::pair<const char32_t *, char *>
+avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
+  const char32_t *end = buf + len;
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
+  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
+  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
+  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
+  __m256i running_max = _mm256_setzero_si256();
+  __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  utf8_to_latin1::validating_transcoder converter;
-  return converter.convert_with_errors(buf, len, latin1_output);
-}
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return arm64::utf8_to_latin1::convert_valid(buf, len, latin1_output);
-}
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
+    running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16::validating_transcoder converter;
-  return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
-}
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
+    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16::validating_transcoder converter;
-  return converter.convert<endianness::BIG>(buf, len, utf16_output);
-}
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16::validating_transcoder converter;
-  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
-                                                           utf16_output);
-}
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
+    }
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16::validating_transcoder converter;
-  return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
-}
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
-    const char *input, size_t size, char16_t *utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
-                                                          utf16_output);
-}
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
-    const char *input, size_t size, char16_t *utf16_output) const noexcept {
-  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
-                                                       utf16_output);
-}
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  utf8_to_utf32::validating_transcoder converter;
-  return converter.convert(buf, len, utf32_output);
-}
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  utf8_to_utf32::validating_transcoder converter;
-  return converter.convert_with_errors(buf, len, utf32_output);
-}
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
-    const char *input, size_t size, char32_t *utf32_output) const noexcept {
-  return utf8_to_utf32::convert_valid(input, size, utf32_output);
-}
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<const char16_t *, char *> ret =
-      arm_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - latin1_output;
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
 
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+      // 6. adjust pointers
+      buf += 16;
+      continue;
     }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+    // Must check for overflow in packing
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    if (saturation_bitmask == 0xffffffff) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<const char16_t *, char *> ret =
-      arm_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - latin1_output;
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_latin1::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      /* In this branch we handle three cases:
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-simdutf_warn_unused result
-implementation::convert_utf16le_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<result, char *> ret =
-      arm_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
-          buf, len, latin1_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      latin1_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-simdutf_warn_unused result
-implementation::convert_utf16be_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<result, char *> ret =
-      arm_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
-                                                               latin1_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      latin1_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement a custom function.
-  return convert_utf16be_to_latin1(buf, len, latin1_output);
-}
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement a custom function.
-  return convert_utf16le_to_latin1(buf, len, latin1_output);
-}
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  std::pair<const char16_t *, char *> ret =
-      arm_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf8_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  std::pair<const char16_t *, char *> ret =
-      arm_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf8_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf8::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char *> ret =
-      arm_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len,
-                                                                utf8_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf8_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char *> ret =
-      arm_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len,
-                                                             utf8_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
     } else {
-      ret.second += scalar_res.count;
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
     }
+  } // while
+
+  // check for invalid input
+  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
+          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
+    return std::make_pair(nullptr, utf8_output);
   }
-  ret.first.count =
-      ret.second -
-      utf8_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return convert_utf16le_to_utf8(buf, len, utf8_output);
-}
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf8_output);
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return convert_utf16be_to_utf8(buf, len, utf8_output);
+  return std::make_pair(buf, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return 0;
-  }
-  std::pair<const char32_t *, char *> ret =
-      arm_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf8_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+std::pair<result, char *>
+avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                       char *utf8_output) {
+  const char32_t *end = buf + len;
+  const char32_t *start = buf;
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return result(error_code::SUCCESS, 0);
-  }
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char *> ret =
-      arm_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
-  if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-        buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf8_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+  const __m256i v_0000 = _mm256_setzero_si256();
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
+  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
+  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
+  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
+  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::pair<const char16_t *, char32_t *> ret =
-      arm_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf32_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::pair<const char16_t *, char32_t *> ret =
-      arm_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf32_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
+    // Check for too large input
+    const __m256i max_input =
+        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
+    if (static_cast<uint32_t>(_mm256_movemask_epi8(
+            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
+      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                            utf8_output);
     }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char32_t *> ret =
-      arm_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len,
-                                                                 utf32_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf32_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
+    // saturation
+    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
+                                        _mm256_and_si256(nextin, v_7fffffff));
+    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char32_t *> ret =
-      arm_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len,
-                                                              utf32_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
+    // Try to apply UTF-16 => UTF-8 routine on 256 bits
+    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+
+    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      const __m128i utf8_packed = _mm_packus_epi16(
+          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
+      // 2. store (16 bytes)
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
-  }
-  ret.first.count =
-      ret.second -
-      utf32_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+    // no bits set above 7th bit
+    const __m256i one_byte_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
+    const uint32_t one_byte_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<const char32_t *, char *> ret =
-      arm_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - latin1_output;
+    // no bits set above 11th bit
+    const __m256i one_or_two_bytes_bytemask =
+        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
+    const uint32_t one_or_two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
+    if (one_or_two_bytes_bitmask == 0xffffffff) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
+      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
 
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      // t0 = [000a|aaaa|bbbb|bb00]
+      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      // t2 = [0000|0000|00bb|bbbb]
+      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      // t3 = [000a|aaaa|00bb|bbbb]
+      const __m256i t3 = _mm256_or_si256(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<result, char *> ret =
-      arm_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
-  if (ret.first.error) {
-    return ret.first;
-  } // Can return directly since scalar fallback already found correct
-    // ret.first.count
-  if (ret.first.count != len) { // All good so far, but not finished
-    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
-        buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      latin1_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+      // 2. merge ASCII and 2-byte codewords
+      const __m256i utf8_unpacked =
+          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<const char32_t *, char *> ret =
-      arm_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - latin1_output;
+      // 3. prepare bitmask for 8-bit lookup
+      const uint32_t M0 = one_byte_bitmask & 0x55555555;
+      const uint32_t M1 = M0 >> 7;
+      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      // 4. pack the bytes
 
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert_valid(
-        ret.first, len - (ret.first - buf), ret.second);
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      const uint8_t *row =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
+      const uint8_t *row_2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
+                                                                       16)][0];
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  // optimization opportunity: implement a custom function.
-  return convert_utf32_to_utf8(buf, len, utf8_output);
-}
+      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char32_t *, char16_t *> ret =
-      arm_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+      const __m256i utf8_packed = _mm256_shuffle_epi8(
+          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_castsi256_si128(utf8_packed));
+      utf8_output += row[0];
+      _mm_storeu_si128((__m128i *)utf8_output,
+                       _mm256_extractf128_si256(utf8_packed, 1));
+      utf8_output += row_2[0];
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char32_t *, char16_t *> ret =
-      arm_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf32_to_utf16::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+      // 6. adjust pointers
+      buf += 16;
+      continue;
     }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
+    // Must check for overflow in packing
+    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
+        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+    if (saturation_bitmask == 0xffffffff) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char16_t *> ret =
-      arm_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len,
-                                                                 utf16_output);
-  if (ret.first.count != len) {
-    result scalar_res =
-        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf16_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+      // Check for illegal surrogate code units
+      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf8_output);
+      }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char16_t *> ret =
-      arm_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
-                                                              utf16_output);
-  if (ret.first.count != len) {
-    result scalar_res =
-        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf16_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
+      const __m256i dup_even = _mm256_setr_epi16(
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
+          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return convert_utf32_to_utf16le(buf, len, utf16_output);
-}
+      /* In this branch we handle three cases:
+        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+        single UFT-8 byte
+        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+        UTF-8 bytes
+        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+        three UTF-8 bytes
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return convert_utf32_to_utf16be(buf, len, utf16_output);
-}
+        We expand the input word (16-bit) into two code units (32-bit), thus
+        we have room for four bytes. However, we need five distinct bit
+        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return convert_utf16le_to_utf32(buf, len, utf32_output);
-}
+        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+        in register t2.
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return convert_utf16be_to_utf32(buf, len, utf32_output);
-}
+        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+        either byte 1 for case #2 or byte 2 for case #3. Note that they
+        differ by exactly one bit.
 
-void implementation::change_endianness_utf16(const char16_t *input,
-                                             size_t length,
-                                             char16_t *output) const noexcept {
-  utf16::change_endianness_utf16(input, length, output);
-}
+        Finally from these two code units we build proper UTF-8 sequence, taking
+        into account the case (i.e, the number of bytes to write).
+      */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
 
-simdutf_warn_unused size_t implementation::count_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::count_code_points<endianness::LITTLE>(input, length);
-}
+      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
+      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
+      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
+      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
+      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
+                                             simdutf_vec(0b0100000000000000));
+      const __m256i s4 = _mm256_xor_si256(s3, m0);
+#undef simdutf_vec
+
+      // 4. expand code units 16-bit => 32-bit
+      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
+      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
+                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
+      // Due to the wider registers, the following path is less likely to be
+      // useful.
+      /*if(mask == 0) {
+        // We only have three-byte code units. Use fast path.
+        const __m256i shuffle =
+      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
+      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
+      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
+      _mm256_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
+        _mm_storeu_si128((__m128i*)utf8_output,
+      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        continue;
+      }*/
+      const uint8_t mask0 = uint8_t(mask);
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
+      const __m128i utf8_0 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
 
-simdutf_warn_unused size_t implementation::count_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::count_code_points<endianness::BIG>(input, length);
-}
+      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
+      const __m128i utf8_1 =
+          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
 
-simdutf_warn_unused size_t
-implementation::count_utf8(const char *input, size_t length) const noexcept {
-  return utf8::count_code_points(input, length);
-}
+      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
+      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
+      const __m128i utf8_2 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
-    const char *buf, size_t len) const noexcept {
-  return count_utf8(buf, len);
-}
+      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
+      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
+      const __m128i utf8_3 =
+          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf16(size_t length) const noexcept {
-  return scalar::utf16::latin1_length_from_utf16(length);
-}
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
+      utf8_output += row0[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+      utf8_output += row1[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
+      utf8_output += row2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
+      utf8_output += row3[0];
+      buf += 16;
+    } else {
+      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
+      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD may require
+      // large, non-trivial tables?
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else { // 4-byte
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf32(size_t length) const noexcept {
-  return scalar::utf32::latin1_length_from_utf32(length);
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
 }
+/* end file src/haswell/avx2_convert_utf32_to_utf8.cpp */
+/* begin file src/haswell/avx2_convert_utf32_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char32_t *, char16_t *>
+avx2_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                            char16_t *utf16_output) {
+  const char32_t *end = buf + len;
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
-    const char *input, size_t length) const noexcept {
-  // See
-  // https://lemire.me/blog/2023/05/15/computing-the-utf-8-size-of-a-latin-1-string-quickly-arm-neon-edition/
-  // credit to Pete Cawley
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
-  uint64_t result = 0;
-  const int lanes = sizeof(uint8x16_t);
-  uint8_t rem = length % lanes;
-  const uint8_t *simd_end = data + (length / lanes) * lanes;
-  const uint8x16_t threshold = vdupq_n_u8(0x80);
-  for (; data < simd_end; data += lanes) {
-    // load 16 bytes
-    uint8x16_t input_vec = vld1q_u8(data);
-    // compare to threshold (0x80)
-    uint8x16_t withhighbit = vcgeq_u8(input_vec, threshold);
-    // vertical addition
-    result -= vaddvq_s8(vreinterpretq_s8_u8(withhighbit));
-  }
-  return result + (length / lanes) * lanes +
-         scalar::latin1::utf8_length_from_latin1((const char *)simd_end, rem);
-}
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+  __m256i forbidden_bytemask = _mm256_setzero_si256();
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
-}
+  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
-}
+    const __m256i v_00000000 = _mm256_setzero_si256();
+    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
-simdutf_warn_unused size_t
-implementation::utf16_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf16_length_from_latin1(length);
-}
+    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
-simdutf_warn_unused size_t
-implementation::utf32_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf32_length_from_latin1(length);
-}
+    if (saturation_bitmask == 0xffffffff) {
+      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
+      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
+      forbidden_bytemask = _mm256_or_si256(
+          forbidden_bytemask,
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
-}
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      }
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
+      utf16_output += 8;
+      buf += 8;
+    } else {
+      size_t forward = 7;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr, utf16_output);
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (big_endian) {
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
+    }
+  }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
-}
+  // check for invalid input
+  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+    return std::make_pair(nullptr, utf16_output);
+  }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return utf8::utf16_length_from_utf8(input, length);
+  return std::make_pair(buf, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  const uint32x4_t v_7f = vmovq_n_u32((uint32_t)0x7f);
-  const uint32x4_t v_7ff = vmovq_n_u32((uint32_t)0x7ff);
-  const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
-  const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos + 4 <= length; pos += 4) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
-    const uint32x4_t ascii_bytes_bytemask = vcleq_u32(in, v_7f);
-    const uint32x4_t one_two_bytes_bytemask = vcleq_u32(in, v_7ff);
-    const uint32x4_t two_bytes_bytemask =
-        veorq_u32(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const uint32x4_t three_bytes_bytemask =
-        veorq_u32(vcleq_u32(in, v_ffff), one_two_bytes_bytemask);
+template <endianness big_endian>
+std::pair<result, char16_t *>
+avx2_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                        char16_t *utf16_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-    const uint16x8_t reduced_ascii_bytes_bytemask =
-        vreinterpretq_u16_u32(vandq_u32(ascii_bytes_bytemask, v_1));
-    const uint16x8_t reduced_two_bytes_bytemask =
-        vreinterpretq_u16_u32(vandq_u32(two_bytes_bytemask, v_1));
-    const uint16x8_t reduced_three_bytes_bytemask =
-        vreinterpretq_u16_u32(vandq_u32(three_bytes_bytemask, v_1));
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-    const uint16x8_t compressed_bytemask0 =
-        vpaddq_u16(reduced_ascii_bytes_bytemask, reduced_two_bytes_bytemask);
-    const uint16x8_t compressed_bytemask1 =
-        vpaddq_u16(reduced_three_bytes_bytemask, reduced_three_bytes_bytemask);
+  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
+    __m256i in = _mm256_loadu_si256((__m256i *)buf);
 
-    size_t ascii_count = count_ones(
-        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 0));
-    size_t two_bytes_count = count_ones(
-        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask0), 1));
-    size_t three_bytes_count = count_ones(
-        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask1), 0));
+    const __m256i v_00000000 = _mm256_setzero_si256();
+    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
 
-    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
-  }
-  return count +
-         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
-}
+    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
+    const __m256i saturation_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t saturation_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  const uint32x4_t v_ffff = vmovq_n_u32((uint32_t)0xffff);
-  const uint32x4_t v_1 = vmovq_n_u32((uint32_t)0x1);
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos + 4 <= length; pos += 4) {
-    uint32x4_t in = vld1q_u32(reinterpret_cast<const uint32_t *>(input + pos));
-    const uint32x4_t surrogate_bytemask = vcgtq_u32(in, v_ffff);
-    const uint16x8_t reduced_bytemask =
-        vreinterpretq_u16_u32(vandq_u32(surrogate_bytemask, v_1));
-    const uint16x8_t compressed_bytemask =
-        vpaddq_u16(reduced_bytemask, reduced_bytemask);
-    size_t surrogate_count = count_ones(
-        vgetq_lane_u64(vreinterpretq_u64_u16(compressed_bytemask), 0));
-    count += 4 + surrogate_count;
+    if (saturation_bitmask == 0xffffffff) {
+      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
+      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
+      const __m256i forbidden_bytemask =
+          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
+          0x0) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              utf16_output);
+      }
+
+      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
+                                              _mm256_extractf128_si256(in, 1));
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      }
+      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
+      utf16_output += 8;
+      buf += 8;
+    } else {
+      size_t forward = 7;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k), utf16_output);
+          }
+          *utf16_output++ =
+              big_endian
+                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
+                  : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (big_endian) {
+            high_surrogate =
+                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
+            low_surrogate =
+                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
+    }
   }
-  return count +
-         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
-}
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return utf8::count_code_points(input, length);
+  return std::make_pair(result(error_code::SUCCESS, buf - start), utf16_output);
 }
+/* end file src/haswell/avx2_convert_utf32_to_utf16.cpp */
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
-}
+/* begin file src/haswell/avx2_convert_utf8_to_latin1.cpp */
+// depends on "tables/utf8_to_utf16_tables.h"
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
-}
+// Convert up to 12 bytes from utf8 to latin1 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+size_t convert_masked_utf8_to_latin1(const char *input,
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  const __m128i in = _mm_loadu_si128((__m128i *)input);
 
-simdutf_warn_unused full_result implementation::base64_to_binary_details(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
-}
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask &
+      0xfff; // we are only processing 12 bytes in case it is not all ASCII
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process the data in chunks of 12 bytes.
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
+    latin1_output += 12; // We wrote 12 characters.
+    return 12;           // We consumed 1 bytes.
+  }
+  /// We do not have a fast path available, so we fallback.
+  const uint8_t idx =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  // this indicates an invalid input:
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+  // processors where pdep/pext is fast, we might be able to use a small lookup
+  // table.
+  const __m128i sh =
+      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+  const __m128i perm = _mm_shuffle_epi8(in, sh);
+  const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+  const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+  __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
+  // writing 8 bytes even though we only care about the first 6 bytes.
+  // performance note: it would be faster to use _mm_storeu_si128, we should
+  // investigate.
+  _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
+  latin1_output += 6; // We wrote 6 bytes.
+  return consumed;
 }
+/* end file src/haswell/avx2_convert_utf8_to_latin1.cpp */
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
-}
+/* begin file src/haswell/avx2_base64.cpp */
+/**
+ * References and further reading:
+ *
+ * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
+ * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
+ * https://arxiv.org/abs/1910.05109
+ *
+ * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
+ * Instructions, ACM Transactions on the Web 12 (3), 2018.
+ * https://arxiv.org/abs/1704.00605
+ *
+ * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
+ * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
+ * Request for Comments: 4648.
+ *
+ * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
+ * http://www.alfredklomp.com/programming/sse-base64/. (2014).
+ *
+ * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
+ * acceleration. https://github.com/aklomp/base64. (2014).
+ *
+ * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
+ * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
+ *
+ * Nick Kopp. 2013. Base64 Encoding on a GPU.
+ * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
+ */
 
-simdutf_warn_unused full_result implementation::base64_to_binary_details(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
-}
+template <bool base64_url>
+simdutf_really_inline __m256i lookup_pshufb_improved(const __m256i input) {
+  // credit: Wojciech Muła
+  __m256i result = _mm256_subs_epu8(input, _mm256_set1_epi8(51));
+  const __m256i less = _mm256_cmpgt_epi8(_mm256_set1_epi8(26), input);
+  result =
+      _mm256_or_si256(result, _mm256_and_si256(less, _mm256_set1_epi8(13)));
+  __m256i shift_LUT;
+  if (base64_url) {
+    shift_LUT = _mm256_setr_epi8(
+        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0,
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(
-    size_t length, base64_options options) const noexcept {
-  return scalar::base64::base64_length_from_binary(length, options);
-}
+        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0);
+  } else {
+    shift_LUT = _mm256_setr_epi8(
+        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0,
 
-size_t implementation::binary_to_base64(const char *input, size_t length,
-                                        char *output,
-                                        base64_options options) const noexcept {
-  return encode_base64(output, input, length, options);
+        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0);
+  }
+
+  result = _mm256_shuffle_epi8(shift_LUT, result);
+  return _mm256_add_epi8(result, input);
 }
 
-} // namespace arm64
-} // namespace simdutf
+template <bool isbase64url>
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
+  // credit: Wojciech Muła
+  const uint8_t *input = (const uint8_t *)src;
 
-/* begin file src/simdutf/arm64/end.h */
-/* end file src/simdutf/arm64/end.h */
-/* end file src/arm64/implementation.cpp */
-#endif
-#if SIMDUTF_IMPLEMENTATION_FALLBACK
-/* begin file src/fallback/implementation.cpp */
-/* begin file src/simdutf/fallback/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "fallback"
-// #define SIMDUTF_IMPLEMENTATION fallback
-/* end file src/simdutf/fallback/begin.h */
+  uint8_t *out = (uint8_t *)dst;
+  const __m256i shuf =
+      _mm256_set_epi8(10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1,
 
+                      10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
+  size_t i = 0;
+  for (; i + 100 <= srclen; i += 96) {
+    const __m128i lo0 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 0));
+    const __m128i hi0 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 1));
+    const __m128i lo1 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 2));
+    const __m128i hi1 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 3));
+    const __m128i lo2 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 4));
+    const __m128i hi2 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 5));
+    const __m128i lo3 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 6));
+    const __m128i hi3 = _mm_loadu_si128(
+        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 7));
 
+    __m256i in0 = _mm256_shuffle_epi8(_mm256_set_m128i(hi0, lo0), shuf);
+    __m256i in1 = _mm256_shuffle_epi8(_mm256_set_m128i(hi1, lo1), shuf);
+    __m256i in2 = _mm256_shuffle_epi8(_mm256_set_m128i(hi2, lo2), shuf);
+    __m256i in3 = _mm256_shuffle_epi8(_mm256_set_m128i(hi3, lo3), shuf);
 
+    const __m256i t0_0 = _mm256_and_si256(in0, _mm256_set1_epi32(0x0fc0fc00));
+    const __m256i t0_1 = _mm256_and_si256(in1, _mm256_set1_epi32(0x0fc0fc00));
+    const __m256i t0_2 = _mm256_and_si256(in2, _mm256_set1_epi32(0x0fc0fc00));
+    const __m256i t0_3 = _mm256_and_si256(in3, _mm256_set1_epi32(0x0fc0fc00));
 
+    const __m256i t1_0 =
+        _mm256_mulhi_epu16(t0_0, _mm256_set1_epi32(0x04000040));
+    const __m256i t1_1 =
+        _mm256_mulhi_epu16(t0_1, _mm256_set1_epi32(0x04000040));
+    const __m256i t1_2 =
+        _mm256_mulhi_epu16(t0_2, _mm256_set1_epi32(0x04000040));
+    const __m256i t1_3 =
+        _mm256_mulhi_epu16(t0_3, _mm256_set1_epi32(0x04000040));
 
+    const __m256i t2_0 = _mm256_and_si256(in0, _mm256_set1_epi32(0x003f03f0));
+    const __m256i t2_1 = _mm256_and_si256(in1, _mm256_set1_epi32(0x003f03f0));
+    const __m256i t2_2 = _mm256_and_si256(in2, _mm256_set1_epi32(0x003f03f0));
+    const __m256i t2_3 = _mm256_and_si256(in3, _mm256_set1_epi32(0x003f03f0));
 
+    const __m256i t3_0 =
+        _mm256_mullo_epi16(t2_0, _mm256_set1_epi32(0x01000010));
+    const __m256i t3_1 =
+        _mm256_mullo_epi16(t2_1, _mm256_set1_epi32(0x01000010));
+    const __m256i t3_2 =
+        _mm256_mullo_epi16(t2_2, _mm256_set1_epi32(0x01000010));
+    const __m256i t3_3 =
+        _mm256_mullo_epi16(t2_3, _mm256_set1_epi32(0x01000010));
 
+    const __m256i input0 = _mm256_or_si256(t1_0, t3_0);
+    const __m256i input1 = _mm256_or_si256(t1_1, t3_1);
+    const __m256i input2 = _mm256_or_si256(t1_2, t3_2);
+    const __m256i input3 = _mm256_or_si256(t1_3, t3_3);
 
-#include <cstdint>
-#include <cstring>
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
+                        lookup_pshufb_improved<isbase64url>(input0));
+    out += 32;
 
-namespace simdutf {
-namespace fallback {
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
+                        lookup_pshufb_improved<isbase64url>(input1));
+    out += 32;
 
-simdutf_warn_unused int
-implementation::detect_encodings(const char *input,
-                                 size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if (bom_encoding != encoding_type::unspecified) {
-    return bom_encoding;
-  }
-  // todo: reimplement as a one-pass algorithm.
-  int out = 0;
-  if (validate_utf8(input, length)) {
-    out |= encoding_type::UTF8;
-  }
-  if ((length % 2) == 0) {
-    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
-                         length / 2)) {
-      out |= encoding_type::UTF16_LE;
-    }
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
+                        lookup_pshufb_improved<isbase64url>(input2));
+    out += 32;
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
+                        lookup_pshufb_improved<isbase64url>(input3));
+    out += 32;
   }
-  if ((length % 4) == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
-      out |= encoding_type::UTF32_LE;
-    }
+  for (; i + 28 <= srclen; i += 24) {
+    // lo = [xxxx|DDDC|CCBB|BAAA]
+    // hi = [xxxx|HHHG|GGFF|FEEE]
+    const __m128i lo =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i));
+    const __m128i hi =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i + 4 * 3));
+
+    // bytes from groups A, B and C are needed in separate 32-bit lanes
+    // in = [0HHH|0GGG|0FFF|0EEE[0DDD|0CCC|0BBB|0AAA]
+    __m256i in = _mm256_shuffle_epi8(_mm256_set_m128i(hi, lo), shuf);
+
+    // this part is well commented in encode.sse.cpp
+
+    const __m256i t0 = _mm256_and_si256(in, _mm256_set1_epi32(0x0fc0fc00));
+    const __m256i t1 = _mm256_mulhi_epu16(t0, _mm256_set1_epi32(0x04000040));
+    const __m256i t2 = _mm256_and_si256(in, _mm256_set1_epi32(0x003f03f0));
+    const __m256i t3 = _mm256_mullo_epi16(t2, _mm256_set1_epi32(0x01000010));
+    const __m256i indices = _mm256_or_si256(t1, t3);
+
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
+                        lookup_pshufb_improved<isbase64url>(indices));
+    out += 32;
   }
-  return out;
+  return i / 3 * 4 + scalar::base64::tail_encode_base64((char *)out, src + i,
+                                                        srclen - i, options);
 }
 
-simdutf_warn_unused bool
-implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return scalar::utf8::validate(buf, len);
-}
+static inline void compress(__m128i data, uint16_t mask, char *output) {
+  if (mask == 0) {
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(output), data);
+    return;
+  }
+  // this particular implementation was inspired by work done by @animetosho
+  // we do it in two steps, first 8 bytes and then second 8 bytes
+  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
+  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
+  // next line just loads the 64-bit values thintable_epi8[mask1] and
+  // thintable_epi8[mask2] into a 128-bit register, using only
+  // two instructions on most compilers.
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return scalar::utf8::validate_with_errors(buf, len);
-}
+  __m128i shufmask = _mm_set_epi64x(tables::base64::thintable_epi8[mask2],
+                                    tables::base64::thintable_epi8[mask1]);
+  // we increment by 0x08 the second half of the mask
+  shufmask =
+      _mm_add_epi8(shufmask, _mm_set_epi32(0x08080808, 0x08080808, 0, 0));
+  // this is the version "nearly pruned"
+  __m128i pruned = _mm_shuffle_epi8(data, shufmask);
+  // we still need to put the two halves together.
+  // we compute the popcount of the first half:
+  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
+  // then load the corresponding mask, what it does is to write
+  // only the first pop1 bytes from the first 8 bytes, and then
+  // it fills in with the bytes from the second 8 bytes + some filling
+  // at the end.
+  __m128i compactmask = _mm_loadu_si128(reinterpret_cast<const __m128i *>(
+      tables::base64::pshufb_combine_table + pop1 * 8));
+  __m128i answer = _mm_shuffle_epi8(pruned, compactmask);
 
-simdutf_warn_unused bool
-implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return scalar::ascii::validate(buf, len);
+  _mm_storeu_si128(reinterpret_cast<__m128i *>(output), answer);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return scalar::ascii::validate_with_errors(buf, len);
+static inline void compress(__m256i data, uint32_t mask, char *output) {
+  if (mask == 0) {
+    _mm256_storeu_si256(reinterpret_cast<__m256i *>(output), data);
+    return;
+  }
+  compress(_mm256_castsi256_si128(data), uint16_t(mask), output);
+  compress(_mm256_extracti128_si256(data, 1), uint16_t(mask >> 16),
+           output + _mm_popcnt_u32(~mask & 0xFFFF));
 }
 
-simdutf_warn_unused bool
-implementation::validate_utf16le(const char16_t *buf,
-                                 size_t len) const noexcept {
-  return scalar::utf16::validate<endianness::LITTLE>(buf, len);
-}
+struct block64 {
+  __m256i chunks[2];
+};
 
-simdutf_warn_unused bool
-implementation::validate_utf16be(const char16_t *buf,
-                                 size_t len) const noexcept {
-  return scalar::utf16::validate<endianness::BIG>(buf, len);
-}
+template <bool base64_url>
+static inline uint32_t to_base64_mask(__m256i *src, uint32_t *error) {
+  const __m256i ascii_space_tbl =
+      _mm256_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa,
+                       0x0, 0xc, 0xd, 0x0, 0x0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0,
+                       0x0, 0x0, 0x0, 0x9, 0xa, 0x0, 0xc, 0xd, 0x0, 0x0);
+  // credit: aqrit
+  __m256i delta_asso;
+  if (base64_url) {
+    delta_asso =
+        _mm256_setr_epi8(0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x0, 0x0, 0x0,
+                         0x0, 0x0, 0xF, 0x0, 0xF, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
+                         0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xF, 0x0, 0xF);
+  } else {
+    delta_asso = _mm256_setr_epi8(
+        0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x0F, 0x00, 0x0F, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0x00, 0x0F);
+  }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
-}
+  __m256i delta_values;
+  if (base64_url) {
+    delta_values = _mm256_setr_epi8(
+        0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF), uint8_t(0xBF), uint8_t(0xB9),
+        uint8_t(0xB9), 0x0, 0x11, uint8_t(0xC3), uint8_t(0xBF), uint8_t(0xE0),
+        uint8_t(0xB9), uint8_t(0xB9), 0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF),
+        uint8_t(0xBF), uint8_t(0xB9), uint8_t(0xB9), 0x0, 0x11, uint8_t(0xC3),
+        uint8_t(0xBF), uint8_t(0xE0), uint8_t(0xB9), uint8_t(0xB9));
+  } else {
+    delta_values = _mm256_setr_epi8(
+        int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13), int8_t(0x04),
+        int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9), int8_t(0x00),
+        int8_t(0x10), int8_t(0xC3), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+        int8_t(0xB9), int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+        int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9),
+        int8_t(0x00), int8_t(0x10), int8_t(0xC3), int8_t(0xBF), int8_t(0xBF),
+        int8_t(0xB9), int8_t(0xB9));
+  }
+  __m256i check_asso;
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
-}
+  if (base64_url) {
+    check_asso =
+        _mm256_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x3,
+                         0x7, 0xB, 0xE, 0xB, 0x6, 0xD, 0x1, 0x1, 0x1, 0x1, 0x1,
+                         0x1, 0x1, 0x1, 0x1, 0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
+  } else {
 
-simdutf_warn_unused bool
-implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
-  return scalar::utf32::validate(buf, len);
+    check_asso = _mm256_setr_epi8(
+        0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x03, 0x07,
+        0x0B, 0x0B, 0x0B, 0x0F, 0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+        0x01, 0x01, 0x03, 0x07, 0x0B, 0x0B, 0x0B, 0x0F);
+  }
+  __m256i check_values;
+  if (base64_url) {
+    check_values = _mm256_setr_epi8(
+        uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+        uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6), uint8_t(0xA6),
+        uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0, uint8_t(0x80),
+        0x0, uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+        uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6),
+        uint8_t(0xA6), uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
+        uint8_t(0x80), 0x0, uint8_t(0x80));
+  } else {
+    check_values = _mm256_setr_epi8(
+        int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0xCF),
+        int8_t(0xBF), int8_t(0xD5), int8_t(0xA6), int8_t(0xB5), int8_t(0x86),
+        int8_t(0xD1), int8_t(0x80), int8_t(0xB1), int8_t(0x80), int8_t(0x91),
+        int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
+        int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6), int8_t(0xB5),
+        int8_t(0x86), int8_t(0xD1), int8_t(0x80), int8_t(0xB1), int8_t(0x80),
+        int8_t(0x91), int8_t(0x80));
+  }
+  const __m256i shifted = _mm256_srli_epi32(*src, 3);
+  const __m256i delta_hash =
+      _mm256_avg_epu8(_mm256_shuffle_epi8(delta_asso, *src), shifted);
+  const __m256i check_hash =
+      _mm256_avg_epu8(_mm256_shuffle_epi8(check_asso, *src), shifted);
+  const __m256i out =
+      _mm256_adds_epi8(_mm256_shuffle_epi8(delta_values, delta_hash), *src);
+  const __m256i chk =
+      _mm256_adds_epi8(_mm256_shuffle_epi8(check_values, check_hash), *src);
+  const int mask = _mm256_movemask_epi8(chk);
+  if (mask) {
+    __m256i ascii_space =
+        _mm256_cmpeq_epi8(_mm256_shuffle_epi8(ascii_space_tbl, *src), *src);
+    *error = (mask ^ _mm256_movemask_epi8(ascii_space));
+  }
+  *src = out;
+  return (uint32_t)mask;
 }
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(
-    const char32_t *buf, size_t len) const noexcept {
-  return scalar::utf32::validate_with_errors(buf, len);
+template <bool base64_url>
+static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
+  uint32_t err0 = 0;
+  uint32_t err1 = 0;
+  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], &err0);
+  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], &err1);
+  *error = err0 | ((uint64_t)err1 << 32);
+  return m0 | (m1 << 32);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
-    const char *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::latin1_to_utf8::convert(buf, len, utf8_output);
+static inline void copy_block(block64 *b, char *output) {
+  _mm256_storeu_si256(reinterpret_cast<__m256i *>(output), b->chunks[0]);
+  _mm256_storeu_si256(reinterpret_cast<__m256i *>(output + 32), b->chunks[1]);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::latin1_to_utf16::convert<endianness::LITTLE>(buf, len,
-                                                              utf16_output);
+static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
+  uint64_t nmask = ~mask;
+  compress(b->chunks[0], uint32_t(mask), output);
+  compress(b->chunks[1], uint32_t(mask >> 32),
+           output + _mm_popcnt_u64(nmask & 0xFFFFFFFF));
+  return _mm_popcnt_u64(nmask);
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::latin1_to_utf16::convert<endianness::BIG>(buf, len,
-                                                           utf16_output);
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char *src) {
+  b->chunks[0] = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
+  b->chunks[1] =
+      _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32));
 }
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::latin1_to_utf32::convert(buf, len, utf32_output);
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char16_t *src) {
+  __m256i m1 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
+  __m256i m2 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 16));
+  __m256i m3 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32));
+  __m256i m4 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 48));
+  __m256i m1p = _mm256_permute2x128_si256(m1, m2, 0x20);
+  __m256i m2p = _mm256_permute2x128_si256(m1, m2, 0x31);
+  __m256i m3p = _mm256_permute2x128_si256(m3, m4, 0x20);
+  __m256i m4p = _mm256_permute2x128_si256(m3, m4, 0x31);
+  b->chunks[0] = _mm256_packus_epi16(m1p, m2p);
+  b->chunks[1] = _mm256_packus_epi16(m3p, m4p);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf8_to_latin1::convert(buf, len, latin1_output);
-}
+static inline void base64_decode(char *out, __m256i str) {
+  // credit: aqrit
+  const __m256i pack_shuffle =
+      _mm256_setr_epi8(2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1,
+                       2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1);
+  const __m256i t0 = _mm256_maddubs_epi16(str, _mm256_set1_epi32(0x01400140));
+  const __m256i t1 = _mm256_madd_epi16(t0, _mm256_set1_epi32(0x00011000));
+  const __m256i t2 = _mm256_shuffle_epi8(t1, pack_shuffle);
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf8_to_latin1::convert_with_errors(buf, len, latin1_output);
+  // Store the output:
+  _mm_storeu_si128((__m128i *)out, _mm256_castsi256_si128(t2));
+  _mm_storeu_si128((__m128i *)(out + 12), _mm256_extracti128_si256(t2, 1));
 }
-
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf8_to_latin1::convert_valid(buf, len, latin1_output);
+// decode 64 bytes and output 48 bytes
+static inline void base64_decode_block(char *out, const char *src) {
+  base64_decode(out,
+                _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src)));
+  base64_decode(out + 24, _mm256_loadu_si256(
+                              reinterpret_cast<const __m256i *>(src + 32)));
 }
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert<endianness::LITTLE>(buf, len,
-                                                            utf16_output);
+static inline void base64_decode_block_safe(char *out, const char *src) {
+  base64_decode(out,
+                _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src)));
+  char buffer[32]; // We enforce safety with a buffer.
+  base64_decode(
+      buffer, _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32)));
+  std::memcpy(out + 24, buffer, 24);
 }
-
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert<endianness::BIG>(buf, len,
-                                                         utf16_output);
+static inline void base64_decode_block(char *out, block64 *b) {
+  base64_decode(out, b->chunks[0]);
+  base64_decode(out + 24, b->chunks[1]);
 }
-
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf16_output);
+static inline void base64_decode_block_safe(char *out, block64 *b) {
+  base64_decode(out, b->chunks[0]);
+  char buffer[32]; // We enforce safety with a buffer.
+  base64_decode(buffer, b->chunks[1]);
+  std::memcpy(out + 24, buffer, 24);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(
-      buf, len, utf16_output);
-}
+template <bool base64_url, typename chartype>
+full_result
+compress_decode_base64(char *dst, const chartype *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
+  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
+                                        : tables::base64::to_base64_value;
+  size_t equallocation =
+      srclen; // location of the first padding character if any
+  // skip trailing spaces
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
+    srclen--;
+  }
+  size_t equalsigns = 0;
+  if (srclen > 0 && src[srclen - 1] == '=') {
+    equallocation = srclen - 1;
+    srclen--;
+    equalsigns = 1;
+    // skip trailing spaces
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
+      srclen--;
+    }
+    if (srclen > 0 && src[srclen - 1] == '=') {
+      equallocation = srclen - 1;
+      srclen--;
+      equalsigns = 2;
+    }
+  }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  char *end_of_safe_64byte_zone =
+      (srclen + 3) / 4 * 3 >= 63 ? dst + (srclen + 3) / 4 * 3 - 63 : dst;
+
+  const chartype *const srcinit = src;
+  const char *const dstinit = dst;
+  const chartype *const srcend = src + srclen;
+
+  constexpr size_t block_size = 6;
+  static_assert(block_size >= 2, "block_size must be at least two");
+  char buffer[block_size * 64];
+  char *bufferptr = buffer;
+  if (srclen >= 64) {
+    const chartype *const srcend64 = src + srclen - 64;
+    while (src <= srcend64) {
+      block64 b;
+      load_block(&b, src);
+      src += 64;
+      uint64_t error = 0;
+      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
+      if (error) {
+        src -= 64;
+        size_t error_offset = _tzcnt_u64(error);
+        return {error_code::INVALID_BASE64_CHARACTER,
+                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
+      }
+      if (badcharmask != 0) {
+        // optimization opportunity: check for simple masks like those made of
+        // continuous 1s followed by continuous 0s. And masks containing a
+        // single bad character.
+        bufferptr += compress_block(&b, badcharmask, bufferptr);
+      } else if (bufferptr != buffer) {
+        copy_block(&b, bufferptr);
+        bufferptr += 64;
+      } else {
+        if (dst >= end_of_safe_64byte_zone) {
+          base64_decode_block_safe(dst, &b);
+        } else {
+          base64_decode_block(dst, &b);
+        }
+        dst += 48;
+      }
+      if (bufferptr >= (block_size - 1) * 64 + buffer) {
+        for (size_t i = 0; i < (block_size - 2); i++) {
+          base64_decode_block(dst, buffer + i * 64);
+          dst += 48;
+        }
+        if (dst >= end_of_safe_64byte_zone) {
+          base64_decode_block_safe(dst, buffer + (block_size - 2) * 64);
+        } else {
+          base64_decode_block(dst, buffer + (block_size - 2) * 64);
+        }
+        dst += 48;
+        std::memcpy(buffer, buffer + (block_size - 1) * 64,
+                    64); // 64 might be too much
+        bufferptr -= (block_size - 1) * 64;
+      }
+    }
+  }
+
+  char *buffer_start = buffer;
+  // Optimization note: if this is almost full, then it is worth our
+  // time, otherwise, we should just decode directly.
+  int last_block = (int)((bufferptr - buffer_start) % 64);
+  if (last_block != 0 && srcend - src + last_block >= 64) {
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(buf, len,
-                                                                  utf16_output);
-}
+    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
+      uint8_t val = to_base64[uint8_t(*src)];
+      *bufferptr = char(val);
+      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      }
+      bufferptr += (val <= 63);
+      src++;
+    }
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf8_to_utf16::convert_valid<endianness::BIG>(buf, len,
-                                                               utf16_output);
-}
+  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
+    if (dst >= end_of_safe_64byte_zone) {
+      base64_decode_block_safe(dst, buffer_start);
+    } else {
+      base64_decode_block(dst, buffer_start);
+    }
+    dst += 48;
+  }
+  if ((bufferptr - buffer_start) % 64 != 0) {
+    while (buffer_start + 4 < bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 4);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf8_to_utf32::convert(buf, len, utf32_output);
-}
+      dst += 3;
+      buffer_start += 4;
+    }
+    if (buffer_start + 4 <= bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 3);
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf8_to_utf32::convert_with_errors(buf, len, utf32_output);
+      dst += 3;
+      buffer_start += 4;
+    }
+    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
+    // backtrack
+    int leftover = int(bufferptr - buffer_start);
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
+      }
+      src--;
+      leftover--;
+    }
+  }
+  if (src < srcend + equalsigns) {
+    full_result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      return r;
+    } else {
+      r.output_count += size_t(dst - dstinit);
+    }
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
+      // additional checks
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
+        r.error = error_code::INVALID_BASE64_CHARACTER;
+        r.input_count = equallocation;
+      }
+    }
+    return r;
+  }
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
+    }
+  }
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
+/* end file src/haswell/avx2_base64.cpp */
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
-    const char *input, size_t size, char32_t *utf32_output) const noexcept {
-  return scalar::utf8_to_utf32::convert_valid(input, size, utf32_output);
-}
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert<endianness::LITTLE>(buf, len,
-                                                              latin1_output);
-}
+/* begin file src/generic/buf_block_reader.h */
+namespace simdutf {
+namespace haswell {
+namespace {
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert<endianness::BIG>(buf, len,
-                                                           latin1_output);
-}
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
+public:
+  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
+  simdutf_really_inline size_t block_index();
+  simdutf_really_inline bool has_full_block() const;
+  simdutf_really_inline const uint8_t *full_block() const;
+  /**
+   * Get the last block, padded with spaces.
+   *
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
+   *
+   * @return the number of effective characters in the last block.
+   */
+  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
+  simdutf_really_inline void advance();
 
-simdutf_warn_unused result
-implementation::convert_utf16le_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
-      buf, len, latin1_output);
-}
+private:
+  const uint8_t *buf;
+  const size_t len;
+  const size_t lenminusstep;
+  size_t idx;
+};
 
-simdutf_warn_unused result
-implementation::convert_utf16be_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
-      buf, len, latin1_output);
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_valid<endianness::LITTLE>(
-      buf, len, latin1_output);
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf16_to_latin1::convert_valid<endianness::BIG>(buf, len,
-                                                                 latin1_output);
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
+    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+  }
+  buf[64] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
-                                                            utf8_output);
-}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf8_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
+  return idx < lenminusstep;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-      buf, len, utf8_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
+  return &buf[idx];
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
-                                                                  utf8_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
+  std::memcpy(dst, buf + idx, len - idx);
+  return len - idx;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
-                                                               utf8_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
+  idx += STEP_SIZE;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf32_to_latin1::convert(buf, len, latin1_output);
-}
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/buf_block_reader.h */
+/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_validation {
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf32_to_latin1::convert_with_errors(buf, len, latin1_output);
-}
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return scalar::utf32_to_latin1::convert_valid(buf, len, latin1_output);
-}
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
-}
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
-}
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
-}
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
-                                                             utf16_output);
-}
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
-                                                          utf16_output);
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
 }
-
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf16_output);
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-      buf, len, utf16_output);
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
-      buf, len, utf16_output);
-}
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
-                                                                utf16_output);
-}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
-                                                             utf32_output);
-}
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
+  }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
-                                                          utf32_output);
-}
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
+      this->error |= this->prev_incomplete;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+      }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
+    }
+  }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf32_output);
-}
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-      buf, len, utf32_output);
-}
+}; // struct utf8_checker
+} // namespace utf8_validation
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
-      buf, len, utf32_output);
-}
+using utf8_validation::utf8_checker;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
-                                                                utf32_output);
-}
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+/* begin file src/generic/utf8_validation/utf8_validator.h */
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_validation {
 
-void implementation::change_endianness_utf16(const char16_t *input,
-                                             size_t length,
-                                             char16_t *output) const noexcept {
-  scalar::utf16::change_endianness_utf16(input, length, output);
+/**
+ * Validates that the string is actual UTF-8.
+ */
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::count_code_points<endianness::BIG>(input, length);
+/**
+ * Validates that the string is actual UTF-8 and stops on errors.
+ */
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    if (c.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
+      res.count += count;
+      return res;
+    }
+    reader.advance();
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
 }
 
-simdutf_warn_unused size_t
-implementation::count_utf8(const char *input, size_t length) const noexcept {
-  return scalar::utf8::count_code_points(input, length);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
-    const char *buf, size_t len) const noexcept {
-  return scalar::utf8::count_code_points(buf, len);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    running_or |= in;
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
 }
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf16(size_t length) const noexcept {
-  return scalar::utf16::latin1_length_from_utf16(length);
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf32(size_t length) const noexcept {
-  return length;
-}
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    if (!in.is_ascii()) {
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
+      return result(res.error, count + res.count);
+    }
+    reader.advance();
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
-    const char *input, size_t length) const noexcept {
-  size_t answer = length;
-  size_t i = 0;
-  auto pop = [](uint64_t v) {
-    return (size_t)(((v >> 7) & UINT64_C(0x0101010101010101)) *
-                        UINT64_C(0x0101010101010101) >>
-                    56);
-  };
-  for (; i + 32 <= length; i += 32) {
-    uint64_t v;
-    memcpy(&v, input + i, 8);
-    answer += pop(v);
-    memcpy(&v, input + i + 8, sizeof(v));
-    answer += pop(v);
-    memcpy(&v, input + i + 16, sizeof(v));
-    answer += pop(v);
-    memcpy(&v, input + i + 24, sizeof(v));
-    answer += pop(v);
-  }
-  for (; i + 8 <= length; i += 8) {
-    uint64_t v;
-    memcpy(&v, input + i, sizeof(v));
-    answer += pop(v);
+    count += 64;
   }
-  for (; i + 1 <= length; i += 1) {
-    answer += static_cast<uint8_t>(input[i]) >> 7;
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  if (!in.is_ascii()) {
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
+    return result(res.error, count + res.count);
+  } else {
+    return result(error_code::SUCCESS, length);
   }
-  return answer;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
-                                                                   length);
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
-}
+} // namespace utf8_validation
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_validator.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
-                                                                    length);
-}
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_to_utf16 {
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
-}
+using namespace simd;
 
-simdutf_warn_unused size_t
-implementation::utf16_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf16_length_from_latin1(length);
+template <endianness endian>
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
+      pos += 64;
+    } else {
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
+  }
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return scalar::utf8::utf16_length_from_utf8(input, length);
-}
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  return scalar::utf32::utf8_length_from_utf32(input, length);
-}
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_to_utf16 {
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  return scalar::utf32::utf16_length_from_utf32(input, length);
-}
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-simdutf_warn_unused size_t
-implementation::utf32_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf32_length_from_latin1(length);
-}
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return scalar::utf8::count_code_points(input, length);
-}
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
-}
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-    return {SUCCESS, 0};
-  }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-  }
-  return r;
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
 }
 
-simdutf_warn_unused full_result implementation::base64_to_binary_details(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
-    }
-    return {SUCCESS, 0, 0};
-  }
-  full_result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.output_count % 3 == 0) ||
-        ((r.output_count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
-    }
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
-  return r;
-}
-
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
-}
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
     }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+    if (errors()) {
+      return 0;
     }
-    return {SUCCESS, 0};
-  }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf16_output += howmany;
     }
+    return utf16_output - start;
   }
-  return r;
-}
 
-simdutf_warn_unused full_result implementation::base64_to_binary_details(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
     }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
     }
-    return {SUCCESS, 0, 0};
-  }
-  full_result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.output_count % 3 == 0) ||
-        ((r.output_count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
+      }
     }
+    return result(error_code::SUCCESS, utf16_output - start);
   }
-  return r;
-}
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(
-    size_t length, base64_options options) const noexcept {
-  return scalar::base64::base64_length_from_binary(length, options);
-}
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-size_t implementation::binary_to_base64(const char *input, size_t length,
-                                        char *output,
-                                        base64_options options) const noexcept {
-  return scalar::base64::tail_encode_base64(output, input, length, options);
-}
-} // namespace fallback
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace haswell
 } // namespace simdutf
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
-/* begin file src/simdutf/fallback/end.h */
-/* end file src/simdutf/fallback/end.h */
-/* end file src/fallback/implementation.cpp */
-#endif
-#if SIMDUTF_IMPLEMENTATION_ICELAKE
-/* begin file src/icelake/implementation.cpp */
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_to_utf32 {
 
+using namespace simd;
 
-/* begin file src/simdutf/icelake/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "icelake"
-// #define SIMDUTF_IMPLEMENTATION icelake
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+    }
+  }
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
+}
 
-#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
-// nothing needed.
-#else
-SIMDUTF_TARGET_ICELAKE
-#endif
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-#if SIMDUTF_GCC11ORMORE // workaround for
-                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
-// clang-format off
-SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
-// clang-format on
-#endif // end of workaround
-/* end file src/simdutf/icelake/begin.h */
 namespace simdutf {
-namespace icelake {
+namespace haswell {
 namespace {
-#ifndef SIMDUTF_ICELAKE_H
-  #error "icelake.h must be included"
-#endif
-/* begin file src/icelake/icelake_utf8_common.inl.cpp */
-// Common procedures for both validating and non-validating conversions from
-// UTF-8.
-enum block_processing_mode { SIMDUTF_FULL, SIMDUTF_TAIL };
+namespace utf8_to_utf32 {
+using namespace simd;
 
-using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
-using utf8_to_utf32_result = std::pair<const char *, uint32_t *>;
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-/*
-    process_block_utf8_to_utf16 converts up to 64 bytes from 'in' from UTF-8
-    to UTF-16. When tail = SIMDUTF_FULL, then the full input buffer (64 bytes)
-    might be used. When tail = SIMDUTF_TAIL, we take into account 'gap' which
-    indicates how many input bytes are relevant.
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-    Returns true when the result is correct, otherwise it returns false.
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-    The provided in and out pointers are advanced according to how many input
-    bytes have been processed, upon success.
-*/
-template <block_processing_mode tail, endianness big_endian>
-simdutf_really_inline bool
-process_block_utf8_to_utf16(const char *&in, char16_t *&out, size_t gap) {
-  // constants
-  __m512i mask_identity = _mm512_set_epi8(
-      63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46,
-      45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28,
-      27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9,
-      8, 7, 6, 5, 4, 3, 2, 1, 0);
-  __m512i mask_c0c0c0c0 = _mm512_set1_epi32(0xc0c0c0c0);
-  __m512i mask_80808080 = _mm512_set1_epi32(0x80808080);
-  __m512i mask_f0f0f0f0 = _mm512_set1_epi32(0xf0f0f0f0);
-  __m512i mask_dfdfdfdf_tail = _mm512_set_epi64(
-      0xffffdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
-      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf,
-      0xdfdfdfdfdfdfdfdf, 0xdfdfdfdfdfdfdfdf);
-  __m512i mask_c2c2c2c2 = _mm512_set1_epi32(0xc2c2c2c2);
-  __m512i mask_ffffffff = _mm512_set1_epi32(0xffffffff);
-  __m512i mask_d7c0d7c0 = _mm512_set1_epi32(0xd7c0d7c0);
-  __m512i mask_dc00dc00 = _mm512_set1_epi32(0xdc00dc00);
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  // Note that 'tail' is a compile-time constant !
-  __mmask64 b =
-      (tail == SIMDUTF_FULL) ? 0xFFFFFFFFFFFFFFFF : (uint64_t(1) << gap) - 1;
-  __m512i input = (tail == SIMDUTF_FULL) ? _mm512_loadu_si512(in)
-                                         : _mm512_maskz_loadu_epi8(b, in);
-  __mmask64 m1 = (tail == SIMDUTF_FULL)
-                     ? _mm512_cmplt_epu8_mask(input, mask_80808080)
-                     : _mm512_mask_cmplt_epu8_mask(b, input, mask_80808080);
-  if (_ktestc_mask64_u8(m1,
-                        b)) { // NOT(m1) AND b -- if all zeroes, then all ASCII
-                              // alternatively, we could do 'if (m1 == b) { '
-    if (tail == SIMDUTF_FULL) {
-      in += 64; // consumed 64 bytes
-      // we convert a full 64-byte block, writing 128 bytes.
-      __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-      if (big_endian) {
-        input1 = _mm512_shuffle_epi8(input1, byteflip);
-      }
-      _mm512_storeu_si512(out, input1);
-      out += 32;
-      __m512i input2 =
-          _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
-      if (big_endian) {
-        input2 = _mm512_shuffle_epi8(input2, byteflip);
-      }
-      _mm512_storeu_si512(out, input2);
-      out += 32;
-      return true; // we are done
-    } else {
-      in += gap;
-      if (gap <= 32) {
-        __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-        if (big_endian) {
-          input1 = _mm512_shuffle_epi8(input1, byteflip);
-        }
-        _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << (gap)) - 1),
-                                 input1);
-        out += gap;
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
       } else {
-        __m512i input1 = _mm512_cvtepu8_epi16(_mm512_castsi512_si256(input));
-        if (big_endian) {
-          input1 = _mm512_shuffle_epi8(input1, byteflip);
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-        _mm512_storeu_si512(out, input1);
-        out += 32;
-        __m512i input2 =
-            _mm512_cvtepu8_epi16(_mm512_extracti64x4_epi64(input, 1));
-        if (big_endian) {
-          input2 = _mm512_shuffle_epi8(input2, byteflip);
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
         }
-        _mm512_mask_storeu_epi16(
-            out, __mmask32((uint32_t(1) << (gap - 32)) - 1), input2);
-        out += gap - 32;
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      return true; // we are done
     }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
   }
-  // classify characters further
-  __mmask64 m234 = _mm512_cmp_epu8_mask(
-      mask_c0c0c0c0, input,
-      _MM_CMPINT_LE); // 0xc0 <= input, 2, 3, or 4 leading byte
-  __mmask64 m34 =
-      _mm512_cmp_epu8_mask(mask_dfdfdfdf_tail, input,
-                           _MM_CMPINT_LT); // 0xdf < input,  3 or 4 leading byte
-
-  __mmask64 milltwobytes = _mm512_mask_cmp_epu8_mask(
-      m234, input, mask_c2c2c2c2,
-      _MM_CMPINT_LT); // 0xc0 <= input < 0xc2 (illegal two byte sequence)
-                      // Overlong 2-byte sequence
-  if (_ktestz_mask64_u8(milltwobytes, milltwobytes) == 0) {
-    // Overlong 2-byte sequence
-    return false;
-  }
-  if (_ktestz_mask64_u8(m34, m34) == 0) {
-    // We have a 3-byte sequence and/or a 2-byte sequence, or possibly even a
-    // 4-byte sequence!
-    __mmask64 m4 = _mm512_cmp_epu8_mask(
-        input, mask_f0f0f0f0,
-        _MM_CMPINT_NLT); // 0xf0 <= zmm0 (4 byte start bytes)
-
-    __mmask64 mask_not_ascii = (tail == SIMDUTF_FULL)
-                                   ? _knot_mask64(m1)
-                                   : _kand_mask64(_knot_mask64(m1), b);
 
-    __mmask64 mp1 = _kshiftli_mask64(m234, 1);
-    __mmask64 mp2 = _kshiftli_mask64(m34, 2);
-    // We could do it as follows...
-    // if (_kortestz_mask64_u8(m4,m4)) { // compute the bitwise OR of the 64-bit
-    // masks a and b and return 1 if all zeroes but GCC generates better code
-    // when we do:
-    if (m4 == 0) { // compute the bitwise OR of the 64-bit masks a and b and
-                   // return 1 if all zeroes
-      // Fast path with 1,2,3 bytes
-      __mmask64 mc = _kor_mask64(mp1, mp2); // expected continuation bytes
-      __mmask64 m1234 = _kor_mask64(m1, m234);
-      // mismatched continuation bytes:
-      if (tail == SIMDUTF_FULL) {
-        __mmask64 xnormcm1234 = _kxnor_mask64(
-            mc,
-            m1234); // XNOR of mc and m1234 should be all zero if they differ
-        // the presence of a 1 bit indicates that they overlap.
-        // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return
-        // 1 if all zeroes.
-        if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
-          return false;
-        }
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
       } else {
-        __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
-        if (mc != bxorm1234) {
-          return false;
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-      }
-      // mend: identifying the last bytes of each sequence to be decoded
-      __mmask64 mend = _kshiftri_mask64(m1234, 1);
-      if (tail != SIMDUTF_FULL) {
-        mend = _kor_mask64(mend, (uint64_t(1) << (gap - 1)));
-      }
-
-      __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
-      __m512i last_and_thirdu16 =
-          _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
-
-      __m512i nonasciitags = _mm512_maskz_mov_epi8(
-          mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
-      __m512i clearedbytes = _mm512_andnot_si512(
-          nonasciitags, input); // high two bits cleared where not ASCII
-      __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
-          0x5555555555555555, last_and_thirdu16,
-          clearedbytes); // the last byte of each character
-
-      __mmask64 mask_before_non_ascii = _kshiftri_mask64(
-          mask_not_ascii, 1); // bytes that precede non-ASCII bytes
-      __m512i indexofsecondlastbytes = _mm512_add_epi16(
-          mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
-      __m512i beforeasciibytes =
-          _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
-      __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
-          0x5555555555555555, indexofsecondlastbytes,
-          beforeasciibytes); // the second last bytes (of two, three byte seq,
-                             // surrogates)
-      secondlastbytes =
-          _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
-
-      __m512i indexofthirdlastbytes = _mm512_add_epi16(
-          mask_ffffffff,
-          indexofsecondlastbytes); // indices of the second last bytes
-      __m512i thirdlastbyte =
-          _mm512_maskz_mov_epi8(m34,
-                                clearedbytes); // only those that are the third
-                                               // last byte of a sequence
-      __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
-          0x5555555555555555, indexofthirdlastbytes,
-          thirdlastbyte); // the third last bytes (of three byte sequences, hi
-                          // surrogate)
-      thirdlastbytes =
-          _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
-      __m512i Wout = _mm512_ternarylogic_epi32(lastbytes, secondlastbytes,
-                                               thirdlastbytes, 254);
-      // the elements of Wout excluding the last element if it happens to be a
-      // high surrogate:
-
-      __mmask64 mprocessed =
-          (tail == SIMDUTF_FULL)
-              ? _pdep_u64(0xFFFFFFFF, mend)
-              : _pdep_u64(
-                    0xFFFFFFFF,
-                    _kand_mask64(
-                        mend, b)); // we adjust mend at the end of the output.
-
-      // Encodings out of range...
-      {
-        // the location of 3-byte sequence start bytes in the input
-        __mmask64 m3 = m34 & (b ^ m4);
-        // code units in Wout corresponding to 3-byte sequences.
-        __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
-        __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
-        __mmask32 Msmall800 =
-            _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
-        __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
-        __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
-        __mmask32 M3s =
-            _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
-        if (_kor_mask32(Msmall800, M3s)) {
-          return false;
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
         }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
-      int64_t nout = _mm_popcnt_u64(mprocessed);
-      in += 64 - _lzcnt_u64(mprocessed);
-      if (big_endian) {
-        Wout = _mm512_shuffle_epi8(Wout, byteflip);
+    }
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
       }
-      _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
-      out += nout;
-      return true; // ok
     }
-    //
-    // We have a 4-byte sequence, this is the general case.
-    // Slow!
-    __mmask64 mp3 = _kshiftli_mask64(m4, 3);
-    __mmask64 mc =
-        _kor_mask64(_kor_mask64(mp1, mp2), mp3); // expected continuation bytes
-    __mmask64 m1234 = _kor_mask64(m1, m234);
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
 
-    // mend: identifying the last bytes of each sequence to be decoded
-    __mmask64 mend =
-        _kor_mask64(_kshiftri_mask64(_kor_mask64(mp3, m1234), 1), mp3);
-    if (tail != SIMDUTF_FULL) {
-      mend = _kor_mask64(mend, __mmask64(uint64_t(1) << (gap - 1)));
-    }
-    __m512i last_and_third = _mm512_maskz_compress_epi8(mend, mask_identity);
-    __m512i last_and_thirdu16 =
-        _mm512_cvtepu8_epi16(_mm512_castsi512_si256(last_and_third));
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-    __m512i nonasciitags = _mm512_maskz_mov_epi8(
-        mask_not_ascii, mask_c0c0c0c0); // ASCII: 00000000  other: 11000000
-    __m512i clearedbytes = _mm512_andnot_si512(
-        nonasciitags, input); // high two bits cleared where not ASCII
-    __m512i lastbytes = _mm512_maskz_permutexvar_epi8(
-        0x5555555555555555, last_and_thirdu16,
-        clearedbytes); // the last byte of each character
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+// other functions
+/* begin file src/generic/utf8.h */
 
-    __mmask64 mask_before_non_ascii = _kshiftri_mask64(
-        mask_not_ascii, 1); // bytes that precede non-ASCII bytes
-    __m512i indexofsecondlastbytes = _mm512_add_epi16(
-        mask_ffffffff, last_and_thirdu16); // indices of the second last bytes
-    __m512i beforeasciibytes =
-        _mm512_maskz_mov_epi8(mask_before_non_ascii, clearedbytes);
-    __m512i secondlastbytes = _mm512_maskz_permutexvar_epi8(
-        0x5555555555555555, indexofsecondlastbytes,
-        beforeasciibytes); // the second last bytes (of two, three byte seq,
-                           // surrogates)
-    secondlastbytes =
-        _mm512_slli_epi16(secondlastbytes, 6); // shifted into position
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8 {
 
-    __m512i indexofthirdlastbytes = _mm512_add_epi16(
-        mask_ffffffff,
-        indexofsecondlastbytes); // indices of the second last bytes
-    __m512i thirdlastbyte = _mm512_maskz_mov_epi8(
-        m34,
-        clearedbytes); // only those that are the third last byte of a sequence
-    __m512i thirdlastbytes = _mm512_maskz_permutexvar_epi8(
-        0x5555555555555555, indexofthirdlastbytes,
-        thirdlastbyte); // the third last bytes (of three byte sequences, hi
-                        // surrogate)
-    thirdlastbytes =
-        _mm512_slli_epi16(thirdlastbytes, 12); // shifted into position
-    __m512i thirdsecondandlastbytes = _mm512_ternarylogic_epi32(
-        lastbytes, secondlastbytes, thirdlastbytes, 254);
-    uint64_t Mlo_uint64 = _pext_u64(mp3, mend);
-    __mmask32 Mlo = __mmask32(Mlo_uint64);
-    __mmask32 Mhi = __mmask32(Mlo_uint64 >> 1);
-    __m512i lo_surr_mask = _mm512_maskz_mov_epi16(
-        Mlo,
-        mask_dc00dc00); // lo surr: 1101110000000000, other:  0000000000000000
-    __m512i shifted4_thirdsecondandlastbytes =
-        _mm512_srli_epi16(thirdsecondandlastbytes,
-                          4); // hi surr: 00000WVUTSRQPNML  vuts = WVUTS - 1
-    __m512i tagged_lo_surrogates = _mm512_or_si512(
-        thirdsecondandlastbytes,
-        lo_surr_mask); // lo surr: 110111KJHGFEDCBA, other:  unchanged
-    __m512i Wout = _mm512_mask_add_epi16(
-        tagged_lo_surrogates, Mhi, shifted4_thirdsecondandlastbytes,
-        mask_d7c0d7c0); // hi sur: 110110vutsRQPNML, other:  unchanged
-    // the elements of Wout excluding the last element if it happens to be a
-    // high surrogate:
-    __mmask32 Mout = ~(Mhi & 0x80000000);
-    __mmask64 mprocessed =
-        (tail == SIMDUTF_FULL)
-            ? _pdep_u64(Mout, mend)
-            : _pdep_u64(
-                  Mout,
-                  _kand_mask64(mend,
-                               b)); // we adjust mend at the end of the output.
+using namespace simd;
 
-    // mismatched continuation bytes:
-    if (tail == SIMDUTF_FULL) {
-      __mmask64 xnormcm1234 = _kxnor_mask64(
-          mc, m1234); // XNOR of mc and m1234 should be all zero if they differ
-      // the presence of a 1 bit indicates that they overlap.
-      // _kortestz_mask64_u8: compute the bitwise OR of 64-bit masksand return 1
-      // if all zeroes.
-      if (!_kortestz_mask64_u8(xnormcm1234, xnormcm1234)) {
-        return false;
-      }
-    } else {
-      __mmask64 bxorm1234 = _kxor_mask64(b, m1234);
-      if (mc != bxorm1234) {
-        return false;
-      }
-    }
-    // Encodings out of range...
-    {
-      // the location of 3-byte sequence start bytes in the input
-      __mmask64 m3 = m34 & (b ^ m4);
-      // code units in Wout corresponding to 3-byte sequences.
-      __mmask32 M3 = __mmask32(_pext_u64(m3 << 2, mend));
-      __m512i mask_08000800 = _mm512_set1_epi32(0x08000800);
-      __mmask32 Msmall800 =
-          _mm512_mask_cmplt_epu16_mask(M3, Wout, mask_08000800);
-      __m512i mask_d800d800 = _mm512_set1_epi32(0xd800d800);
-      __m512i Moutminusd800 = _mm512_sub_epi16(Wout, mask_d800d800);
-      __mmask32 M3s =
-          _mm512_mask_cmplt_epu16_mask(M3, Moutminusd800, mask_08000800);
-      __m512i mask_04000400 = _mm512_set1_epi32(0x04000400);
-      __mmask32 M4s =
-          _mm512_mask_cmpge_epu16_mask(Mhi, Moutminusd800, mask_04000400);
-      if (!_kortestz_mask32_u8(M4s, _kor_mask32(Msmall800, M3s))) {
-        return false;
-      }
-    }
-    in += 64 - _lzcnt_u64(mprocessed);
-    int64_t nout = _mm_popcnt_u64(mprocessed);
-    if (big_endian) {
-      Wout = _mm512_shuffle_epi8(Wout, byteflip);
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
+
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8.h */
+/* begin file src/generic/utf16.h */
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf16 {
+
+template <endianness big_endian>
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), Wout);
-    out += nout;
-    return true; // ok
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
   }
-  // Fast path 2: all ASCII or 2 byte
-  __mmask64 continuation_or_ascii = (tail == SIMDUTF_FULL)
-                                        ? _knot_mask64(m234)
-                                        : _kand_mask64(_knot_mask64(m234), b);
-  // on top of -0xc0 we subtract -2 which we get back later of the
-  // continuation byte tags
-  __m512i leading2byte = _mm512_maskz_sub_epi8(m234, input, mask_c2c2c2c2);
-  __mmask64 leading = tail == (tail == SIMDUTF_FULL)
-                          ? _kor_mask64(m1, m234)
-                          : _kand_mask64(_kor_mask64(m1, m234),
-                                         b); // first bytes of each sequence
-  if (tail == SIMDUTF_FULL) {
-    __mmask64 xnor234leading =
-        _kxnor_mask64(_kshiftli_mask64(m234, 1), leading);
-    if (!_kortestz_mask64_u8(xnor234leading, xnor234leading)) {
-      return false;
-    }
-  } else {
-    __mmask64 bxorleading = _kxor_mask64(b, leading);
-    if (_kshiftli_mask64(m234, 1) != bxorleading) {
-      return false;
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+}
+
+template <endianness big_endian>
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
-  }
-  //
-  if (tail == SIMDUTF_FULL) {
-    // In the two-byte/ASCII scenario, we are easily latency bound, so we want
-    // to increment the input buffer as quickly as possible.
-    // We process 32 bytes unless the byte at index 32 is a continuation byte,
-    // in which case we include it as well for a total of 33 bytes.
-    // Note that if x is an ASCII byte, then the following is false:
-    // int8_t(x) <= int8_t(0xc0) under two's complement.
-    in += 32;
-    if (int8_t(*in) <= int8_t(0xc0))
-      in++;
-    // The alternative is to do
-    // in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
-    // but it requires loading the input, doing the mask computation, and
-    // converting back the mask to a general register. It just takes too long,
-    // leaving the processor likely to be idle.
-  } else {
-    in += 64 - _lzcnt_u64(_pdep_u64(0xFFFFFFFF, continuation_or_ascii));
-  }
-  __m512i lead = _mm512_maskz_compress_epi8(
-      leading, leading2byte); // will contain zero for ascii, and the data
-  lead = _mm512_cvtepu8_epi16(
-      _mm512_castsi512_si256(lead)); // ... zero extended into code units
-  __m512i follow = _mm512_maskz_compress_epi8(
-      continuation_or_ascii, input); // the last bytes of each sequence
-  follow = _mm512_cvtepu8_epi16(
-      _mm512_castsi512_si256(follow)); // ... zero extended into code units
-  lead = _mm512_slli_epi16(lead, 6);   // shifted into position
-  __m512i final = _mm512_add_epi16(follow, lead); // combining lead and follow
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
 
-  if (big_endian) {
-    final = _mm512_shuffle_epi8(final, byteflip);
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
   }
-  if (tail == SIMDUTF_FULL) {
-    // Next part is UTF-16 specific and can be generalized to UTF-32.
-    int nout = _mm_popcnt_u32(uint32_t(leading));
-    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), final);
-    out += nout; // UTF-8 to UTF-16 is only expansionary in this case.
-  } else {
-    int nout = int(_mm_popcnt_u64(_pdep_u64(0xFFFFFFFF, leading)));
-    _mm512_mask_storeu_epi16(out, __mmask32((uint64_t(1) << nout) - 1), final);
-    out += nout; // UTF-8 to UTF-16 is only expansionary in this case.
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
+}
+
+template <endianness big_endian>
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
+}
+
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
+  size_t pos = 0;
+
+  while (pos < size / 32 * 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    input.swap_bytes();
+    input.store(reinterpret_cast<uint16_t *>(output));
+    pos += 32;
+    output += 32;
   }
 
-  return true; // we are fine.
+  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
 }
 
-/*
-    utf32_to_utf16_masked converts `count` lower UTF-32 code units
-    from input `utf32` into UTF-16. It differs from utf32_to_utf16
-    in that it 'masks' the writes.
+} // namespace utf16
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf16.h */
 
-    Returns how many 16-bit code units were stored.
+// transcoding from UTF-8 to Latin 1
+/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
-    byteflip is used for flipping 16-bit code units, and it should be
-        __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    We pass it to the (always inlined) function to encourage the compiler to
-    keep the value in a (constant) register.
-*/
-template <endianness big_endian>
-simdutf_really_inline size_t utf32_to_utf16_masked(const __m512i byteflip,
-                                                   __m512i utf32,
-                                                   unsigned int count,
-                                                   char16_t *output) {
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
 
-  const __mmask16 valid = uint16_t((1 << count) - 1);
-  // 1. check if we have any surrogate pairs
-  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
-  const __mmask16 sp_mask =
-      _mm512_mask_cmpgt_epu32_mask(valid, utf32, v_0000_ffff);
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
 
-  if (sp_mask == 0) {
-    if (big_endian) {
-      _mm256_mask_storeu_epi16(
-          (__m256i *)output, valid,
-          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
-                              _mm512_castsi512_si256(byteflip)));
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      FORBIDDEN,
+      // 1110____ ________ <three byte lead in byte 1>
+      FORBIDDEN,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-    } else {
-      _mm256_mask_storeu_epi16((__m256i *)output, valid,
-                               _mm512_cvtepi32_epi16(utf32));
-    }
-    return count;
-  }
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
 
-  {
-    // build surrogate pair code units in 32-bit lanes
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
-    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
-    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
-    const __m512i t1 = _mm512_slli_epi32(t0, 6);
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
 
-    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
-    //    to t0
-    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
-    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
-    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
-    //    to t0
-    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
-    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
-    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
-    const __m512i t3 =
-        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
-    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
-    __m512i t5 = _mm512_ror_epi32(t4, 16);
-    // Here we want to trim all of the upper 16-bit code units from the 2-byte
-    // characters represented as 4-byte values. We can compute it from
-    // sp_mask or the following... It can be more optimized!
-    const __mmask32 nonzero = _kor_mask32(
-        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
-    const __mmask32 nonzero_masked =
-        _kand_mask32(nonzero, __mmask32((uint64_t(1) << (2 * count)) - 1));
-    if (big_endian) {
-      t5 = _mm512_shuffle_epi8(t5, byteflip);
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
+
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
     }
-    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
-    // (zen4)
-    __m512i compressed = _mm512_maskz_compress_epi16(nonzero_masked, t5);
-    _mm512_mask_storeu_epi16(
-        output,
-        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
-        compressed);
-    //_mm512_mask_compressstoreu_epi16(output, nonzero_masked, t5);
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
   }
 
-  return count + static_cast<unsigned int>(count_ones(sp_mask));
-}
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, latin1_output - start);
+  }
 
-/*
-    utf32_to_utf16 converts `count` lower UTF-32 code units
-    from input `utf32` into UTF-16. It may overflow.
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-    Returns how many 16-bit code units were stored.
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
+} // unnamed namespace
+} // namespace haswell
+} // namespace simdutf
+/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-    byteflip is used for flipping 16-bit code units, and it should be
-        __m512i byteflip = _mm512_setr_epi64(
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809,
-            0x0607040502030001,
-            0x0e0f0c0d0a0b0809
-        );
-    We pass it to the (always inlined) function to encourage the compiler to
-    keep the value in a (constant) register.
-*/
-template <endianness big_endian>
-simdutf_really_inline size_t utf32_to_utf16(const __m512i byteflip,
-                                            __m512i utf32, unsigned int count,
-                                            char16_t *output) {
-  // check if we have any surrogate pairs
-  const __m512i v_0000_ffff = _mm512_set1_epi32(0x0000ffff);
-  const __mmask16 sp_mask = _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
+namespace simdutf {
+namespace haswell {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
 
-  if (sp_mask == 0) {
-    // technically, it should be _mm256_storeu_epi16
-    if (big_endian) {
-      _mm256_storeu_si256(
-          (__m256i *)output,
-          _mm256_shuffle_epi8(_mm512_cvtepi32_epi16(utf32),
-                              _mm512_castsi512_si256(byteflip)));
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
     } else {
-      _mm256_storeu_si256((__m256i *)output, _mm512_cvtepi32_epi16(utf32));
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
-    return count;
   }
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
 
-  {
-    // build surrogate pair code units in 32-bit lanes
-
-    //    t0 = 8 x [000000000000aaaa|aaaaaabbbbbbbbbb]
-    const __m512i v_0001_0000 = _mm512_set1_epi32(0x00010000);
-    const __m512i t0 = _mm512_sub_epi32(utf32, v_0001_0000);
-
-    //    t1 = 8 x [000000aaaaaaaaaa|bbbbbbbbbb000000]
-    const __m512i t1 = _mm512_slli_epi32(t0, 6);
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace haswell
+} // namespace simdutf
+  // namespace simdutf
+/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-    //    t2 = 8 x [000000aaaaaaaaaa|aaaaaabbbbbbbbbb] -- copy hi word from t1
-    //    to t0
-    //         0xe4 = (t1 and v_ffff_0000) or (t0 and not v_ffff_0000)
-    const __m512i v_ffff_0000 = _mm512_set1_epi32(0xffff0000);
-    const __m512i t2 = _mm512_ternarylogic_epi32(t1, t0, v_ffff_0000, 0xe4);
+namespace simdutf {
+namespace haswell {
 
-    //    t2 = 8 x [110110aaaaaaaaaa|110111bbbbbbbbbb] -- copy hi word from t1
-    //    to t0
-    //         0xba = (t2 and not v_fc00_fc000) or v_d800_dc00
-    const __m512i v_fc00_fc00 = _mm512_set1_epi32(0xfc00fc00);
-    const __m512i v_d800_dc00 = _mm512_set1_epi32(0xd800dc00);
-    const __m512i t3 =
-        _mm512_ternarylogic_epi32(t2, v_fc00_fc00, v_d800_dc00, 0xba);
-    const __m512i t4 = _mm512_mask_blend_epi32(sp_mask, utf32, t3);
-    __m512i t5 = _mm512_ror_epi32(t4, 16);
-    const __mmask32 nonzero = _kor_mask32(
-        0xaaaaaaaa, _mm512_cmpneq_epi16_mask(t5, _mm512_setzero_si512()));
-    if (big_endian) {
-      t5 = _mm512_shuffle_epi8(t5, byteflip);
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
     }
-    // we deliberately avoid _mm512_mask_compressstoreu_epi16 for portability
-    // (zen4)
-    __m512i compressed = _mm512_maskz_compress_epi16(nonzero, t5);
-    _mm512_mask_storeu_epi16(
-        output,
-        (1 << (count + static_cast<unsigned int>(count_ones(sp_mask)))) - 1,
-        compressed);
-    //_mm512_mask_compressstoreu_epi16(output, nonzero, t5);
   }
-
-  return count + static_cast<unsigned int>(count_ones(sp_mask));
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
+  return out;
 }
 
-/**
- * Store the last N bytes of previous followed by 512-N bytes from input.
- */
-template <int N> __m512i prev(__m512i input, __m512i previous) {
-  static_assert(N <= 32, "N must be no larger than 32");
-  const __m512i movemask =
-      _mm512_setr_epi32(28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
-  const __m512i rotated = _mm512_permutex2var_epi32(input, movemask, previous);
-#if SIMDUTF_GCC8 || SIMDUTF_GCC9
-  constexpr int shift = 16 - N; // workaround for GCC8,9
-  return _mm512_alignr_epi8(input, rotated, shift);
-#else
-  return _mm512_alignr_epi8(input, rotated, 16 - N);
-#endif // SIMDUTF_GCC8 || SIMDUTF_GCC9
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-template <unsigned idx0, unsigned idx1, unsigned idx2, unsigned idx3>
-__m512i shuffle_epi128(__m512i v) {
-  static_assert((idx0 >= 0 && idx0 <= 3), "idx0 must be in range 0..3");
-  static_assert((idx1 >= 0 && idx1 <= 3), "idx1 must be in range 0..3");
-  static_assert((idx2 >= 0 && idx2 <= 3), "idx2 must be in range 0..3");
-  static_assert((idx3 >= 0 && idx3 <= 3), "idx3 must be in range 0..3");
-
-  constexpr unsigned shuffle = idx0 | (idx1 << 2) | (idx2 << 4) | (idx3 << 6);
-  return _mm512_shuffle_i32x4(v, v, shuffle);
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-template <unsigned idx> constexpr __m512i broadcast_epi128(__m512i v) {
-  return shuffle_epi128<idx, idx, idx, idx>(v);
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-/**
- * Current unused.
- */
-template <int N> __m512i rotate_by_N_epi8(const __m512i input) {
-
-  // lanes order: 1, 2, 3, 0 => 0b00_11_10_01
-  const __m512i permuted = _mm512_shuffle_i32x4(input, input, 0x39);
-
-  return _mm512_alignr_epi8(permuted, input, N);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return haswell::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
-/*
-    expanded_utf8_to_utf32 converts expanded UTF-8 characters (`utf8`)
-    stored at separate 32-bit lanes.
-
-    For each lane we have also a character class (`char_class), given in form
-    0x8080800N, where N is 4 highest bits from the leading byte; 0x80 resets
-    corresponding bytes during pshufb.
-*/
-simdutf_really_inline __m512i expanded_utf8_to_utf32(__m512i char_class,
-                                                     __m512i utf8) {
-  /*
-      Input:
-      - utf8: bytes stored at separate 32-bit code units
-      - valid: which code units have valid UTF-8 characters
-
-      Bit layout of single word. We show 4 cases for each possible
-      UTF-8 character encoding. The `?` denotes bits we must not
-      assume their value.
-
-      |10dd.dddd|10cc.cccc|10bb.bbbb|1111.0aaa| 4-byte char
-      |????.????|10cc.cccc|10bb.bbbb|1110.aaaa| 3-byte char
-      |????.????|????.????|10bb.bbbb|110a.aaaa| 2-byte char
-      |????.????|????.????|????.????|0aaa.aaaa| ASCII char
-        byte 3    byte 2    byte 1     byte 0
-  */
-
-  /* 1. Reset control bits of continuation bytes and the MSB
-        of the leading byte; this makes all bytes unsigned (and
-        does not alter ASCII char).
-
-      |00dd.dddd|00cc.cccc|00bb.bbbb|0111.0aaa| 4-byte char
-      |00??.????|00cc.cccc|00bb.bbbb|0110.aaaa| 3-byte char
-      |00??.????|00??.????|00bb.bbbb|010a.aaaa| 2-byte char
-      |00??.????|00??.????|00??.????|0aaa.aaaa| ASCII char
-       ^^        ^^        ^^        ^
-  */
-  __m512i values;
-  const __m512i v_3f3f_3f7f = _mm512_set1_epi32(0x3f3f3f7f);
-  values = _mm512_and_si512(utf8, v_3f3f_3f7f);
-
-  /* 2. Swap and join fields A-B and C-D
-
-      |0000.cccc|ccdd.dddd|0001.110a|aabb.bbbb| 4-byte char
-      |0000.cccc|cc??.????|0001.10aa|aabb.bbbb| 3-byte char
-      |0000.????|????.????|0001.0aaa|aabb.bbbb| 2-byte char
-      |0000.????|????.????|000a.aaaa|aa??.????| ASCII char */
-  const __m512i v_0140_0140 = _mm512_set1_epi32(0x01400140);
-  values = _mm512_maddubs_epi16(values, v_0140_0140);
-
-  /* 3. Swap and join fields AB & CD
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid UTF-16. protect the implementation from
+    // handling nullptr
+    return true;
+  }
+  const char16_t *tail = avx2_validate_utf16<endianness::LITTLE>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
+  } else {
+    return false;
+  }
+}
 
-      |0000.0001|110a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char
-      |0000.0001|10aa.aabb|bbbb.cccc|cc??.????| 3-byte char
-      |0000.0001|0aaa.aabb|bbbb.????|????.????| 2-byte char
-      |0000.000a|aaaa.aa??|????.????|????.????| ASCII char */
-  const __m512i v_0001_1000 = _mm512_set1_epi32(0x00011000);
-  values = _mm512_madd_epi16(values, v_0001_1000);
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid UTF-16. protect the implementation from
+    // handling nullptr
+    return true;
+  }
+  const char16_t *tail = avx2_validate_utf16<endianness::BIG>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
+}
 
-  /* 4. Shift left the values by variable amounts to reset highest UTF-8 bits
-      |aaab.bbbb|bccc.cccd|dddd.d000|0000.0000| 4-byte char -- by 11
-      |aaaa.bbbb|bbcc.cccc|????.??00|0000.0000| 3-byte char -- by 10
-      |aaaa.abbb|bbb?.????|????.???0|0000.0000| 2-byte char -- by 9
-      |aaaa.aaa?|????.????|????.????|?000.0000| ASCII char -- by 7 */
-  {
-    /** pshufb
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  result res = avx2_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
 
-    continuation = 0
-    ascii    = 7
-    _2_bytes = 9
-    _3_bytes = 10
-    _4_bytes = 11
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  result res = avx2_validate_utf16_with_errors<endianness::BIG>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
 
-    shift_left_v3 = 4 * [
-        ascii, # 0000
-        ascii, # 0001
-        ascii, # 0010
-        ascii, # 0011
-        ascii, # 0100
-        ascii, # 0101
-        ascii, # 0110
-        ascii, # 0111
-        continuation, # 1000
-        continuation, # 1001
-        continuation, # 1010
-        continuation, # 1011
-        _2_bytes, # 1100
-        _2_bytes, # 1101
-        _3_bytes, # 1110
-        _4_bytes, # 1111
-    ] */
-    const __m512i shift_left_v3 = _mm512_setr_epi64(
-        0x0707070707070707, 0x0b0a090900000000, 0x0707070707070707,
-        0x0b0a090900000000, 0x0707070707070707, 0x0b0a090900000000,
-        0x0707070707070707, 0x0b0a090900000000);
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid UTF-32. protect the implementation from
+    // handling nullptr
+    return true;
+  }
+  const char32_t *tail = avx2_validate_utf32le(buf, len);
+  if (tail) {
+    return scalar::utf32::validate(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
+}
 
-    const __m512i shift = _mm512_shuffle_epi8(shift_left_v3, char_class);
-    values = _mm512_sllv_epi32(values, shift);
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid UTF-32. protect the implementation from
+    // handling nullptr
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = avx2_validate_utf32le_with_errors(buf, len);
+  if (res.count != len) {
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
   }
+}
 
-  /* 5. Shift right the values by variable amounts to reset lowest bits
-      |0000.0000|000a.aabb|bbbb.cccc|ccdd.dddd| 4-byte char -- by 11
-      |0000.0000|0000.0000|aaaa.bbbb|bbcc.cccc| 3-byte char -- by 16
-      |0000.0000|0000.0000|0000.0aaa|aabb.bbbb| 2-byte char -- by 21
-      |0000.0000|0000.0000|0000.0000|0aaa.aaaa| ASCII char -- by 25 */
-  {
-    // 4 * [25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 21, 21, 16, 11]
-    const __m512i shift_right = _mm512_setr_epi64(
-        0x1919191919191919, 0x0b10151500000000, 0x1919191919191919,
-        0x0b10151500000000, 0x1919191919191919, 0x0b10151500000000,
-        0x1919191919191919, 0x0b10151500000000);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char *, char *> ret =
+      avx2_convert_latin1_to_utf8(buf, len, utf8_output);
+  size_t converted_chars = ret.second - utf8_output;
 
-    const __m512i shift = _mm512_shuffle_epi8(shift_right, char_class);
-    values = _mm512_srlv_epi32(values, shift);
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
   }
 
-  return values;
+  return converted_chars;
 }
 
-simdutf_really_inline __m512i expand_and_identify(__m512i lane0, __m512i lane1,
-                                                  int &count) {
-  const __m512i merged = _mm512_mask_mov_epi32(lane0, 0x1000, lane1);
-  const __m512i expand_ver2 = _mm512_setr_epi64(
-      0x0403020103020100, 0x0605040305040302, 0x0807060507060504,
-      0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,
-      0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);
-  const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);
-  const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);
-  const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);
-  const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);
-  const __mmask16 leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);
-  count = static_cast<int>(count_ones(leading_bytes));
-  return _mm512_mask_compress_epi32(_mm512_setzero_si512(), leading_bytes,
-                                    input);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      avx2_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
+    }
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_really_inline __m512i expand_utf8_to_utf32(__m512i input) {
-  __m512i char_class = _mm512_srli_epi32(input, 4);
-  /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */
-  const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);
-  const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);
-  char_class =
-      _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea);
-  return expanded_utf8_to_utf32(char_class, input);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      avx2_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
+    }
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
-/* end file src/icelake/icelake_utf8_common.inl.cpp */
-/* begin file src/icelake/icelake_macros.inl.cpp */
-
-/*
-    This upcoming macro (SIMDUTF_ICELAKE_TRANSCODE16) takes 16 + 4 bytes (of a
-   UTF-8 string) and loads all possible 4-byte substring into an AVX512
-   register.
-
-    For example if we have bytes abcdefgh... we create following 32-bit lanes
-
-    [abcd|bcde|cdef|defg|efgh|...]
-     ^                          ^
-     byte 0 of reg              byte 63 of reg
-*/
-/** pshufb
-        # lane{0,1,2} have got bytes: [  0,  1,  2,  3,  4,  5,  6,  8,  9, 10,
-   11, 12, 13, 14, 15] # lane3 has got bytes:        [ 16, 17, 18, 19,  4,  5,
-   6,  8,  9, 10, 11, 12, 13, 14, 15]
 
-        expand_ver2 = [
-            # lane 0:
-            0, 1, 2, 3,
-            1, 2, 3, 4,
-            2, 3, 4, 5,
-            3, 4, 5, 6,
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      avx2_convert_latin1_to_utf32(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_converted_chars == 0) {
+      return 0;
+    }
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
+}
 
-            # lane 1:
-            4, 5, 6, 7,
-            5, 6, 7, 8,
-            6, 7, 8, 9,
-            7, 8, 9, 10,
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert(buf, len, latin1_output);
+}
 
-            # lane 2:
-             8,  9, 10, 11,
-             9, 10, 11, 12,
-            10, 11, 12, 13,
-            11, 12, 13, 14,
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, latin1_output);
+}
 
-            # lane 3 order: 13, 14, 15, 16 14, 15, 16, 17, 15, 16, 17, 18, 16,
-   17, 18, 19 12, 13, 14, 15, 13, 14, 15,  0, 14, 15,  0,  1, 15,  0,  1,  2,
-        ]
-*/
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *input, size_t size, char *latin1_output) const noexcept {
+  return utf8_to_latin1::convert_valid(input, size, latin1_output);
+}
 
-#define SIMDUTF_ICELAKE_TRANSCODE16(LANE0, LANE1, MASKED)                      \
-  {                                                                            \
-    const __m512i merged = _mm512_mask_mov_epi32(LANE0, 0x1000, LANE1);        \
-    const __m512i expand_ver2 = _mm512_setr_epi64(                             \
-        0x0403020103020100, 0x0605040305040302, 0x0807060507060504,            \
-        0x0a09080709080706, 0x0c0b0a090b0a0908, 0x0e0d0c0b0d0c0b0a,            \
-        0x000f0e0d0f0e0d0c, 0x0201000f01000f0e);                               \
-    const __m512i input = _mm512_shuffle_epi8(merged, expand_ver2);            \
-                                                                               \
-    __mmask16 leading_bytes;                                                   \
-    const __m512i v_0000_00c0 = _mm512_set1_epi32(0xc0);                       \
-    const __m512i t0 = _mm512_and_si512(input, v_0000_00c0);                   \
-    const __m512i v_0000_0080 = _mm512_set1_epi32(0x80);                       \
-    leading_bytes = _mm512_cmpneq_epu32_mask(t0, v_0000_0080);                 \
-                                                                               \
-    __m512i char_class;                                                        \
-    char_class = _mm512_srli_epi32(input, 4);                                  \
-    /*  char_class = ((input >> 4) & 0x0f) | 0x80808000 */                     \
-    const __m512i v_0000_000f = _mm512_set1_epi32(0x0f);                       \
-    const __m512i v_8080_8000 = _mm512_set1_epi32(0x80808000);                 \
-    char_class =                                                               \
-        _mm512_ternarylogic_epi32(char_class, v_0000_000f, v_8080_8000, 0xea); \
-                                                                               \
-    const int valid_count = static_cast<int>(count_ones(leading_bytes));       \
-    const __m512i utf32 = expanded_utf8_to_utf32(char_class, input);           \
-                                                                               \
-    const __m512i out = _mm512_mask_compress_epi32(_mm512_setzero_si512(),     \
-                                                   leading_bytes, utf32);      \
-                                                                               \
-    if (UTF32) {                                                               \
-      if (MASKED) {                                                            \
-        const __mmask16 valid = uint16_t((1 << valid_count) - 1);              \
-        _mm512_mask_storeu_epi32((__m512i *)output, valid, out);               \
-      } else {                                                                 \
-        _mm512_storeu_si512((__m512i *)output, out);                           \
-      }                                                                        \
-      output += valid_count;                                                   \
-    } else {                                                                   \
-      if (MASKED) {                                                            \
-        output += utf32_to_utf16_masked<big_endian>(                           \
-            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
-      } else {                                                                 \
-        output += utf32_to_utf16<big_endian>(                                  \
-            byteflip, out, valid_count, reinterpret_cast<char16_t *>(output)); \
-      }                                                                        \
-    }                                                                          \
-  }
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
+}
 
-#define SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(INPUT, VALID_COUNT, MASKED)       \
-  {                                                                            \
-    if (UTF32) {                                                               \
-      if (MASKED) {                                                            \
-        const __mmask16 valid_mask = uint16_t((1 << VALID_COUNT) - 1);         \
-        _mm512_mask_storeu_epi32((__m512i *)output, valid_mask, INPUT);        \
-      } else {                                                                 \
-        _mm512_storeu_si512((__m512i *)output, INPUT);                         \
-      }                                                                        \
-      output += VALID_COUNT;                                                   \
-    } else {                                                                   \
-      if (MASKED) {                                                            \
-        output += utf32_to_utf16_masked<big_endian>(                           \
-            byteflip, INPUT, VALID_COUNT,                                      \
-            reinterpret_cast<char16_t *>(output));                             \
-      } else {                                                                 \
-        output +=                                                              \
-            utf32_to_utf16<big_endian>(byteflip, INPUT, VALID_COUNT,           \
-                                       reinterpret_cast<char16_t *>(output));  \
-      }                                                                        \
-    }                                                                          \
-  }
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::BIG>(buf, len, utf16_output);
+}
 
-#define SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)                       \
-  if (UTF32) {                                                                 \
-    const __m128i t0 = _mm512_castsi512_si128(utf8);                           \
-    const __m128i t1 = _mm512_extracti32x4_epi32(utf8, 1);                     \
-    const __m128i t2 = _mm512_extracti32x4_epi32(utf8, 2);                     \
-    const __m128i t3 = _mm512_extracti32x4_epi32(utf8, 3);                     \
-    _mm512_storeu_si512((__m512i *)(output + 0 * 16),                          \
-                        _mm512_cvtepu8_epi32(t0));                             \
-    _mm512_storeu_si512((__m512i *)(output + 1 * 16),                          \
-                        _mm512_cvtepu8_epi32(t1));                             \
-    _mm512_storeu_si512((__m512i *)(output + 2 * 16),                          \
-                        _mm512_cvtepu8_epi32(t2));                             \
-    _mm512_storeu_si512((__m512i *)(output + 3 * 16),                          \
-                        _mm512_cvtepu8_epi32(t3));                             \
-  } else {                                                                     \
-    const __m256i h0 = _mm512_castsi512_si256(utf8);                           \
-    const __m256i h1 = _mm512_extracti64x4_epi64(utf8, 1);                     \
-    if (big_endian) {                                                          \
-      _mm512_storeu_si512(                                                     \
-          (__m512i *)(output + 0 * 16),                                        \
-          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h0), byteflip));            \
-      _mm512_storeu_si512(                                                     \
-          (__m512i *)(output + 2 * 16),                                        \
-          _mm512_shuffle_epi8(_mm512_cvtepu8_epi16(h1), byteflip));            \
-    } else {                                                                   \
-      _mm512_storeu_si512((__m512i *)(output + 0 * 16),                        \
-                          _mm512_cvtepu8_epi16(h0));                           \
-      _mm512_storeu_si512((__m512i *)(output + 2 * 16),                        \
-                          _mm512_cvtepu8_epi16(h1));                           \
-    }                                                                          \
-  }
-/* end file src/icelake/icelake_macros.inl.cpp */
-/* begin file src/icelake/icelake_from_valid_utf8.inl.cpp */
-// file included directly
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
+}
 
-// File contains conversion procedure from VALID UTF-8 strings.
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+}
 
-/*
-    valid_utf8_to_fixed_length converts a valid UTF-8 string into UTF-32.
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
+}
 
-    The `OUTPUT` template type decides what to do with UTF-32: store
-    it directly or convert into UTF-16 (with AVX512).
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
+}
 
-    Input:
-    - str           - valid UTF-8 string
-    - len           - string length
-    - out_buffer    - output buffer
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert(buf, len, utf32_output);
+}
 
-    Result:
-    - pair.first    - the first unprocessed input byte
-    - pair.second   - the first unprocessed output word
-*/
-template <endianness big_endian, typename OUTPUT>
-std::pair<const char *, OUTPUT *>
-valid_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
-  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-  static_assert(
-      UTF32 or UTF16,
-      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-  static_assert(!(UTF32 and big_endian),
-                "we do not currently support big-endian UTF-32");
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, utf32_output);
+}
 
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  const char *ptr = str;
-  const char *end = ptr + len;
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
+}
 
-  OUTPUT *output = dwords;
-  /**
-   * In the main loop, we consume 64 bytes per iteration,
-   * but we access 64 + 4 bytes.
-   * We check for ptr + 64 + 64 <= end because
-   * we want to be do maskless writes without overruns.
-   */
-  while (end - ptr >= 64 + 4) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
-    if (ascii == 0) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
-      continue;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
+                                                                latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    const __m512i lane0 = broadcast_epi128<0>(utf8);
-    const __m512i lane1 = broadcast_epi128<1>(utf8);
-    int valid_count0;
-    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-    const __m512i lane2 = broadcast_epi128<2>(utf8);
-    int valid_count1;
-    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-    if (valid_count0 + valid_count1 <= 16) {
-      vec0 = _mm512_mask_expand_epi32(
-          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-      valid_count0 += valid_count1;
-      vec0 = expand_utf8_to_utf32(vec0);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-    } else {
-      vec0 = expand_utf8_to_utf32(vec0);
-      vec1 = expand_utf8_to_utf32(vec1);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_latin1<endianness::BIG>(buf, len,
+                                                             latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-    const __m512i lane3 = broadcast_epi128<3>(utf8);
-    int valid_count2;
-    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-    uint32_t tmp1;
-    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-    const __m512i lane4 = _mm512_set1_epi32(tmp1);
-    int valid_count3;
-    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-    if (valid_count2 + valid_count3 <= 16) {
-      vec2 = _mm512_mask_expand_epi32(
-          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
-      valid_count2 += valid_count3;
-      vec2 = expand_utf8_to_utf32(vec2);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      avx2_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      vec2 = expand_utf8_to_utf32(vec2);
-      vec3 = expand_utf8_to_utf32(vec3);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+      ret.second += scalar_res.count;
     }
-    ptr += 4 * 16;
   }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  if (end - ptr >= 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-    const __mmask64 ascii = _mm512_test_epi8_mask(utf8, v_80);
-    if (ascii == 0) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      avx2_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                                latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      const __m512i lane0 = broadcast_epi128<0>(utf8);
-      const __m512i lane1 = broadcast_epi128<1>(utf8);
-      int valid_count0;
-      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-      const __m512i lane2 = broadcast_epi128<2>(utf8);
-      int valid_count1;
-      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-      if (valid_count0 + valid_count1 <= 16) {
-        vec0 = _mm512_mask_expand_epi32(
-            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-        valid_count0 += valid_count1;
-        vec0 = expand_utf8_to_utf32(vec0);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      } else {
-        vec0 = expand_utf8_to_utf32(vec0);
-        vec1 = expand_utf8_to_utf32(vec1);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-      }
-
-      const __m512i lane3 = broadcast_epi128<3>(utf8);
-      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
-
-      ptr += 3 * 16;
+      ret.second += scalar_res.count;
     }
   }
-  return {ptr, output};
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-using utf8_to_utf16_result = std::pair<const char *, char16_t *>;
-/* end file src/icelake/icelake_from_valid_utf8.inl.cpp */
-/* begin file src/icelake/icelake_utf8_validation.inl.cpp */
-// file included directly
-
-simdutf_really_inline __m512i check_special_cases(__m512i input,
-                                                  const __m512i prev1) {
-  __m512i mask1 = _mm512_setr_epi64(0x0202020202020202, 0x4915012180808080,
-                                    0x0202020202020202, 0x4915012180808080,
-                                    0x0202020202020202, 0x4915012180808080,
-                                    0x0202020202020202, 0x4915012180808080);
-  const __m512i v_0f = _mm512_set1_epi8(0x0f);
-  __m512i index1 = _mm512_and_si512(_mm512_srli_epi16(prev1, 4), v_0f);
-
-  __m512i byte_1_high = _mm512_shuffle_epi8(mask1, index1);
-  __m512i mask2 = _mm512_setr_epi64(0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
-                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
-                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb,
-                                    0xcbcbcb8b8383a3e7, 0xcbcbdbcbcbcbcbcb);
-  __m512i index2 = _mm512_and_si512(prev1, v_0f);
-
-  __m512i byte_1_low = _mm512_shuffle_epi8(mask2, index2);
-  __m512i mask3 =
-      _mm512_setr_epi64(0x101010101010101, 0x1010101babaaee6, 0x101010101010101,
-                        0x1010101babaaee6, 0x101010101010101, 0x1010101babaaee6,
-                        0x101010101010101, 0x1010101babaaee6);
-  __m512i index3 = _mm512_and_si512(_mm512_srli_epi16(input, 4), v_0f);
-  __m512i byte_2_high = _mm512_shuffle_epi8(mask3, index3);
-  return _mm512_ternarylogic_epi64(byte_1_high, byte_1_low, byte_2_high, 128);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function
+  return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
-simdutf_really_inline __m512i check_multibyte_lengths(const __m512i input,
-                                                      const __m512i prev_input,
-                                                      const __m512i sc) {
-  __m512i prev2 = prev<2>(input, prev_input);
-  __m512i prev3 = prev<3>(input, prev_input);
-  __m512i is_third_byte = _mm512_subs_epu8(
-      prev2, _mm512_set1_epi8(0b11100000u - 1)); // Only 111_____ will be > 0
-  __m512i is_fourth_byte = _mm512_subs_epu8(
-      prev3, _mm512_set1_epi8(0b11110000u - 1)); // Only 1111____ will be > 0
-  __m512i is_third_or_fourth_byte =
-      _mm512_or_si512(is_third_byte, is_fourth_byte);
-  const __m512i v_7f = _mm512_set1_epi8(char(0x7f));
-  is_third_or_fourth_byte = _mm512_adds_epu8(v_7f, is_third_or_fourth_byte);
-  // We want to compute (is_third_or_fourth_byte AND v80) XOR sc.
-  const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-  return _mm512_ternarylogic_epi32(is_third_or_fourth_byte, v_80, sc,
-                                   0b1101010);
-  //__m512i is_third_or_fourth_byte_mask =
-  //_mm512_and_si512(is_third_or_fourth_byte, v_80); return
-  // _mm512_xor_si512(is_third_or_fourth_byte_mask, sc);
-}
-//
-// Return nonzero if there are incomplete multibyte characters at the end of the
-// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
-//
-simdutf_really_inline __m512i is_incomplete(const __m512i input) {
-  // If the previous input's last 3 bytes match this, they're too short (they
-  // ended at EOF):
-  // ... 1111____ 111_____ 11______
-  __m512i max_value = _mm512_setr_epi64(0xffffffffffffffff, 0xffffffffffffffff,
-                                        0xffffffffffffffff, 0xffffffffffffffff,
-                                        0xffffffffffffffff, 0xffffffffffffffff,
-                                        0xffffffffffffffff, 0xbfdfefffffffffff);
-  return _mm512_subs_epu8(input, max_value);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function
+  return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
-struct avx512_utf8_checker {
-  // If this is nonzero, there has been a UTF-8 error.
-  __m512i error{};
-
-  // The last input we received
-  __m512i prev_input_block{};
-  // Whether the last input we received was incomplete (used for ASCII fast
-  // path)
-  __m512i prev_incomplete{};
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8<endianness::LITTLE>(buf, len,
+                                                              utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const __m512i input,
-                                              const __m512i prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    __m512i prev1 = prev<1>(input, prev_input);
-    __m512i sc = check_special_cases(input, prev1);
-    this->error = _mm512_or_si512(
-        check_multibyte_lengths(input, prev_input, sc), this->error);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8<endianness::BIG>(buf, len,
+                                                           utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
+  return saved_bytes;
+}
 
-  // The only problem that can happen at EOF is that a multibyte character is
-  // too short or a byte value too large in the last bytes: check_special_cases
-  // only checks for bytes too large in the first of two bytes.
-  simdutf_really_inline void check_eof() {
-    // If the previous block had incomplete UTF-8 characters at the end, an
-    // ASCII block can't possibly finish them.
-    this->error = _mm512_or_si512(this->error, this->prev_incomplete);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
   }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  // returns true if ASCII.
-  simdutf_really_inline bool check_next_input(const __m512i input) {
-    const __m512i v_80 = _mm512_set1_epi8(char(0x80));
-    const __mmask64 ascii = _mm512_test_epi8_mask(input, v_80);
-    if (ascii == 0) {
-      this->error = _mm512_or_si512(this->error, this->prev_incomplete);
-      return true;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::BIG>(
+          buf, len, utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      this->check_utf8_bytes(input, this->prev_input_block);
-      this->prev_incomplete = is_incomplete(input);
-      this->prev_input_block = input;
-      return false;
+      ret.second += scalar_res.count;
     }
   }
-  // do not forget to call check_eof!
-  simdutf_really_inline bool errors() const {
-    return _mm512_test_epi8_mask(this->error, this->error) != 0;
-  }
-}; // struct avx512_utf8_checker
-/* end file src/icelake/icelake_utf8_validation.inl.cpp */
-/* begin file src/icelake/icelake_from_utf8.inl.cpp */
-// file included directly
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-// File contains conversion procedure from possibly invalid UTF-8 strings.
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16le_to_utf8(buf, len, utf8_output);
+}
 
-/**
- * Attempts to convert up to len 1-byte code units from in (in UTF-8 format) to
- * out.
- * Returns the position of the input and output after the processing is
- * completed. Upon error, the output is set to null.
- */
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16be_to_utf8(buf, len, utf8_output);
+}
 
-template <endianness big_endian>
-utf8_to_utf16_result
-fast_avx512_convert_utf8_to_utf16(const char *in, size_t len, char16_t *out) {
-  const char *const final_in = in + len;
-  bool result = true;
-  while (result) {
-    if (final_in - in >= 64) {
-      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
-          in, out, final_in - in);
-    } else if (in < final_in) {
-      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
-          in, out, final_in - in);
-    } else {
-      break;
-    }
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx2_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
   }
-  if (!result) {
-    out = nullptr;
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-  return std::make_pair(in, out);
+  return saved_bytes;
 }
 
-template <endianness big_endian>
-simdutf::result fast_avx512_convert_utf8_to_utf16_with_errors(const char *in,
-                                                              size_t len,
-                                                              char16_t *out) {
-  const char *const init_in = in;
-  const char16_t *const init_out = out;
-  const char *const final_in = in + len;
-  bool result = true;
-  while (result) {
-    if (final_in - in >= 64) {
-      result = process_block_utf8_to_utf16<SIMDUTF_FULL, big_endian>(
-          in, out, final_in - in);
-    } else if (in < final_in) {
-      result = process_block_utf8_to_utf16<SIMDUTF_TAIL, big_endian>(
-          in, out, final_in - in);
-    } else {
-      break;
-    }
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      avx2_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
   }
-  if (!result) {
-    size_t pos = size_t(in - init_in);
-    if (pos < len && (init_in[pos] & 0xc0) == 0x80 && pos >= 64) {
-      // We must check whether we are the fourth continuation byte
-      bool c1 = (init_in[pos - 1] & 0xc0) == 0x80;
-      bool c2 = (init_in[pos - 2] & 0xc0) == 0x80;
-      bool c3 = (init_in[pos - 3] & 0xc0) == 0x80;
-      if (c1 && c2 && c3) {
-        return {simdutf::TOO_LONG, pos};
-      }
+  size_t saved_bytes = ret.second - latin1_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-    // rewind_and_convert_with_errors will seek a potential error from in
-    // onward, with the ability to go back up to in - init_in bytes, and read
-    // final_in - in bytes forward.
-    simdutf::result res =
-        scalar::utf8_to_utf16::rewind_and_convert_with_errors<big_endian>(
-            in - init_in, in, final_in - in, out);
-    res.count += (in - init_in);
-    return res;
-  } else {
-    return simdutf::result(error_code::SUCCESS, out - init_out);
+    saved_bytes += scalar_saved_bytes;
   }
+  return saved_bytes;
 }
 
-template <endianness big_endian, typename OUTPUT>
-// todo: replace with the utf-8 to utf-16 routine adapted to utf-32. This code
-// is legacy.
-std::pair<const char *, OUTPUT *>
-validating_utf8_to_fixed_length(const char *str, size_t len, OUTPUT *dwords) {
-  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-  static_assert(
-      UTF32 or UTF16,
-      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-  static_assert(!(UTF32 and big_endian),
-                "we do not currently support big-endian UTF-32");
-
-  const char *ptr = str;
-  const char *end = ptr + len;
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  OUTPUT *output = dwords;
-  avx512_utf8_checker checker{};
-  /**
-   * In the main loop, we consume 64 bytes per iteration,
-   * but we access 64 + 4 bytes.
-   * We use masked writes to avoid overruns, see
-   * https://github.com/simdutf/simdutf/issues/471
-   */
-  while (end - ptr >= 64 + 4) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    if (checker.check_next_input(utf8)) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
-      continue;
-    }
-    const __m512i lane0 = broadcast_epi128<0>(utf8);
-    const __m512i lane1 = broadcast_epi128<1>(utf8);
-    int valid_count0;
-    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-    const __m512i lane2 = broadcast_epi128<2>(utf8);
-    int valid_count1;
-    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-    if (valid_count0 + valid_count1 <= 16) {
-      vec0 = _mm512_mask_expand_epi32(
-          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-      valid_count0 += valid_count1;
-      vec0 = expand_utf8_to_utf32(vec0);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-    } else {
-      vec0 = expand_utf8_to_utf32(vec0);
-      vec1 = expand_utf8_to_utf32(vec1);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-    }
-    const __m512i lane3 = broadcast_epi128<3>(utf8);
-    int valid_count2;
-    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-    uint32_t tmp1;
-    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-    const __m512i lane4 = _mm512_set1_epi32(tmp1);
-    int valid_count3;
-    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-    if (valid_count2 + valid_count3 <= 16) {
-      vec2 = _mm512_mask_expand_epi32(
-          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
-      valid_count2 += valid_count3;
-      vec2 = expand_utf8_to_utf32(vec2);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      avx2_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+  if (ret.first.count != len) {
+    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      vec2 = expand_utf8_to_utf32(vec2);
-      vec3 = expand_utf8_to_utf32(vec3);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+      ret.second += scalar_res.count;
     }
-    ptr += 4 * 16;
   }
-  const char *validatedptr = ptr; // validated up to ptr
-
-  // For the final pass, we validate 64 bytes, but we only transcode
-  // 3*16 bytes, so we may end up double-validating 16 bytes.
-  if (end - ptr >= 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    if (checker.check_next_input(utf8)) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
-    } else {
-      const __m512i lane0 = broadcast_epi128<0>(utf8);
-      const __m512i lane1 = broadcast_epi128<1>(utf8);
-      int valid_count0;
-      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-      const __m512i lane2 = broadcast_epi128<2>(utf8);
-      int valid_count1;
-      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-      if (valid_count0 + valid_count1 <= 16) {
-        vec0 = _mm512_mask_expand_epi32(
-            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-        valid_count0 += valid_count1;
-        vec0 = expand_utf8_to_utf32(vec0);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      } else {
-        vec0 = expand_utf8_to_utf32(vec0);
-        vec1 = expand_utf8_to_utf32(vec1);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-      }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-      const __m512i lane3 = broadcast_epi128<3>(utf8);
-      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  return convert_utf32_to_latin1(buf, len, latin1_output);
+}
 
-      ptr += 3 * 16;
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      haswell::avx2_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  if (ret.first.count != len) {
+    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
-    validatedptr += 4 * 16;
   }
-  if (end != validatedptr) {
-    const __m512i utf8 =
-        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
-                                (const __m512i *)validatedptr);
-    checker.check_next_input(utf8);
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
+                                                               utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
   }
-  checker.check_eof();
-  if (checker.errors()) {
-    return {ptr, nullptr}; // We found an error.
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
   }
-  return {ptr, output};
+  return saved_bytes;
 }
 
-// Like validating_utf8_to_fixed_length but returns as soon as an error is
-// identified todo: replace with the utf-8 to utf-16 routine adapted to utf-32.
-// This code is legacy.
-template <endianness big_endian, typename OUTPUT>
-std::tuple<const char *, OUTPUT *, bool>
-validating_utf8_to_fixed_length_with_constant_checks(const char *str,
-                                                     size_t len,
-                                                     OUTPUT *dwords) {
-  constexpr bool UTF32 = std::is_same<OUTPUT, uint32_t>::value;
-  constexpr bool UTF16 = std::is_same<OUTPUT, char16_t>::value;
-  static_assert(
-      UTF32 or UTF16,
-      "output type has to be uint32_t (for UTF-32) or char16_t (for UTF-16)");
-  static_assert(!(UTF32 and big_endian),
-                "we do not currently support big-endian UTF-32");
-
-  const char *ptr = str;
-  const char *end = ptr + len;
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  OUTPUT *output = dwords;
-  avx512_utf8_checker checker{};
-  /**
-   * In the main loop, we consume 64 bytes per iteration,
-   * but we access 64 + 4 bytes.
-   */
-  while (end - ptr >= 4 + 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    bool ascii = checker.check_next_input(utf8);
-    if (checker.errors()) {
-      return {ptr, output, false}; // We found an error.
-    }
-    if (ascii) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
-      continue;
-    }
-    const __m512i lane0 = broadcast_epi128<0>(utf8);
-    const __m512i lane1 = broadcast_epi128<1>(utf8);
-    int valid_count0;
-    __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-    const __m512i lane2 = broadcast_epi128<2>(utf8);
-    int valid_count1;
-    __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-    if (valid_count0 + valid_count1 <= 16) {
-      vec0 = _mm512_mask_expand_epi32(
-          vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-      valid_count0 += valid_count1;
-      vec0 = expand_utf8_to_utf32(vec0);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-    } else {
-      vec0 = expand_utf8_to_utf32(vec0);
-      vec1 = expand_utf8_to_utf32(vec1);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-    }
-    const __m512i lane3 = broadcast_epi128<3>(utf8);
-    int valid_count2;
-    __m512i vec2 = expand_and_identify(lane2, lane3, valid_count2);
-    uint32_t tmp1;
-    ::memcpy(&tmp1, ptr + 64, sizeof(tmp1));
-    const __m512i lane4 = _mm512_set1_epi32(tmp1);
-    int valid_count3;
-    __m512i vec3 = expand_and_identify(lane3, lane4, valid_count3);
-    if (valid_count2 + valid_count3 <= 16) {
-      vec2 = _mm512_mask_expand_epi32(
-          vec2, __mmask16(((1 << valid_count3) - 1) << valid_count2), vec3);
-      valid_count2 += valid_count3;
-      vec2 = expand_utf8_to_utf32(vec2);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-    } else {
-      vec2 = expand_utf8_to_utf32(vec2);
-      vec3 = expand_utf8_to_utf32(vec3);
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec2, valid_count2, true)
-      SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec3, valid_count3, true)
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32<endianness::BIG>(buf, len,
+                                                            utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-    ptr += 4 * 16;
+    saved_bytes += scalar_saved_bytes;
   }
-  const char *validatedptr = ptr; // validated up to ptr
+  return saved_bytes;
+}
 
-  // For the final pass, we validate 64 bytes, but we only transcode
-  // 3*16 bytes, so we may end up double-validating 16 bytes.
-  if (end - ptr >= 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    bool ascii = checker.check_next_input(utf8);
-    if (checker.errors()) {
-      return {ptr, output, false}; // We found an error.
-    }
-    if (ascii) {
-      SIMDUTF_ICELAKE_STORE_ASCII(UTF32, utf8, output)
-      output += 64;
-      ptr += 64;
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
     } else {
-      const __m512i lane0 = broadcast_epi128<0>(utf8);
-      const __m512i lane1 = broadcast_epi128<1>(utf8);
-      int valid_count0;
-      __m512i vec0 = expand_and_identify(lane0, lane1, valid_count0);
-      const __m512i lane2 = broadcast_epi128<2>(utf8);
-      int valid_count1;
-      __m512i vec1 = expand_and_identify(lane1, lane2, valid_count1);
-      if (valid_count0 + valid_count1 <= 16) {
-        vec0 = _mm512_mask_expand_epi32(
-            vec0, __mmask16(((1 << valid_count1) - 1) << valid_count0), vec1);
-        valid_count0 += valid_count1;
-        vec0 = expand_utf8_to_utf32(vec0);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-      } else {
-        vec0 = expand_utf8_to_utf32(vec0);
-        vec1 = expand_utf8_to_utf32(vec1);
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec0, valid_count0, true)
-        SIMDUTF_ICELAKE_WRITE_UTF16_OR_UTF32(vec1, valid_count1, true)
-      }
-
-      const __m512i lane3 = broadcast_epi128<3>(utf8);
-      SIMDUTF_ICELAKE_TRANSCODE16(lane2, lane3, true)
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-      ptr += 3 * 16;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::BIG>(
+          buf, len, utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
-    validatedptr += 4 * 16;
-  }
-  if (end != validatedptr) {
-    const __m512i utf8 =
-        _mm512_maskz_loadu_epi8(~UINT64_C(0) >> (64 - (end - validatedptr)),
-                                (const __m512i *)validatedptr);
-    checker.check_next_input(utf8);
-  }
-  checker.check_eof();
-  if (checker.errors()) {
-    return {ptr, output, false}; // We found an error.
   }
-  return {ptr, output, true};
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
-/* end file src/icelake/icelake_from_utf8.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf8_to_latin1.inl.cpp */
-// file included directly
 
-// File contains conversion procedure from possibly invalid UTF-8 strings.
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf32_to_utf8(buf, len, utf8_output);
+}
 
-template <bool is_remaining>
-simdutf_really_inline size_t process_block_from_utf8_to_latin1(
-    const char *buf, size_t len, char *latin_output, __m512i minus64,
-    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
-  __mmask64 load_mask =
-      is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
-  __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
-  __mmask64 nonascii = _mm512_movepi8_mask(input);
-  if (nonascii == 0) {
-    if (*next_leading_ptr) { // If we ended with a leading byte, it is an error.
-      return 0;              // Indicates error
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx2_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-    is_remaining
-        ? _mm512_mask_storeu_epi8((__m512i *)latin_output, load_mask, input)
-        : _mm512_storeu_si512((__m512i *)latin_output, input);
-    return len;
+    saved_bytes += scalar_saved_bytes;
   }
+  return saved_bytes;
+}
 
-  const __mmask64 leading = _mm512_cmpge_epu8_mask(input, minus64);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      avx2_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-  __m512i highbits = _mm512_xor_si512(input, _mm512_set1_epi8(-62));
-  __mmask64 invalid_leading_bytes =
-      _mm512_mask_cmpgt_epu8_mask(leading, highbits, one);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+          buf, len, utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  if (invalid_leading_bytes) {
-    return 0; // Indicates error
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::BIG>(
+          buf, len, utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
   }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  __mmask64 leading_shift = (leading << 1) | *next_leading_ptr;
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16le(buf, len, utf16_output);
+}
 
-  if ((nonascii ^ leading) != leading_shift) {
-    return 0; // Indicates error
-  }
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16be(buf, len, utf16_output);
+}
 
-  const __mmask64 bit6 = _mm512_cmpeq_epi8_mask(highbits, one);
-  input =
-      _mm512_mask_sub_epi8(input, (bit6 << 1) | *next_bit6_ptr, input, minus64);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16le_to_utf32(buf, len, utf32_output);
+}
 
-  __mmask64 retain = ~leading & load_mask;
-  __m512i output = _mm512_maskz_compress_epi8(retain, input);
-  int64_t written_out = count_ones(retain);
-  if (written_out == 0) {
-    return 0; // Indicates error
-  }
-  *next_bit6_ptr = bit6 >> 63;
-  *next_leading_ptr = leading >> 63;
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16be_to_utf32(buf, len, utf32_output);
+}
 
-  __mmask64 store_mask = ~UINT64_C(0) >> (64 - written_out);
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  utf16::change_endianness_utf16(input, length, output);
+}
 
-  _mm512_mask_storeu_epi8((__m512i *)latin_output, store_mask, output);
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::LITTLE>(input, length);
+}
 
-  return written_out;
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-size_t utf8_to_latin1_avx512(const char *&inbuf, size_t len,
-                             char *&inlatin_output) {
-  const char *buf = inbuf;
-  char *latin_output = inlatin_output;
-  char *start = latin_output;
-  size_t pos = 0;
-  __m512i minus64 = _mm512_set1_epi8(-64); // 11111111111 ... 1100 0000
-  __m512i one = _mm512_set1_epi8(1);
-  __mmask64 next_leading = 0;
-  __mmask64 next_bit6 = 0;
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
+}
 
-  while (pos + 64 <= len) {
-    size_t written = process_block_from_utf8_to_latin1<false>(
-        buf + pos, 64, latin_output, minus64, one, &next_leading, &next_bit6);
-    if (written == 0) {
-      inlatin_output = latin_output;
-      inbuf = buf + pos - next_leading;
-      return 0; // Indicates error at pos or after, or just before pos (too
-                // short error)
-    }
-    latin_output += written;
-    pos += 64;
-  }
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
+}
 
-  if (pos < len) {
-    size_t remaining = len - pos;
-    size_t written = process_block_from_utf8_to_latin1<true>(
-        buf + pos, remaining, latin_output, minus64, one, &next_leading,
-        &next_bit6);
-    if (written == 0) {
-      inbuf = buf + pos - next_leading;
-      inlatin_output = latin_output;
-      return 0; // Indicates error at pos or after, or just before pos (too
-                // short error)
-    }
-    latin_output += written;
-  }
-  if (next_leading) {
-    inbuf = buf + len - next_leading;
-    inlatin_output = latin_output;
-    return 0; // Indicates error at end of buffer
-  }
-  inlatin_output = latin_output;
-  inbuf += len;
-  return size_t(latin_output - start);
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
+  return scalar::utf16::latin1_length_from_utf16(length);
 }
-/* end file src/icelake/icelake_convert_utf8_to_latin1.inl.cpp */
-/* begin file src/icelake/icelake_convert_valid_utf8_to_latin1.inl.cpp */
-// file included directly
 
-// File contains conversion procedure from valid UTF-8 strings.
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
+  return scalar::utf32::latin1_length_from_utf32(length);
+}
 
-template <bool is_remaining>
-simdutf_really_inline size_t process_valid_block_from_utf8_to_latin1(
-    const char *buf, size_t len, char *latin_output, __m512i minus64,
-    __m512i one, __mmask64 *next_leading_ptr, __mmask64 *next_bit6_ptr) {
-  __mmask64 load_mask =
-      is_remaining ? _bzhi_u64(~0ULL, (unsigned int)len) : ~0ULL;
-  __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)buf);
-  __mmask64 nonascii = _mm512_movepi8_mask(input);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
+}
 
-  if (nonascii == 0) {
-    is_remaining
-        ? _mm512_mask_storeu_epi8((__m512i *)latin_output, load_mask, input)
-        : _mm512_storeu_si512((__m512i *)latin_output, input);
-    return len;
-  }
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
+}
 
-  __mmask64 leading = _mm512_cmpge_epu8_mask(input, minus64);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
+}
 
-  __m512i highbits = _mm512_xor_si512(input, _mm512_set1_epi8(-62));
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
+}
 
-  *next_leading_ptr = leading >> 63;
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf16_length_from_latin1(length);
+}
 
-  __mmask64 bit6 = _mm512_cmpeq_epi8_mask(highbits, one);
-  input =
-      _mm512_mask_sub_epi8(input, (bit6 << 1) | *next_bit6_ptr, input, minus64);
-  *next_bit6_ptr = bit6 >> 63;
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::utf16_length_from_utf8(input, length);
+}
 
-  __mmask64 retain = ~leading & load_mask;
-  __m512i output = _mm512_maskz_compress_epi8(retain, input);
-  int64_t written_out = count_ones(retain);
-  if (written_out == 0) {
-    return 0; // Indicates error
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
+  return scalar::latin1::utf32_length_from_latin1(length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t len) const noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
+  size_t answer = len / sizeof(__m256i) * sizeof(__m256i);
+  size_t i = 0;
+  if (answer >= 2048) { // long strings optimization
+    __m256i four_64bits = _mm256_setzero_si256();
+    while (i + sizeof(__m256i) <= len) {
+      __m256i runner = _mm256_setzero_si256();
+      // We can do up to 255 loops without overflow.
+      size_t iterations = (len - i) / sizeof(__m256i);
+      if (iterations > 255) {
+        iterations = 255;
+      }
+      size_t max_i = i + iterations * sizeof(__m256i) - sizeof(__m256i);
+      for (; i + 4 * sizeof(__m256i) <= max_i; i += 4 * sizeof(__m256i)) {
+        __m256i input1 = _mm256_loadu_si256((const __m256i *)(data + i));
+        __m256i input2 =
+            _mm256_loadu_si256((const __m256i *)(data + i + sizeof(__m256i)));
+        __m256i input3 = _mm256_loadu_si256(
+            (const __m256i *)(data + i + 2 * sizeof(__m256i)));
+        __m256i input4 = _mm256_loadu_si256(
+            (const __m256i *)(data + i + 3 * sizeof(__m256i)));
+        __m256i input12 =
+            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input1),
+                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input2));
+        __m256i input23 =
+            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input3),
+                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input4));
+        __m256i input1234 = _mm256_add_epi8(input12, input23);
+        runner = _mm256_sub_epi8(runner, input1234);
+      }
+      for (; i <= max_i; i += sizeof(__m256i)) {
+        __m256i input_256_chunk =
+            _mm256_loadu_si256((const __m256i *)(data + i));
+        runner = _mm256_sub_epi8(
+            runner, _mm256_cmpgt_epi8(_mm256_setzero_si256(), input_256_chunk));
+      }
+      four_64bits = _mm256_add_epi64(
+          four_64bits, _mm256_sad_epu8(runner, _mm256_setzero_si256()));
+    }
+    answer += _mm256_extract_epi64(four_64bits, 0) +
+              _mm256_extract_epi64(four_64bits, 1) +
+              _mm256_extract_epi64(four_64bits, 2) +
+              _mm256_extract_epi64(four_64bits, 3);
+  } else if (answer > 0) {
+    for (; i + sizeof(__m256i) <= len; i += sizeof(__m256i)) {
+      __m256i latin = _mm256_loadu_si256((const __m256i *)(data + i));
+      uint32_t non_ascii = _mm256_movemask_epi8(latin);
+      answer += count_ones(non_ascii);
+    }
   }
-  __mmask64 store_mask = ~UINT64_C(0) >> (64 - written_out);
-  // Optimization opportunity: sometimes, masked writes are not needed.
-  _mm512_mask_storeu_epi8((__m512i *)latin_output, store_mask, output);
-  return written_out;
+  return answer + scalar::latin1::utf8_length_from_latin1(
+                      reinterpret_cast<const char *>(data + i), len - i);
 }
 
-size_t valid_utf8_to_latin1_avx512(const char *buf, size_t len,
-                                   char *latin_output) {
-  char *start = latin_output;
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const __m256i v_00000000 = _mm256_setzero_si256();
+  const __m256i v_ffffff80 = _mm256_set1_epi32((uint32_t)0xffffff80);
+  const __m256i v_fffff800 = _mm256_set1_epi32((uint32_t)0xfffff800);
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
-  __m512i minus64 = _mm512_set1_epi8(-64); // 11111111111 ... 1100 0000
-  __m512i one = _mm512_set1_epi8(1);
-  __mmask64 next_leading = 0;
-  __mmask64 next_bit6 = 0;
+  size_t count = 0;
+  for (; pos + 8 <= length; pos += 8) {
+    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
+    const __m256i ascii_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffffff80), v_00000000);
+    const __m256i one_two_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_fffff800), v_00000000);
+    const __m256i two_bytes_bytemask =
+        _mm256_xor_si256(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const __m256i one_two_three_bytes_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const __m256i three_bytes_bytemask =
+        _mm256_xor_si256(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
+    const uint32_t ascii_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(ascii_bytes_bytemask));
+    const uint32_t two_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(two_bytes_bytemask));
+    const uint32_t three_bytes_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(three_bytes_bytemask));
 
-  while (pos + 64 <= len) {
-    size_t written = process_valid_block_from_utf8_to_latin1<false>(
-        buf + pos, 64, latin_output, minus64, one, &next_leading, &next_bit6);
-    latin_output += written;
-    pos += 64;
+    size_t ascii_count = count_ones(ascii_bytes_bitmask) / 4;
+    size_t two_bytes_count = count_ones(two_bytes_bitmask) / 4;
+    size_t three_bytes_count = count_ones(three_bytes_bitmask) / 4;
+    count += 32 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
   }
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+}
 
-  if (pos < len) {
-    size_t remaining = len - pos;
-    size_t written = process_valid_block_from_utf8_to_latin1<true>(
-        buf + pos, remaining, latin_output, minus64, one, &next_leading,
-        &next_bit6);
-    latin_output += written;
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const __m256i v_00000000 = _mm256_setzero_si256();
+  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 8 <= length; pos += 8) {
+    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
+    const __m256i surrogate_bytemask =
+        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+    const uint32_t surrogate_bitmask =
+        static_cast<uint32_t>(_mm256_movemask_epi8(surrogate_bytemask));
+    size_t surrogate_count = (32 - count_ones(surrogate_bitmask)) / 4;
+    count += 8 + surrogate_count;
   }
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
+}
 
-  return (size_t)(latin_output - start);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
 }
-/* end file src/icelake/icelake_convert_valid_utf8_to_latin1.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf16_to_latin1.inl.cpp */
-// file included directly
-template <endianness big_endian>
-size_t icelake_convert_utf16_to_latin1(const char16_t *buf, size_t len,
-                                       char *latin1_output) {
-  const char16_t *end = buf + len;
-  __m512i v_0xFF = _mm512_set1_epi16(0xff);
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  __m512i shufmask = _mm512_set_epi8(
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 62, 60, 58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38,
-      36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0);
-  while (end - buf >= 32) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
-      return 0;
-    }
-    _mm256_storeu_si256(
-        (__m256i *)latin1_output,
-        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
-    latin1_output += 32;
-    buf += 32;
-  }
-  if (buf < end) {
-    uint32_t mask(uint32_t(1 << (end - buf)) - 1);
-    __m512i in = _mm512_maskz_loadu_epi16(mask, buf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
-      return 0;
-    }
-    _mm256_mask_storeu_epi8(
-        latin1_output, mask,
-        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
-  }
-  return len;
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
-template <endianness big_endian>
-std::pair<result, char *>
-icelake_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
-                                            char *latin1_output) {
-  const char16_t *end = buf + len;
-  const char16_t *start = buf;
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  __m512i v_0xFF = _mm512_set1_epi16(0xff);
-  __m512i shufmask = _mm512_set_epi8(
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 62, 60, 58, 56, 54, 52, 50, 48, 46, 44, 42, 40, 38,
-      36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0);
-  while (end - buf >= 32) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
-      uint16_t word;
-      while ((word = (big_endian ? scalar::utf16::swap_bytes(uint16_t(*buf))
-                                 : uint16_t(*buf))) <= 0xff) {
-        *latin1_output++ = uint8_t(word);
-        buf++;
-      }
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            latin1_output);
-    }
-    _mm256_storeu_si256(
-        (__m256i *)latin1_output,
-        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
-    latin1_output += 32;
-    buf += 32;
-  }
-  if (buf < end) {
-    uint32_t mask(uint32_t(1 << (end - buf)) - 1);
-    __m512i in = _mm512_maskz_loadu_epi16(mask, buf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    if (_mm512_cmpgt_epu16_mask(in, v_0xFF)) {
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
 
-      uint16_t word;
-      while ((word = (big_endian ? scalar::utf16::swap_bytes(uint16_t(*buf))
-                                 : uint16_t(*buf))) <= 0xff) {
-        *latin1_output++ = uint8_t(word);
-        buf++;
-      }
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            latin1_output);
-    }
-    _mm256_mask_storeu_epi8(
-        latin1_output, mask,
-        _mm512_castsi512_si256(_mm512_permutexvar_epi8(shufmask, in)));
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
+
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
+    return encode_base64<true>(output, input, length, options);
+  } else {
+    return encode_base64<false>(output, input, length, options);
   }
-  return std::make_pair(result(error_code::SUCCESS, len), latin1_output);
 }
-/* end file src/icelake/icelake_convert_utf16_to_latin1.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf16_to_utf8.inl.cpp */
-// file included directly
+} // namespace haswell
+} // namespace simdutf
 
-/**
- * This function converts the input (inbuf, inlen), assumed to be valid
- * UTF16 (little endian) into UTF-8 (to outbuf). The number of code units
- * written is written to 'outlen' and the function reports the number of input
- * word consumed.
- */
-template <endianness big_endian>
-size_t utf16_to_utf8_avx512i(const char16_t *inbuf, size_t inlen,
-                             unsigned char *outbuf, size_t *outlen) {
-  __m512i in;
-  __mmask32 inmask = _cvtu32_mask32(0x7fffffff);
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  const char16_t *const inbuf_orig = inbuf;
-  const unsigned char *const outbuf_orig = outbuf;
-  int adjust = 0;
-  int carry = 0;
+/* begin file src/simdutf/haswell/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_HASWELL
+// nothing needed.
+#else
+SIMDUTF_UNTARGET_REGION
+#endif
 
-  while (inlen >= 32) {
-    in = _mm512_loadu_si512(inbuf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    inlen -= 31;
-  lastiteration:
-    inbuf += 31;
 
-  failiteration:
-    const __mmask32 is234byte = _mm512_mask_cmp_epu16_mask(
-        inmask, in, _mm512_set1_epi16(0x0080), _MM_CMPINT_NLT);
+#if SIMDUTF_GCC11ORMORE // workaround for
+                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
+SIMDUTF_POP_DISABLE_WARNINGS
+#endif // end of workaround
+/* end file src/simdutf/haswell/end.h */
+/* end file src/haswell/implementation.cpp */
+#endif
+#if SIMDUTF_IMPLEMENTATION_PPC64
+/* begin file src/ppc64/implementation.cpp */
 
-    if (_ktestz_mask32_u8(inmask, is234byte)) {
-      // fast path for ASCII only
-      _mm512_mask_cvtepi16_storeu_epi8(outbuf, inmask, in);
-      outbuf += 31;
-      carry = 0;
 
-      if (inlen < 32) {
-        goto tail;
-      } else {
-        continue;
-      }
-    }
 
-    const __mmask32 is12byte =
-        _mm512_cmp_epu16_mask(in, _mm512_set1_epi16(0x0800), _MM_CMPINT_LT);
 
-    if (_ktestc_mask32_u8(is12byte, inmask)) {
-      // fast path for 1 and 2 byte only
 
-      const __m512i twobytes = _mm512_ternarylogic_epi32(
-          _mm512_slli_epi16(in, 8), _mm512_srli_epi16(in, 6),
-          _mm512_set1_epi16(0x3f3f), 0xa8); // (A|B)&C
-      in = _mm512_mask_add_epi16(in, is234byte, twobytes,
-                                 _mm512_set1_epi16(int16_t(0x80c0)));
-      const __m512i cmpmask =
-          _mm512_mask_blend_epi16(inmask, _mm512_set1_epi16(int16_t(0xffff)),
-                                  _mm512_set1_epi16(0x0800));
-      const __mmask64 smoosh =
-          _mm512_cmp_epu8_mask(in, cmpmask, _MM_CMPINT_NLT);
-      const __m512i out = _mm512_maskz_compress_epi8(smoosh, in);
-      _mm512_mask_storeu_epi8(outbuf,
-                              _cvtu64_mask64(_pext_u64(_cvtmask64_u64(smoosh),
-                                                       _cvtmask64_u64(smoosh))),
-                              out);
-      outbuf += 31 + _mm_popcnt_u32(_cvtmask32_u32(is234byte));
-      carry = 0;
+/* begin file src/simdutf/ppc64/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "ppc64"
+// #define SIMDUTF_IMPLEMENTATION ppc64
+/* end file src/simdutf/ppc64/begin.h */
+namespace simdutf {
+namespace ppc64 {
+namespace {
+#ifndef SIMDUTF_PPC64_H
+  #error "ppc64.h must be included"
+#endif
+using namespace simd;
 
-      if (inlen < 32) {
-        goto tail;
-      } else {
-        continue;
-      }
-    }
-    __m512i lo = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
-    __m512i hi = _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
+  // careful: 0x80 is not ascii.
+  return input.reduce_or().saturating_sub(0b01111111u).bits_not_set_anywhere();
+}
 
-    __m512i taglo = _mm512_set1_epi32(0x8080e000);
-    __m512i taghi = taglo;
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_second_byte =
+      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
+         int8_t(0);
+}
 
-    const __m512i fc00masked =
-        _mm512_and_epi32(in, _mm512_set1_epi16(int16_t(0xfc00)));
-    const __mmask32 hisurr = _mm512_mask_cmp_epu16_mask(
-        inmask, fc00masked, _mm512_set1_epi16(int16_t(0xd800)), _MM_CMPINT_EQ);
-    const __mmask32 losurr = _mm512_cmp_epu16_mask(
-        fc00masked, _mm512_set1_epi16(int16_t(0xdc00)), _MM_CMPINT_EQ);
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<uint8_t> is_third_byte =
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
+  simd8<uint8_t> is_fourth_byte =
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
+  // Caller requires a bool (all 1's). All values resulting from the subtraction
+  // will be <= 64, so signed comparison is fine.
+  return simd8<bool>(is_third_byte | is_fourth_byte);
+}
 
-    int carryout = 0;
-    if (!_kortestz_mask32_u8(hisurr, losurr)) {
-      // handle surrogates
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
 
-      __m512i los = _mm512_alignr_epi32(hi, lo, 1);
-      __m512i his = _mm512_alignr_epi32(lo, hi, 1);
+/* begin file src/generic/buf_block_reader.h */
+namespace simdutf {
+namespace ppc64 {
+namespace {
 
-      const __mmask32 hisurrhi = _kshiftri_mask32(hisurr, 16);
-      taglo = _mm512_mask_mov_epi32(taglo, __mmask16(hisurr),
-                                    _mm512_set1_epi32(0x808080f0));
-      taghi = _mm512_mask_mov_epi32(taghi, __mmask16(hisurrhi),
-                                    _mm512_set1_epi32(0x808080f0));
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
+public:
+  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
+  simdutf_really_inline size_t block_index();
+  simdutf_really_inline bool has_full_block() const;
+  simdutf_really_inline const uint8_t *full_block() const;
+  /**
+   * Get the last block, padded with spaces.
+   *
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
+   *
+   * @return the number of effective characters in the last block.
+   */
+  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
+  simdutf_really_inline void advance();
 
-      lo = _mm512_mask_slli_epi32(lo, __mmask16(hisurr), lo, 10);
-      hi = _mm512_mask_slli_epi32(hi, __mmask16(hisurrhi), hi, 10);
-      los = _mm512_add_epi32(los, _mm512_set1_epi32(0xfca02400));
-      his = _mm512_add_epi32(his, _mm512_set1_epi32(0xfca02400));
-      lo = _mm512_mask_add_epi32(lo, __mmask16(hisurr), lo, los);
-      hi = _mm512_mask_add_epi32(hi, __mmask16(hisurrhi), hi, his);
+private:
+  const uint8_t *buf;
+  const size_t len;
+  const size_t lenminusstep;
+  size_t idx;
+};
 
-      carryout = _cvtu32_mask32(_kshiftri_mask32(hisurr, 30));
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
+}
 
-      const uint32_t h = _cvtmask32_u32(hisurr);
-      const uint32_t l = _cvtmask32_u32(losurr);
-      // check for mismatched surrogates
-      if ((h + h + carry) ^ l) {
-        const uint32_t lonohi = l & ~(h + h + carry);
-        const uint32_t hinolo = h & ~(l >> 1);
-        inlen = _tzcnt_u32(hinolo | lonohi);
-        inmask = __mmask32(0x7fffffff & ((1U << inlen) - 1));
-        in = _mm512_maskz_mov_epi16(inmask, in);
-        adjust = (int)inlen - 31;
-        inlen = 0;
-        goto failiteration;
-      }
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
     }
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
+}
 
-    hi = _mm512_maskz_mov_epi32(_cvtu32_mask16(0x7fff), hi);
-    carry = carryout;
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
+    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+  }
+  buf[64] = '\0';
+  return buf;
+}
 
-    __m512i mslo =
-        _mm512_multishift_epi64_epi8(_mm512_set1_epi64(0x20262c3200060c12), lo);
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-    __m512i mshi =
-        _mm512_multishift_epi64_epi8(_mm512_set1_epi64(0x20262c3200060c12), hi);
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
+}
 
-    const __mmask32 outmask = __mmask32(_kandn_mask64(losurr, inmask));
-    const __mmask64 outmhi = _kshiftri_mask64(outmask, 16);
+template <size_t STEP_SIZE>
+simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
+  return idx < lenminusstep;
+}
 
-    const __mmask32 is1byte = __mmask32(_knot_mask64(is234byte));
-    const __mmask64 is1bhi = _kshiftri_mask64(is1byte, 16);
-    const __mmask64 is12bhi = _kshiftri_mask64(is12byte, 16);
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
+  return &buf[idx];
+}
 
-    taglo = _mm512_mask_mov_epi32(taglo, __mmask16(is12byte),
-                                  _mm512_set1_epi32(0x80c00000));
-    taghi = _mm512_mask_mov_epi32(taghi, __mmask16(is12bhi),
-                                  _mm512_set1_epi32(0x80c00000));
-    __m512i magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
-                                              _mm512_set1_epi32(0xffffffff),
-                                              _mm512_set1_epi32(0x00010101));
-    __m512i magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
-                                              _mm512_set1_epi32(0xffffffff),
-                                              _mm512_set1_epi32(0x00010101));
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
+  std::memcpy(dst, buf + idx, len - idx);
+  return len - idx;
+}
 
-    magiclo = _mm512_mask_blend_epi32(__mmask16(outmask),
-                                      _mm512_set1_epi32(0xffffffff),
-                                      _mm512_set1_epi32(0x00010101));
-    magichi = _mm512_mask_blend_epi32(__mmask16(outmhi),
-                                      _mm512_set1_epi32(0xffffffff),
-                                      _mm512_set1_epi32(0x00010101));
+template <size_t STEP_SIZE>
+simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
+  idx += STEP_SIZE;
+}
 
-    mslo = _mm512_ternarylogic_epi32(mslo, _mm512_set1_epi32(0x3f3f3f3f), taglo,
-                                     0xea); // A&B|C
-    mshi = _mm512_ternarylogic_epi32(mshi, _mm512_set1_epi32(0x3f3f3f3f), taghi,
-                                     0xea);
-    mslo = _mm512_mask_slli_epi32(mslo, __mmask16(is1byte), lo, 24);
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/buf_block_reader.h */
+/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_validation {
 
-    mshi = _mm512_mask_slli_epi32(mshi, __mmask16(is1bhi), hi, 24);
+using namespace simd;
 
-    const __mmask64 wantlo =
-        _mm512_cmp_epu8_mask(mslo, magiclo, _MM_CMPINT_NLT);
-    const __mmask64 wanthi =
-        _mm512_cmp_epu8_mask(mshi, magichi, _MM_CMPINT_NLT);
-    const __m512i outlo = _mm512_maskz_compress_epi8(wantlo, mslo);
-    const __m512i outhi = _mm512_maskz_compress_epi8(wanthi, mshi);
-    const uint64_t wantlo_uint64 = _cvtmask64_u64(wantlo);
-    const uint64_t wanthi_uint64 = _cvtmask64_u64(wanthi);
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
+
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
+
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
+
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
+}
 
-    uint64_t advlo = _mm_popcnt_u64(wantlo_uint64);
-    uint64_t advhi = _mm_popcnt_u64(wanthi_uint64);
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
 
-    _mm512_mask_storeu_epi8(
-        outbuf, _cvtu64_mask64(_pext_u64(wantlo_uint64, wantlo_uint64)), outlo);
-    _mm512_mask_storeu_epi8(
-        outbuf + advlo, _cvtu64_mask64(_pext_u64(wanthi_uint64, wanthi_uint64)),
-        outhi);
-    outbuf += advlo + advhi;
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
-  outbuf += -adjust;
 
-tail:
-  if (inlen != 0) {
-    // We must have inlen < 31.
-    inmask = _cvtu32_mask32((1U << inlen) - 1);
-    in = _mm512_maskz_loadu_epi16(inmask, inbuf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
-    }
-    adjust = (int)inlen - 31;
-    inlen = 0;
-    goto lastiteration;
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
   }
-  *outlen = (outbuf - outbuf_orig) + adjust;
-  return ((inbuf - inbuf_orig) + adjust);
-}
-/* end file src/icelake/icelake_convert_utf16_to_utf8.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf16_to_utf32.inl.cpp */
-// file included directly
 
-/*
-  Returns a pair: the first unprocessed byte from buf and utf32_output
-  A scalar routing should carry on the conversion of the tail.
-*/
-template <endianness big_endian>
-std::tuple<const char16_t *, char32_t *, bool>
-convert_utf16_to_utf32(const char16_t *buf, size_t len,
-                       char32_t *utf32_output) {
-  const char16_t *end = buf + len;
-  const __m512i v_fc00 = _mm512_set1_epi16((uint16_t)0xfc00);
-  const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
-  const __m512i v_dc00 = _mm512_set1_epi16((uint16_t)0xdc00);
-  __mmask32 carry{0};
-  const __m512i byteflip = _mm512_setr_epi64(
-      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-      0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  while (std::distance(buf, end) >= 32) {
-    // Always safe because buf + 32 <= end so that end - buf >= 32 bytes:
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    if (big_endian) {
-      in = _mm512_shuffle_epi8(in, byteflip);
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
+      this->error |= this->prev_incomplete;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+      }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
+  }
 
-    // H - bitmask for high surrogates
-    const __mmask32 H =
-        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_d800);
-    // H - bitmask for low surrogates
-    const __mmask32 L =
-        _mm512_cmpeq_epi16_mask(_mm512_and_si512(in, v_fc00), v_dc00);
-
-    if ((H | L)) {
-      // surrogate pair(s) in a register
-      const __mmask32 V =
-          (L ^
-           (carry | (H << 1))); // A high surrogate must be followed by low one
-                                // and a low one must be preceded by a high one.
-                                // If valid, V should be equal to 0
-
-      if (V == 0) {
-        // valid case
-        /*
-            Input surrogate pair:
-            |1101.11aa.aaaa.aaaa|1101.10bb.bbbb.bbbb|
-                low surrogate      high surrogate
-        */
-        /*  1. Expand all code units to 32-bit code units
-            in
-           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
-        */
-        const __m512i first = _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in));
-        const __m512i second =
-            _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1));
-
-        /*  2. Shift by one 16-bit word to align low surrogates with high
-           surrogates in
-           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0000.0000.0000.1101.10bb.bbbb.bbbb|
-            shifted
-           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
-        */
-        const __m512i shifted_first = _mm512_alignr_epi32(second, first, 1);
-        const __m512i shifted_second =
-            _mm512_alignr_epi32(_mm512_setzero_si512(), second, 1);
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-        /*  3. Align all high surrogates in first and second by shifting to the
-           left by 10 bits
-            |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
-        */
-        const __m512i aligned_first =
-            _mm512_mask_slli_epi32(first, (__mmask16)H, first, 10);
-        const __m512i aligned_second =
-            _mm512_mask_slli_epi32(second, (__mmask16)(H >> 16), second, 10);
+}; // struct utf8_checker
+} // namespace utf8_validation
 
-        /*  4. Remove surrogate prefixes and add offset 0x10000 by adding in,
-           shifted and constant in
-           |0000.0000.0000.0000.1101.11aa.aaaa.aaaa|0000.0011.0110.bbbb.bbbb.bb00.0000.0000|
-            shifted
-           |????.????.????.????.????.????.????.????|0000.0000.0000.0000.1101.11aa.aaaa.aaaa|
-            constant|1111.1100.1010.0000.0010.0100.0000.0000|1111.1100.1010.0000.0010.0100.0000.0000|
-        */
-        const __m512i constant = _mm512_set1_epi32((uint32_t)0xfca02400);
-        const __m512i added_first = _mm512_mask_add_epi32(
-            aligned_first, (__mmask16)H, aligned_first, shifted_first);
-        const __m512i utf32_first = _mm512_mask_add_epi32(
-            added_first, (__mmask16)H, added_first, constant);
+using utf8_validation::utf8_checker;
 
-        const __m512i added_second =
-            _mm512_mask_add_epi32(aligned_second, (__mmask16)(H >> 16),
-                                  aligned_second, shifted_second);
-        const __m512i utf32_second = _mm512_mask_add_epi32(
-            added_second, (__mmask16)(H >> 16), added_second, constant);
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+/* begin file src/generic/utf8_validation/utf8_validator.h */
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_validation {
 
-        //  5. Store all valid UTF-32 code units (low surrogate positions and
-        //  32nd word are invalid)
-        const __mmask32 valid = ~L & 0x7fffffff;
-        // We deliberately do a _mm512_maskz_compress_epi32 followed by
-        // storeu_epi32 to ease performance portability to Zen 4.
-        const __m512i compressed_first =
-            _mm512_maskz_compress_epi32((__mmask16)(valid), utf32_first);
-        const size_t howmany1 = count_ones((uint16_t)(valid));
-        _mm512_storeu_si512((__m512i *)utf32_output, compressed_first);
-        utf32_output += howmany1;
-        const __m512i compressed_second =
-            _mm512_maskz_compress_epi32((__mmask16)(valid >> 16), utf32_second);
-        const size_t howmany2 = count_ones((uint16_t)(valid >> 16));
-        // The following could be unsafe in some cases?
-        //_mm512_storeu_epi32((__m512i *) utf32_output, compressed_second);
-        _mm512_mask_storeu_epi32((__m512i *)utf32_output,
-                                 __mmask16((1 << howmany2) - 1),
-                                 compressed_second);
-        utf32_output += howmany2;
-        // Only process 31 code units, but keep track if the 31st word is a high
-        // surrogate as a carry
-        buf += 31;
-        carry = (H >> 30) & 0x1;
-      } else {
-        // invalid case
-        return std::make_tuple(buf + carry, utf32_output, false);
-      }
-    } else {
-      // no surrogates
-      // extend all thirty-two 16-bit code units to thirty-two 32-bit code units
-      _mm512_storeu_si512((__m512i *)(utf32_output),
-                          _mm512_cvtepu16_epi32(_mm512_castsi512_si256(in)));
-      _mm512_storeu_si512(
-          (__m512i *)(utf32_output) + 1,
-          _mm512_cvtepu16_epi32(_mm512_extracti32x8_epi32(in, 1)));
-      utf32_output += 32;
-      buf += 32;
-      carry = 0;
-    }
-  } // while
-  return std::make_tuple(buf + carry, utf32_output, true);
-}
-/* end file src/icelake/icelake_convert_utf16_to_utf32.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf32_to_latin1.inl.cpp */
-// file included directly
-size_t icelake_convert_utf32_to_latin1(const char32_t *buf, size_t len,
-                                       char *latin1_output) {
-  const char32_t *end = buf + len;
-  __m512i v_0xFF = _mm512_set1_epi32(0xff);
-  __m512i shufmask = _mm512_set_epi8(
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60,
-      56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4, 0);
-  while (end - buf >= 16) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
-      return 0;
-    }
-    _mm_storeu_si128(
-        (__m128i *)latin1_output,
-        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
-    latin1_output += 16;
-    buf += 16;
-  }
-  if (buf < end) {
-    uint16_t mask = uint16_t((1 << (end - buf)) - 1);
-    __m512i in = _mm512_maskz_loadu_epi32(mask, buf);
-    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
-      return 0;
-    }
-    _mm_mask_storeu_epi8(
-        latin1_output, mask,
-        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+/**
+ * Validates that the string is actual UTF-8.
+ */
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    reader.advance();
   }
-  return len;
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-std::pair<result, char *>
-icelake_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
-                                            char *latin1_output) {
-  const char32_t *end = buf + len;
-  const char32_t *start = buf;
-  __m512i v_0xFF = _mm512_set1_epi32(0xff);
-  __m512i shufmask = _mm512_set_epi8(
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60,
-      56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4, 0);
-  while (end - buf >= 16) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
-      while (uint32_t(*buf) <= 0xff) {
-        *latin1_output++ = uint8_t(*buf++);
-      }
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            latin1_output);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
+
+/**
+ * Validates that the string is actual UTF-8 and stops on errors.
+ */
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    if (c.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
+      res.count += count;
+      return res;
     }
-    _mm_storeu_si128(
-        (__m128i *)latin1_output,
-        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
-    latin1_output += 16;
-    buf += 16;
+    reader.advance();
+    count += 64;
   }
-  if (buf < end) {
-    uint16_t mask = uint16_t((1 << (end - buf)) - 1);
-    __m512i in = _mm512_maskz_loadu_epi32(mask, buf);
-    if (_mm512_cmpgt_epu32_mask(in, v_0xFF)) {
-      while (uint32_t(*buf) <= 0xff) {
-        *latin1_output++ = uint8_t(*buf++);
-      }
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            latin1_output);
-    }
-    _mm_mask_storeu_epi8(
-        latin1_output, mask,
-        _mm512_castsi512_si128(_mm512_permutexvar_epi8(shufmask, in)));
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
   }
-  return std::make_pair(result(error_code::SUCCESS, len), latin1_output);
 }
-/* end file src/icelake/icelake_convert_utf32_to_latin1.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf32_to_utf8.inl.cpp */
-// file included directly
-
-// Todo: currently, this is just the haswell code, optimize for icelake kernel.
-std::pair<const char32_t *, char *>
-avx512_convert_utf32_to_utf8(const char32_t *buf, size_t len,
-                             char *utf8_output) {
-  const char32_t *end = buf + len;
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
-  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
-  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
-  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
-  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
-  __m256i running_max = _mm256_setzero_si256();
-  __m256i forbidden_bytemask = _mm256_setzero_si256();
-
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
-    running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
-    // saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
-                                        _mm256_and_si256(nextin, v_7fffffff));
-    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    running_or |= in;
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
+}
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits
-    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
 
-    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    if (!in.is_ascii()) {
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
+      return result(res.error, count + res.count);
     }
-    // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
-
-    // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+    reader.advance();
 
-      // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  if (!in.is_ascii()) {
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
+    return result(res.error, count + res.count);
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
 
-      // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
-      // 4. pack the bytes
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
 
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
+} // namespace utf8_validation
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_validator.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_to_utf16 {
+using namespace simd;
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-      // 6. adjust pointers
-      buf += 16;
-      continue;
-    }
-    // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
-        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
-    if (saturation_bitmask == 0xffffffff) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(
-          forbidden_bytemask,
-          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-      /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
 
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
-#undef simdutf_vec
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
 
-      // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
-        continue;
-      }*/
-      const uint8_t mask0 = uint8_t(mask);
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_to_utf16 {
 
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+using namespace simd;
 
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-      utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
-    } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
-      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD may require
-      // large, non-trivial tables?
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
-          *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr, utf8_output);
-          }
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else { // 4-byte
-          if (word > 0x10FFFF) {
-            return std::make_pair(nullptr, utf8_output);
-          }
-          *utf8_output++ = char((word >> 18) | 0b11110000);
-          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
+template <endianness endian>
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
+      pos += 64;
+    } else {
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
-      buf += k;
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
-  } // while
-
-  // check for invalid input
-  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
-          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
-    return std::make_pair(nullptr, utf8_output);
   }
-
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
-    return std::make_pair(nullptr, utf8_output);
-  }
-
-  return std::make_pair(buf, utf8_output);
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
 }
 
-// Todo: currently, this is just the haswell code, optimize for icelake kernel.
-std::pair<result, char *>
-avx512_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
-                                         char *utf8_output) {
-  const char32_t *end = buf + len;
-  const char32_t *start = buf;
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
-  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
-  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
-  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
-  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
-  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_to_utf32 {
+using namespace simd;
 
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
-    // Check for too large input
-    const __m256i max_input =
-        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
-    if (static_cast<uint32_t>(_mm256_movemask_epi8(
-            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            utf8_output);
-    }
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
-    // saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
-                                        _mm256_and_si256(nextin, v_7fffffff));
-    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits
-    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
-    }
-    // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-    // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
 
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-      // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
 
-      // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
-      // 4. pack the bytes
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf32_output += howmany;
+    }
+    return utf32_output - start;
+  }
 
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf32_output - start);
+  }
 
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
-      // 6. adjust pointers
-      buf += 16;
-      continue;
-    }
-    // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
-        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
-    if (saturation_bitmask == 0xffffffff) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8_to_utf32 {
 
-      // Check for illegal surrogate code units
-      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      const __m256i forbidden_bytemask =
-          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
-          0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start),
-                              utf8_output);
+using namespace simd;
+
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
       }
+    }
+  }
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
+}
 
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-
-      /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+// other functions
+/* begin file src/generic/utf16.h */
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf16 {
 
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
+template <endianness big_endian>
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
+    }
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+}
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
+template <endianness big_endian>
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
+    }
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
 
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
+}
 
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+template <endianness big_endian>
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
+}
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
-#undef simdutf_vec
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
+  size_t pos = 0;
 
-      // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+  while (pos < size / 32 * 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    input.swap_bytes();
+    input.store(reinterpret_cast<uint16_t *>(output));
+    pos += 32;
+    output += 32;
+  }
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
-        continue;
-      }*/
-      const uint8_t mask0 = uint8_t(mask);
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
+}
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
+} // namespace utf16
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf16.h */
+/* begin file src/generic/utf8.h */
 
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
+namespace simdutf {
+namespace ppc64 {
+namespace {
+namespace utf8 {
 
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+using namespace simd;
 
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-      utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
-    } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
-      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD may require
-      // large, non-trivial tables?
-      size_t forward = 15;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
-          *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
-          *utf8_output++ = char((word >> 6) | 0b11000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k), utf8_output);
-          }
-          *utf8_output++ = char((word >> 12) | 0b11100000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else { // 4-byte
-          if (word > 0x10FFFF) {
-            return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
-          }
-          *utf8_output++ = char((word >> 18) | 0b11110000);
-          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
-          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
-          *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        }
-      }
-      buf += k;
-    }
-  } // while
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
+  }
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
 }
-/* end file src/icelake/icelake_convert_utf32_to_utf8.inl.cpp */
-/* begin file src/icelake/icelake_convert_utf32_to_utf16.inl.cpp */
-// file included directly
+} // namespace utf8
+} // unnamed namespace
+} // namespace ppc64
+} // namespace simdutf
+/* end file src/generic/utf8.h */
 
-// Todo: currently, this is just the haswell code, optimize for icelake kernel.
-template <endianness big_endian>
-std::pair<const char32_t *, char16_t *>
-avx512_convert_utf32_to_utf16(const char32_t *buf, size_t len,
-                              char16_t *utf16_output) {
-  const char32_t *end = buf + len;
+//
+// Implementation-specific overrides
+//
+namespace simdutf {
+namespace ppc64 {
 
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
-  __m256i forbidden_bytemask = _mm256_setzero_si256();
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
+  }
+  // todo: reimplement as a one-pass algorithm.
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
 
-  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+  return out;
+}
 
-    const __m256i v_00000000 = _mm256_setzero_si256();
-    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_utf8(buf, len);
+}
 
-    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
+}
 
-    if (saturation_bitmask == 0xffffffff) {
-      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
-      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(
-          forbidden_bytemask,
-          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_ascii(buf, len);
+}
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
-                                              _mm256_extractf128_si256(in, 1));
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
-      }
-      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
-      utf16_output += 8;
-      buf += 8;
-    } else {
-      size_t forward = 7;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFF0000) == 0) {
-          // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr, utf16_output);
-          }
-          *utf16_output++ =
-              big_endian
-                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
-                  : char16_t(word);
-        } else {
-          // will generate a surrogate pair
-          if (word > 0x10FFFF) {
-            return std::make_pair(nullptr, utf16_output);
-          }
-          word -= 0x10000;
-          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (big_endian) {
-            high_surrogate =
-                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate =
-                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
-          }
-          *utf16_output++ = char16_t(high_surrogate);
-          *utf16_output++ = char16_t(low_surrogate);
-        }
-      }
-      buf += k;
-    }
-  }
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return ppc64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
+}
 
-  // check for invalid input
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
-    return std::make_pair(nullptr, utf16_output);
-  }
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  return scalar::utf16::validate<endianness::LITTLE>(buf, len);
+}
 
-  return std::make_pair(buf, utf16_output);
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  return scalar::utf16::validate<endianness::BIG>(buf, len);
 }
 
-// Todo: currently, this is just the haswell code, optimize for icelake kernel.
-template <endianness big_endian>
-std::pair<result, char16_t *>
-avx512_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
-                                          char16_t *utf16_output) {
-  const char32_t *start = buf;
-  const char32_t *end = buf + len;
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
+}
 
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
+}
 
-  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  return scalar::utf32::validate_with_errors(buf, len);
+}
 
-    const __m256i v_00000000 = _mm256_setzero_si256();
-    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
+simdutf_warn_unused bool
+implementation::validate_utf32(const char16_t *buf, size_t len) const noexcept {
+  return scalar::utf32::validate(buf, len);
+}
 
-    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
+}
 
-    if (saturation_bitmask == 0xffffffff) {
-      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
-      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      const __m256i forbidden_bytemask =
-          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
-          0x0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start),
-                              utf16_output);
-      }
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
+}
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
-                                              _mm256_extractf128_si256(in, 1));
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
-      }
-      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
-      utf16_output += 8;
-      buf += 8;
-    } else {
-      size_t forward = 7;
-      size_t k = 0;
-      if (size_t(end - buf) < forward + 1) {
-        forward = size_t(end - buf - 1);
-      }
-      for (; k < forward; k++) {
-        uint32_t word = buf[k];
-        if ((word & 0xFFFF0000) == 0) {
-          // will not generate a surrogate pair
-          if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k), utf16_output);
-          }
-          *utf16_output++ =
-              big_endian
-                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
-                  : char16_t(word);
-        } else {
-          // will generate a surrogate pair
-          if (word > 0x10FFFF) {
-            return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
-          }
-          word -= 0x10000;
-          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
-          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (big_endian) {
-            high_surrogate =
-                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate =
-                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
-          }
-          *utf16_output++ = char16_t(high_surrogate);
-          *utf16_output++ = char16_t(low_surrogate);
-        }
-      }
-      buf += k;
-    }
-  }
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return result(error_code::OTHER, 0); // stub
+}
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf16_output);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return result(error_code::OTHER, 0); // stub
 }
-/* end file src/icelake/icelake_convert_utf32_to_utf16.inl.cpp */
-/* begin file src/icelake/icelake_ascii_validation.inl.cpp */
-// file included directly
 
-bool validate_ascii(const char *buf, size_t len) {
-  const char *end = buf + len;
-  const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
-  __m512i running_or = _mm512_setzero_si512();
-  for (; end - buf >= 64; buf += 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)buf);
-    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
-                                           0xf8); // running_or | (utf8 & ascii)
-  }
-  if (buf < end) {
-    const __m512i utf8 = _mm512_maskz_loadu_epi8(
-        (uint64_t(1) << (end - buf)) - 1, (const __m512i *)buf);
-    running_or = _mm512_ternarylogic_epi32(running_or, utf8, ascii,
-                                           0xf8); // running_or | (utf8 & ascii)
-  }
-  return (_mm512_test_epi8_mask(running_or, running_or) == 0);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
 }
-/* end file src/icelake/icelake_ascii_validation.inl.cpp */
-/* begin file src/icelake/icelake_utf32_validation.inl.cpp */
-// file included directly
 
-const char32_t *validate_utf32(const char32_t *buf, size_t len) {
-  if (len < 16) {
-    return buf;
-  }
-  const char32_t *end = buf + len - 16;
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char * /*buf*/, size_t /*len*/,
+    char16_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
+}
 
-  const __m512i offset = _mm512_set1_epi32((uint32_t)0xffff2000);
-  __m512i currentmax = _mm512_setzero_si512();
-  __m512i currentoffsetmax = _mm512_setzero_si512();
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
+}
 
-  while (buf <= end) {
-    __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
-    buf += 16;
-    currentoffsetmax =
-        _mm512_max_epu32(_mm512_add_epi32(utf32, offset), currentoffsetmax);
-    currentmax = _mm512_max_epu32(utf32, currentmax);
-  }
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
+  return result(error_code::OTHER, 0); // stub
+}
 
-  const __m512i standardmax = _mm512_set1_epi32((uint32_t)0x10ffff);
-  const __m512i standardoffsetmax = _mm512_set1_epi32((uint32_t)0xfffff7ff);
-  __m512i is_zero =
-      _mm512_xor_si512(_mm512_max_epu32(currentmax, standardmax), standardmax);
-  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
-    return nullptr;
-  }
-  is_zero = _mm512_xor_si512(
-      _mm512_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-  if (_mm512_test_epi8_mask(is_zero, is_zero) != 0) {
-    return nullptr;
-  }
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char * /*buf*/, size_t /*len*/,
+    char32_t * /*utf16_output*/) const noexcept {
+  return 0; // stub
+}
 
-  return buf;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
+                                                            utf8_output);
 }
-/* end file src/icelake/icelake_utf32_validation.inl.cpp */
-/* begin file src/icelake/icelake_convert_latin1_to_utf8.inl.cpp */
-// file included directly
 
-static inline size_t latin1_to_utf8_avx512_vec(__m512i input, size_t input_len,
-                                               char *utf8_output,
-                                               int mask_output) {
-  __mmask64 nonascii = _mm512_movepi8_mask(input);
-  size_t output_size = input_len + (size_t)count_ones(nonascii);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
+}
 
-  // Mask to denote whether the byte is a leading byte that is not ascii
-  __mmask64 sixth = _mm512_cmpge_epu8_mask(
-      input, _mm512_set1_epi8(-64)); // binary representation of -64: 1100 0000
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf8_output);
+}
 
-  const uint64_t alternate_bits = UINT64_C(0x5555555555555555);
-  uint64_t ascii = ~nonascii;
-  // the bits in ascii are inverted and zeros are interspersed in between them
-  uint64_t maskA = ~_pdep_u64(ascii, alternate_bits);
-  uint64_t maskB = ~_pdep_u64(ascii >> 32, alternate_bits);
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+      buf, len, utf8_output);
+}
 
-  // interleave bytes from top and bottom halves (abcd...ABCD -> aAbBcCdD)
-  __m512i input_interleaved = _mm512_permutexvar_epi8(
-      _mm512_set_epi32(0x3f1f3e1e, 0x3d1d3c1c, 0x3b1b3a1a, 0x39193818,
-                       0x37173616, 0x35153414, 0x33133212, 0x31113010,
-                       0x2f0f2e0e, 0x2d0d2c0c, 0x2b0b2a0a, 0x29092808,
-                       0x27072606, 0x25052404, 0x23032202, 0x21012000),
-      input);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
+                                                                  utf8_output);
+}
 
-  // double size of each byte, and insert the leading byte 1100 0010
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
+                                                               utf8_output);
+}
 
-  /*
-  upscale the bytes to 16-bit value, adding the 0b11000000 leading byte in the
-  process. We adjust for the bytes that have their two most significant bits.
-  This takes care of the first 32 bytes, assuming we interleaved the bytes. */
-  __m512i outputA =
-      _mm512_shldi_epi16(input_interleaved, _mm512_set1_epi8(-62), 8);
-  outputA = _mm512_mask_add_epi16(
-      outputA, (__mmask32)sixth, outputA,
-      _mm512_set1_epi16(1 - 0x4000)); // 1- 0x4000 = 1100 0000 0000 0001????
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
+}
 
-  // in the second 32-bit half, set first or second option based on whether
-  // original input is leading byte (second case) or not (first case)
-  __m512i leadingB =
-      _mm512_mask_blend_epi16((__mmask32)(sixth >> 32),
-                              _mm512_set1_epi16(0x00c2),  // 0000 0000 1101 0010
-                              _mm512_set1_epi16(0x40c3)); // 0100 0000 1100 0011
-  __m512i outputB = _mm512_ternarylogic_epi32(
-      input_interleaved, leadingB, _mm512_set1_epi16((short)0xff00),
-      (240 & 170) ^ 204); // (input_interleaved & 0xff00) ^ leadingB
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
+}
 
-  // prune redundant bytes
-  outputA = _mm512_maskz_compress_epi8(maskA, outputA);
-  outputB = _mm512_maskz_compress_epi8(maskB, outputB);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
+}
 
-  size_t output_sizeA = (size_t)count_ones((uint32_t)nonascii) + 32;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
+                                                             utf16_output);
+}
 
-  if (mask_output) {
-    if (input_len > 32) { // is the second half of the input vector used?
-      __mmask64 write_mask = _bzhi_u64(~0ULL, (unsigned int)output_sizeA);
-      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputA);
-      utf8_output += output_sizeA;
-      write_mask = _bzhi_u64(~0ULL, (unsigned int)(output_size - output_sizeA));
-      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputB);
-    } else {
-      __mmask64 write_mask = _bzhi_u64(~0ULL, (unsigned int)output_size);
-      _mm512_mask_storeu_epi8(utf8_output, write_mask, outputA);
-    }
-  } else {
-    _mm512_storeu_si512(utf8_output, outputA);
-    utf8_output += output_sizeA;
-    _mm512_storeu_si512(utf8_output, outputB);
-  }
-  return output_size;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
+                                                          utf16_output);
 }
 
-static inline size_t latin1_to_utf8_avx512_branch(__m512i input,
-                                                  char *utf8_output) {
-  __mmask64 nonascii = _mm512_movepi8_mask(input);
-  if (nonascii) {
-    return latin1_to_utf8_avx512_vec(input, 64, utf8_output, 0);
-  } else {
-    _mm512_storeu_si512(utf8_output, input);
-    return 64;
-  }
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf16_output);
 }
 
-size_t latin1_to_utf8_avx512_start(const char *buf, size_t len,
-                                   char *utf8_output) {
-  char *start = utf8_output;
-  size_t pos = 0;
-  // if there's at least 128 bytes remaining, we don't need to mask the output
-  for (; pos + 128 <= len; pos += 64) {
-    __m512i input = _mm512_loadu_si512((__m512i *)(buf + pos));
-    utf8_output += latin1_to_utf8_avx512_branch(input, utf8_output);
-  }
-  // in the last 128 bytes, the first 64 may require masking the output
-  if (pos + 64 <= len) {
-    __m512i input = _mm512_loadu_si512((__m512i *)(buf + pos));
-    utf8_output += latin1_to_utf8_avx512_vec(input, 64, utf8_output, 1);
-    pos += 64;
-  }
-  // with the last 64 bytes, the input also needs to be masked
-  if (pos < len) {
-    __mmask64 load_mask = _bzhi_u64(~0ULL, (unsigned int)(len - pos));
-    __m512i input = _mm512_maskz_loadu_epi8(load_mask, (__m512i *)(buf + pos));
-    utf8_output += latin1_to_utf8_avx512_vec(input, len - pos, utf8_output, 1);
-  }
-  return (size_t)(utf8_output - start);
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+      buf, len, utf16_output);
 }
-/* end file src/icelake/icelake_convert_latin1_to_utf8.inl.cpp */
-/* begin file src/icelake/icelake_convert_latin1_to_utf16.inl.cpp */
-// file included directly
-template <endianness big_endian>
-size_t icelake_convert_latin1_to_utf16(const char *latin1_input, size_t len,
-                                       char16_t *utf16_output) {
-  size_t rounded_len = len & ~0x1F; // Round down to nearest multiple of 32
 
-  __m512i byteflip = _mm512_setr_epi64(0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809,
-                                       0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  for (size_t i = 0; i < rounded_len; i += 32) {
-    // Load 32 Latin1 characters into a 256-bit register
-    __m256i in = _mm256_loadu_si256((__m256i *)&latin1_input[i]);
-    // Zero extend each set of 8 Latin1 characters to 32 16-bit integers
-    __m512i out = _mm512_cvtepu8_epi16(in);
-    if (big_endian) {
-      out = _mm512_shuffle_epi8(out, byteflip);
-    }
-    // Store the results back to memory
-    _mm512_storeu_si512((__m512i *)&utf16_output[i], out);
-  }
-  if (rounded_len != len) {
-    uint32_t mask = uint32_t(1 << (len - rounded_len)) - 1;
-    __m256i in = _mm256_maskz_loadu_epi8(mask, latin1_input + rounded_len);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
+      buf, len, utf16_output);
+}
 
-    // Zero extend each set of 8 Latin1 characters to 32 16-bit integers
-    __m512i out = _mm512_cvtepu8_epi16(in);
-    if (big_endian) {
-      out = _mm512_shuffle_epi8(out, byteflip);
-    }
-    // Store the results back to memory
-    _mm512_mask_storeu_epi16(utf16_output + rounded_len, mask, out);
-  }
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
+                                                                utf16_output);
+}
 
-  return len;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
+                                                             utf32_output);
 }
-/* end file src/icelake/icelake_convert_latin1_to_utf16.inl.cpp */
-/* begin file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
-std::pair<const char *, char32_t *>
-avx512_convert_latin1_to_utf32(const char *buf, size_t len,
-                               char32_t *utf32_output) {
-  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
 
-  for (size_t i = 0; i < rounded_len; i += 16) {
-    // Load 16 Latin1 characters into a 128-bit register
-    __m128i in = _mm_loadu_si128((__m128i *)&buf[i]);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
+                                                          utf32_output);
+}
 
-    // Zero extend each set of 8 Latin1 characters to 16 32-bit integers using
-    // vpmovzxbd
-    __m512i out = _mm512_cvtepu8_epi32(in);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+      buf, len, utf32_output);
+}
 
-    // Store the results back to memory
-    _mm512_storeu_si512((__m512i *)&utf32_output[i], out);
-  }
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+      buf, len, utf32_output);
+}
 
-  // Return pointers pointing to where we left off
-  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
+      buf, len, utf32_output);
 }
-/* end file src/icelake/icelake_convert_latin1_to_utf32.inl.cpp */
-/* begin file src/icelake/icelake_base64.inl.cpp */
-// file included directly
-/**
- * References and further reading:
- *
- * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
- * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
- * https://arxiv.org/abs/1910.05109
- *
- * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
- * Instructions, ACM Transactions on the Web 12 (3), 2018.
- * https://arxiv.org/abs/1704.00605
- *
- * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
- * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
- * Request for Comments: 4648.
- *
- * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
- * http://www.alfredklomp.com/programming/sse-base64/. (2014).
- *
- * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
- * acceleration. https://github.com/aklomp/base64. (2014).
- *
- * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
- * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
- *
- * Nick Kopp. 2013. Base64 Encoding on a GPU.
- * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
- */
 
-struct block64 {
-  __m512i chunks[1];
-};
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
+                                                                utf32_output);
+}
 
-template <bool base64_url>
-size_t encode_base64(char *dst, const char *src, size_t srclen,
-                     base64_options options) {
-  // credit: Wojciech Muła
-  const uint8_t *input = (const uint8_t *)src;
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  scalar::utf16::change_endianness_utf16(input, length, output);
+}
 
-  uint8_t *out = (uint8_t *)dst;
-  static const char *lookup_tbl =
-      base64_url
-          ? "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
-          : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
+}
 
-  const __m512i shuffle_input = _mm512_setr_epi32(
-      0x01020001, 0x04050304, 0x07080607, 0x0a0b090a, 0x0d0e0c0d, 0x10110f10,
-      0x13141213, 0x16171516, 0x191a1819, 0x1c1d1b1c, 0x1f201e1f, 0x22232122,
-      0x25262425, 0x28292728, 0x2b2c2a2b, 0x2e2f2d2e);
-  const __m512i lookup =
-      _mm512_loadu_si512(reinterpret_cast<const __m512i *>(lookup_tbl));
-  const __m512i multi_shifts = _mm512_set1_epi64(UINT64_C(0x3036242a1016040a));
-  size_t size = srclen;
-  __mmask64 input_mask = 0xffffffffffff; // (1 << 48) - 1
-  while (size >= 48) {
-    const __m512i v = _mm512_maskz_loadu_epi8(
-        input_mask, reinterpret_cast<const __m512i *>(input));
-    const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
-    const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
-    const __m512i result = _mm512_permutexvar_epi8(indices, lookup);
-    _mm512_storeu_si512(reinterpret_cast<__m512i *>(out), result);
-    out += 64;
-    input += 48;
-    size -= 48;
-  }
-  input_mask = ((__mmask64)1 << size) - 1;
-  const __m512i v = _mm512_maskz_loadu_epi8(
-      input_mask, reinterpret_cast<const __m512i *>(input));
-  const __m512i in = _mm512_permutexvar_epi8(shuffle_input, v);
-  const __m512i indices = _mm512_multishift_epi64_epi8(multi_shifts, in);
-  bool padding_needed =
-      (((options & base64_url) == 0) ^
-       ((options & base64_reverse_padding) == base64_reverse_padding));
-  size_t padding_amount = ((size % 3) > 0) ? (3 - (size % 3)) : 0;
-  size_t output_len = ((size + 2) / 3) * 4;
-  size_t non_padded_output_len = output_len - padding_amount;
-  if (!padding_needed) {
-    output_len = non_padded_output_len;
-  }
-  __mmask64 output_mask = output_len == 64 ? (__mmask64)UINT64_MAX
-                                           : ((__mmask64)1 << output_len) - 1;
-  __m512i result = _mm512_mask_permutexvar_epi8(
-      _mm512_set1_epi8('='), ((__mmask64)1 << non_padded_output_len) - 1,
-      indices, lookup);
-  _mm512_mask_storeu_epi8(reinterpret_cast<__m512i *>(out), output_mask,
-                          result);
-  return (size_t)(out - (uint8_t *)dst) + output_len;
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-template <bool base64_url>
-static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
-  __m512i input = b->chunks[0];
-  const __m512i ascii_space_tbl = _mm512_set_epi8(
-      0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10,
-      9, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0,
-      0, 0, 32, 0, 0, 13, 12, 0, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0, 32);
-  __m512i lookup0;
-  if (base64_url) {
-    lookup0 = _mm512_set_epi8(
-        -128, -128, -128, -128, -128, -128, 61, 60, 59, 58, 57, 56, 55, 54, 53,
-        52, -128, -128, 62, -128, -128, -128, -128, -128, -128, -128, -128,
-        -128, -128, -128, -128, -1, -128, -128, -128, -128, -128, -128, -128,
-        -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -1,
-        -128, -128, -1, -1, -128, -128, -128, -128, -128, -128, -128, -128, -1);
-  } else {
-    lookup0 = _mm512_set_epi8(
-        -128, -128, -128, -128, -128, -128, 61, 60, 59, 58, 57, 56, 55, 54, 53,
-        52, 63, -128, -128, -128, 62, -128, -128, -128, -128, -128, -128, -128,
-        -128, -128, -128, -1, -128, -128, -128, -128, -128, -128, -128, -128,
-        -128, -128, -128, -128, -128, -128, -128, -128, -128, -128, -1, -128,
-        -128, -1, -1, -128, -128, -128, -128, -128, -128, -128, -128, -128);
-  }
-  __m512i lookup1;
-  if (base64_url) {
-    lookup1 = _mm512_set_epi8(
-        -128, -128, -128, -128, -128, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42,
-        41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, -128,
-        63, -128, -128, -128, -128, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,
-        14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -128);
-  } else {
-    lookup1 = _mm512_set_epi8(
-        -128, -128, -128, -128, -128, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42,
-        41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, -128,
-        -128, -128, -128, -128, -128, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16,
-        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -128);
-  }
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
+}
 
-  const __m512i translated = _mm512_permutex2var_epi8(lookup0, input, lookup1);
-  const __m512i combined = _mm512_or_si512(translated, input);
-  const __mmask64 mask = _mm512_movepi8_mask(combined);
-  if (mask) {
-    const __mmask64 spaces = _mm512_cmpeq_epi8_mask(
-        _mm512_shuffle_epi8(ascii_space_tbl, input), input);
-    *error = (mask ^ spaces);
-  }
-  b->chunks[0] = translated;
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
+                                                                   length);
+}
 
-  return mask;
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-static inline void copy_block(block64 *b, char *output) {
-  _mm512_storeu_si512(reinterpret_cast<__m512i *>(output), b->chunks[0]);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
+                                                                    length);
 }
 
-static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
-  uint64_t nmask = ~mask;
-  __m512i c = _mm512_maskz_compress_epi8(nmask, b->chunks[0]);
-  _mm512_storeu_si512(reinterpret_cast<__m512i *>(output), c);
-  return _mm_popcnt_u64(nmask);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
 }
 
-// The caller of this function is responsible to ensure that there are 64 bytes
-// available from reading at src. The data is read into a block64 structure.
-static inline void load_block(block64 *b, const char *src) {
-  b->chunks[0] = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return scalar::utf8::utf16_length_from_utf8(input, length);
 }
 
-// The caller of this function is responsible to ensure that there are 128 bytes
-// available from reading at src. The data is read into a block64 structure.
-static inline void load_block(block64 *b, const char16_t *src) {
-  __m512i m1 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src));
-  __m512i m2 = _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src + 32));
-  __m512i p = _mm512_packus_epi16(m1, m2);
-  b->chunks[0] =
-      _mm512_permutexvar_epi64(_mm512_setr_epi64(0, 2, 4, 6, 1, 3, 5, 7), p);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  return scalar::utf32::utf8_length_from_utf32(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  return scalar::utf32::utf16_length_from_utf32(input, length);
 }
 
-static inline void base64_decode(char *out, __m512i str) {
-  const __m512i merge_ab_and_bc =
-      _mm512_maddubs_epi16(str, _mm512_set1_epi32(0x01400140));
-  const __m512i merged =
-      _mm512_madd_epi16(merge_ab_and_bc, _mm512_set1_epi32(0x00011000));
-  const __m512i pack = _mm512_set_epi8(
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 60, 61, 62, 56, 57, 58,
-      52, 53, 54, 48, 49, 50, 44, 45, 46, 40, 41, 42, 36, 37, 38, 32, 33, 34,
-      28, 29, 30, 24, 25, 26, 20, 21, 22, 16, 17, 18, 12, 13, 14, 8, 9, 10, 4,
-      5, 6, 0, 1, 2);
-  const __m512i shuffled = _mm512_permutexvar_epi8(pack, merged);
-  _mm512_mask_storeu_epi8(
-      (__m512i *)out, 0xffffffffffff,
-      shuffled); // mask would be 0xffffffffffff since we write 48 bytes.
-}
-// decode 64 bytes and output 48 bytes
-static inline void base64_decode_block(char *out, const char *src) {
-  base64_decode(out,
-                _mm512_loadu_si512(reinterpret_cast<const __m512i *>(src)));
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return scalar::utf8::count_code_points(input, length);
 }
-static inline void base64_decode_block(char *out, block64 *b) {
-  base64_decode(out, b->chunks[0]);
+
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
 }
 
-template <bool base64_url, typename chartype>
-full_result
-compress_decode_base64(char *dst, const chartype *src, size_t srclen,
-                       base64_options options,
-                       last_chunk_handling_options last_chunk_options) {
-  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
-                                        : tables::base64::to_base64_value;
-  size_t equallocation =
-      srclen; // location of the first padding character if any
-  size_t equalsigns = 0;
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
   // skip trailing spaces
-  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
-         to_base64[uint8_t(src[srclen - 1])] == 64) {
-    srclen--;
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
   }
-  if (srclen > 0 && src[srclen - 1] == '=') {
-    equallocation = srclen - 1;
-    srclen--;
-    equalsigns = 1;
-    // skip trailing spaces
-    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
-           to_base64[uint8_t(src[srclen - 1])] == 64) {
-      srclen--;
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
     }
-    if (srclen > 0 && src[srclen - 1] == '=') {
-      equallocation = srclen - 1;
-      srclen--;
-      equalsigns = 2;
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
     }
   }
-  if (srclen == 0) {
+  if (length == 0) {
     if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+      return {INVALID_BASE64_CHARACTER, equallocation};
     }
-    return {SUCCESS, 0, 0};
+    return {SUCCESS, 0};
   }
-  const chartype *const srcinit = src;
-  const char *const dstinit = dst;
-  const chartype *const srcend = src + srclen;
-
-  // figure out why block_size == 2 is sometimes best???
-  constexpr size_t block_size = 6;
-  char buffer[block_size * 64];
-  char *bufferptr = buffer;
-  if (srclen >= 64) {
-    const chartype *const srcend64 = src + srclen - 64;
-    while (src <= srcend64) {
-      block64 b;
-      load_block(&b, src);
-      src += 64;
-      uint64_t error = 0;
-      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
-      if (error) {
-        src -= 64;
-        size_t error_offset = _tzcnt_u64(error);
-        return {error_code::INVALID_BASE64_CHARACTER,
-                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
-      }
-      if (badcharmask != 0) {
-        // optimization opportunity: check for simple masks like those made of
-        // continuous 1s followed by continuous 0s. And masks containing a
-        // single bad character.
-        bufferptr += compress_block(&b, badcharmask, bufferptr);
-      } else if (bufferptr != buffer) {
-        copy_block(&b, bufferptr);
-        bufferptr += 64;
-      } else {
-        base64_decode_block(dst, &b);
-        dst += 48;
-      }
-      if (bufferptr >= (block_size - 1) * 64 + buffer) {
-        for (size_t i = 0; i < (block_size - 1); i++) {
-          base64_decode_block(dst, buffer + i * 64);
-          dst += 48;
-        }
-        std::memcpy(buffer, buffer + (block_size - 1) * 64,
-                    64); // 64 might be too much
-        bufferptr -= (block_size - 1) * 64;
-      }
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
+  return r;
+}
 
-  char *buffer_start = buffer;
-  // Optimization note: if this is almost full, then it is worth our
-  // time, otherwise, we should just decode directly.
-  int last_block = (int)((bufferptr - buffer_start) % 64);
-  if (last_block != 0 && srcend - src + last_block >= 64) {
-
-    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
-      uint8_t val = to_base64[uint8_t(*src)];
-      *bufferptr = char(val);
-      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
-        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
-                size_t(dst - dstinit)};
-      }
-      bufferptr += (val <= 63);
-      src++;
-    }
-  }
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
 
-  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
-    base64_decode_block(dst, buffer_start);
-    dst += 48;
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  // skip trailing spaces
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
   }
-  if ((bufferptr - buffer_start) % 64 != 0) {
-    while (buffer_start + 4 < bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 4);
-      dst += 3;
-      buffer_start += 4;
-    }
-    if (buffer_start + 4 <= bufferptr) {
-      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
-                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
-                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
-                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
-                        << 8;
-      triple = scalar::utf32::swap_bytes(triple);
-      std::memcpy(dst, &triple, 3);
-      dst += 3;
-      buffer_start += 4;
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
     }
-    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
-    // backtrack
-    int leftover = int(bufferptr - buffer_start);
-    while (leftover > 0) {
-      while (to_base64[uint8_t(*(src - 1))] == 64) {
-        src--;
-      }
-      src--;
-      leftover--;
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
     }
   }
-  if (src < srcend + equalsigns) {
-    full_result r = scalar::base64::base64_tail_decode(
-        dst, src, srcend - src, equalsigns, options, last_chunk_options);
-    r.input_count += size_t(src - srcinit);
-    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
-        r.error == error_code::BASE64_EXTRA_BITS) {
-      return r;
-    } else {
-      r.output_count += size_t(dst - dstinit);
-    }
-    if (last_chunk_options != stop_before_partial &&
-        r.error == error_code::SUCCESS && equalsigns > 0) {
-      // additional checks
-      if ((r.output_count % 3 == 0) ||
-          ((r.output_count % 3) + 1 + equalsigns != 4)) {
-        r.error = error_code::INVALID_BASE64_CHARACTER;
-        r.input_count = equallocation;
-      }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
     }
-    return r;
+    return {SUCCESS, 0};
   }
-  if (equalsigns > 0) {
-    if ((size_t(dst - dstinit) % 3 == 0) ||
-        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
     }
   }
-  return {SUCCESS, srclen, size_t(dst - dstinit)};
+  return r;
 }
-/* end file src/icelake/icelake_base64.inl.cpp */
 
-#include <cstdint>
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
 
-} // namespace
-} // namespace icelake
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  return scalar::base64::binary_to_base64(input, length, output, options);
+}
+} // namespace ppc64
 } // namespace simdutf
 
+/* begin file src/simdutf/ppc64/end.h */
+/* end file src/simdutf/ppc64/end.h */
+/* end file src/ppc64/implementation.cpp */
+#endif
+#if SIMDUTF_IMPLEMENTATION_RVV
+/* begin file src/rvv/implementation.cpp */
+
+
+
+
+
+/* begin file src/simdutf/rvv/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "rvv"
+// #define SIMDUTF_IMPLEMENTATION rvv
+
+#if SIMDUTF_CAN_ALWAYS_RUN_RVV
+// nothing needed.
+#else
+SIMDUTF_TARGET_RVV
+#endif
+/* end file src/simdutf/rvv/begin.h */
 namespace simdutf {
-namespace icelake {
+namespace rvv {
+namespace {
+#ifndef SIMDUTF_RVV_H
+  #error "rvv.h must be included"
+#endif
 
-simdutf_warn_unused int
-implementation::detect_encodings(const char *input,
-                                 size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  // todo: convert to a one-pass algorithm
-  if (bom_encoding != encoding_type::unspecified) {
-    return bom_encoding;
+} // unnamed namespace
+} // namespace rvv
+} // namespace simdutf
+
+//
+// Implementation-specific overrides
+//
+namespace simdutf {
+namespace rvv {
+/* begin file src/rvv/rvv_helpers.inl.cpp */
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf32_store_utf16_m4(uint16_t *dst, vuint32m4_t utf32, size_t vl,
+                         vbool4_t m4even) {
+  /* convert [000000000000aaaa|aaaaaabbbbbbbbbb]
+   * to      [110111bbbbbbbbbb|110110aaaaaaaaaa] */
+  vuint32m4_t sur = __riscv_vsub_vx_u32m4(utf32, 0x10000, vl);
+  sur = __riscv_vor_vv_u32m4(__riscv_vsll_vx_u32m4(sur, 16, vl),
+                             __riscv_vsrl_vx_u32m4(sur, 10, vl), vl);
+  sur = __riscv_vand_vx_u32m4(sur, 0x3FF03FF, vl);
+  sur = __riscv_vor_vx_u32m4(sur, 0xDC00D800, vl);
+  /* merge 1 byte utf32 and 2 byte sur */
+  vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(utf32, 0xFFFF, vl);
+  vuint16m4_t utf32_16 = __riscv_vreinterpret_v_u32m4_u16m4(
+      __riscv_vmerge_vvm_u32m4(utf32, sur, m4, vl));
+  /* compress and store */
+  vbool4_t mOut = __riscv_vmor_mm_b4(
+      __riscv_vmsne_vx_u16m4_b4(utf32_16, 0, vl * 2), m4even, vl * 2);
+  vuint16m4_t vout = __riscv_vcompress_vm_u16m4(utf32_16, mOut, vl * 2);
+  vl = __riscv_vcpop_m_b4(mOut, vl * 2);
+  __riscv_vse16_v_u16m4(dst, simdutf_byteflip<bflip>(vout, vl), vl);
+  return vl;
+};
+/* end file src/rvv/rvv_helpers.inl.cpp */
+
+/* begin file src/rvv/rvv_length_from.inl.cpp */
+
+simdutf_warn_unused size_t
+implementation::count_utf16le(const char16_t *src, size_t len) const noexcept {
+  return utf32_length_from_utf16le(src, len);
+}
+
+simdutf_warn_unused size_t
+implementation::count_utf16be(const char16_t *src, size_t len) const noexcept {
+  return utf32_length_from_utf16be(src, len);
+}
+
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *src, size_t len) const noexcept {
+  return utf32_length_from_utf8(src, len);
+}
+
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *src, size_t len) const noexcept {
+  return utf32_length_from_utf8(src, len);
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t len) const noexcept {
+  return len;
+}
+
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t len) const noexcept {
+  return len;
+}
+
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t len) const noexcept {
+  return len;
+}
+
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t len) const noexcept {
+  return len;
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *src, size_t len) const noexcept {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m8(len);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
+    vbool1_t mask = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
+    count += __riscv_vcpop_m_b1(mask, vl);
   }
-  int out = 0;
-  if (validate_utf8(input, length)) {
-    out |= encoding_type::UTF8;
+  return count;
+}
+
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf32_length_from_utf16(const char16_t *src, size_t len) {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    v = simdutf_byteflip<bflip>(v, vl);
+    vbool2_t notHigh =
+        __riscv_vmor_mm_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl),
+                           __riscv_vmsltu_vx_u16m8_b2(v, 0xDC00, vl), vl);
+    count += __riscv_vcpop_m_b2(notHigh, vl);
   }
-  if ((length % 2) == 0) {
-    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
-                         length / 2)) {
-      out |= encoding_type::UTF16_LE;
-    }
+  return count;
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *src, size_t len) const noexcept {
+  return rvv_utf32_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *src, size_t len) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf32_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
+  else
+    return rvv_utf32_length_from_utf16<simdutf_ByteFlip::V>(src, len);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *src, size_t len) const noexcept {
+  size_t count = len;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m8(len);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
+    count += __riscv_vcpop_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
   }
-  if ((length % 4) == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
-      out |= encoding_type::UTF32_LE;
-    }
+  return count;
+}
+
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_utf8_length_from_utf16(const char16_t *src, size_t len) {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    v = simdutf_byteflip<bflip>(v, vl);
+    vbool2_t m234 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7F, vl);
+    vbool2_t m34 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7FF, vl);
+    vbool2_t notSur =
+        __riscv_vmor_mm_b2(__riscv_vmsltu_vx_u16m8_b2(v, 0xD800, vl),
+                           __riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl), vl);
+    vbool2_t m3 = __riscv_vmand_mm_b2(m34, notSur, vl);
+    count += vl + __riscv_vcpop_m_b2(m234, vl) + __riscv_vcpop_m_b2(m3, vl);
+  }
+  return count;
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *src, size_t len) const noexcept {
+  return rvv_utf8_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *src, size_t len) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf8_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
+  else
+    return rvv_utf8_length_from_utf16<simdutf_ByteFlip::V>(src, len);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *src, size_t len) const noexcept {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e32m8(len);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
+    vbool4_t m234 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7F, vl);
+    vbool4_t m34 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7FF, vl);
+    vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
+    count += vl + __riscv_vcpop_m_b4(m234, vl) + __riscv_vcpop_m_b4(m34, vl) +
+             __riscv_vcpop_m_b4(m4, vl);
+  }
+  return count;
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *src, size_t len) const noexcept {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m8(len);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
+    vbool1_t m1234 = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
+    vbool1_t m4 = __riscv_vmsgtu_vx_u8m8_b1(__riscv_vreinterpret_u8m8(v),
+                                            (uint8_t)0b11101111, vl);
+    count += __riscv_vcpop_m_b1(m1234, vl) + __riscv_vcpop_m_b1(m4, vl);
+  }
+  return count;
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *src, size_t len) const noexcept {
+  size_t count = 0;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e32m8(len);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
+    vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
+    count += vl + __riscv_vcpop_m_b4(m4, vl);
+  }
+  return count;
+}
+/* end file src/rvv/rvv_length_from.inl.cpp */
+/* begin file src/rvv/rvv_validate.inl.cpp */
+
+
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *src, size_t len) const noexcept {
+  size_t vlmax = __riscv_vsetvlmax_e8m8();
+  vint8m8_t mask = __riscv_vmv_v_x_i8m8(0, vlmax);
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m8(len);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
+    mask = __riscv_vor_vv_i8m8_tu(mask, mask, v, vl);
   }
-  return out;
+  return __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(mask, 0, vlmax), vlmax) <
+         0;
 }
 
-simdutf_warn_unused bool
-implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return true;
-  }
-  avx512_utf8_checker checker{};
-  const char *ptr = buf;
-  const char *end = ptr + len;
-  for (; end - ptr >= 64; ptr += 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    checker.check_next_input(utf8);
-  }
-  if (end != ptr) {
-    const __m512i utf8 = _mm512_maskz_loadu_epi8(
-        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
-    checker.check_next_input(utf8);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *src, size_t len) const noexcept {
+  const char *beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m8(len);
+    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
+    long idx = __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
+    if (idx >= 0)
+      return result(error_code::TOO_LARGE, src - beg + idx);
   }
-  checker.check_eof();
-  return !checker.errors();
+  return result(error_code::SUCCESS, src - beg);
 }
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(
-    const char *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return result(error_code::SUCCESS, len);
-  }
-  avx512_utf8_checker checker{};
-  const char *ptr = buf;
-  const char *end = ptr + len;
-  size_t count{0};
-  for (; end - ptr >= 64; ptr += 64) {
-    const __m512i utf8 = _mm512_loadu_si512((const __m512i *)ptr);
-    checker.check_next_input(utf8);
-    if (checker.errors()) {
-      if (count != 0) {
-        count--;
-      } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(
-          reinterpret_cast<const char *>(buf),
-          reinterpret_cast<const char *>(buf + count), len - count);
-      res.count += count;
-      return res;
-    }
-    count += 64;
-  }
-  if (end != ptr) {
-    const __m512i utf8 = _mm512_maskz_loadu_epi8(
-        ~UINT64_C(0) >> (64 - (end - ptr)), (const __m512i *)ptr);
-    checker.check_next_input(utf8);
+/* Returns a close estimation of the number of valid UTF-8 bytes up to the
+ * first invalid one, but never overestimating. */
+simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src,
+                                                         size_t len) {
+  const char *beg = src;
+  if (len < 32)
+    return 0;
+
+  /* validate first three bytes */
+  {
+    size_t idx = 3;
+    while (idx < len && (src[idx] >> 6) == 0b10)
+      ++idx;
+    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
+      return 0;
   }
-  checker.check_eof();
-  if (checker.errors()) {
-    if (count != 0) {
-      count--;
-    } // Sometimes the error is only detected in the next chunk
-    result res = scalar::utf8::rewind_and_validate_with_errors(
-        reinterpret_cast<const char *>(buf),
-        reinterpret_cast<const char *>(buf + count), len - count);
-    res.count += count;
-    return res;
+
+  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
+  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
+  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
+
+  const vuint8m1_t err1tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
+  const vuint8m1_t err2tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
+  const vuint8m1_t err3tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
+
+  size_t tail = 3;
+  size_t n = len - tail;
+
+  for (size_t vl; n > 0; n -= vl, src += vl) {
+    vl = __riscv_vsetvl_e8m4(n);
+    vuint8m4_t v0 = __riscv_vle8_v_u8m4((uint8_t const *)src, vl);
+
+    uint8_t next0 = src[vl + 0];
+    uint8_t next1 = src[vl + 1];
+    uint8_t next2 = src[vl + 2];
+
+    /* fast path: ASCII */
+    if (__riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u8m4_b2(v0, 0b01111111, vl), vl) <
+            0 &&
+        (next0 | next1 | next2) < 0b10000000)
+      continue;
+
+    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
+     * https://arxiv.org/abs/2010.03090 */
+    vuint8m4_t v1 = __riscv_vslide1down_vx_u8m4(v0, next0, vl);
+    vuint8m4_t v2 = __riscv_vslide1down_vx_u8m4(v1, next1, vl);
+    vuint8m4_t v3 = __riscv_vslide1down_vx_u8m4(v2, next2, vl);
+
+    vuint8m4_t s1 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
+        __riscv_vreinterpret_v_u8m4_u16m4(v2), 4, __riscv_vsetvlmax_e16m4()));
+    vuint8m4_t s3 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
+        __riscv_vreinterpret_v_u8m4_u16m4(v3), 4, __riscv_vsetvlmax_e16m4()));
+
+    vuint8m4_t idx2 = __riscv_vand_vx_u8m4(v2, 0xF, vl);
+    vuint8m4_t idx1 = __riscv_vand_vx_u8m4(s1, 0xF, vl);
+    vuint8m4_t idx3 = __riscv_vand_vx_u8m4(s3, 0xF, vl);
+
+    vuint8m4_t err1 = simdutf_vrgather_u8m1x4(err1tbl, idx1);
+    vuint8m4_t err2 = simdutf_vrgather_u8m1x4(err2tbl, idx2);
+    vuint8m4_t err3 = simdutf_vrgather_u8m1x4(err3tbl, idx3);
+    vint8m4_t errs = __riscv_vreinterpret_v_u8m4_i8m4(
+        __riscv_vand_vv_u8m4(__riscv_vand_vv_u8m4(err1, err2, vl), err3, vl));
+
+    vbool2_t is_3 = __riscv_vmsgtu_vx_u8m4_b2(v1, 0b11100000 - 1, vl);
+    vbool2_t is_4 = __riscv_vmsgtu_vx_u8m4_b2(v0, 0b11110000 - 1, vl);
+    vbool2_t is_34 = __riscv_vmor_mm_b2(is_3, is_4, vl);
+    vbool2_t err34 =
+        __riscv_vmxor_mm_b2(is_34, __riscv_vmslt_vx_i8m4_b2(errs, 0, vl), vl);
+    vbool2_t errm =
+        __riscv_vmor_mm_b2(__riscv_vmsgt_vx_i8m4_b2(errs, 0, vl), err34, vl);
+    if (__riscv_vfirst_m_b2(errm, vl) >= 0)
+      break;
   }
-  return result(error_code::SUCCESS, len);
+
+  /* we need to validate the last character */
+  while (tail < len && (src[0] >> 6) == 0b10)
+    --src, ++tail;
+  return src - beg;
 }
 
 simdutf_warn_unused bool
-implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return icelake::validate_ascii(buf, len);
+implementation::validate_utf8(const char *src, size_t len) const noexcept {
+  size_t count = rvv_count_valid_utf8(src, len);
+  return scalar::utf8::validate(src + count, len - count);
 }
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(
-    const char *buf, size_t len) const noexcept {
-  const char *buf_orig = buf;
-  const char *end = buf + len;
-  const __m512i ascii = _mm512_set1_epi8((uint8_t)0x80);
-  for (; end - buf >= 64; buf += 64) {
-    const __m512i input = _mm512_loadu_si512((const __m512i *)buf);
-    __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
-    if (notascii) {
-      return result(error_code::TOO_LARGE,
-                    buf - buf_orig + _tzcnt_u64(notascii));
-    }
-  }
-  if (end != buf) {
-    const __m512i input = _mm512_maskz_loadu_epi8(
-        ~UINT64_C(0) >> (64 - (end - buf)), (const __m512i *)buf);
-    __mmask64 notascii = _mm512_cmp_epu8_mask(input, ascii, _MM_CMPINT_NLT);
-    if (notascii) {
-      return result(error_code::TOO_LARGE,
-                    buf - buf_orig + _tzcnt_u64(notascii));
-    }
-  }
-  return result(error_code::SUCCESS, len);
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *src, size_t len) const noexcept {
+  size_t count = rvv_count_valid_utf8(src, len);
+  result res = scalar::utf8::validate_with_errors(src + count, len - count);
+  return result(res.error, count + res.count);
 }
 
 simdutf_warn_unused bool
-implementation::validate_utf16le(const char16_t *buf,
+implementation::validate_utf16le(const char16_t *src,
                                  size_t len) const noexcept {
-  const char16_t *end = buf + len;
-
-  for (; end - buf >= 32;) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        return false;
-      }
-      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-      if (ends_with_high) {
-        buf += 31; // advance only by 31 code units so that we start with the
-                   // high surrogate on the next round.
-      } else {
-        buf += 32;
-      }
-    } else {
-      buf += 32;
-    }
-  }
-  if (buf < end) {
-    __m512i in =
-        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        return false;
-      }
-    }
-  }
-  return true;
+  return validate_utf16le_with_errors(src, len).error == error_code::SUCCESS;
 }
 
 simdutf_warn_unused bool
-implementation::validate_utf16be(const char16_t *buf,
+implementation::validate_utf16be(const char16_t *src,
                                  size_t len) const noexcept {
-  const char16_t *end = buf + len;
-  const __m512i byteflip = _mm512_setr_epi64(
-      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-      0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  for (; end - buf >= 32;) {
-    __m512i in =
-        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        return false;
-      }
-      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-      if (ends_with_high) {
-        buf += 31; // advance only by 31 code units so that we start with the
-                   // high surrogate on the next round.
-      } else {
-        buf += 32;
-      }
-    } else {
-      buf += 32;
+  return validate_utf16be_with_errors(src, len).error == error_code::SUCCESS;
+}
+
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_validate_utf16_with_errors(const char16_t *src, size_t len) {
+  const char16_t *beg = src;
+  uint16_t last = 0;
+  for (size_t vl; len > 0;
+       len -= vl, src += vl, last = simdutf_byteflip<bflip>(src[-1])) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v1 = __riscv_vle16_v_u16m8((const uint16_t *)src, vl);
+    v1 = simdutf_byteflip<bflip>(v1, vl);
+    vuint16m8_t v0 = __riscv_vslide1up_vx_u16m8(v1, last, vl);
+
+    vbool2_t surhi = __riscv_vmseq_vx_u16m8_b2(
+        __riscv_vand_vx_u16m8(v0, 0xFC00, vl), 0xD800, vl);
+    vbool2_t surlo = __riscv_vmseq_vx_u16m8_b2(
+        __riscv_vand_vx_u16m8(v1, 0xFC00, vl), 0xDC00, vl);
+
+    long idx = __riscv_vfirst_m_b2(__riscv_vmxor_mm_b2(surhi, surlo, vl), vl);
+    if (idx >= 0) {
+      last = idx > 0 ? simdutf_byteflip<bflip>(src[idx - 1]) : last;
+      return result(error_code::SURROGATE,
+                    src - beg + idx - (last - 0xD800u < 0x400u));
+      break;
     }
   }
-  if (buf < end) {
-    __m512i in = _mm512_shuffle_epi8(
-        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
-        byteflip);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        return false;
-      }
-    }
+  if (last - 0xD800u < 0x400u) {
+    return result(error_code::SURROGATE,
+                  src - beg - 1); /* end on high surrogate */
+  } else {
+    return result(error_code::SUCCESS, src - beg);
   }
-  return true;
 }
 
 simdutf_warn_unused result implementation::validate_utf16le_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  const char16_t *start_buf = buf;
-  const char16_t *end = buf + len;
-  for (; end - buf >= 32;) {
-    __m512i in = _mm512_loadu_si512((__m512i *)buf);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
-        uint32_t extra_high =
-            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
-        return result(error_code::SURROGATE,
-                      (buf - start_buf) +
-                          (extra_low < extra_high ? extra_low : extra_high));
-      }
-      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-      if (ends_with_high) {
-        buf += 31; // advance only by 31 code units so that we start with the
-                   // high surrogate on the next round.
-      } else {
-        buf += 32;
-      }
-    } else {
-      buf += 32;
-    }
-  }
-  if (buf < end) {
-    __m512i in =
-        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
-        uint32_t extra_high =
-            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
-        return result(error_code::SURROGATE,
-                      (buf - start_buf) +
-                          (extra_low < extra_high ? extra_low : extra_high));
-      }
-    }
-  }
-  return result(error_code::SUCCESS, len);
+    const char16_t *src, size_t len) const noexcept {
+  return rvv_validate_utf16_with_errors<simdutf_ByteFlip::NONE>(src, len);
 }
 
 simdutf_warn_unused result implementation::validate_utf16be_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  const char16_t *start_buf = buf;
-  const char16_t *end = buf + len;
-  const __m512i byteflip = _mm512_setr_epi64(
-      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-      0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  for (; end - buf >= 32;) {
-    __m512i in =
-        _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)buf), byteflip);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
-        uint32_t extra_high =
-            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
-        return result(error_code::SURROGATE,
-                      (buf - start_buf) +
-                          (extra_low < extra_high ? extra_low : extra_high));
-      }
-      bool ends_with_high = ((highsurrogates & 0x80000000) != 0);
-      if (ends_with_high) {
-        buf += 31; // advance only by 31 code units so that we start with the
-                   // high surrogate on the next round.
-      } else {
-        buf += 32;
-      }
-    } else {
-      buf += 32;
-    }
-  }
-  if (buf < end) {
-    __m512i in = _mm512_shuffle_epi8(
-        _mm512_maskz_loadu_epi16((1U << (end - buf)) - 1, (__m512i *)buf),
-        byteflip);
-    __m512i diff = _mm512_sub_epi16(in, _mm512_set1_epi16(uint16_t(0xD800)));
-    __mmask32 surrogates =
-        _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0800)));
-    if (surrogates) {
-      __mmask32 highsurrogates =
-          _mm512_cmplt_epu16_mask(diff, _mm512_set1_epi16(uint16_t(0x0400)));
-      __mmask32 lowsurrogates = surrogates ^ highsurrogates;
-      // high must be followed by low
-      if ((highsurrogates << 1) != lowsurrogates) {
-        uint32_t extra_low = _tzcnt_u32(lowsurrogates & ~(highsurrogates << 1));
-        uint32_t extra_high =
-            _tzcnt_u32(highsurrogates & ~(lowsurrogates >> 1));
-        return result(error_code::SURROGATE,
-                      (buf - start_buf) +
-                          (extra_low < extra_high ? extra_low : extra_high));
-      }
-    }
-  }
-  return result(error_code::SUCCESS, len);
+    const char16_t *src, size_t len) const noexcept {
+  if (supports_zvbb())
+    return rvv_validate_utf16_with_errors<simdutf_ByteFlip::ZVBB>(src, len);
+  else
+    return rvv_validate_utf16_with_errors<simdutf_ByteFlip::V>(src, len);
 }
 
 simdutf_warn_unused bool
-implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
-  const char32_t *tail = icelake::validate_utf32(buf, len);
-  if (tail) {
-    return scalar::utf32::validate(tail, len - (tail - buf));
-  } else {
-    // we come here if there was an error, or buf was nullptr which may happen
-    // for empty input.
-    return len == 0;
+implementation::validate_utf32(const char32_t *src, size_t len) const noexcept {
+  size_t vlmax = __riscv_vsetvlmax_e32m8();
+  vuint32m8_t max = __riscv_vmv_v_x_u32m8(0x10FFFF, vlmax);
+  vuint32m8_t maxOff = __riscv_vmv_v_x_u32m8(0xFFFFF7FF, vlmax);
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e32m8(len);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
+    vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
+    max = __riscv_vmaxu_vv_u32m8_tu(max, max, v, vl);
+    maxOff = __riscv_vmaxu_vv_u32m8_tu(maxOff, maxOff, off, vl);
   }
+  return __riscv_vfirst_m_b4(
+             __riscv_vmor_mm_b4(
+                 __riscv_vmsne_vx_u32m8_b4(max, 0x10FFFF, vlmax),
+                 __riscv_vmsne_vx_u32m8_b4(maxOff, 0xFFFFF7FF, vlmax), vlmax),
+             vlmax) < 0;
 }
 
 simdutf_warn_unused result implementation::validate_utf32_with_errors(
-    const char32_t *buf, size_t len) const noexcept {
-  const char32_t *buf_orig = buf;
-  if (len >= 16) {
-    const char32_t *end = buf + len - 16;
-    while (buf <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i *)buf);
-      __mmask16 outside_range = _mm512_cmp_epu32_mask(
-          utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
-
-      __m512i utf32_off =
-          _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
-
-      __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
-          utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
-      if ((outside_range | surrogate_range)) {
-        auto outside_idx = _tzcnt_u32(outside_range);
-        auto surrogate_idx = _tzcnt_u32(surrogate_range);
-
-        if (outside_idx < surrogate_idx) {
-          return result(error_code::TOO_LARGE, buf - buf_orig + outside_idx);
-        }
-
-        return result(error_code::SURROGATE, buf - buf_orig + surrogate_idx);
-      }
-
-      buf += 16;
-    }
-  }
-  if (len > 0) {
-    __m512i utf32 = _mm512_maskz_loadu_epi32(
-        __mmask16((1U << (buf_orig + len - buf)) - 1), (const __m512i *)buf);
-    __mmask16 outside_range = _mm512_cmp_epu32_mask(
-        utf32, _mm512_set1_epi32(0x10ffff), _MM_CMPINT_GT);
-    __m512i utf32_off = _mm512_add_epi32(utf32, _mm512_set1_epi32(0xffff2000));
-
-    __mmask16 surrogate_range = _mm512_cmp_epu32_mask(
-        utf32_off, _mm512_set1_epi32(0xfffff7ff), _MM_CMPINT_GT);
-    if ((outside_range | surrogate_range)) {
-      auto outside_idx = _tzcnt_u32(outside_range);
-      auto surrogate_idx = _tzcnt_u32(surrogate_range);
-
-      if (outside_idx < surrogate_idx) {
-        return result(error_code::TOO_LARGE, buf - buf_orig + outside_idx);
+    const char32_t *src, size_t len) const noexcept {
+  const char32_t *beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl) {
+    vl = __riscv_vsetvl_e32m8(len);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
+    vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
+    long idx1 =
+        __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 0x10FFFF, vl), vl);
+    long idx2 = __riscv_vfirst_m_b4(
+        __riscv_vmsgtu_vx_u32m8_b4(off, 0xFFFFF7FF, vl), vl);
+    if (idx1 >= 0 && idx2 >= 0) {
+      if (idx1 <= idx2) {
+        return result(error_code::TOO_LARGE, src - beg + idx1);
+      } else {
+        return result(error_code::SURROGATE, src - beg + idx2);
       }
-
-      return result(error_code::SURROGATE, buf - buf_orig + surrogate_idx);
+    }
+    if (idx1 >= 0) {
+      return result(error_code::TOO_LARGE, src - beg + idx1);
+    }
+    if (idx2 >= 0) {
+      return result(error_code::SURROGATE, src - beg + idx2);
     }
   }
-
-  return result(error_code::SUCCESS, len);
+  return result(error_code::SUCCESS, src - beg);
 }
+/* end file src/rvv/rvv_validate.inl.cpp */
+
+/* begin file src/rvv/rvv_latin1_to.inl.cpp */
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
-    const char *buf, size_t len, char *utf8_output) const noexcept {
-  return icelake::latin1_to_utf8_avx512_start(buf, len, utf8_output);
+    const char *src, size_t len, char *dst) const noexcept {
+  char *beg = dst;
+  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    vbool4_t nascii =
+        __riscv_vmslt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v1), 0, vl);
+    size_t cnt = __riscv_vcpop_m_b4(nascii, vl);
+    vlOut = vl + cnt;
+    if (cnt == 0) {
+      __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
+      continue;
+    }
+
+    vuint8m2_t v0 =
+        __riscv_vor_vx_u8m2(__riscv_vsrl_vx_u8m2(v1, 6, vl), 0b11000000, vl);
+    v1 = __riscv_vand_vx_u8m2_mu(nascii, v1, v1, 0b10111111, vl);
+
+    vuint8m4_t wide =
+        __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vwmaccu_vx_u16m4(
+            __riscv_vwaddu_vv_u16m4(v0, v1, vl), 0xFF, v1, vl));
+    vbool2_t mask = __riscv_vmsgtu_vx_u8m4_b2(
+        __riscv_vsub_vx_u8m4(wide, 0b11000000, vl * 2), 1, vl * 2);
+    vuint8m4_t comp = __riscv_vcompress_vm_u8m4(wide, mask, vl * 2);
+
+    __riscv_vse8_v_u8m4((uint8_t *)dst, comp, vlOut);
+  }
+  return dst - beg;
 }
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return icelake_convert_latin1_to_utf16<endianness::LITTLE>(buf, len,
-                                                             utf16_output);
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  char16_t *beg = dst;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e8m4(len);
+    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
+    __riscv_vse16_v_u16m8((uint16_t *)dst, __riscv_vzext_vf2_u16m8(v, vl), vl);
+  }
+  return dst - beg;
 }
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return icelake_convert_latin1_to_utf16<endianness::BIG>(buf, len,
-                                                          utf16_output);
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  char16_t *beg = dst;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e8m4(len);
+    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
+    __riscv_vse16_v_u16m8(
+        (uint16_t *)dst,
+        __riscv_vsll_vx_u16m8(__riscv_vzext_vf2_u16m8(v, vl), 8, vl), vl);
+  }
+  return dst - beg;
 }
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::pair<const char *, char32_t *> ret =
-      avx512_convert_latin1_to_utf32(buf, len, utf32_output);
-  if (ret.first == nullptr) {
-    return 0;
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  char32_t *beg = dst;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    __riscv_vse32_v_u32m8((uint32_t *)dst, __riscv_vzext_vf4_u32m8(v, vl), vl);
   }
-  size_t converted_chars = ret.second - utf32_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_converted_chars == 0) {
-      return 0;
-    }
-    converted_chars += scalar_converted_chars;
+  return dst - beg;
+}
+/* end file src/rvv/rvv_latin1_to.inl.cpp */
+/* begin file src/rvv/rvv_utf16_to.inl.cpp */
+#include <cstdio>
+
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) {
+  const char16_t *const beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    v = simdutf_byteflip<bflip>(v, vl);
+    long idx = __riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 255, vl), vl);
+    if (idx >= 0)
+      return result(error_code::TOO_LARGE, src - beg + idx);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
   }
-  return converted_chars;
+  return result(error_code::SUCCESS, src - beg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake::utf8_to_latin1_avx512(buf, len, latin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf16le_to_latin1_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  // First, try to convert as much as possible using the SIMD implementation.
-  const char *obuf = buf;
-  char *olatin1_output = latin1_output;
-  size_t written = icelake::utf8_to_latin1_avx512(obuf, len, olatin1_output);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf16be_to_latin1_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
+}
 
-  // If we have completely converted the string
-  if (obuf == buf + len) {
-    return {simdutf::SUCCESS, written};
-  }
-  size_t pos = obuf - buf;
-  result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-      pos, buf + pos, len - pos, latin1_output);
-  res.count += pos;
-  return res;
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
-    const char *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake::valid_utf8_to_latin1_avx512(buf, len, latin1_output);
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                   dst);
+  else
+    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16_result ret =
-      fast_avx512_convert_utf8_to_utf16<endianness::LITTLE>(buf, len,
-                                                            utf16_output);
-  if (ret.second == nullptr) {
-    return 0;
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  const char16_t *const beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
   }
-  return ret.second - utf16_output;
+  return src - beg;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16_result ret = fast_avx512_convert_utf8_to_utf16<endianness::BIG>(
-      buf, len, utf16_output);
-  if (ret.second == nullptr) {
-    return 0;
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  const char16_t *const beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vnsrl_wx_u8m4(v, 8, vl), vl);
   }
-  return ret.second - utf16_output;
+  return src - beg;
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::LITTLE>(
-      buf, len, utf16_output);
-}
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) {
+  size_t n = len;
+  const char16_t *srcBeg = src;
+  const char *dstBeg = dst;
+  size_t vl8m4 = __riscv_vsetvlmax_e8m4();
+  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
+      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return fast_avx512_convert_utf8_to_utf16_with_errors<endianness::BIG>(
-      buf, len, utf16_output);
-}
+  for (size_t vl, vlOut; n > 0;) {
+    vl = __riscv_vsetvl_e16m2(n);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16_result ret =
-      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, char16_t>(
-          buf, len, utf16_output);
-  size_t saved_bytes = ret.second - utf16_output;
-  const char *end = buf + len;
-  if (ret.first == end) {
-    return saved_bytes;
-  }
+    vuint16m2_t v = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
+    v = simdutf_byteflip<bflip>(v, vl);
+    vbool8_t m234 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x80 - 1, vl);
 
-  // Note: AVX512 procedure looks up 4 bytes forward, and
-  //       correctly converts multi-byte chars even if their
-  //       continuation bytes lie outsiede 16-byte window.
-  //       It meas, we have to skip continuation bytes from
-  //       the beginning ret.first, as they were already consumed.
-  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-    ret.first += 1;
-  }
+    if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
+      vlOut = vl;
+      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(v, vlOut),
+                          vlOut);
+      n -= vl, src += vl, dst += vlOut;
+      continue;
+    }
 
-  if (ret.first != end) {
-    const size_t scalar_saved_bytes =
-        scalar::utf8_to_utf16::convert_valid<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+    vbool8_t m34 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x800 - 1, vl);
+
+    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
+      /* 0: [     aaa|aabbbbbb]
+       * 1: [aabbbbbb|        ] vsll 8
+       * 2: [        |   aaaaa] vsrl 6
+       * 3: [00111111|00011111]
+       * 4: [  bbbbbb|000aaaaa] (1|2)&3
+       * 5: [11000000|11000000]
+       * 6: [10bbbbbb|110aaaaa] 4|5 */
+      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
+          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(v, 8, vl),
+                               __riscv_vsrl_vx_u16m2(v, 6, vl), vl),
+          0b0011111100011111, vl);
+      vuint16m2_t vout16 =
+          __riscv_vor_vx_u16m2_mu(m234, v, twoByte, 0b1000000011000000, vl);
+      vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
+
+      /* Every high byte that is zero should be compressed
+       * low bytes should never be compressed, so we set them
+       * to all ones, and then create a non-zero bytes mask */
+      vbool4_t mcomp =
+          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
+                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
+                                   0, vl * 2);
+      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
+
+      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
+      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
+
+      n -= vl, src += vl, dst += vlOut;
+      continue;
     }
-    saved_bytes += scalar_saved_bytes;
-  }
 
-  return saved_bytes;
-}
+    vbool8_t sur = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v, 0xF800, vl), 0xD800, vl);
+    long first = __riscv_vfirst_m_b8(sur, vl);
+    size_t tail = vl - first;
+    vl = first < 0 ? vl : first;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
-    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
-  utf8_to_utf16_result ret =
-      icelake::valid_utf8_to_fixed_length<endianness::BIG, char16_t>(
-          buf, len, utf16_output);
-  size_t saved_bytes = ret.second - utf16_output;
-  const char *end = buf + len;
-  if (ret.first == end) {
-    return saved_bytes;
-  }
+    if (vl > 0) { /* 1/2/3 byte utf8 */
+      /* in: [aaaabbbb|bbcccccc]
+       * v1: [0bcccccc|        ] vsll  8
+       * v1: [10cccccc|        ] vsll  8 & 0b00111111 | 0b10000000
+       * v2: [        |110bbbbb] vsrl  6 & 0b00111111 | 0b11000000
+       * v2: [        |10bbbbbb] vsrl  6 & 0b00111111 | 0b10000000
+       * v3: [        |1110aaaa] vsrl 12 | 0b11100000
+       *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
+       *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
+       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
+       * [10cccccc]
+       */
+      vuint16m2_t v1, v2, v3, v12;
+      v1 = __riscv_vor_vx_u16m2_mu(
+          m234, v, __riscv_vand_vx_u16m2(v, 0b00111111, vl), 0b10000000, vl);
+      v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
 
-  // Note: AVX512 procedure looks up 4 bytes forward, and
-  //       correctly converts multi-byte chars even if their
-  //       continuation bytes lie outsiede 16-byte window.
-  //       It meas, we have to skip continuation bytes from
-  //       the beginning ret.first, as they were already consumed.
-  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-    ret.first += 1;
-  }
+      v2 = __riscv_vor_vx_u16m2(
+          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 6, vl), 0b00111111,
+                                vl),
+          0b10000000, vl);
+      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
+                                   0b01000000, vl);
+      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 12, vl), 0b11100000,
+                                vl);
+      v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
 
-  if (ret.first != end) {
-    const size_t scalar_saved_bytes =
-        scalar::utf8_to_utf16::convert_valid<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
+      vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
+      vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
+
+      vbool2_t mcomp = __riscv_vmor_mm_b2(
+          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
+      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
+
+      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
+      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
+
+      n -= vl, src += vl, dst += vlOut;
     }
-    saved_bytes += scalar_saved_bytes;
+
+    if (tail)
+      while (n) {
+        uint16_t word = simdutf_byteflip<bflip>(src[0]);
+        if ((word & 0xFF80) == 0) {
+          break;
+        } else if ((word & 0xF800) == 0) {
+          break;
+        } else if ((word & 0xF800) != 0xD800) {
+          break;
+        } else {
+          // must be a surrogate pair
+          if (n <= 1)
+            return result(error_code::SURROGATE, src - srcBeg);
+          uint16_t diff = word - 0xD800;
+          if (diff > 0x3FF)
+            return result(error_code::SURROGATE, src - srcBeg);
+          uint16_t diff2 = simdutf_byteflip<bflip>(src[1]) - 0xDC00;
+          if (diff2 > 0x3FF)
+            return result(error_code::SURROGATE, src - srcBeg);
+
+          uint32_t value = ((diff + 0x40) << 10) + diff2;
+
+          // will generate four UTF-8 bytes
+          // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
+          *dst++ = (char)((value >> 18) | 0b11110000);
+          *dst++ = (char)(((value >> 12) & 0b111111) | 0b10000000);
+          *dst++ = (char)(((value >> 6) & 0b111111) | 0b10000000);
+          *dst++ = (char)((value & 0b111111) | 0b10000000);
+          src += 2;
+          n -= 2;
+        }
+      }
   }
 
-  return saved_bytes;
+  return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
-  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  utf8_to_utf32_result ret =
-      icelake::validating_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
-          buf, len, utf32_output);
-  if (ret.second == nullptr)
-    return 0;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf16le_to_utf8_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
+}
 
-  size_t saved_bytes = ret.second - utf32_output;
-  const char *end = buf + len;
-  if (ret.first == end) {
-    return saved_bytes;
-  }
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf16be_to_utf8_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
+}
 
-  // Note: the AVX512 procedure looks up 4 bytes forward, and
-  //       correctly converts multi-byte chars even if their
-  //       continuation bytes lie outside 16-byte window.
-  //       It means, we have to skip continuation bytes from
-  //       the beginning ret.first, as they were already consumed.
-  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-    ret.first += 1;
-  }
-  if (ret.first != end) {
-    const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert(
-        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
+}
 
-  return saved_bytes;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
+  else
+    return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
-    const char *buf, size_t len, char32_t *utf32) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
-    return {error_code::SUCCESS, 0};
-  }
-  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32);
-  auto ret = icelake::validating_utf8_to_fixed_length_with_constant_checks<
-      endianness::LITTLE, uint32_t>(buf, len, utf32_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  return convert_utf16le_to_utf8(src, len, dst);
+}
 
-  if (!std::get<2>(ret)) {
-    size_t pos = std::get<0>(ret) - buf;
-    // We might have an error that occurs right before  pos.
-    // This is only a concern if buf[pos] is not a continuation byte.
-    if ((buf[pos] & 0xc0) != 0x80 && pos >= 64) {
-      pos -= 1;
-    } else if ((buf[pos] & 0xc0) == 0x80 && pos >= 64) {
-      // We must check whether we are the fourth continuation byte
-      bool c1 = (buf[pos - 1] & 0xc0) == 0x80;
-      bool c2 = (buf[pos - 2] & 0xc0) == 0x80;
-      bool c3 = (buf[pos - 3] & 0xc0) == 0x80;
-      if (c1 && c2 && c3) {
-        return {simdutf::TOO_LONG, pos};
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *src, size_t len, char *dst) const noexcept {
+  return convert_utf16be_to_utf8(src, len, dst);
+}
+
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_utf16_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) {
+  const char16_t *const srcBeg = src;
+  char32_t *const dstBeg = dst;
+
+  constexpr const uint16_t ANY_SURROGATE_MASK = 0xf800;
+  constexpr const uint16_t ANY_SURROGATE_VALUE = 0xd800;
+  constexpr const uint16_t LO_SURROGATE_MASK = 0xfc00;
+  constexpr const uint16_t LO_SURROGATE_VALUE = 0xdc00;
+  constexpr const uint16_t HI_SURROGATE_MASK = 0xfc00;
+  constexpr const uint16_t HI_SURROGATE_VALUE = 0xd800;
+
+  uint16_t last = 0;
+  while (len > 0) {
+    size_t vl = __riscv_vsetvl_e16m2(len);
+    vuint16m2_t v0 = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
+    v0 = simdutf_byteflip<bflip>(v0, vl);
+
+    { // check fast-path
+      const vuint16m2_t v = __riscv_vand_vx_u16m2(v0, ANY_SURROGATE_MASK, vl);
+      const vbool8_t any_surrogate =
+          __riscv_vmseq_vx_u16m2_b8(v, ANY_SURROGATE_VALUE, vl);
+      if (__riscv_vfirst_m_b8(any_surrogate, vl) < 0) {
+        /* no surrogates */
+        __riscv_vse32_v_u32m4((uint32_t *)dst, __riscv_vzext_vf2_u32m4(v0, vl),
+                              vl);
+        len -= vl;
+        src += vl;
+        dst += vl;
+        continue;
       }
     }
-    // todo: we reset the output to utf32 instead of using std::get<2.(ret) as
-    // you'd expect. that is because
-    // validating_utf8_to_fixed_length_with_constant_checks may have processed
-    // data beyond the error.
-    result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-        pos, buf + pos, len - pos, utf32);
-    res.count += pos;
-    return res;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  const char *end = buf + len;
-  if (std::get<0>(ret) == end) {
-    return {simdutf::SUCCESS, saved_bytes};
-  }
 
-  // Note: the AVX512 procedure looks up 4 bytes forward, and
-  //       correctly converts multi-byte chars even if their
-  //       continuation bytes lie outside 16-byte window.
-  //       It means, we have to skip continuation bytes from
-  //       the beginning ret.first, as they were already consumed.
-  while (std::get<0>(ret) != end and
-         ((uint8_t(*std::get<0>(ret)) & 0xc0) == 0x80)) {
-    std::get<0>(ret) += 1;
-  }
+    if ((simdutf_byteflip<bflip>(src[0]) & LO_SURROGATE_MASK) ==
+        LO_SURROGATE_VALUE) {
+      return result(error_code::SURROGATE, src - srcBeg);
+    }
+
+    // decode surrogates
+    vuint16m2_t v1 = __riscv_vslide1down_vx_u16m2(v0, 0, vl);
+    vl = __riscv_vsetvl_e16m2(vl - 1);
+    if (vl == 0) {
+      return result(error_code::SURROGATE, src - srcBeg);
+    }
+
+    const vbool8_t surhi = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v0, HI_SURROGATE_MASK, vl), HI_SURROGATE_VALUE,
+        vl);
+    const vbool8_t surlo = __riscv_vmseq_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v1, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
+        vl);
+
+    // compress everything but lo surrogates
+    const vbool8_t compress = __riscv_vmsne_vx_u16m2_b8(
+        __riscv_vand_vx_u16m2(v0, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
+        vl);
 
-  if (std::get<0>(ret) != end) {
-    auto scalar_result = scalar::utf8_to_utf32::convert_with_errors(
-        std::get<0>(ret), len - (std::get<0>(ret) - buf),
-        reinterpret_cast<char32_t *>(utf32_output) + saved_bytes);
-    if (scalar_result.error != simdutf::SUCCESS) {
-      scalar_result.count += (std::get<0>(ret) - buf);
-    } else {
-      scalar_result.count += saved_bytes;
+    {
+      const vbool8_t diff = __riscv_vmxor_mm_b8(surhi, surlo, vl);
+      const long idx = __riscv_vfirst_m_b8(diff, vl);
+      if (idx >= 0) {
+        uint16_t word = simdutf_byteflip<bflip>(src[idx]);
+        if (word < 0xD800 || word > 0xDBFF) {
+          return result(error_code::SURROGATE, src - srcBeg + idx + 1);
+        }
+        return result(error_code::SURROGATE, src - srcBeg + idx);
+      }
     }
-    return scalar_result;
-  }
 
-  return {simdutf::SUCCESS, size_t(std::get<1>(ret) - utf32_output)};
-}
+    last = simdutf_byteflip<bflip>(src[vl]);
+    vuint32m4_t utf32 = __riscv_vzext_vf2_u32m4(v0, vl);
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
-    const char *buf, size_t len, char32_t *utf32_out) const noexcept {
-  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
-  utf8_to_utf32_result ret =
-      icelake::valid_utf8_to_fixed_length<endianness::LITTLE, uint32_t>(
-          buf, len, utf32_output);
-  size_t saved_bytes = ret.second - utf32_output;
-  const char *end = buf + len;
-  if (ret.first == end) {
-    return saved_bytes;
-  }
+    // v0 = 110110yyyyyyyyyy (0xd800 + yyyyyyyyyy) --- hi surrogate
+    // v1 = 110111xxxxxxxxxx (0xdc00 + xxxxxxxxxx) --- lo surrogate
 
-  // Note: AVX512 procedure looks up 4 bytes forward, and
-  //       correctly converts multi-byte chars even if their
-  //       continuation bytes lie outsiede 16-byte window.
-  //       It meas, we have to skip continuation bytes from
-  //       the beginning ret.first, as they were already consumed.
-  while (ret.first != end && ((uint8_t(*ret.first) & 0xc0) == 0x80)) {
-    ret.first += 1;
-  }
+    // t0 = u16(                    0000_00yy_yyyy_yyyy)
+    const vuint32m4_t t0 =
+        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v0, 0x03ff, vl), vl);
+    // t1 = u32(0000_0000_0000_yyyy_yyyy_yy00_0000_0000)
+    const vuint32m4_t t1 = __riscv_vsll_vx_u32m4(t0, 10, vl);
 
-  if (ret.first != end) {
-    const size_t scalar_saved_bytes = scalar::utf8_to_utf32::convert_valid(
-        ret.first, len - (ret.first - buf), utf32_out + saved_bytes);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
+    // t2 = u32(0000_0000_0000_0000_0000_00xx_xxxx_xxxx)
+    const vuint32m4_t t2 =
+        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v1, 0x03ff, vl), vl);
 
-  return saved_bytes;
-}
+    // t3 = u32(0000_0000_0000_yyyy_yyyy_yyxx_xxxx_xxxx)
+    const vuint32m4_t t3 = __riscv_vor_vv_u32m4(t1, t2, vl);
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
-                                                             latin1_output);
-}
+    // t4 = utf32 from surrogate pairs
+    const vuint32m4_t t4 = __riscv_vadd_vx_u32m4(t3, 0x10000, vl);
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1<endianness::BIG>(buf, len,
-                                                          latin1_output);
-}
+    const vuint32m4_t result = __riscv_vmerge_vvm_u32m4(utf32, t4, surhi, vl);
 
-simdutf_warn_unused result
-implementation::convert_utf16le_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
-             buf, len, latin1_output)
-      .first;
-}
+    const vuint32m4_t comp = __riscv_vcompress_vm_u32m4(result, compress, vl);
+    const size_t vlOut = __riscv_vcpop_m_b8(compress, vl);
+    __riscv_vse32_v_u32m4((uint32_t *)dst, comp, vlOut);
 
-simdutf_warn_unused result
-implementation::convert_utf16be_to_latin1_with_errors(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf16_to_latin1_with_errors<endianness::BIG>(
-             buf, len, latin1_output)
-      .first;
-}
+    len -= vl;
+    src += vl;
+    dst += vlOut;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement custom function
-  return convert_utf16be_to_latin1(buf, len, latin1_output);
-}
+    if ((last & LO_SURROGATE_MASK) == LO_SURROGATE_VALUE) {
+      // last item is lo surrogate and got already consumed
+      len -= 1;
+      src += 1;
+    }
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
-    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement custom function
-  return convert_utf16le_to_latin1(buf, len, latin1_output);
+  return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
-      buf, len, (unsigned char *)utf8_output, &outlen);
-  if (inlen != len) {
-    return 0;
-  }
-  return outlen;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  result res = convert_utf16le_to_utf32_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
-      buf, len, (unsigned char *)utf8_output, &outlen);
-  if (inlen != len) {
-    return 0;
-  }
-  return outlen;
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  result res = convert_utf16be_to_utf32_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::LITTLE>(
-      buf, len, (unsigned char *)utf8_output, &outlen);
-  if (inlen != len) {
-    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-        buf + inlen, len - inlen, utf8_output + outlen);
-    res.count += inlen;
-    return res;
-  }
-  return {simdutf::SUCCESS, outlen};
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  size_t outlen;
-  size_t inlen = utf16_to_utf8_avx512i<endianness::BIG>(
-      buf, len, (unsigned char *)utf8_output, &outlen);
-  if (inlen != len) {
-    result res = scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-        buf + inlen, len - inlen, utf8_output + outlen);
-    res.count += inlen;
-    return res;
-  }
-  return {simdutf::SUCCESS, outlen};
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                  dst);
+  else
+    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::V>(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return convert_utf16le_to_utf8(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  return convert_utf16le_to_utf32(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return convert_utf16be_to_utf8(buf, len, utf8_output);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *src, size_t len, char32_t *dst) const noexcept {
+  return convert_utf16be_to_utf32(src, len, dst);
 }
+/* end file src/rvv/rvv_utf16_to.inl.cpp */
+/* begin file src/rvv/rvv_utf32_to.inl.cpp */
 
 simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf32_to_latin1_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
 simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1_with_errors(buf, len, latin1_output)
-      .first;
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  const char32_t *const beg = src;
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e32m8(len);
+    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
+    long idx = __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 255, vl), vl);
+    if (idx >= 0)
+      return result(error_code::TOO_LARGE, src - beg + idx);
+    /* We don't use vcompress here, because its performance varies widely on
+     * current platforms. This might be worth reconsidering once there is more
+     * hardware available. */
+    __riscv_vse8_v_u8m2(
+        (uint8_t *)dst,
+        __riscv_vncvt_x_x_w_u8m2(__riscv_vncvt_x_x_w_u16m4(v, vl), vl), vl);
+  }
+  return result(error_code::SUCCESS, src - beg);
 }
 
 simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  return icelake_convert_utf32_to_latin1(buf, len, latin1_output);
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  return convert_utf32_to_latin1(src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  std::pair<const char32_t *, char *> ret =
-      avx512_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf8_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  size_t n = len;
+  const char32_t *srcBeg = src;
+  const char *dstBeg = dst;
+  size_t vl8m4 = __riscv_vsetvlmax_e8m4();
+  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
+      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
+
+  for (size_t vl, vlOut; n > 0;) {
+    vl = __riscv_vsetvl_e32m4(n);
+
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t const *)src, vl);
+    vbool8_t m234 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x80 - 1, vl);
+    vuint16m2_t vn = __riscv_vncvt_x_x_w_u16m2(v, vl);
+
+    if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
+      vlOut = vl;
+      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(vn, vlOut),
+                          vlOut);
+      n -= vl, src += vl, dst += vlOut;
+      continue;
     }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char *> ret =
-      icelake::avx512_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
-  if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
-        buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
+    vbool8_t m34 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x800 - 1, vl);
+
+    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
+      /* 0: [     aaa|aabbbbbb]
+       * 1: [aabbbbbb|        ] vsll 8
+       * 2: [        |   aaaaa] vsrl 6
+       * 3: [00111111|00111111]
+       * 4: [  bbbbbb|000aaaaa] (1|2)&3
+       * 5: [10000000|11000000]
+       * 6: [10bbbbbb|110aaaaa] 4|5 */
+      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
+          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(vn, 8, vl),
+                               __riscv_vsrl_vx_u16m2(vn, 6, vl), vl),
+          0b0011111100111111, vl);
+      vuint16m2_t vout16 =
+          __riscv_vor_vx_u16m2_mu(m234, vn, twoByte, 0b1000000011000000, vl);
+      vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
+
+      /* Every high byte that is zero should be compressed
+       * low bytes should never be compressed, so we set them
+       * to all ones, and then create a non-zero bytes mask */
+      vbool4_t mcomp =
+          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
+                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
+                                   0, vl * 2);
+      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
+
+      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
+      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
+
+      n -= vl, src += vl, dst += vlOut;
+      continue;
+    }
+    long idx1 =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
+    vbool8_t sur = __riscv_vmseq_vx_u32m4_b8(
+        __riscv_vand_vx_u32m4(v, 0xFFFFF800, vl), 0xD800, vl);
+    long idx2 = __riscv_vfirst_m_b8(sur, vl);
+    if (idx1 >= 0 && idx2 >= 0) {
+      if (idx1 <= idx2) {
+        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+      } else {
+        return result(error_code::SURROGATE, src - srcBeg + idx2);
+      }
+    }
+    if (idx1 >= 0) {
+      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+    }
+    if (idx2 >= 0) {
+      return result(error_code::SURROGATE, src - srcBeg + idx2);
     }
-  }
-  ret.first.count =
-      ret.second -
-      utf8_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return convert_utf32_to_utf8(buf, len, utf8_output);
-}
+    vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x10000 - 1, vl);
+    long first = __riscv_vfirst_m_b8(m4, vl);
+    size_t tail = vl - first;
+    vl = first < 0 ? vl : first;
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char32_t *, char16_t *> ret =
-      avx512_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
+    if (vl > 0) { /* 1/2/3 byte utf8 */
+      /* vn: [aaaabbbb|bbcccccc]
+       * v1: [0bcccccc|        ] vsll  8
+       * v1: [10cccccc|        ] vsll  8 & 0b00111111 | 0b10000000
+       * v2: [        |110bbbbb] vsrl  6 & 0b00111111 | 0b11000000
+       * v2: [        |10bbbbbb] vsrl  6 & 0b00111111 | 0b10000000
+       * v3: [        |1110aaaa] vsrl 12 | 0b11100000
+       *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
+       *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
+       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
+       * [10cccccc]
+       */
+      vuint16m2_t v1, v2, v3, v12;
+      v1 = __riscv_vor_vx_u16m2_mu(
+          m234, vn, __riscv_vand_vx_u16m2(vn, 0b00111111, vl), 0b10000000, vl);
+      v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
+
+      v2 = __riscv_vor_vx_u16m2(
+          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 6, vl), 0b00111111,
+                                vl),
+          0b10000000, vl);
+      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
+                                   0b01000000, vl);
+      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 12, vl), 0b11100000,
+                                vl);
+      v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
+
+      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
+      vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
+      vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
+
+      vbool2_t mcomp = __riscv_vmor_mm_b2(
+          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
+      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
+
+      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
+      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
+
+      n -= vl, src += vl, dst += vlOut;
     }
-    saved_bytes += scalar_saved_bytes;
+
+    if (tail)
+      while (n) {
+        uint32_t word = src[0];
+        if (word < 0x10000)
+          break;
+        if (word > 0x10FFFF)
+          return result(error_code::TOO_LARGE, src - srcBeg);
+        *dst++ = (uint8_t)((word >> 18) | 0b11110000);
+        *dst++ = (uint8_t)(((word >> 12) & 0b111111) | 0b10000000);
+        *dst++ = (uint8_t)(((word >> 6) & 0b111111) | 0b10000000);
+        *dst++ = (uint8_t)((word & 0b111111) | 0b10000000);
+        ++src;
+        --n;
+      }
   }
-  return saved_bytes;
+
+  return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  std::pair<const char32_t *, char16_t *> ret =
-      avx512_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf16_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf32_to_utf16::convert<endianness::BIG>(
-            ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  result res = convert_utf32_to_utf8_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char16_t *> ret =
-      avx512_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
-          buf, len, utf16_output);
-  if (ret.first.count != len) {
-    result scalar_res =
-        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      utf16_output; // Set count to the number of 8-bit code units written
-  return ret.first;
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *src, size_t len, char *dst) const noexcept {
+  return convert_utf32_to_utf8(src, len, dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char16_t *> ret =
-      avx512_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
-                                                                 utf16_output);
-  if (ret.first.count != len) {
-    result scalar_res =
-        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-            buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static result
+rvv_convert_utf32_to_utf16_with_errors(const char32_t *src, size_t len,
+                                       char16_t *dst) {
+  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
+  const char16_t *dstBeg = dst;
+  const char32_t *srcBeg = src;
+  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
+    vl = __riscv_vsetvl_e32m4(len);
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
+    vuint32m4_t off = __riscv_vadd_vx_u32m4(v, 0xFFFF2000, vl);
+    long idx1 =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
+    long idx2 = __riscv_vfirst_m_b8(
+        __riscv_vmsgtu_vx_u32m4_b8(off, 0xFFFFF7FF, vl), vl);
+    if (idx1 >= 0 && idx2 >= 0) {
+      if (idx1 <= idx2)
+        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+      return result(error_code::SURROGATE, src - srcBeg + idx2);
+    }
+    if (idx1 >= 0)
+      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
+    if (idx2 >= 0)
+      return result(error_code::SURROGATE, src - srcBeg + idx2);
+    long idx =
+        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl);
+    if (idx < 0) {
+      vlOut = vl;
+      vuint16m2_t n =
+          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
+      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
+      continue;
     }
+    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
   }
-  ret.first.count =
-      ret.second -
-      utf16_output; // Set count to the number of 8-bit code units written
-  return ret.first;
+  return result(error_code::SUCCESS, dst - dstBeg);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return convert_utf32_to_utf16le(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  result res = convert_utf32_to_utf16le_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return convert_utf32_to_utf16be(buf, len, utf16_output);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  result res = convert_utf32_to_utf16be_with_errors(src, len, dst);
+  return res.error == error_code::SUCCESS ? res.count : 0;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
-                                                          utf32_output);
-  if (!std::get<2>(ret)) {
-    return 0;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::NONE>(
+      src, len, dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) {
-    return 0;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::BIG>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::ZVBB>(
+        src, len, dst);
+  else
+    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::V>(src, len,
+                                                                       dst);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
-                                                          utf32_output);
-  if (!std::get<2>(ret)) {
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    scalar_res.count += (std::get<0>(ret) - buf);
-    return scalar_res;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_res.error) {
-      scalar_res.count += (std::get<0>(ret) - buf);
-      return scalar_res;
-    } else {
-      scalar_res.count += saved_bytes;
-      return scalar_res;
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static size_t
+rvv_convert_valid_utf32_to_utf16(const char32_t *src, size_t len,
+                                 char16_t *dst) {
+  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
+  char16_t *dstBeg = dst;
+  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
+    vl = __riscv_vsetvl_e32m4(len);
+    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
+    if (__riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl) <
+        0) {
+      vlOut = vl;
+      vuint16m2_t n =
+          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
+      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
+      continue;
     }
+    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
   }
-  return simdutf::result(simdutf::SUCCESS, saved_bytes);
+  return dst - dstBeg;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) {
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    scalar_res.count += (std::get<0>(ret) - buf);
-    return scalar_res;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    result scalar_res =
-        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_res.error) {
-      scalar_res.count += (std::get<0>(ret) - buf);
-      return scalar_res;
-    } else {
-      scalar_res.count += saved_bytes;
-      return scalar_res;
-    }
-  }
-  return simdutf::result(simdutf::SUCCESS, saved_bytes);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::NONE>(src, len,
+                                                                  dst);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
-                                                          utf32_output);
-  if (!std::get<2>(ret)) {
-    return 0;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::ZVBB>(src, len,
+                                                                    dst);
+  else
+    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::V>(src, len, dst);
 }
+/* end file src/rvv/rvv_utf32_to.inl.cpp */
+/* begin file src/rvv/rvv_utf8_to.inl.cpp */
+template <typename Tdst, simdutf_ByteFlip bflip, bool validate = true>
+simdutf_really_inline static size_t rvv_utf8_to_common(char const *src,
+                                                       size_t len, Tdst *dst) {
+  static_assert(std::is_same<Tdst, uint16_t>() ||
+                    std::is_same<Tdst, uint32_t>(),
+                "invalid type");
+  constexpr bool is16 = std::is_same<Tdst, uint16_t>();
+  constexpr endianness endian =
+      bflip == simdutf_ByteFlip::NONE ? endianness::LITTLE : endianness::BIG;
+  const auto scalar = [](char const *in, size_t count, Tdst *out) {
+    return is16 ? scalar::utf8_to_utf16::convert<endian>(in, count,
+                                                         (char16_t *)out)
+                : scalar::utf8_to_utf32::convert(in, count, (char32_t *)out);
+  };
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  std::tuple<const char16_t *, char32_t *, bool> ret =
-      icelake::convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
-  if (!std::get<2>(ret)) {
-    return 0;
-  }
-  size_t saved_bytes = std::get<1>(ret) - utf32_output;
-  if (std::get<0>(ret) != buf + len) {
-    const size_t scalar_saved_bytes =
-        scalar::utf16_to_utf32::convert<endianness::BIG>(
-            std::get<0>(ret), len - (std::get<0>(ret) - buf), std::get<1>(ret));
-    if (scalar_saved_bytes == 0) {
+  if (len < 32)
+    return scalar(src, len, dst);
+
+  /* validate first three bytes */
+  if (validate) {
+    size_t idx = 3;
+    while (idx < len && (src[idx] >> 6) == 0b10)
+      ++idx;
+    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
       return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
   }
-  return saved_bytes;
-}
 
-void implementation::change_endianness_utf16(const char16_t *input,
-                                             size_t length,
-                                             char16_t *output) const noexcept {
-  size_t pos = 0;
-  const __m512i byteflip = _mm512_setr_epi64(
-      0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-      0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-      0x0607040502030001, 0x0e0f0c0d0a0b0809);
-  while (pos + 32 <= length) {
-    __m512i utf16 = _mm512_loadu_si512((const __m512i *)(input + pos));
-    utf16 = _mm512_shuffle_epi8(utf16, byteflip);
-    _mm512_storeu_si512(output + pos, utf16);
-    pos += 32;
-  }
-  if (pos < length) {
-    __mmask32 m((1U << (length - pos)) - 1);
-    __m512i utf16 = _mm512_maskz_loadu_epi16(m, (const __m512i *)(input + pos));
-    utf16 = _mm512_shuffle_epi8(utf16, byteflip);
-    _mm512_mask_storeu_epi16(output + pos, m, utf16);
-  }
-}
+  size_t tail = 3;
+  size_t n = len - tail;
+  Tdst *beg = dst;
 
-simdutf_warn_unused size_t implementation::count_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  const char16_t *ptr = input;
-  size_t count{0};
+  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
+  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
+  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
 
-  if (length >= 32) {
-    const char16_t *end = input + length - 32;
+  const vuint8m1_t err1tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
+  const vuint8m1_t err2tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
+  const vuint8m1_t err3tbl =
+      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
 
-    const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
-    const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
+  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
+  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
+      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
 
-    while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
-      ptr += 32;
-      uint64_t not_high_surrogate =
-          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
-                                _mm512_cmplt_epu16_mask(utf16, low));
-      count += count_ones(not_high_surrogate);
+  for (size_t vl, vlOut; n > 0; n -= vl, src += vl, dst += vlOut) {
+    vl = __riscv_vsetvl_e8m2(n);
+
+    vuint8m2_t v0 = __riscv_vle8_v_u8m2((uint8_t const *)src, vl);
+    uint64_t max = __riscv_vmv_x_s_u8m1_u8(
+        __riscv_vredmaxu_vs_u8m2_u8m1(v0, __riscv_vmv_s_x_u8m1(0, vl), vl));
+
+    uint8_t next0 = src[vl + 0];
+    uint8_t next1 = src[vl + 1];
+    uint8_t next2 = src[vl + 2];
+
+    /* fast path: ASCII */
+    if ((max | next0 | next1 | next2) < 0b10000000) {
+      vlOut = vl;
+      if (is16)
+        __riscv_vse16_v_u16m4(
+            (uint16_t *)dst,
+            simdutf_byteflip<bflip>(__riscv_vzext_vf2_u16m4(v0, vlOut), vlOut),
+            vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf4_u32m8(v0, vlOut), vlOut);
+      continue;
     }
-  }
 
-  return count + scalar::utf16::count_code_points<endianness::LITTLE>(
-                     ptr, length - (ptr - input));
-}
+    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
+     * https://arxiv.org/abs/2010.03090 */
+    vuint8m2_t v1 = __riscv_vslide1down_vx_u8m2(v0, next0, vl);
+    vuint8m2_t v2 = __riscv_vslide1down_vx_u8m2(v1, next1, vl);
+    vuint8m2_t v3 = __riscv_vslide1down_vx_u8m2(v2, next2, vl);
 
-simdutf_warn_unused size_t implementation::count_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  const char16_t *ptr = input;
-  size_t count{0};
-  if (length >= 32) {
+    if (validate) {
+      vuint8m2_t s1 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
+          __riscv_vreinterpret_v_u8m2_u16m2(v2), 4, __riscv_vsetvlmax_e16m2()));
+      vuint8m2_t s3 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
+          __riscv_vreinterpret_v_u8m2_u16m2(v3), 4, __riscv_vsetvlmax_e16m2()));
 
-    const char16_t *end = input + length - 32;
+      vuint8m2_t idx2 = __riscv_vand_vx_u8m2(v2, 0xF, vl);
+      vuint8m2_t idx1 = __riscv_vand_vx_u8m2(s1, 0xF, vl);
+      vuint8m2_t idx3 = __riscv_vand_vx_u8m2(s3, 0xF, vl);
 
-    const __m512i low = _mm512_set1_epi16((uint16_t)0xdc00);
-    const __m512i high = _mm512_set1_epi16((uint16_t)0xdfff);
+      vuint8m2_t err1 = simdutf_vrgather_u8m1x2(err1tbl, idx1);
+      vuint8m2_t err2 = simdutf_vrgather_u8m1x2(err2tbl, idx2);
+      vuint8m2_t err3 = simdutf_vrgather_u8m1x2(err3tbl, idx3);
+      vint8m2_t errs = __riscv_vreinterpret_v_u8m2_i8m2(
+          __riscv_vand_vv_u8m2(__riscv_vand_vv_u8m2(err1, err2, vl), err3, vl));
 
-    const __m512i byteflip = _mm512_setr_epi64(
-        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-        0x0607040502030001, 0x0e0f0c0d0a0b0809);
-    while (ptr <= end) {
-      __m512i utf16 =
-          _mm512_shuffle_epi8(_mm512_loadu_si512((__m512i *)ptr), byteflip);
-      ptr += 32;
-      uint64_t not_high_surrogate =
-          static_cast<uint64_t>(_mm512_cmpgt_epu16_mask(utf16, high) |
-                                _mm512_cmplt_epu16_mask(utf16, low));
-      count += count_ones(not_high_surrogate);
+      vbool4_t is_3 = __riscv_vmsgtu_vx_u8m2_b4(v1, 0b11100000 - 1, vl);
+      vbool4_t is_4 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b11110000 - 1, vl);
+      vbool4_t is_34 = __riscv_vmor_mm_b4(is_3, is_4, vl);
+      vbool4_t err34 =
+          __riscv_vmxor_mm_b4(is_34, __riscv_vmslt_vx_i8m2_b4(errs, 0, vl), vl);
+      vbool4_t errm =
+          __riscv_vmor_mm_b4(__riscv_vmsgt_vx_i8m2_b4(errs, 0, vl), err34, vl);
+      if (__riscv_vfirst_m_b4(errm, vl) >= 0)
+        return 0;
     }
-  }
 
-  return count + scalar::utf16::count_code_points<endianness::BIG>(
-                     ptr, length - (ptr - input));
-}
+    /* decoding */
 
-simdutf_warn_unused size_t
-implementation::count_utf8(const char *input, size_t length) const noexcept {
-  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
-  size_t answer =
-      length / sizeof(__m512i) *
-      sizeof(__m512i); // Number of 512-bit chunks that fits into the length.
-  size_t i = 0;
-  __m512i unrolled_popcount{0};
+    /* mask of non continuation bytes */
+    vbool4_t m =
+        __riscv_vmsgt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v0), -65, vl);
+    vlOut = __riscv_vcpop_m_b4(m, vl);
 
-  const __m512i continuation = _mm512_set1_epi8(char(0b10111111));
+    /* extract first and second bytes */
+    vuint8m2_t b1 = __riscv_vcompress_vm_u8m2(v0, m, vl);
+    vuint8m2_t b2 = __riscv_vcompress_vm_u8m2(v1, m, vl);
 
-  while (i + sizeof(__m512i) <= length) {
-    size_t iterations = (length - i) / sizeof(__m512i);
+    /* fast path: one and two byte */
+    if (max < 0b11100000) {
+      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
 
-    size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
-    for (; i + 8 * sizeof(__m512i) <= max_i; i += 8 * sizeof(__m512i)) {
-      __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
-      __m512i input2 =
-          _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
-      __m512i input3 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 2 * sizeof(__m512i)));
-      __m512i input4 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 3 * sizeof(__m512i)));
-      __m512i input5 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 4 * sizeof(__m512i)));
-      __m512i input6 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 5 * sizeof(__m512i)));
-      __m512i input7 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 6 * sizeof(__m512i)));
-      __m512i input8 =
-          _mm512_loadu_si512((const __m512i *)(str + i + 7 * sizeof(__m512i)));
+      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
+      b1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
 
-      __mmask64 mask1 = _mm512_cmple_epi8_mask(input1, continuation);
-      __mmask64 mask2 = _mm512_cmple_epi8_mask(input2, continuation);
-      __mmask64 mask3 = _mm512_cmple_epi8_mask(input3, continuation);
-      __mmask64 mask4 = _mm512_cmple_epi8_mask(input4, continuation);
-      __mmask64 mask5 = _mm512_cmple_epi8_mask(input5, continuation);
-      __mmask64 mask6 = _mm512_cmple_epi8_mask(input6, continuation);
-      __mmask64 mask7 = _mm512_cmple_epi8_mask(input7, continuation);
-      __mmask64 mask8 = _mm512_cmple_epi8_mask(input8, continuation);
+      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
+          b1,
+          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
+                                  vlOut),
+          vlOut);
+      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
+      if (is16)
+        __riscv_vse16_v_u16m4((uint16_t *)dst,
+                              simdutf_byteflip<bflip>(b12, vlOut), vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf2_u32m8(b12, vlOut), vlOut);
+      continue;
+    }
 
-      __m512i mask_register = _mm512_set_epi64(mask8, mask7, mask6, mask5,
-                                               mask4, mask3, mask2, mask1);
+    /* fast path: one, two and three byte */
+    if (max < 0b11110000) {
+      vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
 
-      unrolled_popcount = _mm512_add_epi64(unrolled_popcount,
-                                           _mm512_popcnt_epi64(mask_register));
-    }
+      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
+      b3 = __riscv_vand_vx_u8m2(b3, 0b00111111, vlOut);
 
-    for (; i <= max_i; i += sizeof(__m512i)) {
-      __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
-      uint64_t continuation_bitmask = static_cast<uint64_t>(
-          _mm512_cmple_epi8_mask(more_input, continuation));
-      answer -= count_ones(continuation_bitmask);
+      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
+      vbool4_t m3 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b11011111, vlOut);
+
+      vuint8m2_t t1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
+      b1 = __riscv_vand_vx_u8m2_mu(m3, t1, b1, 15, vlOut);
+
+      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
+          b1,
+          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
+                                  vlOut),
+          vlOut);
+      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
+      vuint16m4_t b123 = __riscv_vwaddu_wv_u16m4_mu(
+          m3, b12, __riscv_vsll_vx_u16m4_mu(m3, b12, b12, 6, vlOut), b3, vlOut);
+      if (is16)
+        __riscv_vse16_v_u16m4((uint16_t *)dst,
+                              simdutf_byteflip<bflip>(b123, vlOut), vlOut);
+      else
+        __riscv_vse32_v_u32m8((uint32_t *)dst,
+                              __riscv_vzext_vf2_u32m8(b123, vlOut), vlOut);
+      continue;
     }
-  }
 
-  __m256i first_half = _mm512_extracti64x4_epi64(unrolled_popcount, 0);
-  __m256i second_half = _mm512_extracti64x4_epi64(unrolled_popcount, 1);
-  answer -= (size_t)_mm256_extract_epi64(first_half, 0) +
-            (size_t)_mm256_extract_epi64(first_half, 1) +
-            (size_t)_mm256_extract_epi64(first_half, 2) +
-            (size_t)_mm256_extract_epi64(first_half, 3) +
-            (size_t)_mm256_extract_epi64(second_half, 0) +
-            (size_t)_mm256_extract_epi64(second_half, 1) +
-            (size_t)_mm256_extract_epi64(second_half, 2) +
-            (size_t)_mm256_extract_epi64(second_half, 3);
+    /* extract third and fourth bytes */
+    vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
+    vuint8m2_t b4 = __riscv_vcompress_vm_u8m2(v3, m, vl);
+
+    /* remove prefix from leading bytes
+     *
+     * We could also use vrgather here, but it increases register pressure,
+     * and its performance varies widely on current platforms. It might be
+     * worth reconsidering, though, once there is more hardware available.
+     * Same goes for the __riscv_vsrl_vv_u32m4 correction step.
+     *
+     * We shift left and then right by the number of bytes in the prefix,
+     * which can be calculated as follows:
+     *         x                                max(x-10, 0)
+     * 0xxx -> 0000-0111 -> sift by 0 or 1   -> 0
+     * 10xx -> 1000-1011 -> don't care
+     * 110x -> 1100,1101 -> sift by 3        -> 2,3
+     * 1110 -> 1110      -> sift by 4        -> 4
+     * 1111 -> 1111      -> sift by 5        -> 5
+     *
+     * vssubu.vx v, 10, (max(x-10, 0)) almost gives us what we want, we
+     * just need to manually detect and handle the one special case:
+     */
+#define SIMDUTF_RVV_UTF8_TO_COMMON_M1(idx)                                     \
+  vuint8m1_t c1 = __riscv_vget_v_u8m2_u8m1(b1, idx);                           \
+  vuint8m1_t c2 = __riscv_vget_v_u8m2_u8m1(b2, idx);                           \
+  vuint8m1_t c3 = __riscv_vget_v_u8m2_u8m1(b3, idx);                           \
+  vuint8m1_t c4 = __riscv_vget_v_u8m2_u8m1(b4, idx);                           \
+  /* remove prefix from trailing bytes */                                      \
+  c2 = __riscv_vand_vx_u8m1(c2, 0b00111111, vlOut);                            \
+  c3 = __riscv_vand_vx_u8m1(c3, 0b00111111, vlOut);                            \
+  c4 = __riscv_vand_vx_u8m1(c4, 0b00111111, vlOut);                            \
+  vuint8m1_t shift = __riscv_vsrl_vx_u8m1(c1, 4, vlOut);                       \
+  shift = __riscv_vmerge_vxm_u8m1(__riscv_vssubu_vx_u8m1(shift, 10, vlOut), 3, \
+                                  __riscv_vmseq_vx_u8m1_b8(shift, 12, vlOut),  \
+                                  vlOut);                                      \
+  c1 = __riscv_vsll_vv_u8m1(c1, shift, vlOut);                                 \
+  c1 = __riscv_vsrl_vv_u8m1(c1, shift, vlOut);                                 \
+  /* unconditionally widen and combine to c1234 */                             \
+  vuint16m2_t c34 = __riscv_vwaddu_wv_u16m2(                                   \
+      __riscv_vwmulu_vx_u16m2(c3, 1 << 6, vlOut), c4, vlOut);                  \
+  vuint16m2_t c12 = __riscv_vwaddu_wv_u16m2(                                   \
+      __riscv_vwmulu_vx_u16m2(c1, 1 << 6, vlOut), c2, vlOut);                  \
+  vuint32m4_t c1234 = __riscv_vwaddu_wv_u32m4(                                 \
+      __riscv_vwmulu_vx_u32m4(c12, 1 << 12, vlOut), c34, vlOut);               \
+  /* derive required right-shift amount from `shift` to reduce                 \
+   * c1234 to the required number of bytes */                                  \
+  c1234 = __riscv_vsrl_vv_u32m4(                                               \
+      c1234,                                                                   \
+      __riscv_vzext_vf4_u32m4(                                                 \
+          __riscv_vmul_vx_u8m1(                                                \
+              __riscv_vrsub_vx_u8m1(__riscv_vssubu_vx_u8m1(shift, 2, vlOut),   \
+                                    3, vlOut),                                 \
+              6, vlOut),                                                       \
+          vlOut),                                                              \
+      vlOut);                                                                  \
+  /* store result in desired format */                                         \
+  if (is16)                                                                    \
+    vlDst = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, c1234, vlOut,     \
+                                            m4even);                           \
+  else                                                                         \
+    vlDst = vlOut, __riscv_vse32_v_u32m4((uint32_t *)dst, c1234, vlOut);
 
-  return answer + scalar::utf8::count_code_points(
-                      reinterpret_cast<const char *>(str + i), length - i);
-}
+    /* Unrolling this manually reduces register pressure and allows
+     * us to terminate early. */
+    {
+      size_t vlOutm2 = vlOut, vlDst;
+      vlOut = __riscv_vsetvl_e8m1(vlOut);
+      SIMDUTF_RVV_UTF8_TO_COMMON_M1(0)
+      if (vlOutm2 == vlOut) {
+        vlOut = vlDst;
+        continue;
+      }
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
-    const char *buf, size_t len) const noexcept {
-  return count_utf8(buf, len);
-}
+      dst += vlDst;
+      vlOut = vlOutm2 - vlOut;
+    }
+    {
+      size_t vlDst;
+      SIMDUTF_RVV_UTF8_TO_COMMON_M1(1)
+      vlOut = vlDst;
+    }
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf16(size_t length) const noexcept {
-  return scalar::utf16::latin1_length_from_utf16(length);
-}
+#undef SIMDUTF_RVV_UTF8_TO_COMMON_M1
+  }
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf32(size_t length) const noexcept {
-  return scalar::utf32::latin1_length_from_utf32(length);
+  /* validate the last character and reparse it + tail */
+  if (len > tail) {
+    if ((src[0] >> 6) == 0b10)
+      --dst;
+    while ((src[0] >> 6) == 0b10 && tail < len)
+      --src, ++tail;
+    if (is16) {
+      /* go back one more, when on high surrogate */
+      if (simdutf_byteflip<bflip>((uint16_t)dst[-1]) >= 0xD800 &&
+          simdutf_byteflip<bflip>((uint16_t)dst[-1]) <= 0xDBFF)
+        --dst;
+    }
+  }
+  size_t ret = scalar(src, tail, dst);
+  if (ret == 0)
+    return 0;
+  return (size_t)(dst - beg) + ret;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  const char16_t *ptr = input;
-  size_t count{0};
-  if (length >= 32) {
-    const char16_t *end = input + length - 32;
-
-    const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
-    const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
-    const __m512i v_dfff = _mm512_set1_epi16((uint16_t)0xdfff);
-    const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *src, size_t len, char *dst) const noexcept {
+  const char *beg = dst;
+  uint8_t last = 0;
+  for (size_t vl, vlOut; len > 0;
+       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    // check which bytes are ASCII
+    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
+    // count ASCII bytes
+    vlOut = __riscv_vcpop_m_b4(ascii, vl);
+    // The original code would only enter the next block after this check:
+    //   vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+    //   vlOut = __riscv_vcpop_m_b4(m, vl);
+    //   if (vlOut != vl || last > 0b01111111) {...}q
+    // So that everything is ASCII or continuation bytes, we just proceeded
+    // without any processing, going straight to __riscv_vse8_v_u8m2.
+    // But you need the __riscv_vslide1up_vx_u8m2 whenever there is a non-ASCII
+    // byte.
+    if (vlOut != vl) { // If not pure ASCII
+      // Non-ASCII characters
+      // We now want to mark the ascii and continuation bytes
+      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+      // We count them, that's our new vlOut (output vector length)
+      vlOut = __riscv_vcpop_m_b4(m, vl);
 
-    while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
-      ptr += 32;
-      __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
-      __mmask32 two_bytes_bitmask =
-          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
-      __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
-      __mmask32 surrogates_bitmask =
-          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
-          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
+      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
 
-      size_t ascii_count = count_ones(ascii_bitmask);
-      size_t two_bytes_count = count_ones(two_bytes_bitmask);
-      size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
-      size_t three_bytes_count =
-          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
+      vbool4_t leading0 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b10111111, vl);
+      vbool4_t trailing1 = __riscv_vmslt_vx_i8m2_b4(
+          __riscv_vreinterpret_v_u8m2_i8m2(v1), (uint8_t)0b11000000, vl);
+      // -62 i 0b11000010, so we check whether any of v0 is too big
+      vbool4_t tobig = __riscv_vmand_mm_b4(
+          leading0,
+          __riscv_vmsgtu_vx_u8m2_b4(__riscv_vxor_vx_u8m2(v0, (uint8_t)-62, vl),
+                                    1, vl),
+          vl);
+      if (__riscv_vfirst_m_b4(
+              __riscv_vmor_mm_b4(
+                  tobig, __riscv_vmxor_mm_b4(leading0, trailing1, vl), vl),
+              vl) >= 0)
+        return 0;
 
-      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
-               2 * surrogate_bytes_count;
+      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
+                                  v1, v1, 0b01000000, vl);
+      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
+    } else if (last >= 0b11000000) { // If last byte is a leading  byte and we
+                                     // got only ASCII, error!
+      return 0;
     }
+    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
   }
-
-  return count + scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(
-                     ptr, length - (ptr - input));
+  if (last > 0b10111111)
+    return 0;
+  return dst - beg;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  const char16_t *ptr = input;
-  size_t count{0};
-
-  if (length >= 32) {
-    const char16_t *end = input + length - 32;
-
-    const __m512i v_007f = _mm512_set1_epi16((uint16_t)0x007f);
-    const __m512i v_07ff = _mm512_set1_epi16((uint16_t)0x07ff);
-    const __m512i v_dfff = _mm512_set1_epi16((uint16_t)0xdfff);
-    const __m512i v_d800 = _mm512_set1_epi16((uint16_t)0xd800);
-
-    const __m512i byteflip = _mm512_setr_epi64(
-        0x0607040502030001, 0x0e0f0c0d0a0b0809, 0x0607040502030001,
-        0x0e0f0c0d0a0b0809, 0x0607040502030001, 0x0e0f0c0d0a0b0809,
-        0x0607040502030001, 0x0e0f0c0d0a0b0809);
-    while (ptr <= end) {
-      __m512i utf16 = _mm512_loadu_si512((const __m512i *)ptr);
-      utf16 = _mm512_shuffle_epi8(utf16, byteflip);
-      ptr += 32;
-      __mmask32 ascii_bitmask = _mm512_cmple_epu16_mask(utf16, v_007f);
-      __mmask32 two_bytes_bitmask =
-          _mm512_mask_cmple_epu16_mask(~ascii_bitmask, utf16, v_07ff);
-      __mmask32 not_one_two_bytes = ~(ascii_bitmask | two_bytes_bitmask);
-      __mmask32 surrogates_bitmask =
-          _mm512_mask_cmple_epu16_mask(not_one_two_bytes, utf16, v_dfff) &
-          _mm512_mask_cmpge_epu16_mask(not_one_two_bytes, utf16, v_d800);
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *src, size_t len, char *dst) const noexcept {
+  size_t res = convert_utf8_to_latin1(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_latin1::convert_with_errors(src, len, dst);
+}
 
-      size_t ascii_count = count_ones(ascii_bitmask);
-      size_t two_bytes_count = count_ones(two_bytes_bitmask);
-      size_t surrogate_bytes_count = count_ones(surrogates_bitmask);
-      size_t three_bytes_count =
-          32 - ascii_count - two_bytes_count - surrogate_bytes_count;
-      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
-               2 * surrogate_bytes_count;
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *src, size_t len, char *dst) const noexcept {
+  const char *beg = dst;
+  uint8_t last = 0;
+  for (size_t vl, vlOut; len > 0;
+       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
+    vl = __riscv_vsetvl_e8m2(len);
+    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
+    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
+    vlOut = __riscv_vcpop_m_b4(ascii, vl);
+    if (vlOut != vl) { // If not pure ASCII
+      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
+      vlOut = __riscv_vcpop_m_b4(m, vl);
+      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
+      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
+                                  v1, v1, 0b01000000, vl);
+      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
     }
+    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
   }
-
-  return count + scalar::utf16::utf8_length_from_utf16<endianness::BIG>(
-                     ptr, length - (ptr - input));
+  return dst - beg;
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return implementation::count_utf16le(input, length);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE>(src, len,
+                                                              (uint16_t *)dst);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return implementation::count_utf16be(input, length);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB>(
+        src, len, (uint16_t *)dst);
+  else
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V>(src, len,
+                                                             (uint16_t *)dst);
 }
 
-simdutf_warn_unused size_t
-implementation::utf16_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf16_length_from_latin1(length);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf16le(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
+      src, len, dst);
 }
 
-simdutf_warn_unused size_t
-implementation::utf32_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf32_length_from_latin1(length);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf16be(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(src, len,
+                                                                     dst);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
-    const char *input, size_t length) const noexcept {
-  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
-  size_t answer = length / sizeof(__m512i) * sizeof(__m512i);
-  size_t i = 0;
-  if (answer >= 2048) { // long strings optimization
-    unsigned char v_0xFF = 0xff;
-    __m512i eight_64bits = _mm512_setzero_si512();
-    while (i + sizeof(__m512i) <= length) {
-      __m512i runner = _mm512_setzero_si512();
-      size_t iterations = (length - i) / sizeof(__m512i);
-      if (iterations > 255) {
-        iterations = 255;
-      }
-      size_t max_i = i + iterations * sizeof(__m512i) - sizeof(__m512i);
-      for (; i + 4 * sizeof(__m512i) <= max_i; i += 4 * sizeof(__m512i)) {
-        // Load four __m512i vectors
-        __m512i input1 = _mm512_loadu_si512((const __m512i *)(str + i));
-        __m512i input2 =
-            _mm512_loadu_si512((const __m512i *)(str + i + sizeof(__m512i)));
-        __m512i input3 = _mm512_loadu_si512(
-            (const __m512i *)(str + i + 2 * sizeof(__m512i)));
-        __m512i input4 = _mm512_loadu_si512(
-            (const __m512i *)(str + i + 3 * sizeof(__m512i)));
-
-        // Generate four masks
-        __mmask64 mask1 =
-            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input1);
-        __mmask64 mask2 =
-            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input2);
-        __mmask64 mask3 =
-            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input3);
-        __mmask64 mask4 =
-            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), input4);
-        // Apply the masks and subtract from the runner
-        __m512i not_ascii1 =
-            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask1, v_0xFF);
-        __m512i not_ascii2 =
-            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask2, v_0xFF);
-        __m512i not_ascii3 =
-            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask3, v_0xFF);
-        __m512i not_ascii4 =
-            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask4, v_0xFF);
-
-        runner = _mm512_sub_epi8(runner, not_ascii1);
-        runner = _mm512_sub_epi8(runner, not_ascii2);
-        runner = _mm512_sub_epi8(runner, not_ascii3);
-        runner = _mm512_sub_epi8(runner, not_ascii4);
-      }
-
-      for (; i <= max_i; i += sizeof(__m512i)) {
-        __m512i more_input = _mm512_loadu_si512((const __m512i *)(str + i));
-
-        __mmask64 mask =
-            _mm512_cmpgt_epi8_mask(_mm512_setzero_si512(), more_input);
-        __m512i not_ascii =
-            _mm512_mask_set1_epi8(_mm512_setzero_si512(), mask, v_0xFF);
-        runner = _mm512_sub_epi8(runner, not_ascii);
-      }
-
-      eight_64bits = _mm512_add_epi64(
-          eight_64bits, _mm512_sad_epu8(runner, _mm512_setzero_si512()));
-    }
-
-    __m256i first_half = _mm512_extracti64x4_epi64(eight_64bits, 0);
-    __m256i second_half = _mm512_extracti64x4_epi64(eight_64bits, 1);
-    answer += (size_t)_mm256_extract_epi64(first_half, 0) +
-              (size_t)_mm256_extract_epi64(first_half, 1) +
-              (size_t)_mm256_extract_epi64(first_half, 2) +
-              (size_t)_mm256_extract_epi64(first_half, 3) +
-              (size_t)_mm256_extract_epi64(second_half, 0) +
-              (size_t)_mm256_extract_epi64(second_half, 1) +
-              (size_t)_mm256_extract_epi64(second_half, 2) +
-              (size_t)_mm256_extract_epi64(second_half, 3);
-  } else if (answer > 0) {
-    for (; i + sizeof(__m512i) <= length; i += sizeof(__m512i)) {
-      __m512i latin = _mm512_loadu_si512((const __m512i *)(str + i));
-      uint64_t non_ascii = _mm512_movepi8_mask(latin);
-      answer += count_ones(non_ascii);
-    }
-  }
-  return answer + scalar::latin1::utf8_length_from_latin1(
-                      reinterpret_cast<const char *>(str + i), length - i);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE, false>(
+      src, len, (uint16_t *)dst);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos + 64 <= length; pos += 64) {
-    __m512i utf8 = _mm512_loadu_si512((const __m512i *)(input + pos));
-    uint64_t utf8_continuation_mask =
-        _mm512_cmplt_epi8_mask(utf8, _mm512_set1_epi8(-65 + 1));
-    // We count one word for anything that is not a continuation (so
-    // leading bytes).
-    count += 64 - count_ones(utf8_continuation_mask);
-    uint64_t utf8_4byte =
-        _mm512_cmpge_epu8_mask(utf8, _mm512_set1_epi8(int8_t(240)));
-    count += count_ones(utf8_4byte);
-  }
-  return count +
-         scalar::utf8::utf16_length_from_utf8(input + pos, length - pos);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *src, size_t len, char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB, false>(
+        src, len, (uint16_t *)dst);
+  else
+    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V, false>(
+        src, len, (uint16_t *)dst);
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  const char32_t *ptr = input;
-  size_t count{0};
-
-  if (length >= 16) {
-    const char32_t *end = input + length - 16;
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE>(src, len,
+                                                              (uint32_t *)dst);
+}
 
-    const __m512i v_0000_007f = _mm512_set1_epi32((uint32_t)0x7f);
-    const __m512i v_0000_07ff = _mm512_set1_epi32((uint32_t)0x7ff);
-    const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  size_t res = convert_utf8_to_utf32(src, len, dst);
+  if (res)
+    return result(error_code::SUCCESS, res);
+  return scalar::utf8_to_utf32::convert_with_errors(src, len, dst);
+}
 
-    while (ptr <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
-      ptr += 16;
-      __mmask16 ascii_bitmask = _mm512_cmple_epu32_mask(utf32, v_0000_007f);
-      __mmask16 two_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
-          _knot_mask16(ascii_bitmask), utf32, v_0000_07ff);
-      __mmask16 three_bytes_bitmask = _mm512_mask_cmple_epu32_mask(
-          _knot_mask16(_mm512_kor(ascii_bitmask, two_bytes_bitmask)), utf32,
-          v_0000_ffff);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *src, size_t len, char32_t *dst) const noexcept {
+  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE, false>(
+      src, len, (uint32_t *)dst);
+}
+/* end file src/rvv/rvv_utf8_to.inl.cpp */
 
-      size_t ascii_count = count_ones(ascii_bitmask);
-      size_t two_bytes_count = count_ones(two_bytes_bitmask);
-      size_t three_bytes_count = count_ones(three_bytes_bitmask);
-      size_t four_bytes_count =
-          16 - ascii_count - two_bytes_count - three_bytes_count;
-      count += ascii_count + 2 * two_bytes_count + 3 * three_bytes_count +
-               4 * four_bytes_count;
-    }
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  if (bom_encoding != encoding_type::unspecified)
+    return bom_encoding;
+  // todo: reimplement as a one-pass algorithm.
+  int out = 0;
+  if (validate_utf8(input, length))
+    out |= encoding_type::UTF8;
+  if (length % 2 == 0) {
+    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2))
+      out |= encoding_type::UTF16_LE;
+  }
+  if (length % 4 == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4))
+      out |= encoding_type::UTF32_LE;
   }
 
-  return count +
-         scalar::utf32::utf8_length_from_utf32(ptr, length - (ptr - input));
+  return out;
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  const char32_t *ptr = input;
-  size_t count{0};
-
-  if (length >= 16) {
-    const char32_t *end = input + length - 16;
-
-    const __m512i v_0000_ffff = _mm512_set1_epi32((uint32_t)0x0000ffff);
-
-    while (ptr <= end) {
-      __m512i utf32 = _mm512_loadu_si512((const __m512i *)ptr);
-      ptr += 16;
-      __mmask16 surrogates_bitmask =
-          _mm512_cmpgt_epu32_mask(utf32, v_0000_ffff);
-
-      count += 16 + count_ones(surrogates_bitmask);
-    }
+template <simdutf_ByteFlip bflip>
+simdutf_really_inline static void
+rvv_change_endianness_utf16(const char16_t *src, size_t len, char16_t *dst) {
+  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
+    vl = __riscv_vsetvl_e16m8(len);
+    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
+    __riscv_vse16_v_u16m8((uint16_t *)dst, simdutf_byteflip<bflip>(v, vl), vl);
   }
-
-  return count +
-         scalar::utf32::utf16_length_from_utf32(ptr, length - (ptr - input));
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return implementation::count_utf8(input, length);
+void implementation::change_endianness_utf16(const char16_t *src, size_t len,
+                                             char16_t *dst) const noexcept {
+  if (supports_zvbb())
+    return rvv_change_endianness_utf16<simdutf_ByteFlip::ZVBB>(src, len, dst);
+  else
+    return rvv_change_endianness_utf16<simdutf_ByteFlip::V>(src, len, dst);
 }
 
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
@@ -25315,21 +37828,86 @@ simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
 simdutf_warn_unused result implementation::base64_to_binary(
     const char *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+  }
+  return r;
 }
 
 simdutf_warn_unused full_result implementation::base64_to_binary_details(
     const char *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
 }
 
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
@@ -25340,21 +37918,86 @@ simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
 simdutf_warn_unused result implementation::base64_to_binary(
     const char16_t *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  auto equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+    return {SUCCESS, 0};
+  }
+  result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation};
+    }
+  }
+  return r;
 }
 
 simdutf_warn_unused full_result implementation::base64_to_binary_details(
     const char16_t *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
+  while (length > 0 &&
+         scalar::base64::is_ascii_white_space(input[length - 1])) {
+    length--;
+  }
+  size_t equallocation =
+      length; // location of the first padding character if any
+  size_t equalsigns = 0;
+  if (length > 0 && input[length - 1] == '=') {
+    equallocation = length - 1;
+    length -= 1;
+    equalsigns++;
+    while (length > 0 &&
+           scalar::base64::is_ascii_white_space(input[length - 1])) {
+      length--;
+    }
+    if (length > 0 && input[length - 1] == '=') {
+      equallocation = length - 1;
+      equalsigns++;
+      length -= 1;
+    }
+  }
+  if (length == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  full_result r = scalar::base64::base64_tail_decode(
+      output, input, length, equalsigns, options, last_chunk_options);
+  if (last_chunk_options != stop_before_partial &&
+      r.error == error_code::SUCCESS && equalsigns > 0) {
+    // additional checks
+    if ((r.output_count % 3 == 0) ||
+        ((r.output_count % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
+    }
+  }
+  return r;
 }
 
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
@@ -25365,56 +38008,38 @@ simdutf_warn_unused size_t implementation::base64_length_from_binary(
 size_t implementation::binary_to_base64(const char *input, size_t length,
                                         char *output,
                                         base64_options options) const noexcept {
-  if (options & base64_url) {
-    return encode_base64<true>(output, input, length, options);
-  } else {
-    return encode_base64<false>(output, input, length, options);
-  }
+  return scalar::base64::tail_encode_base64(output, input, length, options);
 }
-
-} // namespace icelake
+} // namespace rvv
 } // namespace simdutf
 
-/* begin file src/simdutf/icelake/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_ICELAKE
+/* begin file src/simdutf/rvv/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_RVV
 // nothing needed.
 #else
 SIMDUTF_UNTARGET_REGION
 #endif
 
-
-#if SIMDUTF_GCC11ORMORE // workaround for
-                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
-SIMDUTF_POP_DISABLE_WARNINGS
-#endif // end of workaround
-/* end file src/simdutf/icelake/end.h */
-/* end file src/icelake/implementation.cpp */
+/* end file src/simdutf/rvv/end.h */
+/* end file src/rvv/implementation.cpp */
 #endif
-#if SIMDUTF_IMPLEMENTATION_HASWELL
-/* begin file src/haswell/implementation.cpp */
-
-/* begin file src/simdutf/haswell/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "haswell"
-// #define SIMDUTF_IMPLEMENTATION haswell
+#if SIMDUTF_IMPLEMENTATION_WESTMERE
+/* begin file src/westmere/implementation.cpp */
+/* begin file src/simdutf/westmere/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "westmere"
+// #define SIMDUTF_IMPLEMENTATION westmere
 
-#if SIMDUTF_CAN_ALWAYS_RUN_HASWELL
+#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
 // nothing needed.
 #else
-SIMDUTF_TARGET_HASWELL
+SIMDUTF_TARGET_WESTMERE
 #endif
-
-#if SIMDUTF_GCC11ORMORE // workaround for
-                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
-// clang-format off
-SIMDUTF_DISABLE_GCC_WARNING(-Wmaybe-uninitialized)
-// clang-format on
-#endif // end of workaround
-/* end file src/simdutf/haswell/begin.h */
+/* end file src/simdutf/westmere/begin.h */
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
-#ifndef SIMDUTF_HASWELL_H
-  #error "haswell.h must be included"
+#ifndef SIMDUTF_WESTMERE_H
+  #error "westmere.h must be included"
 #endif
 using namespace simd;
 
@@ -25441,13 +38066,90 @@ simdutf_really_inline simd8<bool>
 must_be_2_3_continuation(const simd8<uint8_t> prev2,
                          const simd8<uint8_t> prev3) {
   simd8<uint8_t> is_third_byte =
-      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be > 0x80
+      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
   simd8<uint8_t> is_fourth_byte =
-      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be > 0x80
+      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
   return simd8<bool>(is_third_byte | is_fourth_byte);
 }
 
-/* begin file src/haswell/avx2_validate_utf16.cpp */
+/* begin file src/westmere/internal/loader.cpp */
+namespace internal {
+namespace westmere {
+
+/* begin file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
+/*
+ * reads a vector of uint16 values
+ * bits after 11th are ignored
+ * first 11 bits are encoded into utf8
+ * !important! utf8_output must have at least 16 writable bytes
+ */
+
+inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
+                                       const __m128i one_byte_bytemask,
+                                       const uint16_t one_byte_bitmask) {
+  // 0b1100_0000_1000_0000
+  const __m128i v_c080 = _mm_set1_epi16((int16_t)0xc080);
+  // 0b0001_1111_0000_0000
+  const __m128i v_1f00 = _mm_set1_epi16((int16_t)0x1f00);
+  // 0b0000_0000_0011_1111
+  const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f);
+
+  // 1. prepare 2-byte values
+  // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+  // expected output   : [110a|aaaa|10bb|bbbb] x 8
+
+  // t0 = [000a|aaaa|bbbb|bb00]
+  const __m128i t0 = _mm_slli_epi16(v_u16, 2);
+  // t1 = [000a|aaaa|0000|0000]
+  const __m128i t1 = _mm_and_si128(t0, v_1f00);
+  // t2 = [0000|0000|00bb|bbbb]
+  const __m128i t2 = _mm_and_si128(v_u16, v_003f);
+  // t3 = [000a|aaaa|00bb|bbbb]
+  const __m128i t3 = _mm_or_si128(t1, t2);
+  // t4 = [110a|aaaa|10bb|bbbb]
+  const __m128i t4 = _mm_or_si128(t3, v_c080);
+
+  // 2. merge ASCII and 2-byte codewords
+  const __m128i utf8_unpacked = _mm_blendv_epi8(t4, v_u16, one_byte_bytemask);
+
+  // 3. prepare bitmask for 8-bit lookup
+  //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a
+  //    - LSB)
+  const uint16_t m0 = one_byte_bitmask & 0x5555;      // m0 = 0h0g0f0e0d0c0b0a
+  const uint16_t m1 = static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+  const uint8_t m2 = static_cast<uint8_t>((m0 | m1) & 0xff); // m2 = hdgcfbea
+  // 4. pack the bytes
+  const uint8_t *row =
+      &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
+  const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
+  const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
+
+  // 5. store bytes
+  _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+
+  // 6. adjust pointers
+  utf8_output += row[0];
+}
+
+inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
+                                       const __m128i v_0000,
+                                       const __m128i v_ff80) {
+  // no bits set above 7th bit
+  const __m128i one_byte_bytemask =
+      _mm_cmpeq_epi16(_mm_and_si128(v_u16, v_ff80), v_0000);
+  const uint16_t one_byte_bitmask =
+      static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+
+  write_v_u16_11bits_to_utf8(v_u16, utf8_output, one_byte_bytemask,
+                             one_byte_bitmask);
+}
+/* end file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
+
+} // namespace westmere
+} // namespace internal
+/* end file src/westmere/internal/loader.cpp */
+
+/* begin file src/westmere/sse_validate_utf16.cpp */
 /*
     In UTF-16 code units in range 0xD800 to 0xDFFF have special meaning.
 
@@ -25468,7 +38170,7 @@ must_be_2_3_continuation(const simd8<uint8_t> prev2,
     - there must not be two consecutive high surrogates (0xdc00 .. 0xdfff)
     - there must not be sole low surrogate nor high surrogate
 
-    We're going to build three bitmasks based on the 3rd nibble:
+    We are going to build three bitmasks based on the 3rd nibble:
     - V = valid word,
     - L = low surrogate (0xd800 .. 0xdbff)
     - H = high surrogate (0xdc00 .. 0xdfff)
@@ -25495,7 +38197,7 @@ must_be_2_3_continuation(const simd8<uint8_t> prev2,
    - nullptr if an error was detected.
 */
 template <endianness big_endian>
-const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
+const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
   const char16_t *end = input + size;
 
   const auto v_d8 = simd8<uint8_t>::splat(0xd8);
@@ -25503,13 +38205,13 @@ const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
   const auto v_fc = simd8<uint8_t>::splat(0xfc);
   const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
     // 0. Load data: since the validation takes into account only higher
     //    byte of each word, we compress the two vectors into one which
     //    consists only the higher bytes.
     auto in0 = simd16<uint16_t>(input);
-    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
-
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
     if (big_endian) {
       in0 = in0.swap_bytes();
       in1 = in1.swap_bytes();
@@ -25522,9 +38224,10 @@ const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
 
     // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
     const auto surrogates_wordmask = (in & v_f8) == v_d8;
-    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
-    if (surrogates_bitmask == 0x0) {
-      input += simd16<uint16_t>::ELEMENTS * 2;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
     } else {
       // 2. We have some surrogates that have to be distinguished:
       //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
@@ -25534,35 +38237,36 @@ const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
 
       // V - non-surrogate code units
       //     V = not surrogates_wordmask
-      const uint32_t V = ~surrogates_bitmask;
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
 
       // H - word-mask for high surrogates: the six highest bits are 0b1101'11
       const auto vH = (in & v_fc) == v_dc;
-      const uint32_t H = vH.to_bitmask();
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
 
       // L - word mask for low surrogates
       //     L = not H and surrogates_wordmask
-      const uint32_t L = ~H & surrogates_bitmask;
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
 
-      const uint32_t a =
-          L & (H >> 1); // A low surrogate must be followed by high one.
-                        // (A low surrogate placed in the 7th register's word
-                        // is an exception we handle.)
-      const uint32_t b =
-          a << 1; // Just mark that the opposite fact is hold,
-                  // thanks to that we have only two masks for valid case.
-      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
 
-      if (c == 0xffffffff) {
+      if (c == 0xffff) {
         // The whole input register contains valid UTF-16, i.e.,
         // either single code units or proper surrogate pairs.
-        input += simd16<uint16_t>::ELEMENTS * 2;
-      } else if (c == 0x7fffffff) {
-        // The 31 lower code units of the input register contains valid UTF-16.
-        // The 31 word may be either a low or high surrogate. It the next
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
         // iteration we 1) check if the low surrogate is followed by a high
         // one, 2) reject sole high surrogate.
-        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+        input += 15;
       } else {
         return nullptr;
       }
@@ -25573,8 +38277,8 @@ const char16_t *avx2_validate_utf16(const char16_t *input, size_t size) {
 }
 
 template <endianness big_endian>
-const result avx2_validate_utf16_with_errors(const char16_t *input,
-                                             size_t size) {
+const result sse_validate_utf16_with_errors(const char16_t *input,
+                                            size_t size) {
   if (simdutf_unlikely(size == 0)) {
     return result(error_code::SUCCESS, 0);
   }
@@ -25586,12 +38290,13 @@ const result avx2_validate_utf16_with_errors(const char16_t *input,
   const auto v_fc = simd8<uint8_t>::splat(0xfc);
   const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
     // 0. Load data: since the validation takes into account only higher
     //    byte of each word, we compress the two vectors into one which
     //    consists only the higher bytes.
     auto in0 = simd16<uint16_t>(input);
-    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
 
     if (big_endian) {
       in0 = in0.swap_bytes();
@@ -25605,9 +38310,10 @@ const result avx2_validate_utf16_with_errors(const char16_t *input,
 
     // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
     const auto surrogates_wordmask = (in & v_f8) == v_d8;
-    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
-    if (surrogates_bitmask == 0x0) {
-      input += simd16<uint16_t>::ELEMENTS * 2;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
     } else {
       // 2. We have some surrogates that have to be distinguished:
       //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
@@ -25617,35 +38323,36 @@ const result avx2_validate_utf16_with_errors(const char16_t *input,
 
       // V - non-surrogate code units
       //     V = not surrogates_wordmask
-      const uint32_t V = ~surrogates_bitmask;
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
 
       // H - word-mask for high surrogates: the six highest bits are 0b1101'11
       const auto vH = (in & v_fc) == v_dc;
-      const uint32_t H = vH.to_bitmask();
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
 
       // L - word mask for low surrogates
       //     L = not H and surrogates_wordmask
-      const uint32_t L = ~H & surrogates_bitmask;
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
 
-      const uint32_t a =
-          L & (H >> 1); // A low surrogate must be followed by high one.
-                        // (A low surrogate placed in the 7th register's word
-                        // is an exception we handle.)
-      const uint32_t b =
-          a << 1; // Just mark that the opposite fact is hold,
-                  // thanks to that we have only two masks for valid case.
-      const uint32_t c = V | a | b; // Combine all the masks into the final one.
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
 
-      if (c == 0xffffffff) {
+      if (c == 0xffff) {
         // The whole input register contains valid UTF-16, i.e.,
         // either single code units or proper surrogate pairs.
-        input += simd16<uint16_t>::ELEMENTS * 2;
-      } else if (c == 0x7fffffff) {
-        // The 31 lower code units of the input register contains valid UTF-16.
-        // The 31 word may be either a low or high surrogate. It the next
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
         // iteration we 1) check if the low surrogate is followed by a high
         // one, 2) reject sole high surrogate.
-        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
+        input += 15;
       } else {
         return result(error_code::SURROGATE, input - start);
       }
@@ -25654,228 +38361,210 @@ const result avx2_validate_utf16_with_errors(const char16_t *input,
 
   return result(error_code::SUCCESS, input - start);
 }
-/* end file src/haswell/avx2_validate_utf16.cpp */
-/* begin file src/haswell/avx2_validate_utf32le.cpp */
+/* end file src/westmere/sse_validate_utf16.cpp */
+/* begin file src/westmere/sse_validate_utf32le.cpp */
 /* Returns:
    - pointer to the last unprocessed character (a scalar fallback should check
    the rest);
    - nullptr if an error was detected.
 */
-const char32_t *avx2_validate_utf32le(const char32_t *input, size_t size) {
+const char32_t *sse_validate_utf32le(const char32_t *input, size_t size) {
   const char32_t *end = input + size;
 
-  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
-  const __m256i offset = _mm256_set1_epi32(0xffff2000);
-  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
-  __m256i currentmax = _mm256_setzero_si256();
-  __m256i currentoffsetmax = _mm256_setzero_si256();
+  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
+  const __m128i offset = _mm_set1_epi32(0xffff2000);
+  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
+  __m128i currentmax = _mm_setzero_si128();
+  __m128i currentoffsetmax = _mm_setzero_si128();
 
-  while (input + 8 < end) {
-    const __m256i in = _mm256_loadu_si256((__m256i *)input);
-    currentmax = _mm256_max_epu32(in, currentmax);
+  while (input + 4 < end) {
+    const __m128i in = _mm_loadu_si128((__m128i *)input);
+    currentmax = _mm_max_epu32(in, currentmax);
     currentoffsetmax =
-        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
-    input += 8;
+        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
+    input += 4;
   }
-  __m256i is_zero =
-      _mm256_xor_si256(_mm256_max_epu32(currentmax, standardmax), standardmax);
-  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+  __m128i is_zero =
+      _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
+  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
     return nullptr;
   }
 
-  is_zero = _mm256_xor_si256(
-      _mm256_max_epu32(currentoffsetmax, standardoffsetmax), standardoffsetmax);
-  if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+  is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
+                          standardoffsetmax);
+  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
     return nullptr;
   }
 
   return input;
 }
 
-const result avx2_validate_utf32le_with_errors(const char32_t *input,
-                                               size_t size) {
+const result sse_validate_utf32le_with_errors(const char32_t *input,
+                                              size_t size) {
   const char32_t *start = input;
   const char32_t *end = input + size;
 
-  const __m256i standardmax = _mm256_set1_epi32(0x10ffff);
-  const __m256i offset = _mm256_set1_epi32(0xffff2000);
-  const __m256i standardoffsetmax = _mm256_set1_epi32(0xfffff7ff);
-  __m256i currentmax = _mm256_setzero_si256();
-  __m256i currentoffsetmax = _mm256_setzero_si256();
+  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
+  const __m128i offset = _mm_set1_epi32(0xffff2000);
+  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
+  __m128i currentmax = _mm_setzero_si128();
+  __m128i currentoffsetmax = _mm_setzero_si128();
 
-  while (input + 8 < end) {
-    const __m256i in = _mm256_loadu_si256((__m256i *)input);
-    currentmax = _mm256_max_epu32(in, currentmax);
+  while (input + 4 < end) {
+    const __m128i in = _mm_loadu_si128((__m128i *)input);
+    currentmax = _mm_max_epu32(in, currentmax);
     currentoffsetmax =
-        _mm256_max_epu32(_mm256_add_epi32(in, offset), currentoffsetmax);
+        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
 
-    __m256i is_zero = _mm256_xor_si256(
-        _mm256_max_epu32(currentmax, standardmax), standardmax);
-    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
+    __m128i is_zero =
+        _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
+    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
       return result(error_code::TOO_LARGE, input - start);
     }
 
-    is_zero =
-        _mm256_xor_si256(_mm256_max_epu32(currentoffsetmax, standardoffsetmax),
-                         standardoffsetmax);
-    if (_mm256_testz_si256(is_zero, is_zero) == 0) {
-      return result(error_code::SURROGATE, input - start);
-    }
-    input += 8;
-  }
-
-  return result(error_code::SUCCESS, input - start);
-}
-/* end file src/haswell/avx2_validate_utf32le.cpp */
-
-/* begin file src/haswell/avx2_convert_latin1_to_utf8.cpp */
-std::pair<const char *, char *>
-avx2_convert_latin1_to_utf8(const char *latin1_input, size_t len,
-                            char *utf8_output) {
-  const char *end = latin1_input + len;
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
-  const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
-  const size_t safety_margin = 12;
-
-  while (end - latin1_input >= std::ptrdiff_t(16 + safety_margin)) {
-    __m128i in8 = _mm_loadu_si128((__m128i *)latin1_input);
-    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
-    const __m128i v_80 = _mm_set1_epi8((char)0x80);
-    if (_mm_testz_si128(in8, v_80)) { // ASCII fast path!!!!
-      // 1. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, in8);
-      // 2. adjust pointers
-      latin1_input += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
-    }
-    // We proceed only with the first 16 bytes.
-    const __m256i in = _mm256_cvtepu8_epi16((in8));
-
-    // 1. prepare 2-byte values
-    // input 16-bit word : [0000|0000|aabb|bbbb] x 8
-    // expected output   : [1100|00aa|10bb|bbbb] x 8
-    const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-    const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-    // t0 = [0000|00aa|bbbb|bb00]
-    const __m256i t0 = _mm256_slli_epi16(in, 2);
-    // t1 = [0000|00aa|0000|0000]
-    const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-    // t2 = [0000|0000|00bb|bbbb]
-    const __m256i t2 = _mm256_and_si256(in, v_003f);
-    // t3 = [000a|aaaa|00bb|bbbb]
-    const __m256i t3 = _mm256_or_si256(t1, t2);
-    // t4 = [1100|00aa|10bb|bbbb]
-    const __m256i t4 = _mm256_or_si256(t3, v_c080);
-
-    // 2. merge ASCII and 2-byte codewords
-
-    // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
-
-    const __m256i utf8_unpacked = _mm256_blendv_epi8(t4, in, one_byte_bytemask);
-
-    // 3. prepare bitmask for 8-bit lookup
-    const uint32_t M0 = one_byte_bitmask & 0x55555555;
-    const uint32_t M1 = M0 >> 7;
-    const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
-    // 4. pack the bytes
-
-    const uint8_t *row =
-        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-    const uint8_t *row_2 =
-        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >> 16)]
-                                                            [0];
-
-    const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-    const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
-
-    const __m256i utf8_packed = _mm256_shuffle_epi8(
-        utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-    // 5. store bytes
-    _mm_storeu_si128((__m128i *)utf8_output,
-                     _mm256_castsi256_si128(utf8_packed));
-    utf8_output += row[0];
-    _mm_storeu_si128((__m128i *)utf8_output,
-                     _mm256_extractf128_si256(utf8_packed, 1));
-    utf8_output += row_2[0];
-
-    // 6. adjust pointers
-    latin1_input += 16;
-    continue;
-
-  } // while
-  return std::make_pair(latin1_input, utf8_output);
+    is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
+                            standardoffsetmax);
+    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+      return result(error_code::SURROGATE, input - start);
+    }
+    input += 4;
+  }
+
+  return result(error_code::SUCCESS, input - start);
 }
-/* end file src/haswell/avx2_convert_latin1_to_utf8.cpp */
-/* begin file src/haswell/avx2_convert_latin1_to_utf16.cpp */
-template <endianness big_endian>
-std::pair<const char *, char16_t *>
-avx2_convert_latin1_to_utf16(const char *latin1_input, size_t len,
-                             char16_t *utf16_output) {
-  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 32
+/* end file src/westmere/sse_validate_utf32le.cpp */
 
-  size_t i = 0;
-  for (; i < rounded_len; i += 16) {
-    // Load 16 bytes from the address (input + i) into a xmm register
-    __m128i xmm0 =
-        _mm_loadu_si128(reinterpret_cast<const __m128i *>(latin1_input + i));
+/* begin file src/westmere/sse_convert_latin1_to_utf8.cpp */
+std::pair<const char *const, char *const>
+sse_convert_latin1_to_utf8(const char *latin_input,
+                           const size_t latin_input_length, char *utf8_output) {
+  const char *end = latin_input + latin_input_length;
 
-    // Zero extend each byte in xmm0 to word and put it in another xmm register
-    __m128i xmm1 = _mm_cvtepu8_epi16(xmm0);
+  const __m128i v_0000 = _mm_setzero_si128();
+  // 0b1000_0000
+  const __m128i v_80 = _mm_set1_epi8((uint8_t)0x80);
+  // 0b1111_1111_1000_0000
+  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80);
 
-    // Shift xmm0 to the right by 8 bytes
-    xmm0 = _mm_srli_si128(xmm0, 8);
+  const __m128i latin_1_half_into_u16_byte_mask =
+      _mm_setr_epi8(0, '\x80', 1, '\x80', 2, '\x80', 3, '\x80', 4, '\x80', 5,
+                    '\x80', 6, '\x80', 7, '\x80');
 
-    // Zero extend each byte in the shifted xmm0 to word in xmm0
-    xmm0 = _mm_cvtepu8_epi16(xmm0);
+  const __m128i latin_2_half_into_u16_byte_mask =
+      _mm_setr_epi8(8, '\x80', 9, '\x80', 10, '\x80', 11, '\x80', 12, '\x80',
+                    13, '\x80', 14, '\x80', 15, '\x80');
 
-    if (big_endian) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      xmm0 = _mm_shuffle_epi8(xmm0, swap);
-      xmm1 = _mm_shuffle_epi8(xmm1, swap);
+  // each latin1 takes 1-2 utf8 bytes
+  // slow path writes useful 8-15 bytes twice (eagerly writes 16 bytes and then
+  // adjust the pointer) so the last write can exceed the utf8_output size by
+  // 8-1 bytes by reserving 8 extra input bytes, we expect the output to have
+  // 8-16 bytes free
+  while (end - latin_input >= 16 + 8) {
+    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
+    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
+
+    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
+      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
+      latin_input += 16;
+      utf8_output += 16;
+      continue;
     }
 
-    // Store the contents of xmm1 into the address pointed by (output + i)
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i), xmm1);
+    // assuming a/b are bytes and A/B are uint16 of the same value
+    // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
+    __m128i v_u16_latin_1_half =
+        _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
+    // aaaa_aaaa_bbbb_bbbb -> BBBB_BBBB
+    __m128i v_u16_latin_2_half =
+        _mm_shuffle_epi8(v_latin, latin_2_half_into_u16_byte_mask);
 
-    // Store the contents of xmm0 into the address pointed by (output + i + 8)
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + i + 8), xmm0);
+    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_1_half,
+                                                   utf8_output, v_0000, v_ff80);
+    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_2_half,
+                                                   utf8_output, v_0000, v_ff80);
+    latin_input += 16;
+  }
+
+  if (end - latin_input >= 16) {
+    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
+    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
+
+    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
+      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
+      latin_input += 16;
+      utf8_output += 16;
+    } else {
+      // assuming a/b are bytes and A/B are uint16 of the same value
+      // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
+      __m128i v_u16_latin_1_half =
+          _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          v_u16_latin_1_half, utf8_output, v_0000, v_ff80);
+      latin_input += 8;
+    }
   }
 
+  return std::make_pair(latin_input, utf8_output);
+}
+/* end file src/westmere/sse_convert_latin1_to_utf8.cpp */
+/* begin file src/westmere/sse_convert_latin1_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char *, char16_t *>
+sse_convert_latin1_to_utf16(const char *latin1_input, size_t len,
+                            char16_t *utf16_output) {
+  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
+  for (size_t i = 0; i < rounded_len; i += 16) {
+    // Load 16 Latin1 characters into a 128-bit register
+    __m128i in =
+        _mm_loadu_si128(reinterpret_cast<const __m128i *>(&latin1_input[i]));
+    __m128i out1 = big_endian ? _mm_unpacklo_epi8(_mm_setzero_si128(), in)
+                              : _mm_unpacklo_epi8(in, _mm_setzero_si128());
+    __m128i out2 = big_endian ? _mm_unpackhi_epi8(_mm_setzero_si128(), in)
+                              : _mm_unpackhi_epi8(in, _mm_setzero_si128());
+    // Zero extend each Latin1 character to 16-bit integers and store the
+    // results back to memory
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i]), out1);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i + 8]), out2);
+  }
+  // return pointers pointing to where we left off
   return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
 }
-/* end file src/haswell/avx2_convert_latin1_to_utf16.cpp */
-/* begin file src/haswell/avx2_convert_latin1_to_utf32.cpp */
+/* end file src/westmere/sse_convert_latin1_to_utf16.cpp */
+/* begin file src/westmere/sse_convert_latin1_to_utf32.cpp */
 std::pair<const char *, char32_t *>
-avx2_convert_latin1_to_utf32(const char *buf, size_t len,
-                             char32_t *utf32_output) {
-  size_t rounded_len = ((len | 7) ^ 7); // Round down to nearest multiple of 8
+sse_convert_latin1_to_utf32(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char *end = buf + len;
 
-  for (size_t i = 0; i < rounded_len; i += 8) {
-    // Load 8 Latin1 characters into a 64-bit register
-    __m128i in = _mm_loadl_epi64((__m128i *)&buf[i]);
+  while (end - buf >= 16) {
+    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
 
-    // Zero extend each set of 8 Latin1 characters to 8 32-bit integers using
-    // vpmovzxbd
-    __m256i out = _mm256_cvtepu8_epi32(in);
+    // Shift input to process next 4 bytes
+    __m128i in_shifted1 = _mm_srli_si128(in, 4);
+    __m128i in_shifted2 = _mm_srli_si128(in, 8);
+    __m128i in_shifted3 = _mm_srli_si128(in, 12);
 
-    // Store the results back to memory
-    _mm256_storeu_si256((__m256i *)&utf32_output[i], out);
+    // expand 8-bit to 32-bit unit
+    __m128i out1 = _mm_cvtepu8_epi32(in);
+    __m128i out2 = _mm_cvtepu8_epi32(in_shifted1);
+    __m128i out3 = _mm_cvtepu8_epi32(in_shifted2);
+    __m128i out4 = _mm_cvtepu8_epi32(in_shifted3);
+
+    _mm_storeu_si128((__m128i *)utf32_output, out1);
+    _mm_storeu_si128((__m128i *)(utf32_output + 4), out2);
+    _mm_storeu_si128((__m128i *)(utf32_output + 8), out3);
+    _mm_storeu_si128((__m128i *)(utf32_output + 12), out4);
+
+    utf32_output += 16;
+    buf += 16;
   }
 
-  // return pointers pointing to where we left off
-  return std::make_pair(buf + rounded_len, utf32_output + rounded_len);
+  return std::make_pair(buf, utf32_output);
 }
-/* end file src/haswell/avx2_convert_latin1_to_utf32.cpp */
+/* end file src/westmere/sse_convert_latin1_to_utf32.cpp */
 
-/* begin file src/haswell/avx2_convert_utf8_to_utf16.cpp */
+/* begin file src/westmere/sse_convert_utf8_to_utf16.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
 // Convert up to 12 bytes from utf8 to utf16 using a mask indicating the
@@ -25904,18 +38593,20 @@ size_t convert_masked_utf8_to_utf16(const char *input,
       utf8_end_of_code_point_mask & 0xfff;
   if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
-    __m256i ascii = _mm256_cvtepu8_epi16(in);
+    // Note: using 16 bytes is unsafe, see issue_ossfuzz_71218
+    __m128i ascii_first = _mm_cvtepu8_epi16(in);
+    __m128i ascii_second = _mm_cvtepu8_epi16(_mm_srli_si128(in, 8));
     if (big_endian) {
-      const __m256i swap256 = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      ascii = _mm256_shuffle_epi8(ascii, swap256);
+      ascii_first = _mm_shuffle_epi8(ascii_first, swap);
+      ascii_second = _mm_shuffle_epi8(ascii_second, swap);
     }
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf16_output), ascii);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output), ascii_first);
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + 8),
+                     ascii_second);
     utf16_output += 12; // We wrote 12 16-bit characters.
     return 12;          // We consumed 12 bytes.
   }
-  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
+  if (((utf8_end_of_code_point_mask & 0xFFFF) == 0xaaaa)) {
     // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte
     // UTF-16 code units. There is probably a more efficient sequence, but the
     // following might do.
@@ -25955,11 +38646,12 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     utf16_output += 4;
     return 12;
   }
+  /// We do not have a fast path available, so we fallback.
 
-  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
-      [input_utf8_end_of_code_point_mask][1];
+  const uint8_t idx =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
   if (idx < 64) {
     // SIX (6) input code-code units
     // this is a relatively easy scenario
@@ -25967,8 +38659,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     // code code units spanning between 1 and 2 bytes each is 12 bytes. On
     // processors where pdep/pext is fast, we might be able to use a small
     // lookup table.
-    const __m128i sh = _mm_loadu_si128(
-        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
@@ -25976,12 +38668,11 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     if (big_endian)
       composed = _mm_shuffle_epi8(composed, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed);
-    utf16_output += 6; // We wrote 12 bytes, 6 code points. There is a potential
-                       // overflow of 4 bytes.
+    utf16_output += 6; // We wrote 12 bytes, 6 code points.
   } else if (idx < 145) {
     // FOUR (4) input code-code units
-    const __m128i sh = _mm_loadu_si128(
-        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii =
         _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
@@ -25997,7 +38688,7 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     if (big_endian)
       composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
     _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
-    utf16_output += 4; // Here we overflow by 8 bytes.
+    utf16_output += 4;
   } else if (idx < 209) {
     // TWO (2) input code-code units
     //////////////
@@ -26009,8 +38700,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
     // only leading bytes at least as large as 0xf0 generate surrogate pairs. We
     // do as at the cost of an extra mask.
     /////////////
-    const __m128i sh = _mm_loadu_si128(
-        (const __m128i *)simdutf::tables::utf8_to_utf16::shufutf8[idx]);
+    const __m128i sh =
+        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
     const __m128i perm = _mm_shuffle_epi8(in, sh);
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
     const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
@@ -26071,8 +38762,8 @@ size_t convert_masked_utf8_to_utf16(const char *input,
   }
   return consumed;
 }
-/* end file src/haswell/avx2_convert_utf8_to_utf16.cpp */
-/* begin file src/haswell/avx2_convert_utf8_to_utf32.cpp */
+/* end file src/westmere/sse_convert_utf8_to_utf16.cpp */
+/* begin file src/westmere/sse_convert_utf8_to_utf32.cpp */
 // depends on "tables/utf8_to_utf16_tables.h"
 
 // Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
@@ -26098,10 +38789,14 @@ size_t convert_masked_utf8_to_utf32(const char *input,
       utf8_end_of_code_point_mask & 0xfff;
   if (utf8_end_of_code_point_mask == 0xfff) {
     // We process the data in chunks of 12 bytes.
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
-                        _mm256_cvtepu8_epi32(in));
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output + 8),
-                        _mm256_cvtepu8_epi32(_mm_srli_si128(in, 8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu8_epi32(in));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 4)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 8),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 8)));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 12),
+                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 12)));
     utf32_output += 12; // We wrote 12 32-bit characters.
     return 12;          // We consumed 12 bytes.
   }
@@ -26115,9 +38810,11 @@ size_t convert_masked_utf8_to_utf32(const char *input,
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm256_storeu_si256((__m256i *)utf32_output,
-                        _mm256_cvtepu16_epi32(composed));
-    utf32_output += 8; // We wrote 16 bytes, 8 code points.
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu16_epi32(composed));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
+    utf32_output += 8; // We wrote 32 bytes, 8 code points.
     return 16;
   }
   if (input_utf8_end_of_code_point_mask == 0x924) {
@@ -26160,10 +38857,11 @@ size_t convert_masked_utf8_to_utf32(const char *input,
     const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
     const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
     const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm256_storeu_si256((__m256i *)utf32_output,
-                        _mm256_cvtepu16_epi32(composed));
-    utf32_output += 6; // We wrote 24 bytes, 6 code points. There is a potential
-    // overflow of 32 - 24 = 8 bytes.
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                     _mm_cvtepu16_epi32(composed));
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
+    utf32_output += 6; // We wrote 12 bytes, 6 code points.
   } else if (idx < 145) {
     // FOUR (4) input code-code units
     const __m128i sh =
@@ -26201,46 +38899,99 @@ size_t convert_masked_utf8_to_utf32(const char *input,
         _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
                      _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
     _mm_storeu_si128((__m128i *)utf32_output, composed);
-    utf32_output +=
-        3; // We wrote 3 * 4 bytes, there is a potential overflow of 4 bytes.
+    utf32_output += 3;
   } else {
     // here we know that there is an error but we do not handle errors
   }
   return consumed;
 }
-/* end file src/haswell/avx2_convert_utf8_to_utf32.cpp */
+/* end file src/westmere/sse_convert_utf8_to_utf32.cpp */
+/* begin file src/westmere/sse_convert_utf8_to_latin1.cpp */
+// depends on "tables/utf8_to_utf16_tables.h"
 
-/* begin file src/haswell/avx2_convert_utf16_to_latin1.cpp */
+// Convert up to 12 bytes from utf8 to latin1 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+size_t convert_masked_utf8_to_latin1(const char *input,
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  const __m128i in = _mm_loadu_si128((__m128i *)input);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask &
+      0xfff; // we are only processing 12 bytes in case it is not all ASCII
+  if (utf8_end_of_code_point_mask == 0xfff) {
+    // We process the data in chunks of 12 bytes.
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
+    latin1_output += 12; // We wrote 12 characters.
+    return 12;           // We consumed 12 bytes.
+  }
+  /// We do not have a fast path available, so we fallback.
+  const uint8_t idx =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed =
+      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  // this indicates an invalid input:
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
+  // processors where pdep/pext is fast, we might be able to use a small lookup
+  // table.
+  const __m128i sh =
+      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
+  const __m128i perm = _mm_shuffle_epi8(in, sh);
+  const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
+  const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
+  __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
+  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
+  // writing 8 bytes even though we only care about the first 6 bytes.
+  // performance note: it would be faster to use _mm_storeu_si128, we should
+  // investigate.
+  _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
+  latin1_output += 6; // We wrote 6 bytes.
+  return consumed;
+}
+/* end file src/westmere/sse_convert_utf8_to_latin1.cpp */
+
+/* begin file src/westmere/sse_convert_utf16_to_latin1.cpp */
 template <endianness big_endian>
 std::pair<const char16_t *, char *>
-avx2_convert_utf16_to_latin1(const char16_t *buf, size_t len,
-                             char *latin1_output) {
+sse_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) {
   const char16_t *end = buf + len;
-  while (end - buf >= 16) {
-    // Load 16 UTF-16 characters into 256-bit AVX2 register
-    __m256i in = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(buf));
+  while (end - buf >= 8) {
+    // Load 8 UTF-16 characters into 128-bit SSE register
+    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
 
     if (!match_system(big_endian)) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
 
-    __m256i high_byte_mask = _mm256_set1_epi16((int16_t)0xFF00);
-    if (_mm256_testz_si256(in, high_byte_mask)) {
+    __m128i high_byte_mask = _mm_set1_epi16((int16_t)0xFF00);
+    if (_mm_testz_si128(in, high_byte_mask)) {
       // Pack 16-bit characters into 8-bit and store in latin1_output
-      __m128i lo = _mm256_extractf128_si256(in, 0);
-      __m128i hi = _mm256_extractf128_si256(in, 1);
-      __m128i latin1_packed_lo = _mm_packus_epi16(lo, lo);
-      __m128i latin1_packed_hi = _mm_packus_epi16(hi, hi);
+      __m128i latin1_packed = _mm_packus_epi16(in, in);
       _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
-                       latin1_packed_lo);
-      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output + 8),
-                       latin1_packed_hi);
+                       latin1_packed);
       // Adjust pointers for next iteration
-      buf += 16;
-      latin1_output += 16;
+      buf += 8;
+      latin1_output += 8;
     } else {
       return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
@@ -26250,54 +39001,47 @@ avx2_convert_utf16_to_latin1(const char16_t *buf, size_t len,
 
 template <endianness big_endian>
 std::pair<result, char *>
-avx2_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
-                                         char *latin1_output) {
+sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                        char *latin1_output) {
   const char16_t *start = buf;
   const char16_t *end = buf + len;
-  while (end - buf >= 16) {
-    __m256i in = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(buf));
+  while (end - buf >= 8) {
+    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
 
     if (!match_system(big_endian)) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
 
-    __m256i high_byte_mask = _mm256_set1_epi16((int16_t)0xFF00);
-    if (_mm256_testz_si256(in, high_byte_mask)) {
-      __m128i lo = _mm256_extractf128_si256(in, 0);
-      __m128i hi = _mm256_extractf128_si256(in, 1);
-      __m128i latin1_packed_lo = _mm_packus_epi16(lo, lo);
-      __m128i latin1_packed_hi = _mm_packus_epi16(hi, hi);
+    __m128i high_byte_mask = _mm_set1_epi16((int16_t)0xFF00);
+    if (_mm_testz_si128(in, high_byte_mask)) {
+      __m128i latin1_packed = _mm_packus_epi16(in, in);
       _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
-                       latin1_packed_lo);
-      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output + 8),
-                       latin1_packed_hi);
-      buf += 16;
-      latin1_output += 16;
+                       latin1_packed);
+      buf += 8;
+      latin1_output += 8;
     } else {
       // Fallback to scalar code for handling errors
-      for (int k = 0; k < 16; k++) {
+      for (int k = 0; k < 8; k++) {
         uint16_t word = !match_system(big_endian)
                             ? scalar::utf16::swap_bytes(buf[k])
                             : buf[k];
         if (word <= 0xff) {
           *latin1_output++ = char(word);
         } else {
-          return std::make_pair(
-              result{error_code::TOO_LARGE, (size_t)(buf - start + k)},
-              latin1_output);
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
         }
       }
-      buf += 16;
+      buf += 8;
     }
   } // while
-  return std::make_pair(result{error_code::SUCCESS, (size_t)(buf - start)},
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
                         latin1_output);
 }
-/* end file src/haswell/avx2_convert_utf16_to_latin1.cpp */
-/* begin file src/haswell/avx2_convert_utf16_to_utf8.cpp */
+/* end file src/westmere/sse_convert_utf16_to_latin1.cpp */
+/* begin file src/westmere/sse_convert_utf16_to_utf8.cpp */
 /*
     The vectorized algorithm works on single SSE register i.e., it
     loads eight 16-bit code units.
@@ -26353,117 +39097,91 @@ avx2_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
 */
 template <endianness big_endian>
 std::pair<const char16_t *, char *>
-avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
+sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
+
   const char16_t *end = buf + len;
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
-  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
-  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
+
+  const __m128i v_0000 = _mm_setzero_si128();
+  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
+  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
-    const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
-    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
+    const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
+    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
+      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        nextin = _mm_shuffle_epi8(nextin, swap);
+      }
+      if (!_mm_testz_si128(nextin, v_ff80)) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, in);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
-    // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
-
-    // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
-
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
 
-      // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
-
-      // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
-      // 4. pack the bytes
-
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
-
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+    // no bits set above 7th bit
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+    // no bits set above 11th bit
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
-      // 6. adjust pointers
-      buf += 16;
+    if (one_or_two_bytes_bitmask == 0xffff) {
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
+      buf += 8;
       continue;
     }
+
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x00000000) {
+    if (surrogates_bitmask == 0x0000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
@@ -26492,90 +39210,67 @@ avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      const __m128i s0 = _mm_srli_epi16(in, 4);
       // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
       // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
       // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        buf += 8;
         continue;
-      }*/
+      }
       const uint8_t mask0 = uint8_t(mask);
+
       const uint8_t *row0 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
       const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+
       const uint8_t *row1 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
       const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
-
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
-
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
       _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
       _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
+
+      buf += 8;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -26617,6 +39312,7 @@ avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
       buf += k;
     }
   } // while
+
   return std::make_pair(buf, utf8_output);
 }
 
@@ -26629,120 +39325,92 @@ avx2_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
 */
 template <endianness big_endian>
 std::pair<result, char *>
-avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
-                                       char *utf8_output) {
+sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                      char *utf8_output) {
   const char16_t *start = buf;
   const char16_t *end = buf + len;
 
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
-  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
-  const __m256i v_c080 = _mm256_set1_epi16((int16_t)0xc080);
+  const __m128i v_0000 = _mm_setzero_si128();
+  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
+  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
     // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
-    const __m256i v_ff80 = _mm256_set1_epi16((int16_t)0xff80);
-    if (_mm256_testz_si256(in, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in), _mm256_extractf128_si256(in, 1));
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
+    const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
+    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
+      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+      if (big_endian) {
+        const __m128i swap =
+            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+        nextin = _mm_shuffle_epi8(nextin, swap);
+      }
+      if (!_mm_testz_si128(nextin, v_ff80)) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, in);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
+
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
-
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
-
-      // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in, one_byte_bytemask);
-
-      // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
-      // 4. pack the bytes
-
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
-
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
-
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
 
-      // 6. adjust pointers
-      buf += 16;
+    if (one_or_two_bytes_bitmask == 0xffff) {
+      internal::westmere::write_v_u16_11bits_to_utf8(
+          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
+      buf += 8;
       continue;
     }
+
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x00000000) {
+    if (surrogates_bitmask == 0x0000) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
@@ -26771,90 +39439,67 @@ avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in, dup_even);
+      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in, 4);
+      const __m128i s0 = _mm_srli_epi16(in, 4);
       // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
       // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
       // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        buf += 8;
         continue;
-      }*/
+      }
       const uint8_t mask0 = uint8_t(mask);
+
       const uint8_t *row0 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
       const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+
       const uint8_t *row1 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
       const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
-
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
-
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
       _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
       _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
+
+      buf += 8;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -26898,10 +39543,11 @@ avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
       buf += k;
     }
   } // while
+
   return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
 }
-/* end file src/haswell/avx2_convert_utf16_to_utf8.cpp */
-/* begin file src/haswell/avx2_convert_utf16_to_utf32.cpp */
+/* end file src/westmere/sse_convert_utf16_to_utf8.cpp */
+/* begin file src/westmere/sse_convert_utf16_to_utf32.cpp */
 /*
     The vectorized algorithm works on single SSE register i.e., it
     loads eight 16-bit code units.
@@ -26910,14 +39556,14 @@ avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
     1. an input register contains no surrogates and each value
        is in range 0x0000 .. 0x07ff.
     2. an input register contains no surrogates and values are
-       in range 0x0000 .. 0xffff.
+       is in range 0x0000 .. 0xffff.
     3. an input register contains surrogates --- i.e. codepoints
        can have 16 or 32 bits.
 
     Ad 1.
 
     When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    can be converted into: 1) single UTF8 byte (when it's an ASCII
     char) or 2) two UTF8 bytes.
 
     For this case we do only some shuffle to obtain these 2-byte
@@ -26952,48 +39598,47 @@ avx2_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
 */
 
 /*
-  Returns a pair: the first unprocessed byte from buf and utf32_output
+  Returns a pair: the first unprocessed byte from buf and utf8_output
   A scalar routing should carry on the conversion of the tail.
 */
 template <endianness big_endian>
 std::pair<const char16_t *, char32_t *>
-avx2_convert_utf16_to_utf32(const char16_t *buf, size_t len,
-                            char32_t *utf32_output) {
+sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_output) {
   const char16_t *end = buf + len;
-  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
-  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
 
-  while (end - buf >= 16) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
+  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
+
+  while (end - buf >= 8) {
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x00000000) {
-      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
-      // units
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
-                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
-      _mm256_storeu_si256(
-          reinterpret_cast<__m256i *>(utf32_output + 8),
-          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
-      utf32_output += 16;
-      buf += 16;
+    if (surrogates_bitmask == 0x0000) {
+      // case: no surrogate pair, extend 16-bit code units to 32-bit code units
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                       _mm_cvtepu16_epi32(in));
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
+      utf32_output += 8;
+      buf += 8;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -27007,7 +39652,6 @@ avx2_convert_utf16_to_utf32(const char16_t *buf, size_t len,
       for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
         if ((word & 0xF800) != 0xD800) {
-          // No surrogate pair
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
@@ -27038,44 +39682,43 @@ avx2_convert_utf16_to_utf32(const char16_t *buf, size_t len,
 */
 template <endianness big_endian>
 std::pair<result, char32_t *>
-avx2_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
-                                        char32_t *utf32_output) {
+sse_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                       char32_t *utf32_output) {
   const char16_t *start = buf;
   const char16_t *end = buf + len;
-  const __m256i v_f800 = _mm256_set1_epi16((int16_t)0xf800);
-  const __m256i v_d800 = _mm256_set1_epi16((int16_t)0xd800);
 
-  while (end - buf >= 16) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
+  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
+
+  while (end - buf >= 8) {
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+
     if (big_endian) {
-      const __m256i swap = _mm256_setr_epi8(
-          1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18,
-          21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 31, 30);
-      in = _mm256_shuffle_epi8(in, swap);
+      const __m128i swap =
+          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
+      in = _mm_shuffle_epi8(in, swap);
     }
 
     // 1. Check if there are any surrogate word in the input chunk.
     //    We have also deal with situation when there is a surrogate word
     //    at the end of a chunk.
-    const __m256i surrogates_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in, v_f800), v_d800);
+    const __m128i surrogates_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
 
     // bitmask = 0x0000 if there are no surrogates
     //         = 0xc000 if the last word is a surrogate
-    const uint32_t surrogates_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(surrogates_bytemask));
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x00000000) {
-      // case: we extend all sixteen 16-bit code units to sixteen 32-bit code
-      // units
-      _mm256_storeu_si256(reinterpret_cast<__m256i *>(utf32_output),
-                          _mm256_cvtepu16_epi32(_mm256_castsi256_si128(in)));
-      _mm256_storeu_si256(
-          reinterpret_cast<__m256i *>(utf32_output + 8),
-          _mm256_cvtepu16_epi32(_mm256_extractf128_si256(in, 1)));
-      utf32_output += 16;
-      buf += 16;
+    if (surrogates_bitmask == 0x0000) {
+      // case: no surrogate pair, extend 16-bit code units to 32-bit code units
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
+                       _mm_cvtepu16_epi32(in));
+      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
+                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
+      utf32_output += 8;
+      buf += 8;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -27089,7 +39732,6 @@ avx2_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
       for (; k < forward; k++) {
         uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
         if ((word & 0xF800) != 0xD800) {
-          // No surrogate pair
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
@@ -27112,229 +39754,290 @@ avx2_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
   } // while
   return std::make_pair(result(error_code::SUCCESS, buf - start), utf32_output);
 }
-/* end file src/haswell/avx2_convert_utf16_to_utf32.cpp */
+/* end file src/westmere/sse_convert_utf16_to_utf32.cpp */
 
-/* begin file src/haswell/avx2_convert_utf32_to_latin1.cpp */
+/* begin file src/westmere/sse_convert_utf32_to_latin1.cpp */
 std::pair<const char32_t *, char *>
-avx2_convert_utf32_to_latin1(const char32_t *buf, size_t len,
-                             char *latin1_output) {
-  const size_t rounded_len =
-      len & ~0x1F; // Round down to nearest multiple of 32
-
-  __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
+sse_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                            char *latin1_output) {
+  const size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
 
-  __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
-                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
+  __m128i high_bytes_mask = _mm_set1_epi32(0xFFFFFF00);
+  __m128i shufmask =
+      _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
 
   for (size_t i = 0; i < rounded_len; i += 16) {
-    __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
-    __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
+    __m128i in1 = _mm_loadu_si128((__m128i *)buf);
+    __m128i in2 = _mm_loadu_si128((__m128i *)(buf + 4));
+    __m128i in3 = _mm_loadu_si128((__m128i *)(buf + 8));
+    __m128i in4 = _mm_loadu_si128((__m128i *)(buf + 12));
 
-    __m256i check_combined = _mm256_or_si256(in1, in2);
+    __m128i check_combined = _mm_or_si128(in1, in2);
+    check_combined = _mm_or_si128(check_combined, in3);
+    check_combined = _mm_or_si128(check_combined, in4);
 
-    if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
+    if (!_mm_testz_si128(check_combined, high_bytes_mask)) {
       return std::make_pair(nullptr, latin1_output);
     }
-
-    // Turn UTF32 bytes into latin 1 bytes
-    __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
-    __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
-
-    // move Latin1 bytes to their correct spot
-    __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
-    __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
-    __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
-    __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
-
-    __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
-    _mm_storeu_si128((__m128i *)latin1_output, _mm256_castsi256_si128(result));
-
+    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
+                                       _mm_shuffle_epi8(in2, shufmask));
+    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
+                                       _mm_shuffle_epi8(in4, shufmask));
+    __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
+    _mm_storeu_si128((__m128i *)latin1_output, pack);
     latin1_output += 16;
     buf += 16;
   }
 
   return std::make_pair(buf, latin1_output);
 }
-std::pair<result, char *>
-avx2_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
-                                         char *latin1_output) {
-  const size_t rounded_len =
-      len & ~0x1F; // Round down to nearest multiple of 32
 
-  __m256i high_bytes_mask = _mm256_set1_epi32(0xFFFFFF00);
-  __m256i shufmask = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-                                     -1, 12, 8, 4, 0, -1, -1, -1, -1, -1, -1,
-                                     -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
+std::pair<result, char *>
+sse_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char32_t *start = buf;
+  const size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
 
-  const char32_t *start = buf;
+  __m128i high_bytes_mask = _mm_set1_epi32(0xFFFFFF00);
+  __m128i shufmask =
+      _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
 
   for (size_t i = 0; i < rounded_len; i += 16) {
-    __m256i in1 = _mm256_loadu_si256((__m256i *)buf);
-    __m256i in2 = _mm256_loadu_si256((__m256i *)(buf + 8));
+    __m128i in1 = _mm_loadu_si128((__m128i *)buf);
+    __m128i in2 = _mm_loadu_si128((__m128i *)(buf + 4));
+    __m128i in3 = _mm_loadu_si128((__m128i *)(buf + 8));
+    __m128i in4 = _mm_loadu_si128((__m128i *)(buf + 12));
 
-    __m256i check_combined = _mm256_or_si256(in1, in2);
+    __m128i check_combined = _mm_or_si128(in1, in2);
+    check_combined = _mm_or_si128(check_combined, in3);
+    check_combined = _mm_or_si128(check_combined, in4);
 
-    if (!_mm256_testz_si256(check_combined, high_bytes_mask)) {
+    if (!_mm_testz_si128(check_combined, high_bytes_mask)) {
       // Fallback to scalar code for handling errors
-      for (int k = 0; k < 8; k++) {
+      for (int k = 0; k < 16; k++) {
         char32_t codepoint = buf[k];
-        if (codepoint <= 0xFF) {
-          *latin1_output++ = static_cast<char>(codepoint);
+        if (codepoint <= 0xff) {
+          *latin1_output++ = char(codepoint);
         } else {
           return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
                                 latin1_output);
         }
       }
-      buf += 8;
-    } else {
-      __m256i shuffled1 = _mm256_shuffle_epi8(in1, shufmask);
-      __m256i shuffled2 = _mm256_shuffle_epi8(in2, shufmask);
-
-      __m256i idx1 = _mm256_set_epi32(-1, -1, -1, -1, -1, -1, 4, 0);
-      __m256i idx2 = _mm256_set_epi32(-1, -1, -1, -1, 4, 0, -1, -1);
-      __m256i reshuffled1 = _mm256_permutevar8x32_epi32(shuffled1, idx1);
-      __m256i reshuffled2 = _mm256_permutevar8x32_epi32(shuffled2, idx2);
-
-      __m256i result = _mm256_or_si256(reshuffled1, reshuffled2);
-      _mm_storeu_si128((__m128i *)latin1_output,
-                       _mm256_castsi256_si128(result));
-
-      latin1_output += 16;
       buf += 16;
+      continue;
     }
+    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
+                                       _mm_shuffle_epi8(in2, shufmask));
+    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
+                                       _mm_shuffle_epi8(in4, shufmask));
+    __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
+    _mm_storeu_si128((__m128i *)latin1_output, pack);
+    latin1_output += 16;
+    buf += 16;
   }
 
   return std::make_pair(result(error_code::SUCCESS, buf - start),
                         latin1_output);
 }
-/* end file src/haswell/avx2_convert_utf32_to_latin1.cpp */
-/* begin file src/haswell/avx2_convert_utf32_to_utf8.cpp */
+/* end file src/westmere/sse_convert_utf32_to_latin1.cpp */
+/* begin file src/westmere/sse_convert_utf32_to_utf8.cpp */
 std::pair<const char32_t *, char *>
-avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
+sse_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
   const char32_t *end = buf + len;
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
-  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
-  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
-  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
-  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
-  __m256i running_max = _mm256_setzero_si256();
-  __m256i forbidden_bytemask = _mm256_setzero_si256();
 
+  const __m128i v_0000 = _mm_setzero_si128();              //__m128 = 128 bits
+  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800); // 1111 1000 0000
+                                                           // 0000
+  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080); // 1100 0000 1000
+                                                           // 0000
+  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80); // 1111 1111 1000
+                                                           // 0000
+  const __m128i v_ffff0000 = _mm_set1_epi32(
+      (uint32_t)0xffff0000); // 1111 1111 1111 1111 0000 0000 0000 0000
+  const __m128i v_7fffffff = _mm_set1_epi32(
+      (uint32_t)0x7fffffff); // 0111 1111 1111 1111 1111 1111 1111 1111
+  __m128i running_max = _mm_setzero_si128();
+  __m128i forbidden_bytemask = _mm_setzero_si128();
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
-    running_max = _mm256_max_epu32(_mm256_max_epu32(in, running_max), nextin);
+  while (end - buf >=
+         std::ptrdiff_t(
+             16 + safety_margin)) { // buf is a char32_t pointer, each char32_t
+                                    // has 4 bytes or 32 bits, thus buf + 16 *
+                                    // char_32t = 512 bits = 64 bytes
+    // We load two 16 bytes registers for a total of 32 bytes or 16 characters.
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128(
+        (__m128i *)buf + 1); // These two values can hold only 8 UTF32 chars
+    running_max = _mm_max_epu32(
+        _mm_max_epu32(in, running_max), // take element-wise max char32_t from
+                                        // in and running_max vector
+        nextin); // and take element-wise max element from nextin and
+                 // running_max vector
 
     // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
     // saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
-                                        _mm256_and_si256(nextin, v_7fffffff));
-    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
+    __m128i in_16 = _mm_packus_epi32(
+        _mm_and_si128(in, v_7fffffff),
+        _mm_and_si128(
+            nextin,
+            v_7fffffff)); // in this context pack the two __m128 into a single
+    // By ensuring the highest bit is set to 0(&v_7fffffff), we are making sure
+    // all values are interpreted as non-negative, or specifically, the values
+    // are within the range of valid Unicode code points. remember : having
+    // leading byte 0 means a positive number by the two complements system.
+    // Unicode is well beneath the range where you'll start getting issues so
+    // that's OK.
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits
-    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
 
-    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
+    // Check for ASCII fast path
+
+    // ASCII fast path!!!!
+    // We eagerly load another 32 bytes, hoping that they will be ASCII too.
+    // The intuition is that we try to collect 16 ASCII characters which
+    // requires a total of 64 bytes of input. If we fail, we just pass thirdin
+    // and fourthin as our new inputs.
+    if (_mm_testz_si128(in_16, v_ff80)) { // if the first two blocks are ASCII
+      __m128i thirdin = _mm_loadu_si128((__m128i *)buf + 2);
+      __m128i fourthin = _mm_loadu_si128((__m128i *)buf + 3);
+      running_max = _mm_max_epu32(
+          _mm_max_epu32(thirdin, running_max),
+          fourthin); // take the running max of all 4 vectors thus far
+      __m128i nextin_16 = _mm_packus_epi32(
+          _mm_and_si128(thirdin, v_7fffffff),
+          _mm_and_si128(fourthin,
+                        v_7fffffff)); // pack into 1 vector, now you have two
+      if (!_mm_testz_si128(
+              nextin_16,
+              v_ff80)) { // checks if the second packed vector is ASCII, if not:
+        // 1. pack the bytes
+        // obviously suboptimal.
+        const __m128i utf8_packed = _mm_packus_epi16(
+            in_16, in_16); // creates two copy of in_16 in 1 vector
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output,
+                         utf8_packed); // put them into the output
+        // 3. adjust pointers
+        buf += 8; // the char32_t buffer pointer goes up 8 char32_t chars* 32
+                  // bits =  256 bits
+        utf8_output +=
+            8; // same with output, e.g. lift the first two blocks alone.
+        // Proceed with next input
+        in_16 = nextin_16;
+        // We need to update in and nextin because they are used later.
+        in = thirdin;
+        nextin = fourthin;
+      } else {
+        // 1. pack the bytes
+        const __m128i utf8_packed = _mm_packus_epi16(in_16, nextin_16);
+        // 2. store (16 bytes)
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      }
     }
-    // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+
+    // no bits set above 7th bit -- find out all the ASCII characters
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16( // this takes four bytes at a time and compares:
+            _mm_and_si128(in_16, v_ff80), // the vector that get only the first
+                                          // 9 bits of each 16-bit/2-byte units
+            v_0000                        //
+        ); // they should be all zero if they are ASCII. E.g. ASCII in UTF32 is
+           // of format 0000 0000 0000 0XXX XXXX
+    // _mm_cmpeq_epi16 should now return a 1111 1111 1111 1111 for equals, and
+    // 0000 0000 0000 0000 if not for each 16-bit/2-byte units
+    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(
+        one_byte_bytemask)); // collect the MSB from previous vector and put
+                             // them into uint16_t mas
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+
+    if (one_or_two_bytes_bitmask == 0xffff) {
+      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
+      // produces 2 bytes)
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
       // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+      const __m128i v_1f00 =
+          _mm_set1_epi16((int16_t)0x1f00); // 0001 1111 0000 0000
+      const __m128i v_003f =
+          _mm_set1_epi16((int16_t)0x003f); // 0000 0000 0011 1111
 
       // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      const __m128i t0 = _mm_slli_epi16(in_16, 2); // shift packed vector by two
       // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      const __m128i t1 =
+          _mm_and_si128(t0, v_1f00); // potentital first utf8 byte
       // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      const __m128i t2 =
+          _mm_and_si128(in_16, v_003f); // potential second utf8 byte
       // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
+      const __m128i t3 =
+          _mm_or_si128(t1, t2); // first and second potential utf8 byte together
       // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+      const __m128i t4 = _mm_or_si128(
+          t3,
+          v_c080); // t3 | 1100 0000 1000 0000 = full potential 2-byte utf8 unit
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m128i utf8_unpacked =
+          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
+      //    MSB, a - LSB)
+      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
+      const uint16_t m1 =
+          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+      const uint8_t m2 =
+          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
       // 4. pack the bytes
-
       const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
-
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
       const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
+      const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
 
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
       // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
 
       // 6. adjust pointers
-      buf += 16;
+      buf += 8;
+      utf8_output += row[0];
       continue;
     }
-    // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
-        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+
+    // Check for overflow in packing
+
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
     const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
-    if (saturation_bitmask == 0xffffffff) {
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+    if (saturation_bitmask == 0xffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(
-          forbidden_bytemask,
-          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800));
+      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
+      forbidden_bytemask =
+          _mm_or_si128(forbidden_bytemask,
+                       _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800));
 
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
         single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+        two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
         three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
@@ -27356,95 +40059,72 @@ avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      const __m128i t0 = _mm_shuffle_epi8(in_16, dup_even);
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      const __m128i s0 = _mm_srli_epi16(in_16, 4);
       // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
       // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
       // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        buf += 8;
         continue;
-      }*/
+      }
       const uint8_t mask0 = uint8_t(mask);
+
       const uint8_t *row0 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
       const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+
       const uint8_t *row1 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
       const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
-
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
-
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
       _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
       _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
+
+      buf += 8;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
-      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD may require
-      // large, non-trivial tables?
+      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD in the
+      // presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
@@ -27452,19 +40132,19 @@ avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
       }
       for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+        } else if ((word & 0xFFFFF800) == 0) {
           *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+        } else if ((word & 0xFFFF0000) == 0) {
           if (word >= 0xD800 && word <= 0xDFFF) {
             return std::make_pair(nullptr, utf8_output);
           }
           *utf8_output++ = char((word >> 12) | 0b11100000);
           *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else { // 4-byte
+        } else {
           if (word > 0x10FFFF) {
             return std::make_pair(nullptr, utf8_output);
           }
@@ -27479,13 +40159,13 @@ avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
   } // while
 
   // check for invalid input
-  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(_mm256_cmpeq_epi32(
-          _mm256_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffffffff) {
+  const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
+  if (static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi32(
+          _mm_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffff) {
     return std::make_pair(nullptr, utf8_output);
   }
 
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
     return std::make_pair(nullptr, utf8_output);
   }
 
@@ -27493,145 +40173,141 @@ avx2_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
 }
 
 std::pair<result, char *>
-avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
-                                       char *utf8_output) {
+sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                      char *utf8_output) {
   const char32_t *end = buf + len;
   const char32_t *start = buf;
 
-  const __m256i v_0000 = _mm256_setzero_si256();
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
-  const __m256i v_ff80 = _mm256_set1_epi16((uint16_t)0xff80);
-  const __m256i v_f800 = _mm256_set1_epi16((uint16_t)0xf800);
-  const __m256i v_c080 = _mm256_set1_epi16((uint16_t)0xc080);
-  const __m256i v_7fffffff = _mm256_set1_epi32((uint32_t)0x7fffffff);
-  const __m256i v_10ffff = _mm256_set1_epi32((uint32_t)0x10ffff);
+  const __m128i v_0000 = _mm_setzero_si128();
+  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
+  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080);
+  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80);
+  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
+  const __m128i v_7fffffff = _mm_set1_epi32((uint32_t)0x7fffffff);
+  const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
 
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
   while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
-    __m256i nextin = _mm256_loadu_si256((__m256i *)buf + 1);
+    // We load two 16 bytes registers for a total of 32 bytes or 8 characters.
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
     // Check for too large input
-    const __m256i max_input =
-        _mm256_max_epu32(_mm256_max_epu32(in, nextin), v_10ffff);
-    if (static_cast<uint32_t>(_mm256_movemask_epi8(
-            _mm256_cmpeq_epi32(max_input, v_10ffff))) != 0xffffffff) {
+    __m128i max_input = _mm_max_epu32(_mm_max_epu32(in, nextin), v_10ffff);
+    if (static_cast<uint16_t>(_mm_movemask_epi8(
+            _mm_cmpeq_epi32(max_input, v_10ffff))) != 0xffff) {
       return std::make_pair(result(error_code::TOO_LARGE, buf - start),
                             utf8_output);
     }
 
     // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
     // saturation
-    __m256i in_16 = _mm256_packus_epi32(_mm256_and_si256(in, v_7fffffff),
-                                        _mm256_and_si256(nextin, v_7fffffff));
-    in_16 = _mm256_permute4x64_epi64(in_16, 0b11011000);
+    __m128i in_16 = _mm_packus_epi32(_mm_and_si128(in, v_7fffffff),
+                                     _mm_and_si128(nextin, v_7fffffff));
 
-    // Try to apply UTF-16 => UTF-8 routine on 256 bits
-    // (haswell/avx2_convert_utf16_to_utf8.cpp)
+    // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
 
-    if (_mm256_testz_si256(in_16, v_ff80)) { // ASCII fast path!!!!
+    // Check for ASCII fast path
+    if (_mm_testz_si128(in_16, v_ff80)) { // ASCII fast path!!!!
       // 1. pack the bytes
-      const __m128i utf8_packed = _mm_packus_epi16(
-          _mm256_castsi256_si128(in_16), _mm256_extractf128_si256(in_16, 1));
+      // obviously suboptimal.
+      const __m128i utf8_packed = _mm_packus_epi16(in_16, in_16);
       // 2. store (16 bytes)
       _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
       // 3. adjust pointers
-      buf += 16;
-      utf8_output += 16;
-      continue; // we are done for this round!
+      buf += 8;
+      utf8_output += 8;
+      continue;
     }
+
     // no bits set above 7th bit
-    const __m256i one_byte_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_ff80), v_0000);
-    const uint32_t one_byte_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_byte_bytemask));
+    const __m128i one_byte_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_ff80), v_0000);
+    const uint16_t one_byte_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
 
     // no bits set above 11th bit
-    const __m256i one_or_two_bytes_bytemask =
-        _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_0000);
-    const uint32_t one_or_two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(one_or_two_bytes_bytemask));
-    if (one_or_two_bytes_bitmask == 0xffffffff) {
+    const __m128i one_or_two_bytes_bytemask =
+        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
+    const uint16_t one_or_two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+
+    if (one_or_two_bytes_bitmask == 0xffff) {
+      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
+      // produces 2 bytes)
       // 1. prepare 2-byte values
       // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
       // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m256i v_1f00 = _mm256_set1_epi16((int16_t)0x1f00);
-      const __m256i v_003f = _mm256_set1_epi16((int16_t)0x003f);
+      const __m128i v_1f00 = _mm_set1_epi16((int16_t)0x1f00);
+      const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f);
 
       // t0 = [000a|aaaa|bbbb|bb00]
-      const __m256i t0 = _mm256_slli_epi16(in_16, 2);
+      const __m128i t0 = _mm_slli_epi16(in_16, 2);
       // t1 = [000a|aaaa|0000|0000]
-      const __m256i t1 = _mm256_and_si256(t0, v_1f00);
+      const __m128i t1 = _mm_and_si128(t0, v_1f00);
       // t2 = [0000|0000|00bb|bbbb]
-      const __m256i t2 = _mm256_and_si256(in_16, v_003f);
+      const __m128i t2 = _mm_and_si128(in_16, v_003f);
       // t3 = [000a|aaaa|00bb|bbbb]
-      const __m256i t3 = _mm256_or_si256(t1, t2);
+      const __m128i t3 = _mm_or_si128(t1, t2);
       // t4 = [110a|aaaa|10bb|bbbb]
-      const __m256i t4 = _mm256_or_si256(t3, v_c080);
+      const __m128i t4 = _mm_or_si128(t3, v_c080);
 
       // 2. merge ASCII and 2-byte codewords
-      const __m256i utf8_unpacked =
-          _mm256_blendv_epi8(t4, in_16, one_byte_bytemask);
+      const __m128i utf8_unpacked =
+          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
 
       // 3. prepare bitmask for 8-bit lookup
-      const uint32_t M0 = one_byte_bitmask & 0x55555555;
-      const uint32_t M1 = M0 >> 7;
-      const uint32_t M2 = (M1 | M0) & 0x00ff00ff;
+      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
+      //    MSB, a - LSB)
+      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
+      const uint16_t m1 =
+          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
+      const uint8_t m2 =
+          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
       // 4. pack the bytes
-
       const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2)][0];
-      const uint8_t *row_2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[uint8_t(M2 >>
-                                                                       16)][0];
-
+          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
       const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i shuffle_2 = _mm_loadu_si128((__m128i *)(row_2 + 1));
-
-      const __m256i utf8_packed = _mm256_shuffle_epi8(
-          utf8_unpacked, _mm256_setr_m128i(shuffle, shuffle_2));
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_castsi256_si128(utf8_packed));
-      utf8_output += row[0];
-      _mm_storeu_si128((__m128i *)utf8_output,
-                       _mm256_extractf128_si256(utf8_packed, 1));
-      utf8_output += row_2[0];
+      const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
+
+      // 5. store bytes
+      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
 
       // 6. adjust pointers
-      buf += 16;
+      buf += 8;
+      utf8_output += row[0];
       continue;
     }
-    // Must check for overflow in packing
-    const __m256i saturation_bytemask = _mm256_cmpeq_epi32(
-        _mm256_and_si256(_mm256_or_si256(in, nextin), v_ffff0000), v_0000);
+
+    // Check for overflow in packing
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
     const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
-    if (saturation_bitmask == 0xffffffff) {
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+
+    if (saturation_bitmask == 0xffff) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
 
       // Check for illegal surrogate code units
-      const __m256i v_d800 = _mm256_set1_epi16((uint16_t)0xd800);
-      const __m256i forbidden_bytemask =
-          _mm256_cmpeq_epi16(_mm256_and_si256(in_16, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
-          0x0) {
+      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
+      const __m128i forbidden_bytemask =
+          _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
         return std::make_pair(result(error_code::SURROGATE, buf - start),
                               utf8_output);
       }
 
-      const __m256i dup_even = _mm256_setr_epi16(
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e,
-          0x0000, 0x0202, 0x0404, 0x0606, 0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
+      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
+                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
 
       /* In this branch we handle three cases:
-        1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
         single UFT-8 byte
-        2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-        3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+        two UTF-8 bytes
+          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
         three UTF-8 bytes
 
         We expand the input word (16-bit) into two code units (32-bit), thus
@@ -27653,95 +40329,72 @@ avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm256_set1_epi16(static_cast<uint16_t>(x))
+#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m256i t0 = _mm256_shuffle_epi8(in_16, dup_even);
+      const __m128i t0 = _mm_shuffle_epi8(in_16, dup_even);
       // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m256i t1 = _mm256_and_si256(t0, simdutf_vec(0b0011111101111111));
+      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m256i t2 = _mm256_or_si256(t1, simdutf_vec(0b1000000000000000));
+      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
 
       // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m256i s0 = _mm256_srli_epi16(in_16, 4);
+      const __m128i s0 = _mm_srli_epi16(in_16, 4);
       // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m256i s1 = _mm256_and_si256(s0, simdutf_vec(0b0000111111111100));
+      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
       // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m256i s2 = _mm256_maddubs_epi16(s1, simdutf_vec(0x0140));
+      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
       // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m256i s3 = _mm256_or_si256(s2, simdutf_vec(0b1100000011100000));
-      const __m256i m0 = _mm256_andnot_si256(one_or_two_bytes_bytemask,
-                                             simdutf_vec(0b0100000000000000));
-      const __m256i s4 = _mm256_xor_si256(s3, m0);
+      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
+      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
+                                          simdutf_vec(0b0100000000000000));
+      const __m128i s4 = _mm_xor_si128(s3, m0);
 #undef simdutf_vec
 
       // 4. expand code units 16-bit => 32-bit
-      const __m256i out0 = _mm256_unpacklo_epi16(t2, s4);
-      const __m256i out1 = _mm256_unpackhi_epi16(t2, s4);
+      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
+      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint32_t mask = (one_byte_bitmask & 0x55555555) |
-                            (one_or_two_bytes_bitmask & 0xaaaaaaaa);
-      // Due to the wider registers, the following path is less likely to be
-      // useful.
-      /*if(mask == 0) {
+      const uint16_t mask =
+          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
+      if (mask == 0) {
         // We only have three-byte code units. Use fast path.
-        const __m256i shuffle =
-      _mm256_setr_epi8(2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1,
-      2,3,1,6,7,5,10,11,9,14,15,13,-1,-1,-1,-1); const __m256i utf8_0 =
-      _mm256_shuffle_epi8(out0, shuffle); const __m256i utf8_1 =
-      _mm256_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_0));
+        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
+                                              15, 13, -1, -1, -1, -1);
+        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
+        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output, _mm256_castsi256_si128(utf8_1));
+        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
         utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_0,1)); utf8_output += 12;
-        _mm_storeu_si128((__m128i*)utf8_output,
-      _mm256_extractf128_si256(utf8_1,1)); utf8_output += 12; buf += 16;
+        buf += 8;
         continue;
-      }*/
+      }
       const uint8_t mask0 = uint8_t(mask);
+
       const uint8_t *row0 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
       const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out0), shuffle0);
+      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
 
       const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+
       const uint8_t *row1 =
           &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
       const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 =
-          _mm_shuffle_epi8(_mm256_castsi256_si128(out1), shuffle1);
-
-      const uint8_t mask2 = static_cast<uint8_t>(mask >> 16);
-      const uint8_t *row2 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask2][0];
-      const __m128i shuffle2 = _mm_loadu_si128((__m128i *)(row2 + 1));
-      const __m128i utf8_2 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out0, 1), shuffle2);
-
-      const uint8_t mask3 = static_cast<uint8_t>(mask >> 24);
-      const uint8_t *row3 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask3][0];
-      const __m128i shuffle3 = _mm_loadu_si128((__m128i *)(row3 + 1));
-      const __m128i utf8_3 =
-          _mm_shuffle_epi8(_mm256_extractf128_si256(out1, 1), shuffle3);
+      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
 
       _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
       utf8_output += row0[0];
       _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
       utf8_output += row1[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_2);
-      utf8_output += row2[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_3);
-      utf8_output += row3[0];
-      buf += 16;
+
+      buf += 8;
     } else {
-      // case: at least one 32-bit word is larger than 0xFFFF <=> it will
-      // produce four UTF-8 bytes. Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD may require
-      // large, non-trivial tables?
+      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
+      // wasteful to use scalar code, but being efficient with SIMD in the
+      // presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
@@ -27749,12 +40402,12 @@ avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
       }
       for (; k < forward; k++) {
         uint32_t word = buf[k];
-        if ((word & 0xFFFFFF80) == 0) { // 1-byte (ASCII)
+        if ((word & 0xFFFFFF80) == 0) {
           *utf8_output++ = char(word);
-        } else if ((word & 0xFFFFF800) == 0) { // 2-byte
+        } else if ((word & 0xFFFFF800) == 0) {
           *utf8_output++ = char((word >> 6) | 0b11000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else if ((word & 0xFFFF0000) == 0) { // 3-byte
+        } else if ((word & 0xFFFF0000) == 0) {
           if (word >= 0xD800 && word <= 0xDFFF) {
             return std::make_pair(
                 result(error_code::SURROGATE, buf - start + k), utf8_output);
@@ -27762,7 +40415,7 @@ avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
           *utf8_output++ = char((word >> 12) | 0b11100000);
           *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
-        } else { // 4-byte
+        } else {
           if (word > 0x10FFFF) {
             return std::make_pair(
                 result(error_code::TOO_LARGE, buf - start + k), utf8_output);
@@ -27776,48 +40429,46 @@ avx2_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
       buf += k;
     }
   } // while
-
   return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
 }
-/* end file src/haswell/avx2_convert_utf32_to_utf8.cpp */
-/* begin file src/haswell/avx2_convert_utf32_to_utf16.cpp */
+/* end file src/westmere/sse_convert_utf32_to_utf8.cpp */
+/* begin file src/westmere/sse_convert_utf32_to_utf16.cpp */
 template <endianness big_endian>
 std::pair<const char32_t *, char16_t *>
-avx2_convert_utf32_to_utf16(const char32_t *buf, size_t len,
-                            char16_t *utf16_output) {
-  const char32_t *end = buf + len;
-
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
-  __m256i forbidden_bytemask = _mm256_setzero_si256();
+sse_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                           char16_t *utf16_output) {
 
-  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
+  const char32_t *end = buf + len;
 
-    const __m256i v_00000000 = _mm256_setzero_si256();
-    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
+  const __m128i v_0000 = _mm_setzero_si128();
+  const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
+  __m128i forbidden_bytemask = _mm_setzero_si128();
 
-    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+  while (end - buf >= 8) {
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
     const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
 
-    if (saturation_bitmask == 0xffffffff) {
-      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
-      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      forbidden_bytemask = _mm256_or_si256(
+    // Check if no bits set above 16th
+    if (saturation_bitmask == 0xffff) {
+      // Pack UTF-32 to UTF-16
+      __m128i utf16_packed = _mm_packus_epi32(in, nextin);
+
+      const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
+      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
+      forbidden_bytemask = _mm_or_si128(
           forbidden_bytemask,
-          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800));
+          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800));
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
-                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
         const __m128i swap =
             _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
+
       _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
@@ -27861,7 +40512,7 @@ avx2_convert_utf32_to_utf16(const char32_t *buf, size_t len,
   }
 
   // check for invalid input
-  if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) != 0) {
+  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
     return std::make_pair(nullptr, utf16_output);
   }
 
@@ -27870,45 +40521,42 @@ avx2_convert_utf32_to_utf16(const char32_t *buf, size_t len,
 
 template <endianness big_endian>
 std::pair<result, char16_t *>
-avx2_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
-                                        char16_t *utf16_output) {
+sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                       char16_t *utf16_output) {
   const char32_t *start = buf;
   const char32_t *end = buf + len;
 
-  const size_t safety_margin =
-      12; // to avoid overruns, see issue
-          // https://github.com/simdutf/simdutf/issues/92
-
-  while (end - buf >= std::ptrdiff_t(8 + safety_margin)) {
-    __m256i in = _mm256_loadu_si256((__m256i *)buf);
-
-    const __m256i v_00000000 = _mm256_setzero_si256();
-    const __m256i v_ffff0000 = _mm256_set1_epi32((int32_t)0xffff0000);
+  const __m128i v_0000 = _mm_setzero_si128();
+  const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
 
-    // no bits set above 16th bit <=> can pack to UTF16 without surrogate pairs
-    const __m256i saturation_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
+  while (end - buf >= 8) {
+    __m128i in = _mm_loadu_si128((__m128i *)buf);
+    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
+    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
+        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
     const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(saturation_bytemask));
+        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
 
-    if (saturation_bitmask == 0xffffffff) {
-      const __m256i v_f800 = _mm256_set1_epi32((uint32_t)0xf800);
-      const __m256i v_d800 = _mm256_set1_epi32((uint32_t)0xd800);
-      const __m256i forbidden_bytemask =
-          _mm256_cmpeq_epi32(_mm256_and_si256(in, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm256_movemask_epi8(forbidden_bytemask)) !=
-          0x0) {
+    // Check if no bits set above 16th
+    if (saturation_bitmask == 0xffff) {
+      // Pack UTF-32 to UTF-16
+      __m128i utf16_packed = _mm_packus_epi32(in, nextin);
+
+      const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
+      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
+      const __m128i forbidden_bytemask =
+          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800);
+      if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
         return std::make_pair(result(error_code::SURROGATE, buf - start),
                               utf16_output);
       }
 
-      __m128i utf16_packed = _mm_packus_epi32(_mm256_castsi256_si128(in),
-                                              _mm256_extractf128_si256(in, 1));
       if (big_endian) {
         const __m128i swap =
             _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
         utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
       }
+
       _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
       utf16_output += 8;
       buf += 8;
@@ -27955,72 +40603,8 @@ avx2_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
 
   return std::make_pair(result(error_code::SUCCESS, buf - start), utf16_output);
 }
-/* end file src/haswell/avx2_convert_utf32_to_utf16.cpp */
-
-/* begin file src/haswell/avx2_convert_utf8_to_latin1.cpp */
-// depends on "tables/utf8_to_utf16_tables.h"
-
-// Convert up to 12 bytes from utf8 to latin1 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 12).
-size_t convert_masked_utf8_to_latin1(const char *input,
-                                     uint64_t utf8_end_of_code_point_mask,
-                                     char *&latin1_output) {
-  // we use an approach where we try to process up to 12 input bytes.
-  // Why 12 input bytes and not 16? Because we are concerned with the size of
-  // the lookup tables. Also 12 is nicely divisible by two and three.
-  //
-  //
-  // Optimization note: our main path below is load-latency dependent. Thus it
-  // is maybe beneficial to have fast paths that depend on branch prediction but
-  // have less latency. This results in more instructions but, potentially, also
-  // higher speeds.
-  //
-  const __m128i in = _mm_loadu_si128((__m128i *)input);
-
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask &
-      0xfff; // we are only processing 12 bytes in case it is not all ASCII
-
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process the data in chunks of 12 bytes.
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
-    latin1_output += 12; // We wrote 12 characters.
-    return 12;           // We consumed 1 bytes.
-  }
-  /// We do not have a fast path available, so we fallback.
-  const uint8_t idx =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
-  // this indicates an invalid input:
-  if (idx >= 64) {
-    return consumed;
-  }
-  // Here we should have (idx < 64), if not, there is a bug in the validation or
-  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
-  // we process SIX (6) input code-code units. The max length in bytes of six
-  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
-  // processors where pdep/pext is fast, we might be able to use a small lookup
-  // table.
-  const __m128i sh =
-      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-  const __m128i perm = _mm_shuffle_epi8(in, sh);
-  const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-  const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-  __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
-  // writing 8 bytes even though we only care about the first 6 bytes.
-  // performance note: it would be faster to use _mm_storeu_si128, we should
-  // investigate.
-  _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
-  latin1_output += 6; // We wrote 6 bytes.
-  return consumed;
-}
-/* end file src/haswell/avx2_convert_utf8_to_latin1.cpp */
-
-/* begin file src/haswell/avx2_base64.cpp */
+/* end file src/westmere/sse_convert_utf32_to_utf16.cpp */
+/* begin file src/westmere/sse_base64.cpp */
 /**
  * References and further reading:
  *
@@ -28048,151 +40632,155 @@ size_t convert_masked_utf8_to_latin1(const char *input,
  * Nick Kopp. 2013. Base64 Encoding on a GPU.
  * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
  */
-
-template <bool base64_url>
-simdutf_really_inline __m256i lookup_pshufb_improved(const __m256i input) {
+template <bool base64_url> __m128i lookup_pshufb_improved(const __m128i input) {
   // credit: Wojciech Muła
-  __m256i result = _mm256_subs_epu8(input, _mm256_set1_epi8(51));
-  const __m256i less = _mm256_cmpgt_epi8(_mm256_set1_epi8(26), input);
-  result =
-      _mm256_or_si256(result, _mm256_and_si256(less, _mm256_set1_epi8(13)));
-  __m256i shift_LUT;
-  if (base64_url) {
-    shift_LUT = _mm256_setr_epi8(
-        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0,
+  // reduce  0..51 -> 0
+  //        52..61 -> 1 .. 10
+  //            62 -> 11
+  //            63 -> 12
+  __m128i result = _mm_subs_epu8(input, _mm_set1_epi8(51));
 
-        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0);
-  } else {
-    shift_LUT = _mm256_setr_epi8(
-        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0,
+  // distinguish between ranges 0..25 and 26..51:
+  //         0 .. 25 -> remains 0
+  //        26 .. 51 -> becomes 13
+  const __m128i less = _mm_cmpgt_epi8(_mm_set1_epi8(26), input);
+  result = _mm_or_si128(result, _mm_and_si128(less, _mm_set1_epi8(13)));
 
-        'a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-        '0' - 52, '0' - 52, '0' - 52, '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0);
+  __m128i shift_LUT;
+  if (base64_url) {
+    shift_LUT = _mm_setr_epi8('a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+                              '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+                              '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0);
+  } else {
+    shift_LUT = _mm_setr_epi8('a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+                              '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
+                              '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0);
   }
 
-  result = _mm256_shuffle_epi8(shift_LUT, result);
-  return _mm256_add_epi8(result, input);
+  // read shift
+  result = _mm_shuffle_epi8(shift_LUT, result);
+
+  return _mm_add_epi8(result, input);
 }
 
 template <bool isbase64url>
 size_t encode_base64(char *dst, const char *src, size_t srclen,
                      base64_options options) {
   // credit: Wojciech Muła
+  // SSE (lookup: pshufb improved unrolled)
   const uint8_t *input = (const uint8_t *)src;
 
   uint8_t *out = (uint8_t *)dst;
-  const __m256i shuf =
-      _mm256_set_epi8(10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1,
+  const __m128i shuf =
+      _mm_set_epi8(10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
 
-                      10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
   size_t i = 0;
-  for (; i + 100 <= srclen; i += 96) {
-    const __m128i lo0 = _mm_loadu_si128(
+  for (; i + 52 <= srclen; i += 48) {
+    __m128i in0 = _mm_loadu_si128(
         reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 0));
-    const __m128i hi0 = _mm_loadu_si128(
+    __m128i in1 = _mm_loadu_si128(
         reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 1));
-    const __m128i lo1 = _mm_loadu_si128(
+    __m128i in2 = _mm_loadu_si128(
         reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 2));
-    const __m128i hi1 = _mm_loadu_si128(
+    __m128i in3 = _mm_loadu_si128(
         reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 3));
-    const __m128i lo2 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 4));
-    const __m128i hi2 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 5));
-    const __m128i lo3 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 6));
-    const __m128i hi3 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 7));
 
-    __m256i in0 = _mm256_shuffle_epi8(_mm256_set_m128i(hi0, lo0), shuf);
-    __m256i in1 = _mm256_shuffle_epi8(_mm256_set_m128i(hi1, lo1), shuf);
-    __m256i in2 = _mm256_shuffle_epi8(_mm256_set_m128i(hi2, lo2), shuf);
-    __m256i in3 = _mm256_shuffle_epi8(_mm256_set_m128i(hi3, lo3), shuf);
+    in0 = _mm_shuffle_epi8(in0, shuf);
+    in1 = _mm_shuffle_epi8(in1, shuf);
+    in2 = _mm_shuffle_epi8(in2, shuf);
+    in3 = _mm_shuffle_epi8(in3, shuf);
 
-    const __m256i t0_0 = _mm256_and_si256(in0, _mm256_set1_epi32(0x0fc0fc00));
-    const __m256i t0_1 = _mm256_and_si256(in1, _mm256_set1_epi32(0x0fc0fc00));
-    const __m256i t0_2 = _mm256_and_si256(in2, _mm256_set1_epi32(0x0fc0fc00));
-    const __m256i t0_3 = _mm256_and_si256(in3, _mm256_set1_epi32(0x0fc0fc00));
+    const __m128i t0_0 = _mm_and_si128(in0, _mm_set1_epi32(0x0fc0fc00));
+    const __m128i t0_1 = _mm_and_si128(in1, _mm_set1_epi32(0x0fc0fc00));
+    const __m128i t0_2 = _mm_and_si128(in2, _mm_set1_epi32(0x0fc0fc00));
+    const __m128i t0_3 = _mm_and_si128(in3, _mm_set1_epi32(0x0fc0fc00));
 
-    const __m256i t1_0 =
-        _mm256_mulhi_epu16(t0_0, _mm256_set1_epi32(0x04000040));
-    const __m256i t1_1 =
-        _mm256_mulhi_epu16(t0_1, _mm256_set1_epi32(0x04000040));
-    const __m256i t1_2 =
-        _mm256_mulhi_epu16(t0_2, _mm256_set1_epi32(0x04000040));
-    const __m256i t1_3 =
-        _mm256_mulhi_epu16(t0_3, _mm256_set1_epi32(0x04000040));
+    const __m128i t1_0 = _mm_mulhi_epu16(t0_0, _mm_set1_epi32(0x04000040));
+    const __m128i t1_1 = _mm_mulhi_epu16(t0_1, _mm_set1_epi32(0x04000040));
+    const __m128i t1_2 = _mm_mulhi_epu16(t0_2, _mm_set1_epi32(0x04000040));
+    const __m128i t1_3 = _mm_mulhi_epu16(t0_3, _mm_set1_epi32(0x04000040));
 
-    const __m256i t2_0 = _mm256_and_si256(in0, _mm256_set1_epi32(0x003f03f0));
-    const __m256i t2_1 = _mm256_and_si256(in1, _mm256_set1_epi32(0x003f03f0));
-    const __m256i t2_2 = _mm256_and_si256(in2, _mm256_set1_epi32(0x003f03f0));
-    const __m256i t2_3 = _mm256_and_si256(in3, _mm256_set1_epi32(0x003f03f0));
+    const __m128i t2_0 = _mm_and_si128(in0, _mm_set1_epi32(0x003f03f0));
+    const __m128i t2_1 = _mm_and_si128(in1, _mm_set1_epi32(0x003f03f0));
+    const __m128i t2_2 = _mm_and_si128(in2, _mm_set1_epi32(0x003f03f0));
+    const __m128i t2_3 = _mm_and_si128(in3, _mm_set1_epi32(0x003f03f0));
 
-    const __m256i t3_0 =
-        _mm256_mullo_epi16(t2_0, _mm256_set1_epi32(0x01000010));
-    const __m256i t3_1 =
-        _mm256_mullo_epi16(t2_1, _mm256_set1_epi32(0x01000010));
-    const __m256i t3_2 =
-        _mm256_mullo_epi16(t2_2, _mm256_set1_epi32(0x01000010));
-    const __m256i t3_3 =
-        _mm256_mullo_epi16(t2_3, _mm256_set1_epi32(0x01000010));
+    const __m128i t3_0 = _mm_mullo_epi16(t2_0, _mm_set1_epi32(0x01000010));
+    const __m128i t3_1 = _mm_mullo_epi16(t2_1, _mm_set1_epi32(0x01000010));
+    const __m128i t3_2 = _mm_mullo_epi16(t2_2, _mm_set1_epi32(0x01000010));
+    const __m128i t3_3 = _mm_mullo_epi16(t2_3, _mm_set1_epi32(0x01000010));
 
-    const __m256i input0 = _mm256_or_si256(t1_0, t3_0);
-    const __m256i input1 = _mm256_or_si256(t1_1, t3_1);
-    const __m256i input2 = _mm256_or_si256(t1_2, t3_2);
-    const __m256i input3 = _mm256_or_si256(t1_3, t3_3);
+    const __m128i input0 = _mm_or_si128(t1_0, t3_0);
+    const __m128i input1 = _mm_or_si128(t1_1, t3_1);
+    const __m128i input2 = _mm_or_si128(t1_2, t3_2);
+    const __m128i input3 = _mm_or_si128(t1_3, t3_3);
 
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
-                        lookup_pshufb_improved<isbase64url>(input0));
-    out += 32;
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
+                     lookup_pshufb_improved<isbase64url>(input0));
+    out += 16;
 
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
-                        lookup_pshufb_improved<isbase64url>(input1));
-    out += 32;
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
+                     lookup_pshufb_improved<isbase64url>(input1));
+    out += 16;
 
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
-                        lookup_pshufb_improved<isbase64url>(input2));
-    out += 32;
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
-                        lookup_pshufb_improved<isbase64url>(input3));
-    out += 32;
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
+                     lookup_pshufb_improved<isbase64url>(input2));
+    out += 16;
+
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
+                     lookup_pshufb_improved<isbase64url>(input3));
+    out += 16;
   }
-  for (; i + 28 <= srclen; i += 24) {
-    // lo = [xxxx|DDDC|CCBB|BAAA]
-    // hi = [xxxx|HHHG|GGFF|FEEE]
-    const __m128i lo =
-        _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i));
-    const __m128i hi =
-        _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i + 4 * 3));
+  for (; i + 16 <= srclen; i += 12) {
+
+    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i));
 
     // bytes from groups A, B and C are needed in separate 32-bit lanes
-    // in = [0HHH|0GGG|0FFF|0EEE[0DDD|0CCC|0BBB|0AAA]
-    __m256i in = _mm256_shuffle_epi8(_mm256_set_m128i(hi, lo), shuf);
+    // in = [DDDD|CCCC|BBBB|AAAA]
+    //
+    //      an input triplet has layout
+    //      [????????|ccdddddd|bbbbcccc|aaaaaabb]
+    //        byte 3   byte 2   byte 1   byte 0    -- byte 3 comes from the next
+    //        triplet
+    //
+    //      shuffling changes the order of bytes: 1, 0, 2, 1
+    //      [bbbbcccc|ccdddddd|aaaaaabb|bbbbcccc]
+    //           ^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^
+    //                  processed bits
+    in = _mm_shuffle_epi8(in, shuf);
 
-    // this part is well commented in encode.sse.cpp
+    // unpacking
 
-    const __m256i t0 = _mm256_and_si256(in, _mm256_set1_epi32(0x0fc0fc00));
-    const __m256i t1 = _mm256_mulhi_epu16(t0, _mm256_set1_epi32(0x04000040));
-    const __m256i t2 = _mm256_and_si256(in, _mm256_set1_epi32(0x003f03f0));
-    const __m256i t3 = _mm256_mullo_epi16(t2, _mm256_set1_epi32(0x01000010));
-    const __m256i indices = _mm256_or_si256(t1, t3);
+    // t0    = [0000cccc|cc000000|aaaaaa00|00000000]
+    const __m128i t0 = _mm_and_si128(in, _mm_set1_epi32(0x0fc0fc00));
+    // t1    = [00000000|00cccccc|00000000|00aaaaaa]
+    //          (c * (1 << 10), a * (1 << 6)) >> 16 (note: an unsigned
+    //          multiplication)
+    const __m128i t1 = _mm_mulhi_epu16(t0, _mm_set1_epi32(0x04000040));
 
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(out),
-                        lookup_pshufb_improved<isbase64url>(indices));
-    out += 32;
+    // t2    = [00000000|00dddddd|000000bb|bbbb0000]
+    const __m128i t2 = _mm_and_si128(in, _mm_set1_epi32(0x003f03f0));
+    // t3    = [00dddddd|00000000|00bbbbbb|00000000](
+    //          (d * (1 << 8), b * (1 << 4))
+    const __m128i t3 = _mm_mullo_epi16(t2, _mm_set1_epi32(0x01000010));
+
+    // res   = [00dddddd|00cccccc|00bbbbbb|00aaaaaa] = t1 | t3
+    const __m128i indices = _mm_or_si128(t1, t3);
+
+    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
+                     lookup_pshufb_improved<isbase64url>(indices));
+    out += 16;
   }
+
   return i / 3 * 4 + scalar::base64::tail_encode_base64((char *)out, src + i,
                                                         srclen - i, options);
 }
-
 static inline void compress(__m128i data, uint16_t mask, char *output) {
   if (mask == 0) {
     _mm_storeu_si128(reinterpret_cast<__m128i *>(output), data);
     return;
   }
+
   // this particular implementation was inspired by work done by @animetosho
   // we do it in two steps, first 8 bytes and then second 8 bytes
   uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
@@ -28218,198 +40806,209 @@ static inline void compress(__m128i data, uint16_t mask, char *output) {
   __m128i compactmask = _mm_loadu_si128(reinterpret_cast<const __m128i *>(
       tables::base64::pshufb_combine_table + pop1 * 8));
   __m128i answer = _mm_shuffle_epi8(pruned, compactmask);
-
   _mm_storeu_si128(reinterpret_cast<__m128i *>(output), answer);
 }
 
-static inline void compress(__m256i data, uint32_t mask, char *output) {
-  if (mask == 0) {
-    _mm256_storeu_si256(reinterpret_cast<__m256i *>(output), data);
-    return;
-  }
-  compress(_mm256_castsi256_si128(data), uint16_t(mask), output);
-  compress(_mm256_extracti128_si256(data, 1), uint16_t(mask >> 16),
-           output + _mm_popcnt_u32(~mask & 0xFFFF));
-}
-
 struct block64 {
-  __m256i chunks[2];
+  __m128i chunks[4];
 };
 
 template <bool base64_url>
-static inline uint32_t to_base64_mask(__m256i *src, uint32_t *error) {
-  const __m256i ascii_space_tbl =
-      _mm256_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa,
-                       0x0, 0xc, 0xd, 0x0, 0x0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0,
-                       0x0, 0x0, 0x0, 0x9, 0xa, 0x0, 0xc, 0xd, 0x0, 0x0);
+static inline uint16_t to_base64_mask(__m128i *src, uint32_t *error) {
+  const __m128i ascii_space_tbl =
+      _mm_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa, 0x0,
+                    0xc, 0xd, 0x0, 0x0);
   // credit: aqrit
-  __m256i delta_asso;
+  __m128i delta_asso;
   if (base64_url) {
-    delta_asso =
-        _mm256_setr_epi8(0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x0, 0x0, 0x0,
-                         0x0, 0x0, 0xF, 0x0, 0xF, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
-                         0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0xF, 0x0, 0xF);
+    delta_asso = _mm_setr_epi8(0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x0, 0x0,
+                               0x0, 0x0, 0x0, 0xF, 0x0, 0xF);
   } else {
-    delta_asso = _mm256_setr_epi8(
-        0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,
-        0x00, 0x0F, 0x00, 0x0F, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
-        0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0x00, 0x0F);
-  }
 
-  __m256i delta_values;
+    delta_asso = _mm_setr_epi8(0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+                               0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0x00, 0x0F);
+  }
+  __m128i delta_values;
   if (base64_url) {
-    delta_values = _mm256_setr_epi8(
-        0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF), uint8_t(0xBF), uint8_t(0xB9),
-        uint8_t(0xB9), 0x0, 0x11, uint8_t(0xC3), uint8_t(0xBF), uint8_t(0xE0),
-        uint8_t(0xB9), uint8_t(0xB9), 0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF),
-        uint8_t(0xBF), uint8_t(0xB9), uint8_t(0xB9), 0x0, 0x11, uint8_t(0xC3),
-        uint8_t(0xBF), uint8_t(0xE0), uint8_t(0xB9), uint8_t(0xB9));
+    delta_values = _mm_setr_epi8(0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF),
+                                 uint8_t(0xBF), uint8_t(0xB9), uint8_t(0xB9),
+                                 0x0, 0x11, uint8_t(0xC3), uint8_t(0xBF),
+                                 uint8_t(0xE0), uint8_t(0xB9), uint8_t(0xB9));
   } else {
-    delta_values = _mm256_setr_epi8(
-        int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13), int8_t(0x04),
-        int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9), int8_t(0x00),
-        int8_t(0x10), int8_t(0xC3), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
-        int8_t(0xB9), int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
-        int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9),
-        int8_t(0x00), int8_t(0x10), int8_t(0xC3), int8_t(0xBF), int8_t(0xBF),
-        int8_t(0xB9), int8_t(0xB9));
-  }
-  __m256i check_asso;
 
+    delta_values =
+        _mm_setr_epi8(int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+                      int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+                      int8_t(0xB9), int8_t(0x00), int8_t(0x10), int8_t(0xC3),
+                      int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9));
+  }
+  __m128i check_asso;
   if (base64_url) {
-    check_asso =
-        _mm256_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x3,
-                         0x7, 0xB, 0xE, 0xB, 0x6, 0xD, 0x1, 0x1, 0x1, 0x1, 0x1,
-                         0x1, 0x1, 0x1, 0x1, 0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
+    check_asso = _mm_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
+                               0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
   } else {
 
-    check_asso = _mm256_setr_epi8(
-        0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x03, 0x07,
-        0x0B, 0x0B, 0x0B, 0x0F, 0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
-        0x01, 0x01, 0x03, 0x07, 0x0B, 0x0B, 0x0B, 0x0F);
+    check_asso = _mm_setr_epi8(0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+                               0x01, 0x01, 0x03, 0x07, 0x0B, 0x0B, 0x0B, 0x0F);
   }
-  __m256i check_values;
+  __m128i check_values;
   if (base64_url) {
-    check_values = _mm256_setr_epi8(
-        uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
-        uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6), uint8_t(0xA6),
-        uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0, uint8_t(0x80),
-        0x0, uint8_t(0x80), uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
-        uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF), uint8_t(0xB6),
-        uint8_t(0xA6), uint8_t(0xB5), uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
-        uint8_t(0x80), 0x0, uint8_t(0x80));
+    check_values = _mm_setr_epi8(uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
+                                 uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF),
+                                 uint8_t(0xB6), uint8_t(0xA6), uint8_t(0xB5),
+                                 uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
+                                 uint8_t(0x80), 0x0, uint8_t(0x80));
   } else {
-    check_values = _mm256_setr_epi8(
-        int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0xCF),
-        int8_t(0xBF), int8_t(0xD5), int8_t(0xA6), int8_t(0xB5), int8_t(0x86),
-        int8_t(0xD1), int8_t(0x80), int8_t(0xB1), int8_t(0x80), int8_t(0x91),
-        int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
-        int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6), int8_t(0xB5),
-        int8_t(0x86), int8_t(0xD1), int8_t(0x80), int8_t(0xB1), int8_t(0x80),
-        int8_t(0x91), int8_t(0x80));
+
+    check_values =
+        _mm_setr_epi8(int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
+                      int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6),
+                      int8_t(0xB5), int8_t(0x86), int8_t(0xD1), int8_t(0x80),
+                      int8_t(0xB1), int8_t(0x80), int8_t(0x91), int8_t(0x80));
   }
-  const __m256i shifted = _mm256_srli_epi32(*src, 3);
-  const __m256i delta_hash =
-      _mm256_avg_epu8(_mm256_shuffle_epi8(delta_asso, *src), shifted);
-  const __m256i check_hash =
-      _mm256_avg_epu8(_mm256_shuffle_epi8(check_asso, *src), shifted);
-  const __m256i out =
-      _mm256_adds_epi8(_mm256_shuffle_epi8(delta_values, delta_hash), *src);
-  const __m256i chk =
-      _mm256_adds_epi8(_mm256_shuffle_epi8(check_values, check_hash), *src);
-  const int mask = _mm256_movemask_epi8(chk);
+  const __m128i shifted = _mm_srli_epi32(*src, 3);
+
+  const __m128i delta_hash =
+      _mm_avg_epu8(_mm_shuffle_epi8(delta_asso, *src), shifted);
+  const __m128i check_hash =
+      _mm_avg_epu8(_mm_shuffle_epi8(check_asso, *src), shifted);
+
+  const __m128i out =
+      _mm_adds_epi8(_mm_shuffle_epi8(delta_values, delta_hash), *src);
+  const __m128i chk =
+      _mm_adds_epi8(_mm_shuffle_epi8(check_values, check_hash), *src);
+  const int mask = _mm_movemask_epi8(chk);
   if (mask) {
-    __m256i ascii_space =
-        _mm256_cmpeq_epi8(_mm256_shuffle_epi8(ascii_space_tbl, *src), *src);
-    *error = (mask ^ _mm256_movemask_epi8(ascii_space));
+    __m128i ascii_space =
+        _mm_cmpeq_epi8(_mm_shuffle_epi8(ascii_space_tbl, *src), *src);
+    *error = (mask ^ _mm_movemask_epi8(ascii_space));
   }
   *src = out;
-  return (uint32_t)mask;
+  return (uint16_t)mask;
 }
 
 template <bool base64_url>
 static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
   uint32_t err0 = 0;
   uint32_t err1 = 0;
+  uint32_t err2 = 0;
+  uint32_t err3 = 0;
   uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], &err0);
   uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], &err1);
-  *error = err0 | ((uint64_t)err1 << 32);
-  return m0 | (m1 << 32);
+  uint64_t m2 = to_base64_mask<base64_url>(&b->chunks[2], &err2);
+  uint64_t m3 = to_base64_mask<base64_url>(&b->chunks[3], &err3);
+  *error = (err0) | ((uint64_t)err1 << 16) | ((uint64_t)err2 << 32) |
+           ((uint64_t)err3 << 48);
+  return m0 | (m1 << 16) | (m2 << 32) | (m3 << 48);
+}
+
+#if defined(_MSC_VER) && !defined(__clang__)
+static inline size_t simdutf_tzcnt_u64(uint64_t num) {
+  unsigned long ret;
+  if (num == 0) {
+    return 64;
+  }
+  _BitScanForward64(&ret, num);
+  return ret;
+}
+#else // GCC or Clang
+static inline size_t simdutf_tzcnt_u64(uint64_t num) {
+  return num ? __builtin_ctzll(num) : 64;
 }
+#endif
 
 static inline void copy_block(block64 *b, char *output) {
-  _mm256_storeu_si256(reinterpret_cast<__m256i *>(output), b->chunks[0]);
-  _mm256_storeu_si256(reinterpret_cast<__m256i *>(output + 32), b->chunks[1]);
+  _mm_storeu_si128(reinterpret_cast<__m128i *>(output), b->chunks[0]);
+  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 16), b->chunks[1]);
+  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 32), b->chunks[2]);
+  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 48), b->chunks[3]);
 }
 
 static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
   uint64_t nmask = ~mask;
-  compress(b->chunks[0], uint32_t(mask), output);
-  compress(b->chunks[1], uint32_t(mask >> 32),
+  compress(b->chunks[0], uint16_t(mask), output);
+  compress(b->chunks[1], uint16_t(mask >> 16),
+           output + _mm_popcnt_u64(nmask & 0xFFFF));
+  compress(b->chunks[2], uint16_t(mask >> 32),
            output + _mm_popcnt_u64(nmask & 0xFFFFFFFF));
+  compress(b->chunks[3], uint16_t(mask >> 48),
+           output + _mm_popcnt_u64(nmask & 0xFFFFFFFFFFFFULL));
   return _mm_popcnt_u64(nmask);
 }
 
 // The caller of this function is responsible to ensure that there are 64 bytes
 // available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char *src) {
-  b->chunks[0] = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
-  b->chunks[1] =
-      _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32));
+  b->chunks[0] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
+  b->chunks[1] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16));
+  b->chunks[2] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32));
+  b->chunks[3] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48));
 }
 
 // The caller of this function is responsible to ensure that there are 128 bytes
 // available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char16_t *src) {
-  __m256i m1 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src));
-  __m256i m2 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 16));
-  __m256i m3 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32));
-  __m256i m4 = _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 48));
-  __m256i m1p = _mm256_permute2x128_si256(m1, m2, 0x20);
-  __m256i m2p = _mm256_permute2x128_si256(m1, m2, 0x31);
-  __m256i m3p = _mm256_permute2x128_si256(m3, m4, 0x20);
-  __m256i m4p = _mm256_permute2x128_si256(m3, m4, 0x31);
-  b->chunks[0] = _mm256_packus_epi16(m1p, m2p);
-  b->chunks[1] = _mm256_packus_epi16(m3p, m4p);
+  __m128i m1 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
+  __m128i m2 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 8));
+  __m128i m3 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16));
+  __m128i m4 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 24));
+  __m128i m5 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32));
+  __m128i m6 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 40));
+  __m128i m7 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48));
+  __m128i m8 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 56));
+  b->chunks[0] = _mm_packus_epi16(m1, m2);
+  b->chunks[1] = _mm_packus_epi16(m3, m4);
+  b->chunks[2] = _mm_packus_epi16(m5, m6);
+  b->chunks[3] = _mm_packus_epi16(m7, m8);
 }
 
-static inline void base64_decode(char *out, __m256i str) {
+static inline void base64_decode(char *out, __m128i str) {
   // credit: aqrit
-  const __m256i pack_shuffle =
-      _mm256_setr_epi8(2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1,
-                       2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1);
-  const __m256i t0 = _mm256_maddubs_epi16(str, _mm256_set1_epi32(0x01400140));
-  const __m256i t1 = _mm256_madd_epi16(t0, _mm256_set1_epi32(0x00011000));
-  const __m256i t2 = _mm256_shuffle_epi8(t1, pack_shuffle);
 
+  const __m128i pack_shuffle =
+      _mm_setr_epi8(2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1);
+
+  const __m128i t0 = _mm_maddubs_epi16(str, _mm_set1_epi32(0x01400140));
+  const __m128i t1 = _mm_madd_epi16(t0, _mm_set1_epi32(0x00011000));
+  const __m128i t2 = _mm_shuffle_epi8(t1, pack_shuffle);
   // Store the output:
-  _mm_storeu_si128((__m128i *)out, _mm256_castsi256_si128(t2));
-  _mm_storeu_si128((__m128i *)(out + 12), _mm256_extracti128_si256(t2, 1));
+  // this writes 16 bytes, but we only need 12.
+  _mm_storeu_si128((__m128i *)out, t2);
 }
 // decode 64 bytes and output 48 bytes
 static inline void base64_decode_block(char *out, const char *src) {
-  base64_decode(out,
-                _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src)));
-  base64_decode(out + 24, _mm256_loadu_si256(
-                              reinterpret_cast<const __m256i *>(src + 32)));
+  base64_decode(out, _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
+  base64_decode(out + 12,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16)));
+  base64_decode(out + 24,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32)));
+  base64_decode(out + 36,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48)));
 }
 static inline void base64_decode_block_safe(char *out, const char *src) {
-  base64_decode(out,
-                _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src)));
-  char buffer[32]; // We enforce safety with a buffer.
-  base64_decode(
-      buffer, _mm256_loadu_si256(reinterpret_cast<const __m256i *>(src + 32)));
-  std::memcpy(out + 24, buffer, 24);
+  base64_decode(out, _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
+  base64_decode(out + 12,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16)));
+  base64_decode(out + 24,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32)));
+  char buffer[16];
+  base64_decode(buffer,
+                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48)));
+  std::memcpy(out + 36, buffer, 12);
 }
 static inline void base64_decode_block(char *out, block64 *b) {
   base64_decode(out, b->chunks[0]);
-  base64_decode(out + 24, b->chunks[1]);
+  base64_decode(out + 12, b->chunks[1]);
+  base64_decode(out + 24, b->chunks[2]);
+  base64_decode(out + 36, b->chunks[3]);
 }
 static inline void base64_decode_block_safe(char *out, block64 *b) {
   base64_decode(out, b->chunks[0]);
-  char buffer[32]; // We enforce safety with a buffer.
-  base64_decode(buffer, b->chunks[1]);
-  std::memcpy(out + 24, buffer, 24);
+  base64_decode(out + 12, b->chunks[1]);
+  base64_decode(out + 24, b->chunks[2]);
+  char buffer[16];
+  base64_decode(buffer, b->chunks[3]);
+  std::memcpy(out + 36, buffer, 12);
 }
 
 template <bool base64_url, typename chartype>
@@ -28456,7 +41055,7 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   const chartype *const srcend = src + srclen;
 
   constexpr size_t block_size = 6;
-  static_assert(block_size >= 2, "block_size must be at least two");
+  static_assert(block_size >= 2, "block should of size 2 or more");
   char buffer[block_size * 64];
   char *bufferptr = buffer;
   if (srclen >= 64) {
@@ -28469,7 +41068,7 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        size_t error_offset = _tzcnt_u64(error);
+        size_t error_offset = simdutf_tzcnt_u64(error);
         return {error_code::INVALID_BASE64_CHARACTER,
                 size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
       }
@@ -28512,7 +41111,6 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   // time, otherwise, we should just decode directly.
   int last_block = (int)((bufferptr - buffer_start) % 64);
   if (last_block != 0 && srcend - src + last_block >= 64) {
-
     while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
@@ -28598,15 +41196,15 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   }
   return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
-/* end file src/haswell/avx2_base64.cpp */
+/* end file src/westmere/sse_base64.cpp */
 
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 
 /* begin file src/generic/buf_block_reader.h */
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 
 // Walks through a buffer in block-sized increments, loading the last part with
@@ -28712,12 +41310,12 @@ simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
 }
 
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/buf_block_reader.h */
 /* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_validation {
 
@@ -28937,12 +41535,12 @@ struct utf8_checker {
 using utf8_validation::utf8_checker;
 
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
 /* begin file src/generic/utf8_validation/utf8_validator.h */
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_validation {
 
@@ -29077,14 +41675,14 @@ result generic_validate_ascii_with_errors(const char *input, size_t length) {
 
 } // namespace utf8_validation
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_validation/utf8_validator.h */
 // transcoding from UTF-8 to UTF-16
 /* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_utf16 {
 
@@ -29155,13 +41753,13 @@ simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
 
 } // namespace utf8_to_utf16
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 /* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_utf16 {
 using namespace simd;
@@ -29490,14 +42088,14 @@ struct validating_transcoder {
 }; // struct utf8_checker
 } // namespace utf8_to_utf16
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 // transcoding from UTF-8 to UTF-32
 /* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_utf32 {
 
@@ -29536,13 +42134,13 @@ simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
 
 } // namespace utf8_to_utf32
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 /* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_utf32 {
 using namespace simd;
@@ -29857,14 +42455,14 @@ struct validating_transcoder {
 }; // struct utf8_checker
 } // namespace utf8_to_utf32
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 // other functions
 /* begin file src/generic/utf8.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8 {
 
@@ -29899,12 +42497,12 @@ simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
 }
 } // namespace utf8
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8.h */
 /* begin file src/generic/utf16.h */
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf16 {
 
@@ -29974,15 +42572,14 @@ change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
 
 } // namespace utf16
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf16.h */
-
 // transcoding from UTF-8 to Latin 1
 /* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
@@ -30292,13 +42889,13 @@ struct validating_transcoder {
 }; // struct utf8_checker
 } // namespace utf8_to_latin1
 } // unnamed namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
 /* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 /* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 namespace {
 namespace utf8_to_latin1 {
 using namespace simd;
@@ -30372,19 +42969,24 @@ simdutf_really_inline size_t convert_valid(const char *in, size_t size,
 
 } // namespace utf8_to_latin1
 } // namespace
-} // namespace haswell
+} // namespace westmere
 } // namespace simdutf
   // namespace simdutf
 /* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
+//
+// Implementation-specific overrides
+//
+
 namespace simdutf {
-namespace haswell {
+namespace westmere {
 
 simdutf_warn_unused int
 implementation::detect_encodings(const char *input,
                                  size_t length) const noexcept {
   // If there is a BOM, then we trust it.
   auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  // todo: reimplement as a one-pass algorithm.
   if (bom_encoding != encoding_type::unspecified) {
     return bom_encoding;
   }
@@ -30408,22 +43010,23 @@ implementation::detect_encodings(const char *input,
 
 simdutf_warn_unused bool
 implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_utf8(buf, len);
+  return westmere::utf8_validation::generic_validate_utf8(buf, len);
 }
 
 simdutf_warn_unused result implementation::validate_utf8_with_errors(
     const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_utf8_with_errors(buf, len);
+  return westmere::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
 simdutf_warn_unused bool
 implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_ascii(buf, len);
+  return westmere::utf8_validation::generic_validate_ascii(buf, len);
 }
 
 simdutf_warn_unused result implementation::validate_ascii_with_errors(
     const char *buf, size_t len) const noexcept {
-  return haswell::utf8_validation::generic_validate_ascii_with_errors(buf, len);
+  return westmere::utf8_validation::generic_validate_ascii_with_errors(buf,
+                                                                       len);
 }
 
 simdutf_warn_unused bool
@@ -30434,7 +43037,7 @@ implementation::validate_utf16le(const char16_t *buf,
     // handling nullptr
     return true;
   }
-  const char16_t *tail = avx2_validate_utf16<endianness::LITTLE>(buf, len);
+  const char16_t *tail = sse_validate_utf16<endianness::LITTLE>(buf, len);
   if (tail) {
     return scalar::utf16::validate<endianness::LITTLE>(tail,
                                                        len - (tail - buf));
@@ -30451,7 +43054,7 @@ implementation::validate_utf16be(const char16_t *buf,
     // handling nullptr
     return true;
   }
-  const char16_t *tail = avx2_validate_utf16<endianness::BIG>(buf, len);
+  const char16_t *tail = sse_validate_utf16<endianness::BIG>(buf, len);
   if (tail) {
     return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
   } else {
@@ -30461,7 +43064,7 @@ implementation::validate_utf16be(const char16_t *buf,
 
 simdutf_warn_unused result implementation::validate_utf16le_with_errors(
     const char16_t *buf, size_t len) const noexcept {
-  result res = avx2_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
+  result res = sse_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
   if (res.count != len) {
     result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
         buf + res.count, len - res.count);
@@ -30473,7 +43076,7 @@ simdutf_warn_unused result implementation::validate_utf16le_with_errors(
 
 simdutf_warn_unused result implementation::validate_utf16be_with_errors(
     const char16_t *buf, size_t len) const noexcept {
-  result res = avx2_validate_utf16_with_errors<endianness::BIG>(buf, len);
+  result res = sse_validate_utf16_with_errors<endianness::BIG>(buf, len);
   if (res.count != len) {
     result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
         buf + res.count, len - res.count);
@@ -30490,7 +43093,7 @@ implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
     // handling nullptr
     return true;
   }
-  const char32_t *tail = avx2_validate_utf32le(buf, len);
+  const char32_t *tail = sse_validate_utf32le(buf, len);
   if (tail) {
     return scalar::utf32::validate(tail, len - (tail - buf));
   } else {
@@ -30500,12 +43103,12 @@ implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
 
 simdutf_warn_unused result implementation::validate_utf32_with_errors(
     const char32_t *buf, size_t len) const noexcept {
-  if (simdutf_unlikely(len == 0)) {
+  if (len == 0) {
     // empty input is valid UTF-32. protect the implementation from
     // handling nullptr
     return result(error_code::SUCCESS, 0);
   }
-  result res = avx2_validate_utf32le_with_errors(buf, len);
+  result res = sse_validate_utf32le_with_errors(buf, len);
   if (res.count != len) {
     result scalar_res =
         scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
@@ -30517,8 +43120,9 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
     const char *buf, size_t len, char *utf8_output) const noexcept {
+
   std::pair<const char *, char *> ret =
-      avx2_convert_latin1_to_utf8(buf, len, utf8_output);
+      sse_convert_latin1_to_utf8(buf, len, utf8_output);
   size_t converted_chars = ret.second - utf8_output;
 
   if (ret.first != buf + len) {
@@ -30533,7 +43137,7 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
     const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char *, char16_t *> ret =
-      avx2_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+      sse_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30553,7 +43157,7 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
     const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char *, char16_t *> ret =
-      avx2_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
+      sse_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30573,7 +43177,7 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
     const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char *, char32_t *> ret =
-      avx2_convert_latin1_to_utf32(buf, len, utf32_output);
+      sse_convert_latin1_to_utf32(buf, len, utf32_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30602,8 +43206,8 @@ simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
 }
 
 simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
-    const char *input, size_t size, char *latin1_output) const noexcept {
-  return utf8_to_latin1::convert_valid(input, size, latin1_output);
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return westmere::utf8_to_latin1::convert_valid(buf, len, latin1_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
@@ -30663,12 +43267,12 @@ simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
 simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      haswell::avx2_convert_utf16_to_latin1<endianness::LITTLE>(buf, len,
-                                                                latin1_output);
+      sse_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
   if (ret.first == nullptr) {
     return 0;
   }
   size_t saved_bytes = ret.second - latin1_output;
+
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes =
         scalar::utf16_to_latin1::convert<endianness::LITTLE>(
@@ -30684,12 +43288,12 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      haswell::avx2_convert_utf16_to_latin1<endianness::BIG>(buf, len,
-                                                             latin1_output);
+      sse_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
   if (ret.first == nullptr) {
     return 0;
   }
   size_t saved_bytes = ret.second - latin1_output;
+
   if (ret.first != buf + len) {
     const size_t scalar_saved_bytes =
         scalar::utf16_to_latin1::convert<endianness::BIG>(
@@ -30706,7 +43310,7 @@ simdutf_warn_unused result
 implementation::convert_utf16le_to_latin1_with_errors(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<result, char *> ret =
-      avx2_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+      sse_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
           buf, len, latin1_output);
   if (ret.first.error) {
     return ret.first;
@@ -30733,8 +43337,8 @@ simdutf_warn_unused result
 implementation::convert_utf16be_to_latin1_with_errors(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<result, char *> ret =
-      avx2_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
-                                                                latin1_output);
+      sse_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                               latin1_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -30758,21 +43362,20 @@ implementation::convert_utf16be_to_latin1_with_errors(
 
 simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement a custom function
+  // optimization opportunity: we could provide an optimized function.
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: implement a custom function
+  // optimization opportunity: we could provide an optimized function.
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
     const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      haswell::avx2_convert_utf16_to_utf8<endianness::LITTLE>(buf, len,
-                                                              utf8_output);
+      sse_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30792,8 +43395,7 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
     const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      haswell::avx2_convert_utf16_to_utf8<endianness::BIG>(buf, len,
-                                                           utf8_output);
+      sse_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30815,7 +43417,7 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
+      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
           buf, len, utf8_output);
   if (ret.first.error) {
     return ret.first;
@@ -30843,7 +43445,7 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      haswell::avx2_convert_utf16_to_utf8_with_errors<endianness::BIG>(
+      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::BIG>(
           buf, len, utf8_output);
   if (ret.first.error) {
     return ret.first;
@@ -30876,34 +43478,16 @@ simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  std::pair<const char32_t *, char *> ret =
-      avx2_convert_utf32_to_utf8(buf, len, utf8_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - utf8_output;
-  if (ret.first != buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
-
 simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
     const char32_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<const char32_t *, char *> ret =
-      avx2_convert_utf32_to_latin1(buf, len, latin1_output);
+      sse_convert_utf32_to_latin1(buf, len, latin1_output);
   if (ret.first == nullptr) {
     return 0;
   }
   size_t saved_bytes = ret.second - latin1_output;
-  if (ret.first != buf + len) {
+  // if (ret.first != buf + len) {
+  if (ret.first < buf + len) {
     const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
         ret.first, len - (ret.first - buf), ret.second);
     if (scalar_saved_bytes == 0) {
@@ -30919,7 +43503,8 @@ simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      avx2_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+      westmere::sse_convert_utf32_to_latin1_with_errors(buf, len,
+                                                        latin1_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
         buf + ret.first.count, len - ret.first.count, ret.second);
@@ -30938,15 +43523,35 @@ simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
 
 simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
     const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: we could provide an optimized function.
   return convert_utf32_to_latin1(buf, len, latin1_output);
 }
 
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      sse_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
 simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
     const char32_t *buf, size_t len, char *utf8_output) const noexcept {
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      haswell::avx2_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+      westmere::sse_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
         buf + ret.first.count, len - ret.first.count, ret.second);
@@ -30966,8 +43571,7 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
 simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
     const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char16_t *, char32_t *> ret =
-      haswell::avx2_convert_utf16_to_utf32<endianness::LITTLE>(buf, len,
-                                                               utf32_output);
+      sse_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -30987,8 +43591,7 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
     const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char16_t *, char32_t *> ret =
-      haswell::avx2_convert_utf16_to_utf32<endianness::BIG>(buf, len,
-                                                            utf32_output);
+      sse_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -31010,7 +43613,7 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char32_t *> ret =
-      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
+      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
           buf, len, utf32_output);
   if (ret.first.error) {
     return ret.first;
@@ -31038,7 +43641,7 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char32_t *> ret =
-      haswell::avx2_convert_utf16_to_utf32_with_errors<endianness::BIG>(
+      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::BIG>(
           buf, len, utf32_output);
   if (ret.first.error) {
     return ret.first;
@@ -31069,7 +43672,7 @@ simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
 simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
     const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char32_t *, char16_t *> ret =
-      avx2_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+      sse_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -31089,7 +43692,7 @@ simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
 simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
     const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char32_t *, char16_t *> ret =
-      avx2_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+      sse_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -31111,7 +43714,7 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char16_t *> ret =
-      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
+      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
           buf, len, utf16_output);
   if (ret.first.count != len) {
     result scalar_res =
@@ -31135,7 +43738,7 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char16_t *> ret =
-      haswell::avx2_convert_utf32_to_utf16_with_errors<endianness::BIG>(
+      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::BIG>(
           buf, len, utf16_output);
   if (ret.first.count != len) {
     result scalar_res =
@@ -31220,26 +43823,11 @@ simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
   return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
-}
-
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
-}
-
 simdutf_warn_unused size_t
 implementation::utf16_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf16_length_from_latin1(length);
 }
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return utf8::utf16_length_from_utf8(input, length);
-}
-
 simdutf_warn_unused size_t
 implementation::utf32_length_from_latin1(size_t length) const noexcept {
   return scalar::latin1::utf32_length_from_latin1(length);
@@ -31247,91 +43835,110 @@ implementation::utf32_length_from_latin1(size_t length) const noexcept {
 
 simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
     const char *input, size_t len) const noexcept {
-  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
-  size_t answer = len / sizeof(__m256i) * sizeof(__m256i);
+  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
+  size_t answer = len / sizeof(__m128i) * sizeof(__m128i);
   size_t i = 0;
   if (answer >= 2048) { // long strings optimization
-    __m256i four_64bits = _mm256_setzero_si256();
-    while (i + sizeof(__m256i) <= len) {
-      __m256i runner = _mm256_setzero_si256();
-      // We can do up to 255 loops without overflow.
-      size_t iterations = (len - i) / sizeof(__m256i);
+    __m128i two_64bits = _mm_setzero_si128();
+    while (i + sizeof(__m128i) <= len) {
+      __m128i runner = _mm_setzero_si128();
+      size_t iterations = (len - i) / sizeof(__m128i);
       if (iterations > 255) {
         iterations = 255;
       }
-      size_t max_i = i + iterations * sizeof(__m256i) - sizeof(__m256i);
-      for (; i + 4 * sizeof(__m256i) <= max_i; i += 4 * sizeof(__m256i)) {
-        __m256i input1 = _mm256_loadu_si256((const __m256i *)(data + i));
-        __m256i input2 =
-            _mm256_loadu_si256((const __m256i *)(data + i + sizeof(__m256i)));
-        __m256i input3 = _mm256_loadu_si256(
-            (const __m256i *)(data + i + 2 * sizeof(__m256i)));
-        __m256i input4 = _mm256_loadu_si256(
-            (const __m256i *)(data + i + 3 * sizeof(__m256i)));
-        __m256i input12 =
-            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input1),
-                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input2));
-        __m256i input23 =
-            _mm256_add_epi8(_mm256_cmpgt_epi8(_mm256_setzero_si256(), input3),
-                            _mm256_cmpgt_epi8(_mm256_setzero_si256(), input4));
-        __m256i input1234 = _mm256_add_epi8(input12, input23);
-        runner = _mm256_sub_epi8(runner, input1234);
+      size_t max_i = i + iterations * sizeof(__m128i) - sizeof(__m128i);
+      for (; i + 4 * sizeof(__m128i) <= max_i; i += 4 * sizeof(__m128i)) {
+        __m128i input1 = _mm_loadu_si128((const __m128i *)(str + i));
+        __m128i input2 =
+            _mm_loadu_si128((const __m128i *)(str + i + sizeof(__m128i)));
+        __m128i input3 =
+            _mm_loadu_si128((const __m128i *)(str + i + 2 * sizeof(__m128i)));
+        __m128i input4 =
+            _mm_loadu_si128((const __m128i *)(str + i + 3 * sizeof(__m128i)));
+        __m128i input12 =
+            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input1),
+                         _mm_cmpgt_epi8(_mm_setzero_si128(), input2));
+        __m128i input34 =
+            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input3),
+                         _mm_cmpgt_epi8(_mm_setzero_si128(), input4));
+        __m128i input1234 = _mm_add_epi8(input12, input34);
+        runner = _mm_sub_epi8(runner, input1234);
       }
-      for (; i <= max_i; i += sizeof(__m256i)) {
-        __m256i input_256_chunk =
-            _mm256_loadu_si256((const __m256i *)(data + i));
-        runner = _mm256_sub_epi8(
-            runner, _mm256_cmpgt_epi8(_mm256_setzero_si256(), input_256_chunk));
+      for (; i <= max_i; i += sizeof(__m128i)) {
+        __m128i more_input = _mm_loadu_si128((const __m128i *)(str + i));
+        runner = _mm_sub_epi8(runner,
+                              _mm_cmpgt_epi8(_mm_setzero_si128(), more_input));
       }
-      four_64bits = _mm256_add_epi64(
-          four_64bits, _mm256_sad_epu8(runner, _mm256_setzero_si256()));
+      two_64bits =
+          _mm_add_epi64(two_64bits, _mm_sad_epu8(runner, _mm_setzero_si128()));
     }
-    answer += _mm256_extract_epi64(four_64bits, 0) +
-              _mm256_extract_epi64(four_64bits, 1) +
-              _mm256_extract_epi64(four_64bits, 2) +
-              _mm256_extract_epi64(four_64bits, 3);
-  } else if (answer > 0) {
-    for (; i + sizeof(__m256i) <= len; i += sizeof(__m256i)) {
-      __m256i latin = _mm256_loadu_si256((const __m256i *)(data + i));
-      uint32_t non_ascii = _mm256_movemask_epi8(latin);
+    answer +=
+        _mm_extract_epi64(two_64bits, 0) + _mm_extract_epi64(two_64bits, 1);
+  } else if (answer > 0) { // short string optimization
+    for (; i + 2 * sizeof(__m128i) <= len; i += 2 * sizeof(__m128i)) {
+      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
+      uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
+      answer += count_ones(non_ascii);
+      latin = _mm_loadu_si128((const __m128i *)(input + i) + 1);
+      non_ascii = (uint16_t)_mm_movemask_epi8(latin);
+      answer += count_ones(non_ascii);
+    }
+    for (; i + sizeof(__m128i) <= len; i += sizeof(__m128i)) {
+      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
+      uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
       answer += count_ones(non_ascii);
     }
   }
   return answer + scalar::latin1::utf8_length_from_latin1(
-                      reinterpret_cast<const char *>(data + i), len - i);
+                      reinterpret_cast<const char *>(str + i), len - i);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::utf16_length_from_utf8(input, length);
 }
 
 simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
     const char32_t *input, size_t length) const noexcept {
-  const __m256i v_00000000 = _mm256_setzero_si256();
-  const __m256i v_ffffff80 = _mm256_set1_epi32((uint32_t)0xffffff80);
-  const __m256i v_fffff800 = _mm256_set1_epi32((uint32_t)0xfffff800);
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m128i v_00000000 = _mm_setzero_si128();
+  const __m128i v_ffffff80 = _mm_set1_epi32((uint32_t)0xffffff80);
+  const __m128i v_fffff800 = _mm_set1_epi32((uint32_t)0xfffff800);
+  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for (; pos + 8 <= length; pos += 8) {
-    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
-    const __m256i ascii_bytes_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffffff80), v_00000000);
-    const __m256i one_two_bytes_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_fffff800), v_00000000);
-    const __m256i two_bytes_bytemask =
-        _mm256_xor_si256(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const __m256i one_two_three_bytes_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const __m256i three_bytes_bytemask =
-        _mm256_xor_si256(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
-    const uint32_t ascii_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(ascii_bytes_bytemask));
-    const uint32_t two_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(two_bytes_bytemask));
-    const uint32_t three_bytes_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(three_bytes_bytemask));
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
+    const __m128i ascii_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffffff80), v_00000000);
+    const __m128i one_two_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_fffff800), v_00000000);
+    const __m128i two_bytes_bytemask =
+        _mm_xor_si128(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const __m128i one_two_three_bytes_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
+    const __m128i three_bytes_bytemask =
+        _mm_xor_si128(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
+    const uint16_t ascii_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(ascii_bytes_bytemask));
+    const uint16_t two_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(two_bytes_bytemask));
+    const uint16_t three_bytes_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(three_bytes_bytemask));
 
     size_t ascii_count = count_ones(ascii_bytes_bitmask) / 4;
     size_t two_bytes_count = count_ones(two_bytes_bitmask) / 4;
     size_t three_bytes_count = count_ones(three_bytes_bitmask) / 4;
-    count += 32 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
+    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
   }
   return count +
          scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
@@ -31339,18 +43946,18 @@ simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
 
 simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
     const char32_t *input, size_t length) const noexcept {
-  const __m256i v_00000000 = _mm256_setzero_si256();
-  const __m256i v_ffff0000 = _mm256_set1_epi32((uint32_t)0xffff0000);
+  const __m128i v_00000000 = _mm_setzero_si128();
+  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
   size_t pos = 0;
   size_t count = 0;
-  for (; pos + 8 <= length; pos += 8) {
-    __m256i in = _mm256_loadu_si256((__m256i *)(input + pos));
-    const __m256i surrogate_bytemask =
-        _mm256_cmpeq_epi32(_mm256_and_si256(in, v_ffff0000), v_00000000);
-    const uint32_t surrogate_bitmask =
-        static_cast<uint32_t>(_mm256_movemask_epi8(surrogate_bytemask));
-    size_t surrogate_count = (32 - count_ones(surrogate_bitmask)) / 4;
-    count += 8 + surrogate_count;
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
+    const __m128i surrogate_bytemask =
+        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
+    const uint16_t surrogate_bitmask =
+        static_cast<uint16_t>(_mm_movemask_epi8(surrogate_bytemask));
+    size_t surrogate_count = (16 - count_ones(surrogate_bitmask)) / 4;
+    count += 4 + surrogate_count;
   }
   return count +
          scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
@@ -31386,3612 +43993,5867 @@ simdutf_warn_unused full_result implementation::base64_to_binary_details(
                                              last_chunk_options);
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
+simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
+    const char16_t *input, size_t length) const noexcept {
+  return scalar::base64::maximal_binary_length_from_base64(input, length);
+}
+
+simdutf_warn_unused result implementation::base64_to_binary(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused full_result implementation::base64_to_binary_details(
+    const char16_t *input, size_t length, char *output, base64_options options,
+    last_chunk_handling_options last_chunk_options) const noexcept {
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
+}
+
+simdutf_warn_unused size_t implementation::base64_length_from_binary(
+    size_t length, base64_options options) const noexcept {
+  return scalar::base64::base64_length_from_binary(length, options);
+}
+
+size_t implementation::binary_to_base64(const char *input, size_t length,
+                                        char *output,
+                                        base64_options options) const noexcept {
+  if (options & base64_url) {
+    return encode_base64<true>(output, input, length, options);
+  } else {
+    return encode_base64<false>(output, input, length, options);
+  }
+}
+} // namespace westmere
+} // namespace simdutf
+
+/* begin file src/simdutf/westmere/end.h */
+#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
+// nothing needed.
+#else
+SIMDUTF_UNTARGET_REGION
+#endif
+
+/* end file src/simdutf/westmere/end.h */
+/* end file src/westmere/implementation.cpp */
+#endif
+#if SIMDUTF_IMPLEMENTATION_LSX
+/* begin file src/lsx/implementation.cpp */
+/* begin file src/simdutf/lsx/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "lsx"
+// #define SIMDUTF_IMPLEMENTATION lsx
+/* end file src/simdutf/lsx/begin.h */
+namespace simdutf {
+namespace lsx {
+namespace {
+#ifndef SIMDUTF_LSX_H
+  #error "lsx.h must be included"
+#endif
+using namespace simd;
+
+// convert vmskltz/vmskgez/vmsknz to
+// simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes index
+const uint8_t lsx_1_2_utf8_bytes_mask[] = {
+    0,   1,   4,   5,   16,  17,  20,  21,  64,  65,  68,  69,  80,  81,  84,
+    85,  2,   3,   6,   7,   18,  19,  22,  23,  66,  67,  70,  71,  82,  83,
+    86,  87,  8,   9,   12,  13,  24,  25,  28,  29,  72,  73,  76,  77,  88,
+    89,  92,  93,  10,  11,  14,  15,  26,  27,  30,  31,  74,  75,  78,  79,
+    90,  91,  94,  95,  32,  33,  36,  37,  48,  49,  52,  53,  96,  97,  100,
+    101, 112, 113, 116, 117, 34,  35,  38,  39,  50,  51,  54,  55,  98,  99,
+    102, 103, 114, 115, 118, 119, 40,  41,  44,  45,  56,  57,  60,  61,  104,
+    105, 108, 109, 120, 121, 124, 125, 42,  43,  46,  47,  58,  59,  62,  63,
+    106, 107, 110, 111, 122, 123, 126, 127, 128, 129, 132, 133, 144, 145, 148,
+    149, 192, 193, 196, 197, 208, 209, 212, 213, 130, 131, 134, 135, 146, 147,
+    150, 151, 194, 195, 198, 199, 210, 211, 214, 215, 136, 137, 140, 141, 152,
+    153, 156, 157, 200, 201, 204, 205, 216, 217, 220, 221, 138, 139, 142, 143,
+    154, 155, 158, 159, 202, 203, 206, 207, 218, 219, 222, 223, 160, 161, 164,
+    165, 176, 177, 180, 181, 224, 225, 228, 229, 240, 241, 244, 245, 162, 163,
+    166, 167, 178, 179, 182, 183, 226, 227, 230, 231, 242, 243, 246, 247, 168,
+    169, 172, 173, 184, 185, 188, 189, 232, 233, 236, 237, 248, 249, 252, 253,
+    170, 171, 174, 175, 186, 187, 190, 191, 234, 235, 238, 239, 250, 251, 254,
+    255};
+
+simdutf_really_inline __m128i lsx_swap_bytes(__m128i vec) {
+  // const v16u8 shuf = {1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14};
+  // return __lsx_vshuf_b(__lsx_vldi(0), vec, shuf);
+  return __lsx_vshuf4i_b(vec, 0b10110001);
+  // return __lsx_vor_v(__lsx_vslli_h(vec, 8), __lsx_vsrli_h(vec, 8));
+}
+
+simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
+  return input.is_ascii();
+}
+
+simdutf_unused simdutf_really_inline simd8<bool>
+must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
+                     const simd8<uint8_t> prev3) {
+  simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller
+  // is using ^ as well. This will work fine because we only have to report
+  // errors for cases with 0-1 lead bytes. Multiple lead bytes implies 2
+  // overlapping multibyte characters, and if that happens, there is guaranteed
+  // to be at least *one* lead byte that is part of only 1 other multibyte
+  // character. The error will be detected there.
+  return is_second_byte ^ is_third_byte ^ is_fourth_byte;
+}
+
+simdutf_really_inline simd8<bool>
+must_be_2_3_continuation(const simd8<uint8_t> prev2,
+                         const simd8<uint8_t> prev3) {
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  return is_third_byte ^ is_fourth_byte;
+}
+
+// common functions for utf8 conversions
+simdutf_really_inline __m128i convert_utf8_3_byte_to_utf16(__m128i in) {
+  // Low half contains  10bbbbbb|10cccccc
+  // High half contains 1110aaaa|1110aaaa
+  const v16u8 sh = {2, 1, 5, 4, 8, 7, 11, 10, 0, 0, 3, 3, 6, 6, 9, 9};
+  const v8u16 v0fff = {0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff};
+
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, (__m128i)sh);
+  // 1110aaaa => aaaa0000
+  __m128i perm_high = __lsx_vslli_b(__lsx_vbsrl_v(perm, 8), 4);
+  // 10bbbbbb 10cccccc => 0010bbbb bbcccccc
+  __m128i composed = __lsx_vbitsel_v(__lsx_vsrli_h(perm, 2), /* perm >> 2*/
+                                     perm, __lsx_vrepli_h(0x3f) /* 0x003f */);
+  // 0010bbbb bbcccccc => aaaabbbb bbcccccc
+  composed = __lsx_vbitsel_v(perm_high, composed, (__m128i)v0fff);
+
+  return composed;
+}
+
+simdutf_really_inline __m128i convert_utf8_2_byte_to_utf16(__m128i in) {
+  // 10bbbbb 110aaaaa => 00bbbbb 000aaaaa
+  __m128i composed = __lsx_vand_v(in, __lsx_vldi(0x3f));
+  // 00bbbbbb 000aaaaa => 00000aaa aabbbbbb
+  composed = __lsx_vbitsel_v(
+      __lsx_vsrli_h(__lsx_vslli_h(composed, 8), 2), /* (aaaaa << 8) >> 2 */
+      __lsx_vsrli_h(composed, 8),                   /* bbbbbb >> 8 */
+      __lsx_vrepli_h(0x3f));                        /* 0x003f */
+  return composed;
+}
+
+simdutf_really_inline __m128i
+convert_utf8_1_to_2_byte_to_utf16(__m128i in, size_t shufutf8_idx) {
+  // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
+  // This is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes.
+  __m128i sh =
+      __lsx_vld(reinterpret_cast<const uint8_t *>(
+                    simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]),
+                0);
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, sh);
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000000 00bbbbbb
+  __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_h(0x7f)); // 6 or 7 bits
+  // 1 byte: 00000000 00000000
+  // 2 byte: 00000aaa aa000000
+  const __m128i v1f00 = __lsx_vldi(-2785); // -2785(13bit) => 151f
+  __m128i composed = __lsx_vsrli_h(__lsx_vand_v(perm, v1f00), 2); // 5 bits
+  // Combine with a shift right accumulate
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000aaa aabbbbbb
+  composed = __lsx_vadd_h(ascii, composed);
+  return composed;
+}
+
+/* begin file src/lsx/lsx_validate_utf16.cpp */
+/*
+    In UTF-16 code units in range 0xD800 to 0xDFFF have special meaning.
+
+    In a vectorized algorithm we want to examine the most significant
+    nibble in order to select a fast path. If none of highest nibbles
+    are 0xD (13), than we are sure that UTF-16 chunk in a vector
+    register is valid.
+
+    Let us analyze what we need to check if the nibble is 0xD. The
+    value of the preceding nibble determines what we have:
+
+    0xd000 .. 0xd7ff - a valid word
+    0xd800 .. 0xdbff - low surrogate
+    0xdc00 .. 0xdfff - high surrogate
+
+    Other constraints we have to consider:
+    - there must not be two consecutive low surrogates (0xd800 .. 0xdbff)
+    - there must not be two consecutive high surrogates (0xdc00 .. 0xdfff)
+    - there must not be sole low surrogate nor high surrogate
+
+    We're going to build three bitmasks based on the 3rd nibble:
+    - V = valid word,
+    - L = low surrogate (0xd800 .. 0xdbff)
+    - H = high surrogate (0xdc00 .. 0xdfff)
+
+      0   1   2   3   4   5   6   7    <--- word index
+    [ V | L | H | L | H | V | V | L ]
+      1   0   0   0   0   1   1   0     - V = valid masks
+      0   1   0   1   0   0   0   1     - L = low surrogate
+      0   0   1   0   1   0   0   0     - H high surrogate
+
+
+      1   0   0   0   0   1   1   0   V = valid masks
+      0   1   0   1   0   0   0   0   a = L & (H >> 1)
+      0   0   1   0   1   0   0   0   b = a << 1
+      1   1   1   1   1   1   1   0   c = V | a | b
+                                  ^
+                                  the last bit can be zero, we just consume 7
+   code units and recheck this word in the next iteration
+*/
+
+/* Returns:
+   - pointer to the last unprocessed character (a scalar fallback should check
+   the rest);
+   - nullptr if an error was detected.
+*/
+template <endianness big_endian>
+const char16_t *lsx_validate_utf16(const char16_t *input, size_t size) {
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
+    const auto in = simd8<uint8_t>(__lsx_vssrlni_bu_h(in1.value, in0.value, 8));
+
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
+
+      if (c == 0xffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return nullptr;
+      }
+    }
+  }
+
+  return input;
+}
+
+template <endianness big_endian>
+const result lsx_validate_utf16_with_errors(const char16_t *input,
+                                            size_t size) {
+  const char16_t *start = input;
+  const char16_t *end = input + size;
+
+  const auto v_d8 = simd8<uint8_t>::splat(0xd8);
+  const auto v_f8 = simd8<uint8_t>::splat(0xf8);
+  const auto v_fc = simd8<uint8_t>::splat(0xfc);
+  const auto v_dc = simd8<uint8_t>::splat(0xdc);
+
+  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+    // 0. Load data: since the validation takes into account only higher
+    //    byte of each word, we compress the two vectors into one which
+    //    consists only the higher bytes.
+    auto in0 = simd16<uint16_t>(input);
+    auto in1 =
+        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+
+    if (big_endian) {
+      in0 = in0.swap_bytes();
+      in1 = in1.swap_bytes();
+    }
+
+    const auto in = simd8<uint8_t>(__lsx_vssrlni_bu_h(in1.value, in0.value, 8));
+
+    // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
+    const auto surrogates_wordmask = (in & v_f8) == v_d8;
+    const uint16_t surrogates_bitmask =
+        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
+    if (surrogates_bitmask == 0x0000) {
+      input += 16;
+    } else {
+      // 2. We have some surrogates that have to be distinguished:
+      //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
+      //    - high surrogates: 0b1101'11xx'yyyy'yyyy (0xDC00..0xDFFF)
+      //
+      //    Fact: high surrogate has 11th bit set (3rd bit in the higher word)
+
+      // V - non-surrogate code units
+      //     V = not surrogates_wordmask
+      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+
+      // H - word-mask for high surrogates: the six highest bits are 0b1101'11
+      const auto vH = (in & v_fc) == v_dc;
+      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+
+      // L - word mask for low surrogates
+      //     L = not H and surrogates_wordmask
+      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+
+      const uint16_t a = static_cast<uint16_t>(
+          L & (H >> 1)); // A low surrogate must be followed by high one.
+                         // (A low surrogate placed in the 7th register's word
+                         // is an exception we handle.)
+      const uint16_t b = static_cast<uint16_t>(
+          a << 1); // Just mark that the opinput - startite fact is hold,
+                   // thanks to that we have only two masks for valid case.
+      const uint16_t c = static_cast<uint16_t>(
+          V | a | b); // Combine all the masks into the final one.
+
+      if (c == 0xffff) {
+        // The whole input register contains valid UTF-16, i.e.,
+        // either single code units or proper surrogate pairs.
+        input += 16;
+      } else if (c == 0x7fff) {
+        // The 15 lower code units of the input register contains valid UTF-16.
+        // The 15th word may be either a low or high surrogate. It the next
+        // iteration we 1) check if the low surrogate is followed by a high
+        // one, 2) reject sole high surrogate.
+        input += 15;
+      } else {
+        return result(error_code::SURROGATE, input - start);
+      }
+    }
+  }
+
+  return result(error_code::SUCCESS, input - start);
+}
+/* end file src/lsx/lsx_validate_utf16.cpp */
+/* begin file src/lsx/lsx_validate_utf32le.cpp */
+
+const char32_t *lsx_validate_utf32le(const char32_t *input, size_t size) {
+  const char32_t *end = input + size;
+
+  __m128i offset = __lsx_vreplgr2vr_w(uint32_t(0xffff2000));
+  __m128i standardoffsetmax = __lsx_vreplgr2vr_w(uint32_t(0xfffff7ff));
+  __m128i standardmax = __lsx_vldi(-2288); /*0x10ffff*/
+  __m128i currentmax = __lsx_vldi(0x0);
+  __m128i currentoffsetmax = __lsx_vldi(0x0);
+
+  while (input + 4 < end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(input), 0);
+    currentmax = __lsx_vmax_wu(in, currentmax);
+    // 0xD8__ + 0x2000 = 0xF8__ => 0xF8__ > 0xF7FF
+    currentoffsetmax =
+        __lsx_vmax_wu(__lsx_vadd_w(in, offset), currentoffsetmax);
+
+    input += 4;
+  }
+
+  __m128i is_zero =
+      __lsx_vxor_v(__lsx_vmax_wu(currentmax, standardmax), standardmax);
+  if (__lsx_bnz_v(is_zero)) {
+    return nullptr;
+  }
+
+  is_zero = __lsx_vxor_v(__lsx_vmax_wu(currentoffsetmax, standardoffsetmax),
+                         standardoffsetmax);
+  if (__lsx_bnz_v(is_zero)) {
+    return nullptr;
+  }
+
+  return input;
+}
+
+const result lsx_validate_utf32le_with_errors(const char32_t *input,
+                                              size_t size) {
+  const char32_t *start = input;
+  const char32_t *end = input + size;
+
+  __m128i offset = __lsx_vreplgr2vr_w(uint32_t(0xffff2000));
+  __m128i standardoffsetmax = __lsx_vreplgr2vr_w(uint32_t(0xfffff7ff));
+  __m128i standardmax = __lsx_vldi(-2288); /*0x10ffff*/
+  __m128i currentmax = __lsx_vldi(0x0);
+  __m128i currentoffsetmax = __lsx_vldi(0x0);
+
+  while (input + 4 < end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(input), 0);
+    currentmax = __lsx_vmax_wu(in, currentmax);
+    currentoffsetmax =
+        __lsx_vmax_wu(__lsx_vadd_w(in, offset), currentoffsetmax);
+
+    __m128i is_zero =
+        __lsx_vxor_v(__lsx_vmax_wu(currentmax, standardmax), standardmax);
+    if (__lsx_bnz_v(is_zero)) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
+
+    is_zero = __lsx_vxor_v(__lsx_vmax_wu(currentoffsetmax, standardoffsetmax),
+                           standardoffsetmax);
+    if (__lsx_bnz_v(is_zero)) {
+      return result(error_code::SURROGATE, input - start);
+    }
+
+    input += 4;
+  }
+
+  return result(error_code::SUCCESS, input - start);
+}
+/* end file src/lsx/lsx_validate_utf32le.cpp */
+
+/* begin file src/lsx/lsx_convert_latin1_to_utf8.cpp */
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+
+std::pair<const char *, char *>
+lsx_convert_latin1_to_utf8(const char *latin1_input, size_t len,
+                           char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char *end = latin1_input + len;
+
+  __m128i zero = __lsx_vldi(0);
+  // We always write 16 bytes, of which more than the first 8 bytes
+  // are valid. A safety margin of 8 is more than sufficient.
+  while (latin1_input + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(latin1_input), 0);
+    uint32_t ascii = __lsx_vpickve2gr_hu(__lsx_vmskgez_b(in8), 0);
+    if (ascii == 0xffff) { // ASCII fast path!!!!
+      __lsx_vst(in8, utf8_output, 0);
+      utf8_output += 16;
+      latin1_input += 16;
+      continue;
+    }
+    // We just fallback on UTF-16 code. This could be optimized/simplified
+    // further.
+    __m128i in16 = __lsx_vilvl_b(zero, in8);
+    // 1. prepare 2-byte values
+    // input 8-bit word : [aabb|bbbb] x 8
+    // expected output   : [1100|00aa|10bb|bbbb] x 8
+    // t0 = [0000|00aa|bbbb|bb00]
+    __m128i t0 = __lsx_vslli_h(in16, 2);
+    // t1 = [0000|00aa|0000|0000]
+    __m128i t1 = __lsx_vand_v(t0, __lsx_vldi(-2785));
+    // t3 = [0000|00aa|00bb|bbbb]
+    __m128i t2 = __lsx_vbitsel_v(t1, in16, __lsx_vrepli_h(0x3f));
+    // t4 = [1100|00aa|10bb|bbbb]
+    __m128i t3 = __lsx_vor_v(t2, __lsx_vreplgr2vr_h(uint16_t(0xc080)));
+    // merge ASCII and 2-byte codewords
+    __m128i one_byte_bytemask = __lsx_vsle_hu(in16, __lsx_vrepli_h(0x7F));
+    __m128i utf8_unpacked = __lsx_vbitsel_v(t3, in16, one_byte_bytemask);
+
+    const uint8_t *row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                             [lsx_1_2_utf8_bytes_mask[(ascii & 0xff)]][0];
+    __m128i shuffle = __lsx_vld(row + 1, 0);
+    __m128i utf8_packed = __lsx_vshuf_b(zero, utf8_unpacked, shuffle);
+
+    // store bytes
+    __lsx_vst(utf8_packed, utf8_output, 0);
+    // adjust pointers
+    latin1_input += 8;
+    utf8_output += row[0];
+
+  } // while
+
+  return std::make_pair(latin1_input, reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/lsx/lsx_convert_latin1_to_utf8.cpp */
+/* begin file src/lsx/lsx_convert_latin1_to_utf16.cpp */
+std::pair<const char *, char16_t *>
+lsx_convert_latin1_to_utf16le(const char *buf, size_t len,
+                              char16_t *utf16_output) {
+  const char *end = buf + len;
+
+  __m128i zero = __lsx_vldi(0);
+  while (buf + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m128i inlow = __lsx_vilvl_b(zero, in8);
+    __m128i inhigh = __lsx_vilvh_b(zero, in8);
+    __lsx_vst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lsx_vst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 16);
+
+    utf16_output += 16;
+    buf += 16;
+  }
+
+  return std::make_pair(buf, utf16_output);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
-}
+std::pair<const char *, char16_t *>
+lsx_convert_latin1_to_utf16be(const char *buf, size_t len,
+                              char16_t *utf16_output) {
+  const char *end = buf + len;
+  __m128i zero = __lsx_vldi(0);
+  while (buf + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m128i inlow = __lsx_vilvl_b(in8, zero);
+    __m128i inhigh = __lsx_vilvh_b(in8, zero);
+    __lsx_vst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lsx_vst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 16);
+    utf16_output += 16;
+    buf += 16;
+  }
 
-simdutf_warn_unused full_result implementation::base64_to_binary_details(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  return (options & base64_url)
-             ? compress_decode_base64<true>(output, input, length, options,
-                                            last_chunk_options)
-             : compress_decode_base64<false>(output, input, length, options,
-                                             last_chunk_options);
+  return std::make_pair(buf, utf16_output);
 }
+/* end file src/lsx/lsx_convert_latin1_to_utf16.cpp */
+/* begin file src/lsx/lsx_convert_latin1_to_utf32.cpp */
+std::pair<const char *, char32_t *>
+lsx_convert_latin1_to_utf32(const char *buf, size_t len,
+                            char32_t *utf32_output) {
+  const char *end = buf + len;
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(
-    size_t length, base64_options options) const noexcept {
-  return scalar::base64::base64_length_from_binary(length, options);
-}
+  while (buf + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
 
-size_t implementation::binary_to_base64(const char *input, size_t length,
-                                        char *output,
-                                        base64_options options) const noexcept {
-  if (options & base64_url) {
-    return encode_base64<true>(output, input, length, options);
-  } else {
-    return encode_base64<false>(output, input, length, options);
+    __m128i zero = __lsx_vldi(0);
+    __m128i in16low = __lsx_vilvl_b(zero, in8);
+    __m128i in16high = __lsx_vilvh_b(zero, in8);
+    __m128i in32_0 = __lsx_vilvl_h(zero, in16low);
+    __m128i in32_1 = __lsx_vilvh_h(zero, in16low);
+    __m128i in32_2 = __lsx_vilvl_h(zero, in16high);
+    __m128i in32_3 = __lsx_vilvh_h(zero, in16high);
+
+    __lsx_vst(in32_0, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(in32_1, reinterpret_cast<uint32_t *>(utf32_output + 4), 0);
+    __lsx_vst(in32_2, reinterpret_cast<uint32_t *>(utf32_output + 8), 0);
+    __lsx_vst(in32_3, reinterpret_cast<uint32_t *>(utf32_output + 12), 0);
+
+    utf32_output += 16;
+    buf += 16;
   }
-}
-} // namespace haswell
-} // namespace simdutf
 
-/* begin file src/simdutf/haswell/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_HASWELL
-// nothing needed.
-#else
-SIMDUTF_UNTARGET_REGION
-#endif
+  return std::make_pair(buf, utf32_output);
+}
+/* end file src/lsx/lsx_convert_latin1_to_utf32.cpp */
 
+/* begin file src/lsx/lsx_convert_utf8_to_utf16.cpp */
+// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 16, usually 12).
+template <endianness big_endian>
+size_t convert_masked_utf8_to_utf16(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char16_t *&utf16_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
 
-#if SIMDUTF_GCC11ORMORE // workaround for
-                        // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
-SIMDUTF_POP_DISABLE_WARNINGS
-#endif // end of workaround
-/* end file src/simdutf/haswell/end.h */
-/* end file src/haswell/implementation.cpp */
-#endif
-#if SIMDUTF_IMPLEMENTATION_PPC64
-/* begin file src/ppc64/implementation.cpp */
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xFFFF) {
+    // We process in chunks of 16 bytes
+    // The routine in simd.h is reused.
+    simd8<int8_t> temp{in};
+    temp.store_ascii_as_utf16<big_endian>(utf16_output);
+    utf16_output += 16; // We wrote 16 16-bit characters.
+    return 16;          // We consumed 16 bytes.
+  }
 
+  uint64_t buffer[2];
+  // 3 byte sequences are the next most common, as seen in CJK, which has long
+  // sequences of these.
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units.
+    __m128i composed = convert_utf8_3_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
 
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 4; // We wrote 4 16-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
 
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xAAAA) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte
+    // UTF-16 code units.
+    __m128i composed = convert_utf8_2_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
 
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 6; // We wrote 6 16-bit characters.
+    return 12;         // We consumed 12 bytes.
+  }
 
-/* begin file src/simdutf/ppc64/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "ppc64"
-// #define SIMDUTF_IMPLEMENTATION ppc64
-/* end file src/simdutf/ppc64/begin.h */
-namespace simdutf {
-namespace ppc64 {
-namespace {
-#ifndef SIMDUTF_PPC64_H
-  #error "ppc64.h must be included"
-#endif
-using namespace simd;
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
 
-simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
-  // careful: 0x80 is not ascii.
-  return input.reduce_or().saturating_sub(0b01111111u).bits_not_set_anywhere();
-}
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  const __m128i zero = __lsx_vldi(0);
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // Convert to UTF-16
+    __m128i composed = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
+    // Store
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 6; // We wrote 6 16-bit characters.
+    return consumed;
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // XXX: depending on the system scalar instructions might be faster.
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: xx0bbbbb x0cccccc
+    // 3 byte: xxbbbbbb x0cccccc
+    __m128i lowperm = __lsx_vpickev_h(perm, perm);
+    // 1 byte: 00000000 00000000
+    // 2 byte: 00000000 00000000
+    // 3 byte: 00000000 1110aaaa
+    __m128i highperm = __lsx_vpickod_h(perm, perm);
+    // 3 byte: aaaa0000 00000000
+    highperm = __lsx_vslli_h(highperm, 12);
+    // ASCII
+    // 1 byte: 00000000 0ccccccc
+    // 2+byte: 00000000 00cccccc
+    __m128i ascii = __lsx_vand_v(lowperm, __lsx_vrepli_h(0x7f));
+    // 1 byte: 00000000 00000000
+    // 2 byte: xx0bbbbb 00000000
+    // 3 byte: xxbbbbbb 00000000
+    __m128i middlebyte = __lsx_vand_v(lowperm, __lsx_vldi(-2561) /*0xFF00*/);
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: 0010bbbb bbcccccc
+    // 3 byte: 0010bbbb bbcccccc
+    __m128i composed = __lsx_vor_v(__lsx_vsrli_h(middlebyte, 2), ascii);
 
-simdutf_unused simdutf_really_inline simd8<bool>
-must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
-                     const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_second_byte =
-      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
-  simd8<uint8_t> is_third_byte =
-      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
-  simd8<uint8_t> is_fourth_byte =
-      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
-  // Caller requires a bool (all 1's). All values resulting from the subtraction
-  // will be <= 64, so signed comparison is fine.
-  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
-         int8_t(0);
-}
+    __m128i v0fff = __lsx_vreplgr2vr_h(uint16_t(0xfff));
+    // aaaabbbb bbcccccc
+    composed = __lsx_vbitsel_v(highperm, composed, v0fff);
 
-simdutf_really_inline simd8<bool>
-must_be_2_3_continuation(const simd8<uint8_t> prev2,
-                         const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_third_byte =
-      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
-  simd8<uint8_t> is_fourth_byte =
-      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
-  // Caller requires a bool (all 1's). All values resulting from the subtraction
-  // will be <= 64, so signed comparison is fine.
-  return simd8<bool>(is_third_byte | is_fourth_byte);
-}
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
 
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 4; // We wrote 4 16-bit codepoints
+    return consumed;
+  } else if (idx < 209) {
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-16 pairs. Generating surrogate pairs is a little tricky though, but
+      // it is easier when we can assume they are all pairs. This version does
+      // not use the LUT, but 4 byte sequences are less common and the overhead
+      // of the extra memory access is less important than the early branch
+      // overhead in shorter sequences.
 
-/* begin file src/generic/buf_block_reader.h */
-namespace simdutf {
-namespace ppc64 {
-namespace {
+      // Swap byte pairs
+      // 10dddddd 10cccccc|10bbbbbb 11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      __m128i swap = lsx_swap_bytes(in);
+      // Shift left 2 bits
+      // cccccc00 dddddd00 xxxxxxxx bbbbbb00
+      __m128i shift = __lsx_vslli_b(swap, 2);
+      // Create a magic number containing the low 2 bits of the trail surrogate
+      // and all the corrections needed to create the pair. UTF-8 4b prefix   =
+      // -0x0000|0xF000 surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
+      // surrogate high    = +0x0000|0xD800
+      // surrogate low     = +0xDC00|0x0000
+      // -------------------------------
+      //                   = +0xDC00|0xE7C0
+      __m128i magic = __lsx_vreplgr2vr_w(uint32_t(0xDC00E7C0));
+      // Generate unadjusted trail surrogate minus lowest 2 bits
+      // vec(0000FF00) = __lsx_vldi(-1758)
+      // xxxxxxxx xxxxxxxx|11110aaa bbbbbb00
+      __m128i trail =
+          __lsx_vbitsel_v(shift, swap, __lsx_vldi(-1758 /*0000FF00*/));
+      // Insert low 2 bits of trail surrogate to magic number for later
+      // 11011100 00000000 11100111 110000cc
+      __m128i magic_with_low_2 = __lsx_vor_v(__lsx_vsrli_w(shift, 30), magic);
 
-// Walks through a buffer in block-sized increments, loading the last part with
-// spaces
-template <size_t STEP_SIZE> struct buf_block_reader {
-public:
-  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
-  simdutf_really_inline size_t block_index();
-  simdutf_really_inline bool has_full_block() const;
-  simdutf_really_inline const uint8_t *full_block() const;
-  /**
-   * Get the last block, padded with spaces.
-   *
-   * There will always be a last block, with at least 1 byte, unless len == 0
-   * (in which case this function fills the buffer with spaces and returns 0. In
-   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
-   * block with STEP_SIZE bytes and no spaces for padding.
-   *
-   * @return the number of effective characters in the last block.
-   */
-  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
-  simdutf_really_inline void advance();
+      // Generate lead surrogate
+      // xxxxcccc ccdddddd|xxxxxxxx xxxxxxxx
+      // 000000cc ccdddddd|xxxxxxxx xxxxxxxx
+      __m128i lead = __lsx_vbitsel_v(
+          __lsx_vsrli_h(__lsx_vand_v(shift, __lsx_vldi(0x3F)), 4), swap,
+          __lsx_vrepli_h(0x3f /* 0x003f*/));
 
-private:
-  const uint8_t *buf;
-  const size_t len;
-  const size_t lenminusstep;
-  size_t idx;
-};
+      // Blend pairs
+      // __lsx_vldi(-1741) => vec(0x0000FFFF)
+      // 000000cc ccdddddd|11110aaa bbbbbb00
+      __m128i blend =
+          __lsx_vbitsel_v(lead, trail, __lsx_vldi(-1741) /* (0x0000FFFF)*4 */);
 
-// Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char *format_input_text_64(const uint8_t *text) {
-  static char *buf =
-      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
-    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+      // Add magic number to finish the result
+      // 110111CC CCDDDDDD|110110AA BBBBBBCC
+      __m128i composed = __lsx_vadd_h(blend, magic_with_low_2);
+      // Byte swap if necessary
+      if (!match_system(big_endian)) {
+        composed = lsx_swap_bytes(composed);
+      }
+      // __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+      __lsx_vst(composed, reinterpret_cast<uint16_t *>(buffer), 0);
+      std::memcpy(utf16_output, buffer, 12);
+      utf16_output += 6; // We 3 32-bit surrogate pairs.
+      return 12;         // We consumed 12 bytes.
+    }
+    // 3 1-4 byte sequences
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 3 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // added to fix issue https://github.com/simdutf/simdutf/issues/514
+    // We only want to write 2 * 16-bit code units when that is actually what we
+    // have. Unfortunately, we cannot trust the input. So it is possible to get
+    // 0xff as an input byte and it should not result in a surrogate pair. We
+    // need to check for that.
+    uint32_t permbuffer[4];
+    __lsx_vst(perm, permbuffer, 0);
+    // Mask the low and middle bytes
+    // 00000000 00000000 00000000 0ddddddd
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7f));
+    // Because the surrogates need more work, the high surrogate is computed
+    // first.
+    __m128i middlehigh = __lsx_vslli_w(perm, 2);
+    // 00000000 00000000 00cccccc 00000000
+    __m128i middlebyte = __lsx_vand_v(perm, __lsx_vldi(-3777) /* 0x00003F00 */);
+    // Start assembling the sequence. Since the 4th byte is in the same position
+    // as it would be in a surrogate and there is no dependency, shift left
+    // instead of right. 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx 4 byte:
+    // 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
+    __m128i ab =
+        __lsx_vbitsel_v(middlehigh, perm, __lsx_vldi(-1656) /*0xFF000000*/);
+    // Top 16 bits contains the high ten bits of the surrogate pair before
+    // correction 3 byte: 00000000 10bbbbcc|cccc0000 00000000 4 byte: 11110aaa
+    // bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
+    __m128i v_fffc0000 = __lsx_vreplgr2vr_w(uint32_t(0xFFFC0000));
+    __m128i abc = __lsx_vbitsel_v(__lsx_vslli_w(middlebyte, 4), ab, v_fffc0000);
+    // Combine the low 6 or 7 bits by a shift right accumulate
+    // 3 byte: 00000000 00000010|bbbbcccc ccdddddd - low 16 bits correct
+    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o
+    // correction
+    __m128i composed = __lsx_vor_v(ascii, __lsx_vsrli_w(abc, 6));
+    // After this is for surrogates
+    // Blend the low and high surrogates
+    // 4 byte: 11110aaa bbbbbbcc|bbbbcccc ccdddddd
+    __m128i mixed =
+        __lsx_vbitsel_v(abc, composed, __lsx_vldi(-1741) /*0x0000FFFF*/);
+    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits
+    // yet as 0x10000 was not subtracted from the codepoint yet. 4 byte:
+    // 11110aaa bbbbbbcc|000000cc ccdddddd
+    __m128i v_ffff03ff = __lsx_vreplgr2vr_w(uint32_t(0xFFFF03FF));
+    __m128i masked_pair = __lsx_vand_v(mixed, v_ffff03ff);
+    // Correct the remaining UTF-8 prefix, surrogate offset, and add the
+    // surrogate prefixes in one magic 16-bit addition. similar magic number but
+    // without the continue byte adjust and halfword swapped UTF-8 4b prefix   =
+    // -0xF000|0x0000 surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
+    // surrogate high    = +0xD800|0x0000
+    // surrogate low     = +0x0000|0xDC00
+    // -----------------------------------
+    //                   = +0xE7C0|0xDC00
+    __m128i magic = __lsx_vreplgr2vr_w(uint32_t(0xE7C0DC00));
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD - surrogate pair complete
+    __m128i surrogates = __lsx_vadd_w(masked_pair, magic);
+    // If the high bit is 1 (s32 less than zero), this needs a surrogate pair
+    __m128i is_pair = __lsx_vslt_w(perm, zero);
+    // Select either the 4 byte surrogate pair or the 2 byte solo codepoint
+    // 3 byte: 0xxxxxxx xxxxxxxx|bbbbcccc ccdddddd
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD
+    __m128i selected = __lsx_vbitsel_v(composed, surrogates, is_pair);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      selected = lsx_swap_bytes(selected);
+    }
+    // Attempting to shuffle and store would be complex, just scalarize.
+    uint32_t buffer_tmp[4];
+    __lsx_vst(selected, buffer_tmp, 0);
+    // Test for the top bit of the surrogate mask. Remove due to issue 514
+    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 :
+    // 0x00800000;
+    for (size_t i = 0; i < 3; i++) {
+      // Surrogate
+      // Used to be if (buffer[i] & SURROGATE_MASK) {
+      // See discussion above.
+      // patch for issue https://github.com/simdutf/simdutf/issues/514
+      if ((permbuffer[i] & 0xf8000000) == 0xf0000000) {
+        utf16_output[0] = uint16_t(buffer_tmp[i] >> 16);
+        utf16_output[1] = uint16_t(buffer_tmp[i] & 0xFFFF);
+        utf16_output += 2;
+      } else {
+        utf16_output[0] = uint16_t(buffer_tmp[i] & 0xFFFF);
+        utf16_output++;
+      }
+    }
+    return consumed;
+  } else {
+    // here we know that there is an error but we do not handle errors
+    return 12;
   }
-  buf[sizeof(simd8x64<uint8_t>)] = '\0';
-  return buf;
 }
+/* end file src/lsx/lsx_convert_utf8_to_utf16.cpp */
+/* begin file src/lsx/lsx_convert_utf8_to_utf32.cpp */
+// Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
+// end of the code points. Only the least significant 12 bits of the mask
+// are accessed.
+// It returns how many bytes were consumed (up to 12).
+size_t convert_masked_utf8_to_utf32(const char *input,
+                                    uint64_t utf8_end_of_code_point_mask,
+                                    char32_t *&utf32_out) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  uint32_t *&utf32_output = reinterpret_cast<uint32_t *&>(utf32_out);
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xFFF;
+  //
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+  //
+  // We first try a few fast paths.
+  if ((utf8_end_of_code_point_mask & 0xffff) == 0xffff) {
+    // We process in chunks of 16 bytes.
+    // use fast implementation in src/simdutf/arm64/simd.h
+    // Ideally the compiler can keep the tables in registers.
+    simd8<int8_t> temp{in};
+    temp.store_ascii_as_utf32_tbl(utf32_out);
+    utf32_output += 16; // We wrote 16 32-bit characters.
+    return 16;          // We consumed 16 bytes.
+  }
+  __m128i zero = __lsx_vldi(0);
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
+    // UTF-32 code units. Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_3_byte_to_utf16(in);
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
 
-// Routines to print masks and text for debugging bitmask operations
-simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
-  static char *buf =
-      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
-  in.store(reinterpret_cast<uint8_t *>(buf));
-  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
-    if (buf[i] < ' ') {
-      buf[i] = '_';
-    }
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return 12;         // We consumed 12 bytes.
   }
-  buf[sizeof(simd8x64<uint8_t>)] = '\0';
-  return buf;
-}
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if (input_utf8_end_of_code_point_mask == 0xaaa) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte
+    // UTF-32 code units. Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_2_byte_to_utf16(in);
 
-simdutf_unused static char *format_mask(uint64_t mask) {
-  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
-  for (size_t i = 0; i < 64; i++) {
-    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
+    __m128i utf32_high = __lsx_vilvh_h(zero, composed_utf16);
+
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(utf32_high, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    utf32_output += 6;
+    return 12; // We consumed 12 bytes.
   }
-  buf[64] = '\0';
-  return buf;
-}
+  /// Either no fast path or an unimportant fast path.
 
-template <size_t STEP_SIZE>
-simdutf_really_inline
-buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
-    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
-      idx{0} {}
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
 
-template <size_t STEP_SIZE>
-simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
-  return idx;
+  if (idx < 64) {
+    // SIX (6) input code-code units
+    // Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
+    __m128i utf32_high = __lsx_vilvh_h(zero, composed_utf16);
+
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(utf32_high, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    utf32_output += 6;
+    return consumed;
+  } else if (idx < 145) {
+    // FOUR (4) input code-code units
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // Shuffle
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // Split
+    // 00000000 00000000 0ccccccc
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7F)); // 6 or 7 bits
+    // Note: unmasked
+    // xxxxxxxx aaaaxxxx xxxxxxxx
+    __m128i high =
+        __lsx_vsrli_w(__lsx_vand_v(perm, __lsx_vldi(0xf)), 4); // 4 bits
+    // Use 16 bit bic instead of and.
+    // The top bits will be corrected later in the bsl
+    // 00000000 10bbbbbb 00000000
+    __m128i middle =
+        __lsx_vand_v(perm, __lsx_vldi(-1758 /*0x0000FF00*/)); // 5 or 6 bits
+    // Combine low and middle with shift right accumulate
+    // 00000000 00xxbbbb bbcccccc
+    __m128i lowmid = __lsx_vor_v(ascii, __lsx_vsrli_w(middle, 2));
+    // Insert top 4 bits from high byte with bitwise select
+    // 00000000 aaaabbbb bbcccccc
+    __m128i composed =
+        __lsx_vbitsel_v(lowmid, high, __lsx_vldi(-3600 /*0x0000F000*/));
+    __lsx_vst(composed, utf32_output, 0);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return consumed;
+  } else if (idx < 209) {
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-32 code units. This uses the same method as the fixed 3 byte
+      // version, reversing and shift left insert. However, there is no need for
+      // a shuffle mask now, just rev16 and rev32.
+      //
+      // This version does not use the LUT, but 4 byte sequences are less common
+      // and the overhead of the extra memory access is less important than the
+      // early branch overhead in shorter sequences, so it comes last.
+
+      // Swap pairs of bytes
+      // 10dddddd|10cccccc|10bbbbbb|11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      __m128i swap = lsx_swap_bytes(in);
+      // Shift left and insert
+      // xxxxcccc ccdddddd|xxxxxxxa aabbbbbb
+      __m128i merge1 = __lsx_vbitsel_v(__lsx_vsrli_h(swap, 2), swap,
+                                       __lsx_vrepli_h(0x3f /*0x003F*/));
+      // Shift insert again
+      // xxxxxxxx xxxaaabb bbbbcccc ccdddddd
+      __m128i merge2 =
+          __lsx_vbitsel_v(__lsx_vslli_w(merge1, 12), /* merge1 << 12 */
+                          __lsx_vsrli_w(merge1, 16), /* merge1 >> 16 */
+                          __lsx_vldi(-2545));        /*0x00000FFF*/
+      // Clear the garbage
+      // 00000000 000aaabb bbbbcccc ccdddddd
+      __m128i composed = __lsx_vand_v(merge2, __lsx_vldi(-2273 /*0x1FFFFF*/));
+      // Store
+      __lsx_vst(composed, utf32_output, 0);
+      utf32_output += 3; // We wrote 3 32-bit characters.
+      return 12;         // We consumed 12 bytes.
+    }
+    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit
+    // due to surrogates no longer being involved.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 2 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+
+    // Ascii
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7F));
+    __m128i middle = __lsx_vand_v(perm, __lsx_vldi(-3777 /*0x00003f00*/));
+    // 00000000 00000000 0000cccc ccdddddd
+    __m128i cd =
+        __lsx_vbitsel_v(__lsx_vsrli_w(middle, 2), ascii, __lsx_vrepli_w(0x3f));
+
+    __m128i correction = __lsx_vand_v(perm, __lsx_vldi(-3520 /*0x00400000*/));
+    __m128i corrected = __lsx_vadd_b(perm, __lsx_vsrli_w(correction, 1));
+    // Insert twice
+    // 00000000 000aaabb bbbbxxxx xxxxxxxx
+    __m128i corrected_srli2 =
+        __lsx_vsrli_w(__lsx_vand_v(corrected, __lsx_vrepli_b(0x7)), 2);
+    __m128i ab =
+        __lsx_vbitsel_v(corrected_srli2, corrected, __lsx_vrepli_h(0x3f));
+    ab = __lsx_vsrli_w(ab, 4);
+    // 00000000 000aaabb bbbbcccc ccdddddd
+    __m128i composed =
+        __lsx_vbitsel_v(ab, cd, __lsx_vldi(-2545 /*0x00000FFF*/));
+    // Store
+    __lsx_vst(composed, utf32_output, 0);
+    utf32_output += 3; // We wrote 3 32-bit characters.
+    return consumed;
+  } else {
+    // here we know that there is an error but we do not handle errors
+    return 12;
+  }
 }
+/* end file src/lsx/lsx_convert_utf8_to_utf32.cpp */
+/* begin file src/lsx/lsx_convert_utf8_to_latin1.cpp */
+size_t convert_masked_utf8_to_latin1(const char *input,
+                                     uint64_t utf8_end_of_code_point_mask,
+                                     char *&latin1_output) {
+  // we use an approach where we try to process up to 12 input bytes.
+  // Why 12 input bytes and not 16? Because we are concerned with the size of
+  // the lookup tables. Also 12 is nicely divisible by two and three.
+  //
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
+  // Optimization note: our main path below is load-latency dependent. Thus it
+  // is maybe beneficial to have fast paths that depend on branch prediction but
+  // have less latency. This results in more instructions but, potentially, also
+  // higher speeds.
+
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xFFFF) {
+    // We process in chunks of 16 bytes
+    __lsx_vst(in, reinterpret_cast<uint8_t *>(latin1_output), 0);
+    latin1_output += 16; // We wrote 16 18-bit characters.
+    return 16;           // We consumed 16 bytes.
+  }
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  // this indicates an invalid input:
+  if (idx >= 64) {
+    return consumed;
+  }
+  // Here we should have (idx < 64), if not, there is a bug in the validation or
+  // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. Converts 6
+  // 1-2 byte UTF-8 characters to 6 UTF-16 characters. This is a relatively easy
+  // scenario we process SIX (6) input code-code units. The max length in bytes
+  // of six code code units spanning between 1 and 2 bytes each is 12 bytes.
+  __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                             simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                         0);
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, sh);
+  // ascii mask
+  // 1 byte: 11111111 11111111
+  // 2 byte: 00000000 00000000
+  __m128i ascii_mask = __lsx_vslt_bu(perm, __lsx_vldi(0x80));
+  // utf8 mask
+  // 1 byte: 00000000 00000000
+  // 2 byte: 00111111 00111111
+  __m128i utf8_mask = __lsx_vand_v(__lsx_vsle_bu(__lsx_vldi(0x80), perm),
+                                   __lsx_vldi(0b00111111));
+  // mask
+  //  1 byte: 11111111 11111111
+  //  2 byte: 00111111 00111111
+  __m128i mask = __lsx_vor_v(utf8_mask, ascii_mask);
+
+  __m128i composed = __lsx_vbitsel_v(__lsx_vsrli_h(perm, 2), perm, mask);
+  // writing 8 bytes even though we only care about the first 6 bytes.
+  __m128i latin1_packed = __lsx_vpickev_b(__lsx_vldi(0), composed);
 
-template <size_t STEP_SIZE>
-simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
-  return idx < lenminusstep;
+  uint64_t buffer[2];
+  // __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+  __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(buffer), 0);
+  std::memcpy(latin1_output, buffer, 6);
+  latin1_output += 6; // We wrote 6 bytes.
+  return consumed;
 }
+/* end file src/lsx/lsx_convert_utf8_to_latin1.cpp */
 
-template <size_t STEP_SIZE>
-simdutf_really_inline const uint8_t *
-buf_block_reader<STEP_SIZE>::full_block() const {
-  return &buf[idx];
+/* begin file src/lsx/lsx_convert_utf16_to_latin1.cpp */
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+lsx_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                            char *latin1_output) {
+  const char16_t *end = buf + len;
+  while (buf + 16 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
+      in1 = lsx_swap_bytes(in1);
+    }
+    if (__lsx_bz_v(__lsx_vpickod_b(in1, in))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vpickev_b(in1, in);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
+    }
+  } // while
+  return std::make_pair(buf, latin1_output);
 }
 
-template <size_t STEP_SIZE>
-simdutf_really_inline size_t
-buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
-  if (len == idx) {
-    return 0;
-  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
-  std::memset(dst, 0x20,
-              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
-                          // to write out 8 or 16 bytes at once.
-  std::memcpy(dst, buf + idx, len - idx);
-  return len - idx;
+template <endianness big_endian>
+std::pair<result, char *>
+lsx_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+  while (buf + 16 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
+      in1 = lsx_swap_bytes(in1);
+    }
+    if (__lsx_bz_v(__lsx_vpickod_b(in1, in))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vpickev_b(in1, in);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      // Let us do a scalar fallback.
+      for (int k = 0; k < 16; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
+        }
+      }
+    }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
+/* end file src/lsx/lsx_convert_utf16_to_latin1.cpp */
+/* begin file src/lsx/lsx_convert_utf16_to_utf8.cpp */
+/*
+    The vectorized algorithm works on single SSE register i.e., it
+    loads eight 16-bit code units.
 
-template <size_t STEP_SIZE>
-simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
-  idx += STEP_SIZE;
-}
+    We consider three cases:
+    1. an input register contains no surrogates and each value
+       is in range 0x0000 .. 0x07ff.
+    2. an input register contains no surrogates and values are
+       is in range 0x0000 .. 0xffff.
+    3. an input register contains surrogates --- i.e. codepoints
+       can have 16 or 32 bits.
 
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/buf_block_reader.h */
-/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_validation {
+    Ad 1.
 
-using namespace simd;
+    When values are less than 0x0800, it means that a 16-bit code unit
+    can be converted into: 1) single UTF8 byte (when it's an ASCII
+    char) or 2) two UTF8 bytes.
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+    For this case we do only some shuffle to obtain these 2-byte
+    codes and finally compress the whole SSE register with a single
+    shuffle.
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+    Ad 2.
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+    When values fit in 16-bit code units, but are above 0x07ff, then
+    a single word may produce one, two or three UTF8 bytes.
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+    We prepare data for all these three cases in two registers.
+    The first register contains lower two UTF8 bytes (used in all
+    cases), while the second one contains just the third byte for
+    the three-UTF8-bytes case.
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
-}
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
-}
+    Finally these two registers are interleaved forming eight-element
+    array of 32-bit values. The array spans two SSE registers.
+    The bytes from the registers are compressed using two shuffles.
 
-//
-// Return nonzero if there are incomplete multibyte characters at the end of the
-// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
-//
-simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
-  // If the previous input's last 3 bytes match this, they're too short (they
-  // ended at EOF):
-  // ... 1111____ 111_____ 11______
-  static const uint8_t max_array[32] = {255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        255,
-                                        0b11110000u - 1,
-                                        0b11100000u - 1,
-                                        0b11000000u - 1};
-  const simd8<uint8_t> max_value(
-      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
-  return input.gt_bits(max_value);
-}
+    We need 256-entry lookup table to get a compression pattern
+    and the number of output bytes in the compressed vector register.
+    Each entry occupies 17 bytes.
 
-struct utf8_checker {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
-  // The last input we received
-  simd8<uint8_t> prev_input_block;
-  // Whether the last input we received was incomplete (used for ASCII fast
-  // path)
-  simd8<uint8_t> prev_incomplete;
 
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
-  }
+    To summarize:
+    - We need two 256-entry tables that have 8704 bytes in total.
+*/
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
+template <endianness big_endian>
+std::pair<const char16_t *, char *>
+lsx_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *end = buf + len;
 
-  // The only problem that can happen at EOF is that a multibyte character is
-  // too short or a byte value too large in the last bytes: check_special_cases
-  // only checks for bytes too large in the first of two bytes.
-  simdutf_really_inline void check_eof() {
-    // If the previous block had incomplete UTF-8 characters at the end, an
-    // ASCII block can't possibly finish them.
-    this->error |= this->prev_incomplete;
-  }
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
-    if (simdutf_likely(is_ascii(input))) {
-      this->error |= this->prev_incomplete;
-    } else {
-      // you might think that a for-loop would work, but under Visual Studio, it
-      // is not good enough.
-      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-                    "We support either two or four chunks per 64-byte block.");
-      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
-        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+  __m128i v_07ff = __lsx_vreplgr2vr_h(uint16_t(0x7ff));
+  while (buf + 16 + safety_margin <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
+    }
+    if (__lsx_bz_v(
+            __lsx_vslt_hu(__lsx_vrepli_h(0x7F), in))) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      __m128i nextin = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
+      if (!match_system(big_endian)) {
+        nextin = lsx_swap_bytes(nextin);
+      }
+      if (__lsx_bz_v(__lsx_vslt_hu(__lsx_vrepli_h(0x7F), nextin))) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(nextin, in);
+        // 2. store (16 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(in, in);
+        // 2. store (8 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
       }
-      this->prev_incomplete =
-          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
-      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
-  }
 
-  // do not forget to call check_eof!
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+    __m128i zero = __lsx_vldi(0);
+    if (__lsx_bz_v(__lsx_vslt_hu(v_07ff, in))) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      // t0 = [000a|aaaa|bbbb|bb00]
+      __m128i t0 = __lsx_vslli_h(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      __m128i t1 = __lsx_vand_v(t0, __lsx_vldi(-2785 /*0x1f00*/));
+      // t2 = [0000|0000|00bb|bbbb]
+      __m128i t2 = __lsx_vand_v(in, __lsx_vrepli_h(0x3f));
+      // t3 = [000a|aaaa|00bb|bbbb]
+      __m128i t3 = __lsx_vor_v(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      __m128i v_c080 = __lsx_vreplgr2vr_h(uint16_t(0xc080));
+      __m128i t4 = __lsx_vor_v(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      __m128i one_byte_bytemask =
+          __lsx_vsle_hu(in, __lsx_vrepli_h(0x7F /*0x007F*/));
+      __m128i utf8_unpacked = __lsx_vbitsel_v(t4, in, one_byte_bytemask);
+      // 3. prepare bitmask for 8-bit lookup
+      uint32_t m2 = __lsx_vpickve2gr_bu(__lsx_vmskltz_h(one_byte_bytemask), 0);
+      // 4. pack the bytes
+      const uint8_t *row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                               [lsx_1_2_utf8_bytes_mask[m2]][0];
+      __m128i shuffle = __lsx_vld(row, 1);
+      __m128i utf8_packed = __lsx_vshuf_b(zero, utf8_unpacked, shuffle);
+      // 5. store bytes
+      __lsx_vst(utf8_packed, utf8_output, 0);
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
+    }
+    __m128i surrogates_bytemask =
+        __lsx_vseq_h(__lsx_vand_v(in, __lsx_vldi(-2568 /*0xF800*/)),
+                     __lsx_vldi(-2600 /*0xD800*/));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (__lsx_bz_v(surrogates_bytemask)) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      /* In this branch we handle three cases:
+           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+         single UFT-8 byte
+           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+         two UTF-8 bytes
+           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+         three UTF-8 bytes
 
-}; // struct utf8_checker
-} // namespace utf8_validation
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-using utf8_validation::utf8_checker;
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
-/* begin file src/generic/utf8_validation/utf8_validator.h */
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_validation {
+          We precompute byte 1 for case #3 and -- **conditionally** --
+         precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+         they differ by exactly one bit.
 
-/**
- * Validates that the string is actual UTF-8.
- */
-template <class checker>
-bool generic_validate_utf8(const uint8_t *input, size_t length) {
-  checker c{};
-  buf_block_reader<64> reader(input, length);
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    c.check_next_input(in);
-    reader.advance();
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  c.check_next_input(in);
-  reader.advance();
-  c.check_eof();
-  return !c.errors();
-}
+          Finally from these two code units we build proper UTF-8 sequence,
+         taking into account the case (i.e, the number of bytes to write).
+        */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      __m128i t0 = __lsx_vpickev_b(in, in);
+      t0 = __lsx_vilvl_b(t0, t0);
 
-bool generic_validate_utf8(const char *input, size_t length) {
-  return generic_validate_utf8<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
-}
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|00cc|cccc]
+      __m128i v_3f7f = __lsx_vreplgr2vr_h(uint16_t(0x3F7F));
+      __m128i t1 = __lsx_vand_v(t0, v_3f7f);
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      __m128i t2 = __lsx_vor_v(t1, __lsx_vldi(-2688 /*0x8000*/));
 
-/**
- * Validates that the string is actual UTF-8 and stops on errors.
- */
-template <class checker>
-result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
-  checker c{};
-  buf_block_reader<64> reader(input, length);
-  size_t count{0};
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    c.check_next_input(in);
-    if (c.errors()) {
-      if (count != 0) {
-        count--;
-      } // Sometimes the error is only detected in the next chunk
-      result res = scalar::utf8::rewind_and_validate_with_errors(
-          reinterpret_cast<const char *>(input),
-          reinterpret_cast<const char *>(input + count), length - count);
-      res.count += count;
-      return res;
-    }
-    reader.advance();
-    count += 64;
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  c.check_next_input(in);
-  reader.advance();
-  c.check_eof();
-  if (c.errors()) {
-    if (count != 0) {
-      count--;
-    } // Sometimes the error is only detected in the next chunk
-    result res = scalar::utf8::rewind_and_validate_with_errors(
-        reinterpret_cast<const char *>(input),
-        reinterpret_cast<const char *>(input) + count, length - count);
-    res.count += count;
-    return res;
-  } else {
-    return result(error_code::SUCCESS, length);
-  }
-}
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      __m128i s0 = __lsx_vsrli_h(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      __m128i s1 = __lsx_vslli_h(in, 2);
+      // s1: [aabb|bbbb|cccc|cc00] => [00bb|bbbb|0000|0000]
+      s1 = __lsx_vand_v(s1, __lsx_vldi(-2753 /*0x3F00*/));
 
-result generic_validate_utf8_with_errors(const char *input, size_t length) {
-  return generic_validate_utf8_with_errors<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
+      // [00bb|bbbb|0000|aaaa]
+      __m128i s2 = __lsx_vor_v(s0, s1);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      __m128i v_c0e0 = __lsx_vreplgr2vr_h(uint16_t(0xC0E0));
+      __m128i s3 = __lsx_vor_v(s2, v_c0e0);
+      __m128i one_or_two_bytes_bytemask = __lsx_vsle_hu(in, v_07ff);
+      __m128i m0 = __lsx_vandn_v(one_or_two_bytes_bytemask,
+                                 __lsx_vldi(-2752 /*0x4000*/));
+      __m128i s4 = __lsx_vxor_v(s3, m0);
+
+      // 4. expand code units 16-bit => 32-bit
+      __m128i out0 = __lsx_vilvl_h(s4, t2);
+      __m128i out1 = __lsx_vilvh_h(s4, t2);
+
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      __m128i one_byte_bytemask = __lsx_vsle_hu(in, __lsx_vrepli_h(0x7F));
+
+      __m128i one_or_two_bytes_bytemask_low =
+          __lsx_vilvl_h(one_or_two_bytes_bytemask, zero);
+      __m128i one_or_two_bytes_bytemask_high =
+          __lsx_vilvh_h(one_or_two_bytes_bytemask, zero);
+
+      __m128i one_byte_bytemask_low =
+          __lsx_vilvl_h(one_byte_bytemask, one_byte_bytemask);
+      __m128i one_byte_bytemask_high =
+          __lsx_vilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+      const uint32_t mask0 = __lsx_vpickve2gr_bu(
+          __lsx_vmskltz_h(__lsx_vor_v(one_or_two_bytes_bytemask_low,
+                                      one_byte_bytemask_low)),
+          0);
+      const uint32_t mask1 = __lsx_vpickve2gr_bu(
+          __lsx_vmskltz_h(__lsx_vor_v(one_or_two_bytes_bytemask_high,
+                                      one_byte_bytemask_high)),
+          0);
+
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      __m128i shuffle0 = __lsx_vld(row0, 1);
+      __m128i utf8_0 = __lsx_vshuf_b(zero, out0, shuffle0);
+
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_1 = __lsx_vshuf_b(zero, out1, shuffle1);
+
+      __lsx_vst(utf8_0, utf8_output, 0);
+      utf8_output += row0[0];
+      __lsx_vst(utf8_1, utf8_output, 0);
+      utf8_output += row1[0];
+
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
+    }
+  } // while
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
 }
 
-template <class checker>
-bool generic_validate_ascii(const uint8_t *input, size_t length) {
-  buf_block_reader<64> reader(input, length);
-  uint8_t blocks[64]{};
-  simd::simd8x64<uint8_t> running_or(blocks);
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    running_or |= in;
-    reader.advance();
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  running_or |= in;
-  return running_or.is_ascii();
-}
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char *>
+lsx_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
 
-bool generic_validate_ascii(const char *input, size_t length) {
-  return generic_validate_ascii<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
-}
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+  while (buf + 16 + safety_margin <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
+    }
+    if (__lsx_bz_v(
+            __lsx_vslt_hu(__lsx_vrepli_h(0x7F), in))) { // ASCII fast path!!!!
+      // It is common enough that we have sequences of 16 consecutive ASCII
+      // characters.
+      __m128i nextin = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
+      if (!match_system(big_endian)) {
+        nextin = lsx_swap_bytes(nextin);
+      }
+      if (__lsx_bz_v(__lsx_vslt_hu(__lsx_vrepli_h(0x7F), nextin))) {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(nextin, in);
+        // 2. store (16 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
+      } else {
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(in, in);
+        // 2. store (8 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        in = nextin;
+      }
+    }
 
-template <class checker>
-result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
-  buf_block_reader<64> reader(input, length);
-  size_t count{0};
-  while (reader.has_full_block()) {
-    simd::simd8x64<uint8_t> in(reader.full_block());
-    if (!in.is_ascii()) {
-      result res = scalar::ascii::validate_with_errors(
-          reinterpret_cast<const char *>(input + count), length - count);
-      return result(res.error, count + res.count);
+    __m128i v_07ff = __lsx_vreplgr2vr_h(uint16_t(0x7ff));
+    __m128i zero = __lsx_vldi(0);
+    if (__lsx_bz_v(__lsx_vslt_hu(v_07ff, in))) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+      // expected output   : [110a|aaaa|10bb|bbbb] x 8
+      // t0 = [000a|aaaa|bbbb|bb00]
+      __m128i t0 = __lsx_vslli_h(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      __m128i t1 = __lsx_vand_v(t0, __lsx_vldi(-2785 /*0x1f00*/));
+      // t2 = [0000|0000|00bb|bbbb]
+      __m128i t2 = __lsx_vand_v(in, __lsx_vrepli_h(0x3f));
+      // t3 = [000a|aaaa|00bb|bbbb]
+      __m128i t3 = __lsx_vor_v(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      __m128i v_c080 = __lsx_vreplgr2vr_h(uint16_t(0xc080));
+      __m128i t4 = __lsx_vor_v(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      __m128i one_byte_bytemask =
+          __lsx_vsle_hu(in, __lsx_vrepli_h(0x7F /*0x007F*/));
+      __m128i utf8_unpacked = __lsx_vbitsel_v(t4, in, one_byte_bytemask);
+      // 3. prepare bitmask for 8-bit lookup
+      uint32_t m2 = __lsx_vpickve2gr_bu(__lsx_vmskltz_h(one_byte_bytemask), 0);
+      // 4. pack the bytes
+      const uint8_t *row = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                               [lsx_1_2_utf8_bytes_mask[m2]][0];
+      __m128i shuffle = __lsx_vld(row, 1);
+      __m128i utf8_packed = __lsx_vshuf_b(zero, utf8_unpacked, shuffle);
+      // 5. store bytes
+      __lsx_vst(utf8_packed, utf8_output, 0);
+      // 6. adjust pointers
+      buf += 8;
+      utf8_output += row[0];
+      continue;
     }
-    reader.advance();
+    __m128i surrogates_bytemask =
+        __lsx_vseq_h(__lsx_vand_v(in, __lsx_vldi(-2568 /*0xF800*/)),
+                     __lsx_vldi(-2600 /*0xD800*/));
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (__lsx_bz_v(surrogates_bytemask)) {
+      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+      /* In this branch we handle three cases:
+           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+         single UFT-8 byte
+           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+         two UTF-8 bytes
+           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+         three UTF-8 bytes
 
-    count += 64;
-  }
-  uint8_t block[64]{};
-  reader.get_remainder(block);
-  simd::simd8x64<uint8_t> in(block);
-  if (!in.is_ascii()) {
-    result res = scalar::ascii::validate_with_errors(
-        reinterpret_cast<const char *>(input + count), length - count);
-    return result(res.error, count + res.count);
-  } else {
-    return result(error_code::SUCCESS, length);
-  }
-}
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-result generic_validate_ascii_with_errors(const char *input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
-}
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-} // namespace utf8_validation
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_validator.h */
-// transcoding from UTF-8 to UTF-16
-/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+          We precompute byte 1 for case #3 and -- **conditionally** --
+         precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+         they differ by exactly one bit.
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_to_utf16 {
-using namespace simd;
+          Finally from these two code units we build proper UTF-8 sequence,
+         taking into account the case (i.e, the number of bytes to write).
+        */
+      /**
+       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+       * t2 => [0ccc|cccc] [10cc|cccc]
+       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+       */
+      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+      __m128i t0 = __lsx_vpickev_b(in, in);
+      t0 = __lsx_vilvl_b(t0, t0);
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|00cc|cccc]
+      __m128i v_3f7f = __lsx_vreplgr2vr_h(uint16_t(0x3F7F));
+      __m128i t1 = __lsx_vand_v(t0, v_3f7f);
+      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+      __m128i t2 = __lsx_vor_v(t1, __lsx_vldi(-2688));
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      __m128i s0 = __lsx_vsrli_h(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      __m128i s1 = __lsx_vslli_h(in, 2);
+      // s1: [aabb|bbbb|cccc|cc00] => [00bb|bbbb|0000|0000]
+      s1 = __lsx_vand_v(s1, __lsx_vldi(-2753 /*0x3F00*/));
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+      // [00bb|bbbb|0000|aaaa]
+      __m128i s2 = __lsx_vor_v(s0, s1);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      __m128i v_c0e0 = __lsx_vreplgr2vr_h(uint16_t(0xC0E0));
+      __m128i s3 = __lsx_vor_v(s2, v_c0e0);
+      __m128i one_or_two_bytes_bytemask = __lsx_vsle_hu(in, v_07ff);
+      __m128i m0 = __lsx_vandn_v(one_or_two_bytes_bytemask,
+                                 __lsx_vldi(-2752 /*0x4000*/));
+      __m128i s4 = __lsx_vxor_v(s3, m0);
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+      // 4. expand code units 16-bit => 32-bit
+      __m128i out0 = __lsx_vilvl_h(s4, t2);
+      __m128i out1 = __lsx_vilvh_h(s4, t2);
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+      __m128i one_byte_bytemask = __lsx_vsle_hu(in, __lsx_vrepli_h(0x7F));
+
+      __m128i one_or_two_bytes_bytemask_low =
+          __lsx_vilvl_h(one_or_two_bytes_bytemask, zero);
+      __m128i one_or_two_bytes_bytemask_high =
+          __lsx_vilvh_h(one_or_two_bytes_bytemask, zero);
+
+      __m128i one_byte_bytemask_low =
+          __lsx_vilvl_h(one_byte_bytemask, one_byte_bytemask);
+      __m128i one_byte_bytemask_high =
+          __lsx_vilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+      const uint32_t mask0 = __lsx_vpickve2gr_bu(
+          __lsx_vmskltz_h(__lsx_vor_v(one_or_two_bytes_bytemask_low,
+                                      one_byte_bytemask_low)),
+          0);
+      const uint32_t mask1 = __lsx_vpickve2gr_bu(
+          __lsx_vmskltz_h(__lsx_vor_v(one_or_two_bytes_bytemask_high,
+                                      one_byte_bytemask_high)),
+          0);
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
-}
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
-}
+      const uint8_t *row0 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+      __m128i shuffle0 = __lsx_vld(row0, 1);
+      __m128i utf8_0 = __lsx_vshuf_b(zero, out0, shuffle0);
 
-struct validating_transcoder {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
+      const uint8_t *row1 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_1 = __lsx_vshuf_b(zero, out1, shuffle1);
 
-  validating_transcoder() : error(uint8_t(0)) {}
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
-  }
+      __lsx_vst(utf8_0, utf8_output, 0);
+      utf8_output += row0[0];
+      __lsx_vst(utf8_1, utf8_output, 0);
+      utf8_output += row1[0];
 
-  template <endianness endian>
-  simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char16_t *utf16_output) {
-    size_t pos = 0;
-    char16_t *start{utf16_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (utf8_continuation_mask & 1) {
-          return 0; // error
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
       }
-    }
-    if (errors()) {
-      return 0;
-    }
-    if (pos < size) {
-      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
-          in + pos, size - pos, utf16_output);
-      if (howmany == 0) {
-        return 0;
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xF800) != 0xD800) {
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf8_output++ = char((value >> 18) | 0b11110000);
+          *utf8_output++ = char(((value >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((value >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((value & 0b111111) | 0b10000000);
+        }
       }
-      utf16_output += howmany;
+      buf += k;
     }
-    return utf16_output - start;
-  }
+  } // while
 
-  template <endianness endian>
-  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char16_t *utf16_output) {
-    size_t pos = 0;
-    char16_t *start{utf16_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/lsx/lsx_convert_utf16_to_utf8.cpp */
+/* begin file src/lsx/lsx_convert_utf16_to_utf32.cpp */
+template <endianness big_endian>
+std::pair<const char16_t *, char32_t *>
+lsx_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                           char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *end = buf + len;
+
+  __m128i zero = __lsx_vldi(0);
+  __m128i v_f800 = __lsx_vldi(-2568); /*0xF800*/
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+
+  while (buf + 8 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
     }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (errors() || (utf8_continuation_mask & 1)) {
-          // rewind_and_convert_with_errors will seek a potential error from
-          // in+pos onward, with the ability to go back up to pos bytes, and
-          // read size-pos bytes forward.
-          result res =
-              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-                  pos, in + pos, size - pos, utf16_output);
-          res.count += pos;
-          return res;
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
+
+    __m128i surrogates_bytemask =
+        __lsx_vseq_h(__lsx_vand_v(in, v_f800), v_d800);
+    // It might seem like checking for surrogates_bitmask == 0xc000 could help.
+    // However, it is likely an uncommon occurrence.
+    if (__lsx_bz_v(surrogates_bytemask)) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      __lsx_vst(__lsx_vilvl_h(zero, in), utf32_output, 0);
+      __lsx_vst(__lsx_vilvh_h(zero, in), utf32_output, 16);
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
         }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
       }
+      buf += k;
     }
-    if (errors()) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
-      res.count += pos;
-      return res;
+  } // while
+  return std::make_pair(buf, reinterpret_cast<char32_t *>(utf32_output));
+}
+
+/*
+  Returns a pair: a result struct and utf8_output.
+  If there is an error, the count field of the result is the position of the
+  error. Otherwise, it is the position of the first unprocessed byte in buf
+  (even if finished). A scalar routing should carry on the conversion of the
+  tail if needed.
+*/
+template <endianness big_endian>
+std::pair<result, char32_t *>
+lsx_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                       char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
+  const char16_t *start = buf;
+  const char16_t *end = buf + len;
+
+  __m128i zero = __lsx_vldi(0);
+  __m128i v_f800 = __lsx_vldi(-2568); /*0xF800*/
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+
+  while (buf + 8 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lsx_swap_bytes(in);
     }
-    if (pos < size) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
-      if (res.error) { // In case of error, we want the error position
-        res.count += pos;
-        return res;
-      } else { // In case of success, we want the number of word written
-        utf16_output += res.count;
+
+    __m128i surrogates_bytemask =
+        __lsx_vseq_h(__lsx_vand_v(in, v_f800), v_d800);
+    if (__lsx_bz_v(surrogates_bytemask)) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      __lsx_vst(__lsx_vilvl_h(zero, in), utf32_output, 0);
+      __lsx_vst(__lsx_vilvh_h(zero, in), utf32_output, 16);
+      utf32_output += 8;
+      buf += 8;
+      // surrogate pair(s) in a register
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
+        if ((word & 0xF800) != 0xD800) {
+          *utf32_output++ = char32_t(word);
+        } else {
+          // must be a surrogate pair
+          uint16_t diff = uint16_t(word - 0xD800);
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
+          k++;
+          uint16_t diff2 = uint16_t(next_word - 0xDC00);
+          if ((diff | diff2) > 0x3FF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k - 1),
+                reinterpret_cast<char32_t *>(utf32_output));
+          }
+          uint32_t value = (diff << 10) + diff2 + 0x10000;
+          *utf32_output++ = char32_t(value);
+        }
       }
+      buf += k;
     }
-    return result(error_code::SUCCESS, utf16_output - start);
-  }
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char32_t *>(utf32_output));
+}
+/* end file src/lsx/lsx_convert_utf16_to_utf32.cpp */
 
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+/* begin file src/lsx/lsx_convert_utf32_to_latin1.cpp */
+std::pair<const char32_t *, char *>
+lsx_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                            char *latin1_output) {
+  const char32_t *end = buf + len;
+  const v16u8 shuf_mask = {0, 4, 8, 12, 16, 20, 24, 28, 0, 0, 0, 0, 0, 0, 0, 0};
+  __m128i v_ff = __lsx_vrepli_w(0xFF);
 
-}; // struct utf8_checker
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+  while (buf + 16 <= end) {
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i in2 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_to_utf16 {
+    __m128i in12 = __lsx_vor_v(in1, in2);
+    if (__lsx_bz_v(__lsx_vslt_wu(v_ff, in12))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vshuf_b(in2, in1, (__m128i)shuf_mask);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
+    }
+  } // while
+  return std::make_pair(buf, latin1_output);
+}
 
-using namespace simd;
+std::pair<result, char *>
+lsx_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                        char *latin1_output) {
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char16_t *utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the
-  // generic directory.
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the
-    // mask far more than 64 bytes.
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if (in.is_ascii()) {
-      in.store_ascii_as_utf16<endian>(utf16_output);
-      utf16_output += 64;
-      pos += 64;
+  const v16u8 shuf_mask = {0, 4, 8, 12, 16, 20, 24, 28, 0, 0, 0, 0, 0, 0, 0, 0};
+  __m128i v_ff = __lsx_vrepli_w(0xFF);
+
+  while (buf + 16 <= end) {
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i in2 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
+
+    __m128i in12 = __lsx_vor_v(in1, in2);
+
+    if (__lsx_bz_v(__lsx_vslt_wu(v_ff, in12))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vshuf_b(in2, in1, (__m128i)shuf_mask);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 8;
+      latin1_output += 8;
     } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow
-      // path. Anything that is not a continuation mask is a 'leading byte',
-      // that is, the start of a new code point.
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation
-      // byte
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end*
-      // of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times when using solely
-      // the slow/regular path, and at least four times if there are fast paths.
-      while (pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        //
-        // Thus we may allow convert_masked_utf8_to_utf16 to process
-        // more bytes at a time under a fast-path mode where 16 bytes
-        // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(
-            input + pos, utf8_end_of_code_point_mask, utf16_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
+      // Let us do a scalar fallback.
+      for (int k = 0; k < 8; k++) {
+        uint32_t word = buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
+        } else {
+          return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
+                                latin1_output);
+        }
       }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
     }
-  }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
-      input + pos, size - pos, utf16_output);
-  return utf16_output - start;
+  } // while
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        latin1_output);
 }
+/* end file src/lsx/lsx_convert_utf32_to_latin1.cpp */
+/* begin file src/lsx/lsx_convert_utf32_to_utf8.cpp */
+std::pair<const char32_t *, char *>
+lsx_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *end = buf + len;
 
-} // namespace utf8_to_utf16
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-// transcoding from UTF-8 to UTF-32
-/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+  __m128i v_c080 = __lsx_vreplgr2vr_h(uint16_t(0xC080));
+  __m128i v_07ff = __lsx_vreplgr2vr_h(uint16_t(0x7FF));
+  __m128i v_dfff = __lsx_vreplgr2vr_h(uint16_t(0xDFFF));
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+  __m128i forbidden_bytemask = __lsx_vldi(0x0);
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_to_utf32 {
-using namespace simd;
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
 
-simdutf_really_inline simd8<uint8_t>
-check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
-  // Bit 1 = Too Long (ASCII followed by continuation)
-  // Bit 2 = Overlong 3-byte
-  // Bit 4 = Surrogate
-  // Bit 5 = Overlong 2-byte
-  // Bit 7 = Two Continuations
-  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
-                                               // 11______ 11______
-  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
-  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
-  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
-  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
-  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
-  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
-                                               // 11110100 101_____
-                                               // 11110101 1001____
-                                               // 11110101 101_____
-                                               // 1111011_ 1001____
-                                               // 1111011_ 101_____
-                                               // 11111___ 1001____
-                                               // 11111___ 101_____
-  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
-  // 11110101 1000____
-  // 1111011_ 1000____
-  // 11111___ 1000____
-  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  while (buf + 16 + safety_margin < end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i nextin = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
 
-  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
-      // 0_______ ________ <ASCII in byte 1>
-      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
-      TOO_LONG,
-      // 10______ ________ <continuation in byte 1>
-      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
-      // 1100____ ________ <two byte lead in byte 1>
-      TOO_SHORT | OVERLONG_2,
-      // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
-      // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
-      // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
-  constexpr const uint8_t CARRY =
-      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
-  const simd8<uint8_t> byte_1_low =
-      (prev1 & 0x0F)
-          .lookup_16<uint8_t>(
-              // ____0000 ________
-              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
-              // ____0001 ________
-              CARRY | OVERLONG_2,
-              // ____001_ ________
-              CARRY, CARRY,
+    // Check if no bits set above 16th
+    if (__lsx_bz_v(__lsx_vpickod_h(in, nextin))) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (lsx_convert_utf16_to_utf8.cpp)
+      __m128i utf16_packed = __lsx_vpickev_h(nextin, in);
 
-              // ____0100 ________
-              CARRY | TOO_LARGE,
-              // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+      if (__lsx_bz_v(__lsx_vslt_hu(__lsx_vrepli_h(0x7F),
+                                   utf16_packed))) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(utf16_packed, utf16_packed);
+        // 2. store (8 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
+      __m128i zero = __lsx_vldi(0);
+      if (__lsx_bz_v(__lsx_vslt_hu(v_07ff, utf16_packed))) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
 
-              // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
-  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
-      // ________ 0_______ <ASCII in byte 2>
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
-      TOO_SHORT, TOO_SHORT,
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const __m128i t0 = __lsx_vslli_h(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const __m128i t1 = __lsx_vand_v(t0, __lsx_vldi(-2785 /*0x1f00*/));
+        // t2 = [0000|0000|00bb|bbbb]
+        const __m128i t2 = __lsx_vand_v(utf16_packed, __lsx_vrepli_h(0x3f));
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const __m128i t3 = __lsx_vor_v(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const __m128i t4 = __lsx_vor_v(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        __m128i one_byte_bytemask =
+            __lsx_vsle_hu(utf16_packed, __lsx_vrepli_h(0x7F /*0x007F*/));
+        __m128i utf8_unpacked =
+            __lsx_vbitsel_v(t4, utf16_packed, one_byte_bytemask);
+        // 3. prepare bitmask for 8-bit lookup
+        uint32_t m2 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(one_byte_bytemask), 0);
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lsx_1_2_utf8_bytes_mask[m2]][0];
+        __m128i shuffle = __lsx_vld(row, 1);
+        __m128i utf8_packed = __lsx_vshuf_b(zero, utf8_unpacked, shuffle);
+        // 5. store bytes
+        __lsx_vst(utf8_packed, utf8_output, 0);
 
-      // ________ 1000____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
-          OVERLONG_4,
-      // ________ 1001____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
-      // ________ 101_____
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
-      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        forbidden_bytemask = __lsx_vor_v(
+            __lsx_vand_v(
+                __lsx_vsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+                __lsx_vsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+            forbidden_bytemask);
+        /* In this branch we handle three cases:
+    1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single
+    UFT-8 byte
+    2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+    UTF-8 bytes
+    3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three
+    UTF-8 bytes
+
+    We expand the input word (16-bit) into two code units (32-bit), thus
+    we have room for four bytes. However, we need five distinct bit
+    layouts. Note that the last byte in cases #2 and #3 is the same.
+
+    We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+    in register t2.
+
+    We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+    either byte 1 for case #2 or byte 2 for case #3. Note that they
+    differ by exactly one bit.
+
+    Finally from these two code units we build proper UTF-8 sequence, taking
+    into account the case (i.e, the number of bytes to write).
+  */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        __m128i t0 = __lsx_vpickev_b(utf16_packed, utf16_packed);
+        t0 = __lsx_vilvl_b(t0, t0);
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        __m128i v_3f7f = __lsx_vreplgr2vr_h(uint16_t(0x3F7F));
+        __m128i t1 = __lsx_vand_v(t0, v_3f7f);
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        __m128i t2 = __lsx_vor_v(t1, __lsx_vldi(-2688 /*0x8000*/));
 
-      // ________ 11______
-      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
-  return (byte_1_high & byte_1_low & byte_2_high);
-}
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
-}
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        __m128i s0 = __lsx_vsrli_h(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        __m128i s1 = __lsx_vslli_h(utf16_packed, 2);
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        s1 = __lsx_vand_v(s1, __lsx_vldi(-2753 /*0x3F00*/));
+        // [00bb|bbbb|0000|aaaa]
+        __m128i s2 = __lsx_vor_v(s0, s1);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        __m128i v_c0e0 = __lsx_vreplgr2vr_h(uint16_t(0xC0E0));
+        __m128i s3 = __lsx_vor_v(s2, v_c0e0);
+        // __m128i v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        __m128i one_or_two_bytes_bytemask = __lsx_vsle_hu(utf16_packed, v_07ff);
+        __m128i m0 = __lsx_vandn_v(one_or_two_bytes_bytemask,
+                                   __lsx_vldi(-2752 /*0x4000*/));
+        __m128i s4 = __lsx_vxor_v(s3, m0);
 
-struct validating_transcoder {
-  // If this is nonzero, there has been a UTF-8 error.
-  simd8<uint8_t> error;
+        // 4. expand code units 16-bit => 32-bit
+        __m128i out0 = __lsx_vilvl_h(s4, t2);
+        __m128i out1 = __lsx_vilvh_h(s4, t2);
 
-  validating_transcoder() : error(uint8_t(0)) {}
-  //
-  // Check whether the current bytes are valid UTF-8.
-  //
-  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
-                                              const simd8<uint8_t> prev_input) {
-    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
-    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
-    // small negative numbers)
-    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
-  }
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        __m128i one_byte_bytemask =
+            __lsx_vsle_hu(utf16_packed, __lsx_vrepli_h(0x7F));
+
+        __m128i one_or_two_bytes_bytemask_u16_to_u32_low =
+            __lsx_vilvl_h(one_or_two_bytes_bytemask, zero);
+        __m128i one_or_two_bytes_bytemask_u16_to_u32_high =
+            __lsx_vilvh_h(one_or_two_bytes_bytemask, zero);
+
+        __m128i one_byte_bytemask_u16_to_u32_low =
+            __lsx_vilvl_h(one_byte_bytemask, one_byte_bytemask);
+        __m128i one_byte_bytemask_u16_to_u32_high =
+            __lsx_vilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+        const uint32_t mask0 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(__lsx_vor_v(
+                                    one_or_two_bytes_bytemask_u16_to_u32_low,
+                                    one_byte_bytemask_u16_to_u32_low)),
+                                0);
+        const uint32_t mask1 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(__lsx_vor_v(
+                                    one_or_two_bytes_bytemask_u16_to_u32_high,
+                                    one_byte_bytemask_u16_to_u32_high)),
+                                0);
 
-  simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char32_t *utf32_output) {
-    size_t pos = 0;
-    char32_t *start{utf32_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 16 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
-        pos += 64;
-      } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (utf8_continuation_mask & 1) {
-          return 0; // we have an error
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
-        }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        __m128i shuffle0 = __lsx_vld(row0, 1);
+        __m128i utf8_0 = __lsx_vshuf_b(zero, out0, shuffle0);
+
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_1 = __lsx_vshuf_b(zero, out1, shuffle1);
+
+        __lsx_vst(utf8_0, utf8_output, 0);
+        utf8_output += row0[0];
+        __lsx_vst(utf8_1, utf8_output, 0);
+        utf8_output += row1[0];
+
+        buf += 8;
       }
-    }
-    if (errors()) {
-      return 0;
-    }
-    if (pos < size) {
-      size_t howmany =
-          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
-      if (howmany == 0) {
-        return 0;
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
+    } else {
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
       }
-      utf32_output += howmany;
+      buf += k;
     }
-    return utf32_output - start;
+  } // while
+
+  // check for invalid input
+  if (__lsx_bnz_v(forbidden_bytemask)) {
+    return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
   }
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
+}
 
-  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char32_t *utf32_output) {
-    size_t pos = 0;
-    char32_t *start{utf32_output};
-    // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
-    // last 16 bytes, and if the data is valid, then it is entirely safe because
-    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
-    // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
-    size_t leading_byte = 0;
-    size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
-    }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
-    const size_t safety_margin = size - margin + 1; // to avoid overruns!
-    while (pos + 64 + safety_margin <= size) {
-      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-      if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
-        pos += 64;
+std::pair<result, char *>
+lsx_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                      char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
+
+  __m128i v_c080 = __lsx_vreplgr2vr_h(uint16_t(0xC080));
+  __m128i v_07ff = __lsx_vreplgr2vr_h(uint16_t(0x7FF));
+  __m128i v_dfff = __lsx_vreplgr2vr_h(uint16_t(0xDFFF));
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+  __m128i forbidden_bytemask = __lsx_vldi(0x0);
+  const size_t safety_margin =
+      12; // to avoid overruns, see issue
+          // https://github.com/simdutf/simdutf/issues/92
+
+  while (buf + 16 + safety_margin < end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i nextin = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
+
+    // Check if no bits set above 16th
+    if (__lsx_bz_v(__lsx_vpickod_h(in, nextin))) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (lsx_convert_utf16_to_utf8.cpp)
+      __m128i utf16_packed = __lsx_vpickev_h(nextin, in);
+
+      if (__lsx_bz_v(__lsx_vslt_hu(__lsx_vrepli_h(0x7F),
+                                   utf16_packed))) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m128i utf8_packed = __lsx_vpickev_b(utf16_packed, utf16_packed);
+        // 2. store (8 bytes)
+        __lsx_vst(utf8_packed, utf8_output, 0);
+        // 3. adjust pointers
+        buf += 8;
+        utf8_output += 8;
+        continue; // we are done for this round!
+      }
+      __m128i zero = __lsx_vldi(0);
+      if (__lsx_bz_v(__lsx_vslt_hu(v_07ff, utf16_packed))) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
+
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const __m128i t0 = __lsx_vslli_h(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const __m128i t1 = __lsx_vand_v(t0, __lsx_vldi(-2785 /*0x1f00*/));
+        // t2 = [0000|0000|00bb|bbbb]
+        const __m128i t2 = __lsx_vand_v(utf16_packed, __lsx_vrepli_h(0x3f));
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const __m128i t3 = __lsx_vor_v(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const __m128i t4 = __lsx_vor_v(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        __m128i one_byte_bytemask =
+            __lsx_vsle_hu(utf16_packed, __lsx_vrepli_h(0x7F /*0x007F*/));
+        __m128i utf8_unpacked =
+            __lsx_vbitsel_v(t4, utf16_packed, one_byte_bytemask);
+        // 3. prepare bitmask for 8-bit lookup
+        uint32_t m2 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(one_byte_bytemask), 0);
+        // 4. pack the bytes
+        const uint8_t *row =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lsx_1_2_utf8_bytes_mask[m2]][0];
+        __m128i shuffle = __lsx_vld(row, 1);
+        __m128i utf8_packed = __lsx_vshuf_b(zero, utf8_unpacked, shuffle);
+        // 5. store bytes
+        __lsx_vst(utf8_packed, utf8_output, 0);
+
+        // 6. adjust pointers
+        buf += 8;
+        utf8_output += row[0];
+        continue;
       } else {
-        // you might think that a for-loop would work, but under Visual Studio,
-        // it is not good enough.
-        static_assert(
-            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
-                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
-            "We support either two or four chunks per 64-byte block.");
-        auto zero = simd8<uint8_t>{uint8_t(0)};
-        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
-          this->check_utf8_bytes(input.chunks[0], zero);
-          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
-          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
-          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
-        }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (errors() || (utf8_continuation_mask & 1)) {
-          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-              pos, in + pos, size - pos, utf32_output);
-          res.count += pos;
-          return res;
-        }
-        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-        // We process in blocks of up to 12 bytes except possibly
-        // for fast paths which may process up to 16 bytes. For the
-        // slow path to work, we should have at least 12 input bytes left.
-        size_t max_starting_point = (pos + 64) - 12;
-        // Next loop is going to run at least five times.
-        while (pos < max_starting_point) {
-          // Performance note: our ability to compute 'consumed' and
-          // then shift and recompute is critical. If there is a
-          // latency of, say, 4 cycles on getting 'consumed', then
-          // the inner loop might have a total latency of about 6 cycles.
-          // Yet we process between 6 to 12 inputs bytes, thus we get
-          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-          // for this section of the code. Hence, there is a limit
-          // to how much we can further increase this latency before
-          // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
-          pos += consumed;
-          utf8_end_of_code_point_mask >>= consumed;
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        forbidden_bytemask = __lsx_vor_v(
+            __lsx_vand_v(
+                __lsx_vsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+                __lsx_vsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+            forbidden_bytemask);
+        if (__lsx_bnz_v(forbidden_bytemask)) {
+          return std::make_pair(result(error_code::SURROGATE, buf - start),
+                                reinterpret_cast<char *>(utf8_output));
         }
-        // At this point there may remain between 0 and 12 bytes in the
-        // 64-byte block. These bytes will be processed again. So we have an
-        // 80% efficiency (in the worst case). In practice we expect an
-        // 85% to 90% efficiency.
-      }
-    }
-    if (errors()) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
-      res.count += pos;
-      return res;
-    }
-    if (pos < size) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
-      if (res.error) { // In case of error, we want the error position
-        res.count += pos;
-        return res;
-      } else { // In case of success, we want the number of word written
-        utf32_output += res.count;
-      }
-    }
-    return result(error_code::SUCCESS, utf32_output - start);
-  }
+        /* In this branch we handle three cases:
+    1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           - single
+    UFT-8 byte
+    2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
+    UTF-8 bytes
+    3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] - three
+    UTF-8 bytes
+
+    We expand the input word (16-bit) into two code units (32-bit), thus
+    we have room for four bytes. However, we need five distinct bit
+    layouts. Note that the last byte in cases #2 and #3 is the same.
+
+    We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+    in register t2.
+
+    We precompute byte 1 for case #3 and -- **conditionally** -- precompute
+    either byte 1 for case #2 or byte 2 for case #3. Note that they
+    differ by exactly one bit.
+
+    Finally from these two code units we build proper UTF-8 sequence, taking
+    into account the case (i.e, the number of bytes to write).
+  */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        __m128i t0 = __lsx_vpickev_b(utf16_packed, utf16_packed);
+        t0 = __lsx_vilvl_b(t0, t0);
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        __m128i v_3f7f = __lsx_vreplgr2vr_h(uint16_t(0x3F7F));
+        __m128i t1 = __lsx_vand_v(t0, v_3f7f);
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        __m128i t2 = __lsx_vor_v(t1, __lsx_vldi(-2688 /*0x8000*/));
 
-  simdutf_really_inline bool errors() const {
-    return this->error.any_bits_set_anywhere();
-  }
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        __m128i s0 = __lsx_vsrli_h(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        __m128i s1 = __lsx_vslli_h(utf16_packed, 2);
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        s1 = __lsx_vand_v(s1, __lsx_vldi(-2753 /*0x3F00*/));
+        // [00bb|bbbb|0000|aaaa]
+        __m128i s2 = __lsx_vor_v(s0, s1);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        __m128i v_c0e0 = __lsx_vreplgr2vr_h(uint16_t(0xC0E0));
+        __m128i s3 = __lsx_vor_v(s2, v_c0e0);
+        // __m128i v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        __m128i one_or_two_bytes_bytemask = __lsx_vsle_hu(utf16_packed, v_07ff);
+        __m128i m0 = __lsx_vandn_v(one_or_two_bytes_bytemask,
+                                   __lsx_vldi(-2752 /*0x4000*/));
+        __m128i s4 = __lsx_vxor_v(s3, m0);
 
-}; // struct utf8_checker
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
-/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+        // 4. expand code units 16-bit => 32-bit
+        __m128i out0 = __lsx_vilvl_h(s4, t2);
+        __m128i out1 = __lsx_vilvh_h(s4, t2);
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8_to_utf32 {
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        __m128i one_byte_bytemask =
+            __lsx_vsle_hu(utf16_packed, __lsx_vrepli_h(0x7F));
+
+        __m128i one_or_two_bytes_bytemask_u16_to_u32_low =
+            __lsx_vilvl_h(one_or_two_bytes_bytemask, zero);
+        __m128i one_or_two_bytes_bytemask_u16_to_u32_high =
+            __lsx_vilvh_h(one_or_two_bytes_bytemask, zero);
+
+        __m128i one_byte_bytemask_u16_to_u32_low =
+            __lsx_vilvl_h(one_byte_bytemask, one_byte_bytemask);
+        __m128i one_byte_bytemask_u16_to_u32_high =
+            __lsx_vilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+        const uint32_t mask0 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(__lsx_vor_v(
+                                    one_or_two_bytes_bytemask_u16_to_u32_low,
+                                    one_byte_bytemask_u16_to_u32_low)),
+                                0);
+        const uint32_t mask1 =
+            __lsx_vpickve2gr_bu(__lsx_vmskltz_h(__lsx_vor_v(
+                                    one_or_two_bytes_bytemask_u16_to_u32_high,
+                                    one_byte_bytemask_u16_to_u32_high)),
+                                0);
 
-using namespace simd;
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
+        __m128i shuffle0 = __lsx_vld(row0, 1);
+        __m128i utf8_0 = __lsx_vshuf_b(zero, out0, shuffle0);
 
-simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char32_t *utf32_output) noexcept {
-  size_t pos = 0;
-  char32_t *start{utf32_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if (in.is_ascii()) {
-      in.store_ascii_as_utf32(utf32_output);
-      utf32_output += 64;
-      pos += 64;
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_1 = __lsx_vshuf_b(zero, out1, shuffle1);
+
+        __lsx_vst(utf8_0, utf8_output, 0);
+        utf8_output += row0[0];
+        __lsx_vst(utf8_1, utf8_output, 0);
+        utf8_output += row1[0];
+
+        buf += 8;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
     } else {
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation
-      // byte
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      size_t max_starting_point = (pos + 64) - 12;
-      while (pos < max_starting_point) {
-        size_t consumed = convert_masked_utf8_to_utf32(
-            input + pos, utf8_end_of_code_point_mask, utf32_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
+      size_t forward = 15;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
       }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFFFF80) == 0) {
+          *utf8_output++ = char(word);
+        } else if ((word & 0xFFFFF800) == 0) {
+          *utf8_output++ = char((word >> 6) | 0b11000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else if ((word & 0xFFFF0000) == 0) {
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 12) | 0b11100000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        } else {
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
+          }
+          *utf8_output++ = char((word >> 18) | 0b11110000);
+          *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+          *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+          *utf8_output++ = char((word & 0b111111) | 0b10000000);
+        }
+      }
+      buf += k;
     }
-  }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
-                                                       utf32_output);
-  return utf32_output - start;
+  } // while
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
 }
+/* end file src/lsx/lsx_convert_utf32_to_utf8.cpp */
+/* begin file src/lsx/lsx_convert_utf32_to_utf16.cpp */
+template <endianness big_endian>
+std::pair<const char32_t *, char16_t *>
+lsx_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                           char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *end = buf + len;
 
-} // namespace utf8_to_utf32
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
-// other functions
-/* begin file src/generic/utf16.h */
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf16 {
+  __m128i forbidden_bytemask = __lsx_vrepli_h(0);
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+  __m128i v_dfff = __lsx_vreplgr2vr_h(uint16_t(0xdfff));
+  while (buf + 8 <= end) {
+    __m128i in0 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
 
-template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t *in,
-                                               size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
-    }
-    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-    count += count_ones(not_pair) / 2;
-  }
-  return count +
-         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
-}
+    // Check if no bits set above 16th
+    if (__lsx_bz_v(__lsx_vpickod_h(in1, in0))) {
+      __m128i utf16_packed = __lsx_vpickev_h(in1, in0);
+      forbidden_bytemask = __lsx_vor_v(
+          __lsx_vand_v(
+              __lsx_vsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+              __lsx_vsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+          forbidden_bytemask);
 
-template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
-                                                    size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
+      if (!match_system(big_endian)) {
+        utf16_packed = lsx_swap_bytes(utf16_packed);
+      }
+      __lsx_vst(utf16_packed, utf16_output, 0);
+      utf16_output += 8;
+      buf += 8;
+    } else {
+      size_t forward = 3;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (!match_system(big_endian)) {
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
     }
-    uint64_t ascii_mask = input.lteq(0x7F);
-    uint64_t twobyte_mask = input.lteq(0x7FF);
-    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+  }
 
-    size_t ascii_count = count_ones(ascii_mask) / 2;
-    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
-    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
-    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
-             ascii_count;
+  // check for invalid input
+  if (__lsx_bnz_v(forbidden_bytemask)) {
+    return std::make_pair(nullptr, reinterpret_cast<char16_t *>(utf16_output));
   }
-  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
-                                                                   size - pos);
+  return std::make_pair(buf, reinterpret_cast<char16_t *>(utf16_output));
 }
 
 template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
-                                                     size_t size) {
-  return count_code_points<big_endian>(in, size);
-}
-
-simdutf_really_inline void
-change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
-  size_t pos = 0;
-
-  while (pos < size / 32 * 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    input.swap_bytes();
-    input.store(reinterpret_cast<uint16_t *>(output));
-    pos += 32;
-    output += 32;
-  }
-
-  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
-}
+std::pair<result, char16_t *>
+lsx_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                       char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
+  const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-} // namespace utf16
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf16.h */
-/* begin file src/generic/utf8.h */
+  __m128i forbidden_bytemask = __lsx_vrepli_h(0);
+  __m128i v_d800 = __lsx_vldi(-2600); /*0xD800*/
+  __m128i v_dfff = __lsx_vreplgr2vr_h(uint16_t(0xdfff));
 
-namespace simdutf {
-namespace ppc64 {
-namespace {
-namespace utf8 {
+  while (buf + 8 <= end) {
+    __m128i in0 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint32_t *>(buf), 16);
+    // Check if no bits set above 16th
+    if (__lsx_bz_v(__lsx_vpickod_h(in1, in0))) {
+      __m128i utf16_packed = __lsx_vpickev_h(in1, in0);
+
+      forbidden_bytemask = __lsx_vor_v(
+          __lsx_vand_v(
+              __lsx_vsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+              __lsx_vsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+          forbidden_bytemask);
+      if (__lsx_bnz_v(forbidden_bytemask)) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
 
-using namespace simd;
+      if (!match_system(big_endian)) {
+        utf16_packed = lsx_swap_bytes(utf16_packed);
+      }
 
-simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.gt(-65);
-    count += count_ones(utf8_continuation_mask);
+      __lsx_vst(utf16_packed, utf16_output, 0);
+      utf16_output += 8;
+      buf += 8;
+    } else {
+      size_t forward = 3;
+      size_t k = 0;
+      if (size_t(end - buf) < forward + 1) {
+        forward = size_t(end - buf - 1);
+      }
+      for (; k < forward; k++) {
+        uint32_t word = buf[k];
+        if ((word & 0xFFFF0000) == 0) {
+          // will not generate a surrogate pair
+          if (word >= 0xD800 && word <= 0xDFFF) {
+            return std::make_pair(
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
+        } else {
+          // will generate a surrogate pair
+          if (word > 0x10FFFF) {
+            return std::make_pair(
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
+          }
+          word -= 0x10000;
+          uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+          uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+          if (!match_system(big_endian)) {
+            high_surrogate =
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+          }
+          *utf16_output++ = char16_t(high_surrogate);
+          *utf16_output++ = char16_t(low_surrogate);
+        }
+      }
+      buf += k;
+    }
   }
-  return count + scalar::utf8::count_code_points(in + pos, size - pos);
-}
 
-simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
-                                                    size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-    // We count one word for anything that is not a continuation (so
-    // leading bytes).
-    count += 64 - count_ones(utf8_continuation_mask);
-    int64_t utf8_4byte = input.gteq_unsigned(240);
-    count += count_ones(utf8_4byte);
-  }
-  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char16_t *>(utf16_output));
 }
-} // namespace utf8
-} // unnamed namespace
-} // namespace ppc64
-} // namespace simdutf
-/* end file src/generic/utf8.h */
+/* end file src/lsx/lsx_convert_utf32_to_utf16.cpp */
+/* begin file src/lsx/lsx_base64.cpp */
+/**
+ * References and further reading:
+ *
+ * Wojciech Muła, Daniel Lemire, Base64 encoding and decoding at almost the
+ * speed of a memory copy, Software: Practice and Experience 50 (2), 2020.
+ * https://arxiv.org/abs/1910.05109
+ *
+ * Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding using AVX2
+ * Instructions, ACM Transactions on the Web 12 (3), 2018.
+ * https://arxiv.org/abs/1704.00605
+ *
+ * Simon Josefsson. 2006. The Base16, Base32, and Base64 Data Encodings.
+ * https://tools.ietf.org/html/rfc4648. (2006). Internet Engineering Task Force,
+ * Request for Comments: 4648.
+ *
+ * Alfred Klomp. 2014a. Fast Base64 encoding/decoding with SSE vectorization.
+ * http://www.alfredklomp.com/programming/sse-base64/. (2014).
+ *
+ * Alfred Klomp. 2014b. Fast Base64 stream encoder/decoder in C99, with SIMD
+ * acceleration. https://github.com/aklomp/base64. (2014).
+ *
+ * Hanson Char. 2014. A Fast and Correct Base 64 Codec. (2014).
+ * https://aws.amazon.com/blogs/developer/a-fast-and-correct-base-64-codec/
+ *
+ * Nick Kopp. 2013. Base64 Encoding on a GPU.
+ * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
+ */
 
-//
-// Implementation-specific overrides
-//
-namespace simdutf {
-namespace ppc64 {
+template <bool isbase64url>
+size_t encode_base64(char *dst, const char *src, size_t srclen,
+                     base64_options options) {
+  // credit: Wojciech Muła
+  // SSE (lookup: pshufb improved unrolled)
+  const uint8_t *input = (const uint8_t *)src;
+  static const char *lookup_tbl =
+      isbase64url
+          ? "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
+          : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+  uint8_t *out = (uint8_t *)dst;
 
-simdutf_warn_unused int
-implementation::detect_encodings(const char *input,
-                                 size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if (bom_encoding != encoding_type::unspecified) {
-    return bom_encoding;
-  }
-  // todo: reimplement as a one-pass algorithm.
-  int out = 0;
-  if (validate_utf8(input, length)) {
-    out |= encoding_type::UTF8;
-  }
-  if ((length % 2) == 0) {
-    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2)) {
-      out |= encoding_type::UTF16_LE;
-    }
+  v16u8 shuf;
+  __m128i v_fc0fc00, v_3f03f0, shift_r, shift_l, base64_tbl0, base64_tbl1,
+      base64_tbl2, base64_tbl3;
+  if (srclen >= 16) {
+    shuf = v16u8{1, 0, 2, 1, 4, 3, 5, 4, 7, 6, 8, 7, 10, 9, 11, 10};
+    v_fc0fc00 = __lsx_vreplgr2vr_w(uint32_t(0x0fc0fc00));
+    v_3f03f0 = __lsx_vreplgr2vr_w(uint32_t(0x003f03f0));
+    shift_r = __lsx_vreplgr2vr_w(uint32_t(0x0006000a));
+    shift_l = __lsx_vreplgr2vr_w(uint32_t(0x00080004));
+    base64_tbl0 = __lsx_vld(lookup_tbl, 0);
+    base64_tbl1 = __lsx_vld(lookup_tbl, 16);
+    base64_tbl2 = __lsx_vld(lookup_tbl, 32);
+    base64_tbl3 = __lsx_vld(lookup_tbl, 48);
   }
-  if ((length % 4) == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
-      out |= encoding_type::UTF32_LE;
-    }
+
+  size_t i = 0;
+  for (; i + 52 <= srclen; i += 48) {
+    __m128i in0 =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 0);
+    __m128i in1 =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 1);
+    __m128i in2 =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 2);
+    __m128i in3 =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 3);
+
+    in0 = __lsx_vshuf_b(in0, in0, (__m128i)shuf);
+    in1 = __lsx_vshuf_b(in1, in1, (__m128i)shuf);
+    in2 = __lsx_vshuf_b(in2, in2, (__m128i)shuf);
+    in3 = __lsx_vshuf_b(in3, in3, (__m128i)shuf);
+
+    __m128i t0_0 = __lsx_vand_v(in0, v_fc0fc00);
+    __m128i t0_1 = __lsx_vand_v(in1, v_fc0fc00);
+    __m128i t0_2 = __lsx_vand_v(in2, v_fc0fc00);
+    __m128i t0_3 = __lsx_vand_v(in3, v_fc0fc00);
+
+    __m128i t1_0 = __lsx_vsrl_h(t0_0, shift_r);
+    __m128i t1_1 = __lsx_vsrl_h(t0_1, shift_r);
+    __m128i t1_2 = __lsx_vsrl_h(t0_2, shift_r);
+    __m128i t1_3 = __lsx_vsrl_h(t0_3, shift_r);
+
+    __m128i t2_0 = __lsx_vand_v(in0, v_3f03f0);
+    __m128i t2_1 = __lsx_vand_v(in1, v_3f03f0);
+    __m128i t2_2 = __lsx_vand_v(in2, v_3f03f0);
+    __m128i t2_3 = __lsx_vand_v(in3, v_3f03f0);
+
+    __m128i t3_0 = __lsx_vsll_h(t2_0, shift_l);
+    __m128i t3_1 = __lsx_vsll_h(t2_1, shift_l);
+    __m128i t3_2 = __lsx_vsll_h(t2_2, shift_l);
+    __m128i t3_3 = __lsx_vsll_h(t2_3, shift_l);
+
+    __m128i input0 = __lsx_vor_v(t1_0, t3_0);
+    __m128i input0_shuf0 = __lsx_vshuf_b(base64_tbl1, base64_tbl0, input0);
+    __m128i input0_shuf1 = __lsx_vshuf_b(base64_tbl3, base64_tbl2,
+                                         __lsx_vsub_b(input0, __lsx_vldi(32)));
+    __m128i input0_mask = __lsx_vslei_bu(input0, 31);
+    __m128i input0_result =
+        __lsx_vbitsel_v(input0_shuf1, input0_shuf0, input0_mask);
+    __lsx_vst(input0_result, reinterpret_cast<__m128i *>(out), 0);
+    out += 16;
+
+    __m128i input1 = __lsx_vor_v(t1_1, t3_1);
+    __m128i input1_shuf0 = __lsx_vshuf_b(base64_tbl1, base64_tbl0, input1);
+    __m128i input1_shuf1 = __lsx_vshuf_b(base64_tbl3, base64_tbl2,
+                                         __lsx_vsub_b(input1, __lsx_vldi(32)));
+    __m128i input1_mask = __lsx_vslei_bu(input1, 31);
+    __m128i input1_result =
+        __lsx_vbitsel_v(input1_shuf1, input1_shuf0, input1_mask);
+    __lsx_vst(input1_result, reinterpret_cast<__m128i *>(out), 0);
+    out += 16;
+
+    __m128i input2 = __lsx_vor_v(t1_2, t3_2);
+    __m128i input2_shuf0 = __lsx_vshuf_b(base64_tbl1, base64_tbl0, input2);
+    __m128i input2_shuf1 = __lsx_vshuf_b(base64_tbl3, base64_tbl2,
+                                         __lsx_vsub_b(input2, __lsx_vldi(32)));
+    __m128i input2_mask = __lsx_vslei_bu(input2, 31);
+    __m128i input2_result =
+        __lsx_vbitsel_v(input2_shuf1, input2_shuf0, input2_mask);
+    __lsx_vst(input2_result, reinterpret_cast<__m128i *>(out), 0);
+    out += 16;
+
+    __m128i input3 = __lsx_vor_v(t1_3, t3_3);
+    __m128i input3_shuf0 = __lsx_vshuf_b(base64_tbl1, base64_tbl0, input3);
+    __m128i input3_shuf1 = __lsx_vshuf_b(base64_tbl3, base64_tbl2,
+                                         __lsx_vsub_b(input3, __lsx_vldi(32)));
+    __m128i input3_mask = __lsx_vslei_bu(input3, 31);
+    __m128i input3_result =
+        __lsx_vbitsel_v(input3_shuf1, input3_shuf0, input3_mask);
+    __lsx_vst(input3_result, reinterpret_cast<__m128i *>(out), 0);
+    out += 16;
   }
+  for (; i + 16 <= srclen; i += 12) {
 
-  return out;
-}
+    __m128i in = __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 0);
 
-simdutf_warn_unused bool
-implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_utf8(buf, len);
-}
+    // bytes from groups A, B and C are needed in separate 32-bit lanes
+    // in = [DDDD|CCCC|BBBB|AAAA]
+    //
+    //      an input triplet has layout
+    //      [????????|ccdddddd|bbbbcccc|aaaaaabb]
+    //        byte 3   byte 2   byte 1   byte 0    -- byte 3 comes from the next
+    //        triplet
+    //
+    //      shuffling changes the order of bytes: 1, 0, 2, 1
+    //      [bbbbcccc|ccdddddd|aaaaaabb|bbbbcccc]
+    //           ^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^
+    //                  processed bits
+    in = __lsx_vshuf_b(in, in, (__m128i)shuf);
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_utf8_with_errors(buf, len);
-}
+    // unpacking
+    // t0    = [0000cccc|cc000000|aaaaaa00|00000000]
+    __m128i t0 = __lsx_vand_v(in, v_fc0fc00);
+    // t1    = [00000000|00cccccc|00000000|00aaaaaa]
+    //          ((c >> 6),  (a >> 10))
+    __m128i t1 = __lsx_vsrl_h(t0, shift_r);
 
-simdutf_warn_unused bool
-implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_ascii(buf, len);
-}
+    // t2    = [00000000|00dddddd|000000bb|bbbb0000]
+    __m128i t2 = __lsx_vand_v(in, v_3f03f0);
+    // t3    = [00dddddd|00000000|00bbbbbb|00000000]
+    //          ((d << 8), (b << 4))
+    __m128i t3 = __lsx_vsll_h(t2, shift_l);
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(
-    const char *buf, size_t len) const noexcept {
-  return ppc64::utf8_validation::generic_validate_ascii_with_errors(buf, len);
-}
+    // res   = [00dddddd|00cccccc|00bbbbbb|00aaaaaa] = t1 | t3
+    __m128i indices = __lsx_vor_v(t1, t3);
 
-simdutf_warn_unused bool
-implementation::validate_utf16le(const char16_t *buf,
-                                 size_t len) const noexcept {
-  return scalar::utf16::validate<endianness::LITTLE>(buf, len);
-}
+    __m128i indices_shuf0 = __lsx_vshuf_b(base64_tbl1, base64_tbl0, indices);
+    __m128i indices_shuf1 = __lsx_vshuf_b(
+        base64_tbl3, base64_tbl2, __lsx_vsub_b(indices, __lsx_vldi(32)));
+    __m128i indices_mask = __lsx_vslei_bu(indices, 31);
+    __m128i indices_result =
+        __lsx_vbitsel_v(indices_shuf1, indices_shuf0, indices_mask);
 
-simdutf_warn_unused bool
-implementation::validate_utf16be(const char16_t *buf,
-                                 size_t len) const noexcept {
-  return scalar::utf16::validate<endianness::BIG>(buf, len);
-}
+    __lsx_vst(indices_result, reinterpret_cast<__m128i *>(out), 0);
+    out += 16;
+  }
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  return scalar::utf16::validate_with_errors<endianness::LITTLE>(buf, len);
+  return i / 3 * 4 + scalar::base64::tail_encode_base64((char *)out, src + i,
+                                                        srclen - i, options);
 }
 
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(
-    const char16_t *buf, size_t len) const noexcept {
-  return scalar::utf16::validate_with_errors<endianness::BIG>(buf, len);
-}
+static inline void compress(__m128i data, uint16_t mask, char *output) {
+  if (mask == 0) {
+    __lsx_vst(data, reinterpret_cast<__m128i *>(output), 0);
+    return;
+  }
+  // this particular implementation was inspired by work done by @animetosho
+  // we do it in two steps, first 8 bytes and then second 8 bytes
+  uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
+  uint8_t mask2 = uint8_t(mask >> 8); // most significant 8 bits
+  // next line just loads the 64-bit values thintable_epi8[mask1] and
+  // thintable_epi8[mask2] into a 128-bit register, using only
+  // two instructions on most compilers.
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(
-    const char32_t *buf, size_t len) const noexcept {
-  return scalar::utf32::validate_with_errors(buf, len);
-}
+  v2u64 shufmask = {tables::base64::thintable_epi8[mask1],
+                    tables::base64::thintable_epi8[mask2]};
 
-simdutf_warn_unused bool
-implementation::validate_utf32(const char16_t *buf, size_t len) const noexcept {
-  return scalar::utf32::validate(buf, len);
-}
+  // we increment by 0x08 the second half of the mask
+  v4u32 hi = {0, 0, 0x08080808, 0x08080808};
+  __m128i shufmask1 = __lsx_vadd_b((__m128i)shufmask, (__m128i)hi);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
-}
+  // this is the version "nearly pruned"
+  __m128i pruned = __lsx_vshuf_b(data, data, shufmask1);
+  // we still need to put the two halves together.
+  // we compute the popcount of the first half:
+  int pop1 = tables::base64::BitsSetTable256mul2[mask1];
+  // then load the corresponding mask, what it does is to write
+  // only the first pop1 bytes from the first 8 bytes, and then
+  // it fills in with the bytes from the second 8 bytes + some filling
+  // at the end.
+  __m128i compactmask =
+      __lsx_vld(reinterpret_cast<const __m128i *>(
+                    tables::base64::pshufb_combine_table + pop1 * 8),
+                0);
+  __m128i answer = __lsx_vshuf_b(pruned, pruned, compactmask);
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
+  __lsx_vst(answer, reinterpret_cast<__m128i *>(output), 0);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return result(error_code::OTHER, 0); // stub
-}
+struct block64 {
+  __m128i chunks[4];
+};
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return result(error_code::OTHER, 0); // stub
-}
+template <bool base64_url>
+static inline uint16_t to_base64_mask(__m128i *src, bool *error) {
+  const v16u8 ascii_space_tbl = {0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+                                 0x0,  0x9, 0xa, 0x0, 0xc, 0xd, 0x0, 0x0};
+  // credit: aqrit
+  /*
+  '0'(0x30)-'9'(0x39) => delta_values_index = 4
+  'A'(0x41)-'Z'(0x5a) => delta_values_index = 4/5/12(4+8)
+  'a'(0x61)-'z'(0x7a) => delta_values_index = 6/7/14(6+8)
+  '+'(0x2b)           => delta_values_index = 3
+  '/'(0x2f)           => delta_values_index = 2+8 = 10
+  '-'(0x2d)           => delta_values_index = 2+8 = 10
+  '_'(0x5f)           => delta_values_index = 5+8 = 13
+  */
+  v16u8 delta_asso = {0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
+                      0x0, 0x0, 0x0, 0x0, 0x0, 0xF, 0x0, 0xF};
+  v16i8 delta_values;
+  if (base64_url) {
+    delta_values =
+        v16i8{int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+              int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+              int8_t(0xB9), int8_t(0x00), int8_t(0x11), int8_t(0xC3),
+              int8_t(0xBF), int8_t(0xE0), int8_t(0xB9), int8_t(0xB9)};
+  } else {
+    delta_values =
+        v16i8{int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+              int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+              int8_t(0xB9), int8_t(0x00), int8_t(0x10), int8_t(0xC3),
+              int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9)};
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
-}
+  v16u8 check_asso;
+  if (base64_url) {
+    check_asso = v16u8{0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+                       0x01, 0x01, 0x03, 0x07, 0x0B, 0x06, 0x0B, 0x12};
+  } else {
+    check_asso = v16u8{0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+                       0x01, 0x01, 0x03, 0x07, 0x0B, 0x0B, 0x0B, 0x0F};
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
-    const char * /*buf*/, size_t /*len*/,
-    char16_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
-}
+  v16i8 check_values;
+  if (base64_url) {
+    check_values = v16i8{int8_t(0x0),  int8_t(0x80), int8_t(0x80), int8_t(0x80),
+                         int8_t(0xCF), int8_t(0xBF), int8_t(0xD3), int8_t(0xA6),
+                         int8_t(0xB5), int8_t(0x86), int8_t(0xD0), int8_t(0x80),
+                         int8_t(0xB0), int8_t(0x80), int8_t(0x0),  int8_t(0x0)};
+  } else {
+    check_values =
+        v16i8{int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
+              int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6),
+              int8_t(0xB5), int8_t(0x86), int8_t(0xD1), int8_t(0x80),
+              int8_t(0xB1), int8_t(0x80), int8_t(0x91), int8_t(0x80)};
+  }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
-    const char * /*buf*/, size_t /*len*/,
-    char32_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
-}
+  const __m128i shifted = __lsx_vsrli_b(*src, 3);
+  __m128i asso_index = __lsx_vand_v(*src, __lsx_vldi(0xF));
+  const __m128i delta_hash =
+      __lsx_vavgr_bu(__lsx_vshuf_b((__m128i)delta_asso, (__m128i)delta_asso,
+                                   (__m128i)asso_index),
+                     shifted);
+  const __m128i check_hash =
+      __lsx_vavgr_bu(__lsx_vshuf_b((__m128i)check_asso, (__m128i)check_asso,
+                                   (__m128i)asso_index),
+                     shifted);
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
-    const char * /*buf*/, size_t /*len*/,
-    char32_t * /*utf16_output*/) const noexcept {
-  return result(error_code::OTHER, 0); // stub
-}
+  const __m128i out =
+      __lsx_vsadd_b(__lsx_vshuf_b((__m128i)delta_values, (__m128i)delta_values,
+                                  (__m128i)delta_hash),
+                    *src);
+  const __m128i chk =
+      __lsx_vsadd_b(__lsx_vshuf_b((__m128i)check_values, (__m128i)check_values,
+                                  (__m128i)check_hash),
+                    *src);
+  unsigned int mask = __lsx_vpickve2gr_hu(__lsx_vmskltz_b(chk), 0);
+  if (mask) {
+    __m128i ascii_space = __lsx_vseq_b(__lsx_vshuf_b((__m128i)ascii_space_tbl,
+                                                     (__m128i)ascii_space_tbl,
+                                                     (__m128i)asso_index),
+                                       *src);
+    *error |=
+        (mask != __lsx_vpickve2gr_hu(__lsx_vmskltz_b((__m128i)ascii_space), 0));
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
-    const char * /*buf*/, size_t /*len*/,
-    char32_t * /*utf16_output*/) const noexcept {
-  return 0; // stub
+  *src = out;
+  return (uint16_t)mask;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::LITTLE>(buf, len,
-                                                            utf8_output);
+template <bool base64_url>
+static inline uint64_t to_base64_mask(block64 *b, bool *error) {
+  *error = 0;
+  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], error);
+  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], error);
+  uint64_t m2 = to_base64_mask<base64_url>(&b->chunks[2], error);
+  uint64_t m3 = to_base64_mask<base64_url>(&b->chunks[3], error);
+  return m0 | (m1 << 16) | (m2 << 32) | (m3 << 48);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert<endianness::BIG>(buf, len, utf8_output);
+static inline void copy_block(block64 *b, char *output) {
+  __lsx_vst(b->chunks[0], reinterpret_cast<__m128i *>(output), 0);
+  __lsx_vst(b->chunks[1], reinterpret_cast<__m128i *>(output), 16);
+  __lsx_vst(b->chunks[2], reinterpret_cast<__m128i *>(output), 32);
+  __lsx_vst(b->chunks[3], reinterpret_cast<__m128i *>(output), 48);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf8_output);
+static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
+  uint64_t nmask = ~mask;
+  uint64_t count =
+      __lsx_vpickve2gr_d(__lsx_vpcnt_h(__lsx_vreplgr2vr_d(nmask)), 0);
+  uint16_t *count_ptr = (uint16_t *)&count;
+  compress(b->chunks[0], uint16_t(mask), output);
+  compress(b->chunks[1], uint16_t(mask >> 16), output + count_ptr[0]);
+  compress(b->chunks[2], uint16_t(mask >> 32),
+           output + count_ptr[0] + count_ptr[1]);
+  compress(b->chunks[3], uint16_t(mask >> 48),
+           output + count_ptr[0] + count_ptr[1] + count_ptr[2]);
+  return count_ones(nmask);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
-      buf, len, utf8_output);
+// The caller of this function is responsible to ensure that there are 64 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char *src) {
+  b->chunks[0] = __lsx_vld(reinterpret_cast<const __m128i *>(src), 0);
+  b->chunks[1] = __lsx_vld(reinterpret_cast<const __m128i *>(src), 16);
+  b->chunks[2] = __lsx_vld(reinterpret_cast<const __m128i *>(src), 32);
+  b->chunks[3] = __lsx_vld(reinterpret_cast<const __m128i *>(src), 48);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::LITTLE>(buf, len,
-                                                                  utf8_output);
+// The caller of this function is responsible to ensure that there are 128 bytes
+// available from reading at src. The data is read into a block64 structure.
+static inline void load_block(block64 *b, const char16_t *src) {
+  __m128i m1 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 0);
+  __m128i m2 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 16);
+  __m128i m3 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 32);
+  __m128i m4 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 48);
+  __m128i m5 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 64);
+  __m128i m6 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 80);
+  __m128i m7 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 96);
+  __m128i m8 = __lsx_vld(reinterpret_cast<const __m128i *>(src), 112);
+  b->chunks[0] = __lsx_vssrlni_bu_h(m2, m1, 0);
+  b->chunks[1] = __lsx_vssrlni_bu_h(m4, m3, 0);
+  b->chunks[2] = __lsx_vssrlni_bu_h(m6, m5, 0);
+  b->chunks[3] = __lsx_vssrlni_bu_h(m8, m7, 0);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
-    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf16_to_utf8::convert_valid<endianness::BIG>(buf, len,
-                                                               utf8_output);
-}
+static inline void base64_decode(char *out, __m128i str) {
+  __m128i t0 = __lsx_vor_v(
+      __lsx_vslli_w(str, 26),
+      __lsx_vslli_w(__lsx_vand_v(str, __lsx_vldi(-1758 /*0x0000FF00*/)), 12));
+  __m128i t1 =
+      __lsx_vsrli_w(__lsx_vand_v(str, __lsx_vldi(-3521 /*0x003F0000*/)), 2);
+  __m128i t2 = __lsx_vor_v(t0, t1);
+  __m128i t3 = __lsx_vor_v(t2, __lsx_vsrli_w(str, 16));
+  const v16u8 pack_shuffle = {3, 2,  1,  7,  6, 5, 11, 10,
+                              9, 15, 14, 13, 0, 0, 0,  0};
+  t3 = __lsx_vshuf_b(t3, t3, (__m128i)pack_shuffle);
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert(buf, len, utf8_output);
+  // Store the output:
+  // we only need 12.
+  __lsx_vstelm_d(t3, out, 0, 0);
+  __lsx_vstelm_w(t3, out + 8, 0, 2);
 }
-
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert_with_errors(buf, len, utf8_output);
+// decode 64 bytes and output 48 bytes
+static inline void base64_decode_block(char *out, const char *src) {
+  base64_decode(out, __lsx_vld(reinterpret_cast<const __m128i *>(src), 0));
+  base64_decode(out + 12,
+                __lsx_vld(reinterpret_cast<const __m128i *>(src), 16));
+  base64_decode(out + 24,
+                __lsx_vld(reinterpret_cast<const __m128i *>(src), 32));
+  base64_decode(out + 36,
+                __lsx_vld(reinterpret_cast<const __m128i *>(src), 48));
 }
-
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
-    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
-  return scalar::utf32_to_utf8::convert_valid(buf, len, utf8_output);
+static inline void base64_decode_block_safe(char *out, const char *src) {
+  base64_decode_block(out, src);
 }
-
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::LITTLE>(buf, len,
-                                                             utf16_output);
+static inline void base64_decode_block(char *out, block64 *b) {
+  base64_decode(out, b->chunks[0]);
+  base64_decode(out + 12, b->chunks[1]);
+  base64_decode(out + 24, b->chunks[2]);
+  base64_decode(out + 36, b->chunks[3]);
 }
-
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert<endianness::BIG>(buf, len,
-                                                          utf16_output);
+static inline void base64_decode_block_safe(char *out, block64 *b) {
+  base64_decode_block(out, b);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf16_output);
-}
+template <bool base64_url, typename char_type>
+full_result
+compress_decode_base64(char *dst, const char_type *src, size_t srclen,
+                       base64_options options,
+                       last_chunk_handling_options last_chunk_options) {
+  const uint8_t *to_base64 = base64_url ? tables::base64::to_base64_url_value
+                                        : tables::base64::to_base64_value;
+  size_t equallocation =
+      srclen; // location of the first padding character if any
+  // skip trailing spaces
+  while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+         to_base64[uint8_t(src[srclen - 1])] == 64) {
+    srclen--;
+  }
+  size_t equalsigns = 0;
+  if (srclen > 0 && src[srclen - 1] == '=') {
+    equallocation = srclen - 1;
+    srclen--;
+    equalsigns = 1;
+    // skip trailing spaces
+    while (srclen > 0 && scalar::base64::is_eight_byte(src[srclen - 1]) &&
+           to_base64[uint8_t(src[srclen - 1])] == 64) {
+      srclen--;
+    }
+    if (srclen > 0 && src[srclen - 1] == '=') {
+      equallocation = srclen - 1;
+      srclen--;
+      equalsigns = 2;
+    }
+  }
+  if (srclen == 0) {
+    if (equalsigns > 0) {
+      return {INVALID_BASE64_CHARACTER, equallocation, 0};
+    }
+    return {SUCCESS, 0, 0};
+  }
+  const char_type *const srcinit = src;
+  const char *const dstinit = dst;
+  const char_type *const srcend = src + srclen;
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
-      buf, len, utf16_output);
-}
+  constexpr size_t block_size = 10;
+  char buffer[block_size * 64];
+  char *bufferptr = buffer;
+  if (srclen >= 64) {
+    const char_type *const srcend64 = src + srclen - 64;
+    while (src <= srcend64) {
+      block64 b;
+      load_block(&b, src);
+      src += 64;
+      bool error = false;
+      uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
+      if (badcharmask) {
+        if (error) {
+          src -= 64;
+          while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+                 to_base64[uint8_t(*src)] <= 64) {
+            src++;
+          }
+          if (src < srcend) {
+            // should never happen
+          }
+          return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                  size_t(dst - dstinit)};
+        }
+      }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::LITTLE>(
-      buf, len, utf16_output);
-}
+      if (badcharmask != 0) {
+        // optimization opportunity: check for simple masks like those made of
+        // continuous 1s followed by continuous 0s. And masks containing a
+        // single bad character.
+        bufferptr += compress_block(&b, badcharmask, bufferptr);
+      } else {
+        // optimization opportunity: if bufferptr == buffer and mask == 0, we
+        // can avoid the call to compress_block and decode directly.
+        copy_block(&b, bufferptr);
+        bufferptr += 64;
+      }
+      if (bufferptr >= (block_size - 1) * 64 + buffer) {
+        for (size_t i = 0; i < (block_size - 1); i++) {
+          base64_decode_block(dst, buffer + i * 64);
+          dst += 48;
+        }
+        std::memcpy(buffer, buffer + (block_size - 1) * 64,
+                    64); // 64 might be too much
+        bufferptr -= (block_size - 1) * 64;
+      }
+    }
+  }
+  char *buffer_start = buffer;
+  // Optimization note: if this is almost full, then it is worth our
+  // time, otherwise, we should just decode directly.
+  int last_block = (int)((bufferptr - buffer_start) % 64);
+  if (last_block != 0 && srcend - src + last_block >= 64) {
+    while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
+      uint8_t val = to_base64[uint8_t(*src)];
+      *bufferptr = char(val);
+      if (!scalar::base64::is_eight_byte(*src) || val > 64) {
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
+      }
+      bufferptr += (val <= 63);
+      src++;
+    }
+  }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
-    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
-  return scalar::utf32_to_utf16::convert_valid<endianness::BIG>(buf, len,
-                                                                utf16_output);
+  for (; buffer_start + 64 <= bufferptr; buffer_start += 64) {
+    base64_decode_block(dst, buffer_start);
+    dst += 48;
+  }
+  if ((bufferptr - buffer_start) % 64 != 0) {
+    while (buffer_start + 4 < bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 4);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    if (buffer_start + 4 <= bufferptr) {
+      uint32_t triple = ((uint32_t(uint8_t(buffer_start[0])) << 3 * 6) +
+                         (uint32_t(uint8_t(buffer_start[1])) << 2 * 6) +
+                         (uint32_t(uint8_t(buffer_start[2])) << 1 * 6) +
+                         (uint32_t(uint8_t(buffer_start[3])) << 0 * 6))
+                        << 8;
+      triple = scalar::utf32::swap_bytes(triple);
+      std::memcpy(dst, &triple, 3);
+
+      dst += 3;
+      buffer_start += 4;
+    }
+    // we may have 1, 2 or 3 bytes left and we need to decode them so let us
+    // backtrack
+    int leftover = int(bufferptr - buffer_start);
+    while (leftover > 0) {
+      while (to_base64[uint8_t(*(src - 1))] == 64) {
+        src--;
+      }
+      src--;
+      leftover--;
+    }
+  }
+  if (src < srcend + equalsigns) {
+    full_result r = scalar::base64::base64_tail_decode(
+        dst, src, srcend - src, equalsigns, options, last_chunk_options);
+    r.input_count += size_t(src - srcinit);
+    if (r.error == error_code::INVALID_BASE64_CHARACTER ||
+        r.error == error_code::BASE64_EXTRA_BITS) {
+      return r;
+    } else {
+      r.output_count += size_t(dst - dstinit);
+    }
+    if (last_chunk_options != stop_before_partial &&
+        r.error == error_code::SUCCESS && equalsigns > 0) {
+      // additional checks
+      if ((r.output_count % 3 == 0) ||
+          ((r.output_count % 3) + 1 + equalsigns != 4)) {
+        r.error = error_code::INVALID_BASE64_CHARACTER;
+        r.input_count = equallocation;
+      }
+    }
+    return r;
+  }
+  if (equalsigns > 0) {
+    if ((size_t(dst - dstinit) % 3 == 0) ||
+        ((size_t(dst - dstinit) % 3) + 1 + equalsigns != 4)) {
+      return {INVALID_BASE64_CHARACTER, equallocation, size_t(dst - dstinit)};
+    }
+  }
+  return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
+/* end file src/lsx/lsx_base64.cpp */
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::LITTLE>(buf, len,
-                                                             utf32_output);
+} // namespace
+} // namespace lsx
+} // namespace simdutf
+
+/* begin file src/generic/buf_block_reader.h */
+namespace simdutf {
+namespace lsx {
+namespace {
+
+// Walks through a buffer in block-sized increments, loading the last part with
+// spaces
+template <size_t STEP_SIZE> struct buf_block_reader {
+public:
+  simdutf_really_inline buf_block_reader(const uint8_t *_buf, size_t _len);
+  simdutf_really_inline size_t block_index();
+  simdutf_really_inline bool has_full_block() const;
+  simdutf_really_inline const uint8_t *full_block() const;
+  /**
+   * Get the last block, padded with spaces.
+   *
+   * There will always be a last block, with at least 1 byte, unless len == 0
+   * (in which case this function fills the buffer with spaces and returns 0. In
+   * particular, if len == STEP_SIZE there will be 0 full_blocks and 1 remainder
+   * block with STEP_SIZE bytes and no spaces for padding.
+   *
+   * @return the number of effective characters in the last block.
+   */
+  simdutf_really_inline size_t get_remainder(uint8_t *dst) const;
+  simdutf_really_inline void advance();
+
+private:
+  const uint8_t *buf;
+  const size_t len;
+  const size_t lenminusstep;
+  size_t idx;
+};
+
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text_64(const uint8_t *text) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    buf[i] = int8_t(text[i]) < ' ' ? '_' : int8_t(text[i]);
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert<endianness::BIG>(buf, len,
-                                                          utf32_output);
+// Routines to print masks and text for debugging bitmask operations
+simdutf_unused static char *format_input_text(const simd8x64<uint8_t> &in) {
+  static char *buf =
+      reinterpret_cast<char *>(malloc(sizeof(simd8x64<uint8_t>) + 1));
+  in.store(reinterpret_cast<uint8_t *>(buf));
+  for (size_t i = 0; i < sizeof(simd8x64<uint8_t>); i++) {
+    if (buf[i] < ' ') {
+      buf[i] = '_';
+    }
+  }
+  buf[sizeof(simd8x64<uint8_t>)] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
-      buf, len, utf32_output);
+simdutf_unused static char *format_mask(uint64_t mask) {
+  static char *buf = reinterpret_cast<char *>(malloc(64 + 1));
+  for (size_t i = 0; i < 64; i++) {
+    buf[i] = (mask & (size_t(1) << i)) ? 'X' : ' ';
+  }
+  buf[64] = '\0';
+  return buf;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
-      buf, len, utf32_output);
-}
+template <size_t STEP_SIZE>
+simdutf_really_inline
+buf_block_reader<STEP_SIZE>::buf_block_reader(const uint8_t *_buf, size_t _len)
+    : buf{_buf}, len{_len}, lenminusstep{len < STEP_SIZE ? 0 : len - STEP_SIZE},
+      idx{0} {}
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::LITTLE>(
-      buf, len, utf32_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t buf_block_reader<STEP_SIZE>::block_index() {
+  return idx;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
-    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
-  return scalar::utf16_to_utf32::convert_valid<endianness::BIG>(buf, len,
-                                                                utf32_output);
+template <size_t STEP_SIZE>
+simdutf_really_inline bool buf_block_reader<STEP_SIZE>::has_full_block() const {
+  return idx < lenminusstep;
 }
 
-void implementation::change_endianness_utf16(const char16_t *input,
-                                             size_t length,
-                                             char16_t *output) const noexcept {
-  scalar::utf16::change_endianness_utf16(input, length, output);
+template <size_t STEP_SIZE>
+simdutf_really_inline const uint8_t *
+buf_block_reader<STEP_SIZE>::full_block() const {
+  return &buf[idx];
 }
 
-simdutf_warn_unused size_t implementation::count_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::count_code_points<endianness::LITTLE>(input, length);
+template <size_t STEP_SIZE>
+simdutf_really_inline size_t
+buf_block_reader<STEP_SIZE>::get_remainder(uint8_t *dst) const {
+  if (len == idx) {
+    return 0;
+  } // memcpy(dst, null, 0) will trigger an error with some sanitizers
+  std::memset(dst, 0x20,
+              STEP_SIZE); // std::memset STEP_SIZE because it is more efficient
+                          // to write out 8 or 16 bytes at once.
+  std::memcpy(dst, buf + idx, len - idx);
+  return len - idx;
 }
 
-simdutf_warn_unused size_t implementation::count_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::count_code_points<endianness::BIG>(input, length);
+template <size_t STEP_SIZE>
+simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
+  idx += STEP_SIZE;
 }
 
-simdutf_warn_unused size_t
-implementation::count_utf8(const char *input, size_t length) const noexcept {
-  return utf8::count_code_points(input, length);
-}
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/buf_block_reader.h */
+/* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_validation {
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::LITTLE>(input,
-                                                                   length);
-}
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
-}
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::LITTLE>(input,
-                                                                    length);
-}
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
-}
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return scalar::utf8::utf16_length_from_utf8(input, length);
-}
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  return scalar::utf32::utf8_length_from_utf32(input, length);
-}
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
-    const char32_t *input, size_t length) const noexcept {
-  return scalar::utf32::utf16_length_from_utf32(input, length);
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
 }
-
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
-    const char *input, size_t length) const noexcept {
-  return scalar::utf8::count_code_points(input, length);
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
+//
+// Return nonzero if there are incomplete multibyte characters at the end of the
+// block: e.g. if there is a 4-byte character, but it is 3 bytes from the end.
+//
+simdutf_really_inline simd8<uint8_t> is_incomplete(const simd8<uint8_t> input) {
+  // If the previous input's last 3 bytes match this, they're too short (they
+  // ended at EOF):
+  // ... 1111____ 111_____ 11______
+  static const uint8_t max_array[32] = {255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        255,
+                                        0b11110000u - 1,
+                                        0b11100000u - 1,
+                                        0b11000000u - 1};
+  const simd8<uint8_t> max_value(
+      &max_array[sizeof(max_array) - sizeof(simd8<uint8_t>)]);
+  return input.gt_bits(max_value);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  // skip trailing spaces
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
+struct utf8_checker {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+  // The last input we received
+  simd8<uint8_t> prev_input_block;
+  // Whether the last input we received was incomplete (used for ASCII fast
+  // path)
+  simd8<uint8_t> prev_incomplete;
+
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
+
+  // The only problem that can happen at EOF is that a multibyte character is
+  // too short or a byte value too large in the last bytes: check_special_cases
+  // only checks for bytes too large in the first of two bytes.
+  simdutf_really_inline void check_eof() {
+    // If the previous block had incomplete UTF-8 characters at the end, an
+    // ASCII block can't possibly finish them.
+    this->error |= this->prev_incomplete;
   }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+
+  simdutf_really_inline void check_next_input(const simd8x64<uint8_t> &input) {
+    if (simdutf_likely(is_ascii(input))) {
+      this->error |= this->prev_incomplete;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      static_assert((simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                        (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+                    "We support either two or four chunks per 64-byte block.");
+      if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+      } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+        this->check_utf8_bytes(input.chunks[0], this->prev_input_block);
+        this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+        this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+      }
+      this->prev_incomplete =
+          is_incomplete(input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1]);
+      this->prev_input_block = input.chunks[simd8x64<uint8_t>::NUM_CHUNKS - 1];
     }
-    return {SUCCESS, 0};
   }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
+
+  // do not forget to call check_eof!
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
   }
-  return r;
+
+}; // struct utf8_checker
+} // namespace utf8_validation
+
+using utf8_validation::utf8_checker;
+
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
+/* begin file src/generic/utf8_validation/utf8_validator.h */
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_validation {
+
+/**
+ * Validates that the string is actual UTF-8.
+ */
+template <class checker>
+bool generic_validate_utf8(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  return !c.errors();
 }
 
-simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
-    const char16_t *input, size_t length) const noexcept {
-  return scalar::base64::maximal_binary_length_from_base64(input, length);
+bool generic_validate_utf8(const char *input, size_t length) {
+  return generic_validate_utf8<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-simdutf_warn_unused result implementation::base64_to_binary(
-    const char16_t *input, size_t length, char *output, base64_options options,
-    last_chunk_handling_options last_chunk_options) const noexcept {
-  // skip trailing spaces
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
+/**
+ * Validates that the string is actual UTF-8 and stops on errors.
+ */
+template <class checker>
+result generic_validate_utf8_with_errors(const uint8_t *input, size_t length) {
+  checker c{};
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    c.check_next_input(in);
+    if (c.errors()) {
+      if (count != 0) {
+        count--;
+      } // Sometimes the error is only detected in the next chunk
+      result res = scalar::utf8::rewind_and_validate_with_errors(
+          reinterpret_cast<const char *>(input),
+          reinterpret_cast<const char *>(input + count), length - count);
+      res.count += count;
+      return res;
     }
-    return {SUCCESS, 0};
+    reader.advance();
+    count += 64;
   }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  c.check_next_input(in);
+  reader.advance();
+  c.check_eof();
+  if (c.errors()) {
+    if (count != 0) {
+      count--;
+    } // Sometimes the error is only detected in the next chunk
+    result res = scalar::utf8::rewind_and_validate_with_errors(
+        reinterpret_cast<const char *>(input),
+        reinterpret_cast<const char *>(input) + count, length - count);
+    res.count += count;
+    return res;
+  } else {
+    return result(error_code::SUCCESS, length);
   }
-  return r;
 }
 
-simdutf_warn_unused size_t implementation::base64_length_from_binary(
-    size_t length, base64_options options) const noexcept {
-  return scalar::base64::base64_length_from_binary(length, options);
+result generic_validate_utf8_with_errors(const char *input, size_t length) {
+  return generic_validate_utf8_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-size_t implementation::binary_to_base64(const char *input, size_t length,
-                                        char *output,
-                                        base64_options options) const noexcept {
-  return scalar::base64::binary_to_base64(input, length, output, options);
+template <class checker>
+bool generic_validate_ascii(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  uint8_t blocks[64]{};
+  simd::simd8x64<uint8_t> running_or(blocks);
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    running_or |= in;
+    reader.advance();
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  running_or |= in;
+  return running_or.is_ascii();
 }
-} // namespace ppc64
-} // namespace simdutf
-
-/* begin file src/simdutf/ppc64/end.h */
-/* end file src/simdutf/ppc64/end.h */
-/* end file src/ppc64/implementation.cpp */
-#endif
-#if SIMDUTF_IMPLEMENTATION_RVV
-/* begin file src/rvv/implementation.cpp */
-
-
-
 
+bool generic_validate_ascii(const char *input, size_t length) {
+  return generic_validate_ascii<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
 
-/* begin file src/simdutf/rvv/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "rvv"
-// #define SIMDUTF_IMPLEMENTATION rvv
+template <class checker>
+result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
+  buf_block_reader<64> reader(input, length);
+  size_t count{0};
+  while (reader.has_full_block()) {
+    simd::simd8x64<uint8_t> in(reader.full_block());
+    if (!in.is_ascii()) {
+      result res = scalar::ascii::validate_with_errors(
+          reinterpret_cast<const char *>(input + count), length - count);
+      return result(res.error, count + res.count);
+    }
+    reader.advance();
 
-#if SIMDUTF_CAN_ALWAYS_RUN_RVV
-// nothing needed.
-#else
-SIMDUTF_TARGET_RVV
-#endif
-/* end file src/simdutf/rvv/begin.h */
-namespace simdutf {
-namespace rvv {
-namespace {
-#ifndef SIMDUTF_RVV_H
-  #error "rvv.h must be included"
-#endif
+    count += 64;
+  }
+  uint8_t block[64]{};
+  reader.get_remainder(block);
+  simd::simd8x64<uint8_t> in(block);
+  if (!in.is_ascii()) {
+    result res = scalar::ascii::validate_with_errors(
+        reinterpret_cast<const char *>(input + count), length - count);
+    return result(res.error, count + res.count);
+  } else {
+    return result(error_code::SUCCESS, length);
+  }
+}
+
+result generic_validate_ascii_with_errors(const char *input, size_t length) {
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
+}
 
+} // namespace utf8_validation
 } // unnamed namespace
-} // namespace rvv
+} // namespace lsx
 } // namespace simdutf
+/* end file src/generic/utf8_validation/utf8_validator.h */
+
+// transcoding from UTF-8 to Latin 1
+/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
-//
-// Implementation-specific overrides
-//
 namespace simdutf {
-namespace rvv {
-/* begin file src/rvv/rvv_helpers.inl.cpp */
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t
-rvv_utf32_store_utf16_m4(uint16_t *dst, vuint32m4_t utf32, size_t vl,
-                         vbool4_t m4even) {
-  /* convert [000000000000aaaa|aaaaaabbbbbbbbbb]
-   * to      [110111bbbbbbbbbb|110110aaaaaaaaaa] */
-  vuint32m4_t sur = __riscv_vsub_vx_u32m4(utf32, 0x10000, vl);
-  sur = __riscv_vor_vv_u32m4(__riscv_vsll_vx_u32m4(sur, 16, vl),
-                             __riscv_vsrl_vx_u32m4(sur, 10, vl), vl);
-  sur = __riscv_vand_vx_u32m4(sur, 0x3FF03FF, vl);
-  sur = __riscv_vor_vx_u32m4(sur, 0xDC00D800, vl);
-  /* merge 1 byte utf32 and 2 byte sur */
-  vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(utf32, 0xFFFF, vl);
-  vuint16m4_t utf32_16 = __riscv_vreinterpret_v_u32m4_u16m4(
-      __riscv_vmerge_vvm_u32m4(utf32, sur, m4, vl));
-  /* compress and store */
-  vbool4_t mOut = __riscv_vmor_mm_b4(
-      __riscv_vmsne_vx_u16m4_b4(utf32_16, 0, vl * 2), m4even, vl * 2);
-  vuint16m4_t vout = __riscv_vcompress_vm_u16m4(utf32_16, mOut, vl * 2);
-  vl = __riscv_vcpop_m_b4(mOut, vl * 2);
-  __riscv_vse16_v_u16m4(dst, simdutf_byteflip<bflip>(vout, vl), vl);
-  return vl;
-};
-/* end file src/rvv/rvv_helpers.inl.cpp */
+namespace lsx {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
 
-/* begin file src/rvv/rvv_length_from.inl.cpp */
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
 
-simdutf_warn_unused size_t
-implementation::count_utf16le(const char16_t *src, size_t len) const noexcept {
-  return utf32_length_from_utf16le(src, len);
-}
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      FORBIDDEN,
+      // 1110____ ________ <three byte lead in byte 1>
+      FORBIDDEN,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      FORBIDDEN);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-simdutf_warn_unused size_t
-implementation::count_utf16be(const char16_t *src, size_t len) const noexcept {
-  return utf32_length_from_utf16be(src, len);
-}
+              // ____0100 ________
+              FORBIDDEN,
+              // ____0101 ________
+              FORBIDDEN,
+              // ____011_ ________
+              FORBIDDEN, FORBIDDEN,
 
-simdutf_warn_unused size_t
-implementation::count_utf8(const char *src, size_t len) const noexcept {
-  return utf32_length_from_utf8(src, len);
-}
+              // ____1___ ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              // ____1101 ________
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
-    const char *src, size_t len) const noexcept {
-  return utf32_length_from_utf8(src, len);
-}
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf16(size_t len) const noexcept {
-  return len;
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
 }
 
-simdutf_warn_unused size_t
-implementation::latin1_length_from_utf32(size_t len) const noexcept {
-  return len;
-}
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-simdutf_warn_unused size_t
-implementation::utf16_length_from_latin1(size_t len) const noexcept {
-  return len;
-}
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    this->error |= check_special_cases(input, prev1);
+  }
 
-simdutf_warn_unused size_t
-implementation::utf32_length_from_latin1(size_t len) const noexcept {
-  return len;
-}
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      latin1_output += howmany;
+    }
+    return latin1_output - start;
+  }
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
-    const char *src, size_t len) const noexcept {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
-    vbool1_t mask = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
-    count += __riscv_vcpop_m_b1(mask, vl);
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char *latin1_output) {
+    size_t pos = 0;
+    char *start{latin1_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        if (errors()) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        latin1_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, latin1_output - start);
   }
-  return count;
-}
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t
-rvv_utf32_length_from_utf16(const char16_t *src, size_t len) {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    v = simdutf_byteflip<bflip>(v, vl);
-    vbool2_t notHigh =
-        __riscv_vmor_mm_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl),
-                           __riscv_vmsltu_vx_u16m8_b2(v, 0xDC00, vl), vl);
-    count += __riscv_vcpop_m_b2(notHigh, vl);
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
   }
-  return count;
-}
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
-    const char16_t *src, size_t len) const noexcept {
-  return rvv_utf32_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
-}
+}; // struct utf8_checker
+} // namespace utf8_to_latin1
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
-simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
-    const char16_t *src, size_t len) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf32_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
-  else
-    return rvv_utf32_length_from_utf16<simdutf_ByteFlip::V>(src, len);
-}
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_to_latin1 {
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
-    const char *src, size_t len) const noexcept {
-  size_t count = len;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
-    count += __riscv_vcpop_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
   }
-  return count;
-}
-
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t
-rvv_utf8_length_from_utf16(const char16_t *src, size_t len) {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    v = simdutf_byteflip<bflip>(v, vl);
-    vbool2_t m234 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7F, vl);
-    vbool2_t m34 = __riscv_vmsgtu_vx_u16m8_b2(v, 0x7FF, vl);
-    vbool2_t notSur =
-        __riscv_vmor_mm_b2(__riscv_vmsltu_vx_u16m8_b2(v, 0xD800, vl),
-                           __riscv_vmsgtu_vx_u16m8_b2(v, 0xDFFF, vl), vl);
-    vbool2_t m3 = __riscv_vmand_mm_b2(m34, notSur, vl);
-    count += vl + __riscv_vcpop_m_b2(m234, vl) + __riscv_vcpop_m_b2(m3, vl);
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
   }
-  return count;
-}
-
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
-    const char16_t *src, size_t len) const noexcept {
-  return rvv_utf8_length_from_utf16<simdutf_ByteFlip::NONE>(src, len);
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
 }
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
-    const char16_t *src, size_t len) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf8_length_from_utf16<simdutf_ByteFlip::ZVBB>(src, len);
-  else
-    return rvv_utf8_length_from_utf16<simdutf_ByteFlip::V>(src, len);
-}
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace lsx
+} // namespace simdutf
+  // namespace simdutf
+/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
 
-simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
-    const char32_t *src, size_t len) const noexcept {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
-    vbool4_t m234 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7F, vl);
-    vbool4_t m34 = __riscv_vmsgtu_vx_u32m8_b4(v, 0x7FF, vl);
-    vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
-    count += vl + __riscv_vcpop_m_b4(m234, vl) + __riscv_vcpop_m_b4(m34, vl) +
-             __riscv_vcpop_m_b4(m4, vl);
-  }
-  return count;
-}
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_to_utf16 {
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
-    const char *src, size_t len) const noexcept {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
-    vbool1_t m1234 = __riscv_vmsgt_vx_i8m8_b1(v, -65, vl);
-    vbool1_t m4 = __riscv_vmsgtu_vx_u8m8_b1(__riscv_vreinterpret_u8m8(v),
-                                            (uint8_t)0b11101111, vl);
-    count += __riscv_vcpop_m_b1(m1234, vl) + __riscv_vcpop_m_b1(m4, vl);
-  }
-  return count;
-}
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
-    const char32_t *src, size_t len) const noexcept {
-  size_t count = 0;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
-    vbool4_t m4 = __riscv_vmsgtu_vx_u32m8_b4(v, 0xFFFF, vl);
-    count += vl + __riscv_vcpop_m_b4(m4, vl);
+template <endianness endian>
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
+  size_t pos = 0;
+  char16_t *start{utf16_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
+      pos += 64;
+    } else {
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
   }
-  return count;
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
 }
-/* end file src/rvv/rvv_length_from.inl.cpp */
-/* begin file src/rvv/rvv_validate.inl.cpp */
 
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
-simdutf_warn_unused bool
-implementation::validate_ascii(const char *src, size_t len) const noexcept {
-  size_t vlmax = __riscv_vsetvlmax_e8m8();
-  vint8m8_t mask = __riscv_vmv_v_x_i8m8(0, vlmax);
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
-    mask = __riscv_vor_vv_i8m8_tu(mask, mask, v, vl);
-  }
-  return __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(mask, 0, vlmax), vlmax) <
-         0;
-}
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_to_utf16 {
+using namespace simd;
 
-simdutf_warn_unused result implementation::validate_ascii_with_errors(
-    const char *src, size_t len) const noexcept {
-  const char *beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m8(len);
-    vint8m8_t v = __riscv_vle8_v_i8m8((int8_t *)src, vl);
-    long idx = __riscv_vfirst_m_b1(__riscv_vmslt_vx_i8m8_b1(v, 0, vl), vl);
-    if (idx >= 0)
-      return result(error_code::TOO_LARGE, src - beg + idx);
-  }
-  return result(error_code::SUCCESS, src - beg);
-}
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-/* Returns a close estimation of the number of valid UTF-8 bytes up to the
- * first invalid one, but never overestimating. */
-simdutf_really_inline static size_t rvv_count_valid_utf8(const char *src,
-                                                         size_t len) {
-  const char *beg = src;
-  if (len < 32)
-    return 0;
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-  /* validate first three bytes */
-  {
-    size_t idx = 3;
-    while (idx < len && (src[idx] >> 6) == 0b10)
-      ++idx;
-    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
-      return 0;
-  }
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
-  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
-  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-  const vuint8m1_t err1tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
-  const vuint8m1_t err2tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
-  const vuint8m1_t err3tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-  size_t tail = 3;
-  size_t n = len - tail;
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
+}
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
 
-  for (size_t vl; n > 0; n -= vl, src += vl) {
-    vl = __riscv_vsetvl_e8m4(n);
-    vuint8m4_t v0 = __riscv_vle8_v_u8m4((uint8_t const *)src, vl);
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
 
-    uint8_t next0 = src[vl + 0];
-    uint8_t next1 = src[vl + 1];
-    uint8_t next2 = src[vl + 2];
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
+  }
 
-    /* fast path: ASCII */
-    if (__riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u8m4_b2(v0, 0b01111111, vl), vl) <
-            0 &&
-        (next0 | next1 | next2) < 0b10000000)
-      continue;
+  template <endianness endian>
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      return 0;
+    }
+    if (pos < size) {
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf16_output += howmany;
+    }
+    return utf16_output - start;
+  }
 
-    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
-     * https://arxiv.org/abs/2010.03090 */
-    vuint8m4_t v1 = __riscv_vslide1down_vx_u8m4(v0, next0, vl);
-    vuint8m4_t v2 = __riscv_vslide1down_vx_u8m4(v1, next1, vl);
-    vuint8m4_t v3 = __riscv_vslide1down_vx_u8m4(v2, next2, vl);
+  template <endianness endian>
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char16_t *utf16_output) {
+    size_t pos = 0;
+    char16_t *start{utf16_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf16_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf16_output - start);
+  }
 
-    vuint8m4_t s1 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
-        __riscv_vreinterpret_v_u8m4_u16m4(v2), 4, __riscv_vsetvlmax_e16m4()));
-    vuint8m4_t s3 = __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vsrl_vx_u16m4(
-        __riscv_vreinterpret_v_u8m4_u16m4(v3), 4, __riscv_vsetvlmax_e16m4()));
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
+  }
 
-    vuint8m4_t idx2 = __riscv_vand_vx_u8m4(v2, 0xF, vl);
-    vuint8m4_t idx1 = __riscv_vand_vx_u8m4(s1, 0xF, vl);
-    vuint8m4_t idx3 = __riscv_vand_vx_u8m4(s3, 0xF, vl);
+}; // struct utf8_checker
+} // namespace utf8_to_utf16
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
-    vuint8m4_t err1 = simdutf_vrgather_u8m1x4(err1tbl, idx1);
-    vuint8m4_t err2 = simdutf_vrgather_u8m1x4(err2tbl, idx2);
-    vuint8m4_t err3 = simdutf_vrgather_u8m1x4(err3tbl, idx3);
-    vint8m4_t errs = __riscv_vreinterpret_v_u8m4_i8m4(
-        __riscv_vand_vv_u8m4(__riscv_vand_vv_u8m4(err1, err2, vl), err3, vl));
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_to_utf32 {
 
-    vbool2_t is_3 = __riscv_vmsgtu_vx_u8m4_b2(v1, 0b11100000 - 1, vl);
-    vbool2_t is_4 = __riscv_vmsgtu_vx_u8m4_b2(v0, 0b11110000 - 1, vl);
-    vbool2_t is_34 = __riscv_vmor_mm_b2(is_3, is_4, vl);
-    vbool2_t err34 =
-        __riscv_vmxor_mm_b2(is_34, __riscv_vmslt_vx_i8m4_b2(errs, 0, vl), vl);
-    vbool2_t errm =
-        __riscv_vmor_mm_b2(__riscv_vmsgt_vx_i8m4_b2(errs, 0, vl), err34, vl);
-    if (__riscv_vfirst_m_b2(errm, vl) >= 0)
-      break;
-  }
+using namespace simd;
 
-  /* we need to validate the last character */
-  while (tail < len && (src[0] >> 6) == 0b10)
-    --src, ++tail;
-  return src - beg;
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
+  size_t pos = 0;
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+    }
+  }
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
 }
 
-simdutf_warn_unused bool
-implementation::validate_utf8(const char *src, size_t len) const noexcept {
-  size_t count = rvv_count_valid_utf8(src, len);
-  return scalar::utf8::validate(src + count, len - count);
-}
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-simdutf_warn_unused result implementation::validate_utf8_with_errors(
-    const char *src, size_t len) const noexcept {
-  size_t count = rvv_count_valid_utf8(src, len);
-  result res = scalar::utf8::validate_with_errors(src + count, len - count);
-  return result(res.error, count + res.count);
-}
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8_to_utf32 {
+using namespace simd;
 
-simdutf_warn_unused bool
-implementation::validate_utf16le(const char16_t *src,
-                                 size_t len) const noexcept {
-  return validate_utf16le_with_errors(src, len).error == error_code::SUCCESS;
-}
+simdutf_really_inline simd8<uint8_t>
+check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
+  // Bit 1 = Too Long (ASCII followed by continuation)
+  // Bit 2 = Overlong 3-byte
+  // Bit 4 = Surrogate
+  // Bit 5 = Overlong 2-byte
+  // Bit 7 = Two Continuations
+  constexpr const uint8_t TOO_SHORT = 1 << 0;  // 11______ 0_______
+                                               // 11______ 11______
+  constexpr const uint8_t TOO_LONG = 1 << 1;   // 0_______ 10______
+  constexpr const uint8_t OVERLONG_3 = 1 << 2; // 11100000 100_____
+  constexpr const uint8_t SURROGATE = 1 << 4;  // 11101101 101_____
+  constexpr const uint8_t OVERLONG_2 = 1 << 5; // 1100000_ 10______
+  constexpr const uint8_t TWO_CONTS = 1 << 7;  // 10______ 10______
+  constexpr const uint8_t TOO_LARGE = 1 << 3;  // 11110100 1001____
+                                               // 11110100 101_____
+                                               // 11110101 1001____
+                                               // 11110101 101_____
+                                               // 1111011_ 1001____
+                                               // 1111011_ 101_____
+                                               // 11111___ 1001____
+                                               // 11111___ 101_____
+  constexpr const uint8_t TOO_LARGE_1000 = 1 << 6;
+  // 11110101 1000____
+  // 1111011_ 1000____
+  // 11111___ 1000____
+  constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
 
-simdutf_warn_unused bool
-implementation::validate_utf16be(const char16_t *src,
-                                 size_t len) const noexcept {
-  return validate_utf16be_with_errors(src, len).error == error_code::SUCCESS;
-}
+  const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
+      // 0_______ ________ <ASCII in byte 1>
+      TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG, TOO_LONG,
+      TOO_LONG,
+      // 10______ ________ <continuation in byte 1>
+      TWO_CONTS, TWO_CONTS, TWO_CONTS, TWO_CONTS,
+      // 1100____ ________ <two byte lead in byte 1>
+      TOO_SHORT | OVERLONG_2,
+      // 1101____ ________ <two byte lead in byte 1>
+      TOO_SHORT,
+      // 1110____ ________ <three byte lead in byte 1>
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      // 1111____ ________ <four+ byte lead in byte 1>
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+  constexpr const uint8_t CARRY =
+      TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
+  const simd8<uint8_t> byte_1_low =
+      (prev1 & 0x0F)
+          .lookup_16<uint8_t>(
+              // ____0000 ________
+              CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+              // ____0001 ________
+              CARRY | OVERLONG_2,
+              // ____001_ ________
+              CARRY, CARRY,
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static result
-rvv_validate_utf16_with_errors(const char16_t *src, size_t len) {
-  const char16_t *beg = src;
-  uint16_t last = 0;
-  for (size_t vl; len > 0;
-       len -= vl, src += vl, last = simdutf_byteflip<bflip>(src[-1])) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v1 = __riscv_vle16_v_u16m8((const uint16_t *)src, vl);
-    v1 = simdutf_byteflip<bflip>(v1, vl);
-    vuint16m8_t v0 = __riscv_vslide1up_vx_u16m8(v1, last, vl);
+              // ____0100 ________
+              CARRY | TOO_LARGE,
+              // ____0101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____011_ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
-    vbool2_t surhi = __riscv_vmseq_vx_u16m8_b2(
-        __riscv_vand_vx_u16m8(v0, 0xFC00, vl), 0xD800, vl);
-    vbool2_t surlo = __riscv_vmseq_vx_u16m8_b2(
-        __riscv_vand_vx_u16m8(v1, 0xFC00, vl), 0xDC00, vl);
+              // ____1___ ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              // ____1101 ________
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
+  const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
+      // ________ 0_______ <ASCII in byte 2>
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
+      TOO_SHORT, TOO_SHORT,
 
-    long idx = __riscv_vfirst_m_b2(__riscv_vmxor_mm_b2(surhi, surlo, vl), vl);
-    if (idx >= 0) {
-      last = idx > 0 ? simdutf_byteflip<bflip>(src[idx - 1]) : last;
-      return result(error_code::SURROGATE,
-                    src - beg + idx - (last - 0xD800u < 0x400u));
-      break;
-    }
-  }
-  if (last - 0xD800u < 0x400u) {
-    return result(error_code::SURROGATE,
-                  src - beg - 1); /* end on high surrogate */
-  } else {
-    return result(error_code::SUCCESS, src - beg);
-  }
-}
+      // ________ 1000____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 |
+          OVERLONG_4,
+      // ________ 1001____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+      // ________ 101_____
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+      TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
 
-simdutf_warn_unused result implementation::validate_utf16le_with_errors(
-    const char16_t *src, size_t len) const noexcept {
-  return rvv_validate_utf16_with_errors<simdutf_ByteFlip::NONE>(src, len);
+      // ________ 11______
+      TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
+  return (byte_1_high & byte_1_low & byte_2_high);
 }
-
-simdutf_warn_unused result implementation::validate_utf16be_with_errors(
-    const char16_t *src, size_t len) const noexcept {
-  if (supports_zvbb())
-    return rvv_validate_utf16_with_errors<simdutf_ByteFlip::ZVBB>(src, len);
-  else
-    return rvv_validate_utf16_with_errors<simdutf_ByteFlip::V>(src, len);
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
 }
 
-simdutf_warn_unused bool
-implementation::validate_utf32(const char32_t *src, size_t len) const noexcept {
-  size_t vlmax = __riscv_vsetvlmax_e32m8();
-  vuint32m8_t max = __riscv_vmv_v_x_u32m8(0x10FFFF, vlmax);
-  vuint32m8_t maxOff = __riscv_vmv_v_x_u32m8(0xFFFFF7FF, vlmax);
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
-    vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
-    max = __riscv_vmaxu_vv_u32m8_tu(max, max, v, vl);
-    maxOff = __riscv_vmaxu_vv_u32m8_tu(maxOff, maxOff, off, vl);
+struct validating_transcoder {
+  // If this is nonzero, there has been a UTF-8 error.
+  simd8<uint8_t> error;
+
+  validating_transcoder() : error(uint8_t(0)) {}
+  //
+  // Check whether the current bytes are valid UTF-8.
+  //
+  simdutf_really_inline void check_utf8_bytes(const simd8<uint8_t> input,
+                                              const simd8<uint8_t> prev_input) {
+    // Flip prev1...prev3 so we can easily determine if they are 2+, 3+ or 4+
+    // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
+    // small negative numbers)
+    simd8<uint8_t> prev1 = input.prev<1>(prev_input);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
-  return __riscv_vfirst_m_b4(
-             __riscv_vmor_mm_b4(
-                 __riscv_vmsne_vx_u32m8_b4(max, 0x10FFFF, vlmax),
-                 __riscv_vmsne_vx_u32m8_b4(maxOff, 0xFFFFF7FF, vlmax), vlmax),
-             vlmax) < 0;
-}
 
-simdutf_warn_unused result implementation::validate_utf32_with_errors(
-    const char32_t *src, size_t len) const noexcept {
-  const char32_t *beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl) {
-    vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
-    vuint32m8_t off = __riscv_vadd_vx_u32m8(v, 0xFFFF2000, vl);
-    long idx1 =
-        __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 0x10FFFF, vl), vl);
-    long idx2 = __riscv_vfirst_m_b4(
-        __riscv_vmsgtu_vx_u32m8_b4(off, 0xFFFFF7FF, vl), vl);
-    if (idx1 >= 0 && idx2 >= 0) {
-      if (idx1 <= idx2) {
-        return result(error_code::TOO_LARGE, src - beg + idx1);
+  simdutf_really_inline size_t convert(const char *in, size_t size,
+                                       char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 16 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
+    }
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
       } else {
-        return result(error_code::SURROGATE, src - beg + idx2);
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
       }
     }
-    if (idx1 >= 0) {
-      return result(error_code::TOO_LARGE, src - beg + idx1);
+    if (errors()) {
+      return 0;
     }
-    if (idx2 >= 0) {
-      return result(error_code::SURROGATE, src - beg + idx2);
+    if (pos < size) {
+      size_t howmany =
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      if (howmany == 0) {
+        return 0;
+      }
+      utf32_output += howmany;
     }
+    return utf32_output - start;
   }
-  return result(error_code::SUCCESS, src - beg);
-}
-/* end file src/rvv/rvv_validate.inl.cpp */
-
-/* begin file src/rvv/rvv_latin1_to.inl.cpp */
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
-    const char *src, size_t len, char *dst) const noexcept {
-  char *beg = dst;
-  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
-    vbool4_t nascii =
-        __riscv_vmslt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v1), 0, vl);
-    size_t cnt = __riscv_vcpop_m_b4(nascii, vl);
-    vlOut = vl + cnt;
-    if (cnt == 0) {
-      __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
-      continue;
+  simdutf_really_inline result convert_with_errors(const char *in, size_t size,
+                                                   char32_t *utf32_output) {
+    size_t pos = 0;
+    char32_t *start{utf32_output};
+    // In the worst case, we have the haswell kernel which can cause an overflow
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // last 16 bytes, and if the data is valid, then it is entirely safe because
+    // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
+    // generally assume that you have valid UTF-8 input, so we are going to go
+    // back from the end counting 8 leading bytes, to give us a good margin.
+    size_t leading_byte = 0;
+    size_t margin = size;
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-
-    vuint8m2_t v0 =
-        __riscv_vor_vx_u8m2(__riscv_vsrl_vx_u8m2(v1, 6, vl), 0b11000000, vl);
-    v1 = __riscv_vand_vx_u8m2_mu(nascii, v1, v1, 0b10111111, vl);
-
-    vuint8m4_t wide =
-        __riscv_vreinterpret_v_u16m4_u8m4(__riscv_vwmaccu_vx_u16m4(
-            __riscv_vwaddu_vv_u16m4(v0, v1, vl), 0xFF, v1, vl));
-    vbool2_t mask = __riscv_vmsgtu_vx_u8m4_b2(
-        __riscv_vsub_vx_u8m4(wide, 0b11000000, vl * 2), 1, vl * 2);
-    vuint8m4_t comp = __riscv_vcompress_vm_u8m4(wide, mask, vl * 2);
-
-    __riscv_vse8_v_u8m4((uint8_t *)dst, comp, vlOut);
-  }
-  return dst - beg;
-}
-
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  char16_t *beg = dst;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e8m4(len);
-    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
-    __riscv_vse16_v_u16m8((uint16_t *)dst, __riscv_vzext_vf2_u16m8(v, vl), vl);
-  }
-  return dst - beg;
-}
-
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  char16_t *beg = dst;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e8m4(len);
-    vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t *)src, vl);
-    __riscv_vse16_v_u16m8(
-        (uint16_t *)dst,
-        __riscv_vsll_vx_u16m8(__riscv_vzext_vf2_u16m8(v, vl), 8, vl), vl);
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
+    const size_t safety_margin = size - margin + 1; // to avoid overruns!
+    while (pos + 64 + safety_margin <= size) {
+      simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+      if (input.is_ascii()) {
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
+        pos += 64;
+      } else {
+        // you might think that a for-loop would work, but under Visual Studio,
+        // it is not good enough.
+        static_assert(
+            (simd8x64<uint8_t>::NUM_CHUNKS == 2) ||
+                (simd8x64<uint8_t>::NUM_CHUNKS == 4),
+            "We support either two or four chunks per 64-byte block.");
+        auto zero = simd8<uint8_t>{uint8_t(0)};
+        if (simd8x64<uint8_t>::NUM_CHUNKS == 2) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+        } else if (simd8x64<uint8_t>::NUM_CHUNKS == 4) {
+          this->check_utf8_bytes(input.chunks[0], zero);
+          this->check_utf8_bytes(input.chunks[1], input.chunks[0]);
+          this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
+          this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
+        }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
+          res.count += pos;
+          return res;
+        }
+        uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+        uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+        // We process in blocks of up to 12 bytes except possibly
+        // for fast paths which may process up to 16 bytes. For the
+        // slow path to work, we should have at least 12 input bytes left.
+        size_t max_starting_point = (pos + 64) - 12;
+        // Next loop is going to run at least five times.
+        while (pos < max_starting_point) {
+          // Performance note: our ability to compute 'consumed' and
+          // then shift and recompute is critical. If there is a
+          // latency of, say, 4 cycles on getting 'consumed', then
+          // the inner loop might have a total latency of about 6 cycles.
+          // Yet we process between 6 to 12 inputs bytes, thus we get
+          // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+          // for this section of the code. Hence, there is a limit
+          // to how much we can further increase this latency before
+          // it seriously harms performance.
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          pos += consumed;
+          utf8_end_of_code_point_mask >>= consumed;
+        }
+        // At this point there may remain between 0 and 12 bytes in the
+        // 64-byte block. These bytes will be processed again. So we have an
+        // 80% efficiency (in the worst case). In practice we expect an
+        // 85% to 90% efficiency.
+      }
+    }
+    if (errors()) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      res.count += pos;
+      return res;
+    }
+    if (pos < size) {
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
+      if (res.error) { // In case of error, we want the error position
+        res.count += pos;
+        return res;
+      } else { // In case of success, we want the number of word written
+        utf32_output += res.count;
+      }
+    }
+    return result(error_code::SUCCESS, utf32_output - start);
   }
-  return dst - beg;
-}
 
-simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
-    const char *src, size_t len, char32_t *dst) const noexcept {
-  char32_t *beg = dst;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
-    __riscv_vse32_v_u32m8((uint32_t *)dst, __riscv_vzext_vf4_u32m8(v, vl), vl);
+  simdutf_really_inline bool errors() const {
+    return this->error.any_bits_set_anywhere();
   }
-  return dst - beg;
-}
-/* end file src/rvv/rvv_latin1_to.inl.cpp */
-/* begin file src/rvv/rvv_utf16_to.inl.cpp */
-#include <cstdio>
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static result
-rvv_utf16_to_latin1_with_errors(const char16_t *src, size_t len, char *dst) {
-  const char16_t *const beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    v = simdutf_byteflip<bflip>(v, vl);
-    long idx = __riscv_vfirst_m_b2(__riscv_vmsgtu_vx_u16m8_b2(v, 255, vl), vl);
-    if (idx >= 0)
-      return result(error_code::TOO_LARGE, src - beg + idx);
-    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
-  }
-  return result(error_code::SUCCESS, src - beg);
-}
+}; // struct utf8_checker
+} // namespace utf8_to_utf32
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf16le_to_latin1_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
-}
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf16be_to_latin1_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
-}
+// other functions
+/* begin file src/generic/utf8.h */
 
-simdutf_warn_unused result
-implementation::convert_utf16le_to_latin1_with_errors(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
-}
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf8 {
 
-simdutf_warn_unused result
-implementation::convert_utf16be_to_latin1_with_errors(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
-                                                                   dst);
-  else
-    return rvv_utf16_to_latin1_with_errors<simdutf_ByteFlip::V>(src, len, dst);
-}
+using namespace simd;
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  const char16_t *const beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m4(v, vl), vl);
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
   }
-  return src - beg;
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  const char16_t *const beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    __riscv_vse8_v_u8m4((uint8_t *)dst, __riscv_vnsrl_wx_u8m4(v, 8, vl), vl);
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
   }
-  return src - beg;
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
 }
+} // namespace utf8
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf8.h */
+/* begin file src/generic/utf16.h */
+namespace simdutf {
+namespace lsx {
+namespace {
+namespace utf16 {
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static result
-rvv_utf16_to_utf8_with_errors(const char16_t *src, size_t len, char *dst) {
-  size_t n = len;
-  const char16_t *srcBeg = src;
-  const char *dstBeg = dst;
-  size_t vl8m4 = __riscv_vsetvlmax_e8m4();
-  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
-      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
-
-  for (size_t vl, vlOut; n > 0;) {
-    vl = __riscv_vsetvl_e16m2(n);
-
-    vuint16m2_t v = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
-    v = simdutf_byteflip<bflip>(v, vl);
-    vbool8_t m234 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x80 - 1, vl);
-
-    if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
-      vlOut = vl;
-      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(v, vlOut),
-                          vlOut);
-      n -= vl, src += vl, dst += vlOut;
-      continue;
+template <endianness big_endian>
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
+  }
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+}
 
-    vbool8_t m34 = __riscv_vmsgtu_vx_u16m2_b8(v, 0x800 - 1, vl);
-
-    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
-      /* 0: [     aaa|aabbbbbb]
-       * 1: [aabbbbbb|        ] vsll 8
-       * 2: [        |   aaaaa] vsrl 6
-       * 3: [00111111|00011111]
-       * 4: [  bbbbbb|000aaaaa] (1|2)&3
-       * 5: [11000000|11000000]
-       * 6: [10bbbbbb|110aaaaa] 4|5 */
-      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
-          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(v, 8, vl),
-                               __riscv_vsrl_vx_u16m2(v, 6, vl), vl),
-          0b0011111100011111, vl);
-      vuint16m2_t vout16 =
-          __riscv_vor_vx_u16m2_mu(m234, v, twoByte, 0b1000000011000000, vl);
-      vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
-
-      /* Every high byte that is zero should be compressed
-       * low bytes should never be compressed, so we set them
-       * to all ones, and then create a non-zero bytes mask */
-      vbool4_t mcomp =
-          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
-                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
-                                   0, vl * 2);
-      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
-
-      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
-      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
-
-      n -= vl, src += vl, dst += vlOut;
-      continue;
+template <endianness big_endian>
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
 
-    vbool8_t sur = __riscv_vmseq_vx_u16m2_b8(
-        __riscv_vand_vx_u16m2(v, 0xF800, vl), 0xD800, vl);
-    long first = __riscv_vfirst_m_b8(sur, vl);
-    size_t tail = vl - first;
-    vl = first < 0 ? vl : first;
-
-    if (vl > 0) { /* 1/2/3 byte utf8 */
-      /* in: [aaaabbbb|bbcccccc]
-       * v1: [0bcccccc|        ] vsll  8
-       * v1: [10cccccc|        ] vsll  8 & 0b00111111 | 0b10000000
-       * v2: [        |110bbbbb] vsrl  6 & 0b00111111 | 0b11000000
-       * v2: [        |10bbbbbb] vsrl  6 & 0b00111111 | 0b10000000
-       * v3: [        |1110aaaa] vsrl 12 | 0b11100000
-       *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
-       *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
-       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
-       * [10cccccc]
-       */
-      vuint16m2_t v1, v2, v3, v12;
-      v1 = __riscv_vor_vx_u16m2_mu(
-          m234, v, __riscv_vand_vx_u16m2(v, 0b00111111, vl), 0b10000000, vl);
-      v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
-
-      v2 = __riscv_vor_vx_u16m2(
-          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 6, vl), 0b00111111,
-                                vl),
-          0b10000000, vl);
-      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
-                                   0b01000000, vl);
-      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(v, 12, vl), 0b11100000,
-                                vl);
-      v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
+  }
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
+}
 
-      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
-      vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
-      vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
+template <endianness big_endian>
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
+}
 
-      vbool2_t mcomp = __riscv_vmor_mm_b2(
-          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
-      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
+  size_t pos = 0;
 
-      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
-      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
+  while (pos < size / 32 * 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    input.swap_bytes();
+    input.store(reinterpret_cast<uint16_t *>(output));
+    pos += 32;
+    output += 32;
+  }
 
-      n -= vl, src += vl, dst += vlOut;
-    }
+  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
+}
 
-    if (tail)
-      while (n) {
-        uint16_t word = simdutf_byteflip<bflip>(src[0]);
-        if ((word & 0xFF80) == 0) {
-          break;
-        } else if ((word & 0xF800) == 0) {
-          break;
-        } else if ((word & 0xF800) != 0xD800) {
-          break;
-        } else {
-          // must be a surrogate pair
-          if (n <= 1)
-            return result(error_code::SURROGATE, src - srcBeg);
-          uint16_t diff = word - 0xD800;
-          if (diff > 0x3FF)
-            return result(error_code::SURROGATE, src - srcBeg);
-          uint16_t diff2 = simdutf_byteflip<bflip>(src[1]) - 0xDC00;
-          if (diff2 > 0x3FF)
-            return result(error_code::SURROGATE, src - srcBeg);
+} // namespace utf16
+} // unnamed namespace
+} // namespace lsx
+} // namespace simdutf
+/* end file src/generic/utf16.h */
 
-          uint32_t value = ((diff + 0x40) << 10) + diff2;
+//
+// Implementation-specific overrides
+//
+namespace simdutf {
+namespace lsx {
 
-          // will generate four UTF-8 bytes
-          // we have 0b11110XXX 0b10XXXXXX 0b10XXXXXX 0b10XXXXXX
-          *dst++ = (char)((value >> 18) | 0b11110000);
-          *dst++ = (char)(((value >> 12) & 0b111111) | 0b10000000);
-          *dst++ = (char)(((value >> 6) & 0b111111) | 0b10000000);
-          *dst++ = (char)((value & 0b111111) | 0b10000000);
-          src += 2;
-          n -= 2;
-        }
-      }
+simdutf_warn_unused int
+implementation::detect_encodings(const char *input,
+                                 size_t length) const noexcept {
+  // If there is a BOM, then we trust it.
+  auto bom_encoding = simdutf::BOM::check_bom(input, length);
+  // todo: reimplement as a one-pass algorithm.
+  if (bom_encoding != encoding_type::unspecified) {
+    return bom_encoding;
   }
-
-  return result(error_code::SUCCESS, dst - dstBeg);
+  int out = 0;
+  if (validate_utf8(input, length)) {
+    out |= encoding_type::UTF8;
+  }
+  if ((length % 2) == 0) {
+    if (validate_utf16le(reinterpret_cast<const char16_t *>(input),
+                         length / 2)) {
+      out |= encoding_type::UTF16_LE;
+    }
+  }
+  if ((length % 4) == 0) {
+    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4)) {
+      out |= encoding_type::UTF32_LE;
+    }
+  }
+  return out;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf16le_to_utf8_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused bool
+implementation::validate_utf8(const char *buf, size_t len) const noexcept {
+  return lsx::utf8_validation::generic_validate_utf8(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf16be_to_utf8_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused result implementation::validate_utf8_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return lsx::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
+simdutf_warn_unused bool
+implementation::validate_ascii(const char *buf, size_t len) const noexcept {
+  return lsx::utf8_validation::generic_validate_ascii(buf, len);
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::ZVBB>(src, len, dst);
-  else
-    return rvv_utf16_to_utf8_with_errors<simdutf_ByteFlip::V>(src, len, dst);
+simdutf_warn_unused result implementation::validate_ascii_with_errors(
+    const char *buf, size_t len) const noexcept {
+  return lsx::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  return convert_utf16le_to_utf8(src, len, dst);
+simdutf_warn_unused bool
+implementation::validate_utf16le(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char16_t *tail = lsx_validate_utf16<endianness::LITTLE>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::LITTLE>(tail,
+                                                       len - (tail - buf));
+  } else {
+    return false;
+  }
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
-    const char16_t *src, size_t len, char *dst) const noexcept {
-  return convert_utf16be_to_utf8(src, len, dst);
+simdutf_warn_unused bool
+implementation::validate_utf16be(const char16_t *buf,
+                                 size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char16_t *tail = lsx_validate_utf16<endianness::BIG>(buf, len);
+  if (tail) {
+    return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
 }
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static result
-rvv_utf16_to_utf32_with_errors(const char16_t *src, size_t len, char32_t *dst) {
-  const char16_t *const srcBeg = src;
-  char32_t *const dstBeg = dst;
-
-  constexpr const uint16_t ANY_SURROGATE_MASK = 0xf800;
-  constexpr const uint16_t ANY_SURROGATE_VALUE = 0xd800;
-  constexpr const uint16_t LO_SURROGATE_MASK = 0xfc00;
-  constexpr const uint16_t LO_SURROGATE_VALUE = 0xdc00;
-  constexpr const uint16_t HI_SURROGATE_MASK = 0xfc00;
-  constexpr const uint16_t HI_SURROGATE_VALUE = 0xd800;
-
-  uint16_t last = 0;
-  while (len > 0) {
-    size_t vl = __riscv_vsetvl_e16m2(len);
-    vuint16m2_t v0 = __riscv_vle16_v_u16m2((uint16_t const *)src, vl);
-    v0 = simdutf_byteflip<bflip>(v0, vl);
-
-    { // check fast-path
-      const vuint16m2_t v = __riscv_vand_vx_u16m2(v0, ANY_SURROGATE_MASK, vl);
-      const vbool8_t any_surrogate =
-          __riscv_vmseq_vx_u16m2_b8(v, ANY_SURROGATE_VALUE, vl);
-      if (__riscv_vfirst_m_b8(any_surrogate, vl) < 0) {
-        /* no surrogates */
-        __riscv_vse32_v_u32m4((uint32_t *)dst, __riscv_vzext_vf2_u32m4(v0, vl),
-                              vl);
-        len -= vl;
-        src += vl;
-        dst += vl;
-        continue;
-      }
-    }
-
-    if ((simdutf_byteflip<bflip>(src[0]) & LO_SURROGATE_MASK) ==
-        LO_SURROGATE_VALUE) {
-      return result(error_code::SURROGATE, src - srcBeg);
-    }
-
-    // decode surrogates
-    vuint16m2_t v1 = __riscv_vslide1down_vx_u16m2(v0, 0, vl);
-    vl = __riscv_vsetvl_e16m2(vl - 1);
-    if (vl == 0) {
-      return result(error_code::SURROGATE, src - srcBeg);
-    }
-
-    const vbool8_t surhi = __riscv_vmseq_vx_u16m2_b8(
-        __riscv_vand_vx_u16m2(v0, HI_SURROGATE_MASK, vl), HI_SURROGATE_VALUE,
-        vl);
-    const vbool8_t surlo = __riscv_vmseq_vx_u16m2_b8(
-        __riscv_vand_vx_u16m2(v1, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
-        vl);
-
-    // compress everything but lo surrogates
-    const vbool8_t compress = __riscv_vmsne_vx_u16m2_b8(
-        __riscv_vand_vx_u16m2(v0, LO_SURROGATE_MASK, vl), LO_SURROGATE_VALUE,
-        vl);
-
-    {
-      const vbool8_t diff = __riscv_vmxor_mm_b8(surhi, surlo, vl);
-      const long idx = __riscv_vfirst_m_b8(diff, vl);
-      if (idx >= 0) {
-        uint16_t word = simdutf_byteflip<bflip>(src[idx]);
-        if (word < 0xD800 || word > 0xDBFF) {
-          return result(error_code::SURROGATE, src - srcBeg + idx + 1);
-        }
-        return result(error_code::SURROGATE, src - srcBeg + idx);
-      }
-    }
-
-    last = simdutf_byteflip<bflip>(src[vl]);
-    vuint32m4_t utf32 = __riscv_vzext_vf2_u32m4(v0, vl);
-
-    // v0 = 110110yyyyyyyyyy (0xd800 + yyyyyyyyyy) --- hi surrogate
-    // v1 = 110111xxxxxxxxxx (0xdc00 + xxxxxxxxxx) --- lo surrogate
-
-    // t0 = u16(                    0000_00yy_yyyy_yyyy)
-    const vuint32m4_t t0 =
-        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v0, 0x03ff, vl), vl);
-    // t1 = u32(0000_0000_0000_yyyy_yyyy_yy00_0000_0000)
-    const vuint32m4_t t1 = __riscv_vsll_vx_u32m4(t0, 10, vl);
-
-    // t2 = u32(0000_0000_0000_0000_0000_00xx_xxxx_xxxx)
-    const vuint32m4_t t2 =
-        __riscv_vzext_vf2_u32m4(__riscv_vand_vx_u16m2(v1, 0x03ff, vl), vl);
-
-    // t3 = u32(0000_0000_0000_yyyy_yyyy_yyxx_xxxx_xxxx)
-    const vuint32m4_t t3 = __riscv_vor_vv_u32m4(t1, t2, vl);
+simdutf_warn_unused result implementation::validate_utf16le_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = lsx_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
 
-    // t4 = utf32 from surrogate pairs
-    const vuint32m4_t t4 = __riscv_vadd_vx_u32m4(t3, 0x10000, vl);
+simdutf_warn_unused result implementation::validate_utf16be_with_errors(
+    const char16_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = lsx_validate_utf16_with_errors<endianness::BIG>(buf, len);
+  if (res.count != len) {
+    result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
+        buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
 
-    const vuint32m4_t result = __riscv_vmerge_vvm_u32m4(utf32, t4, surhi, vl);
+simdutf_warn_unused bool
+implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    // empty input is valid. protected the implementation from nullptr.
+    return true;
+  }
+  const char32_t *tail = lsx_validate_utf32le(buf, len);
+  if (tail) {
+    return scalar::utf32::validate(tail, len - (tail - buf));
+  } else {
+    return false;
+  }
+}
 
-    const vuint32m4_t comp = __riscv_vcompress_vm_u32m4(result, compress, vl);
-    const size_t vlOut = __riscv_vcpop_m_b8(compress, vl);
-    __riscv_vse32_v_u32m4((uint32_t *)dst, comp, vlOut);
+simdutf_warn_unused result implementation::validate_utf32_with_errors(
+    const char32_t *buf, size_t len) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = lsx_validate_utf32le_with_errors(buf, len);
+  if (res.count != len) {
+    result scalar_res =
+        scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
+    return result(scalar_res.error, res.count + scalar_res.count);
+  } else {
+    return res;
+  }
+}
 
-    len -= vl;
-    src += vl;
-    dst += vlOut;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
+    const char *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char *, char *> ret =
+      lsx_convert_latin1_to_utf8(buf, len, utf8_output);
+  size_t converted_chars = ret.second - utf8_output;
 
-    if ((last & LO_SURROGATE_MASK) == LO_SURROGATE_VALUE) {
-      // last item is lo surrogate and got already consumed
-      len -= 1;
-      src += 1;
-    }
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
   }
-
-  return result(error_code::SUCCESS, dst - dstBeg);
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  result res = convert_utf16le_to_utf32_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      lsx_convert_latin1_to_utf16le(buf, len, utf16_output);
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  result res = convert_utf16be_to_utf32_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char *, char16_t *> ret =
+      lsx_convert_latin1_to_utf16be(buf, len, utf16_output);
+  size_t converted_chars = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars =
+        scalar::latin1_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::NONE>(src, len, dst);
+simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char *, char32_t *> ret =
+      lsx_convert_latin1_to_utf32(buf, len, utf32_output);
+  size_t converted_chars = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    converted_chars += scalar_converted_chars;
+  }
+  return converted_chars;
 }
 
-simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::ZVBB>(src, len,
-                                                                  dst);
-  else
-    return rvv_utf16_to_utf32_with_errors<simdutf_ByteFlip::V>(src, len, dst);
+simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  return convert_utf16le_to_utf32(src, len, dst);
+simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  utf8_to_latin1::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, latin1_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
-    const char16_t *src, size_t len, char32_t *dst) const noexcept {
-  return convert_utf16be_to_utf32(src, len, dst);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
+    const char *buf, size_t len, char *latin1_output) const noexcept {
+  return lsx::utf8_to_latin1::convert_valid(buf, len, latin1_output);
 }
-/* end file src/rvv/rvv_utf16_to.inl.cpp */
-/* begin file src/rvv/rvv_utf32_to.inl.cpp */
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf32_to_latin1_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::LITTLE>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  const char32_t *const beg = src;
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e32m8(len);
-    vuint32m8_t v = __riscv_vle32_v_u32m8((uint32_t *)src, vl);
-    long idx = __riscv_vfirst_m_b4(__riscv_vmsgtu_vx_u32m8_b4(v, 255, vl), vl);
-    if (idx >= 0)
-      return result(error_code::TOO_LARGE, src - beg + idx);
-    /* We don't use vcompress here, because its performance varies widely on
-     * current platforms. This might be worth reconsidering once there is more
-     * hardware available. */
-    __riscv_vse8_v_u8m2(
-        (uint8_t *)dst,
-        __riscv_vncvt_x_x_w_u8m2(__riscv_vncvt_x_x_w_u16m4(v, vl), vl), vl);
-  }
-  return result(error_code::SUCCESS, src - beg);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert<endianness::BIG>(buf, len, utf16_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  return convert_utf32_to_latin1(src, len, dst);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::LITTLE>(buf, len,
+                                                           utf16_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  size_t n = len;
-  const char32_t *srcBeg = src;
-  const char *dstBeg = dst;
-  size_t vl8m4 = __riscv_vsetvlmax_e8m4();
-  vbool2_t m4mulp2 = __riscv_vmseq_vx_u8m4_b2(
-      __riscv_vand_vx_u8m4(__riscv_vid_v_u8m4(vl8m4), 3, vl8m4), 2, vl8m4);
-
-  for (size_t vl, vlOut; n > 0;) {
-    vl = __riscv_vsetvl_e32m4(n);
+simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
+    const char *buf, size_t len, char16_t *utf16_output) const noexcept {
+  utf8_to_utf16::validating_transcoder converter;
+  return converter.convert_with_errors<endianness::BIG>(buf, len, utf16_output);
+}
 
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t const *)src, vl);
-    vbool8_t m234 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x80 - 1, vl);
-    vuint16m2_t vn = __riscv_vncvt_x_x_w_u16m2(v, vl);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::LITTLE>(input, size,
+                                                          utf16_output);
+}
 
-    if (__riscv_vfirst_m_b8(m234, vl) < 0) { /* 1 byte utf8 */
-      vlOut = vl;
-      __riscv_vse8_v_u8m1((uint8_t *)dst, __riscv_vncvt_x_x_w_u8m1(vn, vlOut),
-                          vlOut);
-      n -= vl, src += vl, dst += vlOut;
-      continue;
-    }
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
+    const char *input, size_t size, char16_t *utf16_output) const noexcept {
+  return utf8_to_utf16::convert_valid<endianness::BIG>(input, size,
+                                                       utf16_output);
+}
 
-    vbool8_t m34 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x800 - 1, vl);
+simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert(buf, len, utf32_output);
+}
 
-    if (__riscv_vfirst_m_b8(m34, vl) < 0) { /* 1/2 byte utf8 */
-      /* 0: [     aaa|aabbbbbb]
-       * 1: [aabbbbbb|        ] vsll 8
-       * 2: [        |   aaaaa] vsrl 6
-       * 3: [00111111|00111111]
-       * 4: [  bbbbbb|000aaaaa] (1|2)&3
-       * 5: [10000000|11000000]
-       * 6: [10bbbbbb|110aaaaa] 4|5 */
-      vuint16m2_t twoByte = __riscv_vand_vx_u16m2(
-          __riscv_vor_vv_u16m2(__riscv_vsll_vx_u16m2(vn, 8, vl),
-                               __riscv_vsrl_vx_u16m2(vn, 6, vl), vl),
-          0b0011111100111111, vl);
-      vuint16m2_t vout16 =
-          __riscv_vor_vx_u16m2_mu(m234, vn, twoByte, 0b1000000011000000, vl);
-      vuint8m2_t vout = __riscv_vreinterpret_v_u16m2_u8m2(vout16);
+simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
+    const char *buf, size_t len, char32_t *utf32_output) const noexcept {
+  utf8_to_utf32::validating_transcoder converter;
+  return converter.convert_with_errors(buf, len, utf32_output);
+}
 
-      /* Every high byte that is zero should be compressed
-       * low bytes should never be compressed, so we set them
-       * to all ones, and then create a non-zero bytes mask */
-      vbool4_t mcomp =
-          __riscv_vmsne_vx_u8m2_b4(__riscv_vreinterpret_v_u16m2_u8m2(
-                                       __riscv_vor_vx_u16m2(vout16, 0xFF, vl)),
-                                   0, vl * 2);
-      vlOut = __riscv_vcpop_m_b4(mcomp, vl * 2);
+simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
+    const char *input, size_t size, char32_t *utf32_output) const noexcept {
+  return utf8_to_utf32::convert_valid(input, size, utf32_output);
+}
 
-      vout = __riscv_vcompress_vm_u8m2(vout, mcomp, vl * 2);
-      __riscv_vse8_v_u8m2((uint8_t *)dst, vout, vlOut);
+simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      lsx_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
 
-      n -= vl, src += vl, dst += vlOut;
-      continue;
-    }
-    long idx1 =
-        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
-    vbool8_t sur = __riscv_vmseq_vx_u32m4_b8(
-        __riscv_vand_vx_u32m4(v, 0xFFFFF800, vl), 0xD800, vl);
-    long idx2 = __riscv_vfirst_m_b8(sur, vl);
-    if (idx1 >= 0 && idx2 >= 0) {
-      if (idx1 <= idx2) {
-        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
-      } else {
-        return result(error_code::SURROGATE, src - srcBeg + idx2);
-      }
-    }
-    if (idx1 >= 0) {
-      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
-    }
-    if (idx2 >= 0) {
-      return result(error_code::SURROGATE, src - srcBeg + idx2);
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    vbool8_t m4 = __riscv_vmsgtu_vx_u32m4_b8(v, 0x10000 - 1, vl);
-    long first = __riscv_vfirst_m_b8(m4, vl);
-    size_t tail = vl - first;
-    vl = first < 0 ? vl : first;
+simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      lsx_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
 
-    if (vl > 0) { /* 1/2/3 byte utf8 */
-      /* vn: [aaaabbbb|bbcccccc]
-       * v1: [0bcccccc|        ] vsll  8
-       * v1: [10cccccc|        ] vsll  8 & 0b00111111 | 0b10000000
-       * v2: [        |110bbbbb] vsrl  6 & 0b00111111 | 0b11000000
-       * v2: [        |10bbbbbb] vsrl  6 & 0b00111111 | 0b10000000
-       * v3: [        |1110aaaa] vsrl 12 | 0b11100000
-       *  1: [00000000|0bcccccc|00000000|00000000] => [0bcccccc]
-       *  2: [00000000|10cccccc|110bbbbb|00000000] => [110bbbbb] [10cccccc]
-       *  3: [00000000|10cccccc|10bbbbbb|1110aaaa] => [1110aaaa] [10bbbbbb]
-       * [10cccccc]
-       */
-      vuint16m2_t v1, v2, v3, v12;
-      v1 = __riscv_vor_vx_u16m2_mu(
-          m234, vn, __riscv_vand_vx_u16m2(vn, 0b00111111, vl), 0b10000000, vl);
-      v1 = __riscv_vsll_vx_u16m2(v1, 8, vl);
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_latin1::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-      v2 = __riscv_vor_vx_u16m2(
-          __riscv_vand_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 6, vl), 0b00111111,
-                                vl),
-          0b10000000, vl);
-      v2 = __riscv_vor_vx_u16m2_mu(__riscv_vmnot_m_b8(m34, vl), v2, v2,
-                                   0b01000000, vl);
-      v3 = __riscv_vor_vx_u16m2(__riscv_vsrl_vx_u16m2(vn, 12, vl), 0b11100000,
-                                vl);
-      v12 = __riscv_vor_vv_u16m2_mu(m234, v1, v1, v2, vl);
+simdutf_warn_unused result
+implementation::convert_utf16le_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      lsx_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+          buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-      vuint32m4_t w12 = __riscv_vwmulu_vx_u32m4(v12, 1 << 8, vl);
-      vuint32m4_t w123 = __riscv_vwaddu_wv_u32m4_mu(m34, w12, w12, v3, vl);
-      vuint8m4_t vout = __riscv_vreinterpret_v_u32m4_u8m4(w123);
+simdutf_warn_unused result
+implementation::convert_utf16be_to_latin1_with_errors(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      lsx_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                               latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_latin1::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-      vbool2_t mcomp = __riscv_vmor_mm_b2(
-          m4mulp2, __riscv_vmsne_vx_u8m4_b2(vout, 0, vl * 4), vl * 4);
-      vlOut = __riscv_vcpop_m_b2(mcomp, vl * 4);
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf16be_to_latin1(buf, len, latin1_output);
+}
 
-      vout = __riscv_vcompress_vm_u8m4(vout, mcomp, vl * 4);
-      __riscv_vse8_v_u8m4((uint8_t *)dst, vout, vlOut);
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
+    const char16_t *buf, size_t len, char *latin1_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf16le_to_latin1(buf, len, latin1_output);
+}
 
-      n -= vl, src += vl, dst += vlOut;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      lsx_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-
-    if (tail)
-      while (n) {
-        uint32_t word = src[0];
-        if (word < 0x10000)
-          break;
-        if (word > 0x10FFFF)
-          return result(error_code::TOO_LARGE, src - srcBeg);
-        *dst++ = (uint8_t)((word >> 18) | 0b11110000);
-        *dst++ = (uint8_t)(((word >> 12) & 0b111111) | 0b10000000);
-        *dst++ = (uint8_t)(((word >> 6) & 0b111111) | 0b10000000);
-        *dst++ = (uint8_t)((word & 0b111111) | 0b10000000);
-        ++src;
-        --n;
-      }
+    saved_bytes += scalar_saved_bytes;
   }
-
-  return result(error_code::SUCCESS, dst - dstBeg);
+  return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  result res = convert_utf32_to_utf8_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  std::pair<const char16_t *, char *> ret =
+      lsx_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf8::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
-    const char32_t *src, size_t len, char *dst) const noexcept {
-  return convert_utf32_to_utf8(src, len, dst);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      lsx_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len,
+                                                                utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static result
-rvv_convert_utf32_to_utf16_with_errors(const char32_t *src, size_t len,
-                                       char16_t *dst) {
-  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
-      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
-  const char16_t *dstBeg = dst;
-  const char32_t *srcBeg = src;
-  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
-    vl = __riscv_vsetvl_e32m4(len);
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
-    vuint32m4_t off = __riscv_vadd_vx_u32m4(v, 0xFFFF2000, vl);
-    long idx1 =
-        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0x10FFFF, vl), vl);
-    long idx2 = __riscv_vfirst_m_b8(
-        __riscv_vmsgtu_vx_u32m4_b8(off, 0xFFFFF7FF, vl), vl);
-    if (idx1 >= 0 && idx2 >= 0) {
-      if (idx1 <= idx2)
-        return result(error_code::TOO_LARGE, src - srcBeg + idx1);
-      return result(error_code::SURROGATE, src - srcBeg + idx2);
-    }
-    if (idx1 >= 0)
-      return result(error_code::TOO_LARGE, src - srcBeg + idx1);
-    if (idx2 >= 0)
-      return result(error_code::SURROGATE, src - srcBeg + idx2);
-    long idx =
-        __riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl);
-    if (idx < 0) {
-      vlOut = vl;
-      vuint16m2_t n =
-          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
-      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
-      continue;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      lsx_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len,
+                                                             utf8_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf8::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
-    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
   }
-  return result(error_code::SUCCESS, dst - dstBeg);
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  result res = convert_utf32_to_utf16le_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16le_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  result res = convert_utf32_to_utf16be_with_errors(src, len, dst);
-  return res.error == error_code::SUCCESS ? res.count : 0;
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
+    const char16_t *buf, size_t len, char *utf8_output) const noexcept {
+  return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::NONE>(
-      src, len, dst);
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return 0;
+  }
+  std::pair<const char32_t *, char *> ret =
+      lsx_convert_utf32_to_utf8(buf, len, utf8_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf8_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_utf8::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
 }
 
-simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::ZVBB>(
-        src, len, dst);
-  else
-    return rvv_convert_utf32_to_utf16_with_errors<simdutf_ByteFlip::V>(src, len,
-                                                                       dst);
+simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char *> ret =
+      lsx_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+  if (ret.first.count != len) {
+    result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf8_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static size_t
-rvv_convert_valid_utf32_to_utf16(const char32_t *src, size_t len,
-                                 char16_t *dst) {
-  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
-      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
-  char16_t *dstBeg = dst;
-  for (size_t vl, vlOut; len > 0; len -= vl, src += vl, dst += vlOut) {
-    vl = __riscv_vsetvl_e32m4(len);
-    vuint32m4_t v = __riscv_vle32_v_u32m4((uint32_t *)src, vl);
-    if (__riscv_vfirst_m_b8(__riscv_vmsgtu_vx_u32m4_b8(v, 0xFFFF, vl), vl) <
-        0) {
-      vlOut = vl;
-      vuint16m2_t n =
-          simdutf_byteflip<bflip>(__riscv_vncvt_x_x_w_u16m2(v, vlOut), vlOut);
-      __riscv_vse16_v_u16m2((uint16_t *)dst, n, vlOut);
-      continue;
+simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      lsx_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
-    vlOut = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, v, vl, m4even);
+    saved_bytes += scalar_saved_bytes;
   }
-  return dst - dstBeg;
+  return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::NONE>(src, len,
-                                                                  dst);
+simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  std::pair<const char16_t *, char32_t *> ret =
+      lsx_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf32_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf16_to_utf32::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
-    const char32_t *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::ZVBB>(src, len,
-                                                                    dst);
-  else
-    return rvv_convert_valid_utf32_to_utf16<simdutf_ByteFlip::V>(src, len, dst);
+simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      lsx_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
-/* end file src/rvv/rvv_utf32_to.inl.cpp */
-/* begin file src/rvv/rvv_utf8_to.inl.cpp */
-template <typename Tdst, simdutf_ByteFlip bflip, bool validate = true>
-simdutf_really_inline static size_t rvv_utf8_to_common(char const *src,
-                                                       size_t len, Tdst *dst) {
-  static_assert(std::is_same<Tdst, uint16_t>() ||
-                    std::is_same<Tdst, uint32_t>(),
-                "invalid type");
-  constexpr bool is16 = std::is_same<Tdst, uint16_t>();
-  constexpr endianness endian =
-      bflip == simdutf_ByteFlip::NONE ? endianness::LITTLE : endianness::BIG;
-  const auto scalar = [](char const *in, size_t count, Tdst *out) {
-    return is16 ? scalar::utf8_to_utf16::convert<endian>(in, count,
-                                                         (char16_t *)out)
-                : scalar::utf8_to_utf32::convert(in, count, (char32_t *)out);
-  };
 
-  if (len < 32)
-    return scalar(src, len, dst);
-
-  /* validate first three bytes */
-  if (validate) {
-    size_t idx = 3;
-    while (idx < len && (src[idx] >> 6) == 0b10)
-      ++idx;
-    if (idx > 3 + 3 || !scalar::utf8::validate(src, idx))
-      return 0;
+simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char32_t *> ret =
+      lsx_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len,
+                                                              utf32_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res =
+        scalar::utf16_to_utf32::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
   }
+  ret.first.count =
+      ret.second -
+      utf32_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  size_t tail = 3;
-  size_t n = len - tail;
-  Tdst *beg = dst;
-
-  static const uint64_t err1m[] = {0x0202020202020202, 0x4915012180808080};
-  static const uint64_t err2m[] = {0xCBCBCB8B8383A3E7, 0xCBCBDBCBCBCBCBCB};
-  static const uint64_t err3m[] = {0x0101010101010101, 0X01010101BABAAEE6};
-
-  const vuint8m1_t err1tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err1m, 2));
-  const vuint8m1_t err2tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err2m, 2));
-  const vuint8m1_t err3tbl =
-      __riscv_vreinterpret_v_u64m1_u8m1(__riscv_vle64_v_u64m1(err3m, 2));
-
-  size_t vl8m2 = __riscv_vsetvlmax_e8m2();
-  vbool4_t m4even = __riscv_vmseq_vx_u8m2_b4(
-      __riscv_vand_vx_u8m2(__riscv_vid_v_u8m2(vl8m2), 1, vl8m2), 0, vl8m2);
-
-  for (size_t vl, vlOut; n > 0; n -= vl, src += vl, dst += vlOut) {
-    vl = __riscv_vsetvl_e8m2(n);
-
-    vuint8m2_t v0 = __riscv_vle8_v_u8m2((uint8_t const *)src, vl);
-    uint64_t max = __riscv_vmv_x_s_u8m1_u8(
-        __riscv_vredmaxu_vs_u8m2_u8m1(v0, __riscv_vmv_s_x_u8m1(0, vl), vl));
-
-    uint8_t next0 = src[vl + 0];
-    uint8_t next1 = src[vl + 1];
-    uint8_t next2 = src[vl + 2];
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      lsx_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
 
-    /* fast path: ASCII */
-    if ((max | next0 | next1 | next2) < 0b10000000) {
-      vlOut = vl;
-      if (is16)
-        __riscv_vse16_v_u16m4(
-            (uint16_t *)dst,
-            simdutf_byteflip<bflip>(__riscv_vzext_vf2_u16m4(v0, vlOut), vlOut),
-            vlOut);
-      else
-        __riscv_vse32_v_u32m8((uint32_t *)dst,
-                              __riscv_vzext_vf4_u32m8(v0, vlOut), vlOut);
-      continue;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    /* see "Validating UTF-8 In Less Than One Instruction Per Byte"
-     * https://arxiv.org/abs/2010.03090 */
-    vuint8m2_t v1 = __riscv_vslide1down_vx_u8m2(v0, next0, vl);
-    vuint8m2_t v2 = __riscv_vslide1down_vx_u8m2(v1, next1, vl);
-    vuint8m2_t v3 = __riscv_vslide1down_vx_u8m2(v2, next2, vl);
-
-    if (validate) {
-      vuint8m2_t s1 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
-          __riscv_vreinterpret_v_u8m2_u16m2(v2), 4, __riscv_vsetvlmax_e16m2()));
-      vuint8m2_t s3 = __riscv_vreinterpret_v_u16m2_u8m2(__riscv_vsrl_vx_u16m2(
-          __riscv_vreinterpret_v_u8m2_u16m2(v3), 4, __riscv_vsetvlmax_e16m2()));
-
-      vuint8m2_t idx2 = __riscv_vand_vx_u8m2(v2, 0xF, vl);
-      vuint8m2_t idx1 = __riscv_vand_vx_u8m2(s1, 0xF, vl);
-      vuint8m2_t idx3 = __riscv_vand_vx_u8m2(s3, 0xF, vl);
-
-      vuint8m2_t err1 = simdutf_vrgather_u8m1x2(err1tbl, idx1);
-      vuint8m2_t err2 = simdutf_vrgather_u8m1x2(err2tbl, idx2);
-      vuint8m2_t err3 = simdutf_vrgather_u8m1x2(err3tbl, idx3);
-      vint8m2_t errs = __riscv_vreinterpret_v_u8m2_i8m2(
-          __riscv_vand_vv_u8m2(__riscv_vand_vv_u8m2(err1, err2, vl), err3, vl));
-
-      vbool4_t is_3 = __riscv_vmsgtu_vx_u8m2_b4(v1, 0b11100000 - 1, vl);
-      vbool4_t is_4 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b11110000 - 1, vl);
-      vbool4_t is_34 = __riscv_vmor_mm_b4(is_3, is_4, vl);
-      vbool4_t err34 =
-          __riscv_vmxor_mm_b4(is_34, __riscv_vmslt_vx_i8m2_b4(errs, 0, vl), vl);
-      vbool4_t errm =
-          __riscv_vmor_mm_b4(__riscv_vmsgt_vx_i8m2_b4(errs, 0, vl), err34, vl);
-      if (__riscv_vfirst_m_b4(errm, vl) >= 0)
-        return 0;
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      lsx_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-    /* decoding */
-
-    /* mask of non continuation bytes */
-    vbool4_t m =
-        __riscv_vmsgt_vx_i8m2_b4(__riscv_vreinterpret_v_u8m2_i8m2(v0), -65, vl);
-    vlOut = __riscv_vcpop_m_b4(m, vl);
-
-    /* extract first and second bytes */
-    vuint8m2_t b1 = __riscv_vcompress_vm_u8m2(v0, m, vl);
-    vuint8m2_t b2 = __riscv_vcompress_vm_u8m2(v1, m, vl);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      lsx_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
 
-    /* fast path: one and two byte */
-    if (max < 0b11100000) {
-      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert_valid(
+        ret.first, len - (ret.first - buf), ret.second);
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
-      b1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
+    const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // optimization opportunity: implement a custom function.
+  return convert_utf32_to_utf8(buf, len, utf8_output);
+}
 
-      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
-          b1,
-          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
-                                  vlOut),
-          vlOut);
-      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
-      if (is16)
-        __riscv_vse16_v_u16m4((uint16_t *)dst,
-                              simdutf_byteflip<bflip>(b12, vlOut), vlOut);
-      else
-        __riscv_vse32_v_u32m8((uint32_t *)dst,
-                              __riscv_vzext_vf2_u32m8(b12, vlOut), vlOut);
-      continue;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      lsx_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::LITTLE>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
+  }
 
-    /* fast path: one, two and three byte */
-    if (max < 0b11110000) {
-      vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
-
-      b2 = __riscv_vand_vx_u8m2(b2, 0b00111111, vlOut);
-      b3 = __riscv_vand_vx_u8m2(b3, 0b00111111, vlOut);
-
-      vbool4_t m1 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b10111111, vlOut);
-      vbool4_t m3 = __riscv_vmsgtu_vx_u8m2_b4(b1, 0b11011111, vlOut);
-
-      vuint8m2_t t1 = __riscv_vand_vx_u8m2_mu(m1, b1, b1, 63, vlOut);
-      b1 = __riscv_vand_vx_u8m2_mu(m3, t1, b1, 15, vlOut);
+  return saved_bytes;
+}
 
-      vuint16m4_t b12 = __riscv_vwmulu_vv_u16m4(
-          b1,
-          __riscv_vmerge_vxm_u8m2(__riscv_vmv_v_x_u8m2(1, vlOut), 1 << 6, m1,
-                                  vlOut),
-          vlOut);
-      b12 = __riscv_vwaddu_wv_u16m4_mu(m1, b12, b12, b2, vlOut);
-      vuint16m4_t b123 = __riscv_vwaddu_wv_u16m4_mu(
-          m3, b12, __riscv_vsll_vx_u16m4_mu(m3, b12, b12, 6, vlOut), b3, vlOut);
-      if (is16)
-        __riscv_vse16_v_u16m4((uint16_t *)dst,
-                              simdutf_byteflip<bflip>(b123, vlOut), vlOut);
-      else
-        __riscv_vse32_v_u32m8((uint32_t *)dst,
-                              __riscv_vzext_vf2_u32m8(b123, vlOut), vlOut);
-      continue;
+simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  std::pair<const char32_t *, char16_t *> ret =
+      lsx_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - utf16_output;
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes =
+        scalar::utf32_to_utf16::convert<endianness::BIG>(
+            ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
     }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
 
-    /* extract third and fourth bytes */
-    vuint8m2_t b3 = __riscv_vcompress_vm_u8m2(v2, m, vl);
-    vuint8m2_t b4 = __riscv_vcompress_vm_u8m2(v3, m, vl);
-
-    /* remove prefix from leading bytes
-     *
-     * We could also use vrgather here, but it increases register pressure,
-     * and its performance varies widely on current platforms. It might be
-     * worth reconsidering, though, once there is more hardware available.
-     * Same goes for the __riscv_vsrl_vv_u32m4 correction step.
-     *
-     * We shift left and then right by the number of bytes in the prefix,
-     * which can be calculated as follows:
-     *         x                                max(x-10, 0)
-     * 0xxx -> 0000-0111 -> sift by 0 or 1   -> 0
-     * 10xx -> 1000-1011 -> don't care
-     * 110x -> 1100,1101 -> sift by 3        -> 2,3
-     * 1110 -> 1110      -> sift by 4        -> 4
-     * 1111 -> 1111      -> sift by 5        -> 5
-     *
-     * vssubu.vx v, 10, (max(x-10, 0)) almost gives us what we want, we
-     * just need to manually detect and handle the one special case:
-     */
-#define SIMDUTF_RVV_UTF8_TO_COMMON_M1(idx)                                     \
-  vuint8m1_t c1 = __riscv_vget_v_u8m2_u8m1(b1, idx);                           \
-  vuint8m1_t c2 = __riscv_vget_v_u8m2_u8m1(b2, idx);                           \
-  vuint8m1_t c3 = __riscv_vget_v_u8m2_u8m1(b3, idx);                           \
-  vuint8m1_t c4 = __riscv_vget_v_u8m2_u8m1(b4, idx);                           \
-  /* remove prefix from trailing bytes */                                      \
-  c2 = __riscv_vand_vx_u8m1(c2, 0b00111111, vlOut);                            \
-  c3 = __riscv_vand_vx_u8m1(c3, 0b00111111, vlOut);                            \
-  c4 = __riscv_vand_vx_u8m1(c4, 0b00111111, vlOut);                            \
-  vuint8m1_t shift = __riscv_vsrl_vx_u8m1(c1, 4, vlOut);                       \
-  shift = __riscv_vmerge_vxm_u8m1(__riscv_vssubu_vx_u8m1(shift, 10, vlOut), 3, \
-                                  __riscv_vmseq_vx_u8m1_b8(shift, 12, vlOut),  \
-                                  vlOut);                                      \
-  c1 = __riscv_vsll_vv_u8m1(c1, shift, vlOut);                                 \
-  c1 = __riscv_vsrl_vv_u8m1(c1, shift, vlOut);                                 \
-  /* unconditionally widen and combine to c1234 */                             \
-  vuint16m2_t c34 = __riscv_vwaddu_wv_u16m2(                                   \
-      __riscv_vwmulu_vx_u16m2(c3, 1 << 6, vlOut), c4, vlOut);                  \
-  vuint16m2_t c12 = __riscv_vwaddu_wv_u16m2(                                   \
-      __riscv_vwmulu_vx_u16m2(c1, 1 << 6, vlOut), c2, vlOut);                  \
-  vuint32m4_t c1234 = __riscv_vwaddu_wv_u32m4(                                 \
-      __riscv_vwmulu_vx_u32m4(c12, 1 << 12, vlOut), c34, vlOut);               \
-  /* derive required right-shift amount from `shift` to reduce                 \
-   * c1234 to the required number of bytes */                                  \
-  c1234 = __riscv_vsrl_vv_u32m4(                                               \
-      c1234,                                                                   \
-      __riscv_vzext_vf4_u32m4(                                                 \
-          __riscv_vmul_vx_u8m1(                                                \
-              __riscv_vrsub_vx_u8m1(__riscv_vssubu_vx_u8m1(shift, 2, vlOut),   \
-                                    3, vlOut),                                 \
-              6, vlOut),                                                       \
-          vlOut),                                                              \
-      vlOut);                                                                  \
-  /* store result in desired format */                                         \
-  if (is16)                                                                    \
-    vlDst = rvv_utf32_store_utf16_m4<bflip>((uint16_t *)dst, c1234, vlOut,     \
-                                            m4even);                           \
-  else                                                                         \
-    vlDst = vlOut, __riscv_vse32_v_u32m4((uint32_t *)dst, c1234, vlOut);
-
-    /* Unrolling this manually reduces register pressure and allows
-     * us to terminate early. */
-    {
-      size_t vlOutm2 = vlOut, vlDst;
-      vlOut = __riscv_vsetvl_e8m1(vlOut);
-      SIMDUTF_RVV_UTF8_TO_COMMON_M1(0)
-      if (vlOutm2 == vlOut) {
-        vlOut = vlDst;
-        continue;
-      }
-
-      dst += vlDst;
-      vlOut = vlOutm2 - vlOut;
-    }
-    {
-      size_t vlDst;
-      SIMDUTF_RVV_UTF8_TO_COMMON_M1(1)
-      vlOut = vlDst;
+simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      lsx_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
-
-#undef SIMDUTF_RVV_UTF8_TO_COMMON_M1
   }
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
 
-  /* validate the last character and reparse it + tail */
-  if (len > tail) {
-    if ((src[0] >> 6) == 0b10)
-      --dst;
-    while ((src[0] >> 6) == 0b10 && tail < len)
-      --src, ++tail;
-    if (is16) {
-      /* go back one more, when on high surrogate */
-      if (simdutf_byteflip<bflip>((uint16_t)dst[-1]) >= 0xD800 &&
-          simdutf_byteflip<bflip>((uint16_t)dst[-1]) <= 0xDBFF)
-        --dst;
+simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  // ret.first.count is always the position in the buffer, not the number of
+  // code units written even if finished
+  std::pair<result, char16_t *> ret =
+      lsx_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                              utf16_output);
+  if (ret.first.count != len) {
+    result scalar_res =
+        scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
+            buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
     }
   }
-  size_t ret = scalar(src, tail, dst);
-  if (ret == 0)
-    return 0;
-  return (size_t)(dst - beg) + ret;
+  ret.first.count =
+      ret.second -
+      utf16_output; // Set count to the number of 8-bit code units written
+  return ret.first;
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
-    const char *src, size_t len, char *dst) const noexcept {
-  const char *beg = dst;
-  uint8_t last = 0;
-  for (size_t vl, vlOut; len > 0;
-       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
-    // check which bytes are ASCII
-    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
-    // count ASCII bytes
-    vlOut = __riscv_vcpop_m_b4(ascii, vl);
-    // The original code would only enter the next block after this check:
-    //   vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-    //   vlOut = __riscv_vcpop_m_b4(m, vl);
-    //   if (vlOut != vl || last > 0b01111111) {...}q
-    // So that everything is ASCII or continuation bytes, we just proceeded
-    // without any processing, going straight to __riscv_vse8_v_u8m2.
-    // But you need the __riscv_vslide1up_vx_u8m2 whenever there is a non-ASCII
-    // byte.
-    if (vlOut != vl) { // If not pure ASCII
-      // Non-ASCII characters
-      // We now want to mark the ascii and continuation bytes
-      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-      // We count them, that's our new vlOut (output vector length)
-      vlOut = __riscv_vcpop_m_b4(m, vl);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16le(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16le(buf, len, utf16_output);
+}
 
-      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf16be(
+    const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
+  return convert_utf32_to_utf16be(buf, len, utf16_output);
+}
 
-      vbool4_t leading0 = __riscv_vmsgtu_vx_u8m2_b4(v0, 0b10111111, vl);
-      vbool4_t trailing1 = __riscv_vmslt_vx_i8m2_b4(
-          __riscv_vreinterpret_v_u8m2_i8m2(v1), (uint8_t)0b11000000, vl);
-      // -62 i 0b11000010, so we check whether any of v0 is too big
-      vbool4_t tobig = __riscv_vmand_mm_b4(
-          leading0,
-          __riscv_vmsgtu_vx_u8m2_b4(__riscv_vxor_vx_u8m2(v0, (uint8_t)-62, vl),
-                                    1, vl),
-          vl);
-      if (__riscv_vfirst_m_b4(
-              __riscv_vmor_mm_b4(
-                  tobig, __riscv_vmxor_mm_b4(leading0, trailing1, vl), vl),
-              vl) >= 0)
-        return 0;
+simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16le_to_utf32(buf, len, utf32_output);
+}
 
-      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
-                                  v1, v1, 0b01000000, vl);
-      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
-    } else if (last >= 0b11000000) { // If last byte is a leading  byte and we
-                                     // got only ASCII, error!
-      return 0;
-    }
-    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
-  }
-  if (last > 0b10111111)
-    return 0;
-  return dst - beg;
+simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf32(
+    const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
+  return convert_utf16be_to_utf32(buf, len, utf32_output);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
-    const char *src, size_t len, char *dst) const noexcept {
-  size_t res = convert_utf8_to_latin1(src, len, dst);
-  if (res)
-    return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_latin1::convert_with_errors(src, len, dst);
+void implementation::change_endianness_utf16(const char16_t *input,
+                                             size_t length,
+                                             char16_t *output) const noexcept {
+  utf16::change_endianness_utf16(input, length, output);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
-    const char *src, size_t len, char *dst) const noexcept {
-  const char *beg = dst;
-  uint8_t last = 0;
-  for (size_t vl, vlOut; len > 0;
-       len -= vl, src += vl, dst += vlOut, last = src[-1]) {
-    vl = __riscv_vsetvl_e8m2(len);
-    vuint8m2_t v1 = __riscv_vle8_v_u8m2((uint8_t *)src, vl);
-    vbool4_t ascii = __riscv_vmsltu_vx_u8m2_b4(v1, 0b10000000, vl);
-    vlOut = __riscv_vcpop_m_b4(ascii, vl);
-    if (vlOut != vl) { // If not pure ASCII
-      vbool4_t m = __riscv_vmsltu_vx_u8m2_b4(v1, 0b11000000, vl);
-      vlOut = __riscv_vcpop_m_b4(m, vl);
-      vuint8m2_t v0 = __riscv_vslide1up_vx_u8m2(v1, last, vl);
-      v1 = __riscv_vor_vx_u8m2_mu(__riscv_vmseq_vx_u8m2_b4(v0, 0b11000011, vl),
-                                  v1, v1, 0b01000000, vl);
-      v1 = __riscv_vcompress_vm_u8m2(v1, m, vl);
-    }
-    __riscv_vse8_v_u8m2((uint8_t *)dst, v1, vlOut);
-  }
-  return dst - beg;
+simdutf_warn_unused size_t implementation::count_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE>(src, len,
-                                                              (uint16_t *)dst);
+simdutf_warn_unused size_t implementation::count_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::count_code_points<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf16be(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB>(
-        src, len, (uint16_t *)dst);
-  else
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V>(src, len,
-                                                             (uint16_t *)dst);
+simdutf_warn_unused size_t
+implementation::count_utf8(const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16le_with_errors(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf16le(src, len, dst);
-  if (res)
-    return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::LITTLE>(
-      src, len, dst);
+simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
+    const char *buf, size_t len) const noexcept {
+  return count_utf8(buf, len);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf16be_with_errors(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf16be(src, len, dst);
-  if (res)
-    return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf16::convert_with_errors<endianness::BIG>(src, len,
-                                                                     dst);
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf16(size_t length) const noexcept {
+  return length;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16le(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::NONE, false>(
-      src, len, (uint16_t *)dst);
+simdutf_warn_unused size_t
+implementation::latin1_length_from_utf32(size_t length) const noexcept {
+  return length;
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf16be(
-    const char *src, size_t len, char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::ZVBB, false>(
-        src, len, (uint16_t *)dst);
-  else
-    return rvv_utf8_to_common<uint16_t, simdutf_ByteFlip::V, false>(
-        src, len, (uint16_t *)dst);
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
+  const uint8_t *data_end = data + length;
+  uint64_t result = 0;
+  while (data + 16 < data_end) {
+    uint64_t two_bytes = 0;
+    __m128i input_vec = __lsx_vld(data, 0);
+    two_bytes =
+        __lsx_vpickve2gr_hu(__lsx_vpcnt_h(__lsx_vmskltz_b(input_vec)), 0);
+    result += 16 + two_bytes;
+    data += 16;
+  }
+  return result + scalar::latin1::utf8_length_from_latin1((const char *)data,
+                                                          data_end - data);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf8_to_utf32(
-    const char *src, size_t len, char32_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE>(src, len,
-                                                              (uint32_t *)dst);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-simdutf_warn_unused result implementation::convert_utf8_to_utf32_with_errors(
-    const char *src, size_t len, char32_t *dst) const noexcept {
-  size_t res = convert_utf8_to_utf32(src, len, dst);
-  if (res)
-    return result(error_code::SUCCESS, res);
-  return scalar::utf8_to_utf32::convert_with_errors(src, len, dst);
+simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf8_length_from_utf16<endianness::BIG>(input, length);
 }
 
-simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
-    const char *src, size_t len, char32_t *dst) const noexcept {
-  return rvv_utf8_to_common<uint32_t, simdutf_ByteFlip::NONE, false>(
-      src, len, (uint32_t *)dst);
+simdutf_warn_unused size_t
+implementation::utf16_length_from_latin1(size_t length) const noexcept {
+  return length;
 }
-/* end file src/rvv/rvv_utf8_to.inl.cpp */
 
-simdutf_warn_unused int
-implementation::detect_encodings(const char *input,
-                                 size_t length) const noexcept {
-  // If there is a BOM, then we trust it.
-  auto bom_encoding = simdutf::BOM::check_bom(input, length);
-  if (bom_encoding != encoding_type::unspecified)
-    return bom_encoding;
-  // todo: reimplement as a one-pass algorithm.
-  int out = 0;
-  if (validate_utf8(input, length))
-    out |= encoding_type::UTF8;
-  if (length % 2 == 0) {
-    if (validate_utf16(reinterpret_cast<const char16_t *>(input), length / 2))
-      out |= encoding_type::UTF16_LE;
-  }
-  if (length % 4 == 0) {
-    if (validate_utf32(reinterpret_cast<const char32_t *>(input), length / 4))
-      out |= encoding_type::UTF32_LE;
-  }
+simdutf_warn_unused size_t
+implementation::utf32_length_from_latin1(size_t length) const noexcept {
+  return length;
+}
 
-  return out;
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::LITTLE>(input, length);
 }
 
-template <simdutf_ByteFlip bflip>
-simdutf_really_inline static void
-rvv_change_endianness_utf16(const char16_t *src, size_t len, char16_t *dst) {
-  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
-    vl = __riscv_vsetvl_e16m8(len);
-    vuint16m8_t v = __riscv_vle16_v_u16m8((uint16_t *)src, vl);
-    __riscv_vse16_v_u16m8((uint16_t *)dst, simdutf_byteflip<bflip>(v, vl), vl);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf16be(
+    const char16_t *input, size_t length) const noexcept {
+  return utf16::utf32_length_from_utf16<endianness::BIG>(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::utf16_length_from_utf8(input, length);
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const __m128i v_80 = __lsx_vrepli_w(0x80); /*0x00000080*/
+  const __m128i v_800 = __lsx_vldi(-3832);   /*0x00000800*/
+  const __m128i v_10000 = __lsx_vldi(-3583); /*0x00010000*/
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(input + pos), 0);
+    const __m128i ascii_bytes_bytemask = __lsx_vslt_w(in, v_80);
+    const __m128i one_two_bytes_bytemask = __lsx_vslt_w(in, v_800);
+    const __m128i two_bytes_bytemask =
+        __lsx_vxor_v(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    const __m128i three_bytes_bytemask =
+        __lsx_vxor_v(__lsx_vslt_w(in, v_10000), one_two_bytes_bytemask);
+
+    const uint32_t ascii_bytes_count = __lsx_vpickve2gr_bu(
+        __lsx_vpcnt_b(__lsx_vmskltz_w(ascii_bytes_bytemask)), 0);
+    const uint32_t two_bytes_count = __lsx_vpickve2gr_bu(
+        __lsx_vpcnt_b(__lsx_vmskltz_w(two_bytes_bytemask)), 0);
+    const uint32_t three_bytes_count = __lsx_vpickve2gr_bu(
+        __lsx_vpcnt_b(__lsx_vmskltz_w(three_bytes_bytemask)), 0);
+
+    count +=
+        16 - 3 * ascii_bytes_count - 2 * two_bytes_count - three_bytes_count;
+  }
+  return count +
+         scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
+}
+
+simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
+    const char32_t *input, size_t length) const noexcept {
+  const __m128i v_ffff = __lsx_vldi(-2304); /*0x0000ffff*/
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos + 4 <= length; pos += 4) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(input + pos), 0);
+    const __m128i surrogate_bytemask = __lsx_vslt_wu(v_ffff, in);
+    size_t surrogate_count = __lsx_vpickve2gr_bu(
+        __lsx_vpcnt_b(__lsx_vmskltz_w(surrogate_bytemask)), 0);
+    count += 4 + surrogate_count;
   }
+  return count +
+         scalar::utf32::utf16_length_from_utf32(input + pos, length - pos);
 }
 
-void implementation::change_endianness_utf16(const char16_t *src, size_t len,
-                                             char16_t *dst) const noexcept {
-  if (supports_zvbb())
-    return rvv_change_endianness_utf16<simdutf_ByteFlip::ZVBB>(src, len, dst);
-  else
-    return rvv_change_endianness_utf16<simdutf_ByteFlip::V>(src, len, dst);
+simdutf_warn_unused size_t implementation::utf32_length_from_utf8(
+    const char *input, size_t length) const noexcept {
+  return utf8::count_code_points(input, length);
 }
 
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
@@ -35002,86 +49864,21 @@ simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
 simdutf_warn_unused result implementation::base64_to_binary(
     const char *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-    return {SUCCESS, 0};
-  }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-  }
-  return r;
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
 simdutf_warn_unused full_result implementation::base64_to_binary_details(
     const char *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
-    }
-    return {SUCCESS, 0, 0};
-  }
-  full_result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.output_count % 3 == 0) ||
-        ((r.output_count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
-    }
-  }
-  return r;
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
 simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
@@ -35092,86 +49889,21 @@ simdutf_warn_unused size_t implementation::maximal_binary_length_from_base64(
 simdutf_warn_unused result implementation::base64_to_binary(
     const char16_t *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  auto equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-    return {SUCCESS, 0};
-  }
-  result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.count % 3 == 0) || ((r.count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation};
-    }
-  }
-  return r;
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
 simdutf_warn_unused full_result implementation::base64_to_binary_details(
     const char16_t *input, size_t length, char *output, base64_options options,
     last_chunk_handling_options last_chunk_options) const noexcept {
-  while (length > 0 &&
-         scalar::base64::is_ascii_white_space(input[length - 1])) {
-    length--;
-  }
-  size_t equallocation =
-      length; // location of the first padding character if any
-  size_t equalsigns = 0;
-  if (length > 0 && input[length - 1] == '=') {
-    equallocation = length - 1;
-    length -= 1;
-    equalsigns++;
-    while (length > 0 &&
-           scalar::base64::is_ascii_white_space(input[length - 1])) {
-      length--;
-    }
-    if (length > 0 && input[length - 1] == '=') {
-      equallocation = length - 1;
-      equalsigns++;
-      length -= 1;
-    }
-  }
-  if (length == 0) {
-    if (equalsigns > 0) {
-      return {INVALID_BASE64_CHARACTER, equallocation, 0};
-    }
-    return {SUCCESS, 0, 0};
-  }
-  full_result r = scalar::base64::base64_tail_decode(
-      output, input, length, equalsigns, options, last_chunk_options);
-  if (last_chunk_options != stop_before_partial &&
-      r.error == error_code::SUCCESS && equalsigns > 0) {
-    // additional checks
-    if ((r.output_count % 3 == 0) ||
-        ((r.output_count % 3) + 1 + equalsigns != 4)) {
-      return {INVALID_BASE64_CHARACTER, equallocation, r.output_count};
-    }
-  }
-  return r;
+  return (options & base64_url)
+             ? compress_decode_base64<true>(output, input, length, options,
+                                            last_chunk_options)
+             : compress_decode_base64<false>(output, input, length, options,
+                                             last_chunk_options);
 }
 
 simdutf_warn_unused size_t implementation::base64_length_from_binary(
@@ -35182,148 +49914,148 @@ simdutf_warn_unused size_t implementation::base64_length_from_binary(
 size_t implementation::binary_to_base64(const char *input, size_t length,
                                         char *output,
                                         base64_options options) const noexcept {
-  return scalar::base64::tail_encode_base64(output, input, length, options);
+  if (options & base64_url) {
+    return encode_base64<true>(output, input, length, options);
+  } else {
+    return encode_base64<false>(output, input, length, options);
+  }
 }
-} // namespace rvv
+} // namespace lsx
 } // namespace simdutf
 
-/* begin file src/simdutf/rvv/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_RVV
-// nothing needed.
-#else
-SIMDUTF_UNTARGET_REGION
-#endif
-
-/* end file src/simdutf/rvv/end.h */
-/* end file src/rvv/implementation.cpp */
-#endif
-#if SIMDUTF_IMPLEMENTATION_WESTMERE
-/* begin file src/westmere/implementation.cpp */
-/* begin file src/simdutf/westmere/begin.h */
-// redefining SIMDUTF_IMPLEMENTATION to "westmere"
-// #define SIMDUTF_IMPLEMENTATION westmere
-
-#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
-// nothing needed.
-#else
-SIMDUTF_TARGET_WESTMERE
+/* begin file src/simdutf/lsx/end.h */
+/* end file src/simdutf/lsx/end.h */
+/* end file src/lsx/implementation.cpp */
 #endif
-/* end file src/simdutf/westmere/begin.h */
+#if SIMDUTF_IMPLEMENTATION_LASX
+/* begin file src/lasx/implementation.cpp */
+/* begin file src/simdutf/lasx/begin.h */
+// redefining SIMDUTF_IMPLEMENTATION to "lasx"
+// #define SIMDUTF_IMPLEMENTATION lasx
+/* end file src/simdutf/lasx/begin.h */
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-#ifndef SIMDUTF_WESTMERE_H
-  #error "westmere.h must be included"
+#ifndef SIMDUTF_LASX_H
+  #error "lasx.h must be included"
 #endif
 using namespace simd;
 
+// convert vmskltz/vmskgez/vmsknz to
+// simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes index
+const uint8_t lasx_1_2_utf8_bytes_mask[] = {
+    0,   1,   4,   5,   16,  17,  20,  21,  64,  65,  68,  69,  80,  81,  84,
+    85,  2,   3,   6,   7,   18,  19,  22,  23,  66,  67,  70,  71,  82,  83,
+    86,  87,  8,   9,   12,  13,  24,  25,  28,  29,  72,  73,  76,  77,  88,
+    89,  92,  93,  10,  11,  14,  15,  26,  27,  30,  31,  74,  75,  78,  79,
+    90,  91,  94,  95,  32,  33,  36,  37,  48,  49,  52,  53,  96,  97,  100,
+    101, 112, 113, 116, 117, 34,  35,  38,  39,  50,  51,  54,  55,  98,  99,
+    102, 103, 114, 115, 118, 119, 40,  41,  44,  45,  56,  57,  60,  61,  104,
+    105, 108, 109, 120, 121, 124, 125, 42,  43,  46,  47,  58,  59,  62,  63,
+    106, 107, 110, 111, 122, 123, 126, 127, 128, 129, 132, 133, 144, 145, 148,
+    149, 192, 193, 196, 197, 208, 209, 212, 213, 130, 131, 134, 135, 146, 147,
+    150, 151, 194, 195, 198, 199, 210, 211, 214, 215, 136, 137, 140, 141, 152,
+    153, 156, 157, 200, 201, 204, 205, 216, 217, 220, 221, 138, 139, 142, 143,
+    154, 155, 158, 159, 202, 203, 206, 207, 218, 219, 222, 223, 160, 161, 164,
+    165, 176, 177, 180, 181, 224, 225, 228, 229, 240, 241, 244, 245, 162, 163,
+    166, 167, 178, 179, 182, 183, 226, 227, 230, 231, 242, 243, 246, 247, 168,
+    169, 172, 173, 184, 185, 188, 189, 232, 233, 236, 237, 248, 249, 252, 253,
+    170, 171, 174, 175, 186, 187, 190, 191, 234, 235, 238, 239, 250, 251, 254,
+    255};
+
+simdutf_really_inline __m128i lsx_swap_bytes(__m128i vec) {
+  return __lsx_vshuf4i_b(vec, 0b10110001);
+}
+simdutf_really_inline __m256i lasx_swap_bytes(__m256i vec) {
+  return __lasx_xvshuf4i_b(vec, 0b10110001);
+}
+
 simdutf_really_inline bool is_ascii(const simd8x64<uint8_t> &input) {
-  return input.reduce_or().is_ascii();
+  return input.is_ascii();
 }
 
 simdutf_unused simdutf_really_inline simd8<bool>
 must_be_continuation(const simd8<uint8_t> prev1, const simd8<uint8_t> prev2,
                      const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_second_byte =
-      prev1.saturating_sub(0b11000000u - 1); // Only 11______ will be > 0
-  simd8<uint8_t> is_third_byte =
-      prev2.saturating_sub(0b11100000u - 1); // Only 111_____ will be > 0
-  simd8<uint8_t> is_fourth_byte =
-      prev3.saturating_sub(0b11110000u - 1); // Only 1111____ will be > 0
-  // Caller requires a bool (all 1's). All values resulting from the subtraction
-  // will be <= 64, so signed comparison is fine.
-  return simd8<int8_t>(is_second_byte | is_third_byte | is_fourth_byte) >
-         int8_t(0);
+  simd8<bool> is_second_byte = prev1 >= uint8_t(0b11000000u);
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  // Use ^ instead of | for is_*_byte, because ^ is commutative, and the caller
+  // is using ^ as well. This will work fine because we only have to report
+  // errors for cases with 0-1 lead bytes. Multiple lead bytes implies 2
+  // overlapping multibyte characters, and if that happens, there is guaranteed
+  // to be at least *one* lead byte that is part of only 1 other multibyte
+  // character. The error will be detected there.
+  return is_second_byte ^ is_third_byte ^ is_fourth_byte;
 }
 
 simdutf_really_inline simd8<bool>
 must_be_2_3_continuation(const simd8<uint8_t> prev2,
                          const simd8<uint8_t> prev3) {
-  simd8<uint8_t> is_third_byte =
-      prev2.saturating_sub(0xe0u - 0x80); // Only 111_____ will be >= 0x80
-  simd8<uint8_t> is_fourth_byte =
-      prev3.saturating_sub(0xf0u - 0x80); // Only 1111____ will be >= 0x80
-  return simd8<bool>(is_third_byte | is_fourth_byte);
+  simd8<bool> is_third_byte = prev2 >= uint8_t(0b11100000u);
+  simd8<bool> is_fourth_byte = prev3 >= uint8_t(0b11110000u);
+  return is_third_byte ^ is_fourth_byte;
 }
 
-/* begin file src/westmere/internal/loader.cpp */
-namespace internal {
-namespace westmere {
-
-/* begin file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
-/*
- * reads a vector of uint16 values
- * bits after 11th are ignored
- * first 11 bits are encoded into utf8
- * !important! utf8_output must have at least 16 writable bytes
- */
-
-inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
-                                       const __m128i one_byte_bytemask,
-                                       const uint16_t one_byte_bitmask) {
-  // 0b1100_0000_1000_0000
-  const __m128i v_c080 = _mm_set1_epi16((int16_t)0xc080);
-  // 0b0001_1111_0000_0000
-  const __m128i v_1f00 = _mm_set1_epi16((int16_t)0x1f00);
-  // 0b0000_0000_0011_1111
-  const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f);
-
-  // 1. prepare 2-byte values
-  // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-  // expected output   : [110a|aaaa|10bb|bbbb] x 8
-
-  // t0 = [000a|aaaa|bbbb|bb00]
-  const __m128i t0 = _mm_slli_epi16(v_u16, 2);
-  // t1 = [000a|aaaa|0000|0000]
-  const __m128i t1 = _mm_and_si128(t0, v_1f00);
-  // t2 = [0000|0000|00bb|bbbb]
-  const __m128i t2 = _mm_and_si128(v_u16, v_003f);
-  // t3 = [000a|aaaa|00bb|bbbb]
-  const __m128i t3 = _mm_or_si128(t1, t2);
-  // t4 = [110a|aaaa|10bb|bbbb]
-  const __m128i t4 = _mm_or_si128(t3, v_c080);
-
-  // 2. merge ASCII and 2-byte codewords
-  const __m128i utf8_unpacked = _mm_blendv_epi8(t4, v_u16, one_byte_bytemask);
-
-  // 3. prepare bitmask for 8-bit lookup
-  //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h - MSB, a
-  //    - LSB)
-  const uint16_t m0 = one_byte_bitmask & 0x5555;      // m0 = 0h0g0f0e0d0c0b0a
-  const uint16_t m1 = static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
-  const uint8_t m2 = static_cast<uint8_t>((m0 | m1) & 0xff); // m2 = hdgcfbea
-  // 4. pack the bytes
-  const uint8_t *row =
-      &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-  const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-  const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
-
-  // 5. store bytes
-  _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+// common functions for utf8 conversions
+simdutf_really_inline __m128i convert_utf8_3_byte_to_utf16(__m128i in) {
+  // Low half contains  10bbbbbb|10cccccc
+  // High half contains 1110aaaa|1110aaaa
+  const v16u8 sh = {2, 1, 5, 4, 8, 7, 11, 10, 0, 0, 3, 3, 6, 6, 9, 9};
+  const v8u16 v0fff = {0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff, 0xfff};
+
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, (__m128i)sh);
+  // 1110aaaa => aaaa0000
+  __m128i perm_high = __lsx_vslli_b(__lsx_vbsrl_v(perm, 8), 4);
+  // 10bbbbbb 10cccccc => 0010bbbb bbcccccc
+  __m128i composed = __lsx_vbitsel_v(__lsx_vsrli_h(perm, 2), /* perm >> 2*/
+                                     perm, __lsx_vrepli_h(0x3f) /* 0x003f */);
+  // 0010bbbb bbcccccc => aaaabbbb bbcccccc
+  composed = __lsx_vbitsel_v(perm_high, composed, (__m128i)v0fff);
 
-  // 6. adjust pointers
-  utf8_output += row[0];
+  return composed;
 }
 
-inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
-                                       const __m128i v_0000,
-                                       const __m128i v_ff80) {
-  // no bits set above 7th bit
-  const __m128i one_byte_bytemask =
-      _mm_cmpeq_epi16(_mm_and_si128(v_u16, v_ff80), v_0000);
-  const uint16_t one_byte_bitmask =
-      static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
-
-  write_v_u16_11bits_to_utf8(v_u16, utf8_output, one_byte_bytemask,
-                             one_byte_bitmask);
+simdutf_really_inline __m128i convert_utf8_2_byte_to_utf16(__m128i in) {
+  // 10bbbbb 110aaaaa => 00bbbbb 000aaaaa
+  __m128i composed = __lsx_vand_v(in, __lsx_vldi(0x3f));
+  // 00bbbbbb 000aaaaa => 00000aaa aabbbbbb
+  composed = __lsx_vbitsel_v(
+      __lsx_vsrli_h(__lsx_vslli_h(composed, 8), 2), /* (aaaaa << 8) >> 2 */
+      __lsx_vsrli_h(composed, 8),                   /* bbbbbb >> 8 */
+      __lsx_vrepli_h(0x3f));                        /* 0x003f */
+  return composed;
 }
-/* end file src/westmere/internal/write_v_u16_11bits_to_utf8.cpp */
 
-} // namespace westmere
-} // namespace internal
-/* end file src/westmere/internal/loader.cpp */
+simdutf_really_inline __m128i
+convert_utf8_1_to_2_byte_to_utf16(__m128i in, size_t shufutf8_idx) {
+  // Converts 6 1-2 byte UTF-8 characters to 6 UTF-16 characters.
+  // This is a relatively easy scenario
+  // we process SIX (6) input code-code units. The max length in bytes of six
+  // code code units spanning between 1 and 2 bytes each is 12 bytes.
+  __m128i sh =
+      __lsx_vld(reinterpret_cast<const uint8_t *>(
+                    simdutf::tables::utf8_to_utf16::shufutf8[shufutf8_idx]),
+                0);
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, sh);
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000000 00bbbbbb
+  __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_h(0x7f)); // 6 or 7 bits
+  // 1 byte: 00000000 00000000
+  // 2 byte: 00000aaa aa000000
+  __m128i v1f00 = __lsx_vldi(-2785); // -2785(13bit) => 151f
+  __m128i composed = __lsx_vsrli_h(__lsx_vand_v(perm, v1f00), 2); // 5 bits
+  // Combine with a shift right accumulate
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 00000aaa aabbbbbb
+  composed = __lsx_vadd_h(ascii, composed);
+  return composed;
+}
 
-/* begin file src/westmere/sse_validate_utf16.cpp */
+/* begin file src/lasx/lasx_validate_utf16.cpp */
 /*
     In UTF-16 code units in range 0xD800 to 0xDFFF have special meaning.
 
@@ -35344,7 +50076,7 @@ inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
     - there must not be two consecutive high surrogates (0xdc00 .. 0xdfff)
     - there must not be sole low surrogate nor high surrogate
 
-    We are going to build three bitmasks based on the 3rd nibble:
+    We're going to build three bitmasks based on the 3rd nibble:
     - V = valid word,
     - L = low surrogate (0xd800 .. 0xdbff)
     - H = high surrogate (0xdc00 .. 0xdfff)
@@ -35371,7 +50103,7 @@ inline void write_v_u16_11bits_to_utf8(const __m128i v_u16, char *&utf8_output,
    - nullptr if an error was detected.
 */
 template <endianness big_endian>
-const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
+const char16_t *lasx_validate_utf16(const char16_t *input, size_t size) {
   const char16_t *end = input + size;
 
   const auto v_d8 = simd8<uint8_t>::splat(0xd8);
@@ -35379,29 +50111,26 @@ const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
   const auto v_fc = simd8<uint8_t>::splat(0xfc);
   const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
     // 0. Load data: since the validation takes into account only higher
     //    byte of each word, we compress the two vectors into one which
     //    consists only the higher bytes.
     auto in0 = simd16<uint16_t>(input);
-    auto in1 =
-        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
+
     if (big_endian) {
       in0 = in0.swap_bytes();
       in1 = in1.swap_bytes();
     }
 
-    const auto t0 = in0.shr<8>();
-    const auto t1 = in1.shr<8>();
-
-    const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd8<uint8_t>(__lasx_xvpermi_d(
+        __lasx_xvssrlni_bu_h(in1.value, in0.value, 8), 0b11011000));
 
     // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
     const auto surrogates_wordmask = (in & v_f8) == v_d8;
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
-    if (surrogates_bitmask == 0x0000) {
-      input += 16;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
     } else {
       // 2. We have some surrogates that have to be distinguished:
       //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
@@ -35411,36 +50140,35 @@ const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
 
       // V - non-surrogate code units
       //     V = not surrogates_wordmask
-      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+      const uint32_t V = ~surrogates_bitmask;
 
       // H - word-mask for high surrogates: the six highest bits are 0b1101'11
       const auto vH = (in & v_fc) == v_dc;
-      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+      const uint32_t H = vH.to_bitmask();
 
       // L - word mask for low surrogates
       //     L = not H and surrogates_wordmask
-      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+      const uint32_t L = ~H & surrogates_bitmask;
 
-      const uint16_t a = static_cast<uint16_t>(
-          L & (H >> 1)); // A low surrogate must be followed by high one.
-                         // (A low surrogate placed in the 7th register's word
-                         // is an exception we handle.)
-      const uint16_t b = static_cast<uint16_t>(
-          a << 1); // Just mark that the opinput - startite fact is hold,
-                   // thanks to that we have only two masks for valid case.
-      const uint16_t c = static_cast<uint16_t>(
-          V | a | b); // Combine all the masks into the final one.
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
 
-      if (c == 0xffff) {
+      if (c == 0xffffffff) {
         // The whole input register contains valid UTF-16, i.e.,
         // either single code units or proper surrogate pairs.
-        input += 16;
-      } else if (c == 0x7fff) {
-        // The 15 lower code units of the input register contains valid UTF-16.
-        // The 15th word may be either a low or high surrogate. It the next
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
         // iteration we 1) check if the low surrogate is followed by a high
         // one, 2) reject sole high surrogate.
-        input += 15;
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
       } else {
         return nullptr;
       }
@@ -35451,8 +50179,8 @@ const char16_t *sse_validate_utf16(const char16_t *input, size_t size) {
 }
 
 template <endianness big_endian>
-const result sse_validate_utf16_with_errors(const char16_t *input,
-                                            size_t size) {
+const result lasx_validate_utf16_with_errors(const char16_t *input,
+                                             size_t size) {
   if (simdutf_unlikely(size == 0)) {
     return result(error_code::SUCCESS, 0);
   }
@@ -35464,30 +50192,25 @@ const result sse_validate_utf16_with_errors(const char16_t *input,
   const auto v_fc = simd8<uint8_t>::splat(0xfc);
   const auto v_dc = simd8<uint8_t>::splat(0xdc);
 
-  while (input + simd16<uint16_t>::SIZE * 2 < end) {
+  while (input + simd16<uint16_t>::ELEMENTS * 2 < end) {
     // 0. Load data: since the validation takes into account only higher
     //    byte of each word, we compress the two vectors into one which
     //    consists only the higher bytes.
     auto in0 = simd16<uint16_t>(input);
-    auto in1 =
-        simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
+    auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::ELEMENTS);
 
     if (big_endian) {
       in0 = in0.swap_bytes();
       in1 = in1.swap_bytes();
     }
-
-    const auto t0 = in0.shr<8>();
-    const auto t1 = in1.shr<8>();
-
-    const auto in = simd16<uint16_t>::pack(t0, t1);
+    const auto in = simd8<uint8_t>(__lasx_xvpermi_d(
+        __lasx_xvssrlni_bu_h(in1.value, in0.value, 8), 0b11011000));
 
     // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
     const auto surrogates_wordmask = (in & v_f8) == v_d8;
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(surrogates_wordmask.to_bitmask());
-    if (surrogates_bitmask == 0x0000) {
-      input += 16;
+    const uint32_t surrogates_bitmask = surrogates_wordmask.to_bitmask();
+    if (surrogates_bitmask == 0x0) {
+      input += simd16<uint16_t>::ELEMENTS * 2;
     } else {
       // 2. We have some surrogates that have to be distinguished:
       //    - low  surrogates: 0b1101'10xx'yyyy'yyyy (0xD800..0xDBFF)
@@ -35497,36 +50220,35 @@ const result sse_validate_utf16_with_errors(const char16_t *input,
 
       // V - non-surrogate code units
       //     V = not surrogates_wordmask
-      const uint16_t V = static_cast<uint16_t>(~surrogates_bitmask);
+      const uint32_t V = ~surrogates_bitmask;
 
       // H - word-mask for high surrogates: the six highest bits are 0b1101'11
       const auto vH = (in & v_fc) == v_dc;
-      const uint16_t H = static_cast<uint16_t>(vH.to_bitmask());
+      const uint32_t H = vH.to_bitmask();
 
       // L - word mask for low surrogates
       //     L = not H and surrogates_wordmask
-      const uint16_t L = static_cast<uint16_t>(~H & surrogates_bitmask);
+      const uint32_t L = ~H & surrogates_bitmask;
 
-      const uint16_t a = static_cast<uint16_t>(
-          L & (H >> 1)); // A low surrogate must be followed by high one.
-                         // (A low surrogate placed in the 7th register's word
-                         // is an exception we handle.)
-      const uint16_t b = static_cast<uint16_t>(
-          a << 1); // Just mark that the opinput - startite fact is hold,
-                   // thanks to that we have only two masks for valid case.
-      const uint16_t c = static_cast<uint16_t>(
-          V | a | b); // Combine all the masks into the final one.
+      const uint32_t a =
+          L & (H >> 1); // A low surrogate must be followed by high one.
+                        // (A low surrogate placed in the 7th register's word
+                        // is an exception we handle.)
+      const uint32_t b =
+          a << 1; // Just mark that the opposite fact is hold,
+                  // thanks to that we have only two masks for valid case.
+      const uint32_t c = V | a | b; // Combine all the masks into the final one.
 
-      if (c == 0xffff) {
+      if (c == 0xffffffff) {
         // The whole input register contains valid UTF-16, i.e.,
         // either single code units or proper surrogate pairs.
-        input += 16;
-      } else if (c == 0x7fff) {
-        // The 15 lower code units of the input register contains valid UTF-16.
-        // The 15th word may be either a low or high surrogate. It the next
+        input += simd16<uint16_t>::ELEMENTS * 2;
+      } else if (c == 0x7fffffff) {
+        // The 31 lower code units of the input register contains valid UTF-16.
+        // The 31 word may be either a low or high surrogate. It the next
         // iteration we 1) check if the low surrogate is followed by a high
         // one, 2) reject sole high surrogate.
-        input += 15;
+        input += simd16<uint16_t>::ELEMENTS * 2 - 1;
       } else {
         return result(error_code::SURROGATE, input - start);
       }
@@ -35535,200 +50257,289 @@ const result sse_validate_utf16_with_errors(const char16_t *input,
 
   return result(error_code::SUCCESS, input - start);
 }
-/* end file src/westmere/sse_validate_utf16.cpp */
-/* begin file src/westmere/sse_validate_utf32le.cpp */
-/* Returns:
-   - pointer to the last unprocessed character (a scalar fallback should check
-   the rest);
-   - nullptr if an error was detected.
-*/
-const char32_t *sse_validate_utf32le(const char32_t *input, size_t size) {
+/* end file src/lasx/lasx_validate_utf16.cpp */
+/* begin file src/lasx/lasx_validate_utf32le.cpp */
+
+const char32_t *lasx_validate_utf32le(const char32_t *input, size_t size) {
   const char32_t *end = input + size;
 
-  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
-  const __m128i offset = _mm_set1_epi32(0xffff2000);
-  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
-  __m128i currentmax = _mm_setzero_si128();
-  __m128i currentoffsetmax = _mm_setzero_si128();
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)input & 0x1F) && input < end) {
+    uint32_t word = *input++;
+    if (word > 0x10FFFF || (word >= 0xD800 && word <= 0xDFFF)) {
+      return nullptr;
+    }
+  }
 
-  while (input + 4 < end) {
-    const __m128i in = _mm_loadu_si128((__m128i *)input);
-    currentmax = _mm_max_epu32(in, currentmax);
+  __m256i offset = __lasx_xvreplgr2vr_w(uint32_t(0xffff2000));
+  __m256i standardoffsetmax = __lasx_xvreplgr2vr_w(uint32_t(0xfffff7ff));
+  __m256i standardmax = __lasx_xvldi(-2288); /*0x10ffff*/
+  __m256i currentmax = __lasx_xvldi(0x0);
+  __m256i currentoffsetmax = __lasx_xvldi(0x0);
+
+  while (input + 8 < end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint32_t *>(input), 0);
+    currentmax = __lasx_xvmax_wu(in, currentmax);
+    // 0xD8__ + 0x2000 = 0xF8__ => 0xF8__ > 0xF7FF
     currentoffsetmax =
-        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
-    input += 4;
+        __lasx_xvmax_wu(__lasx_xvadd_w(in, offset), currentoffsetmax);
+    input += 8;
   }
-  __m128i is_zero =
-      _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
-  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+  __m256i is_zero =
+      __lasx_xvxor_v(__lasx_xvmax_wu(currentmax, standardmax), standardmax);
+  if (__lasx_xbnz_v(is_zero)) {
     return nullptr;
   }
 
-  is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
-                          standardoffsetmax);
-  if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+  is_zero = __lasx_xvxor_v(__lasx_xvmax_wu(currentoffsetmax, standardoffsetmax),
+                           standardoffsetmax);
+  if (__lasx_xbnz_v(is_zero)) {
     return nullptr;
   }
-
   return input;
 }
 
-const result sse_validate_utf32le_with_errors(const char32_t *input,
-                                              size_t size) {
+const result lasx_validate_utf32le_with_errors(const char32_t *input,
+                                               size_t size) {
   const char32_t *start = input;
   const char32_t *end = input + size;
 
-  const __m128i standardmax = _mm_set1_epi32(0x10ffff);
-  const __m128i offset = _mm_set1_epi32(0xffff2000);
-  const __m128i standardoffsetmax = _mm_set1_epi32(0xfffff7ff);
-  __m128i currentmax = _mm_setzero_si128();
-  __m128i currentoffsetmax = _mm_setzero_si128();
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)input & 0x1F) && input < end) {
+    uint32_t word = *input;
+    if (word > 0x10FFFF) {
+      return result(error_code::TOO_LARGE, input - start);
+    }
+    if (word >= 0xD800 && word <= 0xDFFF) {
+      return result(error_code::SURROGATE, input - start);
+    }
+    input++;
+  }
 
-  while (input + 4 < end) {
-    const __m128i in = _mm_loadu_si128((__m128i *)input);
-    currentmax = _mm_max_epu32(in, currentmax);
+  __m256i offset = __lasx_xvreplgr2vr_w(uint32_t(0xffff2000));
+  __m256i standardoffsetmax = __lasx_xvreplgr2vr_w(uint32_t(0xfffff7ff));
+  __m256i standardmax = __lasx_xvldi(-2288); /*0x10ffff*/
+  __m256i currentmax = __lasx_xvldi(0x0);
+  __m256i currentoffsetmax = __lasx_xvldi(0x0);
+
+  while (input + 8 < end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint32_t *>(input), 0);
+    currentmax = __lasx_xvmax_wu(in, currentmax);
     currentoffsetmax =
-        _mm_max_epu32(_mm_add_epi32(in, offset), currentoffsetmax);
+        __lasx_xvmax_wu(__lasx_xvadd_w(in, offset), currentoffsetmax);
 
-    __m128i is_zero =
-        _mm_xor_si128(_mm_max_epu32(currentmax, standardmax), standardmax);
-    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+    __m256i is_zero =
+        __lasx_xvxor_v(__lasx_xvmax_wu(currentmax, standardmax), standardmax);
+    if (__lasx_xbnz_v(is_zero)) {
       return result(error_code::TOO_LARGE, input - start);
     }
-
-    is_zero = _mm_xor_si128(_mm_max_epu32(currentoffsetmax, standardoffsetmax),
-                            standardoffsetmax);
-    if (_mm_test_all_zeros(is_zero, is_zero) == 0) {
+    is_zero =
+        __lasx_xvxor_v(__lasx_xvmax_wu(currentoffsetmax, standardoffsetmax),
+                       standardoffsetmax);
+    if (__lasx_xbnz_v(is_zero)) {
       return result(error_code::SURROGATE, input - start);
     }
-    input += 4;
+    input += 8;
   }
 
   return result(error_code::SUCCESS, input - start);
 }
-/* end file src/westmere/sse_validate_utf32le.cpp */
-
-/* begin file src/westmere/sse_convert_latin1_to_utf8.cpp */
-std::pair<const char *const, char *const>
-sse_convert_latin1_to_utf8(const char *latin_input,
-                           const size_t latin_input_length, char *utf8_output) {
-  const char *end = latin_input + latin_input_length;
-
-  const __m128i v_0000 = _mm_setzero_si128();
-  // 0b1000_0000
-  const __m128i v_80 = _mm_set1_epi8((uint8_t)0x80);
-  // 0b1111_1111_1000_0000
-  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80);
-
-  const __m128i latin_1_half_into_u16_byte_mask =
-      _mm_setr_epi8(0, '\x80', 1, '\x80', 2, '\x80', 3, '\x80', 4, '\x80', 5,
-                    '\x80', 6, '\x80', 7, '\x80');
+/* end file src/lasx/lasx_validate_utf32le.cpp */
 
-  const __m128i latin_2_half_into_u16_byte_mask =
-      _mm_setr_epi8(8, '\x80', 9, '\x80', 10, '\x80', 11, '\x80', 12, '\x80',
-                    13, '\x80', 14, '\x80', 15, '\x80');
+/* begin file src/lasx/lasx_convert_latin1_to_utf8.cpp */
+/*
+  Returns a pair: the first unprocessed byte from buf and utf8_output
+  A scalar routing should carry on the conversion of the tail.
+*/
 
-  // each latin1 takes 1-2 utf8 bytes
-  // slow path writes useful 8-15 bytes twice (eagerly writes 16 bytes and then
-  // adjust the pointer) so the last write can exceed the utf8_output size by
-  // 8-1 bytes by reserving 8 extra input bytes, we expect the output to have
-  // 8-16 bytes free
-  while (end - latin_input >= 16 + 8) {
-    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
+std::pair<const char *, char *>
+lasx_convert_latin1_to_utf8(const char *latin1_input, size_t len,
+                            char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
+  const size_t safety_margin = 12;
+  const char *end = latin1_input + len - safety_margin;
 
-    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
-      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
-      latin_input += 16;
+  // We always write 16 bytes, of which more than the first 8 bytes
+  // are valid. A safety margin of 8 is more than sufficient.
+  while (latin1_input + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(latin1_input), 0);
+    uint32_t ascii_mask = __lsx_vpickve2gr_wu(__lsx_vmskgez_b(in8), 0);
+    if (ascii_mask == 0xFFFF) {
+      __lsx_vst(in8, utf8_output, 0);
       utf8_output += 16;
+      latin1_input += 16;
       continue;
     }
+    // We just fallback on UTF-16 code. This could be optimized/simplified
+    // further.
+    __m256i in16 = __lasx_vext2xv_hu_bu(____m256i(in8));
+    // 1. prepare 2-byte values
+    // input 8-bit word : [aabb|bbbb] x 16
+    // expected output   : [1100|00aa|10bb|bbbb] x 16
+    // t0 = [0000|00aa|bbbb|bb00]
+    __m256i t0 = __lasx_xvslli_h(in16, 2);
+    // t1 = [0000|00aa|0000|0000]
+    __m256i t1 = __lasx_xvand_v(t0, __lasx_xvldi(-2785));
+    // t3 = [0000|00aa|00bb|bbbb]
+    __m256i t2 = __lasx_xvbitsel_v(t1, in16, __lasx_xvrepli_h(0x3f));
+    // t4 = [1100|00aa|10bb|bbbb]
+    __m256i t3 = __lasx_xvor_v(t2, __lasx_xvreplgr2vr_h(uint16_t(0xc080)));
+    // merge ASCII and 2-byte codewords
+    __m256i one_byte_bytemask = __lasx_xvsle_hu(in16, __lasx_xvrepli_h(0x7F));
+    __m256i utf8_unpacked = __lasx_xvbitsel_v(t3, in16, one_byte_bytemask);
+
+    const uint8_t *row0 =
+        &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+            [lasx_1_2_utf8_bytes_mask[(ascii_mask & 0xFF)]][0];
+    __m128i shuffle0 = __lsx_vld(row0 + 1, 0);
+    __m128i utf8_unpacked_lo = lasx_extracti128_lo(utf8_unpacked);
+    __m128i utf8_packed0 =
+        __lsx_vshuf_b(utf8_unpacked_lo, utf8_unpacked_lo, shuffle0);
+    __lsx_vst(utf8_packed0, utf8_output, 0);
+    utf8_output += row0[0];
+
+    const uint8_t *row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                              [lasx_1_2_utf8_bytes_mask[(ascii_mask >> 8)]][0];
+    __m128i shuffle1 = __lsx_vld(row1 + 1, 0);
+    __m128i utf8_unpacked_hi = lasx_extracti128_hi(utf8_unpacked);
+    __m128i utf8_packed1 =
+        __lsx_vshuf_b(utf8_unpacked_hi, utf8_unpacked_hi, shuffle1);
+    __lsx_vst(utf8_packed1, utf8_output, 0);
+    utf8_output += row1[0];
 
-    // assuming a/b are bytes and A/B are uint16 of the same value
-    // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
-    __m128i v_u16_latin_1_half =
-        _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
-    // aaaa_aaaa_bbbb_bbbb -> BBBB_BBBB
-    __m128i v_u16_latin_2_half =
-        _mm_shuffle_epi8(v_latin, latin_2_half_into_u16_byte_mask);
+    latin1_input += 16;
+  } // while
 
-    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_1_half,
-                                                   utf8_output, v_0000, v_ff80);
-    internal::westmere::write_v_u16_11bits_to_utf8(v_u16_latin_2_half,
-                                                   utf8_output, v_0000, v_ff80);
-    latin_input += 16;
+  return std::make_pair(latin1_input, reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/lasx/lasx_convert_latin1_to_utf8.cpp */
+/* begin file src/lasx/lasx_convert_latin1_to_utf16.cpp */
+std::pair<const char *, char16_t *>
+lasx_convert_latin1_to_utf16le(const char *buf, size_t len,
+                               char16_t *utf16_output) {
+  const char *end = buf + len;
+
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)utf16_output & 0x1F) && buf < end) {
+    *utf16_output++ = uint8_t(*buf) & 0xFF;
+    buf++;
   }
 
-  if (end - latin_input >= 16) {
-    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-    __m128i v_latin = _mm_loadu_si128((__m128i *)latin_input);
+  while (buf + 32 <= end) {
+    __m256i in8 = __lasx_xvld(reinterpret_cast<const uint8_t *>(buf), 0);
 
-    if (_mm_testz_si128(v_latin, v_80)) { // ASCII fast path!!!!
-      _mm_storeu_si128((__m128i *)utf8_output, v_latin);
-      latin_input += 16;
-      utf8_output += 16;
-    } else {
-      // assuming a/b are bytes and A/B are uint16 of the same value
-      // aaaa_aaaa_bbbb_bbbb -> AAAA_AAAA
-      __m128i v_u16_latin_1_half =
-          _mm_shuffle_epi8(v_latin, latin_1_half_into_u16_byte_mask);
-      internal::westmere::write_v_u16_11bits_to_utf8(
-          v_u16_latin_1_half, utf8_output, v_0000, v_ff80);
-      latin_input += 8;
-    }
+    __m256i inlow = __lasx_vext2xv_hu_bu(in8);
+    __m256i in8_high = __lasx_xvpermi_q(in8, in8, 0b00000001);
+    __m256i inhigh = __lasx_vext2xv_hu_bu(in8_high);
+    __lasx_xvst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lasx_xvst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 32);
+
+    utf16_output += 32;
+    buf += 32;
   }
 
-  return std::make_pair(latin_input, utf8_output);
+  if (buf + 16 <= end) {
+    __m128i zero = __lsx_vldi(0);
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m128i inlow = __lsx_vilvl_b(zero, in8);
+    __m128i inhigh = __lsx_vilvh_b(zero, in8);
+    __lsx_vst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lsx_vst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 16);
+
+    utf16_output += 16;
+    buf += 16;
+  }
+  return std::make_pair(buf, utf16_output);
 }
-/* end file src/westmere/sse_convert_latin1_to_utf8.cpp */
-/* begin file src/westmere/sse_convert_latin1_to_utf16.cpp */
-template <endianness big_endian>
+
 std::pair<const char *, char16_t *>
-sse_convert_latin1_to_utf16(const char *latin1_input, size_t len,
-                            char16_t *utf16_output) {
-  size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
-  for (size_t i = 0; i < rounded_len; i += 16) {
-    // Load 16 Latin1 characters into a 128-bit register
-    __m128i in =
-        _mm_loadu_si128(reinterpret_cast<const __m128i *>(&latin1_input[i]));
-    __m128i out1 = big_endian ? _mm_unpacklo_epi8(_mm_setzero_si128(), in)
-                              : _mm_unpacklo_epi8(in, _mm_setzero_si128());
-    __m128i out2 = big_endian ? _mm_unpackhi_epi8(_mm_setzero_si128(), in)
-                              : _mm_unpackhi_epi8(in, _mm_setzero_si128());
-    // Zero extend each Latin1 character to 16-bit integers and store the
-    // results back to memory
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i]), out1);
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(&utf16_output[i + 8]), out2);
+lasx_convert_latin1_to_utf16be(const char *buf, size_t len,
+                               char16_t *utf16_output) {
+  const char *end = buf + len;
+
+  while (((uint64_t)utf16_output & 0x1F) && buf < end) {
+    *utf16_output++ = (uint16_t(*buf++) << 8);
   }
-  // return pointers pointing to where we left off
-  return std::make_pair(latin1_input + rounded_len, utf16_output + rounded_len);
+
+  __m256i zero = __lasx_xvldi(0);
+  while (buf + 32 <= end) {
+    __m256i in8 = __lasx_xvld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m256i in8_shuf = __lasx_xvpermi_d(in8, 0b11011000);
+
+    __m256i inlow = __lasx_xvilvl_b(in8_shuf, zero);
+    __m256i inhigh = __lasx_xvilvh_b(in8_shuf, zero);
+    __lasx_xvst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lasx_xvst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 32);
+    utf16_output += 32;
+    buf += 32;
+  }
+
+  if (buf + 16 <= end) {
+    __m128i zero_128 = __lsx_vldi(0);
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m128i inlow = __lsx_vilvl_b(in8, zero_128);
+    __m128i inhigh = __lsx_vilvh_b(in8, zero_128);
+    __lsx_vst(inlow, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    __lsx_vst(inhigh, reinterpret_cast<uint16_t *>(utf16_output), 16);
+    utf16_output += 16;
+    buf += 16;
+  }
+
+  return std::make_pair(buf, utf16_output);
 }
-/* end file src/westmere/sse_convert_latin1_to_utf16.cpp */
-/* begin file src/westmere/sse_convert_latin1_to_utf32.cpp */
+/* end file src/lasx/lasx_convert_latin1_to_utf16.cpp */
+/* begin file src/lasx/lasx_convert_latin1_to_utf32.cpp */
 std::pair<const char *, char32_t *>
-sse_convert_latin1_to_utf32(const char *buf, size_t len,
-                            char32_t *utf32_output) {
+lasx_convert_latin1_to_utf32(const char *buf, size_t len,
+                             char32_t *utf32_output) {
   const char *end = buf + len;
 
-  while (end - buf >= 16) {
-    // Load 16 Latin1 characters (16 bytes) into a 128-bit register
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
+  // LASX requires 32-byte alignment, otherwise performance will be degraded
+  while (((uint64_t)utf32_output & 0x1F) && buf < end) {
+    *utf32_output++ = ((uint32_t)*buf) & 0xFF;
+    buf++;
+  }
 
-    // Shift input to process next 4 bytes
-    __m128i in_shifted1 = _mm_srli_si128(in, 4);
-    __m128i in_shifted2 = _mm_srli_si128(in, 8);
-    __m128i in_shifted3 = _mm_srli_si128(in, 12);
+  while (buf + 32 <= end) {
+    __m256i in8 = __lasx_xvld(reinterpret_cast<const uint8_t *>(buf), 0);
 
-    // expand 8-bit to 32-bit unit
-    __m128i out1 = _mm_cvtepu8_epi32(in);
-    __m128i out2 = _mm_cvtepu8_epi32(in_shifted1);
-    __m128i out3 = _mm_cvtepu8_epi32(in_shifted2);
-    __m128i out4 = _mm_cvtepu8_epi32(in_shifted3);
+    __m256i in32_0 = __lasx_vext2xv_wu_bu(in8);
+    __lasx_xvst(in32_0, reinterpret_cast<uint32_t *>(utf32_output), 0);
 
-    _mm_storeu_si128((__m128i *)utf32_output, out1);
-    _mm_storeu_si128((__m128i *)(utf32_output + 4), out2);
-    _mm_storeu_si128((__m128i *)(utf32_output + 8), out3);
-    _mm_storeu_si128((__m128i *)(utf32_output + 12), out4);
+    __m256i in8_1 = __lasx_xvpermi_d(in8, 0b00000001);
+    __m256i in32_1 = __lasx_vext2xv_wu_bu(in8_1);
+    __lasx_xvst(in32_1, reinterpret_cast<uint32_t *>(utf32_output), 32);
+
+    __m256i in8_2 = __lasx_xvpermi_d(in8, 0b00000010);
+    __m256i in32_2 = __lasx_vext2xv_wu_bu(in8_2);
+    __lasx_xvst(in32_2, reinterpret_cast<uint32_t *>(utf32_output), 64);
+
+    __m256i in8_3 = __lasx_xvpermi_d(in8, 0b00000011);
+    __m256i in32_3 = __lasx_vext2xv_wu_bu(in8_3);
+    __lasx_xvst(in32_3, reinterpret_cast<uint32_t *>(utf32_output), 96);
+
+    utf32_output += 32;
+    buf += 32;
+  }
+
+  if (buf + 16 <= end) {
+    __m128i in8 = __lsx_vld(reinterpret_cast<const uint8_t *>(buf), 0);
+
+    __m128i zero = __lsx_vldi(0);
+    __m128i in16low = __lsx_vilvl_b(zero, in8);
+    __m128i in16high = __lsx_vilvh_b(zero, in8);
+    __m128i in32_0 = __lsx_vilvl_h(zero, in16low);
+    __m128i in32_1 = __lsx_vilvh_h(zero, in16low);
+    __m128i in32_2 = __lsx_vilvl_h(zero, in16high);
+    __m128i in32_3 = __lsx_vilvh_h(zero, in16high);
+
+    __lsx_vst(in32_0, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(in32_1, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    __lsx_vst(in32_2, reinterpret_cast<uint32_t *>(utf32_output), 32);
+    __lsx_vst(in32_3, reinterpret_cast<uint32_t *>(utf32_output), 48);
 
     utf32_output += 16;
     buf += 16;
@@ -35736,15 +50547,13 @@ sse_convert_latin1_to_utf32(const char *buf, size_t len,
 
   return std::make_pair(buf, utf32_output);
 }
-/* end file src/westmere/sse_convert_latin1_to_utf32.cpp */
-
-/* begin file src/westmere/sse_convert_utf8_to_utf16.cpp */
-// depends on "tables/utf8_to_utf16_tables.h"
+/* end file src/lasx/lasx_convert_latin1_to_utf32.cpp */
 
-// Convert up to 12 bytes from utf8 to utf16 using a mask indicating the
+/* begin file src/lasx/lasx_convert_utf8_to_utf16.cpp */
+// Convert up to 16 bytes from utf8 to utf16 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
-// It returns how many bytes were consumed (up to 12).
+// It returns how many bytes were consumed (up to 16, usually 12).
 template <endianness big_endian>
 size_t convert_masked_utf8_to_utf16(const char *input,
                                     uint64_t utf8_end_of_code_point_mask,
@@ -35753,204 +50562,304 @@ size_t convert_masked_utf8_to_utf16(const char *input,
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
   //
   // Optimization note: our main path below is load-latency dependent. Thus it
   // is maybe beneficial to have fast paths that depend on branch prediction but
   // have less latency. This results in more instructions but, potentially, also
   // higher speeds.
-  //
+
   // We first try a few fast paths.
-  const __m128i swap =
-      _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-  const __m128i in = _mm_loadu_si128((__m128i *)input);
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff;
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process the data in chunks of 12 bytes.
-    // Note: using 16 bytes is unsafe, see issue_ossfuzz_71218
-    __m128i ascii_first = _mm_cvtepu8_epi16(in);
-    __m128i ascii_second = _mm_cvtepu8_epi16(_mm_srli_si128(in, 8));
-    if (big_endian) {
-      ascii_first = _mm_shuffle_epi8(ascii_first, swap);
-      ascii_second = _mm_shuffle_epi8(ascii_second, swap);
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xFFFF) {
+    __m128i zero = __lsx_vldi(0);
+    if (match_system(big_endian)) {
+      __lsx_vst(__lsx_vilvl_b(zero, in),
+                reinterpret_cast<uint16_t *>(utf16_output), 0);
+      __lsx_vst(__lsx_vilvh_b(zero, in),
+                reinterpret_cast<uint16_t *>(utf16_output), 16);
+    } else {
+      __lsx_vst(__lsx_vilvl_b(in, zero),
+                reinterpret_cast<uint16_t *>(utf16_output), 0);
+      __lsx_vst(__lsx_vilvh_b(in, zero),
+                reinterpret_cast<uint16_t *>(utf16_output), 16);
     }
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output), ascii_first);
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf16_output + 8),
-                     ascii_second);
-    utf16_output += 12; // We wrote 12 16-bit characters.
-    return 12;          // We consumed 12 bytes.
+    utf16_output += 16; // We wrote 16 16-bit characters.
+    return 16;          // We consumed 16 bytes.
   }
-  if (((utf8_end_of_code_point_mask & 0xFFFF) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 2-byte
-    // UTF-16 code units. There is probably a more efficient sequence, but the
-    // following might do.
-    const __m128i sh =
-        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-    __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian)
-      composed = _mm_shuffle_epi8(composed, swap);
-    _mm_storeu_si128((__m128i *)utf16_output, composed);
-    utf16_output += 8; // We wrote 16 bytes, 8 code points.
-    return 16;
+
+  // 3 byte sequences are the next most common, as seen in CJK, which has long
+  // sequences of these.
+  if (input_utf8_end_of_code_point_mask == 0x924) {
+    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
+    // UTF-16 code units.
+    __m128i composed = convert_utf8_3_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
+
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 4; // We wrote 4 16-bit characters.
+    return 12;         // We consumed 12 bytes.
   }
-  if (input_utf8_end_of_code_point_mask == 0x924) {
-    // We want to take 4 3-byte UTF-8 code units and turn them into 4 2-byte
-    // UTF-16 code units. There is probably a more efficient sequence, but the
-    // following might do.
-    const __m128i sh =
-        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii =
-        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
-    const __m128i middlebyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    const __m128i highbyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
-    __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian)
-      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
-    _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
-    utf16_output += 4;
-    return 12;
+
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xAAAA) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 2-byte
+    // UTF-16 code units.
+    __m128i composed = convert_utf8_2_byte_to_utf16(in);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
+
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 8; // We wrote 6 16-bit characters.
+    return 16;         // We consumed 12 bytes.
   }
-  /// We do not have a fast path available, so we fallback.
 
-  const uint8_t idx =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
+  const __m128i zero = __lsx_vldi(0);
   if (idx < 64) {
     // SIX (6) input code-code units
-    // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six
-    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
-    // processors where pdep/pext is fast, we might be able to use a small
-    // lookup table.
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-    __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    if (big_endian)
-      composed = _mm_shuffle_epi8(composed, swap);
-    _mm_storeu_si128((__m128i *)utf16_output, composed);
-    utf16_output += 6; // We wrote 12 bytes, 6 code points.
+    // Convert to UTF-16
+    __m128i composed = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
+    // Store
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 6; // We wrote 6 16-bit characters.
+    return consumed;
   } else if (idx < 145) {
     // FOUR (4) input code-code units
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii =
-        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
-    const __m128i middlebyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    const __m128i highbyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
-    __m128i composed_repacked = _mm_packus_epi32(composed, composed);
-    if (big_endian)
-      composed_repacked = _mm_shuffle_epi8(composed_repacked, swap);
-    _mm_storeu_si128((__m128i *)utf16_output, composed_repacked);
-    utf16_output += 4;
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // XXX: depending on the system scalar instructions might be faster.
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: xx0bbbbb x0cccccc
+    // 3 byte: xxbbbbbb x0cccccc
+    __m128i lowperm = __lsx_vpickev_h(perm, perm);
+    // 1 byte: 00000000 00000000
+    // 2 byte: 00000000 00000000
+    // 3 byte: 00000000 1110aaaa
+    __m128i highperm = __lsx_vpickod_h(perm, perm);
+    // 3 byte: aaaa0000 00000000
+    highperm = __lsx_vslli_h(highperm, 12);
+    // ASCII
+    // 1 byte: 00000000 0ccccccc
+    // 2+byte: 00000000 00cccccc
+    __m128i ascii = __lsx_vand_v(lowperm, __lsx_vrepli_h(0x7f));
+    // 1 byte: 00000000 00000000
+    // 2 byte: xx0bbbbb 00000000
+    // 3 byte: xxbbbbbb 00000000
+    __m128i middlebyte = __lsx_vand_v(lowperm, __lsx_vldi(-2561) /*0xFF00*/);
+    // 1 byte: 00000000 0ccccccc
+    // 2 byte: 0010bbbb bbcccccc
+    // 3 byte: 0010bbbb bbcccccc
+    __m128i composed = __lsx_vor_v(__lsx_vsrli_h(middlebyte, 2), ascii);
+
+    __m128i v0fff = __lsx_vreplgr2vr_h(uint16_t(0xfff));
+    // aaaabbbb bbcccccc
+    composed = __lsx_vbitsel_v(highperm, composed, v0fff);
+
+    if (!match_system(big_endian)) {
+      composed = lsx_swap_bytes(composed);
+    }
+
+    __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+    utf16_output += 4; // We wrote 4 16-bit codepoints
+    return consumed;
   } else if (idx < 209) {
-    // TWO (2) input code-code units
-    //////////////
-    // There might be garbage inputs where a leading byte mascarades as a
-    // four-byte leading byte (by being followed by 3 continuation byte), but is
-    // not greater than 0xf0. This could trigger a buffer overflow if we only
-    // counted leading bytes of the form 0xf0 as generating surrogate pairs,
-    // without further UTF-8 validation. Thus we must be careful to ensure that
-    // only leading bytes at least as large as 0xf0 generate surrogate pairs. We
-    // do as at the cost of an extra mask.
-    /////////////
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
-    const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    __m128i middlehighbyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f0000));
-    // correct for spurious high bit
-    const __m128i correct =
-        _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
-    middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
-    const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
-    // We deliberately carry the leading four bits in highbyte if they are
-    // present, we remove them later when computing hightenbits.
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0xff000000));
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
-    // When we need to generate a surrogate pair (leading byte > 0xF0), then
-    // the corresponding 32-bit value in 'composed'  will be greater than
-    // > (0xff00000>>6) or > 0x3c00000. This can be used later to identify the
-    // location of the surrogate pairs.
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
-                     _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
-    const __m128i composedminus =
-        _mm_sub_epi32(composed, _mm_set1_epi32(0x10000));
-    const __m128i lowtenbits =
-        _mm_and_si128(composedminus, _mm_set1_epi32(0x3ff));
-    // Notice the 0x3ff mask:
-    const __m128i hightenbits =
-        _mm_and_si128(_mm_srli_epi32(composedminus, 10), _mm_set1_epi32(0x3ff));
-    const __m128i lowtenbitsadd =
-        _mm_add_epi32(lowtenbits, _mm_set1_epi32(0xDC00));
-    const __m128i hightenbitsadd =
-        _mm_add_epi32(hightenbits, _mm_set1_epi32(0xD800));
-    const __m128i lowtenbitsaddshifted = _mm_slli_epi32(lowtenbitsadd, 16);
-    __m128i surrogates = _mm_or_si128(hightenbitsadd, lowtenbitsaddshifted);
-    uint32_t basic_buffer[4];
-    uint32_t basic_buffer_swap[4];
-    if (big_endian) {
-      _mm_storeu_si128((__m128i *)basic_buffer_swap,
-                       _mm_shuffle_epi8(composed, swap));
-      surrogates = _mm_shuffle_epi8(surrogates, swap);
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-16 pairs. Generating surrogate pairs is a little tricky though, but
+      // it is easier when we can assume they are all pairs. This version does
+      // not use the LUT, but 4 byte sequences are less common and the overhead
+      // of the extra memory access is less important than the early branch
+      // overhead in shorter sequences.
+
+      // Swap byte pairs
+      // 10dddddd 10cccccc|10bbbbbb 11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      __m128i swap = lsx_swap_bytes(in);
+      // Shift left 2 bits
+      // cccccc00 dddddd00 xxxxxxxx bbbbbb00
+      __m128i shift = __lsx_vslli_b(swap, 2);
+      // Create a magic number containing the low 2 bits of the trail surrogate
+      // and all the corrections needed to create the pair. UTF-8 4b prefix   =
+      // -0x0000|0xF000 surrogate offset  = -0x0000|0x0040 (0x10000 << 6)
+      // surrogate high    = +0x0000|0xD800
+      // surrogate low     = +0xDC00|0x0000
+      // -------------------------------
+      //                   = +0xDC00|0xE7C0
+      __m128i magic = __lsx_vreplgr2vr_w(uint32_t(0xDC00E7C0));
+      // Generate unadjusted trail surrogate minus lowest 2 bits
+      // vec(0000FF00) = __lsx_vldi(-1758)
+      // xxxxxxxx xxxxxxxx|11110aaa bbbbbb00
+      __m128i trail =
+          __lsx_vbitsel_v(shift, swap, __lsx_vldi(-1758 /*0000FF00*/));
+      // Insert low 2 bits of trail surrogate to magic number for later
+      // 11011100 00000000 11100111 110000cc
+      __m128i magic_with_low_2 = __lsx_vor_v(__lsx_vsrli_w(shift, 30), magic);
+
+      // Generate lead surrogate
+      // xxxxcccc ccdddddd|xxxxxxxx xxxxxxxx
+      // 000000cc ccdddddd|xxxxxxxx xxxxxxxx
+      __m128i lead = __lsx_vbitsel_v(
+          __lsx_vsrli_h(__lsx_vand_v(shift, __lsx_vldi(0x3F)), 4), swap,
+          __lsx_vrepli_h(0x3f /* 0x003f*/));
+
+      // Blend pairs
+      // __lsx_vldi(-1741) => vec(0x0000FFFF)
+      // 000000cc ccdddddd|11110aaa bbbbbb00
+      __m128i blend =
+          __lsx_vbitsel_v(lead, trail, __lsx_vldi(-1741) /* (0x0000FFFF)*4 */);
+
+      // Add magic number to finish the result
+      // 110111CC CCDDDDDD|110110AA BBBBBBCC
+      __m128i composed = __lsx_vadd_h(blend, magic_with_low_2);
+      // Byte swap if necessary
+      if (!match_system(big_endian)) {
+        composed = lsx_swap_bytes(composed);
+      }
+      __lsx_vst(composed, reinterpret_cast<uint16_t *>(utf16_output), 0);
+      utf16_output += 6; // We 3 32-bit surrogate pairs.
+      return 12;         // We consumed 12 bytes.
     }
-    _mm_storeu_si128((__m128i *)basic_buffer, composed);
-    uint32_t surrogate_buffer[4];
-    _mm_storeu_si128((__m128i *)surrogate_buffer, surrogates);
+    // 3 1-4 byte sequences
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 3 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // added to fix issue https://github.com/simdutf/simdutf/issues/514
+    // We only want to write 2 * 16-bit code units when that is actually what we
+    // have. Unfortunately, we cannot trust the input. So it is possible to get
+    // 0xff as an input byte and it should not result in a surrogate pair. We
+    // need to check for that.
+    uint32_t permbuffer[4];
+    __lsx_vst(perm, permbuffer, 0);
+    // Mask the low and middle bytes
+    // 00000000 00000000 00000000 0ddddddd
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7f));
+    // Because the surrogates need more work, the high surrogate is computed
+    // first.
+    __m128i middlehigh = __lsx_vslli_w(perm, 2);
+    // 00000000 00000000 00cccccc 00000000
+    __m128i middlebyte = __lsx_vand_v(perm, __lsx_vldi(-3777) /* 0x00003F00 */);
+    // Start assembling the sequence. Since the 4th byte is in the same position
+    // as it would be in a surrogate and there is no dependency, shift left
+    // instead of right. 3 byte: 00000000 10bbbbxx xxxxxxxx xxxxxxxx 4 byte:
+    // 11110aaa bbbbbbxx xxxxxxxx xxxxxxxx
+    __m128i ab =
+        __lsx_vbitsel_v(middlehigh, perm, __lsx_vldi(-1656) /*0xFF000000*/);
+    // Top 16 bits contains the high ten bits of the surrogate pair before
+    // correction 3 byte: 00000000 10bbbbcc|cccc0000 00000000 4 byte: 11110aaa
+    // bbbbbbcc|cccc0000 00000000 - high 10 bits correct w/o correction
+    __m128i v_fffc0000 = __lsx_vreplgr2vr_w(uint32_t(0xFFFC0000));
+    __m128i abc = __lsx_vbitsel_v(__lsx_vslli_w(middlebyte, 4), ab, v_fffc0000);
+    // Combine the low 6 or 7 bits by a shift right accumulate
+    // 3 byte: 00000000 00000010|bbbbcccc ccdddddd - low 16 bits correct
+    // 4 byte: 00000011 110aaabb|bbbbcccc ccdddddd - low 10 bits correct w/o
+    // correction
+    __m128i composed = __lsx_vor_v(ascii, __lsx_vsrli_w(abc, 6));
+    // After this is for surrogates
+    // Blend the low and high surrogates
+    // 4 byte: 11110aaa bbbbbbcc|bbbbcccc ccdddddd
+    __m128i mixed =
+        __lsx_vbitsel_v(abc, composed, __lsx_vldi(-1741) /*0x0000FFFF*/);
+    // Clear the upper 6 bits of the low surrogate. Don't clear the upper bits
+    // yet as 0x10000 was not subtracted from the codepoint yet. 4 byte:
+    // 11110aaa bbbbbbcc|000000cc ccdddddd
+    __m128i v_ffff03ff = __lsx_vreplgr2vr_w(uint32_t(0xFFFF03FF));
+    __m128i masked_pair = __lsx_vand_v(mixed, v_ffff03ff);
+    // Correct the remaining UTF-8 prefix, surrogate offset, and add the
+    // surrogate prefixes in one magic 16-bit addition. similar magic number but
+    // without the continue byte adjust and halfword swapped UTF-8 4b prefix   =
+    // -0xF000|0x0000 surrogate offset  = -0x0040|0x0000 (0x10000 << 6)
+    // surrogate high    = +0xD800|0x0000
+    // surrogate low     = +0x0000|0xDC00
+    // -----------------------------------
+    //                   = +0xE7C0|0xDC00
+    __m128i magic = __lsx_vreplgr2vr_w(uint32_t(0xE7C0DC00));
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD - surrogate pair complete
+    __m128i surrogates = __lsx_vadd_w(masked_pair, magic);
+    // If the high bit is 1 (s32 less than zero), this needs a surrogate pair
+    __m128i is_pair = __lsx_vslt_w(perm, zero);
+    // Select either the 4 byte surrogate pair or the 2 byte solo codepoint
+    // 3 byte: 0xxxxxxx xxxxxxxx|bbbbcccc ccdddddd
+    // 4 byte: 110110AA BBBBBBCC|110111CC CCDDDDDD
+    __m128i selected = __lsx_vbitsel_v(composed, surrogates, is_pair);
+    // Byte swap if necessary
+    if (!match_system(big_endian)) {
+      selected = lsx_swap_bytes(selected);
+    }
+    // Attempting to shuffle and store would be complex, just scalarize.
+    uint32_t buffer_tmp[4];
+    __lsx_vst(selected, buffer_tmp, 0);
+    // Test for the top bit of the surrogate mask. Remove due to issue 514
+    // const uint32_t SURROGATE_MASK = match_system(big_endian) ? 0x80000000 :
+    // 0x00800000;
     for (size_t i = 0; i < 3; i++) {
-      if (basic_buffer[i] > 0x3c00000) {
-        utf16_output[0] = uint16_t(surrogate_buffer[i] & 0xffff);
-        utf16_output[1] = uint16_t(surrogate_buffer[i] >> 16);
+      // Surrogate
+      // Used to be if (buffer[i] & SURROGATE_MASK) {
+      // See discussion above.
+      // patch for issue https://github.com/simdutf/simdutf/issues/514
+      if ((permbuffer[i] & 0xf8000000) == 0xf0000000) {
+        utf16_output[0] = uint16_t(buffer_tmp[i] >> 16);
+        utf16_output[1] = uint16_t(buffer_tmp[i] & 0xFFFF);
         utf16_output += 2;
       } else {
-        utf16_output[0] = big_endian ? uint16_t(basic_buffer_swap[i])
-                                     : uint16_t(basic_buffer[i]);
+        utf16_output[0] = uint16_t(buffer_tmp[i] & 0xFFFF);
         utf16_output++;
       }
     }
+    return consumed;
   } else {
     // here we know that there is an error but we do not handle errors
+    return 12;
   }
-  return consumed;
 }
-/* end file src/westmere/sse_convert_utf8_to_utf16.cpp */
-/* begin file src/westmere/sse_convert_utf8_to_utf32.cpp */
-// depends on "tables/utf8_to_utf16_tables.h"
-
+/* end file src/lasx/lasx_convert_utf8_to_utf16.cpp */
+/* begin file src/lasx/lasx_convert_utf8_to_utf32.cpp */
 // Convert up to 12 bytes from utf8 to utf32 using a mask indicating the
 // end of the code points. Only the least significant 12 bits of the mask
 // are accessed.
 // It returns how many bytes were consumed (up to 12).
 size_t convert_masked_utf8_to_utf32(const char *input,
                                     uint64_t utf8_end_of_code_point_mask,
-                                    char32_t *&utf32_output) {
+                                    char32_t *&utf32_out) {
   // we use an approach where we try to process up to 12 input bytes.
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
+  uint32_t *&utf32_output = reinterpret_cast<uint32_t *&>(utf32_out);
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xFFF;
   //
   // Optimization note: our main path below is load-latency dependent. Thus it
   // is maybe beneficial to have fast paths that depend on branch prediction but
@@ -35958,135 +50867,179 @@ size_t convert_masked_utf8_to_utf32(const char *input,
   // higher speeds.
   //
   // We first try a few fast paths.
-  const __m128i in = _mm_loadu_si128((__m128i *)input);
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask & 0xfff;
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process the data in chunks of 12 bytes.
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
-                     _mm_cvtepu8_epi32(in));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
-                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 4)));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 8),
-                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 8)));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 12),
-                     _mm_cvtepu8_epi32(_mm_srli_si128(in, 12)));
-    utf32_output += 12; // We wrote 12 32-bit characters.
-    return 12;          // We consumed 12 bytes.
-  }
-  if (((utf8_end_of_code_point_mask & 0xffff) == 0xaaaa)) {
-    // We want to take 8 2-byte UTF-8 code units and turn them into 8 4-byte
-    // UTF-32 code units. There is probably a more efficient sequence, but the
-    // following might do.
-    const __m128i sh =
-        _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-    const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
-                     _mm_cvtepu16_epi32(composed));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
-                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
-    utf32_output += 8; // We wrote 32 bytes, 8 code points.
-    return 16;
+  if ((utf8_end_of_code_point_mask & 0xffff) == 0xffff) {
+    // We process in chunks of 16 bytes.
+    // use fast implementation in src/simdutf/arm64/simd.h
+    // Ideally the compiler can keep the tables in registers.
+    __m128i zero = __lsx_vldi(0);
+    __m128i in16low = __lsx_vilvl_b(zero, in);
+    __m128i in16high = __lsx_vilvh_b(zero, in);
+    __m128i in32_0 = __lsx_vilvl_h(zero, in16low);
+    __m128i in32_1 = __lsx_vilvh_h(zero, in16low);
+    __m128i in32_2 = __lsx_vilvl_h(zero, in16high);
+    __m128i in32_3 = __lsx_vilvh_h(zero, in16high);
+
+    __lsx_vst(in32_0, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(in32_1, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    __lsx_vst(in32_2, reinterpret_cast<uint32_t *>(utf32_output), 32);
+    __lsx_vst(in32_3, reinterpret_cast<uint32_t *>(utf32_output), 48);
+
+    utf32_output += 16; // We wrote 16 32-bit characters.
+    return 16;          // We consumed 16 bytes.
   }
+  __m128i zero = __lsx_vldi(0);
   if (input_utf8_end_of_code_point_mask == 0x924) {
     // We want to take 4 3-byte UTF-8 code units and turn them into 4 4-byte
-    // UTF-32 code units. There is probably a more efficient sequence, but the
-    // following might do.
-    const __m128i sh =
-        _mm_setr_epi8(2, 1, 0, -1, 5, 4, 3, -1, 8, 7, 6, -1, 11, 10, 9, -1);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii =
-        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
-    const __m128i middlebyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    const __m128i highbyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
-    _mm_storeu_si128((__m128i *)utf32_output, composed);
-    utf32_output += 4;
-    return 12;
+    // UTF-32 code units. Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_3_byte_to_utf16(in);
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
+
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return 12;         // We consumed 12 bytes.
   }
-  /// We do not have a fast path available, so we fallback.
+  // 2 byte sequences occur in short bursts in languages like Greek and Russian.
+  if (input_utf8_end_of_code_point_mask == 0xaaa) {
+    // We want to take 6 2-byte UTF-8 code units and turn them into 6 4-byte
+    // UTF-32 code units. Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_2_byte_to_utf16(in);
+
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
+    __m128i utf32_high = __lsx_vilvh_h(zero, composed_utf16);
+
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(utf32_high, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    utf32_output += 6;
+    return 12; // We consumed 12 bytes.
+  }
+  // Either no fast path or an unimportant fast path.
+
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
 
-  const uint8_t idx =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
   if (idx < 64) {
     // SIX (6) input code-code units
-    // this is a relatively easy scenario
-    // we process SIX (6) input code-code units. The max length in bytes of six
-    // code code units spanning between 1 and 2 bytes each is 12 bytes. On
-    // processors where pdep/pext is fast, we might be able to use a small
-    // lookup table.
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-    const __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
-                     _mm_cvtepu16_epi32(composed));
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
-                     _mm_cvtepu16_epi32(_mm_srli_si128(composed, 8)));
-    utf32_output += 6; // We wrote 12 bytes, 6 code points.
+    // Convert to UTF-16
+    __m128i composed_utf16 = convert_utf8_1_to_2_byte_to_utf16(in, idx);
+    __m128i utf32_low = __lsx_vilvl_h(zero, composed_utf16);
+    __m128i utf32_high = __lsx_vilvh_h(zero, composed_utf16);
+
+    __lsx_vst(utf32_low, reinterpret_cast<uint32_t *>(utf32_output), 0);
+    __lsx_vst(utf32_high, reinterpret_cast<uint32_t *>(utf32_output), 16);
+    utf32_output += 6;
+    return consumed;
   } else if (idx < 145) {
     // FOUR (4) input code-code units
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii =
-        _mm_and_si128(perm, _mm_set1_epi32(0x7f)); // 7 or 6 bits
-    const __m128i middlebyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x3f00)); // 5 or 6 bits
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    const __m128i highbyte =
-        _mm_and_si128(perm, _mm_set1_epi32(0x0f0000)); // 4 bits
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 4);
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted), highbyte_shifted);
-    _mm_storeu_si128((__m128i *)utf32_output, composed);
-    utf32_output += 4;
+    // UTF-16 and UTF-32 use similar algorithms, but UTF-32 skips the narrowing.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // Shuffle
+    // 1 byte: 00000000 00000000 0ccccccc
+    // 2 byte: 00000000 110bbbbb 10cccccc
+    // 3 byte: 1110aaaa 10bbbbbb 10cccccc
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+    // Split
+    // 00000000 00000000 0ccccccc
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7F)); // 6 or 7 bits
+    // Note: unmasked
+    // xxxxxxxx aaaaxxxx xxxxxxxx
+    __m128i high =
+        __lsx_vsrli_w(__lsx_vand_v(perm, __lsx_vldi(0xf)), 4); // 4 bits
+    // Use 16 bit bic instead of and.
+    // The top bits will be corrected later in the bsl
+    // 00000000 10bbbbbb 00000000
+    __m128i middle =
+        __lsx_vand_v(perm, __lsx_vldi(-1758 /*0x0000FF00*/)); // 5 or 6 bits
+    // Combine low and middle with shift right accumulate
+    // 00000000 00xxbbbb bbcccccc
+    __m128i lowmid = __lsx_vor_v(ascii, __lsx_vsrli_w(middle, 2));
+    // Insert top 4 bits from high byte with bitwise select
+    // 00000000 aaaabbbb bbcccccc
+    __m128i composed =
+        __lsx_vbitsel_v(lowmid, high, __lsx_vldi(-3600 /*0x0000F000*/));
+    __lsx_vst(composed, utf32_output, 0);
+    utf32_output += 4; // We wrote 4 32-bit characters.
+    return consumed;
   } else if (idx < 209) {
-    // TWO (2) input code-code units
-    const __m128i sh =
-        _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-    const __m128i perm = _mm_shuffle_epi8(in, sh);
-    const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi32(0x7f));
-    const __m128i middlebyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f00));
-    const __m128i middlebyte_shifted = _mm_srli_epi32(middlebyte, 2);
-    __m128i middlehighbyte = _mm_and_si128(perm, _mm_set1_epi32(0x3f0000));
-    // correct for spurious high bit
-    const __m128i correct =
-        _mm_srli_epi32(_mm_and_si128(perm, _mm_set1_epi32(0x400000)), 1);
-    middlehighbyte = _mm_xor_si128(correct, middlehighbyte);
-    const __m128i middlehighbyte_shifted = _mm_srli_epi32(middlehighbyte, 4);
-    const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi32(0x07000000));
-    const __m128i highbyte_shifted = _mm_srli_epi32(highbyte, 6);
-    const __m128i composed =
-        _mm_or_si128(_mm_or_si128(ascii, middlebyte_shifted),
-                     _mm_or_si128(highbyte_shifted, middlehighbyte_shifted));
-    _mm_storeu_si128((__m128i *)utf32_output, composed);
-    utf32_output += 3;
+    // THREE (3) input code-code units
+    if (input_utf8_end_of_code_point_mask == 0x888) {
+      // We want to take 3 4-byte UTF-8 code units and turn them into 3 4-byte
+      // UTF-32 code units. This uses the same method as the fixed 3 byte
+      // version, reversing and shift left insert. However, there is no need for
+      // a shuffle mask now, just rev16 and rev32.
+      //
+      // This version does not use the LUT, but 4 byte sequences are less common
+      // and the overhead of the extra memory access is less important than the
+      // early branch overhead in shorter sequences, so it comes last.
+
+      // Swap pairs of bytes
+      // 10dddddd|10cccccc|10bbbbbb|11110aaa
+      // 10cccccc 10dddddd|11110aaa 10bbbbbb
+      __m128i swap = lsx_swap_bytes(in);
+      // Shift left and insert
+      // xxxxcccc ccdddddd|xxxxxxxa aabbbbbb
+      __m128i merge1 = __lsx_vbitsel_v(__lsx_vsrli_h(swap, 2), swap,
+                                       __lsx_vrepli_h(0x3f /*0x003F*/));
+      // Shift insert again
+      // xxxxxxxx xxxaaabb bbbbcccc ccdddddd
+      __m128i merge2 =
+          __lsx_vbitsel_v(__lsx_vslli_w(merge1, 12), /* merge1 << 12 */
+                          __lsx_vsrli_w(merge1, 16), /* merge1 >> 16 */
+                          __lsx_vldi(-2545));        /*0x00000FFF*/
+      // Clear the garbage
+      // 00000000 000aaabb bbbbcccc ccdddddd
+      __m128i composed = __lsx_vand_v(merge2, __lsx_vldi(-2273 /*0x1FFFFF*/));
+      // Store
+      __lsx_vst(composed, utf32_output, 0);
+      utf32_output += 3; // We wrote 3 32-bit characters.
+      return 12;         // We consumed 12 bytes.
+    }
+    // Unlike UTF-16, doing a fast codepath doesn't have nearly as much benefit
+    // due to surrogates no longer being involved.
+    __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                               simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                           0);
+    // 1 byte: 00000000 00000000 00000000 0ddddddd
+    // 2 byte: 00000000 00000000 110ccccc 10dddddd
+    // 3 byte: 00000000 1110bbbb 10cccccc 10dddddd
+    // 4 byte: 11110aaa 10bbbbbb 10cccccc 10dddddd
+    sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+    __m128i perm = __lsx_vshuf_b(zero, in, sh);
+
+    // Ascii
+    __m128i ascii = __lsx_vand_v(perm, __lsx_vrepli_w(0x7F));
+    __m128i middle = __lsx_vand_v(perm, __lsx_vldi(-3777 /*0x00003f00*/));
+    // 00000000 00000000 0000cccc ccdddddd
+    __m128i cd =
+        __lsx_vbitsel_v(__lsx_vsrli_w(middle, 2), ascii, __lsx_vrepli_w(0x3f));
+
+    __m128i correction = __lsx_vand_v(perm, __lsx_vldi(-3520 /*0x00400000*/));
+    __m128i corrected = __lsx_vadd_b(perm, __lsx_vsrli_w(correction, 1));
+    // Insert twice
+    // 00000000 000aaabb bbbbxxxx xxxxxxxx
+    __m128i corrected_srli2 =
+        __lsx_vsrli_w(__lsx_vand_v(corrected, __lsx_vrepli_b(0x7)), 2);
+    __m128i ab =
+        __lsx_vbitsel_v(corrected_srli2, corrected, __lsx_vrepli_h(0x3f));
+    ab = __lsx_vsrli_w(ab, 4);
+    // 00000000 000aaabb bbbbcccc ccdddddd
+    __m128i composed =
+        __lsx_vbitsel_v(ab, cd, __lsx_vldi(-2545 /*0x00000FFF*/));
+    // Store
+    __lsx_vst(composed, utf32_output, 0);
+    utf32_output += 3; // We wrote 3 32-bit characters.
+    return consumed;
   } else {
     // here we know that there is an error but we do not handle errors
+    return 12;
   }
-  return consumed;
 }
-/* end file src/westmere/sse_convert_utf8_to_utf32.cpp */
-/* begin file src/westmere/sse_convert_utf8_to_latin1.cpp */
-// depends on "tables/utf8_to_utf16_tables.h"
-
-// Convert up to 12 bytes from utf8 to latin1 using a mask indicating the
-// end of the code points. Only the least significant 12 bits of the mask
-// are accessed.
-// It returns how many bytes were consumed (up to 12).
+/* end file src/lasx/lasx_convert_utf8_to_utf32.cpp */
+/* begin file src/lasx/lasx_convert_utf8_to_latin1.cpp */
 size_t convert_masked_utf8_to_latin1(const char *input,
                                      uint64_t utf8_end_of_code_point_mask,
                                      char *&latin1_output) {
@@ -36094,27 +51047,30 @@ size_t convert_masked_utf8_to_latin1(const char *input,
   // Why 12 input bytes and not 16? Because we are concerned with the size of
   // the lookup tables. Also 12 is nicely divisible by two and three.
   //
-  //
+  __m128i in = __lsx_vld(reinterpret_cast<const uint8_t *>(input), 0);
+
+  const uint16_t input_utf8_end_of_code_point_mask =
+      utf8_end_of_code_point_mask & 0xfff;
   // Optimization note: our main path below is load-latency dependent. Thus it
   // is maybe beneficial to have fast paths that depend on branch prediction but
   // have less latency. This results in more instructions but, potentially, also
   // higher speeds.
-  //
-  const __m128i in = _mm_loadu_si128((__m128i *)input);
-  const uint16_t input_utf8_end_of_code_point_mask =
-      utf8_end_of_code_point_mask &
-      0xfff; // we are only processing 12 bytes in case it is not all ASCII
-  if (utf8_end_of_code_point_mask == 0xfff) {
-    // We process the data in chunks of 12 bytes.
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(latin1_output), in);
-    latin1_output += 12; // We wrote 12 characters.
-    return 12;           // We consumed 12 bytes.
+
+  // We first try a few fast paths.
+  // The obvious first test is ASCII, which actually consumes the full 16.
+  if ((utf8_end_of_code_point_mask & 0xFFFF) == 0xFFFF) {
+    // We process in chunks of 16 bytes
+    __lsx_vst(in, reinterpret_cast<uint8_t *>(latin1_output), 0);
+    latin1_output += 16; // We wrote 16 18-bit characters.
+    return 16;           // We consumed 16 bytes.
   }
-  /// We do not have a fast path available, so we fallback.
-  const uint8_t idx =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][0];
-  const uint8_t consumed =
-      tables::utf8_to_utf16::utf8bigindex[input_utf8_end_of_code_point_mask][1];
+  /// We do not have a fast path available, or the fast path is unimportant, so
+  /// we fallback.
+  const uint8_t idx = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][0];
+
+  const uint8_t consumed = simdutf::tables::utf8_to_utf16::utf8bigindex
+      [input_utf8_end_of_code_point_mask][1];
   // this indicates an invalid input:
   if (idx >= 64) {
     return consumed;
@@ -36122,50 +51078,63 @@ size_t convert_masked_utf8_to_latin1(const char *input,
   // Here we should have (idx < 64), if not, there is a bug in the validation or
   // elsewhere. SIX (6) input code-code units this is a relatively easy scenario
   // we process SIX (6) input code-code units. The max length in bytes of six
-  // code code units spanning between 1 and 2 bytes each is 12 bytes. On
-  // processors where pdep/pext is fast, we might be able to use a small lookup
-  // table.
-  const __m128i sh =
-      _mm_loadu_si128((const __m128i *)tables::utf8_to_utf16::shufutf8[idx]);
-  const __m128i perm = _mm_shuffle_epi8(in, sh);
-  const __m128i ascii = _mm_and_si128(perm, _mm_set1_epi16(0x7f));
-  const __m128i highbyte = _mm_and_si128(perm, _mm_set1_epi16(0x1f00));
-  __m128i composed = _mm_or_si128(ascii, _mm_srli_epi16(highbyte, 2));
-  const __m128i latin1_packed = _mm_packus_epi16(composed, composed);
+  // code code units spanning between 1 and 2 bytes each is 12 bytes. Converts 6
+  // 1-2 byte UTF-8 characters to 6 UTF-16 characters. This is a relatively easy
+  // scenario we process SIX (6) input code-code units. The max length in bytes
+  // of six code code units spanning between 1 and 2 bytes each is 12 bytes.
+  __m128i sh = __lsx_vld(reinterpret_cast<const uint8_t *>(
+                             simdutf::tables::utf8_to_utf16::shufutf8[idx]),
+                         0);
+  // Shuffle
+  // 1 byte: 00000000 0bbbbbbb
+  // 2 byte: 110aaaaa 10bbbbbb
+  sh = __lsx_vand_v(sh, __lsx_vldi(0x1f));
+  __m128i perm = __lsx_vshuf_b(__lsx_vldi(0), in, sh);
+  // ascii mask
+  // 1 byte: 11111111 11111111
+  // 2 byte: 00000000 00000000
+  __m128i ascii_mask = __lsx_vslt_bu(perm, __lsx_vldi(0x80));
+  // utf8 mask
+  // 1 byte: 00000000 00000000
+  // 2 byte: 00111111 00111111
+  __m128i utf8_mask = __lsx_vand_v(__lsx_vsle_bu(__lsx_vldi(0x80), perm),
+                                   __lsx_vldi(0b00111111));
+  // mask
+  //  1 byte: 11111111 11111111
+  //  2 byte: 00111111 00111111
+  __m128i mask = __lsx_vor_v(utf8_mask, ascii_mask);
+
+  __m128i composed = __lsx_vbitsel_v(__lsx_vsrli_h(perm, 2), perm, mask);
   // writing 8 bytes even though we only care about the first 6 bytes.
-  // performance note: it would be faster to use _mm_storeu_si128, we should
-  // investigate.
-  _mm_storel_epi64((__m128i *)latin1_output, latin1_packed);
+  __m128i latin1_packed = __lsx_vpickev_b(__lsx_vldi(0), composed);
+
+  __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
   latin1_output += 6; // We wrote 6 bytes.
   return consumed;
 }
-/* end file src/westmere/sse_convert_utf8_to_latin1.cpp */
+/* end file src/lasx/lasx_convert_utf8_to_latin1.cpp */
 
-/* begin file src/westmere/sse_convert_utf16_to_latin1.cpp */
+/* begin file src/lasx/lasx_convert_utf16_to_latin1.cpp */
 template <endianness big_endian>
 std::pair<const char16_t *, char *>
-sse_convert_utf16_to_latin1(const char16_t *buf, size_t len,
-                            char *latin1_output) {
+lasx_convert_utf16_to_latin1(const char16_t *buf, size_t len,
+                             char *latin1_output) {
   const char16_t *end = buf + len;
-  while (end - buf >= 8) {
-    // Load 8 UTF-16 characters into 128-bit SSE register
-    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
-
+  while (buf + 16 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
     if (!match_system(big_endian)) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+      in = lsx_swap_bytes(in);
+      in1 = lsx_swap_bytes(in1);
     }
-
-    __m128i high_byte_mask = _mm_set1_epi16((int16_t)0xFF00);
-    if (_mm_testz_si128(in, high_byte_mask)) {
-      // Pack 16-bit characters into 8-bit and store in latin1_output
-      __m128i latin1_packed = _mm_packus_epi16(in, in);
-      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
-                       latin1_packed);
-      // Adjust pointers for next iteration
-      buf += 8;
-      latin1_output += 8;
+    if (__lsx_bz_v(__lsx_vpickod_b(in1, in))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vpickev_b(in1, in);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
     } else {
       return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
@@ -36175,29 +51144,28 @@ sse_convert_utf16_to_latin1(const char16_t *buf, size_t len,
 
 template <endianness big_endian>
 std::pair<result, char *>
-sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
-                                        char *latin1_output) {
+lasx_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
+                                         char *latin1_output) {
   const char16_t *start = buf;
   const char16_t *end = buf + len;
-  while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(buf));
-
+  while (buf + 16 <= end) {
+    __m128i in = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 0);
+    __m128i in1 = __lsx_vld(reinterpret_cast<const uint16_t *>(buf), 16);
     if (!match_system(big_endian)) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+      in = lsx_swap_bytes(in);
+      in1 = lsx_swap_bytes(in1);
     }
-
-    __m128i high_byte_mask = _mm_set1_epi16((int16_t)0xFF00);
-    if (_mm_testz_si128(in, high_byte_mask)) {
-      __m128i latin1_packed = _mm_packus_epi16(in, in);
-      _mm_storel_epi64(reinterpret_cast<__m128i *>(latin1_output),
-                       latin1_packed);
-      buf += 8;
-      latin1_output += 8;
+    if (__lsx_bz_v(__lsx_vpickod_b(in1, in))) {
+      // 1. pack the bytes
+      __m128i latin1_packed = __lsx_vpickev_b(in1, in);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
     } else {
-      // Fallback to scalar code for handling errors
-      for (int k = 0; k < 8; k++) {
+      // Let us do a scalar fallback.
+      for (int k = 0; k < 16; k++) {
         uint16_t word = !match_system(big_endian)
                             ? scalar::utf16::swap_bytes(buf[k])
                             : buf[k];
@@ -36208,16 +51176,15 @@ sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
                                 latin1_output);
         }
       }
-      buf += 8;
     }
   } // while
   return std::make_pair(result(error_code::SUCCESS, buf - start),
                         latin1_output);
 }
-/* end file src/westmere/sse_convert_utf16_to_latin1.cpp */
-/* begin file src/westmere/sse_convert_utf16_to_utf8.cpp */
+/* end file src/lasx/lasx_convert_utf16_to_latin1.cpp */
+/* begin file src/lasx/lasx_convert_utf16_to_utf8.cpp */
 /*
-    The vectorized algorithm works on single SSE register i.e., it
+    The vectorized algorithm works on single LASX register i.e., it
     loads eight 16-bit code units.
 
     We consider three cases:
@@ -36231,11 +51198,11 @@ sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
     Ad 1.
 
     When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it is an ASCII
+    can be converted into: 1) single UTF8 byte (when it's an ASCII
     char) or 2) two UTF8 bytes.
 
     For this case we do only some shuffle to obtain these 2-byte
-    codes and finally compress the whole SSE register with a single
+    codes and finally compress the whole LASX register with a single
     shuffle.
 
     We need 256-entry lookup table to get a compression pattern
@@ -36253,7 +51220,7 @@ sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
     the three-UTF8-bytes case.
 
     Finally these two registers are interleaved forming eight-element
-    array of 32-bit values. The array spans two SSE registers.
+    array of 32-bit values. The array spans two LASX registers.
     The bytes from the registers are compressed using two shuffles.
 
     We need 256-entry lookup table to get a compression pattern
@@ -36264,187 +51231,210 @@ sse_convert_utf16_to_latin1_with_errors(const char16_t *buf, size_t len,
     To summarize:
     - We need two 256-entry tables that have 8704 bytes in total.
 */
-
 /*
   Returns a pair: the first unprocessed byte from buf and utf8_output
   A scalar routing should carry on the conversion of the tail.
 */
+
 template <endianness big_endian>
 std::pair<const char16_t *, char *>
-sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
-
+lasx_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
   const char16_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();
-  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
-  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    if (big_endian) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+  __m256i v_07ff = __lasx_xvreplgr2vr_h(uint16_t(0x7ff));
+  __m256i zero = __lasx_xvldi(0);
+  __m128i zero_128 = __lsx_vldi(0);
+  while (buf + 16 + safety_margin <= end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lasx_swap_bytes(in);
     }
-    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
-    const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
-    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
-      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        nextin = _mm_shuffle_epi8(nextin, swap);
-      }
-      if (!_mm_testz_si128(nextin, v_ff80)) {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in, in);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        in = nextin;
-      } else {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
-      }
+    if (__lasx_xbnz_h(__lasx_xvslt_hu(
+            in, __lasx_xvrepli_h(0x7F)))) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      __m256i utf8_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_b(in, in), 0b00001000);
+      // 2. store (16 bytes)
+      __lsx_vst(lasx_extracti128_lo(utf8_packed), utf8_output, 0);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
 
-    // no bits set above 7th bit
-    const __m128i one_byte_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+    if (__lasx_xbz_v(__lasx_xvslt_hu(v_07ff, in))) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 16
+      // expected output   : [110a|aaaa|10bb|bbbb] x 16
+      // t0 = [000a|aaaa|bbbb|bb00]
+      __m256i t0 = __lasx_xvslli_h(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      __m256i t1 = __lasx_xvand_v(t0, __lasx_xvldi(-2785 /*0x1f00*/));
+      // t2 = [0000|0000|00bb|bbbb]
+      __m256i t2 = __lasx_xvand_v(in, __lasx_xvrepli_h(0x3f));
+      // t3 = [000a|aaaa|00bb|bbbb]
+      __m256i t3 = __lasx_xvor_v(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      __m256i v_c080 = __lasx_xvreplgr2vr_h(uint16_t(0xc080));
+      __m256i t4 = __lasx_xvor_v(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      __m256i one_byte_bytemask =
+          __lasx_xvsle_hu(in, __lasx_xvrepli_h(0x7F /*0x007F*/));
+      __m256i utf8_unpacked = __lasx_xvbitsel_v(t4, in, one_byte_bytemask);
+      // 3. prepare bitmask for 8-bit lookup
+      __m256i mask = __lasx_xvmskltz_h(one_byte_bytemask);
+      uint32_t m1 = __lasx_xvpickve2gr_wu(mask, 0);
+      uint32_t m2 = __lasx_xvpickve2gr_wu(mask, 4);
+      // 4. pack the bytes
+      const uint8_t *row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                                [lasx_1_2_utf8_bytes_mask[m1]][0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_packed1 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(utf8_unpacked), shuffle1);
+
+      const uint8_t *row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                                [lasx_1_2_utf8_bytes_mask[m2]][0];
+      __m128i shuffle2 = __lsx_vld(row2, 1);
+      __m128i utf8_packed2 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(utf8_unpacked), shuffle2);
+      // 5. store bytes
+      __lsx_vst(utf8_packed1, utf8_output, 0);
+      utf8_output += row1[0];
 
-    // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+      __lsx_vst(utf8_packed2, utf8_output, 0);
+      utf8_output += row2[0];
 
-    if (one_or_two_bytes_bitmask == 0xffff) {
-      internal::westmere::write_v_u16_11bits_to_utf8(
-          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
-      buf += 8;
+      buf += 16;
       continue;
     }
-
-    // 1. Check if there are any surrogate word in the input chunk.
-    //    We have also deal with situation when there is a surrogate word
-    //    at the end of a chunk.
-    const __m128i surrogates_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
-
-    // bitmask = 0x0000 if there are no surrogates
-    //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    __m256i surrogates_bytemask =
+        __lasx_xvseq_h(__lasx_xvand_v(in, __lasx_xvldi(-2568 /*0xF800*/)),
+                       __lasx_xvldi(-2600 /*0xD800*/));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x0000) {
+    if (__lasx_xbz_v(surrogates_bytemask)) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-
       /* In this branch we handle three cases:
-         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
+           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+         single UFT-8 byte
+           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+         two UTF-8 bytes
+           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+         three UTF-8 bytes
 
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
+          We precompute byte 1 for case #3 and -- **conditionally** --
+         precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+         they differ by exactly one bit.
 
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
+          Finally from these two code units we build proper UTF-8 sequence,
+         taking into account the case (i.e, the number of bytes to write).
+        */
       /**
        * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
+      __m256i t0 = __lasx_xvpickev_b(in, in);
+      t0 = __lasx_xvilvl_b(t0, t0);
+
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|00cc|cccc]
+      __m256i v_3f7f = __lasx_xvreplgr2vr_h(uint16_t(0x3F7F));
+      __m256i t1 = __lasx_xvand_v(t0, v_3f7f);
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+      __m256i t2 = __lasx_xvor_v(t1, __lasx_xvldi(-2688));
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m128i s0 = _mm_srli_epi16(in, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
-                                          simdutf_vec(0b0100000000000000));
-      const __m128i s4 = _mm_xor_si128(s3, m0);
-#undef simdutf_vec
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      __m256i s0 = __lasx_xvsrli_h(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      __m256i s1 = __lasx_xvslli_h(in, 2);
+      // s1: [aabb|bbbb|cccc|cc00] => [00bb|bbbb|0000|0000]
+      s1 = __lasx_xvand_v(s1, __lasx_xvldi(-2753 /*0x3F00*/));
+
+      // [00bb|bbbb|0000|aaaa]
+      __m256i s2 = __lasx_xvor_v(s0, s1);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      __m256i v_c0e0 = __lasx_xvreplgr2vr_h(uint16_t(0xC0E0));
+      __m256i s3 = __lasx_xvor_v(s2, v_c0e0);
+      __m256i one_or_two_bytes_bytemask = __lasx_xvsle_hu(in, v_07ff);
+      __m256i m0 = __lasx_xvandn_v(one_or_two_bytes_bytemask,
+                                   __lasx_xvldi(-2752 /*0x4000*/));
+      __m256i s4 = __lasx_xvxor_v(s3, m0);
 
       // 4. expand code units 16-bit => 32-bit
-      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+      __m256i out0 = __lasx_xvilvl_h(s4, t2);
+      __m256i out1 = __lasx_xvilvh_h(s4, t2);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask =
-          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
-      if (mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
-                                              15, 13, -1, -1, -1, -1);
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
-        continue;
-      }
-      const uint8_t mask0 = uint8_t(mask);
-
+      __m256i one_byte_bytemask = __lasx_xvsle_hu(in, __lasx_xvrepli_h(0x7F));
+      __m256i one_byte_bytemask_low =
+          __lasx_xvilvl_h(one_byte_bytemask, one_byte_bytemask);
+      __m256i one_byte_bytemask_high =
+          __lasx_xvilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+      __m256i one_or_two_bytes_bytemask_low =
+          __lasx_xvilvl_h(one_or_two_bytes_bytemask, zero);
+      __m256i one_or_two_bytes_bytemask_high =
+          __lasx_xvilvh_h(one_or_two_bytes_bytemask, zero);
+
+      __m256i mask0 = __lasx_xvmskltz_h(
+          __lasx_xvor_v(one_or_two_bytes_bytemask_low, one_byte_bytemask_low));
+      __m256i mask1 = __lasx_xvmskltz_h(__lasx_xvor_v(
+          one_or_two_bytes_bytemask_high, one_byte_bytemask_high));
+
+      uint32_t mask = __lasx_xvpickve2gr_wu(mask0, 0);
       const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
-
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle0 = __lsx_vld(row0, 1);
+      __m128i utf8_0 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out0), shuffle0);
+      __lsx_vst(utf8_0, utf8_output, 0);
+      utf8_output += row0[0];
 
+      mask = __lasx_xvpickve2gr_wu(mask1, 0);
       const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
-
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_1 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out1), shuffle1);
+      __lsx_vst(utf8_1, utf8_output, 0);
       utf8_output += row1[0];
 
-      buf += 8;
+      mask = __lasx_xvpickve2gr_wu(mask0, 4);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle2 = __lsx_vld(row2, 1);
+      __m128i utf8_2 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out0), shuffle2);
+      __lsx_vst(utf8_2, utf8_output, 0);
+      utf8_output += row2[0];
+
+      mask = __lasx_xvpickve2gr_wu(mask1, 4);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle3 = __lsx_vld(row3, 1);
+      __m128i utf8_3 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out1), shuffle3);
+      __lsx_vst(utf8_3, utf8_output, 0);
+      utf8_output += row3[0];
+
+      buf += 16;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -36456,7 +51446,9 @@ sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
         forward = size_t(end - buf - 1);
       }
       for (; k < forward; k++) {
-        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
         if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
         } else if ((word & 0xF800) == 0) {
@@ -36469,12 +51461,14 @@ sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word =
-              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
           if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(nullptr, utf8_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
           }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf8_output++ = char((value >> 18) | 0b11110000);
@@ -36486,8 +51480,7 @@ sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
       buf += k;
     }
   } // while
-
-  return std::make_pair(buf, utf8_output);
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
 }
 
 /*
@@ -36499,181 +51492,205 @@ sse_convert_utf16_to_utf8(const char16_t *buf, size_t len, char *utf8_output) {
 */
 template <endianness big_endian>
 std::pair<result, char *>
-sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
-                                      char *utf8_output) {
+lasx_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
+                                       char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
   const char16_t *start = buf;
   const char16_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();
-  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
-  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    if (big_endian) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+  __m256i v_07ff = __lasx_xvreplgr2vr_h(uint16_t(0x7ff));
+  __m256i zero = __lasx_xvldi(0);
+  __m128i zero_128 = __lsx_vldi(0);
+  while (buf + 16 + safety_margin <= end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lasx_swap_bytes(in);
     }
-    // a single 16-bit UTF-16 word can yield 1, 2 or 3 UTF-8 bytes
-    const __m128i v_ff80 = _mm_set1_epi16((int16_t)0xff80);
-    if (_mm_testz_si128(in, v_ff80)) { // ASCII fast path!!!!
-      __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        nextin = _mm_shuffle_epi8(nextin, swap);
-      }
-      if (!_mm_testz_si128(nextin, v_ff80)) {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in, in);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 8;
-        utf8_output += 8;
-        in = nextin;
-      } else {
-        // 1. pack the bytes
-        // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(in, nextin);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-        // 3. adjust pointers
-        buf += 16;
-        utf8_output += 16;
-        continue; // we are done for this round!
-      }
+    if (__lasx_xbnz_h(__lasx_xvslt_hu(
+            in, __lasx_xvrepli_h(0x7F)))) { // ASCII fast path!!!!
+      // 1. pack the bytes
+      __m256i utf8_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_b(in, in), 0b00001000);
+      // 2. store (16 bytes)
+      __lsx_vst(lasx_extracti128_lo(utf8_packed), utf8_output, 0);
+      // 3. adjust pointers
+      buf += 16;
+      utf8_output += 16;
+      continue; // we are done for this round!
     }
 
-    // no bits set above 7th bit
-    const __m128i one_byte_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
+    if (__lasx_xbz_v(__lasx_xvslt_hu(v_07ff, in))) {
+      // 1. prepare 2-byte values
+      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 16
+      // expected output   : [110a|aaaa|10bb|bbbb] x 16
+      // t0 = [000a|aaaa|bbbb|bb00]
+      __m256i t0 = __lasx_xvslli_h(in, 2);
+      // t1 = [000a|aaaa|0000|0000]
+      __m256i t1 = __lasx_xvand_v(t0, __lasx_xvldi(-2785 /*0x1f00*/));
+      // t2 = [0000|0000|00bb|bbbb]
+      __m256i t2 = __lasx_xvand_v(in, __lasx_xvrepli_h(0x3f));
+      // t3 = [000a|aaaa|00bb|bbbb]
+      __m256i t3 = __lasx_xvor_v(t1, t2);
+      // t4 = [110a|aaaa|10bb|bbbb]
+      __m256i v_c080 = __lasx_xvreplgr2vr_h(uint16_t(0xc080));
+      __m256i t4 = __lasx_xvor_v(t3, v_c080);
+      // 2. merge ASCII and 2-byte codewords
+      __m256i one_byte_bytemask =
+          __lasx_xvsle_hu(in, __lasx_xvrepli_h(0x7F /*0x007F*/));
+      __m256i utf8_unpacked = __lasx_xvbitsel_v(t4, in, one_byte_bytemask);
+      // 3. prepare bitmask for 8-bit lookup
+      __m256i mask = __lasx_xvmskltz_h(one_byte_bytemask);
+      uint32_t m1 = __lasx_xvpickve2gr_wu(mask, 0);
+      uint32_t m2 = __lasx_xvpickve2gr_wu(mask, 4);
+      // 4. pack the bytes
+      const uint8_t *row1 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                                [lasx_1_2_utf8_bytes_mask[m1]][0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_packed1 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(utf8_unpacked), shuffle1);
+
+      const uint8_t *row2 = &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                                [lasx_1_2_utf8_bytes_mask[m2]][0];
+      __m128i shuffle2 = __lsx_vld(row2, 1);
+      __m128i utf8_packed2 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(utf8_unpacked), shuffle2);
+      // 5. store bytes
+      __lsx_vst(utf8_packed1, utf8_output, 0);
+      utf8_output += row1[0];
 
-    // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
+      __lsx_vst(utf8_packed2, utf8_output, 0);
+      utf8_output += row2[0];
 
-    if (one_or_two_bytes_bitmask == 0xffff) {
-      internal::westmere::write_v_u16_11bits_to_utf8(
-          in, utf8_output, one_byte_bytemask, one_byte_bitmask);
-      buf += 8;
+      buf += 16;
       continue;
     }
-
-    // 1. Check if there are any surrogate word in the input chunk.
-    //    We have also deal with situation when there is a surrogate word
-    //    at the end of a chunk.
-    const __m128i surrogates_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
-
-    // bitmask = 0x0000 if there are no surrogates
-    //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    __m256i surrogates_bytemask =
+        __lasx_xvseq_h(__lasx_xvand_v(in, __lasx_xvldi(-2568 /*0xF800*/)),
+                       __lasx_xvldi(-2600 /*0xD800*/));
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x0000) {
+    if (__lasx_xbz_v(surrogates_bytemask)) {
       // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-
       /* In this branch we handle three cases:
-         1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-         2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              - two
-        UTF-8 bytes
-         3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
+           1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+         single UFT-8 byte
+           2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+         two UTF-8 bytes
+           3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+         three UTF-8 bytes
 
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
+          We expand the input word (16-bit) into two code units (32-bit), thus
+          we have room for four bytes. However, we need five distinct bit
+          layouts. Note that the last byte in cases #2 and #3 is the same.
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
+          We precompute byte 1 for case #1 and the common byte for cases #2 & #3
+          in register t2.
 
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
+          We precompute byte 1 for case #3 and -- **conditionally** --
+         precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+         they differ by exactly one bit.
 
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
+          Finally from these two code units we build proper UTF-8 sequence,
+         taking into account the case (i.e, the number of bytes to write).
+        */
       /**
        * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
        * t2 => [0ccc|cccc] [10cc|cccc]
        * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
        */
-#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
       // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m128i t0 = _mm_shuffle_epi8(in, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
+      __m256i t0 = __lasx_xvpickev_b(in, in);
+      t0 = __lasx_xvilvl_b(t0, t0);
+
+      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|00cc|cccc]
+      __m256i v_3f7f = __lasx_xvreplgr2vr_h(uint16_t(0x3F7F));
+      __m256i t1 = __lasx_xvand_v(t0, v_3f7f);
       // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+      __m256i t2 = __lasx_xvor_v(t1, __lasx_xvldi(-2688));
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m128i s0 = _mm_srli_epi16(in, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
-                                          simdutf_vec(0b0100000000000000));
-      const __m128i s4 = _mm_xor_si128(s3, m0);
-#undef simdutf_vec
+      // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+      __m256i s0 = __lasx_xvsrli_h(in, 12);
+      // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+      __m256i s1 = __lasx_xvslli_h(in, 2);
+      // s1: [aabb|bbbb|cccc|cc00] => [00bb|bbbb|0000|0000]
+      s1 = __lasx_xvand_v(s1, __lasx_xvldi(-2753 /*0x3F00*/));
+
+      // [00bb|bbbb|0000|aaaa]
+      __m256i s2 = __lasx_xvor_v(s0, s1);
+      // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+      __m256i v_c0e0 = __lasx_xvreplgr2vr_h(uint16_t(0xC0E0));
+      __m256i s3 = __lasx_xvor_v(s2, v_c0e0);
+      __m256i one_or_two_bytes_bytemask = __lasx_xvsle_hu(in, v_07ff);
+      __m256i m0 = __lasx_xvandn_v(one_or_two_bytes_bytemask,
+                                   __lasx_xvldi(-2752 /*0x4000*/));
+      __m256i s4 = __lasx_xvxor_v(s3, m0);
 
       // 4. expand code units 16-bit => 32-bit
-      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+      __m256i out0 = __lasx_xvilvl_h(s4, t2);
+      __m256i out1 = __lasx_xvilvh_h(s4, t2);
 
       // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask =
-          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
-      if (mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
-                                              15, 13, -1, -1, -1, -1);
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
-        continue;
-      }
-      const uint8_t mask0 = uint8_t(mask);
-
+      __m256i one_byte_bytemask = __lasx_xvsle_hu(in, __lasx_xvrepli_h(0x7F));
+      __m256i one_byte_bytemask_low =
+          __lasx_xvilvl_h(one_byte_bytemask, one_byte_bytemask);
+      __m256i one_byte_bytemask_high =
+          __lasx_xvilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+      __m256i one_or_two_bytes_bytemask_low =
+          __lasx_xvilvl_h(one_or_two_bytes_bytemask, zero);
+      __m256i one_or_two_bytes_bytemask_high =
+          __lasx_xvilvh_h(one_or_two_bytes_bytemask, zero);
+
+      __m256i mask0 = __lasx_xvmskltz_h(
+          __lasx_xvor_v(one_or_two_bytes_bytemask_low, one_byte_bytemask_low));
+      __m256i mask1 = __lasx_xvmskltz_h(__lasx_xvor_v(
+          one_or_two_bytes_bytemask_high, one_byte_bytemask_high));
+
+      uint32_t mask = __lasx_xvpickve2gr_wu(mask0, 0);
       const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
-
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle0 = __lsx_vld(row0, 1);
+      __m128i utf8_0 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out0), shuffle0);
+      __lsx_vst(utf8_0, utf8_output, 0);
+      utf8_output += row0[0];
 
+      mask = __lasx_xvpickve2gr_wu(mask1, 0);
       const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
-
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle1 = __lsx_vld(row1, 1);
+      __m128i utf8_1 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out1), shuffle1);
+      __lsx_vst(utf8_1, utf8_output, 0);
       utf8_output += row1[0];
 
-      buf += 8;
+      mask = __lasx_xvpickve2gr_wu(mask0, 4);
+      const uint8_t *row2 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle2 = __lsx_vld(row2, 1);
+      __m128i utf8_2 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out0), shuffle2);
+      __lsx_vst(utf8_2, utf8_output, 0);
+      utf8_output += row2[0];
+
+      mask = __lasx_xvpickve2gr_wu(mask1, 4);
+      const uint8_t *row3 =
+          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                [0];
+      __m128i shuffle3 = __lsx_vld(row3, 1);
+      __m128i utf8_3 =
+          __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out1), shuffle3);
+      __lsx_vst(utf8_3, utf8_output, 0);
+      utf8_output += row3[0];
+
+      buf += 16;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -36685,7 +51702,9 @@ sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
         forward = size_t(end - buf - 1);
       }
       for (; k < forward; k++) {
-        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
         if ((word & 0xFF80) == 0) {
           *utf8_output++ = char(word);
         } else if ((word & 0xF800) == 0) {
@@ -36698,14 +51717,15 @@ sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word =
-              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
           if ((diff | diff2) > 0x3FF) {
             return std::make_pair(
                 result(error_code::SURROGATE, buf - start + k - 1),
-                utf8_output);
+                reinterpret_cast<char *>(utf8_output));
           }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf8_output++ = char((value >> 18) | 0b11110000);
@@ -36718,101 +51738,67 @@ sse_convert_utf16_to_utf8_with_errors(const char16_t *buf, size_t len,
     }
   } // while
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
-}
-/* end file src/westmere/sse_convert_utf16_to_utf8.cpp */
-/* begin file src/westmere/sse_convert_utf16_to_utf32.cpp */
-/*
-    The vectorized algorithm works on single SSE register i.e., it
-    loads eight 16-bit code units.
-
-    We consider three cases:
-    1. an input register contains no surrogates and each value
-       is in range 0x0000 .. 0x07ff.
-    2. an input register contains no surrogates and values are
-       is in range 0x0000 .. 0xffff.
-    3. an input register contains surrogates --- i.e. codepoints
-       can have 16 or 32 bits.
-
-    Ad 1.
-
-    When values are less than 0x0800, it means that a 16-bit code unit
-    can be converted into: 1) single UTF8 byte (when it's an ASCII
-    char) or 2) two UTF8 bytes.
-
-    For this case we do only some shuffle to obtain these 2-byte
-    codes and finally compress the whole SSE register with a single
-    shuffle.
-
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
-
-    Ad 2.
-
-    When values fit in 16-bit code units, but are above 0x07ff, then
-    a single word may produce one, two or three UTF8 bytes.
-
-    We prepare data for all these three cases in two registers.
-    The first register contains lower two UTF8 bytes (used in all
-    cases), while the second one contains just the third byte for
-    the three-UTF8-bytes case.
-
-    Finally these two registers are interleaved forming eight-element
-    array of 32-bit values. The array spans two SSE registers.
-    The bytes from the registers are compressed using two shuffles.
-
-    We need 256-entry lookup table to get a compression pattern
-    and the number of output bytes in the compressed vector register.
-    Each entry occupies 17 bytes.
-
-
-    To summarize:
-    - We need two 256-entry tables that have 8704 bytes in total.
-*/
-
-/*
-  Returns a pair: the first unprocessed byte from buf and utf8_output
-  A scalar routing should carry on the conversion of the tail.
-*/
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
+}
+/* end file src/lasx/lasx_convert_utf16_to_utf8.cpp */
+/* begin file src/lasx/lasx_convert_utf16_to_utf32.cpp */
 template <endianness big_endian>
 std::pair<const char16_t *, char32_t *>
-sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
-                           char32_t *utf32_output) {
+lasx_convert_utf16_to_utf32(const char16_t *buf, size_t len,
+                            char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
   const char16_t *end = buf + len;
 
-  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
-  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)utf32_output & 0x1f) && buf < end) {
+    uint16_t word =
+        !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[0]) : buf[0];
+    if ((word & 0xF800) != 0xD800) {
+      *utf32_output++ = char32_t(word);
+      buf++;
+    } else {
+      if (buf + 1 >= end) {
+        return std::make_pair(nullptr,
+                              reinterpret_cast<char32_t *>(utf32_output));
+      }
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      uint16_t next_word = !match_system(big_endian)
+                               ? scalar::utf16::swap_bytes(buf[1])
+                               : buf[1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if ((diff | diff2) > 0x3FF) {
+        return std::make_pair(nullptr,
+                              reinterpret_cast<char32_t *>(utf32_output));
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      *utf32_output++ = char32_t(value);
+      buf += 2;
+    }
+  }
 
-  while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
+  __m256i v_f800 = __lasx_xvldi(-2568); /*0xF800*/
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
 
-    if (big_endian) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+  while (buf + 16 <= end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lasx_swap_bytes(in);
     }
 
-    // 1. Check if there are any surrogate word in the input chunk.
-    //    We have also deal with situation when there is a surrogate word
-    //    at the end of a chunk.
-    const __m128i surrogates_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
-
-    // bitmask = 0x0000 if there are no surrogates
-    //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    __m256i surrogates_bytemask =
+        __lasx_xvseq_h(__lasx_xvand_v(in, v_f800), v_d800);
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x0000) {
-      // case: no surrogate pair, extend 16-bit code units to 32-bit code units
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
-                       _mm_cvtepu16_epi32(in));
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
-                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
-      utf32_output += 8;
-      buf += 8;
+    if (__lasx_xbz_v(surrogates_bytemask)) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      __m256i in_hi = __lasx_xvpermi_q(in, in, 0b00000001);
+      __lasx_xvst(__lasx_vext2xv_wu_hu(in), utf32_output, 0);
+      __lasx_xvst(__lasx_vext2xv_wu_hu(in_hi), utf32_output, 32);
+      utf32_output += 16;
+      buf += 16;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -36824,18 +51810,22 @@ sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
         forward = size_t(end - buf - 1);
       }
       for (; k < forward; k++) {
-        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
         if ((word & 0xF800) != 0xD800) {
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word =
-              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
           if ((diff | diff2) > 0x3FF) {
-            return std::make_pair(nullptr, utf32_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char32_t *>(utf32_output));
           }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
@@ -36844,7 +51834,7 @@ sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
       buf += k;
     }
   } // while
-  return std::make_pair(buf, utf32_output);
+  return std::make_pair(buf, reinterpret_cast<char32_t *>(utf32_output));
 }
 
 /*
@@ -36856,43 +51846,59 @@ sse_convert_utf16_to_utf32(const char16_t *buf, size_t len,
 */
 template <endianness big_endian>
 std::pair<result, char32_t *>
-sse_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
-                                       char32_t *utf32_output) {
+lasx_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
+                                        char32_t *utf32_out) {
+  uint32_t *utf32_output = reinterpret_cast<uint32_t *>(utf32_out);
   const char16_t *start = buf;
   const char16_t *end = buf + len;
 
-  const __m128i v_f800 = _mm_set1_epi16((int16_t)0xf800);
-  const __m128i v_d800 = _mm_set1_epi16((int16_t)0xd800);
-
-  while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-
-    if (big_endian) {
-      const __m128i swap =
-          _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-      in = _mm_shuffle_epi8(in, swap);
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)utf32_output & 0x1f) && buf < end) {
+    uint16_t word =
+        !match_system(big_endian) ? scalar::utf16::swap_bytes(buf[0]) : buf[0];
+    if ((word & 0xF800) != 0xD800) {
+      *utf32_output++ = char32_t(word);
+      buf++;
+    } else if (buf + 1 < end) {
+      // must be a surrogate pair
+      uint16_t diff = uint16_t(word - 0xD800);
+      uint16_t next_word = !match_system(big_endian)
+                               ? scalar::utf16::swap_bytes(buf[1])
+                               : buf[1];
+      uint16_t diff2 = uint16_t(next_word - 0xDC00);
+      if ((diff | diff2) > 0x3FF) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              reinterpret_cast<char32_t *>(utf32_output));
+      }
+      uint32_t value = (diff << 10) + diff2 + 0x10000;
+      *utf32_output++ = char32_t(value);
+      buf += 2;
+    } else {
+      return std::make_pair(result(error_code::SURROGATE, buf - start),
+                            reinterpret_cast<char32_t *>(utf32_output));
     }
+  }
 
-    // 1. Check if there are any surrogate word in the input chunk.
-    //    We have also deal with situation when there is a surrogate word
-    //    at the end of a chunk.
-    const __m128i surrogates_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in, v_f800), v_d800);
+  __m256i v_f800 = __lasx_xvldi(-2568); /*0xF800*/
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
+  while (buf + 16 <= end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint16_t *>(buf), 0);
+    if (!match_system(big_endian)) {
+      in = lasx_swap_bytes(in);
+    }
 
-    // bitmask = 0x0000 if there are no surrogates
-    //         = 0xc000 if the last word is a surrogate
-    const uint16_t surrogates_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(surrogates_bytemask));
+    __m256i surrogates_bytemask =
+        __lasx_xvseq_h(__lasx_xvand_v(in, v_f800), v_d800);
     // It might seem like checking for surrogates_bitmask == 0xc000 could help.
     // However, it is likely an uncommon occurrence.
-    if (surrogates_bitmask == 0x0000) {
-      // case: no surrogate pair, extend 16-bit code units to 32-bit code units
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output),
-                       _mm_cvtepu16_epi32(in));
-      _mm_storeu_si128(reinterpret_cast<__m128i *>(utf32_output + 4),
-                       _mm_cvtepu16_epi32(_mm_srli_si128(in, 8)));
-      utf32_output += 8;
-      buf += 8;
+    if (__lasx_xbz_v(surrogates_bytemask)) {
+      // case: no surrogate pairs, extend all 16-bit code units to 32-bit code
+      // units
+      __m256i in_hi = __lasx_xvpermi_q(in, in, 0b00000001);
+      __lasx_xvst(__lasx_vext2xv_wu_hu(in), utf32_output, 0);
+      __lasx_xvst(__lasx_vext2xv_wu_hu(in_hi), utf32_output, 32);
+      utf32_output += 16;
+      buf += 16;
       // surrogate pair(s) in a register
     } else {
       // Let us do a scalar fallback.
@@ -36904,20 +51910,23 @@ sse_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
         forward = size_t(end - buf - 1);
       }
       for (; k < forward; k++) {
-        uint16_t word = big_endian ? scalar::utf16::swap_bytes(buf[k]) : buf[k];
+        uint16_t word = !match_system(big_endian)
+                            ? scalar::utf16::swap_bytes(buf[k])
+                            : buf[k];
         if ((word & 0xF800) != 0xD800) {
           *utf32_output++ = char32_t(word);
         } else {
           // must be a surrogate pair
           uint16_t diff = uint16_t(word - 0xD800);
-          uint16_t next_word =
-              big_endian ? scalar::utf16::swap_bytes(buf[k + 1]) : buf[k + 1];
+          uint16_t next_word = !match_system(big_endian)
+                                   ? scalar::utf16::swap_bytes(buf[k + 1])
+                                   : buf[k + 1];
           k++;
           uint16_t diff2 = uint16_t(next_word - 0xDC00);
           if ((diff | diff2) > 0x3FF) {
             return std::make_pair(
                 result(error_code::SURROGATE, buf - start + k - 1),
-                utf32_output);
+                reinterpret_cast<char32_t *>(utf32_output));
           }
           uint32_t value = (diff << 10) + diff2 + 0x10000;
           *utf32_output++ = char32_t(value);
@@ -36926,379 +51935,337 @@ sse_convert_utf16_to_utf32_with_errors(const char16_t *buf, size_t len,
       buf += k;
     }
   } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf32_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char32_t *>(utf32_output));
 }
-/* end file src/westmere/sse_convert_utf16_to_utf32.cpp */
+/* end file src/lasx/lasx_convert_utf16_to_utf32.cpp */
 
-/* begin file src/westmere/sse_convert_utf32_to_latin1.cpp */
+/* begin file src/lasx/lasx_convert_utf32_to_latin1.cpp */
 std::pair<const char32_t *, char *>
-sse_convert_utf32_to_latin1(const char32_t *buf, size_t len,
-                            char *latin1_output) {
-  const size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
-
-  __m128i high_bytes_mask = _mm_set1_epi32(0xFFFFFF00);
-  __m128i shufmask =
-      _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
-
-  for (size_t i = 0; i < rounded_len; i += 16) {
-    __m128i in1 = _mm_loadu_si128((__m128i *)buf);
-    __m128i in2 = _mm_loadu_si128((__m128i *)(buf + 4));
-    __m128i in3 = _mm_loadu_si128((__m128i *)(buf + 8));
-    __m128i in4 = _mm_loadu_si128((__m128i *)(buf + 12));
+lasx_convert_utf32_to_latin1(const char32_t *buf, size_t len,
+                             char *latin1_output) {
+  const char32_t *end = buf + len;
+  const __m256i shuf_mask = ____m256i(
+      (__m128i)v16u8{0, 4, 8, 12, 16, 20, 24, 28, 0, 0, 0, 0, 0, 0, 0, 0});
+  __m256i v_ff = __lasx_xvrepli_w(0xFF);
 
-    __m128i check_combined = _mm_or_si128(in1, in2);
-    check_combined = _mm_or_si128(check_combined, in3);
-    check_combined = _mm_or_si128(check_combined, in4);
+  while (buf + 16 <= end) {
+    __m256i in1 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i in2 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
-    if (!_mm_testz_si128(check_combined, high_bytes_mask)) {
-      return std::make_pair(nullptr, latin1_output);
+    __m256i in12 = __lasx_xvor_v(in1, in2);
+    if (__lasx_xbz_v(__lasx_xvslt_wu(v_ff, in12))) {
+      // 1. pack the bytes
+      __m256i latin1_packed_tmp = __lasx_xvshuf_b(in2, in1, shuf_mask);
+      latin1_packed_tmp = __lasx_xvpermi_d(latin1_packed_tmp, 0b00001000);
+      __m128i latin1_packed = lasx_extracti128_lo(latin1_packed_tmp);
+      latin1_packed = __lsx_vpermi_w(latin1_packed, latin1_packed, 0b11011000);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      return std::make_pair(nullptr, reinterpret_cast<char *>(latin1_output));
     }
-    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
-                                       _mm_shuffle_epi8(in2, shufmask));
-    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
-                                       _mm_shuffle_epi8(in4, shufmask));
-    __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
-    _mm_storeu_si128((__m128i *)latin1_output, pack);
-    latin1_output += 16;
-    buf += 16;
-  }
-
+  } // while
   return std::make_pair(buf, latin1_output);
 }
 
 std::pair<result, char *>
-sse_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
-                                        char *latin1_output) {
+lasx_convert_utf32_to_latin1_with_errors(const char32_t *buf, size_t len,
+                                         char *latin1_output) {
   const char32_t *start = buf;
-  const size_t rounded_len = len & ~0xF; // Round down to nearest multiple of 16
-
-  __m128i high_bytes_mask = _mm_set1_epi32(0xFFFFFF00);
-  __m128i shufmask =
-      _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 12, 8, 4, 0);
+  const char32_t *end = buf + len;
 
-  for (size_t i = 0; i < rounded_len; i += 16) {
-    __m128i in1 = _mm_loadu_si128((__m128i *)buf);
-    __m128i in2 = _mm_loadu_si128((__m128i *)(buf + 4));
-    __m128i in3 = _mm_loadu_si128((__m128i *)(buf + 8));
-    __m128i in4 = _mm_loadu_si128((__m128i *)(buf + 12));
+  const __m256i shuf_mask = ____m256i(
+      (__m128i)v16u8{0, 4, 8, 12, 16, 20, 24, 28, 0, 0, 0, 0, 0, 0, 0, 0});
+  __m256i v_ff = __lasx_xvrepli_w(0xFF);
 
-    __m128i check_combined = _mm_or_si128(in1, in2);
-    check_combined = _mm_or_si128(check_combined, in3);
-    check_combined = _mm_or_si128(check_combined, in4);
+  while (buf + 16 <= end) {
+    __m256i in1 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i in2 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
-    if (!_mm_testz_si128(check_combined, high_bytes_mask)) {
-      // Fallback to scalar code for handling errors
+    __m256i in12 = __lasx_xvor_v(in1, in2);
+    if (__lasx_xbz_v(__lasx_xvslt_wu(v_ff, in12))) {
+      // 1. pack the bytes
+      __m256i latin1_packed_tmp = __lasx_xvshuf_b(in2, in1, shuf_mask);
+      latin1_packed_tmp = __lasx_xvpermi_d(latin1_packed_tmp, 0b00001000);
+      __m128i latin1_packed = lasx_extracti128_lo(latin1_packed_tmp);
+      latin1_packed = __lsx_vpermi_w(latin1_packed, latin1_packed, 0b11011000);
+      // 2. store (8 bytes)
+      __lsx_vst(latin1_packed, reinterpret_cast<uint8_t *>(latin1_output), 0);
+      // 3. adjust pointers
+      buf += 16;
+      latin1_output += 16;
+    } else {
+      // Let us do a scalar fallback.
       for (int k = 0; k < 16; k++) {
-        char32_t codepoint = buf[k];
-        if (codepoint <= 0xff) {
-          *latin1_output++ = char(codepoint);
+        uint32_t word = buf[k];
+        if (word <= 0xff) {
+          *latin1_output++ = char(word);
         } else {
           return std::make_pair(result(error_code::TOO_LARGE, buf - start + k),
                                 latin1_output);
         }
       }
-      buf += 16;
-      continue;
     }
-    __m128i pack1 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in1, shufmask),
-                                       _mm_shuffle_epi8(in2, shufmask));
-    __m128i pack2 = _mm_unpacklo_epi32(_mm_shuffle_epi8(in3, shufmask),
-                                       _mm_shuffle_epi8(in4, shufmask));
-    __m128i pack = _mm_unpacklo_epi64(pack1, pack2);
-    _mm_storeu_si128((__m128i *)latin1_output, pack);
-    latin1_output += 16;
-    buf += 16;
-  }
-
+  } // while
   return std::make_pair(result(error_code::SUCCESS, buf - start),
                         latin1_output);
 }
-/* end file src/westmere/sse_convert_utf32_to_latin1.cpp */
-/* begin file src/westmere/sse_convert_utf32_to_utf8.cpp */
+/* end file src/lasx/lasx_convert_utf32_to_latin1.cpp */
+/* begin file src/lasx/lasx_convert_utf32_to_utf8.cpp */
 std::pair<const char32_t *, char *>
-sse_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
+lasx_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
   const char32_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();              //__m128 = 128 bits
-  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800); // 1111 1000 0000
-                                                           // 0000
-  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080); // 1100 0000 1000
-                                                           // 0000
-  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80); // 1111 1111 1000
-                                                           // 0000
-  const __m128i v_ffff0000 = _mm_set1_epi32(
-      (uint32_t)0xffff0000); // 1111 1111 1111 1111 0000 0000 0000 0000
-  const __m128i v_7fffffff = _mm_set1_epi32(
-      (uint32_t)0x7fffffff); // 0111 1111 1111 1111 1111 1111 1111 1111
-  __m128i running_max = _mm_setzero_si128();
-  __m128i forbidden_bytemask = _mm_setzero_si128();
+  // load addr align 32
+  while (((uint64_t)buf & 0x1F) && buf < end) {
+    uint32_t word = *buf;
+    if ((word & 0xFFFFFF80) == 0) {
+      *utf8_output++ = char(word);
+    } else if ((word & 0xFFFFF800) == 0) {
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    } else if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    } else {
+      if (word > 0x10FFFF) {
+        return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    }
+    buf++;
+  }
+
+  __m256i v_c080 = __lasx_xvreplgr2vr_h(uint16_t(0xC080));
+  __m256i v_07ff = __lasx_xvreplgr2vr_h(uint16_t(0x7FF));
+  __m256i v_dfff = __lasx_xvreplgr2vr_h(uint16_t(0xDFFF));
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
+  __m256i zero = __lasx_xvldi(0);
+  __m128i zero_128 = __lsx_vldi(0);
+  __m256i forbidden_bytemask = __lasx_xvldi(0x0);
+
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >=
-         std::ptrdiff_t(
-             16 + safety_margin)) { // buf is a char32_t pointer, each char32_t
-                                    // has 4 bytes or 32 bits, thus buf + 16 *
-                                    // char_32t = 512 bits = 64 bytes
-    // We load two 16 bytes registers for a total of 32 bytes or 16 characters.
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    __m128i nextin = _mm_loadu_si128(
-        (__m128i *)buf + 1); // These two values can hold only 8 UTF32 chars
-    running_max = _mm_max_epu32(
-        _mm_max_epu32(in, running_max), // take element-wise max char32_t from
-                                        // in and running_max vector
-        nextin); // and take element-wise max element from nextin and
-                 // running_max vector
-
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
-    // saturation
-    __m128i in_16 = _mm_packus_epi32(
-        _mm_and_si128(in, v_7fffffff),
-        _mm_and_si128(
-            nextin,
-            v_7fffffff)); // in this context pack the two __m128 into a single
-    // By ensuring the highest bit is set to 0(&v_7fffffff), we are making sure
-    // all values are interpreted as non-negative, or specifically, the values
-    // are within the range of valid Unicode code points. remember : having
-    // leading byte 0 means a positive number by the two complements system.
-    // Unicode is well beneath the range where you'll start getting issues so
-    // that's OK.
-
-    // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
+  while (buf + 16 + safety_margin < end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i nextin = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
-    // Check for ASCII fast path
+    // Check if no bits set above 16th
+    if (__lasx_xbz_v(__lasx_xvpickod_h(in, nextin))) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (lasx_convert_utf16_to_utf8.cpp)
+      __m256i utf16_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_h(nextin, in), 0b11011000);
 
-    // ASCII fast path!!!!
-    // We eagerly load another 32 bytes, hoping that they will be ASCII too.
-    // The intuition is that we try to collect 16 ASCII characters which
-    // requires a total of 64 bytes of input. If we fail, we just pass thirdin
-    // and fourthin as our new inputs.
-    if (_mm_testz_si128(in_16, v_ff80)) { // if the first two blocks are ASCII
-      __m128i thirdin = _mm_loadu_si128((__m128i *)buf + 2);
-      __m128i fourthin = _mm_loadu_si128((__m128i *)buf + 3);
-      running_max = _mm_max_epu32(
-          _mm_max_epu32(thirdin, running_max),
-          fourthin); // take the running max of all 4 vectors thus far
-      __m128i nextin_16 = _mm_packus_epi32(
-          _mm_and_si128(thirdin, v_7fffffff),
-          _mm_and_si128(fourthin,
-                        v_7fffffff)); // pack into 1 vector, now you have two
-      if (!_mm_testz_si128(
-              nextin_16,
-              v_ff80)) { // checks if the second packed vector is ASCII, if not:
+      if (__lasx_xbz_v(__lasx_xvslt_hu(__lasx_xvrepli_h(0x7F),
+                                       utf16_packed))) { // ASCII fast path!!!!
         // 1. pack the bytes
         // obviously suboptimal.
-        const __m128i utf8_packed = _mm_packus_epi16(
-            in_16, in_16); // creates two copy of in_16 in 1 vector
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output,
-                         utf8_packed); // put them into the output
-        // 3. adjust pointers
-        buf += 8; // the char32_t buffer pointer goes up 8 char32_t chars* 32
-                  // bits =  256 bits
-        utf8_output +=
-            8; // same with output, e.g. lift the first two blocks alone.
-        // Proceed with next input
-        in_16 = nextin_16;
-        // We need to update in and nextin because they are used later.
-        in = thirdin;
-        nextin = fourthin;
-      } else {
-        // 1. pack the bytes
-        const __m128i utf8_packed = _mm_packus_epi16(in_16, nextin_16);
-        // 2. store (16 bytes)
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
+        __m256i utf8_packed = __lasx_xvpermi_d(
+            __lasx_xvpickev_b(utf16_packed, utf16_packed), 0b00001000);
+        // 2. store (8 bytes)
+        __lsx_vst(lasx_extracti128_lo(utf8_packed), utf8_output, 0);
         // 3. adjust pointers
         buf += 16;
         utf8_output += 16;
         continue; // we are done for this round!
       }
-    }
-
-    // no bits set above 7th bit -- find out all the ASCII characters
-    const __m128i one_byte_bytemask =
-        _mm_cmpeq_epi16( // this takes four bytes at a time and compares:
-            _mm_and_si128(in_16, v_ff80), // the vector that get only the first
-                                          // 9 bits of each 16-bit/2-byte units
-            v_0000                        //
-        ); // they should be all zero if they are ASCII. E.g. ASCII in UTF32 is
-           // of format 0000 0000 0000 0XXX XXXX
-    // _mm_cmpeq_epi16 should now return a 1111 1111 1111 1111 for equals, and
-    // 0000 0000 0000 0000 if not for each 16-bit/2-byte units
-    const uint16_t one_byte_bitmask = static_cast<uint16_t>(_mm_movemask_epi8(
-        one_byte_bytemask)); // collect the MSB from previous vector and put
-                             // them into uint16_t mas
-
-    // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
-
-    if (one_or_two_bytes_bitmask == 0xffff) {
-      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
-      // produces 2 bytes)
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m128i v_1f00 =
-          _mm_set1_epi16((int16_t)0x1f00); // 0001 1111 0000 0000
-      const __m128i v_003f =
-          _mm_set1_epi16((int16_t)0x003f); // 0000 0000 0011 1111
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m128i t0 = _mm_slli_epi16(in_16, 2); // shift packed vector by two
-      // t1 = [000a|aaaa|0000|0000]
-      const __m128i t1 =
-          _mm_and_si128(t0, v_1f00); // potentital first utf8 byte
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m128i t2 =
-          _mm_and_si128(in_16, v_003f); // potential second utf8 byte
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m128i t3 =
-          _mm_or_si128(t1, t2); // first and second potential utf8 byte together
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m128i t4 = _mm_or_si128(
-          t3,
-          v_c080); // t3 | 1100 0000 1000 0000 = full potential 2-byte utf8 unit
-
-      // 2. merge ASCII and 2-byte codewords
-      const __m128i utf8_unpacked =
-          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
-
-      // 3. prepare bitmask for 8-bit lookup
-      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
-      //    MSB, a - LSB)
-      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
-      const uint16_t m1 =
-          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
-      const uint8_t m2 =
-          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
-      // 4. pack the bytes
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
-
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-
-      // 6. adjust pointers
-      buf += 8;
-      utf8_output += row[0];
-      continue;
-    }
-
-    // Check for overflow in packing
-
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
-        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
-    if (saturation_bitmask == 0xffff) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
-      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask =
-          _mm_or_si128(forbidden_bytemask,
-                       _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800));
-
-      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-
-      /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
-        two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
-
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
 
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
-
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
-
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m128i t0 = _mm_shuffle_epi8(in_16, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+      if (__lasx_xbz_v(__lasx_xvslt_hu(v_07ff, utf16_packed))) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m128i s0 = _mm_srli_epi16(in_16, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
-                                          simdutf_vec(0b0100000000000000));
-      const __m128i s4 = _mm_xor_si128(s3, m0);
-#undef simdutf_vec
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const __m256i t0 = __lasx_xvslli_h(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const __m256i t1 = __lasx_xvand_v(t0, __lasx_xvldi(-2785 /*0x1f00*/));
+        // t2 = [0000|0000|00bb|bbbb]
+        const __m256i t2 = __lasx_xvand_v(utf16_packed, __lasx_xvrepli_h(0x3f));
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const __m256i t3 = __lasx_xvor_v(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const __m256i t4 = __lasx_xvor_v(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        __m256i one_byte_bytemask =
+            __lasx_xvsle_hu(utf16_packed, __lasx_xvrepli_h(0x7F /*0x007F*/));
+        __m256i utf8_unpacked =
+            __lasx_xvbitsel_v(t4, utf16_packed, one_byte_bytemask);
+        // 3. prepare bitmask for 8-bit lookup
+        __m256i mask = __lasx_xvmskltz_h(one_byte_bytemask);
+        uint32_t m1 = __lasx_xvpickve2gr_wu(mask, 0);
+        uint32_t m2 = __lasx_xvpickve2gr_wu(mask, 4);
+        // 4. pack the bytes
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lasx_1_2_utf8_bytes_mask[m1]][0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_packed1 = __lsx_vshuf_b(
+            zero_128, lasx_extracti128_lo(utf8_unpacked), shuffle1);
+
+        const uint8_t *row2 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lasx_1_2_utf8_bytes_mask[m2]][0];
+        __m128i shuffle2 = __lsx_vld(row2, 1);
+        __m128i utf8_packed2 = __lsx_vshuf_b(
+            zero_128, lasx_extracti128_hi(utf8_unpacked), shuffle2);
+        // 5. store bytes
+        __lsx_vst(utf8_packed1, utf8_output, 0);
+        utf8_output += row1[0];
 
-      // 4. expand code units 16-bit => 32-bit
-      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+        __lsx_vst(utf8_packed2, utf8_output, 0);
+        utf8_output += row2[0];
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask =
-          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
-      if (mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
-                                              15, 13, -1, -1, -1, -1);
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
+        buf += 16;
         continue;
-      }
-      const uint8_t mask0 = uint8_t(mask);
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        forbidden_bytemask = __lasx_xvor_v(
+            __lasx_xvand_v(
+                __lasx_xvsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+                __lasx_xvsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+            forbidden_bytemask);
+        /* In this branch we handle three cases:
+            1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+           single UFT-8 byte
+            2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+           two UTF-8 bytes
+            3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+           three UTF-8 bytes
+
+            We expand the input word (16-bit) into two code units (32-bit), thus
+            we have room for four bytes. However, we need five distinct bit
+            layouts. Note that the last byte in cases #2 and #3 is the same.
+
+            We precompute byte 1 for case #1 and the common byte for cases #2 &
+           #3 in register t2.
+
+            We precompute byte 1 for case #3 and -- **conditionally** --
+           precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+           they differ by exactly one bit.
+
+            Finally from these two code units we build proper UTF-8 sequence,
+           taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        __m256i t0 = __lasx_xvpickev_b(utf16_packed, utf16_packed);
+        t0 = __lasx_xvilvl_b(t0, t0);
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        __m256i v_3f7f = __lasx_xvreplgr2vr_h(uint16_t(0x3F7F));
+        __m256i t1 = __lasx_xvand_v(t0, v_3f7f);
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        __m256i t2 = __lasx_xvor_v(t1, __lasx_xvldi(-2688 /*0x8000*/));
 
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        __m256i s0 = __lasx_xvsrli_h(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        __m256i s1 = __lasx_xvslli_h(utf16_packed, 2);
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        s1 = __lasx_xvand_v(s1, __lasx_xvldi(-2753 /*0x3F00*/));
+        // [00bb|bbbb|0000|aaaa]
+        __m256i s2 = __lasx_xvor_v(s0, s1);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        __m256i v_c0e0 = __lasx_xvreplgr2vr_h(uint16_t(0xC0E0));
+        __m256i s3 = __lasx_xvor_v(s2, v_c0e0);
+        // __m256i v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        __m256i one_or_two_bytes_bytemask =
+            __lasx_xvsle_hu(utf16_packed, v_07ff);
+        __m256i m0 = __lasx_xvandn_v(one_or_two_bytes_bytemask,
+                                     __lasx_xvldi(-2752 /*0x4000*/));
+        __m256i s4 = __lasx_xvxor_v(s3, m0);
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        // 4. expand code units 16-bit => 32-bit
+        __m256i out0 = __lasx_xvilvl_h(s4, t2);
+        __m256i out1 = __lasx_xvilvh_h(s4, t2);
 
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        __m256i one_byte_bytemask =
+            __lasx_xvsle_hu(utf16_packed, __lasx_xvrepli_h(0x7F));
+
+        __m256i one_or_two_bytes_bytemask_u16_to_u32_low =
+            __lasx_xvilvl_h(one_or_two_bytes_bytemask, zero);
+        __m256i one_or_two_bytes_bytemask_u16_to_u32_high =
+            __lasx_xvilvh_h(one_or_two_bytes_bytemask, zero);
+
+        __m256i one_byte_bytemask_u16_to_u32_low =
+            __lasx_xvilvl_h(one_byte_bytemask, one_byte_bytemask);
+        __m256i one_byte_bytemask_u16_to_u32_high =
+            __lasx_xvilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+        __m256i mask0 = __lasx_xvmskltz_h(
+            __lasx_xvor_v(one_or_two_bytes_bytemask_u16_to_u32_low,
+                          one_byte_bytemask_u16_to_u32_low));
+        __m256i mask1 = __lasx_xvmskltz_h(
+            __lasx_xvor_v(one_or_two_bytes_bytemask_u16_to_u32_high,
+                          one_byte_bytemask_u16_to_u32_high));
+
+        uint32_t mask = __lasx_xvpickve2gr_wu(mask0, 0);
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle0 = __lsx_vld(row0, 1);
+        __m128i utf8_0 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out0), shuffle0);
+        __lsx_vst(utf8_0, utf8_output, 0);
+        utf8_output += row0[0];
 
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-      utf8_output += row1[0];
+        mask = __lasx_xvpickve2gr_wu(mask1, 0);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_1 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out1), shuffle1);
+        __lsx_vst(utf8_1, utf8_output, 0);
+        utf8_output += row1[0];
 
-      buf += 8;
+        mask = __lasx_xvpickve2gr_wu(mask0, 4);
+        const uint8_t *row2 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle2 = __lsx_vld(row2, 1);
+        __m128i utf8_2 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out0), shuffle2);
+        __lsx_vst(utf8_2, utf8_output, 0);
+        utf8_output += row2[0];
+
+        mask = __lasx_xvpickve2gr_wu(mask1, 4);
+        const uint8_t *row3 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle3 = __lsx_vld(row3, 1);
+        __m128i utf8_3 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out1), shuffle3);
+        __lsx_vst(utf8_3, utf8_output, 0);
+        utf8_output += row3[0];
+
+        buf += 16;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
     } else {
-      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
-      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD in the
-      // presence of surrogate pairs may require non-trivial tables.
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
@@ -37313,14 +52280,16 @@ sse_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else if ((word & 0xFFFF0000) == 0) {
           if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr, utf8_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
           }
           *utf8_output++ = char((word >> 12) | 0b11100000);
           *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
           *utf8_output++ = char((word & 0b111111) | 0b10000000);
         } else {
           if (word > 0x10FFFF) {
-            return std::make_pair(nullptr, utf8_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char *>(utf8_output));
           }
           *utf8_output++ = char((word >> 18) | 0b11110000);
           *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
@@ -37333,242 +52302,269 @@ sse_convert_utf32_to_utf8(const char32_t *buf, size_t len, char *utf8_output) {
   } // while
 
   // check for invalid input
-  const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
-  if (static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi32(
-          _mm_max_epu32(running_max, v_10ffff), v_10ffff))) != 0xffff) {
-    return std::make_pair(nullptr, utf8_output);
-  }
-
-  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
-    return std::make_pair(nullptr, utf8_output);
+  if (__lasx_xbnz_v(forbidden_bytemask)) {
+    return std::make_pair(nullptr, reinterpret_cast<char *>(utf8_output));
   }
-
-  return std::make_pair(buf, utf8_output);
+  return std::make_pair(buf, reinterpret_cast<char *>(utf8_output));
 }
 
 std::pair<result, char *>
-sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
-                                      char *utf8_output) {
-  const char32_t *end = buf + len;
+lasx_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
+                                       char *utf8_out) {
+  uint8_t *utf8_output = reinterpret_cast<uint8_t *>(utf8_out);
   const char32_t *start = buf;
+  const char32_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();
-  const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
-  const __m128i v_c080 = _mm_set1_epi16((uint16_t)0xc080);
-  const __m128i v_ff80 = _mm_set1_epi16((uint16_t)0xff80);
-  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
-  const __m128i v_7fffffff = _mm_set1_epi32((uint32_t)0x7fffffff);
-  const __m128i v_10ffff = _mm_set1_epi32((uint32_t)0x10ffff);
+  // load addr align 32
+  while (((uint64_t)buf & 0x1F) && buf < end) {
+    uint32_t word = *buf;
+    if ((word & 0xFFFFFF80) == 0) {
+      *utf8_output++ = char(word);
+    } else if ((word & 0xFFFFF800) == 0) {
+      *utf8_output++ = char((word >> 6) | 0b11000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    } else if ((word & 0xFFFF0000) == 0) {
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start),
+                              reinterpret_cast<char *>(utf8_output));
+      }
+      *utf8_output++ = char((word >> 12) | 0b11100000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    } else {
+      if (word > 0x10FFFF) {
+        return std::make_pair(result(error_code::TOO_LARGE, buf - start),
+                              reinterpret_cast<char *>(utf8_output));
+      }
+      *utf8_output++ = char((word >> 18) | 0b11110000);
+      *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
+      *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
+      *utf8_output++ = char((word & 0b111111) | 0b10000000);
+    }
+    buf++;
+  }
 
+  __m256i v_c080 = __lasx_xvreplgr2vr_h(uint16_t(0xC080));
+  __m256i v_07ff = __lasx_xvreplgr2vr_h(uint16_t(0x7FF));
+  __m256i v_dfff = __lasx_xvreplgr2vr_h(uint16_t(0xDFFF));
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
+  __m256i zero = __lasx_xvldi(0);
+  __m128i zero_128 = __lsx_vldi(0);
+  __m256i forbidden_bytemask = __lasx_xvldi(0x0);
   const size_t safety_margin =
       12; // to avoid overruns, see issue
           // https://github.com/simdutf/simdutf/issues/92
 
-  while (end - buf >= std::ptrdiff_t(16 + safety_margin)) {
-    // We load two 16 bytes registers for a total of 32 bytes or 8 characters.
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
-    // Check for too large input
-    __m128i max_input = _mm_max_epu32(_mm_max_epu32(in, nextin), v_10ffff);
-    if (static_cast<uint16_t>(_mm_movemask_epi8(
-            _mm_cmpeq_epi32(max_input, v_10ffff))) != 0xffff) {
-      return std::make_pair(result(error_code::TOO_LARGE, buf - start),
-                            utf8_output);
-    }
-
-    // Pack 32-bit UTF-32 code units to 16-bit UTF-16 code units with unsigned
-    // saturation
-    __m128i in_16 = _mm_packus_epi32(_mm_and_si128(in, v_7fffffff),
-                                     _mm_and_si128(nextin, v_7fffffff));
-
-    // Try to apply UTF-16 => UTF-8 from ./sse_convert_utf16_to_utf8.cpp
-
-    // Check for ASCII fast path
-    if (_mm_testz_si128(in_16, v_ff80)) { // ASCII fast path!!!!
-      // 1. pack the bytes
-      // obviously suboptimal.
-      const __m128i utf8_packed = _mm_packus_epi16(in_16, in_16);
-      // 2. store (16 bytes)
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-      // 3. adjust pointers
-      buf += 8;
-      utf8_output += 8;
-      continue;
-    }
-
-    // no bits set above 7th bit
-    const __m128i one_byte_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_ff80), v_0000);
-    const uint16_t one_byte_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_byte_bytemask));
-
-    // no bits set above 11th bit
-    const __m128i one_or_two_bytes_bytemask =
-        _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_0000);
-    const uint16_t one_or_two_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(one_or_two_bytes_bytemask));
-
-    if (one_or_two_bytes_bitmask == 0xffff) {
-      // case: all code units either produce 1 or 2 UTF-8 bytes (at least one
-      // produces 2 bytes)
-      // 1. prepare 2-byte values
-      // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
-      // expected output   : [110a|aaaa|10bb|bbbb] x 8
-      const __m128i v_1f00 = _mm_set1_epi16((int16_t)0x1f00);
-      const __m128i v_003f = _mm_set1_epi16((int16_t)0x003f);
-
-      // t0 = [000a|aaaa|bbbb|bb00]
-      const __m128i t0 = _mm_slli_epi16(in_16, 2);
-      // t1 = [000a|aaaa|0000|0000]
-      const __m128i t1 = _mm_and_si128(t0, v_1f00);
-      // t2 = [0000|0000|00bb|bbbb]
-      const __m128i t2 = _mm_and_si128(in_16, v_003f);
-      // t3 = [000a|aaaa|00bb|bbbb]
-      const __m128i t3 = _mm_or_si128(t1, t2);
-      // t4 = [110a|aaaa|10bb|bbbb]
-      const __m128i t4 = _mm_or_si128(t3, v_c080);
-
-      // 2. merge ASCII and 2-byte codewords
-      const __m128i utf8_unpacked =
-          _mm_blendv_epi8(t4, in_16, one_byte_bytemask);
-
-      // 3. prepare bitmask for 8-bit lookup
-      //    one_byte_bitmask = hhggffeeddccbbaa -- the bits are doubled (h -
-      //    MSB, a - LSB)
-      const uint16_t m0 = one_byte_bitmask & 0x5555; // m0 = 0h0g0f0e0d0c0b0a
-      const uint16_t m1 =
-          static_cast<uint16_t>(m0 >> 7); // m1 = 00000000h0g0f0e0
-      const uint8_t m2 =
-          static_cast<uint8_t>((m0 | m1) & 0xff); // m2 =         hdgcfbea
-      // 4. pack the bytes
-      const uint8_t *row =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes[m2][0];
-      const __m128i shuffle = _mm_loadu_si128((__m128i *)(row + 1));
-      const __m128i utf8_packed = _mm_shuffle_epi8(utf8_unpacked, shuffle);
-
-      // 5. store bytes
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_packed);
-
-      // 6. adjust pointers
-      buf += 8;
-      utf8_output += row[0];
-      continue;
-    }
-
-    // Check for overflow in packing
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
-        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+  while (buf + 16 + safety_margin < end) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i nextin = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
-    if (saturation_bitmask == 0xffff) {
-      // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+    // Check if no bits set above 16th
+    if (__lasx_xbz_v(__lasx_xvpickod_h(in, nextin))) {
+      // Pack UTF-32 to UTF-16 safely (without surrogate pairs)
+      // Apply UTF-16 => UTF-8 routine (lasx_convert_utf16_to_utf8.cpp)
+      __m256i utf16_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_h(nextin, in), 0b11011000);
 
-      // Check for illegal surrogate code units
-      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      const __m128i forbidden_bytemask =
-          _mm_cmpeq_epi16(_mm_and_si128(in_16, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
-        return std::make_pair(result(error_code::SURROGATE, buf - start),
-                              utf8_output);
+      if (__lasx_xbz_v(__lasx_xvslt_hu(__lasx_xvrepli_h(0x7F),
+                                       utf16_packed))) { // ASCII fast path!!!!
+        // 1. pack the bytes
+        // obviously suboptimal.
+        __m256i utf8_packed = __lasx_xvpermi_d(
+            __lasx_xvpickev_b(utf16_packed, utf16_packed), 0b00001000);
+        // 2. store (8 bytes)
+        __lsx_vst(lasx_extracti128_lo(utf8_packed), utf8_output, 0);
+        // 3. adjust pointers
+        buf += 16;
+        utf8_output += 16;
+        continue; // we are done for this round!
       }
 
-      const __m128i dup_even = _mm_setr_epi16(0x0000, 0x0202, 0x0404, 0x0606,
-                                              0x0808, 0x0a0a, 0x0c0c, 0x0e0e);
-
-      /* In this branch we handle three cases:
-          1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
-        single UFT-8 byte
-          2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
-        two UTF-8 bytes
-          3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
-        three UTF-8 bytes
-
-        We expand the input word (16-bit) into two code units (32-bit), thus
-        we have room for four bytes. However, we need five distinct bit
-        layouts. Note that the last byte in cases #2 and #3 is the same.
-
-        We precompute byte 1 for case #1 and the common byte for cases #2 & #3
-        in register t2.
-
-        We precompute byte 1 for case #3 and -- **conditionally** -- precompute
-        either byte 1 for case #2 or byte 2 for case #3. Note that they
-        differ by exactly one bit.
-
-        Finally from these two code units we build proper UTF-8 sequence, taking
-        into account the case (i.e, the number of bytes to write).
-      */
-      /**
-       * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
-       * t2 => [0ccc|cccc] [10cc|cccc]
-       * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
-       */
-#define simdutf_vec(x) _mm_set1_epi16(static_cast<uint16_t>(x))
-      // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
-      const __m128i t0 = _mm_shuffle_epi8(in_16, dup_even);
-      // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
-      const __m128i t1 = _mm_and_si128(t0, simdutf_vec(0b0011111101111111));
-      // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
-      const __m128i t2 = _mm_or_si128(t1, simdutf_vec(0b1000000000000000));
+      if (__lasx_xbz_v(__lasx_xvslt_hu(v_07ff, utf16_packed))) {
+        // 1. prepare 2-byte values
+        // input 16-bit word : [0000|0aaa|aabb|bbbb] x 8
+        // expected output   : [110a|aaaa|10bb|bbbb] x 8
 
-      // [aaaa|bbbb|bbcc|cccc] =>  [0000|aaaa|bbbb|bbcc]
-      const __m128i s0 = _mm_srli_epi16(in_16, 4);
-      // [0000|aaaa|bbbb|bbcc] => [0000|aaaa|bbbb|bb00]
-      const __m128i s1 = _mm_and_si128(s0, simdutf_vec(0b0000111111111100));
-      // [0000|aaaa|bbbb|bb00] => [00bb|bbbb|0000|aaaa]
-      const __m128i s2 = _mm_maddubs_epi16(s1, simdutf_vec(0x0140));
-      // [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
-      const __m128i s3 = _mm_or_si128(s2, simdutf_vec(0b1100000011100000));
-      const __m128i m0 = _mm_andnot_si128(one_or_two_bytes_bytemask,
-                                          simdutf_vec(0b0100000000000000));
-      const __m128i s4 = _mm_xor_si128(s3, m0);
-#undef simdutf_vec
+        // t0 = [000a|aaaa|bbbb|bb00]
+        const __m256i t0 = __lasx_xvslli_h(utf16_packed, 2);
+        // t1 = [000a|aaaa|0000|0000]
+        const __m256i t1 = __lasx_xvand_v(t0, __lasx_xvldi(-2785 /*0x1f00*/));
+        // t2 = [0000|0000|00bb|bbbb]
+        const __m256i t2 = __lasx_xvand_v(utf16_packed, __lasx_xvrepli_h(0x3f));
+        // t3 = [000a|aaaa|00bb|bbbb]
+        const __m256i t3 = __lasx_xvor_v(t1, t2);
+        // t4 = [110a|aaaa|10bb|bbbb]
+        const __m256i t4 = __lasx_xvor_v(t3, v_c080);
+        // 2. merge ASCII and 2-byte codewords
+        __m256i one_byte_bytemask =
+            __lasx_xvsle_hu(utf16_packed, __lasx_xvrepli_h(0x7F /*0x007F*/));
+        __m256i utf8_unpacked =
+            __lasx_xvbitsel_v(t4, utf16_packed, one_byte_bytemask);
+        // 3. prepare bitmask for 8-bit lookup
+        __m256i mask = __lasx_xvmskltz_h(one_byte_bytemask);
+        uint32_t m1 = __lasx_xvpickve2gr_wu(mask, 0);
+        uint32_t m2 = __lasx_xvpickve2gr_wu(mask, 4);
+        // 4. pack the bytes
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lasx_1_2_utf8_bytes_mask[m1]][0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_packed1 = __lsx_vshuf_b(
+            zero_128, lasx_extracti128_lo(utf8_unpacked), shuffle1);
+
+        const uint8_t *row2 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_utf8_bytes
+                [lasx_1_2_utf8_bytes_mask[m2]][0];
+        __m128i shuffle2 = __lsx_vld(row2, 1);
+        __m128i utf8_packed2 = __lsx_vshuf_b(
+            zero_128, lasx_extracti128_hi(utf8_unpacked), shuffle2);
+        // 5. store bytes
+        __lsx_vst(utf8_packed1, utf8_output, 0);
+        utf8_output += row1[0];
 
-      // 4. expand code units 16-bit => 32-bit
-      const __m128i out0 = _mm_unpacklo_epi16(t2, s4);
-      const __m128i out1 = _mm_unpackhi_epi16(t2, s4);
+        __lsx_vst(utf8_packed2, utf8_output, 0);
+        utf8_output += row2[0];
 
-      // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
-      const uint16_t mask =
-          (one_byte_bitmask & 0x5555) | (one_or_two_bytes_bitmask & 0xaaaa);
-      if (mask == 0) {
-        // We only have three-byte code units. Use fast path.
-        const __m128i shuffle = _mm_setr_epi8(2, 3, 1, 6, 7, 5, 10, 11, 9, 14,
-                                              15, 13, -1, -1, -1, -1);
-        const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle);
-        const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle);
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-        utf8_output += 12;
-        _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-        utf8_output += 12;
-        buf += 8;
+        buf += 16;
         continue;
-      }
-      const uint8_t mask0 = uint8_t(mask);
+      } else {
+        // case: code units from register produce either 1, 2 or 3 UTF-8 bytes
+        forbidden_bytemask = __lasx_xvor_v(
+            __lasx_xvand_v(
+                __lasx_xvsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+                __lasx_xvsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+            forbidden_bytemask);
+        if (__lasx_xbnz_v(forbidden_bytemask)) {
+          return std::make_pair(result(error_code::SURROGATE, buf - start),
+                                reinterpret_cast<char *>(utf8_output));
+        }
+        /* In this branch we handle three cases:
+            1. [0000|0000|0ccc|cccc] => [0ccc|cccc]                           -
+           single UFT-8 byte
+            2. [0000|0bbb|bbcc|cccc] => [110b|bbbb], [10cc|cccc]              -
+           two UTF-8 bytes
+            3. [aaaa|bbbb|bbcc|cccc] => [1110|aaaa], [10bb|bbbb], [10cc|cccc] -
+           three UTF-8 bytes
+
+            We expand the input word (16-bit) into two code units (32-bit), thus
+            we have room for four bytes. However, we need five distinct bit
+            layouts. Note that the last byte in cases #2 and #3 is the same.
+
+            We precompute byte 1 for case #1 and the common byte for cases #2 &
+           #3 in register t2.
+
+            We precompute byte 1 for case #3 and -- **conditionally** --
+           precompute either byte 1 for case #2 or byte 2 for case #3. Note that
+           they differ by exactly one bit.
+
+            Finally from these two code units we build proper UTF-8 sequence,
+           taking into account the case (i.e, the number of bytes to write).
+        */
+        /**
+         * Given [aaaa|bbbb|bbcc|cccc] our goal is to produce:
+         * t2 => [0ccc|cccc] [10cc|cccc]
+         * s4 => [1110|aaaa] ([110b|bbbb] OR [10bb|bbbb])
+         */
+        // [aaaa|bbbb|bbcc|cccc] => [bbcc|cccc|bbcc|cccc]
+        __m256i t0 = __lasx_xvpickev_b(utf16_packed, utf16_packed);
+        t0 = __lasx_xvilvl_b(t0, t0);
+        // [bbcc|cccc|bbcc|cccc] => [00cc|cccc|0bcc|cccc]
+        __m256i v_3f7f = __lasx_xvreplgr2vr_h(uint16_t(0x3F7F));
+        __m256i t1 = __lasx_xvand_v(t0, v_3f7f);
+        // [00cc|cccc|0bcc|cccc] => [10cc|cccc|0bcc|cccc]
+        __m256i t2 = __lasx_xvor_v(t1, __lasx_xvldi(-2688 /*0x8000*/));
 
-      const uint8_t *row0 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask0][0];
-      const __m128i shuffle0 = _mm_loadu_si128((__m128i *)(row0 + 1));
-      const __m128i utf8_0 = _mm_shuffle_epi8(out0, shuffle0);
+        // s0: [aaaa|bbbb|bbcc|cccc] => [0000|0000|0000|aaaa]
+        __m256i s0 = __lasx_xvsrli_h(utf16_packed, 12);
+        // s1: [aaaa|bbbb|bbcc|cccc] => [0000|bbbb|bb00|0000]
+        __m256i s1 = __lasx_xvslli_h(utf16_packed, 2);
+        // [0000|bbbb|bb00|0000] => [00bb|bbbb|0000|0000]
+        s1 = __lasx_xvand_v(s1, __lasx_xvldi(-2753 /*0x3F00*/));
+        // [00bb|bbbb|0000|aaaa]
+        __m256i s2 = __lasx_xvor_v(s0, s1);
+        // s3: [00bb|bbbb|0000|aaaa] => [11bb|bbbb|1110|aaaa]
+        __m256i v_c0e0 = __lasx_xvreplgr2vr_h(uint16_t(0xC0E0));
+        __m256i s3 = __lasx_xvor_v(s2, v_c0e0);
+        // __m256i v_07ff = vmovq_n_u16((uint16_t)0x07FF);
+        __m256i one_or_two_bytes_bytemask =
+            __lasx_xvsle_hu(utf16_packed, v_07ff);
+        __m256i m0 = __lasx_xvandn_v(one_or_two_bytes_bytemask,
+                                     __lasx_xvldi(-2752 /*0x4000*/));
+        __m256i s4 = __lasx_xvxor_v(s3, m0);
 
-      const uint8_t mask1 = static_cast<uint8_t>(mask >> 8);
+        // 4. expand code units 16-bit => 32-bit
+        __m256i out0 = __lasx_xvilvl_h(s4, t2);
+        __m256i out1 = __lasx_xvilvh_h(s4, t2);
 
-      const uint8_t *row1 =
-          &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask1][0];
-      const __m128i shuffle1 = _mm_loadu_si128((__m128i *)(row1 + 1));
-      const __m128i utf8_1 = _mm_shuffle_epi8(out1, shuffle1);
+        // 5. compress 32-bit code units into 1, 2 or 3 bytes -- 2 x shuffle
+        __m256i one_byte_bytemask =
+            __lasx_xvsle_hu(utf16_packed, __lasx_xvrepli_h(0x7F));
+
+        __m256i one_or_two_bytes_bytemask_u16_to_u32_low =
+            __lasx_xvilvl_h(one_or_two_bytes_bytemask, zero);
+        __m256i one_or_two_bytes_bytemask_u16_to_u32_high =
+            __lasx_xvilvh_h(one_or_two_bytes_bytemask, zero);
+
+        __m256i one_byte_bytemask_u16_to_u32_low =
+            __lasx_xvilvl_h(one_byte_bytemask, one_byte_bytemask);
+        __m256i one_byte_bytemask_u16_to_u32_high =
+            __lasx_xvilvh_h(one_byte_bytemask, one_byte_bytemask);
+
+        __m256i mask0 = __lasx_xvmskltz_h(
+            __lasx_xvor_v(one_or_two_bytes_bytemask_u16_to_u32_low,
+                          one_byte_bytemask_u16_to_u32_low));
+        __m256i mask1 = __lasx_xvmskltz_h(
+            __lasx_xvor_v(one_or_two_bytes_bytemask_u16_to_u32_high,
+                          one_byte_bytemask_u16_to_u32_high));
+
+        uint32_t mask = __lasx_xvpickve2gr_wu(mask0, 0);
+        const uint8_t *row0 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle0 = __lsx_vld(row0, 1);
+        __m128i utf8_0 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out0), shuffle0);
+        __lsx_vst(utf8_0, utf8_output, 0);
+        utf8_output += row0[0];
 
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_0);
-      utf8_output += row0[0];
-      _mm_storeu_si128((__m128i *)utf8_output, utf8_1);
-      utf8_output += row1[0];
+        mask = __lasx_xvpickve2gr_wu(mask1, 0);
+        const uint8_t *row1 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle1 = __lsx_vld(row1, 1);
+        __m128i utf8_1 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_lo(out1), shuffle1);
+        __lsx_vst(utf8_1, utf8_output, 0);
+        utf8_output += row1[0];
 
-      buf += 8;
+        mask = __lasx_xvpickve2gr_wu(mask0, 4);
+        const uint8_t *row2 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle2 = __lsx_vld(row2, 1);
+        __m128i utf8_2 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out0), shuffle2);
+        __lsx_vst(utf8_2, utf8_output, 0);
+        utf8_output += row2[0];
+
+        mask = __lasx_xvpickve2gr_wu(mask1, 4);
+        const uint8_t *row3 =
+            &simdutf::tables::utf16_to_utf8::pack_1_2_3_utf8_bytes[mask & 0xFF]
+                                                                  [0];
+        __m128i shuffle3 = __lsx_vld(row3, 1);
+        __m128i utf8_3 =
+            __lsx_vshuf_b(zero_128, lasx_extracti128_hi(out1), shuffle3);
+        __lsx_vst(utf8_3, utf8_output, 0);
+        utf8_output += row3[0];
+
+        buf += 16;
+      }
+      // At least one 32-bit word will produce a surrogate pair in UTF-16 <=>
+      // will produce four UTF-8 bytes.
     } else {
-      // case: at least one 32-bit word produce a surrogate pair in UTF-16 <=>
-      // will produce four UTF-8 bytes Let us do a scalar fallback. It may seem
-      // wasteful to use scalar code, but being efficient with SIMD in the
-      // presence of surrogate pairs may require non-trivial tables.
+      // Let us do a scalar fallback.
+      // It may seem wasteful to use scalar code, but being efficient with SIMD
+      // in the presence of surrogate pairs may require non-trivial tables.
       size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
@@ -37584,7 +52580,8 @@ sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
         } else if ((word & 0xFFFF0000) == 0) {
           if (word >= 0xD800 && word <= 0xDFFF) {
             return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k), utf8_output);
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
           }
           *utf8_output++ = char((word >> 12) | 0b11100000);
           *utf8_output++ = char(((word >> 6) & 0b111111) | 0b10000000);
@@ -37592,7 +52589,8 @@ sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
         } else {
           if (word > 0x10FFFF) {
             return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k), utf8_output);
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char *>(utf8_output));
           }
           *utf8_output++ = char((word >> 18) | 0b11110000);
           *utf8_output++ = char(((word >> 12) & 0b111111) | 0b10000000);
@@ -37603,51 +52601,76 @@ sse_convert_utf32_to_utf8_with_errors(const char32_t *buf, size_t len,
       buf += k;
     }
   } // while
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf8_output);
+
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char *>(utf8_output));
 }
-/* end file src/westmere/sse_convert_utf32_to_utf8.cpp */
-/* begin file src/westmere/sse_convert_utf32_to_utf16.cpp */
+/* end file src/lasx/lasx_convert_utf32_to_utf8.cpp */
+/* begin file src/lasx/lasx_convert_utf32_to_utf16.cpp */
 template <endianness big_endian>
 std::pair<const char32_t *, char16_t *>
-sse_convert_utf32_to_utf16(const char32_t *buf, size_t len,
-                           char16_t *utf16_output) {
-
+lasx_convert_utf32_to_utf16(const char32_t *buf, size_t len,
+                            char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
   const char32_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();
-  const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
-  __m128i forbidden_bytemask = _mm_setzero_si128();
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)utf16_output & 0x1F) && buf < end) {
+    uint32_t word = *buf++;
+    if ((word & 0xFFFF0000) == 0) {
+      // will not generate a surrogate pair
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return std::make_pair(nullptr,
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(word >> 8 | word << 8)
+                            : char16_t(word);
+      // buf++;
+    } else {
+      // will generate a surrogate pair
+      if (word > 0x10FFFF) {
+        return std::make_pair(nullptr,
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
+      word -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+        low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+      // buf++;
+    }
+  }
 
-  while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
-        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+  __m256i forbidden_bytemask = __lasx_xvrepli_h(0);
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
+  __m256i v_dfff = __lasx_xvreplgr2vr_h(uint16_t(0xdfff));
+  while (buf + 16 <= end) {
+    __m256i in0 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i in1 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
     // Check if no bits set above 16th
-    if (saturation_bitmask == 0xffff) {
-      // Pack UTF-32 to UTF-16
-      __m128i utf16_packed = _mm_packus_epi32(in, nextin);
-
-      const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
-      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      forbidden_bytemask = _mm_or_si128(
-          forbidden_bytemask,
-          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800));
+    if (__lasx_xbz_v(__lasx_xvpickod_h(in1, in0))) {
+      __m256i utf16_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_h(in1, in0), 0b11011000);
+      forbidden_bytemask = __lasx_xvor_v(
+          __lasx_xvand_v(
+              __lasx_xvsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+              __lasx_xvsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+          forbidden_bytemask);
 
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      if (!match_system(big_endian)) {
+        utf16_packed = lasx_swap_bytes(utf16_packed);
       }
-
-      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
-      utf16_output += 8;
-      buf += 8;
+      __lasx_xvst(utf16_packed, utf16_output, 0);
+      utf16_output += 16;
+      buf += 16;
     } else {
-      size_t forward = 7;
+      size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
         forward = size_t(end - buf - 1);
@@ -37657,25 +52680,25 @@ sse_convert_utf32_to_utf16(const char32_t *buf, size_t len,
         if ((word & 0xFFFF0000) == 0) {
           // will not generate a surrogate pair
           if (word >= 0xD800 && word <= 0xDFFF) {
-            return std::make_pair(nullptr, utf16_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
           }
-          *utf16_output++ =
-              big_endian
-                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
-                  : char16_t(word);
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
         } else {
           // will generate a surrogate pair
           if (word > 0x10FFFF) {
-            return std::make_pair(nullptr, utf16_output);
+            return std::make_pair(nullptr,
+                                  reinterpret_cast<char16_t *>(utf16_output));
           }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (big_endian) {
+          if (!match_system(big_endian)) {
             high_surrogate =
-                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate =
-                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -37686,56 +52709,80 @@ sse_convert_utf32_to_utf16(const char32_t *buf, size_t len,
   }
 
   // check for invalid input
-  if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
-    return std::make_pair(nullptr, utf16_output);
+  if (__lasx_xbnz_v(forbidden_bytemask)) {
+    return std::make_pair(nullptr, reinterpret_cast<char16_t *>(utf16_output));
   }
-
-  return std::make_pair(buf, utf16_output);
+  return std::make_pair(buf, reinterpret_cast<char16_t *>(utf16_output));
 }
 
 template <endianness big_endian>
 std::pair<result, char16_t *>
-sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
-                                       char16_t *utf16_output) {
+lasx_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
+                                        char16_t *utf16_out) {
+  uint16_t *utf16_output = reinterpret_cast<uint16_t *>(utf16_out);
   const char32_t *start = buf;
   const char32_t *end = buf + len;
 
-  const __m128i v_0000 = _mm_setzero_si128();
-  const __m128i v_ffff0000 = _mm_set1_epi32((int32_t)0xffff0000);
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)utf16_output & 0x1F) && buf < end) {
+    uint32_t word = *buf++;
+    if ((word & 0xFFFF0000) == 0) {
+      // will not generate a surrogate pair
+      if (word >= 0xD800 && word <= 0xDFFF) {
+        return std::make_pair(result(error_code::SURROGATE, buf - start - 1),
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
+      *utf16_output++ = !match_system(big_endian)
+                            ? char16_t(word >> 8 | word << 8)
+                            : char16_t(word);
+    } else {
+      // will generate a surrogate pair
+      if (word > 0x10FFFF) {
+        return std::make_pair(result(error_code::TOO_LARGE, buf - start - 1),
+                              reinterpret_cast<char16_t *>(utf16_output));
+      }
+      word -= 0x10000;
+      uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
+      uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
+      if (!match_system(big_endian)) {
+        high_surrogate = uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+        low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
+      }
+      *utf16_output++ = char16_t(high_surrogate);
+      *utf16_output++ = char16_t(low_surrogate);
+    }
+  }
 
-  while (end - buf >= 8) {
-    __m128i in = _mm_loadu_si128((__m128i *)buf);
-    __m128i nextin = _mm_loadu_si128((__m128i *)buf + 1);
-    const __m128i saturation_bytemask = _mm_cmpeq_epi32(
-        _mm_and_si128(_mm_or_si128(in, nextin), v_ffff0000), v_0000);
-    const uint32_t saturation_bitmask =
-        static_cast<uint32_t>(_mm_movemask_epi8(saturation_bytemask));
+  __m256i forbidden_bytemask = __lasx_xvrepli_h(0);
+  __m256i v_d800 = __lasx_xvldi(-2600); /*0xD800*/
+  __m256i v_dfff = __lasx_xvreplgr2vr_h(uint16_t(0xdfff));
+  while (buf + 16 <= end) {
+    __m256i in0 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 0);
+    __m256i in1 = __lasx_xvld(reinterpret_cast<const uint32_t *>(buf), 32);
 
     // Check if no bits set above 16th
-    if (saturation_bitmask == 0xffff) {
-      // Pack UTF-32 to UTF-16
-      __m128i utf16_packed = _mm_packus_epi32(in, nextin);
-
-      const __m128i v_f800 = _mm_set1_epi16((uint16_t)0xf800);
-      const __m128i v_d800 = _mm_set1_epi16((uint16_t)0xd800);
-      const __m128i forbidden_bytemask =
-          _mm_cmpeq_epi16(_mm_and_si128(utf16_packed, v_f800), v_d800);
-      if (static_cast<uint32_t>(_mm_movemask_epi8(forbidden_bytemask)) != 0) {
+    if (__lasx_xbz_v(__lasx_xvpickod_h(in1, in0))) {
+      __m256i utf16_packed =
+          __lasx_xvpermi_d(__lasx_xvpickev_h(in1, in0), 0b11011000);
+      forbidden_bytemask = __lasx_xvor_v(
+          __lasx_xvand_v(
+              __lasx_xvsle_h(utf16_packed, v_dfff),  // utf16_packed <= 0xdfff
+              __lasx_xvsle_h(v_d800, utf16_packed)), // utf16_packed >= 0xd800
+          forbidden_bytemask);
+      if (__lasx_xbnz_v(forbidden_bytemask)) {
         return std::make_pair(result(error_code::SURROGATE, buf - start),
-                              utf16_output);
+                              reinterpret_cast<char16_t *>(utf16_output));
       }
 
-      if (big_endian) {
-        const __m128i swap =
-            _mm_setr_epi8(1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14);
-        utf16_packed = _mm_shuffle_epi8(utf16_packed, swap);
+      if (!match_system(big_endian)) {
+        utf16_packed = lasx_swap_bytes(utf16_packed);
       }
 
-      _mm_storeu_si128((__m128i *)utf16_output, utf16_packed);
-      utf16_output += 8;
-      buf += 8;
+      __lasx_xvst(utf16_packed, utf16_output, 0);
+      utf16_output += 16;
+      buf += 16;
     } else {
-      size_t forward = 7;
+      size_t forward = 15;
       size_t k = 0;
       if (size_t(end - buf) < forward + 1) {
         forward = size_t(end - buf - 1);
@@ -37746,26 +52793,26 @@ sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
           // will not generate a surrogate pair
           if (word >= 0xD800 && word <= 0xDFFF) {
             return std::make_pair(
-                result(error_code::SURROGATE, buf - start + k), utf16_output);
+                result(error_code::SURROGATE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
           }
-          *utf16_output++ =
-              big_endian
-                  ? char16_t((uint16_t(word) >> 8) | (uint16_t(word) << 8))
-                  : char16_t(word);
+          *utf16_output++ = !match_system(big_endian)
+                                ? char16_t(word >> 8 | word << 8)
+                                : char16_t(word);
         } else {
           // will generate a surrogate pair
           if (word > 0x10FFFF) {
             return std::make_pair(
-                result(error_code::TOO_LARGE, buf - start + k), utf16_output);
+                result(error_code::TOO_LARGE, buf - start + k),
+                reinterpret_cast<char16_t *>(utf16_output));
           }
           word -= 0x10000;
           uint16_t high_surrogate = uint16_t(0xD800 + (word >> 10));
           uint16_t low_surrogate = uint16_t(0xDC00 + (word & 0x3FF));
-          if (big_endian) {
+          if (!match_system(big_endian)) {
             high_surrogate =
-                uint16_t((high_surrogate >> 8) | (high_surrogate << 8));
-            low_surrogate =
-                uint16_t((low_surrogate >> 8) | (low_surrogate << 8));
+                uint16_t(high_surrogate >> 8 | high_surrogate << 8);
+            low_surrogate = uint16_t(low_surrogate << 8 | low_surrogate >> 8);
           }
           *utf16_output++ = char16_t(high_surrogate);
           *utf16_output++ = char16_t(low_surrogate);
@@ -37775,10 +52822,11 @@ sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
     }
   }
 
-  return std::make_pair(result(error_code::SUCCESS, buf - start), utf16_output);
+  return std::make_pair(result(error_code::SUCCESS, buf - start),
+                        reinterpret_cast<char16_t *>(utf16_output));
 }
-/* end file src/westmere/sse_convert_utf32_to_utf16.cpp */
-/* begin file src/westmere/sse_base64.cpp */
+/* end file src/lasx/lasx_convert_utf32_to_utf16.cpp */
+/* begin file src/lasx/lasx_base64.cpp */
 /**
  * References and further reading:
  *
@@ -37806,36 +52854,6 @@ sse_convert_utf32_to_utf16_with_errors(const char32_t *buf, size_t len,
  * Nick Kopp. 2013. Base64 Encoding on a GPU.
  * https://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU. (2013).
  */
-template <bool base64_url> __m128i lookup_pshufb_improved(const __m128i input) {
-  // credit: Wojciech Muła
-  // reduce  0..51 -> 0
-  //        52..61 -> 1 .. 10
-  //            62 -> 11
-  //            63 -> 12
-  __m128i result = _mm_subs_epu8(input, _mm_set1_epi8(51));
-
-  // distinguish between ranges 0..25 and 26..51:
-  //         0 .. 25 -> remains 0
-  //        26 .. 51 -> becomes 13
-  const __m128i less = _mm_cmpgt_epi8(_mm_set1_epi8(26), input);
-  result = _mm_or_si128(result, _mm_and_si128(less, _mm_set1_epi8(13)));
-
-  __m128i shift_LUT;
-  if (base64_url) {
-    shift_LUT = _mm_setr_epi8('a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-                              '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-                              '0' - 52, '-' - 62, '_' - 63, 'A', 0, 0);
-  } else {
-    shift_LUT = _mm_setr_epi8('a' - 26, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-                              '0' - 52, '0' - 52, '0' - 52, '0' - 52, '0' - 52,
-                              '0' - 52, '+' - 62, '/' - 63, 'A', 0, 0);
-  }
-
-  // read shift
-  result = _mm_shuffle_epi8(shift_LUT, result);
-
-  return _mm_add_epi8(result, input);
-}
 
 template <bool isbase64url>
 size_t encode_base64(char *dst, const char *src, size_t srclen,
@@ -37843,71 +52861,124 @@ size_t encode_base64(char *dst, const char *src, size_t srclen,
   // credit: Wojciech Muła
   // SSE (lookup: pshufb improved unrolled)
   const uint8_t *input = (const uint8_t *)src;
-
+  static const char *lookup_tbl =
+      isbase64url
+          ? "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
+          : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
   uint8_t *out = (uint8_t *)dst;
-  const __m128i shuf =
-      _mm_set_epi8(10, 11, 9, 10, 7, 8, 6, 7, 4, 5, 3, 4, 1, 2, 0, 1);
 
+  v32u8 shuf;
+  __m256i v_fc0fc00, v_3f03f0, shift_r, shift_l, base64_tbl0, base64_tbl1,
+      base64_tbl2, base64_tbl3;
+  if (srclen >= 28) {
+    shuf = v32u8{1, 0, 2, 1, 4, 3, 5, 4, 7, 6, 8, 7, 10, 9, 11, 10,
+                 1, 0, 2, 1, 4, 3, 5, 4, 7, 6, 8, 7, 10, 9, 11, 10};
+
+    v_fc0fc00 = __lasx_xvreplgr2vr_w(uint32_t(0x0fc0fc00));
+    v_3f03f0 = __lasx_xvreplgr2vr_w(uint32_t(0x003f03f0));
+    shift_r = __lasx_xvreplgr2vr_w(uint32_t(0x0006000a));
+    shift_l = __lasx_xvreplgr2vr_w(uint32_t(0x00080004));
+    base64_tbl0 = ____m256i(__lsx_vld(lookup_tbl, 0));
+    base64_tbl1 = ____m256i(__lsx_vld(lookup_tbl, 16));
+    base64_tbl2 = ____m256i(__lsx_vld(lookup_tbl, 32));
+    base64_tbl3 = ____m256i(__lsx_vld(lookup_tbl, 48));
+  }
   size_t i = 0;
-  for (; i + 52 <= srclen; i += 48) {
-    __m128i in0 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 0));
-    __m128i in1 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 1));
-    __m128i in2 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 2));
-    __m128i in3 = _mm_loadu_si128(
-        reinterpret_cast<const __m128i *>(input + i + 4 * 3 * 3));
-
-    in0 = _mm_shuffle_epi8(in0, shuf);
-    in1 = _mm_shuffle_epi8(in1, shuf);
-    in2 = _mm_shuffle_epi8(in2, shuf);
-    in3 = _mm_shuffle_epi8(in3, shuf);
-
-    const __m128i t0_0 = _mm_and_si128(in0, _mm_set1_epi32(0x0fc0fc00));
-    const __m128i t0_1 = _mm_and_si128(in1, _mm_set1_epi32(0x0fc0fc00));
-    const __m128i t0_2 = _mm_and_si128(in2, _mm_set1_epi32(0x0fc0fc00));
-    const __m128i t0_3 = _mm_and_si128(in3, _mm_set1_epi32(0x0fc0fc00));
-
-    const __m128i t1_0 = _mm_mulhi_epu16(t0_0, _mm_set1_epi32(0x04000040));
-    const __m128i t1_1 = _mm_mulhi_epu16(t0_1, _mm_set1_epi32(0x04000040));
-    const __m128i t1_2 = _mm_mulhi_epu16(t0_2, _mm_set1_epi32(0x04000040));
-    const __m128i t1_3 = _mm_mulhi_epu16(t0_3, _mm_set1_epi32(0x04000040));
-
-    const __m128i t2_0 = _mm_and_si128(in0, _mm_set1_epi32(0x003f03f0));
-    const __m128i t2_1 = _mm_and_si128(in1, _mm_set1_epi32(0x003f03f0));
-    const __m128i t2_2 = _mm_and_si128(in2, _mm_set1_epi32(0x003f03f0));
-    const __m128i t2_3 = _mm_and_si128(in3, _mm_set1_epi32(0x003f03f0));
-
-    const __m128i t3_0 = _mm_mullo_epi16(t2_0, _mm_set1_epi32(0x01000010));
-    const __m128i t3_1 = _mm_mullo_epi16(t2_1, _mm_set1_epi32(0x01000010));
-    const __m128i t3_2 = _mm_mullo_epi16(t2_2, _mm_set1_epi32(0x01000010));
-    const __m128i t3_3 = _mm_mullo_epi16(t2_3, _mm_set1_epi32(0x01000010));
-
-    const __m128i input0 = _mm_or_si128(t1_0, t3_0);
-    const __m128i input1 = _mm_or_si128(t1_1, t3_1);
-    const __m128i input2 = _mm_or_si128(t1_2, t3_2);
-    const __m128i input3 = _mm_or_si128(t1_3, t3_3);
-
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
-                     lookup_pshufb_improved<isbase64url>(input0));
-    out += 16;
+  for (; i + 100 <= srclen; i += 96) {
+    __m128i in0_lo =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 0);
+    __m128i in0_hi =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 1);
+    __m128i in1_lo =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 2);
+    __m128i in1_hi =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 3);
+    __m128i in2_lo =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 4);
+    __m128i in2_hi =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 5);
+    __m128i in3_lo =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 6);
+    __m128i in3_hi =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 7);
+
+    __m256i in0 = lasx_set_q(in0_hi, in0_lo);
+    __m256i in1 = lasx_set_q(in1_hi, in1_lo);
+    __m256i in2 = lasx_set_q(in2_hi, in2_lo);
+    __m256i in3 = lasx_set_q(in3_hi, in3_lo);
+
+    in0 = __lasx_xvshuf_b(in0, in0, (__m256i)shuf);
+    in1 = __lasx_xvshuf_b(in1, in1, (__m256i)shuf);
+    in2 = __lasx_xvshuf_b(in2, in2, (__m256i)shuf);
+    in3 = __lasx_xvshuf_b(in3, in3, (__m256i)shuf);
+
+    __m256i t0_0 = __lasx_xvand_v(in0, v_fc0fc00);
+    __m256i t0_1 = __lasx_xvand_v(in1, v_fc0fc00);
+    __m256i t0_2 = __lasx_xvand_v(in2, v_fc0fc00);
+    __m256i t0_3 = __lasx_xvand_v(in3, v_fc0fc00);
+
+    __m256i t1_0 = __lasx_xvsrl_h(t0_0, shift_r);
+    __m256i t1_1 = __lasx_xvsrl_h(t0_1, shift_r);
+    __m256i t1_2 = __lasx_xvsrl_h(t0_2, shift_r);
+    __m256i t1_3 = __lasx_xvsrl_h(t0_3, shift_r);
+
+    __m256i t2_0 = __lasx_xvand_v(in0, v_3f03f0);
+    __m256i t2_1 = __lasx_xvand_v(in1, v_3f03f0);
+    __m256i t2_2 = __lasx_xvand_v(in2, v_3f03f0);
+    __m256i t2_3 = __lasx_xvand_v(in3, v_3f03f0);
+
+    __m256i t3_0 = __lasx_xvsll_h(t2_0, shift_l);
+    __m256i t3_1 = __lasx_xvsll_h(t2_1, shift_l);
+    __m256i t3_2 = __lasx_xvsll_h(t2_2, shift_l);
+    __m256i t3_3 = __lasx_xvsll_h(t2_3, shift_l);
+
+    __m256i input0 = __lasx_xvor_v(t1_0, t3_0);
+    __m256i input0_shuf0 = __lasx_xvshuf_b(base64_tbl1, base64_tbl0, input0);
+    __m256i input0_shuf1 = __lasx_xvshuf_b(
+        base64_tbl3, base64_tbl2, __lasx_xvsub_b(input0, __lasx_xvldi(32)));
+    __m256i input0_mask = __lasx_xvslei_bu(input0, 31);
+    __m256i input0_result =
+        __lasx_xvbitsel_v(input0_shuf1, input0_shuf0, input0_mask);
+    __lasx_xvst(input0_result, reinterpret_cast<__m256i *>(out), 0);
+    out += 32;
 
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
-                     lookup_pshufb_improved<isbase64url>(input1));
-    out += 16;
+    __m256i input1 = __lasx_xvor_v(t1_1, t3_1);
+    __m256i input1_shuf0 = __lasx_xvshuf_b(base64_tbl1, base64_tbl0, input1);
+    __m256i input1_shuf1 = __lasx_xvshuf_b(
+        base64_tbl3, base64_tbl2, __lasx_xvsub_b(input1, __lasx_xvldi(32)));
+    __m256i input1_mask = __lasx_xvslei_bu(input1, 31);
+    __m256i input1_result =
+        __lasx_xvbitsel_v(input1_shuf1, input1_shuf0, input1_mask);
+    __lasx_xvst(input1_result, reinterpret_cast<__m256i *>(out), 0);
+    out += 32;
 
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
-                     lookup_pshufb_improved<isbase64url>(input2));
-    out += 16;
+    __m256i input2 = __lasx_xvor_v(t1_2, t3_2);
+    __m256i input2_shuf0 = __lasx_xvshuf_b(base64_tbl1, base64_tbl0, input2);
+    __m256i input2_shuf1 = __lasx_xvshuf_b(
+        base64_tbl3, base64_tbl2, __lasx_xvsub_b(input2, __lasx_xvldi(32)));
+    __m256i input2_mask = __lasx_xvslei_bu(input2, 31);
+    __m256i input2_result =
+        __lasx_xvbitsel_v(input2_shuf1, input2_shuf0, input2_mask);
+    __lasx_xvst(input2_result, reinterpret_cast<__m256i *>(out), 0);
+    out += 32;
 
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
-                     lookup_pshufb_improved<isbase64url>(input3));
-    out += 16;
+    __m256i input3 = __lasx_xvor_v(t1_3, t3_3);
+    __m256i input3_shuf0 = __lasx_xvshuf_b(base64_tbl1, base64_tbl0, input3);
+    __m256i input3_shuf1 = __lasx_xvshuf_b(
+        base64_tbl3, base64_tbl2, __lasx_xvsub_b(input3, __lasx_xvldi(32)));
+    __m256i input3_mask = __lasx_xvslei_bu(input3, 31);
+    __m256i input3_result =
+        __lasx_xvbitsel_v(input3_shuf1, input3_shuf0, input3_mask);
+    __lasx_xvst(input3_result, reinterpret_cast<__m256i *>(out), 0);
+    out += 32;
   }
-  for (; i + 16 <= srclen; i += 12) {
+  for (; i + 28 <= srclen; i += 24) {
 
-    __m128i in = _mm_loadu_si128(reinterpret_cast<const __m128i *>(input + i));
+    __m128i in_lo = __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 0);
+    __m128i in_hi =
+        __lsx_vld(reinterpret_cast<const __m128i *>(input + i), 4 * 3 * 1);
+
+    __m256i in = lasx_set_q(in_hi, in_lo);
 
     // bytes from groups A, B and C are needed in separate 32-bit lanes
     // in = [DDDD|CCCC|BBBB|AAAA]
@@ -37921,40 +52992,43 @@ size_t encode_base64(char *dst, const char *src, size_t srclen,
     //      [bbbbcccc|ccdddddd|aaaaaabb|bbbbcccc]
     //           ^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^
     //                  processed bits
-    in = _mm_shuffle_epi8(in, shuf);
+    in = __lasx_xvshuf_b(in, in, (__m256i)shuf);
 
     // unpacking
-
     // t0    = [0000cccc|cc000000|aaaaaa00|00000000]
-    const __m128i t0 = _mm_and_si128(in, _mm_set1_epi32(0x0fc0fc00));
+    __m256i t0 = __lasx_xvand_v(in, v_fc0fc00);
     // t1    = [00000000|00cccccc|00000000|00aaaaaa]
-    //          (c * (1 << 10), a * (1 << 6)) >> 16 (note: an unsigned
-    //          multiplication)
-    const __m128i t1 = _mm_mulhi_epu16(t0, _mm_set1_epi32(0x04000040));
+    //          ((c >> 6),  (a >> 10))
+    __m256i t1 = __lasx_xvsrl_h(t0, shift_r);
 
     // t2    = [00000000|00dddddd|000000bb|bbbb0000]
-    const __m128i t2 = _mm_and_si128(in, _mm_set1_epi32(0x003f03f0));
-    // t3    = [00dddddd|00000000|00bbbbbb|00000000](
-    //          (d * (1 << 8), b * (1 << 4))
-    const __m128i t3 = _mm_mullo_epi16(t2, _mm_set1_epi32(0x01000010));
+    __m256i t2 = __lasx_xvand_v(in, v_3f03f0);
+    // t3    = [00dddddd|00000000|00bbbbbb|00000000]
+    //          ((d << 8), (b << 4))
+    __m256i t3 = __lasx_xvsll_h(t2, shift_l);
 
     // res   = [00dddddd|00cccccc|00bbbbbb|00aaaaaa] = t1 | t3
-    const __m128i indices = _mm_or_si128(t1, t3);
-
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(out),
-                     lookup_pshufb_improved<isbase64url>(indices));
-    out += 16;
+    __m256i indices = __lasx_xvor_v(t1, t3);
+
+    __m256i indices_shuf0 = __lasx_xvshuf_b(base64_tbl1, base64_tbl0, indices);
+    __m256i indices_shuf1 = __lasx_xvshuf_b(
+        base64_tbl3, base64_tbl2, __lasx_xvsub_b(indices, __lasx_xvldi(32)));
+    __m256i indices_mask = __lasx_xvslei_bu(indices, 31);
+    __m256i indices_result =
+        __lasx_xvbitsel_v(indices_shuf1, indices_shuf0, indices_mask);
+    __lasx_xvst(indices_result, reinterpret_cast<__m256i *>(out), 0);
+    out += 32;
   }
 
   return i / 3 * 4 + scalar::base64::tail_encode_base64((char *)out, src + i,
                                                         srclen - i, options);
 }
+
 static inline void compress(__m128i data, uint16_t mask, char *output) {
   if (mask == 0) {
-    _mm_storeu_si128(reinterpret_cast<__m128i *>(output), data);
+    __lsx_vst(data, reinterpret_cast<__m128i *>(output), 0);
     return;
   }
-
   // this particular implementation was inspired by work done by @animetosho
   // we do it in two steps, first 8 bytes and then second 8 bytes
   uint8_t mask1 = uint8_t(mask);      // least significant 8 bits
@@ -37963,13 +53037,15 @@ static inline void compress(__m128i data, uint16_t mask, char *output) {
   // thintable_epi8[mask2] into a 128-bit register, using only
   // two instructions on most compilers.
 
-  __m128i shufmask = _mm_set_epi64x(tables::base64::thintable_epi8[mask2],
-                                    tables::base64::thintable_epi8[mask1]);
+  v2u64 shufmask = {tables::base64::thintable_epi8[mask1],
+                    tables::base64::thintable_epi8[mask2]};
+
   // we increment by 0x08 the second half of the mask
-  shufmask =
-      _mm_add_epi8(shufmask, _mm_set_epi32(0x08080808, 0x08080808, 0, 0));
+  const v4u32 hi = {0, 0, 0x08080808, 0x08080808};
+  __m128i shufmask1 = __lsx_vadd_b((__m128i)shufmask, (__m128i)hi);
+
   // this is the version "nearly pruned"
-  __m128i pruned = _mm_shuffle_epi8(data, shufmask);
+  __m128i pruned = __lsx_vshuf_b(data, data, shufmask1);
   // we still need to put the two halves together.
   // we compute the popcount of the first half:
   int pop1 = tables::base64::BitsSetTable256mul2[mask1];
@@ -37977,212 +53053,185 @@ static inline void compress(__m128i data, uint16_t mask, char *output) {
   // only the first pop1 bytes from the first 8 bytes, and then
   // it fills in with the bytes from the second 8 bytes + some filling
   // at the end.
-  __m128i compactmask = _mm_loadu_si128(reinterpret_cast<const __m128i *>(
-      tables::base64::pshufb_combine_table + pop1 * 8));
-  __m128i answer = _mm_shuffle_epi8(pruned, compactmask);
-  _mm_storeu_si128(reinterpret_cast<__m128i *>(output), answer);
+  __m128i compactmask =
+      __lsx_vld(reinterpret_cast<const __m128i *>(
+                    tables::base64::pshufb_combine_table + pop1 * 8),
+                0);
+  __m128i answer = __lsx_vshuf_b(pruned, pruned, compactmask);
+
+  __lsx_vst(answer, reinterpret_cast<__m128i *>(output), 0);
 }
 
 struct block64 {
-  __m128i chunks[4];
+  __m256i chunks[2];
 };
 
 template <bool base64_url>
-static inline uint16_t to_base64_mask(__m128i *src, uint32_t *error) {
-  const __m128i ascii_space_tbl =
-      _mm_setr_epi8(0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0xa, 0x0,
-                    0xc, 0xd, 0x0, 0x0);
+static inline uint32_t to_base64_mask(__m256i *src, bool *error) {
+  __m256i ascii_space_tbl =
+      ____m256i((__m128i)v16u8{0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+                               0x9, 0xa, 0x0, 0xc, 0xd, 0x0, 0x0});
   // credit: aqrit
-  __m128i delta_asso;
+  __m256i delta_asso =
+      ____m256i((__m128i)v16u8{0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x0, 0x0,
+                               0x0, 0x0, 0x0, 0xF, 0x0, 0xF});
+  __m256i delta_values;
   if (base64_url) {
-    delta_asso = _mm_setr_epi8(0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x0, 0x0,
-                               0x0, 0x0, 0x0, 0xF, 0x0, 0xF);
+    delta_values = ____m256i(
+        (__m128i)v16i8{int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+                       int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+                       int8_t(0xB9), int8_t(0x00), int8_t(0x11), int8_t(0xC3),
+                       int8_t(0xBF), int8_t(0xE0), int8_t(0xB9), int8_t(0xB9)});
   } else {
-
-    delta_asso = _mm_setr_epi8(0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
-                               0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0x00, 0x0F);
+    delta_values = ____m256i(
+        (__m128i)v16i8{int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
+                       int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
+                       int8_t(0xB9), int8_t(0x00), int8_t(0x10), int8_t(0xC3),
+                       int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9)});
   }
-  __m128i delta_values;
-  if (base64_url) {
-    delta_values = _mm_setr_epi8(0x0, 0x0, 0x0, 0x13, 0x4, uint8_t(0xBF),
-                                 uint8_t(0xBF), uint8_t(0xB9), uint8_t(0xB9),
-                                 0x0, 0x11, uint8_t(0xC3), uint8_t(0xBF),
-                                 uint8_t(0xE0), uint8_t(0xB9), uint8_t(0xB9));
-  } else {
 
-    delta_values =
-        _mm_setr_epi8(int8_t(0x00), int8_t(0x00), int8_t(0x00), int8_t(0x13),
-                      int8_t(0x04), int8_t(0xBF), int8_t(0xBF), int8_t(0xB9),
-                      int8_t(0xB9), int8_t(0x00), int8_t(0x10), int8_t(0xC3),
-                      int8_t(0xBF), int8_t(0xBF), int8_t(0xB9), int8_t(0xB9));
-  }
-  __m128i check_asso;
+  __m256i check_asso;
   if (base64_url) {
-    check_asso = _mm_setr_epi8(0xD, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
-                               0x3, 0x7, 0xB, 0xE, 0xB, 0x6);
+    check_asso = ____m256i((__m128i)v16u8{0x0D, 0x01, 0x01, 0x01, 0x01, 0x01,
+                                          0x01, 0x01, 0x01, 0x01, 0x03, 0x07,
+                                          0x0B, 0x06, 0x0B, 0x12});
   } else {
-
-    check_asso = _mm_setr_epi8(0x0D, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
-                               0x01, 0x01, 0x03, 0x07, 0x0B, 0x0B, 0x0B, 0x0F);
+    check_asso = ____m256i((__m128i)v16u8{0x0D, 0x01, 0x01, 0x01, 0x01, 0x01,
+                                          0x01, 0x01, 0x01, 0x01, 0x03, 0x07,
+                                          0x0B, 0x0B, 0x0B, 0x0F});
   }
-  __m128i check_values;
+
+  __m256i check_values;
   if (base64_url) {
-    check_values = _mm_setr_epi8(uint8_t(0x80), uint8_t(0x80), uint8_t(0x80),
-                                 uint8_t(0x80), uint8_t(0xCF), uint8_t(0xBF),
-                                 uint8_t(0xB6), uint8_t(0xA6), uint8_t(0xB5),
-                                 uint8_t(0xA1), 0x0, uint8_t(0x80), 0x0,
-                                 uint8_t(0x80), 0x0, uint8_t(0x80));
+    check_values = ____m256i(
+        (__m128i)v16i8{int8_t(0x0), int8_t(0x80), int8_t(0x80), int8_t(0x80),
+                       int8_t(0xCF), int8_t(0xBF), int8_t(0xD3), int8_t(0xA6),
+                       int8_t(0xB5), int8_t(0x86), int8_t(0xD0), int8_t(0x80),
+                       int8_t(0xB0), int8_t(0x80), int8_t(0x0), int8_t(0x0)});
   } else {
-
-    check_values =
-        _mm_setr_epi8(int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
-                      int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6),
-                      int8_t(0xB5), int8_t(0x86), int8_t(0xD1), int8_t(0x80),
-                      int8_t(0xB1), int8_t(0x80), int8_t(0x91), int8_t(0x80));
-  }
-  const __m128i shifted = _mm_srli_epi32(*src, 3);
-
-  const __m128i delta_hash =
-      _mm_avg_epu8(_mm_shuffle_epi8(delta_asso, *src), shifted);
-  const __m128i check_hash =
-      _mm_avg_epu8(_mm_shuffle_epi8(check_asso, *src), shifted);
-
-  const __m128i out =
-      _mm_adds_epi8(_mm_shuffle_epi8(delta_values, delta_hash), *src);
-  const __m128i chk =
-      _mm_adds_epi8(_mm_shuffle_epi8(check_values, check_hash), *src);
-  const int mask = _mm_movemask_epi8(chk);
+    check_values = ____m256i(
+        (__m128i)v16i8{int8_t(0x80), int8_t(0x80), int8_t(0x80), int8_t(0x80),
+                       int8_t(0xCF), int8_t(0xBF), int8_t(0xD5), int8_t(0xA6),
+                       int8_t(0xB5), int8_t(0x86), int8_t(0xD1), int8_t(0x80),
+                       int8_t(0xB1), int8_t(0x80), int8_t(0x91), int8_t(0x80)});
+  }
+
+  __m256i shifted = __lasx_xvsrli_b(*src, 3);
+  __m256i asso_index = __lasx_xvand_v(*src, __lasx_xvldi(0xF));
+  __m256i delta_hash = __lasx_xvavgr_bu(
+      __lasx_xvshuf_b(delta_asso, delta_asso, asso_index), shifted);
+  __m256i check_hash = __lasx_xvavgr_bu(
+      __lasx_xvshuf_b(check_asso, check_asso, asso_index), shifted);
+
+  __m256i out = __lasx_xvsadd_b(
+      __lasx_xvshuf_b(delta_values, delta_values, delta_hash), *src);
+  __m256i chk = __lasx_xvsadd_b(
+      __lasx_xvshuf_b(check_values, check_values, check_hash), *src);
+  __m256i chk_ltz = __lasx_xvmskltz_b(chk);
+  unsigned int mask = __lasx_xvpickve2gr_wu(chk_ltz, 0);
+  mask = mask | (__lsx_vpickve2gr_hu(lasx_extracti128_hi(chk_ltz), 0) << 16);
   if (mask) {
-    __m128i ascii_space =
-        _mm_cmpeq_epi8(_mm_shuffle_epi8(ascii_space_tbl, *src), *src);
-    *error = (mask ^ _mm_movemask_epi8(ascii_space));
+    __m256i ascii_space = __lasx_xvseq_b(
+        __lasx_xvshuf_b(ascii_space_tbl, ascii_space_tbl, asso_index), *src);
+    __m256i ascii_space_ltz = __lasx_xvmskltz_b(ascii_space);
+    unsigned int ascii_space_mask = __lasx_xvpickve2gr_wu(ascii_space_ltz, 0);
+    ascii_space_mask =
+        ascii_space_mask |
+        (__lsx_vpickve2gr_hu(lasx_extracti128_hi(ascii_space_ltz), 0) << 16);
+    *error |= (mask != ascii_space_mask);
   }
+
   *src = out;
-  return (uint16_t)mask;
+  return (uint32_t)mask;
 }
 
 template <bool base64_url>
-static inline uint64_t to_base64_mask(block64 *b, uint64_t *error) {
-  uint32_t err0 = 0;
-  uint32_t err1 = 0;
-  uint32_t err2 = 0;
-  uint32_t err3 = 0;
-  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], &err0);
-  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], &err1);
-  uint64_t m2 = to_base64_mask<base64_url>(&b->chunks[2], &err2);
-  uint64_t m3 = to_base64_mask<base64_url>(&b->chunks[3], &err3);
-  *error = (err0) | ((uint64_t)err1 << 16) | ((uint64_t)err2 << 32) |
-           ((uint64_t)err3 << 48);
-  return m0 | (m1 << 16) | (m2 << 32) | (m3 << 48);
-}
-
-#if defined(_MSC_VER) && !defined(__clang__)
-static inline size_t simdutf_tzcnt_u64(uint64_t num) {
-  unsigned long ret;
-  if (num == 0) {
-    return 64;
-  }
-  _BitScanForward64(&ret, num);
-  return ret;
-}
-#else // GCC or Clang
-static inline size_t simdutf_tzcnt_u64(uint64_t num) {
-  return num ? __builtin_ctzll(num) : 64;
+static inline uint64_t to_base64_mask(block64 *b, bool *error) {
+  *error = 0;
+  uint64_t m0 = to_base64_mask<base64_url>(&b->chunks[0], error);
+  uint64_t m1 = to_base64_mask<base64_url>(&b->chunks[1], error);
+  return m0 | (m1 << 32);
 }
-#endif
 
 static inline void copy_block(block64 *b, char *output) {
-  _mm_storeu_si128(reinterpret_cast<__m128i *>(output), b->chunks[0]);
-  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 16), b->chunks[1]);
-  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 32), b->chunks[2]);
-  _mm_storeu_si128(reinterpret_cast<__m128i *>(output + 48), b->chunks[3]);
+  __lasx_xvst(b->chunks[0], reinterpret_cast<__m256i *>(output), 0);
+  __lasx_xvst(b->chunks[1], reinterpret_cast<__m256i *>(output), 32);
 }
 
 static inline uint64_t compress_block(block64 *b, uint64_t mask, char *output) {
   uint64_t nmask = ~mask;
-  compress(b->chunks[0], uint16_t(mask), output);
-  compress(b->chunks[1], uint16_t(mask >> 16),
-           output + _mm_popcnt_u64(nmask & 0xFFFF));
-  compress(b->chunks[2], uint16_t(mask >> 32),
-           output + _mm_popcnt_u64(nmask & 0xFFFFFFFF));
-  compress(b->chunks[3], uint16_t(mask >> 48),
-           output + _mm_popcnt_u64(nmask & 0xFFFFFFFFFFFFULL));
-  return _mm_popcnt_u64(nmask);
+  uint64_t count =
+      __lsx_vpickve2gr_d(__lsx_vpcnt_h(__lsx_vreplgr2vr_d(nmask)), 0);
+  uint16_t *count_ptr = (uint16_t *)&count;
+  compress(lasx_extracti128_lo(b->chunks[0]), uint16_t(mask), output);
+  compress(lasx_extracti128_hi(b->chunks[0]), uint16_t(mask >> 16),
+           output + count_ptr[0]);
+  compress(lasx_extracti128_lo(b->chunks[1]), uint16_t(mask >> 32),
+           output + count_ptr[0] + count_ptr[1]);
+  compress(lasx_extracti128_hi(b->chunks[1]), uint16_t(mask >> 48),
+           output + count_ptr[0] + count_ptr[1] + count_ptr[2]);
+  return count_ones(nmask);
 }
 
 // The caller of this function is responsible to ensure that there are 64 bytes
 // available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char *src) {
-  b->chunks[0] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
-  b->chunks[1] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16));
-  b->chunks[2] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32));
-  b->chunks[3] = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48));
+  b->chunks[0] = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 0);
+  b->chunks[1] = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 32);
 }
 
 // The caller of this function is responsible to ensure that there are 128 bytes
 // available from reading at src. The data is read into a block64 structure.
 static inline void load_block(block64 *b, const char16_t *src) {
-  __m128i m1 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src));
-  __m128i m2 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 8));
-  __m128i m3 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16));
-  __m128i m4 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 24));
-  __m128i m5 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32));
-  __m128i m6 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 40));
-  __m128i m7 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48));
-  __m128i m8 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 56));
-  b->chunks[0] = _mm_packus_epi16(m1, m2);
-  b->chunks[1] = _mm_packus_epi16(m3, m4);
-  b->chunks[2] = _mm_packus_epi16(m5, m6);
-  b->chunks[3] = _mm_packus_epi16(m7, m8);
+  __m256i m1 = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 0);
+  __m256i m2 = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 32);
+  __m256i m3 = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 64);
+  __m256i m4 = __lasx_xvld(reinterpret_cast<const __m256i *>(src), 96);
+  b->chunks[0] = __lasx_xvpermi_d(__lasx_xvssrlni_bu_h(m2, m1, 0), 0b11011000);
+  b->chunks[1] = __lasx_xvpermi_d(__lasx_xvssrlni_bu_h(m4, m3, 0), 0b11011000);
 }
 
-static inline void base64_decode(char *out, __m128i str) {
-  // credit: aqrit
-
-  const __m128i pack_shuffle =
-      _mm_setr_epi8(2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1);
+static inline void base64_decode(char *out, __m256i str) {
+  __m256i t0 = __lasx_xvor_v(
+      __lasx_xvslli_w(str, 26),
+      __lasx_xvslli_w(__lasx_xvand_v(str, __lasx_xvldi(-1758 /*0x0000FF00*/)),
+                      12));
+  __m256i t1 = __lasx_xvsrli_w(
+      __lasx_xvand_v(str, __lasx_xvldi(-3521 /*0x003F0000*/)), 2);
+  __m256i t2 = __lasx_xvor_v(t0, t1);
+  __m256i t3 = __lasx_xvor_v(t2, __lasx_xvsrli_w(str, 16));
+  __m256i pack_shuffle = ____m256i(
+      (__m128i)v16u8{3, 2, 1, 7, 6, 5, 11, 10, 9, 15, 14, 13, 0, 0, 0, 0});
+  t3 = __lasx_xvshuf_b(t3, t3, (__m256i)pack_shuffle);
 
-  const __m128i t0 = _mm_maddubs_epi16(str, _mm_set1_epi32(0x01400140));
-  const __m128i t1 = _mm_madd_epi16(t0, _mm_set1_epi32(0x00011000));
-  const __m128i t2 = _mm_shuffle_epi8(t1, pack_shuffle);
   // Store the output:
-  // this writes 16 bytes, but we only need 12.
-  _mm_storeu_si128((__m128i *)out, t2);
+  __lsx_vst(lasx_extracti128_lo(t3), out, 0);
+  __lsx_vst(lasx_extracti128_hi(t3), out, 12);
 }
 // decode 64 bytes and output 48 bytes
 static inline void base64_decode_block(char *out, const char *src) {
-  base64_decode(out, _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
-  base64_decode(out + 12,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16)));
+  base64_decode(out, __lasx_xvld(reinterpret_cast<const __m256i *>(src), 0));
   base64_decode(out + 24,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32)));
-  base64_decode(out + 36,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48)));
+                __lasx_xvld(reinterpret_cast<const __m256i *>(src), 32));
 }
+
 static inline void base64_decode_block_safe(char *out, const char *src) {
-  base64_decode(out, _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
-  base64_decode(out + 12,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 16)));
-  base64_decode(out + 24,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 32)));
-  char buffer[16];
+  base64_decode(out, __lasx_xvld(reinterpret_cast<const __m256i *>(src), 0));
+  char buffer[32];
   base64_decode(buffer,
-                _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + 48)));
-  std::memcpy(out + 36, buffer, 12);
+                __lasx_xvld(reinterpret_cast<const __m256i *>(src), 32));
+  std::memcpy(out + 24, buffer, 24);
 }
+
 static inline void base64_decode_block(char *out, block64 *b) {
   base64_decode(out, b->chunks[0]);
-  base64_decode(out + 12, b->chunks[1]);
-  base64_decode(out + 24, b->chunks[2]);
-  base64_decode(out + 36, b->chunks[3]);
+  base64_decode(out + 24, b->chunks[1]);
 }
 static inline void base64_decode_block_safe(char *out, block64 *b) {
   base64_decode(out, b->chunks[0]);
-  base64_decode(out + 12, b->chunks[1]);
-  base64_decode(out + 24, b->chunks[2]);
-  char buffer[16];
-  base64_decode(buffer, b->chunks[3]);
-  std::memcpy(out + 36, buffer, 12);
+  char buffer[32];
+  base64_decode(buffer, b->chunks[1]);
+  std::memcpy(out + 24, buffer, 24);
 }
 
 template <bool base64_url, typename chartype>
@@ -38229,7 +53278,7 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   const chartype *const srcend = src + srclen;
 
   constexpr size_t block_size = 6;
-  static_assert(block_size >= 2, "block should of size 2 or more");
+  static_assert(block_size >= 2, "block_size must be at least two");
   char buffer[block_size * 64];
   char *bufferptr = buffer;
   if (srclen >= 64) {
@@ -38238,13 +53287,16 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
       block64 b;
       load_block(&b, src);
       src += 64;
-      uint64_t error = 0;
+      bool error = false;
       uint64_t badcharmask = to_base64_mask<base64_url>(&b, &error);
       if (error) {
         src -= 64;
-        size_t error_offset = simdutf_tzcnt_u64(error);
-        return {error_code::INVALID_BASE64_CHARACTER,
-                size_t(src - srcinit + error_offset), size_t(dst - dstinit)};
+        while (src < srcend && scalar::base64::is_eight_byte(*src) &&
+               to_base64[uint8_t(*src)] <= 64) {
+          src++;
+        }
+        return {error_code::INVALID_BASE64_CHARACTER, size_t(src - srcinit),
+                size_t(dst - dstinit)};
       }
       if (badcharmask != 0) {
         // optimization opportunity: check for simple masks like those made of
@@ -38285,6 +53337,7 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   // time, otherwise, we should just decode directly.
   int last_block = (int)((bufferptr - buffer_start) % 64);
   if (last_block != 0 && srcend - src + last_block >= 64) {
+
     while ((bufferptr - buffer_start) % 64 != 0 && src < srcend) {
       uint8_t val = to_base64[uint8_t(*src)];
       *bufferptr = char(val);
@@ -38370,15 +53423,15 @@ compress_decode_base64(char *dst, const chartype *src, size_t srclen,
   }
   return {SUCCESS, srclen, size_t(dst - dstinit)};
 }
-/* end file src/westmere/sse_base64.cpp */
+/* end file src/lasx/lasx_base64.cpp */
 
-} // unnamed namespace
-} // namespace westmere
+} // namespace
+} // namespace lasx
 } // namespace simdutf
 
 /* begin file src/generic/buf_block_reader.h */
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
 
 // Walks through a buffer in block-sized increments, loading the last part with
@@ -38484,12 +53537,12 @@ simdutf_really_inline void buf_block_reader<STEP_SIZE>::advance() {
 }
 
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
 /* end file src/generic/buf_block_reader.h */
 /* begin file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
 namespace utf8_validation {
 
@@ -38709,12 +53762,12 @@ struct utf8_checker {
 using utf8_validation::utf8_checker;
 
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
 /* end file src/generic/utf8_validation/utf8_lookup4_algorithm.h */
 /* begin file src/generic/utf8_validation/utf8_validator.h */
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
 namespace utf8_validation {
 
@@ -38843,103 +53896,31 @@ result generic_validate_ascii_with_errors(const uint8_t *input, size_t length) {
 }
 
 result generic_validate_ascii_with_errors(const char *input, size_t length) {
-  return generic_validate_ascii_with_errors<utf8_checker>(
-      reinterpret_cast<const uint8_t *>(input), length);
-}
-
-} // namespace utf8_validation
-} // unnamed namespace
-} // namespace westmere
-} // namespace simdutf
-/* end file src/generic/utf8_validation/utf8_validator.h */
-// transcoding from UTF-8 to UTF-16
-/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-
-namespace simdutf {
-namespace westmere {
-namespace {
-namespace utf8_to_utf16 {
-
-using namespace simd;
-
-template <endianness endian>
-simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char16_t *utf16_output) noexcept {
-  // The implementation is not specific to haswell and should be moved to the
-  // generic directory.
-  size_t pos = 0;
-  char16_t *start{utf16_output};
-  const size_t safety_margin = 16; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
-    // this loop could be unrolled further. For example, we could process the
-    // mask far more than 64 bytes.
-    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
-    if (in.is_ascii()) {
-      in.store_ascii_as_utf16<endian>(utf16_output);
-      utf16_output += 64;
-      pos += 64;
-    } else {
-      // Slow path. We hope that the compiler will recognize that this is a slow
-      // path. Anything that is not a continuation mask is a 'leading byte',
-      // that is, the start of a new code point.
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
-      // -65 is 0b10111111 in two-complement's, so largest possible continuation
-      // byte
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      // The *start* of code points is not so useful, rather, we want the *end*
-      // of code points.
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times when using solely
-      // the slow/regular path, and at least four times if there are fast paths.
-      while (pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        //
-        // Thus we may allow convert_masked_utf8_to_utf16 to process
-        // more bytes at a time under a fast-path mode where 16 bytes
-        // are consumed at once (e.g., when encountering ASCII).
-        size_t consumed = convert_masked_utf8_to_utf16<endian>(
-            input + pos, utf8_end_of_code_point_mask, utf16_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
-      }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
-    }
-  }
-  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
-      input + pos, size - pos, utf16_output);
-  return utf16_output - start;
+  return generic_validate_ascii_with_errors<utf8_checker>(
+      reinterpret_cast<const uint8_t *>(input), length);
 }
 
-} // namespace utf8_to_utf16
+} // namespace utf8_validation
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
-/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+/* end file src/generic/utf8_validation/utf8_validator.h */
+
+// transcoding from UTF-8 to Latin 1
+/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8_to_utf16 {
+namespace utf8_to_latin1 {
 using namespace simd;
 
 simdutf_really_inline simd8<uint8_t>
 check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
+  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
+  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
+  // 0b11000010 and nothing else.
+  //
   // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
   // Bit 1 = Too Long (ASCII followed by continuation)
   // Bit 2 = Overlong 3-byte
@@ -38966,6 +53947,7 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
   // 1111011_ 1000____
   // 11111___ 1000____
   constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
+  constexpr const uint8_t FORBIDDEN = 0xff;
 
   const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
@@ -38976,11 +53958,11 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
       // 1100____ ________ <two byte lead in byte 1>
       TOO_SHORT | OVERLONG_2,
       // 1101____ ________ <two byte lead in byte 1>
-      TOO_SHORT,
+      FORBIDDEN,
       // 1110____ ________ <three byte lead in byte 1>
-      TOO_SHORT | OVERLONG_3 | SURROGATE,
+      FORBIDDEN,
       // 1111____ ________ <four+ byte lead in byte 1>
-      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
+      FORBIDDEN);
   constexpr const uint8_t CARRY =
       TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
   const simd8<uint8_t> byte_1_low =
@@ -38994,23 +53976,16 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
               CARRY, CARRY,
 
               // ____0100 ________
-              CARRY | TOO_LARGE,
+              FORBIDDEN,
               // ____0101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              FORBIDDEN,
               // ____011_ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              FORBIDDEN, FORBIDDEN,
 
               // ____1___ ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
               // ____1101 ________
-              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
-              CARRY | TOO_LARGE | TOO_LARGE_1000,
-              CARRY | TOO_LARGE | TOO_LARGE_1000);
+              FORBIDDEN, FORBIDDEN, FORBIDDEN);
   const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
       TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
@@ -39029,17 +54004,6 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
       TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
   return (byte_1_high & byte_1_low & byte_2_high);
 }
-simdutf_really_inline simd8<uint8_t>
-check_multibyte_lengths(const simd8<uint8_t> input,
-                        const simd8<uint8_t> prev_input,
-                        const simd8<uint8_t> sc) {
-  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
-  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
-  simd8<uint8_t> must23 =
-      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
-  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
-  return must23_80 ^ sc;
-}
 
 struct validating_transcoder {
   // If this is nonzero, there has been a UTF-8 error.
@@ -39055,25 +54019,24 @@ struct validating_transcoder {
     // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
     // small negative numbers)
     simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    simd8<uint8_t> sc = check_special_cases(input, prev1);
-    this->error |= check_multibyte_lengths(input, prev_input, sc);
+    this->error |= check_special_cases(input, prev1);
   }
 
-  template <endianness endian>
   simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char16_t *utf16_output) {
+                                       char *latin1_output) {
     size_t pos = 0;
-    char16_t *start{utf16_output};
+    char *start{latin1_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 8 leading bytes, to give us a good margin.
+    // back from the end counting 16 leading bytes, to give us a good margin.
     size_t leading_byte = 0;
     size_t margin = size;
-    for (; margin > 0 && leading_byte < 8; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) > -65);
+    for (; margin > 0 && leading_byte < 16; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) >
+                       -65); // twos complement of -65 is 1011 1111 ...
     }
     // If the input is long enough, then we have that margin-1 is the eight last
     // leading byte.
@@ -39081,8 +54044,8 @@ struct validating_transcoder {
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39101,10 +54064,9 @@ struct validating_transcoder {
           this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
           this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (utf8_continuation_mask & 1) {
-          return 0; // error
-        }
+        uint64_t utf8_continuation_mask =
+            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                               // this case, we also have ASCII to account for.
         uint64_t utf8_leading_mask = ~utf8_continuation_mask;
         uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
         // We process in blocks of up to 12 bytes except possibly
@@ -39122,8 +54084,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -39137,23 +54099,22 @@ struct validating_transcoder {
       return 0;
     }
     if (pos < size) {
-      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
-          in + pos, size - pos, utf16_output);
+      size_t howmany =
+          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
       if (howmany == 0) {
         return 0;
       }
-      utf16_output += howmany;
+      latin1_output += howmany;
     }
-    return utf16_output - start;
+    return latin1_output - start;
   }
 
-  template <endianness endian>
   simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char16_t *utf16_output) {
+                                                   char *latin1_output) {
     size_t pos = 0;
-    char16_t *start{utf16_output};
+    char *start{latin1_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
+    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
@@ -39169,8 +54130,8 @@ struct validating_transcoder {
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store_ascii_as_utf16<endian>(utf16_output);
-        utf16_output += 64;
+        input.store((int8_t *)latin1_output);
+        latin1_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39189,17 +54150,16 @@ struct validating_transcoder {
           this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
           this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-        if (errors() || (utf8_continuation_mask & 1)) {
+        if (errors()) {
           // rewind_and_convert_with_errors will seek a potential error from
           // in+pos onward, with the ability to go back up to pos bytes, and
           // read size-pos bytes forward.
-          result res =
-              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-                  pos, in + pos, size - pos, utf16_output);
+          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, latin1_output);
           res.count += pos;
           return res;
         }
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
         uint64_t utf8_leading_mask = ~utf8_continuation_mask;
         uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
         // We process in blocks of up to 12 bytes except possibly
@@ -39217,8 +54177,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf16<endian>(
-              in + pos, utf8_end_of_code_point_mask, utf16_output);
+          size_t consumed = convert_masked_utf8_to_latin1(
+              in + pos, utf8_end_of_code_point_mask, latin1_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -39232,9 +54192,8 @@ struct validating_transcoder {
       // rewind_and_convert_with_errors will seek a potential error from in+pos
       // onward, with the ability to go back up to pos bytes, and read size-pos
       // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
       res.count += pos;
       return res;
     }
@@ -39242,17 +54201,16 @@ struct validating_transcoder {
       // rewind_and_convert_with_errors will seek a potential error from in+pos
       // onward, with the ability to go back up to pos bytes, and read size-pos
       // bytes forward.
-      result res =
-          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
-              pos, in + pos, size - pos, utf16_output);
+      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, latin1_output);
       if (res.error) { // In case of error, we want the error position
         res.count += pos;
         return res;
       } else { // In case of success, we want the number of word written
-        utf16_output += res.count;
+        latin1_output += res.count;
       }
     }
-    return result(error_code::SUCCESS, utf16_output - start);
+    return result(error_code::SUCCESS, latin1_output - start);
   }
 
   simdutf_really_inline bool errors() const {
@@ -39260,63 +54218,176 @@ struct validating_transcoder {
   }
 
 }; // struct utf8_checker
-} // namespace utf8_to_utf16
+} // namespace utf8_to_latin1
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
-// transcoding from UTF-8 to UTF-32
-/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8_to_utf32 {
+namespace utf8_to_latin1 {
+using namespace simd;
+
+simdutf_really_inline size_t convert_valid(const char *in, size_t size,
+                                           char *latin1_output) {
+  size_t pos = 0;
+  char *start{latin1_output};
+  // In the worst case, we have the haswell kernel which can cause an overflow
+  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
+  // 16 bytes, and if the data is valid, then it is entirely safe because 16
+  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
+  // assume that you have valid UTF-8 input, so we are going to go back from the
+  // end counting 8 leading bytes, to give us a good margin.
+  size_t leading_byte = 0;
+  size_t margin = size;
+  for (; margin > 0 && leading_byte < 8; margin--) {
+    leading_byte += (int8_t(in[margin - 1]) >
+                     -65); // twos complement of -65 is 1011 1111 ...
+  }
+  // If the input is long enough, then we have that margin-1 is the eight last
+  // leading byte.
+  const size_t safety_margin = size - margin + 1; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    if (input.is_ascii()) {
+      input.store((int8_t *)latin1_output);
+      latin1_output += 64;
+      pos += 64;
+    } else {
+      // you might think that a for-loop would work, but under Visual Studio, it
+      // is not good enough.
+      uint64_t utf8_continuation_mask =
+          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
+                             // this case, we also have ASCII to account for.
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
+      size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times.
+      while (pos < max_starting_point) {
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        size_t consumed = convert_masked_utf8_to_latin1(
+            in + pos, utf8_end_of_code_point_mask, latin1_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
+    }
+  }
+  if (pos < size) {
+    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
+                                                           latin1_output);
+    latin1_output += howmany;
+  }
+  return latin1_output - start;
+}
+
+} // namespace utf8_to_latin1
+} // namespace
+} // namespace lasx
+} // namespace simdutf
+  // namespace simdutf
+/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+// transcoding from UTF-8 to UTF-16
+/* begin file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+
+namespace simdutf {
+namespace lasx {
+namespace {
+namespace utf8_to_utf16 {
 
 using namespace simd;
 
+template <endianness endian>
 simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
-                                         char32_t *utf32_output) noexcept {
+                                         char16_t *utf16_output) noexcept {
+  // The implementation is not specific to haswell and should be moved to the
+  // generic directory.
   size_t pos = 0;
-  char32_t *start{utf32_output};
+  char16_t *start{utf16_output};
   const size_t safety_margin = 16; // to avoid overruns!
   while (pos + 64 + safety_margin <= size) {
+    // this loop could be unrolled further. For example, we could process the
+    // mask far more than 64 bytes.
     simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
     if (in.is_ascii()) {
-      in.store_ascii_as_utf32(utf32_output);
-      utf32_output += 64;
+      in.store_ascii_as_utf16<endian>(utf16_output);
+      utf16_output += 64;
       pos += 64;
     } else {
+      // Slow path. We hope that the compiler will recognize that this is a slow
+      // path. Anything that is not a continuation mask is a 'leading byte',
+      // that is, the start of a new code point.
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
       // -65 is 0b10111111 in two-complement's, so largest possible continuation
       // byte
-      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
       uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      // The *start* of code points is not so useful, rather, we want the *end*
+      // of code points.
       uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      // We process in blocks of up to 12 bytes except possibly
+      // for fast paths which may process up to 16 bytes. For the
+      // slow path to work, we should have at least 12 input bytes left.
       size_t max_starting_point = (pos + 64) - 12;
+      // Next loop is going to run at least five times when using solely
+      // the slow/regular path, and at least four times if there are fast paths.
       while (pos < max_starting_point) {
-        size_t consumed = convert_masked_utf8_to_utf32(
-            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        // Performance note: our ability to compute 'consumed' and
+        // then shift and recompute is critical. If there is a
+        // latency of, say, 4 cycles on getting 'consumed', then
+        // the inner loop might have a total latency of about 6 cycles.
+        // Yet we process between 6 to 12 inputs bytes, thus we get
+        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
+        // for this section of the code. Hence, there is a limit
+        // to how much we can further increase this latency before
+        // it seriously harms performance.
+        //
+        // Thus we may allow convert_masked_utf8_to_utf16 to process
+        // more bytes at a time under a fast-path mode where 16 bytes
+        // are consumed at once (e.g., when encountering ASCII).
+        size_t consumed = convert_masked_utf8_to_utf16<endian>(
+            input + pos, utf8_end_of_code_point_mask, utf16_output);
         pos += consumed;
         utf8_end_of_code_point_mask >>= consumed;
       }
+      // At this point there may remain between 0 and 12 bytes in the
+      // 64-byte block. These bytes will be processed again. So we have an
+      // 80% efficiency (in the worst case). In practice we expect an
+      // 85% to 90% efficiency.
     }
   }
-  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
-                                                       utf32_output);
-  return utf32_output - start;
+  utf16_output += scalar::utf8_to_utf16::convert_valid<endian>(
+      input + pos, size - pos, utf16_output);
+  return utf16_output - start;
 }
 
-} // namespace utf8_to_utf32
+} // namespace utf8_to_utf16
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
-/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+/* end file src/generic/utf8_to_utf16/valid_utf8_to_utf16.h */
+/* begin file src/generic/utf8_to_utf16/utf8_to_utf16.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8_to_utf32 {
+namespace utf8_to_utf16 {
 using namespace simd;
 
 simdutf_really_inline simd8<uint8_t>
@@ -39440,29 +54511,30 @@ struct validating_transcoder {
     this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
 
+  template <endianness endian>
   simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char32_t *utf32_output) {
+                                       char16_t *utf16_output) {
     size_t pos = 0;
-    char32_t *start{utf32_output};
+    char16_t *start{utf16_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
-    // back from the end counting 16 leading bytes, to give us a good margin.
+    // back from the end counting 8 leading bytes, to give us a good margin.
     size_t leading_byte = 0;
     size_t margin = size;
     for (; margin > 0 && leading_byte < 8; margin--) {
       leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
     const size_t safety_margin = size - margin + 1; // to avoid overruns!
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39483,7 +54555,7 @@ struct validating_transcoder {
         }
         uint64_t utf8_continuation_mask = input.lt(-65 + 1);
         if (utf8_continuation_mask & 1) {
-          return 0; // we have an error
+          return 0; // error
         }
         uint64_t utf8_leading_mask = ~utf8_continuation_mask;
         uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
@@ -39502,8 +54574,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -39517,22 +54589,23 @@ struct validating_transcoder {
       return 0;
     }
     if (pos < size) {
-      size_t howmany =
-          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
+      size_t howmany = scalar::utf8_to_utf16::convert<endian>(
+          in + pos, size - pos, utf16_output);
       if (howmany == 0) {
         return 0;
       }
-      utf32_output += howmany;
+      utf16_output += howmany;
     }
-    return utf32_output - start;
+    return utf16_output - start;
   }
 
+  template <endianness endian>
   simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char32_t *utf32_output) {
+                                                   char16_t *utf16_output) {
     size_t pos = 0;
-    char32_t *start{utf32_output};
+    char16_t *start{utf16_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
+    // of 8 bytes when calling convert_masked_utf8_to_utf16. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
@@ -39542,14 +54615,14 @@ struct validating_transcoder {
     for (; margin > 0 && leading_byte < 8; margin--) {
       leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    // If the input is long enough, then we have that margin-1 is the fourth
-    // last leading byte.
+    // If the input is long enough, then we have that margin-1 is the eight last
+    // leading byte.
     const size_t safety_margin = size - margin + 1; // to avoid overruns!
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store_ascii_as_utf32(utf32_output);
-        utf32_output += 64;
+        input.store_ascii_as_utf16<endian>(utf16_output);
+        utf16_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39570,8 +54643,12 @@ struct validating_transcoder {
         }
         uint64_t utf8_continuation_mask = input.lt(-65 + 1);
         if (errors() || (utf8_continuation_mask & 1)) {
-          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-              pos, in + pos, size - pos, utf32_output);
+          // rewind_and_convert_with_errors will seek a potential error from
+          // in+pos onward, with the ability to go back up to pos bytes, and
+          // read size-pos bytes forward.
+          result res =
+              scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+                  pos, in + pos, size - pos, utf16_output);
           res.count += pos;
           return res;
         }
@@ -39592,8 +54669,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_utf32(
-              in + pos, utf8_end_of_code_point_mask, utf32_output);
+          size_t consumed = convert_masked_utf8_to_utf16<endian>(
+              in + pos, utf8_end_of_code_point_mask, utf16_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -39604,22 +54681,30 @@ struct validating_transcoder {
       }
     }
     if (errors()) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
       res.count += pos;
       return res;
     }
     if (pos < size) {
-      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, utf32_output);
+      // rewind_and_convert_with_errors will seek a potential error from in+pos
+      // onward, with the ability to go back up to pos bytes, and read size-pos
+      // bytes forward.
+      result res =
+          scalar::utf8_to_utf16::rewind_and_convert_with_errors<endian>(
+              pos, in + pos, size - pos, utf16_output);
       if (res.error) { // In case of error, we want the error position
         res.count += pos;
         return res;
       } else { // In case of success, we want the number of word written
-        utf32_output += res.count;
+        utf16_output += res.count;
       }
     }
-    return result(error_code::SUCCESS, utf32_output - start);
+    return result(error_code::SUCCESS, utf16_output - start);
   }
 
   simdutf_really_inline bool errors() const {
@@ -39627,143 +54712,67 @@ struct validating_transcoder {
   }
 
 }; // struct utf8_checker
-} // namespace utf8_to_utf32
+} // namespace utf8_to_utf16
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
-// other functions
-/* begin file src/generic/utf8.h */
+/* end file src/generic/utf8_to_utf16/utf8_to_utf16.h */
+// transcoding from UTF-8 to UTF-32
+/* begin file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8 {
+namespace utf8_to_utf32 {
 
 using namespace simd;
 
-simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.gt(-65);
-    count += count_ones(utf8_continuation_mask);
-  }
-  return count + scalar::utf8::count_code_points(in + pos, size - pos);
-}
-
-simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
-                                                    size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos + 64 <= size; pos += 64) {
-    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
-    // We count one word for anything that is not a continuation (so
-    // leading bytes).
-    count += 64 - count_ones(utf8_continuation_mask);
-    int64_t utf8_4byte = input.gteq_unsigned(240);
-    count += count_ones(utf8_4byte);
-  }
-  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
-}
-} // namespace utf8
-} // unnamed namespace
-} // namespace westmere
-} // namespace simdutf
-/* end file src/generic/utf8.h */
-/* begin file src/generic/utf16.h */
-namespace simdutf {
-namespace westmere {
-namespace {
-namespace utf16 {
-
-template <endianness big_endian>
-simdutf_really_inline size_t count_code_points(const char16_t *in,
-                                               size_t size) {
-  size_t pos = 0;
-  size_t count = 0;
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
-    }
-    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
-    count += count_ones(not_pair) / 2;
-  }
-  return count +
-         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
-}
-
-template <endianness big_endian>
-simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
-                                                    size_t size) {
+simdutf_warn_unused size_t convert_valid(const char *input, size_t size,
+                                         char32_t *utf32_output) noexcept {
   size_t pos = 0;
-  size_t count = 0;
-  // This algorithm could no doubt be improved!
-  for (; pos < size / 32 * 32; pos += 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    if (!match_system(big_endian)) {
-      input.swap_bytes();
+  char32_t *start{utf32_output};
+  const size_t safety_margin = 16; // to avoid overruns!
+  while (pos + 64 + safety_margin <= size) {
+    simd8x64<int8_t> in(reinterpret_cast<const int8_t *>(input + pos));
+    if (in.is_ascii()) {
+      in.store_ascii_as_utf32(utf32_output);
+      utf32_output += 64;
+      pos += 64;
+    } else {
+      // -65 is 0b10111111 in two-complement's, so largest possible continuation
+      // byte
+      uint64_t utf8_continuation_mask = in.lt(-65 + 1);
+      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
+      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
+      size_t max_starting_point = (pos + 64) - 12;
+      while (pos < max_starting_point) {
+        size_t consumed = convert_masked_utf8_to_utf32(
+            input + pos, utf8_end_of_code_point_mask, utf32_output);
+        pos += consumed;
+        utf8_end_of_code_point_mask >>= consumed;
+      }
     }
-    uint64_t ascii_mask = input.lteq(0x7F);
-    uint64_t twobyte_mask = input.lteq(0x7FF);
-    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
-
-    size_t ascii_count = count_ones(ascii_mask) / 2;
-    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
-    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
-    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
-    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
-             ascii_count;
-  }
-  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
-                                                                   size - pos);
-}
-
-template <endianness big_endian>
-simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
-                                                     size_t size) {
-  return count_code_points<big_endian>(in, size);
-}
-
-simdutf_really_inline void
-change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
-  size_t pos = 0;
-
-  while (pos < size / 32 * 32) {
-    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
-    input.swap_bytes();
-    input.store(reinterpret_cast<uint16_t *>(output));
-    pos += 32;
-    output += 32;
   }
-
-  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
+  utf32_output += scalar::utf8_to_utf32::convert_valid(input + pos, size - pos,
+                                                       utf32_output);
+  return utf32_output - start;
 }
 
-} // namespace utf16
+} // namespace utf8_to_utf32
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf16.h */
-// transcoding from UTF-8 to Latin 1
-/* begin file src/generic/utf8_to_latin1/utf8_to_latin1.h */
+/* end file src/generic/utf8_to_utf32/valid_utf8_to_utf32.h */
+/* begin file src/generic/utf8_to_utf32/utf8_to_utf32.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8_to_latin1 {
+namespace utf8_to_utf32 {
 using namespace simd;
 
 simdutf_really_inline simd8<uint8_t>
 check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
-  // For UTF-8 to Latin 1, we can allow any ASCII character, and any
-  // continuation byte, but the non-ASCII leading bytes must be 0b11000011 or
-  // 0b11000010 and nothing else.
-  //
   // Bit 0 = Too Short (lead byte/ASCII followed by lead byte/ASCII)
   // Bit 1 = Too Long (ASCII followed by continuation)
   // Bit 2 = Overlong 3-byte
@@ -39790,7 +54799,6 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
   // 1111011_ 1000____
   // 11111___ 1000____
   constexpr const uint8_t OVERLONG_4 = 1 << 6; // 11110000 1000____
-  constexpr const uint8_t FORBIDDEN = 0xff;
 
   const simd8<uint8_t> byte_1_high = prev1.shr<4>().lookup_16<uint8_t>(
       // 0_______ ________ <ASCII in byte 1>
@@ -39801,11 +54809,11 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
       // 1100____ ________ <two byte lead in byte 1>
       TOO_SHORT | OVERLONG_2,
       // 1101____ ________ <two byte lead in byte 1>
-      FORBIDDEN,
+      TOO_SHORT,
       // 1110____ ________ <three byte lead in byte 1>
-      FORBIDDEN,
+      TOO_SHORT | OVERLONG_3 | SURROGATE,
       // 1111____ ________ <four+ byte lead in byte 1>
-      FORBIDDEN);
+      TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4);
   constexpr const uint8_t CARRY =
       TOO_SHORT | TOO_LONG | TWO_CONTS; // These all have ____ in byte 1 .
   const simd8<uint8_t> byte_1_low =
@@ -39819,16 +54827,23 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
               CARRY, CARRY,
 
               // ____0100 ________
-              FORBIDDEN,
+              CARRY | TOO_LARGE,
               // ____0101 ________
-              FORBIDDEN,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
               // ____011_ ________
-              FORBIDDEN, FORBIDDEN,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
 
               // ____1___ ________
-              FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN, FORBIDDEN,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
               // ____1101 ________
-              FORBIDDEN, FORBIDDEN, FORBIDDEN);
+              CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+              CARRY | TOO_LARGE | TOO_LARGE_1000,
+              CARRY | TOO_LARGE | TOO_LARGE_1000);
   const simd8<uint8_t> byte_2_high = input.shr<4>().lookup_16<uint8_t>(
       // ________ 0_______ <ASCII in byte 2>
       TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT,
@@ -39847,6 +54862,17 @@ check_special_cases(const simd8<uint8_t> input, const simd8<uint8_t> prev1) {
       TOO_SHORT, TOO_SHORT, TOO_SHORT, TOO_SHORT);
   return (byte_1_high & byte_1_low & byte_2_high);
 }
+simdutf_really_inline simd8<uint8_t>
+check_multibyte_lengths(const simd8<uint8_t> input,
+                        const simd8<uint8_t> prev_input,
+                        const simd8<uint8_t> sc) {
+  simd8<uint8_t> prev2 = input.prev<2>(prev_input);
+  simd8<uint8_t> prev3 = input.prev<3>(prev_input);
+  simd8<uint8_t> must23 =
+      simd8<uint8_t>(must_be_2_3_continuation(prev2, prev3));
+  simd8<uint8_t> must23_80 = must23 & uint8_t(0x80);
+  return must23_80 ^ sc;
+}
 
 struct validating_transcoder {
   // If this is nonzero, there has been a UTF-8 error.
@@ -39862,33 +54888,33 @@ struct validating_transcoder {
     // lead bytes (2, 3, 4-byte leads become large positive numbers instead of
     // small negative numbers)
     simd8<uint8_t> prev1 = input.prev<1>(prev_input);
-    this->error |= check_special_cases(input, prev1);
+    simd8<uint8_t> sc = check_special_cases(input, prev1);
+    this->error |= check_multibyte_lengths(input, prev_input, sc);
   }
 
   simdutf_really_inline size_t convert(const char *in, size_t size,
-                                       char *latin1_output) {
+                                       char32_t *utf32_output) {
     size_t pos = 0;
-    char *start{latin1_output};
+    char32_t *start{utf32_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // of 8 words when calling convert_masked_utf8_to_utf32. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
     // back from the end counting 16 leading bytes, to give us a good margin.
     size_t leading_byte = 0;
     size_t margin = size;
-    for (; margin > 0 && leading_byte < 16; margin--) {
-      leading_byte += (int8_t(in[margin - 1]) >
-                       -65); // twos complement of -65 is 1011 1111 ...
+    for (; margin > 0 && leading_byte < 8; margin--) {
+      leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
     const size_t safety_margin = size - margin + 1; // to avoid overruns!
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store((int8_t *)latin1_output);
-        latin1_output += 64;
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39907,9 +54933,10 @@ struct validating_transcoder {
           this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
           this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-        uint64_t utf8_continuation_mask =
-            input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
-                               // this case, we also have ASCII to account for.
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (utf8_continuation_mask & 1) {
+          return 0; // we have an error
+        }
         uint64_t utf8_leading_mask = ~utf8_continuation_mask;
         uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
         // We process in blocks of up to 12 bytes except possibly
@@ -39927,8 +54954,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_latin1(
-              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -39943,21 +54970,21 @@ struct validating_transcoder {
     }
     if (pos < size) {
       size_t howmany =
-          scalar::utf8_to_latin1::convert(in + pos, size - pos, latin1_output);
+          scalar::utf8_to_utf32::convert(in + pos, size - pos, utf32_output);
       if (howmany == 0) {
         return 0;
       }
-      latin1_output += howmany;
+      utf32_output += howmany;
     }
-    return latin1_output - start;
+    return utf32_output - start;
   }
 
   simdutf_really_inline result convert_with_errors(const char *in, size_t size,
-                                                   char *latin1_output) {
+                                                   char32_t *utf32_output) {
     size_t pos = 0;
-    char *start{latin1_output};
+    char32_t *start{utf32_output};
     // In the worst case, we have the haswell kernel which can cause an overflow
-    // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the
+    // of 8 bytes when calling convert_masked_utf8_to_utf32. If you skip the
     // last 16 bytes, and if the data is valid, then it is entirely safe because
     // 16 UTF-8 bytes generate much more than 8 bytes. However, you cannot
     // generally assume that you have valid UTF-8 input, so we are going to go
@@ -39967,14 +54994,14 @@ struct validating_transcoder {
     for (; margin > 0 && leading_byte < 8; margin--) {
       leading_byte += (int8_t(in[margin - 1]) > -65);
     }
-    // If the input is long enough, then we have that margin-1 is the eight last
-    // leading byte.
+    // If the input is long enough, then we have that margin-1 is the fourth
+    // last leading byte.
     const size_t safety_margin = size - margin + 1; // to avoid overruns!
     while (pos + 64 + safety_margin <= size) {
       simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
       if (input.is_ascii()) {
-        input.store((int8_t *)latin1_output);
-        latin1_output += 64;
+        input.store_ascii_as_utf32(utf32_output);
+        utf32_output += 64;
         pos += 64;
       } else {
         // you might think that a for-loop would work, but under Visual Studio,
@@ -39993,16 +55020,13 @@ struct validating_transcoder {
           this->check_utf8_bytes(input.chunks[2], input.chunks[1]);
           this->check_utf8_bytes(input.chunks[3], input.chunks[2]);
         }
-        if (errors()) {
-          // rewind_and_convert_with_errors will seek a potential error from
-          // in+pos onward, with the ability to go back up to pos bytes, and
-          // read size-pos bytes forward.
-          result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-              pos, in + pos, size - pos, latin1_output);
+        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+        if (errors() || (utf8_continuation_mask & 1)) {
+          result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+              pos, in + pos, size - pos, utf32_output);
           res.count += pos;
           return res;
         }
-        uint64_t utf8_continuation_mask = input.lt(-65 + 1);
         uint64_t utf8_leading_mask = ~utf8_continuation_mask;
         uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
         // We process in blocks of up to 12 bytes except possibly
@@ -40020,8 +55044,8 @@ struct validating_transcoder {
           // for this section of the code. Hence, there is a limit
           // to how much we can further increase this latency before
           // it seriously harms performance.
-          size_t consumed = convert_masked_utf8_to_latin1(
-              in + pos, utf8_end_of_code_point_mask, latin1_output);
+          size_t consumed = convert_masked_utf8_to_utf32(
+              in + pos, utf8_end_of_code_point_mask, utf32_output);
           pos += consumed;
           utf8_end_of_code_point_mask >>= consumed;
         }
@@ -40032,28 +55056,22 @@ struct validating_transcoder {
       }
     }
     if (errors()) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, latin1_output);
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
       res.count += pos;
       return res;
     }
     if (pos < size) {
-      // rewind_and_convert_with_errors will seek a potential error from in+pos
-      // onward, with the ability to go back up to pos bytes, and read size-pos
-      // bytes forward.
-      result res = scalar::utf8_to_latin1::rewind_and_convert_with_errors(
-          pos, in + pos, size - pos, latin1_output);
+      result res = scalar::utf8_to_utf32::rewind_and_convert_with_errors(
+          pos, in + pos, size - pos, utf32_output);
       if (res.error) { // In case of error, we want the error position
         res.count += pos;
         return res;
       } else { // In case of success, we want the number of word written
-        latin1_output += res.count;
+        utf32_output += res.count;
       }
     }
-    return result(error_code::SUCCESS, latin1_output - start);
+    return result(error_code::SUCCESS, utf32_output - start);
   }
 
   simdutf_really_inline bool errors() const {
@@ -40061,99 +55079,136 @@ struct validating_transcoder {
   }
 
 }; // struct utf8_checker
-} // namespace utf8_to_latin1
+} // namespace utf8_to_utf32
 } // unnamed namespace
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
-/* end file src/generic/utf8_to_latin1/utf8_to_latin1.h */
-/* begin file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+/* end file src/generic/utf8_to_utf32/utf8_to_utf32.h */
+
+
+// other functions
+/* begin file src/generic/utf8.h */
 
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 namespace {
-namespace utf8_to_latin1 {
+namespace utf8 {
+
 using namespace simd;
 
-simdutf_really_inline size_t convert_valid(const char *in, size_t size,
-                                           char *latin1_output) {
+simdutf_really_inline size_t count_code_points(const char *in, size_t size) {
   size_t pos = 0;
-  char *start{latin1_output};
-  // In the worst case, we have the haswell kernel which can cause an overflow
-  // of 8 bytes when calling convert_masked_utf8_to_latin1. If you skip the last
-  // 16 bytes, and if the data is valid, then it is entirely safe because 16
-  // UTF-8 bytes generate much more than 8 bytes. However, you cannot generally
-  // assume that you have valid UTF-8 input, so we are going to go back from the
-  // end counting 8 leading bytes, to give us a good margin.
-  size_t leading_byte = 0;
-  size_t margin = size;
-  for (; margin > 0 && leading_byte < 8; margin--) {
-    leading_byte += (int8_t(in[margin - 1]) >
-                     -65); // twos complement of -65 is 1011 1111 ...
+  size_t count = 0;
+  for (; pos + 64 <= size; pos += 64) {
+    simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
+    uint64_t utf8_continuation_mask = input.gt(-65);
+    count += count_ones(utf8_continuation_mask);
   }
-  // If the input is long enough, then we have that margin-1 is the eight last
-  // leading byte.
-  const size_t safety_margin = size - margin + 1; // to avoid overruns!
-  while (pos + 64 + safety_margin <= size) {
+  return count + scalar::utf8::count_code_points(in + pos, size - pos);
+}
+
+simdutf_really_inline size_t utf16_length_from_utf8(const char *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos + 64 <= size; pos += 64) {
     simd8x64<int8_t> input(reinterpret_cast<const int8_t *>(in + pos));
-    if (input.is_ascii()) {
-      input.store((int8_t *)latin1_output);
-      latin1_output += 64;
-      pos += 64;
-    } else {
-      // you might think that a for-loop would work, but under Visual Studio, it
-      // is not good enough.
-      uint64_t utf8_continuation_mask =
-          input.lt(-65 + 1); // -64 is 1100 0000 in twos complement. Note: in
-                             // this case, we also have ASCII to account for.
-      uint64_t utf8_leading_mask = ~utf8_continuation_mask;
-      uint64_t utf8_end_of_code_point_mask = utf8_leading_mask >> 1;
-      // We process in blocks of up to 12 bytes except possibly
-      // for fast paths which may process up to 16 bytes. For the
-      // slow path to work, we should have at least 12 input bytes left.
-      size_t max_starting_point = (pos + 64) - 12;
-      // Next loop is going to run at least five times.
-      while (pos < max_starting_point) {
-        // Performance note: our ability to compute 'consumed' and
-        // then shift and recompute is critical. If there is a
-        // latency of, say, 4 cycles on getting 'consumed', then
-        // the inner loop might have a total latency of about 6 cycles.
-        // Yet we process between 6 to 12 inputs bytes, thus we get
-        // a speed limit between 1 cycle/byte and 0.5 cycle/byte
-        // for this section of the code. Hence, there is a limit
-        // to how much we can further increase this latency before
-        // it seriously harms performance.
-        size_t consumed = convert_masked_utf8_to_latin1(
-            in + pos, utf8_end_of_code_point_mask, latin1_output);
-        pos += consumed;
-        utf8_end_of_code_point_mask >>= consumed;
-      }
-      // At this point there may remain between 0 and 12 bytes in the
-      // 64-byte block. These bytes will be processed again. So we have an
-      // 80% efficiency (in the worst case). In practice we expect an
-      // 85% to 90% efficiency.
+    uint64_t utf8_continuation_mask = input.lt(-65 + 1);
+    // We count one word for anything that is not a continuation (so
+    // leading bytes).
+    count += 64 - count_ones(utf8_continuation_mask);
+    int64_t utf8_4byte = input.gteq_unsigned(240);
+    count += count_ones(utf8_4byte);
+  }
+  return count + scalar::utf8::utf16_length_from_utf8(in + pos, size - pos);
+}
+} // namespace utf8
+} // unnamed namespace
+} // namespace lasx
+} // namespace simdutf
+/* end file src/generic/utf8.h */
+/* begin file src/generic/utf16.h */
+namespace simdutf {
+namespace lasx {
+namespace {
+namespace utf16 {
+
+template <endianness big_endian>
+simdutf_really_inline size_t count_code_points(const char16_t *in,
+                                               size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
     }
+    uint64_t not_pair = input.not_in_range(0xDC00, 0xDFFF);
+    count += count_ones(not_pair) / 2;
   }
-  if (pos < size) {
-    size_t howmany = scalar::utf8_to_latin1::convert_valid(in + pos, size - pos,
-                                                           latin1_output);
-    latin1_output += howmany;
+  return count +
+         scalar::utf16::count_code_points<big_endian>(in + pos, size - pos);
+}
+
+template <endianness big_endian>
+simdutf_really_inline size_t utf8_length_from_utf16(const char16_t *in,
+                                                    size_t size) {
+  size_t pos = 0;
+  size_t count = 0;
+  // This algorithm could no doubt be improved!
+  for (; pos < size / 32 * 32; pos += 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    if (!match_system(big_endian)) {
+      input.swap_bytes();
+    }
+    uint64_t ascii_mask = input.lteq(0x7F);
+    uint64_t twobyte_mask = input.lteq(0x7FF);
+    uint64_t not_pair_mask = input.not_in_range(0xD800, 0xDFFF);
+
+    size_t ascii_count = count_ones(ascii_mask) / 2;
+    size_t twobyte_count = count_ones(twobyte_mask & ~ascii_mask) / 2;
+    size_t threebyte_count = count_ones(not_pair_mask & ~twobyte_mask) / 2;
+    size_t fourbyte_count = 32 - count_ones(not_pair_mask) / 2;
+    count += 2 * fourbyte_count + 3 * threebyte_count + 2 * twobyte_count +
+             ascii_count;
   }
-  return latin1_output - start;
+  return count + scalar::utf16::utf8_length_from_utf16<big_endian>(in + pos,
+                                                                   size - pos);
 }
 
-} // namespace utf8_to_latin1
-} // namespace
-} // namespace westmere
+template <endianness big_endian>
+simdutf_really_inline size_t utf32_length_from_utf16(const char16_t *in,
+                                                     size_t size) {
+  return count_code_points<big_endian>(in, size);
+}
+
+simdutf_really_inline void
+change_endianness_utf16(const char16_t *in, size_t size, char16_t *output) {
+  size_t pos = 0;
+
+  while (pos < size / 32 * 32) {
+    simd16x32<uint16_t> input(reinterpret_cast<const uint16_t *>(in + pos));
+    input.swap_bytes();
+    input.store(reinterpret_cast<uint16_t *>(output));
+    pos += 32;
+    output += 32;
+  }
+
+  scalar::utf16::change_endianness_utf16(in + pos, size - pos, output);
+}
+
+} // namespace utf16
+} // unnamed namespace
+} // namespace lasx
 } // namespace simdutf
-  // namespace simdutf
-/* end file src/generic/utf8_to_latin1/valid_utf8_to_latin1.h */
+/* end file src/generic/utf16.h */
 
 //
 // Implementation-specific overrides
 //
-
 namespace simdutf {
-namespace westmere {
+namespace lasx {
 
 simdutf_warn_unused int
 implementation::detect_encodings(const char *input,
@@ -40184,34 +55239,32 @@ implementation::detect_encodings(const char *input,
 
 simdutf_warn_unused bool
 implementation::validate_utf8(const char *buf, size_t len) const noexcept {
-  return westmere::utf8_validation::generic_validate_utf8(buf, len);
+  return lasx::utf8_validation::generic_validate_utf8(buf, len);
 }
 
 simdutf_warn_unused result implementation::validate_utf8_with_errors(
     const char *buf, size_t len) const noexcept {
-  return westmere::utf8_validation::generic_validate_utf8_with_errors(buf, len);
+  return lasx::utf8_validation::generic_validate_utf8_with_errors(buf, len);
 }
 
 simdutf_warn_unused bool
 implementation::validate_ascii(const char *buf, size_t len) const noexcept {
-  return westmere::utf8_validation::generic_validate_ascii(buf, len);
+  return lasx::utf8_validation::generic_validate_ascii(buf, len);
 }
 
 simdutf_warn_unused result implementation::validate_ascii_with_errors(
     const char *buf, size_t len) const noexcept {
-  return westmere::utf8_validation::generic_validate_ascii_with_errors(buf,
-                                                                       len);
+  return lasx::utf8_validation::generic_validate_ascii_with_errors(buf, len);
 }
 
 simdutf_warn_unused bool
 implementation::validate_utf16le(const char16_t *buf,
                                  size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
-    // empty input is valid UTF-16. protect the implementation from
-    // handling nullptr
+    // empty input is valid. protected the implementation from nullptr.
     return true;
   }
-  const char16_t *tail = sse_validate_utf16<endianness::LITTLE>(buf, len);
+  const char16_t *tail = lasx_validate_utf16<endianness::LITTLE>(buf, len);
   if (tail) {
     return scalar::utf16::validate<endianness::LITTLE>(tail,
                                                        len - (tail - buf));
@@ -40224,11 +55277,10 @@ simdutf_warn_unused bool
 implementation::validate_utf16be(const char16_t *buf,
                                  size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
-    // empty input is valid UTF-16. protect the implementation from
-    // handling nullptr
+    // empty input is valid. protected the implementation from nullptr.
     return true;
   }
-  const char16_t *tail = sse_validate_utf16<endianness::BIG>(buf, len);
+  const char16_t *tail = lasx_validate_utf16<endianness::BIG>(buf, len);
   if (tail) {
     return scalar::utf16::validate<endianness::BIG>(tail, len - (tail - buf));
   } else {
@@ -40238,7 +55290,10 @@ implementation::validate_utf16be(const char16_t *buf,
 
 simdutf_warn_unused result implementation::validate_utf16le_with_errors(
     const char16_t *buf, size_t len) const noexcept {
-  result res = sse_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = lasx_validate_utf16_with_errors<endianness::LITTLE>(buf, len);
   if (res.count != len) {
     result scalar_res = scalar::utf16::validate_with_errors<endianness::LITTLE>(
         buf + res.count, len - res.count);
@@ -40250,7 +55305,10 @@ simdutf_warn_unused result implementation::validate_utf16le_with_errors(
 
 simdutf_warn_unused result implementation::validate_utf16be_with_errors(
     const char16_t *buf, size_t len) const noexcept {
-  result res = sse_validate_utf16_with_errors<endianness::BIG>(buf, len);
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
+  result res = lasx_validate_utf16_with_errors<endianness::BIG>(buf, len);
   if (res.count != len) {
     result scalar_res = scalar::utf16::validate_with_errors<endianness::BIG>(
         buf + res.count, len - res.count);
@@ -40263,11 +55321,10 @@ simdutf_warn_unused result implementation::validate_utf16be_with_errors(
 simdutf_warn_unused bool
 implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
   if (simdutf_unlikely(len == 0)) {
-    // empty input is valid UTF-32. protect the implementation from
-    // handling nullptr
+    // empty input is valid. protected the implementation from nullptr.
     return true;
   }
-  const char32_t *tail = sse_validate_utf32le(buf, len);
+  const char32_t *tail = lasx_validate_utf32le(buf, len);
   if (tail) {
     return scalar::utf32::validate(tail, len - (tail - buf));
   } else {
@@ -40277,12 +55334,10 @@ implementation::validate_utf32(const char32_t *buf, size_t len) const noexcept {
 
 simdutf_warn_unused result implementation::validate_utf32_with_errors(
     const char32_t *buf, size_t len) const noexcept {
-  if (len == 0) {
-    // empty input is valid UTF-32. protect the implementation from
-    // handling nullptr
+  if (simdutf_unlikely(len == 0)) {
     return result(error_code::SUCCESS, 0);
   }
-  result res = sse_validate_utf32le_with_errors(buf, len);
+  result res = lasx_validate_utf32le_with_errors(buf, len);
   if (res.count != len) {
     result scalar_res =
         scalar::utf32::validate_with_errors(buf + res.count, len - res.count);
@@ -40294,9 +55349,8 @@ simdutf_warn_unused result implementation::validate_utf32_with_errors(
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
     const char *buf, size_t len, char *utf8_output) const noexcept {
-
   std::pair<const char *, char *> ret =
-      sse_convert_latin1_to_utf8(buf, len, utf8_output);
+      lasx_convert_latin1_to_utf8(buf, len, utf8_output);
   size_t converted_chars = ret.second - utf8_output;
 
   if (ret.first != buf + len) {
@@ -40304,25 +55358,18 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf8(
         ret.first, len - (ret.first - buf), ret.second);
     converted_chars += scalar_converted_chars;
   }
-
   return converted_chars;
 }
 
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
     const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char *, char16_t *> ret =
-      sse_convert_latin1_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
+      lasx_convert_latin1_to_utf16le(buf, len, utf16_output);
   size_t converted_chars = ret.second - utf16_output;
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars =
         scalar::latin1_to_utf16::convert<endianness::LITTLE>(
             ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_converted_chars == 0) {
-      return 0;
-    }
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
@@ -40331,18 +55378,12 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf16le(
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
     const char *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char *, char16_t *> ret =
-      sse_convert_latin1_to_utf16<endianness::BIG>(buf, len, utf16_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
+      lasx_convert_latin1_to_utf16be(buf, len, utf16_output);
   size_t converted_chars = ret.second - utf16_output;
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars =
         scalar::latin1_to_utf16::convert<endianness::BIG>(
             ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_converted_chars == 0) {
-      return 0;
-    }
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
@@ -40351,17 +55392,11 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf16be(
 simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
     const char *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char *, char32_t *> ret =
-      sse_convert_latin1_to_utf32(buf, len, utf32_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
+      lasx_convert_latin1_to_utf32(buf, len, utf32_output);
   size_t converted_chars = ret.second - utf32_output;
   if (ret.first != buf + len) {
     const size_t scalar_converted_chars = scalar::latin1_to_utf32::convert(
         ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_converted_chars == 0) {
-      return 0;
-    }
     converted_chars += scalar_converted_chars;
   }
   return converted_chars;
@@ -40369,19 +55404,117 @@ simdutf_warn_unused size_t implementation::convert_latin1_to_utf32(
 
 simdutf_warn_unused size_t implementation::convert_utf8_to_latin1(
     const char *buf, size_t len, char *latin1_output) const noexcept {
+  size_t pos = 0;
+  char *output_start{latin1_output};
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)latin1_output & 0x1F) && pos < len) {
+    if (buf[pos] & 0x80) {
+      if (pos + 1 >= len)
+        return 0;
+      if ((buf[pos] & 0b11100000) == 0b11000000) {
+        if ((buf[pos + 1] & 0b11000000) != 0b10000000)
+          return 0;
+        uint32_t code_point =
+            (buf[pos] & 0b00011111) << 6 | (buf[pos + 1] & 0b00111111);
+        if (code_point < 0x80 || 0xFF < code_point) {
+          return 0;
+        }
+        *latin1_output++ = char(code_point);
+        pos += 2;
+      } else {
+        return 0;
+      }
+    } else {
+      *latin1_output++ = char(buf[pos]);
+      pos++;
+    }
+  }
+  size_t convert_size = latin1_output - output_start;
+  if (pos == len)
+    return convert_size;
   utf8_to_latin1::validating_transcoder converter;
-  return converter.convert(buf, len, latin1_output);
+  size_t convert_result =
+      converter.convert(buf + pos, len - pos, latin1_output);
+  return convert_result ? convert_size + convert_result : 0;
 }
 
 simdutf_warn_unused result implementation::convert_utf8_to_latin1_with_errors(
     const char *buf, size_t len, char *latin1_output) const noexcept {
+  size_t pos = 0;
+  char *output_start{latin1_output};
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)latin1_output & 0x1F) && pos < len) {
+    if (buf[pos] & 0x80) {
+      if ((buf[pos] & 0b11100000) == 0b11000000) {
+        if (pos + 1 >= len)
+          return result(error_code::TOO_SHORT, pos);
+        if ((buf[pos + 1] & 0b11000000) != 0b10000000)
+          return result(error_code::TOO_SHORT, pos);
+        uint32_t code_point =
+            (buf[pos] & 0b00011111) << 6 | (buf[pos + 1] & 0b00111111);
+        if (code_point < 0x80)
+          return result(error_code::OVERLONG, pos);
+        if (0xFF < code_point)
+          return result(error_code::TOO_LARGE, pos);
+        *latin1_output++ = char(code_point);
+        pos += 2;
+      } else if ((buf[pos] & 0b11110000) == 0b11100000) {
+        return result(error_code::TOO_LARGE, pos);
+      } else if ((buf[pos] & 0b11111000) == 0b11110000) {
+        return result(error_code::TOO_LARGE, pos);
+      } else {
+        if ((buf[pos] & 0b11000000) == 0b10000000) {
+          return result(error_code::TOO_LONG, pos);
+        }
+        return result(error_code::HEADER_BITS, pos);
+      }
+    } else {
+      *latin1_output++ = char(buf[pos]);
+      pos++;
+    }
+  }
+  size_t convert_size = latin1_output - output_start;
+  if (pos == len)
+    return result(error_code::SUCCESS, convert_size);
+
   utf8_to_latin1::validating_transcoder converter;
-  return converter.convert_with_errors(buf, len, latin1_output);
+  result res =
+      converter.convert_with_errors(buf + pos, len - pos, latin1_output);
+  return res.error ? result(res.error, res.count + pos)
+                   : result(res.error, res.count + convert_size);
 }
 
 simdutf_warn_unused size_t implementation::convert_valid_utf8_to_latin1(
     const char *buf, size_t len, char *latin1_output) const noexcept {
-  return westmere::utf8_to_latin1::convert_valid(buf, len, latin1_output);
+  size_t pos = 0;
+  char *output_start{latin1_output};
+  // Performance degradation when memory address is not 32-byte aligned
+  while (((uint64_t)latin1_output & 0x1F) && pos < len) {
+    if (buf[pos] & 0x80) {
+      if (pos + 1 >= len)
+        break;
+      if ((buf[pos] & 0b11100000) == 0b11000000) {
+        if ((buf[pos + 1] & 0b11000000) != 0b10000000)
+          return 0;
+        uint32_t code_point =
+            (buf[pos] & 0b00011111) << 6 | (buf[pos + 1] & 0b00111111);
+        *latin1_output++ = char(code_point);
+        pos += 2;
+      } else {
+        return 0;
+      }
+    } else {
+      *latin1_output++ = char(buf[pos]);
+      pos++;
+    }
+  }
+  size_t convert_size = latin1_output - output_start;
+  if (pos == len)
+    return convert_size;
+
+  size_t convert_result =
+      lasx::utf8_to_latin1::convert_valid(buf + pos, len - pos, latin1_output);
+  return convert_result ? convert_size + convert_result : 0;
 }
 
 simdutf_warn_unused size_t implementation::convert_utf8_to_utf16le(
@@ -40441,7 +55574,7 @@ simdutf_warn_unused size_t implementation::convert_valid_utf8_to_utf32(
 simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      sse_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
+      lasx_convert_utf16_to_latin1<endianness::LITTLE>(buf, len, latin1_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40462,7 +55595,7 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_latin1(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      sse_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
+      lasx_convert_utf16_to_latin1<endianness::BIG>(buf, len, latin1_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40484,7 +55617,7 @@ simdutf_warn_unused result
 implementation::convert_utf16le_to_latin1_with_errors(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<result, char *> ret =
-      sse_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
+      lasx_convert_utf16_to_latin1_with_errors<endianness::LITTLE>(
           buf, len, latin1_output);
   if (ret.first.error) {
     return ret.first;
@@ -40511,8 +55644,8 @@ simdutf_warn_unused result
 implementation::convert_utf16be_to_latin1_with_errors(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
   std::pair<result, char *> ret =
-      sse_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
-                                                               latin1_output);
+      lasx_convert_utf16_to_latin1_with_errors<endianness::BIG>(buf, len,
+                                                                latin1_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -40536,20 +55669,20 @@ implementation::convert_utf16be_to_latin1_with_errors(
 
 simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: we could provide an optimized function.
+  // optimization opportunity: implement a custom function.
   return convert_utf16be_to_latin1(buf, len, latin1_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_valid_utf16le_to_latin1(
     const char16_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: we could provide an optimized function.
+  // optimization opportunity: implement a custom function.
   return convert_utf16le_to_latin1(buf, len, latin1_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
     const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      sse_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
+      lasx_convert_utf16_to_utf8<endianness::LITTLE>(buf, len, utf8_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40569,7 +55702,7 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_utf8(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_utf8(
     const char16_t *buf, size_t len, char *utf8_output) const noexcept {
   std::pair<const char16_t *, char *> ret =
-      sse_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
+      lasx_convert_utf16_to_utf8<endianness::BIG>(buf, len, utf8_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40591,8 +55724,8 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf8_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(
-          buf, len, utf8_output);
+      lasx_convert_utf16_to_utf8_with_errors<endianness::LITTLE>(buf, len,
+                                                                 utf8_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -40619,8 +55752,8 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf8_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      westmere::sse_convert_utf16_to_utf8_with_errors<endianness::BIG>(
-          buf, len, utf8_output);
+      lasx_convert_utf16_to_utf8_with_errors<endianness::BIG>(buf, len,
+                                                              utf8_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -40652,59 +55785,13 @@ simdutf_warn_unused size_t implementation::convert_valid_utf16be_to_utf8(
   return convert_utf16be_to_utf8(buf, len, utf8_output);
 }
 
-simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  std::pair<const char32_t *, char *> ret =
-      sse_convert_utf32_to_latin1(buf, len, latin1_output);
-  if (ret.first == nullptr) {
-    return 0;
-  }
-  size_t saved_bytes = ret.second - latin1_output;
-  // if (ret.first != buf + len) {
-  if (ret.first < buf + len) {
-    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
-        ret.first, len - (ret.first - buf), ret.second);
-    if (scalar_saved_bytes == 0) {
-      return 0;
-    }
-    saved_bytes += scalar_saved_bytes;
-  }
-  return saved_bytes;
-}
-
-simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  // ret.first.count is always the position in the buffer, not the number of
-  // code units written even if finished
-  std::pair<result, char *> ret =
-      westmere::sse_convert_utf32_to_latin1_with_errors(buf, len,
-                                                        latin1_output);
-  if (ret.first.count != len) {
-    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
-        buf + ret.first.count, len - ret.first.count, ret.second);
-    if (scalar_res.error) {
-      scalar_res.count += ret.first.count;
-      return scalar_res;
-    } else {
-      ret.second += scalar_res.count;
-    }
-  }
-  ret.first.count =
-      ret.second -
-      latin1_output; // Set count to the number of 8-bit code units written
-  return ret.first;
-}
-
-simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
-    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
-  // optimization opportunity: we could provide an optimized function.
-  return convert_utf32_to_latin1(buf, len, latin1_output);
-}
-
 simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
     const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return 0;
+  }
   std::pair<const char32_t *, char *> ret =
-      sse_convert_utf32_to_utf8(buf, len, utf8_output);
+      lasx_convert_utf32_to_utf8(buf, len, utf8_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40722,10 +55809,13 @@ simdutf_warn_unused size_t implementation::convert_utf32_to_utf8(
 
 simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
     const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  if (simdutf_unlikely(len == 0)) {
+    return result(error_code::SUCCESS, 0);
+  }
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char *> ret =
-      westmere::sse_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
+      lasx_convert_utf32_to_utf8_with_errors(buf, len, utf8_output);
   if (ret.first.count != len) {
     result scalar_res = scalar::utf32_to_utf8::convert_with_errors(
         buf + ret.first.count, len - ret.first.count, ret.second);
@@ -40745,7 +55835,7 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf8_with_errors(
 simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
     const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char16_t *, char32_t *> ret =
-      sse_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
+      lasx_convert_utf16_to_utf32<endianness::LITTLE>(buf, len, utf32_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40765,7 +55855,7 @@ simdutf_warn_unused size_t implementation::convert_utf16le_to_utf32(
 simdutf_warn_unused size_t implementation::convert_utf16be_to_utf32(
     const char16_t *buf, size_t len, char32_t *utf32_output) const noexcept {
   std::pair<const char16_t *, char32_t *> ret =
-      sse_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
+      lasx_convert_utf16_to_utf32<endianness::BIG>(buf, len, utf32_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40787,8 +55877,8 @@ simdutf_warn_unused result implementation::convert_utf16le_to_utf32_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char32_t *> ret =
-      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(
-          buf, len, utf32_output);
+      lasx_convert_utf16_to_utf32_with_errors<endianness::LITTLE>(buf, len,
+                                                                  utf32_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -40815,8 +55905,8 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char32_t *> ret =
-      westmere::sse_convert_utf16_to_utf32_with_errors<endianness::BIG>(
-          buf, len, utf32_output);
+      lasx_convert_utf16_to_utf32_with_errors<endianness::BIG>(buf, len,
+                                                               utf32_output);
   if (ret.first.error) {
     return ret.first;
   } // Can return directly since scalar fallback already found correct
@@ -40838,15 +55928,77 @@ simdutf_warn_unused result implementation::convert_utf16be_to_utf32_with_errors(
   return ret.first;
 }
 
+simdutf_warn_unused size_t implementation::convert_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      lasx_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert(
+        ret.first, len - (ret.first - buf), ret.second);
+    if (scalar_saved_bytes == 0) {
+      return 0;
+    }
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
+simdutf_warn_unused result implementation::convert_utf32_to_latin1_with_errors(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<result, char *> ret =
+      lasx_convert_utf32_to_latin1_with_errors(buf, len, latin1_output);
+  if (ret.first.error) {
+    return ret.first;
+  } // Can return directly since scalar fallback already found correct
+    // ret.first.count
+  if (ret.first.count != len) { // All good so far, but not finished
+    result scalar_res = scalar::utf32_to_latin1::convert_with_errors(
+        buf + ret.first.count, len - ret.first.count, ret.second);
+    if (scalar_res.error) {
+      scalar_res.count += ret.first.count;
+      return scalar_res;
+    } else {
+      ret.second += scalar_res.count;
+    }
+  }
+  ret.first.count =
+      ret.second -
+      latin1_output; // Set count to the number of 8-bit code units written
+  return ret.first;
+}
+
+simdutf_warn_unused size_t implementation::convert_valid_utf32_to_latin1(
+    const char32_t *buf, size_t len, char *latin1_output) const noexcept {
+  std::pair<const char32_t *, char *> ret =
+      lasx_convert_utf32_to_latin1(buf, len, latin1_output);
+  if (ret.first == nullptr) {
+    return 0;
+  }
+  size_t saved_bytes = ret.second - latin1_output;
+
+  if (ret.first != buf + len) {
+    const size_t scalar_saved_bytes = scalar::utf32_to_latin1::convert_valid(
+        ret.first, len - (ret.first - buf), ret.second);
+    saved_bytes += scalar_saved_bytes;
+  }
+  return saved_bytes;
+}
+
 simdutf_warn_unused size_t implementation::convert_valid_utf32_to_utf8(
     const char32_t *buf, size_t len, char *utf8_output) const noexcept {
+  // optimization opportunity: implement a custom function.
   return convert_utf32_to_utf8(buf, len, utf8_output);
 }
 
 simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
     const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char32_t *, char16_t *> ret =
-      sse_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
+      lasx_convert_utf32_to_utf16<endianness::LITTLE>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40860,13 +56012,14 @@ simdutf_warn_unused size_t implementation::convert_utf32_to_utf16le(
     }
     saved_bytes += scalar_saved_bytes;
   }
+
   return saved_bytes;
 }
 
 simdutf_warn_unused size_t implementation::convert_utf32_to_utf16be(
     const char32_t *buf, size_t len, char16_t *utf16_output) const noexcept {
   std::pair<const char32_t *, char16_t *> ret =
-      sse_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
+      lasx_convert_utf32_to_utf16<endianness::BIG>(buf, len, utf16_output);
   if (ret.first == nullptr) {
     return 0;
   }
@@ -40888,8 +56041,8 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16le_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char16_t *> ret =
-      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(
-          buf, len, utf16_output);
+      lasx_convert_utf32_to_utf16_with_errors<endianness::LITTLE>(buf, len,
+                                                                  utf16_output);
   if (ret.first.count != len) {
     result scalar_res =
         scalar::utf32_to_utf16::convert_with_errors<endianness::LITTLE>(
@@ -40912,8 +56065,8 @@ simdutf_warn_unused result implementation::convert_utf32_to_utf16be_with_errors(
   // ret.first.count is always the position in the buffer, not the number of
   // code units written even if finished
   std::pair<result, char16_t *> ret =
-      westmere::sse_convert_utf32_to_utf16_with_errors<endianness::BIG>(
-          buf, len, utf16_output);
+      lasx_convert_utf32_to_utf16_with_errors<endianness::BIG>(buf, len,
+                                                               utf16_output);
   if (ret.first.count != len) {
     result scalar_res =
         scalar::utf32_to_utf16::convert_with_errors<endianness::BIG>(
@@ -40969,7 +56122,23 @@ simdutf_warn_unused size_t implementation::count_utf16be(
 
 simdutf_warn_unused size_t
 implementation::count_utf8(const char *input, size_t length) const noexcept {
-  return utf8::count_code_points(input, length);
+  size_t pos = 0;
+  size_t count = 0;
+  // Performance degradation when memory address is not 32-byte aligned
+  while ((((uint64_t)input + pos) & 0x1F && pos < length)) {
+    if (input[pos++] > -65) {
+      count++;
+    }
+  }
+  __m256i v_bf = __lasx_xvldi(0xBF); // 0b10111111
+  for (; pos + 32 <= length; pos += 32) {
+    __m256i in = __lasx_xvld(reinterpret_cast<const int8_t *>(input + pos), 0);
+    __m256i utf8_count =
+        __lasx_xvpcnt_h(__lasx_xvmskltz_b(__lasx_xvslt_b(v_bf, in)));
+    count = count + __lasx_xvpickve2gr_wu(utf8_count, 0) +
+            __lasx_xvpickve2gr_wu(utf8_count, 4);
+  }
+  return count + scalar::utf8::count_code_points(input + pos, length - pos);
 }
 
 simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
@@ -40979,12 +56148,29 @@ simdutf_warn_unused size_t implementation::latin1_length_from_utf8(
 
 simdutf_warn_unused size_t
 implementation::latin1_length_from_utf16(size_t length) const noexcept {
-  return scalar::utf16::latin1_length_from_utf16(length);
+  return length;
 }
 
 simdutf_warn_unused size_t
 implementation::latin1_length_from_utf32(size_t length) const noexcept {
-  return scalar::utf32::latin1_length_from_utf32(length);
+  return length;
+}
+
+simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
+    const char *input, size_t length) const noexcept {
+  const uint8_t *data = reinterpret_cast<const uint8_t *>(input);
+  const uint8_t *data_end = data + length;
+  uint64_t result = 0;
+  while (data + 16 < data_end) {
+    uint64_t two_bytes = 0;
+    __m128i input_vec = __lsx_vld(data, 0);
+    two_bytes =
+        __lsx_vpickve2gr_hu(__lsx_vpcnt_h(__lsx_vmskltz_b(input_vec)), 0);
+    result += 16 + two_bytes;
+    data += 16;
+  }
+  return result + scalar::latin1::utf8_length_from_latin1((const char *)data,
+                                                          data_end - data);
 }
 
 simdutf_warn_unused size_t implementation::utf8_length_from_utf16le(
@@ -40999,72 +56185,12 @@ simdutf_warn_unused size_t implementation::utf8_length_from_utf16be(
 
 simdutf_warn_unused size_t
 implementation::utf16_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf16_length_from_latin1(length);
+  return length;
 }
 
 simdutf_warn_unused size_t
 implementation::utf32_length_from_latin1(size_t length) const noexcept {
-  return scalar::latin1::utf32_length_from_latin1(length);
-}
-
-simdutf_warn_unused size_t implementation::utf8_length_from_latin1(
-    const char *input, size_t len) const noexcept {
-  const uint8_t *str = reinterpret_cast<const uint8_t *>(input);
-  size_t answer = len / sizeof(__m128i) * sizeof(__m128i);
-  size_t i = 0;
-  if (answer >= 2048) { // long strings optimization
-    __m128i two_64bits = _mm_setzero_si128();
-    while (i + sizeof(__m128i) <= len) {
-      __m128i runner = _mm_setzero_si128();
-      size_t iterations = (len - i) / sizeof(__m128i);
-      if (iterations > 255) {
-        iterations = 255;
-      }
-      size_t max_i = i + iterations * sizeof(__m128i) - sizeof(__m128i);
-      for (; i + 4 * sizeof(__m128i) <= max_i; i += 4 * sizeof(__m128i)) {
-        __m128i input1 = _mm_loadu_si128((const __m128i *)(str + i));
-        __m128i input2 =
-            _mm_loadu_si128((const __m128i *)(str + i + sizeof(__m128i)));
-        __m128i input3 =
-            _mm_loadu_si128((const __m128i *)(str + i + 2 * sizeof(__m128i)));
-        __m128i input4 =
-            _mm_loadu_si128((const __m128i *)(str + i + 3 * sizeof(__m128i)));
-        __m128i input12 =
-            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input1),
-                         _mm_cmpgt_epi8(_mm_setzero_si128(), input2));
-        __m128i input34 =
-            _mm_add_epi8(_mm_cmpgt_epi8(_mm_setzero_si128(), input3),
-                         _mm_cmpgt_epi8(_mm_setzero_si128(), input4));
-        __m128i input1234 = _mm_add_epi8(input12, input34);
-        runner = _mm_sub_epi8(runner, input1234);
-      }
-      for (; i <= max_i; i += sizeof(__m128i)) {
-        __m128i more_input = _mm_loadu_si128((const __m128i *)(str + i));
-        runner = _mm_sub_epi8(runner,
-                              _mm_cmpgt_epi8(_mm_setzero_si128(), more_input));
-      }
-      two_64bits =
-          _mm_add_epi64(two_64bits, _mm_sad_epu8(runner, _mm_setzero_si128()));
-    }
-    answer +=
-        _mm_extract_epi64(two_64bits, 0) + _mm_extract_epi64(two_64bits, 1);
-  } else if (answer > 0) { // short string optimization
-    for (; i + 2 * sizeof(__m128i) <= len; i += 2 * sizeof(__m128i)) {
-      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
-      uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
-      answer += count_ones(non_ascii);
-      latin = _mm_loadu_si128((const __m128i *)(input + i) + 1);
-      non_ascii = (uint16_t)_mm_movemask_epi8(latin);
-      answer += count_ones(non_ascii);
-    }
-    for (; i + sizeof(__m128i) <= len; i += sizeof(__m128i)) {
-      __m128i latin = _mm_loadu_si128((const __m128i *)(input + i));
-      uint16_t non_ascii = (uint16_t)_mm_movemask_epi8(latin);
-      answer += count_ones(non_ascii);
-    }
-  }
-  return answer + scalar::latin1::utf8_length_from_latin1(
-                      reinterpret_cast<const char *>(str + i), len - i);
+  return length;
 }
 
 simdutf_warn_unused size_t implementation::utf32_length_from_utf16le(
@@ -41084,35 +56210,35 @@ simdutf_warn_unused size_t implementation::utf16_length_from_utf8(
 
 simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
     const char32_t *input, size_t length) const noexcept {
-  const __m128i v_00000000 = _mm_setzero_si128();
-  const __m128i v_ffffff80 = _mm_set1_epi32((uint32_t)0xffffff80);
-  const __m128i v_fffff800 = _mm_set1_epi32((uint32_t)0xfffff800);
-  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
+  __m256i v_80 = __lasx_xvrepli_w(0x80); /*0x00000080*/
+  __m256i v_800 = __lasx_xvldi(-3832);   /*0x00000800*/
+  __m256i v_10000 = __lasx_xvldi(-3583); /*0x00010000*/
   size_t pos = 0;
   size_t count = 0;
-  for (; pos + 4 <= length; pos += 4) {
-    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
-    const __m128i ascii_bytes_bytemask =
-        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffffff80), v_00000000);
-    const __m128i one_two_bytes_bytemask =
-        _mm_cmpeq_epi32(_mm_and_si128(in, v_fffff800), v_00000000);
-    const __m128i two_bytes_bytemask =
-        _mm_xor_si128(one_two_bytes_bytemask, ascii_bytes_bytemask);
-    const __m128i one_two_three_bytes_bytemask =
-        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
-    const __m128i three_bytes_bytemask =
-        _mm_xor_si128(one_two_three_bytes_bytemask, one_two_bytes_bytemask);
-    const uint16_t ascii_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(ascii_bytes_bytemask));
-    const uint16_t two_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(two_bytes_bytemask));
-    const uint16_t three_bytes_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(three_bytes_bytemask));
-
-    size_t ascii_count = count_ones(ascii_bytes_bitmask) / 4;
-    size_t two_bytes_count = count_ones(two_bytes_bitmask) / 4;
-    size_t three_bytes_count = count_ones(three_bytes_bitmask) / 4;
-    count += 16 - 3 * ascii_count - 2 * two_bytes_count - three_bytes_count;
+  for (; pos + 8 <= length; pos += 8) {
+    __m256i in =
+        __lasx_xvld(reinterpret_cast<const uint32_t *>(input + pos), 0);
+    __m256i ascii_bytes_bytemask = __lasx_xvslt_w(in, v_80);
+    __m256i one_two_bytes_bytemask = __lasx_xvslt_w(in, v_800);
+    __m256i two_bytes_bytemask =
+        __lasx_xvxor_v(one_two_bytes_bytemask, ascii_bytes_bytemask);
+    __m256i three_bytes_bytemask =
+        __lasx_xvxor_v(__lasx_xvslt_w(in, v_10000), one_two_bytes_bytemask);
+
+    __m256i ascii_bytes =
+        __lasx_xvpcnt_w(__lasx_xvmskltz_w(ascii_bytes_bytemask));
+    const uint32_t ascii_bytes_count = __lasx_xvpickve2gr_wu(ascii_bytes, 0) +
+                                       __lasx_xvpickve2gr_wu(ascii_bytes, 4);
+    __m256i two_bytes = __lasx_xvpcnt_w(__lasx_xvmskltz_w(two_bytes_bytemask));
+    const uint32_t two_bytes_count = __lasx_xvpickve2gr_wu(two_bytes, 0) +
+                                     __lasx_xvpickve2gr_wu(two_bytes, 4);
+    __m256i three_bytes =
+        __lasx_xvpcnt_w(__lasx_xvmskltz_w(three_bytes_bytemask));
+    const uint32_t three_bytes_count = __lasx_xvpickve2gr_wu(three_bytes, 0) +
+                                       __lasx_xvpickve2gr_wu(three_bytes, 4);
+
+    count +=
+        32 - 3 * ascii_bytes_count - 2 * two_bytes_count - three_bytes_count;
   }
   return count +
          scalar::utf32::utf8_length_from_utf32(input + pos, length - pos);
@@ -41120,17 +56246,14 @@ simdutf_warn_unused size_t implementation::utf8_length_from_utf32(
 
 simdutf_warn_unused size_t implementation::utf16_length_from_utf32(
     const char32_t *input, size_t length) const noexcept {
-  const __m128i v_00000000 = _mm_setzero_si128();
-  const __m128i v_ffff0000 = _mm_set1_epi32((uint32_t)0xffff0000);
+  __m128i v_ffff = __lsx_vldi(-2304); /*0x0000ffff*/
   size_t pos = 0;
   size_t count = 0;
   for (; pos + 4 <= length; pos += 4) {
-    __m128i in = _mm_loadu_si128((__m128i *)(input + pos));
-    const __m128i surrogate_bytemask =
-        _mm_cmpeq_epi32(_mm_and_si128(in, v_ffff0000), v_00000000);
-    const uint16_t surrogate_bitmask =
-        static_cast<uint16_t>(_mm_movemask_epi8(surrogate_bytemask));
-    size_t surrogate_count = (16 - count_ones(surrogate_bitmask)) / 4;
+    __m128i in = __lsx_vld(reinterpret_cast<const uint32_t *>(input + pos), 0);
+    __m128i surrogate_bytemask = __lsx_vslt_wu(v_ffff, in);
+    size_t surrogate_count = __lsx_vpickve2gr_bu(
+        __lsx_vpcnt_b(__lsx_vmskltz_w(surrogate_bytemask)), 0);
     count += 4 + surrogate_count;
   }
   return count +
@@ -41206,18 +56329,12 @@ size_t implementation::binary_to_base64(const char *input, size_t length,
     return encode_base64<false>(output, input, length, options);
   }
 }
-} // namespace westmere
+} // namespace lasx
 } // namespace simdutf
 
-/* begin file src/simdutf/westmere/end.h */
-#if SIMDUTF_CAN_ALWAYS_RUN_WESTMERE
-// nothing needed.
-#else
-SIMDUTF_UNTARGET_REGION
-#endif
-
-/* end file src/simdutf/westmere/end.h */
-/* end file src/westmere/implementation.cpp */
+/* begin file src/simdutf/lasx/end.h */
+/* end file src/simdutf/lasx/end.h */
+/* end file src/lasx/implementation.cpp */
 #endif
 
 SIMDUTF_POP_DISABLE_WARNINGS
diff --git a/deps/simdutf/simdutf.h b/deps/simdutf/simdutf.h
index 5f82ca372ccfe3..2d984f40e7bc3f 100644
--- a/deps/simdutf/simdutf.h
+++ b/deps/simdutf/simdutf.h
@@ -1,4 +1,4 @@
-/* auto-generated on 2024-11-21 10:33:28 -0500. Do not edit! */
+/* auto-generated on 2024-12-10 14:54:53 -0500. Do not edit! */
 /* begin file include/simdutf.h */
 #ifndef SIMDUTF_H
 #define SIMDUTF_H
@@ -178,7 +178,12 @@
   #endif
 
 #elif defined(__loongarch_lp64)
-// LoongArch 64-bit
+  #if defined(__loongarch_sx) && defined(__loongarch_asx)
+    #define SIMDUTF_IS_LSX 1
+    #define SIMDUTF_IS_LASX 1
+  #elif defined(__loongarch_sx)
+    #define SIMDUTF_IS_LSX 1
+  #endif
 #else
   // The simdutf library is designed
   // for 64-bit processors and it seems that you are not
@@ -670,7 +675,7 @@ SIMDUTF_DISABLE_UNDESIRED_WARNINGS
 #define SIMDUTF_SIMDUTF_VERSION_H
 
 /** The version of simdutf being used (major.minor.revision) */
-#define SIMDUTF_VERSION "5.6.3"
+#define SIMDUTF_VERSION "5.6.4"
 
 namespace simdutf {
 enum {
@@ -685,7 +690,7 @@ enum {
   /**
    * The revision (major.minor.REVISION) of simdutf being used.
    */
-  SIMDUTF_VERSION_REVISION = 3
+  SIMDUTF_VERSION_REVISION = 4
 };
 } // namespace simdutf
 
@@ -796,6 +801,8 @@ enum instruction_set {
   AVX512VPOPCNTDQ = 0x2000,
   RVV = 0x4000,
   ZVBB = 0x8000,
+  LSX = 0x40000,
+  LASX = 0x80000,
 };
 
 #if defined(__PPC64__)
@@ -987,6 +994,28 @@ static inline uint32_t detect_supported_architectures() {
   }
   return host_isa;
 }
+#elif defined(__loongarch__)
+  #if defined(__linux__)
+    #include <sys/auxv.h>
+  // bits/hwcap.h
+  // #define HWCAP_LOONGARCH_LSX             (1 << 4)
+  // #define HWCAP_LOONGARCH_LASX            (1 << 5)
+  #endif
+
+static inline uint32_t detect_supported_architectures() {
+  uint32_t host_isa = instruction_set::DEFAULT;
+  #if defined(__linux__)
+  uint64_t hwcap = 0;
+  hwcap = getauxval(AT_HWCAP);
+  if (hwcap & HWCAP_LOONGARCH_LSX) {
+    host_isa |= instruction_set::LSX;
+  }
+  if (hwcap & HWCAP_LOONGARCH_LASX) {
+    host_isa |= instruction_set::LASX;
+  }
+  #endif
+  return host_isa;
+}
 #else // fallback
 
 // includes 32-bit ARM.

From f67147ec478d00b48228d4b0f40250de4369bd9b Mon Sep 17 00:00:00 2001
From: "Node.js GitHub Bot" <github-bot@iojs.org>
Date: Tue, 17 Dec 2024 08:20:20 -0500
Subject: [PATCH 203/216] tools: update github_reporter to 1.7.2

PR-URL: https://github.com/nodejs/node/pull/56205
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
---
 tools/github_reporter/index.js     | 1893 ++++++++++++++++++++--------
 tools/github_reporter/package.json |    2 +-
 2 files changed, 1355 insertions(+), 540 deletions(-)

diff --git a/tools/github_reporter/index.js b/tools/github_reporter/index.js
index 9abd7e231e1421..3bab152c932229 100644
--- a/tools/github_reporter/index.js
+++ b/tools/github_reporter/index.js
@@ -1,37 +1,8 @@
 "use strict";
-var __create = Object.create;
-var __defProp = Object.defineProperty;
-var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
 var __getOwnPropNames = Object.getOwnPropertyNames;
-var __getProtoOf = Object.getPrototypeOf;
-var __hasOwnProp = Object.prototype.hasOwnProperty;
-var __esm = (fn, res) => function __init() {
-  return fn && (res = (0, fn[__getOwnPropNames(fn)[0]])(fn = 0)), res;
-};
 var __commonJS = (cb, mod) => function __require() {
   return mod || (0, cb[__getOwnPropNames(cb)[0]])((mod = { exports: {} }).exports, mod), mod.exports;
 };
-var __export = (target, all) => {
-  for (var name in all)
-    __defProp(target, name, { get: all[name], enumerable: true });
-};
-var __copyProps = (to, from, except, desc) => {
-  if (from && typeof from === "object" || typeof from === "function") {
-    for (let key of __getOwnPropNames(from))
-      if (!__hasOwnProp.call(to, key) && key !== except)
-        __defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
-  }
-  return to;
-};
-var __toESM = (mod, isNodeMode, target) => (target = mod != null ? __create(__getProtoOf(mod)) : {}, __copyProps(
-  // If the importer is in node compatibility mode or this is not an ESM
-  // file that has been converted to a CommonJS file using a Babel-
-  // compatible transform (i.e. "__esModule" has not been set), then set
-  // "default" to the CommonJS "module.exports" for node compatibility.
-  isNodeMode || !mod || !mod.__esModule ? __defProp(target, "default", { value: mod, enumerable: true }) : target,
-  mod
-));
-var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: true }), mod);
 
 // node_modules/@reporters/github/node_modules/@actions/core/lib/utils.js
 var require_utils = __commonJS({
@@ -72,9 +43,13 @@ var require_command = __commonJS({
     var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
-      Object.defineProperty(o, k2, { enumerable: true, get: function() {
-        return m[k];
-      } });
+      var desc = Object.getOwnPropertyDescriptor(m, k);
+      if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+        desc = { enumerable: true, get: function() {
+          return m[k];
+        } };
+      }
+      Object.defineProperty(o, k2, desc);
     } : function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
@@ -91,7 +66,7 @@ var require_command = __commonJS({
       var result = {};
       if (mod != null) {
         for (var k in mod)
-          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+          if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k))
             __createBinding(result, mod, k);
       }
       __setModuleDefault(result, mod);
@@ -144,345 +119,12 @@ var require_command = __commonJS({
       }
     };
     function escapeData(s) {
-      return utils_1.toCommandValue(s).replace(/%/g, "%25").replace(/\r/g, "%0D").replace(/\n/g, "%0A");
+      return (0, utils_1.toCommandValue)(s).replace(/%/g, "%25").replace(/\r/g, "%0D").replace(/\n/g, "%0A");
     }
     function escapeProperty(s) {
-      return utils_1.toCommandValue(s).replace(/%/g, "%25").replace(/\r/g, "%0D").replace(/\n/g, "%0A").replace(/:/g, "%3A").replace(/,/g, "%2C");
-    }
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/rng.js
-function rng() {
-  if (poolPtr > rnds8Pool.length - 16) {
-    import_crypto.default.randomFillSync(rnds8Pool);
-    poolPtr = 0;
-  }
-  return rnds8Pool.slice(poolPtr, poolPtr += 16);
-}
-var import_crypto, rnds8Pool, poolPtr;
-var init_rng = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/rng.js"() {
-    import_crypto = __toESM(require("crypto"));
-    rnds8Pool = new Uint8Array(256);
-    poolPtr = rnds8Pool.length;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/regex.js
-var regex_default;
-var init_regex = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/regex.js"() {
-    regex_default = /^(?:[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}|00000000-0000-0000-0000-000000000000)$/i;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/validate.js
-function validate(uuid) {
-  return typeof uuid === "string" && regex_default.test(uuid);
-}
-var validate_default;
-var init_validate = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/validate.js"() {
-    init_regex();
-    validate_default = validate;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/stringify.js
-function stringify(arr, offset = 0) {
-  const uuid = (byteToHex[arr[offset + 0]] + byteToHex[arr[offset + 1]] + byteToHex[arr[offset + 2]] + byteToHex[arr[offset + 3]] + "-" + byteToHex[arr[offset + 4]] + byteToHex[arr[offset + 5]] + "-" + byteToHex[arr[offset + 6]] + byteToHex[arr[offset + 7]] + "-" + byteToHex[arr[offset + 8]] + byteToHex[arr[offset + 9]] + "-" + byteToHex[arr[offset + 10]] + byteToHex[arr[offset + 11]] + byteToHex[arr[offset + 12]] + byteToHex[arr[offset + 13]] + byteToHex[arr[offset + 14]] + byteToHex[arr[offset + 15]]).toLowerCase();
-  if (!validate_default(uuid)) {
-    throw TypeError("Stringified UUID is invalid");
-  }
-  return uuid;
-}
-var byteToHex, stringify_default;
-var init_stringify = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/stringify.js"() {
-    init_validate();
-    byteToHex = [];
-    for (let i = 0; i < 256; ++i) {
-      byteToHex.push((i + 256).toString(16).substr(1));
-    }
-    stringify_default = stringify;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v1.js
-function v1(options, buf, offset) {
-  let i = buf && offset || 0;
-  const b = buf || new Array(16);
-  options = options || {};
-  let node = options.node || _nodeId;
-  let clockseq = options.clockseq !== void 0 ? options.clockseq : _clockseq;
-  if (node == null || clockseq == null) {
-    const seedBytes = options.random || (options.rng || rng)();
-    if (node == null) {
-      node = _nodeId = [seedBytes[0] | 1, seedBytes[1], seedBytes[2], seedBytes[3], seedBytes[4], seedBytes[5]];
-    }
-    if (clockseq == null) {
-      clockseq = _clockseq = (seedBytes[6] << 8 | seedBytes[7]) & 16383;
+      return (0, utils_1.toCommandValue)(s).replace(/%/g, "%25").replace(/\r/g, "%0D").replace(/\n/g, "%0A").replace(/:/g, "%3A").replace(/,/g, "%2C");
     }
   }
-  let msecs = options.msecs !== void 0 ? options.msecs : Date.now();
-  let nsecs = options.nsecs !== void 0 ? options.nsecs : _lastNSecs + 1;
-  const dt = msecs - _lastMSecs + (nsecs - _lastNSecs) / 1e4;
-  if (dt < 0 && options.clockseq === void 0) {
-    clockseq = clockseq + 1 & 16383;
-  }
-  if ((dt < 0 || msecs > _lastMSecs) && options.nsecs === void 0) {
-    nsecs = 0;
-  }
-  if (nsecs >= 1e4) {
-    throw new Error("uuid.v1(): Can't create more than 10M uuids/sec");
-  }
-  _lastMSecs = msecs;
-  _lastNSecs = nsecs;
-  _clockseq = clockseq;
-  msecs += 122192928e5;
-  const tl = ((msecs & 268435455) * 1e4 + nsecs) % 4294967296;
-  b[i++] = tl >>> 24 & 255;
-  b[i++] = tl >>> 16 & 255;
-  b[i++] = tl >>> 8 & 255;
-  b[i++] = tl & 255;
-  const tmh = msecs / 4294967296 * 1e4 & 268435455;
-  b[i++] = tmh >>> 8 & 255;
-  b[i++] = tmh & 255;
-  b[i++] = tmh >>> 24 & 15 | 16;
-  b[i++] = tmh >>> 16 & 255;
-  b[i++] = clockseq >>> 8 | 128;
-  b[i++] = clockseq & 255;
-  for (let n = 0; n < 6; ++n) {
-    b[i + n] = node[n];
-  }
-  return buf || stringify_default(b);
-}
-var _nodeId, _clockseq, _lastMSecs, _lastNSecs, v1_default;
-var init_v1 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v1.js"() {
-    init_rng();
-    init_stringify();
-    _lastMSecs = 0;
-    _lastNSecs = 0;
-    v1_default = v1;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/parse.js
-function parse(uuid) {
-  if (!validate_default(uuid)) {
-    throw TypeError("Invalid UUID");
-  }
-  let v;
-  const arr = new Uint8Array(16);
-  arr[0] = (v = parseInt(uuid.slice(0, 8), 16)) >>> 24;
-  arr[1] = v >>> 16 & 255;
-  arr[2] = v >>> 8 & 255;
-  arr[3] = v & 255;
-  arr[4] = (v = parseInt(uuid.slice(9, 13), 16)) >>> 8;
-  arr[5] = v & 255;
-  arr[6] = (v = parseInt(uuid.slice(14, 18), 16)) >>> 8;
-  arr[7] = v & 255;
-  arr[8] = (v = parseInt(uuid.slice(19, 23), 16)) >>> 8;
-  arr[9] = v & 255;
-  arr[10] = (v = parseInt(uuid.slice(24, 36), 16)) / 1099511627776 & 255;
-  arr[11] = v / 4294967296 & 255;
-  arr[12] = v >>> 24 & 255;
-  arr[13] = v >>> 16 & 255;
-  arr[14] = v >>> 8 & 255;
-  arr[15] = v & 255;
-  return arr;
-}
-var parse_default;
-var init_parse = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/parse.js"() {
-    init_validate();
-    parse_default = parse;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v35.js
-function stringToBytes(str) {
-  str = unescape(encodeURIComponent(str));
-  const bytes = [];
-  for (let i = 0; i < str.length; ++i) {
-    bytes.push(str.charCodeAt(i));
-  }
-  return bytes;
-}
-function v35_default(name, version2, hashfunc) {
-  function generateUUID(value, namespace, buf, offset) {
-    if (typeof value === "string") {
-      value = stringToBytes(value);
-    }
-    if (typeof namespace === "string") {
-      namespace = parse_default(namespace);
-    }
-    if (namespace.length !== 16) {
-      throw TypeError("Namespace must be array-like (16 iterable integer values, 0-255)");
-    }
-    let bytes = new Uint8Array(16 + value.length);
-    bytes.set(namespace);
-    bytes.set(value, namespace.length);
-    bytes = hashfunc(bytes);
-    bytes[6] = bytes[6] & 15 | version2;
-    bytes[8] = bytes[8] & 63 | 128;
-    if (buf) {
-      offset = offset || 0;
-      for (let i = 0; i < 16; ++i) {
-        buf[offset + i] = bytes[i];
-      }
-      return buf;
-    }
-    return stringify_default(bytes);
-  }
-  try {
-    generateUUID.name = name;
-  } catch (err) {
-  }
-  generateUUID.DNS = DNS;
-  generateUUID.URL = URL2;
-  return generateUUID;
-}
-var DNS, URL2;
-var init_v35 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v35.js"() {
-    init_stringify();
-    init_parse();
-    DNS = "6ba7b810-9dad-11d1-80b4-00c04fd430c8";
-    URL2 = "6ba7b811-9dad-11d1-80b4-00c04fd430c8";
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/md5.js
-function md5(bytes) {
-  if (Array.isArray(bytes)) {
-    bytes = Buffer.from(bytes);
-  } else if (typeof bytes === "string") {
-    bytes = Buffer.from(bytes, "utf8");
-  }
-  return import_crypto2.default.createHash("md5").update(bytes).digest();
-}
-var import_crypto2, md5_default;
-var init_md5 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/md5.js"() {
-    import_crypto2 = __toESM(require("crypto"));
-    md5_default = md5;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v3.js
-var v3, v3_default;
-var init_v3 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v3.js"() {
-    init_v35();
-    init_md5();
-    v3 = v35_default("v3", 48, md5_default);
-    v3_default = v3;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v4.js
-function v4(options, buf, offset) {
-  options = options || {};
-  const rnds = options.random || (options.rng || rng)();
-  rnds[6] = rnds[6] & 15 | 64;
-  rnds[8] = rnds[8] & 63 | 128;
-  if (buf) {
-    offset = offset || 0;
-    for (let i = 0; i < 16; ++i) {
-      buf[offset + i] = rnds[i];
-    }
-    return buf;
-  }
-  return stringify_default(rnds);
-}
-var v4_default;
-var init_v4 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v4.js"() {
-    init_rng();
-    init_stringify();
-    v4_default = v4;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/sha1.js
-function sha1(bytes) {
-  if (Array.isArray(bytes)) {
-    bytes = Buffer.from(bytes);
-  } else if (typeof bytes === "string") {
-    bytes = Buffer.from(bytes, "utf8");
-  }
-  return import_crypto3.default.createHash("sha1").update(bytes).digest();
-}
-var import_crypto3, sha1_default;
-var init_sha1 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/sha1.js"() {
-    import_crypto3 = __toESM(require("crypto"));
-    sha1_default = sha1;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v5.js
-var v5, v5_default;
-var init_v5 = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/v5.js"() {
-    init_v35();
-    init_sha1();
-    v5 = v35_default("v5", 80, sha1_default);
-    v5_default = v5;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/nil.js
-var nil_default;
-var init_nil = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/nil.js"() {
-    nil_default = "00000000-0000-0000-0000-000000000000";
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/version.js
-function version(uuid) {
-  if (!validate_default(uuid)) {
-    throw TypeError("Invalid UUID");
-  }
-  return parseInt(uuid.substr(14, 1), 16);
-}
-var version_default;
-var init_version = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/version.js"() {
-    init_validate();
-    version_default = version;
-  }
-});
-
-// node_modules/@reporters/github/node_modules/uuid/dist/esm-node/index.js
-var esm_node_exports = {};
-__export(esm_node_exports, {
-  NIL: () => nil_default,
-  parse: () => parse_default,
-  stringify: () => stringify_default,
-  v1: () => v1_default,
-  v3: () => v3_default,
-  v4: () => v4_default,
-  v5: () => v5_default,
-  validate: () => validate_default,
-  version: () => version_default
-});
-var init_esm_node = __esm({
-  "node_modules/@reporters/github/node_modules/uuid/dist/esm-node/index.js"() {
-    init_v1();
-    init_v3();
-    init_v4();
-    init_v5();
-    init_nil();
-    init_version();
-    init_validate();
-    init_stringify();
-    init_parse();
-  }
 });
 
 // node_modules/@reporters/github/node_modules/@actions/core/lib/file-command.js
@@ -492,9 +134,13 @@ var require_file_command = __commonJS({
     var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
-      Object.defineProperty(o, k2, { enumerable: true, get: function() {
-        return m[k];
-      } });
+      var desc = Object.getOwnPropertyDescriptor(m, k);
+      if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+        desc = { enumerable: true, get: function() {
+          return m[k];
+        } };
+      }
+      Object.defineProperty(o, k2, desc);
     } : function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
@@ -511,7 +157,7 @@ var require_file_command = __commonJS({
       var result = {};
       if (mod != null) {
         for (var k in mod)
-          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+          if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k))
             __createBinding(result, mod, k);
       }
       __setModuleDefault(result, mod);
@@ -519,9 +165,9 @@ var require_file_command = __commonJS({
     };
     Object.defineProperty(exports2, "__esModule", { value: true });
     exports2.prepareKeyValueMessage = exports2.issueFileCommand = void 0;
+    var crypto = __importStar(require("crypto"));
     var fs = __importStar(require("fs"));
     var os = __importStar(require("os"));
-    var uuid_1 = (init_esm_node(), __toCommonJS(esm_node_exports));
     var utils_1 = require_utils();
     function issueFileCommand(command, message) {
       const filePath = process.env[`GITHUB_${command}`];
@@ -531,14 +177,14 @@ var require_file_command = __commonJS({
       if (!fs.existsSync(filePath)) {
         throw new Error(`Missing file at path: ${filePath}`);
       }
-      fs.appendFileSync(filePath, `${utils_1.toCommandValue(message)}${os.EOL}`, {
+      fs.appendFileSync(filePath, `${(0, utils_1.toCommandValue)(message)}${os.EOL}`, {
         encoding: "utf8"
       });
     }
     exports2.issueFileCommand = issueFileCommand;
     function prepareKeyValueMessage(key, value) {
-      const delimiter = `ghadelimiter_${uuid_1.v4()}`;
-      const convertedValue = utils_1.toCommandValue(value);
+      const delimiter = `ghadelimiter_${crypto.randomUUID()}`;
+      const convertedValue = (0, utils_1.toCommandValue)(value);
       if (key.includes(delimiter)) {
         throw new Error(`Unexpected input: name should not contain the delimiter "${delimiter}"`);
       }
@@ -1281,7 +927,7 @@ var require_util = __commonJS({
     var { InvalidArgumentError } = require_errors();
     var { Blob: Blob2 } = require("buffer");
     var nodeUtil = require("util");
-    var { stringify: stringify2 } = require("querystring");
+    var { stringify } = require("querystring");
     var { headerNameLowerCasedRecord } = require_constants();
     var [nodeMajor, nodeMinor] = process.versions.node.split(".").map((v) => Number(v));
     function nop() {
@@ -1296,7 +942,7 @@ var require_util = __commonJS({
       if (url.includes("?") || url.includes("#")) {
         throw new Error('Query params cannot be passed when url already contains "?" or "#".');
       }
-      const stringified = stringify2(queryParams);
+      const stringified = stringify(queryParams);
       if (stringified) {
         url += "?" + stringified;
       }
@@ -3973,11 +3619,11 @@ var require_util2 = __commonJS({
     var assert = require("assert");
     var { isUint8Array } = require("util/types");
     var supportedHashes = [];
-    var crypto4;
+    var crypto;
     try {
-      crypto4 = require("crypto");
+      crypto = require("crypto");
       const possibleRelevantHashes = ["sha256", "sha384", "sha512"];
-      supportedHashes = crypto4.getHashes().filter((hash) => possibleRelevantHashes.includes(hash));
+      supportedHashes = crypto.getHashes().filter((hash) => possibleRelevantHashes.includes(hash));
     } catch {
     }
     function responseURL(response) {
@@ -4243,7 +3889,7 @@ var require_util2 = __commonJS({
       }
     }
     function bytesMatch(bytes, metadataList) {
-      if (crypto4 === void 0) {
+      if (crypto === void 0) {
         return true;
       }
       const parsedMetadata = parseMetadata(metadataList);
@@ -4258,7 +3904,7 @@ var require_util2 = __commonJS({
       for (const item of metadata) {
         const algorithm = item.algo;
         const expectedValue = item.hash;
-        let actualValue = crypto4.createHash(algorithm).update(bytes).digest("base64");
+        let actualValue = crypto.createHash(algorithm).update(bytes).digest("base64");
         if (actualValue[actualValue.length - 1] === "=") {
           if (actualValue[actualValue.length - 2] === "=") {
             actualValue = actualValue.slice(0, -2);
@@ -11400,7 +11046,7 @@ var require_proxy_agent = __commonJS({
   "node_modules/@reporters/github/node_modules/undici/lib/proxy-agent.js"(exports2, module2) {
     "use strict";
     var { kProxy, kClose, kDestroy, kInterceptors } = require_symbols();
-    var { URL: URL3 } = require("url");
+    var { URL: URL2 } = require("url");
     var Agent = require_agent();
     var Pool = require_pool();
     var DispatcherBase = require_dispatcher_base();
@@ -11449,7 +11095,7 @@ var require_proxy_agent = __commonJS({
         this[kRequestTls] = opts.requestTls;
         this[kProxyTls] = opts.proxyTls;
         this[kProxyHeaders] = opts.headers || {};
-        const resolvedUrl = new URL3(opts.uri);
+        const resolvedUrl = new URL2(opts.uri);
         const { origin, port, host, username, password } = resolvedUrl;
         if (opts.auth && opts.token) {
           throw new InvalidArgumentError("opts.auth cannot be used in combination with opts.token");
@@ -11504,7 +11150,7 @@ var require_proxy_agent = __commonJS({
         });
       }
       dispatch(opts, handler) {
-        const { host } = new URL3(opts.origin);
+        const { host } = new URL2(opts.origin);
         const headers = buildHeaders(opts.headers);
         throwIfProxyAuthIsSent(headers);
         return this[kAgent].dispatch(
@@ -15919,7 +15565,7 @@ var require_util6 = __commonJS({
         throw new Error("Invalid cookie max-age");
       }
     }
-    function stringify2(cookie) {
+    function stringify(cookie) {
       if (cookie.name.length === 0) {
         return null;
       }
@@ -15984,7 +15630,7 @@ var require_util6 = __commonJS({
     }
     module2.exports = {
       isCTLExcludingHtab,
-      stringify: stringify2,
+      stringify,
       getHeadersList
     };
   }
@@ -16135,7 +15781,7 @@ var require_cookies = __commonJS({
   "node_modules/@reporters/github/node_modules/undici/lib/cookies/index.js"(exports2, module2) {
     "use strict";
     var { parseSetCookie } = require_parse();
-    var { stringify: stringify2, getHeadersList } = require_util6();
+    var { stringify, getHeadersList } = require_util6();
     var { webidl } = require_webidl();
     var { Headers } = require_headers();
     function getCookies(headers) {
@@ -16177,9 +15823,9 @@ var require_cookies = __commonJS({
       webidl.argumentLengthCheck(arguments, 2, { header: "setCookie" });
       webidl.brandCheck(headers, Headers, { strict: false });
       cookie = webidl.converters.Cookie(cookie);
-      const str = stringify2(cookie);
+      const str = stringify(cookie);
       if (str) {
-        headers.append("Set-Cookie", stringify2(cookie));
+        headers.append("Set-Cookie", stringify(cookie));
       }
     }
     webidl.converters.DeleteCookieAttributes = webidl.dictionaryConverter([
@@ -16675,9 +16321,9 @@ var require_connection = __commonJS({
     channels.open = diagnosticsChannel.channel("undici:websocket:open");
     channels.close = diagnosticsChannel.channel("undici:websocket:close");
     channels.socketError = diagnosticsChannel.channel("undici:websocket:socket_error");
-    var crypto4;
+    var crypto;
     try {
-      crypto4 = require("crypto");
+      crypto = require("crypto");
     } catch {
     }
     function establishWebSocketConnection(url, protocols, ws, onEstablish, options) {
@@ -16696,7 +16342,7 @@ var require_connection = __commonJS({
         const headersList = new Headers(options.headers)[kHeadersList];
         request.headersList = headersList;
       }
-      const keyValue = crypto4.randomBytes(16).toString("base64");
+      const keyValue = crypto.randomBytes(16).toString("base64");
       request.headersList.append("sec-websocket-key", keyValue);
       request.headersList.append("sec-websocket-version", "13");
       for (const protocol of protocols) {
@@ -16725,7 +16371,7 @@ var require_connection = __commonJS({
             return;
           }
           const secWSAccept = response.headersList.get("Sec-WebSocket-Accept");
-          const digest = crypto4.createHash("sha1").update(keyValue + uid).digest("base64");
+          const digest = crypto.createHash("sha1").update(keyValue + uid).digest("base64");
           if (secWSAccept !== digest) {
             failWebsocketConnection(ws, "Incorrect hash received in Sec-WebSocket-Accept header.");
             return;
@@ -16805,9 +16451,9 @@ var require_frame = __commonJS({
   "node_modules/@reporters/github/node_modules/undici/lib/websocket/frame.js"(exports2, module2) {
     "use strict";
     var { maxUnsigned16Bit } = require_constants5();
-    var crypto4;
+    var crypto;
     try {
-      crypto4 = require("crypto");
+      crypto = require("crypto");
     } catch {
     }
     var WebsocketFrameSend = class {
@@ -16816,7 +16462,7 @@ var require_frame = __commonJS({
        */
       constructor(data) {
         this.frameData = data;
-        this.maskKey = crypto4.randomBytes(4);
+        this.maskKey = crypto.randomBytes(4);
       }
       createFrame(opcode) {
         const bodyLength = this.frameData?.byteLength ?? 0;
@@ -18449,9 +18095,9 @@ var require_oidc_utils = __commonJS({
               const encodedAudience = encodeURIComponent(audience);
               id_token_url = `${id_token_url}&audience=${encodedAudience}`;
             }
-            core_1.debug(`ID token url is ${id_token_url}`);
+            (0, core_1.debug)(`ID token url is ${id_token_url}`);
             const id_token = yield OidcClient.getCall(id_token_url);
-            core_1.setSecret(id_token);
+            (0, core_1.setSecret)(id_token);
             return id_token;
           } catch (error) {
             throw new Error(`Error message: ${error.message}`);
@@ -18764,9 +18410,13 @@ var require_path_utils = __commonJS({
     var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
-      Object.defineProperty(o, k2, { enumerable: true, get: function() {
-        return m[k];
-      } });
+      var desc = Object.getOwnPropertyDescriptor(m, k);
+      if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+        desc = { enumerable: true, get: function() {
+          return m[k];
+        } };
+      }
+      Object.defineProperty(o, k2, desc);
     } : function(o, m, k, k2) {
       if (k2 === void 0)
         k2 = k;
@@ -18783,7 +18433,7 @@ var require_path_utils = __commonJS({
       var result = {};
       if (mod != null) {
         for (var k in mod)
-          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+          if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k))
             __createBinding(result, mod, k);
       }
       __setModuleDefault(result, mod);
@@ -18807,9 +18457,9 @@ var require_path_utils = __commonJS({
   }
 });
 
-// node_modules/@reporters/github/node_modules/@actions/core/lib/core.js
-var require_core = __commonJS({
-  "node_modules/@reporters/github/node_modules/@actions/core/lib/core.js"(exports2) {
+// node_modules/@reporters/github/node_modules/@actions/io/lib/io-util.js
+var require_io_util = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/io/lib/io-util.js"(exports2) {
     "use strict";
     var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
       if (k2 === void 0)
@@ -18866,146 +18516,1306 @@ var require_core = __commonJS({
         step((generator = generator.apply(thisArg, _arguments || [])).next());
       });
     };
+    var _a;
     Object.defineProperty(exports2, "__esModule", { value: true });
-    exports2.getIDToken = exports2.getState = exports2.saveState = exports2.group = exports2.endGroup = exports2.startGroup = exports2.info = exports2.notice = exports2.warning = exports2.error = exports2.debug = exports2.isDebug = exports2.setFailed = exports2.setCommandEcho = exports2.setOutput = exports2.getBooleanInput = exports2.getMultilineInput = exports2.getInput = exports2.addPath = exports2.setSecret = exports2.exportVariable = exports2.ExitCode = void 0;
-    var command_1 = require_command();
-    var file_command_1 = require_file_command();
-    var utils_1 = require_utils();
-    var os = __importStar(require("os"));
+    exports2.getCmdPath = exports2.tryGetExecutablePath = exports2.isRooted = exports2.isDirectory = exports2.exists = exports2.READONLY = exports2.UV_FS_O_EXLOCK = exports2.IS_WINDOWS = exports2.unlink = exports2.symlink = exports2.stat = exports2.rmdir = exports2.rm = exports2.rename = exports2.readlink = exports2.readdir = exports2.open = exports2.mkdir = exports2.lstat = exports2.copyFile = exports2.chmod = void 0;
+    var fs = __importStar(require("fs"));
     var path2 = __importStar(require("path"));
-    var oidc_utils_1 = require_oidc_utils();
-    var ExitCode;
-    (function(ExitCode2) {
-      ExitCode2[ExitCode2["Success"] = 0] = "Success";
-      ExitCode2[ExitCode2["Failure"] = 1] = "Failure";
-    })(ExitCode = exports2.ExitCode || (exports2.ExitCode = {}));
-    function exportVariable(name, val) {
-      const convertedVal = utils_1.toCommandValue(val);
-      process.env[name] = convertedVal;
-      const filePath = process.env["GITHUB_ENV"] || "";
-      if (filePath) {
-        return file_command_1.issueFileCommand("ENV", file_command_1.prepareKeyValueMessage(name, val));
-      }
-      command_1.issueCommand("set-env", { name }, convertedVal);
-    }
-    exports2.exportVariable = exportVariable;
-    function setSecret(secret) {
-      command_1.issueCommand("add-mask", {}, secret);
-    }
-    exports2.setSecret = setSecret;
-    function addPath(inputPath) {
-      const filePath = process.env["GITHUB_PATH"] || "";
-      if (filePath) {
-        file_command_1.issueFileCommand("PATH", inputPath);
-      } else {
-        command_1.issueCommand("add-path", {}, inputPath);
-      }
-      process.env["PATH"] = `${inputPath}${path2.delimiter}${process.env["PATH"]}`;
+    _a = fs.promises, exports2.chmod = _a.chmod, exports2.copyFile = _a.copyFile, exports2.lstat = _a.lstat, exports2.mkdir = _a.mkdir, exports2.open = _a.open, exports2.readdir = _a.readdir, exports2.readlink = _a.readlink, exports2.rename = _a.rename, exports2.rm = _a.rm, exports2.rmdir = _a.rmdir, exports2.stat = _a.stat, exports2.symlink = _a.symlink, exports2.unlink = _a.unlink;
+    exports2.IS_WINDOWS = process.platform === "win32";
+    exports2.UV_FS_O_EXLOCK = 268435456;
+    exports2.READONLY = fs.constants.O_RDONLY;
+    function exists(fsPath) {
+      return __awaiter(this, void 0, void 0, function* () {
+        try {
+          yield exports2.stat(fsPath);
+        } catch (err) {
+          if (err.code === "ENOENT") {
+            return false;
+          }
+          throw err;
+        }
+        return true;
+      });
     }
-    exports2.addPath = addPath;
-    function getInput(name, options) {
-      const val = process.env[`INPUT_${name.replace(/ /g, "_").toUpperCase()}`] || "";
-      if (options && options.required && !val) {
-        throw new Error(`Input required and not supplied: ${name}`);
-      }
-      if (options && options.trimWhitespace === false) {
-        return val;
-      }
-      return val.trim();
+    exports2.exists = exists;
+    function isDirectory(fsPath, useStat = false) {
+      return __awaiter(this, void 0, void 0, function* () {
+        const stats = useStat ? yield exports2.stat(fsPath) : yield exports2.lstat(fsPath);
+        return stats.isDirectory();
+      });
     }
-    exports2.getInput = getInput;
-    function getMultilineInput(name, options) {
-      const inputs = getInput(name, options).split("\n").filter((x) => x !== "");
-      if (options && options.trimWhitespace === false) {
-        return inputs;
+    exports2.isDirectory = isDirectory;
+    function isRooted(p) {
+      p = normalizeSeparators(p);
+      if (!p) {
+        throw new Error('isRooted() parameter "p" cannot be empty');
       }
-      return inputs.map((input) => input.trim());
-    }
-    exports2.getMultilineInput = getMultilineInput;
-    function getBooleanInput(name, options) {
-      const trueValue = ["true", "True", "TRUE"];
-      const falseValue = ["false", "False", "FALSE"];
-      const val = getInput(name, options);
-      if (trueValue.includes(val))
-        return true;
-      if (falseValue.includes(val))
-        return false;
-      throw new TypeError(`Input does not meet YAML 1.2 "Core Schema" specification: ${name}
-Support boolean input list: \`true | True | TRUE | false | False | FALSE\``);
-    }
-    exports2.getBooleanInput = getBooleanInput;
-    function setOutput(name, value) {
-      const filePath = process.env["GITHUB_OUTPUT"] || "";
-      if (filePath) {
-        return file_command_1.issueFileCommand("OUTPUT", file_command_1.prepareKeyValueMessage(name, value));
+      if (exports2.IS_WINDOWS) {
+        return p.startsWith("\\") || /^[A-Z]:/i.test(p);
       }
-      process.stdout.write(os.EOL);
-      command_1.issueCommand("set-output", { name }, utils_1.toCommandValue(value));
-    }
-    exports2.setOutput = setOutput;
-    function setCommandEcho(enabled) {
-      command_1.issue("echo", enabled ? "on" : "off");
-    }
-    exports2.setCommandEcho = setCommandEcho;
-    function setFailed(message) {
-      process.exitCode = ExitCode.Failure;
-      error(message);
-    }
-    exports2.setFailed = setFailed;
-    function isDebug() {
-      return process.env["RUNNER_DEBUG"] === "1";
-    }
-    exports2.isDebug = isDebug;
-    function debug(message) {
-      command_1.issueCommand("debug", {}, message);
-    }
-    exports2.debug = debug;
-    function error(message, properties = {}) {
-      command_1.issueCommand("error", utils_1.toCommandProperties(properties), message instanceof Error ? message.toString() : message);
-    }
-    exports2.error = error;
-    function warning(message, properties = {}) {
-      command_1.issueCommand("warning", utils_1.toCommandProperties(properties), message instanceof Error ? message.toString() : message);
+      return p.startsWith("/");
     }
-    exports2.warning = warning;
-    function notice(message, properties = {}) {
-      command_1.issueCommand("notice", utils_1.toCommandProperties(properties), message instanceof Error ? message.toString() : message);
-    }
-    exports2.notice = notice;
-    function info(message) {
-      process.stdout.write(message + os.EOL);
-    }
-    exports2.info = info;
-    function startGroup(name) {
-      command_1.issue("group", name);
-    }
-    exports2.startGroup = startGroup;
-    function endGroup() {
-      command_1.issue("endgroup");
-    }
-    exports2.endGroup = endGroup;
-    function group(name, fn) {
+    exports2.isRooted = isRooted;
+    function tryGetExecutablePath(filePath, extensions) {
       return __awaiter(this, void 0, void 0, function* () {
-        startGroup(name);
-        let result;
+        let stats = void 0;
         try {
-          result = yield fn();
-        } finally {
-          endGroup();
+          stats = yield exports2.stat(filePath);
+        } catch (err) {
+          if (err.code !== "ENOENT") {
+            console.log(`Unexpected error attempting to determine if executable file exists '${filePath}': ${err}`);
+          }
         }
-        return result;
+        if (stats && stats.isFile()) {
+          if (exports2.IS_WINDOWS) {
+            const upperExt = path2.extname(filePath).toUpperCase();
+            if (extensions.some((validExt) => validExt.toUpperCase() === upperExt)) {
+              return filePath;
+            }
+          } else {
+            if (isUnixExecutable(stats)) {
+              return filePath;
+            }
+          }
+        }
+        const originalFilePath = filePath;
+        for (const extension of extensions) {
+          filePath = originalFilePath + extension;
+          stats = void 0;
+          try {
+            stats = yield exports2.stat(filePath);
+          } catch (err) {
+            if (err.code !== "ENOENT") {
+              console.log(`Unexpected error attempting to determine if executable file exists '${filePath}': ${err}`);
+            }
+          }
+          if (stats && stats.isFile()) {
+            if (exports2.IS_WINDOWS) {
+              try {
+                const directory = path2.dirname(filePath);
+                const upperName = path2.basename(filePath).toUpperCase();
+                for (const actualName of yield exports2.readdir(directory)) {
+                  if (upperName === actualName.toUpperCase()) {
+                    filePath = path2.join(directory, actualName);
+                    break;
+                  }
+                }
+              } catch (err) {
+                console.log(`Unexpected error attempting to determine the actual case of the file '${filePath}': ${err}`);
+              }
+              return filePath;
+            } else {
+              if (isUnixExecutable(stats)) {
+                return filePath;
+              }
+            }
+          }
+        }
+        return "";
       });
     }
-    exports2.group = group;
-    function saveState(name, value) {
-      const filePath = process.env["GITHUB_STATE"] || "";
-      if (filePath) {
-        return file_command_1.issueFileCommand("STATE", file_command_1.prepareKeyValueMessage(name, value));
+    exports2.tryGetExecutablePath = tryGetExecutablePath;
+    function normalizeSeparators(p) {
+      p = p || "";
+      if (exports2.IS_WINDOWS) {
+        p = p.replace(/\//g, "\\");
+        return p.replace(/\\\\+/g, "\\");
       }
-      command_1.issueCommand("save-state", { name }, utils_1.toCommandValue(value));
+      return p.replace(/\/\/+/g, "/");
     }
-    exports2.saveState = saveState;
-    function getState(name) {
+    function isUnixExecutable(stats) {
+      return (stats.mode & 1) > 0 || (stats.mode & 8) > 0 && stats.gid === process.getgid() || (stats.mode & 64) > 0 && stats.uid === process.getuid();
+    }
+    function getCmdPath() {
+      var _a2;
+      return (_a2 = process.env["COMSPEC"]) !== null && _a2 !== void 0 ? _a2 : `cmd.exe`;
+    }
+    exports2.getCmdPath = getCmdPath;
+  }
+});
+
+// node_modules/@reporters/github/node_modules/@actions/io/lib/io.js
+var require_io = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/io/lib/io.js"(exports2) {
+    "use strict";
+    var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      Object.defineProperty(o, k2, { enumerable: true, get: function() {
+        return m[k];
+      } });
+    } : function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      o[k2] = m[k];
+    });
+    var __setModuleDefault = exports2 && exports2.__setModuleDefault || (Object.create ? function(o, v) {
+      Object.defineProperty(o, "default", { enumerable: true, value: v });
+    } : function(o, v) {
+      o["default"] = v;
+    });
+    var __importStar = exports2 && exports2.__importStar || function(mod) {
+      if (mod && mod.__esModule)
+        return mod;
+      var result = {};
+      if (mod != null) {
+        for (var k in mod)
+          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+            __createBinding(result, mod, k);
+      }
+      __setModuleDefault(result, mod);
+      return result;
+    };
+    var __awaiter = exports2 && exports2.__awaiter || function(thisArg, _arguments, P, generator) {
+      function adopt(value) {
+        return value instanceof P ? value : new P(function(resolve) {
+          resolve(value);
+        });
+      }
+      return new (P || (P = Promise))(function(resolve, reject) {
+        function fulfilled(value) {
+          try {
+            step(generator.next(value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function rejected(value) {
+          try {
+            step(generator["throw"](value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function step(result) {
+          result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected);
+        }
+        step((generator = generator.apply(thisArg, _arguments || [])).next());
+      });
+    };
+    Object.defineProperty(exports2, "__esModule", { value: true });
+    exports2.findInPath = exports2.which = exports2.mkdirP = exports2.rmRF = exports2.mv = exports2.cp = void 0;
+    var assert_1 = require("assert");
+    var path2 = __importStar(require("path"));
+    var ioUtil = __importStar(require_io_util());
+    function cp(source, dest, options = {}) {
+      return __awaiter(this, void 0, void 0, function* () {
+        const { force, recursive, copySourceDirectory } = readCopyOptions(options);
+        const destStat = (yield ioUtil.exists(dest)) ? yield ioUtil.stat(dest) : null;
+        if (destStat && destStat.isFile() && !force) {
+          return;
+        }
+        const newDest = destStat && destStat.isDirectory() && copySourceDirectory ? path2.join(dest, path2.basename(source)) : dest;
+        if (!(yield ioUtil.exists(source))) {
+          throw new Error(`no such file or directory: ${source}`);
+        }
+        const sourceStat = yield ioUtil.stat(source);
+        if (sourceStat.isDirectory()) {
+          if (!recursive) {
+            throw new Error(`Failed to copy. ${source} is a directory, but tried to copy without recursive flag.`);
+          } else {
+            yield cpDirRecursive(source, newDest, 0, force);
+          }
+        } else {
+          if (path2.relative(source, newDest) === "") {
+            throw new Error(`'${newDest}' and '${source}' are the same file`);
+          }
+          yield copyFile(source, newDest, force);
+        }
+      });
+    }
+    exports2.cp = cp;
+    function mv(source, dest, options = {}) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if (yield ioUtil.exists(dest)) {
+          let destExists = true;
+          if (yield ioUtil.isDirectory(dest)) {
+            dest = path2.join(dest, path2.basename(source));
+            destExists = yield ioUtil.exists(dest);
+          }
+          if (destExists) {
+            if (options.force == null || options.force) {
+              yield rmRF(dest);
+            } else {
+              throw new Error("Destination already exists");
+            }
+          }
+        }
+        yield mkdirP(path2.dirname(dest));
+        yield ioUtil.rename(source, dest);
+      });
+    }
+    exports2.mv = mv;
+    function rmRF(inputPath) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if (ioUtil.IS_WINDOWS) {
+          if (/[*"<>|]/.test(inputPath)) {
+            throw new Error('File path must not contain `*`, `"`, `<`, `>` or `|` on Windows');
+          }
+        }
+        try {
+          yield ioUtil.rm(inputPath, {
+            force: true,
+            maxRetries: 3,
+            recursive: true,
+            retryDelay: 300
+          });
+        } catch (err) {
+          throw new Error(`File was unable to be removed ${err}`);
+        }
+      });
+    }
+    exports2.rmRF = rmRF;
+    function mkdirP(fsPath) {
+      return __awaiter(this, void 0, void 0, function* () {
+        assert_1.ok(fsPath, "a path argument must be provided");
+        yield ioUtil.mkdir(fsPath, { recursive: true });
+      });
+    }
+    exports2.mkdirP = mkdirP;
+    function which(tool, check) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if (!tool) {
+          throw new Error("parameter 'tool' is required");
+        }
+        if (check) {
+          const result = yield which(tool, false);
+          if (!result) {
+            if (ioUtil.IS_WINDOWS) {
+              throw new Error(`Unable to locate executable file: ${tool}. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file.`);
+            } else {
+              throw new Error(`Unable to locate executable file: ${tool}. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.`);
+            }
+          }
+          return result;
+        }
+        const matches = yield findInPath(tool);
+        if (matches && matches.length > 0) {
+          return matches[0];
+        }
+        return "";
+      });
+    }
+    exports2.which = which;
+    function findInPath(tool) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if (!tool) {
+          throw new Error("parameter 'tool' is required");
+        }
+        const extensions = [];
+        if (ioUtil.IS_WINDOWS && process.env["PATHEXT"]) {
+          for (const extension of process.env["PATHEXT"].split(path2.delimiter)) {
+            if (extension) {
+              extensions.push(extension);
+            }
+          }
+        }
+        if (ioUtil.isRooted(tool)) {
+          const filePath = yield ioUtil.tryGetExecutablePath(tool, extensions);
+          if (filePath) {
+            return [filePath];
+          }
+          return [];
+        }
+        if (tool.includes(path2.sep)) {
+          return [];
+        }
+        const directories = [];
+        if (process.env.PATH) {
+          for (const p of process.env.PATH.split(path2.delimiter)) {
+            if (p) {
+              directories.push(p);
+            }
+          }
+        }
+        const matches = [];
+        for (const directory of directories) {
+          const filePath = yield ioUtil.tryGetExecutablePath(path2.join(directory, tool), extensions);
+          if (filePath) {
+            matches.push(filePath);
+          }
+        }
+        return matches;
+      });
+    }
+    exports2.findInPath = findInPath;
+    function readCopyOptions(options) {
+      const force = options.force == null ? true : options.force;
+      const recursive = Boolean(options.recursive);
+      const copySourceDirectory = options.copySourceDirectory == null ? true : Boolean(options.copySourceDirectory);
+      return { force, recursive, copySourceDirectory };
+    }
+    function cpDirRecursive(sourceDir, destDir, currentDepth, force) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if (currentDepth >= 255)
+          return;
+        currentDepth++;
+        yield mkdirP(destDir);
+        const files = yield ioUtil.readdir(sourceDir);
+        for (const fileName of files) {
+          const srcFile = `${sourceDir}/${fileName}`;
+          const destFile = `${destDir}/${fileName}`;
+          const srcFileStat = yield ioUtil.lstat(srcFile);
+          if (srcFileStat.isDirectory()) {
+            yield cpDirRecursive(srcFile, destFile, currentDepth, force);
+          } else {
+            yield copyFile(srcFile, destFile, force);
+          }
+        }
+        yield ioUtil.chmod(destDir, (yield ioUtil.stat(sourceDir)).mode);
+      });
+    }
+    function copyFile(srcFile, destFile, force) {
+      return __awaiter(this, void 0, void 0, function* () {
+        if ((yield ioUtil.lstat(srcFile)).isSymbolicLink()) {
+          try {
+            yield ioUtil.lstat(destFile);
+            yield ioUtil.unlink(destFile);
+          } catch (e) {
+            if (e.code === "EPERM") {
+              yield ioUtil.chmod(destFile, "0666");
+              yield ioUtil.unlink(destFile);
+            }
+          }
+          const symlinkFull = yield ioUtil.readlink(srcFile);
+          yield ioUtil.symlink(symlinkFull, destFile, ioUtil.IS_WINDOWS ? "junction" : null);
+        } else if (!(yield ioUtil.exists(destFile)) || force) {
+          yield ioUtil.copyFile(srcFile, destFile);
+        }
+      });
+    }
+  }
+});
+
+// node_modules/@reporters/github/node_modules/@actions/exec/lib/toolrunner.js
+var require_toolrunner = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/exec/lib/toolrunner.js"(exports2) {
+    "use strict";
+    var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      Object.defineProperty(o, k2, { enumerable: true, get: function() {
+        return m[k];
+      } });
+    } : function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      o[k2] = m[k];
+    });
+    var __setModuleDefault = exports2 && exports2.__setModuleDefault || (Object.create ? function(o, v) {
+      Object.defineProperty(o, "default", { enumerable: true, value: v });
+    } : function(o, v) {
+      o["default"] = v;
+    });
+    var __importStar = exports2 && exports2.__importStar || function(mod) {
+      if (mod && mod.__esModule)
+        return mod;
+      var result = {};
+      if (mod != null) {
+        for (var k in mod)
+          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+            __createBinding(result, mod, k);
+      }
+      __setModuleDefault(result, mod);
+      return result;
+    };
+    var __awaiter = exports2 && exports2.__awaiter || function(thisArg, _arguments, P, generator) {
+      function adopt(value) {
+        return value instanceof P ? value : new P(function(resolve) {
+          resolve(value);
+        });
+      }
+      return new (P || (P = Promise))(function(resolve, reject) {
+        function fulfilled(value) {
+          try {
+            step(generator.next(value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function rejected(value) {
+          try {
+            step(generator["throw"](value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function step(result) {
+          result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected);
+        }
+        step((generator = generator.apply(thisArg, _arguments || [])).next());
+      });
+    };
+    Object.defineProperty(exports2, "__esModule", { value: true });
+    exports2.argStringToArray = exports2.ToolRunner = void 0;
+    var os = __importStar(require("os"));
+    var events = __importStar(require("events"));
+    var child = __importStar(require("child_process"));
+    var path2 = __importStar(require("path"));
+    var io = __importStar(require_io());
+    var ioUtil = __importStar(require_io_util());
+    var timers_1 = require("timers");
+    var IS_WINDOWS = process.platform === "win32";
+    var ToolRunner = class extends events.EventEmitter {
+      constructor(toolPath, args, options) {
+        super();
+        if (!toolPath) {
+          throw new Error("Parameter 'toolPath' cannot be null or empty.");
+        }
+        this.toolPath = toolPath;
+        this.args = args || [];
+        this.options = options || {};
+      }
+      _debug(message) {
+        if (this.options.listeners && this.options.listeners.debug) {
+          this.options.listeners.debug(message);
+        }
+      }
+      _getCommandString(options, noPrefix) {
+        const toolPath = this._getSpawnFileName();
+        const args = this._getSpawnArgs(options);
+        let cmd = noPrefix ? "" : "[command]";
+        if (IS_WINDOWS) {
+          if (this._isCmdFile()) {
+            cmd += toolPath;
+            for (const a of args) {
+              cmd += ` ${a}`;
+            }
+          } else if (options.windowsVerbatimArguments) {
+            cmd += `"${toolPath}"`;
+            for (const a of args) {
+              cmd += ` ${a}`;
+            }
+          } else {
+            cmd += this._windowsQuoteCmdArg(toolPath);
+            for (const a of args) {
+              cmd += ` ${this._windowsQuoteCmdArg(a)}`;
+            }
+          }
+        } else {
+          cmd += toolPath;
+          for (const a of args) {
+            cmd += ` ${a}`;
+          }
+        }
+        return cmd;
+      }
+      _processLineBuffer(data, strBuffer, onLine) {
+        try {
+          let s = strBuffer + data.toString();
+          let n = s.indexOf(os.EOL);
+          while (n > -1) {
+            const line = s.substring(0, n);
+            onLine(line);
+            s = s.substring(n + os.EOL.length);
+            n = s.indexOf(os.EOL);
+          }
+          return s;
+        } catch (err) {
+          this._debug(`error processing line. Failed with error ${err}`);
+          return "";
+        }
+      }
+      _getSpawnFileName() {
+        if (IS_WINDOWS) {
+          if (this._isCmdFile()) {
+            return process.env["COMSPEC"] || "cmd.exe";
+          }
+        }
+        return this.toolPath;
+      }
+      _getSpawnArgs(options) {
+        if (IS_WINDOWS) {
+          if (this._isCmdFile()) {
+            let argline = `/D /S /C "${this._windowsQuoteCmdArg(this.toolPath)}`;
+            for (const a of this.args) {
+              argline += " ";
+              argline += options.windowsVerbatimArguments ? a : this._windowsQuoteCmdArg(a);
+            }
+            argline += '"';
+            return [argline];
+          }
+        }
+        return this.args;
+      }
+      _endsWith(str, end) {
+        return str.endsWith(end);
+      }
+      _isCmdFile() {
+        const upperToolPath = this.toolPath.toUpperCase();
+        return this._endsWith(upperToolPath, ".CMD") || this._endsWith(upperToolPath, ".BAT");
+      }
+      _windowsQuoteCmdArg(arg) {
+        if (!this._isCmdFile()) {
+          return this._uvQuoteCmdArg(arg);
+        }
+        if (!arg) {
+          return '""';
+        }
+        const cmdSpecialChars = [
+          " ",
+          "	",
+          "&",
+          "(",
+          ")",
+          "[",
+          "]",
+          "{",
+          "}",
+          "^",
+          "=",
+          ";",
+          "!",
+          "'",
+          "+",
+          ",",
+          "`",
+          "~",
+          "|",
+          "<",
+          ">",
+          '"'
+        ];
+        let needsQuotes = false;
+        for (const char of arg) {
+          if (cmdSpecialChars.some((x) => x === char)) {
+            needsQuotes = true;
+            break;
+          }
+        }
+        if (!needsQuotes) {
+          return arg;
+        }
+        let reverse = '"';
+        let quoteHit = true;
+        for (let i = arg.length; i > 0; i--) {
+          reverse += arg[i - 1];
+          if (quoteHit && arg[i - 1] === "\\") {
+            reverse += "\\";
+          } else if (arg[i - 1] === '"') {
+            quoteHit = true;
+            reverse += '"';
+          } else {
+            quoteHit = false;
+          }
+        }
+        reverse += '"';
+        return reverse.split("").reverse().join("");
+      }
+      _uvQuoteCmdArg(arg) {
+        if (!arg) {
+          return '""';
+        }
+        if (!arg.includes(" ") && !arg.includes("	") && !arg.includes('"')) {
+          return arg;
+        }
+        if (!arg.includes('"') && !arg.includes("\\")) {
+          return `"${arg}"`;
+        }
+        let reverse = '"';
+        let quoteHit = true;
+        for (let i = arg.length; i > 0; i--) {
+          reverse += arg[i - 1];
+          if (quoteHit && arg[i - 1] === "\\") {
+            reverse += "\\";
+          } else if (arg[i - 1] === '"') {
+            quoteHit = true;
+            reverse += "\\";
+          } else {
+            quoteHit = false;
+          }
+        }
+        reverse += '"';
+        return reverse.split("").reverse().join("");
+      }
+      _cloneExecOptions(options) {
+        options = options || {};
+        const result = {
+          cwd: options.cwd || process.cwd(),
+          env: options.env || process.env,
+          silent: options.silent || false,
+          windowsVerbatimArguments: options.windowsVerbatimArguments || false,
+          failOnStdErr: options.failOnStdErr || false,
+          ignoreReturnCode: options.ignoreReturnCode || false,
+          delay: options.delay || 1e4
+        };
+        result.outStream = options.outStream || process.stdout;
+        result.errStream = options.errStream || process.stderr;
+        return result;
+      }
+      _getSpawnOptions(options, toolPath) {
+        options = options || {};
+        const result = {};
+        result.cwd = options.cwd;
+        result.env = options.env;
+        result["windowsVerbatimArguments"] = options.windowsVerbatimArguments || this._isCmdFile();
+        if (options.windowsVerbatimArguments) {
+          result.argv0 = `"${toolPath}"`;
+        }
+        return result;
+      }
+      /**
+       * Exec a tool.
+       * Output will be streamed to the live console.
+       * Returns promise with return code
+       *
+       * @param     tool     path to tool to exec
+       * @param     options  optional exec options.  See ExecOptions
+       * @returns   number
+       */
+      exec() {
+        return __awaiter(this, void 0, void 0, function* () {
+          if (!ioUtil.isRooted(this.toolPath) && (this.toolPath.includes("/") || IS_WINDOWS && this.toolPath.includes("\\"))) {
+            this.toolPath = path2.resolve(process.cwd(), this.options.cwd || process.cwd(), this.toolPath);
+          }
+          this.toolPath = yield io.which(this.toolPath, true);
+          return new Promise((resolve, reject) => __awaiter(this, void 0, void 0, function* () {
+            this._debug(`exec tool: ${this.toolPath}`);
+            this._debug("arguments:");
+            for (const arg of this.args) {
+              this._debug(`   ${arg}`);
+            }
+            const optionsNonNull = this._cloneExecOptions(this.options);
+            if (!optionsNonNull.silent && optionsNonNull.outStream) {
+              optionsNonNull.outStream.write(this._getCommandString(optionsNonNull) + os.EOL);
+            }
+            const state = new ExecState(optionsNonNull, this.toolPath);
+            state.on("debug", (message) => {
+              this._debug(message);
+            });
+            if (this.options.cwd && !(yield ioUtil.exists(this.options.cwd))) {
+              return reject(new Error(`The cwd: ${this.options.cwd} does not exist!`));
+            }
+            const fileName = this._getSpawnFileName();
+            const cp = child.spawn(fileName, this._getSpawnArgs(optionsNonNull), this._getSpawnOptions(this.options, fileName));
+            let stdbuffer = "";
+            if (cp.stdout) {
+              cp.stdout.on("data", (data) => {
+                if (this.options.listeners && this.options.listeners.stdout) {
+                  this.options.listeners.stdout(data);
+                }
+                if (!optionsNonNull.silent && optionsNonNull.outStream) {
+                  optionsNonNull.outStream.write(data);
+                }
+                stdbuffer = this._processLineBuffer(data, stdbuffer, (line) => {
+                  if (this.options.listeners && this.options.listeners.stdline) {
+                    this.options.listeners.stdline(line);
+                  }
+                });
+              });
+            }
+            let errbuffer = "";
+            if (cp.stderr) {
+              cp.stderr.on("data", (data) => {
+                state.processStderr = true;
+                if (this.options.listeners && this.options.listeners.stderr) {
+                  this.options.listeners.stderr(data);
+                }
+                if (!optionsNonNull.silent && optionsNonNull.errStream && optionsNonNull.outStream) {
+                  const s = optionsNonNull.failOnStdErr ? optionsNonNull.errStream : optionsNonNull.outStream;
+                  s.write(data);
+                }
+                errbuffer = this._processLineBuffer(data, errbuffer, (line) => {
+                  if (this.options.listeners && this.options.listeners.errline) {
+                    this.options.listeners.errline(line);
+                  }
+                });
+              });
+            }
+            cp.on("error", (err) => {
+              state.processError = err.message;
+              state.processExited = true;
+              state.processClosed = true;
+              state.CheckComplete();
+            });
+            cp.on("exit", (code) => {
+              state.processExitCode = code;
+              state.processExited = true;
+              this._debug(`Exit code ${code} received from tool '${this.toolPath}'`);
+              state.CheckComplete();
+            });
+            cp.on("close", (code) => {
+              state.processExitCode = code;
+              state.processExited = true;
+              state.processClosed = true;
+              this._debug(`STDIO streams have closed for tool '${this.toolPath}'`);
+              state.CheckComplete();
+            });
+            state.on("done", (error, exitCode) => {
+              if (stdbuffer.length > 0) {
+                this.emit("stdline", stdbuffer);
+              }
+              if (errbuffer.length > 0) {
+                this.emit("errline", errbuffer);
+              }
+              cp.removeAllListeners();
+              if (error) {
+                reject(error);
+              } else {
+                resolve(exitCode);
+              }
+            });
+            if (this.options.input) {
+              if (!cp.stdin) {
+                throw new Error("child process missing stdin");
+              }
+              cp.stdin.end(this.options.input);
+            }
+          }));
+        });
+      }
+    };
+    exports2.ToolRunner = ToolRunner;
+    function argStringToArray(argString) {
+      const args = [];
+      let inQuotes = false;
+      let escaped = false;
+      let arg = "";
+      function append(c) {
+        if (escaped && c !== '"') {
+          arg += "\\";
+        }
+        arg += c;
+        escaped = false;
+      }
+      for (let i = 0; i < argString.length; i++) {
+        const c = argString.charAt(i);
+        if (c === '"') {
+          if (!escaped) {
+            inQuotes = !inQuotes;
+          } else {
+            append(c);
+          }
+          continue;
+        }
+        if (c === "\\" && escaped) {
+          append(c);
+          continue;
+        }
+        if (c === "\\" && inQuotes) {
+          escaped = true;
+          continue;
+        }
+        if (c === " " && !inQuotes) {
+          if (arg.length > 0) {
+            args.push(arg);
+            arg = "";
+          }
+          continue;
+        }
+        append(c);
+      }
+      if (arg.length > 0) {
+        args.push(arg.trim());
+      }
+      return args;
+    }
+    exports2.argStringToArray = argStringToArray;
+    var ExecState = class extends events.EventEmitter {
+      constructor(options, toolPath) {
+        super();
+        this.processClosed = false;
+        this.processError = "";
+        this.processExitCode = 0;
+        this.processExited = false;
+        this.processStderr = false;
+        this.delay = 1e4;
+        this.done = false;
+        this.timeout = null;
+        if (!toolPath) {
+          throw new Error("toolPath must not be empty");
+        }
+        this.options = options;
+        this.toolPath = toolPath;
+        if (options.delay) {
+          this.delay = options.delay;
+        }
+      }
+      CheckComplete() {
+        if (this.done) {
+          return;
+        }
+        if (this.processClosed) {
+          this._setResult();
+        } else if (this.processExited) {
+          this.timeout = timers_1.setTimeout(ExecState.HandleTimeout, this.delay, this);
+        }
+      }
+      _debug(message) {
+        this.emit("debug", message);
+      }
+      _setResult() {
+        let error;
+        if (this.processExited) {
+          if (this.processError) {
+            error = new Error(`There was an error when attempting to execute the process '${this.toolPath}'. This may indicate the process failed to start. Error: ${this.processError}`);
+          } else if (this.processExitCode !== 0 && !this.options.ignoreReturnCode) {
+            error = new Error(`The process '${this.toolPath}' failed with exit code ${this.processExitCode}`);
+          } else if (this.processStderr && this.options.failOnStdErr) {
+            error = new Error(`The process '${this.toolPath}' failed because one or more lines were written to the STDERR stream`);
+          }
+        }
+        if (this.timeout) {
+          clearTimeout(this.timeout);
+          this.timeout = null;
+        }
+        this.done = true;
+        this.emit("done", error, this.processExitCode);
+      }
+      static HandleTimeout(state) {
+        if (state.done) {
+          return;
+        }
+        if (!state.processClosed && state.processExited) {
+          const message = `The STDIO streams did not close within ${state.delay / 1e3} seconds of the exit event from process '${state.toolPath}'. This may indicate a child process inherited the STDIO streams and has not yet exited.`;
+          state._debug(message);
+        }
+        state._setResult();
+      }
+    };
+  }
+});
+
+// node_modules/@reporters/github/node_modules/@actions/exec/lib/exec.js
+var require_exec = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/exec/lib/exec.js"(exports2) {
+    "use strict";
+    var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      Object.defineProperty(o, k2, { enumerable: true, get: function() {
+        return m[k];
+      } });
+    } : function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      o[k2] = m[k];
+    });
+    var __setModuleDefault = exports2 && exports2.__setModuleDefault || (Object.create ? function(o, v) {
+      Object.defineProperty(o, "default", { enumerable: true, value: v });
+    } : function(o, v) {
+      o["default"] = v;
+    });
+    var __importStar = exports2 && exports2.__importStar || function(mod) {
+      if (mod && mod.__esModule)
+        return mod;
+      var result = {};
+      if (mod != null) {
+        for (var k in mod)
+          if (k !== "default" && Object.hasOwnProperty.call(mod, k))
+            __createBinding(result, mod, k);
+      }
+      __setModuleDefault(result, mod);
+      return result;
+    };
+    var __awaiter = exports2 && exports2.__awaiter || function(thisArg, _arguments, P, generator) {
+      function adopt(value) {
+        return value instanceof P ? value : new P(function(resolve) {
+          resolve(value);
+        });
+      }
+      return new (P || (P = Promise))(function(resolve, reject) {
+        function fulfilled(value) {
+          try {
+            step(generator.next(value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function rejected(value) {
+          try {
+            step(generator["throw"](value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function step(result) {
+          result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected);
+        }
+        step((generator = generator.apply(thisArg, _arguments || [])).next());
+      });
+    };
+    Object.defineProperty(exports2, "__esModule", { value: true });
+    exports2.getExecOutput = exports2.exec = void 0;
+    var string_decoder_1 = require("string_decoder");
+    var tr = __importStar(require_toolrunner());
+    function exec(commandLine, args, options) {
+      return __awaiter(this, void 0, void 0, function* () {
+        const commandArgs = tr.argStringToArray(commandLine);
+        if (commandArgs.length === 0) {
+          throw new Error(`Parameter 'commandLine' cannot be null or empty.`);
+        }
+        const toolPath = commandArgs[0];
+        args = commandArgs.slice(1).concat(args || []);
+        const runner = new tr.ToolRunner(toolPath, args, options);
+        return runner.exec();
+      });
+    }
+    exports2.exec = exec;
+    function getExecOutput(commandLine, args, options) {
+      var _a, _b;
+      return __awaiter(this, void 0, void 0, function* () {
+        let stdout = "";
+        let stderr = "";
+        const stdoutDecoder = new string_decoder_1.StringDecoder("utf8");
+        const stderrDecoder = new string_decoder_1.StringDecoder("utf8");
+        const originalStdoutListener = (_a = options === null || options === void 0 ? void 0 : options.listeners) === null || _a === void 0 ? void 0 : _a.stdout;
+        const originalStdErrListener = (_b = options === null || options === void 0 ? void 0 : options.listeners) === null || _b === void 0 ? void 0 : _b.stderr;
+        const stdErrListener = (data) => {
+          stderr += stderrDecoder.write(data);
+          if (originalStdErrListener) {
+            originalStdErrListener(data);
+          }
+        };
+        const stdOutListener = (data) => {
+          stdout += stdoutDecoder.write(data);
+          if (originalStdoutListener) {
+            originalStdoutListener(data);
+          }
+        };
+        const listeners = Object.assign(Object.assign({}, options === null || options === void 0 ? void 0 : options.listeners), { stdout: stdOutListener, stderr: stdErrListener });
+        const exitCode = yield exec(commandLine, args, Object.assign(Object.assign({}, options), { listeners }));
+        stdout += stdoutDecoder.end();
+        stderr += stderrDecoder.end();
+        return {
+          exitCode,
+          stdout,
+          stderr
+        };
+      });
+    }
+    exports2.getExecOutput = getExecOutput;
+  }
+});
+
+// node_modules/@reporters/github/node_modules/@actions/core/lib/platform.js
+var require_platform = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/core/lib/platform.js"(exports2) {
+    "use strict";
+    var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      var desc = Object.getOwnPropertyDescriptor(m, k);
+      if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+        desc = { enumerable: true, get: function() {
+          return m[k];
+        } };
+      }
+      Object.defineProperty(o, k2, desc);
+    } : function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      o[k2] = m[k];
+    });
+    var __setModuleDefault = exports2 && exports2.__setModuleDefault || (Object.create ? function(o, v) {
+      Object.defineProperty(o, "default", { enumerable: true, value: v });
+    } : function(o, v) {
+      o["default"] = v;
+    });
+    var __importStar = exports2 && exports2.__importStar || function(mod) {
+      if (mod && mod.__esModule)
+        return mod;
+      var result = {};
+      if (mod != null) {
+        for (var k in mod)
+          if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k))
+            __createBinding(result, mod, k);
+      }
+      __setModuleDefault(result, mod);
+      return result;
+    };
+    var __awaiter = exports2 && exports2.__awaiter || function(thisArg, _arguments, P, generator) {
+      function adopt(value) {
+        return value instanceof P ? value : new P(function(resolve) {
+          resolve(value);
+        });
+      }
+      return new (P || (P = Promise))(function(resolve, reject) {
+        function fulfilled(value) {
+          try {
+            step(generator.next(value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function rejected(value) {
+          try {
+            step(generator["throw"](value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function step(result) {
+          result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected);
+        }
+        step((generator = generator.apply(thisArg, _arguments || [])).next());
+      });
+    };
+    var __importDefault = exports2 && exports2.__importDefault || function(mod) {
+      return mod && mod.__esModule ? mod : { "default": mod };
+    };
+    Object.defineProperty(exports2, "__esModule", { value: true });
+    exports2.getDetails = exports2.isLinux = exports2.isMacOS = exports2.isWindows = exports2.arch = exports2.platform = void 0;
+    var os_1 = __importDefault(require("os"));
+    var exec = __importStar(require_exec());
+    var getWindowsInfo = () => __awaiter(void 0, void 0, void 0, function* () {
+      const { stdout: version } = yield exec.getExecOutput('powershell -command "(Get-CimInstance -ClassName Win32_OperatingSystem).Version"', void 0, {
+        silent: true
+      });
+      const { stdout: name } = yield exec.getExecOutput('powershell -command "(Get-CimInstance -ClassName Win32_OperatingSystem).Caption"', void 0, {
+        silent: true
+      });
+      return {
+        name: name.trim(),
+        version: version.trim()
+      };
+    });
+    var getMacOsInfo = () => __awaiter(void 0, void 0, void 0, function* () {
+      var _a, _b, _c, _d;
+      const { stdout } = yield exec.getExecOutput("sw_vers", void 0, {
+        silent: true
+      });
+      const version = (_b = (_a = stdout.match(/ProductVersion:\s*(.+)/)) === null || _a === void 0 ? void 0 : _a[1]) !== null && _b !== void 0 ? _b : "";
+      const name = (_d = (_c = stdout.match(/ProductName:\s*(.+)/)) === null || _c === void 0 ? void 0 : _c[1]) !== null && _d !== void 0 ? _d : "";
+      return {
+        name,
+        version
+      };
+    });
+    var getLinuxInfo = () => __awaiter(void 0, void 0, void 0, function* () {
+      const { stdout } = yield exec.getExecOutput("lsb_release", ["-i", "-r", "-s"], {
+        silent: true
+      });
+      const [name, version] = stdout.trim().split("\n");
+      return {
+        name,
+        version
+      };
+    });
+    exports2.platform = os_1.default.platform();
+    exports2.arch = os_1.default.arch();
+    exports2.isWindows = exports2.platform === "win32";
+    exports2.isMacOS = exports2.platform === "darwin";
+    exports2.isLinux = exports2.platform === "linux";
+    function getDetails() {
+      return __awaiter(this, void 0, void 0, function* () {
+        return Object.assign(Object.assign({}, yield exports2.isWindows ? getWindowsInfo() : exports2.isMacOS ? getMacOsInfo() : getLinuxInfo()), {
+          platform: exports2.platform,
+          arch: exports2.arch,
+          isWindows: exports2.isWindows,
+          isMacOS: exports2.isMacOS,
+          isLinux: exports2.isLinux
+        });
+      });
+    }
+    exports2.getDetails = getDetails;
+  }
+});
+
+// node_modules/@reporters/github/node_modules/@actions/core/lib/core.js
+var require_core = __commonJS({
+  "node_modules/@reporters/github/node_modules/@actions/core/lib/core.js"(exports2) {
+    "use strict";
+    var __createBinding = exports2 && exports2.__createBinding || (Object.create ? function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      var desc = Object.getOwnPropertyDescriptor(m, k);
+      if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+        desc = { enumerable: true, get: function() {
+          return m[k];
+        } };
+      }
+      Object.defineProperty(o, k2, desc);
+    } : function(o, m, k, k2) {
+      if (k2 === void 0)
+        k2 = k;
+      o[k2] = m[k];
+    });
+    var __setModuleDefault = exports2 && exports2.__setModuleDefault || (Object.create ? function(o, v) {
+      Object.defineProperty(o, "default", { enumerable: true, value: v });
+    } : function(o, v) {
+      o["default"] = v;
+    });
+    var __importStar = exports2 && exports2.__importStar || function(mod) {
+      if (mod && mod.__esModule)
+        return mod;
+      var result = {};
+      if (mod != null) {
+        for (var k in mod)
+          if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k))
+            __createBinding(result, mod, k);
+      }
+      __setModuleDefault(result, mod);
+      return result;
+    };
+    var __awaiter = exports2 && exports2.__awaiter || function(thisArg, _arguments, P, generator) {
+      function adopt(value) {
+        return value instanceof P ? value : new P(function(resolve) {
+          resolve(value);
+        });
+      }
+      return new (P || (P = Promise))(function(resolve, reject) {
+        function fulfilled(value) {
+          try {
+            step(generator.next(value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function rejected(value) {
+          try {
+            step(generator["throw"](value));
+          } catch (e) {
+            reject(e);
+          }
+        }
+        function step(result) {
+          result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected);
+        }
+        step((generator = generator.apply(thisArg, _arguments || [])).next());
+      });
+    };
+    Object.defineProperty(exports2, "__esModule", { value: true });
+    exports2.platform = exports2.toPlatformPath = exports2.toWin32Path = exports2.toPosixPath = exports2.markdownSummary = exports2.summary = exports2.getIDToken = exports2.getState = exports2.saveState = exports2.group = exports2.endGroup = exports2.startGroup = exports2.info = exports2.notice = exports2.warning = exports2.error = exports2.debug = exports2.isDebug = exports2.setFailed = exports2.setCommandEcho = exports2.setOutput = exports2.getBooleanInput = exports2.getMultilineInput = exports2.getInput = exports2.addPath = exports2.setSecret = exports2.exportVariable = exports2.ExitCode = void 0;
+    var command_1 = require_command();
+    var file_command_1 = require_file_command();
+    var utils_1 = require_utils();
+    var os = __importStar(require("os"));
+    var path2 = __importStar(require("path"));
+    var oidc_utils_1 = require_oidc_utils();
+    var ExitCode;
+    (function(ExitCode2) {
+      ExitCode2[ExitCode2["Success"] = 0] = "Success";
+      ExitCode2[ExitCode2["Failure"] = 1] = "Failure";
+    })(ExitCode || (exports2.ExitCode = ExitCode = {}));
+    function exportVariable(name, val) {
+      const convertedVal = (0, utils_1.toCommandValue)(val);
+      process.env[name] = convertedVal;
+      const filePath = process.env["GITHUB_ENV"] || "";
+      if (filePath) {
+        return (0, file_command_1.issueFileCommand)("ENV", (0, file_command_1.prepareKeyValueMessage)(name, val));
+      }
+      (0, command_1.issueCommand)("set-env", { name }, convertedVal);
+    }
+    exports2.exportVariable = exportVariable;
+    function setSecret(secret) {
+      (0, command_1.issueCommand)("add-mask", {}, secret);
+    }
+    exports2.setSecret = setSecret;
+    function addPath(inputPath) {
+      const filePath = process.env["GITHUB_PATH"] || "";
+      if (filePath) {
+        (0, file_command_1.issueFileCommand)("PATH", inputPath);
+      } else {
+        (0, command_1.issueCommand)("add-path", {}, inputPath);
+      }
+      process.env["PATH"] = `${inputPath}${path2.delimiter}${process.env["PATH"]}`;
+    }
+    exports2.addPath = addPath;
+    function getInput(name, options) {
+      const val = process.env[`INPUT_${name.replace(/ /g, "_").toUpperCase()}`] || "";
+      if (options && options.required && !val) {
+        throw new Error(`Input required and not supplied: ${name}`);
+      }
+      if (options && options.trimWhitespace === false) {
+        return val;
+      }
+      return val.trim();
+    }
+    exports2.getInput = getInput;
+    function getMultilineInput(name, options) {
+      const inputs = getInput(name, options).split("\n").filter((x) => x !== "");
+      if (options && options.trimWhitespace === false) {
+        return inputs;
+      }
+      return inputs.map((input) => input.trim());
+    }
+    exports2.getMultilineInput = getMultilineInput;
+    function getBooleanInput(name, options) {
+      const trueValue = ["true", "True", "TRUE"];
+      const falseValue = ["false", "False", "FALSE"];
+      const val = getInput(name, options);
+      if (trueValue.includes(val))
+        return true;
+      if (falseValue.includes(val))
+        return false;
+      throw new TypeError(`Input does not meet YAML 1.2 "Core Schema" specification: ${name}
+Support boolean input list: \`true | True | TRUE | false | False | FALSE\``);
+    }
+    exports2.getBooleanInput = getBooleanInput;
+    function setOutput(name, value) {
+      const filePath = process.env["GITHUB_OUTPUT"] || "";
+      if (filePath) {
+        return (0, file_command_1.issueFileCommand)("OUTPUT", (0, file_command_1.prepareKeyValueMessage)(name, value));
+      }
+      process.stdout.write(os.EOL);
+      (0, command_1.issueCommand)("set-output", { name }, (0, utils_1.toCommandValue)(value));
+    }
+    exports2.setOutput = setOutput;
+    function setCommandEcho(enabled) {
+      (0, command_1.issue)("echo", enabled ? "on" : "off");
+    }
+    exports2.setCommandEcho = setCommandEcho;
+    function setFailed(message) {
+      process.exitCode = ExitCode.Failure;
+      error(message);
+    }
+    exports2.setFailed = setFailed;
+    function isDebug() {
+      return process.env["RUNNER_DEBUG"] === "1";
+    }
+    exports2.isDebug = isDebug;
+    function debug(message) {
+      (0, command_1.issueCommand)("debug", {}, message);
+    }
+    exports2.debug = debug;
+    function error(message, properties = {}) {
+      (0, command_1.issueCommand)("error", (0, utils_1.toCommandProperties)(properties), message instanceof Error ? message.toString() : message);
+    }
+    exports2.error = error;
+    function warning(message, properties = {}) {
+      (0, command_1.issueCommand)("warning", (0, utils_1.toCommandProperties)(properties), message instanceof Error ? message.toString() : message);
+    }
+    exports2.warning = warning;
+    function notice(message, properties = {}) {
+      (0, command_1.issueCommand)("notice", (0, utils_1.toCommandProperties)(properties), message instanceof Error ? message.toString() : message);
+    }
+    exports2.notice = notice;
+    function info(message) {
+      process.stdout.write(message + os.EOL);
+    }
+    exports2.info = info;
+    function startGroup(name) {
+      (0, command_1.issue)("group", name);
+    }
+    exports2.startGroup = startGroup;
+    function endGroup() {
+      (0, command_1.issue)("endgroup");
+    }
+    exports2.endGroup = endGroup;
+    function group(name, fn) {
+      return __awaiter(this, void 0, void 0, function* () {
+        startGroup(name);
+        let result;
+        try {
+          result = yield fn();
+        } finally {
+          endGroup();
+        }
+        return result;
+      });
+    }
+    exports2.group = group;
+    function saveState(name, value) {
+      const filePath = process.env["GITHUB_STATE"] || "";
+      if (filePath) {
+        return (0, file_command_1.issueFileCommand)("STATE", (0, file_command_1.prepareKeyValueMessage)(name, value));
+      }
+      (0, command_1.issueCommand)("save-state", { name }, (0, utils_1.toCommandValue)(value));
+    }
+    exports2.saveState = saveState;
+    function getState(name) {
       return process.env[`STATE_${name}`] || "";
     }
     exports2.getState = getState;
@@ -19033,6 +19843,7 @@ Support boolean input list: \`true | True | TRUE | false | False | FALSE\``);
     Object.defineProperty(exports2, "toPlatformPath", { enumerable: true, get: function() {
       return path_utils_1.toPlatformPath;
     } });
+    exports2.platform = __importStar(require_platform());
   }
 });
 
@@ -19305,6 +20116,7 @@ var require_stack_utils = __commonJS({
 
 // node_modules/@reporters/github/index.js
 var path = require("node:path");
+var { fileURLToPath } = require("node:url");
 var util = require("node:util");
 var { EOL } = require("node:os");
 var core = require_core();
@@ -19312,10 +20124,13 @@ var StackUtils = require_stack_utils();
 var WORKSPACE = process.env.GITHUB_WORKSPACE ?? "";
 var stack = new StackUtils({ cwd: WORKSPACE, internals: StackUtils.nodeInternals() });
 var isFile = (name) => name?.startsWith(WORKSPACE);
-var getRelativeFilePath = (name) => isFile(name) ? path.relative(WORKSPACE, require.resolve(name) ?? "") : null;
+var getRelativeFilePath = (name) => isFile(name) ? path.relative(WORKSPACE, name) : null;
 function getFilePath(fileName) {
   if (fileName.startsWith("file://")) {
-    return getRelativeFilePath(new URL(fileName).pathname);
+    return getRelativeFilePath(fileURLToPath(fileName));
+  }
+  if (!path.isAbsolute(fileName)) {
+    return getRelativeFilePath(path.resolve(fileName) ?? "");
   }
   return getRelativeFilePath(fileName);
 }
@@ -19390,7 +20205,7 @@ module.exports = async function githubReporter(source) {
         break;
     }
   }
-  const formatedDiagnostics = diagnostics.map((d) => {
+  const formattedDiagnostics = diagnostics.map((d) => {
     const [key, ...rest] = d.split(" ");
     const value = rest.join(" ");
     return [
@@ -19398,11 +20213,11 @@ module.exports = async function githubReporter(source) {
       DIAGNOSTIC_VALUES[key] ? DIAGNOSTIC_VALUES[key](value) : value
     ];
   });
-  core.startGroup(`Test results (${formatedDiagnostics.find(([key]) => key === DIAGNOSTIC_KEYS.pass)?.[1] ?? counter.pass} passed, ${formatedDiagnostics.find(([key]) => key === DIAGNOSTIC_KEYS.fail)?.[1] ?? counter.fail} failed)`);
-  core.notice(formatedDiagnostics.map((d) => d.join(": ")).join(EOL));
+  core.startGroup(`Test results (${formattedDiagnostics.find(([key]) => key === DIAGNOSTIC_KEYS.pass)?.[1] ?? counter.pass} passed, ${formattedDiagnostics.find(([key]) => key === DIAGNOSTIC_KEYS.fail)?.[1] ?? counter.fail} failed)`);
+  core.notice(formattedDiagnostics.map((d) => d.join(": ")).join(EOL));
   core.endGroup();
   if (process.env.GITHUB_STEP_SUMMARY) {
-    await core.summary.addHeading("Test Results").addTable(formatedDiagnostics).write();
+    await core.summary.addHeading("Test Results").addTable(formattedDiagnostics).write();
   }
 };
 /*! Bundled license information:
diff --git a/tools/github_reporter/package.json b/tools/github_reporter/package.json
index 5e37b4fc5a0307..d9e5273b7ce46e 100644
--- a/tools/github_reporter/package.json
+++ b/tools/github_reporter/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@reporters/github",
-  "version": "1.7.1",
+  "version": "1.7.2",
   "description": "A github actions reporter for `node:test`",
   "type": "commonjs",
   "keywords": [

From af020edf96c6414009d5eb282c44836674fc2702 Mon Sep 17 00:00:00 2001
From: Joyee Cheung <joyeec9h3@gmail.com>
Date: Tue, 17 Dec 2024 23:18:32 +0100
Subject: [PATCH 204/216] build: fix missing fp16 dependency in d8 builds
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PR-URL: https://github.com/nodejs/node/pull/56266
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 tools/v8_gypfiles/d8.gyp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/v8_gypfiles/d8.gyp b/tools/v8_gypfiles/d8.gyp
index ef071b40710c49..4dd989724d3b6f 100644
--- a/tools/v8_gypfiles/d8.gyp
+++ b/tools/v8_gypfiles/d8.gyp
@@ -21,6 +21,7 @@
         'v8.gyp:v8_libplatform',
         'v8.gyp:generate_bytecode_builtins_list',
         'v8.gyp:v8_abseil',
+        'v8.gyp:fp16',
       ],
       # Generated source files need this explicitly:
       'include_dirs+': [

From ae683a9e1f1a2cd59fedabb3be87faf781d00d1d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=90=B4=E5=B0=8F=E7=99=BD?= <296015668@qq.com>
Date: Wed, 18 Dec 2024 17:42:47 +0800
Subject: [PATCH 205/216] build: set DESTCPU correctly for 'make binary' on
 loongarch64
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: 吴小白 <296015668@qq.com>
PR-URL: https://github.com/nodejs/node/pull/56271
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
---
 Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 22c58a2676a564..b31a252ff0fd63 100644
--- a/Makefile
+++ b/Makefile
@@ -933,6 +933,9 @@ else
 ifeq ($(findstring riscv64,$(UNAME_M)),riscv64)
 DESTCPU ?= riscv64
 else
+ifeq ($(findstring loongarch64,$(UNAME_M)),loongarch64)
+DESTCPU ?= loong64
+else
 DESTCPU ?= x86
 endif
 endif
@@ -946,6 +949,7 @@ endif
 endif
 endif
 endif
+endif
 ifeq ($(DESTCPU),x64)
 ARCH=x64
 else
@@ -970,6 +974,9 @@ else
 ifeq ($(DESTCPU),riscv64)
 ARCH=riscv64
 else
+ifeq ($(DESTCPU),loong64)
+ARCH=loong64
+else
 ARCH=x86
 endif
 endif
@@ -979,6 +986,7 @@ endif
 endif
 endif
 endif
+endif
 
 # node and v8 use different arch names (e.g. node 'x86' vs v8 'ia32').
 # pass the proper v8 arch name to $V8_ARCH based on user-specified $DESTCPU.

From f89f4ff8563f38cc851915e600769aecf1c5bd8a Mon Sep 17 00:00:00 2001
From: Rich Trott <rtrott@gmail.com>
Date: Wed, 18 Dec 2024 08:19:08 -0800
Subject: [PATCH 206/216] doc: fix color contrast issue in light mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Attributes are being highlighted as #f00 on a background of #f2f2f2.
That's a color contrast of 3.98:1, failing to meet the 4.5:1 requirement
of WCAG 2.1 AA. This changes the attribute color to #d00, which has a
color contrast of 5.09:1 meeting the 4.5:1 requirement.

PR-URL: https://github.com/nodejs/node/pull/56272
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Claudio Wunder <cwunder@gnome.org>
---
 doc/api_assets/hljs.css | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/api_assets/hljs.css b/doc/api_assets/hljs.css
index 86bd405709276c..91cece59e7cced 100644
--- a/doc/api_assets/hljs.css
+++ b/doc/api_assets/hljs.css
@@ -49,7 +49,7 @@
     color: #808080;
   }
   .hljs-attr {
-    color: #f00;
+    color: #d00;
   }
   .hljs-symbol,
   .hljs-bullet,

From 558e6588a4d1b1b04b223d5dc46281279911a021 Mon Sep 17 00:00:00 2001
From: Shu-yu Guo <syg@chromium.org>
Date: Fri, 26 Jan 2024 14:20:40 -0800
Subject: [PATCH 207/216] deps: V8: backport ae5a4db8ad86

Original commit message:

    [import-attributes] Deprecate 'assert' for removal in 12.6

    See https://groups.google.com/a/chromium.org/g/blink-dev/c/ZHvzLaJZRvo/m/FgNDBjrtBQAJ

    Bug: v8:10958
    Change-Id: I4d21c9f7aad1024b198b4a1cdfb4792a011da464
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5055681
    Reviewed-by: Rezvan Mahdavi Hezaveh <rezvan@chromium.org>
    Auto-Submit: Shu-yu Guo <syg@chromium.org>
    Commit-Queue: Shu-yu Guo <syg@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#92044}

Refs: https://github.com/v8/v8/commit/ae5a4db8ad86c817f80856901ea121829f8b5184
Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/55961
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 common.gypi                                   |  2 +-
 deps/v8/src/common/message-template.h         |  3 ++
 deps/v8/src/parsing/parser.cc                 |  3 ++
 .../fail/modules-import-assertions-fail-1.out |  1 +
 .../fail/modules-import-assertions-fail-2.out |  3 +-
 .../PrivateAccessorAccess.golden              |  8 ++---
 .../PrivateMethodAccess.golden                |  4 +--
 .../StaticPrivateMethodAccess.golden          | 30 +++++++++----------
 8 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/common.gypi b/common.gypi
index 1ece4f5e494533..c56785cbadfa2d 100644
--- a/common.gypi
+++ b/common.gypi
@@ -36,7 +36,7 @@
 
     # Reset this number to 0 on major V8 upgrades.
     # Increment by one for each non-official patch applied to deps/v8.
-    'v8_embedder_string': '-node.23',
+    'v8_embedder_string': '-node.24',
 
     ##### V8 defaults for Node.js #####
 
diff --git a/deps/v8/src/common/message-template.h b/deps/v8/src/common/message-template.h
index 99c9230107368d..fe26746d806def 100644
--- a/deps/v8/src/common/message-template.h
+++ b/deps/v8/src/common/message-template.h
@@ -113,6 +113,9 @@ namespace internal {
   T(IllegalInvocation, "Illegal invocation")                                   \
   T(ImmutablePrototypeSet,                                                     \
     "Immutable prototype object '%' cannot have their prototype set")          \
+  T(ImportAssertDeprecated,                                                    \
+    "'assert' is deprecated in import statements and support will be removed " \
+    "in %; use 'with' instead")                                                \
   T(ImportAssertionDuplicateKey, "Import assertion has duplicate key '%'")     \
   T(ImportCallNotNewExpression, "Cannot use new with import")                  \
   T(ImportOutsideModule, "Cannot use import statement outside a module")       \
diff --git a/deps/v8/src/parsing/parser.cc b/deps/v8/src/parsing/parser.cc
index 4a3fceed6af7c6..0d9085af752618 100644
--- a/deps/v8/src/parsing/parser.cc
+++ b/deps/v8/src/parsing/parser.cc
@@ -1365,6 +1365,9 @@ ImportAssertions* Parser::ParseImportAssertClause() {
              !scanner()->HasLineTerminatorBeforeNext() &&
              CheckContextualKeyword(ast_value_factory()->assert_string())) {
     // 'assert' keyword consumed
+    info_->pending_error_handler()->ReportWarningAt(
+        position(), end_position(), MessageTemplate::kImportAssertDeprecated,
+        "a future version");
   } else {
     return import_assertions;
   }
diff --git a/deps/v8/test/message/fail/modules-import-assertions-fail-1.out b/deps/v8/test/message/fail/modules-import-assertions-fail-1.out
index 1b3be22192a342..2ba99a283f69bc 100644
--- a/deps/v8/test/message/fail/modules-import-assertions-fail-1.out
+++ b/deps/v8/test/message/fail/modules-import-assertions-fail-1.out
@@ -1 +1,2 @@
+*%(basename)s:9: 'assert' is deprecated in import statements and support will be removed in a future version; use 'with' instead
 undefined:0: Error: Invalid module type was asserted
\ No newline at end of file
diff --git a/deps/v8/test/message/fail/modules-import-assertions-fail-2.out b/deps/v8/test/message/fail/modules-import-assertions-fail-2.out
index 47ac5621187d97..0a82154d871ddc 100644
--- a/deps/v8/test/message/fail/modules-import-assertions-fail-2.out
+++ b/deps/v8/test/message/fail/modules-import-assertions-fail-2.out
@@ -1,4 +1,5 @@
+*%(basename)s:9: 'assert' is deprecated in import statements and support will be removed in a future version; use 'with' instead
 undefined:1: SyntaxError: Unexpected token '/', "// Copyrig"... is not valid JSON
 // Copyright 2021 the V8 project authors. All rights reserved.
 ^
-SyntaxError: Unexpected token '/', "// Copyrig"... is not valid JSON
\ No newline at end of file
+SyntaxError: Unexpected token '/', "// Copyrig"... is not valid JSON
diff --git a/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateAccessorAccess.golden b/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateAccessorAccess.golden
index 3594623033d02b..d6091fbf9b7fdc 100644
--- a/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateAccessorAccess.golden
+++ b/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateAccessorAccess.golden
@@ -83,7 +83,7 @@ bytecodes: [
   /*   48 E> */ B(DefineKeyedOwnProperty), R(this), R(0), U8(0), U8(0),
   /*   53 S> */ B(LdaImmutableCurrentContextSlot), U8(3),
   /*   58 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(311),
+                B(Wide), B(LdaSmi), I16(312),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
@@ -115,7 +115,7 @@ bytecodes: [
   /*   41 E> */ B(DefineKeyedOwnProperty), R(this), R(0), U8(0), U8(0),
   /*   46 S> */ B(LdaImmutableCurrentContextSlot), U8(3),
   /*   51 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(310),
+                B(Wide), B(LdaSmi), I16(311),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
@@ -149,7 +149,7 @@ bytecodes: [
                 B(Star2),
                 B(LdaImmutableCurrentContextSlot), U8(3),
   /*   58 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(311),
+                B(Wide), B(LdaSmi), I16(312),
                 B(Star3),
                 B(LdaConstant), U8(0),
                 B(Star4),
@@ -181,7 +181,7 @@ bytecodes: [
   /*   41 E> */ B(DefineKeyedOwnProperty), R(this), R(0), U8(0), U8(0),
   /*   46 S> */ B(LdaImmutableCurrentContextSlot), U8(3),
   /*   51 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(310),
+                B(Wide), B(LdaSmi), I16(311),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
diff --git a/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateMethodAccess.golden b/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateMethodAccess.golden
index b7bb831e618647..1c8dc4d9e9a6dd 100644
--- a/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateMethodAccess.golden
+++ b/deps/v8/test/unittests/interpreter/bytecode_expectations/PrivateMethodAccess.golden
@@ -58,7 +58,7 @@ bytecodes: [
                 B(Star2),
                 B(LdaImmutableCurrentContextSlot), U8(3),
   /*   54 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(309),
+                B(Wide), B(LdaSmi), I16(310),
                 B(Star3),
                 B(LdaConstant), U8(0),
                 B(Star4),
@@ -91,7 +91,7 @@ bytecodes: [
   /*   44 E> */ B(DefineKeyedOwnProperty), R(this), R(0), U8(0), U8(0),
   /*   49 S> */ B(LdaImmutableCurrentContextSlot), U8(3),
   /*   54 E> */ B(GetKeyedProperty), R(this), U8(2),
-                B(Wide), B(LdaSmi), I16(309),
+                B(Wide), B(LdaSmi), I16(311),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
diff --git a/deps/v8/test/unittests/interpreter/bytecode_expectations/StaticPrivateMethodAccess.golden b/deps/v8/test/unittests/interpreter/bytecode_expectations/StaticPrivateMethodAccess.golden
index e8350d6b7b42ca..2b21d2260fc0ff 100644
--- a/deps/v8/test/unittests/interpreter/bytecode_expectations/StaticPrivateMethodAccess.golden
+++ b/deps/v8/test/unittests/interpreter/bytecode_expectations/StaticPrivateMethodAccess.golden
@@ -24,7 +24,7 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(1),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
@@ -61,13 +61,13 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
   /*   61 E> */ B(CallRuntime), U16(Runtime::kNewTypeError), R(2), U8(2),
                 B(Throw),
-                B(Wide), B(LdaSmi), I16(309),
+                B(Wide), B(LdaSmi), I16(310),
                 B(Star2),
                 B(LdaConstant), U8(1),
                 B(Star3),
@@ -99,13 +99,13 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star1),
                 B(LdaConstant), U8(0),
                 B(Star2),
   /*   61 E> */ B(CallRuntime), U16(Runtime::kNewTypeError), R(1), U8(2),
                 B(Throw),
-                B(Wide), B(LdaSmi), I16(309),
+                B(Wide), B(LdaSmi), I16(310),
                 B(Star1),
                 B(LdaConstant), U8(1),
                 B(Star2),
@@ -145,7 +145,7 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
@@ -167,7 +167,7 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star3),
                 B(LdaConstant), U8(0),
                 B(Star4),
@@ -182,7 +182,7 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
@@ -216,13 +216,13 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star1),
                 B(LdaConstant), U8(0),
                 B(Star2),
   /*   65 E> */ B(CallRuntime), U16(Runtime::kNewTypeError), R(1), U8(2),
                 B(Throw),
-                B(Wide), B(LdaSmi), I16(311),
+                B(Wide), B(LdaSmi), I16(312),
                 B(Star1),
                 B(LdaConstant), U8(1),
                 B(Star2),
@@ -253,13 +253,13 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star1),
                 B(LdaConstant), U8(0),
                 B(Star2),
   /*   58 E> */ B(CallRuntime), U16(Runtime::kNewTypeError), R(1), U8(2),
                 B(Throw),
-                B(Wide), B(LdaSmi), I16(310),
+                B(Wide), B(LdaSmi), I16(311),
                 B(Star1),
                 B(LdaConstant), U8(1),
                 B(Star2),
@@ -292,13 +292,13 @@ bytecodes: [
                 B(TestReferenceEqual), R(this),
                 B(Mov), R(this), R(0),
                 B(JumpIfTrue), U8(16),
-                B(Wide), B(LdaSmi), I16(303),
+                B(Wide), B(LdaSmi), I16(304),
                 B(Star2),
                 B(LdaConstant), U8(0),
                 B(Star3),
   /*   65 E> */ B(CallRuntime), U16(Runtime::kNewTypeError), R(2), U8(2),
                 B(Throw),
-                B(Wide), B(LdaSmi), I16(311),
+                B(Wide), B(LdaSmi), I16(312),
                 B(Star2),
                 B(LdaConstant), U8(1),
                 B(Star3),
@@ -327,7 +327,7 @@ bytecode array length: 19
 bytecodes: [
   /*   46 S> */ B(LdaImmutableCurrentContextSlot), U8(3),
   /*   51 E> */ B(GetKeyedProperty), R(this), U8(0),
-                B(Wide), B(LdaSmi), I16(310),
+                B(Wide), B(LdaSmi), I16(311),
                 B(Star1),
                 B(LdaConstant), U8(0),
                 B(Star2),

From 3c4262a1713198ed873c0f3e8c85384f3fd70422 Mon Sep 17 00:00:00 2001
From: Shu-yu Guo <syg@chromium.org>
Date: Tue, 30 Jan 2024 16:28:06 -0800
Subject: [PATCH 208/216] deps: V8: cherry-pick 26fd1dfa9cd6

Original commit message:

    [import-attributes] Deprecate 'assert' for dynamic import as well

    Bug: v8:10958
    Change-Id: I7847bdb5d2c79f057f4e1df99f8f5889788f09cb
    Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5249778
    Commit-Queue: Shu-yu Guo <syg@chromium.org>
    Reviewed-by: Leszek Swirski <leszeks@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#92123}

Refs: https://github.com/v8/v8/commit/26fd1dfa9cd6d56eae8ecfc7a136fd6709fba161
PR-URL: https://github.com/nodejs/node/pull/55961
Reviewed-By: Jacob Smith <jacob@frende.me>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
---
 common.gypi                      |  2 +-
 deps/v8/src/execution/isolate.cc | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/common.gypi b/common.gypi
index c56785cbadfa2d..04852d81103ef8 100644
--- a/common.gypi
+++ b/common.gypi
@@ -36,7 +36,7 @@
 
     # Reset this number to 0 on major V8 upgrades.
     # Increment by one for each non-official patch applied to deps/v8.
-    'v8_embedder_string': '-node.24',
+    'v8_embedder_string': '-node.25',
 
     ##### V8 defaults for Node.js #####
 
diff --git a/deps/v8/src/execution/isolate.cc b/deps/v8/src/execution/isolate.cc
index 33ff1348f58989..242c611f7582aa 100644
--- a/deps/v8/src/execution/isolate.cc
+++ b/deps/v8/src/execution/isolate.cc
@@ -5193,6 +5193,20 @@ MaybeHandle<FixedArray> Isolate::GetImportAssertionsFromArgument(
       // an error.
       return MaybeHandle<FixedArray>();
     }
+
+    if (V8_UNLIKELY(!import_assertions_object->IsUndefined())) {
+      MessageLocation* location = nullptr;
+      MessageLocation computed_location;
+      if (ComputeLocation(&computed_location)) {
+        location = &computed_location;
+      }
+      Handle<JSMessageObject> message = MessageHandler::MakeMessageObject(
+          this, MessageTemplate::kImportAssertDeprecated, location,
+          factory()->NewStringFromAsciiChecked("a future version"),
+          Handle<FixedArray>::null());
+      message->set_error_level(v8::Isolate::kMessageWarning);
+      MessageHandler::ReportMessage(this, location, message);
+    }
   }
 
   // If there is no 'with' or 'assert' option in the options bag, it's not an

From 030f1559864167979fe03b655a554a48fdc7c9f4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= <nribaudo@igalia.com>
Date: Sat, 12 Oct 2024 13:21:09 +0200
Subject: [PATCH 209/216] esm: mark import attributes and JSON module as stable
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The two proposals reached stage 4 at the October 2024 meeting.

PR-URL: https://github.com/nodejs/node/pull/55333
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Backport-PR-URL: https://github.com/nodejs/node/pull/55961
---
 doc/api/esm.md                          | 17 +++++++----------
 lib/internal/modules/esm/translators.js |  1 -
 test/es-module/test-esm-json.mjs        |  4 ++--
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/doc/api/esm.md b/doc/api/esm.md
index 7016e24065feab..90a3d0c9903be8 100644
--- a/doc/api/esm.md
+++ b/doc/api/esm.md
@@ -262,13 +262,9 @@ changes:
     description: Switch from Import Assertions to Import Attributes.
 -->
 
-> Stability: 1.1 - Active development
-
-> This feature was previously named "Import assertions", and using the `assert`
-> keyword instead of `with`. Any uses in code of the prior `assert` keyword
-> should be updated to use `with` instead.
+> Stability: 2 - Stable
 
-The [Import Attributes proposal][] adds an inline syntax for module import
+[Import attributes][Import Attributes MDN] are an inline syntax for module import
 statements to pass on more information alongside the module specifier.
 
 ```js
@@ -278,13 +274,14 @@ const { default: barData } =
   await import('./bar.json', { with: { type: 'json' } });
 ```
 
-Node.js supports the following `type` values, for which the attribute is
-mandatory:
+Node.js only supports the `type` attribute, for which it supports the following values:
 
 | Attribute `type` | Needed for       |
 | ---------------- | ---------------- |
 | `'json'`         | [JSON modules][] |
 
+The `type: 'json'` attribute is mandatory when importing JSON modules.
+
 ## Built-in modules
 
 [Built-in modules][] provide named exports of their public API. A
@@ -591,7 +588,7 @@ separate cache.
 
 ## JSON modules
 
-> Stability: 1 - Experimental
+> Stability: 2 - Stable
 
 JSON files can be referenced by `import`:
 
@@ -1129,7 +1126,7 @@ resolution for ESM specifiers is [commonjs-extension-resolution-loader][].
 [Dynamic `import()`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import
 [ES Module Integration Proposal for WebAssembly]: https://github.com/webassembly/esm-integration
 [Import Attributes]: #import-attributes
-[Import Attributes proposal]: https://github.com/tc39/proposal-import-attributes
+[Import Attributes MDN]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import/with
 [JSON modules]: #json-modules
 [Loading ECMAScript modules using `require()`]: modules.md#loading-ecmascript-modules-using-require
 [Module customization hooks]: module.md#customization-hooks
diff --git a/lib/internal/modules/esm/translators.js b/lib/internal/modules/esm/translators.js
index 8f4b6b25d88896..044d820161a5f9 100644
--- a/lib/internal/modules/esm/translators.js
+++ b/lib/internal/modules/esm/translators.js
@@ -450,7 +450,6 @@ translators.set('builtin', function builtinStrategy(url) {
 // Strategy for loading a JSON file
 const isWindows = process.platform === 'win32';
 translators.set('json', function jsonStrategy(url, source) {
-  emitExperimentalWarning('Importing JSON modules');
   assertBufferSource(source, true, 'load');
   debug(`Loading JSONModule ${url}`);
   const pathname = StringPrototypeStartsWith(url, 'file:') ?
diff --git a/test/es-module/test-esm-json.mjs b/test/es-module/test-esm-json.mjs
index 8be7b60d322573..083194ab4237b0 100644
--- a/test/es-module/test-esm-json.mjs
+++ b/test/es-module/test-esm-json.mjs
@@ -16,12 +16,12 @@ describe('ESM: importing JSON', () => {
     assert.strictEqual(secret.ofLife, 42);
   });
 
-  it('should print an experimental warning', async () => {
+  it('should not print an experimental warning', async () => {
     const { code, signal, stderr } = await spawnPromisified(execPath, [
       fixtures.path('/es-modules/json-modules.mjs'),
     ]);
 
-    assert.match(stderr, /ExperimentalWarning: Importing JSON modules/);
+    assert.strictEqual(stderr, '');
     assert.strictEqual(code, 0);
     assert.strictEqual(signal, null);
   });

From 618e0056724142feee8eb77759d4fedf38d1f689 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Sat, 16 Nov 2024 16:36:27 +0000
Subject: [PATCH 210/216] doc: add history entries for JSON modules
 stabilization

PR-URL: https://github.com/nodejs/node/pull/55855
Refs: https://github.com/nodejs/node/pull/55333
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
Backport-PR-URL: https://github.com/nodejs/node/pull/55961
---
 doc/api/esm.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/doc/api/esm.md b/doc/api/esm.md
index 90a3d0c9903be8..65174473f8e37c 100644
--- a/doc/api/esm.md
+++ b/doc/api/esm.md
@@ -7,6 +7,9 @@
 <!-- YAML
 added: v8.5.0
 changes:
+  - version: REPLACEME
+    pr-url: https://github.com/nodejs/node/pull/55333
+    description: Import attributes are no longer experimental.
   - version: v20.10.0
     pr-url: https://github.com/nodejs/node/pull/50140
     description: Add experimental support for import attributes.
@@ -588,6 +591,13 @@ separate cache.
 
 ## JSON modules
 
+<!-- YAML
+changes:
+  - version: REPLACEME
+    pr-url: https://github.com/nodejs/node/pull/55333
+    description: JSON modules are no longer experimental.
+-->
+
 > Stability: 2 - Stable
 
 JSON files can be referenced by `import`:

From f07be5e3cdbd2b0dfb74980f8edfcd306dc96c6e Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Thu, 23 Jan 2025 09:11:22 +0100
Subject: [PATCH 211/216] doc: add note for features using `InternalWorker`
 with permission model

PR-URL: https://github.com/nodejs/node/pull/56706
Backport-PR-URL: https://github.com/nodejs/node/pull/56721
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
---
 doc/api/cli.md    | 11 +++++++++++
 doc/api/module.md |  7 +++++++
 2 files changed, 18 insertions(+)

diff --git a/doc/api/cli.md b/doc/api/cli.md
index 0b38b15ec47e39..df7031f65d44f9 100644
--- a/doc/api/cli.md
+++ b/doc/api/cli.md
@@ -951,6 +951,10 @@ Previously gated the entire `import.meta.resolve` feature.
 <!-- YAML
 added: v8.8.0
 changes:
+  - version: v20.18.2
+    pr-url: https://github.com/nodejs-private/node-private/pull/629
+    description: Using this feature with the permission model enabled requires
+                 passing `--allow-worker`.
   - version: v12.11.1
     pr-url: https://github.com/nodejs/node/pull/29752
     description: This flag was renamed from `--loader` to
@@ -964,6 +968,8 @@ changes:
 Specify the `module` containing exported [module customization hooks][].
 `module` may be any string accepted as an [`import` specifier][].
 
+This feature requires `--allow-worker` if used with the [Permission Model][].
+
 ### `--experimental-network-imports`
 
 <!-- YAML
@@ -1070,6 +1076,11 @@ report is not generated. See the documentation on
 
 <!-- YAML
 added: v20.18.0
+changes:
+  - version: v20.18.2
+    pr-url: https://github.com/nodejs-private/node-private/pull/629
+    description: Using this feature with the permission model enabled requires
+                 passing `--allow-worker`.
 -->
 
 > Stability: 1.0 - Early development
diff --git a/doc/api/module.md b/doc/api/module.md
index e624c4a13bb1b7..8cd718ab8d8093 100644
--- a/doc/api/module.md
+++ b/doc/api/module.md
@@ -87,6 +87,10 @@ isBuiltin('wss'); // false
 <!-- YAML
 added: v20.6.0
 changes:
+  - version: v20.18.2
+    pr-url: https://github.com/nodejs-private/node-private/pull/629
+    description: Using this feature with the permission model enabled requires
+                 passing `--allow-worker`.
   - version: v20.8.0
     pr-url: https://github.com/nodejs/node/pull/49655
     description: Add support for WHATWG URL instances.
@@ -113,6 +117,8 @@ changes:
 Register a module that exports [hooks][] that customize Node.js module
 resolution and loading behavior. See [Customization hooks][].
 
+This feature requires `--allow-worker` if used with the [Permission Model][].
+
 ### `module.syncBuiltinESMExports()`
 
 <!-- YAML
@@ -1117,6 +1123,7 @@ returned object contains the following keys:
 [Customization hooks]: #customization-hooks
 [ES Modules]: esm.md
 [HTTPS and HTTP imports]: esm.md#https-and-http-imports
+[Permission Model]: permissions.md#permission-model
 [Source map v3 format]: https://sourcemaps.info/spec.html#h.mofvlxcwqzej
 [`"exports"`]: packages.md#exports
 [`--enable-source-maps`]: cli.md#--enable-source-maps

From f78508cd30bce720c76381d65c1486481644a8d6 Mon Sep 17 00:00:00 2001
From: Antoine du Hamel <duhamelantoine1995@gmail.com>
Date: Wed, 22 Jan 2025 21:02:09 +0100
Subject: [PATCH 212/216] doc: add history info for Permission Model

PR-URL: https://github.com/nodejs/node/pull/56707
Backport-PR-URL: https://github.com/nodejs/node/pull/56724
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
---
 doc/api/permissions.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/doc/api/permissions.md b/doc/api/permissions.md
index cc8bde3320550a..80516643e890d4 100644
--- a/doc/api/permissions.md
+++ b/doc/api/permissions.md
@@ -1,5 +1,9 @@
 # Permissions
 
+<!--introduced_in=v20.0.0-->
+
+<!-- source_link=src/permission.cc -->
+
 Permissions can be used to control what system resources the
 Node.js process has access to or what actions the process can take
 with those resources. Permissions can also control what modules can
@@ -475,12 +479,12 @@ Additionally, import maps only work on `import` so it may be desirable to add a
 
 ### Permission Model
 
-<!-- type=misc -->
+<!-- YAML
+added: v20.0.0
+-->
 
 > Stability: 1.1 - Active development
 
-<!-- name=permission-model -->
-
 The Node.js Permission Model is a mechanism for restricting access to specific
 resources during execution.
 The API exists behind a flag [`--experimental-permission`][] which when enabled,

From 5c3f18be0472d6d524c36abdc3543db3b7ffd3a5 Mon Sep 17 00:00:00 2001
From: Rafael Gonzaga <rafael.nunu@hotmail.com>
Date: Wed, 29 Jan 2025 12:21:48 -0300
Subject: [PATCH 213/216] test: temporary remove resource check from fs
 read-write

Since the last security release, the resource check has been
flaky on Windows. This commit temporarily disables those checks
to unblock the next regular release.

PR-URL: https://github.com/nodejs/node/pull/56789
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
---
 test/fixtures/permission/fs-read.js  | 100 +++++++++++++--------------
 test/fixtures/permission/fs-write.js |  76 ++++++++++----------
 2 files changed, 87 insertions(+), 89 deletions(-)

diff --git a/test/fixtures/permission/fs-read.js b/test/fixtures/permission/fs-read.js
index a32e12f82aa8ab..03261d975ab94c 100644
--- a/test/fixtures/permission/fs-read.js
+++ b/test/fixtures/permission/fs-read.js
@@ -18,26 +18,26 @@ const regularFile = __filename;
   fs.readFile(blockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.readFile(bufferBlockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.readFileSync(blockedFile);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.readFileSync(blockedFileURL);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -51,7 +51,7 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   })).then(common.mustCall());
   assert.rejects(() => {
     return new Promise((_resolve, reject) => {
@@ -61,7 +61,7 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   })).then(common.mustCall());
 
   assert.rejects(() => {
@@ -72,7 +72,7 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   })).then(common.mustCall());
 }
 
@@ -81,31 +81,31 @@ const regularFile = __filename;
   fs.stat(blockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.stat(bufferBlockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.statSync(blockedFile);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.statSync(blockedFileURL);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.stat(path.join(blockedFolder, 'anyfile'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
   }));
 
   // doesNotThrow
@@ -119,26 +119,26 @@ const regularFile = __filename;
   fs.access(blockedFile, fs.constants.R_OK, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.access(bufferBlockedFile, fs.constants.R_OK, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.accessSync(blockedFileURL, fs.constants.R_OK);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.accessSync(path.join(blockedFolder, 'anyfile'), fs.constants.R_OK);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
   }));
 
   // doesNotThrow
@@ -152,26 +152,26 @@ const regularFile = __filename;
   fs.copyFile(blockedFile, path.join(blockedFolder, 'any-other-file'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.copyFile(bufferBlockedFile, path.join(blockedFolder, 'any-other-file'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.copyFileSync(blockedFileURL, path.join(blockedFolder, 'any-other-file'));
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.copyFileSync(blockedFile, path.join(__dirname, 'any-other-file'));
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -182,30 +182,28 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    // cpSync calls lstatSync before reading blockedFile
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.cpSync(bufferBlockedFile, path.join(blockedFolder, 'any-other-file'));
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.cpSync(blockedFileURL, path.join(blockedFolder, 'any-other-file'));
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    // cpSync calls lstatSync before reading blockedFile
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.cpSync(blockedFile, path.join(__dirname, 'any-other-file'));
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -214,26 +212,26 @@ const regularFile = __filename;
   fs.open(blockedFile, 'r', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.open(bufferBlockedFile, 'r', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.openSync(blockedFileURL, 'r');
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.openSync(path.join(blockedFolder, 'anyfile'), 'r');
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
   }));
 
   // doesNotThrow
@@ -260,14 +258,14 @@ const regularFile = __filename;
   fs.opendir(blockedFolder, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFolder),
+    // resource: path.toNamespacedPath(blockedFolder),
   }));
   assert.throws(() => {
     fs.opendirSync(blockedFolder);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFolder),
+    // resource: path.toNamespacedPath(blockedFolder),
   }));
   // doesNotThrow
   fs.opendir(allowedFolder, (err, dir) => {
@@ -281,14 +279,14 @@ const regularFile = __filename;
   fs.readdir(blockedFolder, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFolder),
+    // resource: path.toNamespacedPath(blockedFolder),
   }));
   assert.throws(() => {
     fs.readdirSync(blockedFolder);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFolder),
+    // resource: path.toNamespacedPath(blockedFolder),
   }));
 
   // doesNotThrow
@@ -304,14 +302,14 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.watch(blockedFileURL, () => {});
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 
   // doesNotThrow
@@ -327,14 +325,14 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.watchFile(blockedFileURL, common.mustNotCall());
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -343,26 +341,26 @@ const regularFile = __filename;
   fs.rename(blockedFile, 'newfile', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.rename(bufferBlockedFile, 'newfile', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.renameSync(blockedFile, 'newfile');
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.renameSync(blockedFileURL, 'newfile');
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -373,21 +371,21 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.openAsBlob(bufferBlockedFile);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.openAsBlob(blockedFileURL);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -405,7 +403,7 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -414,26 +412,26 @@ const regularFile = __filename;
   fs.statfs(blockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.statfs(bufferBlockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.statfsSync(blockedFile);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.statfsSync(blockedFileURL);
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
 }
 
@@ -444,7 +442,7 @@ const regularFile = __filename;
   }, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemRead',
-    resource: blockedFolder,
+    // resource: blockedFolder,
   }));
 }
 
diff --git a/test/fixtures/permission/fs-write.js b/test/fixtures/permission/fs-write.js
index 0c0ec72602041a..5dd3b07ed9a0cf 100644
--- a/test/fixtures/permission/fs-write.js
+++ b/test/fixtures/permission/fs-write.js
@@ -28,31 +28,31 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   fs.writeFile(blockedFile, 'example', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.writeFile(bufferBlockedFile, 'example', common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.writeFileSync(blockedFileURL, 'example');
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.writeFileSync(relativeProtectedFile, 'example');
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(relativeProtectedFile),
+    // resource: path.toNamespacedPath(relativeProtectedFile),
   });
 
   assert.throws(() => {
@@ -60,7 +60,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
   });
 }
 
@@ -74,7 +74,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }).then(common.mustCall());
   assert.rejects(() => {
     return new Promise((_resolve, reject) => {
@@ -84,7 +84,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(relativeProtectedFile),
+    // resource: path.toNamespacedPath(relativeProtectedFile),
   }).then(common.mustCall());
 
   assert.rejects(() => {
@@ -95,7 +95,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'example')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'example')),
   }).then(common.mustCall());
 }
 
@@ -106,28 +106,28 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.utimes(bufferBlockedFile, new Date(), new Date(), () => {});
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.utimes(blockedFileURL, new Date(), new Date(), () => {});
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.utimes(relativeProtectedFile, new Date(), new Date(), () => {});
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(relativeProtectedFile),
+    // resource: path.toNamespacedPath(relativeProtectedFile),
   });
 
   assert.throws(() => {
@@ -135,7 +135,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'anyfile')),
   });
 }
 
@@ -146,21 +146,21 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.lutimes(bufferBlockedFile, new Date(), new Date(), () => {});
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.lutimes(blockedFileURL, new Date(), new Date(), () => {});
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
 }
 
@@ -173,7 +173,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'any-folder')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'any-folder')),
   });
   assert.throws(() => {
     fs.mkdir(path.join(relativeProtectedFolder, 'any-folder'), (err) => {
@@ -182,7 +182,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-folder')),
+    // resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-folder')),
   });
 }
 
@@ -206,38 +206,38 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   fs.rename(blockedFile, path.join(blockedFile, 'renamed'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   fs.rename(bufferBlockedFile, path.join(blockedFile, 'renamed'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.renameSync(blockedFileURL, path.join(blockedFile, 'renamed'));
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.renameSync(relativeProtectedFile, path.join(relativeProtectedFile, 'renamed'));
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(relativeProtectedFile),
+    // resource: path.toNamespacedPath(relativeProtectedFile),
   });
   assert.throws(() => {
     fs.renameSync(blockedFile, path.join(regularFolder, 'renamed'));
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
 
   assert.throws(() => {
@@ -245,7 +245,7 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'renamed')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'renamed')),
   });
 }
 
@@ -256,24 +256,24 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'any-file')),
   });
   assert.throws(() => {
     fs.copyFileSync(regularFile, path.join(relativeProtectedFolder, 'any-file'));
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
   });
   fs.copyFile(regularFile, path.join(relativeProtectedFolder, 'any-file'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
   }));
   fs.copyFile(bufferBlockedFile, path.join(relativeProtectedFolder, 'any-file'), common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
   }));
 }
 
@@ -284,14 +284,14 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(blockedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(blockedFolder, 'any-file')),
   });
   assert.throws(() => {
     fs.cpSync(regularFile, path.join(relativeProtectedFolder, 'any-file'));
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
+    // resource: path.toNamespacedPath(path.join(relativeProtectedFolder, 'any-file')),
   });
 }
 
@@ -302,14 +302,14 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFolder),
+    // resource: path.toNamespacedPath(blockedFolder),
   });
   assert.throws(() => {
     fs.rmSync(relativeProtectedFolder, { recursive: true });
   },{
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(relativeProtectedFolder),
+    // resource: path.toNamespacedPath(relativeProtectedFolder),
   });
 }
 
@@ -504,26 +504,26 @@ const relativeProtectedFolder = process.env.RELATIVEBLOCKEDFOLDER;
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   assert.throws(() => {
     fs.unlinkSync(bufferBlockedFile);
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
   fs.unlink(blockedFile, common.expectsError({
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   }));
   assert.throws(() => {
     fs.unlinkSync(blockedFileURL);
   }, {
     code: 'ERR_ACCESS_DENIED',
     permission: 'FileSystemWrite',
-    resource: path.toNamespacedPath(blockedFile),
+    // resource: path.toNamespacedPath(blockedFile),
   });
 }
 

From 39d608f9d829fab8078968270f81776d858110c2 Mon Sep 17 00:00:00 2001
From: Joyee Cheung <joyeec9h3@gmail.com>
Date: Thu, 9 Jan 2025 16:31:01 +0100
Subject: [PATCH 214/216] test: mark test-http-server-request-timeouts-mixed as
 flaky

This has been flaking the CI for more than 2 years with various
attempts to fix without success. It has still been flaking the
CI (failed 19 out of 100 recent testing CI runs). It's time to
mark it as flaky.

PR-URL: https://github.com/nodejs/node/pull/56503
Refs: https://github.com/nodejs/node/issues/43465
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
---
 test/sequential/sequential.status | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/test/sequential/sequential.status b/test/sequential/sequential.status
index 5f4445416d95fa..dd2275ddc95404 100644
--- a/test/sequential/sequential.status
+++ b/test/sequential/sequential.status
@@ -28,6 +28,9 @@ test-http-server-request-timeouts-mixed: PASS, FLAKY
 # https://github.com/nodejs/node/issues/54816
 test-single-executable-application-empty: PASS, FLAKY
 
+# https://github.com/nodejs/node/issues/43465
+test-http-server-request-timeouts-mixed: PASS, FLAKY
+
 [$system==solaris] # Also applies to SmartOS
 
 [$system==freebsd]

From 1c8b474319123695171becd1c79ab018af47d03c Mon Sep 17 00:00:00 2001
From: Marco Ippolito <marcoippolito54@gmail.com>
Date: Thu, 23 Jan 2025 13:38:17 +0100
Subject: [PATCH 215/216] test: skip test-buffer-tostring-range on smartos

PR-URL: https://github.com/nodejs/node/pull/56727
Refs: https://github.com/nodejs/node/issues/56726
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
---
 test/parallel/parallel.status | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/test/parallel/parallel.status b/test/parallel/parallel.status
index ae1763d1ee4eda..a7860449225092 100644
--- a/test/parallel/parallel.status
+++ b/test/parallel/parallel.status
@@ -66,6 +66,8 @@ test-pipe-file-to-http: PASS, FLAKY
 test-debugger-heap-profiler: PASS, FLAKY
 
 [$system==solaris] # Also applies to SmartOS
+# https://github.com/nodejs/node/issues/56726
+test-buffer-tostring-range: SKIP
 # https://github.com/nodejs/node/issues/43457
 test-domain-no-error-handler-abort-on-uncaught-0: PASS, FLAKY
 test-domain-no-error-handler-abort-on-uncaught-1: PASS,FLAKY

From 4819c99baa28bf2c1baf411ba100c467fec3d486 Mon Sep 17 00:00:00 2001
From: Marco Ippolito <marcoippolito54@gmail.com>
Date: Wed, 22 Jan 2025 14:04:10 +0100
Subject: [PATCH 216/216] 2025-02-10, Version 20.18.3 'Iron' (LTS)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Notable changes:

crypto:
  * update root certificates to NSS 3.104 (Richard Lau) https://github.com/nodejs/node/pull/55681
doc:
  * add LJHarb to collaborators (Jordan Harband) https://github.com/nodejs/node/pull/56132
  * enforce strict policy to semver-major releases (Rafael Gonzaga) https://github.com/nodejs/node/pull/55732
  * add jazelly to collaborators (Jason Zhang) https://github.com/nodejs/node/pull/55531
esm:
  * mark import attributes and JSON module as stable (Nicolò Ribaudo) https://github.com/nodejs/node/pull/55333
tools:
  * fix root certificate updater (Richard Lau) https://github.com/nodejs/node/pull/55681

PR-URL: https://github.com/nodejs/node/pull/56699
---
 CHANGELOG.md                    |   3 +-
 doc/api/esm.md                  |   4 +-
 doc/changelogs/CHANGELOG_V20.md | 232 ++++++++++++++++++++++++++++++++
 src/node_version.h              |   2 +-
 4 files changed, 237 insertions(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a60d13281b71e4..c8f7e219141cf6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -35,7 +35,8 @@ release.
 </tr>
 <tr>
   <td valign="top">
-<b><a href="doc/changelogs/CHANGELOG_V20.md#20.18.2">20.18.2</a></b><br/>
+<b><a href="doc/changelogs/CHANGELOG_V20.md#20.18.3">20.18.3</a></b><br/>
+<a href="doc/changelogs/CHANGELOG_V20.md#20.18.2">20.18.2</a><br/>
 <a href="doc/changelogs/CHANGELOG_V20.md#20.18.1">20.18.1</a><br/>
 <a href="doc/changelogs/CHANGELOG_V20.md#20.18.0">20.18.0</a><br/>
 <a href="doc/changelogs/CHANGELOG_V20.md#20.17.0">20.17.0</a><br/>
diff --git a/doc/api/esm.md b/doc/api/esm.md
index 65174473f8e37c..2419c64c2084f0 100644
--- a/doc/api/esm.md
+++ b/doc/api/esm.md
@@ -7,7 +7,7 @@
 <!-- YAML
 added: v8.5.0
 changes:
-  - version: REPLACEME
+  - version: v20.18.3
     pr-url: https://github.com/nodejs/node/pull/55333
     description: Import attributes are no longer experimental.
   - version: v20.10.0
@@ -593,7 +593,7 @@ separate cache.
 
 <!-- YAML
 changes:
-  - version: REPLACEME
+  - version: v20.18.3
     pr-url: https://github.com/nodejs/node/pull/55333
     description: JSON modules are no longer experimental.
 -->
diff --git a/doc/changelogs/CHANGELOG_V20.md b/doc/changelogs/CHANGELOG_V20.md
index 608d77fee98475..fc81e35e8264db 100644
--- a/doc/changelogs/CHANGELOG_V20.md
+++ b/doc/changelogs/CHANGELOG_V20.md
@@ -9,6 +9,7 @@
 </tr>
 <tr>
 <td>
+<a href="#20.18.3">20.18.3</a><br/>
 <a href="#20.18.2">20.18.2</a><br/>
 <a href="#20.18.1">20.18.1</a><br/>
 <a href="#20.18.0">20.18.0</a><br/>
@@ -67,6 +68,237 @@
   * [io.js](CHANGELOG_IOJS.md)
   * [Archive](CHANGELOG_ARCHIVE.md)
 
+<a id="20.18.3"></a>
+
+## 2025-02-10, Version 20.18.3 'Iron' (LTS), @marco-ippolito
+
+### Notable Changes
+
+* \[[`030f155986`](https://github.com/nodejs/node/commit/030f155986)] - **esm**: mark import attributes and JSON module as stable (Nicolò Ribaudo) [#55333](https://github.com/nodejs/node/pull/55333)
+* \[[`b9b006331f`](https://github.com/nodejs/node/commit/b9b006331f)] - **doc**: add LJHarb to collaborators (Jordan Harband) [#56132](https://github.com/nodejs/node/pull/56132)
+* \[[`39b89e90b4`](https://github.com/nodejs/node/commit/39b89e90b4)] - **doc**: enforce strict policy to semver-major releases (Rafael Gonzaga) [#55732](https://github.com/nodejs/node/pull/55732)
+* \[[`247fa1959f`](https://github.com/nodejs/node/commit/247fa1959f)] - **crypto**: update root certificates to NSS 3.104 (Richard Lau) [#55681](https://github.com/nodejs/node/pull/55681)
+* \[[`adfc2f993a`](https://github.com/nodejs/node/commit/adfc2f993a)] - **tools**: fix root certificate updater (Richard Lau) [#55681](https://github.com/nodejs/node/pull/55681)
+* \[[`29862ae105`](https://github.com/nodejs/node/commit/29862ae105)] - **doc**: add jazelly to collaborators (Jason Zhang) [#55531](https://github.com/nodejs/node/pull/55531)
+
+### Commits
+
+* \[[`b4f5da18a5`](https://github.com/nodejs/node/commit/b4f5da18a5)] - **benchmark**: add `test-reporters` (Aviv Keller) [#55757](https://github.com/nodejs/node/pull/55757)
+* \[[`407992e272`](https://github.com/nodejs/node/commit/407992e272)] - **benchmark**: add `test_runner/mock-fn` (Aviv Keller) [#55771](https://github.com/nodejs/node/pull/55771)
+* \[[`17abec4367`](https://github.com/nodejs/node/commit/17abec4367)] - **benchmark**: add nodeTiming.uvmetricsinfo bench (RafaelGSS) [#55614](https://github.com/nodejs/node/pull/55614)
+* \[[`43f7050338`](https://github.com/nodejs/node/commit/43f7050338)] - **benchmark**: add --runs support to run.js (Rafael Gonzaga) [#55158](https://github.com/nodejs/node/pull/55158)
+* \[[`470789a981`](https://github.com/nodejs/node/commit/470789a981)] - **benchmark**: adjust byte size for buffer-copy (Rafael Gonzaga) [#55295](https://github.com/nodejs/node/pull/55295)
+* \[[`ea1c97ac16`](https://github.com/nodejs/node/commit/ea1c97ac16)] - **buffer**: document concat zero-fill (Duncan) [#55562](https://github.com/nodejs/node/pull/55562)
+* \[[`ae683a9e1f`](https://github.com/nodejs/node/commit/ae683a9e1f)] - **build**: set DESTCPU correctly for 'make binary' on loongarch64 (吴小白) [#56271](https://github.com/nodejs/node/pull/56271)
+* \[[`af020edf96`](https://github.com/nodejs/node/commit/af020edf96)] - **build**: fix missing fp16 dependency in d8 builds (Joyee Cheung) [#56266](https://github.com/nodejs/node/pull/56266)
+* \[[`d6a1b74404`](https://github.com/nodejs/node/commit/d6a1b74404)] - **build**: add major release action (Rafael Gonzaga) [#56199](https://github.com/nodejs/node/pull/56199)
+* \[[`bc92a96a5a`](https://github.com/nodejs/node/commit/bc92a96a5a)] - **build**: allow overriding clang usage (Shelley Vohr) [#56016](https://github.com/nodejs/node/pull/56016)
+* \[[`f370ec0989`](https://github.com/nodejs/node/commit/f370ec0989)] - **build**: remove defaults for create-release-proposal (Rafael Gonzaga) [#56042](https://github.com/nodejs/node/pull/56042)
+* \[[`25e1862e87`](https://github.com/nodejs/node/commit/25e1862e87)] - **build**: set node\_arch to target\_cpu in GN (Shelley Vohr) [#55967](https://github.com/nodejs/node/pull/55967)
+* \[[`55c205e5f6`](https://github.com/nodejs/node/commit/55c205e5f6)] - **build**: add create release proposal action (Rafael Gonzaga) [#55690](https://github.com/nodejs/node/pull/55690)
+* \[[`9f14ba808d`](https://github.com/nodejs/node/commit/9f14ba808d)] - **build**: implement node\_use\_amaro flag in GN build (Cheng) [#55798](https://github.com/nodejs/node/pull/55798)
+* \[[`046430c47e`](https://github.com/nodejs/node/commit/046430c47e)] - **build**: fix building with system icu 76 (Michael Cho) [#55563](https://github.com/nodejs/node/pull/55563)
+* \[[`0b6d62c812`](https://github.com/nodejs/node/commit/0b6d62c812)] - **build**: fix GN arg used in generate\_config\_gypi.py (Shelley Vohr) [#55530](https://github.com/nodejs/node/pull/55530)
+* \[[`8f9c642369`](https://github.com/nodejs/node/commit/8f9c642369)] - **build**: fix GN build for cares/uv deps (Cheng) [#55477](https://github.com/nodejs/node/pull/55477)
+* \[[`284e932326`](https://github.com/nodejs/node/commit/284e932326)] - **build**: fix uninstall script for AIX 7.1 (Cloorc) [#55438](https://github.com/nodejs/node/pull/55438)
+* \[[`2f71f168ef`](https://github.com/nodejs/node/commit/2f71f168ef)] - **build**: tidy up cares.gyp (Richard Lau) [#55445](https://github.com/nodejs/node/pull/55445)
+* \[[`e89e807522`](https://github.com/nodejs/node/commit/e89e807522)] - **build**: synchronize list of c-ares source files (Richard Lau) [#55445](https://github.com/nodejs/node/pull/55445)
+* \[[`5eb6c94851`](https://github.com/nodejs/node/commit/5eb6c94851)] - **build**: fix path concatenation (Mohammed Keyvanzadeh) [#55387](https://github.com/nodejs/node/pull/55387)
+* \[[`720d23f3ac`](https://github.com/nodejs/node/commit/720d23f3ac)] - **build**: fix make errors that occur in Makefile (minkyu\_kim) [#55287](https://github.com/nodejs/node/pull/55287)
+* \[[`dc552c6739`](https://github.com/nodejs/node/commit/dc552c6739)] - **build,win**: enable pch for clang-cl (Stefan Stojanovic) [#55249](https://github.com/nodejs/node/pull/55249)
+* \[[`64b140d484`](https://github.com/nodejs/node/commit/64b140d484)] - **cli**: add `--heap-prof` flag available to `NODE_OPTIONS` (Juan José) [#54259](https://github.com/nodejs/node/pull/54259)
+* \[[`23fb644037`](https://github.com/nodejs/node/commit/23fb644037)] - **crypto**: ensure CryptoKey usages and algorithm are cached objects (Filip Skokan) [#56108](https://github.com/nodejs/node/pull/56108)
+* \[[`247fa1959f`](https://github.com/nodejs/node/commit/247fa1959f)] - **crypto**: update root certificates to NSS 3.104 (Richard Lau) [#55681](https://github.com/nodejs/node/pull/55681)
+* \[[`3c4262a171`](https://github.com/nodejs/node/commit/3c4262a171)] - **deps**: V8: cherry-pick 26fd1dfa9cd6 (Shu-yu Guo) [#55961](https://github.com/nodejs/node/pull/55961)
+* \[[`558e6588a4`](https://github.com/nodejs/node/commit/558e6588a4)] - **deps**: V8: backport ae5a4db8ad86 (Shu-yu Guo) [#55961](https://github.com/nodejs/node/pull/55961)
+* \[[`169bc58447`](https://github.com/nodejs/node/commit/169bc58447)] - **deps**: update simdutf to 5.6.4 (Node.js GitHub Bot) [#56255](https://github.com/nodejs/node/pull/56255)
+* \[[`bc7bb1e269`](https://github.com/nodejs/node/commit/bc7bb1e269)] - **deps**: update c-ares to v1.34.4 (Node.js GitHub Bot) [#56256](https://github.com/nodejs/node/pull/56256)
+* \[[`782bb6cac4`](https://github.com/nodejs/node/commit/782bb6cac4)] - **deps**: update zlib to 1.3.0.1-motley-82a5fec (Node.js GitHub Bot) [#55980](https://github.com/nodejs/node/pull/55980)
+* \[[`f7131cf178`](https://github.com/nodejs/node/commit/f7131cf178)] - **deps**: update corepack to 0.30.0 (Node.js GitHub Bot) [#55977](https://github.com/nodejs/node/pull/55977)
+* \[[`b09f6abcd3`](https://github.com/nodejs/node/commit/b09f6abcd3)] - **deps**: update simdutf to 5.6.3 (Node.js GitHub Bot) [#55973](https://github.com/nodejs/node/pull/55973)
+* \[[`d63ccb60ea`](https://github.com/nodejs/node/commit/d63ccb60ea)] - **deps**: update zlib to 1.3.0.1-motley-7e2e4d7 (Node.js GitHub Bot) [#54432](https://github.com/nodejs/node/pull/54432)
+* \[[`a2f315ef8b`](https://github.com/nodejs/node/commit/a2f315ef8b)] - **deps**: update simdutf to 5.6.2 (Node.js GitHub Bot) [#55889](https://github.com/nodejs/node/pull/55889)
+* \[[`afed723b6c`](https://github.com/nodejs/node/commit/afed723b6c)] - **deps**: update simdutf to 5.6.1 (Node.js GitHub Bot) [#55850](https://github.com/nodejs/node/pull/55850)
+* \[[`753c3b322f`](https://github.com/nodejs/node/commit/753c3b322f)] - **deps**: update c-ares to v1.34.3 (Node.js GitHub Bot) [#55803](https://github.com/nodejs/node/pull/55803)
+* \[[`4f89af8a6f`](https://github.com/nodejs/node/commit/4f89af8a6f)] - **deps**: update acorn to 8.14.0 (Node.js GitHub Bot) [#55699](https://github.com/nodejs/node/pull/55699)
+* \[[`07359ec14f`](https://github.com/nodejs/node/commit/07359ec14f)] - **deps**: update acorn to 8.13.0 (Node.js GitHub Bot) [#55558](https://github.com/nodejs/node/pull/55558)
+* \[[`c6236571fc`](https://github.com/nodejs/node/commit/c6236571fc)] - **deps**: update googletest to df1544b (Node.js GitHub Bot) [#55465](https://github.com/nodejs/node/pull/55465)
+* \[[`f63413c6f3`](https://github.com/nodejs/node/commit/f63413c6f3)] - **deps**: update c-ares to v1.34.2 (Node.js GitHub Bot) [#55463](https://github.com/nodejs/node/pull/55463)
+* \[[`ad725c766d`](https://github.com/nodejs/node/commit/ad725c766d)] - **deps**: update ada to 2.9.1 (Node.js GitHub Bot) [#54679](https://github.com/nodejs/node/pull/54679)
+* \[[`33367cbd62`](https://github.com/nodejs/node/commit/33367cbd62)] - **deps**: update simdutf to 5.6.0 (Node.js GitHub Bot) [#55379](https://github.com/nodejs/node/pull/55379)
+* \[[`f2a55d9d2d`](https://github.com/nodejs/node/commit/f2a55d9d2d)] - **deps**: update c-ares to v1.34.1 (Node.js GitHub Bot) [#55369](https://github.com/nodejs/node/pull/55369)
+* \[[`1d14886266`](https://github.com/nodejs/node/commit/1d14886266)] - **dgram**: check udp buffer size to avoid fd leak (theanarkh) [#56084](https://github.com/nodejs/node/pull/56084)
+* \[[`de265b9558`](https://github.com/nodejs/node/commit/de265b9558)] - **diagnostics\_channel**: fix unsubscribe during publish (simon-id) [#55116](https://github.com/nodejs/node/pull/55116)
+* \[[`22e0d17097`](https://github.com/nodejs/node/commit/22e0d17097)] - **dns**: stop using deprecated `ares_query` (Aviv Keller) [#55430](https://github.com/nodejs/node/pull/55430)
+* \[[`44f3b23749`](https://github.com/nodejs/node/commit/44f3b23749)] - **dns**: honor the order option (Luigi Pinca) [#55392](https://github.com/nodejs/node/pull/55392)
+* \[[`f78508cd30`](https://github.com/nodejs/node/commit/f78508cd30)] - **doc**: add history info for Permission Model (Antoine du Hamel) [#56707](https://github.com/nodejs/node/pull/56707)
+* \[[`f07be5e3cd`](https://github.com/nodejs/node/commit/f07be5e3cd)] - **doc**: add note for features using `InternalWorker` with permission model (Antoine du Hamel) [#56706](https://github.com/nodejs/node/pull/56706)
+* \[[`618e005672`](https://github.com/nodejs/node/commit/618e005672)] - **doc**: add history entries for JSON modules stabilization (Antoine du Hamel) [#55855](https://github.com/nodejs/node/pull/55855)
+* \[[`f89f4ff856`](https://github.com/nodejs/node/commit/f89f4ff856)] - **doc**: fix color contrast issue in light mode (Rich Trott) [#56272](https://github.com/nodejs/node/pull/56272)
+* \[[`a51ef9d829`](https://github.com/nodejs/node/commit/a51ef9d829)] - **doc**: clarify util.aborted resource usage (Kunal Kumar) [#55780](https://github.com/nodejs/node/pull/55780)
+* \[[`2d88c4b425`](https://github.com/nodejs/node/commit/2d88c4b425)] - **doc**: add esm examples to node:repl (Alfredo González) [#55432](https://github.com/nodejs/node/pull/55432)
+* \[[`722dada673`](https://github.com/nodejs/node/commit/722dada673)] - **doc**: add esm examples to node:readline (Alfredo González) [#55335](https://github.com/nodejs/node/pull/55335)
+* \[[`090c7a3b01`](https://github.com/nodejs/node/commit/090c7a3b01)] - **doc**: fix 'which' to 'that' and add commas (Selveter Senitro) [#56216](https://github.com/nodejs/node/pull/56216)
+* \[[`ae3f6fbe59`](https://github.com/nodejs/node/commit/ae3f6fbe59)] - **doc**: `sea.getRawAsset(key)` always returns an ArrayBuffer (沈鸿飞) [#56206](https://github.com/nodejs/node/pull/56206)
+* \[[`d103917d92`](https://github.com/nodejs/node/commit/d103917d92)] - **doc**: update announce documentation for releases (Rafael Gonzaga) [#56200](https://github.com/nodejs/node/pull/56200)
+* \[[`80e5bb87c4`](https://github.com/nodejs/node/commit/80e5bb87c4)] - **doc**: update blog link to /vulnerability (Rafael Gonzaga) [#56198](https://github.com/nodejs/node/pull/56198)
+* \[[`b739c2a926`](https://github.com/nodejs/node/commit/b739c2a926)] - **doc**: call out import.meta is only supported in ES modules (Anton Kastritskii) [#56186](https://github.com/nodejs/node/pull/56186)
+* \[[`bbd0222a10`](https://github.com/nodejs/node/commit/bbd0222a10)] - **doc**: add ambassador message - benefits of Node.js (Michael Dawson) [#56085](https://github.com/nodejs/node/pull/56085)
+* \[[`0e9abf2754`](https://github.com/nodejs/node/commit/0e9abf2754)] - **doc**: fix incorrect link to style guide (Yuan-Ming Hsu) [#56181](https://github.com/nodejs/node/pull/56181)
+* \[[`1dbc7e87d7`](https://github.com/nodejs/node/commit/1dbc7e87d7)] - **doc**: fix c++ addon hello world sample (Edigleysson Silva (Edy)) [#56172](https://github.com/nodejs/node/pull/56172)
+* \[[`026f0198c8`](https://github.com/nodejs/node/commit/026f0198c8)] - **doc**: update blog release-post link (Ruy Adorno) [#56123](https://github.com/nodejs/node/pull/56123)
+* \[[`c2fa359f7a`](https://github.com/nodejs/node/commit/c2fa359f7a)] - **doc**: mention `-a` flag for the release script (Ruy Adorno) [#56124](https://github.com/nodejs/node/pull/56124)
+* \[[`b9b006331f`](https://github.com/nodejs/node/commit/b9b006331f)] - **doc**: add LJHarb to collaborators (Jordan Harband) [#56132](https://github.com/nodejs/node/pull/56132)
+* \[[`7a1365ba62`](https://github.com/nodejs/node/commit/7a1365ba62)] - **doc**: add create-release-action to process (Rafael Gonzaga) [#55993](https://github.com/nodejs/node/pull/55993)
+* \[[`51262ec84e`](https://github.com/nodejs/node/commit/51262ec84e)] - **doc**: rename file to advocacy-ambassador-program.md (Tobias Nießen) [#56046](https://github.com/nodejs/node/pull/56046)
+* \[[`6fc7328831`](https://github.com/nodejs/node/commit/6fc7328831)] - **doc**: remove unused import from sample code (Blended Bram) [#55570](https://github.com/nodejs/node/pull/55570)
+* \[[`9f3ef4a434`](https://github.com/nodejs/node/commit/9f3ef4a434)] - **doc**: add FAQ to releases section (Rafael Gonzaga) [#55992](https://github.com/nodejs/node/pull/55992)
+* \[[`1dcf8dfedb`](https://github.com/nodejs/node/commit/1dcf8dfedb)] - **doc**: move history entry to class description (Luigi Pinca) [#55991](https://github.com/nodejs/node/pull/55991)
+* \[[`e016f68c73`](https://github.com/nodejs/node/commit/e016f68c73)] - **doc**: add history entry for textEncoder.encodeInto() (Luigi Pinca) [#55990](https://github.com/nodejs/node/pull/55990)
+* \[[`1b31638262`](https://github.com/nodejs/node/commit/1b31638262)] - **doc**: improve GN build documentation a bit (Shelley Vohr) [#55968](https://github.com/nodejs/node/pull/55968)
+* \[[`d25bcfd0b2`](https://github.com/nodejs/node/commit/d25bcfd0b2)] - **doc**: remove confusing and outdated sentence (Luigi Pinca) [#55988](https://github.com/nodejs/node/pull/55988)
+* \[[`65c1784337`](https://github.com/nodejs/node/commit/65c1784337)] - **doc**: add doc for PerformanceObserver.takeRecords() (skyclouds2001) [#55786](https://github.com/nodejs/node/pull/55786)
+* \[[`682ae41f86`](https://github.com/nodejs/node/commit/682ae41f86)] - **doc**: add vetted courses to the ambassador benefits (Matteo Collina) [#55934](https://github.com/nodejs/node/pull/55934)
+* \[[`9b6cc54b50`](https://github.com/nodejs/node/commit/9b6cc54b50)] - **doc**: doc how to add message for promotion (Michael Dawson) [#55843](https://github.com/nodejs/node/pull/55843)
+* \[[`db5378c8b9`](https://github.com/nodejs/node/commit/db5378c8b9)] - **doc**: add esm example for zlib (Leonardo Peixoto) [#55946](https://github.com/nodejs/node/pull/55946)
+* \[[`58a6fbb9cf`](https://github.com/nodejs/node/commit/58a6fbb9cf)] - **doc**: document approach for building wasm in deps (Michael Dawson) [#55940](https://github.com/nodejs/node/pull/55940)
+* \[[`41e3bcd752`](https://github.com/nodejs/node/commit/41e3bcd752)] - **doc**: add esm examples to node:timers (Alfredo González) [#55857](https://github.com/nodejs/node/pull/55857)
+* \[[`61de8f9b04`](https://github.com/nodejs/node/commit/61de8f9b04)] - **doc**: include git node release --promote to steps (Rafael Gonzaga) [#55835](https://github.com/nodejs/node/pull/55835)
+* \[[`559a0bfa2e`](https://github.com/nodejs/node/commit/559a0bfa2e)] - **doc**: add a note on console stream behavior (Gireesh Punathil) [#55616](https://github.com/nodejs/node/pull/55616)
+* \[[`3d11a85fe5`](https://github.com/nodejs/node/commit/3d11a85fe5)] - **doc**: add `-S` flag release preparation example (Antoine du Hamel) [#55836](https://github.com/nodejs/node/pull/55836)
+* \[[`955690e6cf`](https://github.com/nodejs/node/commit/955690e6cf)] - **doc**: clarify UV\_THREADPOOL\_SIZE env var usage (Preveen P) [#55832](https://github.com/nodejs/node/pull/55832)
+* \[[`d6738e919a`](https://github.com/nodejs/node/commit/d6738e919a)] - **doc**: add notable-change mention to sec release (Rafael Gonzaga) [#55830](https://github.com/nodejs/node/pull/55830)
+* \[[`79876f0dfd`](https://github.com/nodejs/node/commit/79876f0dfd)] - **doc**: fix history info for `URL.prototype.toJSON` (Antoine du Hamel) [#55818](https://github.com/nodejs/node/pull/55818)
+* \[[`c14776fbaa`](https://github.com/nodejs/node/commit/c14776fbaa)] - **doc**: correct max-semi-space-size statement (Joe Bowbeer) [#55812](https://github.com/nodejs/node/pull/55812)
+* \[[`83b415e8f3`](https://github.com/nodejs/node/commit/83b415e8f3)] - **doc**: run license-builder (github-actions\[bot]) [#55813](https://github.com/nodejs/node/pull/55813)
+* \[[`07f53b1d75`](https://github.com/nodejs/node/commit/07f53b1d75)] - **doc**: clarify triager role (Gireesh Punathil) [#55775](https://github.com/nodejs/node/pull/55775)
+* \[[`2abfdefcf3`](https://github.com/nodejs/node/commit/2abfdefcf3)] - **doc**: clarify removal of experimental API does not require a deprecation (Antoine du Hamel) [#55746](https://github.com/nodejs/node/pull/55746)
+* \[[`39b89e90b4`](https://github.com/nodejs/node/commit/39b89e90b4)] - **doc**: enforce strict policy to semver-major releases (Rafael Gonzaga) [#55732](https://github.com/nodejs/node/pull/55732)
+* \[[`d0417eaec9`](https://github.com/nodejs/node/commit/d0417eaec9)] - **doc**: add esm example in `path.md` (Aviv Keller) [#55745](https://github.com/nodejs/node/pull/55745)
+* \[[`032ff07a2d`](https://github.com/nodejs/node/commit/032ff07a2d)] - **doc**: consistent use of word child process (Gireesh Punathil) [#55654](https://github.com/nodejs/node/pull/55654)
+* \[[`16eef6461e`](https://github.com/nodejs/node/commit/16eef6461e)] - **doc**: clarity to available addon options (Preveen P) [#55715](https://github.com/nodejs/node/pull/55715)
+* \[[`a7ce82e3cc`](https://github.com/nodejs/node/commit/a7ce82e3cc)] - **doc**: update `--max-semi-space-size` description (Joe Bowbeer) [#55495](https://github.com/nodejs/node/pull/55495)
+* \[[`1bb461e2b6`](https://github.com/nodejs/node/commit/1bb461e2b6)] - **doc**: add write flag when open file as the demo code's intention (robberfree) [#54626](https://github.com/nodejs/node/pull/54626)
+* \[[`8cd619f8d7`](https://github.com/nodejs/node/commit/8cd619f8d7)] - **doc**: remove mention of ECDH-ES in crypto.diffieHellman (Filip Skokan) [#55611](https://github.com/nodejs/node/pull/55611)
+* \[[`4576d14d0f`](https://github.com/nodejs/node/commit/4576d14d0f)] - **doc**: improve c++ embedder API doc (Gireesh Punathil) [#55597](https://github.com/nodejs/node/pull/55597)
+* \[[`12bd57fbaa`](https://github.com/nodejs/node/commit/12bd57fbaa)] - **doc**: capitalize "MIT License" (Aviv Keller) [#55575](https://github.com/nodejs/node/pull/55575)
+* \[[`362b01b275`](https://github.com/nodejs/node/commit/362b01b275)] - **doc**: add esm examples to node:string\_decoder (Alfredo González) [#55507](https://github.com/nodejs/node/pull/55507)
+* \[[`29862ae105`](https://github.com/nodejs/node/commit/29862ae105)] - **doc**: add jazelly to collaborators (Jason Zhang) [#55531](https://github.com/nodejs/node/pull/55531)
+* \[[`c1b63e5e6b`](https://github.com/nodejs/node/commit/c1b63e5e6b)] - **doc**: changed the command used to verify SHASUMS256 (adriancuadrado) [#55420](https://github.com/nodejs/node/pull/55420)
+* \[[`9db657532b`](https://github.com/nodejs/node/commit/9db657532b)] - **doc**: add note about stdio streams in child\_process (Ederin (Ed) Igharoro) [#55322](https://github.com/nodejs/node/pull/55322)
+* \[[`475e478713`](https://github.com/nodejs/node/commit/475e478713)] - **doc**: add `isBigIntObject` to documentation (leviscar) [#55450](https://github.com/nodejs/node/pull/55450)
+* \[[`0487e70475`](https://github.com/nodejs/node/commit/0487e70475)] - **doc**: remove outdated remarks about `highWaterMark` in fs (Ian Kerins) [#55462](https://github.com/nodejs/node/pull/55462)
+* \[[`e9a8feb44a`](https://github.com/nodejs/node/commit/e9a8feb44a)] - **doc**: move Danielle Adams key to old gpg keys (RafaelGSS) [#55399](https://github.com/nodejs/node/pull/55399)
+* \[[`bfbe651626`](https://github.com/nodejs/node/commit/bfbe651626)] - **doc**: move Bryan English key to old gpg keys (RafaelGSS) [#55399](https://github.com/nodejs/node/pull/55399)
+* \[[`c1cab9b4d7`](https://github.com/nodejs/node/commit/c1cab9b4d7)] - **doc**: move Beth Griggs keys to old gpg keys (RafaelGSS) [#55399](https://github.com/nodejs/node/pull/55399)
+* \[[`85d8eb397c`](https://github.com/nodejs/node/commit/85d8eb397c)] - **doc**: spell out condition restrictions (Jan Martin) [#55187](https://github.com/nodejs/node/pull/55187)
+* \[[`de8de542b5`](https://github.com/nodejs/node/commit/de8de542b5)] - **doc**: add missing return values in buffer docs (Karl Horky) [#55273](https://github.com/nodejs/node/pull/55273)
+* \[[`a5df7087fd`](https://github.com/nodejs/node/commit/a5df7087fd)] - **doc**: fix ambasador markdown list (Rafael Gonzaga) [#55361](https://github.com/nodejs/node/pull/55361)
+* \[[`fbfcb0cc08`](https://github.com/nodejs/node/commit/fbfcb0cc08)] - **doc**: edit onboarding guide to clarify when mailmap addition is needed (Antoine du Hamel) [#55334](https://github.com/nodejs/node/pull/55334)
+* \[[`e70abce96a`](https://github.com/nodejs/node/commit/e70abce96a)] - **doc**: fix the return type of outgoingMessage.setHeaders() (Jimmy Leung) [#55290](https://github.com/nodejs/node/pull/55290)
+* \[[`030f155986`](https://github.com/nodejs/node/commit/030f155986)] - **esm**: mark import attributes and JSON module as stable (Nicolò Ribaudo) [#55333](https://github.com/nodejs/node/pull/55333)
+* \[[`86cb697b81`](https://github.com/nodejs/node/commit/86cb697b81)] - **esm**: add a fallback when importer in not a file (Antoine du Hamel) [#55471](https://github.com/nodejs/node/pull/55471)
+* \[[`8c8de30680`](https://github.com/nodejs/node/commit/8c8de30680)] - **esm**: fix inconsistency with `importAssertion` in `resolve` hook (Wei Zhu) [#55365](https://github.com/nodejs/node/pull/55365)
+* \[[`a41b0e1247`](https://github.com/nodejs/node/commit/a41b0e1247)] - **events**: optimize EventTarget.addEventListener (Robert Nagy) [#55312](https://github.com/nodejs/node/pull/55312)
+* \[[`2c6dcf7209`](https://github.com/nodejs/node/commit/2c6dcf7209)] - **fs**: make mutating `options` in Promises `readdir()` not affect results (LiviaMedeiros) [#56057](https://github.com/nodejs/node/pull/56057)
+* \[[`9317feb829`](https://github.com/nodejs/node/commit/9317feb829)] - **fs**: lazily load ReadFileContext (Gürgün Dayıoğlu) [#55998](https://github.com/nodejs/node/pull/55998)
+* \[[`739ee18430`](https://github.com/nodejs/node/commit/739ee18430)] - **http2**: support ALPNCallback option (ZYSzys) [#56187](https://github.com/nodejs/node/pull/56187)
+* \[[`7ba6dcf180`](https://github.com/nodejs/node/commit/7ba6dcf180)] - **http2**: fix memory leak caused by premature listener removing (ywave620) [#55966](https://github.com/nodejs/node/pull/55966)
+* \[[`4c15bd44a0`](https://github.com/nodejs/node/commit/4c15bd44a0)] - **http2**: fix client async storage persistence (Orgad Shaneh) [#55460](https://github.com/nodejs/node/pull/55460)
+* \[[`ac57dadd9a`](https://github.com/nodejs/node/commit/ac57dadd9a)] - **lib**: add validation for options in compileFunction (Taejin Kim) [#56023](https://github.com/nodejs/node/pull/56023)
+* \[[`a5b0d8900a`](https://github.com/nodejs/node/commit/a5b0d8900a)] - **lib**: remove startsWith/endsWith primordials for char checks (Gürgün Dayıoğlu) [#55407](https://github.com/nodejs/node/pull/55407)
+* \[[`f10857828f`](https://github.com/nodejs/node/commit/f10857828f)] - **lib**: test\_runner#mock:timers respeced timeout\_max behaviour (BadKey) [#55375](https://github.com/nodejs/node/pull/55375)
+* \[[`1a193bf256`](https://github.com/nodejs/node/commit/1a193bf256)] - **meta**: bump github/codeql-action from 3.27.0 to 3.27.5 (dependabot\[bot]) [#56103](https://github.com/nodejs/node/pull/56103)
+* \[[`23f319803d`](https://github.com/nodejs/node/commit/23f319803d)] - **meta**: bump actions/checkout from 4.1.7 to 4.2.2 (dependabot\[bot]) [#56102](https://github.com/nodejs/node/pull/56102)
+* \[[`a953301a1c`](https://github.com/nodejs/node/commit/a953301a1c)] - **meta**: bump step-security/harden-runner from 2.10.1 to 2.10.2 (dependabot\[bot]) [#56101](https://github.com/nodejs/node/pull/56101)
+* \[[`c58065ae77`](https://github.com/nodejs/node/commit/c58065ae77)] - **meta**: bump actions/setup-node from 4.0.3 to 4.1.0 (dependabot\[bot]) [#56100](https://github.com/nodejs/node/pull/56100)
+* \[[`12b0cecc20`](https://github.com/nodejs/node/commit/12b0cecc20)] - **meta**: add releasers as CODEOWNERS to proposal action (Rafael Gonzaga) [#56043](https://github.com/nodejs/node/pull/56043)
+* \[[`070aa9d6a5`](https://github.com/nodejs/node/commit/070aa9d6a5)] - **meta**: bump actions/setup-python from 5.2.0 to 5.3.0 (dependabot\[bot]) [#55688](https://github.com/nodejs/node/pull/55688)
+* \[[`7a46ffd18a`](https://github.com/nodejs/node/commit/7a46ffd18a)] - **meta**: bump actions/setup-node from 4.0.4 to 4.1.0 (dependabot\[bot]) [#55687](https://github.com/nodejs/node/pull/55687)
+* \[[`8b4f2e0c6a`](https://github.com/nodejs/node/commit/8b4f2e0c6a)] - **meta**: bump rtCamp/action-slack-notify from 2.3.0 to 2.3.2 (dependabot\[bot]) [#55686](https://github.com/nodejs/node/pull/55686)
+* \[[`024c5b2ab3`](https://github.com/nodejs/node/commit/024c5b2ab3)] - **meta**: bump actions/upload-artifact from 4.4.0 to 4.4.3 (dependabot\[bot]) [#55685](https://github.com/nodejs/node/pull/55685)
+* \[[`3d06971a15`](https://github.com/nodejs/node/commit/3d06971a15)] - **meta**: bump actions/cache from 4.0.2 to 4.1.2 (dependabot\[bot]) [#55684](https://github.com/nodejs/node/pull/55684)
+* \[[`c33de63a86`](https://github.com/nodejs/node/commit/c33de63a86)] - **meta**: bump actions/checkout from 4.2.0 to 4.2.2 (dependabot\[bot]) [#55683](https://github.com/nodejs/node/pull/55683)
+* \[[`ccc1ea0576`](https://github.com/nodejs/node/commit/ccc1ea0576)] - **meta**: bump github/codeql-action from 3.26.10 to 3.27.0 (dependabot\[bot]) [#55682](https://github.com/nodejs/node/pull/55682)
+* \[[`9c2d0fd242`](https://github.com/nodejs/node/commit/9c2d0fd242)] - **meta**: make review-wanted message minimal (Aviv Keller) [#55607](https://github.com/nodejs/node/pull/55607)
+* \[[`0c14cae2b2`](https://github.com/nodejs/node/commit/0c14cae2b2)] - **meta**: show PR/issue title on review-wanted (Aviv Keller) [#55606](https://github.com/nodejs/node/pull/55606)
+* \[[`aeae7e1e6f`](https://github.com/nodejs/node/commit/aeae7e1e6f)] - **meta**: move one or more collaborators to emeritus (Node.js GitHub Bot) [#55381](https://github.com/nodejs/node/pull/55381)
+* \[[`6d7b78c3d8`](https://github.com/nodejs/node/commit/6d7b78c3d8)] - **meta**: change color to blue notify review-wanted (Rafael Gonzaga) [#55423](https://github.com/nodejs/node/pull/55423)
+* \[[`7441e289db`](https://github.com/nodejs/node/commit/7441e289db)] - **meta**: bump codecov/codecov-action from 4.5.0 to 4.6.0 (dependabot\[bot]) [#55222](https://github.com/nodejs/node/pull/55222)
+* \[[`158c8ad77c`](https://github.com/nodejs/node/commit/158c8ad77c)] - **meta**: bump github/codeql-action from 3.26.6 to 3.26.10 (dependabot\[bot]) [#55221](https://github.com/nodejs/node/pull/55221)
+* \[[`8d3d4a9fab`](https://github.com/nodejs/node/commit/8d3d4a9fab)] - **meta**: bump step-security/harden-runner from 2.9.1 to 2.10.1 (dependabot\[bot]) [#55220](https://github.com/nodejs/node/pull/55220)
+* \[[`6797a35a5b`](https://github.com/nodejs/node/commit/6797a35a5b)] - **module**: prevent main thread exiting before esm worker ends (Shima Ryuhei) [#56183](https://github.com/nodejs/node/pull/56183)
+* \[[`bd99bf109f`](https://github.com/nodejs/node/commit/bd99bf109f)] - **node-api**: allow napi\_delete\_reference in finalizers (Chengzhong Wu) [#55620](https://github.com/nodejs/node/pull/55620)
+* \[[`6308c18dbb`](https://github.com/nodejs/node/commit/6308c18dbb)] - **report**: fix network queries in getReport libuv with exclude-network (Adrien Foulon) [#55602](https://github.com/nodejs/node/pull/55602)
+* \[[`ff2eec7275`](https://github.com/nodejs/node/commit/ff2eec7275)] - **sea**: only assert snapshot main function for main threads (Joyee Cheung) [#56120](https://github.com/nodejs/node/pull/56120)
+* \[[`f9f3003de7`](https://github.com/nodejs/node/commit/f9f3003de7)] - **src**: fix outdated js2c.cc references (Chengzhong Wu) [#56133](https://github.com/nodejs/node/pull/56133)
+* \[[`a882536596`](https://github.com/nodejs/node/commit/a882536596)] - **src**: fix kill signal on Windows (Hüseyin Açacak) [#55514](https://github.com/nodejs/node/pull/55514)
+* \[[`df1002438a`](https://github.com/nodejs/node/commit/df1002438a)] - **src**: improve `node:os` userInfo performance (Yagiz Nizipli) [#55719](https://github.com/nodejs/node/pull/55719)
+* \[[`f17416ec3e`](https://github.com/nodejs/node/commit/f17416ec3e)] - **src**: fix dns crash when failed to create NodeAresTask (theanarkh) [#55521](https://github.com/nodejs/node/pull/55521)
+* \[[`8d5b8c31d8`](https://github.com/nodejs/node/commit/8d5b8c31d8)] - **src**: use NewFromUtf8Literal in NODE\_DEFINE\_CONSTANT (Charles Kerr) [#55581](https://github.com/nodejs/node/pull/55581)
+* \[[`0977bb6c1d`](https://github.com/nodejs/node/commit/0977bb6c1d)] - **src**: remove icu based `ToASCII` and `ToUnicode` (Yagiz Nizipli) [#55156](https://github.com/nodejs/node/pull/55156)
+* \[[`72817072e2`](https://github.com/nodejs/node/commit/72817072e2)] - **src**: fix winapi\_strerror error string (Hüseyin Açacak) [#55207](https://github.com/nodejs/node/pull/55207)
+* \[[`6f47f53f90`](https://github.com/nodejs/node/commit/6f47f53f90)] - **src,lib**: optimize nodeTiming.uvMetricsInfo (RafaelGSS) [#55614](https://github.com/nodejs/node/pull/55614)
+* \[[`ac583d4549`](https://github.com/nodejs/node/commit/ac583d4549)] - **stream**: propagate AbortSignal reason (Marvin ROGER) [#55473](https://github.com/nodejs/node/pull/55473)
+* \[[`1c8b474319`](https://github.com/nodejs/node/commit/1c8b474319)] - **test**: skip test-buffer-tostring-range on smartos (Marco Ippolito) [#56727](https://github.com/nodejs/node/pull/56727)
+* \[[`39d608f9d8`](https://github.com/nodejs/node/commit/39d608f9d8)] - **test**: mark test-http-server-request-timeouts-mixed as flaky (Joyee Cheung) [#56503](https://github.com/nodejs/node/pull/56503)
+* \[[`5c3f18be04`](https://github.com/nodejs/node/commit/5c3f18be04)] - **test**: temporary remove resource check from fs read-write (Rafael Gonzaga) [#56789](https://github.com/nodejs/node/pull/56789)
+* \[[`4196aaf033`](https://github.com/nodejs/node/commit/4196aaf033)] - **test**: remove exludes for sea tests on PPC (Michael Dawson) [#56217](https://github.com/nodejs/node/pull/56217)
+* \[[`3ea738fc26`](https://github.com/nodejs/node/commit/3ea738fc26)] - **test**: remove `hasOpenSSL3x` utils (Antoine du Hamel) [#56164](https://github.com/nodejs/node/pull/56164)
+* \[[`21e21a270e`](https://github.com/nodejs/node/commit/21e21a270e)] - **test**: remove test-fs-utimes flaky designation (Luigi Pinca) [#56052](https://github.com/nodejs/node/pull/56052)
+* \[[`e464c6f7a5`](https://github.com/nodejs/node/commit/e464c6f7a5)] - **test**: move test-worker-arraybuffer-zerofill to parallel (Luigi Pinca) [#56053](https://github.com/nodejs/node/pull/56053)
+* \[[`e99584cd57`](https://github.com/nodejs/node/commit/e99584cd57)] - **test**: make HTTP/1.0 connection test more robust (Arne Keller) [#55959](https://github.com/nodejs/node/pull/55959)
+* \[[`2d03f87ef7`](https://github.com/nodejs/node/commit/2d03f87ef7)] - **test**: convert readdir test to use test runner (Thomas Chetwin) [#55750](https://github.com/nodejs/node/pull/55750)
+* \[[`207562fa3d`](https://github.com/nodejs/node/commit/207562fa3d)] - **test**: make x509 crypto tests work with BoringSSL (Shelley Vohr) [#55927](https://github.com/nodejs/node/pull/55927)
+* \[[`a17d9e1acf`](https://github.com/nodejs/node/commit/a17d9e1acf)] - **test**: fix determining lower priority (Livia Medeiros) [#55908](https://github.com/nodejs/node/pull/55908)
+* \[[`50b6729d8c`](https://github.com/nodejs/node/commit/50b6729d8c)] - **test**: increase coverage of `pathToFileURL` (Antoine du Hamel) [#55493](https://github.com/nodejs/node/pull/55493)
+* \[[`0aa9e74027`](https://github.com/nodejs/node/commit/0aa9e74027)] - **test**: improve test coverage for child process message sending (Juan José) [#55710](https://github.com/nodejs/node/pull/55710)
+* \[[`ebdbbc3ec8`](https://github.com/nodejs/node/commit/ebdbbc3ec8)] - **test**: ensure that test priority is not higher than current priority (Livia Medeiros) [#55739](https://github.com/nodejs/node/pull/55739)
+* \[[`b40789e085`](https://github.com/nodejs/node/commit/b40789e085)] - **test**: add buffer to fs\_permission tests (Rafael Gonzaga) [#55734](https://github.com/nodejs/node/pull/55734)
+* \[[`a9998799be`](https://github.com/nodejs/node/commit/a9998799be)] - **test**: improve test coverage for `ServerResponse` (Juan José) [#55711](https://github.com/nodejs/node/pull/55711)
+* \[[`d2421f3c92`](https://github.com/nodejs/node/commit/d2421f3c92)] - **test**: ignore unrelated events in FW watch tests (Carlos Espa) [#55605](https://github.com/nodejs/node/pull/55605)
+* \[[`0ac0afc4a9`](https://github.com/nodejs/node/commit/0ac0afc4a9)] - **test**: refactor some esm tests (Antoine du Hamel) [#55472](https://github.com/nodejs/node/pull/55472)
+* \[[`0f8b8269d1`](https://github.com/nodejs/node/commit/0f8b8269d1)] - **test**: split up test-runner-mock-timers test (Julian Gassner) [#55506](https://github.com/nodejs/node/pull/55506)
+* \[[`8f6462f40b`](https://github.com/nodejs/node/commit/8f6462f40b)] - **test**: avoid `apply()` calls with large amount of elements (Livia Medeiros) [#55501](https://github.com/nodejs/node/pull/55501)
+* \[[`e9b0ff482b`](https://github.com/nodejs/node/commit/e9b0ff482b)] - **test**: increase test coverage for `http.OutgoingMessage.appendHeader()` (Juan José) [#55467](https://github.com/nodejs/node/pull/55467)
+* \[[`d5ad060073`](https://github.com/nodejs/node/commit/d5ad060073)] - **test**: fix addons and node-api test assumptions (Antoine du Hamel) [#55441](https://github.com/nodejs/node/pull/55441)
+* \[[`a28376bb85`](https://github.com/nodejs/node/commit/a28376bb85)] - **test**: deflake `test-cluster-shared-handle-bind-privileged-port` (Aviv Keller) [#55378](https://github.com/nodejs/node/pull/55378)
+* \[[`22c07867d1`](https://github.com/nodejs/node/commit/22c07867d1)] - **test**: remove duplicate tests (Luigi Pinca) [#55393](https://github.com/nodejs/node/pull/55393)
+* \[[`5489656b35`](https://github.com/nodejs/node/commit/5489656b35)] - **test**: update test\_util.cc for coverage (minkyu\_kim) [#55291](https://github.com/nodejs/node/pull/55291)
+* \[[`ceafb3250d`](https://github.com/nodejs/node/commit/ceafb3250d)] - **test,crypto**: make crypto tests work with BoringSSL (Shelley Vohr) [#55491](https://github.com/nodejs/node/pull/55491)
+* \[[`7021b3b276`](https://github.com/nodejs/node/commit/7021b3b276)] - **test\_runner**: simplify hook running logic (Colin Ihrig) [#55963](https://github.com/nodejs/node/pull/55963)
+* \[[`d9fd632f56`](https://github.com/nodejs/node/commit/d9fd632f56)] - **test\_runner**: error on mocking an already mocked date (Aviv Keller) [#55858](https://github.com/nodejs/node/pull/55858)
+* \[[`3fcca16374`](https://github.com/nodejs/node/commit/3fcca16374)] - **test\_runner**: add support for scheduler.wait on mock timers (Erick Wendel) [#55244](https://github.com/nodejs/node/pull/55244)
+* \[[`f67147ec47`](https://github.com/nodejs/node/commit/f67147ec47)] - **tools**: update github\_reporter to 1.7.2 (Node.js GitHub Bot) [#56205](https://github.com/nodejs/node/pull/56205)
+* \[[`5c819f1043`](https://github.com/nodejs/node/commit/5c819f1043)] - **tools**: add REPLACEME check to workflow (Mert Can Altin) [#56251](https://github.com/nodejs/node/pull/56251)
+* \[[`b24a85b00b`](https://github.com/nodejs/node/commit/b24a85b00b)] - **tools**: use `github.actor` instead of bot username for release proposals (Antoine du Hamel) [#56232](https://github.com/nodejs/node/pull/56232)
+* \[[`33cd7d3d8c`](https://github.com/nodejs/node/commit/33cd7d3d8c)] - **tools**: fix release proposal linter to support more than 1 folk preparing (Antoine du Hamel) [#56203](https://github.com/nodejs/node/pull/56203)
+* \[[`10d55e3d73`](https://github.com/nodejs/node/commit/10d55e3d73)] - **tools**: use commit title as PR title when creating release proposal (Antoine du Hamel) [#56165](https://github.com/nodejs/node/pull/56165)
+* \[[`b3d40e3be5`](https://github.com/nodejs/node/commit/b3d40e3be5)] - **tools**: improve release proposal PR opening (Antoine du Hamel) [#56161](https://github.com/nodejs/node/pull/56161)
+* \[[`13455ca9ce`](https://github.com/nodejs/node/commit/13455ca9ce)] - **tools**: update `create-release-proposal` workflow (Antoine du Hamel) [#56054](https://github.com/nodejs/node/pull/56054)
+* \[[`851a3d7d8d`](https://github.com/nodejs/node/commit/851a3d7d8d)] - **tools**: fix update-undici script (Michaël Zasso) [#56069](https://github.com/nodejs/node/pull/56069)
+* \[[`e1635fbd4e`](https://github.com/nodejs/node/commit/e1635fbd4e)] - **tools**: allow dispatch of `tools.yml` from forks (Antoine du Hamel) [#56008](https://github.com/nodejs/node/pull/56008)
+* \[[`5f15d8b3f5`](https://github.com/nodejs/node/commit/5f15d8b3f5)] - **tools**: fix nghttp3 updater script (Antoine du Hamel) [#56007](https://github.com/nodejs/node/pull/56007)
+* \[[`bbf39b8c46`](https://github.com/nodejs/node/commit/bbf39b8c46)] - **tools**: filter release keys to reduce interactivity (Antoine du Hamel) [#55950](https://github.com/nodejs/node/pull/55950)
+* \[[`954e60b87d`](https://github.com/nodejs/node/commit/954e60b87d)] - **tools**: update WPT updater (Antoine du Hamel) [#56003](https://github.com/nodejs/node/pull/56003)
+* \[[`1e09d258da`](https://github.com/nodejs/node/commit/1e09d258da)] - **tools**: add WPT updater for specific subsystems (Mert Can Altin) [#54460](https://github.com/nodejs/node/pull/54460)
+* \[[`b95c4f5bf0`](https://github.com/nodejs/node/commit/b95c4f5bf0)] - **tools**: use tokenless Codecov uploads (Michaël Zasso) [#55943](https://github.com/nodejs/node/pull/55943)
+* \[[`6327554706`](https://github.com/nodejs/node/commit/6327554706)] - **tools**: add linter for release commit proposals (Antoine du Hamel) [#55923](https://github.com/nodejs/node/pull/55923)
+* \[[`aad478e58d`](https://github.com/nodejs/node/commit/aad478e58d)] - **tools**: fix exclude labels for commit-queue (Richard Lau) [#55809](https://github.com/nodejs/node/pull/55809)
+* \[[`1c8c881aef`](https://github.com/nodejs/node/commit/1c8c881aef)] - **tools**: make commit-queue check blocked label (Marco Ippolito) [#55781](https://github.com/nodejs/node/pull/55781)
+* \[[`c3913f9c87`](https://github.com/nodejs/node/commit/c3913f9c87)] - **tools**: fix c-ares updater script for Node.js 18 (Richard Lau) [#55717](https://github.com/nodejs/node/pull/55717)
+* \[[`adfc2f993a`](https://github.com/nodejs/node/commit/adfc2f993a)] - **tools**: fix root certificate updater (Richard Lau) [#55681](https://github.com/nodejs/node/pull/55681)
+* \[[`d336f8de15`](https://github.com/nodejs/node/commit/d336f8de15)] - **tools**: compact jq output in daily-wpt-fyi.yml action (Filip Skokan) [#55695](https://github.com/nodejs/node/pull/55695)
+* \[[`cdb7839a0c`](https://github.com/nodejs/node/commit/cdb7839a0c)] - **tools**: run daily WPT.fyi report on all supported releases (Filip Skokan) [#55619](https://github.com/nodejs/node/pull/55619)
+* \[[`274d0b4062`](https://github.com/nodejs/node/commit/274d0b4062)] - **tools**: update lint-md-dependencies (Node.js GitHub Bot) [#55470](https://github.com/nodejs/node/pull/55470)
+* \[[`3dceeb8b15`](https://github.com/nodejs/node/commit/3dceeb8b15)] - **tools**: add script to synch c-ares source lists (Richard Lau) [#55445](https://github.com/nodejs/node/pull/55445)
+* \[[`bd0ec907da`](https://github.com/nodejs/node/commit/bd0ec907da)] - **url**: handle "unsafe" characters properly in `pathToFileURL` (Antoine du Hamel) [#54545](https://github.com/nodejs/node/pull/54545)
+* \[[`83137bceb6`](https://github.com/nodejs/node/commit/83137bceb6)] - **util**: fix Latin1 decoding to return string output (Mert Can Altin) [#56222](https://github.com/nodejs/node/pull/56222)
+* \[[`195cc42935`](https://github.com/nodejs/node/commit/195cc42935)] - **util**: do not rely on mutable `Object` and `Function`' `constructor` prop (Antoine du Hamel) [#56188](https://github.com/nodejs/node/pull/56188)
+* \[[`cca7c518de`](https://github.com/nodejs/node/commit/cca7c518de)] - **util**: add fast path for Latin1 decoding (Mert Can Altin) [#55275](https://github.com/nodejs/node/pull/55275)
+* \[[`7ed346d8fd`](https://github.com/nodejs/node/commit/7ed346d8fd)] - **util**: do not catch on circular `@@toStringTag` errors (Aviv Keller) [#55544](https://github.com/nodejs/node/pull/55544)
+* \[[`aa031b3eec`](https://github.com/nodejs/node/commit/aa031b3eec)] - **worker**: fix crash when a worker joins after exit (Stephen Belanger) [#56191](https://github.com/nodejs/node/pull/56191)
+
 <a id="20.18.2"></a>
 
 ## 2025-01-21, Version 20.18.2 'Iron' (LTS), @RafaelGSS
diff --git a/src/node_version.h b/src/node_version.h
index 693d1f057092fa..fdf856b3f475ad 100644
--- a/src/node_version.h
+++ b/src/node_version.h
@@ -29,7 +29,7 @@
 #define NODE_VERSION_IS_LTS 1
 #define NODE_VERSION_LTS_CODENAME "Iron"
 
-#define NODE_VERSION_IS_RELEASE 0
+#define NODE_VERSION_IS_RELEASE 1
 
 #ifndef NODE_STRINGIFY
 #define NODE_STRINGIFY(n) NODE_STRINGIFY_HELPER(n)